Downloads:
37,327
Downloads of v 3.4.1:
2,686
Last Update:
18 Jul 2023
Package Maintainer(s):
Software Author(s):
- IronPython Contributors
- Microsoft
Tags:
ironpython python dynamic dlr- Software Specific:
- Software Site
- Software License
- Software Mailing List
- Package Specific:
- Package Source
- Package outdated?
- Package broken?
- Contact Maintainers
- Contact Site Admins
- Software Vendor?
- Report Abuse
- Download
IronPython
- 1
- 2
- 3
3.4.1 | Updated: 18 Jul 2023
- Software Specific:
- Software Site
- Software License
- Software Mailing List
- Package Specific:
- Package Source
- Package outdated?
- Package broken?
- Contact Maintainers
- Contact Site Admins
- Software Vendor?
- Report Abuse
- Download
Downloads:
37,327
Downloads of v 3.4.1:
2,686
Maintainer(s):
Software Author(s):
- IronPython Contributors
- Microsoft
IronPython 3.4.1
Legal Disclaimer: Neither this package nor Chocolatey Software, Inc. are affiliated with or endorsed by IronPython Contributors, Microsoft. The inclusion of IronPython Contributors, Microsoft trademark(s), if any, upon this webpage is solely to identify IronPython Contributors, Microsoft goods or services and not for commercial purposes.
- 1
- 2
- 3
All Checks are Passing
3 Passing Tests
Deployment Method: Individual Install, Upgrade, & Uninstall
To install IronPython, run the following command from the command line or from PowerShell:
To upgrade IronPython, run the following command from the command line or from PowerShell:
To uninstall IronPython, run the following command from the command line or from PowerShell:
Deployment Method:
This applies to both open source and commercial editions of Chocolatey.
1. Enter Your Internal Repository Url
(this should look similar to https://community.chocolatey.org/api/v2/)
2. Setup Your Environment
1. Ensure you are set for organizational deployment
Please see the organizational deployment guide
2. Get the package into your environment
Option 1: Cached Package (Unreliable, Requires Internet - Same As Community)-
Open Source or Commercial:
- Proxy Repository - Create a proxy nuget repository on Nexus, Artifactory Pro, or a proxy Chocolatey repository on ProGet. Point your upstream to https://community.chocolatey.org/api/v2/. Packages cache on first access automatically. Make sure your choco clients are using your proxy repository as a source and NOT the default community repository. See source command for more information.
- You can also just download the package and push it to a repository Download
-
Open Source
-
Download the package:
Download - Follow manual internalization instructions
-
-
Package Internalizer (C4B)
-
Run: (additional options)
choco download ironpython --internalize --source=https://community.chocolatey.org/api/v2/
-
For package and dependencies run:
choco push --source="'INTERNAL REPO URL'"
- Automate package internalization
-
Run: (additional options)
3. Copy Your Script
choco upgrade ironpython -y --source="'INTERNAL REPO URL'" [other options]
See options you can pass to upgrade.
See best practices for scripting.
Add this to a PowerShell script or use a Batch script with tools and in places where you are calling directly to Chocolatey. If you are integrating, keep in mind enhanced exit codes.
If you do use a PowerShell script, use the following to ensure bad exit codes are shown as failures:
choco upgrade ironpython -y --source="'INTERNAL REPO URL'"
$exitCode = $LASTEXITCODE
Write-Verbose "Exit code was $exitCode"
$validExitCodes = @(0, 1605, 1614, 1641, 3010)
if ($validExitCodes -contains $exitCode) {
Exit 0
}
Exit $exitCode
- name: Install ironpython
win_chocolatey:
name: ironpython
version: '3.4.1'
source: INTERNAL REPO URL
state: present
See docs at https://docs.ansible.com/ansible/latest/modules/win_chocolatey_module.html.
chocolatey_package 'ironpython' do
action :install
source 'INTERNAL REPO URL'
version '3.4.1'
end
See docs at https://docs.chef.io/resource_chocolatey_package.html.
cChocoPackageInstaller ironpython
{
Name = "ironpython"
Version = "3.4.1"
Source = "INTERNAL REPO URL"
}
Requires cChoco DSC Resource. See docs at https://github.com/chocolatey/cChoco.
package { 'ironpython':
ensure => '3.4.1',
provider => 'chocolatey',
source => 'INTERNAL REPO URL',
}
Requires Puppet Chocolatey Provider module. See docs at https://forge.puppet.com/puppetlabs/chocolatey.
4. If applicable - Chocolatey configuration/installation
See infrastructure management matrix for Chocolatey configuration elements and examples.
This package was approved as a trusted package on 25 Oct 2024.
IronPython is an open-source implementation of the Python programming language that is tightly integrated with the .NET Framework. IronPython can use the .NET Framework and Python libraries, and other .NET languages can use Python code just as easily.
md5: 2EC8B6F478E732D34F6458B47CC83016 | sha1: 9D833125C1B4FCAE8B49C8DEB13D681D0BBC63F7 | sha256: 0718C32F2184FE2B177A9F1DFADFCFFCC2C1755DE84F659730BA237E13339AF7 | sha512: 6AD6A512471702967E67B76075EF7E458500A7EC6A3DA08841D8108ED287B1F9A20144420303EF95D12910F5318B3A7E8051E35F947732F3910CD281D2E366AA
md5: 0CEF60F8033247008F70D12457F52C6D | sha1: 69268B9C497635B85085284274431C56D455BE2B | sha256: FA31585F4EE4140EE4DC3E4130643EA21F514D142A8DBACBC12AFBA5ED7BB5A3 | sha512: D46D37B2AD9846E4AB524D7ABC342A06E046376B14242BD12009E70584F008F50CB63A29A2E616FEBBE533B65E9E93C65DAC9E6C95B62191FB6916609F524C62
md5: 81CB4E974BDD7BE3D2B6A517D5A6BE8A | sha1: BDEE2E351F189BCADB4DE623B634E4A674946619 | sha256: 42154D6CF9FC8C15571140F0656BD98A127814E40F8CEA1FB567D9D97CDE0AA5 | sha512: D75B778A748FDDCD571777DBC36CE1C399BF0E3620E51D079491E5231AAA78C4E47551B6F14D53CC23377253960D9F6ED7B716EF0A48C89CCE9A953E205AC4C9
md5: E23F28491BF2FAA9B4FF0A5EA14F6104 | sha1: 0BCA66EF671E8C860E9F647278B084CAAEC721E4 | sha256: 237F9659526CB6C7F19837A6F2E13E368A19BD0A0731A4C9911038B146BDFF56 | sha512: F58D8396F210DABDFB3EAF011CE2503BF2DE4B005F37D8AE710745D84F81DD7E414C95E853D82DE6B1F48DFDD19FAA9A86DB4BA905B5409AA7CA4BDCC7650258
md5: 73DCF3954F7228B3290B6342B55A3D57 | sha1: 0AD74A838AD8AF7A766A99A24EDD822657E56114 | sha256: E5197479D47ADA36E395CBC9DA3381324A792D5D72C0F43916A45264EE97B0ED | sha512: 58763EF89E66C903887DD8B0EC60E8E243966496CBC6F8DEE49FD75F21D841162A4FB5C8EEA8C69C768A563827A73B30F2BBFB9B89B7F4C4B70462091DF50C70
md5: ECDFE8EDE869D2CCC6BF99981EA96400 | sha1: 2F410A0396BC148ED533AD49B6415FB58DD4D641 | sha256: ACCCCFBE45D9F08FFEED9916E37B33E98C65BE012CFFF6E7FA7B67210CE1FEFB | sha512: 5FC7FEE5C25CB2EEE19737068968E00A00961C257271B420F594E5A0DA0559502D04EE6BA2D8D2AAD77F3769622F6743A5EE8DAE23F8F993F33FB09ED8DB2741
md5: F09441A1EE47FB3E6571A3A448E05BAF | sha1: 3C5C5DF5F8F8DB3F0A35C5ED8D357313A54E3CDE | sha256: BF3FB84664F4097F1A8A9BC71A51DCF8CF1A905D4080A4D290DA1730866E856F | sha512: 0199AE0633BCCFEAEFBB5AED20832A4379C7AD73461D41A9DA3D6DC044093CC319670E67C4EFBF830308CBD9A48FB40D4A6C7E472DCC42EB745C6BA813E8E7C6
md5: AAA2CBF14E06E9D3586D8A4ED455DB33 | sha1: 3D216458740AD5CB05BC5F7C3491CDE44A1E5DF0 | sha256: 1D3EF8698281E7CF7371D1554AFEF5872B39F96C26DA772210A33DA041BA1183 | sha512: 0B14A039CA67982794A2BB69974EF04A7FBEE3686D7364F8F4DB70EA6259D29640CBB83D5B544D92FA1D3676C7619CD580FF45671A2BB4753ED8B383597C6DA8
md5: DA04A75DDC22118ED24E0B53E474805A | sha1: 2D68C648A6A6371B6046E6C3AF09128230E0AD32 | sha256: 66409F670315AFE8610F17A4D3A1EE52D72B6A46C544CEC97544E8385F90AD74 | sha512: 26AF01CA25E921465F477A0E1499EDC9E0AC26C23908E5E9B97D3AFD60F3308BFBF2C8CA89EA21878454CD88A1CDDD2F2F0172A6E1E87EF33C56CD7A8D16E9C8
md5: DA634C577CFE205A10D8D07789098565 | sha1: 1F0D5D2BD3A5331F3F3E2DB3803E500FC2153312 | sha256: E0D0ADA32DA47D7D72E0B1CD57A0146CF1BA456CD9BD7969D5753E5A79E50A1C | sha512: 8A147D5CB779C679B277689AE284BB6BB063A74B3932344DC46573E4C2A23BB221A7DFDA852EDBAD77F60AE8C3E2371E59B81F3DE1C1DC1D315993E37881DA74
md5: 7DF883A176DB40367A021870D5493984 | sha1: 94DA6B65FF390D7C26A3A44028D7021F85277376 | sha256: 9023CA3D1BB7DEBD58D74611B1F7A96F101B4192C6E146B6916060A78C170815 | sha512: 2C9C83449C90E5C30B36F193933B1D517EF1142E2EB9959A25EDAD7D64DED6AF4C277A1A471946F92A3654F7FA1ECBF673CC9AA1C90CB0FD7D1934D43E2F2F9A
md5: 1CBD3797464703E0F615CE6C305952DA | sha1: 482317A6B46E77B579B9E927B0057424D5A3A193 | sha256: D2E05B45DBD778448BAE830CB209431885463915205D01DB1AA88C76960D1C4C | sha512: CF18BB53E6917B18961806E5C1F81C87DF382ACD0BA5D7BF5D0BD6439F1305C455C8AC5C0929E97722B510E7DEC1BFF3EF181BE5034B184DECD0E4A397DB0204
md5: AB876538B68C21DAA09BD562321D088B | sha1: 98272EE4EB3DF1955632D57F450A5F7F8482CFB3 | sha256: 3D9987A384175801DF3B065EFAE6A3CF7C9ADA136E995F5DED9CB9F9F9998822 | sha512: B56CF32DA42A619EE0315FB68FD6C7E9783D30A870C73E407BC4E8EF2176C03C1E9DD2F32DB3B861C9E5CFFFD8E70A98A7D91401F74A5A7BC800DF9499714AFD
md5: 6DC3877BBB2B7BB46EDD2590E94D335B | sha1: 1AA3A59FB5E392C1F8E0CF0C4D3A9A5687406344 | sha256: 19920DD81602A9CBB76F40268AC70D8897AC3B8BEA3EF0F7F1A1A27C9AD5674F | sha512: 3BD19444CEF28ACB741B9AABE675A2418865A2B6FE5DF9D8BD2050F68924DBE396A9011FDC14ACB2A085CDB51B8E37AD349C287F040A68E7F405EBD6E1EF763C
md5: 428C0BDEA54356A34218627970C3CFE7 | sha1: C4F0A7DBB18AD638E3E94E86E7042560B03B4F43 | sha256: 5C43C75D7D88DF61CF76C4739346A1F7947671019C08785457E61DC790ECB0A5 | sha512: B1A1FC669931D0D1D559E50D94C3E7E32C3478F7C52BA11441359FC9C8499193209AD1A931BAFCCF70D9ABD1B2F2ED2879397BBB06B6576418FB5357A8B62189
md5: 248C5D53AA98789FEFBD8DA6DBA4156C | sha1: F80D766A9058765CA1BFD425D85C8677CD3A74A6 | sha256: 4E3C9E0D0EC9120DE35521EE3C3E7A5AD6709686EDC26E7B2B746973BA9C1A47 | sha512: 81EF2D3E12E249A7F26293B9DB981553AB7325A4AA2715573A7749716281B0080F08A0D3C8B6CDF7EBC10ABF353BF2A307646B5103CFD943F152CA0B67D2D21B
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Abstract Base Classes (ABCs) according to PEP 3119."""
from _weakrefset import WeakSet
def abstractmethod(funcobj):
"""A decorator indicating abstract methods.
Requires that the metaclass is ABCMeta or derived from it. A
class that has a metaclass derived from ABCMeta cannot be
instantiated unless all of its abstract methods are overridden.
The abstract methods can be called using any of the normal
'super' call mechanisms.
Usage:
class C(metaclass=ABCMeta):
@abstractmethod
def my_abstract_method(self, ...):
...
"""
funcobj.__isabstractmethod__ = True
return funcobj
class abstractclassmethod(classmethod):
"""
A decorator indicating abstract classmethods.
Similar to abstractmethod.
Usage:
class C(metaclass=ABCMeta):
@abstractclassmethod
def my_abstract_classmethod(cls, ...):
...
'abstractclassmethod' is deprecated. Use 'classmethod' with
'abstractmethod' instead.
"""
__isabstractmethod__ = True
def __init__(self, callable):
callable.__isabstractmethod__ = True
super().__init__(callable)
class abstractstaticmethod(staticmethod):
"""
A decorator indicating abstract staticmethods.
Similar to abstractmethod.
Usage:
class C(metaclass=ABCMeta):
@abstractstaticmethod
def my_abstract_staticmethod(...):
...
'abstractstaticmethod' is deprecated. Use 'staticmethod' with
'abstractmethod' instead.
"""
__isabstractmethod__ = True
def __init__(self, callable):
callable.__isabstractmethod__ = True
super().__init__(callable)
class abstractproperty(property):
"""
A decorator indicating abstract properties.
Requires that the metaclass is ABCMeta or derived from it. A
class that has a metaclass derived from ABCMeta cannot be
instantiated unless all of its abstract properties are overridden.
The abstract properties can be called using any of the normal
'super' call mechanisms.
Usage:
class C(metaclass=ABCMeta):
@abstractproperty
def my_abstract_property(self):
...
This defines a read-only property; you can also define a read-write
abstract property using the 'long' form of property declaration:
class C(metaclass=ABCMeta):
def getx(self): ...
def setx(self, value): ...
x = abstractproperty(getx, setx)
'abstractproperty' is deprecated. Use 'property' with 'abstractmethod'
instead.
"""
__isabstractmethod__ = True
class ABCMeta(type):
"""Metaclass for defining Abstract Base Classes (ABCs).
Use this metaclass to create an ABC. An ABC can be subclassed
directly, and then acts as a mix-in class. You can also register
unrelated concrete classes (even built-in classes) and unrelated
ABCs as 'virtual subclasses' -- these and their descendants will
be considered subclasses of the registering ABC by the built-in
issubclass() function, but the registering ABC won't show up in
their MRO (Method Resolution Order) nor will method
implementations defined by the registering ABC be callable (not
even via super()).
"""
# A global counter that is incremented each time a class is
# registered as a virtual subclass of anything. It forces the
# negative cache to be cleared before its next use.
# Note: this counter is private. Use `abc.get_cache_token()` for
# external code.
_abc_invalidation_counter = 0
def __new__(mcls, name, bases, namespace):
cls = super().__new__(mcls, name, bases, namespace)
# Compute set of abstract method names
abstracts = {name
for name, value in namespace.items()
if getattr(value, "__isabstractmethod__", False)}
for base in bases:
for name in getattr(base, "__abstractmethods__", set()):
value = getattr(cls, name, None)
if getattr(value, "__isabstractmethod__", False):
abstracts.add(name)
cls.__abstractmethods__ = frozenset(abstracts)
# Set up inheritance registry
cls._abc_registry = WeakSet()
cls._abc_cache = WeakSet()
cls._abc_negative_cache = WeakSet()
cls._abc_negative_cache_version = ABCMeta._abc_invalidation_counter
return cls
def register(cls, subclass):
"""Register a virtual subclass of an ABC.
Returns the subclass, to allow usage as a class decorator.
"""
if not isinstance(subclass, type):
raise TypeError("Can only register classes")
if issubclass(subclass, cls):
return subclass # Already a subclass
# Subtle: test for cycles *after* testing for "already a subclass";
# this means we allow X.register(X) and interpret it as a no-op.
if issubclass(cls, subclass):
# This would create a cycle, which is bad for the algorithm below
raise RuntimeError("Refusing to create an inheritance cycle")
cls._abc_registry.add(subclass)
ABCMeta._abc_invalidation_counter += 1 # Invalidate negative cache
return subclass
def _dump_registry(cls, file=None):
"""Debug helper to print the ABC registry."""
print("Class: %s.%s" % (cls.__module__, cls.__name__), file=file)
print("Inv.counter: %s" % ABCMeta._abc_invalidation_counter, file=file)
for name in sorted(cls.__dict__.keys()):
if name.startswith("_abc_"):
value = getattr(cls, name)
print("%s: %r" % (name, value), file=file)
def __instancecheck__(cls, instance):
"""Override for isinstance(instance, cls)."""
# Inline the cache checking
subclass = instance.__class__
if subclass in cls._abc_cache:
return True
subtype = type(instance)
if subtype is subclass:
if (cls._abc_negative_cache_version ==
ABCMeta._abc_invalidation_counter and
subclass in cls._abc_negative_cache):
return False
# Fall back to the subclass check.
return cls.__subclasscheck__(subclass)
return any(cls.__subclasscheck__(c) for c in {subclass, subtype})
def __subclasscheck__(cls, subclass):
"""Override for issubclass(subclass, cls)."""
# Check cache
if subclass in cls._abc_cache:
return True
# Check negative cache; may have to invalidate
if cls._abc_negative_cache_version < ABCMeta._abc_invalidation_counter:
# Invalidate the negative cache
cls._abc_negative_cache = WeakSet()
cls._abc_negative_cache_version = ABCMeta._abc_invalidation_counter
elif subclass in cls._abc_negative_cache:
return False
# Check the subclass hook
ok = cls.__subclasshook__(subclass)
if ok is not NotImplemented:
assert isinstance(ok, bool)
if ok:
cls._abc_cache.add(subclass)
else:
cls._abc_negative_cache.add(subclass)
return ok
# Check if it's a direct subclass
if cls in getattr(subclass, '__mro__', ()):
cls._abc_cache.add(subclass)
return True
# Check if it's a subclass of a registered class (recursive)
for rcls in cls._abc_registry:
if issubclass(subclass, rcls):
cls._abc_cache.add(subclass)
return True
# Check if it's a subclass of a subclass (recursive)
for scls in cls.__subclasses__():
if issubclass(subclass, scls):
cls._abc_cache.add(subclass)
return True
# No dice; update negative cache
cls._abc_negative_cache.add(subclass)
return False
class ABC(metaclass=ABCMeta):
"""Helper class that provides a standard way to create an ABC using
inheritance.
"""
pass
def get_cache_token():
"""Returns the current ABC cache token.
The token is an opaque object (supporting equality testing) identifying the
current version of the ABC cache for virtual subclasses. The token changes
with every call to ``register()`` on any ABC.
"""
return ABCMeta._abc_invalidation_counter
"""Stuff to parse AIFF-C and AIFF files.
Unless explicitly stated otherwise, the description below is true
both for AIFF-C files and AIFF files.
An AIFF-C file has the following structure.
+-----------------+
| FORM |
+-----------------+
| <size> |
+----+------------+
| | AIFC |
| +------------+
| | <chunks> |
| | . |
| | . |
| | . |
+----+------------+
An AIFF file has the string "AIFF" instead of "AIFC".
A chunk consists of an identifier (4 bytes) followed by a size (4 bytes,
big endian order), followed by the data. The size field does not include
the size of the 8 byte header.
The following chunk types are recognized.
FVER
<version number of AIFF-C defining document> (AIFF-C only).
MARK
<# of markers> (2 bytes)
list of markers:
<marker ID> (2 bytes, must be > 0)
<position> (4 bytes)
<marker name> ("pstring")
COMM
<# of channels> (2 bytes)
<# of sound frames> (4 bytes)
<size of the samples> (2 bytes)
<sampling frequency> (10 bytes, IEEE 80-bit extended
floating point)
in AIFF-C files only:
<compression type> (4 bytes)
<human-readable version of compression type> ("pstring")
SSND
<offset> (4 bytes, not used by this program)
<blocksize> (4 bytes, not used by this program)
<sound data>
A pstring consists of 1 byte length, a string of characters, and 0 or 1
byte pad to make the total length even.
Usage.
Reading AIFF files:
f = aifc.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
In some types of audio files, if the setpos() method is not used,
the seek() method is not necessary.
This returns an instance of a class with the following public methods:
getnchannels() -- returns number of audio channels (1 for
mono, 2 for stereo)
getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getcomptype() -- returns compression type ('NONE' for AIFF files)
getcompname() -- returns human-readable version of
compression type ('not compressed' for AIFF files)
getparams() -- returns a namedtuple consisting of all of the
above in the above order
getmarkers() -- get the list of marks in the audio file or None
if there are no marks
getmark(id) -- get mark with the specified id (raises an error
if the mark does not exist)
readframes(n) -- returns at most n frames of audio
rewind() -- rewind to the beginning of the audio stream
setpos(pos) -- seek to the specified position
tell() -- return the current position
close() -- close the instance (make it unusable)
The position returned by tell(), the position given to setpos() and
the position of marks are all compatible and have nothing to do with
the actual position in the file.
The close() method is called automatically when the class instance
is destroyed.
Writing AIFF files:
f = aifc.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
aiff() -- create an AIFF file (AIFF-C default)
aifc() -- create an AIFF-C file
setnchannels(n) -- set the number of channels
setsampwidth(n) -- set the sample width
setframerate(n) -- set the frame rate
setnframes(n) -- set the number of frames
setcomptype(type, name)
-- set the compression type and the
human-readable compression type
setparams(tuple)
-- set all parameters at once
setmark(id, pos, name)
-- add specified mark to the list of marks
tell() -- return current position in output file (useful
in combination with setmark())
writeframesraw(data)
-- write audio frames without pathing up the
file header
writeframes(data)
-- write audio frames and patch up the file header
close() -- patch up the file header and close the
output file
You should set the parameters before the first writeframesraw or
writeframes. The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes(b'') or
close() to patch up the sizes in the header.
Marks can be added anytime. If there are any marks, you must call
close() after all frames have been written.
The close() method is called automatically when the class instance
is destroyed.
When a file is opened with the extension '.aiff', an AIFF file is
written, otherwise an AIFF-C file is written. This default can be
changed by calling aiff() or aifc() before the first writeframes or
writeframesraw.
"""
import struct
import builtins
import warnings
__all__ = ["Error", "open", "openfp"]
class Error(Exception):
pass
_AIFC_version = 0xA2805140 # Version 1 of AIFF-C
def _read_long(file):
try:
return struct.unpack('>l', file.read(4))[0]
except struct.error:
raise EOFError
def _read_ulong(file):
try:
return struct.unpack('>L', file.read(4))[0]
except struct.error:
raise EOFError
def _read_short(file):
try:
return struct.unpack('>h', file.read(2))[0]
except struct.error:
raise EOFError
def _read_ushort(file):
try:
return struct.unpack('>H', file.read(2))[0]
except struct.error:
raise EOFError
def _read_string(file):
length = ord(file.read(1))
if length == 0:
data = b''
else:
data = file.read(length)
if length & 1 == 0:
dummy = file.read(1)
return data
_HUGE_VAL = 1.79769313486231e+308 # See <limits.h>
def _read_float(f): # 10 bytes
expon = _read_short(f) # 2 bytes
sign = 1
if expon < 0:
sign = -1
expon = expon + 0x8000
himant = _read_ulong(f) # 4 bytes
lomant = _read_ulong(f) # 4 bytes
if expon == himant == lomant == 0:
f = 0.0
elif expon == 0x7FFF:
f = _HUGE_VAL
else:
expon = expon - 16383
f = (himant * 0x100000000 + lomant) * pow(2.0, expon - 63)
return sign * f
def _write_short(f, x):
f.write(struct.pack('>h', x))
def _write_ushort(f, x):
f.write(struct.pack('>H', x))
def _write_long(f, x):
f.write(struct.pack('>l', x))
def _write_ulong(f, x):
f.write(struct.pack('>L', x))
def _write_string(f, s):
if len(s) > 255:
raise ValueError("string exceeds maximum pstring length")
f.write(struct.pack('B', len(s)))
f.write(s)
if len(s) & 1 == 0:
f.write(b'\x00')
def _write_float(f, x):
import math
if x < 0:
sign = 0x8000
x = x * -1
else:
sign = 0
if x == 0:
expon = 0
himant = 0
lomant = 0
else:
fmant, expon = math.frexp(x)
if expon > 16384 or fmant >= 1 or fmant != fmant: # Infinity or NaN
expon = sign|0x7FFF
himant = 0
lomant = 0
else: # Finite
expon = expon + 16382
if expon < 0: # denormalized
fmant = math.ldexp(fmant, expon)
expon = 0
expon = expon | sign
fmant = math.ldexp(fmant, 32)
fsmant = math.floor(fmant)
himant = int(fsmant)
fmant = math.ldexp(fmant - fsmant, 32)
fsmant = math.floor(fmant)
lomant = int(fsmant)
_write_ushort(f, expon)
_write_ulong(f, himant)
_write_ulong(f, lomant)
from chunk import Chunk
from collections import namedtuple
_aifc_params = namedtuple('_aifc_params',
'nchannels sampwidth framerate nframes comptype compname')
class Aifc_read:
# Variables used in this class:
#
# These variables are available to the user though appropriate
# methods of this class:
# _file -- the open file with methods read(), close(), and seek()
# set through the __init__() method
# _nchannels -- the number of audio channels
# available through the getnchannels() method
# _nframes -- the number of audio frames
# available through the getnframes() method
# _sampwidth -- the number of bytes per audio sample
# available through the getsampwidth() method
# _framerate -- the sampling frequency
# available through the getframerate() method
# _comptype -- the AIFF-C compression type ('NONE' if AIFF)
# available through the getcomptype() method
# _compname -- the human-readable AIFF-C compression type
# available through the getcomptype() method
# _markers -- the marks in the audio file
# available through the getmarkers() and getmark()
# methods
# _soundpos -- the position in the audio stream
# available through the tell() method, set through the
# setpos() method
#
# These variables are used internally only:
# _version -- the AIFF-C version number
# _decomp -- the decompressor from builtin module cl
# _comm_chunk_read -- 1 iff the COMM chunk has been read
# _aifc -- 1 iff reading an AIFF-C file
# _ssnd_seek_needed -- 1 iff positioned correctly in audio
# file for readframes()
# _ssnd_chunk -- instantiation of a chunk class for the SSND chunk
# _framesize -- size of one frame in the file
def initfp(self, file):
self._version = 0
self._convert = None
self._markers = []
self._soundpos = 0
self._file = file
chunk = Chunk(file)
if chunk.getname() != b'FORM':
raise Error('file does not start with FORM id')
formdata = chunk.read(4)
if formdata == b'AIFF':
self._aifc = 0
elif formdata == b'AIFC':
self._aifc = 1
else:
raise Error('not an AIFF or AIFF-C file')
self._comm_chunk_read = 0
while 1:
self._ssnd_seek_needed = 1
try:
chunk = Chunk(self._file)
except EOFError:
break
chunkname = chunk.getname()
if chunkname == b'COMM':
self._read_comm_chunk(chunk)
self._comm_chunk_read = 1
elif chunkname == b'SSND':
self._ssnd_chunk = chunk
dummy = chunk.read(8)
self._ssnd_seek_needed = 0
elif chunkname == b'FVER':
self._version = _read_ulong(chunk)
elif chunkname == b'MARK':
self._readmark(chunk)
chunk.skip()
if not self._comm_chunk_read or not self._ssnd_chunk:
raise Error('COMM chunk and/or SSND chunk missing')
def __init__(self, f):
if isinstance(f, str):
f = builtins.open(f, 'rb')
# else, assume it is an open file object already
self.initfp(f)
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
#
# User visible methods.
#
def getfp(self):
return self._file
def rewind(self):
self._ssnd_seek_needed = 1
self._soundpos = 0
def close(self):
file = self._file
if file is not None:
self._file = None
file.close()
def tell(self):
return self._soundpos
def getnchannels(self):
return self._nchannels
def getnframes(self):
return self._nframes
def getsampwidth(self):
return self._sampwidth
def getframerate(self):
return self._framerate
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
## def getversion(self):
## return self._version
def getparams(self):
return _aifc_params(self.getnchannels(), self.getsampwidth(),
self.getframerate(), self.getnframes(),
self.getcomptype(), self.getcompname())
def getmarkers(self):
if len(self._markers) == 0:
return None
return self._markers
def getmark(self, id):
for marker in self._markers:
if id == marker[0]:
return marker
raise Error('marker {0!r} does not exist'.format(id))
def setpos(self, pos):
if pos < 0 or pos > self._nframes:
raise Error('position not in range')
self._soundpos = pos
self._ssnd_seek_needed = 1
def readframes(self, nframes):
if self._ssnd_seek_needed:
self._ssnd_chunk.seek(0)
dummy = self._ssnd_chunk.read(8)
pos = self._soundpos * self._framesize
if pos:
self._ssnd_chunk.seek(pos + 8)
self._ssnd_seek_needed = 0
if nframes == 0:
return b''
data = self._ssnd_chunk.read(nframes * self._framesize)
if self._convert and data:
data = self._convert(data)
self._soundpos = self._soundpos + len(data) // (self._nchannels
* self._sampwidth)
return data
#
# Internal methods.
#
def _alaw2lin(self, data):
import audioop
return audioop.alaw2lin(data, 2)
def _ulaw2lin(self, data):
import audioop
return audioop.ulaw2lin(data, 2)
def _adpcm2lin(self, data):
import audioop
if not hasattr(self, '_adpcmstate'):
# first time
self._adpcmstate = None
data, self._adpcmstate = audioop.adpcm2lin(data, 2, self._adpcmstate)
return data
def _read_comm_chunk(self, chunk):
self._nchannels = _read_short(chunk)
self._nframes = _read_long(chunk)
self._sampwidth = (_read_short(chunk) + 7) // 8
self._framerate = int(_read_float(chunk))
self._framesize = self._nchannels * self._sampwidth
if self._aifc:
#DEBUG: SGI's soundeditor produces a bad size :-(
kludge = 0
if chunk.chunksize == 18:
kludge = 1
warnings.warn('Warning: bad COMM chunk size')
chunk.chunksize = 23
#DEBUG end
self._comptype = chunk.read(4)
#DEBUG start
if kludge:
length = ord(chunk.file.read(1))
if length & 1 == 0:
length = length + 1
chunk.chunksize = chunk.chunksize + length
chunk.file.seek(-1, 1)
#DEBUG end
self._compname = _read_string(chunk)
if self._comptype != b'NONE':
if self._comptype == b'G722':
self._convert = self._adpcm2lin
elif self._comptype in (b'ulaw', b'ULAW'):
self._convert = self._ulaw2lin
elif self._comptype in (b'alaw', b'ALAW'):
self._convert = self._alaw2lin
else:
raise Error('unsupported compression type')
self._sampwidth = 2
else:
self._comptype = b'NONE'
self._compname = b'not compressed'
def _readmark(self, chunk):
nmarkers = _read_short(chunk)
# Some files appear to contain invalid counts.
# Cope with this by testing for EOF.
try:
for i in range(nmarkers):
id = _read_short(chunk)
pos = _read_long(chunk)
name = _read_string(chunk)
if pos or name:
# some files appear to have
# dummy markers consisting of
# a position 0 and name ''
self._markers.append((id, pos, name))
except EOFError:
w = ('Warning: MARK chunk contains only %s marker%s instead of %s' %
(len(self._markers), '' if len(self._markers) == 1 else 's',
nmarkers))
warnings.warn(w)
class Aifc_write:
# Variables used in this class:
#
# These variables are user settable through appropriate methods
# of this class:
# _file -- the open file with methods write(), close(), tell(), seek()
# set through the __init__() method
# _comptype -- the AIFF-C compression type ('NONE' in AIFF)
# set through the setcomptype() or setparams() method
# _compname -- the human-readable AIFF-C compression type
# set through the setcomptype() or setparams() method
# _nchannels -- the number of audio channels
# set through the setnchannels() or setparams() method
# _sampwidth -- the number of bytes per audio sample
# set through the setsampwidth() or setparams() method
# _framerate -- the sampling frequency
# set through the setframerate() or setparams() method
# _nframes -- the number of audio frames written to the header
# set through the setnframes() or setparams() method
# _aifc -- whether we're writing an AIFF-C file or an AIFF file
# set through the aifc() method, reset through the
# aiff() method
#
# These variables are used internally only:
# _version -- the AIFF-C version number
# _comp -- the compressor from builtin module cl
# _nframeswritten -- the number of audio frames actually written
# _datalength -- the size of the audio samples written to the header
# _datawritten -- the size of the audio samples actually written
def __init__(self, f):
if isinstance(f, str):
filename = f
f = builtins.open(f, 'wb')
else:
# else, assume it is an open file object already
filename = '???'
self.initfp(f)
if filename[-5:] == '.aiff':
self._aifc = 0
else:
self._aifc = 1
def initfp(self, file):
self._file = file
self._version = _AIFC_version
self._comptype = b'NONE'
self._compname = b'not compressed'
self._convert = None
self._nchannels = 0
self._sampwidth = 0
self._framerate = 0
self._nframes = 0
self._nframeswritten = 0
self._datawritten = 0
self._datalength = 0
self._markers = []
self._marklength = 0
self._aifc = 1 # AIFF-C is default
def __del__(self):
self.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
#
# User visible methods.
#
def aiff(self):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
self._aifc = 0
def aifc(self):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
self._aifc = 1
def setnchannels(self, nchannels):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if nchannels < 1:
raise Error('bad # of channels')
self._nchannels = nchannels
def getnchannels(self):
if not self._nchannels:
raise Error('number of channels not set')
return self._nchannels
def setsampwidth(self, sampwidth):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if sampwidth < 1 or sampwidth > 4:
raise Error('bad sample width')
self._sampwidth = sampwidth
def getsampwidth(self):
if not self._sampwidth:
raise Error('sample width not set')
return self._sampwidth
def setframerate(self, framerate):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if framerate <= 0:
raise Error('bad frame rate')
self._framerate = framerate
def getframerate(self):
if not self._framerate:
raise Error('frame rate not set')
return self._framerate
def setnframes(self, nframes):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
self._nframes = nframes
def getnframes(self):
return self._nframeswritten
def setcomptype(self, comptype, compname):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if comptype not in (b'NONE', b'ulaw', b'ULAW',
b'alaw', b'ALAW', b'G722'):
raise Error('unsupported compression type')
self._comptype = comptype
self._compname = compname
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
## def setversion(self, version):
## if self._nframeswritten:
## raise Error, 'cannot change parameters after starting to write'
## self._version = version
def setparams(self, params):
nchannels, sampwidth, framerate, nframes, comptype, compname = params
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if comptype not in (b'NONE', b'ulaw', b'ULAW',
b'alaw', b'ALAW', b'G722'):
raise Error('unsupported compression type')
self.setnchannels(nchannels)
self.setsampwidth(sampwidth)
self.setframerate(framerate)
self.setnframes(nframes)
self.setcomptype(comptype, compname)
def getparams(self):
if not self._nchannels or not self._sampwidth or not self._framerate:
raise Error('not all parameters set')
return _aifc_params(self._nchannels, self._sampwidth, self._framerate,
self._nframes, self._comptype, self._compname)
def setmark(self, id, pos, name):
if id <= 0:
raise Error('marker ID must be > 0')
if pos < 0:
raise Error('marker position must be >= 0')
if not isinstance(name, bytes):
raise Error('marker name must be bytes')
for i in range(len(self._markers)):
if id == self._markers[i][0]:
self._markers[i] = id, pos, name
return
self._markers.append((id, pos, name))
def getmark(self, id):
for marker in self._markers:
if id == marker[0]:
return marker
raise Error('marker {0!r} does not exist'.format(id))
def getmarkers(self):
if len(self._markers) == 0:
return None
return self._markers
def tell(self):
return self._nframeswritten
def writeframesraw(self, data):
if not isinstance(data, (bytes, bytearray)):
data = memoryview(data).cast('B')
self._ensure_header_written(len(data))
nframes = len(data) // (self._sampwidth * self._nchannels)
if self._convert:
data = self._convert(data)
self._file.write(data)
self._nframeswritten = self._nframeswritten + nframes
self._datawritten = self._datawritten + len(data)
def writeframes(self, data):
self.writeframesraw(data)
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten:
self._patchheader()
def close(self):
if self._file is None:
return
try:
self._ensure_header_written(0)
if self._datawritten & 1:
# quick pad to even size
self._file.write(b'\x00')
self._datawritten = self._datawritten + 1
self._writemarkers()
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten or \
self._marklength:
self._patchheader()
finally:
# Prevent ref cycles
self._convert = None
f = self._file
self._file = None
f.close()
#
# Internal methods.
#
def _lin2alaw(self, data):
import audioop
return audioop.lin2alaw(data, 2)
def _lin2ulaw(self, data):
import audioop
return audioop.lin2ulaw(data, 2)
def _lin2adpcm(self, data):
import audioop
if not hasattr(self, '_adpcmstate'):
self._adpcmstate = None
data, self._adpcmstate = audioop.lin2adpcm(data, 2, self._adpcmstate)
return data
def _ensure_header_written(self, datasize):
if not self._nframeswritten:
if self._comptype in (b'ULAW', b'ulaw', b'ALAW', b'alaw', b'G722'):
if not self._sampwidth:
self._sampwidth = 2
if self._sampwidth != 2:
raise Error('sample width must be 2 when compressing '
'with ulaw/ULAW, alaw/ALAW or G7.22 (ADPCM)')
if not self._nchannels:
raise Error('# channels not specified')
if not self._sampwidth:
raise Error('sample width not specified')
if not self._framerate:
raise Error('sampling rate not specified')
self._write_header(datasize)
def _init_compression(self):
if self._comptype == b'G722':
self._convert = self._lin2adpcm
elif self._comptype in (b'ulaw', b'ULAW'):
self._convert = self._lin2ulaw
elif self._comptype in (b'alaw', b'ALAW'):
self._convert = self._lin2alaw
def _write_header(self, initlength):
if self._aifc and self._comptype != b'NONE':
self._init_compression()
self._file.write(b'FORM')
if not self._nframes:
self._nframes = initlength // (self._nchannels * self._sampwidth)
self._datalength = self._nframes * self._nchannels * self._sampwidth
if self._datalength & 1:
self._datalength = self._datalength + 1
if self._aifc:
if self._comptype in (b'ulaw', b'ULAW', b'alaw', b'ALAW'):
self._datalength = self._datalength // 2
if self._datalength & 1:
self._datalength = self._datalength + 1
elif self._comptype == b'G722':
self._datalength = (self._datalength + 3) // 4
if self._datalength & 1:
self._datalength = self._datalength + 1
try:
self._form_length_pos = self._file.tell()
except (AttributeError, OSError):
self._form_length_pos = None
commlength = self._write_form_length(self._datalength)
if self._aifc:
self._file.write(b'AIFC')
self._file.write(b'FVER')
_write_ulong(self._file, 4)
_write_ulong(self._file, self._version)
else:
self._file.write(b'AIFF')
self._file.write(b'COMM')
_write_ulong(self._file, commlength)
_write_short(self._file, self._nchannels)
if self._form_length_pos is not None:
self._nframes_pos = self._file.tell()
_write_ulong(self._file, self._nframes)
if self._comptype in (b'ULAW', b'ulaw', b'ALAW', b'alaw', b'G722'):
_write_short(self._file, 8)
else:
_write_short(self._file, self._sampwidth * 8)
_write_float(self._file, self._framerate)
if self._aifc:
self._file.write(self._comptype)
_write_string(self._file, self._compname)
self._file.write(b'SSND')
if self._form_length_pos is not None:
self._ssnd_length_pos = self._file.tell()
_write_ulong(self._file, self._datalength + 8)
_write_ulong(self._file, 0)
_write_ulong(self._file, 0)
def _write_form_length(self, datalength):
if self._aifc:
commlength = 18 + 5 + len(self._compname)
if commlength & 1:
commlength = commlength + 1
verslength = 12
else:
commlength = 18
verslength = 0
_write_ulong(self._file, 4 + verslength + self._marklength + \
8 + commlength + 16 + datalength)
return commlength
def _patchheader(self):
curpos = self._file.tell()
if self._datawritten & 1:
datalength = self._datawritten + 1
self._file.write(b'\x00')
else:
datalength = self._datawritten
if datalength == self._datalength and \
self._nframes == self._nframeswritten and \
self._marklength == 0:
self._file.seek(curpos, 0)
return
self._file.seek(self._form_length_pos, 0)
dummy = self._write_form_length(datalength)
self._file.seek(self._nframes_pos, 0)
_write_ulong(self._file, self._nframeswritten)
self._file.seek(self._ssnd_length_pos, 0)
_write_ulong(self._file, datalength + 8)
self._file.seek(curpos, 0)
self._nframes = self._nframeswritten
self._datalength = datalength
def _writemarkers(self):
if len(self._markers) == 0:
return
self._file.write(b'MARK')
length = 2
for marker in self._markers:
id, pos, name = marker
length = length + len(name) + 1 + 6
if len(name) & 1 == 0:
length = length + 1
_write_ulong(self._file, length)
self._marklength = length + 8
_write_short(self._file, len(self._markers))
for marker in self._markers:
id, pos, name = marker
_write_short(self._file, id)
_write_ulong(self._file, pos)
_write_string(self._file, name)
def open(f, mode=None):
if mode is None:
if hasattr(f, 'mode'):
mode = f.mode
else:
mode = 'rb'
if mode in ('r', 'rb'):
return Aifc_read(f)
elif mode in ('w', 'wb'):
return Aifc_write(f)
else:
raise Error("mode must be 'r', 'rb', 'w', or 'wb'")
openfp = open # B/W compatibility
if __name__ == '__main__':
import sys
if not sys.argv[1:]:
sys.argv.append('/usr/demos/data/audio/bach.aiff')
fn = sys.argv[1]
with open(fn, 'r') as f:
print("Reading", fn)
print("nchannels =", f.getnchannels())
print("nframes =", f.getnframes())
print("sampwidth =", f.getsampwidth())
print("framerate =", f.getframerate())
print("comptype =", f.getcomptype())
print("compname =", f.getcompname())
if sys.argv[2:]:
gn = sys.argv[2]
print("Writing", gn)
with open(gn, 'w') as g:
g.setparams(f.getparams())
while 1:
data = f.readframes(1024)
if not data:
break
g.writeframes(data)
print("Done.")
import webbrowser
import hashlib
webbrowser.open("http://xkcd.com/353/")
def geohash(latitude, longitude, datedow):
'''Compute geohash() using the Munroe algorithm.
>>> geohash(37.421542, -122.085589, b'2005-05-26-10458.68')
37.857713 -122.544543
'''
# http://xkcd.com/426/
h = hashlib.md5(datedow).hexdigest()
p, q = [('%f' % float.fromhex('0.' + x)) for x in (h[:16], h[16:32])]
print('%d%s %d%s' % (latitude, p[1:], longitude, q[1:]))
# Author: Steven J. Bethard <[email protected]>.
"""Command-line parsing library
This module is an optparse-inspired command-line parsing library that:
- handles both optional and positional arguments
- produces highly informative usage messages
- supports parsers that dispatch to sub-parsers
The following is a simple usage example that sums integers from the
command-line and writes the result to a file::
parser = argparse.ArgumentParser(
description='sum the integers at the command line')
parser.add_argument(
'integers', metavar='int', nargs='+', type=int,
help='an integer to be summed')
parser.add_argument(
'--log', default=sys.stdout, type=argparse.FileType('w'),
help='the file where the sum should be written')
args = parser.parse_args()
args.log.write('%s' % sum(args.integers))
args.log.close()
The module contains the following public classes:
- ArgumentParser -- The main entry point for command-line parsing. As the
example above shows, the add_argument() method is used to populate
the parser with actions for optional and positional arguments. Then
the parse_args() method is invoked to convert the args at the
command-line into an object with attributes.
- ArgumentError -- The exception raised by ArgumentParser objects when
there are errors with the parser's actions. Errors raised while
parsing the command-line are caught by ArgumentParser and emitted
as command-line messages.
- FileType -- A factory for defining types of files to be created. As the
example above shows, instances of FileType are typically passed as
the type= argument of add_argument() calls.
- Action -- The base class for parser actions. Typically actions are
selected by passing strings like 'store_true' or 'append_const' to
the action= argument of add_argument(). However, for greater
customization of ArgumentParser actions, subclasses of Action may
be defined and passed as the action= argument.
- HelpFormatter, RawDescriptionHelpFormatter, RawTextHelpFormatter,
ArgumentDefaultsHelpFormatter -- Formatter classes which
may be passed as the formatter_class= argument to the
ArgumentParser constructor. HelpFormatter is the default,
RawDescriptionHelpFormatter and RawTextHelpFormatter tell the parser
not to change the formatting for help text, and
ArgumentDefaultsHelpFormatter adds information about argument defaults
to the help.
All other classes in this module are considered implementation details.
(Also note that HelpFormatter and RawDescriptionHelpFormatter are only
considered public as object names -- the API of the formatter objects is
still considered an implementation detail.)
"""
__version__ = '1.1'
__all__ = [
'ArgumentParser',
'ArgumentError',
'ArgumentTypeError',
'FileType',
'HelpFormatter',
'ArgumentDefaultsHelpFormatter',
'RawDescriptionHelpFormatter',
'RawTextHelpFormatter',
'MetavarTypeHelpFormatter',
'Namespace',
'Action',
'ONE_OR_MORE',
'OPTIONAL',
'PARSER',
'REMAINDER',
'SUPPRESS',
'ZERO_OR_MORE',
]
import collections as _collections
import copy as _copy
import os as _os
import re as _re
import sys as _sys
import textwrap as _textwrap
from gettext import gettext as _, ngettext
SUPPRESS = '==SUPPRESS=='
OPTIONAL = '?'
ZERO_OR_MORE = '*'
ONE_OR_MORE = '+'
PARSER = 'A...'
REMAINDER = '...'
_UNRECOGNIZED_ARGS_ATTR = '_unrecognized_args'
# =============================
# Utility functions and classes
# =============================
class _AttributeHolder(object):
"""Abstract base class that provides __repr__.
The __repr__ method returns a string in the format::
ClassName(attr=name, attr=name, ...)
The attributes are determined either by a class-level attribute,
'_kwarg_names', or by inspecting the instance __dict__.
"""
def __repr__(self):
type_name = type(self).__name__
arg_strings = []
for arg in self._get_args():
arg_strings.append(repr(arg))
for name, value in self._get_kwargs():
arg_strings.append('%s=%r' % (name, value))
return '%s(%s)' % (type_name, ', '.join(arg_strings))
def _get_kwargs(self):
return sorted(self.__dict__.items())
def _get_args(self):
return []
def _ensure_value(namespace, name, value):
if getattr(namespace, name, None) is None:
setattr(namespace, name, value)
return getattr(namespace, name)
# ===============
# Formatting Help
# ===============
class HelpFormatter(object):
"""Formatter for generating usage messages and argument help strings.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def __init__(self,
prog,
indent_increment=2,
max_help_position=24,
width=None):
# default setting for width
if width is None:
try:
width = int(_os.environ['COLUMNS'])
except (KeyError, ValueError):
width = 80
width -= 2
self._prog = prog
self._indent_increment = indent_increment
self._max_help_position = max_help_position
self._max_help_position = min(max_help_position,
max(width - 20, indent_increment * 2))
self._width = width
self._current_indent = 0
self._level = 0
self._action_max_length = 0
self._root_section = self._Section(self, None)
self._current_section = self._root_section
self._whitespace_matcher = _re.compile(r'\s+')
self._long_break_matcher = _re.compile(r'\n\n\n+')
# ===============================
# Section and indentation methods
# ===============================
def _indent(self):
self._current_indent += self._indent_increment
self._level += 1
def _dedent(self):
self._current_indent -= self._indent_increment
assert self._current_indent >= 0, 'Indent decreased below 0.'
self._level -= 1
class _Section(object):
def __init__(self, formatter, parent, heading=None):
self.formatter = formatter
self.parent = parent
self.heading = heading
self.items = []
def format_help(self):
# format the indented section
if self.parent is not None:
self.formatter._indent()
join = self.formatter._join_parts
for func, args in self.items:
func(*args)
item_help = join([func(*args) for func, args in self.items])
if self.parent is not None:
self.formatter._dedent()
# return nothing if the section was empty
if not item_help:
return ''
# add the heading if the section was non-empty
if self.heading is not SUPPRESS and self.heading is not None:
current_indent = self.formatter._current_indent
heading = '%*s%s:\n' % (current_indent, '', self.heading)
else:
heading = ''
# join the section-initial newline, the heading and the help
return join(['\n', heading, item_help, '\n'])
def _add_item(self, func, args):
self._current_section.items.append((func, args))
# ========================
# Message building methods
# ========================
def start_section(self, heading):
self._indent()
section = self._Section(self, self._current_section, heading)
self._add_item(section.format_help, [])
self._current_section = section
def end_section(self):
self._current_section = self._current_section.parent
self._dedent()
def add_text(self, text):
if text is not SUPPRESS and text is not None:
self._add_item(self._format_text, [text])
def add_usage(self, usage, actions, groups, prefix=None):
if usage is not SUPPRESS:
args = usage, actions, groups, prefix
self._add_item(self._format_usage, args)
def add_argument(self, action):
if action.help is not SUPPRESS:
# find all invocations
get_invocation = self._format_action_invocation
invocations = [get_invocation(action)]
for subaction in self._iter_indented_subactions(action):
invocations.append(get_invocation(subaction))
# update the maximum item length
invocation_length = max([len(s) for s in invocations])
action_length = invocation_length + self._current_indent
self._action_max_length = max(self._action_max_length,
action_length)
# add the item to the list
self._add_item(self._format_action, [action])
def add_arguments(self, actions):
for action in actions:
self.add_argument(action)
# =======================
# Help-formatting methods
# =======================
def format_help(self):
help = self._root_section.format_help()
if help:
help = self._long_break_matcher.sub('\n\n', help)
help = help.strip('\n') + '\n'
return help
def _join_parts(self, part_strings):
return ''.join([part
for part in part_strings
if part and part is not SUPPRESS])
def _format_usage(self, usage, actions, groups, prefix):
if prefix is None:
prefix = _('usage: ')
# if usage is specified, use that
if usage is not None:
usage = usage % dict(prog=self._prog)
# if no optionals or positionals are available, usage is just prog
elif usage is None and not actions:
usage = '%(prog)s' % dict(prog=self._prog)
# if optionals and positionals are available, calculate usage
elif usage is None:
prog = '%(prog)s' % dict(prog=self._prog)
# split optionals from positionals
optionals = []
positionals = []
for action in actions:
if action.option_strings:
optionals.append(action)
else:
positionals.append(action)
# build full usage string
format = self._format_actions_usage
action_usage = format(optionals + positionals, groups)
usage = ' '.join([s for s in [prog, action_usage] if s])
# wrap the usage parts if it's too long
text_width = self._width - self._current_indent
if len(prefix) + len(usage) > text_width:
# break usage into wrappable parts
part_regexp = r'\(.*?\)+|\[.*?\]+|\S+'
opt_usage = format(optionals, groups)
pos_usage = format(positionals, groups)
opt_parts = _re.findall(part_regexp, opt_usage)
pos_parts = _re.findall(part_regexp, pos_usage)
assert ' '.join(opt_parts) == opt_usage
assert ' '.join(pos_parts) == pos_usage
# helper for wrapping lines
def get_lines(parts, indent, prefix=None):
lines = []
line = []
if prefix is not None:
line_len = len(prefix) - 1
else:
line_len = len(indent) - 1
for part in parts:
if line_len + 1 + len(part) > text_width and line:
lines.append(indent + ' '.join(line))
line = []
line_len = len(indent) - 1
line.append(part)
line_len += len(part) + 1
if line:
lines.append(indent + ' '.join(line))
if prefix is not None:
lines[0] = lines[0][len(indent):]
return lines
# if prog is short, follow it with optionals or positionals
if len(prefix) + len(prog) <= 0.75 * text_width:
indent = ' ' * (len(prefix) + len(prog) + 1)
if opt_parts:
lines = get_lines([prog] + opt_parts, indent, prefix)
lines.extend(get_lines(pos_parts, indent))
elif pos_parts:
lines = get_lines([prog] + pos_parts, indent, prefix)
else:
lines = [prog]
# if prog is long, put it on its own line
else:
indent = ' ' * len(prefix)
parts = opt_parts + pos_parts
lines = get_lines(parts, indent)
if len(lines) > 1:
lines = []
lines.extend(get_lines(opt_parts, indent))
lines.extend(get_lines(pos_parts, indent))
lines = [prog] + lines
# join lines into usage
usage = '\n'.join(lines)
# prefix with 'usage:'
return '%s%s\n\n' % (prefix, usage)
def _format_actions_usage(self, actions, groups):
# find group indices and identify actions in groups
group_actions = set()
inserts = {}
for group in groups:
try:
start = actions.index(group._group_actions[0])
except ValueError:
continue
else:
end = start + len(group._group_actions)
if actions[start:end] == group._group_actions:
for action in group._group_actions:
group_actions.add(action)
if not group.required:
if start in inserts:
inserts[start] += ' ['
else:
inserts[start] = '['
inserts[end] = ']'
else:
if start in inserts:
inserts[start] += ' ('
else:
inserts[start] = '('
inserts[end] = ')'
for i in range(start + 1, end):
inserts[i] = '|'
# collect all actions format strings
parts = []
for i, action in enumerate(actions):
# suppressed arguments are marked with None
# remove | separators for suppressed arguments
if action.help is SUPPRESS:
parts.append(None)
if inserts.get(i) == '|':
inserts.pop(i)
elif inserts.get(i + 1) == '|':
inserts.pop(i + 1)
# produce all arg strings
elif not action.option_strings:
default = self._get_default_metavar_for_positional(action)
part = self._format_args(action, default)
# if it's in a group, strip the outer []
if action in group_actions:
if part[0] == '[' and part[-1] == ']':
part = part[1:-1]
# add the action string to the list
parts.append(part)
# produce the first way to invoke the option in brackets
else:
option_string = action.option_strings[0]
# if the Optional doesn't take a value, format is:
# -s or --long
if action.nargs == 0:
part = '%s' % option_string
# if the Optional takes a value, format is:
# -s ARGS or --long ARGS
else:
default = self._get_default_metavar_for_optional(action)
args_string = self._format_args(action, default)
part = '%s %s' % (option_string, args_string)
# make it look optional if it's not required or in a group
if not action.required and action not in group_actions:
part = '[%s]' % part
# add the action string to the list
parts.append(part)
# insert things at the necessary indices
for i in sorted(inserts, reverse=True):
parts[i:i] = [inserts[i]]
# join all the action items with spaces
text = ' '.join([item for item in parts if item is not None])
# clean up separators for mutually exclusive groups
open = r'[\[(]'
close = r'[\])]'
text = _re.sub(r'(%s) ' % open, r'\1', text)
text = _re.sub(r' (%s)' % close, r'\1', text)
text = _re.sub(r'%s *%s' % (open, close), r'', text)
text = _re.sub(r'\(([^|]*)\)', r'\1', text)
text = text.strip()
# return the text
return text
def _format_text(self, text):
if '%(prog)' in text:
text = text % dict(prog=self._prog)
text_width = max(self._width - self._current_indent, 11)
indent = ' ' * self._current_indent
return self._fill_text(text, text_width, indent) + '\n\n'
def _format_action(self, action):
# determine the required width and the entry label
help_position = min(self._action_max_length + 2,
self._max_help_position)
help_width = max(self._width - help_position, 11)
action_width = help_position - self._current_indent - 2
action_header = self._format_action_invocation(action)
# no help; start on same line and add a final newline
if not action.help:
tup = self._current_indent, '', action_header
action_header = '%*s%s\n' % tup
# short action name; start on the same line and pad two spaces
elif len(action_header) <= action_width:
tup = self._current_indent, '', action_width, action_header
action_header = '%*s%-*s ' % tup
indent_first = 0
# long action name; start on the next line
else:
tup = self._current_indent, '', action_header
action_header = '%*s%s\n' % tup
indent_first = help_position
# collect the pieces of the action help
parts = [action_header]
# if there was help for the action, add lines of help text
if action.help:
help_text = self._expand_help(action)
help_lines = self._split_lines(help_text, help_width)
parts.append('%*s%s\n' % (indent_first, '', help_lines[0]))
for line in help_lines[1:]:
parts.append('%*s%s\n' % (help_position, '', line))
# or add a newline if the description doesn't end with one
elif not action_header.endswith('\n'):
parts.append('\n')
# if there are any sub-actions, add their help as well
for subaction in self._iter_indented_subactions(action):
parts.append(self._format_action(subaction))
# return a single string
return self._join_parts(parts)
def _format_action_invocation(self, action):
if not action.option_strings:
default = self._get_default_metavar_for_positional(action)
metavar, = self._metavar_formatter(action, default)(1)
return metavar
else:
parts = []
# if the Optional doesn't take a value, format is:
# -s, --long
if action.nargs == 0:
parts.extend(action.option_strings)
# if the Optional takes a value, format is:
# -s ARGS, --long ARGS
else:
default = self._get_default_metavar_for_optional(action)
args_string = self._format_args(action, default)
for option_string in action.option_strings:
parts.append('%s %s' % (option_string, args_string))
return ', '.join(parts)
def _metavar_formatter(self, action, default_metavar):
if action.metavar is not None:
result = action.metavar
elif action.choices is not None:
choice_strs = [str(choice) for choice in action.choices]
result = '{%s}' % ','.join(choice_strs)
else:
result = default_metavar
def format(tuple_size):
if isinstance(result, tuple):
return result
else:
return (result, ) * tuple_size
return format
def _format_args(self, action, default_metavar):
get_metavar = self._metavar_formatter(action, default_metavar)
if action.nargs is None:
result = '%s' % get_metavar(1)
elif action.nargs == OPTIONAL:
result = '[%s]' % get_metavar(1)
elif action.nargs == ZERO_OR_MORE:
result = '[%s [%s ...]]' % get_metavar(2)
elif action.nargs == ONE_OR_MORE:
result = '%s [%s ...]' % get_metavar(2)
elif action.nargs == REMAINDER:
result = '...'
elif action.nargs == PARSER:
result = '%s ...' % get_metavar(1)
else:
formats = ['%s' for _ in range(action.nargs)]
result = ' '.join(formats) % get_metavar(action.nargs)
return result
def _expand_help(self, action):
params = dict(vars(action), prog=self._prog)
for name in list(params):
if params[name] is SUPPRESS:
del params[name]
for name in list(params):
if hasattr(params[name], '__name__'):
params[name] = params[name].__name__
if params.get('choices') is not None:
choices_str = ', '.join([str(c) for c in params['choices']])
params['choices'] = choices_str
return self._get_help_string(action) % params
def _iter_indented_subactions(self, action):
try:
get_subactions = action._get_subactions
except AttributeError:
pass
else:
self._indent()
yield from get_subactions()
self._dedent()
def _split_lines(self, text, width):
text = self._whitespace_matcher.sub(' ', text).strip()
return _textwrap.wrap(text, width)
def _fill_text(self, text, width, indent):
text = self._whitespace_matcher.sub(' ', text).strip()
return _textwrap.fill(text, width, initial_indent=indent,
subsequent_indent=indent)
def _get_help_string(self, action):
return action.help
def _get_default_metavar_for_optional(self, action):
return action.dest.upper()
def _get_default_metavar_for_positional(self, action):
return action.dest
class RawDescriptionHelpFormatter(HelpFormatter):
"""Help message formatter which retains any formatting in descriptions.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _fill_text(self, text, width, indent):
return ''.join(indent + line for line in text.splitlines(keepends=True))
class RawTextHelpFormatter(RawDescriptionHelpFormatter):
"""Help message formatter which retains formatting of all help text.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _split_lines(self, text, width):
return text.splitlines()
class ArgumentDefaultsHelpFormatter(HelpFormatter):
"""Help message formatter which adds default values to argument help.
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _get_help_string(self, action):
help = action.help
if '%(default)' not in action.help:
if action.default is not SUPPRESS:
defaulting_nargs = [OPTIONAL, ZERO_OR_MORE]
if action.option_strings or action.nargs in defaulting_nargs:
help += ' (default: %(default)s)'
return help
class MetavarTypeHelpFormatter(HelpFormatter):
"""Help message formatter which uses the argument 'type' as the default
metavar value (instead of the argument 'dest')
Only the name of this class is considered a public API. All the methods
provided by the class are considered an implementation detail.
"""
def _get_default_metavar_for_optional(self, action):
return action.type.__name__
def _get_default_metavar_for_positional(self, action):
return action.type.__name__
# =====================
# Options and Arguments
# =====================
def _get_action_name(argument):
if argument is None:
return None
elif argument.option_strings:
return '/'.join(argument.option_strings)
elif argument.metavar not in (None, SUPPRESS):
return argument.metavar
elif argument.dest not in (None, SUPPRESS):
return argument.dest
else:
return None
class ArgumentError(Exception):
"""An error from creating or using an argument (optional or positional).
The string value of this exception is the message, augmented with
information about the argument that caused it.
"""
def __init__(self, argument, message):
self.argument_name = _get_action_name(argument)
self.message = message
def __str__(self):
if self.argument_name is None:
format = '%(message)s'
else:
format = 'argument %(argument_name)s: %(message)s'
return format % dict(message=self.message,
argument_name=self.argument_name)
class ArgumentTypeError(Exception):
"""An error from trying to convert a command line string to a type."""
pass
# ==============
# Action classes
# ==============
class Action(_AttributeHolder):
"""Information about how to convert command line strings to Python objects.
Action objects are used by an ArgumentParser to represent the information
needed to parse a single argument from one or more strings from the
command line. The keyword arguments to the Action constructor are also
all attributes of Action instances.
Keyword Arguments:
- option_strings -- A list of command-line option strings which
should be associated with this action.
- dest -- The name of the attribute to hold the created object(s)
- nargs -- The number of command-line arguments that should be
consumed. By default, one argument will be consumed and a single
value will be produced. Other values include:
- N (an integer) consumes N arguments (and produces a list)
- '?' consumes zero or one arguments
- '*' consumes zero or more arguments (and produces a list)
- '+' consumes one or more arguments (and produces a list)
Note that the difference between the default and nargs=1 is that
with the default, a single value will be produced, while with
nargs=1, a list containing a single value will be produced.
- const -- The value to be produced if the option is specified and the
option uses an action that takes no values.
- default -- The value to be produced if the option is not specified.
- type -- A callable that accepts a single string argument, and
returns the converted value. The standard Python types str, int,
float, and complex are useful examples of such callables. If None,
str is used.
- choices -- A container of values that should be allowed. If not None,
after a command-line argument has been converted to the appropriate
type, an exception will be raised if it is not a member of this
collection.
- required -- True if the action must always be specified at the
command line. This is only meaningful for optional command-line
arguments.
- help -- The help string describing the argument.
- metavar -- The name to be used for the option's argument with the
help string. If None, the 'dest' value will be used as the name.
"""
def __init__(self,
option_strings,
dest,
nargs=None,
const=None,
default=None,
type=None,
choices=None,
required=False,
help=None,
metavar=None):
self.option_strings = option_strings
self.dest = dest
self.nargs = nargs
self.const = const
self.default = default
self.type = type
self.choices = choices
self.required = required
self.help = help
self.metavar = metavar
def _get_kwargs(self):
names = [
'option_strings',
'dest',
'nargs',
'const',
'default',
'type',
'choices',
'help',
'metavar',
]
return [(name, getattr(self, name)) for name in names]
def __call__(self, parser, namespace, values, option_string=None):
raise NotImplementedError(_('.__call__() not defined'))
class _StoreAction(Action):
def __init__(self,
option_strings,
dest,
nargs=None,
const=None,
default=None,
type=None,
choices=None,
required=False,
help=None,
metavar=None):
if nargs == 0:
raise ValueError('nargs for store actions must be > 0; if you '
'have nothing to store, actions such as store '
'true or store const may be more appropriate')
if const is not None and nargs != OPTIONAL:
raise ValueError('nargs must be %r to supply const' % OPTIONAL)
super(_StoreAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=nargs,
const=const,
default=default,
type=type,
choices=choices,
required=required,
help=help,
metavar=metavar)
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, values)
class _StoreConstAction(Action):
def __init__(self,
option_strings,
dest,
const,
default=None,
required=False,
help=None,
metavar=None):
super(_StoreConstAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=0,
const=const,
default=default,
required=required,
help=help)
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, self.const)
class _StoreTrueAction(_StoreConstAction):
def __init__(self,
option_strings,
dest,
default=False,
required=False,
help=None):
super(_StoreTrueAction, self).__init__(
option_strings=option_strings,
dest=dest,
const=True,
default=default,
required=required,
help=help)
class _StoreFalseAction(_StoreConstAction):
def __init__(self,
option_strings,
dest,
default=True,
required=False,
help=None):
super(_StoreFalseAction, self).__init__(
option_strings=option_strings,
dest=dest,
const=False,
default=default,
required=required,
help=help)
class _AppendAction(Action):
def __init__(self,
option_strings,
dest,
nargs=None,
const=None,
default=None,
type=None,
choices=None,
required=False,
help=None,
metavar=None):
if nargs == 0:
raise ValueError('nargs for append actions must be > 0; if arg '
'strings are not supplying the value to append, '
'the append const action may be more appropriate')
if const is not None and nargs != OPTIONAL:
raise ValueError('nargs must be %r to supply const' % OPTIONAL)
super(_AppendAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=nargs,
const=const,
default=default,
type=type,
choices=choices,
required=required,
help=help,
metavar=metavar)
def __call__(self, parser, namespace, values, option_string=None):
items = _copy.copy(_ensure_value(namespace, self.dest, []))
items.append(values)
setattr(namespace, self.dest, items)
class _AppendConstAction(Action):
def __init__(self,
option_strings,
dest,
const,
default=None,
required=False,
help=None,
metavar=None):
super(_AppendConstAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=0,
const=const,
default=default,
required=required,
help=help,
metavar=metavar)
def __call__(self, parser, namespace, values, option_string=None):
items = _copy.copy(_ensure_value(namespace, self.dest, []))
items.append(self.const)
setattr(namespace, self.dest, items)
class _CountAction(Action):
def __init__(self,
option_strings,
dest,
default=None,
required=False,
help=None):
super(_CountAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=0,
default=default,
required=required,
help=help)
def __call__(self, parser, namespace, values, option_string=None):
new_count = _ensure_value(namespace, self.dest, 0) + 1
setattr(namespace, self.dest, new_count)
class _HelpAction(Action):
def __init__(self,
option_strings,
dest=SUPPRESS,
default=SUPPRESS,
help=None):
super(_HelpAction, self).__init__(
option_strings=option_strings,
dest=dest,
default=default,
nargs=0,
help=help)
def __call__(self, parser, namespace, values, option_string=None):
parser.print_help()
parser.exit()
class _VersionAction(Action):
def __init__(self,
option_strings,
version=None,
dest=SUPPRESS,
default=SUPPRESS,
help="show program's version number and exit"):
super(_VersionAction, self).__init__(
option_strings=option_strings,
dest=dest,
default=default,
nargs=0,
help=help)
self.version = version
def __call__(self, parser, namespace, values, option_string=None):
version = self.version
if version is None:
version = parser.version
formatter = parser._get_formatter()
formatter.add_text(version)
parser._print_message(formatter.format_help(), _sys.stdout)
parser.exit()
class _SubParsersAction(Action):
class _ChoicesPseudoAction(Action):
def __init__(self, name, aliases, help):
metavar = dest = name
if aliases:
metavar += ' (%s)' % ', '.join(aliases)
sup = super(_SubParsersAction._ChoicesPseudoAction, self)
sup.__init__(option_strings=[], dest=dest, help=help,
metavar=metavar)
def __init__(self,
option_strings,
prog,
parser_class,
dest=SUPPRESS,
help=None,
metavar=None):
self._prog_prefix = prog
self._parser_class = parser_class
self._name_parser_map = _collections.OrderedDict()
self._choices_actions = []
super(_SubParsersAction, self).__init__(
option_strings=option_strings,
dest=dest,
nargs=PARSER,
choices=self._name_parser_map,
help=help,
metavar=metavar)
def add_parser(self, name, **kwargs):
# set prog from the existing prefix
if kwargs.get('prog') is None:
kwargs['prog'] = '%s %s' % (self._prog_prefix, name)
aliases = kwargs.pop('aliases', ())
# create a pseudo-action to hold the choice help
if 'help' in kwargs:
help = kwargs.pop('help')
choice_action = self._ChoicesPseudoAction(name, aliases, help)
self._choices_actions.append(choice_action)
# create the parser and add it to the map
parser = self._parser_class(**kwargs)
self._name_parser_map[name] = parser
# make parser available under aliases also
for alias in aliases:
self._name_parser_map[alias] = parser
return parser
def _get_subactions(self):
return self._choices_actions
def __call__(self, parser, namespace, values, option_string=None):
parser_name = values[0]
arg_strings = values[1:]
# set the parser name if requested
if self.dest is not SUPPRESS:
setattr(namespace, self.dest, parser_name)
# select the parser
try:
parser = self._name_parser_map[parser_name]
except KeyError:
args = {'parser_name': parser_name,
'choices': ', '.join(self._name_parser_map)}
msg = _('unknown parser %(parser_name)r (choices: %(choices)s)') % args
raise ArgumentError(self, msg)
# parse all the remaining options into the namespace
# store any unrecognized options on the object, so that the top
# level parser can decide what to do with them
# In case this subparser defines new defaults, we parse them
# in a new namespace object and then update the original
# namespace for the relevant parts.
subnamespace, arg_strings = parser.parse_known_args(arg_strings, None)
for key, value in vars(subnamespace).items():
setattr(namespace, key, value)
if arg_strings:
vars(namespace).setdefault(_UNRECOGNIZED_ARGS_ATTR, [])
getattr(namespace, _UNRECOGNIZED_ARGS_ATTR).extend(arg_strings)
# ==============
# Type classes
# ==============
class FileType(object):
"""Factory for creating file object types
Instances of FileType are typically passed as type= arguments to the
ArgumentParser add_argument() method.
Keyword Arguments:
- mode -- A string indicating how the file is to be opened. Accepts the
same values as the builtin open() function.
- bufsize -- The file's desired buffer size. Accepts the same values as
the builtin open() function.
- encoding -- The file's encoding. Accepts the same values as the
builtin open() function.
- errors -- A string indicating how encoding and decoding errors are to
be handled. Accepts the same value as the builtin open() function.
"""
def __init__(self, mode='r', bufsize=-1, encoding=None, errors=None):
self._mode = mode
self._bufsize = bufsize
self._encoding = encoding
self._errors = errors
def __call__(self, string):
# the special argument "-" means sys.std{in,out}
if string == '-':
if 'r' in self._mode:
return _sys.stdin
elif 'w' in self._mode:
return _sys.stdout
else:
msg = _('argument "-" with mode %r') % self._mode
raise ValueError(msg)
# all other arguments are used as file names
try:
return open(string, self._mode, self._bufsize, self._encoding,
self._errors)
except OSError as e:
message = _("can't open '%s': %s")
raise ArgumentTypeError(message % (string, e))
def __repr__(self):
args = self._mode, self._bufsize
kwargs = [('encoding', self._encoding), ('errors', self._errors)]
args_str = ', '.join([repr(arg) for arg in args if arg != -1] +
['%s=%r' % (kw, arg) for kw, arg in kwargs
if arg is not None])
return '%s(%s)' % (type(self).__name__, args_str)
# ===========================
# Optional and Positional Parsing
# ===========================
class Namespace(_AttributeHolder):
"""Simple object for storing attributes.
Implements equality by attribute names and values, and provides a simple
string representation.
"""
def __init__(self, **kwargs):
for name in kwargs:
setattr(self, name, kwargs[name])
def __eq__(self, other):
if not isinstance(other, Namespace):
return NotImplemented
return vars(self) == vars(other)
def __ne__(self, other):
if not isinstance(other, Namespace):
return NotImplemented
return not (self == other)
def __contains__(self, key):
return key in self.__dict__
class _ActionsContainer(object):
def __init__(self,
description,
prefix_chars,
argument_default,
conflict_handler):
super(_ActionsContainer, self).__init__()
self.description = description
self.argument_default = argument_default
self.prefix_chars = prefix_chars
self.conflict_handler = conflict_handler
# set up registries
self._registries = {}
# register actions
self.register('action', None, _StoreAction)
self.register('action', 'store', _StoreAction)
self.register('action', 'store_const', _StoreConstAction)
self.register('action', 'store_true', _StoreTrueAction)
self.register('action', 'store_false', _StoreFalseAction)
self.register('action', 'append', _AppendAction)
self.register('action', 'append_const', _AppendConstAction)
self.register('action', 'count', _CountAction)
self.register('action', 'help', _HelpAction)
self.register('action', 'version', _VersionAction)
self.register('action', 'parsers', _SubParsersAction)
# raise an exception if the conflict handler is invalid
self._get_handler()
# action storage
self._actions = []
self._option_string_actions = {}
# groups
self._action_groups = []
self._mutually_exclusive_groups = []
# defaults storage
self._defaults = {}
# determines whether an "option" looks like a negative number
self._negative_number_matcher = _re.compile(r'^-\d+$|^-\d*\.\d+$')
# whether or not there are any optionals that look like negative
# numbers -- uses a list so it can be shared and edited
self._has_negative_number_optionals = []
# ====================
# Registration methods
# ====================
def register(self, registry_name, value, object):
registry = self._registries.setdefault(registry_name, {})
registry[value] = object
def _registry_get(self, registry_name, value, default=None):
return self._registries[registry_name].get(value, default)
# ==================================
# Namespace default accessor methods
# ==================================
def set_defaults(self, **kwargs):
self._defaults.update(kwargs)
# if these defaults match any existing arguments, replace
# the previous default on the object with the new one
for action in self._actions:
if action.dest in kwargs:
action.default = kwargs[action.dest]
def get_default(self, dest):
for action in self._actions:
if action.dest == dest and action.default is not None:
return action.default
return self._defaults.get(dest, None)
# =======================
# Adding argument actions
# =======================
def add_argument(self, *args, **kwargs):
"""
add_argument(dest, ..., name=value, ...)
add_argument(option_string, option_string, ..., name=value, ...)
"""
# if no positional args are supplied or only one is supplied and
# it doesn't look like an option string, parse a positional
# argument
chars = self.prefix_chars
if not args or len(args) == 1 and args[0][0] not in chars:
if args and 'dest' in kwargs:
raise ValueError('dest supplied twice for positional argument')
kwargs = self._get_positional_kwargs(*args, **kwargs)
# otherwise, we're adding an optional argument
else:
kwargs = self._get_optional_kwargs(*args, **kwargs)
# if no default was supplied, use the parser-level default
if 'default' not in kwargs:
dest = kwargs['dest']
if dest in self._defaults:
kwargs['default'] = self._defaults[dest]
elif self.argument_default is not None:
kwargs['default'] = self.argument_default
# create the action object, and add it to the parser
action_class = self._pop_action_class(kwargs)
if not callable(action_class):
raise ValueError('unknown action "%s"' % (action_class,))
action = action_class(**kwargs)
# raise an error if the action type is not callable
type_func = self._registry_get('type', action.type, action.type)
if not callable(type_func):
raise ValueError('%r is not callable' % (type_func,))
# raise an error if the metavar does not match the type
if hasattr(self, "_get_formatter"):
try:
self._get_formatter()._format_args(action, None)
except TypeError:
raise ValueError("length of metavar tuple does not match nargs")
return self._add_action(action)
def add_argument_group(self, *args, **kwargs):
group = _ArgumentGroup(self, *args, **kwargs)
self._action_groups.append(group)
return group
def add_mutually_exclusive_group(self, **kwargs):
group = _MutuallyExclusiveGroup(self, **kwargs)
self._mutually_exclusive_groups.append(group)
return group
def _add_action(self, action):
# resolve any conflicts
self._check_conflict(action)
# add to actions list
self._actions.append(action)
action.container = self
# index the action by any option strings it has
for option_string in action.option_strings:
self._option_string_actions[option_string] = action
# set the flag if any option strings look like negative numbers
for option_string in action.option_strings:
if self._negative_number_matcher.match(option_string):
if not self._has_negative_number_optionals:
self._has_negative_number_optionals.append(True)
# return the created action
return action
def _remove_action(self, action):
self._actions.remove(action)
def _add_container_actions(self, container):
# collect groups by titles
title_group_map = {}
for group in self._action_groups:
if group.title in title_group_map:
msg = _('cannot merge actions - two groups are named %r')
raise ValueError(msg % (group.title))
title_group_map[group.title] = group
# map each action to its group
group_map = {}
for group in container._action_groups:
# if a group with the title exists, use that, otherwise
# create a new group matching the container's group
if group.title not in title_group_map:
title_group_map[group.title] = self.add_argument_group(
title=group.title,
description=group.description,
conflict_handler=group.conflict_handler)
# map the actions to their new group
for action in group._group_actions:
group_map[action] = title_group_map[group.title]
# add container's mutually exclusive groups
# NOTE: if add_mutually_exclusive_group ever gains title= and
# description= then this code will need to be expanded as above
for group in container._mutually_exclusive_groups:
mutex_group = self.add_mutually_exclusive_group(
required=group.required)
# map the actions to their new mutex group
for action in group._group_actions:
group_map[action] = mutex_group
# add all actions to this container or their group
for action in container._actions:
group_map.get(action, self)._add_action(action)
def _get_positional_kwargs(self, dest, **kwargs):
# make sure required is not specified
if 'required' in kwargs:
msg = _("'required' is an invalid argument for positionals")
raise TypeError(msg)
# mark positional arguments as required if at least one is
# always required
if kwargs.get('nargs') not in [OPTIONAL, ZERO_OR_MORE]:
kwargs['required'] = True
if kwargs.get('nargs') == ZERO_OR_MORE and 'default' not in kwargs:
kwargs['required'] = True
# return the keyword arguments with no option strings
return dict(kwargs, dest=dest, option_strings=[])
def _get_optional_kwargs(self, *args, **kwargs):
# determine short and long option strings
option_strings = []
long_option_strings = []
for option_string in args:
# error on strings that don't start with an appropriate prefix
if not option_string[0] in self.prefix_chars:
args = {'option': option_string,
'prefix_chars': self.prefix_chars}
msg = _('invalid option string %(option)r: '
'must start with a character %(prefix_chars)r')
raise ValueError(msg % args)
# strings starting with two prefix characters are long options
option_strings.append(option_string)
if option_string[0] in self.prefix_chars:
if len(option_string) > 1:
if option_string[1] in self.prefix_chars:
long_option_strings.append(option_string)
# infer destination, '--foo-bar' -> 'foo_bar' and '-x' -> 'x'
dest = kwargs.pop('dest', None)
if dest is None:
if long_option_strings:
dest_option_string = long_option_strings[0]
else:
dest_option_string = option_strings[0]
dest = dest_option_string.lstrip(self.prefix_chars)
if not dest:
msg = _('dest= is required for options like %r')
raise ValueError(msg % option_string)
dest = dest.replace('-', '_')
# return the updated keyword arguments
return dict(kwargs, dest=dest, option_strings=option_strings)
def _pop_action_class(self, kwargs, default=None):
action = kwargs.pop('action', default)
return self._registry_get('action', action, action)
def _get_handler(self):
# determine function from conflict handler string
handler_func_name = '_handle_conflict_%s' % self.conflict_handler
try:
return getattr(self, handler_func_name)
except AttributeError:
msg = _('invalid conflict_resolution value: %r')
raise ValueError(msg % self.conflict_handler)
def _check_conflict(self, action):
# find all options that conflict with this option
confl_optionals = []
for option_string in action.option_strings:
if option_string in self._option_string_actions:
confl_optional = self._option_string_actions[option_string]
confl_optionals.append((option_string, confl_optional))
# resolve any conflicts
if confl_optionals:
conflict_handler = self._get_handler()
conflict_handler(action, confl_optionals)
def _handle_conflict_error(self, action, conflicting_actions):
message = ngettext('conflicting option string: %s',
'conflicting option strings: %s',
len(conflicting_actions))
conflict_string = ', '.join([option_string
for option_string, action
in conflicting_actions])
raise ArgumentError(action, message % conflict_string)
def _handle_conflict_resolve(self, action, conflicting_actions):
# remove all conflicting options
for option_string, action in conflicting_actions:
# remove the conflicting option
action.option_strings.remove(option_string)
self._option_string_actions.pop(option_string, None)
# if the option now has no option string, remove it from the
# container holding it
if not action.option_strings:
action.container._remove_action(action)
class _ArgumentGroup(_ActionsContainer):
def __init__(self, container, title=None, description=None, **kwargs):
# add any missing keyword arguments by checking the container
update = kwargs.setdefault
update('conflict_handler', container.conflict_handler)
update('prefix_chars', container.prefix_chars)
update('argument_default', container.argument_default)
super_init = super(_ArgumentGroup, self).__init__
super_init(description=description, **kwargs)
# group attributes
self.title = title
self._group_actions = []
# share most attributes with the container
self._registries = container._registries
self._actions = container._actions
self._option_string_actions = container._option_string_actions
self._defaults = container._defaults
self._has_negative_number_optionals = \
container._has_negative_number_optionals
self._mutually_exclusive_groups = container._mutually_exclusive_groups
def _add_action(self, action):
action = super(_ArgumentGroup, self)._add_action(action)
self._group_actions.append(action)
return action
def _remove_action(self, action):
super(_ArgumentGroup, self)._remove_action(action)
self._group_actions.remove(action)
class _MutuallyExclusiveGroup(_ArgumentGroup):
def __init__(self, container, required=False):
super(_MutuallyExclusiveGroup, self).__init__(container)
self.required = required
self._container = container
def _add_action(self, action):
if action.required:
msg = _('mutually exclusive arguments must be optional')
raise ValueError(msg)
action = self._container._add_action(action)
self._group_actions.append(action)
return action
def _remove_action(self, action):
self._container._remove_action(action)
self._group_actions.remove(action)
class ArgumentParser(_AttributeHolder, _ActionsContainer):
"""Object for parsing command line strings into Python objects.
Keyword Arguments:
- prog -- The name of the program (default: sys.argv[0])
- usage -- A usage message (default: auto-generated from arguments)
- description -- A description of what the program does
- epilog -- Text following the argument descriptions
- parents -- Parsers whose arguments should be copied into this one
- formatter_class -- HelpFormatter class for printing help messages
- prefix_chars -- Characters that prefix optional arguments
- fromfile_prefix_chars -- Characters that prefix files containing
additional arguments
- argument_default -- The default value for all arguments
- conflict_handler -- String indicating how to handle conflicts
- add_help -- Add a -h/-help option
"""
def __init__(self,
prog=None,
usage=None,
description=None,
epilog=None,
parents=[],
formatter_class=HelpFormatter,
prefix_chars='-',
fromfile_prefix_chars=None,
argument_default=None,
conflict_handler='error',
add_help=True):
superinit = super(ArgumentParser, self).__init__
superinit(description=description,
prefix_chars=prefix_chars,
argument_default=argument_default,
conflict_handler=conflict_handler)
# default setting for prog
if prog is None:
prog = _os.path.basename(_sys.argv[0])
self.prog = prog
self.usage = usage
self.epilog = epilog
self.formatter_class = formatter_class
self.fromfile_prefix_chars = fromfile_prefix_chars
self.add_help = add_help
add_group = self.add_argument_group
self._positionals = add_group(_('positional arguments'))
self._optionals = add_group(_('optional arguments'))
self._subparsers = None
# register types
def identity(string):
return string
self.register('type', None, identity)
# add help argument if necessary
# (using explicit default to override global argument_default)
default_prefix = '-' if '-' in prefix_chars else prefix_chars[0]
if self.add_help:
self.add_argument(
default_prefix+'h', default_prefix*2+'help',
action='help', default=SUPPRESS,
help=_('show this help message and exit'))
# add parent arguments and defaults
for parent in parents:
self._add_container_actions(parent)
try:
defaults = parent._defaults
except AttributeError:
pass
else:
self._defaults.update(defaults)
# =======================
# Pretty __repr__ methods
# =======================
def _get_kwargs(self):
names = [
'prog',
'usage',
'description',
'formatter_class',
'conflict_handler',
'add_help',
]
return [(name, getattr(self, name)) for name in names]
# ==================================
# Optional/Positional adding methods
# ==================================
def add_subparsers(self, **kwargs):
if self._subparsers is not None:
self.error(_('cannot have multiple subparser arguments'))
# add the parser class to the arguments if it's not present
kwargs.setdefault('parser_class', type(self))
if 'title' in kwargs or 'description' in kwargs:
title = _(kwargs.pop('title', 'subcommands'))
description = _(kwargs.pop('description', None))
self._subparsers = self.add_argument_group(title, description)
else:
self._subparsers = self._positionals
# prog defaults to the usage message of this parser, skipping
# optional arguments and with no "usage:" prefix
if kwargs.get('prog') is None:
formatter = self._get_formatter()
positionals = self._get_positional_actions()
groups = self._mutually_exclusive_groups
formatter.add_usage(self.usage, positionals, groups, '')
kwargs['prog'] = formatter.format_help().strip()
# create the parsers action and add it to the positionals list
parsers_class = self._pop_action_class(kwargs, 'parsers')
action = parsers_class(option_strings=[], **kwargs)
self._subparsers._add_action(action)
# return the created parsers action
return action
def _add_action(self, action):
if action.option_strings:
self._optionals._add_action(action)
else:
self._positionals._add_action(action)
return action
def _get_optional_actions(self):
return [action
for action in self._actions
if action.option_strings]
def _get_positional_actions(self):
return [action
for action in self._actions
if not action.option_strings]
# =====================================
# Command line argument parsing methods
# =====================================
def parse_args(self, args=None, namespace=None):
args, argv = self.parse_known_args(args, namespace)
if argv:
msg = _('unrecognized arguments: %s')
self.error(msg % ' '.join(argv))
return args
def parse_known_args(self, args=None, namespace=None):
if args is None:
# args default to the system args
args = _sys.argv[1:]
else:
# make sure that args are mutable
args = list(args)
# default Namespace built from parser defaults
if namespace is None:
namespace = Namespace()
# add any action defaults that aren't present
for action in self._actions:
if action.dest is not SUPPRESS:
if not hasattr(namespace, action.dest):
if action.default is not SUPPRESS:
setattr(namespace, action.dest, action.default)
# add any parser defaults that aren't present
for dest in self._defaults:
if not hasattr(namespace, dest):
setattr(namespace, dest, self._defaults[dest])
# parse the arguments and exit if there are any errors
try:
namespace, args = self._parse_known_args(args, namespace)
if hasattr(namespace, _UNRECOGNIZED_ARGS_ATTR):
args.extend(getattr(namespace, _UNRECOGNIZED_ARGS_ATTR))
delattr(namespace, _UNRECOGNIZED_ARGS_ATTR)
return namespace, args
except ArgumentError:
err = _sys.exc_info()[1]
self.error(str(err))
def _parse_known_args(self, arg_strings, namespace):
# replace arg strings that are file references
if self.fromfile_prefix_chars is not None:
arg_strings = self._read_args_from_files(arg_strings)
# map all mutually exclusive arguments to the other arguments
# they can't occur with
action_conflicts = {}
for mutex_group in self._mutually_exclusive_groups:
group_actions = mutex_group._group_actions
for i, mutex_action in enumerate(mutex_group._group_actions):
conflicts = action_conflicts.setdefault(mutex_action, [])
conflicts.extend(group_actions[:i])
conflicts.extend(group_actions[i + 1:])
# find all option indices, and determine the arg_string_pattern
# which has an 'O' if there is an option at an index,
# an 'A' if there is an argument, or a '-' if there is a '--'
option_string_indices = {}
arg_string_pattern_parts = []
arg_strings_iter = iter(arg_strings)
for i, arg_string in enumerate(arg_strings_iter):
# all args after -- are non-options
if arg_string == '--':
arg_string_pattern_parts.append('-')
for arg_string in arg_strings_iter:
arg_string_pattern_parts.append('A')
# otherwise, add the arg to the arg strings
# and note the index if it was an option
else:
option_tuple = self._parse_optional(arg_string)
if option_tuple is None:
pattern = 'A'
else:
option_string_indices[i] = option_tuple
pattern = 'O'
arg_string_pattern_parts.append(pattern)
# join the pieces together to form the pattern
arg_strings_pattern = ''.join(arg_string_pattern_parts)
# converts arg strings to the appropriate and then takes the action
seen_actions = set()
seen_non_default_actions = set()
def take_action(action, argument_strings, option_string=None):
seen_actions.add(action)
argument_values = self._get_values(action, argument_strings)
# error if this argument is not allowed with other previously
# seen arguments, assuming that actions that use the default
# value don't really count as "present"
if argument_values is not action.default:
seen_non_default_actions.add(action)
for conflict_action in action_conflicts.get(action, []):
if conflict_action in seen_non_default_actions:
msg = _('not allowed with argument %s')
action_name = _get_action_name(conflict_action)
raise ArgumentError(action, msg % action_name)
# take the action if we didn't receive a SUPPRESS value
# (e.g. from a default)
if argument_values is not SUPPRESS:
action(self, namespace, argument_values, option_string)
# function to convert arg_strings into an optional action
def consume_optional(start_index):
# get the optional identified at this index
option_tuple = option_string_indices[start_index]
action, option_string, explicit_arg = option_tuple
# identify additional optionals in the same arg string
# (e.g. -xyz is the same as -x -y -z if no args are required)
match_argument = self._match_argument
action_tuples = []
while True:
# if we found no optional action, skip it
if action is None:
extras.append(arg_strings[start_index])
return start_index + 1
# if there is an explicit argument, try to match the
# optional's string arguments to only this
if explicit_arg is not None:
arg_count = match_argument(action, 'A')
# if the action is a single-dash option and takes no
# arguments, try to parse more single-dash options out
# of the tail of the option string
chars = self.prefix_chars
if arg_count == 0 and option_string[1] not in chars:
action_tuples.append((action, [], option_string))
char = option_string[0]
option_string = char + explicit_arg[0]
new_explicit_arg = explicit_arg[1:] or None
optionals_map = self._option_string_actions
if option_string in optionals_map:
action = optionals_map[option_string]
explicit_arg = new_explicit_arg
else:
msg = _('ignored explicit argument %r')
raise ArgumentError(action, msg % explicit_arg)
# if the action expect exactly one argument, we've
# successfully matched the option; exit the loop
elif arg_count == 1:
stop = start_index + 1
args = [explicit_arg]
action_tuples.append((action, args, option_string))
break
# error if a double-dash option did not use the
# explicit argument
else:
msg = _('ignored explicit argument %r')
raise ArgumentError(action, msg % explicit_arg)
# if there is no explicit argument, try to match the
# optional's string arguments with the following strings
# if successful, exit the loop
else:
start = start_index + 1
selected_patterns = arg_strings_pattern[start:]
arg_count = match_argument(action, selected_patterns)
stop = start + arg_count
args = arg_strings[start:stop]
action_tuples.append((action, args, option_string))
break
# add the Optional to the list and return the index at which
# the Optional's string args stopped
assert action_tuples
for action, args, option_string in action_tuples:
take_action(action, args, option_string)
return stop
# the list of Positionals left to be parsed; this is modified
# by consume_positionals()
positionals = self._get_positional_actions()
# function to convert arg_strings into positional actions
def consume_positionals(start_index):
# match as many Positionals as possible
match_partial = self._match_arguments_partial
selected_pattern = arg_strings_pattern[start_index:]
arg_counts = match_partial(positionals, selected_pattern)
# slice off the appropriate arg strings for each Positional
# and add the Positional and its args to the list
for action, arg_count in zip(positionals, arg_counts):
args = arg_strings[start_index: start_index + arg_count]
start_index += arg_count
take_action(action, args)
# slice off the Positionals that we just parsed and return the
# index at which the Positionals' string args stopped
positionals[:] = positionals[len(arg_counts):]
return start_index
# consume Positionals and Optionals alternately, until we have
# passed the last option string
extras = []
start_index = 0
if option_string_indices:
max_option_string_index = max(option_string_indices)
else:
max_option_string_index = -1
while start_index <= max_option_string_index:
# consume any Positionals preceding the next option
next_option_string_index = min([
index
for index in option_string_indices
if index >= start_index])
if start_index != next_option_string_index:
positionals_end_index = consume_positionals(start_index)
# only try to parse the next optional if we didn't consume
# the option string during the positionals parsing
if positionals_end_index > start_index:
start_index = positionals_end_index
continue
else:
start_index = positionals_end_index
# if we consumed all the positionals we could and we're not
# at the index of an option string, there were extra arguments
if start_index not in option_string_indices:
strings = arg_strings[start_index:next_option_string_index]
extras.extend(strings)
start_index = next_option_string_index
# consume the next optional and any arguments for it
start_index = consume_optional(start_index)
# consume any positionals following the last Optional
stop_index = consume_positionals(start_index)
# if we didn't consume all the argument strings, there were extras
extras.extend(arg_strings[stop_index:])
# make sure all required actions were present and also convert
# action defaults which were not given as arguments
required_actions = []
for action in self._actions:
if action not in seen_actions:
if action.required:
required_actions.append(_get_action_name(action))
else:
# Convert action default now instead of doing it before
# parsing arguments to avoid calling convert functions
# twice (which may fail) if the argument was given, but
# only if it was defined already in the namespace
if (action.default is not None and
isinstance(action.default, str) and
hasattr(namespace, action.dest) and
action.default is getattr(namespace, action.dest)):
setattr(namespace, action.dest,
self._get_value(action, action.default))
if required_actions:
self.error(_('the following arguments are required: %s') %
', '.join(required_actions))
# make sure all required groups had one option present
for group in self._mutually_exclusive_groups:
if group.required:
for action in group._group_actions:
if action in seen_non_default_actions:
break
# if no actions were used, report the error
else:
names = [_get_action_name(action)
for action in group._group_actions
if action.help is not SUPPRESS]
msg = _('one of the arguments %s is required')
self.error(msg % ' '.join(names))
# return the updated namespace and the extra arguments
return namespace, extras
def _read_args_from_files(self, arg_strings):
# expand arguments referencing files
new_arg_strings = []
for arg_string in arg_strings:
# for regular arguments, just add them back into the list
if not arg_string or arg_string[0] not in self.fromfile_prefix_chars:
new_arg_strings.append(arg_string)
# replace arguments referencing files with the file content
else:
try:
with open(arg_string[1:]) as args_file:
arg_strings = []
for arg_line in args_file.read().splitlines():
for arg in self.convert_arg_line_to_args(arg_line):
arg_strings.append(arg)
arg_strings = self._read_args_from_files(arg_strings)
new_arg_strings.extend(arg_strings)
except OSError:
err = _sys.exc_info()[1]
self.error(str(err))
# return the modified argument list
return new_arg_strings
def convert_arg_line_to_args(self, arg_line):
return [arg_line]
def _match_argument(self, action, arg_strings_pattern):
# match the pattern for this action to the arg strings
nargs_pattern = self._get_nargs_pattern(action)
match = _re.match(nargs_pattern, arg_strings_pattern)
# raise an exception if we weren't able to find a match
if match is None:
nargs_errors = {
None: _('expected one argument'),
OPTIONAL: _('expected at most one argument'),
ONE_OR_MORE: _('expected at least one argument'),
}
default = ngettext('expected %s argument',
'expected %s arguments',
action.nargs) % action.nargs
msg = nargs_errors.get(action.nargs, default)
raise ArgumentError(action, msg)
# return the number of arguments matched
return len(match.group(1))
def _match_arguments_partial(self, actions, arg_strings_pattern):
# progressively shorten the actions list by slicing off the
# final actions until we find a match
result = []
for i in range(len(actions), 0, -1):
actions_slice = actions[:i]
pattern = ''.join([self._get_nargs_pattern(action)
for action in actions_slice])
match = _re.match(pattern, arg_strings_pattern)
if match is not None:
result.extend([len(string) for string in match.groups()])
break
# return the list of arg string counts
return result
def _parse_optional(self, arg_string):
# if it's an empty string, it was meant to be a positional
if not arg_string:
return None
# if it doesn't start with a prefix, it was meant to be positional
if not arg_string[0] in self.prefix_chars:
return None
# if the option string is present in the parser, return the action
if arg_string in self._option_string_actions:
action = self._option_string_actions[arg_string]
return action, arg_string, None
# if it's just a single character, it was meant to be positional
if len(arg_string) == 1:
return None
# if the option string before the "=" is present, return the action
if '=' in arg_string:
option_string, explicit_arg = arg_string.split('=', 1)
if option_string in self._option_string_actions:
action = self._option_string_actions[option_string]
return action, option_string, explicit_arg
# search through all possible prefixes of the option string
# and all actions in the parser for possible interpretations
option_tuples = self._get_option_tuples(arg_string)
# if multiple actions match, the option string was ambiguous
if len(option_tuples) > 1:
options = ', '.join([option_string
for action, option_string, explicit_arg in option_tuples])
args = {'option': arg_string, 'matches': options}
msg = _('ambiguous option: %(option)s could match %(matches)s')
self.error(msg % args)
# if exactly one action matched, this segmentation is good,
# so return the parsed action
elif len(option_tuples) == 1:
option_tuple, = option_tuples
return option_tuple
# if it was not found as an option, but it looks like a negative
# number, it was meant to be positional
# unless there are negative-number-like options
if self._negative_number_matcher.match(arg_string):
if not self._has_negative_number_optionals:
return None
# if it contains a space, it was meant to be a positional
if ' ' in arg_string:
return None
# it was meant to be an optional but there is no such option
# in this parser (though it might be a valid option in a subparser)
return None, arg_string, None
def _get_option_tuples(self, option_string):
result = []
# option strings starting with two prefix characters are only
# split at the '='
chars = self.prefix_chars
if option_string[0] in chars and option_string[1] in chars:
if '=' in option_string:
option_prefix, explicit_arg = option_string.split('=', 1)
else:
option_prefix = option_string
explicit_arg = None
for option_string in self._option_string_actions:
if option_string.startswith(option_prefix):
action = self._option_string_actions[option_string]
tup = action, option_string, explicit_arg
result.append(tup)
# single character options can be concatenated with their arguments
# but multiple character options always have to have their argument
# separate
elif option_string[0] in chars and option_string[1] not in chars:
option_prefix = option_string
explicit_arg = None
short_option_prefix = option_string[:2]
short_explicit_arg = option_string[2:]
for option_string in self._option_string_actions:
if option_string == short_option_prefix:
action = self._option_string_actions[option_string]
tup = action, option_string, short_explicit_arg
result.append(tup)
elif option_string.startswith(option_prefix):
action = self._option_string_actions[option_string]
tup = action, option_string, explicit_arg
result.append(tup)
# shouldn't ever get here
else:
self.error(_('unexpected option string: %s') % option_string)
# return the collected option tuples
return result
def _get_nargs_pattern(self, action):
# in all examples below, we have to allow for '--' args
# which are represented as '-' in the pattern
nargs = action.nargs
# the default (None) is assumed to be a single argument
if nargs is None:
nargs_pattern = '(-*A-*)'
# allow zero or one arguments
elif nargs == OPTIONAL:
nargs_pattern = '(-*A?-*)'
# allow zero or more arguments
elif nargs == ZERO_OR_MORE:
nargs_pattern = '(-*[A-]*)'
# allow one or more arguments
elif nargs == ONE_OR_MORE:
nargs_pattern = '(-*A[A-]*)'
# allow any number of options or arguments
elif nargs == REMAINDER:
nargs_pattern = '([-AO]*)'
# allow one argument followed by any number of options or arguments
elif nargs == PARSER:
nargs_pattern = '(-*A[-AO]*)'
# all others should be integers
else:
nargs_pattern = '(-*%s-*)' % '-*'.join('A' * nargs)
# if this is an optional action, -- is not allowed
if action.option_strings:
nargs_pattern = nargs_pattern.replace('-*', '')
nargs_pattern = nargs_pattern.replace('-', '')
# return the pattern
return nargs_pattern
# ========================
# Value conversion methods
# ========================
def _get_values(self, action, arg_strings):
# for everything but PARSER, REMAINDER args, strip out first '--'
if action.nargs not in [PARSER, REMAINDER]:
try:
arg_strings.remove('--')
except ValueError:
pass
# optional argument produces a default when not present
if not arg_strings and action.nargs == OPTIONAL:
if action.option_strings:
value = action.const
else:
value = action.default
if isinstance(value, str):
value = self._get_value(action, value)
self._check_value(action, value)
# when nargs='*' on a positional, if there were no command-line
# args, use the default if it is anything other than None
elif (not arg_strings and action.nargs == ZERO_OR_MORE and
not action.option_strings):
if action.default is not None:
value = action.default
else:
value = arg_strings
self._check_value(action, value)
# single argument or optional argument produces a single value
elif len(arg_strings) == 1 and action.nargs in [None, OPTIONAL]:
arg_string, = arg_strings
value = self._get_value(action, arg_string)
self._check_value(action, value)
# REMAINDER arguments convert all values, checking none
elif action.nargs == REMAINDER:
value = [self._get_value(action, v) for v in arg_strings]
# PARSER arguments convert all values, but check only the first
elif action.nargs == PARSER:
value = [self._get_value(action, v) for v in arg_strings]
self._check_value(action, value[0])
# all other types of nargs produce a list
else:
value = [self._get_value(action, v) for v in arg_strings]
for v in value:
self._check_value(action, v)
# return the converted value
return value
def _get_value(self, action, arg_string):
type_func = self._registry_get('type', action.type, action.type)
if not callable(type_func):
msg = _('%r is not callable')
raise ArgumentError(action, msg % type_func)
# convert the value to the appropriate type
try:
result = type_func(arg_string)
# ArgumentTypeErrors indicate errors
except ArgumentTypeError:
name = getattr(action.type, '__name__', repr(action.type))
msg = str(_sys.exc_info()[1])
raise ArgumentError(action, msg)
# TypeErrors or ValueErrors also indicate errors
except (TypeError, ValueError):
name = getattr(action.type, '__name__', repr(action.type))
args = {'type': name, 'value': arg_string}
msg = _('invalid %(type)s value: %(value)r')
raise ArgumentError(action, msg % args)
# return the converted value
return result
def _check_value(self, action, value):
# converted value must be one of the choices (if specified)
if action.choices is not None and value not in action.choices:
args = {'value': value,
'choices': ', '.join(map(repr, action.choices))}
msg = _('invalid choice: %(value)r (choose from %(choices)s)')
raise ArgumentError(action, msg % args)
# =======================
# Help-formatting methods
# =======================
def format_usage(self):
formatter = self._get_formatter()
formatter.add_usage(self.usage, self._actions,
self._mutually_exclusive_groups)
return formatter.format_help()
def format_help(self):
formatter = self._get_formatter()
# usage
formatter.add_usage(self.usage, self._actions,
self._mutually_exclusive_groups)
# description
formatter.add_text(self.description)
# positionals, optionals and user-defined groups
for action_group in self._action_groups:
formatter.start_section(action_group.title)
formatter.add_text(action_group.description)
formatter.add_arguments(action_group._group_actions)
formatter.end_section()
# epilog
formatter.add_text(self.epilog)
# determine help from format above
return formatter.format_help()
def _get_formatter(self):
return self.formatter_class(prog=self.prog)
# =====================
# Help-printing methods
# =====================
def print_usage(self, file=None):
if file is None:
file = _sys.stdout
self._print_message(self.format_usage(), file)
def print_help(self, file=None):
if file is None:
file = _sys.stdout
self._print_message(self.format_help(), file)
def _print_message(self, message, file=None):
if message:
if file is None:
file = _sys.stderr
file.write(message)
# ===============
# Exiting methods
# ===============
def exit(self, status=0, message=None):
if message:
self._print_message(message, _sys.stderr)
_sys.exit(status)
def error(self, message):
"""error(message: string)
Prints a usage message incorporating the message to stderr and
exits.
If you override this in a subclass, it should not return -- it
should either exit or raise an exception.
"""
self.print_usage(_sys.stderr)
args = {'prog': self.prog, 'message': message}
self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
"""
ast
~~~
The `ast` module helps Python applications to process trees of the Python
abstract syntax grammar. The abstract syntax itself might change with
each Python release; this module helps to find out programmatically what
the current grammar looks like and allows modifications of it.
An abstract syntax tree can be generated by passing `ast.PyCF_ONLY_AST` as
a flag to the `compile()` builtin function or by using the `parse()`
function from this module. The result will be a tree of objects whose
classes all inherit from `ast.AST`.
A modified abstract syntax tree can be compiled into a Python code object
using the built-in `compile()` function.
Additionally various helper functions are provided that make working with
the trees simpler. The main intention of the helper functions and this
module in general is to provide an easy to use interface for libraries
that work tightly with the python syntax (template engines for example).
:copyright: Copyright 2008 by Armin Ronacher.
:license: Python License.
"""
from _ast import *
def parse(source, filename='<unknown>', mode='exec'):
"""
Parse the source into an AST node.
Equivalent to compile(source, filename, mode, PyCF_ONLY_AST).
"""
return compile(source, filename, mode, PyCF_ONLY_AST)
def literal_eval(node_or_string):
"""
Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the following
Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
sets, booleans, and None.
"""
if isinstance(node_or_string, str):
node_or_string = parse(node_or_string, mode='eval')
if isinstance(node_or_string, Expression):
node_or_string = node_or_string.body
def _convert(node):
if isinstance(node, (Str, Bytes)):
return node.s
elif isinstance(node, Num):
return node.n
elif isinstance(node, Tuple):
return tuple(map(_convert, node.elts))
elif isinstance(node, List):
return list(map(_convert, node.elts))
elif isinstance(node, Set):
return set(map(_convert, node.elts))
elif isinstance(node, Dict):
return dict((_convert(k), _convert(v)) for k, v
in zip(node.keys, node.values))
elif isinstance(node, NameConstant):
return node.value
elif isinstance(node, UnaryOp) and \
isinstance(node.op, (UAdd, USub)) and \
isinstance(node.operand, (Num, UnaryOp, BinOp)):
operand = _convert(node.operand)
if isinstance(node.op, UAdd):
return + operand
else:
return - operand
elif isinstance(node, BinOp) and \
isinstance(node.op, (Add, Sub)) and \
isinstance(node.right, (Num, UnaryOp, BinOp)) and \
isinstance(node.left, (Num, UnaryOp, BinOp)):
left = _convert(node.left)
right = _convert(node.right)
if isinstance(node.op, Add):
return left + right
else:
return left - right
raise ValueError('malformed node or string: ' + repr(node))
return _convert(node_or_string)
def dump(node, annotate_fields=True, include_attributes=False):
"""
Return a formatted dump of the tree in *node*. This is mainly useful for
debugging purposes. The returned string will show the names and the values
for fields. This makes the code impossible to evaluate, so if evaluation is
wanted *annotate_fields* must be set to False. Attributes such as line
numbers and column offsets are not dumped by default. If this is wanted,
*include_attributes* can be set to True.
"""
def _format(node):
if isinstance(node, AST):
fields = [(a, _format(b)) for a, b in iter_fields(node)]
rv = '%s(%s' % (node.__class__.__name__, ', '.join(
('%s=%s' % field for field in fields)
if annotate_fields else
(b for a, b in fields)
))
if include_attributes and node._attributes:
rv += fields and ', ' or ' '
rv += ', '.join('%s=%s' % (a, _format(getattr(node, a)))
for a in node._attributes)
return rv + ')'
elif isinstance(node, list):
return '[%s]' % ', '.join(_format(x) for x in node)
return repr(node)
if not isinstance(node, AST):
raise TypeError('expected AST, got %r' % node.__class__.__name__)
return _format(node)
def copy_location(new_node, old_node):
"""
Copy source location (`lineno` and `col_offset` attributes) from
*old_node* to *new_node* if possible, and return *new_node*.
"""
for attr in 'lineno', 'col_offset':
if attr in old_node._attributes and attr in new_node._attributes \
and hasattr(old_node, attr):
setattr(new_node, attr, getattr(old_node, attr))
return new_node
def fix_missing_locations(node):
"""
When you compile a node tree with compile(), the compiler expects lineno and
col_offset attributes for every node that supports them. This is rather
tedious to fill in for generated nodes, so this helper adds these attributes
recursively where not already set, by setting them to the values of the
parent node. It works recursively starting at *node*.
"""
def _fix(node, lineno, col_offset):
if 'lineno' in node._attributes:
if not hasattr(node, 'lineno'):
node.lineno = lineno
else:
lineno = node.lineno
if 'col_offset' in node._attributes:
if not hasattr(node, 'col_offset'):
node.col_offset = col_offset
else:
col_offset = node.col_offset
for child in iter_child_nodes(node):
_fix(child, lineno, col_offset)
_fix(node, 1, 0)
return node
def increment_lineno(node, n=1):
"""
Increment the line number of each node in the tree starting at *node* by *n*.
This is useful to "move code" to a different location in a file.
"""
for child in walk(node):
if 'lineno' in child._attributes:
child.lineno = getattr(child, 'lineno', 0) + n
return node
def iter_fields(node):
"""
Yield a tuple of ``(fieldname, value)`` for each field in ``node._fields``
that is present on *node*.
"""
for field in node._fields:
try:
yield field, getattr(node, field)
except AttributeError:
pass
def iter_child_nodes(node):
"""
Yield all direct child nodes of *node*, that is, all fields that are nodes
and all items of fields that are lists of nodes.
"""
for name, field in iter_fields(node):
if isinstance(field, AST):
yield field
elif isinstance(field, list):
for item in field:
if isinstance(item, AST):
yield item
def get_docstring(node, clean=True):
"""
Return the docstring for the given node or None if no docstring can
be found. If the node provided does not have docstrings a TypeError
will be raised.
"""
if not isinstance(node, (FunctionDef, ClassDef, Module)):
raise TypeError("%r can't have docstrings" % node.__class__.__name__)
if node.body and isinstance(node.body[0], Expr) and \
isinstance(node.body[0].value, Str):
if clean:
import inspect
return inspect.cleandoc(node.body[0].value.s)
return node.body[0].value.s
def walk(node):
"""
Recursively yield all descendant nodes in the tree starting at *node*
(including *node* itself), in no specified order. This is useful if you
only want to modify nodes in place and don't care about the context.
"""
from collections import deque
todo = deque([node])
while todo:
node = todo.popleft()
todo.extend(iter_child_nodes(node))
yield node
class NodeVisitor(object):
"""
A node visitor base class that walks the abstract syntax tree and calls a
visitor function for every node found. This function may return a value
which is forwarded by the `visit` method.
This class is meant to be subclassed, with the subclass adding visitor
methods.
Per default the visitor functions for the nodes are ``'visit_'`` +
class name of the node. So a `TryFinally` node visit function would
be `visit_TryFinally`. This behavior can be changed by overriding
the `visit` method. If no visitor function exists for a node
(return value `None`) the `generic_visit` visitor is used instead.
Don't use the `NodeVisitor` if you want to apply changes to nodes during
traversing. For this a special visitor exists (`NodeTransformer`) that
allows modifications.
"""
def visit(self, node):
"""Visit a node."""
method = 'visit_' + node.__class__.__name__
visitor = getattr(self, method, self.generic_visit)
return visitor(node)
def generic_visit(self, node):
"""Called if no explicit visitor function exists for a node."""
for field, value in iter_fields(node):
if isinstance(value, list):
for item in value:
if isinstance(item, AST):
self.visit(item)
elif isinstance(value, AST):
self.visit(value)
class NodeTransformer(NodeVisitor):
"""
A :class:`NodeVisitor` subclass that walks the abstract syntax tree and
allows modification of nodes.
The `NodeTransformer` will walk the AST and use the return value of the
visitor methods to replace or remove the old node. If the return value of
the visitor method is ``None``, the node will be removed from its location,
otherwise it is replaced with the return value. The return value may be the
original node in which case no replacement takes place.
Here is an example transformer that rewrites all occurrences of name lookups
(``foo``) to ``data['foo']``::
class RewriteName(NodeTransformer):
def visit_Name(self, node):
return copy_location(Subscript(
value=Name(id='data', ctx=Load()),
slice=Index(value=Str(s=node.id)),
ctx=node.ctx
), node)
Keep in mind that if the node you're operating on has child nodes you must
either transform the child nodes yourself or call the :meth:`generic_visit`
method for the node first.
For nodes that were part of a collection of statements (that applies to all
statement nodes), the visitor may also return a list of nodes rather than
just a single node.
Usually you use the transformer like this::
node = YourTransformer().visit(node)
"""
def generic_visit(self, node):
for field, old_value in iter_fields(node):
old_value = getattr(node, field, None)
if isinstance(old_value, list):
new_values = []
for value in old_value:
if isinstance(value, AST):
value = self.visit(value)
if value is None:
continue
elif not isinstance(value, AST):
new_values.extend(value)
continue
new_values.append(value)
old_value[:] = new_values
elif isinstance(old_value, AST):
new_node = self.visit(old_value)
if new_node is None:
delattr(node, field)
else:
setattr(node, field, new_node)
return node
# -*- Mode: Python; tab-width: 4 -*-
# Id: asynchat.py,v 2.26 2000/09/07 22:29:26 rushing Exp
# Author: Sam Rushing <[email protected]>
# ======================================================================
# Copyright 1996 by Sam Rushing
#
# All Rights Reserved
#
# Permission to use, copy, modify, and distribute this software and
# its documentation for any purpose and without fee is hereby
# granted, provided that the above copyright notice appear in all
# copies and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of Sam
# Rushing not be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior
# permission.
#
# SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
# INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN
# NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR
# CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
# OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
# NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
# CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
# ======================================================================
r"""A class supporting chat-style (command/response) protocols.
This class adds support for 'chat' style protocols - where one side
sends a 'command', and the other sends a response (examples would be
the common internet protocols - smtp, nntp, ftp, etc..).
The handle_read() method looks at the input stream for the current
'terminator' (usually '\r\n' for single-line responses, '\r\n.\r\n'
for multi-line output), calling self.found_terminator() on its
receipt.
for example:
Say you build an async nntp client using this class. At the start
of the connection, you'll have self.terminator set to '\r\n', in
order to process the single-line greeting. Just before issuing a
'LIST' command you'll set it to '\r\n.\r\n'. The output of the LIST
command will be accumulated (using your own 'collect_incoming_data'
method) up to the terminator, and then control will be returned to
you - by calling your self.found_terminator() method.
"""
import asyncore
from collections import deque
class async_chat(asyncore.dispatcher):
"""This is an abstract class. You must derive from this class, and add
the two methods collect_incoming_data() and found_terminator()"""
# these are overridable defaults
ac_in_buffer_size = 65536
ac_out_buffer_size = 65536
# we don't want to enable the use of encoding by default, because that is a
# sign of an application bug that we don't want to pass silently
use_encoding = 0
encoding = 'latin-1'
def __init__(self, sock=None, map=None):
# for string terminator matching
self.ac_in_buffer = b''
# we use a list here rather than io.BytesIO for a few reasons...
# del lst[:] is faster than bio.truncate(0)
# lst = [] is faster than bio.truncate(0)
self.incoming = []
# we toss the use of the "simple producer" and replace it with
# a pure deque, which the original fifo was a wrapping of
self.producer_fifo = deque()
asyncore.dispatcher.__init__(self, sock, map)
def collect_incoming_data(self, data):
raise NotImplementedError("must be implemented in subclass")
def _collect_incoming_data(self, data):
self.incoming.append(data)
def _get_data(self):
d = b''.join(self.incoming)
del self.incoming[:]
return d
def found_terminator(self):
raise NotImplementedError("must be implemented in subclass")
def set_terminator(self, term):
"""Set the input delimiter.
Can be a fixed string of any length, an integer, or None.
"""
if isinstance(term, str) and self.use_encoding:
term = bytes(term, self.encoding)
elif isinstance(term, int) and term < 0:
raise ValueError('the number of received bytes must be positive')
self.terminator = term
def get_terminator(self):
return self.terminator
# grab some more data from the socket,
# throw it to the collector method,
# check for the terminator,
# if found, transition to the next state.
def handle_read(self):
try:
data = self.recv(self.ac_in_buffer_size)
except BlockingIOError:
return
except OSError as why:
self.handle_error()
return
if isinstance(data, str) and self.use_encoding:
data = bytes(str, self.encoding)
self.ac_in_buffer = self.ac_in_buffer + data
# Continue to search for self.terminator in self.ac_in_buffer,
# while calling self.collect_incoming_data. The while loop
# is necessary because we might read several data+terminator
# combos with a single recv(4096).
while self.ac_in_buffer:
lb = len(self.ac_in_buffer)
terminator = self.get_terminator()
if not terminator:
# no terminator, collect it all
self.collect_incoming_data(self.ac_in_buffer)
self.ac_in_buffer = b''
elif isinstance(terminator, int):
# numeric terminator
n = terminator
if lb < n:
self.collect_incoming_data(self.ac_in_buffer)
self.ac_in_buffer = b''
self.terminator = self.terminator - lb
else:
self.collect_incoming_data(self.ac_in_buffer[:n])
self.ac_in_buffer = self.ac_in_buffer[n:]
self.terminator = 0
self.found_terminator()
else:
# 3 cases:
# 1) end of buffer matches terminator exactly:
# collect data, transition
# 2) end of buffer matches some prefix:
# collect data to the prefix
# 3) end of buffer does not match any prefix:
# collect data
terminator_len = len(terminator)
index = self.ac_in_buffer.find(terminator)
if index != -1:
# we found the terminator
if index > 0:
# don't bother reporting the empty string
# (source of subtle bugs)
self.collect_incoming_data(self.ac_in_buffer[:index])
self.ac_in_buffer = self.ac_in_buffer[index+terminator_len:]
# This does the Right Thing if the terminator
# is changed here.
self.found_terminator()
else:
# check for a prefix of the terminator
index = find_prefix_at_end(self.ac_in_buffer, terminator)
if index:
if index != lb:
# we found a prefix, collect up to the prefix
self.collect_incoming_data(self.ac_in_buffer[:-index])
self.ac_in_buffer = self.ac_in_buffer[-index:]
break
else:
# no prefix, collect it all
self.collect_incoming_data(self.ac_in_buffer)
self.ac_in_buffer = b''
def handle_write(self):
self.initiate_send()
def handle_close(self):
self.close()
def push(self, data):
if not isinstance(data, (bytes, bytearray, memoryview)):
raise TypeError('data argument must be byte-ish (%r)',
type(data))
sabs = self.ac_out_buffer_size
if len(data) > sabs:
for i in range(0, len(data), sabs):
self.producer_fifo.append(data[i:i+sabs])
else:
self.producer_fifo.append(data)
self.initiate_send()
def push_with_producer(self, producer):
self.producer_fifo.append(producer)
self.initiate_send()
def readable(self):
"predicate for inclusion in the readable for select()"
# cannot use the old predicate, it violates the claim of the
# set_terminator method.
# return (len(self.ac_in_buffer) <= self.ac_in_buffer_size)
return 1
def writable(self):
"predicate for inclusion in the writable for select()"
return self.producer_fifo or (not self.connected)
def close_when_done(self):
"automatically close this channel once the outgoing queue is empty"
self.producer_fifo.append(None)
def initiate_send(self):
while self.producer_fifo and self.connected:
first = self.producer_fifo[0]
# handle empty string/buffer or None entry
if not first:
del self.producer_fifo[0]
if first is None:
self.handle_close()
return
# handle classic producer behavior
obs = self.ac_out_buffer_size
try:
data = first[:obs]
except TypeError:
data = first.more()
if data:
self.producer_fifo.appendleft(data)
else:
del self.producer_fifo[0]
continue
if isinstance(data, str) and self.use_encoding:
data = bytes(data, self.encoding)
# send the data
try:
num_sent = self.send(data)
except OSError:
self.handle_error()
return
if num_sent:
if num_sent < len(data) or obs < len(first):
self.producer_fifo[0] = first[num_sent:]
else:
del self.producer_fifo[0]
# we tried to send some actual data
return
def discard_buffers(self):
# Emergencies only!
self.ac_in_buffer = b''
del self.incoming[:]
self.producer_fifo.clear()
class simple_producer:
def __init__(self, data, buffer_size=512):
self.data = data
self.buffer_size = buffer_size
def more(self):
if len(self.data) > self.buffer_size:
result = self.data[:self.buffer_size]
self.data = self.data[self.buffer_size:]
return result
else:
result = self.data
self.data = b''
return result
class fifo:
def __init__(self, list=None):
if not list:
self.list = deque()
else:
self.list = deque(list)
def __len__(self):
return len(self.list)
def is_empty(self):
return not self.list
def first(self):
return self.list[0]
def push(self, data):
self.list.append(data)
def pop(self):
if self.list:
return (1, self.list.popleft())
else:
return (0, None)
# Given 'haystack', see if any prefix of 'needle' is at its end. This
# assumes an exact match has already been checked. Return the number of
# characters matched.
# for example:
# f_p_a_e("qwerty\r", "\r\n") => 1
# f_p_a_e("qwertydkjf", "\r\n") => 0
# f_p_a_e("qwerty\r\n", "\r\n") => <undefined>
# this could maybe be made faster with a computed regex?
# [answer: no; circa Python-2.0, Jan 2001]
# new python: 28961/s
# old python: 18307/s
# re: 12820/s
# regex: 14035/s
def find_prefix_at_end(haystack, needle):
l = len(needle) - 1
while l and not haystack.endswith(needle[:l]):
l -= 1
return l
# -*- Mode: Python -*-
# Id: asyncore.py,v 2.51 2000/09/07 22:29:26 rushing Exp
# Author: Sam Rushing <[email protected]>
# ======================================================================
# Copyright 1996 by Sam Rushing
#
# All Rights Reserved
#
# Permission to use, copy, modify, and distribute this software and
# its documentation for any purpose and without fee is hereby
# granted, provided that the above copyright notice appear in all
# copies and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of Sam
# Rushing not be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior
# permission.
#
# SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
# INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN
# NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR
# CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
# OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
# NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
# CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
# ======================================================================
"""Basic infrastructure for asynchronous socket service clients and servers.
There are only two ways to have a program on a single processor do "more
than one thing at a time". Multi-threaded programming is the simplest and
most popular way to do it, but there is another very different technique,
that lets you have nearly all the advantages of multi-threading, without
actually using multiple threads. it's really only practical if your program
is largely I/O bound. If your program is CPU bound, then pre-emptive
scheduled threads are probably what you really need. Network servers are
rarely CPU-bound, however.
If your operating system supports the select() system call in its I/O
library (and nearly all do), then you can use it to juggle multiple
communication channels at once; doing other work while your I/O is taking
place in the "background." Although this strategy can seem strange and
complex, especially at first, it is in many ways easier to understand and
control than multi-threaded programming. The module documented here solves
many of the difficult problems for you, making the task of building
sophisticated high-performance network servers and clients a snap.
"""
import select
import socket
import sys
import time
import warnings
import os
from errno import EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET, EINVAL, \
ENOTCONN, ESHUTDOWN, EISCONN, EBADF, ECONNABORTED, EPIPE, EAGAIN, \
errorcode
_DISCONNECTED = frozenset((ECONNRESET, ENOTCONN, ESHUTDOWN, ECONNABORTED, EPIPE,
EBADF))
try:
socket_map
except NameError:
socket_map = {}
def _strerror(err):
try:
return os.strerror(err)
except (ValueError, OverflowError, NameError):
if err in errorcode:
return errorcode[err]
return "Unknown error %s" %err
class ExitNow(Exception):
pass
_reraised_exceptions = (ExitNow, KeyboardInterrupt, SystemExit)
def read(obj):
try:
obj.handle_read_event()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def write(obj):
try:
obj.handle_write_event()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def _exception(obj):
try:
obj.handle_expt_event()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def readwrite(obj, flags):
try:
if flags & select.POLLIN:
obj.handle_read_event()
if flags & select.POLLOUT:
obj.handle_write_event()
if flags & select.POLLPRI:
obj.handle_expt_event()
if flags & (select.POLLHUP | select.POLLERR | select.POLLNVAL):
obj.handle_close()
except OSError as e:
if e.args[0] not in _DISCONNECTED:
obj.handle_error()
else:
obj.handle_close()
except _reraised_exceptions:
raise
except:
obj.handle_error()
def poll(timeout=0.0, map=None):
if map is None:
map = socket_map
if map:
r = []; w = []; e = []
for fd, obj in list(map.items()):
is_r = obj.readable()
is_w = obj.writable()
if is_r:
r.append(fd)
# accepting sockets should not be writable
if is_w and not obj.accepting:
w.append(fd)
if is_r or is_w:
e.append(fd)
if [] == r == w == e:
time.sleep(timeout)
return
try:
r, w, e = select.select(r, w, e, timeout)
except InterruptedError:
return
for fd in r:
obj = map.get(fd)
if obj is None:
continue
read(obj)
for fd in w:
obj = map.get(fd)
if obj is None:
continue
write(obj)
for fd in e:
obj = map.get(fd)
if obj is None:
continue
_exception(obj)
def poll2(timeout=0.0, map=None):
# Use the poll() support added to the select module in Python 2.0
if map is None:
map = socket_map
if timeout is not None:
# timeout is in milliseconds
timeout = int(timeout*1000)
pollster = select.poll()
if map:
for fd, obj in list(map.items()):
flags = 0
if obj.readable():
flags |= select.POLLIN | select.POLLPRI
# accepting sockets should not be writable
if obj.writable() and not obj.accepting:
flags |= select.POLLOUT
if flags:
pollster.register(fd, flags)
try:
r = pollster.poll(timeout)
except InterruptedError:
r = []
for fd, flags in r:
obj = map.get(fd)
if obj is None:
continue
readwrite(obj, flags)
poll3 = poll2 # Alias for backward compatibility
def loop(timeout=30.0, use_poll=False, map=None, count=None):
if map is None:
map = socket_map
if use_poll and hasattr(select, 'poll'):
poll_fun = poll2
else:
poll_fun = poll
if count is None:
while map:
poll_fun(timeout, map)
else:
while map and count > 0:
poll_fun(timeout, map)
count = count - 1
class dispatcher:
debug = False
connected = False
accepting = False
connecting = False
closing = False
addr = None
ignore_log_types = frozenset(['warning'])
def __init__(self, sock=None, map=None):
if map is None:
self._map = socket_map
else:
self._map = map
self._fileno = None
if sock:
# Set to nonblocking just to make sure for cases where we
# get a socket from a blocking source.
sock.setblocking(0)
self.set_socket(sock, map)
self.connected = True
# The constructor no longer requires that the socket
# passed be connected.
try:
self.addr = sock.getpeername()
except OSError as err:
if err.args[0] in (ENOTCONN, EINVAL):
# To handle the case where we got an unconnected
# socket.
self.connected = False
else:
# The socket is broken in some unknown way, alert
# the user and remove it from the map (to prevent
# polling of broken sockets).
self.del_channel(map)
raise
else:
self.socket = None
def __repr__(self):
status = [self.__class__.__module__+"."+self.__class__.__name__]
if self.accepting and self.addr:
status.append('listening')
elif self.connected:
status.append('connected')
if self.addr is not None:
try:
status.append('%s:%d' % self.addr)
except TypeError:
status.append(repr(self.addr))
return '<%s at %#x>' % (' '.join(status), id(self))
__str__ = __repr__
def add_channel(self, map=None):
#self.log_info('adding channel %s' % self)
if map is None:
map = self._map
map[self._fileno] = self
def del_channel(self, map=None):
fd = self._fileno
if map is None:
map = self._map
if fd in map:
#self.log_info('closing channel %d:%s' % (fd, self))
del map[fd]
self._fileno = None
def create_socket(self, family=socket.AF_INET, type=socket.SOCK_STREAM):
self.family_and_type = family, type
sock = socket.socket(family, type)
sock.setblocking(0)
self.set_socket(sock)
def set_socket(self, sock, map=None):
self.socket = sock
## self.__dict__['socket'] = sock
self._fileno = sock.fileno()
self.add_channel(map)
def set_reuse_addr(self):
# try to re-use a server port if possible
try:
self.socket.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEADDR,
self.socket.getsockopt(socket.SOL_SOCKET,
socket.SO_REUSEADDR) | 1
)
except OSError:
pass
# ==================================================
# predicates for select()
# these are used as filters for the lists of sockets
# to pass to select().
# ==================================================
def readable(self):
return True
def writable(self):
return True
# ==================================================
# socket object methods.
# ==================================================
def listen(self, num):
self.accepting = True
if os.name == 'nt' and num > 5:
num = 5
return self.socket.listen(num)
def bind(self, addr):
self.addr = addr
return self.socket.bind(addr)
def connect(self, address):
self.connected = False
self.connecting = True
err = self.socket.connect_ex(address)
if err in (EINPROGRESS, EALREADY, EWOULDBLOCK) \
or err == EINVAL and os.name in ('nt', 'ce'):
self.addr = address
return
if err in (0, EISCONN):
self.addr = address
self.handle_connect_event()
else:
raise OSError(err, errorcode[err])
def accept(self):
# XXX can return either an address pair or None
try:
conn, addr = self.socket.accept()
except TypeError:
return None
except OSError as why:
if why.args[0] in (EWOULDBLOCK, ECONNABORTED, EAGAIN):
return None
else:
raise
else:
return conn, addr
def send(self, data):
try:
result = self.socket.send(data)
return result
except OSError as why:
if why.args[0] == EWOULDBLOCK:
return 0
elif why.args[0] in _DISCONNECTED:
self.handle_close()
return 0
else:
raise
def recv(self, buffer_size):
try:
data = self.socket.recv(buffer_size)
if not data:
# a closed connection is indicated by signaling
# a read condition, and having recv() return 0.
self.handle_close()
return b''
else:
return data
except OSError as why:
# winsock sometimes raises ENOTCONN
if why.args[0] in _DISCONNECTED:
self.handle_close()
return b''
else:
raise
def close(self):
self.connected = False
self.accepting = False
self.connecting = False
self.del_channel()
if self.socket is not None:
try:
self.socket.close()
except OSError as why:
if why.args[0] not in (ENOTCONN, EBADF):
raise
# cheap inheritance, used to pass all other attribute
# references to the underlying socket object.
def __getattr__(self, attr):
try:
retattr = getattr(self.socket, attr)
except AttributeError:
raise AttributeError("%s instance has no attribute '%s'"
%(self.__class__.__name__, attr))
else:
msg = "%(me)s.%(attr)s is deprecated; use %(me)s.socket.%(attr)s " \
"instead" % {'me' : self.__class__.__name__, 'attr' : attr}
warnings.warn(msg, DeprecationWarning, stacklevel=2)
return retattr
# log and log_info may be overridden to provide more sophisticated
# logging and warning methods. In general, log is for 'hit' logging
# and 'log_info' is for informational, warning and error logging.
def log(self, message):
sys.stderr.write('log: %s\n' % str(message))
def log_info(self, message, type='info'):
if type not in self.ignore_log_types:
print('%s: %s' % (type, message))
def handle_read_event(self):
if self.accepting:
# accepting sockets are never connected, they "spawn" new
# sockets that are connected
self.handle_accept()
elif not self.connected:
if self.connecting:
self.handle_connect_event()
self.handle_read()
else:
self.handle_read()
def handle_connect_event(self):
err = self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
if err != 0:
raise OSError(err, _strerror(err))
self.handle_connect()
self.connected = True
self.connecting = False
def handle_write_event(self):
if self.accepting:
# Accepting sockets shouldn't get a write event.
# We will pretend it didn't happen.
return
if not self.connected:
if self.connecting:
self.handle_connect_event()
self.handle_write()
def handle_expt_event(self):
# handle_expt_event() is called if there might be an error on the
# socket, or if there is OOB data
# check for the error condition first
err = self.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
if err != 0:
# we can get here when select.select() says that there is an
# exceptional condition on the socket
# since there is an error, we'll go ahead and close the socket
# like we would in a subclassed handle_read() that received no
# data
self.handle_close()
else:
self.handle_expt()
def handle_error(self):
nil, t, v, tbinfo = compact_traceback()
# sometimes a user repr method will crash.
try:
self_repr = repr(self)
except:
self_repr = '<__repr__(self) failed for object at %0x>' % id(self)
self.log_info(
'uncaptured python exception, closing channel %s (%s:%s %s)' % (
self_repr,
t,
v,
tbinfo
),
'error'
)
self.handle_close()
def handle_expt(self):
self.log_info('unhandled incoming priority event', 'warning')
def handle_read(self):
self.log_info('unhandled read event', 'warning')
def handle_write(self):
self.log_info('unhandled write event', 'warning')
def handle_connect(self):
self.log_info('unhandled connect event', 'warning')
def handle_accept(self):
pair = self.accept()
if pair is not None:
self.handle_accepted(*pair)
def handle_accepted(self, sock, addr):
sock.close()
self.log_info('unhandled accepted event', 'warning')
def handle_close(self):
self.log_info('unhandled close event', 'warning')
self.close()
# ---------------------------------------------------------------------------
# adds simple buffered output capability, useful for simple clients.
# [for more sophisticated usage use asynchat.async_chat]
# ---------------------------------------------------------------------------
class dispatcher_with_send(dispatcher):
def __init__(self, sock=None, map=None):
dispatcher.__init__(self, sock, map)
self.out_buffer = b''
def initiate_send(self):
num_sent = 0
num_sent = dispatcher.send(self, self.out_buffer[:65536])
self.out_buffer = self.out_buffer[num_sent:]
def handle_write(self):
self.initiate_send()
def writable(self):
return (not self.connected) or len(self.out_buffer)
def send(self, data):
if self.debug:
self.log_info('sending %s' % repr(data))
self.out_buffer = self.out_buffer + data
self.initiate_send()
# ---------------------------------------------------------------------------
# used for debugging.
# ---------------------------------------------------------------------------
def compact_traceback():
t, v, tb = sys.exc_info()
tbinfo = []
if not tb: # Must have a traceback
raise AssertionError("traceback does not exist")
while tb:
tbinfo.append((
tb.tb_frame.f_code.co_filename,
tb.tb_frame.f_code.co_name,
str(tb.tb_lineno)
))
tb = tb.tb_next
# just to be safe
del tb
file, function, line = tbinfo[-1]
info = ' '.join(['[%s|%s|%s]' % x for x in tbinfo])
return (file, function, line), t, v, info
def close_all(map=None, ignore_all=False):
if map is None:
map = socket_map
for x in list(map.values()):
try:
x.close()
except OSError as x:
if x.args[0] == EBADF:
pass
elif not ignore_all:
raise
except _reraised_exceptions:
raise
except:
if not ignore_all:
raise
map.clear()
# Asynchronous File I/O:
#
# After a little research (reading man pages on various unixen, and
# digging through the linux kernel), I've determined that select()
# isn't meant for doing asynchronous file i/o.
# Heartening, though - reading linux/mm/filemap.c shows that linux
# supports asynchronous read-ahead. So _MOST_ of the time, the data
# will be sitting in memory for us already when we go to read it.
#
# What other OS's (besides NT) support async file i/o? [VMS?]
#
# Regardless, this is useful for pipes, and stdin/stdout...
if os.name == 'posix':
import fcntl
class file_wrapper:
# Here we override just enough to make a file
# look like a socket for the purposes of asyncore.
# The passed fd is automatically os.dup()'d
def __init__(self, fd):
self.fd = os.dup(fd)
def __del__(self):
if self.fd >= 0:
warnings.warn("unclosed file %r" % self, ResourceWarning)
self.close()
def recv(self, *args):
return os.read(self.fd, *args)
def send(self, *args):
return os.write(self.fd, *args)
def getsockopt(self, level, optname, buflen=None):
if (level == socket.SOL_SOCKET and
optname == socket.SO_ERROR and
not buflen):
return 0
raise NotImplementedError("Only asyncore specific behaviour "
"implemented.")
read = recv
write = send
def close(self):
if self.fd < 0:
return
os.close(self.fd)
self.fd = -1
def fileno(self):
return self.fd
class file_dispatcher(dispatcher):
def __init__(self, fd, map=None):
dispatcher.__init__(self, None, map)
self.connected = True
try:
fd = fd.fileno()
except AttributeError:
pass
self.set_file(fd)
# set it to non-blocking mode
flags = fcntl.fcntl(fd, fcntl.F_GETFL, 0)
flags = flags | os.O_NONBLOCK
fcntl.fcntl(fd, fcntl.F_SETFL, flags)
def set_file(self, fd):
self.socket = file_wrapper(fd)
self._fileno = self.socket.fileno()
self.add_channel()
#! /usr/bin/env python3
"""Base16, Base32, Base64 (RFC 3548), Base85 and Ascii85 data encodings"""
# Modified 04-Oct-1995 by Jack Jansen to use binascii module
# Modified 30-Dec-2003 by Barry Warsaw to add full RFC 3548 support
# Modified 22-May-2007 by Guido van Rossum to use bytes everywhere
import re
import struct
import binascii
__all__ = [
# Legacy interface exports traditional RFC 1521 Base64 encodings
'encode', 'decode', 'encodebytes', 'decodebytes',
# Generalized interface for other encodings
'b64encode', 'b64decode', 'b32encode', 'b32decode',
'b16encode', 'b16decode',
# Base85 and Ascii85 encodings
'b85encode', 'b85decode', 'a85encode', 'a85decode',
# Standard Base64 encoding
'standard_b64encode', 'standard_b64decode',
# Some common Base64 alternatives. As referenced by RFC 3458, see thread
# starting at:
#
# http://zgp.org/pipermail/p2p-hackers/2001-September/000316.html
'urlsafe_b64encode', 'urlsafe_b64decode',
]
bytes_types = (bytes, bytearray) # Types acceptable as binary data
def _bytes_from_decode_data(s):
if isinstance(s, str):
try:
return s.encode('ascii')
except UnicodeEncodeError:
raise ValueError('string argument should contain only ASCII characters')
if isinstance(s, bytes_types):
return s
try:
return memoryview(s).tobytes()
except TypeError:
raise TypeError("argument should be a bytes-like object or ASCII "
"string, not %r" % s.__class__.__name__) from None
# Base64 encoding/decoding uses binascii
def b64encode(s, altchars=None):
"""Encode a byte string using Base64.
s is the byte string to encode. Optional altchars must be a byte
string of length 2 which specifies an alternative alphabet for the
'+' and '/' characters. This allows an application to
e.g. generate url or filesystem safe Base64 strings.
The encoded byte string is returned.
"""
# Strip off the trailing newline
encoded = binascii.b2a_base64(s)[:-1]
if altchars is not None:
assert len(altchars) == 2, repr(altchars)
return encoded.translate(bytes.maketrans(b'+/', altchars))
return encoded
def b64decode(s, altchars=None, validate=False):
"""Decode a Base64 encoded byte string.
s is the byte string to decode. Optional altchars must be a
string of length 2 which specifies the alternative alphabet used
instead of the '+' and '/' characters.
The decoded string is returned. A binascii.Error is raised if s is
incorrectly padded.
If validate is False (the default), non-base64-alphabet characters are
discarded prior to the padding check. If validate is True,
non-base64-alphabet characters in the input result in a binascii.Error.
"""
s = _bytes_from_decode_data(s)
if altchars is not None:
altchars = _bytes_from_decode_data(altchars)
assert len(altchars) == 2, repr(altchars)
s = s.translate(bytes.maketrans(altchars, b'+/'))
if validate and not re.match(b'^[A-Za-z0-9+/]*={0,2}$', s):
raise binascii.Error('Non-base64 digit found')
return binascii.a2b_base64(s)
def standard_b64encode(s):
"""Encode a byte string using the standard Base64 alphabet.
s is the byte string to encode. The encoded byte string is returned.
"""
return b64encode(s)
def standard_b64decode(s):
"""Decode a byte string encoded with the standard Base64 alphabet.
s is the byte string to decode. The decoded byte string is
returned. binascii.Error is raised if the input is incorrectly
padded or if there are non-alphabet characters present in the
input.
"""
return b64decode(s)
_urlsafe_encode_translation = bytes.maketrans(b'+/', b'-_')
_urlsafe_decode_translation = bytes.maketrans(b'-_', b'+/')
def urlsafe_b64encode(s):
"""Encode a byte string using a url-safe Base64 alphabet.
s is the byte string to encode. The encoded byte string is
returned. The alphabet uses '-' instead of '+' and '_' instead of
'/'.
"""
return b64encode(s).translate(_urlsafe_encode_translation)
def urlsafe_b64decode(s):
"""Decode a byte string encoded with the standard Base64 alphabet.
s is the byte string to decode. The decoded byte string is
returned. binascii.Error is raised if the input is incorrectly
padded or if there are non-alphabet characters present in the
input.
The alphabet uses '-' instead of '+' and '_' instead of '/'.
"""
s = _bytes_from_decode_data(s)
s = s.translate(_urlsafe_decode_translation)
return b64decode(s)
# Base32 encoding/decoding must be done in Python
_b32alphabet = b'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567'
_b32tab2 = None
_b32rev = None
def b32encode(s):
"""Encode a byte string using Base32.
s is the byte string to encode. The encoded byte string is returned.
"""
global _b32tab2
# Delay the initialization of the table to not waste memory
# if the function is never called
if _b32tab2 is None:
b32tab = [bytes((i,)) for i in _b32alphabet]
_b32tab2 = [a + b for a in b32tab for b in b32tab]
b32tab = None
if not isinstance(s, bytes_types):
s = memoryview(s).tobytes()
leftover = len(s) % 5
# Pad the last quantum with zero bits if necessary
if leftover:
s = s + bytes(5 - leftover) # Don't use += !
encoded = bytearray()
from_bytes = int.from_bytes
b32tab2 = _b32tab2
for i in range(0, len(s), 5):
c = from_bytes(s[i: i + 5], 'big')
encoded += (b32tab2[c >> 30] + # bits 1 - 10
b32tab2[(c >> 20) & 0x3ff] + # bits 11 - 20
b32tab2[(c >> 10) & 0x3ff] + # bits 21 - 30
b32tab2[c & 0x3ff] # bits 31 - 40
)
# Adjust for any leftover partial quanta
if leftover == 1:
encoded[-6:] = b'======'
elif leftover == 2:
encoded[-4:] = b'===='
elif leftover == 3:
encoded[-3:] = b'==='
elif leftover == 4:
encoded[-1:] = b'='
return bytes(encoded)
def b32decode(s, casefold=False, map01=None):
"""Decode a Base32 encoded byte string.
s is the byte string to decode. Optional casefold is a flag
specifying whether a lowercase alphabet is acceptable as input.
For security purposes, the default is False.
RFC 3548 allows for optional mapping of the digit 0 (zero) to the
letter O (oh), and for optional mapping of the digit 1 (one) to
either the letter I (eye) or letter L (el). The optional argument
map01 when not None, specifies which letter the digit 1 should be
mapped to (when map01 is not None, the digit 0 is always mapped to
the letter O). For security purposes the default is None, so that
0 and 1 are not allowed in the input.
The decoded byte string is returned. binascii.Error is raised if
the input is incorrectly padded or if there are non-alphabet
characters present in the input.
"""
global _b32rev
# Delay the initialization of the table to not waste memory
# if the function is never called
if _b32rev is None:
_b32rev = {v: k for k, v in enumerate(_b32alphabet)}
s = _bytes_from_decode_data(s)
if len(s) % 8:
raise binascii.Error('Incorrect padding')
# Handle section 2.4 zero and one mapping. The flag map01 will be either
# False, or the character to map the digit 1 (one) to. It should be
# either L (el) or I (eye).
if map01 is not None:
map01 = _bytes_from_decode_data(map01)
assert len(map01) == 1, repr(map01)
s = s.translate(bytes.maketrans(b'01', b'O' + map01))
if casefold:
s = s.upper()
# Strip off pad characters from the right. We need to count the pad
# characters because this will tell us how many null bytes to remove from
# the end of the decoded string.
l = len(s)
s = s.rstrip(b'=')
padchars = l - len(s)
# Now decode the full quanta
decoded = bytearray()
b32rev = _b32rev
for i in range(0, len(s), 8):
quanta = s[i: i + 8]
acc = 0
try:
for c in quanta:
acc = (acc << 5) + b32rev[c]
except KeyError:
raise binascii.Error('Non-base32 digit found') from None
decoded += acc.to_bytes(5, 'big')
# Process the last, partial quanta
if padchars:
acc <<= 5 * padchars
last = acc.to_bytes(5, 'big')
if padchars == 1:
decoded[-5:] = last[:-1]
elif padchars == 3:
decoded[-5:] = last[:-2]
elif padchars == 4:
decoded[-5:] = last[:-3]
elif padchars == 6:
decoded[-5:] = last[:-4]
else:
raise binascii.Error('Incorrect padding')
return bytes(decoded)
# RFC 3548, Base 16 Alphabet specifies uppercase, but hexlify() returns
# lowercase. The RFC also recommends against accepting input case
# insensitively.
def b16encode(s):
"""Encode a byte string using Base16.
s is the byte string to encode. The encoded byte string is returned.
"""
return binascii.hexlify(s).upper()
def b16decode(s, casefold=False):
"""Decode a Base16 encoded byte string.
s is the byte string to decode. Optional casefold is a flag
specifying whether a lowercase alphabet is acceptable as input.
For security purposes, the default is False.
The decoded byte string is returned. binascii.Error is raised if
s were incorrectly padded or if there are non-alphabet characters
present in the string.
"""
s = _bytes_from_decode_data(s)
if casefold:
s = s.upper()
if re.search(b'[^0-9A-F]', s):
raise binascii.Error('Non-base16 digit found')
return binascii.unhexlify(s)
#
# Ascii85 encoding/decoding
#
_a85chars = None
_a85chars2 = None
_A85START = b"<~"
_A85END = b"~>"
def _85encode(b, chars, chars2, pad=False, foldnuls=False, foldspaces=False):
# Helper function for a85encode and b85encode
if not isinstance(b, bytes_types):
b = memoryview(b).tobytes()
padding = (-len(b)) % 4
if padding:
b = b + b'\0' * padding
words = struct.Struct('!%dI' % (len(b) // 4)).unpack(b)
chunks = [b'z' if foldnuls and not word else
b'y' if foldspaces and word == 0x20202020 else
(chars2[word // 614125] +
chars2[word // 85 % 7225] +
chars[word % 85])
for word in words]
if padding and not pad:
if chunks[-1] == b'z':
chunks[-1] = chars[0] * 5
chunks[-1] = chunks[-1][:-padding]
return b''.join(chunks)
def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):
"""Encode a byte string using Ascii85.
b is the byte string to encode. The encoded byte string is returned.
foldspaces is an optional flag that uses the special short sequence 'y'
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
feature is not supported by the "standard" Adobe encoding.
wrapcol controls whether the output should have newline ('\\n') characters
added to it. If this is non-zero, each output line will be at most this
many characters long.
pad controls whether the input string is padded to a multiple of 4 before
encoding. Note that the btoa implementation always pads.
adobe controls whether the encoded byte sequence is framed with <~ and ~>,
which is used by the Adobe implementation.
"""
global _a85chars, _a85chars2
# Delay the initialization of tables to not waste memory
# if the function is never called
if _a85chars is None:
_a85chars = [bytes((i,)) for i in range(33, 118)]
_a85chars2 = [(a + b) for a in _a85chars for b in _a85chars]
result = _85encode(b, _a85chars, _a85chars2, pad, True, foldspaces)
if adobe:
result = _A85START + result
if wrapcol:
wrapcol = max(2 if adobe else 1, wrapcol)
chunks = [result[i: i + wrapcol]
for i in range(0, len(result), wrapcol)]
if adobe:
if len(chunks[-1]) + 2 > wrapcol:
chunks.append(b'')
result = b'\n'.join(chunks)
if adobe:
result += _A85END
return result
def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
"""Decode an Ascii85 encoded byte string.
s is the byte string to decode.
foldspaces is a flag that specifies whether the 'y' short sequence should be
accepted as shorthand for 4 consecutive spaces (ASCII 0x20). This feature is
not supported by the "standard" Adobe encoding.
adobe controls whether the input sequence is in Adobe Ascii85 format (i.e.
is framed with <~ and ~>).
ignorechars should be a byte string containing characters to ignore from the
input. This should only contain whitespace characters, and by default
contains all whitespace characters in ASCII.
"""
b = _bytes_from_decode_data(b)
if adobe:
if not (b.startswith(_A85START) and b.endswith(_A85END)):
raise ValueError("Ascii85 encoded byte sequences must be bracketed "
"by {!r} and {!r}".format(_A85START, _A85END))
b = b[2:-2] # Strip off start/end markers
#
# We have to go through this stepwise, so as to ignore spaces and handle
# special short sequences
#
packI = struct.Struct('!I').pack
decoded = []
decoded_append = decoded.append
curr = []
curr_append = curr.append
curr_clear = curr.clear
for x in b + b'u' * 4:
if b'!'[0] <= x <= b'u'[0]:
curr_append(x)
if len(curr) == 5:
acc = 0
for x in curr:
acc = 85 * acc + (x - 33)
try:
decoded_append(packI(acc))
except struct.error:
raise ValueError('Ascii85 overflow') from None
curr_clear()
elif x == b'z'[0]:
if curr:
raise ValueError('z inside Ascii85 5-tuple')
decoded_append(b'\0\0\0\0')
elif foldspaces and x == b'y'[0]:
if curr:
raise ValueError('y inside Ascii85 5-tuple')
decoded_append(b'\x20\x20\x20\x20')
elif x in ignorechars:
# Skip whitespace
continue
else:
raise ValueError('Non-Ascii85 digit found: %c' % x)
result = b''.join(decoded)
padding = 4 - len(curr)
if padding:
# Throw away the extra padding
result = result[:-padding]
return result
# The following code is originally taken (with permission) from Mercurial
_b85alphabet = (b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
b"abcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~")
_b85chars = None
_b85chars2 = None
_b85dec = None
def b85encode(b, pad=False):
"""Encode an ASCII-encoded byte array in base85 format.
If pad is true, the input is padded with "\\0" so its length is a multiple of
4 characters before encoding.
"""
global _b85chars, _b85chars2
# Delay the initialization of tables to not waste memory
# if the function is never called
if _b85chars is None:
_b85chars = [bytes((i,)) for i in _b85alphabet]
_b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]
return _85encode(b, _b85chars, _b85chars2, pad)
def b85decode(b):
"""Decode base85-encoded byte array"""
global _b85dec
# Delay the initialization of tables to not waste memory
# if the function is never called
if _b85dec is None:
_b85dec = [None] * 256
for i, c in enumerate(_b85alphabet):
_b85dec[c] = i
b = _bytes_from_decode_data(b)
padding = (-len(b)) % 5
b = b + b'~' * padding
out = []
packI = struct.Struct('!I').pack
for i in range(0, len(b), 5):
chunk = b[i:i + 5]
acc = 0
try:
for c in chunk:
acc = acc * 85 + _b85dec[c]
except TypeError:
for j, c in enumerate(chunk):
if _b85dec[c] is None:
raise ValueError('bad base85 character at position %d'
% (i + j)) from None
raise
try:
out.append(packI(acc))
except struct.error:
raise ValueError('base85 overflow in hunk starting at byte %d'
% i) from None
result = b''.join(out)
if padding:
result = result[:-padding]
return result
# Legacy interface. This code could be cleaned up since I don't believe
# binascii has any line length limitations. It just doesn't seem worth it
# though. The files should be opened in binary mode.
MAXLINESIZE = 76 # Excluding the CRLF
MAXBINSIZE = (MAXLINESIZE//4)*3
def encode(input, output):
"""Encode a file; input and output are binary files."""
while True:
s = input.read(MAXBINSIZE)
if not s:
break
while len(s) < MAXBINSIZE:
ns = input.read(MAXBINSIZE-len(s))
if not ns:
break
s += ns
line = binascii.b2a_base64(s)
output.write(line)
def decode(input, output):
"""Decode a file; input and output are binary files."""
while True:
line = input.readline()
if not line:
break
s = binascii.a2b_base64(line)
output.write(s)
def _input_type_check(s):
try:
m = memoryview(s)
except TypeError as err:
msg = "expected bytes-like object, not %s" % s.__class__.__name__
raise TypeError(msg) from err
if m.format not in ('c', 'b', 'B'):
msg = ("expected single byte elements, not %r from %s" %
(m.format, s.__class__.__name__))
raise TypeError(msg)
if m.ndim != 1:
msg = ("expected 1-D data, not %d-D data from %s" %
(m.ndim, s.__class__.__name__))
raise TypeError(msg)
def encodebytes(s):
"""Encode a bytestring into a bytestring containing multiple lines
of base-64 data."""
_input_type_check(s)
pieces = []
for i in range(0, len(s), MAXBINSIZE):
chunk = s[i : i + MAXBINSIZE]
pieces.append(binascii.b2a_base64(chunk))
return b"".join(pieces)
def encodestring(s):
"""Legacy alias of encodebytes()."""
import warnings
warnings.warn("encodestring() is a deprecated alias, use encodebytes()",
DeprecationWarning, 2)
return encodebytes(s)
def decodebytes(s):
"""Decode a bytestring of base-64 data into a bytestring."""
_input_type_check(s)
return binascii.a2b_base64(s)
def decodestring(s):
"""Legacy alias of decodebytes()."""
import warnings
warnings.warn("decodestring() is a deprecated alias, use decodebytes()",
DeprecationWarning, 2)
return decodebytes(s)
# Usable as a script...
def main():
"""Small main program"""
import sys, getopt
try:
opts, args = getopt.getopt(sys.argv[1:], 'deut')
except getopt.error as msg:
sys.stdout = sys.stderr
print(msg)
print("""usage: %s [-d|-e|-u|-t] [file|-]
-d, -u: decode
-e: encode (default)
-t: encode and decode string 'Aladdin:open sesame'"""%sys.argv[0])
sys.exit(2)
func = encode
for o, a in opts:
if o == '-e': func = encode
if o == '-d': func = decode
if o == '-u': func = decode
if o == '-t': test(); return
if args and args[0] != '-':
with open(args[0], 'rb') as f:
func(f, sys.stdout.buffer)
else:
func(sys.stdin.buffer, sys.stdout.buffer)
def test():
s0 = b"Aladdin:open sesame"
print(repr(s0))
s1 = encodebytes(s0)
print(repr(s1))
s2 = decodebytes(s1)
print(repr(s2))
assert s0 == s2
if __name__ == '__main__':
main()
"""Debugger basics"""
import fnmatch
import sys
import os
from inspect import CO_GENERATOR
__all__ = ["BdbQuit", "Bdb", "Breakpoint"]
class BdbQuit(Exception):
"""Exception to give up completely."""
class Bdb:
"""Generic Python debugger base class.
This class takes care of details of the trace facility;
a derived class should implement user interaction.
The standard debugger class (pdb.Pdb) is an example.
"""
def __init__(self, skip=None):
self.skip = set(skip) if skip else None
self.breaks = {}
self.fncache = {}
self.frame_returning = None
def canonic(self, filename):
if filename == "<" + filename[1:-1] + ">":
return filename
canonic = self.fncache.get(filename)
if not canonic:
canonic = os.path.abspath(filename)
canonic = os.path.normcase(canonic)
self.fncache[filename] = canonic
return canonic
def reset(self):
import linecache
linecache.checkcache()
self.botframe = None
self._set_stopinfo(None, None)
def trace_dispatch(self, frame, event, arg):
if self.quitting:
return # None
if event == 'line':
return self.dispatch_line(frame)
if event == 'call':
return self.dispatch_call(frame, arg)
if event == 'return':
return self.dispatch_return(frame, arg)
if event == 'exception':
return self.dispatch_exception(frame, arg)
if event == 'c_call':
return self.trace_dispatch
if event == 'c_exception':
return self.trace_dispatch
if event == 'c_return':
return self.trace_dispatch
print('bdb.Bdb.dispatch: unknown debugging event:', repr(event))
return self.trace_dispatch
def dispatch_line(self, frame):
if self.stop_here(frame) or self.break_here(frame):
self.user_line(frame)
if self.quitting: raise BdbQuit
return self.trace_dispatch
def dispatch_call(self, frame, arg):
# XXX 'arg' is no longer used
if self.botframe is None:
# First call of dispatch since reset()
self.botframe = frame.f_back # (CT) Note that this may also be None!
return self.trace_dispatch
if not (self.stop_here(frame) or self.break_anywhere(frame)):
# No need to trace this function
return # None
# Ignore call events in generator except when stepping.
if self.stopframe and frame.f_code.co_flags & CO_GENERATOR:
return self.trace_dispatch
self.user_call(frame, arg)
if self.quitting: raise BdbQuit
return self.trace_dispatch
def dispatch_return(self, frame, arg):
if self.stop_here(frame) or frame == self.returnframe:
# Ignore return events in generator except when stepping.
if self.stopframe and frame.f_code.co_flags & CO_GENERATOR:
return self.trace_dispatch
try:
self.frame_returning = frame
self.user_return(frame, arg)
finally:
self.frame_returning = None
if self.quitting: raise BdbQuit
# The user issued a 'next' or 'until' command.
if self.stopframe is frame and self.stoplineno != -1:
self._set_stopinfo(None, None)
return self.trace_dispatch
def dispatch_exception(self, frame, arg):
if self.stop_here(frame):
# When stepping with next/until/return in a generator frame, skip
# the internal StopIteration exception (with no traceback)
# triggered by a subiterator run with the 'yield from' statement.
if not (frame.f_code.co_flags & CO_GENERATOR
and arg[0] is StopIteration and arg[2] is None):
self.user_exception(frame, arg)
if self.quitting: raise BdbQuit
# Stop at the StopIteration or GeneratorExit exception when the user
# has set stopframe in a generator by issuing a return command, or a
# next/until command at the last statement in the generator before the
# exception.
elif (self.stopframe and frame is not self.stopframe
and self.stopframe.f_code.co_flags & CO_GENERATOR
and arg[0] in (StopIteration, GeneratorExit)):
self.user_exception(frame, arg)
if self.quitting: raise BdbQuit
return self.trace_dispatch
# Normally derived classes don't override the following
# methods, but they may if they want to redefine the
# definition of stopping and breakpoints.
def is_skipped_module(self, module_name):
for pattern in self.skip:
if fnmatch.fnmatch(module_name, pattern):
return True
return False
def stop_here(self, frame):
# (CT) stopframe may now also be None, see dispatch_call.
# (CT) the former test for None is therefore removed from here.
if self.skip and \
self.is_skipped_module(frame.f_globals.get('__name__')):
return False
if frame is self.stopframe:
if self.stoplineno == -1:
return False
return frame.f_lineno >= self.stoplineno
if not self.stopframe:
return True
return False
def break_here(self, frame):
filename = self.canonic(frame.f_code.co_filename)
if filename not in self.breaks:
return False
lineno = frame.f_lineno
if lineno not in self.breaks[filename]:
# The line itself has no breakpoint, but maybe the line is the
# first line of a function with breakpoint set by function name.
lineno = frame.f_code.co_firstlineno
if lineno not in self.breaks[filename]:
return False
# flag says ok to delete temp. bp
(bp, flag) = effective(filename, lineno, frame)
if bp:
self.currentbp = bp.number
if (flag and bp.temporary):
self.do_clear(str(bp.number))
return True
else:
return False
def do_clear(self, arg):
raise NotImplementedError("subclass of bdb must implement do_clear()")
def break_anywhere(self, frame):
return self.canonic(frame.f_code.co_filename) in self.breaks
# Derived classes should override the user_* methods
# to gain control.
def user_call(self, frame, argument_list):
"""This method is called when there is the remote possibility
that we ever need to stop in this function."""
pass
def user_line(self, frame):
"""This method is called when we stop or break at this line."""
pass
def user_return(self, frame, return_value):
"""This method is called when a return trap is set here."""
pass
def user_exception(self, frame, exc_info):
"""This method is called if an exception occurs,
but only if we are to stop at or just below this level."""
pass
def _set_stopinfo(self, stopframe, returnframe, stoplineno=0):
self.stopframe = stopframe
self.returnframe = returnframe
self.quitting = False
# stoplineno >= 0 means: stop at line >= the stoplineno
# stoplineno -1 means: don't stop at all
self.stoplineno = stoplineno
# Derived classes and clients can call the following methods
# to affect the stepping state.
def set_until(self, frame, lineno=None):
"""Stop when the line with the line no greater than the current one is
reached or when returning from current frame"""
# the name "until" is borrowed from gdb
if lineno is None:
lineno = frame.f_lineno + 1
self._set_stopinfo(frame, frame, lineno)
def set_step(self):
"""Stop after one line of code."""
# Issue #13183: pdb skips frames after hitting a breakpoint and running
# step commands.
# Restore the trace function in the caller (that may not have been set
# for performance reasons) when returning from the current frame.
if self.frame_returning:
caller_frame = self.frame_returning.f_back
if caller_frame and not caller_frame.f_trace:
caller_frame.f_trace = self.trace_dispatch
self._set_stopinfo(None, None)
def set_next(self, frame):
"""Stop on the next line in or below the given frame."""
self._set_stopinfo(frame, None)
def set_return(self, frame):
"""Stop when returning from the given frame."""
if frame.f_code.co_flags & CO_GENERATOR:
self._set_stopinfo(frame, None, -1)
else:
self._set_stopinfo(frame.f_back, frame)
def set_trace(self, frame=None):
"""Start debugging from `frame`.
If frame is not specified, debugging starts from caller's frame.
"""
if frame is None:
frame = sys._getframe().f_back
self.reset()
while frame:
frame.f_trace = self.trace_dispatch
self.botframe = frame
frame = frame.f_back
self.set_step()
sys.settrace(self.trace_dispatch)
def set_continue(self):
# Don't stop except at breakpoints or when finished
self._set_stopinfo(self.botframe, None, -1)
if not self.breaks:
# no breakpoints; run without debugger overhead
sys.settrace(None)
frame = sys._getframe().f_back
while frame and frame is not self.botframe:
del frame.f_trace
frame = frame.f_back
def set_quit(self):
self.stopframe = self.botframe
self.returnframe = None
self.quitting = True
sys.settrace(None)
# Derived classes and clients can call the following methods
# to manipulate breakpoints. These methods return an
# error message is something went wrong, None if all is well.
# Set_break prints out the breakpoint line and file:lineno.
# Call self.get_*break*() to see the breakpoints or better
# for bp in Breakpoint.bpbynumber: if bp: bp.bpprint().
def set_break(self, filename, lineno, temporary=False, cond=None,
funcname=None):
filename = self.canonic(filename)
import linecache # Import as late as possible
line = linecache.getline(filename, lineno)
if not line:
return 'Line %s:%d does not exist' % (filename, lineno)
list = self.breaks.setdefault(filename, [])
if lineno not in list:
list.append(lineno)
bp = Breakpoint(filename, lineno, temporary, cond, funcname)
def _prune_breaks(self, filename, lineno):
if (filename, lineno) not in Breakpoint.bplist:
self.breaks[filename].remove(lineno)
if not self.breaks[filename]:
del self.breaks[filename]
def clear_break(self, filename, lineno):
filename = self.canonic(filename)
if filename not in self.breaks:
return 'There are no breakpoints in %s' % filename
if lineno not in self.breaks[filename]:
return 'There is no breakpoint at %s:%d' % (filename, lineno)
# If there's only one bp in the list for that file,line
# pair, then remove the breaks entry
for bp in Breakpoint.bplist[filename, lineno][:]:
bp.deleteMe()
self._prune_breaks(filename, lineno)
def clear_bpbynumber(self, arg):
try:
bp = self.get_bpbynumber(arg)
except ValueError as err:
return str(err)
bp.deleteMe()
self._prune_breaks(bp.file, bp.line)
def clear_all_file_breaks(self, filename):
filename = self.canonic(filename)
if filename not in self.breaks:
return 'There are no breakpoints in %s' % filename
for line in self.breaks[filename]:
blist = Breakpoint.bplist[filename, line]
for bp in blist:
bp.deleteMe()
del self.breaks[filename]
def clear_all_breaks(self):
if not self.breaks:
return 'There are no breakpoints'
for bp in Breakpoint.bpbynumber:
if bp:
bp.deleteMe()
self.breaks = {}
def get_bpbynumber(self, arg):
if not arg:
raise ValueError('Breakpoint number expected')
try:
number = int(arg)
except ValueError:
raise ValueError('Non-numeric breakpoint number %s' % arg)
try:
bp = Breakpoint.bpbynumber[number]
except IndexError:
raise ValueError('Breakpoint number %d out of range' % number)
if bp is None:
raise ValueError('Breakpoint %d already deleted' % number)
return bp
def get_break(self, filename, lineno):
filename = self.canonic(filename)
return filename in self.breaks and \
lineno in self.breaks[filename]
def get_breaks(self, filename, lineno):
filename = self.canonic(filename)
return filename in self.breaks and \
lineno in self.breaks[filename] and \
Breakpoint.bplist[filename, lineno] or []
def get_file_breaks(self, filename):
filename = self.canonic(filename)
if filename in self.breaks:
return self.breaks[filename]
else:
return []
def get_all_breaks(self):
return self.breaks
# Derived classes and clients can call the following method
# to get a data structure representing a stack trace.
def get_stack(self, f, t):
stack = []
if t and t.tb_frame is f:
t = t.tb_next
while f is not None:
stack.append((f, f.f_lineno))
if f is self.botframe:
break
f = f.f_back
stack.reverse()
i = max(0, len(stack) - 1)
while t is not None:
stack.append((t.tb_frame, t.tb_lineno))
t = t.tb_next
if f is None:
i = max(0, len(stack) - 1)
return stack, i
def format_stack_entry(self, frame_lineno, lprefix=': '):
import linecache, reprlib
frame, lineno = frame_lineno
filename = self.canonic(frame.f_code.co_filename)
s = '%s(%r)' % (filename, lineno)
if frame.f_code.co_name:
s += frame.f_code.co_name
else:
s += "<lambda>"
if '__args__' in frame.f_locals:
args = frame.f_locals['__args__']
else:
args = None
if args:
s += reprlib.repr(args)
else:
s += '()'
if '__return__' in frame.f_locals:
rv = frame.f_locals['__return__']
s += '->'
s += reprlib.repr(rv)
line = linecache.getline(filename, lineno, frame.f_globals)
if line:
s += lprefix + line.strip()
return s
# The following methods can be called by clients to use
# a debugger to debug a statement or an expression.
# Both can be given as a string, or a code object.
def run(self, cmd, globals=None, locals=None):
if globals is None:
import __main__
globals = __main__.__dict__
if locals is None:
locals = globals
self.reset()
if isinstance(cmd, str):
cmd = compile(cmd, "<string>", "exec")
sys.settrace(self.trace_dispatch)
try:
exec(cmd, globals, locals)
except BdbQuit:
pass
finally:
self.quitting = True
sys.settrace(None)
def runeval(self, expr, globals=None, locals=None):
if globals is None:
import __main__
globals = __main__.__dict__
if locals is None:
locals = globals
self.reset()
sys.settrace(self.trace_dispatch)
try:
return eval(expr, globals, locals)
except BdbQuit:
pass
finally:
self.quitting = True
sys.settrace(None)
def runctx(self, cmd, globals, locals):
# B/W compatibility
self.run(cmd, globals, locals)
# This method is more useful to debug a single function call.
def runcall(self, func, *args, **kwds):
self.reset()
sys.settrace(self.trace_dispatch)
res = None
try:
res = func(*args, **kwds)
except BdbQuit:
pass
finally:
self.quitting = True
sys.settrace(None)
return res
def set_trace():
Bdb().set_trace()
class Breakpoint:
"""Breakpoint class.
Implements temporary breakpoints, ignore counts, disabling and
(re)-enabling, and conditionals.
Breakpoints are indexed by number through bpbynumber and by
the file,line tuple using bplist. The former points to a
single instance of class Breakpoint. The latter points to a
list of such instances since there may be more than one
breakpoint per line.
"""
# XXX Keeping state in the class is a mistake -- this means
# you cannot have more than one active Bdb instance.
next = 1 # Next bp to be assigned
bplist = {} # indexed by (file, lineno) tuple
bpbynumber = [None] # Each entry is None or an instance of Bpt
# index 0 is unused, except for marking an
# effective break .... see effective()
def __init__(self, file, line, temporary=False, cond=None, funcname=None):
self.funcname = funcname
# Needed if funcname is not None.
self.func_first_executable_line = None
self.file = file # This better be in canonical form!
self.line = line
self.temporary = temporary
self.cond = cond
self.enabled = True
self.ignore = 0
self.hits = 0
self.number = Breakpoint.next
Breakpoint.next += 1
# Build the two lists
self.bpbynumber.append(self)
if (file, line) in self.bplist:
self.bplist[file, line].append(self)
else:
self.bplist[file, line] = [self]
def deleteMe(self):
index = (self.file, self.line)
self.bpbynumber[self.number] = None # No longer in list
self.bplist[index].remove(self)
if not self.bplist[index]:
# No more bp for this f:l combo
del self.bplist[index]
def enable(self):
self.enabled = True
def disable(self):
self.enabled = False
def bpprint(self, out=None):
if out is None:
out = sys.stdout
print(self.bpformat(), file=out)
def bpformat(self):
if self.temporary:
disp = 'del '
else:
disp = 'keep '
if self.enabled:
disp = disp + 'yes '
else:
disp = disp + 'no '
ret = '%-4dbreakpoint %s at %s:%d' % (self.number, disp,
self.file, self.line)
if self.cond:
ret += '\n\tstop only if %s' % (self.cond,)
if self.ignore:
ret += '\n\tignore next %d hits' % (self.ignore,)
if self.hits:
if self.hits > 1:
ss = 's'
else:
ss = ''
ret += '\n\tbreakpoint already hit %d time%s' % (self.hits, ss)
return ret
def __str__(self):
return 'breakpoint %s at %s:%s' % (self.number, self.file, self.line)
# -----------end of Breakpoint class----------
def checkfuncname(b, frame):
"""Check whether we should break here because of `b.funcname`."""
if not b.funcname:
# Breakpoint was set via line number.
if b.line != frame.f_lineno:
# Breakpoint was set at a line with a def statement and the function
# defined is called: don't break.
return False
return True
# Breakpoint set via function name.
if frame.f_code.co_name != b.funcname:
# It's not a function call, but rather execution of def statement.
return False
# We are in the right frame.
if not b.func_first_executable_line:
# The function is entered for the 1st time.
b.func_first_executable_line = frame.f_lineno
if b.func_first_executable_line != frame.f_lineno:
# But we are not at the first line number: don't break.
return False
return True
# Determines if there is an effective (active) breakpoint at this
# line of code. Returns breakpoint number or 0 if none
def effective(file, line, frame):
"""Determine which breakpoint for this file:line is to be acted upon.
Called only if we know there is a bpt at this
location. Returns breakpoint that was triggered and a flag
that indicates if it is ok to delete a temporary bp.
"""
possibles = Breakpoint.bplist[file, line]
for b in possibles:
if not b.enabled:
continue
if not checkfuncname(b, frame):
continue
# Count every hit when bp is enabled
b.hits += 1
if not b.cond:
# If unconditional, and ignoring go on to next, else break
if b.ignore > 0:
b.ignore -= 1
continue
else:
# breakpoint and marker that it's ok to delete if temporary
return (b, True)
else:
# Conditional bp.
# Ignore count applies only to those bpt hits where the
# condition evaluates to true.
try:
val = eval(b.cond, frame.f_globals, frame.f_locals)
if val:
if b.ignore > 0:
b.ignore -= 1
# continue
else:
return (b, True)
# else:
# continue
except:
# if eval fails, most conservative thing is to stop on
# breakpoint regardless of ignore count. Don't delete
# temporary, as another hint to user.
return (b, False)
return (None, None)
# -------------------- testing --------------------
class Tdb(Bdb):
def user_call(self, frame, args):
name = frame.f_code.co_name
if not name: name = '???'
print('+++ call', name, args)
def user_line(self, frame):
import linecache
name = frame.f_code.co_name
if not name: name = '???'
fn = self.canonic(frame.f_code.co_filename)
line = linecache.getline(fn, frame.f_lineno, frame.f_globals)
print('+++', fn, frame.f_lineno, name, ':', line.strip())
def user_return(self, frame, retval):
print('+++ return', retval)
def user_exception(self, frame, exc_stuff):
print('+++ exception', exc_stuff)
self.set_continue()
def foo(n):
print('foo(', n, ')')
x = bar(n*10)
print('bar returned', x)
def bar(a):
print('bar(', a, ')')
return a/2
def test():
t = Tdb()
t.run('import bdb; bdb.foo(10)')
"""Macintosh binhex compression/decompression.
easy interface:
binhex(inputfilename, outputfilename)
hexbin(inputfilename, outputfilename)
"""
#
# Jack Jansen, CWI, August 1995.
#
# The module is supposed to be as compatible as possible. Especially the
# easy interface should work "as expected" on any platform.
# XXXX Note: currently, textfiles appear in mac-form on all platforms.
# We seem to lack a simple character-translate in python.
# (we should probably use ISO-Latin-1 on all but the mac platform).
# XXXX The simple routines are too simple: they expect to hold the complete
# files in-core. Should be fixed.
# XXXX It would be nice to handle AppleDouble format on unix
# (for servers serving macs).
# XXXX I don't understand what happens when you get 0x90 times the same byte on
# input. The resulting code (xx 90 90) would appear to be interpreted as an
# escaped *value* of 0x90. All coders I've seen appear to ignore this nicety...
#
import io
import os
import struct
import binascii
__all__ = ["binhex","hexbin","Error"]
class Error(Exception):
pass
# States (what have we written)
_DID_HEADER = 0
_DID_DATA = 1
# Various constants
REASONABLY_LARGE = 32768 # Minimal amount we pass the rle-coder
LINELEN = 64
RUNCHAR = b"\x90"
#
# This code is no longer byte-order dependent
class FInfo:
def __init__(self):
self.Type = '????'
self.Creator = '????'
self.Flags = 0
def getfileinfo(name):
finfo = FInfo()
with io.open(name, 'rb') as fp:
# Quick check for textfile
data = fp.read(512)
if 0 not in data:
finfo.Type = 'TEXT'
fp.seek(0, 2)
dsize = fp.tell()
dir, file = os.path.split(name)
file = file.replace(':', '-', 1)
return file, finfo, dsize, 0
class openrsrc:
def __init__(self, *args):
pass
def read(self, *args):
return b''
def write(self, *args):
pass
def close(self):
pass
class _Hqxcoderengine:
"""Write data to the coder in 3-byte chunks"""
def __init__(self, ofp):
self.ofp = ofp
self.data = b''
self.hqxdata = b''
self.linelen = LINELEN - 1
def write(self, data):
self.data = self.data + data
datalen = len(self.data)
todo = (datalen // 3) * 3
data = self.data[:todo]
self.data = self.data[todo:]
if not data:
return
self.hqxdata = self.hqxdata + binascii.b2a_hqx(data)
self._flush(0)
def _flush(self, force):
first = 0
while first <= len(self.hqxdata) - self.linelen:
last = first + self.linelen
self.ofp.write(self.hqxdata[first:last] + b'\n')
self.linelen = LINELEN
first = last
self.hqxdata = self.hqxdata[first:]
if force:
self.ofp.write(self.hqxdata + b':\n')
def close(self):
if self.data:
self.hqxdata = self.hqxdata + binascii.b2a_hqx(self.data)
self._flush(1)
self.ofp.close()
del self.ofp
class _Rlecoderengine:
"""Write data to the RLE-coder in suitably large chunks"""
def __init__(self, ofp):
self.ofp = ofp
self.data = b''
def write(self, data):
self.data = self.data + data
if len(self.data) < REASONABLY_LARGE:
return
rledata = binascii.rlecode_hqx(self.data)
self.ofp.write(rledata)
self.data = b''
def close(self):
if self.data:
rledata = binascii.rlecode_hqx(self.data)
self.ofp.write(rledata)
self.ofp.close()
del self.ofp
class BinHex:
def __init__(self, name_finfo_dlen_rlen, ofp):
name, finfo, dlen, rlen = name_finfo_dlen_rlen
close_on_error = False
if isinstance(ofp, str):
ofname = ofp
ofp = io.open(ofname, 'wb')
close_on_error = True
try:
ofp.write(b'(This file must be converted with BinHex 4.0)\r\r:')
hqxer = _Hqxcoderengine(ofp)
self.ofp = _Rlecoderengine(hqxer)
self.crc = 0
if finfo is None:
finfo = FInfo()
self.dlen = dlen
self.rlen = rlen
self._writeinfo(name, finfo)
self.state = _DID_HEADER
except:
if close_on_error:
ofp.close()
raise
def _writeinfo(self, name, finfo):
nl = len(name)
if nl > 63:
raise Error('Filename too long')
d = bytes([nl]) + name.encode("latin-1") + b'\0'
tp, cr = finfo.Type, finfo.Creator
if isinstance(tp, str):
tp = tp.encode("latin-1")
if isinstance(cr, str):
cr = cr.encode("latin-1")
d2 = tp + cr
# Force all structs to be packed with big-endian
d3 = struct.pack('>h', finfo.Flags)
d4 = struct.pack('>ii', self.dlen, self.rlen)
info = d + d2 + d3 + d4
self._write(info)
self._writecrc()
def _write(self, data):
self.crc = binascii.crc_hqx(data, self.crc)
self.ofp.write(data)
def _writecrc(self):
# XXXX Should this be here??
# self.crc = binascii.crc_hqx('\0\0', self.crc)
if self.crc < 0:
fmt = '>h'
else:
fmt = '>H'
self.ofp.write(struct.pack(fmt, self.crc))
self.crc = 0
def write(self, data):
if self.state != _DID_HEADER:
raise Error('Writing data at the wrong time')
self.dlen = self.dlen - len(data)
self._write(data)
def close_data(self):
if self.dlen != 0:
raise Error('Incorrect data size, diff=%r' % (self.rlen,))
self._writecrc()
self.state = _DID_DATA
def write_rsrc(self, data):
if self.state < _DID_DATA:
self.close_data()
if self.state != _DID_DATA:
raise Error('Writing resource data at the wrong time')
self.rlen = self.rlen - len(data)
self._write(data)
def close(self):
if self.state is None:
return
try:
if self.state < _DID_DATA:
self.close_data()
if self.state != _DID_DATA:
raise Error('Close at the wrong time')
if self.rlen != 0:
raise Error("Incorrect resource-datasize, diff=%r" % (self.rlen,))
self._writecrc()
finally:
self.state = None
ofp = self.ofp
del self.ofp
ofp.close()
def binhex(inp, out):
"""binhex(infilename, outfilename): create binhex-encoded copy of a file"""
finfo = getfileinfo(inp)
ofp = BinHex(finfo, out)
ifp = io.open(inp, 'rb')
# XXXX Do textfile translation on non-mac systems
while True:
d = ifp.read(128000)
if not d: break
ofp.write(d)
ofp.close_data()
ifp.close()
ifp = openrsrc(inp, 'rb')
while True:
d = ifp.read(128000)
if not d: break
ofp.write_rsrc(d)
ofp.close()
ifp.close()
class _Hqxdecoderengine:
"""Read data via the decoder in 4-byte chunks"""
def __init__(self, ifp):
self.ifp = ifp
self.eof = 0
def read(self, totalwtd):
"""Read at least wtd bytes (or until EOF)"""
decdata = b''
wtd = totalwtd
#
# The loop here is convoluted, since we don't really now how
# much to decode: there may be newlines in the incoming data.
while wtd > 0:
if self.eof: return decdata
wtd = ((wtd + 2) // 3) * 4
data = self.ifp.read(wtd)
#
# Next problem: there may not be a complete number of
# bytes in what we pass to a2b. Solve by yet another
# loop.
#
while True:
try:
decdatacur, self.eof = binascii.a2b_hqx(data)
break
except binascii.Incomplete:
pass
newdata = self.ifp.read(1)
if not newdata:
raise Error('Premature EOF on binhex file')
data = data + newdata
decdata = decdata + decdatacur
wtd = totalwtd - len(decdata)
if not decdata and not self.eof:
raise Error('Premature EOF on binhex file')
return decdata
def close(self):
self.ifp.close()
class _Rledecoderengine:
"""Read data via the RLE-coder"""
def __init__(self, ifp):
self.ifp = ifp
self.pre_buffer = b''
self.post_buffer = b''
self.eof = 0
def read(self, wtd):
if wtd > len(self.post_buffer):
self._fill(wtd - len(self.post_buffer))
rv = self.post_buffer[:wtd]
self.post_buffer = self.post_buffer[wtd:]
return rv
def _fill(self, wtd):
self.pre_buffer = self.pre_buffer + self.ifp.read(wtd + 4)
if self.ifp.eof:
self.post_buffer = self.post_buffer + \
binascii.rledecode_hqx(self.pre_buffer)
self.pre_buffer = b''
return
#
# Obfuscated code ahead. We have to take care that we don't
# end up with an orphaned RUNCHAR later on. So, we keep a couple
# of bytes in the buffer, depending on what the end of
# the buffer looks like:
# '\220\0\220' - Keep 3 bytes: repeated \220 (escaped as \220\0)
# '?\220' - Keep 2 bytes: repeated something-else
# '\220\0' - Escaped \220: Keep 2 bytes.
# '?\220?' - Complete repeat sequence: decode all
# otherwise: keep 1 byte.
#
mark = len(self.pre_buffer)
if self.pre_buffer[-3:] == RUNCHAR + b'\0' + RUNCHAR:
mark = mark - 3
elif self.pre_buffer[-1:] == RUNCHAR:
mark = mark - 2
elif self.pre_buffer[-2:] == RUNCHAR + b'\0':
mark = mark - 2
elif self.pre_buffer[-2:-1] == RUNCHAR:
pass # Decode all
else:
mark = mark - 1
self.post_buffer = self.post_buffer + \
binascii.rledecode_hqx(self.pre_buffer[:mark])
self.pre_buffer = self.pre_buffer[mark:]
def close(self):
self.ifp.close()
class HexBin:
def __init__(self, ifp):
if isinstance(ifp, str):
ifp = io.open(ifp, 'rb')
#
# Find initial colon.
#
while True:
ch = ifp.read(1)
if not ch:
raise Error("No binhex data found")
# Cater for \r\n terminated lines (which show up as \n\r, hence
# all lines start with \r)
if ch == b'\r':
continue
if ch == b':':
break
hqxifp = _Hqxdecoderengine(ifp)
self.ifp = _Rledecoderengine(hqxifp)
self.crc = 0
self._readheader()
def _read(self, len):
data = self.ifp.read(len)
self.crc = binascii.crc_hqx(data, self.crc)
return data
def _checkcrc(self):
filecrc = struct.unpack('>h', self.ifp.read(2))[0] & 0xffff
#self.crc = binascii.crc_hqx('\0\0', self.crc)
# XXXX Is this needed??
self.crc = self.crc & 0xffff
if filecrc != self.crc:
raise Error('CRC error, computed %x, read %x'
% (self.crc, filecrc))
self.crc = 0
def _readheader(self):
len = self._read(1)
fname = self._read(ord(len))
rest = self._read(1 + 4 + 4 + 2 + 4 + 4)
self._checkcrc()
type = rest[1:5]
creator = rest[5:9]
flags = struct.unpack('>h', rest[9:11])[0]
self.dlen = struct.unpack('>l', rest[11:15])[0]
self.rlen = struct.unpack('>l', rest[15:19])[0]
self.FName = fname
self.FInfo = FInfo()
self.FInfo.Creator = creator
self.FInfo.Type = type
self.FInfo.Flags = flags
self.state = _DID_HEADER
def read(self, *n):
if self.state != _DID_HEADER:
raise Error('Read data at wrong time')
if n:
n = n[0]
n = min(n, self.dlen)
else:
n = self.dlen
rv = b''
while len(rv) < n:
rv = rv + self._read(n-len(rv))
self.dlen = self.dlen - n
return rv
def close_data(self):
if self.state != _DID_HEADER:
raise Error('close_data at wrong time')
if self.dlen:
dummy = self._read(self.dlen)
self._checkcrc()
self.state = _DID_DATA
def read_rsrc(self, *n):
if self.state == _DID_HEADER:
self.close_data()
if self.state != _DID_DATA:
raise Error('Read resource data at wrong time')
if n:
n = n[0]
n = min(n, self.rlen)
else:
n = self.rlen
self.rlen = self.rlen - n
return self._read(n)
def close(self):
if self.state is None:
return
try:
if self.rlen:
dummy = self.read_rsrc(self.rlen)
self._checkcrc()
finally:
self.state = None
self.ifp.close()
def hexbin(inp, out):
"""hexbin(infilename, outfilename) - Decode binhexed file"""
ifp = HexBin(inp)
finfo = ifp.FInfo
if not out:
out = ifp.FName
ofp = io.open(out, 'wb')
# XXXX Do translation on non-mac systems
while True:
d = ifp.read(128000)
if not d: break
ofp.write(d)
ofp.close()
ifp.close_data()
d = ifp.read_rsrc(128000)
if d:
ofp = openrsrc(out, 'wb')
ofp.write(d)
while True:
d = ifp.read_rsrc(128000)
if not d: break
ofp.write(d)
ofp.close()
ifp.close()
"""Bisection algorithms."""
def insort_right(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the right of the rightmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]: hi = mid
else: lo = mid+1
a.insert(lo, x)
insort = insort_right # backward compatibility
def bisect_right(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e <= x, and all e in
a[i:] have e > x. So if x already appears in the list, a.insert(x) will
insert just after the rightmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]: hi = mid
else: lo = mid+1
return lo
bisect = bisect_right # backward compatibility
def insort_left(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the left of the leftmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
a.insert(lo, x)
def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
return lo
# Overwrite above definitions with a fast C implementation
try:
from _bisect import *
except ImportError:
pass
"""Interface to the libbzip2 compression library.
This module provides a file interface, classes for incremental
(de)compression, and functions for one-shot (de)compression.
"""
__all__ = ["BZ2File", "BZ2Compressor", "BZ2Decompressor",
"open", "compress", "decompress"]
__author__ = "Nadeem Vawda <[email protected]>"
from builtins import open as _builtin_open
import io
import warnings
try:
from threading import RLock
except ImportError:
from dummy_threading import RLock
from _bz2 import BZ2Compressor, BZ2Decompressor
_MODE_CLOSED = 0
_MODE_READ = 1
_MODE_READ_EOF = 2
_MODE_WRITE = 3
_BUFFER_SIZE = 8192
class BZ2File(io.BufferedIOBase):
"""A file object providing transparent bzip2 (de)compression.
A BZ2File can act as a wrapper for an existing file object, or refer
directly to a named file on disk.
Note that BZ2File provides a *binary* file interface - data read is
returned as bytes, and data to be written should be given as bytes.
"""
def __init__(self, filename, mode="r", buffering=None, compresslevel=9):
"""Open a bzip2-compressed file.
If filename is a str or bytes object, it gives the name
of the file to be opened. Otherwise, it should be a file object,
which will be used to read or write the compressed data.
mode can be 'r' for reading (default), 'w' for (over)writing,
'x' for creating exclusively, or 'a' for appending. These can
equivalently be given as 'rb', 'wb', 'xb', and 'ab'.
buffering is ignored. Its use is deprecated.
If mode is 'w', 'x' or 'a', compresslevel can be a number between 1
and 9 specifying the level of compression: 1 produces the least
compression, and 9 (default) produces the most compression.
If mode is 'r', the input file may be the concatenation of
multiple compressed streams.
"""
# This lock must be recursive, so that BufferedIOBase's
# readline(), readlines() and writelines() don't deadlock.
self._lock = RLock()
self._fp = None
self._closefp = False
self._mode = _MODE_CLOSED
self._pos = 0
self._size = -1
if buffering is not None:
warnings.warn("Use of 'buffering' argument is deprecated",
DeprecationWarning)
if not (1 <= compresslevel <= 9):
raise ValueError("compresslevel must be between 1 and 9")
if mode in ("", "r", "rb"):
mode = "rb"
mode_code = _MODE_READ
self._decompressor = BZ2Decompressor()
self._buffer = b""
self._buffer_offset = 0
elif mode in ("w", "wb"):
mode = "wb"
mode_code = _MODE_WRITE
self._compressor = BZ2Compressor(compresslevel)
elif mode in ("x", "xb"):
mode = "xb"
mode_code = _MODE_WRITE
self._compressor = BZ2Compressor(compresslevel)
elif mode in ("a", "ab"):
mode = "ab"
mode_code = _MODE_WRITE
self._compressor = BZ2Compressor(compresslevel)
else:
raise ValueError("Invalid mode: %r" % (mode,))
if isinstance(filename, (str, bytes)):
self._fp = _builtin_open(filename, mode)
self._closefp = True
self._mode = mode_code
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
self._mode = mode_code
else:
raise TypeError("filename must be a str or bytes object, or a file")
def close(self):
"""Flush and close the file.
May be called more than once without error. Once the file is
closed, any other operation on it will raise a ValueError.
"""
with self._lock:
if self._mode == _MODE_CLOSED:
return
try:
if self._mode in (_MODE_READ, _MODE_READ_EOF):
self._decompressor = None
elif self._mode == _MODE_WRITE:
self._fp.write(self._compressor.flush())
self._compressor = None
finally:
try:
if self._closefp:
self._fp.close()
finally:
self._fp = None
self._closefp = False
self._mode = _MODE_CLOSED
self._buffer = b""
self._buffer_offset = 0
@property
def closed(self):
"""True if this file is closed."""
return self._mode == _MODE_CLOSED
def fileno(self):
"""Return the file descriptor for the underlying file."""
self._check_not_closed()
return self._fp.fileno()
def seekable(self):
"""Return whether the file supports seeking."""
return self.readable() and self._fp.seekable()
def readable(self):
"""Return whether the file was opened for reading."""
self._check_not_closed()
return self._mode in (_MODE_READ, _MODE_READ_EOF)
def writable(self):
"""Return whether the file was opened for writing."""
self._check_not_closed()
return self._mode == _MODE_WRITE
# Mode-checking helper functions.
def _check_not_closed(self):
if self.closed:
raise ValueError("I/O operation on closed file")
def _check_can_read(self):
if self._mode not in (_MODE_READ, _MODE_READ_EOF):
self._check_not_closed()
raise io.UnsupportedOperation("File not open for reading")
def _check_can_write(self):
if self._mode != _MODE_WRITE:
self._check_not_closed()
raise io.UnsupportedOperation("File not open for writing")
def _check_can_seek(self):
if self._mode not in (_MODE_READ, _MODE_READ_EOF):
self._check_not_closed()
raise io.UnsupportedOperation("Seeking is only supported "
"on files open for reading")
if not self._fp.seekable():
raise io.UnsupportedOperation("The underlying file object "
"does not support seeking")
# Fill the readahead buffer if it is empty. Returns False on EOF.
def _fill_buffer(self):
if self._mode == _MODE_READ_EOF:
return False
# Depending on the input data, our call to the decompressor may not
# return any data. In this case, try again after reading another block.
while self._buffer_offset == len(self._buffer):
rawblock = (self._decompressor.unused_data or
self._fp.read(_BUFFER_SIZE))
if not rawblock:
if self._decompressor.eof:
# End-of-stream marker and end of file. We're good.
self._mode = _MODE_READ_EOF
self._size = self._pos
return False
else:
# Problem - we were expecting more compressed data.
raise EOFError("Compressed file ended before the "
"end-of-stream marker was reached")
if self._decompressor.eof:
# Continue to next stream.
self._decompressor = BZ2Decompressor()
try:
self._buffer = self._decompressor.decompress(rawblock)
except OSError:
# Trailing data isn't a valid bzip2 stream. We're done here.
self._mode = _MODE_READ_EOF
self._size = self._pos
return False
else:
self._buffer = self._decompressor.decompress(rawblock)
self._buffer_offset = 0
return True
# Read data until EOF.
# If return_data is false, consume the data without returning it.
def _read_all(self, return_data=True):
# The loop assumes that _buffer_offset is 0. Ensure that this is true.
self._buffer = self._buffer[self._buffer_offset:]
self._buffer_offset = 0
blocks = []
while self._fill_buffer():
if return_data:
blocks.append(self._buffer)
self._pos += len(self._buffer)
self._buffer = b""
if return_data:
return b"".join(blocks)
# Read a block of up to n bytes.
# If return_data is false, consume the data without returning it.
def _read_block(self, n, return_data=True):
# If we have enough data buffered, return immediately.
end = self._buffer_offset + n
if end <= len(self._buffer):
data = self._buffer[self._buffer_offset : end]
self._buffer_offset = end
self._pos += len(data)
return data if return_data else None
# The loop assumes that _buffer_offset is 0. Ensure that this is true.
self._buffer = self._buffer[self._buffer_offset:]
self._buffer_offset = 0
blocks = []
while n > 0 and self._fill_buffer():
if n < len(self._buffer):
data = self._buffer[:n]
self._buffer_offset = n
else:
data = self._buffer
self._buffer = b""
if return_data:
blocks.append(data)
self._pos += len(data)
n -= len(data)
if return_data:
return b"".join(blocks)
def peek(self, n=0):
"""Return buffered data without advancing the file position.
Always returns at least one byte of data, unless at EOF.
The exact number of bytes returned is unspecified.
"""
with self._lock:
self._check_can_read()
if not self._fill_buffer():
return b""
return self._buffer[self._buffer_offset:]
def read(self, size=-1):
"""Read up to size uncompressed bytes from the file.
If size is negative or omitted, read until EOF is reached.
Returns b'' if the file is already at EOF.
"""
with self._lock:
self._check_can_read()
if size == 0:
return b""
elif size < 0:
return self._read_all()
else:
return self._read_block(size)
def read1(self, size=-1):
"""Read up to size uncompressed bytes, while trying to avoid
making multiple reads from the underlying stream.
Returns b'' if the file is at EOF.
"""
# Usually, read1() calls _fp.read() at most once. However, sometimes
# this does not give enough data for the decompressor to make progress.
# In this case we make multiple reads, to avoid returning b"".
with self._lock:
self._check_can_read()
if (size == 0 or
# Only call _fill_buffer() if the buffer is actually empty.
# This gives a significant speedup if *size* is small.
(self._buffer_offset == len(self._buffer) and not self._fill_buffer())):
return b""
if size > 0:
data = self._buffer[self._buffer_offset :
self._buffer_offset + size]
self._buffer_offset += len(data)
else:
data = self._buffer[self._buffer_offset:]
self._buffer = b""
self._buffer_offset = 0
self._pos += len(data)
return data
def readinto(self, b):
"""Read up to len(b) bytes into b.
Returns the number of bytes read (0 for EOF).
"""
with self._lock:
return io.BufferedIOBase.readinto(self, b)
def readline(self, size=-1):
"""Read a line of uncompressed bytes from the file.
The terminating newline (if present) is retained. If size is
non-negative, no more than size bytes will be read (in which
case the line may be incomplete). Returns b'' if already at EOF.
"""
if not isinstance(size, int):
if not hasattr(size, "__index__"):
raise TypeError("Integer argument expected")
size = size.__index__()
with self._lock:
self._check_can_read()
# Shortcut for the common case - the whole line is in the buffer.
if size < 0:
end = self._buffer.find(b"\n", self._buffer_offset) + 1
if end > 0:
line = self._buffer[self._buffer_offset : end]
self._buffer_offset = end
self._pos += len(line)
return line
return io.BufferedIOBase.readline(self, size)
def readlines(self, size=-1):
"""Read a list of lines of uncompressed bytes from the file.
size can be specified to control the number of lines read: no
further lines will be read once the total size of the lines read
so far equals or exceeds size.
"""
if not isinstance(size, int):
if not hasattr(size, "__index__"):
raise TypeError("Integer argument expected")
size = size.__index__()
with self._lock:
return io.BufferedIOBase.readlines(self, size)
def write(self, data):
"""Write a byte string to the file.
Returns the number of uncompressed bytes written, which is
always len(data). Note that due to buffering, the file on disk
may not reflect the data written until close() is called.
"""
with self._lock:
self._check_can_write()
compressed = self._compressor.compress(data)
self._fp.write(compressed)
self._pos += len(data)
return len(data)
def writelines(self, seq):
"""Write a sequence of byte strings to the file.
Returns the number of uncompressed bytes written.
seq can be any iterable yielding byte strings.
Line separators are not added between the written byte strings.
"""
with self._lock:
return io.BufferedIOBase.writelines(self, seq)
# Rewind the file to the beginning of the data stream.
def _rewind(self):
self._fp.seek(0, 0)
self._mode = _MODE_READ
self._pos = 0
self._decompressor = BZ2Decompressor()
self._buffer = b""
self._buffer_offset = 0
def seek(self, offset, whence=0):
"""Change the file position.
The new position is specified by offset, relative to the
position indicated by whence. Values for whence are:
0: start of stream (default); offset must not be negative
1: current stream position
2: end of stream; offset must not be positive
Returns the new file position.
Note that seeking is emulated, so depending on the parameters,
this operation may be extremely slow.
"""
with self._lock:
self._check_can_seek()
# Recalculate offset as an absolute file position.
if whence == 0:
pass
elif whence == 1:
offset = self._pos + offset
elif whence == 2:
# Seeking relative to EOF - we need to know the file's size.
if self._size < 0:
self._read_all(return_data=False)
offset = self._size + offset
else:
raise ValueError("Invalid value for whence: %s" % (whence,))
# Make it so that offset is the number of bytes to skip forward.
if offset < self._pos:
self._rewind()
else:
offset -= self._pos
# Read and discard data until we reach the desired position.
self._read_block(offset, return_data=False)
return self._pos
def tell(self):
"""Return the current file position."""
with self._lock:
self._check_not_closed()
return self._pos
def open(filename, mode="rb", compresslevel=9,
encoding=None, errors=None, newline=None):
"""Open a bzip2-compressed file in binary or text mode.
The filename argument can be an actual filename (a str or bytes
object), or an existing file object to read from or write to.
The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or
"ab" for binary mode, or "rt", "wt", "xt" or "at" for text mode.
The default mode is "rb", and the default compresslevel is 9.
For binary mode, this function is equivalent to the BZ2File
constructor: BZ2File(filename, mode, compresslevel). In this case,
the encoding, errors and newline arguments must not be provided.
For text mode, a BZ2File object is created, and wrapped in an
io.TextIOWrapper instance with the specified encoding, error
handling behavior, and line ending(s).
"""
if "t" in mode:
if "b" in mode:
raise ValueError("Invalid mode: %r" % (mode,))
else:
if encoding is not None:
raise ValueError("Argument 'encoding' not supported in binary mode")
if errors is not None:
raise ValueError("Argument 'errors' not supported in binary mode")
if newline is not None:
raise ValueError("Argument 'newline' not supported in binary mode")
bz_mode = mode.replace("t", "")
binary_file = BZ2File(filename, bz_mode, compresslevel=compresslevel)
if "t" in mode:
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
def compress(data, compresslevel=9):
"""Compress a block of data.
compresslevel, if given, must be a number between 1 and 9.
For incremental compression, use a BZ2Compressor object instead.
"""
comp = BZ2Compressor(compresslevel)
return comp.compress(data) + comp.flush()
def decompress(data):
"""Decompress a block of data.
For incremental decompression, use a BZ2Decompressor object instead.
"""
results = []
while data:
decomp = BZ2Decompressor()
try:
res = decomp.decompress(data)
except OSError:
if results:
break # Leftover data is not a valid bzip2 stream; ignore it.
else:
raise # Error on the first iteration; bail out.
results.append(res)
if not decomp.eof:
raise ValueError("Compressed data ended before the "
"end-of-stream marker was reached")
data = decomp.unused_data
return b"".join(results)
"""Calendar printing functions
Note when comparing these calendars to the ones printed by cal(1): By
default, these calendars have Monday as the first day of the week, and
Sunday as the last (the European convention). Use setfirstweekday() to
set the first day of the week (0=Monday, 6=Sunday)."""
import sys
import datetime
import locale as _locale
__all__ = ["IllegalMonthError", "IllegalWeekdayError", "setfirstweekday",
"firstweekday", "isleap", "leapdays", "weekday", "monthrange",
"monthcalendar", "prmonth", "month", "prcal", "calendar",
"timegm", "month_name", "month_abbr", "day_name", "day_abbr"]
# Exception raised for bad input (with string parameter for details)
error = ValueError
# Exceptions raised for bad input
class IllegalMonthError(ValueError):
def __init__(self, month):
self.month = month
def __str__(self):
return "bad month number %r; must be 1-12" % self.month
class IllegalWeekdayError(ValueError):
def __init__(self, weekday):
self.weekday = weekday
def __str__(self):
return "bad weekday number %r; must be 0 (Monday) to 6 (Sunday)" % self.weekday
# Constants for months referenced later
January = 1
February = 2
# Number of days per month (except for February in leap years)
mdays = [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
# This module used to have hard-coded lists of day and month names, as
# English strings. The classes following emulate a read-only version of
# that, but supply localized names. Note that the values are computed
# fresh on each call, in case the user changes locale between calls.
class _localized_month:
_months = [datetime.date(2001, i+1, 1).strftime for i in range(12)]
_months.insert(0, lambda x: "")
def __init__(self, format):
self.format = format
def __getitem__(self, i):
funcs = self._months[i]
if isinstance(i, slice):
return [f(self.format) for f in funcs]
else:
return funcs(self.format)
def __len__(self):
return 13
class _localized_day:
# January 1, 2001, was a Monday.
_days = [datetime.date(2001, 1, i+1).strftime for i in range(7)]
def __init__(self, format):
self.format = format
def __getitem__(self, i):
funcs = self._days[i]
if isinstance(i, slice):
return [f(self.format) for f in funcs]
else:
return funcs(self.format)
def __len__(self):
return 7
# Full and abbreviated names of weekdays
day_name = _localized_day('%A')
day_abbr = _localized_day('%a')
# Full and abbreviated names of months (1-based arrays!!!)
month_name = _localized_month('%B')
month_abbr = _localized_month('%b')
# Constants for weekdays
(MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7)
def isleap(year):
"""Return True for leap years, False for non-leap years."""
return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
def leapdays(y1, y2):
"""Return number of leap years in range [y1, y2).
Assume y1 <= y2."""
y1 -= 1
y2 -= 1
return (y2//4 - y1//4) - (y2//100 - y1//100) + (y2//400 - y1//400)
def weekday(year, month, day):
"""Return weekday (0-6 ~ Mon-Sun) for year (1970-...), month (1-12),
day (1-31)."""
return datetime.date(year, month, day).weekday()
def monthrange(year, month):
"""Return weekday (0-6 ~ Mon-Sun) and number of days (28-31) for
year, month."""
if not 1 <= month <= 12:
raise IllegalMonthError(month)
day1 = weekday(year, month, 1)
ndays = mdays[month] + (month == February and isleap(year))
return day1, ndays
class Calendar(object):
"""
Base calendar class. This class doesn't do any formatting. It simply
provides data to subclasses.
"""
def __init__(self, firstweekday=0):
self.firstweekday = firstweekday # 0 = Monday, 6 = Sunday
def getfirstweekday(self):
return self._firstweekday % 7
def setfirstweekday(self, firstweekday):
self._firstweekday = firstweekday
firstweekday = property(getfirstweekday, setfirstweekday)
def iterweekdays(self):
"""
Return an iterator for one week of weekday numbers starting with the
configured first one.
"""
for i in range(self.firstweekday, self.firstweekday + 7):
yield i%7
def itermonthdates(self, year, month):
"""
Return an iterator for one month. The iterator will yield datetime.date
values and will always iterate through complete weeks, so it will yield
dates outside the specified month.
"""
date = datetime.date(year, month, 1)
# Go back to the beginning of the week
days = (date.weekday() - self.firstweekday) % 7
date -= datetime.timedelta(days=days)
oneday = datetime.timedelta(days=1)
while True:
yield date
try:
date += oneday
except OverflowError:
# Adding one day could fail after datetime.MAXYEAR
break
if date.month != month and date.weekday() == self.firstweekday:
break
def itermonthdays2(self, year, month):
"""
Like itermonthdates(), but will yield (day number, weekday number)
tuples. For days outside the specified month the day number is 0.
"""
for date in self.itermonthdates(year, month):
if date.month != month:
yield (0, date.weekday())
else:
yield (date.day, date.weekday())
def itermonthdays(self, year, month):
"""
Like itermonthdates(), but will yield day numbers. For days outside
the specified month the day number is 0.
"""
for date in self.itermonthdates(year, month):
if date.month != month:
yield 0
else:
yield date.day
def monthdatescalendar(self, year, month):
"""
Return a matrix (list of lists) representing a month's calendar.
Each row represents a week; week entries are datetime.date values.
"""
dates = list(self.itermonthdates(year, month))
return [ dates[i:i+7] for i in range(0, len(dates), 7) ]
def monthdays2calendar(self, year, month):
"""
Return a matrix representing a month's calendar.
Each row represents a week; week entries are
(day number, weekday number) tuples. Day numbers outside this month
are zero.
"""
days = list(self.itermonthdays2(year, month))
return [ days[i:i+7] for i in range(0, len(days), 7) ]
def monthdayscalendar(self, year, month):
"""
Return a matrix representing a month's calendar.
Each row represents a week; days outside this month are zero.
"""
days = list(self.itermonthdays(year, month))
return [ days[i:i+7] for i in range(0, len(days), 7) ]
def yeardatescalendar(self, year, width=3):
"""
Return the data for the specified year ready for formatting. The return
value is a list of month rows. Each month row contains up to width months.
Each month contains between 4 and 6 weeks and each week contains 1-7
days. Days are datetime.date objects.
"""
months = [
self.monthdatescalendar(year, i)
for i in range(January, January+12)
]
return [months[i:i+width] for i in range(0, len(months), width) ]
def yeardays2calendar(self, year, width=3):
"""
Return the data for the specified year ready for formatting (similar to
yeardatescalendar()). Entries in the week lists are
(day number, weekday number) tuples. Day numbers outside this month are
zero.
"""
months = [
self.monthdays2calendar(year, i)
for i in range(January, January+12)
]
return [months[i:i+width] for i in range(0, len(months), width) ]
def yeardayscalendar(self, year, width=3):
"""
Return the data for the specified year ready for formatting (similar to
yeardatescalendar()). Entries in the week lists are day numbers.
Day numbers outside this month are zero.
"""
months = [
self.monthdayscalendar(year, i)
for i in range(January, January+12)
]
return [months[i:i+width] for i in range(0, len(months), width) ]
class TextCalendar(Calendar):
"""
Subclass of Calendar that outputs a calendar as a simple plain text
similar to the UNIX program cal.
"""
def prweek(self, theweek, width):
"""
Print a single week (no newline).
"""
print(self.formatweek(theweek, width), end=' ')
def formatday(self, day, weekday, width):
"""
Returns a formatted day.
"""
if day == 0:
s = ''
else:
s = '%2i' % day # right-align single-digit days
return s.center(width)
def formatweek(self, theweek, width):
"""
Returns a single week in a string (no newline).
"""
return ' '.join(self.formatday(d, wd, width) for (d, wd) in theweek)
def formatweekday(self, day, width):
"""
Returns a formatted week day name.
"""
if width >= 9:
names = day_name
else:
names = day_abbr
return names[day][:width].center(width)
def formatweekheader(self, width):
"""
Return a header for a week.
"""
return ' '.join(self.formatweekday(i, width) for i in self.iterweekdays())
def formatmonthname(self, theyear, themonth, width, withyear=True):
"""
Return a formatted month name.
"""
s = month_name[themonth]
if withyear:
s = "%s %r" % (s, theyear)
return s.center(width)
def prmonth(self, theyear, themonth, w=0, l=0):
"""
Print a month's calendar.
"""
print(self.formatmonth(theyear, themonth, w, l), end=' ')
def formatmonth(self, theyear, themonth, w=0, l=0):
"""
Return a month's calendar string (multi-line).
"""
w = max(2, w)
l = max(1, l)
s = self.formatmonthname(theyear, themonth, 7 * (w + 1) - 1)
s = s.rstrip()
s += '\n' * l
s += self.formatweekheader(w).rstrip()
s += '\n' * l
for week in self.monthdays2calendar(theyear, themonth):
s += self.formatweek(week, w).rstrip()
s += '\n' * l
return s
def formatyear(self, theyear, w=2, l=1, c=6, m=3):
"""
Returns a year's calendar as a multi-line string.
"""
w = max(2, w)
l = max(1, l)
c = max(2, c)
colwidth = (w + 1) * 7 - 1
v = []
a = v.append
a(repr(theyear).center(colwidth*m+c*(m-1)).rstrip())
a('\n'*l)
header = self.formatweekheader(w)
for (i, row) in enumerate(self.yeardays2calendar(theyear, m)):
# months in this row
months = range(m*i+1, min(m*(i+1)+1, 13))
a('\n'*l)
names = (self.formatmonthname(theyear, k, colwidth, False)
for k in months)
a(formatstring(names, colwidth, c).rstrip())
a('\n'*l)
headers = (header for k in months)
a(formatstring(headers, colwidth, c).rstrip())
a('\n'*l)
# max number of weeks for this row
height = max(len(cal) for cal in row)
for j in range(height):
weeks = []
for cal in row:
if j >= len(cal):
weeks.append('')
else:
weeks.append(self.formatweek(cal[j], w))
a(formatstring(weeks, colwidth, c).rstrip())
a('\n' * l)
return ''.join(v)
def pryear(self, theyear, w=0, l=0, c=6, m=3):
"""Print a year's calendar."""
print(self.formatyear(theyear, w, l, c, m))
class HTMLCalendar(Calendar):
"""
This calendar returns complete HTML pages.
"""
# CSS classes for the day <td>s
cssclasses = ["mon", "tue", "wed", "thu", "fri", "sat", "sun"]
def formatday(self, day, weekday):
"""
Return a day as a table cell.
"""
if day == 0:
return '<td class="noday"> </td>' # day outside month
else:
return '<td class="%s">%d</td>' % (self.cssclasses[weekday], day)
def formatweek(self, theweek):
"""
Return a complete week as a table row.
"""
s = ''.join(self.formatday(d, wd) for (d, wd) in theweek)
return '<tr>%s</tr>' % s
def formatweekday(self, day):
"""
Return a weekday name as a table header.
"""
return '<th class="%s">%s</th>' % (self.cssclasses[day], day_abbr[day])
def formatweekheader(self):
"""
Return a header for a week as a table row.
"""
s = ''.join(self.formatweekday(i) for i in self.iterweekdays())
return '<tr>%s</tr>' % s
def formatmonthname(self, theyear, themonth, withyear=True):
"""
Return a month name as a table row.
"""
if withyear:
s = '%s %s' % (month_name[themonth], theyear)
else:
s = '%s' % month_name[themonth]
return '<tr><th colspan="7" class="month">%s</th></tr>' % s
def formatmonth(self, theyear, themonth, withyear=True):
"""
Return a formatted month as a table.
"""
v = []
a = v.append
a('<table border="0" cellpadding="0" cellspacing="0" class="month">')
a('\n')
a(self.formatmonthname(theyear, themonth, withyear=withyear))
a('\n')
a(self.formatweekheader())
a('\n')
for week in self.monthdays2calendar(theyear, themonth):
a(self.formatweek(week))
a('\n')
a('</table>')
a('\n')
return ''.join(v)
def formatyear(self, theyear, width=3):
"""
Return a formatted year as a table of tables.
"""
v = []
a = v.append
width = max(width, 1)
a('<table border="0" cellpadding="0" cellspacing="0" class="year">')
a('\n')
a('<tr><th colspan="%d" class="year">%s</th></tr>' % (width, theyear))
for i in range(January, January+12, width):
# months in this row
months = range(i, min(i+width, 13))
a('<tr>')
for m in months:
a('<td>')
a(self.formatmonth(theyear, m, withyear=False))
a('</td>')
a('</tr>')
a('</table>')
return ''.join(v)
def formatyearpage(self, theyear, width=3, css='calendar.css', encoding=None):
"""
Return a formatted year as a complete HTML page.
"""
if encoding is None:
encoding = sys.getdefaultencoding()
v = []
a = v.append
a('<?xml version="1.0" encoding="%s"?>\n' % encoding)
a('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n')
a('<html>\n')
a('<head>\n')
a('<meta http-equiv="Content-Type" content="text/html; charset=%s" />\n' % encoding)
if css is not None:
a('<link rel="stylesheet" type="text/css" href="%s" />\n' % css)
a('<title>Calendar for %d</title>\n' % theyear)
a('</head>\n')
a('<body>\n')
a(self.formatyear(theyear, width))
a('</body>\n')
a('</html>\n')
return ''.join(v).encode(encoding, "xmlcharrefreplace")
class different_locale:
def __init__(self, locale):
self.locale = locale
def __enter__(self):
self.oldlocale = _locale.getlocale(_locale.LC_TIME)
_locale.setlocale(_locale.LC_TIME, self.locale)
def __exit__(self, *args):
_locale.setlocale(_locale.LC_TIME, self.oldlocale)
class LocaleTextCalendar(TextCalendar):
"""
This class can be passed a locale name in the constructor and will return
month and weekday names in the specified locale. If this locale includes
an encoding all strings containing month and weekday names will be returned
as unicode.
"""
def __init__(self, firstweekday=0, locale=None):
TextCalendar.__init__(self, firstweekday)
if locale is None:
locale = _locale.getdefaultlocale()
self.locale = locale
def formatweekday(self, day, width):
with different_locale(self.locale):
if width >= 9:
names = day_name
else:
names = day_abbr
name = names[day]
return name[:width].center(width)
def formatmonthname(self, theyear, themonth, width, withyear=True):
with different_locale(self.locale):
s = month_name[themonth]
if withyear:
s = "%s %r" % (s, theyear)
return s.center(width)
class LocaleHTMLCalendar(HTMLCalendar):
"""
This class can be passed a locale name in the constructor and will return
month and weekday names in the specified locale. If this locale includes
an encoding all strings containing month and weekday names will be returned
as unicode.
"""
def __init__(self, firstweekday=0, locale=None):
HTMLCalendar.__init__(self, firstweekday)
if locale is None:
locale = _locale.getdefaultlocale()
self.locale = locale
def formatweekday(self, day):
with different_locale(self.locale):
s = day_abbr[day]
return '<th class="%s">%s</th>' % (self.cssclasses[day], s)
def formatmonthname(self, theyear, themonth, withyear=True):
with different_locale(self.locale):
s = month_name[themonth]
if withyear:
s = '%s %s' % (s, theyear)
return '<tr><th colspan="7" class="month">%s</th></tr>' % s
# Support for old module level interface
c = TextCalendar()
firstweekday = c.getfirstweekday
def setfirstweekday(firstweekday):
if not MONDAY <= firstweekday <= SUNDAY:
raise IllegalWeekdayError(firstweekday)
c.firstweekday = firstweekday
monthcalendar = c.monthdayscalendar
prweek = c.prweek
week = c.formatweek
weekheader = c.formatweekheader
prmonth = c.prmonth
month = c.formatmonth
calendar = c.formatyear
prcal = c.pryear
# Spacing of month columns for multi-column year calendar
_colwidth = 7*3 - 1 # Amount printed by prweek()
_spacing = 6 # Number of spaces between columns
def format(cols, colwidth=_colwidth, spacing=_spacing):
"""Prints multi-column formatting for year calendars"""
print(formatstring(cols, colwidth, spacing))
def formatstring(cols, colwidth=_colwidth, spacing=_spacing):
"""Returns a string formatted from n strings, centered within n columns."""
spacing *= ' '
return spacing.join(c.center(colwidth) for c in cols)
EPOCH = 1970
_EPOCH_ORD = datetime.date(EPOCH, 1, 1).toordinal()
def timegm(tuple):
"""Unrelated but handy function to calculate Unix timestamp from GMT."""
year, month, day, hour, minute, second = tuple[:6]
days = datetime.date(year, month, 1).toordinal() - _EPOCH_ORD + day - 1
hours = days*24 + hour
minutes = hours*60 + minute
seconds = minutes*60 + second
return seconds
def main(args):
import optparse
parser = optparse.OptionParser(usage="usage: %prog [options] [year [month]]")
parser.add_option(
"-w", "--width",
dest="width", type="int", default=2,
help="width of date column (default 2, text only)"
)
parser.add_option(
"-l", "--lines",
dest="lines", type="int", default=1,
help="number of lines for each week (default 1, text only)"
)
parser.add_option(
"-s", "--spacing",
dest="spacing", type="int", default=6,
help="spacing between months (default 6, text only)"
)
parser.add_option(
"-m", "--months",
dest="months", type="int", default=3,
help="months per row (default 3, text only)"
)
parser.add_option(
"-c", "--css",
dest="css", default="calendar.css",
help="CSS to use for page (html only)"
)
parser.add_option(
"-L", "--locale",
dest="locale", default=None,
help="locale to be used from month and weekday names"
)
parser.add_option(
"-e", "--encoding",
dest="encoding", default=None,
help="Encoding to use for output."
)
parser.add_option(
"-t", "--type",
dest="type", default="text",
choices=("text", "html"),
help="output type (text or html)"
)
(options, args) = parser.parse_args(args)
if options.locale and not options.encoding:
parser.error("if --locale is specified --encoding is required")
sys.exit(1)
locale = options.locale, options.encoding
if options.type == "html":
if options.locale:
cal = LocaleHTMLCalendar(locale=locale)
else:
cal = HTMLCalendar()
encoding = options.encoding
if encoding is None:
encoding = sys.getdefaultencoding()
optdict = dict(encoding=encoding, css=options.css)
write = sys.stdout.buffer.write
if len(args) == 1:
write(cal.formatyearpage(datetime.date.today().year, **optdict))
elif len(args) == 2:
write(cal.formatyearpage(int(args[1]), **optdict))
else:
parser.error("incorrect number of arguments")
sys.exit(1)
else:
if options.locale:
cal = LocaleTextCalendar(locale=locale)
else:
cal = TextCalendar()
optdict = dict(w=options.width, l=options.lines)
if len(args) != 3:
optdict["c"] = options.spacing
optdict["m"] = options.months
if len(args) == 1:
result = cal.formatyear(datetime.date.today().year, **optdict)
elif len(args) == 2:
result = cal.formatyear(int(args[1]), **optdict)
elif len(args) == 3:
result = cal.formatmonth(int(args[1]), int(args[2]), **optdict)
else:
parser.error("incorrect number of arguments")
sys.exit(1)
write = sys.stdout.write
if options.encoding:
result = result.encode(options.encoding)
write = sys.stdout.buffer.write
write(result)
if __name__ == "__main__":
main(sys.argv)
#! /usr/local/bin/python
# NOTE: the above "/usr/local/bin/python" is NOT a mistake. It is
# intentionally NOT "/usr/bin/env python". On many systems
# (e.g. Solaris), /usr/local/bin is not in $PATH as passed to CGI
# scripts, and /usr/local/bin is the default directory where Python is
# installed, so /usr/bin/env would be unable to find python. Granted,
# binary installations by Linux vendors often install Python in
# /usr/bin. So let those vendors patch cgi.py to match their choice
# of installation.
"""Support module for CGI (Common Gateway Interface) scripts.
This module defines a number of utilities for use by CGI scripts
written in Python.
"""
# History
# -------
#
# Michael McLay started this module. Steve Majewski changed the
# interface to SvFormContentDict and FormContentDict. The multipart
# parsing was inspired by code submitted by Andreas Paepcke. Guido van
# Rossum rewrote, reformatted and documented the module and is currently
# responsible for its maintenance.
#
__version__ = "2.6"
# Imports
# =======
from io import StringIO, BytesIO, TextIOWrapper
from collections import Mapping
import sys
import os
import urllib.parse
from email.parser import FeedParser
from email.message import Message
from warnings import warn
import html
import locale
import tempfile
__all__ = ["MiniFieldStorage", "FieldStorage",
"parse", "parse_qs", "parse_qsl", "parse_multipart",
"parse_header", "print_exception", "print_environ",
"print_form", "print_directory", "print_arguments",
"print_environ_usage", "escape"]
# Logging support
# ===============
logfile = "" # Filename to log to, if not empty
logfp = None # File object to log to, if not None
def initlog(*allargs):
"""Write a log message, if there is a log file.
Even though this function is called initlog(), you should always
use log(); log is a variable that is set either to initlog
(initially), to dolog (once the log file has been opened), or to
nolog (when logging is disabled).
The first argument is a format string; the remaining arguments (if
any) are arguments to the % operator, so e.g.
log("%s: %s", "a", "b")
will write "a: b" to the log file, followed by a newline.
If the global logfp is not None, it should be a file object to
which log data is written.
If the global logfp is None, the global logfile may be a string
giving a filename to open, in append mode. This file should be
world writable!!! If the file can't be opened, logging is
silently disabled (since there is no safe place where we could
send an error message).
"""
global log, logfile, logfp
if logfile and not logfp:
try:
logfp = open(logfile, "a")
except OSError:
pass
if not logfp:
log = nolog
else:
log = dolog
log(*allargs)
def dolog(fmt, *args):
"""Write a log message to the log file. See initlog() for docs."""
logfp.write(fmt%args + "\n")
def nolog(*allargs):
"""Dummy function, assigned to log when logging is disabled."""
pass
def closelog():
"""Close the log file."""
global log, logfile, logfp
logfile = ''
if logfp:
logfp.close()
logfp = None
log = initlog
log = initlog # The current logging function
# Parsing functions
# =================
# Maximum input we will accept when REQUEST_METHOD is POST
# 0 ==> unlimited input
maxlen = 0
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
"""Parse a query in the environment or from a file (default stdin)
Arguments, all optional:
fp : file pointer; default: sys.stdin.buffer
environ : environment dictionary; default: os.environ
keep_blank_values: flag indicating whether blank values in
percent-encoded forms should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
"""
if fp is None:
fp = sys.stdin
# field keys and values (except for files) are returned as strings
# an encoding is required to decode the bytes read from self.fp
if hasattr(fp,'encoding'):
encoding = fp.encoding
else:
encoding = 'latin-1'
# fp.read() must return bytes
if isinstance(fp, TextIOWrapper):
fp = fp.buffer
if not 'REQUEST_METHOD' in environ:
environ['REQUEST_METHOD'] = 'GET' # For testing stand-alone
if environ['REQUEST_METHOD'] == 'POST':
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
if ctype == 'multipart/form-data':
return parse_multipart(fp, pdict)
elif ctype == 'application/x-www-form-urlencoded':
clength = int(environ['CONTENT_LENGTH'])
if maxlen and clength > maxlen:
raise ValueError('Maximum content length exceeded')
qs = fp.read(clength).decode(encoding)
else:
qs = '' # Unknown content-type
if 'QUERY_STRING' in environ:
if qs: qs = qs + '&'
qs = qs + environ['QUERY_STRING']
elif sys.argv[1:]:
if qs: qs = qs + '&'
qs = qs + sys.argv[1]
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
elif 'QUERY_STRING' in environ:
qs = environ['QUERY_STRING']
else:
if sys.argv[1:]:
qs = sys.argv[1]
else:
qs = ""
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
encoding=encoding)
# parse query string function called from urlparse,
# this is done in order to maintain backward compatiblity.
def parse_qs(qs, keep_blank_values=0, strict_parsing=0):
"""Parse a query given as a string argument."""
warn("cgi.parse_qs is deprecated, use urllib.parse.parse_qs instead",
DeprecationWarning, 2)
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing)
def parse_qsl(qs, keep_blank_values=0, strict_parsing=0):
"""Parse a query given as a string argument."""
warn("cgi.parse_qsl is deprecated, use urllib.parse.parse_qsl instead",
DeprecationWarning, 2)
return urllib.parse.parse_qsl(qs, keep_blank_values, strict_parsing)
def parse_multipart(fp, pdict):
"""Parse multipart input.
Arguments:
fp : input file
pdict: dictionary containing other parameters of content-type header
Returns a dictionary just like parse_qs(): keys are the field names, each
value is a list of values for that field. This is easy to use but not
much good if you are expecting megabytes to be uploaded -- in that case,
use the FieldStorage class instead which is much more flexible. Note
that content-type is the raw, unparsed contents of the content-type
header.
XXX This does not parse nested multipart parts -- use FieldStorage for
that.
XXX This should really be subsumed by FieldStorage altogether -- no
point in having two implementations of the same parsing algorithm.
Also, FieldStorage protects itself better against certain DoS attacks
by limiting the size of the data read in one chunk. The API here
does not support that kind of protection. This also affects parse()
since it can call parse_multipart().
"""
import http.client
boundary = b""
if 'boundary' in pdict:
boundary = pdict['boundary']
if not valid_boundary(boundary):
raise ValueError('Invalid boundary in multipart form: %r'
% (boundary,))
nextpart = b"--" + boundary
lastpart = b"--" + boundary + b"--"
partdict = {}
terminator = b""
while terminator != lastpart:
bytes = -1
data = None
if terminator:
# At start of next part. Read headers first.
headers = http.client.parse_headers(fp)
clength = headers.get('content-length')
if clength:
try:
bytes = int(clength)
except ValueError:
pass
if bytes > 0:
if maxlen and bytes > maxlen:
raise ValueError('Maximum content length exceeded')
data = fp.read(bytes)
else:
data = b""
# Read lines until end of part.
lines = []
while 1:
line = fp.readline()
if not line:
terminator = lastpart # End outer loop
break
if line.startswith(b"--"):
terminator = line.rstrip()
if terminator in (nextpart, lastpart):
break
lines.append(line)
# Done with part.
if data is None:
continue
if bytes < 0:
if lines:
# Strip final line terminator
line = lines[-1]
if line[-2:] == b"\r\n":
line = line[:-2]
elif line[-1:] == b"\n":
line = line[:-1]
lines[-1] = line
data = b"".join(lines)
line = headers['content-disposition']
if not line:
continue
key, params = parse_header(line)
if key != 'form-data':
continue
if 'name' in params:
name = params['name']
else:
continue
if name in partdict:
partdict[name].append(data)
else:
partdict[name] = [data]
return partdict
def _parseparam(s):
while s[:1] == ';':
s = s[1:]
end = s.find(';')
while end > 0 and (s.count('"', 0, end) - s.count('\\"', 0, end)) % 2:
end = s.find(';', end + 1)
if end < 0:
end = len(s)
f = s[:end]
yield f.strip()
s = s[end:]
def parse_header(line):
"""Parse a Content-type like header.
Return the main content-type and a dictionary of options.
"""
parts = _parseparam(';' + line)
key = parts.__next__()
pdict = {}
for p in parts:
i = p.find('=')
if i >= 0:
name = p[:i].strip().lower()
value = p[i+1:].strip()
if len(value) >= 2 and value[0] == value[-1] == '"':
value = value[1:-1]
value = value.replace('\\\\', '\\').replace('\\"', '"')
pdict[name] = value
return key, pdict
# Classes for field storage
# =========================
class MiniFieldStorage:
"""Like FieldStorage, for use when no file uploads are possible."""
# Dummy attributes
filename = None
list = None
type = None
file = None
type_options = {}
disposition = None
disposition_options = {}
headers = {}
def __init__(self, name, value):
"""Constructor from field name and value."""
self.name = name
self.value = value
# self.file = StringIO(value)
def __repr__(self):
"""Return printable representation."""
return "MiniFieldStorage(%r, %r)" % (self.name, self.value)
class FieldStorage:
"""Store a sequence of fields, reading multipart/form-data.
This class provides naming, typing, files stored on disk, and
more. At the top level, it is accessible like a dictionary, whose
keys are the field names. (Note: None can occur as a field name.)
The items are either a Python list (if there's multiple values) or
another FieldStorage or MiniFieldStorage object. If it's a single
object, it has the following attributes:
name: the field name, if specified; otherwise None
filename: the filename, if specified; otherwise None; this is the
client side filename, *not* the file name on which it is
stored (that's a temporary file you don't deal with)
value: the value as a *string*; for file uploads, this
transparently reads the file every time you request the value
and returns *bytes*
file: the file(-like) object from which you can read the data *as
bytes* ; None if the data is stored a simple string
type: the content-type, or None if not specified
type_options: dictionary of options specified on the content-type
line
disposition: content-disposition, or None if not specified
disposition_options: dictionary of corresponding options
headers: a dictionary(-like) object (sometimes email.message.Message or a
subclass thereof) containing *all* headers
The class is subclassable, mostly for the purpose of overriding
the make_file() method, which is called internally to come up with
a file open for reading and writing. This makes it possible to
override the default choice of storing all files in a temporary
directory and unlinking them as soon as they have been opened.
"""
def __init__(self, fp=None, headers=None, outerboundary=b'',
environ=os.environ, keep_blank_values=0, strict_parsing=0,
limit=None, encoding='utf-8', errors='replace'):
"""Constructor. Read multipart/* until last part.
Arguments, all optional:
fp : file pointer; default: sys.stdin.buffer
(not used when the request method is GET)
Can be :
1. a TextIOWrapper object
2. an object whose read() and readline() methods return bytes
headers : header dictionary-like object; default:
taken from environ as per CGI spec
outerboundary : terminating multipart boundary
(for internal use only)
environ : environment dictionary; default: os.environ
keep_blank_values: flag indicating whether blank values in
percent-encoded forms should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
limit : used internally to read parts of multipart/form-data forms,
to exit from the reading loop when reached. It is the difference
between the form content-length and the number of bytes already
read
encoding, errors : the encoding and error handler used to decode the
binary stream to strings. Must be the same as the charset defined
for the page sending the form (content-type : meta http-equiv or
header)
"""
method = 'GET'
self.keep_blank_values = keep_blank_values
self.strict_parsing = strict_parsing
if 'REQUEST_METHOD' in environ:
method = environ['REQUEST_METHOD'].upper()
self.qs_on_post = None
if method == 'GET' or method == 'HEAD':
if 'QUERY_STRING' in environ:
qs = environ['QUERY_STRING']
elif sys.argv[1:]:
qs = sys.argv[1]
else:
qs = ""
qs = qs.encode(locale.getpreferredencoding(), 'surrogateescape')
fp = BytesIO(qs)
if headers is None:
headers = {'content-type':
"application/x-www-form-urlencoded"}
if headers is None:
headers = {}
if method == 'POST':
# Set default content-type for POST to what's traditional
headers['content-type'] = "application/x-www-form-urlencoded"
if 'CONTENT_TYPE' in environ:
headers['content-type'] = environ['CONTENT_TYPE']
if 'QUERY_STRING' in environ:
self.qs_on_post = environ['QUERY_STRING']
if 'CONTENT_LENGTH' in environ:
headers['content-length'] = environ['CONTENT_LENGTH']
else:
if not (isinstance(headers, (Mapping, Message))):
raise TypeError("headers must be mapping or an instance of "
"email.message.Message")
self.headers = headers
if fp is None:
self.fp = sys.stdin.buffer
# self.fp.read() must return bytes
elif isinstance(fp, TextIOWrapper):
self.fp = fp.buffer
else:
if not (hasattr(fp, 'read') and hasattr(fp, 'readline')):
raise TypeError("fp must be file pointer")
self.fp = fp
self.encoding = encoding
self.errors = errors
if not isinstance(outerboundary, bytes):
raise TypeError('outerboundary must be bytes, not %s'
% type(outerboundary).__name__)
self.outerboundary = outerboundary
self.bytes_read = 0
self.limit = limit
# Process content-disposition header
cdisp, pdict = "", {}
if 'content-disposition' in self.headers:
cdisp, pdict = parse_header(self.headers['content-disposition'])
self.disposition = cdisp
self.disposition_options = pdict
self.name = None
if 'name' in pdict:
self.name = pdict['name']
self.filename = None
if 'filename' in pdict:
self.filename = pdict['filename']
self._binary_file = self.filename is not None
# Process content-type header
#
# Honor any existing content-type header. But if there is no
# content-type header, use some sensible defaults. Assume
# outerboundary is "" at the outer level, but something non-false
# inside a multi-part. The default for an inner part is text/plain,
# but for an outer part it should be urlencoded. This should catch
# bogus clients which erroneously forget to include a content-type
# header.
#
# See below for what we do if there does exist a content-type header,
# but it happens to be something we don't understand.
if 'content-type' in self.headers:
ctype, pdict = parse_header(self.headers['content-type'])
elif self.outerboundary or method != 'POST':
ctype, pdict = "text/plain", {}
else:
ctype, pdict = 'application/x-www-form-urlencoded', {}
self.type = ctype
self.type_options = pdict
if 'boundary' in pdict:
self.innerboundary = pdict['boundary'].encode(self.encoding)
else:
self.innerboundary = b""
clen = -1
if 'content-length' in self.headers:
try:
clen = int(self.headers['content-length'])
except ValueError:
pass
if maxlen and clen > maxlen:
raise ValueError('Maximum content length exceeded')
self.length = clen
if self.limit is None and clen:
self.limit = clen
self.list = self.file = None
self.done = 0
if ctype == 'application/x-www-form-urlencoded':
self.read_urlencoded()
elif ctype[:10] == 'multipart/':
self.read_multi(environ, keep_blank_values, strict_parsing)
else:
self.read_single()
def __del__(self):
try:
self.file.close()
except AttributeError:
pass
def __repr__(self):
"""Return a printable representation."""
return "FieldStorage(%r, %r, %r)" % (
self.name, self.filename, self.value)
def __iter__(self):
return iter(self.keys())
def __getattr__(self, name):
if name != 'value':
raise AttributeError(name)
if self.file:
self.file.seek(0)
value = self.file.read()
self.file.seek(0)
elif self.list is not None:
value = self.list
else:
value = None
return value
def __getitem__(self, key):
"""Dictionary style indexing."""
if self.list is None:
raise TypeError("not indexable")
found = []
for item in self.list:
if item.name == key: found.append(item)
if not found:
raise KeyError(key)
if len(found) == 1:
return found[0]
else:
return found
def getvalue(self, key, default=None):
"""Dictionary style get() method, including 'value' lookup."""
if key in self:
value = self[key]
if isinstance(value, list):
return [x.value for x in value]
else:
return value.value
else:
return default
def getfirst(self, key, default=None):
""" Return the first value received."""
if key in self:
value = self[key]
if isinstance(value, list):
return value[0].value
else:
return value.value
else:
return default
def getlist(self, key):
""" Return list of received values."""
if key in self:
value = self[key]
if isinstance(value, list):
return [x.value for x in value]
else:
return [value.value]
else:
return []
def keys(self):
"""Dictionary style keys() method."""
if self.list is None:
raise TypeError("not indexable")
return list(set(item.name for item in self.list))
def __contains__(self, key):
"""Dictionary style __contains__ method."""
if self.list is None:
raise TypeError("not indexable")
return any(item.name == key for item in self.list)
def __len__(self):
"""Dictionary style len(x) support."""
return len(self.keys())
def __bool__(self):
if self.list is None:
raise TypeError("Cannot be converted to bool.")
return bool(self.list)
def read_urlencoded(self):
"""Internal: read data in query string format."""
qs = self.fp.read(self.length)
if not isinstance(qs, bytes):
raise ValueError("%s should return bytes, got %s" \
% (self.fp, type(qs).__name__))
qs = qs.decode(self.encoding, self.errors)
if self.qs_on_post:
qs += '&' + self.qs_on_post
self.list = []
query = urllib.parse.parse_qsl(
qs, self.keep_blank_values, self.strict_parsing,
encoding=self.encoding, errors=self.errors)
for key, value in query:
self.list.append(MiniFieldStorage(key, value))
self.skip_lines()
FieldStorageClass = None
def read_multi(self, environ, keep_blank_values, strict_parsing):
"""Internal: read a part that is itself multipart."""
ib = self.innerboundary
if not valid_boundary(ib):
raise ValueError('Invalid boundary in multipart form: %r' % (ib,))
self.list = []
if self.qs_on_post:
query = urllib.parse.parse_qsl(
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
encoding=self.encoding, errors=self.errors)
for key, value in query:
self.list.append(MiniFieldStorage(key, value))
klass = self.FieldStorageClass or self.__class__
first_line = self.fp.readline() # bytes
if not isinstance(first_line, bytes):
raise ValueError("%s should return bytes, got %s" \
% (self.fp, type(first_line).__name__))
self.bytes_read += len(first_line)
# Ensure that we consume the file until we've hit our inner boundary
while (first_line.strip() != (b"--" + self.innerboundary) and
first_line):
first_line = self.fp.readline()
self.bytes_read += len(first_line)
while True:
parser = FeedParser()
hdr_text = b""
while True:
data = self.fp.readline()
hdr_text += data
if not data.strip():
break
if not hdr_text:
break
# parser takes strings, not bytes
self.bytes_read += len(hdr_text)
parser.feed(hdr_text.decode(self.encoding, self.errors))
headers = parser.close()
# Some clients add Content-Length for part headers, ignore them
if 'content-length' in headers:
del headers['content-length']
part = klass(self.fp, headers, ib, environ, keep_blank_values,
strict_parsing,self.limit-self.bytes_read,
self.encoding, self.errors)
self.bytes_read += part.bytes_read
self.list.append(part)
if part.done or self.bytes_read >= self.length > 0:
break
self.skip_lines()
def read_single(self):
"""Internal: read an atomic part."""
if self.length >= 0:
self.read_binary()
self.skip_lines()
else:
self.read_lines()
self.file.seek(0)
bufsize = 8*1024 # I/O buffering size for copy to file
def read_binary(self):
"""Internal: read binary data."""
self.file = self.make_file()
todo = self.length
if todo >= 0:
while todo > 0:
data = self.fp.read(min(todo, self.bufsize)) # bytes
if not isinstance(data, bytes):
raise ValueError("%s should return bytes, got %s"
% (self.fp, type(data).__name__))
self.bytes_read += len(data)
if not data:
self.done = -1
break
self.file.write(data)
todo = todo - len(data)
def read_lines(self):
"""Internal: read lines until EOF or outerboundary."""
if self._binary_file:
self.file = self.__file = BytesIO() # store data as bytes for files
else:
self.file = self.__file = StringIO() # as strings for other fields
if self.outerboundary:
self.read_lines_to_outerboundary()
else:
self.read_lines_to_eof()
def __write(self, line):
"""line is always bytes, not string"""
if self.__file is not None:
if self.__file.tell() + len(line) > 1000:
self.file = self.make_file()
data = self.__file.getvalue()
self.file.write(data)
self.__file = None
if self._binary_file:
# keep bytes
self.file.write(line)
else:
# decode to string
self.file.write(line.decode(self.encoding, self.errors))
def read_lines_to_eof(self):
"""Internal: read lines until EOF."""
while 1:
line = self.fp.readline(1<<16) # bytes
self.bytes_read += len(line)
if not line:
self.done = -1
break
self.__write(line)
def read_lines_to_outerboundary(self):
"""Internal: read lines until outerboundary.
Data is read as bytes: boundaries and line ends must be converted
to bytes for comparisons.
"""
next_boundary = b"--" + self.outerboundary
last_boundary = next_boundary + b"--"
delim = b""
last_line_lfend = True
_read = 0
while 1:
if _read >= self.limit:
break
line = self.fp.readline(1<<16) # bytes
self.bytes_read += len(line)
_read += len(line)
if not line:
self.done = -1
break
if delim == b"\r":
line = delim + line
delim = b""
if line.startswith(b"--") and last_line_lfend:
strippedline = line.rstrip()
if strippedline == next_boundary:
break
if strippedline == last_boundary:
self.done = 1
break
odelim = delim
if line.endswith(b"\r\n"):
delim = b"\r\n"
line = line[:-2]
last_line_lfend = True
elif line.endswith(b"\n"):
delim = b"\n"
line = line[:-1]
last_line_lfend = True
elif line.endswith(b"\r"):
# We may interrupt \r\n sequences if they span the 2**16
# byte boundary
delim = b"\r"
line = line[:-1]
last_line_lfend = False
else:
delim = b""
last_line_lfend = False
self.__write(odelim + line)
def skip_lines(self):
"""Internal: skip lines until outer boundary if defined."""
if not self.outerboundary or self.done:
return
next_boundary = b"--" + self.outerboundary
last_boundary = next_boundary + b"--"
last_line_lfend = True
while True:
line = self.fp.readline(1<<16)
self.bytes_read += len(line)
if not line:
self.done = -1
break
if line.endswith(b"--") and last_line_lfend:
strippedline = line.strip()
if strippedline == next_boundary:
break
if strippedline == last_boundary:
self.done = 1
break
last_line_lfend = line.endswith(b'\n')
def make_file(self):
"""Overridable: return a readable & writable file.
The file will be used as follows:
- data is written to it
- seek(0)
- data is read from it
The file is opened in binary mode for files, in text mode
for other fields
This version opens a temporary file for reading and writing,
and immediately deletes (unlinks) it. The trick (on Unix!) is
that the file can still be used, but it can't be opened by
another process, and it will automatically be deleted when it
is closed or when the current process terminates.
If you want a more permanent file, you derive a class which
overrides this method. If you want a visible temporary file
that is nevertheless automatically deleted when the script
terminates, try defining a __del__ method in a derived class
which unlinks the temporary files you have created.
"""
if self._binary_file:
return tempfile.TemporaryFile("wb+")
else:
return tempfile.TemporaryFile("w+",
encoding=self.encoding, newline = '\n')
# Test/debug code
# ===============
def test(environ=os.environ):
"""Robust test CGI script, usable as main program.
Write minimal HTTP headers and dump all information provided to
the script in HTML form.
"""
print("Content-type: text/html")
print()
sys.stderr = sys.stdout
try:
form = FieldStorage() # Replace with other classes to test those
print_directory()
print_arguments()
print_form(form)
print_environ(environ)
print_environ_usage()
def f():
exec("testing print_exception() -- <I>italics?</I>")
def g(f=f):
f()
print("<H3>What follows is a test, not an actual exception:</H3>")
g()
except:
print_exception()
print("<H1>Second try with a small maxlen...</H1>")
global maxlen
maxlen = 50
try:
form = FieldStorage() # Replace with other classes to test those
print_directory()
print_arguments()
print_form(form)
print_environ(environ)
except:
print_exception()
def print_exception(type=None, value=None, tb=None, limit=None):
if type is None:
type, value, tb = sys.exc_info()
import traceback
print()
print("<H3>Traceback (most recent call last):</H3>")
list = traceback.format_tb(tb, limit) + \
traceback.format_exception_only(type, value)
print("<PRE>%s<B>%s</B></PRE>" % (
html.escape("".join(list[:-1])),
html.escape(list[-1]),
))
del tb
def print_environ(environ=os.environ):
"""Dump the shell environment as HTML."""
keys = sorted(environ.keys())
print()
print("<H3>Shell Environment:</H3>")
print("<DL>")
for key in keys:
print("<DT>", html.escape(key), "<DD>", html.escape(environ[key]))
print("</DL>")
print()
def print_form(form):
"""Dump the contents of a form as HTML."""
keys = sorted(form.keys())
print()
print("<H3>Form Contents:</H3>")
if not keys:
print("<P>No form fields.")
print("<DL>")
for key in keys:
print("<DT>" + html.escape(key) + ":", end=' ')
value = form[key]
print("<i>" + html.escape(repr(type(value))) + "</i>")
print("<DD>" + html.escape(repr(value)))
print("</DL>")
print()
def print_directory():
"""Dump the current directory as HTML."""
print()
print("<H3>Current Working Directory:</H3>")
try:
pwd = os.getcwd()
except OSError as msg:
print("OSError:", html.escape(str(msg)))
else:
print(html.escape(pwd))
print()
def print_arguments():
print()
print("<H3>Command Line Arguments:</H3>")
print()
print(sys.argv)
print()
def print_environ_usage():
"""Dump a list of environment variables used by CGI as HTML."""
print("""
<H3>These environment variables could have been set:</H3>
<UL>
<LI>AUTH_TYPE
<LI>CONTENT_LENGTH
<LI>CONTENT_TYPE
<LI>DATE_GMT
<LI>DATE_LOCAL
<LI>DOCUMENT_NAME
<LI>DOCUMENT_ROOT
<LI>DOCUMENT_URI
<LI>GATEWAY_INTERFACE
<LI>LAST_MODIFIED
<LI>PATH
<LI>PATH_INFO
<LI>PATH_TRANSLATED
<LI>QUERY_STRING
<LI>REMOTE_ADDR
<LI>REMOTE_HOST
<LI>REMOTE_IDENT
<LI>REMOTE_USER
<LI>REQUEST_METHOD
<LI>SCRIPT_NAME
<LI>SERVER_NAME
<LI>SERVER_PORT
<LI>SERVER_PROTOCOL
<LI>SERVER_ROOT
<LI>SERVER_SOFTWARE
</UL>
In addition, HTTP headers sent by the server may be passed in the
environment as well. Here are some common variable names:
<UL>
<LI>HTTP_ACCEPT
<LI>HTTP_CONNECTION
<LI>HTTP_HOST
<LI>HTTP_PRAGMA
<LI>HTTP_REFERER
<LI>HTTP_USER_AGENT
</UL>
""")
# Utilities
# =========
def escape(s, quote=None):
"""Deprecated API."""
warn("cgi.escape is deprecated, use html.escape instead",
DeprecationWarning, stacklevel=2)
s = s.replace("&", "&") # Must be done first!
s = s.replace("<", "<")
s = s.replace(">", ">")
if quote:
s = s.replace('"', """)
return s
def valid_boundary(s):
import re
if isinstance(s, bytes):
_vb_pattern = b"^[ -~]{0,200}[!-~]$"
else:
_vb_pattern = "^[ -~]{0,200}[!-~]$"
return re.match(_vb_pattern, s)
# Invoke mainline
# ===============
# Call test() when this file is run as a script (not imported as a module)
if __name__ == '__main__':
test()
"""More comprehensive traceback formatting for Python scripts.
To enable this module, do:
import cgitb; cgitb.enable()
at the top of your script. The optional arguments to enable() are:
display - if true, tracebacks are displayed in the web browser
logdir - if set, tracebacks are written to files in this directory
context - number of lines of source code to show for each stack frame
format - 'text' or 'html' controls the output format
By default, tracebacks are displayed but not saved, the context is 5 lines
and the output format is 'html' (for backwards compatibility with the
original use of this module)
Alternatively, if you have caught an exception and want cgitb to display it
for you, call cgitb.handler(). The optional argument to handler() is a
3-item tuple (etype, evalue, etb) just like the value of sys.exc_info().
The default handler displays output as HTML.
"""
import inspect
import keyword
import linecache
import os
import pydoc
import sys
import tempfile
import time
import tokenize
import traceback
def reset():
"""Return a string that resets the CGI and browser to a known state."""
return '''<!--: spam
Content-Type: text/html
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> -->
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> --> -->
</font> </font> </font> </script> </object> </blockquote> </pre>
</table> </table> </table> </table> </table> </font> </font> </font>'''
__UNDEF__ = [] # a special sentinel object
def small(text):
if text:
return '<small>' + text + '</small>'
else:
return ''
def strong(text):
if text:
return '<strong>' + text + '</strong>'
else:
return ''
def grey(text):
if text:
return '<font color="#909090">' + text + '</font>'
else:
return ''
def lookup(name, frame, locals):
"""Find the value for a given name in the given environment."""
if name in locals:
return 'local', locals[name]
if name in frame.f_globals:
return 'global', frame.f_globals[name]
if '__builtins__' in frame.f_globals:
builtins = frame.f_globals['__builtins__']
if type(builtins) is type({}):
if name in builtins:
return 'builtin', builtins[name]
else:
if hasattr(builtins, name):
return 'builtin', getattr(builtins, name)
return None, __UNDEF__
def scanvars(reader, frame, locals):
"""Scan one logical line of Python and look up values of variables used."""
vars, lasttoken, parent, prefix, value = [], None, None, '', __UNDEF__
for ttype, token, start, end, line in tokenize.generate_tokens(reader):
if ttype == tokenize.NEWLINE: break
if ttype == tokenize.NAME and token not in keyword.kwlist:
if lasttoken == '.':
if parent is not __UNDEF__:
value = getattr(parent, token, __UNDEF__)
vars.append((prefix + token, prefix, value))
else:
where, value = lookup(token, frame, locals)
vars.append((token, where, value))
elif token == '.':
prefix += lasttoken + '.'
parent = value
else:
parent, prefix = None, ''
lasttoken = token
return vars
def html(einfo, context=5):
"""Return a nice HTML document describing a given traceback."""
etype, evalue, etb = einfo
if isinstance(etype, type):
etype = etype.__name__
pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable
date = time.ctime(time.time())
head = '<body bgcolor="#f0f0f8">' + pydoc.html.heading(
'<big><big>%s</big></big>' %
strong(pydoc.html.escape(str(etype))),
'#ffffff', '#6622aa', pyver + '<br>' + date) + '''
<p>A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.</p>'''
indent = '<tt>' + small(' ' * 5) + ' </tt>'
frames = []
records = inspect.getinnerframes(etb, context)
for frame, file, lnum, func, lines, index in records:
if file:
file = os.path.abspath(file)
link = '<a href="file://%s">%s</a>' % (file, pydoc.html.escape(file))
else:
file = link = '?'
args, varargs, varkw, locals = inspect.getargvalues(frame)
call = ''
if func != '?':
call = 'in ' + strong(func) + \
inspect.formatargvalues(args, varargs, varkw, locals,
formatvalue=lambda value: '=' + pydoc.html.repr(value))
highlight = {}
def reader(lnum=[lnum]):
highlight[lnum[0]] = 1
try: return linecache.getline(file, lnum[0])
finally: lnum[0] += 1
vars = scanvars(reader, frame, locals)
rows = ['<tr><td bgcolor="#d8bbff">%s%s %s</td></tr>' %
('<big> </big>', link, call)]
if index is not None:
i = lnum - index
for line in lines:
num = small(' ' * (5-len(str(i))) + str(i)) + ' '
if i in highlight:
line = '<tt>=>%s%s</tt>' % (num, pydoc.html.preformat(line))
rows.append('<tr><td bgcolor="#ffccee">%s</td></tr>' % line)
else:
line = '<tt> %s%s</tt>' % (num, pydoc.html.preformat(line))
rows.append('<tr><td>%s</td></tr>' % grey(line))
i += 1
done, dump = {}, []
for name, where, value in vars:
if name in done: continue
done[name] = 1
if value is not __UNDEF__:
if where in ('global', 'builtin'):
name = ('<em>%s</em> ' % where) + strong(name)
elif where == 'local':
name = strong(name)
else:
name = where + strong(name.split('.')[-1])
dump.append('%s = %s' % (name, pydoc.html.repr(value)))
else:
dump.append(name + ' <em>undefined</em>')
rows.append('<tr><td>%s</td></tr>' % small(grey(', '.join(dump))))
frames.append('''
<table width="100%%" cellspacing=0 cellpadding=0 border=0>
%s</table>''' % '\n'.join(rows))
exception = ['<p>%s: %s' % (strong(pydoc.html.escape(str(etype))),
pydoc.html.escape(str(evalue)))]
for name in dir(evalue):
if name[:1] == '_': continue
value = pydoc.html.repr(getattr(evalue, name))
exception.append('\n<br>%s%s =\n%s' % (indent, name, value))
return head + ''.join(frames) + ''.join(exception) + '''
<!-- The above is a description of an error in a Python program, formatted
for a Web browser because the 'cgitb' module was enabled. In case you
are not reading this in a Web browser, here is the original traceback:
%s
-->
''' % pydoc.html.escape(
''.join(traceback.format_exception(etype, evalue, etb)))
def text(einfo, context=5):
"""Return a plain text document describing a given traceback."""
etype, evalue, etb = einfo
if isinstance(etype, type):
etype = etype.__name__
pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable
date = time.ctime(time.time())
head = "%s\n%s\n%s\n" % (str(etype), pyver, date) + '''
A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.
'''
frames = []
records = inspect.getinnerframes(etb, context)
for frame, file, lnum, func, lines, index in records:
file = file and os.path.abspath(file) or '?'
args, varargs, varkw, locals = inspect.getargvalues(frame)
call = ''
if func != '?':
call = 'in ' + func + \
inspect.formatargvalues(args, varargs, varkw, locals,
formatvalue=lambda value: '=' + pydoc.text.repr(value))
highlight = {}
def reader(lnum=[lnum]):
highlight[lnum[0]] = 1
try: return linecache.getline(file, lnum[0])
finally: lnum[0] += 1
vars = scanvars(reader, frame, locals)
rows = [' %s %s' % (file, call)]
if index is not None:
i = lnum - index
for line in lines:
num = '%5d ' % i
rows.append(num+line.rstrip())
i += 1
done, dump = {}, []
for name, where, value in vars:
if name in done: continue
done[name] = 1
if value is not __UNDEF__:
if where == 'global': name = 'global ' + name
elif where != 'local': name = where + name.split('.')[-1]
dump.append('%s = %s' % (name, pydoc.text.repr(value)))
else:
dump.append(name + ' undefined')
rows.append('\n'.join(dump))
frames.append('\n%s\n' % '\n'.join(rows))
exception = ['%s: %s' % (str(etype), str(evalue))]
for name in dir(evalue):
value = pydoc.text.repr(getattr(evalue, name))
exception.append('\n%s%s = %s' % (" "*4, name, value))
return head + ''.join(frames) + ''.join(exception) + '''
The above is a description of an error in a Python program. Here is
the original traceback:
%s
''' % ''.join(traceback.format_exception(etype, evalue, etb))
class Hook:
"""A hook to replace sys.excepthook that shows tracebacks in HTML."""
def __init__(self, display=1, logdir=None, context=5, file=None,
format="html"):
self.display = display # send tracebacks to browser if true
self.logdir = logdir # log tracebacks to files if not None
self.context = context # number of source code lines per frame
self.file = file or sys.stdout # place to send the output
self.format = format
def __call__(self, etype, evalue, etb):
self.handle((etype, evalue, etb))
def handle(self, info=None):
info = info or sys.exc_info()
if self.format == "html":
self.file.write(reset())
formatter = (self.format=="html") and html or text
plain = False
try:
doc = formatter(info, self.context)
except: # just in case something goes wrong
doc = ''.join(traceback.format_exception(*info))
plain = True
if self.display:
if plain:
doc = doc.replace('&', '&').replace('<', '<')
self.file.write('<pre>' + doc + '</pre>\n')
else:
self.file.write(doc + '\n')
else:
self.file.write('<p>A problem occurred in a Python script.\n')
if self.logdir is not None:
suffix = ['.txt', '.html'][self.format=="html"]
(fd, path) = tempfile.mkstemp(suffix=suffix, dir=self.logdir)
try:
file = os.fdopen(fd, 'w')
file.write(doc)
file.close()
msg = '%s contains the description of this error.' % path
except:
msg = 'Tried to save traceback to %s, but failed.' % path
if self.format == 'html':
self.file.write('<p>%s</p>\n' % msg)
else:
self.file.write(msg + '\n')
try:
self.file.flush()
except: pass
handler = Hook().handle
def enable(display=1, logdir=None, context=5, format="html"):
"""Install an exception handler that formats tracebacks as HTML.
The optional argument 'display' can be set to 0 to suppress sending the
traceback to the browser, and 'logdir' can be set to a directory to cause
tracebacks to be written to files there."""
sys.excepthook = Hook(display=display, logdir=logdir,
context=context, format=format)
"""Simple class to read IFF chunks.
An IFF chunk (used in formats such as AIFF, TIFF, RMFF (RealMedia File
Format)) has the following structure:
+----------------+
| ID (4 bytes) |
+----------------+
| size (4 bytes) |
+----------------+
| data |
| ... |
+----------------+
The ID is a 4-byte string which identifies the type of chunk.
The size field (a 32-bit value, encoded using big-endian byte order)
gives the size of the whole chunk, including the 8-byte header.
Usually an IFF-type file consists of one or more chunks. The proposed
usage of the Chunk class defined here is to instantiate an instance at
the start of each chunk and read from the instance until it reaches
the end, after which a new instance can be instantiated. At the end
of the file, creating a new instance will fail with an EOFError
exception.
Usage:
while True:
try:
chunk = Chunk(file)
except EOFError:
break
chunktype = chunk.getname()
while True:
data = chunk.read(nbytes)
if not data:
pass
# do something with data
The interface is file-like. The implemented methods are:
read, close, seek, tell, isatty.
Extra methods are: skip() (called by close, skips to the end of the chunk),
getname() (returns the name (ID) of the chunk)
The __init__ method has one required argument, a file-like object
(including a chunk instance), and one optional argument, a flag which
specifies whether or not chunks are aligned on 2-byte boundaries. The
default is 1, i.e. aligned.
"""
class Chunk:
def __init__(self, file, align=True, bigendian=True, inclheader=False):
import struct
self.closed = False
self.align = align # whether to align to word (2-byte) boundaries
if bigendian:
strflag = '>'
else:
strflag = '<'
self.file = file
self.chunkname = file.read(4)
if len(self.chunkname) < 4:
raise EOFError
try:
self.chunksize = struct.unpack_from(strflag+'L', file.read(4))[0]
except struct.error:
raise EOFError
if inclheader:
self.chunksize = self.chunksize - 8 # subtract header
self.size_read = 0
try:
self.offset = self.file.tell()
except (AttributeError, OSError):
self.seekable = False
else:
self.seekable = True
def getname(self):
"""Return the name (ID) of the current chunk."""
return self.chunkname
def getsize(self):
"""Return the size of the current chunk."""
return self.chunksize
def close(self):
if not self.closed:
try:
self.skip()
finally:
self.closed = True
def isatty(self):
if self.closed:
raise ValueError("I/O operation on closed file")
return False
def seek(self, pos, whence=0):
"""Seek to specified position into the chunk.
Default position is 0 (start of chunk).
If the file is not seekable, this will result in an error.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
if not self.seekable:
raise OSError("cannot seek")
if whence == 1:
pos = pos + self.size_read
elif whence == 2:
pos = pos + self.chunksize
if pos < 0 or pos > self.chunksize:
raise RuntimeError
self.file.seek(self.offset + pos, 0)
self.size_read = pos
def tell(self):
if self.closed:
raise ValueError("I/O operation on closed file")
return self.size_read
def read(self, size=-1):
"""Read at most size bytes from the chunk.
If size is omitted or negative, read until the end
of the chunk.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
if self.size_read >= self.chunksize:
return b''
if size < 0:
size = self.chunksize - self.size_read
if size > self.chunksize - self.size_read:
size = self.chunksize - self.size_read
data = self.file.read(size)
self.size_read = self.size_read + len(data)
if self.size_read == self.chunksize and \
self.align and \
(self.chunksize & 1):
dummy = self.file.read(1)
self.size_read = self.size_read + len(dummy)
return data
def skip(self):
"""Skip the rest of the chunk.
If you are not interested in the contents of the chunk,
this method should be called so that the file points to
the start of the next chunk.
"""
if self.closed:
raise ValueError("I/O operation on closed file")
if self.seekable:
try:
n = self.chunksize - self.size_read
# maybe fix alignment
if self.align and (self.chunksize & 1):
n = n + 1
self.file.seek(n, 1)
self.size_read = self.size_read + n
return
except OSError:
pass
while self.size_read < self.chunksize:
n = min(8192, self.chunksize - self.size_read)
dummy = self.read(n)
if not dummy:
raise EOFError
# Licensed to the .NET Foundation under one or more agreements.
# The .NET Foundation licenses this file to you under the Apache 2.0 License.
# See the LICENSE file in the project root for more information.
__all__ = ["ClrClass", "ClrInterface", "accepts", "returns", "attribute", "propagate_attributes"]
import clr
clr.AddReference("Microsoft.Dynamic")
clr.AddReference("Microsoft.Scripting")
clr.AddReference("IronPython")
if clr.IsNetCoreApp:
clr.AddReference("System.Reflection.Emit")
import System
from System import Char, Void, Boolean, Array, Type, AppDomain
from System.Reflection import FieldAttributes, MethodAttributes, PropertyAttributes, ParameterAttributes
from System.Reflection import CallingConventions, TypeAttributes, AssemblyName
from System.Reflection.Emit import OpCodes, CustomAttributeBuilder, AssemblyBuilder, AssemblyBuilderAccess
from System.Runtime.InteropServices import DllImportAttribute, CallingConvention, CharSet
from Microsoft.Scripting.Generation import Snippets
from Microsoft.Scripting.Runtime import DynamicOperations
from Microsoft.Scripting.Utils import ReflectionUtils
from IronPython.Runtime import NameType, PythonContext
from IronPython.Runtime.Types import PythonType, ReflectedField, ReflectedProperty
PropertyAttributes_None = getattr(PropertyAttributes, "None")
ParameterAttributes_None = getattr(ParameterAttributes, "None")
def validate_clr_types(signature_types, var_signature = False):
if not isinstance(signature_types, tuple):
signature_types = (signature_types,)
for t in signature_types:
if type(t) is type(System.IComparable): # type overloaded on generic arity, eg IComparable and IComparable[T]
t = t[()] # select non-generic version
clr_type = clr.GetClrType(t)
if t == Void:
raise TypeError("Void cannot be used in signature")
is_typed = clr.GetPythonType(clr_type) == t
# is_typed needs to be weakened until the generated type
# gets explicitly published as the underlying CLR type
is_typed = is_typed or (hasattr(t, "__metaclass__") and t.__metaclass__ in [ClrInterface, ClrClass])
if not is_typed:
raise Exception("Invalid CLR type %s" % str(t))
if not var_signature:
if clr_type.IsByRef:
raise TypeError("Byref can only be used as arguments and locals")
# ArgIterator is not present in Silverlight
if hasattr(System, "ArgIterator") and t == System.ArgIterator:
raise TypeError("Stack-referencing types can only be used as arguments and locals")
class TypedFunction(object):
"""
A strongly-typed function can get wrapped up as a staticmethod, a property, etc.
This class represents the raw function, but with the type information
it is decorated with.
Other information is stored as attributes on the function. See propagate_attributes
"""
def __init__(self, function, is_static = False, prop_name_if_prop_get = None, prop_name_if_prop_set = None):
self.function = function
self.is_static = is_static
self.prop_name_if_prop_get = prop_name_if_prop_get
self.prop_name_if_prop_set = prop_name_if_prop_set
class ClrType(type):
"""
Base metaclass for creating strongly-typed CLR types
"""
def is_typed_method(self, function):
if hasattr(function, "arg_types") != hasattr(function, "return_type"):
raise TypeError("One of @accepts and @returns is missing for %s" % function.__name__)
return hasattr(function, "arg_types")
def get_typed_properties(self):
for item_name, item in self.__dict__.items():
if isinstance(item, property):
if item.fget:
if not self.is_typed_method(item.fget): continue
prop_type = item.fget.return_type
else:
if not self.is_typed_method(item.fset): continue
prop_type = item.fset.arg_types[0]
validate_clr_types(prop_type)
clr_prop_type = clr.GetClrType(prop_type)
yield item, item_name, clr_prop_type
def emit_properties(self, typebld):
for prop, prop_name, clr_prop_type in self.get_typed_properties():
self.emit_property(typebld, prop, prop_name, clr_prop_type)
def emit_property(self, typebld, prop, name, clrtype):
prpbld = typebld.DefineProperty(name, PropertyAttributes_None, clrtype, None)
if prop.fget:
getter = self.emitted_methods[(prop.fget.__name__, prop.fget.arg_types)]
prpbld.SetGetMethod(getter)
if prop.fset:
setter = self.emitted_methods[(prop.fset.__name__, prop.fset.arg_types)]
prpbld.SetSetMethod(setter)
def dummy_function(self): raise RuntimeError("this should not get called")
def get_typed_methods(self):
"""
Get all the methods with @accepts (and @returns) decorators
Functions are assumed to be instance methods, unless decorated with @staticmethod
"""
# We avoid using the "types" library as it is not a builtin
FunctionType = type(ClrType.__dict__["dummy_function"])
for item_name, item in self.__dict__.items():
function = None
is_static = False
if isinstance(item, FunctionType):
function, is_static = item, False
elif isinstance(item, staticmethod):
function, is_static = getattr(self, item_name), True
elif isinstance(item, property):
if item.fget and self.is_typed_method(item.fget):
if item.fget.__name__ == item_name:
# The property hides the getter. So yield the getter
yield TypedFunction(item.fget, False, item_name, None)
if item.fset and self.is_typed_method(item.fset):
if item.fset.__name__ == item_name:
# The property hides the setter. So yield the setter
yield TypedFunction(item.fset, False, None, item_name)
continue
else:
continue
if self.is_typed_method(function):
yield TypedFunction(function, is_static)
def emit_methods(self, typebld):
# We need to track the generated methods so that we can emit properties
# referring these methods.
# Also, the hash is indexed by name *and signature*. Even though Python does
# not have method overloading, property getter and setter functions can have
# the same __name__ attribute
self.emitted_methods = {}
for function_info in self.get_typed_methods():
method_builder = self.emit_method(typebld, function_info)
function = function_info.function
if (function.__name__, function.arg_types) in self.emitted_methods:
raise TypeError("methods with clashing names")
self.emitted_methods[(function.__name__, function.arg_types)] = method_builder
def emit_classattribs(self, typebld):
if hasattr(self, '_clrclassattribs'):
for attrib_info in self._clrclassattribs:
if isinstance(attrib_info, type):
ci = clr.GetClrType(attrib_info).GetConstructor(())
cab = CustomAttributeBuilder(ci, ())
elif isinstance(attrib_info, CustomAttributeDecorator):
cab = attrib_info.GetBuilder()
else:
make_decorator = attrib_info()
cab = make_decorator.GetBuilder()
typebld.SetCustomAttribute(cab)
def get_clr_type_name(self):
if hasattr(self, "_clrnamespace"):
return self._clrnamespace + "." + self.__name__
else:
return self.__name__
def create_type(self, typebld):
self.emit_members(typebld)
new_type = typebld.CreateType()
self.map_members(new_type)
return new_type
class ClrInterface(ClrType):
"""
Set __metaclass__ in a Python class declaration to declare a
CLR interface type.
You need to specify object as the base-type if you do not specify any other
interfaces as the base interfaces
"""
def __init__(self, *args):
return super(ClrInterface, self).__init__(*args)
def emit_method(self, typebld, function_info):
assert(not function_info.is_static)
function = function_info.function
attributes = MethodAttributes.Public | MethodAttributes.Virtual | MethodAttributes.Abstract
method_builder = typebld.DefineMethod(
function.__name__,
attributes,
function.return_type,
function.arg_types)
instance_offset = 0 if function_info.is_static else 1
arg_names = function.__code__.co_varnames
for i in range(len(function.arg_types)):
# TODO - set non-trivial ParameterAttributes, default value and custom attributes
p = method_builder.DefineParameter(i + 1, ParameterAttributes_None, arg_names[i + instance_offset])
if hasattr(function, "CustomAttributeBuilders"):
for cab in function.CustomAttributeBuilders:
method_builder.SetCustomAttribute(cab)
return method_builder
def emit_members(self, typebld):
self.emit_methods(typebld)
self.emit_properties(typebld)
self.emit_classattribs(typebld)
def map_members(self, new_type): pass
interface_module_builder = None
@staticmethod
def define_interface(typename, bases):
for b in bases:
validate_clr_types(b)
if not ClrInterface.interface_module_builder:
name = AssemblyName("interfaces")
access = AssemblyBuilderAccess.Run
assembly_builder = ReflectionUtils.DefineDynamicAssembly(name, access)
ClrInterface.interface_module_builder = assembly_builder.DefineDynamicModule("interfaces")
attrs = TypeAttributes.Public | TypeAttributes.Interface | TypeAttributes.Abstract
return ClrInterface.interface_module_builder.DefineType(typename, attrs, None, bases)
def map_clr_type(self, clr_type):
"""
TODO - Currently "t = clr.GetPythonType(clr.GetClrType(C)); t == C" will be False
for C where C.__metaclass__ is ClrInterface, even though both t and C
represent the same CLR type. This can be fixed by publishing a mapping
between t and C in the IronPython runtime.
"""
pass
def __clrtype__(self):
# CFoo below will use ClrInterface as its metaclass, but the user will not expect CFoo
# to be an interface in this case:
#
# class IFoo(object):
# __metaclass__ = ClrInterface
# class CFoo(IFoo): pass
if not "__metaclass__" in self.__dict__:
return super(ClrInterface, self).__clrtype__()
bases = list(self.__bases__)
bases.remove(object)
bases = tuple(bases)
if False: # Snippets currently does not support creating interfaces
typegen = Snippets.Shared.DefineType(self.get_clr_type_name(), bases, True, False)
typebld = typegen.TypeBuilder
else:
typebld = ClrInterface.define_interface(self.get_clr_type_name(), bases)
clr_type = self.create_type(typebld)
self.map_clr_type(clr_type)
return clr_type
# Note that ClrClass inherits from ClrInterface to satisfy Python requirements of metaclasses.
# A metaclass of a subtype has to be subtype of the metaclass of a base type. As a result,
# if you define a type hierarchy as shown below, it requires ClrClass to be a subtype
# of ClrInterface:
#
# class IFoo(object):
# __metaclass__ = ClrInterface
# class CFoo(IFoo):
# __metaclass__ = ClrClass
class ClrClass(ClrInterface):
"""
Set __metaclass__ in a Python class declaration to specify strong-type
information for the class or its attributes. The Python class
retains its Python attributes, like being able to add or remove methods.
"""
# Holds the FieldInfo for a static CLR field which points to a
# Microsoft.Scripting.Runtime.DynamicOperations corresponding to the current ScriptEngine
dynamic_operations_field = None
def emit_fields(self, typebld):
if hasattr(self, "_clrfields"):
for fldname in self._clrfields:
field_type = self._clrfields[fldname]
validate_clr_types(field_type)
typebld.DefineField(
fldname,
clr.GetClrType(field_type),
FieldAttributes.Public)
def map_fields(self, new_type):
if hasattr(self, "_clrfields"):
for fldname in self._clrfields:
fldinfo = new_type.GetField(fldname)
setattr(self, fldname, ReflectedField(fldinfo))
@staticmethod
def get_dynamic_operations_field():
if ClrClass.dynamic_operations_field:
return ClrClass.dynamic_operations_field
python_context = clr.GetCurrentRuntime().GetLanguage(PythonContext)
dynamic_operations = DynamicOperations(python_context)
typegen = Snippets.Shared.DefineType(
"DynamicOperationsHolder" + str(hash(python_context)),
object,
True,
False)
typebld = typegen.TypeBuilder
typebld.DefineField(
"DynamicOperations",
DynamicOperations,
FieldAttributes.Public | FieldAttributes.Static)
new_type = typebld.CreateType()
ClrClass.dynamic_operations_field = new_type.GetField("DynamicOperations")
ClrClass.dynamic_operations_field.SetValue(None, dynamic_operations)
return ClrClass.dynamic_operations_field
def emit_typed_stub_to_python_method(self, typebld, function_info):
function = function_info.function
"""
Generate a stub method that repushes all the arguments and
dispatches to DynamicOperations.InvokeMember
"""
invoke_member = clr.GetClrType(DynamicOperations).GetMethod(
"InvokeMember",
Array[Type]((object, str, Array[object])))
# Type.GetMethod raises an AmbiguousMatchException if there is a generic and a non-generic method
# (like DynamicOperations.GetMember) with the same name and signature. So we have to do things
# the hard way
get_member_search = [m for m in clr.GetClrType(DynamicOperations).GetMethods() if m.Name == "GetMember" and not m.IsGenericMethod and m.GetParameters().Length == 2]
assert(len(get_member_search) == 1)
get_member = get_member_search[0]
set_member_search = [m for m in clr.GetClrType(DynamicOperations).GetMethods() if m.Name == "SetMember" and not m.IsGenericMethod and m.GetParameters().Length == 3]
assert(len(set_member_search) == 1)
set_member = set_member_search[0]
convert_to = clr.GetClrType(DynamicOperations).GetMethod(
"ConvertTo",
Array[Type]((object, Type)))
get_type_from_handle = clr.GetClrType(Type).GetMethod("GetTypeFromHandle")
attributes = MethodAttributes.Public
if function_info.is_static: attributes |= MethodAttributes.Static
if function.__name__ == "__new__":
if function_info.is_static: raise TypeError
method_builder = typebld.DefineConstructor(
attributes,
CallingConventions.HasThis,
function.arg_types)
raise NotImplementedError("Need to call self.baseType ctor passing in self.get_python_type_field()")
else:
method_builder = typebld.DefineMethod(
function.__name__,
attributes,
function.return_type,
function.arg_types)
instance_offset = 0 if function_info.is_static else 1
arg_names = function.__code__.co_varnames
for i in range(len(function.arg_types)):
# TODO - set non-trivial ParameterAttributes, default value and custom attributes
p = method_builder.DefineParameter(i + 1, ParameterAttributes_None, arg_names[i + instance_offset])
ilgen = method_builder.GetILGenerator()
args_array = ilgen.DeclareLocal(Array[object])
args_count = len(function.arg_types)
ilgen.Emit(OpCodes.Ldc_I4, args_count)
ilgen.Emit(OpCodes.Newarr, object)
ilgen.Emit(OpCodes.Stloc, args_array)
for i in range(args_count):
arg_type = function.arg_types[i]
if clr.GetClrType(arg_type).IsByRef:
raise NotImplementedError("byref params not supported")
ilgen.Emit(OpCodes.Ldloc, args_array)
ilgen.Emit(OpCodes.Ldc_I4, i)
ilgen.Emit(OpCodes.Ldarg, i + int(not function_info.is_static))
ilgen.Emit(OpCodes.Box, arg_type)
ilgen.Emit(OpCodes.Stelem_Ref)
has_return_value = True
if function_info.prop_name_if_prop_get:
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldstr, function_info.prop_name_if_prop_get)
ilgen.Emit(OpCodes.Callvirt, get_member)
elif function_info.prop_name_if_prop_set:
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldstr, function_info.prop_name_if_prop_set)
ilgen.Emit(OpCodes.Ldarg, 1)
ilgen.Emit(OpCodes.Callvirt, set_member)
has_return_value = False
else:
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
if function_info.is_static:
raise NotImplementedError("need to load Python class object from a CLR static field")
# ilgen.Emit(OpCodes.Ldsfld, class_object)
else:
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldstr, function.__name__)
ilgen.Emit(OpCodes.Ldloc, args_array)
ilgen.Emit(OpCodes.Callvirt, invoke_member)
if has_return_value:
if function.return_type == Void:
ilgen.Emit(OpCodes.Pop)
else:
ret_val = ilgen.DeclareLocal(object)
ilgen.Emit(OpCodes.Stloc, ret_val)
ilgen.Emit(OpCodes.Ldsfld, ClrClass.get_dynamic_operations_field())
ilgen.Emit(OpCodes.Ldloc, ret_val)
ilgen.Emit(OpCodes.Ldtoken, clr.GetClrType(function.return_type))
ilgen.Emit(OpCodes.Call, get_type_from_handle)
ilgen.Emit(OpCodes.Callvirt, convert_to)
ilgen.Emit(OpCodes.Unbox_Any, function.return_type)
ilgen.Emit(OpCodes.Ret)
return method_builder
def emit_method(self, typebld, function_info):
function = function_info.function
if hasattr(function, "DllImportAttributeDecorator"):
dllImportAttributeDecorator = function.DllImportAttributeDecorator
name = function.__name__
dllName = dllImportAttributeDecorator.args[0]
entryName = function.__name__
attributes = MethodAttributes.Public | MethodAttributes.Static | MethodAttributes.PinvokeImpl
callingConvention = CallingConventions.Standard
returnType = function.return_type
returnTypeRequiredCustomModifiers = ()
returnTypeOptionalCustomModifiers = ()
parameterTypes = function.arg_types
parameterTypeRequiredCustomModifiers = None
parameterTypeOptionalCustomModifiers = None
nativeCallConv = CallingConvention.Winapi
nativeCharSet = CharSet.Auto
method_builder = typebld.DefinePInvokeMethod(
name,
dllName,
entryName,
attributes,
callingConvention,
returnType,
returnTypeRequiredCustomModifiers,
returnTypeOptionalCustomModifiers,
parameterTypes,
parameterTypeRequiredCustomModifiers,
parameterTypeOptionalCustomModifiers,
nativeCallConv,
nativeCharSet)
else:
method_builder = self.emit_typed_stub_to_python_method(typebld, function_info)
if hasattr(function, "CustomAttributeBuilders"):
for cab in function.CustomAttributeBuilders:
method_builder.SetCustomAttribute(cab)
return method_builder
def map_pinvoke_methods(self, new_type):
pythonType = clr.GetPythonType(new_type)
for function_info in self.get_typed_methods():
function = function_info.function
if hasattr(function, "DllImportAttributeDecorator"):
# Overwrite the Python function with the pinvoke_method
pinvoke_method = getattr(pythonType, function.__name__)
setattr(self, function.__name__, pinvoke_method)
def emit_python_type_field(self, typebld):
return typebld.DefineField(
"PythonType",
PythonType,
FieldAttributes.Public | FieldAttributes.Static)
def set_python_type_field(self, new_type):
self.PythonType = new_type.GetField("PythonType")
self.PythonType.SetValue(None, self)
def add_wrapper_ctors(self, baseType, typebld):
python_type_field = self.emit_python_type_field(typebld)
for ctor in baseType.GetConstructors():
ctorparams = ctor.GetParameters()
# leave out the PythonType argument
assert(ctorparams[0].ParameterType == clr.GetClrType(PythonType))
ctorparams = ctorparams[1:]
ctorbld = typebld.DefineConstructor(
ctor.Attributes,
ctor.CallingConvention,
tuple([p.ParameterType for p in ctorparams]))
ilgen = ctorbld.GetILGenerator()
ilgen.Emit(OpCodes.Ldarg, 0)
ilgen.Emit(OpCodes.Ldsfld, python_type_field)
for index in range(len(ctorparams)):
ilgen.Emit(OpCodes.Ldarg, index + 1)
ilgen.Emit(OpCodes.Call, ctor)
ilgen.Emit(OpCodes.Ret)
def emit_members(self, typebld):
self.emit_fields(typebld)
self.add_wrapper_ctors(self.baseType, typebld)
super(ClrClass, self).emit_members(typebld)
def map_members(self, new_type):
self.map_fields(new_type)
self.map_pinvoke_methods(new_type)
self.set_python_type_field(new_type)
super(ClrClass, self).map_members(new_type)
def __clrtype__(self):
# CDerived below will use ClrClass as its metaclass, but the user may not expect CDerived
# to be a typed .NET class in this case:
#
# class CBase(object):
# __metaclass__ = ClrClass
# class CDerived(CBase): pass
if not "__metaclass__" in self.__dict__:
return super(ClrClass, self).__clrtype__()
# Create a simple Python type first.
self.baseType = super(ClrType, self).__clrtype__()
# We will now subtype it to create a customized class with the
# CLR attributes as defined by the user
typegen = Snippets.Shared.DefineType(self.get_clr_type_name(), self.baseType, True, False)
typebld = typegen.TypeBuilder
return self.create_type(typebld)
def make_cab(attrib_type, *args, **kwds):
clrtype = clr.GetClrType(attrib_type)
argtypes = tuple(map(lambda x:clr.GetClrType(type(x)), args))
ci = clrtype.GetConstructor(argtypes)
props = ([],[])
fields = ([],[])
for kwd in kwds:
pi = clrtype.GetProperty(kwd)
if pi is not None:
props[0].append(pi)
props[1].append(kwds[kwd])
else:
fi = clrtype.GetField(kwd)
if fi is not None:
fields[0].append(fi)
fields[1].append(kwds[kwd])
else:
raise TypeError("No %s Member found on %s" % (kwd, clrtype.Name))
return CustomAttributeBuilder(ci, args,
tuple(props[0]), tuple(props[1]),
tuple(fields[0]), tuple(fields[1]))
def accepts(*args):
"""
TODO - needs to be merged with clr.accepts
"""
validate_clr_types(args, True)
def decorator(function):
function.arg_types = args
return function
return decorator
def returns(return_type = Void):
"""
TODO - needs to be merged with clr.returns
"""
if return_type != Void:
validate_clr_types(return_type)
def decorator(function):
function.return_type = return_type
return function
return decorator
class CustomAttributeDecorator(object):
"""
This represents information about a custom-attribute applied to a type or a method
Note that we cannot use an instance of System.Attribute to capture this information
as it is not possible to go from an instance of System.Attribute to an instance
of System.Reflection.Emit.CustomAttributeBuilder as the latter needs to know
how to represent information in metadata to later *recreate* a similar instance of
System.Attribute.
Also note that once a CustomAttributeBuilder is created, it is not possible to
query it. Hence, we need to store the arguments required to store the
CustomAttributeBuilder so that pseudo-custom-attributes can get to the information.
"""
def __init__(self, attrib_type, *args, **kwargs):
self.attrib_type = attrib_type
self.args = args
self.kwargs = kwargs
def __call__(self, function):
if self.attrib_type == DllImportAttribute:
function.DllImportAttributeDecorator = self
else:
if not hasattr(function, "CustomAttributeBuilders"):
function.CustomAttributeBuilders = []
function.CustomAttributeBuilders.append(self.GetBuilder())
return function
def GetBuilder(self):
assert not self.attrib_type in [DllImportAttribute]
return make_cab(self.attrib_type, *self.args, **self.kwargs)
def attribute(attrib_type):
"""
This decorator is used to specify a CustomAttribute for a type or method.
"""
def make_decorator(*args, **kwargs):
return CustomAttributeDecorator(attrib_type, *args, **kwargs)
return make_decorator
def propagate_attributes(old_function, new_function):
"""
Use this if you replace a function in a type with ClrInterface or ClrClass as the metaclass.
This will typically be needed if you are defining a decorator which wraps functions with
new functions, and want it to work in conjunction with clrtype
"""
if hasattr(old_function, "return_type"):
new_function.__name__ = old_function.__name__
new_function.return_type = old_function.return_type
new_function.arg_types = old_function.arg_types
if hasattr(old_function, "CustomAttributeBuilders"):
new_function.CustomAttributeBuilders = old_function.CustomAttributeBuilders
if hasattr(old_function, "CustomAttributeBuilders"):
new_function.DllImportAttributeDecorator = old_function.DllImportAttributeDecorator
"""A generic class to build line-oriented command interpreters.
Interpreters constructed with this class obey the following conventions:
1. End of file on input is processed as the command 'EOF'.
2. A command is parsed out of each line by collecting the prefix composed
of characters in the identchars member.
3. A command `foo' is dispatched to a method 'do_foo()'; the do_ method
is passed a single argument consisting of the remainder of the line.
4. Typing an empty line repeats the last command. (Actually, it calls the
method `emptyline', which may be overridden in a subclass.)
5. There is a predefined `help' method. Given an argument `topic', it
calls the command `help_topic'. With no arguments, it lists all topics
with defined help_ functions, broken into up to three topics; documented
commands, miscellaneous help topics, and undocumented commands.
6. The command '?' is a synonym for `help'. The command '!' is a synonym
for `shell', if a do_shell method exists.
7. If completion is enabled, completing commands will be done automatically,
and completing of commands args is done by calling complete_foo() with
arguments text, line, begidx, endidx. text is string we are matching
against, all returned matches must begin with it. line is the current
input line (lstripped), begidx and endidx are the beginning and end
indexes of the text being matched, which could be used to provide
different completion depending upon which position the argument is in.
The `default' method may be overridden to intercept commands for which there
is no do_ method.
The `completedefault' method may be overridden to intercept completions for
commands that have no complete_ method.
The data member `self.ruler' sets the character used to draw separator lines
in the help messages. If empty, no ruler line is drawn. It defaults to "=".
If the value of `self.intro' is nonempty when the cmdloop method is called,
it is printed out on interpreter startup. This value may be overridden
via an optional argument to the cmdloop() method.
The data members `self.doc_header', `self.misc_header', and
`self.undoc_header' set the headers used for the help function's
listings of documented functions, miscellaneous topics, and undocumented
functions respectively.
"""
import string, sys
__all__ = ["Cmd"]
PROMPT = '(Cmd) '
IDENTCHARS = string.ascii_letters + string.digits + '_'
class Cmd:
"""A simple framework for writing line-oriented command interpreters.
These are often useful for test harnesses, administrative tools, and
prototypes that will later be wrapped in a more sophisticated interface.
A Cmd instance or subclass instance is a line-oriented interpreter
framework. There is no good reason to instantiate Cmd itself; rather,
it's useful as a superclass of an interpreter class you define yourself
in order to inherit Cmd's methods and encapsulate action methods.
"""
prompt = PROMPT
identchars = IDENTCHARS
ruler = '='
lastcmd = ''
intro = None
doc_leader = ""
doc_header = "Documented commands (type help <topic>):"
misc_header = "Miscellaneous help topics:"
undoc_header = "Undocumented commands:"
nohelp = "*** No help on %s"
use_rawinput = 1
def __init__(self, completekey='tab', stdin=None, stdout=None):
"""Instantiate a line-oriented interpreter framework.
The optional argument 'completekey' is the readline name of a
completion key; it defaults to the Tab key. If completekey is
not None and the readline module is available, command completion
is done automatically. The optional arguments stdin and stdout
specify alternate input and output file objects; if not specified,
sys.stdin and sys.stdout are used.
"""
if stdin is not None:
self.stdin = stdin
else:
self.stdin = sys.stdin
if stdout is not None:
self.stdout = stdout
else:
self.stdout = sys.stdout
self.cmdqueue = []
self.completekey = completekey
def cmdloop(self, intro=None):
"""Repeatedly issue a prompt, accept input, parse an initial prefix
off the received input, and dispatch to action methods, passing them
the remainder of the line as argument.
"""
self.preloop()
if self.use_rawinput and self.completekey:
try:
import readline
self.old_completer = readline.get_completer()
readline.set_completer(self.complete)
readline.parse_and_bind(self.completekey+": complete")
except ImportError:
pass
try:
if intro is not None:
self.intro = intro
if self.intro:
self.stdout.write(str(self.intro)+"\n")
stop = None
while not stop:
if self.cmdqueue:
line = self.cmdqueue.pop(0)
else:
if self.use_rawinput:
try:
line = input(self.prompt)
except EOFError:
line = 'EOF'
else:
self.stdout.write(self.prompt)
self.stdout.flush()
line = self.stdin.readline()
if not len(line):
line = 'EOF'
else:
line = line.rstrip('\r\n')
line = self.precmd(line)
stop = self.onecmd(line)
stop = self.postcmd(stop, line)
self.postloop()
finally:
if self.use_rawinput and self.completekey:
try:
import readline
readline.set_completer(self.old_completer)
except ImportError:
pass
def precmd(self, line):
"""Hook method executed just before the command line is
interpreted, but after the input prompt is generated and issued.
"""
return line
def postcmd(self, stop, line):
"""Hook method executed just after a command dispatch is finished."""
return stop
def preloop(self):
"""Hook method executed once when the cmdloop() method is called."""
pass
def postloop(self):
"""Hook method executed once when the cmdloop() method is about to
return.
"""
pass
def parseline(self, line):
"""Parse the line into a command name and a string containing
the arguments. Returns a tuple containing (command, args, line).
'command' and 'args' may be None if the line couldn't be parsed.
"""
line = line.strip()
if not line:
return None, None, line
elif line[0] == '?':
line = 'help ' + line[1:]
elif line[0] == '!':
if hasattr(self, 'do_shell'):
line = 'shell ' + line[1:]
else:
return None, None, line
i, n = 0, len(line)
while i < n and line[i] in self.identchars: i = i+1
cmd, arg = line[:i], line[i:].strip()
return cmd, arg, line
def onecmd(self, line):
"""Interpret the argument as though it had been typed in response
to the prompt.
This may be overridden, but should not normally need to be;
see the precmd() and postcmd() methods for useful execution hooks.
The return value is a flag indicating whether interpretation of
commands by the interpreter should stop.
"""
cmd, arg, line = self.parseline(line)
if not line:
return self.emptyline()
if cmd is None:
return self.default(line)
self.lastcmd = line
if line == 'EOF' :
self.lastcmd = ''
if cmd == '':
return self.default(line)
else:
try:
func = getattr(self, 'do_' + cmd)
except AttributeError:
return self.default(line)
return func(arg)
def emptyline(self):
"""Called when an empty line is entered in response to the prompt.
If this method is not overridden, it repeats the last nonempty
command entered.
"""
if self.lastcmd:
return self.onecmd(self.lastcmd)
def default(self, line):
"""Called on an input line when the command prefix is not recognized.
If this method is not overridden, it prints an error message and
returns.
"""
self.stdout.write('*** Unknown syntax: %s\n'%line)
def completedefault(self, *ignored):
"""Method called to complete an input line when no command-specific
complete_*() method is available.
By default, it returns an empty list.
"""
return []
def completenames(self, text, *ignored):
dotext = 'do_'+text
return [a[3:] for a in self.get_names() if a.startswith(dotext)]
def complete(self, text, state):
"""Return the next possible completion for 'text'.
If a command has not been entered, then complete against command list.
Otherwise try to call complete_<command> to get list of completions.
"""
if state == 0:
import readline
origline = readline.get_line_buffer()
line = origline.lstrip()
stripped = len(origline) - len(line)
begidx = readline.get_begidx() - stripped
endidx = readline.get_endidx() - stripped
if begidx>0:
cmd, args, foo = self.parseline(line)
if cmd == '':
compfunc = self.completedefault
else:
try:
compfunc = getattr(self, 'complete_' + cmd)
except AttributeError:
compfunc = self.completedefault
else:
compfunc = self.completenames
self.completion_matches = compfunc(text, line, begidx, endidx)
try:
return self.completion_matches[state]
except IndexError:
return None
def get_names(self):
# This method used to pull in base class attributes
# at a time dir() didn't do it yet.
return dir(self.__class__)
def complete_help(self, *args):
commands = set(self.completenames(*args))
topics = set(a[5:] for a in self.get_names()
if a.startswith('help_' + args[0]))
return list(commands | topics)
def do_help(self, arg):
'List available commands with "help" or detailed help with "help cmd".'
if arg:
# XXX check arg syntax
try:
func = getattr(self, 'help_' + arg)
except AttributeError:
try:
doc=getattr(self, 'do_' + arg).__doc__
if doc:
self.stdout.write("%s\n"%str(doc))
return
except AttributeError:
pass
self.stdout.write("%s\n"%str(self.nohelp % (arg,)))
return
func()
else:
names = self.get_names()
cmds_doc = []
cmds_undoc = []
help = {}
for name in names:
if name[:5] == 'help_':
help[name[5:]]=1
names.sort()
# There can be duplicates if routines overridden
prevname = ''
for name in names:
if name[:3] == 'do_':
if name == prevname:
continue
prevname = name
cmd=name[3:]
if cmd in help:
cmds_doc.append(cmd)
del help[cmd]
elif getattr(self, name).__doc__:
cmds_doc.append(cmd)
else:
cmds_undoc.append(cmd)
self.stdout.write("%s\n"%str(self.doc_leader))
self.print_topics(self.doc_header, cmds_doc, 15,80)
self.print_topics(self.misc_header, list(help.keys()),15,80)
self.print_topics(self.undoc_header, cmds_undoc, 15,80)
def print_topics(self, header, cmds, cmdlen, maxcol):
if cmds:
self.stdout.write("%s\n"%str(header))
if self.ruler:
self.stdout.write("%s\n"%str(self.ruler * len(header)))
self.columnize(cmds, maxcol-1)
self.stdout.write("\n")
def columnize(self, list, displaywidth=80):
"""Display a list of strings as a compact set of columns.
Each column is only as wide as necessary.
Columns are separated by two spaces (one was not legible enough).
"""
if not list:
self.stdout.write("<empty>\n")
return
nonstrings = [i for i in range(len(list))
if not isinstance(list[i], str)]
if nonstrings:
raise TypeError("list[i] not a string for i in %s"
% ", ".join(map(str, nonstrings)))
size = len(list)
if size == 1:
self.stdout.write('%s\n'%str(list[0]))
return
# Try every row count from 1 upwards
for nrows in range(1, len(list)):
ncols = (size+nrows-1) // nrows
colwidths = []
totwidth = -2
for col in range(ncols):
colwidth = 0
for row in range(nrows):
i = row + nrows*col
if i >= size:
break
x = list[i]
colwidth = max(colwidth, len(x))
colwidths.append(colwidth)
totwidth += colwidth + 2
if totwidth > displaywidth:
break
if totwidth <= displaywidth:
break
else:
nrows = len(list)
ncols = 1
colwidths = [0]
for row in range(nrows):
texts = []
for col in range(ncols):
i = row + nrows*col
if i >= size:
x = ""
else:
x = list[i]
texts.append(x)
while texts and not texts[-1]:
del texts[-1]
for col in range(len(texts)):
texts[col] = texts[col].ljust(colwidths[col])
self.stdout.write("%s\n"%str(" ".join(texts)))
"""Utilities needed to emulate Python's interactive interpreter.
"""
# Inspired by similar code by Jeff Epler and Fredrik Lundh.
import sys
import traceback
from codeop import CommandCompiler, compile_command
__all__ = ["InteractiveInterpreter", "InteractiveConsole", "interact",
"compile_command"]
class InteractiveInterpreter:
"""Base class for InteractiveConsole.
This class deals with parsing and interpreter state (the user's
namespace); it doesn't deal with input buffering or prompting or
input file naming (the filename is always passed in explicitly).
"""
def __init__(self, locals=None):
"""Constructor.
The optional 'locals' argument specifies the dictionary in
which code will be executed; it defaults to a newly created
dictionary with key "__name__" set to "__console__" and key
"__doc__" set to None.
"""
if locals is None:
locals = {"__name__": "__console__", "__doc__": None}
self.locals = locals
self.compile = CommandCompiler()
def runsource(self, source, filename="<input>", symbol="single"):
"""Compile and run some source in the interpreter.
Arguments are as for compile_command().
One several things can happen:
1) The input is incorrect; compile_command() raised an
exception (SyntaxError or OverflowError). A syntax traceback
will be printed by calling the showsyntaxerror() method.
2) The input is incomplete, and more input is required;
compile_command() returned None. Nothing happens.
3) The input is complete; compile_command() returned a code
object. The code is executed by calling self.runcode() (which
also handles run-time exceptions, except for SystemExit).
The return value is True in case 2, False in the other cases (unless
an exception is raised). The return value can be used to
decide whether to use sys.ps1 or sys.ps2 to prompt the next
line.
"""
try:
code = self.compile(source, filename, symbol)
except (OverflowError, SyntaxError, ValueError):
# Case 1
self.showsyntaxerror(filename)
return False
if code is None:
# Case 2
return True
# Case 3
self.runcode(code)
return False
def runcode(self, code):
"""Execute a code object.
When an exception occurs, self.showtraceback() is called to
display a traceback. All exceptions are caught except
SystemExit, which is reraised.
A note about KeyboardInterrupt: this exception may occur
elsewhere in this code, and may not always be caught. The
caller should be prepared to deal with it.
"""
try:
exec(code, self.locals)
except SystemExit:
raise
except:
self.showtraceback()
def showsyntaxerror(self, filename=None):
"""Display the syntax error that just occurred.
This doesn't display a stack trace because there isn't one.
If a filename is given, it is stuffed in the exception instead
of what was there before (because Python's parser always uses
"<string>" when reading from a string).
The output is written by self.write(), below.
"""
type, value, tb = sys.exc_info()
sys.last_type = type
sys.last_value = value
sys.last_traceback = tb
if filename and type is SyntaxError:
# Work hard to stuff the correct filename in the exception
try:
msg, (dummy_filename, lineno, offset, line) = value.args
except ValueError:
# Not the format we expect; leave it alone
pass
else:
# Stuff in the right filename
value = SyntaxError(msg, (filename, lineno, offset, line))
sys.last_value = value
if sys.excepthook is sys.__excepthook__:
lines = traceback.format_exception_only(type, value)
self.write(''.join(lines))
else:
# If someone has set sys.excepthook, we let that take precedence
# over self.write
sys.excepthook(type, value, tb)
def showtraceback(self):
"""Display the exception that just occurred.
We remove the first stack item because it is our own code.
The output is written by self.write(), below.
"""
try:
type, value, tb = sys.exc_info()
sys.last_type = type
sys.last_value = value
sys.last_traceback = tb
tblist = traceback.extract_tb(tb)
del tblist[:1]
lines = traceback.format_list(tblist)
if lines:
lines.insert(0, "Traceback (most recent call last):\n")
lines.extend(traceback.format_exception_only(type, value))
finally:
tblist = tb = None
if sys.excepthook is sys.__excepthook__:
self.write(''.join(lines))
else:
# If someone has set sys.excepthook, we let that take precedence
# over self.write
sys.excepthook(type, value, tb)
def write(self, data):
"""Write a string.
The base implementation writes to sys.stderr; a subclass may
replace this with a different implementation.
"""
sys.stderr.write(data)
class InteractiveConsole(InteractiveInterpreter):
"""Closely emulate the behavior of the interactive Python interpreter.
This class builds on InteractiveInterpreter and adds prompting
using the familiar sys.ps1 and sys.ps2, and input buffering.
"""
def __init__(self, locals=None, filename="<console>"):
"""Constructor.
The optional locals argument will be passed to the
InteractiveInterpreter base class.
The optional filename argument should specify the (file)name
of the input stream; it will show up in tracebacks.
"""
InteractiveInterpreter.__init__(self, locals)
self.filename = filename
self.resetbuffer()
def resetbuffer(self):
"""Reset the input buffer."""
self.buffer = []
def interact(self, banner=None):
"""Closely emulate the interactive Python console.
The optional banner argument specifies the banner to print
before the first interaction; by default it prints a banner
similar to the one printed by the real Python interpreter,
followed by the current class name in parentheses (so as not
to confuse this with the real interpreter -- since it's so
close!).
"""
try:
sys.ps1
except AttributeError:
sys.ps1 = ">>> "
try:
sys.ps2
except AttributeError:
sys.ps2 = "... "
cprt = 'Type "help", "copyright", "credits" or "license" for more information.'
if banner is None:
self.write("Python %s on %s\n%s\n(%s)\n" %
(sys.version, sys.platform, cprt,
self.__class__.__name__))
elif banner:
self.write("%s\n" % str(banner))
more = 0
while 1:
try:
if more:
prompt = sys.ps2
else:
prompt = sys.ps1
try:
line = self.raw_input(prompt)
except EOFError:
self.write("\n")
break
else:
more = self.push(line)
except KeyboardInterrupt:
self.write("\nKeyboardInterrupt\n")
self.resetbuffer()
more = 0
def push(self, line):
"""Push a line to the interpreter.
The line should not have a trailing newline; it may have
internal newlines. The line is appended to a buffer and the
interpreter's runsource() method is called with the
concatenated contents of the buffer as source. If this
indicates that the command was executed or invalid, the buffer
is reset; otherwise, the command is incomplete, and the buffer
is left as it was after the line was appended. The return
value is 1 if more input is required, 0 if the line was dealt
with in some way (this is the same as runsource()).
"""
self.buffer.append(line)
source = "\n".join(self.buffer)
more = self.runsource(source, self.filename)
if not more:
self.resetbuffer()
return more
def raw_input(self, prompt=""):
"""Write a prompt and read a line.
The returned line does not include the trailing newline.
When the user enters the EOF key sequence, EOFError is raised.
The base implementation uses the built-in function
input(); a subclass may replace this with a different
implementation.
"""
return input(prompt)
def interact(banner=None, readfunc=None, local=None):
"""Closely emulate the interactive Python interpreter.
This is a backwards compatible interface to the InteractiveConsole
class. When readfunc is not specified, it attempts to import the
readline module to enable GNU readline if it is available.
Arguments (all optional, all default to None):
banner -- passed to InteractiveConsole.interact()
readfunc -- if not None, replaces InteractiveConsole.raw_input()
local -- passed to InteractiveInterpreter.__init__()
"""
console = InteractiveConsole(local)
if readfunc is not None:
console.raw_input = readfunc
else:
try:
import readline
except ImportError:
pass
console.interact(banner)
if __name__ == "__main__":
interact()
""" codecs -- Python Codec Registry, API and helpers.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""#"
import builtins, sys
### Registry and builtin stateless codec functions
try:
from _codecs import *
except ImportError as why:
raise SystemError('Failed to load the builtin codecs: %s' % why)
__all__ = ["register", "lookup", "open", "EncodedFile", "BOM", "BOM_BE",
"BOM_LE", "BOM32_BE", "BOM32_LE", "BOM64_BE", "BOM64_LE",
"BOM_UTF8", "BOM_UTF16", "BOM_UTF16_LE", "BOM_UTF16_BE",
"BOM_UTF32", "BOM_UTF32_LE", "BOM_UTF32_BE",
"CodecInfo", "Codec", "IncrementalEncoder", "IncrementalDecoder",
"StreamReader", "StreamWriter",
"StreamReaderWriter", "StreamRecoder",
"getencoder", "getdecoder", "getincrementalencoder",
"getincrementaldecoder", "getreader", "getwriter",
"encode", "decode", "iterencode", "iterdecode",
"strict_errors", "ignore_errors", "replace_errors",
"xmlcharrefreplace_errors", "backslashreplace_errors",
"register_error", "lookup_error"]
### Constants
#
# Byte Order Mark (BOM = ZERO WIDTH NO-BREAK SPACE = U+FEFF)
# and its possible byte string values
# for UTF8/UTF16/UTF32 output and little/big endian machines
#
# UTF-8
BOM_UTF8 = b'\xef\xbb\xbf'
# UTF-16, little endian
BOM_LE = BOM_UTF16_LE = b'\xff\xfe'
# UTF-16, big endian
BOM_BE = BOM_UTF16_BE = b'\xfe\xff'
# UTF-32, little endian
BOM_UTF32_LE = b'\xff\xfe\x00\x00'
# UTF-32, big endian
BOM_UTF32_BE = b'\x00\x00\xfe\xff'
if sys.byteorder == 'little':
# UTF-16, native endianness
BOM = BOM_UTF16 = BOM_UTF16_LE
# UTF-32, native endianness
BOM_UTF32 = BOM_UTF32_LE
else:
# UTF-16, native endianness
BOM = BOM_UTF16 = BOM_UTF16_BE
# UTF-32, native endianness
BOM_UTF32 = BOM_UTF32_BE
# Old broken names (don't use in new code)
BOM32_LE = BOM_UTF16_LE
BOM32_BE = BOM_UTF16_BE
BOM64_LE = BOM_UTF32_LE
BOM64_BE = BOM_UTF32_BE
### Codec base classes (defining the API)
class CodecInfo(tuple):
"""Codec details when looking up the codec registry"""
# Private API to allow Python 3.4 to blacklist the known non-Unicode
# codecs in the standard library. A more general mechanism to
# reliably distinguish test encodings from other codecs will hopefully
# be defined for Python 3.5
#
# See http://bugs.python.org/issue19619
_is_text_encoding = True # Assume codecs are text encodings by default
def __new__(cls, encode, decode, streamreader=None, streamwriter=None,
incrementalencoder=None, incrementaldecoder=None, name=None,
*, _is_text_encoding=None):
self = tuple.__new__(cls, (encode, decode, streamreader, streamwriter))
self.name = name
self.encode = encode
self.decode = decode
self.incrementalencoder = incrementalencoder
self.incrementaldecoder = incrementaldecoder
self.streamwriter = streamwriter
self.streamreader = streamreader
if _is_text_encoding is not None:
self._is_text_encoding = _is_text_encoding
return self
def __repr__(self):
return "<%s.%s object for encoding %s at 0x%x>" % \
(self.__class__.__module__, self.__class__.__name__,
self.name, id(self))
class Codec:
""" Defines the interface for stateless encoders/decoders.
The .encode()/.decode() methods may use different error
handling schemes by providing the errors argument. These
string values are predefined:
'strict' - raise a ValueError error (or a subclass)
'ignore' - ignore the character and continue with the next
'replace' - replace with a suitable replacement character;
Python will use the official U+FFFD REPLACEMENT
CHARACTER for the builtin Unicode codecs on
decoding and '?' on encoding.
'surrogateescape' - replace with private code points U+DCnn.
'xmlcharrefreplace' - Replace with the appropriate XML
character reference (only for encoding).
'backslashreplace' - Replace with backslashed escape sequences
(only for encoding).
The set of allowed values can be extended via register_error.
"""
def encode(self, input, errors='strict'):
""" Encodes the object input and returns a tuple (output
object, length consumed).
errors defines the error handling to apply. It defaults to
'strict' handling.
The method may not store state in the Codec instance. Use
StreamWriter for codecs which have to keep state in order to
make encoding efficient.
The encoder must be able to handle zero length input and
return an empty object of the output object type in this
situation.
"""
raise NotImplementedError
def decode(self, input, errors='strict'):
""" Decodes the object input and returns a tuple (output
object, length consumed).
input must be an object which provides the bf_getreadbuf
buffer slot. Python strings, buffer objects and memory
mapped files are examples of objects providing this slot.
errors defines the error handling to apply. It defaults to
'strict' handling.
The method may not store state in the Codec instance. Use
StreamReader for codecs which have to keep state in order to
make decoding efficient.
The decoder must be able to handle zero length input and
return an empty object of the output object type in this
situation.
"""
raise NotImplementedError
class IncrementalEncoder(object):
"""
An IncrementalEncoder encodes an input in multiple steps. The input can
be passed piece by piece to the encode() method. The IncrementalEncoder
remembers the state of the encoding process between calls to encode().
"""
def __init__(self, errors='strict'):
"""
Creates an IncrementalEncoder instance.
The IncrementalEncoder may use different error handling schemes by
providing the errors keyword argument. See the module docstring
for a list of possible values.
"""
self.errors = errors
self.buffer = ""
def encode(self, input, final=False):
"""
Encodes input and returns the resulting object.
"""
raise NotImplementedError
def reset(self):
"""
Resets the encoder to the initial state.
"""
def getstate(self):
"""
Return the current state of the encoder.
"""
return 0
def setstate(self, state):
"""
Set the current state of the encoder. state must have been
returned by getstate().
"""
class BufferedIncrementalEncoder(IncrementalEncoder):
"""
This subclass of IncrementalEncoder can be used as the baseclass for an
incremental encoder if the encoder must keep some of the output in a
buffer between calls to encode().
"""
def __init__(self, errors='strict'):
IncrementalEncoder.__init__(self, errors)
# unencoded input that is kept between calls to encode()
self.buffer = ""
def _buffer_encode(self, input, errors, final):
# Overwrite this method in subclasses: It must encode input
# and return an (output, length consumed) tuple
raise NotImplementedError
def encode(self, input, final=False):
# encode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_encode(data, self.errors, final)
# keep unencoded input until the next call
self.buffer = data[consumed:]
return result
def reset(self):
IncrementalEncoder.reset(self)
self.buffer = ""
def getstate(self):
return self.buffer or 0
def setstate(self, state):
self.buffer = state or ""
class IncrementalDecoder(object):
"""
An IncrementalDecoder decodes an input in multiple steps. The input can
be passed piece by piece to the decode() method. The IncrementalDecoder
remembers the state of the decoding process between calls to decode().
"""
def __init__(self, errors='strict'):
"""
Create an IncrementalDecoder instance.
The IncrementalDecoder may use different error handling schemes by
providing the errors keyword argument. See the module docstring
for a list of possible values.
"""
self.errors = errors
def decode(self, input, final=False):
"""
Decode input and returns the resulting object.
"""
raise NotImplementedError
def reset(self):
"""
Reset the decoder to the initial state.
"""
def getstate(self):
"""
Return the current state of the decoder.
This must be a (buffered_input, additional_state_info) tuple.
buffered_input must be a bytes object containing bytes that
were passed to decode() that have not yet been converted.
additional_state_info must be a non-negative integer
representing the state of the decoder WITHOUT yet having
processed the contents of buffered_input. In the initial state
and after reset(), getstate() must return (b"", 0).
"""
return (b"", 0)
def setstate(self, state):
"""
Set the current state of the decoder.
state must have been returned by getstate(). The effect of
setstate((b"", 0)) must be equivalent to reset().
"""
class BufferedIncrementalDecoder(IncrementalDecoder):
"""
This subclass of IncrementalDecoder can be used as the baseclass for an
incremental decoder if the decoder must be able to handle incomplete
byte sequences.
"""
def __init__(self, errors='strict'):
IncrementalDecoder.__init__(self, errors)
# undecoded input that is kept between calls to decode()
self.buffer = b""
def _buffer_decode(self, input, errors, final):
# Overwrite this method in subclasses: It must decode input
# and return an (output, length consumed) tuple
raise NotImplementedError
def decode(self, input, final=False):
# decode input (taking the buffer into account)
data = self.buffer + input
(result, consumed) = self._buffer_decode(data, self.errors, final)
# keep undecoded input until the next call
self.buffer = data[consumed:]
return result
def reset(self):
IncrementalDecoder.reset(self)
self.buffer = b""
def getstate(self):
# additional state info is always 0
return (self.buffer, 0)
def setstate(self, state):
# ignore additional state info
self.buffer = state[0]
#
# The StreamWriter and StreamReader class provide generic working
# interfaces which can be used to implement new encoding submodules
# very easily. See encodings/utf_8.py for an example on how this is
# done.
#
class StreamWriter(Codec):
def __init__(self, stream, errors='strict'):
""" Creates a StreamWriter instance.
stream must be a file-like object open for writing.
The StreamWriter may use different error handling
schemes by providing the errors keyword argument. These
parameters are predefined:
'strict' - raise a ValueError (or a subclass)
'ignore' - ignore the character and continue with the next
'replace'- replace with a suitable replacement character
'xmlcharrefreplace' - Replace with the appropriate XML
character reference.
'backslashreplace' - Replace with backslashed escape
sequences (only for encoding).
The set of allowed parameter values can be extended via
register_error.
"""
self.stream = stream
self.errors = errors
def write(self, object):
""" Writes the object's contents encoded to self.stream.
"""
data, consumed = self.encode(object, self.errors)
self.stream.write(data)
def writelines(self, list):
""" Writes the concatenated list of strings to the stream
using .write().
"""
self.write(''.join(list))
def reset(self):
""" Flushes and resets the codec buffers used for keeping state.
Calling this method should ensure that the data on the
output is put into a clean state, that allows appending
of new fresh data without having to rescan the whole
stream to recover state.
"""
pass
def seek(self, offset, whence=0):
self.stream.seek(offset, whence)
if whence == 0 and offset == 0:
self.reset()
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
###
class StreamReader(Codec):
charbuffertype = str
def __init__(self, stream, errors='strict'):
""" Creates a StreamReader instance.
stream must be a file-like object open for reading.
The StreamReader may use different error handling
schemes by providing the errors keyword argument. These
parameters are predefined:
'strict' - raise a ValueError (or a subclass)
'ignore' - ignore the character and continue with the next
'replace'- replace with a suitable replacement character;
The set of allowed parameter values can be extended via
register_error.
"""
self.stream = stream
self.errors = errors
self.bytebuffer = b""
self._empty_charbuffer = self.charbuffertype()
self.charbuffer = self._empty_charbuffer
self.linebuffer = None
def decode(self, input, errors='strict'):
raise NotImplementedError
def read(self, size=-1, chars=-1, firstline=False):
""" Decodes data from the stream self.stream and returns the
resulting object.
chars indicates the number of decoded code points or bytes to
return. read() will never return more data than requested,
but it might return less, if there is not enough available.
size indicates the approximate maximum number of decoded
bytes or code points to read for decoding. The decoder
can modify this setting as appropriate. The default value
-1 indicates to read and decode as much as possible. size
is intended to prevent having to decode huge files in one
step.
If firstline is true, and a UnicodeDecodeError happens
after the first line terminator in the input only the first line
will be returned, the rest of the input will be kept until the
next call to read().
The method should use a greedy read strategy, meaning that
it should read as much data as is allowed within the
definition of the encoding and the given size, e.g. if
optional encoding endings or state markers are available
on the stream, these should be read too.
"""
# If we have lines cached, first merge them back into characters
if self.linebuffer:
self.charbuffer = self._empty_charbuffer.join(self.linebuffer)
self.linebuffer = None
# read until we get the required number of characters (if available)
while True:
# can the request be satisfied from the character buffer?
if chars >= 0:
if len(self.charbuffer) >= chars:
break
elif size >= 0:
if len(self.charbuffer) >= size:
break
# we need more data
if size < 0:
newdata = self.stream.read()
else:
newdata = self.stream.read(size)
# decode bytes (those remaining from the last call included)
data = self.bytebuffer + newdata
if not data:
break
try:
newchars, decodedbytes = self.decode(data, self.errors)
except UnicodeDecodeError as exc:
if firstline:
newchars, decodedbytes = \
self.decode(data[:exc.start], self.errors)
lines = newchars.splitlines(keepends=True)
if len(lines)<=1:
raise
else:
raise
# keep undecoded bytes until the next call
self.bytebuffer = data[decodedbytes:]
# put new characters in the character buffer
self.charbuffer += newchars
# there was no data available
if not newdata:
break
if chars < 0:
# Return everything we've got
result = self.charbuffer
self.charbuffer = self._empty_charbuffer
else:
# Return the first chars characters
result = self.charbuffer[:chars]
self.charbuffer = self.charbuffer[chars:]
return result
def readline(self, size=None, keepends=True):
""" Read one line from the input stream and return the
decoded data.
size, if given, is passed as size argument to the
read() method.
"""
# If we have lines cached from an earlier read, return
# them unconditionally
if self.linebuffer:
line = self.linebuffer[0]
del self.linebuffer[0]
if len(self.linebuffer) == 1:
# revert to charbuffer mode; we might need more data
# next time
self.charbuffer = self.linebuffer[0]
self.linebuffer = None
if not keepends:
line = line.splitlines(keepends=False)[0]
return line
readsize = size or 72
line = self._empty_charbuffer
# If size is given, we call read() only once
while True:
data = self.read(readsize, firstline=True)
if data:
# If we're at a "\r" read one extra character (which might
# be a "\n") to get a proper line ending. If the stream is
# temporarily exhausted we return the wrong line ending.
if (isinstance(data, str) and data.endswith("\r")) or \
(isinstance(data, bytes) and data.endswith(b"\r")):
data += self.read(size=1, chars=1)
line += data
lines = line.splitlines(keepends=True)
if lines:
if len(lines) > 1:
# More than one line result; the first line is a full line
# to return
line = lines[0]
del lines[0]
if len(lines) > 1:
# cache the remaining lines
lines[-1] += self.charbuffer
self.linebuffer = lines
self.charbuffer = None
else:
# only one remaining line, put it back into charbuffer
self.charbuffer = lines[0] + self.charbuffer
if not keepends:
line = line.splitlines(keepends=False)[0]
break
line0withend = lines[0]
line0withoutend = lines[0].splitlines(keepends=False)[0]
if line0withend != line0withoutend: # We really have a line end
# Put the rest back together and keep it until the next call
self.charbuffer = self._empty_charbuffer.join(lines[1:]) + \
self.charbuffer
if keepends:
line = line0withend
else:
line = line0withoutend
break
# we didn't get anything or this was our only try
if not data or size is not None:
if line and not keepends:
line = line.splitlines(keepends=False)[0]
break
if readsize < 8000:
readsize *= 2
return line
def readlines(self, sizehint=None, keepends=True):
""" Read all lines available on the input stream
and return them as a list.
Line breaks are implemented using the codec's decoder
method and are included in the list entries.
sizehint, if given, is ignored since there is no efficient
way to finding the true end-of-line.
"""
data = self.read()
return data.splitlines(keepends)
def reset(self):
""" Resets the codec buffers used for keeping state.
Note that no stream repositioning should take place.
This method is primarily intended to be able to recover
from decoding errors.
"""
self.bytebuffer = b""
self.charbuffer = self._empty_charbuffer
self.linebuffer = None
def seek(self, offset, whence=0):
""" Set the input stream's current position.
Resets the codec buffers used for keeping state.
"""
self.stream.seek(offset, whence)
self.reset()
def __next__(self):
""" Return the next decoded line from the input stream."""
line = self.readline()
if line:
return line
raise StopIteration
def __iter__(self):
return self
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
###
class StreamReaderWriter:
""" StreamReaderWriter instances allow wrapping streams which
work in both read and write modes.
The design is such that one can use the factory functions
returned by the codec.lookup() function to construct the
instance.
"""
# Optional attributes set by the file wrappers below
encoding = 'unknown'
def __init__(self, stream, Reader, Writer, errors='strict'):
""" Creates a StreamReaderWriter instance.
stream must be a Stream-like object.
Reader, Writer must be factory functions or classes
providing the StreamReader, StreamWriter interface resp.
Error handling is done in the same way as defined for the
StreamWriter/Readers.
"""
self.stream = stream
self.reader = Reader(stream, errors)
self.writer = Writer(stream, errors)
self.errors = errors
def read(self, size=-1):
return self.reader.read(size)
def readline(self, size=None):
return self.reader.readline(size)
def readlines(self, sizehint=None):
return self.reader.readlines(sizehint)
def __next__(self):
""" Return the next decoded line from the input stream."""
return next(self.reader)
def __iter__(self):
return self
def write(self, data):
return self.writer.write(data)
def writelines(self, list):
return self.writer.writelines(list)
def reset(self):
self.reader.reset()
self.writer.reset()
def seek(self, offset, whence=0):
self.stream.seek(offset, whence)
self.reader.reset()
if whence == 0 and offset == 0:
self.writer.reset()
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
# these are needed to make "with codecs.open(...)" work properly
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
###
class StreamRecoder:
""" StreamRecoder instances translate data from one encoding to another.
They use the complete set of APIs returned by the
codecs.lookup() function to implement their task.
Data written to the StreamRecoder is first decoded into an
intermediate format (depending on the "decode" codec) and then
written to the underlying stream using an instance of the provided
Writer class.
In the other direction, data is read from the underlying stream using
a Reader instance and then encoded and returned to the caller.
"""
# Optional attributes set by the file wrappers below
data_encoding = 'unknown'
file_encoding = 'unknown'
def __init__(self, stream, encode, decode, Reader, Writer,
errors='strict'):
""" Creates a StreamRecoder instance which implements a two-way
conversion: encode and decode work on the frontend (the
data visible to .read() and .write()) while Reader and Writer
work on the backend (the data in stream).
You can use these objects to do transparent
transcodings from e.g. latin-1 to utf-8 and back.
stream must be a file-like object.
encode and decode must adhere to the Codec interface; Reader and
Writer must be factory functions or classes providing the
StreamReader and StreamWriter interfaces resp.
Error handling is done in the same way as defined for the
StreamWriter/Readers.
"""
self.stream = stream
self.encode = encode
self.decode = decode
self.reader = Reader(stream, errors)
self.writer = Writer(stream, errors)
self.errors = errors
def read(self, size=-1):
data = self.reader.read(size)
data, bytesencoded = self.encode(data, self.errors)
return data
def readline(self, size=None):
if size is None:
data = self.reader.readline()
else:
data = self.reader.readline(size)
data, bytesencoded = self.encode(data, self.errors)
return data
def readlines(self, sizehint=None):
data = self.reader.read()
data, bytesencoded = self.encode(data, self.errors)
return data.splitlines(keepends=True)
def __next__(self):
""" Return the next decoded line from the input stream."""
data = next(self.reader)
data, bytesencoded = self.encode(data, self.errors)
return data
def __iter__(self):
return self
def write(self, data):
data, bytesdecoded = self.decode(data, self.errors)
return self.writer.write(data)
def writelines(self, list):
data = ''.join(list)
data, bytesdecoded = self.decode(data, self.errors)
return self.writer.write(data)
def reset(self):
self.reader.reset()
self.writer.reset()
def __getattr__(self, name,
getattr=getattr):
""" Inherit all other methods from the underlying stream.
"""
return getattr(self.stream, name)
def __enter__(self):
return self
def __exit__(self, type, value, tb):
self.stream.close()
### Shortcuts
def open(filename, mode='r', encoding=None, errors='strict', buffering=1):
""" Open an encoded file using the given mode and return
a wrapped version providing transparent encoding/decoding.
Note: The wrapped version will only accept the object format
defined by the codecs, i.e. Unicode objects for most builtin
codecs. Output is also codec dependent and will usually be
Unicode as well.
Underlying encoded files are always opened in binary mode.
The default file mode is 'r', meaning to open the file in read mode.
encoding specifies the encoding which is to be used for the
file.
errors may be given to define the error handling. It defaults
to 'strict' which causes ValueErrors to be raised in case an
encoding error occurs.
buffering has the same meaning as for the builtin open() API.
It defaults to line buffered.
The returned wrapped file object provides an extra attribute
.encoding which allows querying the used encoding. This
attribute is only available if an encoding was specified as
parameter.
"""
if encoding is not None and \
'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'
file = builtins.open(filename, mode, buffering)
if encoding is None:
return file
info = lookup(encoding)
srw = StreamReaderWriter(file, info.streamreader, info.streamwriter, errors)
# Add attributes to simplify introspection
srw.encoding = encoding
return srw
def EncodedFile(file, data_encoding, file_encoding=None, errors='strict'):
""" Return a wrapped version of file which provides transparent
encoding translation.
Data written to the wrapped file is decoded according
to the given data_encoding and then encoded to the underlying
file using file_encoding. The intermediate data type
will usually be Unicode but depends on the specified codecs.
Bytes read from the file are decoded using file_encoding and then
passed back to the caller encoded using data_encoding.
If file_encoding is not given, it defaults to data_encoding.
errors may be given to define the error handling. It defaults
to 'strict' which causes ValueErrors to be raised in case an
encoding error occurs.
The returned wrapped file object provides two extra attributes
.data_encoding and .file_encoding which reflect the given
parameters of the same name. The attributes can be used for
introspection by Python programs.
"""
if file_encoding is None:
file_encoding = data_encoding
data_info = lookup(data_encoding)
file_info = lookup(file_encoding)
sr = StreamRecoder(file, data_info.encode, data_info.decode,
file_info.streamreader, file_info.streamwriter, errors)
# Add attributes to simplify introspection
sr.data_encoding = data_encoding
sr.file_encoding = file_encoding
return sr
### Helpers for codec lookup
def getencoder(encoding):
""" Lookup up the codec for the given encoding and return
its encoder function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).encode
def getdecoder(encoding):
""" Lookup up the codec for the given encoding and return
its decoder function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).decode
def getincrementalencoder(encoding):
""" Lookup up the codec for the given encoding and return
its IncrementalEncoder class or factory function.
Raises a LookupError in case the encoding cannot be found
or the codecs doesn't provide an incremental encoder.
"""
encoder = lookup(encoding).incrementalencoder
if encoder is None:
raise LookupError(encoding)
return encoder
def getincrementaldecoder(encoding):
""" Lookup up the codec for the given encoding and return
its IncrementalDecoder class or factory function.
Raises a LookupError in case the encoding cannot be found
or the codecs doesn't provide an incremental decoder.
"""
decoder = lookup(encoding).incrementaldecoder
if decoder is None:
raise LookupError(encoding)
return decoder
def getreader(encoding):
""" Lookup up the codec for the given encoding and return
its StreamReader class or factory function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).streamreader
def getwriter(encoding):
""" Lookup up the codec for the given encoding and return
its StreamWriter class or factory function.
Raises a LookupError in case the encoding cannot be found.
"""
return lookup(encoding).streamwriter
def iterencode(iterator, encoding, errors='strict', **kwargs):
"""
Encoding iterator.
Encodes the input strings from the iterator using an IncrementalEncoder.
errors and kwargs are passed through to the IncrementalEncoder
constructor.
"""
encoder = getincrementalencoder(encoding)(errors, **kwargs)
for input in iterator:
output = encoder.encode(input)
if output:
yield output
output = encoder.encode("", True)
if output:
yield output
def iterdecode(iterator, encoding, errors='strict', **kwargs):
"""
Decoding iterator.
Decodes the input strings from the iterator using an IncrementalDecoder.
errors and kwargs are passed through to the IncrementalDecoder
constructor.
"""
decoder = getincrementaldecoder(encoding)(errors, **kwargs)
for input in iterator:
output = decoder.decode(input)
if output:
yield output
output = decoder.decode(b"", True)
if output:
yield output
### Helpers for charmap-based codecs
def make_identity_dict(rng):
""" make_identity_dict(rng) -> dict
Return a dictionary where elements of the rng sequence are
mapped to themselves.
"""
return {i:i for i in rng}
def make_encoding_map(decoding_map):
""" Creates an encoding map from a decoding map.
If a target mapping in the decoding map occurs multiple
times, then that target is mapped to None (undefined mapping),
causing an exception when encountered by the charmap codec
during translation.
One example where this happens is cp875.py which decodes
multiple character to \\u001a.
"""
m = {}
for k,v in decoding_map.items():
if not v in m:
m[v] = k
else:
m[v] = None
return m
### error handlers
try:
strict_errors = lookup_error("strict")
ignore_errors = lookup_error("ignore")
replace_errors = lookup_error("replace")
xmlcharrefreplace_errors = lookup_error("xmlcharrefreplace")
backslashreplace_errors = lookup_error("backslashreplace")
except LookupError:
# In --disable-unicode builds, these error handler are missing
strict_errors = None
ignore_errors = None
replace_errors = None
xmlcharrefreplace_errors = None
backslashreplace_errors = None
# Tell modulefinder that using codecs probably needs the encodings
# package
_false = 0
if _false:
import encodings
### Tests
if __name__ == '__main__':
# Make stdout translate Latin-1 output into UTF-8 output
sys.stdout = EncodedFile(sys.stdout, 'latin-1', 'utf-8')
# Have stdin translate Latin-1 input into UTF-8 input
sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1')
r"""Utilities to compile possibly incomplete Python source code.
This module provides two interfaces, broadly similar to the builtin
function compile(), which take program text, a filename and a 'mode'
and:
- Return code object if the command is complete and valid
- Return None if the command is incomplete
- Raise SyntaxError, ValueError or OverflowError if the command is a
syntax error (OverflowError and ValueError can be produced by
malformed literals).
Approach:
First, check if the source consists entirely of blank lines and
comments; if so, replace it with 'pass', because the built-in
parser doesn't always do the right thing for these.
Compile three times: as is, with \n, and with \n\n appended. If it
compiles as is, it's complete. If it compiles with one \n appended,
we expect more. If it doesn't compile either way, we compare the
error we get when compiling with \n or \n\n appended. If the errors
are the same, the code is broken. But if the errors are different, we
expect more. Not intuitive; not even guaranteed to hold in future
releases; but this matches the compiler's behavior from Python 1.4
through 2.2, at least.
Caveat:
It is possible (but not likely) that the parser stops parsing with a
successful outcome before reaching the end of the source; in this
case, trailing symbols may be ignored instead of causing an error.
For example, a backslash followed by two newlines may be followed by
arbitrary garbage. This will be fixed once the API for the parser is
better.
The two interfaces are:
compile_command(source, filename, symbol):
Compiles a single command in the manner described above.
CommandCompiler():
Instances of this class have __call__ methods identical in
signature to compile_command; the difference is that if the
instance compiles program text containing a __future__ statement,
the instance 'remembers' and compiles all subsequent program texts
with the statement in force.
The module also provides another class:
Compile():
Instances of this class act like the built-in function compile,
but with 'memory' in the sense described above.
"""
import __future__
_features = [getattr(__future__, fname)
for fname in __future__.all_feature_names]
__all__ = ["compile_command", "Compile", "CommandCompiler"]
PyCF_DONT_IMPLY_DEDENT = 0x200 # Matches pythonrun.h
def _maybe_compile(compiler, source, filename, symbol):
# Check for source consisting of only blank lines and comments
for line in source.split("\n"):
line = line.strip()
if line and line[0] != '#':
break # Leave it alone
else:
if symbol != "eval":
source = "pass" # Replace it with a 'pass' statement
err = err1 = err2 = None
code = code1 = code2 = None
try:
code = compiler(source, filename, symbol)
except SyntaxError as err:
pass
try:
code1 = compiler(source + "\n", filename, symbol)
except SyntaxError as e:
err1 = e
try:
code2 = compiler(source + "\n\n", filename, symbol)
except SyntaxError as e:
err2 = e
if code:
return code
if not code1 and repr(err1) == repr(err2):
raise err1
def _compile(source, filename, symbol):
return compile(source, filename, symbol, PyCF_DONT_IMPLY_DEDENT)
def compile_command(source, filename="<input>", symbol="single"):
r"""Compile a command and determine whether it is incomplete.
Arguments:
source -- the source string; may contain \n characters
filename -- optional filename from which source was read; default
"<input>"
symbol -- optional grammar start symbol; "single" (default) or "eval"
Return value / exceptions raised:
- Return a code object if the command is complete and valid
- Return None if the command is incomplete
- Raise SyntaxError, ValueError or OverflowError if the command is a
syntax error (OverflowError and ValueError can be produced by
malformed literals).
"""
return _maybe_compile(_compile, source, filename, symbol)
class Compile:
"""Instances of this class behave much like the built-in compile
function, but if one is used to compile text containing a future
statement, it "remembers" and compiles all subsequent program texts
with the statement in force."""
def __init__(self):
self.flags = PyCF_DONT_IMPLY_DEDENT
def __call__(self, source, filename, symbol):
codeob = compile(source, filename, symbol, self.flags, 1)
for feature in _features:
if codeob.co_flags & feature.compiler_flag:
self.flags |= feature.compiler_flag
return codeob
class CommandCompiler:
"""Instances of this class have __call__ methods identical in
signature to compile_command; the difference is that if the
instance compiles program text containing a __future__ statement,
the instance 'remembers' and compiles all subsequent program texts
with the statement in force."""
def __init__(self,):
self.compiler = Compile()
def __call__(self, source, filename="<input>", symbol="single"):
r"""Compile a command and determine whether it is incomplete.
Arguments:
source -- the source string; may contain \n characters
filename -- optional filename from which source was read;
default "<input>"
symbol -- optional grammar start symbol; "single" (default) or
"eval"
Return value / exceptions raised:
- Return a code object if the command is complete and valid
- Return None if the command is incomplete
- Raise SyntaxError, ValueError or OverflowError if the command is a
syntax error (OverflowError and ValueError can be produced by
malformed literals).
"""
return _maybe_compile(self.compiler, source, filename, symbol)
"""Conversion functions between RGB and other color systems.
This modules provides two functions for each color system ABC:
rgb_to_abc(r, g, b) --> a, b, c
abc_to_rgb(a, b, c) --> r, g, b
All inputs and outputs are triples of floats in the range [0.0...1.0]
(with the exception of I and Q, which covers a slightly larger range).
Inputs outside the valid range may cause exceptions or invalid outputs.
Supported color systems:
RGB: Red, Green, Blue components
YIQ: Luminance, Chrominance (used by composite video signals)
HLS: Hue, Luminance, Saturation
HSV: Hue, Saturation, Value
"""
# References:
# http://en.wikipedia.org/wiki/YIQ
# http://en.wikipedia.org/wiki/HLS_color_space
# http://en.wikipedia.org/wiki/HSV_color_space
__all__ = ["rgb_to_yiq","yiq_to_rgb","rgb_to_hls","hls_to_rgb",
"rgb_to_hsv","hsv_to_rgb"]
# Some floating point constants
ONE_THIRD = 1.0/3.0
ONE_SIXTH = 1.0/6.0
TWO_THIRD = 2.0/3.0
# YIQ: used by composite video signals (linear combinations of RGB)
# Y: perceived grey level (0.0 == black, 1.0 == white)
# I, Q: color components
#
# There are a great many versions of the constants used in these formulae.
# The ones in this library uses constants from the FCC version of NTSC.
def rgb_to_yiq(r, g, b):
y = 0.30*r + 0.59*g + 0.11*b
i = 0.74*(r-y) - 0.27*(b-y)
q = 0.48*(r-y) + 0.41*(b-y)
return (y, i, q)
def yiq_to_rgb(y, i, q):
# r = y + (0.27*q + 0.41*i) / (0.74*0.41 + 0.27*0.48)
# b = y + (0.74*q - 0.48*i) / (0.74*0.41 + 0.27*0.48)
# g = y - (0.30*(r-y) + 0.11*(b-y)) / 0.59
r = y + 0.9468822170900693*i + 0.6235565819861433*q
g = y - 0.27478764629897834*i - 0.6356910791873801*q
b = y - 1.1085450346420322*i + 1.7090069284064666*q
if r < 0.0:
r = 0.0
if g < 0.0:
g = 0.0
if b < 0.0:
b = 0.0
if r > 1.0:
r = 1.0
if g > 1.0:
g = 1.0
if b > 1.0:
b = 1.0
return (r, g, b)
# HLS: Hue, Luminance, Saturation
# H: position in the spectrum
# L: color lightness
# S: color saturation
def rgb_to_hls(r, g, b):
maxc = max(r, g, b)
minc = min(r, g, b)
# XXX Can optimize (maxc+minc) and (maxc-minc)
l = (minc+maxc)/2.0
if minc == maxc:
return 0.0, l, 0.0
if l <= 0.5:
s = (maxc-minc) / (maxc+minc)
else:
s = (maxc-minc) / (2.0-maxc-minc)
rc = (maxc-r) / (maxc-minc)
gc = (maxc-g) / (maxc-minc)
bc = (maxc-b) / (maxc-minc)
if r == maxc:
h = bc-gc
elif g == maxc:
h = 2.0+rc-bc
else:
h = 4.0+gc-rc
h = (h/6.0) % 1.0
return h, l, s
def hls_to_rgb(h, l, s):
if s == 0.0:
return l, l, l
if l <= 0.5:
m2 = l * (1.0+s)
else:
m2 = l+s-(l*s)
m1 = 2.0*l - m2
return (_v(m1, m2, h+ONE_THIRD), _v(m1, m2, h), _v(m1, m2, h-ONE_THIRD))
def _v(m1, m2, hue):
hue = hue % 1.0
if hue < ONE_SIXTH:
return m1 + (m2-m1)*hue*6.0
if hue < 0.5:
return m2
if hue < TWO_THIRD:
return m1 + (m2-m1)*(TWO_THIRD-hue)*6.0
return m1
# HSV: Hue, Saturation, Value
# H: position in the spectrum
# S: color saturation ("purity")
# V: color brightness
def rgb_to_hsv(r, g, b):
maxc = max(r, g, b)
minc = min(r, g, b)
v = maxc
if minc == maxc:
return 0.0, 0.0, v
s = (maxc-minc) / maxc
rc = (maxc-r) / (maxc-minc)
gc = (maxc-g) / (maxc-minc)
bc = (maxc-b) / (maxc-minc)
if r == maxc:
h = bc-gc
elif g == maxc:
h = 2.0+rc-bc
else:
h = 4.0+gc-rc
h = (h/6.0) % 1.0
return h, s, v
def hsv_to_rgb(h, s, v):
if s == 0.0:
return v, v, v
i = int(h*6.0) # XXX assume int() truncates!
f = (h*6.0) - i
p = v*(1.0 - s)
q = v*(1.0 - s*f)
t = v*(1.0 - s*(1.0-f))
i = i%6
if i == 0:
return v, t, p
if i == 1:
return q, v, p
if i == 2:
return p, v, t
if i == 3:
return p, q, v
if i == 4:
return t, p, v
if i == 5:
return v, p, q
# Cannot get here
"""Module/script to byte-compile all .py files to .pyc (or .pyo) files.
When called as a script with arguments, this compiles the directories
given as arguments recursively; the -l option prevents it from
recursing into directories.
Without arguments, if compiles all modules on sys.path, without
recursing into subdirectories. (Even though it should do so for
packages -- for now, you'll have to deal with packages separately.)
See module py_compile for details of the actual byte-compilation.
"""
import os
import sys
import importlib.util
import py_compile
import struct
__all__ = ["compile_dir","compile_file","compile_path"]
def compile_dir(dir, maxlevels=10, ddir=None, force=False, rx=None,
quiet=False, legacy=False, optimize=-1):
"""Byte-compile all modules in the given directory tree.
Arguments (only dir is required):
dir: the directory to byte-compile
maxlevels: maximum recursion level (default 10)
ddir: the directory that will be prepended to the path to the
file as it is compiled into each byte-code file.
force: if True, force compilation, even if timestamps are up-to-date
quiet: if True, be quiet during compilation
legacy: if True, produce legacy pyc paths instead of PEP 3147 paths
optimize: optimization level or -1 for level of the interpreter
"""
if not quiet:
print('Listing {!r}...'.format(dir))
try:
names = os.listdir(dir)
except OSError:
print("Can't list {!r}".format(dir))
names = []
names.sort()
success = 1
for name in names:
if name == '__pycache__':
continue
fullname = os.path.join(dir, name)
if ddir is not None:
dfile = os.path.join(ddir, name)
else:
dfile = None
if not os.path.isdir(fullname):
if not compile_file(fullname, ddir, force, rx, quiet,
legacy, optimize):
success = 0
elif (maxlevels > 0 and name != os.curdir and name != os.pardir and
os.path.isdir(fullname) and not os.path.islink(fullname)):
if not compile_dir(fullname, maxlevels - 1, dfile, force, rx,
quiet, legacy, optimize):
success = 0
return success
def compile_file(fullname, ddir=None, force=False, rx=None, quiet=False,
legacy=False, optimize=-1):
"""Byte-compile one file.
Arguments (only fullname is required):
fullname: the file to byte-compile
ddir: if given, the directory name compiled in to the
byte-code file.
force: if True, force compilation, even if timestamps are up-to-date
quiet: if True, be quiet during compilation
legacy: if True, produce legacy pyc paths instead of PEP 3147 paths
optimize: optimization level or -1 for level of the interpreter
"""
success = 1
name = os.path.basename(fullname)
if ddir is not None:
dfile = os.path.join(ddir, name)
else:
dfile = None
if rx is not None:
mo = rx.search(fullname)
if mo:
return success
if os.path.isfile(fullname):
if legacy:
cfile = fullname + ('c' if __debug__ else 'o')
else:
if optimize >= 0:
cfile = importlib.util.cache_from_source(
fullname, debug_override=not optimize)
else:
cfile = importlib.util.cache_from_source(fullname)
cache_dir = os.path.dirname(cfile)
head, tail = name[:-3], name[-3:]
if tail == '.py':
if not force:
try:
mtime = int(os.stat(fullname).st_mtime)
expect = struct.pack('<4sl', importlib.util.MAGIC_NUMBER,
mtime)
with open(cfile, 'rb') as chandle:
actual = chandle.read(8)
if expect == actual:
return success
except OSError:
pass
if not quiet:
print('Compiling {!r}...'.format(fullname))
try:
ok = py_compile.compile(fullname, cfile, dfile, True,
optimize=optimize)
except py_compile.PyCompileError as err:
if quiet:
print('*** Error compiling {!r}...'.format(fullname))
else:
print('*** ', end='')
# escape non-printable characters in msg
msg = err.msg.encode(sys.stdout.encoding,
errors='backslashreplace')
msg = msg.decode(sys.stdout.encoding)
print(msg)
success = 0
except (SyntaxError, UnicodeError, OSError) as e:
if quiet:
print('*** Error compiling {!r}...'.format(fullname))
else:
print('*** ', end='')
print(e.__class__.__name__ + ':', e)
success = 0
else:
if ok == 0:
success = 0
return success
def compile_path(skip_curdir=1, maxlevels=0, force=False, quiet=False,
legacy=False, optimize=-1):
"""Byte-compile all module on sys.path.
Arguments (all optional):
skip_curdir: if true, skip current directory (default True)
maxlevels: max recursion level (default 0)
force: as for compile_dir() (default False)
quiet: as for compile_dir() (default False)
legacy: as for compile_dir() (default False)
optimize: as for compile_dir() (default -1)
"""
success = 1
for dir in sys.path:
if (not dir or dir == os.curdir) and skip_curdir:
print('Skipping current directory')
else:
success = success and compile_dir(dir, maxlevels, None,
force, quiet=quiet,
legacy=legacy, optimize=optimize)
return success
def main():
"""Script main program."""
import argparse
parser = argparse.ArgumentParser(
description='Utilities to support installing Python libraries.')
parser.add_argument('-l', action='store_const', const=0,
default=10, dest='maxlevels',
help="don't recurse into subdirectories")
parser.add_argument('-f', action='store_true', dest='force',
help='force rebuild even if timestamps are up to date')
parser.add_argument('-q', action='store_true', dest='quiet',
help='output only error messages')
parser.add_argument('-b', action='store_true', dest='legacy',
help='use legacy (pre-PEP3147) compiled file locations')
parser.add_argument('-d', metavar='DESTDIR', dest='ddir', default=None,
help=('directory to prepend to file paths for use in '
'compile-time tracebacks and in runtime '
'tracebacks in cases where the source file is '
'unavailable'))
parser.add_argument('-x', metavar='REGEXP', dest='rx', default=None,
help=('skip files matching the regular expression; '
'the regexp is searched for in the full path '
'of each file considered for compilation'))
parser.add_argument('-i', metavar='FILE', dest='flist',
help=('add all the files and directories listed in '
'FILE to the list considered for compilation; '
'if "-", names are read from stdin'))
parser.add_argument('compile_dest', metavar='FILE|DIR', nargs='*',
help=('zero or more file and directory names '
'to compile; if no arguments given, defaults '
'to the equivalent of -l sys.path'))
args = parser.parse_args()
compile_dests = args.compile_dest
if args.rx:
import re
args.rx = re.compile(args.rx)
# if flist is provided then load it
if args.flist:
try:
with (sys.stdin if args.flist=='-' else open(args.flist)) as f:
for line in f:
compile_dests.append(line.strip())
except OSError:
print("Error reading file list {}".format(args.flist))
return False
success = True
try:
if compile_dests:
for dest in compile_dests:
if os.path.isfile(dest):
if not compile_file(dest, args.ddir, args.force, args.rx,
args.quiet, args.legacy):
success = False
else:
if not compile_dir(dest, args.maxlevels, args.ddir,
args.force, args.rx, args.quiet,
args.legacy):
success = False
return success
else:
return compile_path(legacy=args.legacy, force=args.force,
quiet=args.quiet)
except KeyboardInterrupt:
print("\n[interrupted]")
return False
return True
if __name__ == '__main__':
exit_status = int(not main())
sys.exit(exit_status)
"""Configuration file parser.
A configuration file consists of sections, lead by a "[section]" header,
and followed by "name: value" entries, with continuations and such in
the style of RFC 822.
Intrinsic defaults can be specified by passing them into the
ConfigParser constructor as a dictionary.
class:
ConfigParser -- responsible for parsing a list of
configuration files, and managing the parsed database.
methods:
__init__(defaults=None, dict_type=_default_dict, allow_no_value=False,
delimiters=('=', ':'), comment_prefixes=('#', ';'),
inline_comment_prefixes=None, strict=True,
empty_lines_in_values=True):
Create the parser. When `defaults' is given, it is initialized into the
dictionary or intrinsic defaults. The keys must be strings, the values
must be appropriate for %()s string interpolation.
When `dict_type' is given, it will be used to create the dictionary
objects for the list of sections, for the options within a section, and
for the default values.
When `delimiters' is given, it will be used as the set of substrings
that divide keys from values.
When `comment_prefixes' is given, it will be used as the set of
substrings that prefix comments in empty lines. Comments can be
indented.
When `inline_comment_prefixes' is given, it will be used as the set of
substrings that prefix comments in non-empty lines.
When `strict` is True, the parser won't allow for any section or option
duplicates while reading from a single source (file, string or
dictionary). Default is True.
When `empty_lines_in_values' is False (default: True), each empty line
marks the end of an option. Otherwise, internal empty lines of
a multiline option are kept as part of the value.
When `allow_no_value' is True (default: False), options without
values are accepted; the value presented for these is None.
sections()
Return all the configuration section names, sans DEFAULT.
has_section(section)
Return whether the given section exists.
has_option(section, option)
Return whether the given option exists in the given section.
options(section)
Return list of configuration options for the named section.
read(filenames, encoding=None)
Read and parse the list of named configuration files, given by
name. A single filename is also allowed. Non-existing files
are ignored. Return list of successfully read files.
read_file(f, filename=None)
Read and parse one configuration file, given as a file object.
The filename defaults to f.name; it is only used in error
messages (if f has no `name' attribute, the string `<???>' is used).
read_string(string)
Read configuration from a given string.
read_dict(dictionary)
Read configuration from a dictionary. Keys are section names,
values are dictionaries with keys and values that should be present
in the section. If the used dictionary type preserves order, sections
and their keys will be added in order. Values are automatically
converted to strings.
get(section, option, raw=False, vars=None, fallback=_UNSET)
Return a string value for the named option. All % interpolations are
expanded in the return values, based on the defaults passed into the
constructor and the DEFAULT section. Additional substitutions may be
provided using the `vars' argument, which must be a dictionary whose
contents override any pre-existing defaults. If `option' is a key in
`vars', the value from `vars' is used.
getint(section, options, raw=False, vars=None, fallback=_UNSET)
Like get(), but convert value to an integer.
getfloat(section, options, raw=False, vars=None, fallback=_UNSET)
Like get(), but convert value to a float.
getboolean(section, options, raw=False, vars=None, fallback=_UNSET)
Like get(), but convert value to a boolean (currently case
insensitively defined as 0, false, no, off for False, and 1, true,
yes, on for True). Returns False or True.
items(section=_UNSET, raw=False, vars=None)
If section is given, return a list of tuples with (name, value) for
each option in the section. Otherwise, return a list of tuples with
(section_name, section_proxy) for each section, including DEFAULTSECT.
remove_section(section)
Remove the given file section and all its options.
remove_option(section, option)
Remove the given option from the given section.
set(section, option, value)
Set the given option.
write(fp, space_around_delimiters=True)
Write the configuration state in .ini format. If
`space_around_delimiters' is True (the default), delimiters
between keys and values are surrounded by spaces.
"""
from collections.abc import MutableMapping
from collections import OrderedDict as _default_dict, ChainMap as _ChainMap
import functools
import io
import itertools
import re
import sys
import warnings
__all__ = ["NoSectionError", "DuplicateOptionError", "DuplicateSectionError",
"NoOptionError", "InterpolationError", "InterpolationDepthError",
"InterpolationSyntaxError", "ParsingError",
"MissingSectionHeaderError",
"ConfigParser", "SafeConfigParser", "RawConfigParser",
"DEFAULTSECT", "MAX_INTERPOLATION_DEPTH"]
DEFAULTSECT = "DEFAULT"
MAX_INTERPOLATION_DEPTH = 10
# exception classes
class Error(Exception):
"""Base class for ConfigParser exceptions."""
def __init__(self, msg=''):
self.message = msg
Exception.__init__(self, msg)
def __repr__(self):
return self.message
__str__ = __repr__
class NoSectionError(Error):
"""Raised when no section matches a requested option."""
def __init__(self, section):
Error.__init__(self, 'No section: %r' % (section,))
self.section = section
self.args = (section, )
class DuplicateSectionError(Error):
"""Raised when a section is repeated in an input source.
Possible repetitions that raise this exception are: multiple creation
using the API or in strict parsers when a section is found more than once
in a single input file, string or dictionary.
"""
def __init__(self, section, source=None, lineno=None):
msg = [repr(section), " already exists"]
if source is not None:
message = ["While reading from ", repr(source)]
if lineno is not None:
message.append(" [line {0:2d}]".format(lineno))
message.append(": section ")
message.extend(msg)
msg = message
else:
msg.insert(0, "Section ")
Error.__init__(self, "".join(msg))
self.section = section
self.source = source
self.lineno = lineno
self.args = (section, source, lineno)
class DuplicateOptionError(Error):
"""Raised by strict parsers when an option is repeated in an input source.
Current implementation raises this exception only when an option is found
more than once in a single file, string or dictionary.
"""
def __init__(self, section, option, source=None, lineno=None):
msg = [repr(option), " in section ", repr(section),
" already exists"]
if source is not None:
message = ["While reading from ", repr(source)]
if lineno is not None:
message.append(" [line {0:2d}]".format(lineno))
message.append(": option ")
message.extend(msg)
msg = message
else:
msg.insert(0, "Option ")
Error.__init__(self, "".join(msg))
self.section = section
self.option = option
self.source = source
self.lineno = lineno
self.args = (section, option, source, lineno)
class NoOptionError(Error):
"""A requested option was not found."""
def __init__(self, option, section):
Error.__init__(self, "No option %r in section: %r" %
(option, section))
self.option = option
self.section = section
self.args = (option, section)
class InterpolationError(Error):
"""Base class for interpolation-related exceptions."""
def __init__(self, option, section, msg):
Error.__init__(self, msg)
self.option = option
self.section = section
self.args = (option, section, msg)
class InterpolationMissingOptionError(InterpolationError):
"""A string substitution required a setting which was not available."""
def __init__(self, option, section, rawval, reference):
msg = ("Bad value substitution: option {!r} in section {!r} contains "
"an interpolation key {!r} which is not a valid option name. "
"Raw value: {!r}".format(option, section, reference, rawval))
InterpolationError.__init__(self, option, section, msg)
self.reference = reference
self.args = (option, section, rawval, reference)
class InterpolationSyntaxError(InterpolationError):
"""Raised when the source text contains invalid syntax.
Current implementation raises this exception when the source text into
which substitutions are made does not conform to the required syntax.
"""
class InterpolationDepthError(InterpolationError):
"""Raised when substitutions are nested too deeply."""
def __init__(self, option, section, rawval):
msg = ("Recursion limit exceeded in value substitution: option {!r} "
"in section {!r} contains an interpolation key which "
"cannot be substituted in {} steps. Raw value: {!r}"
"".format(option, section, MAX_INTERPOLATION_DEPTH,
rawval))
InterpolationError.__init__(self, option, section, msg)
self.args = (option, section, rawval)
class ParsingError(Error):
"""Raised when a configuration file does not follow legal syntax."""
def __init__(self, source=None, filename=None):
# Exactly one of `source'/`filename' arguments has to be given.
# `filename' kept for compatibility.
if filename and source:
raise ValueError("Cannot specify both `filename' and `source'. "
"Use `source'.")
elif not filename and not source:
raise ValueError("Required argument `source' not given.")
elif filename:
source = filename
Error.__init__(self, 'Source contains parsing errors: %r' % source)
self.source = source
self.errors = []
self.args = (source, )
@property
def filename(self):
"""Deprecated, use `source'."""
warnings.warn(
"The 'filename' attribute will be removed in future versions. "
"Use 'source' instead.",
DeprecationWarning, stacklevel=2
)
return self.source
@filename.setter
def filename(self, value):
"""Deprecated, user `source'."""
warnings.warn(
"The 'filename' attribute will be removed in future versions. "
"Use 'source' instead.",
DeprecationWarning, stacklevel=2
)
self.source = value
def append(self, lineno, line):
self.errors.append((lineno, line))
self.message += '\n\t[line %2d]: %s' % (lineno, line)
class MissingSectionHeaderError(ParsingError):
"""Raised when a key-value pair is found before any section header."""
def __init__(self, filename, lineno, line):
Error.__init__(
self,
'File contains no section headers.\nfile: %r, line: %d\n%r' %
(filename, lineno, line))
self.source = filename
self.lineno = lineno
self.line = line
self.args = (filename, lineno, line)
# Used in parser getters to indicate the default behaviour when a specific
# option is not found it to raise an exception. Created to enable `None' as
# a valid fallback value.
_UNSET = object()
class Interpolation:
"""Dummy interpolation that passes the value through with no changes."""
def before_get(self, parser, section, option, value, defaults):
return value
def before_set(self, parser, section, option, value):
return value
def before_read(self, parser, section, option, value):
return value
def before_write(self, parser, section, option, value):
return value
class BasicInterpolation(Interpolation):
"""Interpolation as implemented in the classic ConfigParser.
The option values can contain format strings which refer to other values in
the same section, or values in the special default section.
For example:
something: %(dir)s/whatever
would resolve the "%(dir)s" to the value of dir. All reference
expansions are done late, on demand. If a user needs to use a bare % in
a configuration file, she can escape it by writing %%. Other % usage
is considered a user error and raises `InterpolationSyntaxError'."""
_KEYCRE = re.compile(r"%\(([^)]+)\)s")
def before_get(self, parser, section, option, value, defaults):
L = []
self._interpolate_some(parser, option, L, value, section, defaults, 1)
return ''.join(L)
def before_set(self, parser, section, option, value):
tmp_value = value.replace('%%', '') # escaped percent signs
tmp_value = self._KEYCRE.sub('', tmp_value) # valid syntax
if '%' in tmp_value:
raise ValueError("invalid interpolation syntax in %r at "
"position %d" % (value, tmp_value.find('%')))
return value
def _interpolate_some(self, parser, option, accum, rest, section, map,
depth):
rawval = parser.get(section, option, raw=True, fallback=rest)
if depth > MAX_INTERPOLATION_DEPTH:
raise InterpolationDepthError(option, section, rawval)
while rest:
p = rest.find("%")
if p < 0:
accum.append(rest)
return
if p > 0:
accum.append(rest[:p])
rest = rest[p:]
# p is no longer used
c = rest[1:2]
if c == "%":
accum.append("%")
rest = rest[2:]
elif c == "(":
m = self._KEYCRE.match(rest)
if m is None:
raise InterpolationSyntaxError(option, section,
"bad interpolation variable reference %r" % rest)
var = parser.optionxform(m.group(1))
rest = rest[m.end():]
try:
v = map[var]
except KeyError:
raise InterpolationMissingOptionError(
option, section, rawval, var)
if "%" in v:
self._interpolate_some(parser, option, accum, v,
section, map, depth + 1)
else:
accum.append(v)
else:
raise InterpolationSyntaxError(
option, section,
"'%%' must be followed by '%%' or '(', "
"found: %r" % (rest,))
class ExtendedInterpolation(Interpolation):
"""Advanced variant of interpolation, supports the syntax used by
`zc.buildout'. Enables interpolation between sections."""
_KEYCRE = re.compile(r"\$\{([^}]+)\}")
def before_get(self, parser, section, option, value, defaults):
L = []
self._interpolate_some(parser, option, L, value, section, defaults, 1)
return ''.join(L)
def before_set(self, parser, section, option, value):
tmp_value = value.replace('$$', '') # escaped dollar signs
tmp_value = self._KEYCRE.sub('', tmp_value) # valid syntax
if '$' in tmp_value:
raise ValueError("invalid interpolation syntax in %r at "
"position %d" % (value, tmp_value.find('$')))
return value
def _interpolate_some(self, parser, option, accum, rest, section, map,
depth):
rawval = parser.get(section, option, raw=True, fallback=rest)
if depth > MAX_INTERPOLATION_DEPTH:
raise InterpolationDepthError(option, section, rawval)
while rest:
p = rest.find("$")
if p < 0:
accum.append(rest)
return
if p > 0:
accum.append(rest[:p])
rest = rest[p:]
# p is no longer used
c = rest[1:2]
if c == "$":
accum.append("$")
rest = rest[2:]
elif c == "{":
m = self._KEYCRE.match(rest)
if m is None:
raise InterpolationSyntaxError(option, section,
"bad interpolation variable reference %r" % rest)
path = m.group(1).split(':')
rest = rest[m.end():]
sect = section
opt = option
try:
if len(path) == 1:
opt = parser.optionxform(path[0])
v = map[opt]
elif len(path) == 2:
sect = path[0]
opt = parser.optionxform(path[1])
v = parser.get(sect, opt, raw=True)
else:
raise InterpolationSyntaxError(
option, section,
"More than one ':' found: %r" % (rest,))
except (KeyError, NoSectionError, NoOptionError):
raise InterpolationMissingOptionError(
option, section, rawval, ":".join(path))
if "$" in v:
self._interpolate_some(parser, opt, accum, v, sect,
dict(parser.items(sect, raw=True)),
depth + 1)
else:
accum.append(v)
else:
raise InterpolationSyntaxError(
option, section,
"'$' must be followed by '$' or '{', "
"found: %r" % (rest,))
class LegacyInterpolation(Interpolation):
"""Deprecated interpolation used in old versions of ConfigParser.
Use BasicInterpolation or ExtendedInterpolation instead."""
_KEYCRE = re.compile(r"%\(([^)]*)\)s|.")
def before_get(self, parser, section, option, value, vars):
rawval = value
depth = MAX_INTERPOLATION_DEPTH
while depth: # Loop through this until it's done
depth -= 1
if value and "%(" in value:
replace = functools.partial(self._interpolation_replace,
parser=parser)
value = self._KEYCRE.sub(replace, value)
try:
value = value % vars
except KeyError as e:
raise InterpolationMissingOptionError(
option, section, rawval, e.args[0])
else:
break
if value and "%(" in value:
raise InterpolationDepthError(option, section, rawval)
return value
def before_set(self, parser, section, option, value):
return value
@staticmethod
def _interpolation_replace(match, parser):
s = match.group(1)
if s is None:
return match.group()
else:
return "%%(%s)s" % parser.optionxform(s)
class RawConfigParser(MutableMapping):
"""ConfigParser that does not do interpolation."""
# Regular expressions for parsing section headers and options
_SECT_TMPL = r"""
\[ # [
(?P<header>[^]]+) # very permissive!
\] # ]
"""
_OPT_TMPL = r"""
(?P<option>.*?) # very permissive!
\s*(?P<vi>{delim})\s* # any number of space/tab,
# followed by any of the
# allowed delimiters,
# followed by any space/tab
(?P<value>.*)$ # everything up to eol
"""
_OPT_NV_TMPL = r"""
(?P<option>.*?) # very permissive!
\s*(?: # any number of space/tab,
(?P<vi>{delim})\s* # optionally followed by
# any of the allowed
# delimiters, followed by any
# space/tab
(?P<value>.*))?$ # everything up to eol
"""
# Interpolation algorithm to be used if the user does not specify another
_DEFAULT_INTERPOLATION = Interpolation()
# Compiled regular expression for matching sections
SECTCRE = re.compile(_SECT_TMPL, re.VERBOSE)
# Compiled regular expression for matching options with typical separators
OPTCRE = re.compile(_OPT_TMPL.format(delim="=|:"), re.VERBOSE)
# Compiled regular expression for matching options with optional values
# delimited using typical separators
OPTCRE_NV = re.compile(_OPT_NV_TMPL.format(delim="=|:"), re.VERBOSE)
# Compiled regular expression for matching leading whitespace in a line
NONSPACECRE = re.compile(r"\S")
# Possible boolean values in the configuration.
BOOLEAN_STATES = {'1': True, 'yes': True, 'true': True, 'on': True,
'0': False, 'no': False, 'false': False, 'off': False}
def __init__(self, defaults=None, dict_type=_default_dict,
allow_no_value=False, *, delimiters=('=', ':'),
comment_prefixes=('#', ';'), inline_comment_prefixes=None,
strict=True, empty_lines_in_values=True,
default_section=DEFAULTSECT,
interpolation=_UNSET):
self._dict = dict_type
self._sections = self._dict()
self._defaults = self._dict()
self._proxies = self._dict()
self._proxies[default_section] = SectionProxy(self, default_section)
if defaults:
for key, value in defaults.items():
self._defaults[self.optionxform(key)] = value
self._delimiters = tuple(delimiters)
if delimiters == ('=', ':'):
self._optcre = self.OPTCRE_NV if allow_no_value else self.OPTCRE
else:
d = "|".join(re.escape(d) for d in delimiters)
if allow_no_value:
self._optcre = re.compile(self._OPT_NV_TMPL.format(delim=d),
re.VERBOSE)
else:
self._optcre = re.compile(self._OPT_TMPL.format(delim=d),
re.VERBOSE)
self._comment_prefixes = tuple(comment_prefixes or ())
self._inline_comment_prefixes = tuple(inline_comment_prefixes or ())
self._strict = strict
self._allow_no_value = allow_no_value
self._empty_lines_in_values = empty_lines_in_values
self.default_section=default_section
self._interpolation = interpolation
if self._interpolation is _UNSET:
self._interpolation = self._DEFAULT_INTERPOLATION
if self._interpolation is None:
self._interpolation = Interpolation()
def defaults(self):
return self._defaults
def sections(self):
"""Return a list of section names, excluding [DEFAULT]"""
# self._sections will never have [DEFAULT] in it
return list(self._sections.keys())
def add_section(self, section):
"""Create a new section in the configuration.
Raise DuplicateSectionError if a section by the specified name
already exists. Raise ValueError if name is DEFAULT.
"""
if section == self.default_section:
raise ValueError('Invalid section name: %r' % section)
if section in self._sections:
raise DuplicateSectionError(section)
self._sections[section] = self._dict()
self._proxies[section] = SectionProxy(self, section)
def has_section(self, section):
"""Indicate whether the named section is present in the configuration.
The DEFAULT section is not acknowledged.
"""
return section in self._sections
def options(self, section):
"""Return a list of option names for the given section name."""
try:
opts = self._sections[section].copy()
except KeyError:
raise NoSectionError(section)
opts.update(self._defaults)
return list(opts.keys())
def read(self, filenames, encoding=None):
"""Read and parse a filename or a list of filenames.
Files that cannot be opened are silently ignored; this is
designed so that you can specify a list of potential
configuration file locations (e.g. current directory, user's
home directory, systemwide directory), and all existing
configuration files in the list will be read. A single
filename may also be given.
Return list of successfully read files.
"""
if isinstance(filenames, str):
filenames = [filenames]
read_ok = []
for filename in filenames:
try:
with open(filename, encoding=encoding) as fp:
self._read(fp, filename)
except OSError:
continue
read_ok.append(filename)
return read_ok
def read_file(self, f, source=None):
"""Like read() but the argument must be a file-like object.
The `f' argument must be iterable, returning one line at a time.
Optional second argument is the `source' specifying the name of the
file being read. If not given, it is taken from f.name. If `f' has no
`name' attribute, `<???>' is used.
"""
if source is None:
try:
source = f.name
except AttributeError:
source = '<???>'
self._read(f, source)
def read_string(self, string, source='<string>'):
"""Read configuration from a given string."""
sfile = io.StringIO(string)
self.read_file(sfile, source)
def read_dict(self, dictionary, source='<dict>'):
"""Read configuration from a dictionary.
Keys are section names, values are dictionaries with keys and values
that should be present in the section. If the used dictionary type
preserves order, sections and their keys will be added in order.
All types held in the dictionary are converted to strings during
reading, including section names, option names and keys.
Optional second argument is the `source' specifying the name of the
dictionary being read.
"""
elements_added = set()
for section, keys in dictionary.items():
section = str(section)
try:
self.add_section(section)
except (DuplicateSectionError, ValueError):
if self._strict and section in elements_added:
raise
elements_added.add(section)
for key, value in keys.items():
key = self.optionxform(str(key))
if value is not None:
value = str(value)
if self._strict and (section, key) in elements_added:
raise DuplicateOptionError(section, key, source)
elements_added.add((section, key))
self.set(section, key, value)
def readfp(self, fp, filename=None):
"""Deprecated, use read_file instead."""
warnings.warn(
"This method will be removed in future versions. "
"Use 'parser.read_file()' instead.",
DeprecationWarning, stacklevel=2
)
self.read_file(fp, source=filename)
def get(self, section, option, *, raw=False, vars=None, fallback=_UNSET):
"""Get an option value for a given section.
If `vars' is provided, it must be a dictionary. The option is looked up
in `vars' (if provided), `section', and in `DEFAULTSECT' in that order.
If the key is not found and `fallback' is provided, it is used as
a fallback value. `None' can be provided as a `fallback' value.
If interpolation is enabled and the optional argument `raw' is False,
all interpolations are expanded in the return values.
Arguments `raw', `vars', and `fallback' are keyword only.
The section DEFAULT is special.
"""
try:
d = self._unify_values(section, vars)
except NoSectionError:
if fallback is _UNSET:
raise
else:
return fallback
option = self.optionxform(option)
try:
value = d[option]
except KeyError:
if fallback is _UNSET:
raise NoOptionError(option, section)
else:
return fallback
if raw or value is None:
return value
else:
return self._interpolation.before_get(self, section, option, value,
d)
def _get(self, section, conv, option, **kwargs):
return conv(self.get(section, option, **kwargs))
def getint(self, section, option, *, raw=False, vars=None,
fallback=_UNSET):
try:
return self._get(section, int, option, raw=raw, vars=vars)
except (NoSectionError, NoOptionError):
if fallback is _UNSET:
raise
else:
return fallback
def getfloat(self, section, option, *, raw=False, vars=None,
fallback=_UNSET):
try:
return self._get(section, float, option, raw=raw, vars=vars)
except (NoSectionError, NoOptionError):
if fallback is _UNSET:
raise
else:
return fallback
def getboolean(self, section, option, *, raw=False, vars=None,
fallback=_UNSET):
try:
return self._get(section, self._convert_to_boolean, option,
raw=raw, vars=vars)
except (NoSectionError, NoOptionError):
if fallback is _UNSET:
raise
else:
return fallback
def items(self, section=_UNSET, raw=False, vars=None):
"""Return a list of (name, value) tuples for each option in a section.
All % interpolations are expanded in the return values, based on the
defaults passed into the constructor, unless the optional argument
`raw' is true. Additional substitutions may be provided using the
`vars' argument, which must be a dictionary whose contents overrides
any pre-existing defaults.
The section DEFAULT is special.
"""
if section is _UNSET:
return super().items()
d = self._defaults.copy()
try:
d.update(self._sections[section])
except KeyError:
if section != self.default_section:
raise NoSectionError(section)
# Update with the entry specific variables
if vars:
for key, value in vars.items():
d[self.optionxform(key)] = value
value_getter = lambda option: self._interpolation.before_get(self,
section, option, d[option], d)
if raw:
value_getter = lambda option: d[option]
return [(option, value_getter(option)) for option in d.keys()]
def popitem(self):
"""Remove a section from the parser and return it as
a (section_name, section_proxy) tuple. If no section is present, raise
KeyError.
The section DEFAULT is never returned because it cannot be removed.
"""
for key in self.sections():
value = self[key]
del self[key]
return key, value
raise KeyError
def optionxform(self, optionstr):
return optionstr.lower()
def has_option(self, section, option):
"""Check for the existence of a given option in a given section.
If the specified `section' is None or an empty string, DEFAULT is
assumed. If the specified `section' does not exist, returns False."""
if not section or section == self.default_section:
option = self.optionxform(option)
return option in self._defaults
elif section not in self._sections:
return False
else:
option = self.optionxform(option)
return (option in self._sections[section]
or option in self._defaults)
def set(self, section, option, value=None):
"""Set an option."""
if value:
value = self._interpolation.before_set(self, section, option,
value)
if not section or section == self.default_section:
sectdict = self._defaults
else:
try:
sectdict = self._sections[section]
except KeyError:
raise NoSectionError(section)
sectdict[self.optionxform(option)] = value
def write(self, fp, space_around_delimiters=True):
"""Write an .ini-format representation of the configuration state.
If `space_around_delimiters' is True (the default), delimiters
between keys and values are surrounded by spaces.
"""
if space_around_delimiters:
d = " {} ".format(self._delimiters[0])
else:
d = self._delimiters[0]
if self._defaults:
self._write_section(fp, self.default_section,
self._defaults.items(), d)
for section in self._sections:
self._write_section(fp, section,
self._sections[section].items(), d)
def _write_section(self, fp, section_name, section_items, delimiter):
"""Write a single section to the specified `fp'."""
fp.write("[{}]\n".format(section_name))
for key, value in section_items:
value = self._interpolation.before_write(self, section_name, key,
value)
if value is not None or not self._allow_no_value:
value = delimiter + str(value).replace('\n', '\n\t')
else:
value = ""
fp.write("{}{}\n".format(key, value))
fp.write("\n")
def remove_option(self, section, option):
"""Remove an option."""
if not section or section == self.default_section:
sectdict = self._defaults
else:
try:
sectdict = self._sections[section]
except KeyError:
raise NoSectionError(section)
option = self.optionxform(option)
existed = option in sectdict
if existed:
del sectdict[option]
return existed
def remove_section(self, section):
"""Remove a file section."""
existed = section in self._sections
if existed:
del self._sections[section]
del self._proxies[section]
return existed
def __getitem__(self, key):
if key != self.default_section and not self.has_section(key):
raise KeyError(key)
return self._proxies[key]
def __setitem__(self, key, value):
# To conform with the mapping protocol, overwrites existing values in
# the section.
# XXX this is not atomic if read_dict fails at any point. Then again,
# no update method in configparser is atomic in this implementation.
if key == self.default_section:
self._defaults.clear()
elif key in self._sections:
self._sections[key].clear()
self.read_dict({key: value})
def __delitem__(self, key):
if key == self.default_section:
raise ValueError("Cannot remove the default section.")
if not self.has_section(key):
raise KeyError(key)
self.remove_section(key)
def __contains__(self, key):
return key == self.default_section or self.has_section(key)
def __len__(self):
return len(self._sections) + 1 # the default section
def __iter__(self):
# XXX does it break when underlying container state changed?
return itertools.chain((self.default_section,), self._sections.keys())
def _read(self, fp, fpname):
"""Parse a sectioned configuration file.
Each section in a configuration file contains a header, indicated by
a name in square brackets (`[]'), plus key/value options, indicated by
`name' and `value' delimited with a specific substring (`=' or `:' by
default).
Values can span multiple lines, as long as they are indented deeper
than the first line of the value. Depending on the parser's mode, blank
lines may be treated as parts of multiline values or ignored.
Configuration files may include comments, prefixed by specific
characters (`#' and `;' by default). Comments may appear on their own
in an otherwise empty line or may be entered in lines holding values or
section names.
"""
elements_added = set()
cursect = None # None, or a dictionary
sectname = None
optname = None
lineno = 0
indent_level = 0
e = None # None, or an exception
for lineno, line in enumerate(fp, start=1):
comment_start = sys.maxsize
# strip inline comments
inline_prefixes = {p: -1 for p in self._inline_comment_prefixes}
while comment_start == sys.maxsize and inline_prefixes:
next_prefixes = {}
for prefix, index in inline_prefixes.items():
index = line.find(prefix, index+1)
if index == -1:
continue
next_prefixes[prefix] = index
if index == 0 or (index > 0 and line[index-1].isspace()):
comment_start = min(comment_start, index)
inline_prefixes = next_prefixes
# strip full line comments
for prefix in self._comment_prefixes:
if line.strip().startswith(prefix):
comment_start = 0
break
if comment_start == sys.maxsize:
comment_start = None
value = line[:comment_start].strip()
if not value:
if self._empty_lines_in_values:
# add empty line to the value, but only if there was no
# comment on the line
if (comment_start is None and
cursect is not None and
optname and
cursect[optname] is not None):
cursect[optname].append('') # newlines added at join
else:
# empty line marks end of value
indent_level = sys.maxsize
continue
# continuation line?
first_nonspace = self.NONSPACECRE.search(line)
cur_indent_level = first_nonspace.start() if first_nonspace else 0
if (cursect is not None and optname and
cur_indent_level > indent_level):
cursect[optname].append(value)
# a section header or option header?
else:
indent_level = cur_indent_level
# is it a section header?
mo = self.SECTCRE.match(value)
if mo:
sectname = mo.group('header')
if sectname in self._sections:
if self._strict and sectname in elements_added:
raise DuplicateSectionError(sectname, fpname,
lineno)
cursect = self._sections[sectname]
elements_added.add(sectname)
elif sectname == self.default_section:
cursect = self._defaults
else:
cursect = self._dict()
self._sections[sectname] = cursect
self._proxies[sectname] = SectionProxy(self, sectname)
elements_added.add(sectname)
# So sections can't start with a continuation line
optname = None
# no section header in the file?
elif cursect is None:
raise MissingSectionHeaderError(fpname, lineno, line)
# an option line?
else:
mo = self._optcre.match(value)
if mo:
optname, vi, optval = mo.group('option', 'vi', 'value')
if not optname:
e = self._handle_error(e, fpname, lineno, line)
optname = self.optionxform(optname.rstrip())
if (self._strict and
(sectname, optname) in elements_added):
raise DuplicateOptionError(sectname, optname,
fpname, lineno)
elements_added.add((sectname, optname))
# This check is fine because the OPTCRE cannot
# match if it would set optval to None
if optval is not None:
optval = optval.strip()
cursect[optname] = [optval]
else:
# valueless option handling
cursect[optname] = None
else:
# a non-fatal parsing error occurred. set up the
# exception but keep going. the exception will be
# raised at the end of the file and will contain a
# list of all bogus lines
e = self._handle_error(e, fpname, lineno, line)
# if any parsing errors occurred, raise an exception
if e:
raise e
self._join_multiline_values()
def _join_multiline_values(self):
defaults = self.default_section, self._defaults
all_sections = itertools.chain((defaults,),
self._sections.items())
for section, options in all_sections:
for name, val in options.items():
if isinstance(val, list):
val = '\n'.join(val).rstrip()
options[name] = self._interpolation.before_read(self,
section,
name, val)
def _handle_error(self, exc, fpname, lineno, line):
if not exc:
exc = ParsingError(fpname)
exc.append(lineno, repr(line))
return exc
def _unify_values(self, section, vars):
"""Create a sequence of lookups with 'vars' taking priority over
the 'section' which takes priority over the DEFAULTSECT.
"""
sectiondict = {}
try:
sectiondict = self._sections[section]
except KeyError:
if section != self.default_section:
raise NoSectionError(section)
# Update with the entry specific variables
vardict = {}
if vars:
for key, value in vars.items():
if value is not None:
value = str(value)
vardict[self.optionxform(key)] = value
return _ChainMap(vardict, sectiondict, self._defaults)
def _convert_to_boolean(self, value):
"""Return a boolean value translating from other types if necessary.
"""
if value.lower() not in self.BOOLEAN_STATES:
raise ValueError('Not a boolean: %s' % value)
return self.BOOLEAN_STATES[value.lower()]
def _validate_value_types(self, *, section="", option="", value=""):
"""Raises a TypeError for non-string values.
The only legal non-string value if we allow valueless
options is None, so we need to check if the value is a
string if:
- we do not allow valueless options, or
- we allow valueless options but the value is not None
For compatibility reasons this method is not used in classic set()
for RawConfigParsers. It is invoked in every case for mapping protocol
access and in ConfigParser.set().
"""
if not isinstance(section, str):
raise TypeError("section names must be strings")
if not isinstance(option, str):
raise TypeError("option keys must be strings")
if not self._allow_no_value or value:
if not isinstance(value, str):
raise TypeError("option values must be strings")
class ConfigParser(RawConfigParser):
"""ConfigParser implementing interpolation."""
_DEFAULT_INTERPOLATION = BasicInterpolation()
def set(self, section, option, value=None):
"""Set an option. Extends RawConfigParser.set by validating type and
interpolation syntax on the value."""
self._validate_value_types(option=option, value=value)
super().set(section, option, value)
def add_section(self, section):
"""Create a new section in the configuration. Extends
RawConfigParser.add_section by validating if the section name is
a string."""
self._validate_value_types(section=section)
super().add_section(section)
class SafeConfigParser(ConfigParser):
"""ConfigParser alias for backwards compatibility purposes."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
warnings.warn(
"The SafeConfigParser class has been renamed to ConfigParser "
"in Python 3.2. This alias will be removed in future versions."
" Use ConfigParser directly instead.",
DeprecationWarning, stacklevel=2
)
class SectionProxy(MutableMapping):
"""A proxy for a single section from a parser."""
def __init__(self, parser, name):
"""Creates a view on a section of the specified `name` in `parser`."""
self._parser = parser
self._name = name
def __repr__(self):
return '<Section: {}>'.format(self._name)
def __getitem__(self, key):
if not self._parser.has_option(self._name, key):
raise KeyError(key)
return self._parser.get(self._name, key)
def __setitem__(self, key, value):
self._parser._validate_value_types(option=key, value=value)
return self._parser.set(self._name, key, value)
def __delitem__(self, key):
if not (self._parser.has_option(self._name, key) and
self._parser.remove_option(self._name, key)):
raise KeyError(key)
def __contains__(self, key):
return self._parser.has_option(self._name, key)
def __len__(self):
return len(self._options())
def __iter__(self):
return self._options().__iter__()
def _options(self):
if self._name != self._parser.default_section:
return self._parser.options(self._name)
else:
return self._parser.defaults()
def get(self, option, fallback=None, *, raw=False, vars=None):
return self._parser.get(self._name, option, raw=raw, vars=vars,
fallback=fallback)
def getint(self, option, fallback=None, *, raw=False, vars=None):
return self._parser.getint(self._name, option, raw=raw, vars=vars,
fallback=fallback)
def getfloat(self, option, fallback=None, *, raw=False, vars=None):
return self._parser.getfloat(self._name, option, raw=raw, vars=vars,
fallback=fallback)
def getboolean(self, option, fallback=None, *, raw=False, vars=None):
return self._parser.getboolean(self._name, option, raw=raw, vars=vars,
fallback=fallback)
@property
def parser(self):
# The parser object of the proxy is read-only.
return self._parser
@property
def name(self):
# The name of the section on a proxy is read-only.
return self._name
"""Utilities for with-statement contexts. See PEP 343."""
import sys
from collections import deque
from functools import wraps
__all__ = ["contextmanager", "closing", "ContextDecorator", "ExitStack",
"redirect_stdout", "suppress"]
class ContextDecorator(object):
"A base class or mixin that enables context managers to work as decorators."
def _recreate_cm(self):
"""Return a recreated instance of self.
Allows an otherwise one-shot context manager like
_GeneratorContextManager to support use as
a decorator via implicit recreation.
This is a private interface just for _GeneratorContextManager.
See issue #11647 for details.
"""
return self
def __call__(self, func):
@wraps(func)
def inner(*args, **kwds):
with self._recreate_cm():
return func(*args, **kwds)
return inner
class _GeneratorContextManager(ContextDecorator):
"""Helper for @contextmanager decorator."""
def __init__(self, func, args, kwds):
self.gen = func(*args, **kwds)
self.func, self.args, self.kwds = func, args, kwds
# Issue 19330: ensure context manager instances have good docstrings
doc = getattr(func, "__doc__", None)
if doc is None:
doc = type(self).__doc__
self.__doc__ = doc
# Unfortunately, this still doesn't provide good help output when
# inspecting the created context manager instances, since pydoc
# currently bypasses the instance docstring and shows the docstring
# for the class instead.
# See http://bugs.python.org/issue19404 for more details.
def _recreate_cm(self):
# _GCM instances are one-shot context managers, so the
# CM must be recreated each time a decorated function is
# called
return self.__class__(self.func, self.args, self.kwds)
def __enter__(self):
try:
return next(self.gen)
except StopIteration:
raise RuntimeError("generator didn't yield") from None
def __exit__(self, type, value, traceback):
if type is None:
try:
next(self.gen)
except StopIteration:
return
else:
raise RuntimeError("generator didn't stop")
else:
if value is None:
# Need to force instantiation so we can reliably
# tell if we get the same exception back
value = type()
try:
self.gen.throw(type, value, traceback)
raise RuntimeError("generator didn't stop after throw()")
except StopIteration as exc:
# Suppress the exception *unless* it's the same exception that
# was passed to throw(). This prevents a StopIteration
# raised inside the "with" statement from being suppressed
return exc is not value
except:
# only re-raise if it's *not* the exception that was
# passed to throw(), because __exit__() must not raise
# an exception unless __exit__() itself failed. But throw()
# has to raise the exception to signal propagation, so this
# fixes the impedance mismatch between the throw() protocol
# and the __exit__() protocol.
#
if sys.exc_info()[1] is not value:
raise
def contextmanager(func):
"""@contextmanager decorator.
Typical usage:
@contextmanager
def some_generator(<arguments>):
<setup>
try:
yield <value>
finally:
<cleanup>
This makes this:
with some_generator(<arguments>) as <variable>:
<body>
equivalent to this:
<setup>
try:
<variable> = <value>
<body>
finally:
<cleanup>
"""
@wraps(func)
def helper(*args, **kwds):
return _GeneratorContextManager(func, args, kwds)
return helper
class closing(object):
"""Context to automatically close something at the end of a block.
Code like this:
with closing(<module>.open(<arguments>)) as f:
<block>
is equivalent to this:
f = <module>.open(<arguments>)
try:
<block>
finally:
f.close()
"""
def __init__(self, thing):
self.thing = thing
def __enter__(self):
return self.thing
def __exit__(self, *exc_info):
self.thing.close()
class redirect_stdout:
"""Context manager for temporarily redirecting stdout to another file
# How to send help() to stderr
with redirect_stdout(sys.stderr):
help(dir)
# How to write help() to a file
with open('help.txt', 'w') as f:
with redirect_stdout(f):
help(pow)
"""
def __init__(self, new_target):
self._new_target = new_target
# We use a list of old targets to make this CM re-entrant
self._old_targets = []
def __enter__(self):
self._old_targets.append(sys.stdout)
sys.stdout = self._new_target
return self._new_target
def __exit__(self, exctype, excinst, exctb):
sys.stdout = self._old_targets.pop()
class suppress:
"""Context manager to suppress specified exceptions
After the exception is suppressed, execution proceeds with the next
statement following the with statement.
with suppress(FileNotFoundError):
os.remove(somefile)
# Execution still resumes here if the file was already removed
"""
def __init__(self, *exceptions):
self._exceptions = exceptions
def __enter__(self):
pass
def __exit__(self, exctype, excinst, exctb):
# Unlike isinstance and issubclass, CPython exception handling
# currently only looks at the concrete type hierarchy (ignoring
# the instance and subclass checking hooks). While Guido considers
# that a bug rather than a feature, it's a fairly hard one to fix
# due to various internal implementation details. suppress provides
# the simpler issubclass based semantics, rather than trying to
# exactly reproduce the limitations of the CPython interpreter.
#
# See http://bugs.python.org/issue12029 for more details
return exctype is not None and issubclass(exctype, self._exceptions)
# Inspired by discussions on http://bugs.python.org/issue13585
class ExitStack(object):
"""Context manager for dynamic management of a stack of exit callbacks
For example:
with ExitStack() as stack:
files = [stack.enter_context(open(fname)) for fname in filenames]
# All opened files will automatically be closed at the end of
# the with statement, even if attempts to open files later
# in the list raise an exception
"""
def __init__(self):
self._exit_callbacks = deque()
def pop_all(self):
"""Preserve the context stack by transferring it to a new instance"""
new_stack = type(self)()
new_stack._exit_callbacks = self._exit_callbacks
self._exit_callbacks = deque()
return new_stack
def _push_cm_exit(self, cm, cm_exit):
"""Helper to correctly register callbacks to __exit__ methods"""
def _exit_wrapper(*exc_details):
return cm_exit(cm, *exc_details)
_exit_wrapper.__self__ = cm
self.push(_exit_wrapper)
def push(self, exit):
"""Registers a callback with the standard __exit__ method signature
Can suppress exceptions the same way __exit__ methods can.
Also accepts any object with an __exit__ method (registering a call
to the method instead of the object itself)
"""
# We use an unbound method rather than a bound method to follow
# the standard lookup behaviour for special methods
_cb_type = type(exit)
try:
exit_method = _cb_type.__exit__
except AttributeError:
# Not a context manager, so assume its a callable
self._exit_callbacks.append(exit)
else:
self._push_cm_exit(exit, exit_method)
return exit # Allow use as a decorator
def callback(self, callback, *args, **kwds):
"""Registers an arbitrary callback and arguments.
Cannot suppress exceptions.
"""
def _exit_wrapper(exc_type, exc, tb):
callback(*args, **kwds)
# We changed the signature, so using @wraps is not appropriate, but
# setting __wrapped__ may still help with introspection
_exit_wrapper.__wrapped__ = callback
self.push(_exit_wrapper)
return callback # Allow use as a decorator
def enter_context(self, cm):
"""Enters the supplied context manager
If successful, also pushes its __exit__ method as a callback and
returns the result of the __enter__ method.
"""
# We look up the special methods on the type to match the with statement
_cm_type = type(cm)
_exit = _cm_type.__exit__
result = _cm_type.__enter__(cm)
self._push_cm_exit(cm, _exit)
return result
def close(self):
"""Immediately unwind the context stack"""
self.__exit__(None, None, None)
def __enter__(self):
return self
def __exit__(self, *exc_details):
received_exc = exc_details[0] is not None
# We manipulate the exception state so it behaves as though
# we were actually nesting multiple with statements
frame_exc = sys.exc_info()[1]
def _fix_exception_context(new_exc, old_exc):
# Context may not be correct, so find the end of the chain
while 1:
exc_context = new_exc.__context__
if exc_context is old_exc:
# Context is already set correctly (see issue 20317)
return
if exc_context is None or exc_context is frame_exc:
break
new_exc = exc_context
# Change the end of the chain to point to the exception
# we expect it to reference
new_exc.__context__ = old_exc
# Callbacks are invoked in LIFO order to match the behaviour of
# nested context managers
suppressed_exc = False
pending_raise = False
while self._exit_callbacks:
cb = self._exit_callbacks.pop()
try:
if cb(*exc_details):
suppressed_exc = True
pending_raise = False
exc_details = (None, None, None)
except:
new_exc_details = sys.exc_info()
# simulate the stack of exceptions by setting the context
_fix_exception_context(new_exc_details[1], exc_details[1])
pending_raise = True
exc_details = new_exc_details
if pending_raise:
try:
# bare "raise exc_details[1]" replaces our carefully
# set-up context
fixed_ctx = exc_details[1].__context__
raise exc_details[1]
except BaseException:
exc_details[1].__context__ = fixed_ctx
raise
return received_exc and suppressed_exc
"""Generic (shallow and deep) copying operations.
Interface summary:
import copy
x = copy.copy(y) # make a shallow copy of y
x = copy.deepcopy(y) # make a deep copy of y
For module specific errors, copy.Error is raised.
The difference between shallow and deep copying is only relevant for
compound objects (objects that contain other objects, like lists or
class instances).
- A shallow copy constructs a new compound object and then (to the
extent possible) inserts *the same objects* into it that the
original contains.
- A deep copy constructs a new compound object and then, recursively,
inserts *copies* into it of the objects found in the original.
Two problems often exist with deep copy operations that don't exist
with shallow copy operations:
a) recursive objects (compound objects that, directly or indirectly,
contain a reference to themselves) may cause a recursive loop
b) because deep copy copies *everything* it may copy too much, e.g.
administrative data structures that should be shared even between
copies
Python's deep copy operation avoids these problems by:
a) keeping a table of objects already copied during the current
copying pass
b) letting user-defined classes override the copying operation or the
set of components copied
This version does not copy types like module, class, function, method,
nor stack trace, stack frame, nor file, socket, window, nor array, nor
any similar types.
Classes can use the same interfaces to control copying that they use
to control pickling: they can define methods called __getinitargs__(),
__getstate__() and __setstate__(). See the documentation for module
"pickle" for information on these methods.
"""
import types
import weakref
from copyreg import dispatch_table
import builtins
class Error(Exception):
pass
error = Error # backward compatibility
try:
from org.python.core import PyStringMap
except ImportError:
PyStringMap = None
__all__ = ["Error", "copy", "deepcopy"]
def copy(x):
"""Shallow copy operation on arbitrary Python objects.
See the module's __doc__ string for more info.
"""
cls = type(x)
copier = _copy_dispatch.get(cls)
if copier:
return copier(x)
try:
issc = issubclass(cls, type)
except TypeError: # cls is not a class
issc = False
if issc:
# treat it as a regular class:
return _copy_immutable(x)
copier = getattr(cls, "__copy__", None)
if copier:
return copier(x)
reductor = dispatch_table.get(cls)
if reductor:
rv = reductor(x)
else:
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
else:
raise Error("un(shallow)copyable object of type %s" % cls)
return _reconstruct(x, rv, 0)
_copy_dispatch = d = {}
def _copy_immutable(x):
return x
for t in (type(None), int, float, bool, str, tuple,
bytes, frozenset, type, range,
types.BuiltinFunctionType, type(Ellipsis),
types.FunctionType, weakref.ref):
d[t] = _copy_immutable
t = getattr(types, "CodeType", None)
if t is not None:
d[t] = _copy_immutable
for name in ("complex", "unicode"):
t = getattr(builtins, name, None)
if t is not None:
d[t] = _copy_immutable
def _copy_with_constructor(x):
return type(x)(x)
for t in (list, dict, set):
d[t] = _copy_with_constructor
def _copy_with_copy_method(x):
return x.copy()
if PyStringMap is not None:
d[PyStringMap] = _copy_with_copy_method
del d
def deepcopy(x, memo=None, _nil=[]):
"""Deep copy operation on arbitrary Python objects.
See the module's __doc__ string for more info.
"""
if memo is None:
memo = {}
d = id(x)
y = memo.get(d, _nil)
if y is not _nil:
return y
cls = type(x)
copier = _deepcopy_dispatch.get(cls)
if copier:
y = copier(x, memo)
else:
try:
issc = issubclass(cls, type)
except TypeError: # cls is not a class (old Boost; see SF #502085)
issc = 0
if issc:
y = _deepcopy_atomic(x, memo)
else:
copier = getattr(x, "__deepcopy__", None)
if copier:
y = copier(memo)
else:
reductor = dispatch_table.get(cls)
if reductor:
rv = reductor(x)
else:
reductor = getattr(x, "__reduce_ex__", None)
if reductor:
rv = reductor(2)
else:
reductor = getattr(x, "__reduce__", None)
if reductor:
rv = reductor()
else:
raise Error(
"un(deep)copyable object of type %s" % cls)
y = _reconstruct(x, rv, 1, memo)
# If is its own copy, don't memoize.
if y is not x:
memo[d] = y
_keep_alive(x, memo) # Make sure x lives at least as long as d
return y
_deepcopy_dispatch = d = {}
def _deepcopy_atomic(x, memo):
return x
d[type(None)] = _deepcopy_atomic
d[type(Ellipsis)] = _deepcopy_atomic
d[int] = _deepcopy_atomic
d[float] = _deepcopy_atomic
d[bool] = _deepcopy_atomic
try:
d[complex] = _deepcopy_atomic
except NameError:
pass
d[bytes] = _deepcopy_atomic
d[str] = _deepcopy_atomic
try:
d[types.CodeType] = _deepcopy_atomic
except AttributeError:
pass
d[type] = _deepcopy_atomic
d[range] = _deepcopy_atomic
d[types.BuiltinFunctionType] = _deepcopy_atomic
d[types.FunctionType] = _deepcopy_atomic
d[weakref.ref] = _deepcopy_atomic
def _deepcopy_list(x, memo):
y = []
memo[id(x)] = y
for a in x:
y.append(deepcopy(a, memo))
return y
d[list] = _deepcopy_list
def _deepcopy_tuple(x, memo):
y = []
for a in x:
y.append(deepcopy(a, memo))
# We're not going to put the tuple in the memo, but it's still important we
# check for it, in case the tuple contains recursive mutable structures.
try:
return memo[id(x)]
except KeyError:
pass
for i in range(len(x)):
if x[i] is not y[i]:
y = tuple(y)
break
else:
y = x
return y
d[tuple] = _deepcopy_tuple
def _deepcopy_dict(x, memo):
y = {}
memo[id(x)] = y
for key, value in x.items():
y[deepcopy(key, memo)] = deepcopy(value, memo)
return y
d[dict] = _deepcopy_dict
if PyStringMap is not None:
d[PyStringMap] = _deepcopy_dict
def _deepcopy_method(x, memo): # Copy instance methods
return type(x)(x.__func__, deepcopy(x.__self__, memo))
_deepcopy_dispatch[types.MethodType] = _deepcopy_method
def _keep_alive(x, memo):
"""Keeps a reference to the object x in the memo.
Because we remember objects by their id, we have
to assure that possibly temporary objects are kept
alive by referencing them.
We store a reference at the id of the memo, which should
normally not be used unless someone tries to deepcopy
the memo itself...
"""
try:
memo[id(memo)].append(x)
except KeyError:
# aha, this is the first one :-)
memo[id(memo)]=[x]
def _reconstruct(x, info, deep, memo=None):
if isinstance(info, str):
return x
assert isinstance(info, tuple)
if memo is None:
memo = {}
n = len(info)
assert n in (2, 3, 4, 5)
callable, args = info[:2]
if n > 2:
state = info[2]
else:
state = None
if n > 3:
listiter = info[3]
else:
listiter = None
if n > 4:
dictiter = info[4]
else:
dictiter = None
if deep:
args = deepcopy(args, memo)
y = callable(*args)
memo[id(x)] = y
if state is not None:
if deep:
state = deepcopy(state, memo)
if hasattr(y, '__setstate__'):
y.__setstate__(state)
else:
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
else:
slotstate = None
if state is not None:
y.__dict__.update(state)
if slotstate is not None:
for key, value in slotstate.items():
setattr(y, key, value)
if listiter is not None:
for item in listiter:
if deep:
item = deepcopy(item, memo)
y.append(item)
if dictiter is not None:
for key, value in dictiter:
if deep:
key = deepcopy(key, memo)
value = deepcopy(value, memo)
y[key] = value
return y
del d
del types
# Helper for instance creation without calling __init__
class _EmptyClass:
pass
"""Helper to provide extensibility for pickle.
This is only useful to add pickle support for extension types defined in
C, not for instances of user-defined classes.
"""
__all__ = ["pickle", "constructor",
"add_extension", "remove_extension", "clear_extension_cache"]
dispatch_table = {}
def pickle(ob_type, pickle_function, constructor_ob=None):
if not callable(pickle_function):
raise TypeError("reduction functions must be callable")
dispatch_table[ob_type] = pickle_function
# The constructor_ob function is a vestige of safe for unpickling.
# There is no reason for the caller to pass it anymore.
if constructor_ob is not None:
constructor(constructor_ob)
def constructor(object):
if not callable(object):
raise TypeError("constructors must be callable")
# Example: provide pickling support for complex numbers.
try:
complex
except NameError:
pass
else:
def pickle_complex(c):
return complex, (c.real, c.imag)
pickle(complex, pickle_complex, complex)
# Support for pickling new-style objects
def _reconstructor(cls, base, state):
if base is object:
obj = object.__new__(cls)
else:
obj = base.__new__(cls, state)
if base.__init__ != object.__init__:
base.__init__(obj, state)
return obj
_HEAPTYPE = 1<<9
# Python code for object.__reduce_ex__ for protocols 0 and 1
def _reduce_ex(self, proto):
assert proto < 2
for base in self.__class__.__mro__:
if hasattr(base, '__flags__') and not base.__flags__ & _HEAPTYPE:
break
else:
base = object # not really reachable
if base is object:
state = None
else:
if base is self.__class__:
raise TypeError("can't pickle %s objects" % base.__name__)
state = base(self)
args = (self.__class__, base, state)
try:
getstate = self.__getstate__
except AttributeError:
if getattr(self, "__slots__", None):
raise TypeError("a class that defines __slots__ without "
"defining __getstate__ cannot be pickled")
try:
dict = self.__dict__
except AttributeError:
dict = None
else:
dict = getstate()
if dict:
return _reconstructor, args, dict
else:
return _reconstructor, args
# Helper for __reduce_ex__ protocol 2
def __newobj__(cls, *args):
return cls.__new__(cls, *args)
def __newobj_ex__(cls, args, kwargs):
"""Used by pickle protocol 4, instead of __newobj__ to allow classes with
keyword-only arguments to be pickled correctly.
"""
return cls.__new__(cls, *args, **kwargs)
def _slotnames(cls):
"""Return a list of slot names for a given class.
This needs to find slots defined by the class and its bases, so we
can't simply return the __slots__ attribute. We must walk down
the Method Resolution Order and concatenate the __slots__ of each
class found there. (This assumes classes don't modify their
__slots__ attribute to misrepresent their slots after the class is
defined.)
"""
# Get the value from a cache in the class if possible
names = cls.__dict__.get("__slotnames__")
if names is not None:
return names
# Not cached -- calculate the value
names = []
if not hasattr(cls, "__slots__"):
# This class has no slots
pass
else:
# Slots found -- gather slot names from all base classes
for c in cls.__mro__:
if "__slots__" in c.__dict__:
slots = c.__dict__['__slots__']
# if class has a single slot, it can be given as a string
if isinstance(slots, str):
slots = (slots,)
for name in slots:
# special descriptors
if name in ("__dict__", "__weakref__"):
continue
# mangled names
elif name.startswith('__') and not name.endswith('__'):
names.append('_%s%s' % (c.__name__, name))
else:
names.append(name)
# Cache the outcome in the class if at all possible
try:
cls.__slotnames__ = names
except:
pass # But don't die if we can't
return names
# A registry of extension codes. This is an ad-hoc compression
# mechanism. Whenever a global reference to <module>, <name> is about
# to be pickled, the (<module>, <name>) tuple is looked up here to see
# if it is a registered extension code for it. Extension codes are
# universal, so that the meaning of a pickle does not depend on
# context. (There are also some codes reserved for local use that
# don't have this restriction.) Codes are positive ints; 0 is
# reserved.
_extension_registry = {} # key -> code
_inverted_registry = {} # code -> key
_extension_cache = {} # code -> object
# Don't ever rebind those names: pickling grabs a reference to them when
# it's initialized, and won't see a rebinding.
def add_extension(module, name, code):
"""Register an extension code."""
code = int(code)
if not 1 <= code <= 0x7fffffff:
raise ValueError("code out of range")
key = (module, name)
if (_extension_registry.get(key) == code and
_inverted_registry.get(code) == key):
return # Redundant registrations are benign
if key in _extension_registry:
raise ValueError("key %s is already registered with code %s" %
(key, _extension_registry[key]))
if code in _inverted_registry:
raise ValueError("code %s is already in use for key %s" %
(code, _inverted_registry[code]))
_extension_registry[key] = code
_inverted_registry[code] = key
def remove_extension(module, name, code):
"""Unregister an extension code. For testing only."""
key = (module, name)
if (_extension_registry.get(key) != code or
_inverted_registry.get(code) != key):
raise ValueError("key %s is not registered with code %s" %
(key, code))
del _extension_registry[key]
del _inverted_registry[code]
if code in _extension_cache:
del _extension_cache[code]
def clear_extension_cache():
_extension_cache.clear()
# Standard extension code assignments
# Reserved ranges
# First Last Count Purpose
# 1 127 127 Reserved for Python standard library
# 128 191 64 Reserved for Zope
# 192 239 48 Reserved for 3rd parties
# 240 255 16 Reserved for private use (will never be assigned)
# 256 Inf Inf Reserved for future assignment
# Extension codes are assigned by the Python Software Foundation.
#! /usr/bin/env python3
"""Python interface for the 'lsprof' profiler.
Compatible with the 'profile' module.
"""
__all__ = ["run", "runctx", "Profile"]
import _lsprof
import profile as _pyprofile
# ____________________________________________________________
# Simple interface
def run(statement, filename=None, sort=-1):
return _pyprofile._Utils(Profile).run(statement, filename, sort)
def runctx(statement, globals, locals, filename=None, sort=-1):
return _pyprofile._Utils(Profile).runctx(statement, globals, locals,
filename, sort)
run.__doc__ = _pyprofile.run.__doc__
runctx.__doc__ = _pyprofile.runctx.__doc__
# ____________________________________________________________
class Profile(_lsprof.Profiler):
"""Profile(custom_timer=None, time_unit=None, subcalls=True, builtins=True)
Builds a profiler object using the specified timer function.
The default timer is a fast built-in one based on real time.
For custom timer functions returning integers, time_unit can
be a float specifying a scale (i.e. how long each integer unit
is, in seconds).
"""
# Most of the functionality is in the base class.
# This subclass only adds convenient and backward-compatible methods.
def print_stats(self, sort=-1):
import pstats
pstats.Stats(self).strip_dirs().sort_stats(sort).print_stats()
def dump_stats(self, file):
import marshal
with open(file, 'wb') as f:
self.create_stats()
marshal.dump(self.stats, f)
def create_stats(self):
self.disable()
self.snapshot_stats()
def snapshot_stats(self):
entries = self.getstats()
self.stats = {}
callersdicts = {}
# call information
for entry in entries:
func = label(entry.code)
nc = entry.callcount # ncalls column of pstats (before '/')
cc = nc - entry.reccallcount # ncalls column of pstats (after '/')
tt = entry.inlinetime # tottime column of pstats
ct = entry.totaltime # cumtime column of pstats
callers = {}
callersdicts[id(entry.code)] = callers
self.stats[func] = cc, nc, tt, ct, callers
# subcall information
for entry in entries:
if entry.calls:
func = label(entry.code)
for subentry in entry.calls:
try:
callers = callersdicts[id(subentry.code)]
except KeyError:
continue
nc = subentry.callcount
cc = nc - subentry.reccallcount
tt = subentry.inlinetime
ct = subentry.totaltime
if func in callers:
prev = callers[func]
nc += prev[0]
cc += prev[1]
tt += prev[2]
ct += prev[3]
callers[func] = nc, cc, tt, ct
# The following two methods can be called by clients to use
# a profiler to profile a statement, given as a string.
def run(self, cmd):
import __main__
dict = __main__.__dict__
return self.runctx(cmd, dict, dict)
def runctx(self, cmd, globals, locals):
self.enable()
try:
exec(cmd, globals, locals)
finally:
self.disable()
return self
# This method is more useful to profile a single function call.
def runcall(self, func, *args, **kw):
self.enable()
try:
return func(*args, **kw)
finally:
self.disable()
# ____________________________________________________________
def label(code):
if isinstance(code, str):
return ('~', 0, code) # built-in functions ('~' sorts at the end)
else:
return (code.co_filename, code.co_firstlineno, code.co_name)
# ____________________________________________________________
def main():
import os, sys
from optparse import OptionParser
usage = "cProfile.py [-o output_file_path] [-s sort] scriptfile [arg] ..."
parser = OptionParser(usage=usage)
parser.allow_interspersed_args = False
parser.add_option('-o', '--outfile', dest="outfile",
help="Save stats to <outfile>", default=None)
parser.add_option('-s', '--sort', dest="sort",
help="Sort order when printing to stdout, based on pstats.Stats class",
default=-1)
if not sys.argv[1:]:
parser.print_usage()
sys.exit(2)
(options, args) = parser.parse_args()
sys.argv[:] = args
if len(args) > 0:
progname = args[0]
sys.path.insert(0, os.path.dirname(progname))
with open(progname, 'rb') as fp:
code = compile(fp.read(), progname, 'exec')
globs = {
'__file__': progname,
'__name__': '__main__',
'__package__': None,
'__cached__': None,
}
runctx(code, globs, None, options.outfile, options.sort)
else:
parser.print_usage()
return parser
# When invoked as main program, invoke the profiler on a script
if __name__ == '__main__':
main()
"""Wrapper to the POSIX crypt library call and associated functionality."""
import _crypt
import string as _string
from random import SystemRandom as _SystemRandom
from collections import namedtuple as _namedtuple
_saltchars = _string.ascii_letters + _string.digits + './'
_sr = _SystemRandom()
class _Method(_namedtuple('_Method', 'name ident salt_chars total_size')):
"""Class representing a salt method per the Modular Crypt Format or the
legacy 2-character crypt method."""
def __repr__(self):
return '<crypt.METHOD_{}>'.format(self.name)
def mksalt(method=None):
"""Generate a salt for the specified method.
If not specified, the strongest available method will be used.
"""
if method is None:
method = methods[0]
s = '${}$'.format(method.ident) if method.ident else ''
s += ''.join(_sr.choice(_saltchars) for char in range(method.salt_chars))
return s
def crypt(word, salt=None):
"""Return a string representing the one-way hash of a password, with a salt
prepended.
If ``salt`` is not specified or is ``None``, the strongest
available method will be selected and a salt generated. Otherwise,
``salt`` may be one of the ``crypt.METHOD_*`` values, or a string as
returned by ``crypt.mksalt()``.
"""
if salt is None or isinstance(salt, _Method):
salt = mksalt(salt)
return _crypt.crypt(word, salt)
# available salting/crypto methods
METHOD_CRYPT = _Method('CRYPT', None, 2, 13)
METHOD_MD5 = _Method('MD5', '1', 8, 34)
METHOD_SHA256 = _Method('SHA256', '5', 16, 63)
METHOD_SHA512 = _Method('SHA512', '6', 16, 106)
methods = []
for _method in (METHOD_SHA512, METHOD_SHA256, METHOD_MD5):
_result = crypt('', _method)
if _result and len(_result) == _method.total_size:
methods.append(_method)
methods.append(METHOD_CRYPT)
del _result, _method
"""
csv.py - read/write/investigate CSV files
"""
import re
from _csv import Error, __version__, writer, reader, register_dialect, \
unregister_dialect, get_dialect, list_dialects, \
field_size_limit, \
QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONNUMERIC, QUOTE_NONE, \
__doc__
from _csv import Dialect as _Dialect
from io import StringIO
__all__ = [ "QUOTE_MINIMAL", "QUOTE_ALL", "QUOTE_NONNUMERIC", "QUOTE_NONE",
"Error", "Dialect", "__doc__", "excel", "excel_tab",
"field_size_limit", "reader", "writer",
"register_dialect", "get_dialect", "list_dialects", "Sniffer",
"unregister_dialect", "__version__", "DictReader", "DictWriter" ]
class Dialect:
"""Describe a CSV dialect.
This must be subclassed (see csv.excel). Valid attributes are:
delimiter, quotechar, escapechar, doublequote, skipinitialspace,
lineterminator, quoting.
"""
_name = ""
_valid = False
# placeholders
delimiter = None
quotechar = None
escapechar = None
doublequote = None
skipinitialspace = None
lineterminator = None
quoting = None
def __init__(self):
if self.__class__ != Dialect:
self._valid = True
self._validate()
def _validate(self):
try:
_Dialect(self)
except TypeError as e:
# We do this for compatibility with py2.3
raise Error(str(e))
class excel(Dialect):
"""Describe the usual properties of Excel-generated CSV files."""
delimiter = ','
quotechar = '"'
doublequote = True
skipinitialspace = False
lineterminator = '\r\n'
quoting = QUOTE_MINIMAL
register_dialect("excel", excel)
class excel_tab(excel):
"""Describe the usual properties of Excel-generated TAB-delimited files."""
delimiter = '\t'
register_dialect("excel-tab", excel_tab)
class unix_dialect(Dialect):
"""Describe the usual properties of Unix-generated CSV files."""
delimiter = ','
quotechar = '"'
doublequote = True
skipinitialspace = False
lineterminator = '\n'
quoting = QUOTE_ALL
register_dialect("unix", unix_dialect)
class DictReader:
def __init__(self, f, fieldnames=None, restkey=None, restval=None,
dialect="excel", *args, **kwds):
self._fieldnames = fieldnames # list of keys for the dict
self.restkey = restkey # key to catch long rows
self.restval = restval # default value for short rows
self.reader = reader(f, dialect, *args, **kwds)
self.dialect = dialect
self.line_num = 0
def __iter__(self):
return self
@property
def fieldnames(self):
if self._fieldnames is None:
try:
self._fieldnames = next(self.reader)
except StopIteration:
pass
self.line_num = self.reader.line_num
return self._fieldnames
@fieldnames.setter
def fieldnames(self, value):
self._fieldnames = value
def __next__(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = next(self.reader)
self.line_num = self.reader.line_num
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == []:
row = next(self.reader)
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
class DictWriter:
def __init__(self, f, fieldnames, restval="", extrasaction="raise",
dialect="excel", *args, **kwds):
self.fieldnames = fieldnames # list of keys for the dict
self.restval = restval # for writing short dicts
if extrasaction.lower() not in ("raise", "ignore"):
raise ValueError("extrasaction (%s) must be 'raise' or 'ignore'"
% extrasaction)
self.extrasaction = extrasaction
self.writer = writer(f, dialect, *args, **kwds)
def writeheader(self):
header = dict(zip(self.fieldnames, self.fieldnames))
self.writerow(header)
def _dict_to_list(self, rowdict):
if self.extrasaction == "raise":
wrong_fields = [k for k in rowdict if k not in self.fieldnames]
if wrong_fields:
raise ValueError("dict contains fields not in fieldnames: "
+ ", ".join([repr(x) for x in wrong_fields]))
return [rowdict.get(key, self.restval) for key in self.fieldnames]
def writerow(self, rowdict):
return self.writer.writerow(self._dict_to_list(rowdict))
def writerows(self, rowdicts):
rows = []
for rowdict in rowdicts:
rows.append(self._dict_to_list(rowdict))
return self.writer.writerows(rows)
# Guard Sniffer's type checking against builds that exclude complex()
try:
complex
except NameError:
complex = float
class Sniffer:
'''
"Sniffs" the format of a CSV file (i.e. delimiter, quotechar)
Returns a Dialect object.
'''
def __init__(self):
# in case there is more than one possible delimiter
self.preferred = [',', '\t', ';', ' ', ':']
def sniff(self, sample, delimiters=None):
"""
Returns a dialect (or None) corresponding to the sample
"""
quotechar, doublequote, delimiter, skipinitialspace = \
self._guess_quote_and_delimiter(sample, delimiters)
if not delimiter:
delimiter, skipinitialspace = self._guess_delimiter(sample,
delimiters)
if not delimiter:
raise Error("Could not determine delimiter")
class dialect(Dialect):
_name = "sniffed"
lineterminator = '\r\n'
quoting = QUOTE_MINIMAL
# escapechar = ''
dialect.doublequote = doublequote
dialect.delimiter = delimiter
# _csv.reader won't accept a quotechar of ''
dialect.quotechar = quotechar or '"'
dialect.skipinitialspace = skipinitialspace
return dialect
def _guess_quote_and_delimiter(self, data, delimiters):
"""
Looks for text enclosed between two identical quotes
(the probable quotechar) which are preceded and followed
by the same character (the probable delimiter).
For example:
,'some text',
The quote with the most wins, same with the delimiter.
If there is no quotechar the delimiter can't be determined
this way.
"""
matches = []
for restr in ('(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?P=delim)', # ,".*?",
'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?P<delim>[^\w\n"\'])(?P<space> ?)', # ".*?",
'(?P<delim>>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\n)', # ,".*?"
'(?:^|\n)(?P<quote>["\']).*?(?P=quote)(?:$|\n)'): # ".*?" (no delim, no space)
regexp = re.compile(restr, re.DOTALL | re.MULTILINE)
matches = regexp.findall(data)
if matches:
break
if not matches:
# (quotechar, doublequote, delimiter, skipinitialspace)
return ('', False, None, 0)
quotes = {}
delims = {}
spaces = 0
for m in matches:
n = regexp.groupindex['quote'] - 1
key = m[n]
if key:
quotes[key] = quotes.get(key, 0) + 1
try:
n = regexp.groupindex['delim'] - 1
key = m[n]
except KeyError:
continue
if key and (delimiters is None or key in delimiters):
delims[key] = delims.get(key, 0) + 1
try:
n = regexp.groupindex['space'] - 1
except KeyError:
continue
if m[n]:
spaces += 1
quotechar = max(quotes, key=quotes.get)
if delims:
delim = max(delims, key=delims.get)
skipinitialspace = delims[delim] == spaces
if delim == '\n': # most likely a file with a single column
delim = ''
else:
# there is *no* delimiter, it's a single column of quoted data
delim = ''
skipinitialspace = 0
# if we see an extra quote between delimiters, we've got a
# double quoted format
dq_regexp = re.compile(
r"((%(delim)s)|^)\W*%(quote)s[^%(delim)s\n]*%(quote)s[^%(delim)s\n]*%(quote)s\W*((%(delim)s)|$)" % \
{'delim':re.escape(delim), 'quote':quotechar}, re.MULTILINE)
if dq_regexp.search(data):
doublequote = True
else:
doublequote = False
return (quotechar, doublequote, delim, skipinitialspace)
def _guess_delimiter(self, data, delimiters):
"""
The delimiter /should/ occur the same number of times on
each row. However, due to malformed data, it may not. We don't want
an all or nothing approach, so we allow for small variations in this
number.
1) build a table of the frequency of each character on every line.
2) build a table of frequencies of this frequency (meta-frequency?),
e.g. 'x occurred 5 times in 10 rows, 6 times in 1000 rows,
7 times in 2 rows'
3) use the mode of the meta-frequency to determine the /expected/
frequency for that character
4) find out how often the character actually meets that goal
5) the character that best meets its goal is the delimiter
For performance reasons, the data is evaluated in chunks, so it can
try and evaluate the smallest portion of the data possible, evaluating
additional chunks as necessary.
"""
data = list(filter(None, data.split('\n')))
ascii = [chr(c) for c in range(127)] # 7-bit ASCII
# build frequency tables
chunkLength = min(10, len(data))
iteration = 0
charFrequency = {}
modes = {}
delims = {}
start, end = 0, min(chunkLength, len(data))
while start < len(data):
iteration += 1
for line in data[start:end]:
for char in ascii:
metaFrequency = charFrequency.get(char, {})
# must count even if frequency is 0
freq = line.count(char)
# value is the mode
metaFrequency[freq] = metaFrequency.get(freq, 0) + 1
charFrequency[char] = metaFrequency
for char in charFrequency.keys():
items = list(charFrequency[char].items())
if len(items) == 1 and items[0][0] == 0:
continue
# get the mode of the frequencies
if len(items) > 1:
modes[char] = max(items, key=lambda x: x[1])
# adjust the mode - subtract the sum of all
# other frequencies
items.remove(modes[char])
modes[char] = (modes[char][0], modes[char][1]
- sum(item[1] for item in items))
else:
modes[char] = items[0]
# build a list of possible delimiters
modeList = modes.items()
total = float(chunkLength * iteration)
# (rows of consistent data) / (number of rows) = 100%
consistency = 1.0
# minimum consistency threshold
threshold = 0.9
while len(delims) == 0 and consistency >= threshold:
for k, v in modeList:
if v[0] > 0 and v[1] > 0:
if ((v[1]/total) >= consistency and
(delimiters is None or k in delimiters)):
delims[k] = v
consistency -= 0.01
if len(delims) == 1:
delim = list(delims.keys())[0]
skipinitialspace = (data[0].count(delim) ==
data[0].count("%c " % delim))
return (delim, skipinitialspace)
# analyze another chunkLength lines
start = end
end += chunkLength
if not delims:
return ('', 0)
# if there's more than one, fall back to a 'preferred' list
if len(delims) > 1:
for d in self.preferred:
if d in delims.keys():
skipinitialspace = (data[0].count(d) ==
data[0].count("%c " % d))
return (d, skipinitialspace)
# nothing else indicates a preference, pick the character that
# dominates(?)
items = [(v,k) for (k,v) in delims.items()]
items.sort()
delim = items[-1][1]
skipinitialspace = (data[0].count(delim) ==
data[0].count("%c " % delim))
return (delim, skipinitialspace)
def has_header(self, sample):
# Creates a dictionary of types of data in each column. If any
# column is of a single type (say, integers), *except* for the first
# row, then the first row is presumed to be labels. If the type
# can't be determined, it is assumed to be a string in which case
# the length of the string is the determining factor: if all of the
# rows except for the first are the same length, it's a header.
# Finally, a 'vote' is taken at the end for each column, adding or
# subtracting from the likelihood of the first row being a header.
rdr = reader(StringIO(sample), self.sniff(sample))
header = next(rdr) # assume first row is header
columns = len(header)
columnTypes = {}
for i in range(columns): columnTypes[i] = None
checked = 0
for row in rdr:
# arbitrary number of rows to check, to keep it sane
if checked > 20:
break
checked += 1
if len(row) != columns:
continue # skip rows that have irregular number of columns
for col in list(columnTypes.keys()):
for thisType in [int, float, complex]:
try:
thisType(row[col])
break
except (ValueError, OverflowError):
pass
else:
# fallback to length of string
thisType = len(row[col])
if thisType != columnTypes[col]:
if columnTypes[col] is None: # add new column type
columnTypes[col] = thisType
else:
# type is inconsistent, remove column from
# consideration
del columnTypes[col]
# finally, compare results against first row and "vote"
# on whether it's a header
hasHeader = 0
for col, colType in columnTypes.items():
if type(colType) == type(0): # it's a length
if len(header[col]) != colType:
hasHeader += 1
else:
hasHeader -= 1
else: # attempt typecast
try:
colType(header[col])
except (ValueError, TypeError):
hasHeader += 1
else:
hasHeader -= 1
return hasHeader > 0
"""Concrete date/time and related types.
See http://www.iana.org/time-zones/repository/tz-link.html for
time zone and DST data sources.
"""
import time as _time
import math as _math
def _cmp(x, y):
return 0 if x == y else 1 if x > y else -1
MINYEAR = 1
MAXYEAR = 9999
_MAXORDINAL = 3652059 # date.max.toordinal()
# Utility functions, adapted from Python's Demo/classes/Dates.py, which
# also assumes the current Gregorian calendar indefinitely extended in
# both directions. Difference: Dates.py calls January 1 of year 0 day
# number 1. The code here calls January 1 of year 1 day number 1. This is
# to match the definition of the "proleptic Gregorian" calendar in Dershowitz
# and Reingold's "Calendrical Calculations", where it's the base calendar
# for all computations. See the book for algorithms for converting between
# proleptic Gregorian ordinals and many other calendar systems.
# -1 is a placeholder for indexing purposes.
_DAYS_IN_MONTH = [-1, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
_DAYS_BEFORE_MONTH = [-1] # -1 is a placeholder for indexing purposes.
dbm = 0
for dim in _DAYS_IN_MONTH[1:]:
_DAYS_BEFORE_MONTH.append(dbm)
dbm += dim
del dbm, dim
def _is_leap(year):
"year -> 1 if leap year, else 0."
return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
def _days_before_year(year):
"year -> number of days before January 1st of year."
y = year - 1
return y*365 + y//4 - y//100 + y//400
def _days_in_month(year, month):
"year, month -> number of days in that month in that year."
assert 1 <= month <= 12, month
if month == 2 and _is_leap(year):
return 29
return _DAYS_IN_MONTH[month]
def _days_before_month(year, month):
"year, month -> number of days in year preceding first day of month."
assert 1 <= month <= 12, 'month must be in 1..12'
return _DAYS_BEFORE_MONTH[month] + (month > 2 and _is_leap(year))
def _ymd2ord(year, month, day):
"year, month, day -> ordinal, considering 01-Jan-0001 as day 1."
assert 1 <= month <= 12, 'month must be in 1..12'
dim = _days_in_month(year, month)
assert 1 <= day <= dim, ('day must be in 1..%d' % dim)
return (_days_before_year(year) +
_days_before_month(year, month) +
day)
_DI400Y = _days_before_year(401) # number of days in 400 years
_DI100Y = _days_before_year(101) # " " " " 100 "
_DI4Y = _days_before_year(5) # " " " " 4 "
# A 4-year cycle has an extra leap day over what we'd get from pasting
# together 4 single years.
assert _DI4Y == 4 * 365 + 1
# Similarly, a 400-year cycle has an extra leap day over what we'd get from
# pasting together 4 100-year cycles.
assert _DI400Y == 4 * _DI100Y + 1
# OTOH, a 100-year cycle has one fewer leap day than we'd get from
# pasting together 25 4-year cycles.
assert _DI100Y == 25 * _DI4Y - 1
def _ord2ymd(n):
"ordinal -> (year, month, day), considering 01-Jan-0001 as day 1."
# n is a 1-based index, starting at 1-Jan-1. The pattern of leap years
# repeats exactly every 400 years. The basic strategy is to find the
# closest 400-year boundary at or before n, then work with the offset
# from that boundary to n. Life is much clearer if we subtract 1 from
# n first -- then the values of n at 400-year boundaries are exactly
# those divisible by _DI400Y:
#
# D M Y n n-1
# -- --- ---- ---------- ----------------
# 31 Dec -400 -_DI400Y -_DI400Y -1
# 1 Jan -399 -_DI400Y +1 -_DI400Y 400-year boundary
# ...
# 30 Dec 000 -1 -2
# 31 Dec 000 0 -1
# 1 Jan 001 1 0 400-year boundary
# 2 Jan 001 2 1
# 3 Jan 001 3 2
# ...
# 31 Dec 400 _DI400Y _DI400Y -1
# 1 Jan 401 _DI400Y +1 _DI400Y 400-year boundary
n -= 1
n400, n = divmod(n, _DI400Y)
year = n400 * 400 + 1 # ..., -399, 1, 401, ...
# Now n is the (non-negative) offset, in days, from January 1 of year, to
# the desired date. Now compute how many 100-year cycles precede n.
# Note that it's possible for n100 to equal 4! In that case 4 full
# 100-year cycles precede the desired day, which implies the desired
# day is December 31 at the end of a 400-year cycle.
n100, n = divmod(n, _DI100Y)
# Now compute how many 4-year cycles precede it.
n4, n = divmod(n, _DI4Y)
# And now how many single years. Again n1 can be 4, and again meaning
# that the desired day is December 31 at the end of the 4-year cycle.
n1, n = divmod(n, 365)
year += n100 * 100 + n4 * 4 + n1
if n1 == 4 or n100 == 4:
assert n == 0
return year-1, 12, 31
# Now the year is correct, and n is the offset from January 1. We find
# the month via an estimate that's either exact or one too large.
leapyear = n1 == 3 and (n4 != 24 or n100 == 3)
assert leapyear == _is_leap(year)
month = (n + 50) >> 5
preceding = _DAYS_BEFORE_MONTH[month] + (month > 2 and leapyear)
if preceding > n: # estimate is too large
month -= 1
preceding -= _DAYS_IN_MONTH[month] + (month == 2 and leapyear)
n -= preceding
assert 0 <= n < _days_in_month(year, month)
# Now the year and month are correct, and n is the offset from the
# start of that month: we're done!
return year, month, n+1
# Month and day names. For localized versions, see the calendar module.
_MONTHNAMES = [None, "Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
_DAYNAMES = [None, "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
def _build_struct_time(y, m, d, hh, mm, ss, dstflag):
wday = (_ymd2ord(y, m, d) + 6) % 7
dnum = _days_before_month(y, m) + d
return _time.struct_time((y, m, d, hh, mm, ss, wday, dnum, dstflag))
def _format_time(hh, mm, ss, us):
# Skip trailing microseconds when us==0.
result = "%02d:%02d:%02d" % (hh, mm, ss)
if us:
result += ".%06d" % us
return result
# Correctly substitute for %z and %Z escapes in strftime formats.
def _wrap_strftime(object, format, timetuple):
# Don't call utcoffset() or tzname() unless actually needed.
freplace = None # the string to use for %f
zreplace = None # the string to use for %z
Zreplace = None # the string to use for %Z
# Scan format for %z and %Z escapes, replacing as needed.
newformat = []
push = newformat.append
i, n = 0, len(format)
while i < n:
ch = format[i]
i += 1
if ch == '%':
if i < n:
ch = format[i]
i += 1
if ch == 'f':
if freplace is None:
freplace = '%06d' % getattr(object,
'microsecond', 0)
newformat.append(freplace)
elif ch == 'z':
if zreplace is None:
zreplace = ""
if hasattr(object, "utcoffset"):
offset = object.utcoffset()
if offset is not None:
sign = '+'
if offset.days < 0:
offset = -offset
sign = '-'
h, m = divmod(offset, timedelta(hours=1))
assert not m % timedelta(minutes=1), "whole minute"
m //= timedelta(minutes=1)
zreplace = '%c%02d%02d' % (sign, h, m)
assert '%' not in zreplace
newformat.append(zreplace)
elif ch == 'Z':
if Zreplace is None:
Zreplace = ""
if hasattr(object, "tzname"):
s = object.tzname()
if s is not None:
# strftime is going to have at this: escape %
Zreplace = s.replace('%', '%%')
newformat.append(Zreplace)
else:
push('%')
push(ch)
else:
push('%')
else:
push(ch)
newformat = "".join(newformat)
return _time.strftime(newformat, timetuple)
def _call_tzinfo_method(tzinfo, methname, tzinfoarg):
if tzinfo is None:
return None
return getattr(tzinfo, methname)(tzinfoarg)
# Just raise TypeError if the arg isn't None or a string.
def _check_tzname(name):
if name is not None and not isinstance(name, str):
raise TypeError("tzinfo.tzname() must return None or string, "
"not '%s'" % type(name))
# name is the offset-producing method, "utcoffset" or "dst".
# offset is what it returned.
# If offset isn't None or timedelta, raises TypeError.
# If offset is None, returns None.
# Else offset is checked for being in range, and a whole # of minutes.
# If it is, its integer value is returned. Else ValueError is raised.
def _check_utc_offset(name, offset):
assert name in ("utcoffset", "dst")
if offset is None:
return
if not isinstance(offset, timedelta):
raise TypeError("tzinfo.%s() must return None "
"or timedelta, not '%s'" % (name, type(offset)))
if offset % timedelta(minutes=1) or offset.microseconds:
raise ValueError("tzinfo.%s() must return a whole number "
"of minutes, got %s" % (name, offset))
if not -timedelta(1) < offset < timedelta(1):
raise ValueError("%s()=%s, must be must be strictly between"
" -timedelta(hours=24) and timedelta(hours=24)"
% (name, offset))
def _check_date_fields(year, month, day):
if not isinstance(year, int):
raise TypeError('int expected')
if not MINYEAR <= year <= MAXYEAR:
raise ValueError('year must be in %d..%d' % (MINYEAR, MAXYEAR), year)
if not 1 <= month <= 12:
raise ValueError('month must be in 1..12', month)
dim = _days_in_month(year, month)
if not 1 <= day <= dim:
raise ValueError('day must be in 1..%d' % dim, day)
def _check_time_fields(hour, minute, second, microsecond):
if not isinstance(hour, int):
raise TypeError('int expected')
if not 0 <= hour <= 23:
raise ValueError('hour must be in 0..23', hour)
if not 0 <= minute <= 59:
raise ValueError('minute must be in 0..59', minute)
if not 0 <= second <= 59:
raise ValueError('second must be in 0..59', second)
if not 0 <= microsecond <= 999999:
raise ValueError('microsecond must be in 0..999999', microsecond)
def _check_tzinfo_arg(tz):
if tz is not None and not isinstance(tz, tzinfo):
raise TypeError("tzinfo argument must be None or of a tzinfo subclass")
def _cmperror(x, y):
raise TypeError("can't compare '%s' to '%s'" % (
type(x).__name__, type(y).__name__))
def _divide_and_round(a, b):
"""divide a by b and round result to the nearest integer
When the ratio is exactly half-way between two integers,
the even integer is returned.
"""
# Based on the reference implementation for divmod_near
# in Objects/longobject.c.
q, r = divmod(a, b)
# round up if either r / b > 0.5, or r / b == 0.5 and q is odd.
# The expression r / b > 0.5 is equivalent to 2 * r > b if b is
# positive, 2 * r < b if b negative.
r *= 2
greater_than_half = r > b if b > 0 else r < b
if greater_than_half or r == b and q % 2 == 1:
q += 1
return q
class timedelta:
"""Represent the difference between two datetime objects.
Supported operators:
- add, subtract timedelta
- unary plus, minus, abs
- compare to timedelta
- multiply, divide by int
In addition, datetime supports subtraction of two datetime objects
returning a timedelta, and addition or subtraction of a datetime
and a timedelta giving a datetime.
Representation: (days, seconds, microseconds). Why? Because I
felt like it.
"""
__slots__ = '_days', '_seconds', '_microseconds'
def __new__(cls, days=0, seconds=0, microseconds=0,
milliseconds=0, minutes=0, hours=0, weeks=0):
# Doing this efficiently and accurately in C is going to be difficult
# and error-prone, due to ubiquitous overflow possibilities, and that
# C double doesn't have enough bits of precision to represent
# microseconds over 10K years faithfully. The code here tries to make
# explicit where go-fast assumptions can be relied on, in order to
# guide the C implementation; it's way more convoluted than speed-
# ignoring auto-overflow-to-long idiomatic Python could be.
# XXX Check that all inputs are ints or floats.
# Final values, all integer.
# s and us fit in 32-bit signed ints; d isn't bounded.
d = s = us = 0
# Normalize everything to days, seconds, microseconds.
days += weeks*7
seconds += minutes*60 + hours*3600
microseconds += milliseconds*1000
# Get rid of all fractions, and normalize s and us.
# Take a deep breath <wink>.
if isinstance(days, float):
dayfrac, days = _math.modf(days)
daysecondsfrac, daysecondswhole = _math.modf(dayfrac * (24.*3600.))
assert daysecondswhole == int(daysecondswhole) # can't overflow
s = int(daysecondswhole)
assert days == int(days)
d = int(days)
else:
daysecondsfrac = 0.0
d = days
assert isinstance(daysecondsfrac, float)
assert abs(daysecondsfrac) <= 1.0
assert isinstance(d, int)
assert abs(s) <= 24 * 3600
# days isn't referenced again before redefinition
if isinstance(seconds, float):
secondsfrac, seconds = _math.modf(seconds)
assert seconds == int(seconds)
seconds = int(seconds)
secondsfrac += daysecondsfrac
assert abs(secondsfrac) <= 2.0
else:
secondsfrac = daysecondsfrac
# daysecondsfrac isn't referenced again
assert isinstance(secondsfrac, float)
assert abs(secondsfrac) <= 2.0
assert isinstance(seconds, int)
days, seconds = divmod(seconds, 24*3600)
d += days
s += int(seconds) # can't overflow
assert isinstance(s, int)
assert abs(s) <= 2 * 24 * 3600
# seconds isn't referenced again before redefinition
usdouble = secondsfrac * 1e6
assert abs(usdouble) < 2.1e6 # exact value not critical
# secondsfrac isn't referenced again
if isinstance(microseconds, float):
microseconds += usdouble
microseconds = round(microseconds, 0)
seconds, microseconds = divmod(microseconds, 1e6)
assert microseconds == int(microseconds)
assert seconds == int(seconds)
days, seconds = divmod(seconds, 24.*3600.)
assert days == int(days)
assert seconds == int(seconds)
d += int(days)
s += int(seconds) # can't overflow
assert isinstance(s, int)
assert abs(s) <= 3 * 24 * 3600
else:
seconds, microseconds = divmod(microseconds, 1000000)
days, seconds = divmod(seconds, 24*3600)
d += days
s += int(seconds) # can't overflow
assert isinstance(s, int)
assert abs(s) <= 3 * 24 * 3600
microseconds = float(microseconds)
microseconds += usdouble
microseconds = round(microseconds, 0)
assert abs(s) <= 3 * 24 * 3600
assert abs(microseconds) < 3.1e6
# Just a little bit of carrying possible for microseconds and seconds.
assert isinstance(microseconds, float)
assert int(microseconds) == microseconds
us = int(microseconds)
seconds, us = divmod(us, 1000000)
s += seconds # cant't overflow
assert isinstance(s, int)
days, s = divmod(s, 24*3600)
d += days
assert isinstance(d, int)
assert isinstance(s, int) and 0 <= s < 24*3600
assert isinstance(us, int) and 0 <= us < 1000000
self = object.__new__(cls)
self._days = d
self._seconds = s
self._microseconds = us
if abs(d) > 999999999:
raise OverflowError("timedelta # of days is too large: %d" % d)
return self
def __repr__(self):
if self._microseconds:
return "%s(%d, %d, %d)" % ('datetime.' + self.__class__.__name__,
self._days,
self._seconds,
self._microseconds)
if self._seconds:
return "%s(%d, %d)" % ('datetime.' + self.__class__.__name__,
self._days,
self._seconds)
return "%s(%d)" % ('datetime.' + self.__class__.__name__, self._days)
def __str__(self):
mm, ss = divmod(self._seconds, 60)
hh, mm = divmod(mm, 60)
s = "%d:%02d:%02d" % (hh, mm, ss)
if self._days:
def plural(n):
return n, abs(n) != 1 and "s" or ""
s = ("%d day%s, " % plural(self._days)) + s
if self._microseconds:
s = s + ".%06d" % self._microseconds
return s
def total_seconds(self):
"""Total seconds in the duration."""
return ((self.days * 86400 + self.seconds)*10**6 +
self.microseconds) / 10**6
# Read-only field accessors
@property
def days(self):
"""days"""
return self._days
@property
def seconds(self):
"""seconds"""
return self._seconds
@property
def microseconds(self):
"""microseconds"""
return self._microseconds
def __add__(self, other):
if isinstance(other, timedelta):
# for CPython compatibility, we cannot use
# our __class__ here, but need a real timedelta
return timedelta(self._days + other._days,
self._seconds + other._seconds,
self._microseconds + other._microseconds)
return NotImplemented
__radd__ = __add__
def __sub__(self, other):
if isinstance(other, timedelta):
# for CPython compatibility, we cannot use
# our __class__ here, but need a real timedelta
return timedelta(self._days - other._days,
self._seconds - other._seconds,
self._microseconds - other._microseconds)
return NotImplemented
def __rsub__(self, other):
if isinstance(other, timedelta):
return -self + other
return NotImplemented
def __neg__(self):
# for CPython compatibility, we cannot use
# our __class__ here, but need a real timedelta
return timedelta(-self._days,
-self._seconds,
-self._microseconds)
def __pos__(self):
return self
def __abs__(self):
if self._days < 0:
return -self
else:
return self
def __mul__(self, other):
if isinstance(other, int):
# for CPython compatibility, we cannot use
# our __class__ here, but need a real timedelta
return timedelta(self._days * other,
self._seconds * other,
self._microseconds * other)
if isinstance(other, float):
usec = self._to_microseconds()
a, b = other.as_integer_ratio()
return timedelta(0, 0, _divide_and_round(usec * a, b))
return NotImplemented
__rmul__ = __mul__
def _to_microseconds(self):
return ((self._days * (24*3600) + self._seconds) * 1000000 +
self._microseconds)
def __floordiv__(self, other):
if not isinstance(other, (int, timedelta)):
return NotImplemented
usec = self._to_microseconds()
if isinstance(other, timedelta):
return usec // other._to_microseconds()
if isinstance(other, int):
return timedelta(0, 0, usec // other)
def __truediv__(self, other):
if not isinstance(other, (int, float, timedelta)):
return NotImplemented
usec = self._to_microseconds()
if isinstance(other, timedelta):
return usec / other._to_microseconds()
if isinstance(other, int):
return timedelta(0, 0, _divide_and_round(usec, other))
if isinstance(other, float):
a, b = other.as_integer_ratio()
return timedelta(0, 0, _divide_and_round(b * usec, a))
def __mod__(self, other):
if isinstance(other, timedelta):
r = self._to_microseconds() % other._to_microseconds()
return timedelta(0, 0, r)
return NotImplemented
def __divmod__(self, other):
if isinstance(other, timedelta):
q, r = divmod(self._to_microseconds(),
other._to_microseconds())
return q, timedelta(0, 0, r)
return NotImplemented
# Comparisons of timedelta objects with other.
def __eq__(self, other):
if isinstance(other, timedelta):
return self._cmp(other) == 0
else:
return False
def __ne__(self, other):
if isinstance(other, timedelta):
return self._cmp(other) != 0
else:
return True
def __le__(self, other):
if isinstance(other, timedelta):
return self._cmp(other) <= 0
else:
_cmperror(self, other)
def __lt__(self, other):
if isinstance(other, timedelta):
return self._cmp(other) < 0
else:
_cmperror(self, other)
def __ge__(self, other):
if isinstance(other, timedelta):
return self._cmp(other) >= 0
else:
_cmperror(self, other)
def __gt__(self, other):
if isinstance(other, timedelta):
return self._cmp(other) > 0
else:
_cmperror(self, other)
def _cmp(self, other):
assert isinstance(other, timedelta)
return _cmp(self._getstate(), other._getstate())
def __hash__(self):
return hash(self._getstate())
def __bool__(self):
return (self._days != 0 or
self._seconds != 0 or
self._microseconds != 0)
# Pickle support.
def _getstate(self):
return (self._days, self._seconds, self._microseconds)
def __reduce__(self):
return (self.__class__, self._getstate())
timedelta.min = timedelta(-999999999)
timedelta.max = timedelta(days=999999999, hours=23, minutes=59, seconds=59,
microseconds=999999)
timedelta.resolution = timedelta(microseconds=1)
class date:
"""Concrete date type.
Constructors:
__new__()
fromtimestamp()
today()
fromordinal()
Operators:
__repr__, __str__
__eq__, __le__, __lt__, __ge__, __gt__, __hash__
__add__, __radd__, __sub__ (add/radd only with timedelta arg)
Methods:
timetuple()
toordinal()
weekday()
isoweekday(), isocalendar(), isoformat()
ctime()
strftime()
Properties (readonly):
year, month, day
"""
__slots__ = '_year', '_month', '_day'
def __new__(cls, year, month=None, day=None):
"""Constructor.
Arguments:
year, month, day (required, base 1)
"""
if (isinstance(year, bytes) and len(year) == 4 and
1 <= year[2] <= 12 and month is None): # Month is sane
# Pickle support
self = object.__new__(cls)
self.__setstate(year)
return self
_check_date_fields(year, month, day)
self = object.__new__(cls)
self._year = year
self._month = month
self._day = day
return self
# Additional constructors
@classmethod
def fromtimestamp(cls, t):
"Construct a date from a POSIX timestamp (like time.time())."
y, m, d, hh, mm, ss, weekday, jday, dst = _time.localtime(t)
return cls(y, m, d)
@classmethod
def today(cls):
"Construct a date from time.time()."
t = _time.time()
return cls.fromtimestamp(t)
@classmethod
def fromordinal(cls, n):
"""Contruct a date from a proleptic Gregorian ordinal.
January 1 of year 1 is day 1. Only the year, month and day are
non-zero in the result.
"""
y, m, d = _ord2ymd(n)
return cls(y, m, d)
# Conversions to string
def __repr__(self):
"""Convert to formal string, for repr().
>>> dt = datetime(2010, 1, 1)
>>> repr(dt)
'datetime.datetime(2010, 1, 1, 0, 0)'
>>> dt = datetime(2010, 1, 1, tzinfo=timezone.utc)
>>> repr(dt)
'datetime.datetime(2010, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)'
"""
return "%s(%d, %d, %d)" % ('datetime.' + self.__class__.__name__,
self._year,
self._month,
self._day)
# XXX These shouldn't depend on time.localtime(), because that
# clips the usable dates to [1970 .. 2038). At least ctime() is
# easily done without using strftime() -- that's better too because
# strftime("%c", ...) is locale specific.
def ctime(self):
"Return ctime() style string."
weekday = self.toordinal() % 7 or 7
return "%s %s %2d 00:00:00 %04d" % (
_DAYNAMES[weekday],
_MONTHNAMES[self._month],
self._day, self._year)
def strftime(self, fmt):
"Format using strftime()."
return _wrap_strftime(self, fmt, self.timetuple())
def __format__(self, fmt):
if len(fmt) != 0:
return self.strftime(fmt)
return str(self)
def isoformat(self):
"""Return the date formatted according to ISO.
This is 'YYYY-MM-DD'.
References:
- http://www.w3.org/TR/NOTE-datetime
- http://www.cl.cam.ac.uk/~mgk25/iso-time.html
"""
return "%04d-%02d-%02d" % (self._year, self._month, self._day)
__str__ = isoformat
# Read-only field accessors
@property
def year(self):
"""year (1-9999)"""
return self._year
@property
def month(self):
"""month (1-12)"""
return self._month
@property
def day(self):
"""day (1-31)"""
return self._day
# Standard conversions, __eq__, __le__, __lt__, __ge__, __gt__,
# __hash__ (and helpers)
def timetuple(self):
"Return local time tuple compatible with time.localtime()."
return _build_struct_time(self._year, self._month, self._day,
0, 0, 0, -1)
def toordinal(self):
"""Return proleptic Gregorian ordinal for the year, month and day.
January 1 of year 1 is day 1. Only the year, month and day values
contribute to the result.
"""
return _ymd2ord(self._year, self._month, self._day)
def replace(self, year=None, month=None, day=None):
"""Return a new date with new values for the specified fields."""
if year is None:
year = self._year
if month is None:
month = self._month
if day is None:
day = self._day
_check_date_fields(year, month, day)
return date(year, month, day)
# Comparisons of date objects with other.
def __eq__(self, other):
if isinstance(other, date):
return self._cmp(other) == 0
return NotImplemented
def __ne__(self, other):
if isinstance(other, date):
return self._cmp(other) != 0
return NotImplemented
def __le__(self, other):
if isinstance(other, date):
return self._cmp(other) <= 0
return NotImplemented
def __lt__(self, other):
if isinstance(other, date):
return self._cmp(other) < 0
return NotImplemented
def __ge__(self, other):
if isinstance(other, date):
return self._cmp(other) >= 0
return NotImplemented
def __gt__(self, other):
if isinstance(other, date):
return self._cmp(other) > 0
return NotImplemented
def _cmp(self, other):
assert isinstance(other, date)
y, m, d = self._year, self._month, self._day
y2, m2, d2 = other._year, other._month, other._day
return _cmp((y, m, d), (y2, m2, d2))
def __hash__(self):
"Hash."
return hash(self._getstate())
# Computations
def __add__(self, other):
"Add a date to a timedelta."
if isinstance(other, timedelta):
o = self.toordinal() + other.days
if 0 < o <= _MAXORDINAL:
return date.fromordinal(o)
raise OverflowError("result out of range")
return NotImplemented
__radd__ = __add__
def __sub__(self, other):
"""Subtract two dates, or a date and a timedelta."""
if isinstance(other, timedelta):
return self + timedelta(-other.days)
if isinstance(other, date):
days1 = self.toordinal()
days2 = other.toordinal()
return timedelta(days1 - days2)
return NotImplemented
def weekday(self):
"Return day of the week, where Monday == 0 ... Sunday == 6."
return (self.toordinal() + 6) % 7
# Day-of-the-week and week-of-the-year, according to ISO
def isoweekday(self):
"Return day of the week, where Monday == 1 ... Sunday == 7."
# 1-Jan-0001 is a Monday
return self.toordinal() % 7 or 7
def isocalendar(self):
"""Return a 3-tuple containing ISO year, week number, and weekday.
The first ISO week of the year is the (Mon-Sun) week
containing the year's first Thursday; everything else derives
from that.
The first week is 1; Monday is 1 ... Sunday is 7.
ISO calendar algorithm taken from
http://www.phys.uu.nl/~vgent/calendar/isocalendar.htm
"""
year = self._year
week1monday = _isoweek1monday(year)
today = _ymd2ord(self._year, self._month, self._day)
# Internally, week and day have origin 0
week, day = divmod(today - week1monday, 7)
if week < 0:
year -= 1
week1monday = _isoweek1monday(year)
week, day = divmod(today - week1monday, 7)
elif week >= 52:
if today >= _isoweek1monday(year+1):
year += 1
week = 0
return year, week+1, day+1
# Pickle support.
def _getstate(self):
yhi, ylo = divmod(self._year, 256)
return bytes([yhi, ylo, self._month, self._day]),
def __setstate(self, string):
if len(string) != 4 or not (1 <= string[2] <= 12):
raise TypeError("not enough arguments")
yhi, ylo, self._month, self._day = string
self._year = yhi * 256 + ylo
def __reduce__(self):
return (self.__class__, self._getstate())
_date_class = date # so functions w/ args named "date" can get at the class
date.min = date(1, 1, 1)
date.max = date(9999, 12, 31)
date.resolution = timedelta(days=1)
class tzinfo:
"""Abstract base class for time zone info classes.
Subclasses must override the name(), utcoffset() and dst() methods.
"""
__slots__ = ()
def tzname(self, dt):
"datetime -> string name of time zone."
raise NotImplementedError("tzinfo subclass must override tzname()")
def utcoffset(self, dt):
"datetime -> minutes east of UTC (negative for west of UTC)"
raise NotImplementedError("tzinfo subclass must override utcoffset()")
def dst(self, dt):
"""datetime -> DST offset in minutes east of UTC.
Return 0 if DST not in effect. utcoffset() must include the DST
offset.
"""
raise NotImplementedError("tzinfo subclass must override dst()")
def fromutc(self, dt):
"datetime in UTC -> datetime in local time."
if not isinstance(dt, datetime):
raise TypeError("fromutc() requires a datetime argument")
if dt.tzinfo is not self:
raise ValueError("dt.tzinfo is not self")
dtoff = dt.utcoffset()
if dtoff is None:
raise ValueError("fromutc() requires a non-None utcoffset() "
"result")
# See the long comment block at the end of this file for an
# explanation of this algorithm.
dtdst = dt.dst()
if dtdst is None:
raise ValueError("fromutc() requires a non-None dst() result")
delta = dtoff - dtdst
if delta:
dt += delta
dtdst = dt.dst()
if dtdst is None:
raise ValueError("fromutc(): dt.dst gave inconsistent "
"results; cannot convert")
return dt + dtdst
# Pickle support.
def __reduce__(self):
getinitargs = getattr(self, "__getinitargs__", None)
if getinitargs:
args = getinitargs()
else:
args = ()
getstate = getattr(self, "__getstate__", None)
if getstate:
state = getstate()
else:
state = getattr(self, "__dict__", None) or None
if state is None:
return (self.__class__, args)
else:
return (self.__class__, args, state)
_tzinfo_class = tzinfo
class time:
"""Time with time zone.
Constructors:
__new__()
Operators:
__repr__, __str__
__eq__, __le__, __lt__, __ge__, __gt__, __hash__
Methods:
strftime()
isoformat()
utcoffset()
tzname()
dst()
Properties (readonly):
hour, minute, second, microsecond, tzinfo
"""
def __new__(cls, hour=0, minute=0, second=0, microsecond=0, tzinfo=None):
"""Constructor.
Arguments:
hour, minute (required)
second, microsecond (default to zero)
tzinfo (default to None)
"""
self = object.__new__(cls)
if isinstance(hour, bytes) and len(hour) == 6:
# Pickle support
self.__setstate(hour, minute or None)
return self
_check_tzinfo_arg(tzinfo)
_check_time_fields(hour, minute, second, microsecond)
self._hour = hour
self._minute = minute
self._second = second
self._microsecond = microsecond
self._tzinfo = tzinfo
return self
# Read-only field accessors
@property
def hour(self):
"""hour (0-23)"""
return self._hour
@property
def minute(self):
"""minute (0-59)"""
return self._minute
@property
def second(self):
"""second (0-59)"""
return self._second
@property
def microsecond(self):
"""microsecond (0-999999)"""
return self._microsecond
@property
def tzinfo(self):
"""timezone info object"""
return self._tzinfo
# Standard conversions, __hash__ (and helpers)
# Comparisons of time objects with other.
def __eq__(self, other):
if isinstance(other, time):
return self._cmp(other, allow_mixed=True) == 0
else:
return False
def __ne__(self, other):
if isinstance(other, time):
return self._cmp(other, allow_mixed=True) != 0
else:
return True
def __le__(self, other):
if isinstance(other, time):
return self._cmp(other) <= 0
else:
_cmperror(self, other)
def __lt__(self, other):
if isinstance(other, time):
return self._cmp(other) < 0
else:
_cmperror(self, other)
def __ge__(self, other):
if isinstance(other, time):
return self._cmp(other) >= 0
else:
_cmperror(self, other)
def __gt__(self, other):
if isinstance(other, time):
return self._cmp(other) > 0
else:
_cmperror(self, other)
def _cmp(self, other, allow_mixed=False):
assert isinstance(other, time)
mytz = self._tzinfo
ottz = other._tzinfo
myoff = otoff = None
if mytz is ottz:
base_compare = True
else:
myoff = self.utcoffset()
otoff = other.utcoffset()
base_compare = myoff == otoff
if base_compare:
return _cmp((self._hour, self._minute, self._second,
self._microsecond),
(other._hour, other._minute, other._second,
other._microsecond))
if myoff is None or otoff is None:
if allow_mixed:
return 2 # arbitrary non-zero value
else:
raise TypeError("cannot compare naive and aware times")
myhhmm = self._hour * 60 + self._minute - myoff//timedelta(minutes=1)
othhmm = other._hour * 60 + other._minute - otoff//timedelta(minutes=1)
return _cmp((myhhmm, self._second, self._microsecond),
(othhmm, other._second, other._microsecond))
def __hash__(self):
"""Hash."""
tzoff = self.utcoffset()
if not tzoff: # zero or None
return hash(self._getstate()[0])
h, m = divmod(timedelta(hours=self.hour, minutes=self.minute) - tzoff,
timedelta(hours=1))
assert not m % timedelta(minutes=1), "whole minute"
m //= timedelta(minutes=1)
if 0 <= h < 24:
return hash(time(h, m, self.second, self.microsecond))
return hash((h, m, self.second, self.microsecond))
# Conversion to string
def _tzstr(self, sep=":"):
"""Return formatted timezone offset (+xx:xx) or None."""
off = self.utcoffset()
if off is not None:
if off.days < 0:
sign = "-"
off = -off
else:
sign = "+"
hh, mm = divmod(off, timedelta(hours=1))
assert not mm % timedelta(minutes=1), "whole minute"
mm //= timedelta(minutes=1)
assert 0 <= hh < 24
off = "%s%02d%s%02d" % (sign, hh, sep, mm)
return off
def __repr__(self):
"""Convert to formal string, for repr()."""
if self._microsecond != 0:
s = ", %d, %d" % (self._second, self._microsecond)
elif self._second != 0:
s = ", %d" % self._second
else:
s = ""
s= "%s(%d, %d%s)" % ('datetime.' + self.__class__.__name__,
self._hour, self._minute, s)
if self._tzinfo is not None:
assert s[-1:] == ")"
s = s[:-1] + ", tzinfo=%r" % self._tzinfo + ")"
return s
def isoformat(self):
"""Return the time formatted according to ISO.
This is 'HH:MM:SS.mmmmmm+zz:zz', or 'HH:MM:SS+zz:zz' if
self.microsecond == 0.
"""
s = _format_time(self._hour, self._minute, self._second,
self._microsecond)
tz = self._tzstr()
if tz:
s += tz
return s
__str__ = isoformat
def strftime(self, fmt):
"""Format using strftime(). The date part of the timestamp passed
to underlying strftime should not be used.
"""
# The year must be >= 1000 else Python's strftime implementation
# can raise a bogus exception.
timetuple = (1900, 1, 1,
self._hour, self._minute, self._second,
0, 1, -1)
return _wrap_strftime(self, fmt, timetuple)
def __format__(self, fmt):
if len(fmt) != 0:
return self.strftime(fmt)
return str(self)
# Timezone functions
def utcoffset(self):
"""Return the timezone offset in minutes east of UTC (negative west of
UTC)."""
if self._tzinfo is None:
return None
offset = self._tzinfo.utcoffset(None)
_check_utc_offset("utcoffset", offset)
return offset
def tzname(self):
"""Return the timezone name.
Note that the name is 100% informational -- there's no requirement that
it mean anything in particular. For example, "GMT", "UTC", "-500",
"-5:00", "EDT", "US/Eastern", "America/New York" are all valid replies.
"""
if self._tzinfo is None:
return None
name = self._tzinfo.tzname(None)
_check_tzname(name)
return name
def dst(self):
"""Return 0 if DST is not in effect, or the DST offset (in minutes
eastward) if DST is in effect.
This is purely informational; the DST offset has already been added to
the UTC offset returned by utcoffset() if applicable, so there's no
need to consult dst() unless you're interested in displaying the DST
info.
"""
if self._tzinfo is None:
return None
offset = self._tzinfo.dst(None)
_check_utc_offset("dst", offset)
return offset
def replace(self, hour=None, minute=None, second=None, microsecond=None,
tzinfo=True):
"""Return a new time with new values for the specified fields."""
if hour is None:
hour = self.hour
if minute is None:
minute = self.minute
if second is None:
second = self.second
if microsecond is None:
microsecond = self.microsecond
if tzinfo is True:
tzinfo = self.tzinfo
_check_time_fields(hour, minute, second, microsecond)
_check_tzinfo_arg(tzinfo)
return time(hour, minute, second, microsecond, tzinfo)
def __bool__(self):
if self.second or self.microsecond:
return True
offset = self.utcoffset() or timedelta(0)
return timedelta(hours=self.hour, minutes=self.minute) != offset
# Pickle support.
def _getstate(self):
us2, us3 = divmod(self._microsecond, 256)
us1, us2 = divmod(us2, 256)
basestate = bytes([self._hour, self._minute, self._second,
us1, us2, us3])
if self._tzinfo is None:
return (basestate,)
else:
return (basestate, self._tzinfo)
def __setstate(self, string, tzinfo):
if len(string) != 6 or string[0] >= 24:
raise TypeError("an integer is required")
(self._hour, self._minute, self._second,
us1, us2, us3) = string
self._microsecond = (((us1 << 8) | us2) << 8) | us3
if tzinfo is None or isinstance(tzinfo, _tzinfo_class):
self._tzinfo = tzinfo
else:
raise TypeError("bad tzinfo state arg %r" % tzinfo)
def __reduce__(self):
return (time, self._getstate())
_time_class = time # so functions w/ args named "time" can get at the class
time.min = time(0, 0, 0)
time.max = time(23, 59, 59, 999999)
time.resolution = timedelta(microseconds=1)
class datetime(date):
"""datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])
The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
"""
__slots__ = date.__slots__ + (
'_hour', '_minute', '_second',
'_microsecond', '_tzinfo')
def __new__(cls, year, month=None, day=None, hour=0, minute=0, second=0,
microsecond=0, tzinfo=None):
if isinstance(year, bytes) and len(year) == 10:
# Pickle support
self = date.__new__(cls, year[:4])
self.__setstate(year, month)
return self
_check_tzinfo_arg(tzinfo)
_check_time_fields(hour, minute, second, microsecond)
self = date.__new__(cls, year, month, day)
self._hour = hour
self._minute = minute
self._second = second
self._microsecond = microsecond
self._tzinfo = tzinfo
return self
# Read-only field accessors
@property
def hour(self):
"""hour (0-23)"""
return self._hour
@property
def minute(self):
"""minute (0-59)"""
return self._minute
@property
def second(self):
"""second (0-59)"""
return self._second
@property
def microsecond(self):
"""microsecond (0-999999)"""
return self._microsecond
@property
def tzinfo(self):
"""timezone info object"""
return self._tzinfo
@classmethod
def _fromtimestamp(cls, t, utc, tz):
"""Construct a datetime from a POSIX timestamp (like time.time()).
A timezone info object may be passed in as well.
"""
frac, t = _math.modf(t)
us = round(frac * 1e6)
if us >= 1000000:
t += 1
us -= 1000000
elif us < 0:
t -= 1
us += 1000000
converter = _time.gmtime if utc else _time.localtime
y, m, d, hh, mm, ss, weekday, jday, dst = converter(t)
ss = min(ss, 59) # clamp out leap seconds if the platform has them
return cls(y, m, d, hh, mm, ss, us, tz)
@classmethod
def fromtimestamp(cls, t, tz=None):
"""Construct a datetime from a POSIX timestamp (like time.time()).
A timezone info object may be passed in as well.
"""
_check_tzinfo_arg(tz)
result = cls._fromtimestamp(t, tz is not None, tz)
if tz is not None:
result = tz.fromutc(result)
return result
@classmethod
def utcfromtimestamp(cls, t):
"""Construct a naive UTC datetime from a POSIX timestamp."""
return cls._fromtimestamp(t, True, None)
# XXX This is supposed to do better than we *can* do by using time.time(),
# XXX if the platform supports a more accurate way. The C implementation
# XXX uses gettimeofday on platforms that have it, but that isn't
# XXX available from Python. So now() may return different results
# XXX across the implementations.
@classmethod
def now(cls, tz=None):
"Construct a datetime from time.time() and optional time zone info."
t = _time.time()
return cls.fromtimestamp(t, tz)
@classmethod
def utcnow(cls):
"Construct a UTC datetime from time.time()."
t = _time.time()
return cls.utcfromtimestamp(t)
@classmethod
def combine(cls, date, time):
"Construct a datetime from a given date and a given time."
if not isinstance(date, _date_class):
raise TypeError("date argument must be a date instance")
if not isinstance(time, _time_class):
raise TypeError("time argument must be a time instance")
return cls(date.year, date.month, date.day,
time.hour, time.minute, time.second, time.microsecond,
time.tzinfo)
def timetuple(self):
"Return local time tuple compatible with time.localtime()."
dst = self.dst()
if dst is None:
dst = -1
elif dst:
dst = 1
else:
dst = 0
return _build_struct_time(self.year, self.month, self.day,
self.hour, self.minute, self.second,
dst)
def timestamp(self):
"Return POSIX timestamp as float"
if self._tzinfo is None:
return _time.mktime((self.year, self.month, self.day,
self.hour, self.minute, self.second,
-1, -1, -1)) + self.microsecond / 1e6
else:
return (self - _EPOCH).total_seconds()
def utctimetuple(self):
"Return UTC time tuple compatible with time.gmtime()."
offset = self.utcoffset()
if offset:
self -= offset
y, m, d = self.year, self.month, self.day
hh, mm, ss = self.hour, self.minute, self.second
return _build_struct_time(y, m, d, hh, mm, ss, 0)
def date(self):
"Return the date part."
return date(self._year, self._month, self._day)
def time(self):
"Return the time part, with tzinfo None."
return time(self.hour, self.minute, self.second, self.microsecond)
def timetz(self):
"Return the time part, with same tzinfo."
return time(self.hour, self.minute, self.second, self.microsecond,
self._tzinfo)
def replace(self, year=None, month=None, day=None, hour=None,
minute=None, second=None, microsecond=None, tzinfo=True):
"""Return a new datetime with new values for the specified fields."""
if year is None:
year = self.year
if month is None:
month = self.month
if day is None:
day = self.day
if hour is None:
hour = self.hour
if minute is None:
minute = self.minute
if second is None:
second = self.second
if microsecond is None:
microsecond = self.microsecond
if tzinfo is True:
tzinfo = self.tzinfo
_check_date_fields(year, month, day)
_check_time_fields(hour, minute, second, microsecond)
_check_tzinfo_arg(tzinfo)
return datetime(year, month, day, hour, minute, second,
microsecond, tzinfo)
def astimezone(self, tz=None):
if tz is None:
if self.tzinfo is None:
raise ValueError("astimezone() requires an aware datetime")
ts = (self - _EPOCH) // timedelta(seconds=1)
localtm = _time.localtime(ts)
local = datetime(*localtm[:6])
try:
# Extract TZ data if available
gmtoff = localtm.tm_gmtoff
zone = localtm.tm_zone
except AttributeError:
# Compute UTC offset and compare with the value implied
# by tm_isdst. If the values match, use the zone name
# implied by tm_isdst.
delta = local - datetime(*_time.gmtime(ts)[:6])
dst = _time.daylight and localtm.tm_isdst > 0
gmtoff = -(_time.altzone if dst else _time.timezone)
if delta == timedelta(seconds=gmtoff):
tz = timezone(delta, _time.tzname[dst])
else:
tz = timezone(delta)
else:
tz = timezone(timedelta(seconds=gmtoff), zone)
elif not isinstance(tz, tzinfo):
raise TypeError("tz argument must be an instance of tzinfo")
mytz = self.tzinfo
if mytz is None:
raise ValueError("astimezone() requires an aware datetime")
if tz is mytz:
return self
# Convert self to UTC, and attach the new time zone object.
myoffset = self.utcoffset()
if myoffset is None:
raise ValueError("astimezone() requires an aware datetime")
utc = (self - myoffset).replace(tzinfo=tz)
# Convert from UTC to tz's local time.
return tz.fromutc(utc)
# Ways to produce a string.
def ctime(self):
"Return ctime() style string."
weekday = self.toordinal() % 7 or 7
return "%s %s %2d %02d:%02d:%02d %04d" % (
_DAYNAMES[weekday],
_MONTHNAMES[self._month],
self._day,
self._hour, self._minute, self._second,
self._year)
def isoformat(self, sep='T'):
"""Return the time formatted according to ISO.
This is 'YYYY-MM-DD HH:MM:SS.mmmmmm', or 'YYYY-MM-DD HH:MM:SS' if
self.microsecond == 0.
If self.tzinfo is not None, the UTC offset is also attached, giving
'YYYY-MM-DD HH:MM:SS.mmmmmm+HH:MM' or 'YYYY-MM-DD HH:MM:SS+HH:MM'.
Optional argument sep specifies the separator between date and
time, default 'T'.
"""
s = ("%04d-%02d-%02d%c" % (self._year, self._month, self._day,
sep) +
_format_time(self._hour, self._minute, self._second,
self._microsecond))
off = self.utcoffset()
if off is not None:
if off.days < 0:
sign = "-"
off = -off
else:
sign = "+"
hh, mm = divmod(off, timedelta(hours=1))
assert not mm % timedelta(minutes=1), "whole minute"
mm //= timedelta(minutes=1)
s += "%s%02d:%02d" % (sign, hh, mm)
return s
def __repr__(self):
"""Convert to formal string, for repr()."""
L = [self._year, self._month, self._day, # These are never zero
self._hour, self._minute, self._second, self._microsecond]
if L[-1] == 0:
del L[-1]
if L[-1] == 0:
del L[-1]
s = ", ".join(map(str, L))
s = "%s(%s)" % ('datetime.' + self.__class__.__name__, s)
if self._tzinfo is not None:
assert s[-1:] == ")"
s = s[:-1] + ", tzinfo=%r" % self._tzinfo + ")"
return s
def __str__(self):
"Convert to string, for str()."
return self.isoformat(sep=' ')
@classmethod
def strptime(cls, date_string, format):
'string, format -> new datetime parsed from a string (like time.strptime()).'
import _strptime
return _strptime._strptime_datetime(cls, date_string, format)
def utcoffset(self):
"""Return the timezone offset in minutes east of UTC (negative west of
UTC)."""
if self._tzinfo is None:
return None
offset = self._tzinfo.utcoffset(self)
_check_utc_offset("utcoffset", offset)
return offset
def tzname(self):
"""Return the timezone name.
Note that the name is 100% informational -- there's no requirement that
it mean anything in particular. For example, "GMT", "UTC", "-500",
"-5:00", "EDT", "US/Eastern", "America/New York" are all valid replies.
"""
name = _call_tzinfo_method(self._tzinfo, "tzname", self)
_check_tzname(name)
return name
def dst(self):
"""Return 0 if DST is not in effect, or the DST offset (in minutes
eastward) if DST is in effect.
This is purely informational; the DST offset has already been added to
the UTC offset returned by utcoffset() if applicable, so there's no
need to consult dst() unless you're interested in displaying the DST
info.
"""
if self._tzinfo is None:
return None
offset = self._tzinfo.dst(self)
_check_utc_offset("dst", offset)
return offset
# Comparisons of datetime objects with other.
def __eq__(self, other):
if isinstance(other, datetime):
return self._cmp(other, allow_mixed=True) == 0
elif not isinstance(other, date):
return NotImplemented
else:
return False
def __ne__(self, other):
if isinstance(other, datetime):
return self._cmp(other, allow_mixed=True) != 0
elif not isinstance(other, date):
return NotImplemented
else:
return True
def __le__(self, other):
if isinstance(other, datetime):
return self._cmp(other) <= 0
elif not isinstance(other, date):
return NotImplemented
else:
_cmperror(self, other)
def __lt__(self, other):
if isinstance(other, datetime):
return self._cmp(other) < 0
elif not isinstance(other, date):
return NotImplemented
else:
_cmperror(self, other)
def __ge__(self, other):
if isinstance(other, datetime):
return self._cmp(other) >= 0
elif not isinstance(other, date):
return NotImplemented
else:
_cmperror(self, other)
def __gt__(self, other):
if isinstance(other, datetime):
return self._cmp(other) > 0
elif not isinstance(other, date):
return NotImplemented
else:
_cmperror(self, other)
def _cmp(self, other, allow_mixed=False):
assert isinstance(other, datetime)
mytz = self._tzinfo
ottz = other._tzinfo
myoff = otoff = None
if mytz is ottz:
base_compare = True
else:
myoff = self.utcoffset()
otoff = other.utcoffset()
base_compare = myoff == otoff
if base_compare:
return _cmp((self._year, self._month, self._day,
self._hour, self._minute, self._second,
self._microsecond),
(other._year, other._month, other._day,
other._hour, other._minute, other._second,
other._microsecond))
if myoff is None or otoff is None:
if allow_mixed:
return 2 # arbitrary non-zero value
else:
raise TypeError("cannot compare naive and aware datetimes")
# XXX What follows could be done more efficiently...
diff = self - other # this will take offsets into account
if diff.days < 0:
return -1
return diff and 1 or 0
def __add__(self, other):
"Add a datetime and a timedelta."
if not isinstance(other, timedelta):
return NotImplemented
delta = timedelta(self.toordinal(),
hours=self._hour,
minutes=self._minute,
seconds=self._second,
microseconds=self._microsecond)
delta += other
hour, rem = divmod(delta.seconds, 3600)
minute, second = divmod(rem, 60)
if 0 < delta.days <= _MAXORDINAL:
return datetime.combine(date.fromordinal(delta.days),
time(hour, minute, second,
delta.microseconds,
tzinfo=self._tzinfo))
raise OverflowError("result out of range")
__radd__ = __add__
def __sub__(self, other):
"Subtract two datetimes, or a datetime and a timedelta."
if not isinstance(other, datetime):
if isinstance(other, timedelta):
return self + -other
return NotImplemented
days1 = self.toordinal()
days2 = other.toordinal()
secs1 = self._second + self._minute * 60 + self._hour * 3600
secs2 = other._second + other._minute * 60 + other._hour * 3600
base = timedelta(days1 - days2,
secs1 - secs2,
self._microsecond - other._microsecond)
if self._tzinfo is other._tzinfo:
return base
myoff = self.utcoffset()
otoff = other.utcoffset()
if myoff == otoff:
return base
if myoff is None or otoff is None:
raise TypeError("cannot mix naive and timezone-aware time")
return base + otoff - myoff
def __hash__(self):
tzoff = self.utcoffset()
if tzoff is None:
return hash(self._getstate()[0])
days = _ymd2ord(self.year, self.month, self.day)
seconds = self.hour * 3600 + self.minute * 60 + self.second
return hash(timedelta(days, seconds, self.microsecond) - tzoff)
# Pickle support.
def _getstate(self):
yhi, ylo = divmod(self._year, 256)
us2, us3 = divmod(self._microsecond, 256)
us1, us2 = divmod(us2, 256)
basestate = bytes([yhi, ylo, self._month, self._day,
self._hour, self._minute, self._second,
us1, us2, us3])
if self._tzinfo is None:
return (basestate,)
else:
return (basestate, self._tzinfo)
def __setstate(self, string, tzinfo):
(yhi, ylo, self._month, self._day, self._hour,
self._minute, self._second, us1, us2, us3) = string
self._year = yhi * 256 + ylo
self._microsecond = (((us1 << 8) | us2) << 8) | us3
if tzinfo is None or isinstance(tzinfo, _tzinfo_class):
self._tzinfo = tzinfo
else:
raise TypeError("bad tzinfo state arg %r" % tzinfo)
def __reduce__(self):
return (self.__class__, self._getstate())
datetime.min = datetime(1, 1, 1)
datetime.max = datetime(9999, 12, 31, 23, 59, 59, 999999)
datetime.resolution = timedelta(microseconds=1)
def _isoweek1monday(year):
# Helper to calculate the day number of the Monday starting week 1
# XXX This could be done more efficiently
THURSDAY = 3
firstday = _ymd2ord(year, 1, 1)
firstweekday = (firstday + 6) % 7 # See weekday() above
week1monday = firstday - firstweekday
if firstweekday > THURSDAY:
week1monday += 7
return week1monday
class timezone(tzinfo):
__slots__ = '_offset', '_name'
# Sentinel value to disallow None
_Omitted = object()
def __new__(cls, offset, name=_Omitted):
if not isinstance(offset, timedelta):
raise TypeError("offset must be a timedelta")
if name is cls._Omitted:
if not offset:
return cls.utc
name = None
elif not isinstance(name, str):
raise TypeError("name must be a string")
if not cls._minoffset <= offset <= cls._maxoffset:
raise ValueError("offset must be a timedelta"
" strictly between -timedelta(hours=24) and"
" timedelta(hours=24).")
if (offset.microseconds != 0 or
offset.seconds % 60 != 0):
raise ValueError("offset must be a timedelta"
" representing a whole number of minutes")
return cls._create(offset, name)
@classmethod
def _create(cls, offset, name=None):
self = tzinfo.__new__(cls)
self._offset = offset
self._name = name
return self
def __getinitargs__(self):
"""pickle support"""
if self._name is None:
return (self._offset,)
return (self._offset, self._name)
def __eq__(self, other):
if type(other) != timezone:
return False
return self._offset == other._offset
def __hash__(self):
return hash(self._offset)
def __repr__(self):
"""Convert to formal string, for repr().
>>> tz = timezone.utc
>>> repr(tz)
'datetime.timezone.utc'
>>> tz = timezone(timedelta(hours=-5), 'EST')
>>> repr(tz)
"datetime.timezone(datetime.timedelta(-1, 68400), 'EST')"
"""
if self is self.utc:
return 'datetime.timezone.utc'
if self._name is None:
return "%s(%r)" % ('datetime.' + self.__class__.__name__,
self._offset)
return "%s(%r, %r)" % ('datetime.' + self.__class__.__name__,
self._offset, self._name)
def __str__(self):
return self.tzname(None)
def utcoffset(self, dt):
if isinstance(dt, datetime) or dt is None:
return self._offset
raise TypeError("utcoffset() argument must be a datetime instance"
" or None")
def tzname(self, dt):
if isinstance(dt, datetime) or dt is None:
if self._name is None:
return self._name_from_offset(self._offset)
return self._name
raise TypeError("tzname() argument must be a datetime instance"
" or None")
def dst(self, dt):
if isinstance(dt, datetime) or dt is None:
return None
raise TypeError("dst() argument must be a datetime instance"
" or None")
def fromutc(self, dt):
if isinstance(dt, datetime):
if dt.tzinfo is not self:
raise ValueError("fromutc: dt.tzinfo "
"is not self")
return dt + self._offset
raise TypeError("fromutc() argument must be a datetime instance"
" or None")
_maxoffset = timedelta(hours=23, minutes=59)
_minoffset = -_maxoffset
@staticmethod
def _name_from_offset(delta):
if delta < timedelta(0):
sign = '-'
delta = -delta
else:
sign = '+'
hours, rest = divmod(delta, timedelta(hours=1))
minutes = rest // timedelta(minutes=1)
return 'UTC{}{:02d}:{:02d}'.format(sign, hours, minutes)
timezone.utc = timezone._create(timedelta(0))
timezone.min = timezone._create(timezone._minoffset)
timezone.max = timezone._create(timezone._maxoffset)
_EPOCH = datetime(1970, 1, 1, tzinfo=timezone.utc)
# Some time zone algebra. For a datetime x, let
# x.n = x stripped of its timezone -- its naive time.
# x.o = x.utcoffset(), and assuming that doesn't raise an exception or
# return None
# x.d = x.dst(), and assuming that doesn't raise an exception or
# return None
# x.s = x's standard offset, x.o - x.d
#
# Now some derived rules, where k is a duration (timedelta).
#
# 1. x.o = x.s + x.d
# This follows from the definition of x.s.
#
# 2. If x and y have the same tzinfo member, x.s = y.s.
# This is actually a requirement, an assumption we need to make about
# sane tzinfo classes.
#
# 3. The naive UTC time corresponding to x is x.n - x.o.
# This is again a requirement for a sane tzinfo class.
#
# 4. (x+k).s = x.s
# This follows from #2, and that datimetimetz+timedelta preserves tzinfo.
#
# 5. (x+k).n = x.n + k
# Again follows from how arithmetic is defined.
#
# Now we can explain tz.fromutc(x). Let's assume it's an interesting case
# (meaning that the various tzinfo methods exist, and don't blow up or return
# None when called).
#
# The function wants to return a datetime y with timezone tz, equivalent to x.
# x is already in UTC.
#
# By #3, we want
#
# y.n - y.o = x.n [1]
#
# The algorithm starts by attaching tz to x.n, and calling that y. So
# x.n = y.n at the start. Then it wants to add a duration k to y, so that [1]
# becomes true; in effect, we want to solve [2] for k:
#
# (y+k).n - (y+k).o = x.n [2]
#
# By #1, this is the same as
#
# (y+k).n - ((y+k).s + (y+k).d) = x.n [3]
#
# By #5, (y+k).n = y.n + k, which equals x.n + k because x.n=y.n at the start.
# Substituting that into [3],
#
# x.n + k - (y+k).s - (y+k).d = x.n; the x.n terms cancel, leaving
# k - (y+k).s - (y+k).d = 0; rearranging,
# k = (y+k).s - (y+k).d; by #4, (y+k).s == y.s, so
# k = y.s - (y+k).d
#
# On the RHS, (y+k).d can't be computed directly, but y.s can be, and we
# approximate k by ignoring the (y+k).d term at first. Note that k can't be
# very large, since all offset-returning methods return a duration of magnitude
# less than 24 hours. For that reason, if y is firmly in std time, (y+k).d must
# be 0, so ignoring it has no consequence then.
#
# In any case, the new value is
#
# z = y + y.s [4]
#
# It's helpful to step back at look at [4] from a higher level: it's simply
# mapping from UTC to tz's standard time.
#
# At this point, if
#
# z.n - z.o = x.n [5]
#
# we have an equivalent time, and are almost done. The insecurity here is
# at the start of daylight time. Picture US Eastern for concreteness. The wall
# time jumps from 1:59 to 3:00, and wall hours of the form 2:MM don't make good
# sense then. The docs ask that an Eastern tzinfo class consider such a time to
# be EDT (because it's "after 2"), which is a redundant spelling of 1:MM EST
# on the day DST starts. We want to return the 1:MM EST spelling because that's
# the only spelling that makes sense on the local wall clock.
#
# In fact, if [5] holds at this point, we do have the standard-time spelling,
# but that takes a bit of proof. We first prove a stronger result. What's the
# difference between the LHS and RHS of [5]? Let
#
# diff = x.n - (z.n - z.o) [6]
#
# Now
# z.n = by [4]
# (y + y.s).n = by #5
# y.n + y.s = since y.n = x.n
# x.n + y.s = since z and y are have the same tzinfo member,
# y.s = z.s by #2
# x.n + z.s
#
# Plugging that back into [6] gives
#
# diff =
# x.n - ((x.n + z.s) - z.o) = expanding
# x.n - x.n - z.s + z.o = cancelling
# - z.s + z.o = by #2
# z.d
#
# So diff = z.d.
#
# If [5] is true now, diff = 0, so z.d = 0 too, and we have the standard-time
# spelling we wanted in the endcase described above. We're done. Contrarily,
# if z.d = 0, then we have a UTC equivalent, and are also done.
#
# If [5] is not true now, diff = z.d != 0, and z.d is the offset we need to
# add to z (in effect, z is in tz's standard time, and we need to shift the
# local clock into tz's daylight time).
#
# Let
#
# z' = z + z.d = z + diff [7]
#
# and we can again ask whether
#
# z'.n - z'.o = x.n [8]
#
# If so, we're done. If not, the tzinfo class is insane, according to the
# assumptions we've made. This also requires a bit of proof. As before, let's
# compute the difference between the LHS and RHS of [8] (and skipping some of
# the justifications for the kinds of substitutions we've done several times
# already):
#
# diff' = x.n - (z'.n - z'.o) = replacing z'.n via [7]
# x.n - (z.n + diff - z'.o) = replacing diff via [6]
# x.n - (z.n + x.n - (z.n - z.o) - z'.o) =
# x.n - z.n - x.n + z.n - z.o + z'.o = cancel x.n
# - z.n + z.n - z.o + z'.o = cancel z.n
# - z.o + z'.o = #1 twice
# -z.s - z.d + z'.s + z'.d = z and z' have same tzinfo
# z'.d - z.d
#
# So z' is UTC-equivalent to x iff z'.d = z.d at this point. If they are equal,
# we've found the UTC-equivalent so are done. In fact, we stop with [7] and
# return z', not bothering to compute z'.d.
#
# How could z.d and z'd differ? z' = z + z.d [7], so merely moving z' by
# a dst() offset, and starting *from* a time already in DST (we know z.d != 0),
# would have to change the result dst() returns: we start in DST, and moving
# a little further into it takes us out of DST.
#
# There isn't a sane case where this can happen. The closest it gets is at
# the end of DST, where there's an hour in UTC with no spelling in a hybrid
# tzinfo class. In US Eastern, that's 5:MM UTC = 0:MM EST = 1:MM EDT. During
# that hour, on an Eastern clock 1:MM is taken as being in standard time (6:MM
# UTC) because the docs insist on that, but 0:MM is taken as being in daylight
# time (4:MM UTC). There is no local time mapping to 5:MM UTC. The local
# clock jumps from 1:59 back to 1:00 again, and repeats the 1:MM hour in
# standard time. Since that's what the local clock *does*, we want to map both
# UTC hours 5:MM and 6:MM to 1:MM Eastern. The result is ambiguous
# in local time, but so it goes -- it's the way the local clock works.
#
# When x = 5:MM UTC is the input to this algorithm, x.o=0, y.o=-5 and y.d=0,
# so z=0:MM. z.d=60 (minutes) then, so [5] doesn't hold and we keep going.
# z' = z + z.d = 1:MM then, and z'.d=0, and z'.d - z.d = -60 != 0 so [8]
# (correctly) concludes that z' is not UTC-equivalent to x.
#
# Because we know z.d said z was in daylight time (else [5] would have held and
# we would have stopped then), and we know z.d != z'.d (else [8] would have held
# and we have stopped then), and there are only 2 possible values dst() can
# return in Eastern, it follows that z'.d must be 0 (which it is in the example,
# but the reasoning doesn't depend on the example -- it depends on there being
# two possible dst() outcomes, one zero and the other non-zero). Therefore
# z' must be in standard time, and is the spelling we want in this case.
#
# Note again that z' is not UTC-equivalent as far as the hybrid tzinfo class is
# concerned (because it takes z' as being in standard time rather than the
# daylight time we intend here), but returning it gives the real-life "local
# clock repeats an hour" behavior when mapping the "unspellable" UTC hour into
# tz.
#
# When the input is 6:MM, z=1:MM and z.d=0, and we stop at once, again with
# the 1:MM standard time spelling we want.
#
# So how can this break? One of the assumptions must be violated. Two
# possibilities:
#
# 1) [2] effectively says that y.s is invariant across all y belong to a given
# time zone. This isn't true if, for political reasons or continental drift,
# a region decides to change its base offset from UTC.
#
# 2) There may be versions of "double daylight" time where the tail end of
# the analysis gives up a step too early. I haven't thought about that
# enough to say.
#
# In any case, it's clear that the default fromutc() is strong enough to handle
# "almost all" time zones: so long as the standard offset is invariant, it
# doesn't matter if daylight time transition points change from year to year, or
# if daylight time is skipped in some years; it doesn't matter how large or
# small dst() may get within its bounds; and it doesn't even matter if some
# perverse time zone returns a negative dst()). So a breaking case must be
# pretty bizarre, and a tzinfo subclass can override fromutc() if it is.
try:
from _datetime import *
except ImportError:
pass
else:
# Clean up unused names
del (_DAYNAMES, _DAYS_BEFORE_MONTH, _DAYS_IN_MONTH,
_DI100Y, _DI400Y, _DI4Y, _MAXORDINAL, _MONTHNAMES,
_build_struct_time, _call_tzinfo_method, _check_date_fields,
_check_time_fields, _check_tzinfo_arg, _check_tzname,
_check_utc_offset, _cmp, _cmperror, _date_class, _days_before_month,
_days_before_year, _days_in_month, _format_time, _is_leap,
_isoweek1monday, _math, _ord2ymd, _time, _time_class, _tzinfo_class,
_wrap_strftime, _ymd2ord)
# XXX Since import * above excludes names that start with _,
# docstring does not get overwritten. In the future, it may be
# appropriate to maintain a single module level docstring and
# remove the following line.
from _datetime import __doc__
# Copyright (c) 2004 Python Software Foundation.
# All rights reserved.
# Written by Eric Price <eprice at tjhsst.edu>
# and Facundo Batista <facundo at taniquetil.com.ar>
# and Raymond Hettinger <python at rcn.com>
# and Aahz <aahz at pobox.com>
# and Tim Peters
# This module should be kept in sync with the latest updates of the
# IBM specification as it evolves. Those updates will be treated
# as bug fixes (deviation from the spec is a compatibility, usability
# bug) and will be backported. At this point the spec is stabilizing
# and the updates are becoming fewer, smaller, and less significant.
"""
This is an implementation of decimal floating point arithmetic based on
the General Decimal Arithmetic Specification:
http://speleotrove.com/decimal/decarith.html
and IEEE standard 854-1987:
http://en.wikipedia.org/wiki/IEEE_854-1987
Decimal floating point has finite precision with arbitrarily large bounds.
The purpose of this module is to support arithmetic using familiar
"schoolhouse" rules and to avoid some of the tricky representation
issues associated with binary floating point. The package is especially
useful for financial applications or for contexts where users have
expectations that are at odds with binary floating point (for instance,
in binary floating point, 1.00 % 0.1 gives 0.09999999999999995 instead
of 0.0; Decimal('1.00') % Decimal('0.1') returns the expected
Decimal('0.00')).
Here are some examples of using the decimal module:
>>> from decimal import *
>>> setcontext(ExtendedContext)
>>> Decimal(0)
Decimal('0')
>>> Decimal('1')
Decimal('1')
>>> Decimal('-.0123')
Decimal('-0.0123')
>>> Decimal(123456)
Decimal('123456')
>>> Decimal('123.45e12345678')
Decimal('1.2345E+12345680')
>>> Decimal('1.33') + Decimal('1.27')
Decimal('2.60')
>>> Decimal('12.34') + Decimal('3.87') - Decimal('18.41')
Decimal('-2.20')
>>> dig = Decimal(1)
>>> print(dig / Decimal(3))
0.333333333
>>> getcontext().prec = 18
>>> print(dig / Decimal(3))
0.333333333333333333
>>> print(dig.sqrt())
1
>>> print(Decimal(3).sqrt())
1.73205080756887729
>>> print(Decimal(3) ** 123)
4.85192780976896427E+58
>>> inf = Decimal(1) / Decimal(0)
>>> print(inf)
Infinity
>>> neginf = Decimal(-1) / Decimal(0)
>>> print(neginf)
-Infinity
>>> print(neginf + inf)
NaN
>>> print(neginf * inf)
-Infinity
>>> print(dig / 0)
Infinity
>>> getcontext().traps[DivisionByZero] = 1
>>> print(dig / 0)
Traceback (most recent call last):
...
...
...
decimal.DivisionByZero: x / 0
>>> c = Context()
>>> c.traps[InvalidOperation] = 0
>>> print(c.flags[InvalidOperation])
0
>>> c.divide(Decimal(0), Decimal(0))
Decimal('NaN')
>>> c.traps[InvalidOperation] = 1
>>> print(c.flags[InvalidOperation])
1
>>> c.flags[InvalidOperation] = 0
>>> print(c.flags[InvalidOperation])
0
>>> print(c.divide(Decimal(0), Decimal(0)))
Traceback (most recent call last):
...
...
...
decimal.InvalidOperation: 0 / 0
>>> print(c.flags[InvalidOperation])
1
>>> c.flags[InvalidOperation] = 0
>>> c.traps[InvalidOperation] = 0
>>> print(c.divide(Decimal(0), Decimal(0)))
NaN
>>> print(c.flags[InvalidOperation])
1
>>>
"""
__all__ = [
# Two major classes
'Decimal', 'Context',
# Named tuple representation
'DecimalTuple',
# Contexts
'DefaultContext', 'BasicContext', 'ExtendedContext',
# Exceptions
'DecimalException', 'Clamped', 'InvalidOperation', 'DivisionByZero',
'Inexact', 'Rounded', 'Subnormal', 'Overflow', 'Underflow',
'FloatOperation',
# Exceptional conditions that trigger InvalidOperation
'DivisionImpossible', 'InvalidContext', 'ConversionSyntax', 'DivisionUndefined',
# Constants for use in setting up contexts
'ROUND_DOWN', 'ROUND_HALF_UP', 'ROUND_HALF_EVEN', 'ROUND_CEILING',
'ROUND_FLOOR', 'ROUND_UP', 'ROUND_HALF_DOWN', 'ROUND_05UP',
# Functions for manipulating contexts
'setcontext', 'getcontext', 'localcontext',
# Limits for the C version for compatibility
'MAX_PREC', 'MAX_EMAX', 'MIN_EMIN', 'MIN_ETINY',
# C version: compile time choice that enables the thread local context
'HAVE_THREADS'
]
__version__ = '1.70' # Highest version of the spec this complies with
# See http://speleotrove.com/decimal/
__libmpdec_version__ = "2.4.1" # compatible libmpdec version
import math as _math
import numbers as _numbers
import sys
is_cli = sys.implementation.name == 'ironpython'
try:
from collections import namedtuple as _namedtuple
DecimalTuple = _namedtuple('DecimalTuple', 'sign digits exponent')
except ImportError:
DecimalTuple = lambda *args: args
# Rounding
ROUND_DOWN = 'ROUND_DOWN'
ROUND_HALF_UP = 'ROUND_HALF_UP'
ROUND_HALF_EVEN = 'ROUND_HALF_EVEN'
ROUND_CEILING = 'ROUND_CEILING'
ROUND_FLOOR = 'ROUND_FLOOR'
ROUND_UP = 'ROUND_UP'
ROUND_HALF_DOWN = 'ROUND_HALF_DOWN'
ROUND_05UP = 'ROUND_05UP'
# Compatibility with the C version
HAVE_THREADS = True
if sys.maxsize == 2**63-1:
MAX_PREC = 999999999999999999
MAX_EMAX = 999999999999999999
MIN_EMIN = -999999999999999999
else:
MAX_PREC = 425000000
MAX_EMAX = 425000000
MIN_EMIN = -425000000
MIN_ETINY = MIN_EMIN - (MAX_PREC-1)
# Errors
class DecimalException(ArithmeticError):
"""Base exception class.
Used exceptions derive from this.
If an exception derives from another exception besides this (such as
Underflow (Inexact, Rounded, Subnormal) that indicates that it is only
called if the others are present. This isn't actually used for
anything, though.
handle -- Called when context._raise_error is called and the
trap_enabler is not set. First argument is self, second is the
context. More arguments can be given, those being after
the explanation in _raise_error (For example,
context._raise_error(NewError, '(-x)!', self._sign) would
call NewError().handle(context, self._sign).)
To define a new exception, it should be sufficient to have it derive
from DecimalException.
"""
def handle(self, context, *args):
pass
class Clamped(DecimalException):
"""Exponent of a 0 changed to fit bounds.
This occurs and signals clamped if the exponent of a result has been
altered in order to fit the constraints of a specific concrete
representation. This may occur when the exponent of a zero result would
be outside the bounds of a representation, or when a large normal
number would have an encoded exponent that cannot be represented. In
this latter case, the exponent is reduced to fit and the corresponding
number of zero digits are appended to the coefficient ("fold-down").
"""
class InvalidOperation(DecimalException):
"""An invalid operation was performed.
Various bad things cause this:
Something creates a signaling NaN
-INF + INF
0 * (+-)INF
(+-)INF / (+-)INF
x % 0
(+-)INF % x
x._rescale( non-integer )
sqrt(-x) , x > 0
0 ** 0
x ** (non-integer)
x ** (+-)INF
An operand is invalid
The result of the operation after these is a quiet positive NaN,
except when the cause is a signaling NaN, in which case the result is
also a quiet NaN, but with the original sign, and an optional
diagnostic information.
"""
def handle(self, context, *args):
if args:
ans = _dec_from_triple(args[0]._sign, args[0]._int, 'n', True)
return ans._fix_nan(context)
return _NaN
class ConversionSyntax(InvalidOperation):
"""Trying to convert badly formed string.
This occurs and signals invalid-operation if an string is being
converted to a number and it does not conform to the numeric string
syntax. The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class DivisionByZero(DecimalException, ZeroDivisionError):
"""Division by 0.
This occurs and signals division-by-zero if division of a finite number
by zero was attempted (during a divide-integer or divide operation, or a
power operation with negative right-hand operand), and the dividend was
not zero.
The result of the operation is [sign,inf], where sign is the exclusive
or of the signs of the operands for divide, or is 1 for an odd power of
-0, for power.
"""
def handle(self, context, sign, *args):
return _SignedInfinity[sign]
class DivisionImpossible(InvalidOperation):
"""Cannot perform the division adequately.
This occurs and signals invalid-operation if the integer result of a
divide-integer or remainder operation had too many digits (would be
longer than precision). The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class DivisionUndefined(InvalidOperation, ZeroDivisionError):
"""Undefined result of division.
This occurs and signals invalid-operation if division by zero was
attempted (during a divide-integer, divide, or remainder operation), and
the dividend is also zero. The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class Inexact(DecimalException):
"""Had to round, losing information.
This occurs and signals inexact whenever the result of an operation is
not exact (that is, it needed to be rounded and any discarded digits
were non-zero), or if an overflow or underflow condition occurs. The
result in all cases is unchanged.
The inexact signal may be tested (or trapped) to determine if a given
operation (or sequence of operations) was inexact.
"""
class InvalidContext(InvalidOperation):
"""Invalid context. Unknown rounding, for example.
This occurs and signals invalid-operation if an invalid context was
detected during an operation. This can occur if contexts are not checked
on creation and either the precision exceeds the capability of the
underlying concrete representation or an unknown or unsupported rounding
was specified. These aspects of the context need only be checked when
the values are required to be used. The result is [0,qNaN].
"""
def handle(self, context, *args):
return _NaN
class Rounded(DecimalException):
"""Number got rounded (not necessarily changed during rounding).
This occurs and signals rounded whenever the result of an operation is
rounded (that is, some zero or non-zero digits were discarded from the
coefficient), or if an overflow or underflow condition occurs. The
result in all cases is unchanged.
The rounded signal may be tested (or trapped) to determine if a given
operation (or sequence of operations) caused a loss of precision.
"""
class Subnormal(DecimalException):
"""Exponent < Emin before rounding.
This occurs and signals subnormal whenever the result of a conversion or
operation is subnormal (that is, its adjusted exponent is less than
Emin, before any rounding). The result in all cases is unchanged.
The subnormal signal may be tested (or trapped) to determine if a given
or operation (or sequence of operations) yielded a subnormal result.
"""
class Overflow(Inexact, Rounded):
"""Numerical overflow.
This occurs and signals overflow if the adjusted exponent of a result
(from a conversion or from an operation that is not an attempt to divide
by zero), after rounding, would be greater than the largest value that
can be handled by the implementation (the value Emax).
The result depends on the rounding mode:
For round-half-up and round-half-even (and for round-half-down and
round-up, if implemented), the result of the operation is [sign,inf],
where sign is the sign of the intermediate result. For round-down, the
result is the largest finite number that can be represented in the
current precision, with the sign of the intermediate result. For
round-ceiling, the result is the same as for round-down if the sign of
the intermediate result is 1, or is [0,inf] otherwise. For round-floor,
the result is the same as for round-down if the sign of the intermediate
result is 0, or is [1,inf] otherwise. In all cases, Inexact and Rounded
will also be raised.
"""
def handle(self, context, sign, *args):
if context.rounding in (ROUND_HALF_UP, ROUND_HALF_EVEN,
ROUND_HALF_DOWN, ROUND_UP):
return _SignedInfinity[sign]
if sign == 0:
if context.rounding == ROUND_CEILING:
return _SignedInfinity[sign]
return _dec_from_triple(sign, '9'*context.prec,
context.Emax-context.prec+1)
if sign == 1:
if context.rounding == ROUND_FLOOR:
return _SignedInfinity[sign]
return _dec_from_triple(sign, '9'*context.prec,
context.Emax-context.prec+1)
class Underflow(Inexact, Rounded, Subnormal):
"""Numerical underflow with result rounded to 0.
This occurs and signals underflow if a result is inexact and the
adjusted exponent of the result would be smaller (more negative) than
the smallest value that can be handled by the implementation (the value
Emin). That is, the result is both inexact and subnormal.
The result after an underflow will be a subnormal number rounded, if
necessary, so that its exponent is not less than Etiny. This may result
in 0 with the sign of the intermediate result and an exponent of Etiny.
In all cases, Inexact, Rounded, and Subnormal will also be raised.
"""
class FloatOperation(DecimalException, TypeError):
"""Enable stricter semantics for mixing floats and Decimals.
If the signal is not trapped (default), mixing floats and Decimals is
permitted in the Decimal() constructor, context.create_decimal() and
all comparison operators. Both conversion and comparisons are exact.
Any occurrence of a mixed operation is silently recorded by setting
FloatOperation in the context flags. Explicit conversions with
Decimal.from_float() or context.create_decimal_from_float() do not
set the flag.
Otherwise (the signal is trapped), only equality comparisons and explicit
conversions are silent. All other mixed operations raise FloatOperation.
"""
# List of public traps and flags
_signals = [Clamped, DivisionByZero, Inexact, Overflow, Rounded,
Underflow, InvalidOperation, Subnormal, FloatOperation]
# Map conditions (per the spec) to signals
_condition_map = {ConversionSyntax:InvalidOperation,
DivisionImpossible:InvalidOperation,
DivisionUndefined:InvalidOperation,
InvalidContext:InvalidOperation}
# Valid rounding modes
_rounding_modes = (ROUND_DOWN, ROUND_HALF_UP, ROUND_HALF_EVEN, ROUND_CEILING,
ROUND_FLOOR, ROUND_UP, ROUND_HALF_DOWN, ROUND_05UP)
##### Context Functions ##################################################
# The getcontext() and setcontext() function manage access to a thread-local
# current context. Py2.4 offers direct support for thread locals. If that
# is not available, use threading.current_thread() which is slower but will
# work for older Pythons. If threads are not part of the build, create a
# mock threading object with threading.local() returning the module namespace.
try:
import threading
except ImportError:
# Python was compiled without threads; create a mock object instead
class MockThreading(object):
def local(self, sys=sys):
return sys.modules[__name__]
threading = MockThreading()
del MockThreading
try:
threading.local
except AttributeError:
# To fix reloading, force it to create a new context
# Old contexts have different exceptions in their dicts, making problems.
if hasattr(threading.current_thread(), '__decimal_context__'):
del threading.current_thread().__decimal_context__
def setcontext(context):
"""Set this thread's context to context."""
if context in (DefaultContext, BasicContext, ExtendedContext):
context = context.copy()
context.clear_flags()
threading.current_thread().__decimal_context__ = context
def getcontext():
"""Returns this thread's context.
If this thread does not yet have a context, returns
a new context and sets this thread's context.
New contexts are copies of DefaultContext.
"""
try:
return threading.current_thread().__decimal_context__
except AttributeError:
context = Context()
threading.current_thread().__decimal_context__ = context
return context
else:
local = threading.local()
if hasattr(local, '__decimal_context__'):
del local.__decimal_context__
def getcontext(_local=local):
"""Returns this thread's context.
If this thread does not yet have a context, returns
a new context and sets this thread's context.
New contexts are copies of DefaultContext.
"""
try:
return _local.__decimal_context__
except AttributeError:
context = Context()
_local.__decimal_context__ = context
return context
def setcontext(context, _local=local):
"""Set this thread's context to context."""
if context in (DefaultContext, BasicContext, ExtendedContext):
context = context.copy()
context.clear_flags()
_local.__decimal_context__ = context
del threading, local # Don't contaminate the namespace
def localcontext(ctx=None):
"""Return a context manager for a copy of the supplied context
Uses a copy of the current context if no context is specified
The returned context manager creates a local decimal context
in a with statement:
def sin(x):
with localcontext() as ctx:
ctx.prec += 2
# Rest of sin calculation algorithm
# uses a precision 2 greater than normal
return +s # Convert result to normal precision
def sin(x):
with localcontext(ExtendedContext):
# Rest of sin calculation algorithm
# uses the Extended Context from the
# General Decimal Arithmetic Specification
return +s # Convert result to normal context
>>> setcontext(DefaultContext)
>>> print(getcontext().prec)
28
>>> with localcontext():
... ctx = getcontext()
... ctx.prec += 2
... print(ctx.prec)
...
30
>>> with localcontext(ExtendedContext):
... print(getcontext().prec)
...
9
>>> print(getcontext().prec)
28
"""
if ctx is None: ctx = getcontext()
return _ContextManager(ctx)
##### Decimal class #######################################################
# Do not subclass Decimal from numbers.Real and do not register it as such
# (because Decimals are not interoperable with floats). See the notes in
# numbers.py for more detail.
class Decimal(object):
"""Floating point class for decimal arithmetic."""
__slots__ = ('_exp','_int','_sign', '_is_special')
# Generally, the value of the Decimal instance is given by
# (-1)**_sign * _int * 10**_exp
# Special values are signified by _is_special == True
# We're immutable, so use __new__ not __init__
def __new__(cls, value="0", context=None):
"""Create a decimal point instance.
>>> Decimal('3.14') # string input
Decimal('3.14')
>>> Decimal((0, (3, 1, 4), -2)) # tuple (sign, digit_tuple, exponent)
Decimal('3.14')
>>> Decimal(314) # int
Decimal('314')
>>> Decimal(Decimal(314)) # another decimal instance
Decimal('314')
>>> Decimal(' 3.14 \\n') # leading and trailing whitespace okay
Decimal('3.14')
"""
# Note that the coefficient, self._int, is actually stored as
# a string rather than as a tuple of digits. This speeds up
# the "digits to integer" and "integer to digits" conversions
# that are used in almost every arithmetic operation on
# Decimals. This is an internal detail: the as_tuple function
# and the Decimal constructor still deal with tuples of
# digits.
self = object.__new__(cls)
if is_cli:
import System
if isinstance(value, System.Decimal):
value = str(value)
# From a string
# REs insist on real strings, so we can too.
if isinstance(value, str):
m = _parser(value.strip())
if m is None:
if context is None:
context = getcontext()
return context._raise_error(ConversionSyntax,
"Invalid literal for Decimal: %r" % value)
if m.group('sign') == "-":
self._sign = 1
else:
self._sign = 0
intpart = m.group('int')
if intpart is not None:
# finite number
fracpart = m.group('frac') or ''
exp = int(m.group('exp') or '0')
self._int = str(int(intpart+fracpart))
self._exp = exp - len(fracpart)
self._is_special = False
else:
diag = m.group('diag')
if diag is not None:
# NaN
self._int = str(int(diag or '0')).lstrip('0')
if m.group('signal'):
self._exp = 'N'
else:
self._exp = 'n'
else:
# infinity
self._int = '0'
self._exp = 'F'
self._is_special = True
return self
# From an integer
if isinstance(value, int):
if value >= 0:
self._sign = 0
else:
self._sign = 1
self._exp = 0
self._int = str(abs(value))
self._is_special = False
return self
# From another decimal
if isinstance(value, Decimal):
self._exp = value._exp
self._sign = value._sign
self._int = value._int
self._is_special = value._is_special
return self
# From an internal working value
if isinstance(value, _WorkRep):
self._sign = value.sign
self._int = str(value.int)
self._exp = int(value.exp)
self._is_special = False
return self
# tuple/list conversion (possibly from as_tuple())
if isinstance(value, (list,tuple)):
if len(value) != 3:
raise ValueError('Invalid tuple size in creation of Decimal '
'from list or tuple. The list or tuple '
'should have exactly three elements.')
# process sign. The isinstance test rejects floats
if not (isinstance(value[0], int) and value[0] in (0,1)):
raise ValueError("Invalid sign. The first value in the tuple "
"should be an integer; either 0 for a "
"positive number or 1 for a negative number.")
self._sign = value[0]
if value[2] == 'F':
# infinity: value[1] is ignored
self._int = '0'
self._exp = value[2]
self._is_special = True
else:
# process and validate the digits in value[1]
digits = []
for digit in value[1]:
if isinstance(digit, int) and 0 <= digit <= 9:
# skip leading zeros
if digits or digit != 0:
digits.append(digit)
else:
raise ValueError("The second value in the tuple must "
"be composed of integers in the range "
"0 through 9.")
if value[2] in ('n', 'N'):
# NaN: digits form the diagnostic
self._int = ''.join(map(str, digits))
self._exp = value[2]
self._is_special = True
elif isinstance(value[2], int):
# finite number: digits give the coefficient
self._int = ''.join(map(str, digits or [0]))
self._exp = value[2]
self._is_special = False
else:
raise ValueError("The third value in the tuple must "
"be an integer, or one of the "
"strings 'F', 'n', 'N'.")
return self
if isinstance(value, float):
if context is None:
context = getcontext()
context._raise_error(FloatOperation,
"strict semantics for mixing floats and Decimals are "
"enabled")
value = Decimal.from_float(value)
self._exp = value._exp
self._sign = value._sign
self._int = value._int
self._is_special = value._is_special
return self
raise TypeError("Cannot convert %r to Decimal" % value)
@classmethod
def from_float(cls, f):
"""Converts a float to a decimal number, exactly.
Note that Decimal.from_float(0.1) is not the same as Decimal('0.1').
Since 0.1 is not exactly representable in binary floating point, the
value is stored as the nearest representable value which is
0x1.999999999999ap-4. The exact equivalent of the value in decimal
is 0.1000000000000000055511151231257827021181583404541015625.
>>> Decimal.from_float(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> Decimal.from_float(float('nan'))
Decimal('NaN')
>>> Decimal.from_float(float('inf'))
Decimal('Infinity')
>>> Decimal.from_float(-float('inf'))
Decimal('-Infinity')
>>> Decimal.from_float(-0.0)
Decimal('-0')
"""
if isinstance(f, int): # handle integer inputs
return cls(f)
if not isinstance(f, float):
raise TypeError("argument must be int or float.")
if _math.isinf(f) or _math.isnan(f):
return cls(repr(f))
if _math.copysign(1.0, f) == 1.0:
sign = 0
else:
sign = 1
n, d = abs(f).as_integer_ratio()
k = d.bit_length() - 1
result = _dec_from_triple(sign, str(n*5**k), -k)
if cls is Decimal:
return result
else:
return cls(result)
def _isnan(self):
"""Returns whether the number is not actually one.
0 if a number
1 if NaN
2 if sNaN
"""
if self._is_special:
exp = self._exp
if exp == 'n':
return 1
elif exp == 'N':
return 2
return 0
def _isinfinity(self):
"""Returns whether the number is infinite
0 if finite or not a number
1 if +INF
-1 if -INF
"""
if self._exp == 'F':
if self._sign:
return -1
return 1
return 0
def _check_nans(self, other=None, context=None):
"""Returns whether the number is not actually one.
if self, other are sNaN, signal
if self, other are NaN return nan
return 0
Done before operations.
"""
self_is_nan = self._isnan()
if other is None:
other_is_nan = False
else:
other_is_nan = other._isnan()
if self_is_nan or other_is_nan:
if context is None:
context = getcontext()
if self_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
self)
if other_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
other)
if self_is_nan:
return self._fix_nan(context)
return other._fix_nan(context)
return 0
def _compare_check_nans(self, other, context):
"""Version of _check_nans used for the signaling comparisons
compare_signal, __le__, __lt__, __ge__, __gt__.
Signal InvalidOperation if either self or other is a (quiet
or signaling) NaN. Signaling NaNs take precedence over quiet
NaNs.
Return 0 if neither operand is a NaN.
"""
if context is None:
context = getcontext()
if self._is_special or other._is_special:
if self.is_snan():
return context._raise_error(InvalidOperation,
'comparison involving sNaN',
self)
elif other.is_snan():
return context._raise_error(InvalidOperation,
'comparison involving sNaN',
other)
elif self.is_qnan():
return context._raise_error(InvalidOperation,
'comparison involving NaN',
self)
elif other.is_qnan():
return context._raise_error(InvalidOperation,
'comparison involving NaN',
other)
return 0
def __bool__(self):
"""Return True if self is nonzero; otherwise return False.
NaNs and infinities are considered nonzero.
"""
return self._is_special or self._int != '0'
def _cmp(self, other):
"""Compare the two non-NaN decimal instances self and other.
Returns -1 if self < other, 0 if self == other and 1
if self > other. This routine is for internal use only."""
if self._is_special or other._is_special:
self_inf = self._isinfinity()
other_inf = other._isinfinity()
if self_inf == other_inf:
return 0
elif self_inf < other_inf:
return -1
else:
return 1
# check for zeros; Decimal('0') == Decimal('-0')
if not self:
if not other:
return 0
else:
return -((-1)**other._sign)
if not other:
return (-1)**self._sign
# If different signs, neg one is less
if other._sign < self._sign:
return -1
if self._sign < other._sign:
return 1
self_adjusted = self.adjusted()
other_adjusted = other.adjusted()
if self_adjusted == other_adjusted:
self_padded = self._int + '0'*(self._exp - other._exp)
other_padded = other._int + '0'*(other._exp - self._exp)
if self_padded == other_padded:
return 0
elif self_padded < other_padded:
return -(-1)**self._sign
else:
return (-1)**self._sign
elif self_adjusted > other_adjusted:
return (-1)**self._sign
else: # self_adjusted < other_adjusted
return -((-1)**self._sign)
# Note: The Decimal standard doesn't cover rich comparisons for
# Decimals. In particular, the specification is silent on the
# subject of what should happen for a comparison involving a NaN.
# We take the following approach:
#
# == comparisons involving a quiet NaN always return False
# != comparisons involving a quiet NaN always return True
# == or != comparisons involving a signaling NaN signal
# InvalidOperation, and return False or True as above if the
# InvalidOperation is not trapped.
# <, >, <= and >= comparisons involving a (quiet or signaling)
# NaN signal InvalidOperation, and return False if the
# InvalidOperation is not trapped.
#
# This behavior is designed to conform as closely as possible to
# that specified by IEEE 754.
def __eq__(self, other, context=None):
self, other = _convert_for_comparison(self, other, equality_op=True)
if other is NotImplemented:
return other
if self._check_nans(other, context):
return False
return self._cmp(other) == 0
def __ne__(self, other, context=None):
self, other = _convert_for_comparison(self, other, equality_op=True)
if other is NotImplemented:
return other
if self._check_nans(other, context):
return True
return self._cmp(other) != 0
def __lt__(self, other, context=None):
self, other = _convert_for_comparison(self, other)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) < 0
def __le__(self, other, context=None):
self, other = _convert_for_comparison(self, other)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) <= 0
def __gt__(self, other, context=None):
self, other = _convert_for_comparison(self, other)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) > 0
def __ge__(self, other, context=None):
self, other = _convert_for_comparison(self, other)
if other is NotImplemented:
return other
ans = self._compare_check_nans(other, context)
if ans:
return False
return self._cmp(other) >= 0
def compare(self, other, context=None):
"""Compare self to other. Return a decimal value:
a or b is a NaN ==> Decimal('NaN')
a < b ==> Decimal('-1')
a == b ==> Decimal('0')
a > b ==> Decimal('1')
"""
other = _convert_other(other, raiseit=True)
# Compare(NaN, NaN) = NaN
if (self._is_special or other and other._is_special):
ans = self._check_nans(other, context)
if ans:
return ans
return Decimal(self._cmp(other))
def __hash__(self):
"""x.__hash__() <==> hash(x)"""
# In order to make sure that the hash of a Decimal instance
# agrees with the hash of a numerically equal integer, float
# or Fraction, we follow the rules for numeric hashes outlined
# in the documentation. (See library docs, 'Built-in Types').
if self._is_special:
if self.is_snan():
raise TypeError('Cannot hash a signaling NaN value.')
elif self.is_nan():
return _PyHASH_NAN
else:
if self._sign:
return -_PyHASH_INF
else:
return _PyHASH_INF
if self._exp >= 0:
exp_hash = pow(10, self._exp, _PyHASH_MODULUS)
else:
exp_hash = pow(_PyHASH_10INV, -self._exp, _PyHASH_MODULUS)
hash_ = int(self._int) * exp_hash % _PyHASH_MODULUS
ans = hash_ if self >= 0 else -hash_
return -2 if ans == -1 else ans
def as_tuple(self):
"""Represents the number as a triple tuple.
To show the internals exactly as they are.
"""
return DecimalTuple(self._sign, tuple(map(int, self._int)), self._exp)
def __repr__(self):
"""Represents the number as an instance of Decimal."""
# Invariant: eval(repr(d)) == d
return "Decimal('%s')" % str(self)
def __str__(self, eng=False, context=None):
"""Return string representation of the number in scientific notation.
Captures all of the information in the underlying representation.
"""
sign = ['', '-'][self._sign]
if self._is_special:
if self._exp == 'F':
return sign + 'Infinity'
elif self._exp == 'n':
return sign + 'NaN' + self._int
else: # self._exp == 'N'
return sign + 'sNaN' + self._int
# number of digits of self._int to left of decimal point
leftdigits = self._exp + len(self._int)
# dotplace is number of digits of self._int to the left of the
# decimal point in the mantissa of the output string (that is,
# after adjusting the exponent)
if self._exp <= 0 and leftdigits > -6:
# no exponent required
dotplace = leftdigits
elif not eng:
# usual scientific notation: 1 digit on left of the point
dotplace = 1
elif self._int == '0':
# engineering notation, zero
dotplace = (leftdigits + 1) % 3 - 1
else:
# engineering notation, nonzero
dotplace = (leftdigits - 1) % 3 + 1
if dotplace <= 0:
intpart = '0'
fracpart = '.' + '0'*(-dotplace) + self._int
elif dotplace >= len(self._int):
intpart = self._int+'0'*(dotplace-len(self._int))
fracpart = ''
else:
intpart = self._int[:dotplace]
fracpart = '.' + self._int[dotplace:]
if leftdigits == dotplace:
exp = ''
else:
if context is None:
context = getcontext()
exp = ['e', 'E'][context.capitals] + "%+d" % (leftdigits-dotplace)
return sign + intpart + fracpart + exp
def to_eng_string(self, context=None):
"""Convert to engineering-type string.
Engineering notation has an exponent which is a multiple of 3, so there
are up to 3 digits left of the decimal place.
Same rules for when in exponential and when as a value as in __str__.
"""
return self.__str__(eng=True, context=context)
def __neg__(self, context=None):
"""Returns a copy with the sign switched.
Rounds, if it has reason.
"""
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if context is None:
context = getcontext()
if not self and context.rounding != ROUND_FLOOR:
# -Decimal('0') is Decimal('0'), not Decimal('-0'), except
# in ROUND_FLOOR rounding mode.
ans = self.copy_abs()
else:
ans = self.copy_negate()
return ans._fix(context)
def __pos__(self, context=None):
"""Returns a copy, unless it is a sNaN.
Rounds the number (if more then precision digits)
"""
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if context is None:
context = getcontext()
if not self and context.rounding != ROUND_FLOOR:
# + (-0) = 0, except in ROUND_FLOOR rounding mode.
ans = self.copy_abs()
else:
ans = Decimal(self)
return ans._fix(context)
def __abs__(self, round=True, context=None):
"""Returns the absolute value of self.
If the keyword argument 'round' is false, do not round. The
expression self.__abs__(round=False) is equivalent to
self.copy_abs().
"""
if not round:
return self.copy_abs()
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if self._sign:
ans = self.__neg__(context=context)
else:
ans = self.__pos__(context=context)
return ans
def __add__(self, other, context=None):
"""Returns self + other.
-INF + INF (or the reverse) cause InvalidOperation errors.
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
if self._is_special or other._is_special:
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
# If both INF, same sign => same as both, opposite => error.
if self._sign != other._sign and other._isinfinity():
return context._raise_error(InvalidOperation, '-INF + INF')
return Decimal(self)
if other._isinfinity():
return Decimal(other) # Can't both be infinity here
exp = min(self._exp, other._exp)
negativezero = 0
if context.rounding == ROUND_FLOOR and self._sign != other._sign:
# If the answer is 0, the sign should be negative, in this case.
negativezero = 1
if not self and not other:
sign = min(self._sign, other._sign)
if negativezero:
sign = 1
ans = _dec_from_triple(sign, '0', exp)
ans = ans._fix(context)
return ans
if not self:
exp = max(exp, other._exp - context.prec-1)
ans = other._rescale(exp, context.rounding)
ans = ans._fix(context)
return ans
if not other:
exp = max(exp, self._exp - context.prec-1)
ans = self._rescale(exp, context.rounding)
ans = ans._fix(context)
return ans
op1 = _WorkRep(self)
op2 = _WorkRep(other)
op1, op2 = _normalize(op1, op2, context.prec)
result = _WorkRep()
if op1.sign != op2.sign:
# Equal and opposite
if op1.int == op2.int:
ans = _dec_from_triple(negativezero, '0', exp)
ans = ans._fix(context)
return ans
if op1.int < op2.int:
op1, op2 = op2, op1
# OK, now abs(op1) > abs(op2)
if op1.sign == 1:
result.sign = 1
op1.sign, op2.sign = op2.sign, op1.sign
else:
result.sign = 0
# So we know the sign, and op1 > 0.
elif op1.sign == 1:
result.sign = 1
op1.sign, op2.sign = (0, 0)
else:
result.sign = 0
# Now, op1 > abs(op2) > 0
if op2.sign == 0:
result.int = op1.int + op2.int
else:
result.int = op1.int - op2.int
result.exp = op1.exp
ans = Decimal(result)
ans = ans._fix(context)
return ans
__radd__ = __add__
def __sub__(self, other, context=None):
"""Return self - other"""
other = _convert_other(other)
if other is NotImplemented:
return other
if self._is_special or other._is_special:
ans = self._check_nans(other, context=context)
if ans:
return ans
# self - other is computed as self + other.copy_negate()
return self.__add__(other.copy_negate(), context=context)
def __rsub__(self, other, context=None):
"""Return other - self"""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__sub__(self, context=context)
def __mul__(self, other, context=None):
"""Return self * other.
(+-) INF * 0 (or its reverse) raise InvalidOperation.
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
resultsign = self._sign ^ other._sign
if self._is_special or other._is_special:
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
if not other:
return context._raise_error(InvalidOperation, '(+-)INF * 0')
return _SignedInfinity[resultsign]
if other._isinfinity():
if not self:
return context._raise_error(InvalidOperation, '0 * (+-)INF')
return _SignedInfinity[resultsign]
resultexp = self._exp + other._exp
# Special case for multiplying by zero
if not self or not other:
ans = _dec_from_triple(resultsign, '0', resultexp)
# Fixing in case the exponent is out of bounds
ans = ans._fix(context)
return ans
# Special case for multiplying by power of 10
if self._int == '1':
ans = _dec_from_triple(resultsign, other._int, resultexp)
ans = ans._fix(context)
return ans
if other._int == '1':
ans = _dec_from_triple(resultsign, self._int, resultexp)
ans = ans._fix(context)
return ans
op1 = _WorkRep(self)
op2 = _WorkRep(other)
ans = _dec_from_triple(resultsign, str(op1.int * op2.int), resultexp)
ans = ans._fix(context)
return ans
__rmul__ = __mul__
def __truediv__(self, other, context=None):
"""Return self / other."""
other = _convert_other(other)
if other is NotImplemented:
return NotImplemented
if context is None:
context = getcontext()
sign = self._sign ^ other._sign
if self._is_special or other._is_special:
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity() and other._isinfinity():
return context._raise_error(InvalidOperation, '(+-)INF/(+-)INF')
if self._isinfinity():
return _SignedInfinity[sign]
if other._isinfinity():
context._raise_error(Clamped, 'Division by infinity')
return _dec_from_triple(sign, '0', context.Etiny())
# Special cases for zeroes
if not other:
if not self:
return context._raise_error(DivisionUndefined, '0 / 0')
return context._raise_error(DivisionByZero, 'x / 0', sign)
if not self:
exp = self._exp - other._exp
coeff = 0
else:
# OK, so neither = 0, INF or NaN
shift = len(other._int) - len(self._int) + context.prec + 1
exp = self._exp - other._exp - shift
op1 = _WorkRep(self)
op2 = _WorkRep(other)
if shift >= 0:
coeff, remainder = divmod(op1.int * 10**shift, op2.int)
else:
coeff, remainder = divmod(op1.int, op2.int * 10**-shift)
if remainder:
# result is not exact; adjust to ensure correct rounding
if coeff % 5 == 0:
coeff += 1
else:
# result is exact; get as close to ideal exponent as possible
ideal_exp = self._exp - other._exp
while exp < ideal_exp and coeff % 10 == 0:
coeff //= 10
exp += 1
ans = _dec_from_triple(sign, str(coeff), exp)
return ans._fix(context)
def _divide(self, other, context):
"""Return (self // other, self % other), to context.prec precision.
Assumes that neither self nor other is a NaN, that self is not
infinite and that other is nonzero.
"""
sign = self._sign ^ other._sign
if other._isinfinity():
ideal_exp = self._exp
else:
ideal_exp = min(self._exp, other._exp)
expdiff = self.adjusted() - other.adjusted()
if not self or other._isinfinity() or expdiff <= -2:
return (_dec_from_triple(sign, '0', 0),
self._rescale(ideal_exp, context.rounding))
if expdiff <= context.prec:
op1 = _WorkRep(self)
op2 = _WorkRep(other)
if op1.exp >= op2.exp:
op1.int *= 10**(op1.exp - op2.exp)
else:
op2.int *= 10**(op2.exp - op1.exp)
q, r = divmod(op1.int, op2.int)
if q < 10**context.prec:
return (_dec_from_triple(sign, str(q), 0),
_dec_from_triple(self._sign, str(r), ideal_exp))
# Here the quotient is too large to be representable
ans = context._raise_error(DivisionImpossible,
'quotient too large in //, % or divmod')
return ans, ans
def __rtruediv__(self, other, context=None):
"""Swaps self/other and returns __truediv__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__truediv__(self, context=context)
def __divmod__(self, other, context=None):
"""
Return (self // other, self % other)
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return (ans, ans)
sign = self._sign ^ other._sign
if self._isinfinity():
if other._isinfinity():
ans = context._raise_error(InvalidOperation, 'divmod(INF, INF)')
return ans, ans
else:
return (_SignedInfinity[sign],
context._raise_error(InvalidOperation, 'INF % x'))
if not other:
if not self:
ans = context._raise_error(DivisionUndefined, 'divmod(0, 0)')
return ans, ans
else:
return (context._raise_error(DivisionByZero, 'x // 0', sign),
context._raise_error(InvalidOperation, 'x % 0'))
quotient, remainder = self._divide(other, context)
remainder = remainder._fix(context)
return quotient, remainder
def __rdivmod__(self, other, context=None):
"""Swaps self/other and returns __divmod__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__divmod__(self, context=context)
def __mod__(self, other, context=None):
"""
self % other
"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
return context._raise_error(InvalidOperation, 'INF % x')
elif not other:
if self:
return context._raise_error(InvalidOperation, 'x % 0')
else:
return context._raise_error(DivisionUndefined, '0 % 0')
remainder = self._divide(other, context)[1]
remainder = remainder._fix(context)
return remainder
def __rmod__(self, other, context=None):
"""Swaps self/other and returns __mod__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__mod__(self, context=context)
def remainder_near(self, other, context=None):
"""
Remainder nearest to 0- abs(remainder-near) <= other/2
"""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
# self == +/-infinity -> InvalidOperation
if self._isinfinity():
return context._raise_error(InvalidOperation,
'remainder_near(infinity, x)')
# other == 0 -> either InvalidOperation or DivisionUndefined
if not other:
if self:
return context._raise_error(InvalidOperation,
'remainder_near(x, 0)')
else:
return context._raise_error(DivisionUndefined,
'remainder_near(0, 0)')
# other = +/-infinity -> remainder = self
if other._isinfinity():
ans = Decimal(self)
return ans._fix(context)
# self = 0 -> remainder = self, with ideal exponent
ideal_exponent = min(self._exp, other._exp)
if not self:
ans = _dec_from_triple(self._sign, '0', ideal_exponent)
return ans._fix(context)
# catch most cases of large or small quotient
expdiff = self.adjusted() - other.adjusted()
if expdiff >= context.prec + 1:
# expdiff >= prec+1 => abs(self/other) > 10**prec
return context._raise_error(DivisionImpossible)
if expdiff <= -2:
# expdiff <= -2 => abs(self/other) < 0.1
ans = self._rescale(ideal_exponent, context.rounding)
return ans._fix(context)
# adjust both arguments to have the same exponent, then divide
op1 = _WorkRep(self)
op2 = _WorkRep(other)
if op1.exp >= op2.exp:
op1.int *= 10**(op1.exp - op2.exp)
else:
op2.int *= 10**(op2.exp - op1.exp)
q, r = divmod(op1.int, op2.int)
# remainder is r*10**ideal_exponent; other is +/-op2.int *
# 10**ideal_exponent. Apply correction to ensure that
# abs(remainder) <= abs(other)/2
if 2*r + (q&1) > op2.int:
r -= op2.int
q += 1
if q >= 10**context.prec:
return context._raise_error(DivisionImpossible)
# result has same sign as self unless r is negative
sign = self._sign
if r < 0:
sign = 1-sign
r = -r
ans = _dec_from_triple(sign, str(r), ideal_exponent)
return ans._fix(context)
def __floordiv__(self, other, context=None):
"""self // other"""
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return ans
if self._isinfinity():
if other._isinfinity():
return context._raise_error(InvalidOperation, 'INF // INF')
else:
return _SignedInfinity[self._sign ^ other._sign]
if not other:
if self:
return context._raise_error(DivisionByZero, 'x // 0',
self._sign ^ other._sign)
else:
return context._raise_error(DivisionUndefined, '0 // 0')
return self._divide(other, context)[0]
def __rfloordiv__(self, other, context=None):
"""Swaps self/other and returns __floordiv__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__floordiv__(self, context=context)
def __float__(self):
"""Float representation."""
if self._isnan():
if self.is_snan():
raise ValueError("Cannot convert signaling NaN to float")
s = "-nan" if self._sign else "nan"
else:
s = str(self)
return float(s)
def __int__(self):
"""Converts self to an int, truncating if necessary."""
if self._is_special:
if self._isnan():
raise ValueError("Cannot convert NaN to integer")
elif self._isinfinity():
raise OverflowError("Cannot convert infinity to integer")
s = (-1)**self._sign
if self._exp >= 0:
return s*int(self._int)*10**self._exp
else:
return s*int(self._int[:self._exp] or '0')
__trunc__ = __int__
def real(self):
return self
real = property(real)
def imag(self):
return Decimal(0)
imag = property(imag)
def conjugate(self):
return self
def __complex__(self):
return complex(float(self))
def _fix_nan(self, context):
"""Decapitate the payload of a NaN to fit the context"""
payload = self._int
# maximum length of payload is precision if clamp=0,
# precision-1 if clamp=1.
max_payload_len = context.prec - context.clamp
if len(payload) > max_payload_len:
payload = payload[len(payload)-max_payload_len:].lstrip('0')
return _dec_from_triple(self._sign, payload, self._exp, True)
return Decimal(self)
def _fix(self, context):
"""Round if it is necessary to keep self within prec precision.
Rounds and fixes the exponent. Does not raise on a sNaN.
Arguments:
self - Decimal instance
context - context used.
"""
if self._is_special:
if self._isnan():
# decapitate payload if necessary
return self._fix_nan(context)
else:
# self is +/-Infinity; return unaltered
return Decimal(self)
# if self is zero then exponent should be between Etiny and
# Emax if clamp==0, and between Etiny and Etop if clamp==1.
Etiny = context.Etiny()
Etop = context.Etop()
if not self:
exp_max = [context.Emax, Etop][context.clamp]
new_exp = min(max(self._exp, Etiny), exp_max)
if new_exp != self._exp:
context._raise_error(Clamped)
return _dec_from_triple(self._sign, '0', new_exp)
else:
return Decimal(self)
# exp_min is the smallest allowable exponent of the result,
# equal to max(self.adjusted()-context.prec+1, Etiny)
exp_min = len(self._int) + self._exp - context.prec
if exp_min > Etop:
# overflow: exp_min > Etop iff self.adjusted() > Emax
ans = context._raise_error(Overflow, 'above Emax', self._sign)
context._raise_error(Inexact)
context._raise_error(Rounded)
return ans
self_is_subnormal = exp_min < Etiny
if self_is_subnormal:
exp_min = Etiny
# round if self has too many digits
if self._exp < exp_min:
digits = len(self._int) + self._exp - exp_min
if digits < 0:
self = _dec_from_triple(self._sign, '1', exp_min-1)
digits = 0
rounding_method = self._pick_rounding_function[context.rounding]
changed = rounding_method(self, digits)
coeff = self._int[:digits] or '0'
if changed > 0:
coeff = str(int(coeff)+1)
if len(coeff) > context.prec:
coeff = coeff[:-1]
exp_min += 1
# check whether the rounding pushed the exponent out of range
if exp_min > Etop:
ans = context._raise_error(Overflow, 'above Emax', self._sign)
else:
ans = _dec_from_triple(self._sign, coeff, exp_min)
# raise the appropriate signals, taking care to respect
# the precedence described in the specification
if changed and self_is_subnormal:
context._raise_error(Underflow)
if self_is_subnormal:
context._raise_error(Subnormal)
if changed:
context._raise_error(Inexact)
context._raise_error(Rounded)
if not ans:
# raise Clamped on underflow to 0
context._raise_error(Clamped)
return ans
if self_is_subnormal:
context._raise_error(Subnormal)
# fold down if clamp == 1 and self has too few digits
if context.clamp == 1 and self._exp > Etop:
context._raise_error(Clamped)
self_padded = self._int + '0'*(self._exp - Etop)
return _dec_from_triple(self._sign, self_padded, Etop)
# here self was representable to begin with; return unchanged
return Decimal(self)
# for each of the rounding functions below:
# self is a finite, nonzero Decimal
# prec is an integer satisfying 0 <= prec < len(self._int)
#
# each function returns either -1, 0, or 1, as follows:
# 1 indicates that self should be rounded up (away from zero)
# 0 indicates that self should be truncated, and that all the
# digits to be truncated are zeros (so the value is unchanged)
# -1 indicates that there are nonzero digits to be truncated
def _round_down(self, prec):
"""Also known as round-towards-0, truncate."""
if _all_zeros(self._int, prec):
return 0
else:
return -1
def _round_up(self, prec):
"""Rounds away from 0."""
return -self._round_down(prec)
def _round_half_up(self, prec):
"""Rounds 5 up (away from 0)"""
if self._int[prec] in '56789':
return 1
elif _all_zeros(self._int, prec):
return 0
else:
return -1
def _round_half_down(self, prec):
"""Round 5 down"""
if _exact_half(self._int, prec):
return -1
else:
return self._round_half_up(prec)
def _round_half_even(self, prec):
"""Round 5 to even, rest to nearest."""
if _exact_half(self._int, prec) and \
(prec == 0 or self._int[prec-1] in '02468'):
return -1
else:
return self._round_half_up(prec)
def _round_ceiling(self, prec):
"""Rounds up (not away from 0 if negative.)"""
if self._sign:
return self._round_down(prec)
else:
return -self._round_down(prec)
def _round_floor(self, prec):
"""Rounds down (not towards 0 if negative)"""
if not self._sign:
return self._round_down(prec)
else:
return -self._round_down(prec)
def _round_05up(self, prec):
"""Round down unless digit prec-1 is 0 or 5."""
if prec and self._int[prec-1] not in '05':
return self._round_down(prec)
else:
return -self._round_down(prec)
_pick_rounding_function = dict(
ROUND_DOWN = _round_down,
ROUND_UP = _round_up,
ROUND_HALF_UP = _round_half_up,
ROUND_HALF_DOWN = _round_half_down,
ROUND_HALF_EVEN = _round_half_even,
ROUND_CEILING = _round_ceiling,
ROUND_FLOOR = _round_floor,
ROUND_05UP = _round_05up,
)
def __round__(self, n=None):
"""Round self to the nearest integer, or to a given precision.
If only one argument is supplied, round a finite Decimal
instance self to the nearest integer. If self is infinite or
a NaN then a Python exception is raised. If self is finite
and lies exactly halfway between two integers then it is
rounded to the integer with even last digit.
>>> round(Decimal('123.456'))
123
>>> round(Decimal('-456.789'))
-457
>>> round(Decimal('-3.0'))
-3
>>> round(Decimal('2.5'))
2
>>> round(Decimal('3.5'))
4
>>> round(Decimal('Inf'))
Traceback (most recent call last):
...
OverflowError: cannot round an infinity
>>> round(Decimal('NaN'))
Traceback (most recent call last):
...
ValueError: cannot round a NaN
If a second argument n is supplied, self is rounded to n
decimal places using the rounding mode for the current
context.
For an integer n, round(self, -n) is exactly equivalent to
self.quantize(Decimal('1En')).
>>> round(Decimal('123.456'), 0)
Decimal('123')
>>> round(Decimal('123.456'), 2)
Decimal('123.46')
>>> round(Decimal('123.456'), -2)
Decimal('1E+2')
>>> round(Decimal('-Infinity'), 37)
Decimal('NaN')
>>> round(Decimal('sNaN123'), 0)
Decimal('NaN123')
"""
if n is not None:
# two-argument form: use the equivalent quantize call
if not isinstance(n, int):
raise TypeError('Second argument to round should be integral')
exp = _dec_from_triple(0, '1', -n)
return self.quantize(exp)
# one-argument form
if self._is_special:
if self.is_nan():
raise ValueError("cannot round a NaN")
else:
raise OverflowError("cannot round an infinity")
return int(self._rescale(0, ROUND_HALF_EVEN))
def __floor__(self):
"""Return the floor of self, as an integer.
For a finite Decimal instance self, return the greatest
integer n such that n <= self. If self is infinite or a NaN
then a Python exception is raised.
"""
if self._is_special:
if self.is_nan():
raise ValueError("cannot round a NaN")
else:
raise OverflowError("cannot round an infinity")
return int(self._rescale(0, ROUND_FLOOR))
def __ceil__(self):
"""Return the ceiling of self, as an integer.
For a finite Decimal instance self, return the least integer n
such that n >= self. If self is infinite or a NaN then a
Python exception is raised.
"""
if self._is_special:
if self.is_nan():
raise ValueError("cannot round a NaN")
else:
raise OverflowError("cannot round an infinity")
return int(self._rescale(0, ROUND_CEILING))
def fma(self, other, third, context=None):
"""Fused multiply-add.
Returns self*other+third with no rounding of the intermediate
product self*other.
self and other are multiplied together, with no rounding of
the result. The third operand is then added to the result,
and a single final rounding is performed.
"""
other = _convert_other(other, raiseit=True)
third = _convert_other(third, raiseit=True)
# compute product; raise InvalidOperation if either operand is
# a signaling NaN or if the product is zero times infinity.
if self._is_special or other._is_special:
if context is None:
context = getcontext()
if self._exp == 'N':
return context._raise_error(InvalidOperation, 'sNaN', self)
if other._exp == 'N':
return context._raise_error(InvalidOperation, 'sNaN', other)
if self._exp == 'n':
product = self
elif other._exp == 'n':
product = other
elif self._exp == 'F':
if not other:
return context._raise_error(InvalidOperation,
'INF * 0 in fma')
product = _SignedInfinity[self._sign ^ other._sign]
elif other._exp == 'F':
if not self:
return context._raise_error(InvalidOperation,
'0 * INF in fma')
product = _SignedInfinity[self._sign ^ other._sign]
else:
product = _dec_from_triple(self._sign ^ other._sign,
str(int(self._int) * int(other._int)),
self._exp + other._exp)
return product.__add__(third, context)
def _power_modulo(self, other, modulo, context=None):
"""Three argument version of __pow__"""
other = _convert_other(other)
if other is NotImplemented:
return other
modulo = _convert_other(modulo)
if modulo is NotImplemented:
return modulo
if context is None:
context = getcontext()
# deal with NaNs: if there are any sNaNs then first one wins,
# (i.e. behaviour for NaNs is identical to that of fma)
self_is_nan = self._isnan()
other_is_nan = other._isnan()
modulo_is_nan = modulo._isnan()
if self_is_nan or other_is_nan or modulo_is_nan:
if self_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
self)
if other_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
other)
if modulo_is_nan == 2:
return context._raise_error(InvalidOperation, 'sNaN',
modulo)
if self_is_nan:
return self._fix_nan(context)
if other_is_nan:
return other._fix_nan(context)
return modulo._fix_nan(context)
# check inputs: we apply same restrictions as Python's pow()
if not (self._isinteger() and
other._isinteger() and
modulo._isinteger()):
return context._raise_error(InvalidOperation,
'pow() 3rd argument not allowed '
'unless all arguments are integers')
if other < 0:
return context._raise_error(InvalidOperation,
'pow() 2nd argument cannot be '
'negative when 3rd argument specified')
if not modulo:
return context._raise_error(InvalidOperation,
'pow() 3rd argument cannot be 0')
# additional restriction for decimal: the modulus must be less
# than 10**prec in absolute value
if modulo.adjusted() >= context.prec:
return context._raise_error(InvalidOperation,
'insufficient precision: pow() 3rd '
'argument must not have more than '
'precision digits')
# define 0**0 == NaN, for consistency with two-argument pow
# (even though it hurts!)
if not other and not self:
return context._raise_error(InvalidOperation,
'at least one of pow() 1st argument '
'and 2nd argument must be nonzero ;'
'0**0 is not defined')
# compute sign of result
if other._iseven():
sign = 0
else:
sign = self._sign
# convert modulo to a Python integer, and self and other to
# Decimal integers (i.e. force their exponents to be >= 0)
modulo = abs(int(modulo))
base = _WorkRep(self.to_integral_value())
exponent = _WorkRep(other.to_integral_value())
# compute result using integer pow()
base = (base.int % modulo * pow(10, base.exp, modulo)) % modulo
for i in range(exponent.exp):
base = pow(base, 10, modulo)
base = pow(base, exponent.int, modulo)
return _dec_from_triple(sign, str(base), 0)
def _power_exact(self, other, p):
"""Attempt to compute self**other exactly.
Given Decimals self and other and an integer p, attempt to
compute an exact result for the power self**other, with p
digits of precision. Return None if self**other is not
exactly representable in p digits.
Assumes that elimination of special cases has already been
performed: self and other must both be nonspecial; self must
be positive and not numerically equal to 1; other must be
nonzero. For efficiency, other._exp should not be too large,
so that 10**abs(other._exp) is a feasible calculation."""
# In the comments below, we write x for the value of self and y for the
# value of other. Write x = xc*10**xe and abs(y) = yc*10**ye, with xc
# and yc positive integers not divisible by 10.
# The main purpose of this method is to identify the *failure*
# of x**y to be exactly representable with as little effort as
# possible. So we look for cheap and easy tests that
# eliminate the possibility of x**y being exact. Only if all
# these tests are passed do we go on to actually compute x**y.
# Here's the main idea. Express y as a rational number m/n, with m and
# n relatively prime and n>0. Then for x**y to be exactly
# representable (at *any* precision), xc must be the nth power of a
# positive integer and xe must be divisible by n. If y is negative
# then additionally xc must be a power of either 2 or 5, hence a power
# of 2**n or 5**n.
#
# There's a limit to how small |y| can be: if y=m/n as above
# then:
#
# (1) if xc != 1 then for the result to be representable we
# need xc**(1/n) >= 2, and hence also xc**|y| >= 2. So
# if |y| <= 1/nbits(xc) then xc < 2**nbits(xc) <=
# 2**(1/|y|), hence xc**|y| < 2 and the result is not
# representable.
#
# (2) if xe != 0, |xe|*(1/n) >= 1, so |xe|*|y| >= 1. Hence if
# |y| < 1/|xe| then the result is not representable.
#
# Note that since x is not equal to 1, at least one of (1) and
# (2) must apply. Now |y| < 1/nbits(xc) iff |yc|*nbits(xc) <
# 10**-ye iff len(str(|yc|*nbits(xc)) <= -ye.
#
# There's also a limit to how large y can be, at least if it's
# positive: the normalized result will have coefficient xc**y,
# so if it's representable then xc**y < 10**p, and y <
# p/log10(xc). Hence if y*log10(xc) >= p then the result is
# not exactly representable.
# if len(str(abs(yc*xe)) <= -ye then abs(yc*xe) < 10**-ye,
# so |y| < 1/xe and the result is not representable.
# Similarly, len(str(abs(yc)*xc_bits)) <= -ye implies |y|
# < 1/nbits(xc).
x = _WorkRep(self)
xc, xe = x.int, x.exp
while xc % 10 == 0:
xc //= 10
xe += 1
y = _WorkRep(other)
yc, ye = y.int, y.exp
while yc % 10 == 0:
yc //= 10
ye += 1
# case where xc == 1: result is 10**(xe*y), with xe*y
# required to be an integer
if xc == 1:
xe *= yc
# result is now 10**(xe * 10**ye); xe * 10**ye must be integral
while xe % 10 == 0:
xe //= 10
ye += 1
if ye < 0:
return None
exponent = xe * 10**ye
if y.sign == 1:
exponent = -exponent
# if other is a nonnegative integer, use ideal exponent
if other._isinteger() and other._sign == 0:
ideal_exponent = self._exp*int(other)
zeros = min(exponent-ideal_exponent, p-1)
else:
zeros = 0
return _dec_from_triple(0, '1' + '0'*zeros, exponent-zeros)
# case where y is negative: xc must be either a power
# of 2 or a power of 5.
if y.sign == 1:
last_digit = xc % 10
if last_digit in (2,4,6,8):
# quick test for power of 2
if xc & -xc != xc:
return None
# now xc is a power of 2; e is its exponent
e = _nbits(xc)-1
# We now have:
#
# x = 2**e * 10**xe, e > 0, and y < 0.
#
# The exact result is:
#
# x**y = 5**(-e*y) * 10**(e*y + xe*y)
#
# provided that both e*y and xe*y are integers. Note that if
# 5**(-e*y) >= 10**p, then the result can't be expressed
# exactly with p digits of precision.
#
# Using the above, we can guard against large values of ye.
# 93/65 is an upper bound for log(10)/log(5), so if
#
# ye >= len(str(93*p//65))
#
# then
#
# -e*y >= -y >= 10**ye > 93*p/65 > p*log(10)/log(5),
#
# so 5**(-e*y) >= 10**p, and the coefficient of the result
# can't be expressed in p digits.
# emax >= largest e such that 5**e < 10**p.
emax = p*93//65
if ye >= len(str(emax)):
return None
# Find -e*y and -xe*y; both must be integers
e = _decimal_lshift_exact(e * yc, ye)
xe = _decimal_lshift_exact(xe * yc, ye)
if e is None or xe is None:
return None
if e > emax:
return None
xc = 5**e
elif last_digit == 5:
# e >= log_5(xc) if xc is a power of 5; we have
# equality all the way up to xc=5**2658
e = _nbits(xc)*28//65
xc, remainder = divmod(5**e, xc)
if remainder:
return None
while xc % 5 == 0:
xc //= 5
e -= 1
# Guard against large values of ye, using the same logic as in
# the 'xc is a power of 2' branch. 10/3 is an upper bound for
# log(10)/log(2).
emax = p*10//3
if ye >= len(str(emax)):
return None
e = _decimal_lshift_exact(e * yc, ye)
xe = _decimal_lshift_exact(xe * yc, ye)
if e is None or xe is None:
return None
if e > emax:
return None
xc = 2**e
else:
return None
if xc >= 10**p:
return None
xe = -e-xe
return _dec_from_triple(0, str(xc), xe)
# now y is positive; find m and n such that y = m/n
if ye >= 0:
m, n = yc*10**ye, 1
else:
if xe != 0 and len(str(abs(yc*xe))) <= -ye:
return None
xc_bits = _nbits(xc)
if xc != 1 and len(str(abs(yc)*xc_bits)) <= -ye:
return None
m, n = yc, 10**(-ye)
while m % 2 == n % 2 == 0:
m //= 2
n //= 2
while m % 5 == n % 5 == 0:
m //= 5
n //= 5
# compute nth root of xc*10**xe
if n > 1:
# if 1 < xc < 2**n then xc isn't an nth power
if xc != 1 and xc_bits <= n:
return None
xe, rem = divmod(xe, n)
if rem != 0:
return None
# compute nth root of xc using Newton's method
a = 1 << -(-_nbits(xc)//n) # initial estimate
while True:
q, r = divmod(xc, a**(n-1))
if a <= q:
break
else:
a = (a*(n-1) + q)//n
if not (a == q and r == 0):
return None
xc = a
# now xc*10**xe is the nth root of the original xc*10**xe
# compute mth power of xc*10**xe
# if m > p*100//_log10_lb(xc) then m > p/log10(xc), hence xc**m >
# 10**p and the result is not representable.
if xc > 1 and m > p*100//_log10_lb(xc):
return None
xc = xc**m
xe *= m
if xc > 10**p:
return None
# by this point the result *is* exactly representable
# adjust the exponent to get as close as possible to the ideal
# exponent, if necessary
str_xc = str(xc)
if other._isinteger() and other._sign == 0:
ideal_exponent = self._exp*int(other)
zeros = min(xe-ideal_exponent, p-len(str_xc))
else:
zeros = 0
return _dec_from_triple(0, str_xc+'0'*zeros, xe-zeros)
def __pow__(self, other, modulo=None, context=None):
"""Return self ** other [ % modulo].
With two arguments, compute self**other.
With three arguments, compute (self**other) % modulo. For the
three argument form, the following restrictions on the
arguments hold:
- all three arguments must be integral
- other must be nonnegative
- either self or other (or both) must be nonzero
- modulo must be nonzero and must have at most p digits,
where p is the context precision.
If any of these restrictions is violated the InvalidOperation
flag is raised.
The result of pow(self, other, modulo) is identical to the
result that would be obtained by computing (self**other) %
modulo with unbounded precision, but is computed more
efficiently. It is always exact.
"""
if modulo is not None:
return self._power_modulo(other, modulo, context)
other = _convert_other(other)
if other is NotImplemented:
return other
if context is None:
context = getcontext()
# either argument is a NaN => result is NaN
ans = self._check_nans(other, context)
if ans:
return ans
# 0**0 = NaN (!), x**0 = 1 for nonzero x (including +/-Infinity)
if not other:
if not self:
return context._raise_error(InvalidOperation, '0 ** 0')
else:
return _One
# result has sign 1 iff self._sign is 1 and other is an odd integer
result_sign = 0
if self._sign == 1:
if other._isinteger():
if not other._iseven():
result_sign = 1
else:
# -ve**noninteger = NaN
# (-0)**noninteger = 0**noninteger
if self:
return context._raise_error(InvalidOperation,
'x ** y with x negative and y not an integer')
# negate self, without doing any unwanted rounding
self = self.copy_negate()
# 0**(+ve or Inf)= 0; 0**(-ve or -Inf) = Infinity
if not self:
if other._sign == 0:
return _dec_from_triple(result_sign, '0', 0)
else:
return _SignedInfinity[result_sign]
# Inf**(+ve or Inf) = Inf; Inf**(-ve or -Inf) = 0
if self._isinfinity():
if other._sign == 0:
return _SignedInfinity[result_sign]
else:
return _dec_from_triple(result_sign, '0', 0)
# 1**other = 1, but the choice of exponent and the flags
# depend on the exponent of self, and on whether other is a
# positive integer, a negative integer, or neither
if self == _One:
if other._isinteger():
# exp = max(self._exp*max(int(other), 0),
# 1-context.prec) but evaluating int(other) directly
# is dangerous until we know other is small (other
# could be 1e999999999)
if other._sign == 1:
multiplier = 0
elif other > context.prec:
multiplier = context.prec
else:
multiplier = int(other)
exp = self._exp * multiplier
if exp < 1-context.prec:
exp = 1-context.prec
context._raise_error(Rounded)
else:
context._raise_error(Inexact)
context._raise_error(Rounded)
exp = 1-context.prec
return _dec_from_triple(result_sign, '1'+'0'*-exp, exp)
# compute adjusted exponent of self
self_adj = self.adjusted()
# self ** infinity is infinity if self > 1, 0 if self < 1
# self ** -infinity is infinity if self < 1, 0 if self > 1
if other._isinfinity():
if (other._sign == 0) == (self_adj < 0):
return _dec_from_triple(result_sign, '0', 0)
else:
return _SignedInfinity[result_sign]
# from here on, the result always goes through the call
# to _fix at the end of this function.
ans = None
exact = False
# crude test to catch cases of extreme overflow/underflow. If
# log10(self)*other >= 10**bound and bound >= len(str(Emax))
# then 10**bound >= 10**len(str(Emax)) >= Emax+1 and hence
# self**other >= 10**(Emax+1), so overflow occurs. The test
# for underflow is similar.
bound = self._log10_exp_bound() + other.adjusted()
if (self_adj >= 0) == (other._sign == 0):
# self > 1 and other +ve, or self < 1 and other -ve
# possibility of overflow
if bound >= len(str(context.Emax)):
ans = _dec_from_triple(result_sign, '1', context.Emax+1)
else:
# self > 1 and other -ve, or self < 1 and other +ve
# possibility of underflow to 0
Etiny = context.Etiny()
if bound >= len(str(-Etiny)):
ans = _dec_from_triple(result_sign, '1', Etiny-1)
# try for an exact result with precision +1
if ans is None:
ans = self._power_exact(other, context.prec + 1)
if ans is not None:
if result_sign == 1:
ans = _dec_from_triple(1, ans._int, ans._exp)
exact = True
# usual case: inexact result, x**y computed directly as exp(y*log(x))
if ans is None:
p = context.prec
x = _WorkRep(self)
xc, xe = x.int, x.exp
y = _WorkRep(other)
yc, ye = y.int, y.exp
if y.sign == 1:
yc = -yc
# compute correctly rounded result: start with precision +3,
# then increase precision until result is unambiguously roundable
extra = 3
while True:
coeff, exp = _dpower(xc, xe, yc, ye, p+extra)
if coeff % (5*10**(len(str(coeff))-p-1)):
break
extra += 3
ans = _dec_from_triple(result_sign, str(coeff), exp)
# unlike exp, ln and log10, the power function respects the
# rounding mode; no need to switch to ROUND_HALF_EVEN here
# There's a difficulty here when 'other' is not an integer and
# the result is exact. In this case, the specification
# requires that the Inexact flag be raised (in spite of
# exactness), but since the result is exact _fix won't do this
# for us. (Correspondingly, the Underflow signal should also
# be raised for subnormal results.) We can't directly raise
# these signals either before or after calling _fix, since
# that would violate the precedence for signals. So we wrap
# the ._fix call in a temporary context, and reraise
# afterwards.
if exact and not other._isinteger():
# pad with zeros up to length context.prec+1 if necessary; this
# ensures that the Rounded signal will be raised.
if len(ans._int) <= context.prec:
expdiff = context.prec + 1 - len(ans._int)
ans = _dec_from_triple(ans._sign, ans._int+'0'*expdiff,
ans._exp-expdiff)
# create a copy of the current context, with cleared flags/traps
newcontext = context.copy()
newcontext.clear_flags()
for exception in _signals:
newcontext.traps[exception] = 0
# round in the new context
ans = ans._fix(newcontext)
# raise Inexact, and if necessary, Underflow
newcontext._raise_error(Inexact)
if newcontext.flags[Subnormal]:
newcontext._raise_error(Underflow)
# propagate signals to the original context; _fix could
# have raised any of Overflow, Underflow, Subnormal,
# Inexact, Rounded, Clamped. Overflow needs the correct
# arguments. Note that the order of the exceptions is
# important here.
if newcontext.flags[Overflow]:
context._raise_error(Overflow, 'above Emax', ans._sign)
for exception in Underflow, Subnormal, Inexact, Rounded, Clamped:
if newcontext.flags[exception]:
context._raise_error(exception)
else:
ans = ans._fix(context)
return ans
def __rpow__(self, other, context=None):
"""Swaps self/other and returns __pow__."""
other = _convert_other(other)
if other is NotImplemented:
return other
return other.__pow__(self, context=context)
def normalize(self, context=None):
"""Normalize- strip trailing 0s, change anything equal to 0 to 0e0"""
if context is None:
context = getcontext()
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
dup = self._fix(context)
if dup._isinfinity():
return dup
if not dup:
return _dec_from_triple(dup._sign, '0', 0)
exp_max = [context.Emax, context.Etop()][context.clamp]
end = len(dup._int)
exp = dup._exp
while dup._int[end-1] == '0' and exp < exp_max:
exp += 1
end -= 1
return _dec_from_triple(dup._sign, dup._int[:end], exp)
def quantize(self, exp, rounding=None, context=None, watchexp=True):
"""Quantize self so its exponent is the same as that of exp.
Similar to self._rescale(exp._exp) but with error checking.
"""
exp = _convert_other(exp, raiseit=True)
if context is None:
context = getcontext()
if rounding is None:
rounding = context.rounding
if self._is_special or exp._is_special:
ans = self._check_nans(exp, context)
if ans:
return ans
if exp._isinfinity() or self._isinfinity():
if exp._isinfinity() and self._isinfinity():
return Decimal(self) # if both are inf, it is OK
return context._raise_error(InvalidOperation,
'quantize with one INF')
# if we're not watching exponents, do a simple rescale
if not watchexp:
ans = self._rescale(exp._exp, rounding)
# raise Inexact and Rounded where appropriate
if ans._exp > self._exp:
context._raise_error(Rounded)
if ans != self:
context._raise_error(Inexact)
return ans
# exp._exp should be between Etiny and Emax
if not (context.Etiny() <= exp._exp <= context.Emax):
return context._raise_error(InvalidOperation,
'target exponent out of bounds in quantize')
if not self:
ans = _dec_from_triple(self._sign, '0', exp._exp)
return ans._fix(context)
self_adjusted = self.adjusted()
if self_adjusted > context.Emax:
return context._raise_error(InvalidOperation,
'exponent of quantize result too large for current context')
if self_adjusted - exp._exp + 1 > context.prec:
return context._raise_error(InvalidOperation,
'quantize result has too many digits for current context')
ans = self._rescale(exp._exp, rounding)
if ans.adjusted() > context.Emax:
return context._raise_error(InvalidOperation,
'exponent of quantize result too large for current context')
if len(ans._int) > context.prec:
return context._raise_error(InvalidOperation,
'quantize result has too many digits for current context')
# raise appropriate flags
if ans and ans.adjusted() < context.Emin:
context._raise_error(Subnormal)
if ans._exp > self._exp:
if ans != self:
context._raise_error(Inexact)
context._raise_error(Rounded)
# call to fix takes care of any necessary folddown, and
# signals Clamped if necessary
ans = ans._fix(context)
return ans
def same_quantum(self, other, context=None):
"""Return True if self and other have the same exponent; otherwise
return False.
If either operand is a special value, the following rules are used:
* return True if both operands are infinities
* return True if both operands are NaNs
* otherwise, return False.
"""
other = _convert_other(other, raiseit=True)
if self._is_special or other._is_special:
return (self.is_nan() and other.is_nan() or
self.is_infinite() and other.is_infinite())
return self._exp == other._exp
def _rescale(self, exp, rounding):
"""Rescale self so that the exponent is exp, either by padding with zeros
or by truncating digits, using the given rounding mode.
Specials are returned without change. This operation is
quiet: it raises no flags, and uses no information from the
context.
exp = exp to scale to (an integer)
rounding = rounding mode
"""
if self._is_special:
return Decimal(self)
if not self:
return _dec_from_triple(self._sign, '0', exp)
if self._exp >= exp:
# pad answer with zeros if necessary
return _dec_from_triple(self._sign,
self._int + '0'*(self._exp - exp), exp)
# too many digits; round and lose data. If self.adjusted() <
# exp-1, replace self by 10**(exp-1) before rounding
digits = len(self._int) + self._exp - exp
if digits < 0:
self = _dec_from_triple(self._sign, '1', exp-1)
digits = 0
this_function = self._pick_rounding_function[rounding]
changed = this_function(self, digits)
coeff = self._int[:digits] or '0'
if changed == 1:
coeff = str(int(coeff)+1)
return _dec_from_triple(self._sign, coeff, exp)
def _round(self, places, rounding):
"""Round a nonzero, nonspecial Decimal to a fixed number of
significant figures, using the given rounding mode.
Infinities, NaNs and zeros are returned unaltered.
This operation is quiet: it raises no flags, and uses no
information from the context.
"""
if places <= 0:
raise ValueError("argument should be at least 1 in _round")
if self._is_special or not self:
return Decimal(self)
ans = self._rescale(self.adjusted()+1-places, rounding)
# it can happen that the rescale alters the adjusted exponent;
# for example when rounding 99.97 to 3 significant figures.
# When this happens we end up with an extra 0 at the end of
# the number; a second rescale fixes this.
if ans.adjusted() != self.adjusted():
ans = ans._rescale(ans.adjusted()+1-places, rounding)
return ans
def to_integral_exact(self, rounding=None, context=None):
"""Rounds to a nearby integer.
If no rounding mode is specified, take the rounding mode from
the context. This method raises the Rounded and Inexact flags
when appropriate.
See also: to_integral_value, which does exactly the same as
this method except that it doesn't raise Inexact or Rounded.
"""
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
return Decimal(self)
if self._exp >= 0:
return Decimal(self)
if not self:
return _dec_from_triple(self._sign, '0', 0)
if context is None:
context = getcontext()
if rounding is None:
rounding = context.rounding
ans = self._rescale(0, rounding)
if ans != self:
context._raise_error(Inexact)
context._raise_error(Rounded)
return ans
def to_integral_value(self, rounding=None, context=None):
"""Rounds to the nearest integer, without raising inexact, rounded."""
if context is None:
context = getcontext()
if rounding is None:
rounding = context.rounding
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
return Decimal(self)
if self._exp >= 0:
return Decimal(self)
else:
return self._rescale(0, rounding)
# the method name changed, but we provide also the old one, for compatibility
to_integral = to_integral_value
def sqrt(self, context=None):
"""Return the square root of self."""
if context is None:
context = getcontext()
if self._is_special:
ans = self._check_nans(context=context)
if ans:
return ans
if self._isinfinity() and self._sign == 0:
return Decimal(self)
if not self:
# exponent = self._exp // 2. sqrt(-0) = -0
ans = _dec_from_triple(self._sign, '0', self._exp // 2)
return ans._fix(context)
if self._sign == 1:
return context._raise_error(InvalidOperation, 'sqrt(-x), x > 0')
# At this point self represents a positive number. Let p be
# the desired precision and express self in the form c*100**e
# with c a positive real number and e an integer, c and e
# being chosen so that 100**(p-1) <= c < 100**p. Then the
# (exact) square root of self is sqrt(c)*10**e, and 10**(p-1)
# <= sqrt(c) < 10**p, so the closest representable Decimal at
# precision p is n*10**e where n = round_half_even(sqrt(c)),
# the closest integer to sqrt(c) with the even integer chosen
# in the case of a tie.
#
# To ensure correct rounding in all cases, we use the
# following trick: we compute the square root to an extra
# place (precision p+1 instead of precision p), rounding down.
# Then, if the result is inexact and its last digit is 0 or 5,
# we increase the last digit to 1 or 6 respectively; if it's
# exact we leave the last digit alone. Now the final round to
# p places (or fewer in the case of underflow) will round
# correctly and raise the appropriate flags.
# use an extra digit of precision
prec = context.prec+1
# write argument in the form c*100**e where e = self._exp//2
# is the 'ideal' exponent, to be used if the square root is
# exactly representable. l is the number of 'digits' of c in
# base 100, so that 100**(l-1) <= c < 100**l.
op = _WorkRep(self)
e = op.exp >> 1
if op.exp & 1:
c = op.int * 10
l = (len(self._int) >> 1) + 1
else:
c = op.int
l = len(self._int)+1 >> 1
# rescale so that c has exactly prec base 100 'digits'
shift = prec-l
if shift >= 0:
c *= 100**shift
exact = True
else:
c, remainder = divmod(c, 100**-shift)
exact = not remainder
e -= shift
# find n = floor(sqrt(c)) using Newton's method
n = 10**prec
while True:
q = c//n
if n <= q:
break
else:
n = n + q >> 1
exact = exact and n*n == c
if exact:
# result is exact; rescale to use ideal exponent e
if shift >= 0:
# assert n % 10**shift == 0
n //= 10**shift
else:
n *= 10**-shift
e += shift
else:
# result is not exact; fix last digit as described above
if n % 5 == 0:
n += 1
ans = _dec_from_triple(0, str(n), e)
# round, and fit to current context
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def max(self, other, context=None):
"""Returns the larger value.
Like max(self, other) except if one is not a number, returns
NaN (and signals if one is sNaN). Also rounds.
"""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self._cmp(other)
if c == 0:
# If both operands are finite and equal in numerical value
# then an ordering is applied:
#
# If the signs differ then max returns the operand with the
# positive sign and min returns the operand with the negative sign
#
# If the signs are the same then the exponent is used to select
# the result. This is exactly the ordering used in compare_total.
c = self.compare_total(other)
if c == -1:
ans = other
else:
ans = self
return ans._fix(context)
def min(self, other, context=None):
"""Returns the smaller value.
Like min(self, other) except if one is not a number, returns
NaN (and signals if one is sNaN). Also rounds.
"""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self._cmp(other)
if c == 0:
c = self.compare_total(other)
if c == -1:
ans = self
else:
ans = other
return ans._fix(context)
def _isinteger(self):
"""Returns whether self is an integer"""
if self._is_special:
return False
if self._exp >= 0:
return True
rest = self._int[self._exp:]
return rest == '0'*len(rest)
def _iseven(self):
"""Returns True if self is even. Assumes self is an integer."""
if not self or self._exp > 0:
return True
return self._int[-1+self._exp] in '02468'
def adjusted(self):
"""Return the adjusted exponent of self"""
try:
return self._exp + len(self._int) - 1
# If NaN or Infinity, self._exp is string
except TypeError:
return 0
def canonical(self):
"""Returns the same Decimal object.
As we do not have different encodings for the same number, the
received object already is in its canonical form.
"""
return self
def compare_signal(self, other, context=None):
"""Compares self to the other operand numerically.
It's pretty much like compare(), but all NaNs signal, with signaling
NaNs taking precedence over quiet NaNs.
"""
other = _convert_other(other, raiseit = True)
ans = self._compare_check_nans(other, context)
if ans:
return ans
return self.compare(other, context=context)
def compare_total(self, other, context=None):
"""Compares self to other using the abstract representations.
This is not like the standard compare, which use their numerical
value. Note that a total ordering is defined for all possible abstract
representations.
"""
other = _convert_other(other, raiseit=True)
# if one is negative and the other is positive, it's easy
if self._sign and not other._sign:
return _NegativeOne
if not self._sign and other._sign:
return _One
sign = self._sign
# let's handle both NaN types
self_nan = self._isnan()
other_nan = other._isnan()
if self_nan or other_nan:
if self_nan == other_nan:
# compare payloads as though they're integers
self_key = len(self._int), self._int
other_key = len(other._int), other._int
if self_key < other_key:
if sign:
return _One
else:
return _NegativeOne
if self_key > other_key:
if sign:
return _NegativeOne
else:
return _One
return _Zero
if sign:
if self_nan == 1:
return _NegativeOne
if other_nan == 1:
return _One
if self_nan == 2:
return _NegativeOne
if other_nan == 2:
return _One
else:
if self_nan == 1:
return _One
if other_nan == 1:
return _NegativeOne
if self_nan == 2:
return _One
if other_nan == 2:
return _NegativeOne
if self < other:
return _NegativeOne
if self > other:
return _One
if self._exp < other._exp:
if sign:
return _One
else:
return _NegativeOne
if self._exp > other._exp:
if sign:
return _NegativeOne
else:
return _One
return _Zero
def compare_total_mag(self, other, context=None):
"""Compares self to other using abstract repr., ignoring sign.
Like compare_total, but with operand's sign ignored and assumed to be 0.
"""
other = _convert_other(other, raiseit=True)
s = self.copy_abs()
o = other.copy_abs()
return s.compare_total(o)
def copy_abs(self):
"""Returns a copy with the sign set to 0. """
return _dec_from_triple(0, self._int, self._exp, self._is_special)
def copy_negate(self):
"""Returns a copy with the sign inverted."""
if self._sign:
return _dec_from_triple(0, self._int, self._exp, self._is_special)
else:
return _dec_from_triple(1, self._int, self._exp, self._is_special)
def copy_sign(self, other, context=None):
"""Returns self with the sign of other."""
other = _convert_other(other, raiseit=True)
return _dec_from_triple(other._sign, self._int,
self._exp, self._is_special)
def exp(self, context=None):
"""Returns e ** self."""
if context is None:
context = getcontext()
# exp(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
# exp(-Infinity) = 0
if self._isinfinity() == -1:
return _Zero
# exp(0) = 1
if not self:
return _One
# exp(Infinity) = Infinity
if self._isinfinity() == 1:
return Decimal(self)
# the result is now guaranteed to be inexact (the true
# mathematical result is transcendental). There's no need to
# raise Rounded and Inexact here---they'll always be raised as
# a result of the call to _fix.
p = context.prec
adj = self.adjusted()
# we only need to do any computation for quite a small range
# of adjusted exponents---for example, -29 <= adj <= 10 for
# the default context. For smaller exponent the result is
# indistinguishable from 1 at the given precision, while for
# larger exponent the result either overflows or underflows.
if self._sign == 0 and adj > len(str((context.Emax+1)*3)):
# overflow
ans = _dec_from_triple(0, '1', context.Emax+1)
elif self._sign == 1 and adj > len(str((-context.Etiny()+1)*3)):
# underflow to 0
ans = _dec_from_triple(0, '1', context.Etiny()-1)
elif self._sign == 0 and adj < -p:
# p+1 digits; final round will raise correct flags
ans = _dec_from_triple(0, '1' + '0'*(p-1) + '1', -p)
elif self._sign == 1 and adj < -p-1:
# p+1 digits; final round will raise correct flags
ans = _dec_from_triple(0, '9'*(p+1), -p-1)
# general case
else:
op = _WorkRep(self)
c, e = op.int, op.exp
if op.sign == 1:
c = -c
# compute correctly rounded result: increase precision by
# 3 digits at a time until we get an unambiguously
# roundable result
extra = 3
while True:
coeff, exp = _dexp(c, e, p+extra)
if coeff % (5*10**(len(str(coeff))-p-1)):
break
extra += 3
ans = _dec_from_triple(0, str(coeff), exp)
# at this stage, ans should round correctly with *any*
# rounding mode, not just with ROUND_HALF_EVEN
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def is_canonical(self):
"""Return True if self is canonical; otherwise return False.
Currently, the encoding of a Decimal instance is always
canonical, so this method returns True for any Decimal.
"""
return True
def is_finite(self):
"""Return True if self is finite; otherwise return False.
A Decimal instance is considered finite if it is neither
infinite nor a NaN.
"""
return not self._is_special
def is_infinite(self):
"""Return True if self is infinite; otherwise return False."""
return self._exp == 'F'
def is_nan(self):
"""Return True if self is a qNaN or sNaN; otherwise return False."""
return self._exp in ('n', 'N')
def is_normal(self, context=None):
"""Return True if self is a normal number; otherwise return False."""
if self._is_special or not self:
return False
if context is None:
context = getcontext()
return context.Emin <= self.adjusted()
def is_qnan(self):
"""Return True if self is a quiet NaN; otherwise return False."""
return self._exp == 'n'
def is_signed(self):
"""Return True if self is negative; otherwise return False."""
return self._sign == 1
def is_snan(self):
"""Return True if self is a signaling NaN; otherwise return False."""
return self._exp == 'N'
def is_subnormal(self, context=None):
"""Return True if self is subnormal; otherwise return False."""
if self._is_special or not self:
return False
if context is None:
context = getcontext()
return self.adjusted() < context.Emin
def is_zero(self):
"""Return True if self is a zero; otherwise return False."""
return not self._is_special and self._int == '0'
def _ln_exp_bound(self):
"""Compute a lower bound for the adjusted exponent of self.ln().
In other words, compute r such that self.ln() >= 10**r. Assumes
that self is finite and positive and that self != 1.
"""
# for 0.1 <= x <= 10 we use the inequalities 1-1/x <= ln(x) <= x-1
adj = self._exp + len(self._int) - 1
if adj >= 1:
# argument >= 10; we use 23/10 = 2.3 as a lower bound for ln(10)
return len(str(adj*23//10)) - 1
if adj <= -2:
# argument <= 0.1
return len(str((-1-adj)*23//10)) - 1
op = _WorkRep(self)
c, e = op.int, op.exp
if adj == 0:
# 1 < self < 10
num = str(c-10**-e)
den = str(c)
return len(num) - len(den) - (num < den)
# adj == -1, 0.1 <= self < 1
return e + len(str(10**-e - c)) - 1
def ln(self, context=None):
"""Returns the natural (base e) logarithm of self."""
if context is None:
context = getcontext()
# ln(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
# ln(0.0) == -Infinity
if not self:
return _NegativeInfinity
# ln(Infinity) = Infinity
if self._isinfinity() == 1:
return _Infinity
# ln(1.0) == 0.0
if self == _One:
return _Zero
# ln(negative) raises InvalidOperation
if self._sign == 1:
return context._raise_error(InvalidOperation,
'ln of a negative value')
# result is irrational, so necessarily inexact
op = _WorkRep(self)
c, e = op.int, op.exp
p = context.prec
# correctly rounded result: repeatedly increase precision by 3
# until we get an unambiguously roundable result
places = p - self._ln_exp_bound() + 2 # at least p+3 places
while True:
coeff = _dlog(c, e, places)
# assert len(str(abs(coeff)))-p >= 1
if coeff % (5*10**(len(str(abs(coeff)))-p-1)):
break
places += 3
ans = _dec_from_triple(int(coeff<0), str(abs(coeff)), -places)
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def _log10_exp_bound(self):
"""Compute a lower bound for the adjusted exponent of self.log10().
In other words, find r such that self.log10() >= 10**r.
Assumes that self is finite and positive and that self != 1.
"""
# For x >= 10 or x < 0.1 we only need a bound on the integer
# part of log10(self), and this comes directly from the
# exponent of x. For 0.1 <= x <= 10 we use the inequalities
# 1-1/x <= log(x) <= x-1. If x > 1 we have |log10(x)| >
# (1-1/x)/2.31 > 0. If x < 1 then |log10(x)| > (1-x)/2.31 > 0
adj = self._exp + len(self._int) - 1
if adj >= 1:
# self >= 10
return len(str(adj))-1
if adj <= -2:
# self < 0.1
return len(str(-1-adj))-1
op = _WorkRep(self)
c, e = op.int, op.exp
if adj == 0:
# 1 < self < 10
num = str(c-10**-e)
den = str(231*c)
return len(num) - len(den) - (num < den) + 2
# adj == -1, 0.1 <= self < 1
num = str(10**-e-c)
return len(num) + e - (num < "231") - 1
def log10(self, context=None):
"""Returns the base 10 logarithm of self."""
if context is None:
context = getcontext()
# log10(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
# log10(0.0) == -Infinity
if not self:
return _NegativeInfinity
# log10(Infinity) = Infinity
if self._isinfinity() == 1:
return _Infinity
# log10(negative or -Infinity) raises InvalidOperation
if self._sign == 1:
return context._raise_error(InvalidOperation,
'log10 of a negative value')
# log10(10**n) = n
if self._int[0] == '1' and self._int[1:] == '0'*(len(self._int) - 1):
# answer may need rounding
ans = Decimal(self._exp + len(self._int) - 1)
else:
# result is irrational, so necessarily inexact
op = _WorkRep(self)
c, e = op.int, op.exp
p = context.prec
# correctly rounded result: repeatedly increase precision
# until result is unambiguously roundable
places = p-self._log10_exp_bound()+2
while True:
coeff = _dlog10(c, e, places)
# assert len(str(abs(coeff)))-p >= 1
if coeff % (5*10**(len(str(abs(coeff)))-p-1)):
break
places += 3
ans = _dec_from_triple(int(coeff<0), str(abs(coeff)), -places)
context = context._shallow_copy()
rounding = context._set_rounding(ROUND_HALF_EVEN)
ans = ans._fix(context)
context.rounding = rounding
return ans
def logb(self, context=None):
""" Returns the exponent of the magnitude of self's MSD.
The result is the integer which is the exponent of the magnitude
of the most significant digit of self (as though it were truncated
to a single digit while maintaining the value of that digit and
without limiting the resulting exponent).
"""
# logb(NaN) = NaN
ans = self._check_nans(context=context)
if ans:
return ans
if context is None:
context = getcontext()
# logb(+/-Inf) = +Inf
if self._isinfinity():
return _Infinity
# logb(0) = -Inf, DivisionByZero
if not self:
return context._raise_error(DivisionByZero, 'logb(0)', 1)
# otherwise, simply return the adjusted exponent of self, as a
# Decimal. Note that no attempt is made to fit the result
# into the current context.
ans = Decimal(self.adjusted())
return ans._fix(context)
def _islogical(self):
"""Return True if self is a logical operand.
For being logical, it must be a finite number with a sign of 0,
an exponent of 0, and a coefficient whose digits must all be
either 0 or 1.
"""
if self._sign != 0 or self._exp != 0:
return False
for dig in self._int:
if dig not in '01':
return False
return True
def _fill_logical(self, context, opa, opb):
dif = context.prec - len(opa)
if dif > 0:
opa = '0'*dif + opa
elif dif < 0:
opa = opa[-context.prec:]
dif = context.prec - len(opb)
if dif > 0:
opb = '0'*dif + opb
elif dif < 0:
opb = opb[-context.prec:]
return opa, opb
def logical_and(self, other, context=None):
"""Applies an 'and' operation between self and other's digits."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
if not self._islogical() or not other._islogical():
return context._raise_error(InvalidOperation)
# fill to context.prec
(opa, opb) = self._fill_logical(context, self._int, other._int)
# make the operation, and clean starting zeroes
result = "".join([str(int(a)&int(b)) for a,b in zip(opa,opb)])
return _dec_from_triple(0, result.lstrip('0') or '0', 0)
def logical_invert(self, context=None):
"""Invert all its digits."""
if context is None:
context = getcontext()
return self.logical_xor(_dec_from_triple(0,'1'*context.prec,0),
context)
def logical_or(self, other, context=None):
"""Applies an 'or' operation between self and other's digits."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
if not self._islogical() or not other._islogical():
return context._raise_error(InvalidOperation)
# fill to context.prec
(opa, opb) = self._fill_logical(context, self._int, other._int)
# make the operation, and clean starting zeroes
result = "".join([str(int(a)|int(b)) for a,b in zip(opa,opb)])
return _dec_from_triple(0, result.lstrip('0') or '0', 0)
def logical_xor(self, other, context=None):
"""Applies an 'xor' operation between self and other's digits."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
if not self._islogical() or not other._islogical():
return context._raise_error(InvalidOperation)
# fill to context.prec
(opa, opb) = self._fill_logical(context, self._int, other._int)
# make the operation, and clean starting zeroes
result = "".join([str(int(a)^int(b)) for a,b in zip(opa,opb)])
return _dec_from_triple(0, result.lstrip('0') or '0', 0)
def max_mag(self, other, context=None):
"""Compares the values numerically with their sign ignored."""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self.copy_abs()._cmp(other.copy_abs())
if c == 0:
c = self.compare_total(other)
if c == -1:
ans = other
else:
ans = self
return ans._fix(context)
def min_mag(self, other, context=None):
"""Compares the values numerically with their sign ignored."""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
if self._is_special or other._is_special:
# If one operand is a quiet NaN and the other is number, then the
# number is always returned
sn = self._isnan()
on = other._isnan()
if sn or on:
if on == 1 and sn == 0:
return self._fix(context)
if sn == 1 and on == 0:
return other._fix(context)
return self._check_nans(other, context)
c = self.copy_abs()._cmp(other.copy_abs())
if c == 0:
c = self.compare_total(other)
if c == -1:
ans = self
else:
ans = other
return ans._fix(context)
def next_minus(self, context=None):
"""Returns the largest representable number smaller than itself."""
if context is None:
context = getcontext()
ans = self._check_nans(context=context)
if ans:
return ans
if self._isinfinity() == -1:
return _NegativeInfinity
if self._isinfinity() == 1:
return _dec_from_triple(0, '9'*context.prec, context.Etop())
context = context.copy()
context._set_rounding(ROUND_FLOOR)
context._ignore_all_flags()
new_self = self._fix(context)
if new_self != self:
return new_self
return self.__sub__(_dec_from_triple(0, '1', context.Etiny()-1),
context)
def next_plus(self, context=None):
"""Returns the smallest representable number larger than itself."""
if context is None:
context = getcontext()
ans = self._check_nans(context=context)
if ans:
return ans
if self._isinfinity() == 1:
return _Infinity
if self._isinfinity() == -1:
return _dec_from_triple(1, '9'*context.prec, context.Etop())
context = context.copy()
context._set_rounding(ROUND_CEILING)
context._ignore_all_flags()
new_self = self._fix(context)
if new_self != self:
return new_self
return self.__add__(_dec_from_triple(0, '1', context.Etiny()-1),
context)
def next_toward(self, other, context=None):
"""Returns the number closest to self, in the direction towards other.
The result is the closest representable number to self
(excluding self) that is in the direction towards other,
unless both have the same value. If the two operands are
numerically equal, then the result is a copy of self with the
sign set to be the same as the sign of other.
"""
other = _convert_other(other, raiseit=True)
if context is None:
context = getcontext()
ans = self._check_nans(other, context)
if ans:
return ans
comparison = self._cmp(other)
if comparison == 0:
return self.copy_sign(other)
if comparison == -1:
ans = self.next_plus(context)
else: # comparison == 1
ans = self.next_minus(context)
# decide which flags to raise using value of ans
if ans._isinfinity():
context._raise_error(Overflow,
'Infinite result from next_toward',
ans._sign)
context._raise_error(Inexact)
context._raise_error(Rounded)
elif ans.adjusted() < context.Emin:
context._raise_error(Underflow)
context._raise_error(Subnormal)
context._raise_error(Inexact)
context._raise_error(Rounded)
# if precision == 1 then we don't raise Clamped for a
# result 0E-Etiny.
if not ans:
context._raise_error(Clamped)
return ans
def number_class(self, context=None):
"""Returns an indication of the class of self.
The class is one of the following strings:
sNaN
NaN
-Infinity
-Normal
-Subnormal
-Zero
+Zero
+Subnormal
+Normal
+Infinity
"""
if self.is_snan():
return "sNaN"
if self.is_qnan():
return "NaN"
inf = self._isinfinity()
if inf == 1:
return "+Infinity"
if inf == -1:
return "-Infinity"
if self.is_zero():
if self._sign:
return "-Zero"
else:
return "+Zero"
if context is None:
context = getcontext()
if self.is_subnormal(context=context):
if self._sign:
return "-Subnormal"
else:
return "+Subnormal"
# just a normal, regular, boring number, :)
if self._sign:
return "-Normal"
else:
return "+Normal"
def radix(self):
"""Just returns 10, as this is Decimal, :)"""
return Decimal(10)
def rotate(self, other, context=None):
"""Returns a rotated copy of self, value-of-other times."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
if other._exp != 0:
return context._raise_error(InvalidOperation)
if not (-context.prec <= int(other) <= context.prec):
return context._raise_error(InvalidOperation)
if self._isinfinity():
return Decimal(self)
# get values, pad if necessary
torot = int(other)
rotdig = self._int
topad = context.prec - len(rotdig)
if topad > 0:
rotdig = '0'*topad + rotdig
elif topad < 0:
rotdig = rotdig[-topad:]
# let's rotate!
rotated = rotdig[torot:] + rotdig[:torot]
return _dec_from_triple(self._sign,
rotated.lstrip('0') or '0', self._exp)
def scaleb(self, other, context=None):
"""Returns self operand after adding the second value to its exp."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
if other._exp != 0:
return context._raise_error(InvalidOperation)
liminf = -2 * (context.Emax + context.prec)
limsup = 2 * (context.Emax + context.prec)
if not (liminf <= int(other) <= limsup):
return context._raise_error(InvalidOperation)
if self._isinfinity():
return Decimal(self)
d = _dec_from_triple(self._sign, self._int, self._exp + int(other))
d = d._fix(context)
return d
def shift(self, other, context=None):
"""Returns a shifted copy of self, value-of-other times."""
if context is None:
context = getcontext()
other = _convert_other(other, raiseit=True)
ans = self._check_nans(other, context)
if ans:
return ans
if other._exp != 0:
return context._raise_error(InvalidOperation)
if not (-context.prec <= int(other) <= context.prec):
return context._raise_error(InvalidOperation)
if self._isinfinity():
return Decimal(self)
# get values, pad if necessary
torot = int(other)
rotdig = self._int
topad = context.prec - len(rotdig)
if topad > 0:
rotdig = '0'*topad + rotdig
elif topad < 0:
rotdig = rotdig[-topad:]
# let's shift!
if torot < 0:
shifted = rotdig[:torot]
else:
shifted = rotdig + '0'*torot
shifted = shifted[-context.prec:]
return _dec_from_triple(self._sign,
shifted.lstrip('0') or '0', self._exp)
# Support for pickling, copy, and deepcopy
def __reduce__(self):
return (self.__class__, (str(self),))
def __copy__(self):
if type(self) is Decimal:
return self # I'm immutable; therefore I am my own clone
return self.__class__(str(self))
def __deepcopy__(self, memo):
if type(self) is Decimal:
return self # My components are also immutable
return self.__class__(str(self))
# PEP 3101 support. the _localeconv keyword argument should be
# considered private: it's provided for ease of testing only.
def __format__(self, specifier, context=None, _localeconv=None):
"""Format a Decimal instance according to the given specifier.
The specifier should be a standard format specifier, with the
form described in PEP 3101. Formatting types 'e', 'E', 'f',
'F', 'g', 'G', 'n' and '%' are supported. If the formatting
type is omitted it defaults to 'g' or 'G', depending on the
value of context.capitals.
"""
# Note: PEP 3101 says that if the type is not present then
# there should be at least one digit after the decimal point.
# We take the liberty of ignoring this requirement for
# Decimal---it's presumably there to make sure that
# format(float, '') behaves similarly to str(float).
if context is None:
context = getcontext()
spec = _parse_format_specifier(specifier, _localeconv=_localeconv)
# special values don't care about the type or precision
if self._is_special:
sign = _format_sign(self._sign, spec)
body = str(self.copy_abs())
if spec['type'] == '%':
body += '%'
return _format_align(sign, body, spec)
# a type of None defaults to 'g' or 'G', depending on context
if spec['type'] is None:
spec['type'] = ['g', 'G'][context.capitals]
# if type is '%', adjust exponent of self accordingly
if spec['type'] == '%':
self = _dec_from_triple(self._sign, self._int, self._exp+2)
# round if necessary, taking rounding mode from the context
rounding = context.rounding
precision = spec['precision']
if precision is not None:
if spec['type'] in 'eE':
self = self._round(precision+1, rounding)
elif spec['type'] in 'fF%':
self = self._rescale(-precision, rounding)
elif spec['type'] in 'gG' and len(self._int) > precision:
self = self._round(precision, rounding)
# special case: zeros with a positive exponent can't be
# represented in fixed point; rescale them to 0e0.
if not self and self._exp > 0 and spec['type'] in 'fF%':
self = self._rescale(0, rounding)
# figure out placement of the decimal point
leftdigits = self._exp + len(self._int)
if spec['type'] in 'eE':
if not self and precision is not None:
dotplace = 1 - precision
else:
dotplace = 1
elif spec['type'] in 'fF%':
dotplace = leftdigits
elif spec['type'] in 'gG':
if self._exp <= 0 and leftdigits > -6:
dotplace = leftdigits
else:
dotplace = 1
# find digits before and after decimal point, and get exponent
if dotplace < 0:
intpart = '0'
fracpart = '0'*(-dotplace) + self._int
elif dotplace > len(self._int):
intpart = self._int + '0'*(dotplace-len(self._int))
fracpart = ''
else:
intpart = self._int[:dotplace] or '0'
fracpart = self._int[dotplace:]
exp = leftdigits-dotplace
# done with the decimal-specific stuff; hand over the rest
# of the formatting to the _format_number function
return _format_number(self._sign, intpart, fracpart, exp, spec)
def _dec_from_triple(sign, coefficient, exponent, special=False):
"""Create a decimal instance directly, without any validation,
normalization (e.g. removal of leading zeros) or argument
conversion.
This function is for *internal use only*.
"""
self = object.__new__(Decimal)
self._sign = sign
self._int = coefficient
self._exp = exponent
self._is_special = special
return self
# Register Decimal as a kind of Number (an abstract base class).
# However, do not register it as Real (because Decimals are not
# interoperable with floats).
_numbers.Number.register(Decimal)
##### Context class #######################################################
class _ContextManager(object):
"""Context manager class to support localcontext().
Sets a copy of the supplied context in __enter__() and restores
the previous decimal context in __exit__()
"""
def __init__(self, new_context):
self.new_context = new_context.copy()
def __enter__(self):
self.saved_context = getcontext()
setcontext(self.new_context)
return self.new_context
def __exit__(self, t, v, tb):
setcontext(self.saved_context)
class Context(object):
"""Contains the context for a Decimal instance.
Contains:
prec - precision (for use in rounding, division, square roots..)
rounding - rounding type (how you round)
traps - If traps[exception] = 1, then the exception is
raised when it is caused. Otherwise, a value is
substituted in.
flags - When an exception is caused, flags[exception] is set.
(Whether or not the trap_enabler is set)
Should be reset by user of Decimal instance.
Emin - Minimum exponent
Emax - Maximum exponent
capitals - If 1, 1*10^1 is printed as 1E+1.
If 0, printed as 1e1
clamp - If 1, change exponents if too high (Default 0)
"""
def __init__(self, prec=None, rounding=None, Emin=None, Emax=None,
capitals=None, clamp=None, flags=None, traps=None,
_ignored_flags=None):
# Set defaults; for everything except flags and _ignored_flags,
# inherit from DefaultContext.
try:
dc = DefaultContext
except NameError:
pass
self.prec = prec if prec is not None else dc.prec
self.rounding = rounding if rounding is not None else dc.rounding
self.Emin = Emin if Emin is not None else dc.Emin
self.Emax = Emax if Emax is not None else dc.Emax
self.capitals = capitals if capitals is not None else dc.capitals
self.clamp = clamp if clamp is not None else dc.clamp
if _ignored_flags is None:
self._ignored_flags = []
else:
self._ignored_flags = _ignored_flags
if traps is None:
self.traps = dc.traps.copy()
elif not isinstance(traps, dict):
self.traps = dict((s, int(s in traps)) for s in _signals + traps)
else:
self.traps = traps
if flags is None:
self.flags = dict.fromkeys(_signals, 0)
elif not isinstance(flags, dict):
self.flags = dict((s, int(s in flags)) for s in _signals + flags)
else:
self.flags = flags
def _set_integer_check(self, name, value, vmin, vmax):
if not isinstance(value, int):
raise TypeError("%s must be an integer" % name)
if vmin == '-inf':
if value > vmax:
raise ValueError("%s must be in [%s, %d]. got: %s" % (name, vmin, vmax, value))
elif vmax == 'inf':
if value < vmin:
raise ValueError("%s must be in [%d, %s]. got: %s" % (name, vmin, vmax, value))
else:
if value < vmin or value > vmax:
raise ValueError("%s must be in [%d, %d]. got %s" % (name, vmin, vmax, value))
return object.__setattr__(self, name, value)
def _set_signal_dict(self, name, d):
if not isinstance(d, dict):
raise TypeError("%s must be a signal dict" % d)
for key in d:
if not key in _signals:
raise KeyError("%s is not a valid signal dict" % d)
for key in _signals:
if not key in d:
raise KeyError("%s is not a valid signal dict" % d)
return object.__setattr__(self, name, d)
def __setattr__(self, name, value):
if name == 'prec':
return self._set_integer_check(name, value, 1, 'inf')
elif name == 'Emin':
return self._set_integer_check(name, value, '-inf', 0)
elif name == 'Emax':
return self._set_integer_check(name, value, 0, 'inf')
elif name == 'capitals':
return self._set_integer_check(name, value, 0, 1)
elif name == 'clamp':
return self._set_integer_check(name, value, 0, 1)
elif name == 'rounding':
if not value in _rounding_modes:
# raise TypeError even for strings to have consistency
# among various implementations.
raise TypeError("%s: invalid rounding mode" % value)
return object.__setattr__(self, name, value)
elif name == 'flags' or name == 'traps':
return self._set_signal_dict(name, value)
elif name == '_ignored_flags':
return object.__setattr__(self, name, value)
else:
raise AttributeError(
"'decimal.Context' object has no attribute '%s'" % name)
def __delattr__(self, name):
raise AttributeError("%s cannot be deleted" % name)
# Support for pickling, copy, and deepcopy
def __reduce__(self):
flags = [sig for sig, v in self.flags.items() if v]
traps = [sig for sig, v in self.traps.items() if v]
return (self.__class__,
(self.prec, self.rounding, self.Emin, self.Emax,
self.capitals, self.clamp, flags, traps))
def __repr__(self):
"""Show the current context."""
s = []
s.append('Context(prec=%(prec)d, rounding=%(rounding)s, '
'Emin=%(Emin)d, Emax=%(Emax)d, capitals=%(capitals)d, '
'clamp=%(clamp)d'
% vars(self))
names = [f.__name__ for f, v in self.flags.items() if v]
s.append('flags=[' + ', '.join(names) + ']')
names = [t.__name__ for t, v in self.traps.items() if v]
s.append('traps=[' + ', '.join(names) + ']')
return ', '.join(s) + ')'
def clear_flags(self):
"""Reset all flags to zero"""
for flag in self.flags:
self.flags[flag] = 0
def clear_traps(self):
"""Reset all traps to zero"""
for flag in self.traps:
self.traps[flag] = 0
def _shallow_copy(self):
"""Returns a shallow copy from self."""
nc = Context(self.prec, self.rounding, self.Emin, self.Emax,
self.capitals, self.clamp, self.flags, self.traps,
self._ignored_flags)
return nc
def copy(self):
"""Returns a deep copy from self."""
nc = Context(self.prec, self.rounding, self.Emin, self.Emax,
self.capitals, self.clamp,
self.flags.copy(), self.traps.copy(),
self._ignored_flags)
return nc
__copy__ = copy
def _raise_error(self, condition, explanation = None, *args):
"""Handles an error
If the flag is in _ignored_flags, returns the default response.
Otherwise, it sets the flag, then, if the corresponding
trap_enabler is set, it reraises the exception. Otherwise, it returns
the default value after setting the flag.
"""
error = _condition_map.get(condition, condition)
if error in self._ignored_flags:
# Don't touch the flag
return error().handle(self, *args)
self.flags[error] = 1
if not self.traps[error]:
# The errors define how to handle themselves.
return condition().handle(self, *args)
# Errors should only be risked on copies of the context
# self._ignored_flags = []
raise error(explanation)
def _ignore_all_flags(self):
"""Ignore all flags, if they are raised"""
return self._ignore_flags(*_signals)
def _ignore_flags(self, *flags):
"""Ignore the flags, if they are raised"""
# Do not mutate-- This way, copies of a context leave the original
# alone.
self._ignored_flags = (self._ignored_flags + list(flags))
return list(flags)
def _regard_flags(self, *flags):
"""Stop ignoring the flags, if they are raised"""
if flags and isinstance(flags[0], (tuple,list)):
flags = flags[0]
for flag in flags:
self._ignored_flags.remove(flag)
# We inherit object.__hash__, so we must deny this explicitly
__hash__ = None
def Etiny(self):
"""Returns Etiny (= Emin - prec + 1)"""
return int(self.Emin - self.prec + 1)
def Etop(self):
"""Returns maximum exponent (= Emax - prec + 1)"""
return int(self.Emax - self.prec + 1)
def _set_rounding(self, type):
"""Sets the rounding type.
Sets the rounding type, and returns the current (previous)
rounding type. Often used like:
context = context.copy()
# so you don't change the calling context
# if an error occurs in the middle.
rounding = context._set_rounding(ROUND_UP)
val = self.__sub__(other, context=context)
context._set_rounding(rounding)
This will make it round up for that operation.
"""
rounding = self.rounding
self.rounding= type
return rounding
def create_decimal(self, num='0'):
"""Creates a new Decimal instance but using self as context.
This method implements the to-number operation of the
IBM Decimal specification."""
if isinstance(num, str) and num != num.strip():
return self._raise_error(ConversionSyntax,
"no trailing or leading whitespace is "
"permitted.")
d = Decimal(num, context=self)
if d._isnan() and len(d._int) > self.prec - self.clamp:
return self._raise_error(ConversionSyntax,
"diagnostic info too long in NaN")
return d._fix(self)
def create_decimal_from_float(self, f):
"""Creates a new Decimal instance from a float but rounding using self
as the context.
>>> context = Context(prec=5, rounding=ROUND_DOWN)
>>> context.create_decimal_from_float(3.1415926535897932)
Decimal('3.1415')
>>> context = Context(prec=5, traps=[Inexact])
>>> context.create_decimal_from_float(3.1415926535897932)
Traceback (most recent call last):
...
decimal.Inexact: None
"""
d = Decimal.from_float(f) # An exact conversion
return d._fix(self) # Apply the context rounding
# Methods
def abs(self, a):
"""Returns the absolute value of the operand.
If the operand is negative, the result is the same as using the minus
operation on the operand. Otherwise, the result is the same as using
the plus operation on the operand.
>>> ExtendedContext.abs(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.abs(Decimal('-100'))
Decimal('100')
>>> ExtendedContext.abs(Decimal('101.5'))
Decimal('101.5')
>>> ExtendedContext.abs(Decimal('-101.5'))
Decimal('101.5')
>>> ExtendedContext.abs(-1)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.__abs__(context=self)
def add(self, a, b):
"""Return the sum of the two operands.
>>> ExtendedContext.add(Decimal('12'), Decimal('7.00'))
Decimal('19.00')
>>> ExtendedContext.add(Decimal('1E+2'), Decimal('1.01E+4'))
Decimal('1.02E+4')
>>> ExtendedContext.add(1, Decimal(2))
Decimal('3')
>>> ExtendedContext.add(Decimal(8), 5)
Decimal('13')
>>> ExtendedContext.add(5, 5)
Decimal('10')
"""
a = _convert_other(a, raiseit=True)
r = a.__add__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def _apply(self, a):
return str(a._fix(self))
def canonical(self, a):
"""Returns the same Decimal object.
As we do not have different encodings for the same number, the
received object already is in its canonical form.
>>> ExtendedContext.canonical(Decimal('2.50'))
Decimal('2.50')
"""
if not isinstance(a, Decimal):
raise TypeError("canonical requires a Decimal as an argument.")
return a.canonical()
def compare(self, a, b):
"""Compares values numerically.
If the signs of the operands differ, a value representing each operand
('-1' if the operand is less than zero, '0' if the operand is zero or
negative zero, or '1' if the operand is greater than zero) is used in
place of that operand for the comparison instead of the actual
operand.
The comparison is then effected by subtracting the second operand from
the first and then returning a value according to the result of the
subtraction: '-1' if the result is less than zero, '0' if the result is
zero or negative zero, or '1' if the result is greater than zero.
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('3'))
Decimal('-1')
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('2.1'))
Decimal('0')
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('2.10'))
Decimal('0')
>>> ExtendedContext.compare(Decimal('3'), Decimal('2.1'))
Decimal('1')
>>> ExtendedContext.compare(Decimal('2.1'), Decimal('-3'))
Decimal('1')
>>> ExtendedContext.compare(Decimal('-3'), Decimal('2.1'))
Decimal('-1')
>>> ExtendedContext.compare(1, 2)
Decimal('-1')
>>> ExtendedContext.compare(Decimal(1), 2)
Decimal('-1')
>>> ExtendedContext.compare(1, Decimal(2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.compare(b, context=self)
def compare_signal(self, a, b):
"""Compares the values of the two operands numerically.
It's pretty much like compare(), but all NaNs signal, with signaling
NaNs taking precedence over quiet NaNs.
>>> c = ExtendedContext
>>> c.compare_signal(Decimal('2.1'), Decimal('3'))
Decimal('-1')
>>> c.compare_signal(Decimal('2.1'), Decimal('2.1'))
Decimal('0')
>>> c.flags[InvalidOperation] = 0
>>> print(c.flags[InvalidOperation])
0
>>> c.compare_signal(Decimal('NaN'), Decimal('2.1'))
Decimal('NaN')
>>> print(c.flags[InvalidOperation])
1
>>> c.flags[InvalidOperation] = 0
>>> print(c.flags[InvalidOperation])
0
>>> c.compare_signal(Decimal('sNaN'), Decimal('2.1'))
Decimal('NaN')
>>> print(c.flags[InvalidOperation])
1
>>> c.compare_signal(-1, 2)
Decimal('-1')
>>> c.compare_signal(Decimal(-1), 2)
Decimal('-1')
>>> c.compare_signal(-1, Decimal(2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.compare_signal(b, context=self)
def compare_total(self, a, b):
"""Compares two operands using their abstract representation.
This is not like the standard compare, which use their numerical
value. Note that a total ordering is defined for all possible abstract
representations.
>>> ExtendedContext.compare_total(Decimal('12.73'), Decimal('127.9'))
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal('-127'), Decimal('12'))
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal('12.30'), Decimal('12.3'))
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal('12.30'), Decimal('12.30'))
Decimal('0')
>>> ExtendedContext.compare_total(Decimal('12.3'), Decimal('12.300'))
Decimal('1')
>>> ExtendedContext.compare_total(Decimal('12.3'), Decimal('NaN'))
Decimal('-1')
>>> ExtendedContext.compare_total(1, 2)
Decimal('-1')
>>> ExtendedContext.compare_total(Decimal(1), 2)
Decimal('-1')
>>> ExtendedContext.compare_total(1, Decimal(2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.compare_total(b)
def compare_total_mag(self, a, b):
"""Compares two operands using their abstract representation ignoring sign.
Like compare_total, but with operand's sign ignored and assumed to be 0.
"""
a = _convert_other(a, raiseit=True)
return a.compare_total_mag(b)
def copy_abs(self, a):
"""Returns a copy of the operand with the sign set to 0.
>>> ExtendedContext.copy_abs(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.copy_abs(Decimal('-100'))
Decimal('100')
>>> ExtendedContext.copy_abs(-1)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.copy_abs()
def copy_decimal(self, a):
"""Returns a copy of the decimal object.
>>> ExtendedContext.copy_decimal(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.copy_decimal(Decimal('-1.00'))
Decimal('-1.00')
>>> ExtendedContext.copy_decimal(1)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return Decimal(a)
def copy_negate(self, a):
"""Returns a copy of the operand with the sign inverted.
>>> ExtendedContext.copy_negate(Decimal('101.5'))
Decimal('-101.5')
>>> ExtendedContext.copy_negate(Decimal('-101.5'))
Decimal('101.5')
>>> ExtendedContext.copy_negate(1)
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.copy_negate()
def copy_sign(self, a, b):
"""Copies the second operand's sign to the first one.
In detail, it returns a copy of the first operand with the sign
equal to the sign of the second operand.
>>> ExtendedContext.copy_sign(Decimal( '1.50'), Decimal('7.33'))
Decimal('1.50')
>>> ExtendedContext.copy_sign(Decimal('-1.50'), Decimal('7.33'))
Decimal('1.50')
>>> ExtendedContext.copy_sign(Decimal( '1.50'), Decimal('-7.33'))
Decimal('-1.50')
>>> ExtendedContext.copy_sign(Decimal('-1.50'), Decimal('-7.33'))
Decimal('-1.50')
>>> ExtendedContext.copy_sign(1, -2)
Decimal('-1')
>>> ExtendedContext.copy_sign(Decimal(1), -2)
Decimal('-1')
>>> ExtendedContext.copy_sign(1, Decimal(-2))
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.copy_sign(b)
def divide(self, a, b):
"""Decimal division in a specified context.
>>> ExtendedContext.divide(Decimal('1'), Decimal('3'))
Decimal('0.333333333')
>>> ExtendedContext.divide(Decimal('2'), Decimal('3'))
Decimal('0.666666667')
>>> ExtendedContext.divide(Decimal('5'), Decimal('2'))
Decimal('2.5')
>>> ExtendedContext.divide(Decimal('1'), Decimal('10'))
Decimal('0.1')
>>> ExtendedContext.divide(Decimal('12'), Decimal('12'))
Decimal('1')
>>> ExtendedContext.divide(Decimal('8.00'), Decimal('2'))
Decimal('4.00')
>>> ExtendedContext.divide(Decimal('2.400'), Decimal('2.0'))
Decimal('1.20')
>>> ExtendedContext.divide(Decimal('1000'), Decimal('100'))
Decimal('10')
>>> ExtendedContext.divide(Decimal('1000'), Decimal('1'))
Decimal('1000')
>>> ExtendedContext.divide(Decimal('2.40E+6'), Decimal('2'))
Decimal('1.20E+6')
>>> ExtendedContext.divide(5, 5)
Decimal('1')
>>> ExtendedContext.divide(Decimal(5), 5)
Decimal('1')
>>> ExtendedContext.divide(5, Decimal(5))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
r = a.__truediv__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def divide_int(self, a, b):
"""Divides two numbers and returns the integer part of the result.
>>> ExtendedContext.divide_int(Decimal('2'), Decimal('3'))
Decimal('0')
>>> ExtendedContext.divide_int(Decimal('10'), Decimal('3'))
Decimal('3')
>>> ExtendedContext.divide_int(Decimal('1'), Decimal('0.3'))
Decimal('3')
>>> ExtendedContext.divide_int(10, 3)
Decimal('3')
>>> ExtendedContext.divide_int(Decimal(10), 3)
Decimal('3')
>>> ExtendedContext.divide_int(10, Decimal(3))
Decimal('3')
"""
a = _convert_other(a, raiseit=True)
r = a.__floordiv__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def divmod(self, a, b):
"""Return (a // b, a % b).
>>> ExtendedContext.divmod(Decimal(8), Decimal(3))
(Decimal('2'), Decimal('2'))
>>> ExtendedContext.divmod(Decimal(8), Decimal(4))
(Decimal('2'), Decimal('0'))
>>> ExtendedContext.divmod(8, 4)
(Decimal('2'), Decimal('0'))
>>> ExtendedContext.divmod(Decimal(8), 4)
(Decimal('2'), Decimal('0'))
>>> ExtendedContext.divmod(8, Decimal(4))
(Decimal('2'), Decimal('0'))
"""
a = _convert_other(a, raiseit=True)
r = a.__divmod__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def exp(self, a):
"""Returns e ** a.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.exp(Decimal('-Infinity'))
Decimal('0')
>>> c.exp(Decimal('-1'))
Decimal('0.367879441')
>>> c.exp(Decimal('0'))
Decimal('1')
>>> c.exp(Decimal('1'))
Decimal('2.71828183')
>>> c.exp(Decimal('0.693147181'))
Decimal('2.00000000')
>>> c.exp(Decimal('+Infinity'))
Decimal('Infinity')
>>> c.exp(10)
Decimal('22026.4658')
"""
a =_convert_other(a, raiseit=True)
return a.exp(context=self)
def fma(self, a, b, c):
"""Returns a multiplied by b, plus c.
The first two operands are multiplied together, using multiply,
the third operand is then added to the result of that
multiplication, using add, all with only one final rounding.
>>> ExtendedContext.fma(Decimal('3'), Decimal('5'), Decimal('7'))
Decimal('22')
>>> ExtendedContext.fma(Decimal('3'), Decimal('-5'), Decimal('7'))
Decimal('-8')
>>> ExtendedContext.fma(Decimal('888565290'), Decimal('1557.96930'), Decimal('-86087.7578'))
Decimal('1.38435736E+12')
>>> ExtendedContext.fma(1, 3, 4)
Decimal('7')
>>> ExtendedContext.fma(1, Decimal(3), 4)
Decimal('7')
>>> ExtendedContext.fma(1, 3, Decimal(4))
Decimal('7')
"""
a = _convert_other(a, raiseit=True)
return a.fma(b, c, context=self)
def is_canonical(self, a):
"""Return True if the operand is canonical; otherwise return False.
Currently, the encoding of a Decimal instance is always
canonical, so this method returns True for any Decimal.
>>> ExtendedContext.is_canonical(Decimal('2.50'))
True
"""
if not isinstance(a, Decimal):
raise TypeError("is_canonical requires a Decimal as an argument.")
return a.is_canonical()
def is_finite(self, a):
"""Return True if the operand is finite; otherwise return False.
A Decimal instance is considered finite if it is neither
infinite nor a NaN.
>>> ExtendedContext.is_finite(Decimal('2.50'))
True
>>> ExtendedContext.is_finite(Decimal('-0.3'))
True
>>> ExtendedContext.is_finite(Decimal('0'))
True
>>> ExtendedContext.is_finite(Decimal('Inf'))
False
>>> ExtendedContext.is_finite(Decimal('NaN'))
False
>>> ExtendedContext.is_finite(1)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_finite()
def is_infinite(self, a):
"""Return True if the operand is infinite; otherwise return False.
>>> ExtendedContext.is_infinite(Decimal('2.50'))
False
>>> ExtendedContext.is_infinite(Decimal('-Inf'))
True
>>> ExtendedContext.is_infinite(Decimal('NaN'))
False
>>> ExtendedContext.is_infinite(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_infinite()
def is_nan(self, a):
"""Return True if the operand is a qNaN or sNaN;
otherwise return False.
>>> ExtendedContext.is_nan(Decimal('2.50'))
False
>>> ExtendedContext.is_nan(Decimal('NaN'))
True
>>> ExtendedContext.is_nan(Decimal('-sNaN'))
True
>>> ExtendedContext.is_nan(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_nan()
def is_normal(self, a):
"""Return True if the operand is a normal number;
otherwise return False.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.is_normal(Decimal('2.50'))
True
>>> c.is_normal(Decimal('0.1E-999'))
False
>>> c.is_normal(Decimal('0.00'))
False
>>> c.is_normal(Decimal('-Inf'))
False
>>> c.is_normal(Decimal('NaN'))
False
>>> c.is_normal(1)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_normal(context=self)
def is_qnan(self, a):
"""Return True if the operand is a quiet NaN; otherwise return False.
>>> ExtendedContext.is_qnan(Decimal('2.50'))
False
>>> ExtendedContext.is_qnan(Decimal('NaN'))
True
>>> ExtendedContext.is_qnan(Decimal('sNaN'))
False
>>> ExtendedContext.is_qnan(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_qnan()
def is_signed(self, a):
"""Return True if the operand is negative; otherwise return False.
>>> ExtendedContext.is_signed(Decimal('2.50'))
False
>>> ExtendedContext.is_signed(Decimal('-12'))
True
>>> ExtendedContext.is_signed(Decimal('-0'))
True
>>> ExtendedContext.is_signed(8)
False
>>> ExtendedContext.is_signed(-8)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_signed()
def is_snan(self, a):
"""Return True if the operand is a signaling NaN;
otherwise return False.
>>> ExtendedContext.is_snan(Decimal('2.50'))
False
>>> ExtendedContext.is_snan(Decimal('NaN'))
False
>>> ExtendedContext.is_snan(Decimal('sNaN'))
True
>>> ExtendedContext.is_snan(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_snan()
def is_subnormal(self, a):
"""Return True if the operand is subnormal; otherwise return False.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.is_subnormal(Decimal('2.50'))
False
>>> c.is_subnormal(Decimal('0.1E-999'))
True
>>> c.is_subnormal(Decimal('0.00'))
False
>>> c.is_subnormal(Decimal('-Inf'))
False
>>> c.is_subnormal(Decimal('NaN'))
False
>>> c.is_subnormal(1)
False
"""
a = _convert_other(a, raiseit=True)
return a.is_subnormal(context=self)
def is_zero(self, a):
"""Return True if the operand is a zero; otherwise return False.
>>> ExtendedContext.is_zero(Decimal('0'))
True
>>> ExtendedContext.is_zero(Decimal('2.50'))
False
>>> ExtendedContext.is_zero(Decimal('-0E+2'))
True
>>> ExtendedContext.is_zero(1)
False
>>> ExtendedContext.is_zero(0)
True
"""
a = _convert_other(a, raiseit=True)
return a.is_zero()
def ln(self, a):
"""Returns the natural (base e) logarithm of the operand.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.ln(Decimal('0'))
Decimal('-Infinity')
>>> c.ln(Decimal('1.000'))
Decimal('0')
>>> c.ln(Decimal('2.71828183'))
Decimal('1.00000000')
>>> c.ln(Decimal('10'))
Decimal('2.30258509')
>>> c.ln(Decimal('+Infinity'))
Decimal('Infinity')
>>> c.ln(1)
Decimal('0')
"""
a = _convert_other(a, raiseit=True)
return a.ln(context=self)
def log10(self, a):
"""Returns the base 10 logarithm of the operand.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.log10(Decimal('0'))
Decimal('-Infinity')
>>> c.log10(Decimal('0.001'))
Decimal('-3')
>>> c.log10(Decimal('1.000'))
Decimal('0')
>>> c.log10(Decimal('2'))
Decimal('0.301029996')
>>> c.log10(Decimal('10'))
Decimal('1')
>>> c.log10(Decimal('70'))
Decimal('1.84509804')
>>> c.log10(Decimal('+Infinity'))
Decimal('Infinity')
>>> c.log10(0)
Decimal('-Infinity')
>>> c.log10(1)
Decimal('0')
"""
a = _convert_other(a, raiseit=True)
return a.log10(context=self)
def logb(self, a):
""" Returns the exponent of the magnitude of the operand's MSD.
The result is the integer which is the exponent of the magnitude
of the most significant digit of the operand (as though the
operand were truncated to a single digit while maintaining the
value of that digit and without limiting the resulting exponent).
>>> ExtendedContext.logb(Decimal('250'))
Decimal('2')
>>> ExtendedContext.logb(Decimal('2.50'))
Decimal('0')
>>> ExtendedContext.logb(Decimal('0.03'))
Decimal('-2')
>>> ExtendedContext.logb(Decimal('0'))
Decimal('-Infinity')
>>> ExtendedContext.logb(1)
Decimal('0')
>>> ExtendedContext.logb(10)
Decimal('1')
>>> ExtendedContext.logb(100)
Decimal('2')
"""
a = _convert_other(a, raiseit=True)
return a.logb(context=self)
def logical_and(self, a, b):
"""Applies the logical operation 'and' between each operand's digits.
The operands must be both logical numbers.
>>> ExtendedContext.logical_and(Decimal('0'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_and(Decimal('0'), Decimal('1'))
Decimal('0')
>>> ExtendedContext.logical_and(Decimal('1'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_and(Decimal('1'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_and(Decimal('1100'), Decimal('1010'))
Decimal('1000')
>>> ExtendedContext.logical_and(Decimal('1111'), Decimal('10'))
Decimal('10')
>>> ExtendedContext.logical_and(110, 1101)
Decimal('100')
>>> ExtendedContext.logical_and(Decimal(110), 1101)
Decimal('100')
>>> ExtendedContext.logical_and(110, Decimal(1101))
Decimal('100')
"""
a = _convert_other(a, raiseit=True)
return a.logical_and(b, context=self)
def logical_invert(self, a):
"""Invert all the digits in the operand.
The operand must be a logical number.
>>> ExtendedContext.logical_invert(Decimal('0'))
Decimal('111111111')
>>> ExtendedContext.logical_invert(Decimal('1'))
Decimal('111111110')
>>> ExtendedContext.logical_invert(Decimal('111111111'))
Decimal('0')
>>> ExtendedContext.logical_invert(Decimal('101010101'))
Decimal('10101010')
>>> ExtendedContext.logical_invert(1101)
Decimal('111110010')
"""
a = _convert_other(a, raiseit=True)
return a.logical_invert(context=self)
def logical_or(self, a, b):
"""Applies the logical operation 'or' between each operand's digits.
The operands must be both logical numbers.
>>> ExtendedContext.logical_or(Decimal('0'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_or(Decimal('0'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_or(Decimal('1'), Decimal('0'))
Decimal('1')
>>> ExtendedContext.logical_or(Decimal('1'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_or(Decimal('1100'), Decimal('1010'))
Decimal('1110')
>>> ExtendedContext.logical_or(Decimal('1110'), Decimal('10'))
Decimal('1110')
>>> ExtendedContext.logical_or(110, 1101)
Decimal('1111')
>>> ExtendedContext.logical_or(Decimal(110), 1101)
Decimal('1111')
>>> ExtendedContext.logical_or(110, Decimal(1101))
Decimal('1111')
"""
a = _convert_other(a, raiseit=True)
return a.logical_or(b, context=self)
def logical_xor(self, a, b):
"""Applies the logical operation 'xor' between each operand's digits.
The operands must be both logical numbers.
>>> ExtendedContext.logical_xor(Decimal('0'), Decimal('0'))
Decimal('0')
>>> ExtendedContext.logical_xor(Decimal('0'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.logical_xor(Decimal('1'), Decimal('0'))
Decimal('1')
>>> ExtendedContext.logical_xor(Decimal('1'), Decimal('1'))
Decimal('0')
>>> ExtendedContext.logical_xor(Decimal('1100'), Decimal('1010'))
Decimal('110')
>>> ExtendedContext.logical_xor(Decimal('1111'), Decimal('10'))
Decimal('1101')
>>> ExtendedContext.logical_xor(110, 1101)
Decimal('1011')
>>> ExtendedContext.logical_xor(Decimal(110), 1101)
Decimal('1011')
>>> ExtendedContext.logical_xor(110, Decimal(1101))
Decimal('1011')
"""
a = _convert_other(a, raiseit=True)
return a.logical_xor(b, context=self)
def max(self, a, b):
"""max compares two values numerically and returns the maximum.
If either operand is a NaN then the general rules apply.
Otherwise, the operands are compared as though by the compare
operation. If they are numerically equal then the left-hand operand
is chosen as the result. Otherwise the maximum (closer to positive
infinity) of the two operands is chosen as the result.
>>> ExtendedContext.max(Decimal('3'), Decimal('2'))
Decimal('3')
>>> ExtendedContext.max(Decimal('-10'), Decimal('3'))
Decimal('3')
>>> ExtendedContext.max(Decimal('1.0'), Decimal('1'))
Decimal('1')
>>> ExtendedContext.max(Decimal('7'), Decimal('NaN'))
Decimal('7')
>>> ExtendedContext.max(1, 2)
Decimal('2')
>>> ExtendedContext.max(Decimal(1), 2)
Decimal('2')
>>> ExtendedContext.max(1, Decimal(2))
Decimal('2')
"""
a = _convert_other(a, raiseit=True)
return a.max(b, context=self)
def max_mag(self, a, b):
"""Compares the values numerically with their sign ignored.
>>> ExtendedContext.max_mag(Decimal('7'), Decimal('NaN'))
Decimal('7')
>>> ExtendedContext.max_mag(Decimal('7'), Decimal('-10'))
Decimal('-10')
>>> ExtendedContext.max_mag(1, -2)
Decimal('-2')
>>> ExtendedContext.max_mag(Decimal(1), -2)
Decimal('-2')
>>> ExtendedContext.max_mag(1, Decimal(-2))
Decimal('-2')
"""
a = _convert_other(a, raiseit=True)
return a.max_mag(b, context=self)
def min(self, a, b):
"""min compares two values numerically and returns the minimum.
If either operand is a NaN then the general rules apply.
Otherwise, the operands are compared as though by the compare
operation. If they are numerically equal then the left-hand operand
is chosen as the result. Otherwise the minimum (closer to negative
infinity) of the two operands is chosen as the result.
>>> ExtendedContext.min(Decimal('3'), Decimal('2'))
Decimal('2')
>>> ExtendedContext.min(Decimal('-10'), Decimal('3'))
Decimal('-10')
>>> ExtendedContext.min(Decimal('1.0'), Decimal('1'))
Decimal('1.0')
>>> ExtendedContext.min(Decimal('7'), Decimal('NaN'))
Decimal('7')
>>> ExtendedContext.min(1, 2)
Decimal('1')
>>> ExtendedContext.min(Decimal(1), 2)
Decimal('1')
>>> ExtendedContext.min(1, Decimal(29))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.min(b, context=self)
def min_mag(self, a, b):
"""Compares the values numerically with their sign ignored.
>>> ExtendedContext.min_mag(Decimal('3'), Decimal('-2'))
Decimal('-2')
>>> ExtendedContext.min_mag(Decimal('-3'), Decimal('NaN'))
Decimal('-3')
>>> ExtendedContext.min_mag(1, -2)
Decimal('1')
>>> ExtendedContext.min_mag(Decimal(1), -2)
Decimal('1')
>>> ExtendedContext.min_mag(1, Decimal(-2))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.min_mag(b, context=self)
def minus(self, a):
"""Minus corresponds to unary prefix minus in Python.
The operation is evaluated using the same rules as subtract; the
operation minus(a) is calculated as subtract('0', a) where the '0'
has the same exponent as the operand.
>>> ExtendedContext.minus(Decimal('1.3'))
Decimal('-1.3')
>>> ExtendedContext.minus(Decimal('-1.3'))
Decimal('1.3')
>>> ExtendedContext.minus(1)
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.__neg__(context=self)
def multiply(self, a, b):
"""multiply multiplies two operands.
If either operand is a special value then the general rules apply.
Otherwise, the operands are multiplied together
('long multiplication'), resulting in a number which may be as long as
the sum of the lengths of the two operands.
>>> ExtendedContext.multiply(Decimal('1.20'), Decimal('3'))
Decimal('3.60')
>>> ExtendedContext.multiply(Decimal('7'), Decimal('3'))
Decimal('21')
>>> ExtendedContext.multiply(Decimal('0.9'), Decimal('0.8'))
Decimal('0.72')
>>> ExtendedContext.multiply(Decimal('0.9'), Decimal('-0'))
Decimal('-0.0')
>>> ExtendedContext.multiply(Decimal('654321'), Decimal('654321'))
Decimal('4.28135971E+11')
>>> ExtendedContext.multiply(7, 7)
Decimal('49')
>>> ExtendedContext.multiply(Decimal(7), 7)
Decimal('49')
>>> ExtendedContext.multiply(7, Decimal(7))
Decimal('49')
"""
a = _convert_other(a, raiseit=True)
r = a.__mul__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def next_minus(self, a):
"""Returns the largest representable number smaller than a.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> ExtendedContext.next_minus(Decimal('1'))
Decimal('0.999999999')
>>> c.next_minus(Decimal('1E-1007'))
Decimal('0E-1007')
>>> ExtendedContext.next_minus(Decimal('-1.00000003'))
Decimal('-1.00000004')
>>> c.next_minus(Decimal('Infinity'))
Decimal('9.99999999E+999')
>>> c.next_minus(1)
Decimal('0.999999999')
"""
a = _convert_other(a, raiseit=True)
return a.next_minus(context=self)
def next_plus(self, a):
"""Returns the smallest representable number larger than a.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> ExtendedContext.next_plus(Decimal('1'))
Decimal('1.00000001')
>>> c.next_plus(Decimal('-1E-1007'))
Decimal('-0E-1007')
>>> ExtendedContext.next_plus(Decimal('-1.00000003'))
Decimal('-1.00000002')
>>> c.next_plus(Decimal('-Infinity'))
Decimal('-9.99999999E+999')
>>> c.next_plus(1)
Decimal('1.00000001')
"""
a = _convert_other(a, raiseit=True)
return a.next_plus(context=self)
def next_toward(self, a, b):
"""Returns the number closest to a, in direction towards b.
The result is the closest representable number from the first
operand (but not the first operand) that is in the direction
towards the second operand, unless the operands have the same
value.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.next_toward(Decimal('1'), Decimal('2'))
Decimal('1.00000001')
>>> c.next_toward(Decimal('-1E-1007'), Decimal('1'))
Decimal('-0E-1007')
>>> c.next_toward(Decimal('-1.00000003'), Decimal('0'))
Decimal('-1.00000002')
>>> c.next_toward(Decimal('1'), Decimal('0'))
Decimal('0.999999999')
>>> c.next_toward(Decimal('1E-1007'), Decimal('-100'))
Decimal('0E-1007')
>>> c.next_toward(Decimal('-1.00000003'), Decimal('-10'))
Decimal('-1.00000004')
>>> c.next_toward(Decimal('0.00'), Decimal('-0.0000'))
Decimal('-0.00')
>>> c.next_toward(0, 1)
Decimal('1E-1007')
>>> c.next_toward(Decimal(0), 1)
Decimal('1E-1007')
>>> c.next_toward(0, Decimal(1))
Decimal('1E-1007')
"""
a = _convert_other(a, raiseit=True)
return a.next_toward(b, context=self)
def normalize(self, a):
"""normalize reduces an operand to its simplest form.
Essentially a plus operation with all trailing zeros removed from the
result.
>>> ExtendedContext.normalize(Decimal('2.1'))
Decimal('2.1')
>>> ExtendedContext.normalize(Decimal('-2.0'))
Decimal('-2')
>>> ExtendedContext.normalize(Decimal('1.200'))
Decimal('1.2')
>>> ExtendedContext.normalize(Decimal('-120'))
Decimal('-1.2E+2')
>>> ExtendedContext.normalize(Decimal('120.00'))
Decimal('1.2E+2')
>>> ExtendedContext.normalize(Decimal('0.00'))
Decimal('0')
>>> ExtendedContext.normalize(6)
Decimal('6')
"""
a = _convert_other(a, raiseit=True)
return a.normalize(context=self)
def number_class(self, a):
"""Returns an indication of the class of the operand.
The class is one of the following strings:
-sNaN
-NaN
-Infinity
-Normal
-Subnormal
-Zero
+Zero
+Subnormal
+Normal
+Infinity
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.number_class(Decimal('Infinity'))
'+Infinity'
>>> c.number_class(Decimal('1E-10'))
'+Normal'
>>> c.number_class(Decimal('2.50'))
'+Normal'
>>> c.number_class(Decimal('0.1E-999'))
'+Subnormal'
>>> c.number_class(Decimal('0'))
'+Zero'
>>> c.number_class(Decimal('-0'))
'-Zero'
>>> c.number_class(Decimal('-0.1E-999'))
'-Subnormal'
>>> c.number_class(Decimal('-1E-10'))
'-Normal'
>>> c.number_class(Decimal('-2.50'))
'-Normal'
>>> c.number_class(Decimal('-Infinity'))
'-Infinity'
>>> c.number_class(Decimal('NaN'))
'NaN'
>>> c.number_class(Decimal('-NaN'))
'NaN'
>>> c.number_class(Decimal('sNaN'))
'sNaN'
>>> c.number_class(123)
'+Normal'
"""
a = _convert_other(a, raiseit=True)
return a.number_class(context=self)
def plus(self, a):
"""Plus corresponds to unary prefix plus in Python.
The operation is evaluated using the same rules as add; the
operation plus(a) is calculated as add('0', a) where the '0'
has the same exponent as the operand.
>>> ExtendedContext.plus(Decimal('1.3'))
Decimal('1.3')
>>> ExtendedContext.plus(Decimal('-1.3'))
Decimal('-1.3')
>>> ExtendedContext.plus(-1)
Decimal('-1')
"""
a = _convert_other(a, raiseit=True)
return a.__pos__(context=self)
def power(self, a, b, modulo=None):
"""Raises a to the power of b, to modulo if given.
With two arguments, compute a**b. If a is negative then b
must be integral. The result will be inexact unless b is
integral and the result is finite and can be expressed exactly
in 'precision' digits.
With three arguments, compute (a**b) % modulo. For the
three argument form, the following restrictions on the
arguments hold:
- all three arguments must be integral
- b must be nonnegative
- at least one of a or b must be nonzero
- modulo must be nonzero and have at most 'precision' digits
The result of pow(a, b, modulo) is identical to the result
that would be obtained by computing (a**b) % modulo with
unbounded precision, but is computed more efficiently. It is
always exact.
>>> c = ExtendedContext.copy()
>>> c.Emin = -999
>>> c.Emax = 999
>>> c.power(Decimal('2'), Decimal('3'))
Decimal('8')
>>> c.power(Decimal('-2'), Decimal('3'))
Decimal('-8')
>>> c.power(Decimal('2'), Decimal('-3'))
Decimal('0.125')
>>> c.power(Decimal('1.7'), Decimal('8'))
Decimal('69.7575744')
>>> c.power(Decimal('10'), Decimal('0.301029996'))
Decimal('2.00000000')
>>> c.power(Decimal('Infinity'), Decimal('-1'))
Decimal('0')
>>> c.power(Decimal('Infinity'), Decimal('0'))
Decimal('1')
>>> c.power(Decimal('Infinity'), Decimal('1'))
Decimal('Infinity')
>>> c.power(Decimal('-Infinity'), Decimal('-1'))
Decimal('-0')
>>> c.power(Decimal('-Infinity'), Decimal('0'))
Decimal('1')
>>> c.power(Decimal('-Infinity'), Decimal('1'))
Decimal('-Infinity')
>>> c.power(Decimal('-Infinity'), Decimal('2'))
Decimal('Infinity')
>>> c.power(Decimal('0'), Decimal('0'))
Decimal('NaN')
>>> c.power(Decimal('3'), Decimal('7'), Decimal('16'))
Decimal('11')
>>> c.power(Decimal('-3'), Decimal('7'), Decimal('16'))
Decimal('-11')
>>> c.power(Decimal('-3'), Decimal('8'), Decimal('16'))
Decimal('1')
>>> c.power(Decimal('3'), Decimal('7'), Decimal('-16'))
Decimal('11')
>>> c.power(Decimal('23E12345'), Decimal('67E189'), Decimal('123456789'))
Decimal('11729830')
>>> c.power(Decimal('-0'), Decimal('17'), Decimal('1729'))
Decimal('-0')
>>> c.power(Decimal('-23'), Decimal('0'), Decimal('65537'))
Decimal('1')
>>> ExtendedContext.power(7, 7)
Decimal('823543')
>>> ExtendedContext.power(Decimal(7), 7)
Decimal('823543')
>>> ExtendedContext.power(7, Decimal(7), 2)
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
r = a.__pow__(b, modulo, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def quantize(self, a, b):
"""Returns a value equal to 'a' (rounded), having the exponent of 'b'.
The coefficient of the result is derived from that of the left-hand
operand. It may be rounded using the current rounding setting (if the
exponent is being increased), multiplied by a positive power of ten (if
the exponent is being decreased), or is unchanged (if the exponent is
already equal to that of the right-hand operand).
Unlike other operations, if the length of the coefficient after the
quantize operation would be greater than precision then an Invalid
operation condition is raised. This guarantees that, unless there is
an error condition, the exponent of the result of a quantize is always
equal to that of the right-hand operand.
Also unlike other operations, quantize will never raise Underflow, even
if the result is subnormal and inexact.
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('0.001'))
Decimal('2.170')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('0.01'))
Decimal('2.17')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('0.1'))
Decimal('2.2')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('1e+0'))
Decimal('2')
>>> ExtendedContext.quantize(Decimal('2.17'), Decimal('1e+1'))
Decimal('0E+1')
>>> ExtendedContext.quantize(Decimal('-Inf'), Decimal('Infinity'))
Decimal('-Infinity')
>>> ExtendedContext.quantize(Decimal('2'), Decimal('Infinity'))
Decimal('NaN')
>>> ExtendedContext.quantize(Decimal('-0.1'), Decimal('1'))
Decimal('-0')
>>> ExtendedContext.quantize(Decimal('-0'), Decimal('1e+5'))
Decimal('-0E+5')
>>> ExtendedContext.quantize(Decimal('+35236450.6'), Decimal('1e-2'))
Decimal('NaN')
>>> ExtendedContext.quantize(Decimal('-35236450.6'), Decimal('1e-2'))
Decimal('NaN')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e-1'))
Decimal('217.0')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e-0'))
Decimal('217')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e+1'))
Decimal('2.2E+2')
>>> ExtendedContext.quantize(Decimal('217'), Decimal('1e+2'))
Decimal('2E+2')
>>> ExtendedContext.quantize(1, 2)
Decimal('1')
>>> ExtendedContext.quantize(Decimal(1), 2)
Decimal('1')
>>> ExtendedContext.quantize(1, Decimal(2))
Decimal('1')
"""
a = _convert_other(a, raiseit=True)
return a.quantize(b, context=self)
def radix(self):
"""Just returns 10, as this is Decimal, :)
>>> ExtendedContext.radix()
Decimal('10')
"""
return Decimal(10)
def remainder(self, a, b):
"""Returns the remainder from integer division.
The result is the residue of the dividend after the operation of
calculating integer division as described for divide-integer, rounded
to precision digits if necessary. The sign of the result, if
non-zero, is the same as that of the original dividend.
This operation will fail under the same conditions as integer division
(that is, if integer division on the same two operands would fail, the
remainder cannot be calculated).
>>> ExtendedContext.remainder(Decimal('2.1'), Decimal('3'))
Decimal('2.1')
>>> ExtendedContext.remainder(Decimal('10'), Decimal('3'))
Decimal('1')
>>> ExtendedContext.remainder(Decimal('-10'), Decimal('3'))
Decimal('-1')
>>> ExtendedContext.remainder(Decimal('10.2'), Decimal('1'))
Decimal('0.2')
>>> ExtendedContext.remainder(Decimal('10'), Decimal('0.3'))
Decimal('0.1')
>>> ExtendedContext.remainder(Decimal('3.6'), Decimal('1.3'))
Decimal('1.0')
>>> ExtendedContext.remainder(22, 6)
Decimal('4')
>>> ExtendedContext.remainder(Decimal(22), 6)
Decimal('4')
>>> ExtendedContext.remainder(22, Decimal(6))
Decimal('4')
"""
a = _convert_other(a, raiseit=True)
r = a.__mod__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def remainder_near(self, a, b):
"""Returns to be "a - b * n", where n is the integer nearest the exact
value of "x / b" (if two integers are equally near then the even one
is chosen). If the result is equal to 0 then its sign will be the
sign of a.
This operation will fail under the same conditions as integer division
(that is, if integer division on the same two operands would fail, the
remainder cannot be calculated).
>>> ExtendedContext.remainder_near(Decimal('2.1'), Decimal('3'))
Decimal('-0.9')
>>> ExtendedContext.remainder_near(Decimal('10'), Decimal('6'))
Decimal('-2')
>>> ExtendedContext.remainder_near(Decimal('10'), Decimal('3'))
Decimal('1')
>>> ExtendedContext.remainder_near(Decimal('-10'), Decimal('3'))
Decimal('-1')
>>> ExtendedContext.remainder_near(Decimal('10.2'), Decimal('1'))
Decimal('0.2')
>>> ExtendedContext.remainder_near(Decimal('10'), Decimal('0.3'))
Decimal('0.1')
>>> ExtendedContext.remainder_near(Decimal('3.6'), Decimal('1.3'))
Decimal('-0.3')
>>> ExtendedContext.remainder_near(3, 11)
Decimal('3')
>>> ExtendedContext.remainder_near(Decimal(3), 11)
Decimal('3')
>>> ExtendedContext.remainder_near(3, Decimal(11))
Decimal('3')
"""
a = _convert_other(a, raiseit=True)
return a.remainder_near(b, context=self)
def rotate(self, a, b):
"""Returns a rotated copy of a, b times.
The coefficient of the result is a rotated copy of the digits in
the coefficient of the first operand. The number of places of
rotation is taken from the absolute value of the second operand,
with the rotation being to the left if the second operand is
positive or to the right otherwise.
>>> ExtendedContext.rotate(Decimal('34'), Decimal('8'))
Decimal('400000003')
>>> ExtendedContext.rotate(Decimal('12'), Decimal('9'))
Decimal('12')
>>> ExtendedContext.rotate(Decimal('123456789'), Decimal('-2'))
Decimal('891234567')
>>> ExtendedContext.rotate(Decimal('123456789'), Decimal('0'))
Decimal('123456789')
>>> ExtendedContext.rotate(Decimal('123456789'), Decimal('+2'))
Decimal('345678912')
>>> ExtendedContext.rotate(1333333, 1)
Decimal('13333330')
>>> ExtendedContext.rotate(Decimal(1333333), 1)
Decimal('13333330')
>>> ExtendedContext.rotate(1333333, Decimal(1))
Decimal('13333330')
"""
a = _convert_other(a, raiseit=True)
return a.rotate(b, context=self)
def same_quantum(self, a, b):
"""Returns True if the two operands have the same exponent.
The result is never affected by either the sign or the coefficient of
either operand.
>>> ExtendedContext.same_quantum(Decimal('2.17'), Decimal('0.001'))
False
>>> ExtendedContext.same_quantum(Decimal('2.17'), Decimal('0.01'))
True
>>> ExtendedContext.same_quantum(Decimal('2.17'), Decimal('1'))
False
>>> ExtendedContext.same_quantum(Decimal('Inf'), Decimal('-Inf'))
True
>>> ExtendedContext.same_quantum(10000, -1)
True
>>> ExtendedContext.same_quantum(Decimal(10000), -1)
True
>>> ExtendedContext.same_quantum(10000, Decimal(-1))
True
"""
a = _convert_other(a, raiseit=True)
return a.same_quantum(b)
def scaleb (self, a, b):
"""Returns the first operand after adding the second value its exp.
>>> ExtendedContext.scaleb(Decimal('7.50'), Decimal('-2'))
Decimal('0.0750')
>>> ExtendedContext.scaleb(Decimal('7.50'), Decimal('0'))
Decimal('7.50')
>>> ExtendedContext.scaleb(Decimal('7.50'), Decimal('3'))
Decimal('7.50E+3')
>>> ExtendedContext.scaleb(1, 4)
Decimal('1E+4')
>>> ExtendedContext.scaleb(Decimal(1), 4)
Decimal('1E+4')
>>> ExtendedContext.scaleb(1, Decimal(4))
Decimal('1E+4')
"""
a = _convert_other(a, raiseit=True)
return a.scaleb(b, context=self)
def shift(self, a, b):
"""Returns a shifted copy of a, b times.
The coefficient of the result is a shifted copy of the digits
in the coefficient of the first operand. The number of places
to shift is taken from the absolute value of the second operand,
with the shift being to the left if the second operand is
positive or to the right otherwise. Digits shifted into the
coefficient are zeros.
>>> ExtendedContext.shift(Decimal('34'), Decimal('8'))
Decimal('400000000')
>>> ExtendedContext.shift(Decimal('12'), Decimal('9'))
Decimal('0')
>>> ExtendedContext.shift(Decimal('123456789'), Decimal('-2'))
Decimal('1234567')
>>> ExtendedContext.shift(Decimal('123456789'), Decimal('0'))
Decimal('123456789')
>>> ExtendedContext.shift(Decimal('123456789'), Decimal('+2'))
Decimal('345678900')
>>> ExtendedContext.shift(88888888, 2)
Decimal('888888800')
>>> ExtendedContext.shift(Decimal(88888888), 2)
Decimal('888888800')
>>> ExtendedContext.shift(88888888, Decimal(2))
Decimal('888888800')
"""
a = _convert_other(a, raiseit=True)
return a.shift(b, context=self)
def sqrt(self, a):
"""Square root of a non-negative number to context precision.
If the result must be inexact, it is rounded using the round-half-even
algorithm.
>>> ExtendedContext.sqrt(Decimal('0'))
Decimal('0')
>>> ExtendedContext.sqrt(Decimal('-0'))
Decimal('-0')
>>> ExtendedContext.sqrt(Decimal('0.39'))
Decimal('0.624499800')
>>> ExtendedContext.sqrt(Decimal('100'))
Decimal('10')
>>> ExtendedContext.sqrt(Decimal('1'))
Decimal('1')
>>> ExtendedContext.sqrt(Decimal('1.0'))
Decimal('1.0')
>>> ExtendedContext.sqrt(Decimal('1.00'))
Decimal('1.0')
>>> ExtendedContext.sqrt(Decimal('7'))
Decimal('2.64575131')
>>> ExtendedContext.sqrt(Decimal('10'))
Decimal('3.16227766')
>>> ExtendedContext.sqrt(2)
Decimal('1.41421356')
>>> ExtendedContext.prec
9
"""
a = _convert_other(a, raiseit=True)
return a.sqrt(context=self)
def subtract(self, a, b):
"""Return the difference between the two operands.
>>> ExtendedContext.subtract(Decimal('1.3'), Decimal('1.07'))
Decimal('0.23')
>>> ExtendedContext.subtract(Decimal('1.3'), Decimal('1.30'))
Decimal('0.00')
>>> ExtendedContext.subtract(Decimal('1.3'), Decimal('2.07'))
Decimal('-0.77')
>>> ExtendedContext.subtract(8, 5)
Decimal('3')
>>> ExtendedContext.subtract(Decimal(8), 5)
Decimal('3')
>>> ExtendedContext.subtract(8, Decimal(5))
Decimal('3')
"""
a = _convert_other(a, raiseit=True)
r = a.__sub__(b, context=self)
if r is NotImplemented:
raise TypeError("Unable to convert %s to Decimal" % b)
else:
return r
def to_eng_string(self, a):
"""Converts a number to a string, using scientific notation.
The operation is not affected by the context.
"""
a = _convert_other(a, raiseit=True)
return a.to_eng_string(context=self)
def to_sci_string(self, a):
"""Converts a number to a string, using scientific notation.
The operation is not affected by the context.
"""
a = _convert_other(a, raiseit=True)
return a.__str__(context=self)
def to_integral_exact(self, a):
"""Rounds to an integer.
When the operand has a negative exponent, the result is the same
as using the quantize() operation using the given operand as the
left-hand-operand, 1E+0 as the right-hand-operand, and the precision
of the operand as the precision setting; Inexact and Rounded flags
are allowed in this operation. The rounding mode is taken from the
context.
>>> ExtendedContext.to_integral_exact(Decimal('2.1'))
Decimal('2')
>>> ExtendedContext.to_integral_exact(Decimal('100'))
Decimal('100')
>>> ExtendedContext.to_integral_exact(Decimal('100.0'))
Decimal('100')
>>> ExtendedContext.to_integral_exact(Decimal('101.5'))
Decimal('102')
>>> ExtendedContext.to_integral_exact(Decimal('-101.5'))
Decimal('-102')
>>> ExtendedContext.to_integral_exact(Decimal('10E+5'))
Decimal('1.0E+6')
>>> ExtendedContext.to_integral_exact(Decimal('7.89E+77'))
Decimal('7.89E+77')
>>> ExtendedContext.to_integral_exact(Decimal('-Inf'))
Decimal('-Infinity')
"""
a = _convert_other(a, raiseit=True)
return a.to_integral_exact(context=self)
def to_integral_value(self, a):
"""Rounds to an integer.
When the operand has a negative exponent, the result is the same
as using the quantize() operation using the given operand as the
left-hand-operand, 1E+0 as the right-hand-operand, and the precision
of the operand as the precision setting, except that no flags will
be set. The rounding mode is taken from the context.
>>> ExtendedContext.to_integral_value(Decimal('2.1'))
Decimal('2')
>>> ExtendedContext.to_integral_value(Decimal('100'))
Decimal('100')
>>> ExtendedContext.to_integral_value(Decimal('100.0'))
Decimal('100')
>>> ExtendedContext.to_integral_value(Decimal('101.5'))
Decimal('102')
>>> ExtendedContext.to_integral_value(Decimal('-101.5'))
Decimal('-102')
>>> ExtendedContext.to_integral_value(Decimal('10E+5'))
Decimal('1.0E+6')
>>> ExtendedContext.to_integral_value(Decimal('7.89E+77'))
Decimal('7.89E+77')
>>> ExtendedContext.to_integral_value(Decimal('-Inf'))
Decimal('-Infinity')
"""
a = _convert_other(a, raiseit=True)
return a.to_integral_value(context=self)
# the method name changed, but we provide also the old one, for compatibility
to_integral = to_integral_value
class _WorkRep(object):
__slots__ = ('sign','int','exp')
# sign: 0 or 1
# int: int
# exp: None, int, or string
def __init__(self, value=None):
if value is None:
self.sign = None
self.int = 0
self.exp = None
elif isinstance(value, Decimal):
self.sign = value._sign
self.int = int(value._int)
self.exp = value._exp
else:
# assert isinstance(value, tuple)
self.sign = value[0]
self.int = value[1]
self.exp = value[2]
def __repr__(self):
return "(%r, %r, %r)" % (self.sign, self.int, self.exp)
__str__ = __repr__
def _normalize(op1, op2, prec = 0):
"""Normalizes op1, op2 to have the same exp and length of coefficient.
Done during addition.
"""
if op1.exp < op2.exp:
tmp = op2
other = op1
else:
tmp = op1
other = op2
# Let exp = min(tmp.exp - 1, tmp.adjusted() - precision - 1).
# Then adding 10**exp to tmp has the same effect (after rounding)
# as adding any positive quantity smaller than 10**exp; similarly
# for subtraction. So if other is smaller than 10**exp we replace
# it with 10**exp. This avoids tmp.exp - other.exp getting too large.
tmp_len = len(str(tmp.int))
other_len = len(str(other.int))
exp = tmp.exp + min(-1, tmp_len - prec - 2)
if other_len + other.exp - 1 < exp:
other.int = 1
other.exp = exp
tmp.int *= 10 ** (tmp.exp - other.exp)
tmp.exp = other.exp
return op1, op2
##### Integer arithmetic functions used by ln, log10, exp and __pow__ #####
_nbits = int.bit_length
def _decimal_lshift_exact(n, e):
""" Given integers n and e, return n * 10**e if it's an integer, else None.
The computation is designed to avoid computing large powers of 10
unnecessarily.
>>> _decimal_lshift_exact(3, 4)
30000
>>> _decimal_lshift_exact(300, -999999999) # returns None
"""
if n == 0:
return 0
elif e >= 0:
return n * 10**e
else:
# val_n = largest power of 10 dividing n.
str_n = str(abs(n))
val_n = len(str_n) - len(str_n.rstrip('0'))
return None if val_n < -e else n // 10**-e
def _sqrt_nearest(n, a):
"""Closest integer to the square root of the positive integer n. a is
an initial approximation to the square root. Any positive integer
will do for a, but the closer a is to the square root of n the
faster convergence will be.
"""
if n <= 0 or a <= 0:
raise ValueError("Both arguments to _sqrt_nearest should be positive.")
b=0
while a != b:
b, a = a, a--n//a>>1
return a
def _rshift_nearest(x, shift):
"""Given an integer x and a nonnegative integer shift, return closest
integer to x / 2**shift; use round-to-even in case of a tie.
"""
b, q = 1 << shift, x >> shift
return q + (2*(x & (b-1)) + (q&1) > b)
def _div_nearest(a, b):
"""Closest integer to a/b, a and b positive integers; rounds to even
in the case of a tie.
"""
q, r = divmod(a, b)
return q + (2*r + (q&1) > b)
def _ilog(x, M, L = 8):
"""Integer approximation to M*log(x/M), with absolute error boundable
in terms only of x/M.
Given positive integers x and M, return an integer approximation to
M * log(x/M). For L = 8 and 0.1 <= x/M <= 10 the difference
between the approximation and the exact result is at most 22. For
L = 8 and 1.0 <= x/M <= 10.0 the difference is at most 15. In
both cases these are upper bounds on the error; it will usually be
much smaller."""
# The basic algorithm is the following: let log1p be the function
# log1p(x) = log(1+x). Then log(x/M) = log1p((x-M)/M). We use
# the reduction
#
# log1p(y) = 2*log1p(y/(1+sqrt(1+y)))
#
# repeatedly until the argument to log1p is small (< 2**-L in
# absolute value). For small y we can use the Taylor series
# expansion
#
# log1p(y) ~ y - y**2/2 + y**3/3 - ... - (-y)**T/T
#
# truncating at T such that y**T is small enough. The whole
# computation is carried out in a form of fixed-point arithmetic,
# with a real number z being represented by an integer
# approximation to z*M. To avoid loss of precision, the y below
# is actually an integer approximation to 2**R*y*M, where R is the
# number of reductions performed so far.
y = x-M
# argument reduction; R = number of reductions performed
R = 0
while (R <= L and abs(y) << L-R >= M or
R > L and abs(y) >> R-L >= M):
y = _div_nearest((M*y) << 1,
M + _sqrt_nearest(M*(M+_rshift_nearest(y, R)), M))
R += 1
# Taylor series with T terms
T = -int(-10*len(str(M))//(3*L))
yshift = _rshift_nearest(y, R)
w = _div_nearest(M, T)
for k in range(T-1, 0, -1):
w = _div_nearest(M, k) - _div_nearest(yshift*w, M)
return _div_nearest(w*y, M)
def _dlog10(c, e, p):
"""Given integers c, e and p with c > 0, p >= 0, compute an integer
approximation to 10**p * log10(c*10**e), with an absolute error of
at most 1. Assumes that c*10**e is not exactly 1."""
# increase precision by 2; compensate for this by dividing
# final result by 100
p += 2
# write c*10**e as d*10**f with either:
# f >= 0 and 1 <= d <= 10, or
# f <= 0 and 0.1 <= d <= 1.
# Thus for c*10**e close to 1, f = 0
l = len(str(c))
f = e+l - (e+l >= 1)
if p > 0:
M = 10**p
k = e+p-f
if k >= 0:
c *= 10**k
else:
c = _div_nearest(c, 10**-k)
log_d = _ilog(c, M) # error < 5 + 22 = 27
log_10 = _log10_digits(p) # error < 1
log_d = _div_nearest(log_d*M, log_10)
log_tenpower = f*M # exact
else:
log_d = 0 # error < 2.31
log_tenpower = _div_nearest(f, 10**-p) # error < 0.5
return _div_nearest(log_tenpower+log_d, 100)
def _dlog(c, e, p):
"""Given integers c, e and p with c > 0, compute an integer
approximation to 10**p * log(c*10**e), with an absolute error of
at most 1. Assumes that c*10**e is not exactly 1."""
# Increase precision by 2. The precision increase is compensated
# for at the end with a division by 100.
p += 2
# rewrite c*10**e as d*10**f with either f >= 0 and 1 <= d <= 10,
# or f <= 0 and 0.1 <= d <= 1. Then we can compute 10**p * log(c*10**e)
# as 10**p * log(d) + 10**p*f * log(10).
l = len(str(c))
f = e+l - (e+l >= 1)
# compute approximation to 10**p*log(d), with error < 27
if p > 0:
k = e+p-f
if k >= 0:
c *= 10**k
else:
c = _div_nearest(c, 10**-k) # error of <= 0.5 in c
# _ilog magnifies existing error in c by a factor of at most 10
log_d = _ilog(c, 10**p) # error < 5 + 22 = 27
else:
# p <= 0: just approximate the whole thing by 0; error < 2.31
log_d = 0
# compute approximation to f*10**p*log(10), with error < 11.
if f:
extra = len(str(abs(f)))-1
if p + extra >= 0:
# error in f * _log10_digits(p+extra) < |f| * 1 = |f|
# after division, error < |f|/10**extra + 0.5 < 10 + 0.5 < 11
f_log_ten = _div_nearest(f*_log10_digits(p+extra), 10**extra)
else:
f_log_ten = 0
else:
f_log_ten = 0
# error in sum < 11+27 = 38; error after division < 0.38 + 0.5 < 1
return _div_nearest(f_log_ten + log_d, 100)
class _Log10Memoize(object):
"""Class to compute, store, and allow retrieval of, digits of the
constant log(10) = 2.302585.... This constant is needed by
Decimal.ln, Decimal.log10, Decimal.exp and Decimal.__pow__."""
def __init__(self):
self.digits = "23025850929940456840179914546843642076011014886"
def getdigits(self, p):
"""Given an integer p >= 0, return floor(10**p)*log(10).
For example, self.getdigits(3) returns 2302.
"""
# digits are stored as a string, for quick conversion to
# integer in the case that we've already computed enough
# digits; the stored digits should always be correct
# (truncated, not rounded to nearest).
if p < 0:
raise ValueError("p should be nonnegative")
if p >= len(self.digits):
# compute p+3, p+6, p+9, ... digits; continue until at
# least one of the extra digits is nonzero
extra = 3
while True:
# compute p+extra digits, correct to within 1ulp
M = 10**(p+extra+2)
digits = str(_div_nearest(_ilog(10*M, M), 100))
if digits[-extra:] != '0'*extra:
break
extra += 3
# keep all reliable digits so far; remove trailing zeros
# and next nonzero digit
self.digits = digits.rstrip('0')[:-1]
return int(self.digits[:p+1])
_log10_digits = _Log10Memoize().getdigits
def _iexp(x, M, L=8):
"""Given integers x and M, M > 0, such that x/M is small in absolute
value, compute an integer approximation to M*exp(x/M). For 0 <=
x/M <= 2.4, the absolute error in the result is bounded by 60 (and
is usually much smaller)."""
# Algorithm: to compute exp(z) for a real number z, first divide z
# by a suitable power R of 2 so that |z/2**R| < 2**-L. Then
# compute expm1(z/2**R) = exp(z/2**R) - 1 using the usual Taylor
# series
#
# expm1(x) = x + x**2/2! + x**3/3! + ...
#
# Now use the identity
#
# expm1(2x) = expm1(x)*(expm1(x)+2)
#
# R times to compute the sequence expm1(z/2**R),
# expm1(z/2**(R-1)), ... , exp(z/2), exp(z).
# Find R such that x/2**R/M <= 2**-L
R = _nbits((x<<L)//M)
# Taylor series. (2**L)**T > M
T = -int(-10*len(str(M))//(3*L))
y = _div_nearest(x, T)
Mshift = M<<R
for i in range(T-1, 0, -1):
y = _div_nearest(x*(Mshift + y), Mshift * i)
# Expansion
for k in range(R-1, -1, -1):
Mshift = M<<(k+2)
y = _div_nearest(y*(y+Mshift), Mshift)
return M+y
def _dexp(c, e, p):
"""Compute an approximation to exp(c*10**e), with p decimal places of
precision.
Returns integers d, f such that:
10**(p-1) <= d <= 10**p, and
(d-1)*10**f < exp(c*10**e) < (d+1)*10**f
In other words, d*10**f is an approximation to exp(c*10**e) with p
digits of precision, and with an error in d of at most 1. This is
almost, but not quite, the same as the error being < 1ulp: when d
= 10**(p-1) the error could be up to 10 ulp."""
# we'll call iexp with M = 10**(p+2), giving p+3 digits of precision
p += 2
# compute log(10) with extra precision = adjusted exponent of c*10**e
extra = max(0, e + len(str(c)) - 1)
q = p + extra
# compute quotient c*10**e/(log(10)) = c*10**(e+q)/(log(10)*10**q),
# rounding down
shift = e+q
if shift >= 0:
cshift = c*10**shift
else:
cshift = c//10**-shift
quot, rem = divmod(cshift, _log10_digits(q))
# reduce remainder back to original precision
rem = _div_nearest(rem, 10**extra)
# error in result of _iexp < 120; error after division < 0.62
return _div_nearest(_iexp(rem, 10**p), 1000), quot - p + 3
def _dpower(xc, xe, yc, ye, p):
"""Given integers xc, xe, yc and ye representing Decimals x = xc*10**xe and
y = yc*10**ye, compute x**y. Returns a pair of integers (c, e) such that:
10**(p-1) <= c <= 10**p, and
(c-1)*10**e < x**y < (c+1)*10**e
in other words, c*10**e is an approximation to x**y with p digits
of precision, and with an error in c of at most 1. (This is
almost, but not quite, the same as the error being < 1ulp: when c
== 10**(p-1) we can only guarantee error < 10ulp.)
We assume that: x is positive and not equal to 1, and y is nonzero.
"""
# Find b such that 10**(b-1) <= |y| <= 10**b
b = len(str(abs(yc))) + ye
# log(x) = lxc*10**(-p-b-1), to p+b+1 places after the decimal point
lxc = _dlog(xc, xe, p+b+1)
# compute product y*log(x) = yc*lxc*10**(-p-b-1+ye) = pc*10**(-p-1)
shift = ye-b
if shift >= 0:
pc = lxc*yc*10**shift
else:
pc = _div_nearest(lxc*yc, 10**-shift)
if pc == 0:
# we prefer a result that isn't exactly 1; this makes it
# easier to compute a correctly rounded result in __pow__
if ((len(str(xc)) + xe >= 1) == (yc > 0)): # if x**y > 1:
coeff, exp = 10**(p-1)+1, 1-p
else:
coeff, exp = 10**p-1, -p
else:
coeff, exp = _dexp(pc, -(p+1), p+1)
coeff = _div_nearest(coeff, 10)
exp += 1
return coeff, exp
def _log10_lb(c, correction = {
'1': 100, '2': 70, '3': 53, '4': 40, '5': 31,
'6': 23, '7': 16, '8': 10, '9': 5}):
"""Compute a lower bound for 100*log10(c) for a positive integer c."""
if c <= 0:
raise ValueError("The argument to _log10_lb should be nonnegative.")
str_c = str(c)
return 100*len(str_c) - correction[str_c[0]]
##### Helper Functions ####################################################
def _convert_other(other, raiseit=False, allow_float=False):
"""Convert other to Decimal.
Verifies that it's ok to use in an implicit construction.
If allow_float is true, allow conversion from float; this
is used in the comparison methods (__eq__ and friends).
"""
if isinstance(other, Decimal):
return other
if isinstance(other, int):
return Decimal(other)
if allow_float and isinstance(other, float):
return Decimal.from_float(other)
if raiseit:
raise TypeError("Unable to convert %s to Decimal" % other)
return NotImplemented
def _convert_for_comparison(self, other, equality_op=False):
"""Given a Decimal instance self and a Python object other, return
a pair (s, o) of Decimal instances such that "s op o" is
equivalent to "self op other" for any of the 6 comparison
operators "op".
"""
if isinstance(other, Decimal):
return self, other
# Comparison with a Rational instance (also includes integers):
# self op n/d <=> self*d op n (for n and d integers, d positive).
# A NaN or infinity can be left unchanged without affecting the
# comparison result.
if isinstance(other, _numbers.Rational):
if not self._is_special:
self = _dec_from_triple(self._sign,
str(int(self._int) * other.denominator),
self._exp)
return self, Decimal(other.numerator)
# Comparisons with float and complex types. == and != comparisons
# with complex numbers should succeed, returning either True or False
# as appropriate. Other comparisons return NotImplemented.
if equality_op and isinstance(other, _numbers.Complex) and other.imag == 0:
other = other.real
if isinstance(other, float):
context = getcontext()
if equality_op:
context.flags[FloatOperation] = 1
else:
context._raise_error(FloatOperation,
"strict semantics for mixing floats and Decimals are enabled")
return self, Decimal.from_float(other)
return NotImplemented, NotImplemented
##### Setup Specific Contexts ############################################
# The default context prototype used by Context()
# Is mutable, so that new contexts can have different default values
DefaultContext = Context(
prec=28, rounding=ROUND_HALF_EVEN,
traps=[DivisionByZero, Overflow, InvalidOperation],
flags=[],
Emax=999999,
Emin=-999999,
capitals=1,
clamp=0
)
# Pre-made alternate contexts offered by the specification
# Don't change these; the user should be able to select these
# contexts and be able to reproduce results from other implementations
# of the spec.
BasicContext = Context(
prec=9, rounding=ROUND_HALF_UP,
traps=[DivisionByZero, Overflow, InvalidOperation, Clamped, Underflow],
flags=[],
)
ExtendedContext = Context(
prec=9, rounding=ROUND_HALF_EVEN,
traps=[],
flags=[],
)
##### crud for parsing strings #############################################
#
# Regular expression used for parsing numeric strings. Additional
# comments:
#
# 1. Uncomment the two '\s*' lines to allow leading and/or trailing
# whitespace. But note that the specification disallows whitespace in
# a numeric string.
#
# 2. For finite numbers (not infinities and NaNs) the body of the
# number between the optional sign and the optional exponent must have
# at least one decimal digit, possibly after the decimal point. The
# lookahead expression '(?=\d|\.\d)' checks this.
import re
_parser = re.compile(r""" # A numeric string consists of:
# \s*
(?P<sign>[-+])? # an optional sign, followed by either...
(
(?=\d|\.\d) # ...a number (with at least one digit)
(?P<int>\d*) # having a (possibly empty) integer part
(\.(?P<frac>\d*))? # followed by an optional fractional part
(E(?P<exp>[-+]?\d+))? # followed by an optional exponent, or...
|
Inf(inity)? # ...an infinity, or...
|
(?P<signal>s)? # ...an (optionally signaling)
NaN # NaN
(?P<diag>\d*) # with (possibly empty) diagnostic info.
)
# \s*
\Z
""", re.VERBOSE | re.IGNORECASE).match
_all_zeros = re.compile('0*$').match
_exact_half = re.compile('50*$').match
##### PEP3101 support functions ##############################################
# The functions in this section have little to do with the Decimal
# class, and could potentially be reused or adapted for other pure
# Python numeric classes that want to implement __format__
#
# A format specifier for Decimal looks like:
#
# [[fill]align][sign][#][0][minimumwidth][,][.precision][type]
_parse_format_specifier_regex = re.compile(r"""\A
(?:
(?P<fill>.)?
(?P<align>[<>=^])
)?
(?P<sign>[-+ ])?
(?P<alt>\#)?
(?P<zeropad>0)?
(?P<minimumwidth>(?!0)\d+)?
(?P<thousands_sep>,)?
(?:\.(?P<precision>0|(?!0)\d+))?
(?P<type>[eEfFgGn%])?
\Z
""", re.VERBOSE|re.DOTALL)
del re
# The locale module is only needed for the 'n' format specifier. The
# rest of the PEP 3101 code functions quite happily without it, so we
# don't care too much if locale isn't present.
try:
import locale as _locale
except ImportError:
pass
def _parse_format_specifier(format_spec, _localeconv=None):
"""Parse and validate a format specifier.
Turns a standard numeric format specifier into a dict, with the
following entries:
fill: fill character to pad field to minimum width
align: alignment type, either '<', '>', '=' or '^'
sign: either '+', '-' or ' '
minimumwidth: nonnegative integer giving minimum width
zeropad: boolean, indicating whether to pad with zeros
thousands_sep: string to use as thousands separator, or ''
grouping: grouping for thousands separators, in format
used by localeconv
decimal_point: string to use for decimal point
precision: nonnegative integer giving precision, or None
type: one of the characters 'eEfFgG%', or None
"""
m = _parse_format_specifier_regex.match(format_spec)
if m is None:
raise ValueError("Invalid format specifier: " + format_spec)
# get the dictionary
format_dict = m.groupdict()
# zeropad; defaults for fill and alignment. If zero padding
# is requested, the fill and align fields should be absent.
fill = format_dict['fill']
align = format_dict['align']
format_dict['zeropad'] = (format_dict['zeropad'] is not None)
if format_dict['zeropad']:
if fill is not None:
raise ValueError("Fill character conflicts with '0'"
" in format specifier: " + format_spec)
if align is not None:
raise ValueError("Alignment conflicts with '0' in "
"format specifier: " + format_spec)
format_dict['fill'] = fill or ' '
# PEP 3101 originally specified that the default alignment should
# be left; it was later agreed that right-aligned makes more sense
# for numeric types. See http://bugs.python.org/issue6857.
format_dict['align'] = align or '>'
# default sign handling: '-' for negative, '' for positive
if format_dict['sign'] is None:
format_dict['sign'] = '-'
# minimumwidth defaults to 0; precision remains None if not given
format_dict['minimumwidth'] = int(format_dict['minimumwidth'] or '0')
if format_dict['precision'] is not None:
format_dict['precision'] = int(format_dict['precision'])
# if format type is 'g' or 'G' then a precision of 0 makes little
# sense; convert it to 1. Same if format type is unspecified.
if format_dict['precision'] == 0:
if format_dict['type'] is None or format_dict['type'] in 'gGn':
format_dict['precision'] = 1
# determine thousands separator, grouping, and decimal separator, and
# add appropriate entries to format_dict
if format_dict['type'] == 'n':
# apart from separators, 'n' behaves just like 'g'
format_dict['type'] = 'g'
if _localeconv is None:
_localeconv = _locale.localeconv()
if format_dict['thousands_sep'] is not None:
raise ValueError("Explicit thousands separator conflicts with "
"'n' type in format specifier: " + format_spec)
format_dict['thousands_sep'] = _localeconv['thousands_sep']
format_dict['grouping'] = _localeconv['grouping']
format_dict['decimal_point'] = _localeconv['decimal_point']
else:
if format_dict['thousands_sep'] is None:
format_dict['thousands_sep'] = ''
format_dict['grouping'] = [3, 0]
format_dict['decimal_point'] = '.'
return format_dict
def _format_align(sign, body, spec):
"""Given an unpadded, non-aligned numeric string 'body' and sign
string 'sign', add padding and alignment conforming to the given
format specifier dictionary 'spec' (as produced by
parse_format_specifier).
"""
# how much extra space do we have to play with?
minimumwidth = spec['minimumwidth']
fill = spec['fill']
padding = fill*(minimumwidth - len(sign) - len(body))
align = spec['align']
if align == '<':
result = sign + body + padding
elif align == '>':
result = padding + sign + body
elif align == '=':
result = sign + padding + body
elif align == '^':
half = len(padding)//2
result = padding[:half] + sign + body + padding[half:]
else:
raise ValueError('Unrecognised alignment field')
return result
def _group_lengths(grouping):
"""Convert a localeconv-style grouping into a (possibly infinite)
iterable of integers representing group lengths.
"""
# The result from localeconv()['grouping'], and the input to this
# function, should be a list of integers in one of the
# following three forms:
#
# (1) an empty list, or
# (2) nonempty list of positive integers + [0]
# (3) list of positive integers + [locale.CHAR_MAX], or
from itertools import chain, repeat
if not grouping:
return []
elif grouping[-1] == 0 and len(grouping) >= 2:
return chain(grouping[:-1], repeat(grouping[-2]))
elif grouping[-1] == _locale.CHAR_MAX:
return grouping[:-1]
else:
raise ValueError('unrecognised format for grouping')
def _insert_thousands_sep(digits, spec, min_width=1):
"""Insert thousands separators into a digit string.
spec is a dictionary whose keys should include 'thousands_sep' and
'grouping'; typically it's the result of parsing the format
specifier using _parse_format_specifier.
The min_width keyword argument gives the minimum length of the
result, which will be padded on the left with zeros if necessary.
If necessary, the zero padding adds an extra '0' on the left to
avoid a leading thousands separator. For example, inserting
commas every three digits in '123456', with min_width=8, gives
'0,123,456', even though that has length 9.
"""
sep = spec['thousands_sep']
grouping = spec['grouping']
groups = []
for l in _group_lengths(grouping):
if l <= 0:
raise ValueError("group length should be positive")
# max(..., 1) forces at least 1 digit to the left of a separator
l = min(max(len(digits), min_width, 1), l)
groups.append('0'*(l - len(digits)) + digits[-l:])
digits = digits[:-l]
min_width -= l
if not digits and min_width <= 0:
break
min_width -= len(sep)
else:
l = max(len(digits), min_width, 1)
groups.append('0'*(l - len(digits)) + digits[-l:])
return sep.join(reversed(groups))
def _format_sign(is_negative, spec):
"""Determine sign character."""
if is_negative:
return '-'
elif spec['sign'] in ' +':
return spec['sign']
else:
return ''
def _format_number(is_negative, intpart, fracpart, exp, spec):
"""Format a number, given the following data:
is_negative: true if the number is negative, else false
intpart: string of digits that must appear before the decimal point
fracpart: string of digits that must come after the point
exp: exponent, as an integer
spec: dictionary resulting from parsing the format specifier
This function uses the information in spec to:
insert separators (decimal separator and thousands separators)
format the sign
format the exponent
add trailing '%' for the '%' type
zero-pad if necessary
fill and align if necessary
"""
sign = _format_sign(is_negative, spec)
if fracpart or spec['alt']:
fracpart = spec['decimal_point'] + fracpart
if exp != 0 or spec['type'] in 'eE':
echar = {'E': 'E', 'e': 'e', 'G': 'E', 'g': 'e'}[spec['type']]
fracpart += "{0}{1:+}".format(echar, exp)
if spec['type'] == '%':
fracpart += '%'
if spec['zeropad']:
min_width = spec['minimumwidth'] - len(fracpart) - len(sign)
else:
min_width = 0
intpart = _insert_thousands_sep(intpart, spec, min_width)
return _format_align(sign, intpart+fracpart, spec)
##### Useful Constants (internal use only) ################################
# Reusable defaults
_Infinity = Decimal('Inf')
_NegativeInfinity = Decimal('-Inf')
_NaN = Decimal('NaN')
_Zero = Decimal(0)
_One = Decimal(1)
_NegativeOne = Decimal(-1)
# _SignedInfinity[sign] is infinity w/ that sign
_SignedInfinity = (_Infinity, _NegativeInfinity)
# Constants related to the hash implementation; hash(x) is based
# on the reduction of x modulo _PyHASH_MODULUS
_PyHASH_MODULUS = sys.hash_info.modulus
# hash values to use for positive and negative infinities, and nans
_PyHASH_INF = sys.hash_info.inf
_PyHASH_NAN = sys.hash_info.nan
# _PyHASH_10INV is the inverse of 10 modulo the prime _PyHASH_MODULUS
_PyHASH_10INV = pow(10, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
del sys
try:
import _decimal
except ImportError:
pass
else:
s1 = set(dir())
s2 = set(dir(_decimal))
for name in s1 - s2:
del globals()[name]
del s1, s2, name
from _decimal import *
if __name__ == '__main__':
import doctest, decimal
doctest.testmod(decimal)
"""
Module difflib -- helpers for computing deltas between objects.
Function get_close_matches(word, possibilities, n=3, cutoff=0.6):
Use SequenceMatcher to return list of the best "good enough" matches.
Function context_diff(a, b):
For two lists of strings, return a delta in context diff format.
Function ndiff(a, b):
Return a delta: the difference between `a` and `b` (lists of strings).
Function restore(delta, which):
Return one of the two sequences that generated an ndiff delta.
Function unified_diff(a, b):
For two lists of strings, return a delta in unified diff format.
Class SequenceMatcher:
A flexible class for comparing pairs of sequences of any type.
Class Differ:
For producing human-readable deltas from sequences of lines of text.
Class HtmlDiff:
For producing HTML side by side comparison with change highlights.
"""
__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher',
'Differ','IS_CHARACTER_JUNK', 'IS_LINE_JUNK', 'context_diff',
'unified_diff', 'HtmlDiff', 'Match']
import heapq
from collections import namedtuple as _namedtuple
Match = _namedtuple('Match', 'a b size')
def _calculate_ratio(matches, length):
if length:
return 2.0 * matches / length
return 1.0
class SequenceMatcher:
"""
SequenceMatcher is a flexible class for comparing pairs of sequences of
any type, so long as the sequence elements are hashable. The basic
algorithm predates, and is a little fancier than, an algorithm
published in the late 1980's by Ratcliff and Obershelp under the
hyperbolic name "gestalt pattern matching". The basic idea is to find
the longest contiguous matching subsequence that contains no "junk"
elements (R-O doesn't address junk). The same idea is then applied
recursively to the pieces of the sequences to the left and to the right
of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that "look right" to people.
SequenceMatcher tries to compute a "human-friendly diff" between two
sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
longest *contiguous* & junk-free matching subsequence. That's what
catches peoples' eyes. The Windows(tm) windiff has another interesting
notion, pairing up elements that appear uniquely in each sequence.
That, and the method here, appear to yield more intuitive difference
reports than does diff. This method appears to be the least vulnerable
to synching up on blocks of "junk lines", though (like blank lines in
ordinary text files, or maybe "<P>" lines in HTML files). That may be
because this is the only method of the 3 that has a *concept* of
"junk" <wink>.
Example, comparing two strings, and considering blanks to be "junk":
>>> s = SequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
>>>
.ratio() returns a float in [0, 1], measuring the "similarity" of the
sequences. As a rule of thumb, a .ratio() value over 0.6 means the
sequences are close matches:
>>> print(round(s.ratio(), 3))
0.86599999999999999
>>>
If you're only interested in where the sequences match,
.get_matching_blocks() is handy:
>>> for block in s.get_matching_blocks():
... print("a[%d] and b[%d] match for %d elements" % block)
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements
Note that the last tuple returned by .get_matching_blocks() is always a
dummy, (len(a), len(b), 0), and this is the only case in which the last
tuple element (number of elements matched) is 0.
If you want to know how to change the first sequence into the second,
use .get_opcodes():
>>> for opcode in s.get_opcodes():
... print("%6s a[%d:%d] b[%d:%d]" % opcode)
equal a[0:8] b[0:8]
insert a[8:8] b[8:17]
equal a[8:29] b[17:38]
See the Differ class for a fancy human-friendly file differencer, which
uses SequenceMatcher both to compare sequences of lines, and to compare
sequences of characters within similar (near-matching) lines.
See also function get_close_matches() in this module, which shows how
simple code building on SequenceMatcher can be used to do useful work.
Timing: Basic R-O is cubic time worst case and quadratic time expected
case. SequenceMatcher is quadratic time for the worst case and has
expected-case behavior dependent in a complicated way on how many
elements the sequences have in common; best case time is linear.
Methods:
__init__(isjunk=None, a='', b='')
Construct a SequenceMatcher.
set_seqs(a, b)
Set the two sequences to be compared.
set_seq1(a)
Set the first sequence to be compared.
set_seq2(b)
Set the second sequence to be compared.
find_longest_match(alo, ahi, blo, bhi)
Find longest matching block in a[alo:ahi] and b[blo:bhi].
get_matching_blocks()
Return list of triples describing matching subsequences.
get_opcodes()
Return list of 5-tuples describing how to turn a into b.
ratio()
Return a measure of the sequences' similarity (float in [0,1]).
quick_ratio()
Return an upper bound on .ratio() relatively quickly.
real_quick_ratio()
Return an upper bound on ratio() very quickly.
"""
def __init__(self, isjunk=None, a='', b='', autojunk=True):
"""Construct a SequenceMatcher.
Optional arg isjunk is None (the default), or a one-argument
function that takes a sequence element and returns true iff the
element is junk. None is equivalent to passing "lambda x: 0", i.e.
no elements are considered to be junk. For example, pass
lambda x: x in " \\t"
if you're comparing lines as sequences of characters, and don't
want to synch up on blanks or hard tabs.
Optional arg a is the first of two sequences to be compared. By
default, an empty string. The elements of a must be hashable. See
also .set_seqs() and .set_seq1().
Optional arg b is the second of two sequences to be compared. By
default, an empty string. The elements of b must be hashable. See
also .set_seqs() and .set_seq2().
Optional arg autojunk should be set to False to disable the
"automatic junk heuristic" that treats popular elements as junk
(see module documentation for more information).
"""
# Members:
# a
# first sequence
# b
# second sequence; differences are computed as "what do
# we need to do to 'a' to change it into 'b'?"
# b2j
# for x in b, b2j[x] is a list of the indices (into b)
# at which x appears; junk and popular elements do not appear
# fullbcount
# for x in b, fullbcount[x] == the number of times x
# appears in b; only materialized if really needed (used
# only for computing quick_ratio())
# matching_blocks
# a list of (i, j, k) triples, where a[i:i+k] == b[j:j+k];
# ascending & non-overlapping in i and in j; terminated by
# a dummy (len(a), len(b), 0) sentinel
# opcodes
# a list of (tag, i1, i2, j1, j2) tuples, where tag is
# one of
# 'replace' a[i1:i2] should be replaced by b[j1:j2]
# 'delete' a[i1:i2] should be deleted
# 'insert' b[j1:j2] should be inserted
# 'equal' a[i1:i2] == b[j1:j2]
# isjunk
# a user-supplied function taking a sequence element and
# returning true iff the element is "junk" -- this has
# subtle but helpful effects on the algorithm, which I'll
# get around to writing up someday <0.9 wink>.
# DON'T USE! Only __chain_b uses this. Use "in self.bjunk".
# bjunk
# the items in b for which isjunk is True.
# bpopular
# nonjunk items in b treated as junk by the heuristic (if used).
self.isjunk = isjunk
self.a = self.b = None
self.autojunk = autojunk
self.set_seqs(a, b)
def set_seqs(self, a, b):
"""Set the two sequences to be compared.
>>> s = SequenceMatcher()
>>> s.set_seqs("abcd", "bcde")
>>> s.ratio()
0.75
"""
self.set_seq1(a)
self.set_seq2(b)
def set_seq1(self, a):
"""Set the first sequence to be compared.
The second sequence to be compared is not changed.
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
>>> s.set_seq1("bcde")
>>> s.ratio()
1.0
>>>
SequenceMatcher computes and caches detailed information about the
second sequence, so if you want to compare one sequence S against
many sequences, use .set_seq2(S) once and call .set_seq1(x)
repeatedly for each of the other sequences.
See also set_seqs() and set_seq2().
"""
if a is self.a:
return
self.a = a
self.matching_blocks = self.opcodes = None
def set_seq2(self, b):
"""Set the second sequence to be compared.
The first sequence to be compared is not changed.
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
>>> s.set_seq2("abcd")
>>> s.ratio()
1.0
>>>
SequenceMatcher computes and caches detailed information about the
second sequence, so if you want to compare one sequence S against
many sequences, use .set_seq2(S) once and call .set_seq1(x)
repeatedly for each of the other sequences.
See also set_seqs() and set_seq1().
"""
if b is self.b:
return
self.b = b
self.matching_blocks = self.opcodes = None
self.fullbcount = None
self.__chain_b()
# For each element x in b, set b2j[x] to a list of the indices in
# b where x appears; the indices are in increasing order; note that
# the number of times x appears in b is len(b2j[x]) ...
# when self.isjunk is defined, junk elements don't show up in this
# map at all, which stops the central find_longest_match method
# from starting any matching block at a junk element ...
# b2j also does not contain entries for "popular" elements, meaning
# elements that account for more than 1 + 1% of the total elements, and
# when the sequence is reasonably large (>= 200 elements); this can
# be viewed as an adaptive notion of semi-junk, and yields an enormous
# speedup when, e.g., comparing program files with hundreds of
# instances of "return NULL;" ...
# note that this is only called when b changes; so for cross-product
# kinds of matches, it's best to call set_seq2 once, then set_seq1
# repeatedly
def __chain_b(self):
# Because isjunk is a user-defined (not C) function, and we test
# for junk a LOT, it's important to minimize the number of calls.
# Before the tricks described here, __chain_b was by far the most
# time-consuming routine in the whole module! If anyone sees
# Jim Roskind, thank him again for profile.py -- I never would
# have guessed that.
# The first trick is to build b2j ignoring the possibility
# of junk. I.e., we don't call isjunk at all yet. Throwing
# out the junk later is much cheaper than building b2j "right"
# from the start.
b = self.b
self.b2j = b2j = {}
for i, elt in enumerate(b):
indices = b2j.setdefault(elt, [])
indices.append(i)
# Purge junk elements
self.bjunk = junk = set()
isjunk = self.isjunk
if isjunk:
for elt in b2j.keys():
if isjunk(elt):
junk.add(elt)
for elt in junk: # separate loop avoids separate list of keys
del b2j[elt]
# Purge popular elements that are not junk
self.bpopular = popular = set()
n = len(b)
if self.autojunk and n >= 200:
ntest = n // 100 + 1
for elt, idxs in b2j.items():
if len(idxs) > ntest:
popular.add(elt)
for elt in popular: # ditto; as fast for 1% deletion
del b2j[elt]
def find_longest_match(self, alo, ahi, blo, bhi):
"""Find longest matching block in a[alo:ahi] and b[blo:bhi].
If isjunk is not defined:
Return (i,j,k) such that a[i:i+k] is equal to b[j:j+k], where
alo <= i <= i+k <= ahi
blo <= j <= j+k <= bhi
and for all (i',j',k') meeting those conditions,
k >= k'
i <= i'
and if i == i', j <= j'
In other words, of all maximal matching blocks, return one that
starts earliest in a, and of all those maximal matching blocks that
start earliest in a, return the one that starts earliest in b.
>>> s = SequenceMatcher(None, " abcd", "abcd abcd")
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=0, b=4, size=5)
If isjunk is defined, first the longest matching block is
determined as above, but with the additional restriction that no
junk element appears in the block. Then that block is extended as
far as possible by matching (only) junk elements on both sides. So
the resulting block never matches on junk except as identical junk
happens to be adjacent to an "interesting" match.
Here's the same example as before, but considering blanks to be
junk. That prevents " abcd" from matching the " abcd" at the tail
end of the second sequence directly. Instead only the "abcd" can
match, and matches the leftmost "abcd" in the second sequence:
>>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd")
>>> s.find_longest_match(0, 5, 0, 9)
Match(a=1, b=0, size=4)
If no blocks match, return (alo, blo, 0).
>>> s = SequenceMatcher(None, "ab", "c")
>>> s.find_longest_match(0, 2, 0, 1)
Match(a=0, b=0, size=0)
"""
# CAUTION: stripping common prefix or suffix would be incorrect.
# E.g.,
# ab
# acab
# Longest matching block is "ab", but if common prefix is
# stripped, it's "a" (tied with "b"). UNIX(tm) diff does so
# strip, so ends up claiming that ab is changed to acab by
# inserting "ca" in the middle. That's minimal but unintuitive:
# "it's obvious" that someone inserted "ac" at the front.
# Windiff ends up at the same place as diff, but by pairing up
# the unique 'b's and then matching the first two 'a's.
a, b, b2j, isbjunk = self.a, self.b, self.b2j, self.bjunk.__contains__
besti, bestj, bestsize = alo, blo, 0
# find longest junk-free match
# during an iteration of the loop, j2len[j] = length of longest
# junk-free match ending with a[i-1] and b[j]
j2len = {}
nothing = []
for i in range(alo, ahi):
# look at all instances of a[i] in b; note that because
# b2j has no junk keys, the loop is skipped if a[i] is junk
j2lenget = j2len.get
newj2len = {}
for j in b2j.get(a[i], nothing):
# a[i] matches b[j]
if j < blo:
continue
if j >= bhi:
break
k = newj2len[j] = j2lenget(j-1, 0) + 1
if k > bestsize:
besti, bestj, bestsize = i-k+1, j-k+1, k
j2len = newj2len
# Extend the best by non-junk elements on each end. In particular,
# "popular" non-junk elements aren't in b2j, which greatly speeds
# the inner loop above, but also means "the best" match so far
# doesn't contain any junk *or* popular non-junk elements.
while besti > alo and bestj > blo and \
not isbjunk(b[bestj-1]) and \
a[besti-1] == b[bestj-1]:
besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
while besti+bestsize < ahi and bestj+bestsize < bhi and \
not isbjunk(b[bestj+bestsize]) and \
a[besti+bestsize] == b[bestj+bestsize]:
bestsize += 1
# Now that we have a wholly interesting match (albeit possibly
# empty!), we may as well suck up the matching junk on each
# side of it too. Can't think of a good reason not to, and it
# saves post-processing the (possibly considerable) expense of
# figuring out what to do with it. In the case of an empty
# interesting match, this is clearly the right thing to do,
# because no other kind of match is possible in the regions.
while besti > alo and bestj > blo and \
isbjunk(b[bestj-1]) and \
a[besti-1] == b[bestj-1]:
besti, bestj, bestsize = besti-1, bestj-1, bestsize+1
while besti+bestsize < ahi and bestj+bestsize < bhi and \
isbjunk(b[bestj+bestsize]) and \
a[besti+bestsize] == b[bestj+bestsize]:
bestsize = bestsize + 1
return Match(besti, bestj, bestsize)
def get_matching_blocks(self):
"""Return list of triples describing matching subsequences.
Each triple is of the form (i, j, n), and means that
a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in
i and in j. New in Python 2.5, it's also guaranteed that if
(i, j, n) and (i', j', n') are adjacent triples in the list, and
the second is not the last triple in the list, then i+n != i' or
j+n != j'. IOW, adjacent triples never describe adjacent equal
blocks.
The last triple is a dummy, (len(a), len(b), 0), and is the only
triple with n==0.
>>> s = SequenceMatcher(None, "abxcd", "abcd")
>>> list(s.get_matching_blocks())
[Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)]
"""
if self.matching_blocks is not None:
return self.matching_blocks
la, lb = len(self.a), len(self.b)
# This is most naturally expressed as a recursive algorithm, but
# at least one user bumped into extreme use cases that exceeded
# the recursion limit on their box. So, now we maintain a list
# ('queue`) of blocks we still need to look at, and append partial
# results to `matching_blocks` in a loop; the matches are sorted
# at the end.
queue = [(0, la, 0, lb)]
matching_blocks = []
while queue:
alo, ahi, blo, bhi = queue.pop()
i, j, k = x = self.find_longest_match(alo, ahi, blo, bhi)
# a[alo:i] vs b[blo:j] unknown
# a[i:i+k] same as b[j:j+k]
# a[i+k:ahi] vs b[j+k:bhi] unknown
if k: # if k is 0, there was no matching block
matching_blocks.append(x)
if alo < i and blo < j:
queue.append((alo, i, blo, j))
if i+k < ahi and j+k < bhi:
queue.append((i+k, ahi, j+k, bhi))
matching_blocks.sort()
# It's possible that we have adjacent equal blocks in the
# matching_blocks list now. Starting with 2.5, this code was added
# to collapse them.
i1 = j1 = k1 = 0
non_adjacent = []
for i2, j2, k2 in matching_blocks:
# Is this block adjacent to i1, j1, k1?
if i1 + k1 == i2 and j1 + k1 == j2:
# Yes, so collapse them -- this just increases the length of
# the first block by the length of the second, and the first
# block so lengthened remains the block to compare against.
k1 += k2
else:
# Not adjacent. Remember the first block (k1==0 means it's
# the dummy we started with), and make the second block the
# new block to compare against.
if k1:
non_adjacent.append((i1, j1, k1))
i1, j1, k1 = i2, j2, k2
if k1:
non_adjacent.append((i1, j1, k1))
non_adjacent.append( (la, lb, 0) )
self.matching_blocks = list(map(Match._make, non_adjacent))
return self.matching_blocks
def get_opcodes(self):
"""Return list of 5-tuples describing how to turn a into b.
Each tuple is of the form (tag, i1, i2, j1, j2). The first tuple
has i1 == j1 == 0, and remaining tuples have i1 == the i2 from the
tuple preceding it, and likewise for j1 == the previous j2.
The tags are strings, with these meanings:
'replace': a[i1:i2] should be replaced by b[j1:j2]
'delete': a[i1:i2] should be deleted.
Note that j1==j2 in this case.
'insert': b[j1:j2] should be inserted at a[i1:i1].
Note that i1==i2 in this case.
'equal': a[i1:i2] == b[j1:j2]
>>> a = "qabxcd"
>>> b = "abycdf"
>>> s = SequenceMatcher(None, a, b)
>>> for tag, i1, i2, j1, j2 in s.get_opcodes():
... print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %
... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2])))
delete a[0:1] (q) b[0:0] ()
equal a[1:3] (ab) b[0:2] (ab)
replace a[3:4] (x) b[2:3] (y)
equal a[4:6] (cd) b[3:5] (cd)
insert a[6:6] () b[5:6] (f)
"""
if self.opcodes is not None:
return self.opcodes
i = j = 0
self.opcodes = answer = []
for ai, bj, size in self.get_matching_blocks():
# invariant: we've pumped out correct diffs to change
# a[:i] into b[:j], and the next matching block is
# a[ai:ai+size] == b[bj:bj+size]. So we need to pump
# out a diff to change a[i:ai] into b[j:bj], pump out
# the matching block, and move (i,j) beyond the match
tag = ''
if i < ai and j < bj:
tag = 'replace'
elif i < ai:
tag = 'delete'
elif j < bj:
tag = 'insert'
if tag:
answer.append( (tag, i, ai, j, bj) )
i, j = ai+size, bj+size
# the list of matching blocks is terminated by a
# sentinel with size 0
if size:
answer.append( ('equal', ai, i, bj, j) )
return answer
def get_grouped_opcodes(self, n=3):
""" Isolate change clusters by eliminating ranges with no changes.
Return a generator of groups with up to n lines of context.
Each group is in the same format as returned by get_opcodes().
>>> from pprint import pprint
>>> a = list(map(str, range(1,40)))
>>> b = a[:]
>>> b[8:8] = ['i'] # Make an insertion
>>> b[20] += 'x' # Make a replacement
>>> b[23:28] = [] # Make a deletion
>>> b[30] += 'y' # Make another replacement
>>> pprint(list(SequenceMatcher(None,a,b).get_grouped_opcodes()))
[[('equal', 5, 8, 5, 8), ('insert', 8, 8, 8, 9), ('equal', 8, 11, 9, 12)],
[('equal', 16, 19, 17, 20),
('replace', 19, 20, 20, 21),
('equal', 20, 22, 21, 23),
('delete', 22, 27, 23, 23),
('equal', 27, 30, 23, 26)],
[('equal', 31, 34, 27, 30),
('replace', 34, 35, 30, 31),
('equal', 35, 38, 31, 34)]]
"""
codes = self.get_opcodes()
if not codes:
codes = [("equal", 0, 1, 0, 1)]
# Fixup leading and trailing groups if they show no changes.
if codes[0][0] == 'equal':
tag, i1, i2, j1, j2 = codes[0]
codes[0] = tag, max(i1, i2-n), i2, max(j1, j2-n), j2
if codes[-1][0] == 'equal':
tag, i1, i2, j1, j2 = codes[-1]
codes[-1] = tag, i1, min(i2, i1+n), j1, min(j2, j1+n)
nn = n + n
group = []
for tag, i1, i2, j1, j2 in codes:
# End the current group and start a new one whenever
# there is a large range with no changes.
if tag == 'equal' and i2-i1 > nn:
group.append((tag, i1, min(i2, i1+n), j1, min(j2, j1+n)))
yield group
group = []
i1, j1 = max(i1, i2-n), max(j1, j2-n)
group.append((tag, i1, i2, j1 ,j2))
if group and not (len(group)==1 and group[0][0] == 'equal'):
yield group
def ratio(self):
"""Return a measure of the sequences' similarity (float in [0,1]).
Where T is the total number of elements in both sequences, and
M is the number of matches, this is 2.0*M / T.
Note that this is 1 if the sequences are identical, and 0 if
they have nothing in common.
.ratio() is expensive to compute if you haven't already computed
.get_matching_blocks() or .get_opcodes(), in which case you may
want to try .quick_ratio() or .real_quick_ratio() first to get an
upper bound.
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio()
0.75
>>> s.quick_ratio()
0.75
>>> s.real_quick_ratio()
1.0
"""
matches = sum(triple[-1] for triple in self.get_matching_blocks())
return _calculate_ratio(matches, len(self.a) + len(self.b))
def quick_ratio(self):
"""Return an upper bound on ratio() relatively quickly.
This isn't defined beyond that it is an upper bound on .ratio(), and
is faster to compute.
"""
# viewing a and b as multisets, set matches to the cardinality
# of their intersection; this counts the number of matches
# without regard to order, so is clearly an upper bound
if self.fullbcount is None:
self.fullbcount = fullbcount = {}
for elt in self.b:
fullbcount[elt] = fullbcount.get(elt, 0) + 1
fullbcount = self.fullbcount
# avail[x] is the number of times x appears in 'b' less the
# number of times we've seen it in 'a' so far ... kinda
avail = {}
availhas, matches = avail.__contains__, 0
for elt in self.a:
if availhas(elt):
numb = avail[elt]
else:
numb = fullbcount.get(elt, 0)
avail[elt] = numb - 1
if numb > 0:
matches = matches + 1
return _calculate_ratio(matches, len(self.a) + len(self.b))
def real_quick_ratio(self):
"""Return an upper bound on ratio() very quickly.
This isn't defined beyond that it is an upper bound on .ratio(), and
is faster to compute than either .ratio() or .quick_ratio().
"""
la, lb = len(self.a), len(self.b)
# can't have more matches than the number of elements in the
# shorter sequence
return _calculate_ratio(min(la, lb), la + lb)
def get_close_matches(word, possibilities, n=3, cutoff=0.6):
"""Use SequenceMatcher to return list of the best "good enough" matches.
word is a sequence for which close matches are desired (typically a
string).
possibilities is a list of sequences against which to match word
(typically a list of strings).
Optional arg n (default 3) is the maximum number of close matches to
return. n must be > 0.
Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
that don't score at least that similar to word are ignored.
The best (no more than n) matches among the possibilities are returned
in a list, sorted by similarity score, most similar first.
>>> get_close_matches("appel", ["ape", "apple", "peach", "puppy"])
['apple', 'ape']
>>> import keyword as _keyword
>>> get_close_matches("wheel", _keyword.kwlist)
['while']
>>> get_close_matches("Apple", _keyword.kwlist)
[]
>>> get_close_matches("accept", _keyword.kwlist)
['except']
"""
if not n > 0:
raise ValueError("n must be > 0: %r" % (n,))
if not 0.0 <= cutoff <= 1.0:
raise ValueError("cutoff must be in [0.0, 1.0]: %r" % (cutoff,))
result = []
s = SequenceMatcher()
s.set_seq2(word)
for x in possibilities:
s.set_seq1(x)
if s.real_quick_ratio() >= cutoff and \
s.quick_ratio() >= cutoff and \
s.ratio() >= cutoff:
result.append((s.ratio(), x))
# Move the best scorers to head of list
result = heapq.nlargest(n, result)
# Strip scores for the best n matches
return [x for score, x in result]
def _count_leading(line, ch):
"""
Return number of `ch` characters at the start of `line`.
Example:
>>> _count_leading(' abc', ' ')
3
"""
i, n = 0, len(line)
while i < n and line[i] == ch:
i += 1
return i
class Differ:
r"""
Differ is a class for comparing sequences of lines of text, and
producing human-readable differences or deltas. Differ uses
SequenceMatcher both to compare sequences of lines, and to compare
sequences of characters within similar (near-matching) lines.
Each line of a Differ delta begins with a two-letter code:
'- ' line unique to sequence 1
'+ ' line unique to sequence 2
' ' line common to both sequences
'? ' line not present in either input sequence
Lines beginning with '? ' attempt to guide the eye to intraline
differences, and were not present in either input sequence. These lines
can be confusing if the sequences contain tab characters.
Note that Differ makes no claim to produce a *minimal* diff. To the
contrary, minimal diffs are often counter-intuitive, because they synch
up anywhere possible, sometimes accidental matches 100 pages apart.
Restricting synch points to contiguous matches preserves some notion of
locality, at the occasional cost of producing a longer diff.
Example: Comparing two texts.
First we set up the texts, sequences of individual single-line strings
ending with newlines (such sequences can also be obtained from the
`readlines()` method of file-like objects):
>>> text1 = ''' 1. Beautiful is better than ugly.
... 2. Explicit is better than implicit.
... 3. Simple is better than complex.
... 4. Complex is better than complicated.
... '''.splitlines(keepends=True)
>>> len(text1)
4
>>> text1[0][-1]
'\n'
>>> text2 = ''' 1. Beautiful is better than ugly.
... 3. Simple is better than complex.
... 4. Complicated is better than complex.
... 5. Flat is better than nested.
... '''.splitlines(keepends=True)
Next we instantiate a Differ object:
>>> d = Differ()
Note that when instantiating a Differ object we may pass functions to
filter out line and character 'junk'. See Differ.__init__ for details.
Finally, we compare the two:
>>> result = list(d.compare(text1, text2))
'result' is a list of strings, so let's pretty-print it:
>>> from pprint import pprint as _pprint
>>> _pprint(result)
[' 1. Beautiful is better than ugly.\n',
'- 2. Explicit is better than implicit.\n',
'- 3. Simple is better than complex.\n',
'+ 3. Simple is better than complex.\n',
'? ++\n',
'- 4. Complex is better than complicated.\n',
'? ^ ---- ^\n',
'+ 4. Complicated is better than complex.\n',
'? ++++ ^ ^\n',
'+ 5. Flat is better than nested.\n']
As a single multi-line string it looks like this:
>>> print(''.join(result), end="")
1. Beautiful is better than ugly.
- 2. Explicit is better than implicit.
- 3. Simple is better than complex.
+ 3. Simple is better than complex.
? ++
- 4. Complex is better than complicated.
? ^ ---- ^
+ 4. Complicated is better than complex.
? ++++ ^ ^
+ 5. Flat is better than nested.
Methods:
__init__(linejunk=None, charjunk=None)
Construct a text differencer, with optional filters.
compare(a, b)
Compare two sequences of lines; generate the resulting delta.
"""
def __init__(self, linejunk=None, charjunk=None):
"""
Construct a text differencer, with optional filters.
The two optional keyword parameters are for filter functions:
- `linejunk`: A function that should accept a single string argument,
and return true iff the string is junk. The module-level function
`IS_LINE_JUNK` may be used to filter out lines without visible
characters, except for at most one splat ('#'). It is recommended
to leave linejunk None; as of Python 2.3, the underlying
SequenceMatcher class has grown an adaptive notion of "noise" lines
that's better than any static definition the author has ever been
able to craft.
- `charjunk`: A function that should accept a string of length 1. The
module-level function `IS_CHARACTER_JUNK` may be used to filter out
whitespace characters (a blank or tab; **note**: bad idea to include
newline in this!). Use of IS_CHARACTER_JUNK is recommended.
"""
self.linejunk = linejunk
self.charjunk = charjunk
def compare(self, a, b):
r"""
Compare two sequences of lines; generate the resulting delta.
Each sequence must contain individual single-line strings ending with
newlines. Such sequences can be obtained from the `readlines()` method
of file-like objects. The delta generated also consists of newline-
terminated strings, ready to be printed as-is via the writeline()
method of a file-like object.
Example:
>>> print(''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(True),
... 'ore\ntree\nemu\n'.splitlines(True))),
... end="")
- one
? ^
+ ore
? ^
- two
- three
? -
+ tree
+ emu
"""
cruncher = SequenceMatcher(self.linejunk, a, b)
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if tag == 'replace':
g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
elif tag == 'delete':
g = self._dump('-', a, alo, ahi)
elif tag == 'insert':
g = self._dump('+', b, blo, bhi)
elif tag == 'equal':
g = self._dump(' ', a, alo, ahi)
else:
raise ValueError('unknown tag %r' % (tag,))
yield from g
def _dump(self, tag, x, lo, hi):
"""Generate comparison results for a same-tagged range."""
for i in range(lo, hi):
yield '%s %s' % (tag, x[i])
def _plain_replace(self, a, alo, ahi, b, blo, bhi):
assert alo < ahi and blo < bhi
# dump the shorter block first -- reduces the burden on short-term
# memory if the blocks are of very different sizes
if bhi - blo < ahi - alo:
first = self._dump('+', b, blo, bhi)
second = self._dump('-', a, alo, ahi)
else:
first = self._dump('-', a, alo, ahi)
second = self._dump('+', b, blo, bhi)
for g in first, second:
yield from g
def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
r"""
When replacing one block of lines with another, search the blocks
for *similar* lines; the best-matching pair (if any) is used as a
synch point, and intraline difference marking is done on the
similar pair. Lots of work, but often worth it.
Example:
>>> d = Differ()
>>> results = d._fancy_replace(['abcDefghiJkl\n'], 0, 1,
... ['abcdefGhijkl\n'], 0, 1)
>>> print(''.join(results), end="")
- abcDefghiJkl
? ^ ^ ^
+ abcdefGhijkl
? ^ ^ ^
"""
# don't synch up unless the lines have a similarity score of at
# least cutoff; best_ratio tracks the best score seen so far
best_ratio, cutoff = 0.74, 0.75
cruncher = SequenceMatcher(self.charjunk)
eqi, eqj = None, None # 1st indices of equal lines (if any)
# search for the pair that matches best without being identical
# (identical lines must be junk lines, & we don't want to synch up
# on junk -- unless we have to)
for j in range(blo, bhi):
bj = b[j]
cruncher.set_seq2(bj)
for i in range(alo, ahi):
ai = a[i]
if ai == bj:
if eqi is None:
eqi, eqj = i, j
continue
cruncher.set_seq1(ai)
# computing similarity is expensive, so use the quick
# upper bounds first -- have seen this speed up messy
# compares by a factor of 3.
# note that ratio() is only expensive to compute the first
# time it's called on a sequence pair; the expensive part
# of the computation is cached by cruncher
if cruncher.real_quick_ratio() > best_ratio and \
cruncher.quick_ratio() > best_ratio and \
cruncher.ratio() > best_ratio:
best_ratio, best_i, best_j = cruncher.ratio(), i, j
if best_ratio < cutoff:
# no non-identical "pretty close" pair
if eqi is None:
# no identical pair either -- treat it as a straight replace
yield from self._plain_replace(a, alo, ahi, b, blo, bhi)
return
# no close pair, but an identical pair -- synch up on that
best_i, best_j, best_ratio = eqi, eqj, 1.0
else:
# there's a close pair, so forget the identical pair (if any)
eqi = None
# a[best_i] very similar to b[best_j]; eqi is None iff they're not
# identical
# pump out diffs from before the synch point
yield from self._fancy_helper(a, alo, best_i, b, blo, best_j)
# do intraline marking on the synch pair
aelt, belt = a[best_i], b[best_j]
if eqi is None:
# pump out a '-', '?', '+', '?' quad for the synched lines
atags = btags = ""
cruncher.set_seqs(aelt, belt)
for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes():
la, lb = ai2 - ai1, bj2 - bj1
if tag == 'replace':
atags += '^' * la
btags += '^' * lb
elif tag == 'delete':
atags += '-' * la
elif tag == 'insert':
btags += '+' * lb
elif tag == 'equal':
atags += ' ' * la
btags += ' ' * lb
else:
raise ValueError('unknown tag %r' % (tag,))
yield from self._qformat(aelt, belt, atags, btags)
else:
# the synch pair is identical
yield ' ' + aelt
# pump out diffs from after the synch point
yield from self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
g = []
if alo < ahi:
if blo < bhi:
g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
else:
g = self._dump('-', a, alo, ahi)
elif blo < bhi:
g = self._dump('+', b, blo, bhi)
yield from g
def _qformat(self, aline, bline, atags, btags):
r"""
Format "?" output and deal with leading tabs.
Example:
>>> d = Differ()
>>> results = d._qformat('\tabcDefghiJkl\n', '\tabcdefGhijkl\n',
... ' ^ ^ ^ ', ' ^ ^ ^ ')
>>> for line in results: print(repr(line))
...
'- \tabcDefghiJkl\n'
'? \t ^ ^ ^\n'
'+ \tabcdefGhijkl\n'
'? \t ^ ^ ^\n'
"""
# Can hurt, but will probably help most of the time.
common = min(_count_leading(aline, "\t"),
_count_leading(bline, "\t"))
common = min(common, _count_leading(atags[:common], " "))
common = min(common, _count_leading(btags[:common], " "))
atags = atags[common:].rstrip()
btags = btags[common:].rstrip()
yield "- " + aline
if atags:
yield "? %s%s\n" % ("\t" * common, atags)
yield "+ " + bline
if btags:
yield "? %s%s\n" % ("\t" * common, btags)
# With respect to junk, an earlier version of ndiff simply refused to
# *start* a match with a junk element. The result was cases like this:
# before: private Thread currentThread;
# after: private volatile Thread currentThread;
# If you consider whitespace to be junk, the longest contiguous match
# not starting with junk is "e Thread currentThread". So ndiff reported
# that "e volatil" was inserted between the 't' and the 'e' in "private".
# While an accurate view, to people that's absurd. The current version
# looks for matching blocks that are entirely junk-free, then extends the
# longest one of those as far as possible but only with matching junk.
# So now "currentThread" is matched, then extended to suck up the
# preceding blank; then "private" is matched, and extended to suck up the
# following blank; then "Thread" is matched; and finally ndiff reports
# that "volatile " was inserted before "Thread". The only quibble
# remaining is that perhaps it was really the case that " volatile"
# was inserted after "private". I can live with that <wink>.
import re
def IS_LINE_JUNK(line, pat=re.compile(r"\s*(?:#\s*)?$").match):
r"""
Return 1 for ignorable line: iff `line` is blank or contains a single '#'.
Examples:
>>> IS_LINE_JUNK('\n')
True
>>> IS_LINE_JUNK(' # \n')
True
>>> IS_LINE_JUNK('hello\n')
False
"""
return pat(line) is not None
def IS_CHARACTER_JUNK(ch, ws=" \t"):
r"""
Return 1 for ignorable character: iff `ch` is a space or tab.
Examples:
>>> IS_CHARACTER_JUNK(' ')
True
>>> IS_CHARACTER_JUNK('\t')
True
>>> IS_CHARACTER_JUNK('\n')
False
>>> IS_CHARACTER_JUNK('x')
False
"""
return ch in ws
########################################################################
### Unified Diff
########################################################################
def _format_range_unified(start, stop):
'Convert range to the "ed" format'
# Per the diff spec at http://www.unix.org/single_unix_specification/
beginning = start + 1 # lines start numbering with one
length = stop - start
if length == 1:
return '{}'.format(beginning)
if not length:
beginning -= 1 # empty ranges begin at line just before the range
return '{},{}'.format(beginning, length)
def unified_diff(a, b, fromfile='', tofile='', fromfiledate='',
tofiledate='', n=3, lineterm='\n'):
r"""
Compare two sequences of lines; generate the delta as a unified diff.
Unified diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by 'n' which
defaults to three.
By default, the diff control lines (those with ---, +++, or @@) are
created with a trailing newline. This is helpful so that inputs
created from file.readlines() result in diffs that are suitable for
file.writelines() since both the inputs and outputs have trailing
newlines.
For inputs that do not have trailing newlines, set the lineterm
argument to "" so that the output will be uniformly newline free.
The unidiff format normally has a header for filenames and modification
times. Any or all of these may be specified using strings for
'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'.
The modification times are normally expressed in the ISO 8601 format.
Example:
>>> for line in unified_diff('one two three four'.split(),
... 'zero one tree four'.split(), 'Original', 'Current',
... '2005-01-26 23:30:50', '2010-04-02 10:20:52',
... lineterm=''):
... print(line) # doctest: +NORMALIZE_WHITESPACE
--- Original 2005-01-26 23:30:50
+++ Current 2010-04-02 10:20:52
@@ -1,4 +1,4 @@
+zero
one
-two
-three
+tree
four
"""
started = False
for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n):
if not started:
started = True
fromdate = '\t{}'.format(fromfiledate) if fromfiledate else ''
todate = '\t{}'.format(tofiledate) if tofiledate else ''
yield '--- {}{}{}'.format(fromfile, fromdate, lineterm)
yield '+++ {}{}{}'.format(tofile, todate, lineterm)
first, last = group[0], group[-1]
file1_range = _format_range_unified(first[1], last[2])
file2_range = _format_range_unified(first[3], last[4])
yield '@@ -{} +{} @@{}'.format(file1_range, file2_range, lineterm)
for tag, i1, i2, j1, j2 in group:
if tag == 'equal':
for line in a[i1:i2]:
yield ' ' + line
continue
if tag in {'replace', 'delete'}:
for line in a[i1:i2]:
yield '-' + line
if tag in {'replace', 'insert'}:
for line in b[j1:j2]:
yield '+' + line
########################################################################
### Context Diff
########################################################################
def _format_range_context(start, stop):
'Convert range to the "ed" format'
# Per the diff spec at http://www.unix.org/single_unix_specification/
beginning = start + 1 # lines start numbering with one
length = stop - start
if not length:
beginning -= 1 # empty ranges begin at line just before the range
if length <= 1:
return '{}'.format(beginning)
return '{},{}'.format(beginning, beginning + length - 1)
# See http://www.unix.org/single_unix_specification/
def context_diff(a, b, fromfile='', tofile='',
fromfiledate='', tofiledate='', n=3, lineterm='\n'):
r"""
Compare two sequences of lines; generate the delta as a context diff.
Context diffs are a compact way of showing line changes and a few
lines of context. The number of context lines is set by 'n' which
defaults to three.
By default, the diff control lines (those with *** or ---) are
created with a trailing newline. This is helpful so that inputs
created from file.readlines() result in diffs that are suitable for
file.writelines() since both the inputs and outputs have trailing
newlines.
For inputs that do not have trailing newlines, set the lineterm
argument to "" so that the output will be uniformly newline free.
The context diff format normally has a header for filenames and
modification times. Any or all of these may be specified using
strings for 'fromfile', 'tofile', 'fromfiledate', and 'tofiledate'.
The modification times are normally expressed in the ISO 8601 format.
If not specified, the strings default to blanks.
Example:
>>> print(''.join(context_diff('one\ntwo\nthree\nfour\n'.splitlines(True),
... 'zero\none\ntree\nfour\n'.splitlines(True), 'Original', 'Current')),
... end="")
*** Original
--- Current
***************
*** 1,4 ****
one
! two
! three
four
--- 1,4 ----
+ zero
one
! tree
four
"""
prefix = dict(insert='+ ', delete='- ', replace='! ', equal=' ')
started = False
for group in SequenceMatcher(None,a,b).get_grouped_opcodes(n):
if not started:
started = True
fromdate = '\t{}'.format(fromfiledate) if fromfiledate else ''
todate = '\t{}'.format(tofiledate) if tofiledate else ''
yield '*** {}{}{}'.format(fromfile, fromdate, lineterm)
yield '--- {}{}{}'.format(tofile, todate, lineterm)
first, last = group[0], group[-1]
yield '***************' + lineterm
file1_range = _format_range_context(first[1], last[2])
yield '*** {} ****{}'.format(file1_range, lineterm)
if any(tag in {'replace', 'delete'} for tag, _, _, _, _ in group):
for tag, i1, i2, _, _ in group:
if tag != 'insert':
for line in a[i1:i2]:
yield prefix[tag] + line
file2_range = _format_range_context(first[3], last[4])
yield '--- {} ----{}'.format(file2_range, lineterm)
if any(tag in {'replace', 'insert'} for tag, _, _, _, _ in group):
for tag, _, _, j1, j2 in group:
if tag != 'delete':
for line in b[j1:j2]:
yield prefix[tag] + line
def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
r"""
Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
Optional keyword parameters `linejunk` and `charjunk` are for filter
functions (or None):
- linejunk: A function that should accept a single string argument, and
return true iff the string is junk. The default is None, and is
recommended; as of Python 2.3, an adaptive notion of "noise" lines is
used that does a good job on its own.
- charjunk: A function that should accept a string of length 1. The
default is module-level function IS_CHARACTER_JUNK, which filters out
whitespace characters (a blank or tab; note: bad idea to include newline
in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
Example:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
... 'ore\ntree\nemu\n'.splitlines(keepends=True))
>>> print(''.join(diff), end="")
- one
? ^
+ ore
? ^
- two
- three
? -
+ tree
+ emu
"""
return Differ(linejunk, charjunk).compare(a, b)
def _mdiff(fromlines, tolines, context=None, linejunk=None,
charjunk=IS_CHARACTER_JUNK):
r"""Returns generator yielding marked up from/to side by side differences.
Arguments:
fromlines -- list of text lines to compared to tolines
tolines -- list of text lines to be compared to fromlines
context -- number of context lines to display on each side of difference,
if None, all from/to text lines will be generated.
linejunk -- passed on to ndiff (see ndiff documentation)
charjunk -- passed on to ndiff (see ndiff documentation)
This function returns an iterator which returns a tuple:
(from line tuple, to line tuple, boolean flag)
from/to line tuple -- (line num, line text)
line num -- integer or None (to indicate a context separation)
line text -- original line text with following markers inserted:
'\0+' -- marks start of added text
'\0-' -- marks start of deleted text
'\0^' -- marks start of changed text
'\1' -- marks end of added/deleted/changed text
boolean flag -- None indicates context separation, True indicates
either "from" or "to" line contains a change, otherwise False.
This function/iterator was originally developed to generate side by side
file difference for making HTML pages (see HtmlDiff class for example
usage).
Note, this function utilizes the ndiff function to generate the side by
side difference markup. Optional ndiff arguments may be passed to this
function and they in turn will be passed to ndiff.
"""
import re
# regular expression for finding intraline change indices
change_re = re.compile('(\++|\-+|\^+)')
# create the difference iterator to generate the differences
diff_lines_iterator = ndiff(fromlines,tolines,linejunk,charjunk)
def _make_line(lines, format_key, side, num_lines=[0,0]):
"""Returns line of text with user's change markup and line formatting.
lines -- list of lines from the ndiff generator to produce a line of
text from. When producing the line of text to return, the
lines used are removed from this list.
format_key -- '+' return first line in list with "add" markup around
the entire line.
'-' return first line in list with "delete" markup around
the entire line.
'?' return first line in list with add/delete/change
intraline markup (indices obtained from second line)
None return first line in list with no markup
side -- indice into the num_lines list (0=from,1=to)
num_lines -- from/to current line number. This is NOT intended to be a
passed parameter. It is present as a keyword argument to
maintain memory of the current line numbers between calls
of this function.
Note, this function is purposefully not defined at the module scope so
that data it needs from its parent function (within whose context it
is defined) does not need to be of module scope.
"""
num_lines[side] += 1
# Handle case where no user markup is to be added, just return line of
# text with user's line format to allow for usage of the line number.
if format_key is None:
return (num_lines[side],lines.pop(0)[2:])
# Handle case of intraline changes
if format_key == '?':
text, markers = lines.pop(0), lines.pop(0)
# find intraline changes (store change type and indices in tuples)
sub_info = []
def record_sub_info(match_object,sub_info=sub_info):
sub_info.append([match_object.group(1)[0],match_object.span()])
return match_object.group(1)
change_re.sub(record_sub_info,markers)
# process each tuple inserting our special marks that won't be
# noticed by an xml/html escaper.
for key,(begin,end) in sub_info[::-1]:
text = text[0:begin]+'\0'+key+text[begin:end]+'\1'+text[end:]
text = text[2:]
# Handle case of add/delete entire line
else:
text = lines.pop(0)[2:]
# if line of text is just a newline, insert a space so there is
# something for the user to highlight and see.
if not text:
text = ' '
# insert marks that won't be noticed by an xml/html escaper.
text = '\0' + format_key + text + '\1'
# Return line of text, first allow user's line formatter to do its
# thing (such as adding the line number) then replace the special
# marks with what the user's change markup.
return (num_lines[side],text)
def _line_iterator():
"""Yields from/to lines of text with a change indication.
This function is an iterator. It itself pulls lines from a
differencing iterator, processes them and yields them. When it can
it yields both a "from" and a "to" line, otherwise it will yield one
or the other. In addition to yielding the lines of from/to text, a
boolean flag is yielded to indicate if the text line(s) have
differences in them.
Note, this function is purposefully not defined at the module scope so
that data it needs from its parent function (within whose context it
is defined) does not need to be of module scope.
"""
lines = []
num_blanks_pending, num_blanks_to_yield = 0, 0
while True:
# Load up next 4 lines so we can look ahead, create strings which
# are a concatenation of the first character of each of the 4 lines
# so we can do some very readable comparisons.
while len(lines) < 4:
try:
lines.append(next(diff_lines_iterator))
except StopIteration:
lines.append('X')
s = ''.join([line[0] for line in lines])
if s.startswith('X'):
# When no more lines, pump out any remaining blank lines so the
# corresponding add/delete lines get a matching blank line so
# all line pairs get yielded at the next level.
num_blanks_to_yield = num_blanks_pending
elif s.startswith('-?+?'):
# simple intraline change
yield _make_line(lines,'?',0), _make_line(lines,'?',1), True
continue
elif s.startswith('--++'):
# in delete block, add block coming: we do NOT want to get
# caught up on blank lines yet, just process the delete line
num_blanks_pending -= 1
yield _make_line(lines,'-',0), None, True
continue
elif s.startswith(('--?+', '--+', '- ')):
# in delete block and see an intraline change or unchanged line
# coming: yield the delete line and then blanks
from_line,to_line = _make_line(lines,'-',0), None
num_blanks_to_yield,num_blanks_pending = num_blanks_pending-1,0
elif s.startswith('-+?'):
# intraline change
yield _make_line(lines,None,0), _make_line(lines,'?',1), True
continue
elif s.startswith('-?+'):
# intraline change
yield _make_line(lines,'?',0), _make_line(lines,None,1), True
continue
elif s.startswith('-'):
# delete FROM line
num_blanks_pending -= 1
yield _make_line(lines,'-',0), None, True
continue
elif s.startswith('+--'):
# in add block, delete block coming: we do NOT want to get
# caught up on blank lines yet, just process the add line
num_blanks_pending += 1
yield None, _make_line(lines,'+',1), True
continue
elif s.startswith(('+ ', '+-')):
# will be leaving an add block: yield blanks then add line
from_line, to_line = None, _make_line(lines,'+',1)
num_blanks_to_yield,num_blanks_pending = num_blanks_pending+1,0
elif s.startswith('+'):
# inside an add block, yield the add line
num_blanks_pending += 1
yield None, _make_line(lines,'+',1), True
continue
elif s.startswith(' '):
# unchanged text, yield it to both sides
yield _make_line(lines[:],None,0),_make_line(lines,None,1),False
continue
# Catch up on the blank lines so when we yield the next from/to
# pair, they are lined up.
while(num_blanks_to_yield < 0):
num_blanks_to_yield += 1
yield None,('','\n'),True
while(num_blanks_to_yield > 0):
num_blanks_to_yield -= 1
yield ('','\n'),None,True
if s.startswith('X'):
raise StopIteration
else:
yield from_line,to_line,True
def _line_pair_iterator():
"""Yields from/to lines of text with a change indication.
This function is an iterator. It itself pulls lines from the line
iterator. Its difference from that iterator is that this function
always yields a pair of from/to text lines (with the change
indication). If necessary it will collect single from/to lines
until it has a matching pair from/to pair to yield.
Note, this function is purposefully not defined at the module scope so
that data it needs from its parent function (within whose context it
is defined) does not need to be of module scope.
"""
line_iterator = _line_iterator()
fromlines,tolines=[],[]
while True:
# Collecting lines of text until we have a from/to pair
while (len(fromlines)==0 or len(tolines)==0):
from_line, to_line, found_diff = next(line_iterator)
if from_line is not None:
fromlines.append((from_line,found_diff))
if to_line is not None:
tolines.append((to_line,found_diff))
# Once we have a pair, remove them from the collection and yield it
from_line, fromDiff = fromlines.pop(0)
to_line, to_diff = tolines.pop(0)
yield (from_line,to_line,fromDiff or to_diff)
# Handle case where user does not want context differencing, just yield
# them up without doing anything else with them.
line_pair_iterator = _line_pair_iterator()
if context is None:
while True:
yield next(line_pair_iterator)
# Handle case where user wants context differencing. We must do some
# storage of lines until we know for sure that they are to be yielded.
else:
context += 1
lines_to_write = 0
while True:
# Store lines up until we find a difference, note use of a
# circular queue because we only need to keep around what
# we need for context.
index, contextLines = 0, [None]*(context)
found_diff = False
while(found_diff is False):
from_line, to_line, found_diff = next(line_pair_iterator)
i = index % context
contextLines[i] = (from_line, to_line, found_diff)
index += 1
# Yield lines that we have collected so far, but first yield
# the user's separator.
if index > context:
yield None, None, None
lines_to_write = context
else:
lines_to_write = index
index = 0
while(lines_to_write):
i = index % context
index += 1
yield contextLines[i]
lines_to_write -= 1
# Now yield the context lines after the change
lines_to_write = context-1
while(lines_to_write):
from_line, to_line, found_diff = next(line_pair_iterator)
# If another change within the context, extend the context
if found_diff:
lines_to_write = context-1
else:
lines_to_write -= 1
yield from_line, to_line, found_diff
_file_template = """
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1" />
<title></title>
<style type="text/css">%(styles)s
</style>
</head>
<body>
%(table)s%(legend)s
</body>
</html>"""
_styles = """
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}"""
_table_template = """
<table class="diff" id="difflib_chg_%(prefix)s_top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
%(header_row)s
<tbody>
%(data_rows)s </tbody>
</table>"""
_legend = """
<table class="diff" summary="Legends">
<tr> <th colspan="2"> Legends </th> </tr>
<tr> <td> <table border="" summary="Colors">
<tr><th> Colors </th> </tr>
<tr><td class="diff_add"> Added </td></tr>
<tr><td class="diff_chg">Changed</td> </tr>
<tr><td class="diff_sub">Deleted</td> </tr>
</table></td>
<td> <table border="" summary="Links">
<tr><th colspan="2"> Links </th> </tr>
<tr><td>(f)irst change</td> </tr>
<tr><td>(n)ext change</td> </tr>
<tr><td>(t)op</td> </tr>
</table></td> </tr>
</table>"""
class HtmlDiff(object):
"""For producing HTML side by side comparison with change highlights.
This class can be used to create an HTML table (or a complete HTML file
containing the table) showing a side by side, line by line comparison
of text with inter-line and intra-line change highlights. The table can
be generated in either full or contextual difference mode.
The following methods are provided for HTML generation:
make_table -- generates HTML for a single side by side table
make_file -- generates complete HTML file with a single side by side table
See tools/scripts/diff.py for an example usage of this class.
"""
_file_template = _file_template
_styles = _styles
_table_template = _table_template
_legend = _legend
_default_prefix = 0
def __init__(self,tabsize=8,wrapcolumn=None,linejunk=None,
charjunk=IS_CHARACTER_JUNK):
"""HtmlDiff instance initializer
Arguments:
tabsize -- tab stop spacing, defaults to 8.
wrapcolumn -- column number where lines are broken and wrapped,
defaults to None where lines are not wrapped.
linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
HtmlDiff() to generate the side by side HTML differences). See
ndiff() documentation for argument default values and descriptions.
"""
self._tabsize = tabsize
self._wrapcolumn = wrapcolumn
self._linejunk = linejunk
self._charjunk = charjunk
def make_file(self,fromlines,tolines,fromdesc='',todesc='',context=False,
numlines=5):
"""Returns HTML file of side by side comparison with change highlights
Arguments:
fromlines -- list of "from" lines
tolines -- list of "to" lines
fromdesc -- "from" file column header string
todesc -- "to" file column header string
context -- set to True for contextual differences (defaults to False
which shows full differences).
numlines -- number of context lines. When context is set True,
controls number of lines displayed before and after the change.
When context is False, controls the number of lines to place
the "next" link anchors before the next change (so click of
"next" link jumps to just before the change).
"""
return self._file_template % dict(
styles = self._styles,
legend = self._legend,
table = self.make_table(fromlines,tolines,fromdesc,todesc,
context=context,numlines=numlines))
def _tab_newline_replace(self,fromlines,tolines):
"""Returns from/to line lists with tabs expanded and newlines removed.
Instead of tab characters being replaced by the number of spaces
needed to fill in to the next tab stop, this function will fill
the space with tab characters. This is done so that the difference
algorithms can identify changes in a file when tabs are replaced by
spaces and vice versa. At the end of the HTML generation, the tab
characters will be replaced with a nonbreakable space.
"""
def expand_tabs(line):
# hide real spaces
line = line.replace(' ','\0')
# expand tabs into spaces
line = line.expandtabs(self._tabsize)
# replace spaces from expanded tabs back into tab characters
# (we'll replace them with markup after we do differencing)
line = line.replace(' ','\t')
return line.replace('\0',' ').rstrip('\n')
fromlines = [expand_tabs(line) for line in fromlines]
tolines = [expand_tabs(line) for line in tolines]
return fromlines,tolines
def _split_line(self,data_list,line_num,text):
"""Builds list of text lines by splitting text lines at wrap point
This function will determine if the input text line needs to be
wrapped (split) into separate lines. If so, the first wrap point
will be determined and the first line appended to the output
text line list. This function is used recursively to handle
the second part of the split line to further split it.
"""
# if blank line or context separator, just add it to the output list
if not line_num:
data_list.append((line_num,text))
return
# if line text doesn't need wrapping, just add it to the output list
size = len(text)
max = self._wrapcolumn
if (size <= max) or ((size -(text.count('\0')*3)) <= max):
data_list.append((line_num,text))
return
# scan text looking for the wrap point, keeping track if the wrap
# point is inside markers
i = 0
n = 0
mark = ''
while n < max and i < size:
if text[i] == '\0':
i += 1
mark = text[i]
i += 1
elif text[i] == '\1':
i += 1
mark = ''
else:
i += 1
n += 1
# wrap point is inside text, break it up into separate lines
line1 = text[:i]
line2 = text[i:]
# if wrap point is inside markers, place end marker at end of first
# line and start marker at beginning of second line because each
# line will have its own table tag markup around it.
if mark:
line1 = line1 + '\1'
line2 = '\0' + mark + line2
# tack on first line onto the output list
data_list.append((line_num,line1))
# use this routine again to wrap the remaining text
self._split_line(data_list,'>',line2)
def _line_wrapper(self,diffs):
"""Returns iterator that splits (wraps) mdiff text lines"""
# pull from/to data and flags from mdiff iterator
for fromdata,todata,flag in diffs:
# check for context separators and pass them through
if flag is None:
yield fromdata,todata,flag
continue
(fromline,fromtext),(toline,totext) = fromdata,todata
# for each from/to line split it at the wrap column to form
# list of text lines.
fromlist,tolist = [],[]
self._split_line(fromlist,fromline,fromtext)
self._split_line(tolist,toline,totext)
# yield from/to line in pairs inserting blank lines as
# necessary when one side has more wrapped lines
while fromlist or tolist:
if fromlist:
fromdata = fromlist.pop(0)
else:
fromdata = ('',' ')
if tolist:
todata = tolist.pop(0)
else:
todata = ('',' ')
yield fromdata,todata,flag
def _collect_lines(self,diffs):
"""Collects mdiff output into separate lists
Before storing the mdiff from/to data into a list, it is converted
into a single line of text with HTML markup.
"""
fromlist,tolist,flaglist = [],[],[]
# pull from/to data and flags from mdiff style iterator
for fromdata,todata,flag in diffs:
try:
# store HTML markup of the lines into the lists
fromlist.append(self._format_line(0,flag,*fromdata))
tolist.append(self._format_line(1,flag,*todata))
except TypeError:
# exceptions occur for lines where context separators go
fromlist.append(None)
tolist.append(None)
flaglist.append(flag)
return fromlist,tolist,flaglist
def _format_line(self,side,flag,linenum,text):
"""Returns HTML markup of "from" / "to" text lines
side -- 0 or 1 indicating "from" or "to" text
flag -- indicates if difference on line
linenum -- line number (used for line number column)
text -- line text to be marked up
"""
try:
linenum = '%d' % linenum
id = ' id="%s%s"' % (self._prefix[side],linenum)
except TypeError:
# handle blank lines where linenum is '>' or ''
id = ''
# replace those things that would get confused with HTML symbols
text=text.replace("&","&").replace(">",">").replace("<","<")
# make space non-breakable so they don't get compressed or line wrapped
text = text.replace(' ',' ').rstrip()
return '<td class="diff_header"%s>%s</td><td nowrap="nowrap">%s</td>' \
% (id,linenum,text)
def _make_prefix(self):
"""Create unique anchor prefixes"""
# Generate a unique anchor prefix so multiple tables
# can exist on the same HTML page without conflicts.
fromprefix = "from%d_" % HtmlDiff._default_prefix
toprefix = "to%d_" % HtmlDiff._default_prefix
HtmlDiff._default_prefix += 1
# store prefixes so line format method has access
self._prefix = [fromprefix,toprefix]
def _convert_flags(self,fromlist,tolist,flaglist,context,numlines):
"""Makes list of "next" links"""
# all anchor names will be generated using the unique "to" prefix
toprefix = self._prefix[1]
# process change flags, generating middle column of next anchors/links
next_id = ['']*len(flaglist)
next_href = ['']*len(flaglist)
num_chg, in_change = 0, False
last = 0
for i,flag in enumerate(flaglist):
if flag:
if not in_change:
in_change = True
last = i
# at the beginning of a change, drop an anchor a few lines
# (the context lines) before the change for the previous
# link
i = max([0,i-numlines])
next_id[i] = ' id="difflib_chg_%s_%d"' % (toprefix,num_chg)
# at the beginning of a change, drop a link to the next
# change
num_chg += 1
next_href[last] = '<a href="#difflib_chg_%s_%d">n</a>' % (
toprefix,num_chg)
else:
in_change = False
# check for cases where there is no content to avoid exceptions
if not flaglist:
flaglist = [False]
next_id = ['']
next_href = ['']
last = 0
if context:
fromlist = ['<td></td><td> No Differences Found </td>']
tolist = fromlist
else:
fromlist = tolist = ['<td></td><td> Empty File </td>']
# if not a change on first line, drop a link
if not flaglist[0]:
next_href[0] = '<a href="#difflib_chg_%s_0">f</a>' % toprefix
# redo the last link to link to the top
next_href[last] = '<a href="#difflib_chg_%s_top">t</a>' % (toprefix)
return fromlist,tolist,flaglist,next_href,next_id
def make_table(self,fromlines,tolines,fromdesc='',todesc='',context=False,
numlines=5):
"""Returns HTML table of side by side comparison with change highlights
Arguments:
fromlines -- list of "from" lines
tolines -- list of "to" lines
fromdesc -- "from" file column header string
todesc -- "to" file column header string
context -- set to True for contextual differences (defaults to False
which shows full differences).
numlines -- number of context lines. When context is set True,
controls number of lines displayed before and after the change.
When context is False, controls the number of lines to place
the "next" link anchors before the next change (so click of
"next" link jumps to just before the change).
"""
# make unique anchor prefixes so that multiple tables may exist
# on the same page without conflict.
self._make_prefix()
# change tabs to spaces before it gets more difficult after we insert
# markup
fromlines,tolines = self._tab_newline_replace(fromlines,tolines)
# create diffs iterator which generates side by side from/to data
if context:
context_lines = numlines
else:
context_lines = None
diffs = _mdiff(fromlines,tolines,context_lines,linejunk=self._linejunk,
charjunk=self._charjunk)
# set up iterator to wrap lines that exceed desired width
if self._wrapcolumn:
diffs = self._line_wrapper(diffs)
# collect up from/to lines and flags into lists (also format the lines)
fromlist,tolist,flaglist = self._collect_lines(diffs)
# process change flags, generating middle column of next anchors/links
fromlist,tolist,flaglist,next_href,next_id = self._convert_flags(
fromlist,tolist,flaglist,context,numlines)
s = []
fmt = ' <tr><td class="diff_next"%s>%s</td>%s' + \
'<td class="diff_next">%s</td>%s</tr>\n'
for i in range(len(flaglist)):
if flaglist[i] is None:
# mdiff yields None on separator lines skip the bogus ones
# generated for the first line
if i > 0:
s.append(' </tbody> \n <tbody>\n')
else:
s.append( fmt % (next_id[i],next_href[i],fromlist[i],
next_href[i],tolist[i]))
if fromdesc or todesc:
header_row = '<thead><tr>%s%s%s%s</tr></thead>' % (
'<th class="diff_next"><br /></th>',
'<th colspan="2" class="diff_header">%s</th>' % fromdesc,
'<th class="diff_next"><br /></th>',
'<th colspan="2" class="diff_header">%s</th>' % todesc)
else:
header_row = ''
table = self._table_template % dict(
data_rows=''.join(s),
header_row=header_row,
prefix=self._prefix[1])
return table.replace('\0+','<span class="diff_add">'). \
replace('\0-','<span class="diff_sub">'). \
replace('\0^','<span class="diff_chg">'). \
replace('\1','</span>'). \
replace('\t',' ')
del re
def restore(delta, which):
r"""
Generate one of the two sequences that generated a delta.
Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract
lines originating from file 1 or 2 (parameter `which`), stripping off line
prefixes.
Examples:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
... 'ore\ntree\nemu\n'.splitlines(keepends=True))
>>> diff = list(diff)
>>> print(''.join(restore(diff, 1)), end="")
one
two
three
>>> print(''.join(restore(diff, 2)), end="")
ore
tree
emu
"""
try:
tag = {1: "- ", 2: "+ "}[int(which)]
except KeyError:
raise ValueError('unknown delta choice (must be 1 or 2): %r'
% which)
prefixes = (" ", tag)
for line in delta:
if line[:2] in prefixes:
yield line[2:]
def _test():
import doctest, difflib
return doctest.testmod(difflib)
if __name__ == "__main__":
_test()
"""Disassembler of Python byte code into mnemonics."""
import sys
import types
import collections
import io
from opcode import *
from opcode import __all__ as _opcodes_all
__all__ = ["code_info", "dis", "disassemble", "distb", "disco",
"findlinestarts", "findlabels", "show_code",
"get_instructions", "Instruction", "Bytecode"] + _opcodes_all
del _opcodes_all
_have_code = (types.MethodType, types.FunctionType, types.CodeType, type)
def _try_compile(source, name):
"""Attempts to compile the given source, first as an expression and
then as a statement if the first approach fails.
Utility function to accept strings in functions that otherwise
expect code objects
"""
try:
c = compile(source, name, 'eval')
except SyntaxError:
c = compile(source, name, 'exec')
return c
def dis(x=None, *, file=None):
"""Disassemble classes, methods, functions, or code.
With no argument, disassemble the last traceback.
"""
if x is None:
distb(file=file)
return
if hasattr(x, '__func__'): # Method
x = x.__func__
if hasattr(x, '__code__'): # Function
x = x.__code__
if hasattr(x, '__dict__'): # Class or module
items = sorted(x.__dict__.items())
for name, x1 in items:
if isinstance(x1, _have_code):
print("Disassembly of %s:" % name, file=file)
try:
dis(x1, file=file)
except TypeError as msg:
print("Sorry:", msg, file=file)
print(file=file)
elif hasattr(x, 'co_code'): # Code object
disassemble(x, file=file)
elif isinstance(x, (bytes, bytearray)): # Raw bytecode
_disassemble_bytes(x, file=file)
elif isinstance(x, str): # Source code
_disassemble_str(x, file=file)
else:
raise TypeError("don't know how to disassemble %s objects" %
type(x).__name__)
def distb(tb=None, *, file=None):
"""Disassemble a traceback (default: last traceback)."""
if tb is None:
try:
tb = sys.last_traceback
except AttributeError:
raise RuntimeError("no last traceback to disassemble")
while tb.tb_next: tb = tb.tb_next
disassemble(tb.tb_frame.f_code, tb.tb_lasti, file=file)
# The inspect module interrogates this dictionary to build its
# list of CO_* constants. It is also used by pretty_flags to
# turn the co_flags field into a human readable list.
COMPILER_FLAG_NAMES = {
1: "OPTIMIZED",
2: "NEWLOCALS",
4: "VARARGS",
8: "VARKEYWORDS",
16: "NESTED",
32: "GENERATOR",
64: "NOFREE",
}
def pretty_flags(flags):
"""Return pretty representation of code flags."""
names = []
for i in range(32):
flag = 1<<i
if flags & flag:
names.append(COMPILER_FLAG_NAMES.get(flag, hex(flag)))
flags ^= flag
if not flags:
break
else:
names.append(hex(flags))
return ", ".join(names)
def _get_code_object(x):
"""Helper to handle methods, functions, strings and raw code objects"""
if hasattr(x, '__func__'): # Method
x = x.__func__
if hasattr(x, '__code__'): # Function
x = x.__code__
if isinstance(x, str): # Source code
x = _try_compile(x, "<disassembly>")
if hasattr(x, 'co_code'): # Code object
return x
raise TypeError("don't know how to disassemble %s objects" %
type(x).__name__)
def code_info(x):
"""Formatted details of methods, functions, or code."""
return _format_code_info(_get_code_object(x))
def _format_code_info(co):
lines = []
lines.append("Name: %s" % co.co_name)
lines.append("Filename: %s" % co.co_filename)
lines.append("Argument count: %s" % co.co_argcount)
lines.append("Kw-only arguments: %s" % co.co_kwonlyargcount)
lines.append("Number of locals: %s" % co.co_nlocals)
lines.append("Stack size: %s" % co.co_stacksize)
lines.append("Flags: %s" % pretty_flags(co.co_flags))
if co.co_consts:
lines.append("Constants:")
for i_c in enumerate(co.co_consts):
lines.append("%4d: %r" % i_c)
if co.co_names:
lines.append("Names:")
for i_n in enumerate(co.co_names):
lines.append("%4d: %s" % i_n)
if co.co_varnames:
lines.append("Variable names:")
for i_n in enumerate(co.co_varnames):
lines.append("%4d: %s" % i_n)
if co.co_freevars:
lines.append("Free variables:")
for i_n in enumerate(co.co_freevars):
lines.append("%4d: %s" % i_n)
if co.co_cellvars:
lines.append("Cell variables:")
for i_n in enumerate(co.co_cellvars):
lines.append("%4d: %s" % i_n)
return "\n".join(lines)
def show_code(co, *, file=None):
"""Print details of methods, functions, or code to *file*.
If *file* is not provided, the output is printed on stdout.
"""
print(code_info(co), file=file)
_Instruction = collections.namedtuple("_Instruction",
"opname opcode arg argval argrepr offset starts_line is_jump_target")
class Instruction(_Instruction):
"""Details for a bytecode operation
Defined fields:
opname - human readable name for operation
opcode - numeric code for operation
arg - numeric argument to operation (if any), otherwise None
argval - resolved arg value (if known), otherwise same as arg
argrepr - human readable description of operation argument
offset - start index of operation within bytecode sequence
starts_line - line started by this opcode (if any), otherwise None
is_jump_target - True if other code jumps to here, otherwise False
"""
def _disassemble(self, lineno_width=3, mark_as_current=False):
"""Format instruction details for inclusion in disassembly output
*lineno_width* sets the width of the line number field (0 omits it)
*mark_as_current* inserts a '-->' marker arrow as part of the line
"""
fields = []
# Column: Source code line number
if lineno_width:
if self.starts_line is not None:
lineno_fmt = "%%%dd" % lineno_width
fields.append(lineno_fmt % self.starts_line)
else:
fields.append(' ' * lineno_width)
# Column: Current instruction indicator
if mark_as_current:
fields.append('-->')
else:
fields.append(' ')
# Column: Jump target marker
if self.is_jump_target:
fields.append('>>')
else:
fields.append(' ')
# Column: Instruction offset from start of code sequence
fields.append(repr(self.offset).rjust(4))
# Column: Opcode name
fields.append(self.opname.ljust(20))
# Column: Opcode argument
if self.arg is not None:
fields.append(repr(self.arg).rjust(5))
# Column: Opcode argument details
if self.argrepr:
fields.append('(' + self.argrepr + ')')
return ' '.join(fields).rstrip()
def get_instructions(x, *, first_line=None):
"""Iterator for the opcodes in methods, functions or code
Generates a series of Instruction named tuples giving the details of
each operations in the supplied code.
If *first_line* is not None, it indicates the line number that should
be reported for the first source line in the disassembled code.
Otherwise, the source line information (if any) is taken directly from
the disassembled code object.
"""
co = _get_code_object(x)
cell_names = co.co_cellvars + co.co_freevars
linestarts = dict(findlinestarts(co))
if first_line is not None:
line_offset = first_line - co.co_firstlineno
else:
line_offset = 0
return _get_instructions_bytes(co.co_code, co.co_varnames, co.co_names,
co.co_consts, cell_names, linestarts,
line_offset)
def _get_const_info(const_index, const_list):
"""Helper to get optional details about const references
Returns the dereferenced constant and its repr if the constant
list is defined.
Otherwise returns the constant index and its repr().
"""
argval = const_index
if const_list is not None:
argval = const_list[const_index]
return argval, repr(argval)
def _get_name_info(name_index, name_list):
"""Helper to get optional details about named references
Returns the dereferenced name as both value and repr if the name
list is defined.
Otherwise returns the name index and its repr().
"""
argval = name_index
if name_list is not None:
argval = name_list[name_index]
argrepr = argval
else:
argrepr = repr(argval)
return argval, argrepr
def _get_instructions_bytes(code, varnames=None, names=None, constants=None,
cells=None, linestarts=None, line_offset=0):
"""Iterate over the instructions in a bytecode string.
Generates a sequence of Instruction namedtuples giving the details of each
opcode. Additional information about the code's runtime environment
(e.g. variable names, constants) can be specified using optional
arguments.
"""
labels = findlabels(code)
extended_arg = 0
starts_line = None
free = None
# enumerate() is not an option, since we sometimes process
# multiple elements on a single pass through the loop
n = len(code)
i = 0
while i < n:
op = code[i]
offset = i
if linestarts is not None:
starts_line = linestarts.get(i, None)
if starts_line is not None:
starts_line += line_offset
is_jump_target = i in labels
i = i+1
arg = None
argval = None
argrepr = ''
if op >= HAVE_ARGUMENT:
arg = code[i] + code[i+1]*256 + extended_arg
extended_arg = 0
i = i+2
if op == EXTENDED_ARG:
extended_arg = arg*65536
# Set argval to the dereferenced value of the argument when
# availabe, and argrepr to the string representation of argval.
# _disassemble_bytes needs the string repr of the
# raw name index for LOAD_GLOBAL, LOAD_CONST, etc.
argval = arg
if op in hasconst:
argval, argrepr = _get_const_info(arg, constants)
elif op in hasname:
argval, argrepr = _get_name_info(arg, names)
elif op in hasjrel:
argval = i + arg
argrepr = "to " + repr(argval)
elif op in haslocal:
argval, argrepr = _get_name_info(arg, varnames)
elif op in hascompare:
argval = cmp_op[arg]
argrepr = argval
elif op in hasfree:
argval, argrepr = _get_name_info(arg, cells)
elif op in hasnargs:
argrepr = "%d positional, %d keyword pair" % (code[i-2], code[i-1])
yield Instruction(opname[op], op,
arg, argval, argrepr,
offset, starts_line, is_jump_target)
def disassemble(co, lasti=-1, *, file=None):
"""Disassemble a code object."""
cell_names = co.co_cellvars + co.co_freevars
linestarts = dict(findlinestarts(co))
_disassemble_bytes(co.co_code, lasti, co.co_varnames, co.co_names,
co.co_consts, cell_names, linestarts, file=file)
def _disassemble_bytes(code, lasti=-1, varnames=None, names=None,
constants=None, cells=None, linestarts=None,
*, file=None, line_offset=0):
# Omit the line number column entirely if we have no line number info
show_lineno = linestarts is not None
# TODO?: Adjust width upwards if max(linestarts.values()) >= 1000?
lineno_width = 3 if show_lineno else 0
for instr in _get_instructions_bytes(code, varnames, names,
constants, cells, linestarts,
line_offset=line_offset):
new_source_line = (show_lineno and
instr.starts_line is not None and
instr.offset > 0)
if new_source_line:
print(file=file)
is_current_instr = instr.offset == lasti
print(instr._disassemble(lineno_width, is_current_instr), file=file)
def _disassemble_str(source, *, file=None):
"""Compile the source string, then disassemble the code object."""
disassemble(_try_compile(source, '<dis>'), file=file)
disco = disassemble # XXX For backwards compatibility
def findlabels(code):
"""Detect all offsets in a byte code which are jump targets.
Return the list of offsets.
"""
labels = []
# enumerate() is not an option, since we sometimes process
# multiple elements on a single pass through the loop
n = len(code)
i = 0
while i < n:
op = code[i]
i = i+1
if op >= HAVE_ARGUMENT:
arg = code[i] + code[i+1]*256
i = i+2
label = -1
if op in hasjrel:
label = i+arg
elif op in hasjabs:
label = arg
if label >= 0:
if label not in labels:
labels.append(label)
return labels
def findlinestarts(code):
"""Find the offsets in a byte code which are start of lines in the source.
Generate pairs (offset, lineno) as described in Python/compile.c.
"""
byte_increments = list(code.co_lnotab[0::2])
line_increments = list(code.co_lnotab[1::2])
lastlineno = None
lineno = code.co_firstlineno
addr = 0
for byte_incr, line_incr in zip(byte_increments, line_increments):
if byte_incr:
if lineno != lastlineno:
yield (addr, lineno)
lastlineno = lineno
addr += byte_incr
lineno += line_incr
if lineno != lastlineno:
yield (addr, lineno)
class Bytecode:
"""The bytecode operations of a piece of code
Instantiate this with a function, method, string of code, or a code object
(as returned by compile()).
Iterating over this yields the bytecode operations as Instruction instances.
"""
def __init__(self, x, *, first_line=None, current_offset=None):
self.codeobj = co = _get_code_object(x)
if first_line is None:
self.first_line = co.co_firstlineno
self._line_offset = 0
else:
self.first_line = first_line
self._line_offset = first_line - co.co_firstlineno
self._cell_names = co.co_cellvars + co.co_freevars
self._linestarts = dict(findlinestarts(co))
self._original_object = x
self.current_offset = current_offset
def __iter__(self):
co = self.codeobj
return _get_instructions_bytes(co.co_code, co.co_varnames, co.co_names,
co.co_consts, self._cell_names,
self._linestarts,
line_offset=self._line_offset)
def __repr__(self):
return "{}({!r})".format(self.__class__.__name__,
self._original_object)
@classmethod
def from_traceback(cls, tb):
""" Construct a Bytecode from the given traceback """
while tb.tb_next:
tb = tb.tb_next
return cls(tb.tb_frame.f_code, current_offset=tb.tb_lasti)
def info(self):
"""Return formatted information about the code object."""
return _format_code_info(self.codeobj)
def dis(self):
"""Return a formatted view of the bytecode operations."""
co = self.codeobj
if self.current_offset is not None:
offset = self.current_offset
else:
offset = -1
with io.StringIO() as output:
_disassemble_bytes(co.co_code, varnames=co.co_varnames,
names=co.co_names, constants=co.co_consts,
cells=self._cell_names,
linestarts=self._linestarts,
line_offset=self._line_offset,
file=output,
lasti=offset)
return output.getvalue()
def _test():
"""Simple test program to disassemble a file."""
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('infile', type=argparse.FileType(), nargs='?', default='-')
args = parser.parse_args()
with args.infile as infile:
source = infile.read()
code = compile(source, args.infile.name, "exec")
dis(code)
if __name__ == "__main__":
_test()
# Module doctest.
# Released to the public domain 16-Jan-2001, by Tim Peters ([email protected]).
# Major enhancements and refactoring by:
# Jim Fulton
# Edward Loper
# Provided as-is; use at your own risk; no warranty; no promises; enjoy!
r"""Module doctest -- a framework for running examples in docstrings.
In simplest use, end each module M to be tested with:
def _test():
import doctest
doctest.testmod()
if __name__ == "__main__":
_test()
Then running the module as a script will cause the examples in the
docstrings to get executed and verified:
python M.py
This won't display anything unless an example fails, in which case the
failing example(s) and the cause(s) of the failure(s) are printed to stdout
(why not stderr? because stderr is a lame hack <0.2 wink>), and the final
line of output is "Test failed.".
Run it with the -v switch instead:
python M.py -v
and a detailed report of all examples tried is printed to stdout, along
with assorted summaries at the end.
You can force verbose mode by passing "verbose=True" to testmod, or prohibit
it by passing "verbose=False". In either of those cases, sys.argv is not
examined by testmod.
There are a variety of other ways to run doctests, including integration
with the unittest framework, and support for running non-Python text
files containing doctests. There are also many ways to override parts
of doctest's default behaviors. See the Library Reference Manual for
details.
"""
__docformat__ = 'reStructuredText en'
__all__ = [
# 0, Option Flags
'register_optionflag',
'DONT_ACCEPT_TRUE_FOR_1',
'DONT_ACCEPT_BLANKLINE',
'NORMALIZE_WHITESPACE',
'ELLIPSIS',
'SKIP',
'IGNORE_EXCEPTION_DETAIL',
'COMPARISON_FLAGS',
'REPORT_UDIFF',
'REPORT_CDIFF',
'REPORT_NDIFF',
'REPORT_ONLY_FIRST_FAILURE',
'REPORTING_FLAGS',
'FAIL_FAST',
# 1. Utility Functions
# 2. Example & DocTest
'Example',
'DocTest',
# 3. Doctest Parser
'DocTestParser',
# 4. Doctest Finder
'DocTestFinder',
# 5. Doctest Runner
'DocTestRunner',
'OutputChecker',
'DocTestFailure',
'UnexpectedException',
'DebugRunner',
# 6. Test Functions
'testmod',
'testfile',
'run_docstring_examples',
# 7. Unittest Support
'DocTestSuite',
'DocFileSuite',
'set_unittest_reportflags',
# 8. Debugging Support
'script_from_examples',
'testsource',
'debug_src',
'debug',
]
import __future__
import argparse
import difflib
import inspect
import linecache
import os
import pdb
import re
import sys
import traceback
import unittest
from io import StringIO
from collections import namedtuple
TestResults = namedtuple('TestResults', 'failed attempted')
# There are 4 basic classes:
# - Example: a <source, want> pair, plus an intra-docstring line number.
# - DocTest: a collection of examples, parsed from a docstring, plus
# info about where the docstring came from (name, filename, lineno).
# - DocTestFinder: extracts DocTests from a given object's docstring and
# its contained objects' docstrings.
# - DocTestRunner: runs DocTest cases, and accumulates statistics.
#
# So the basic picture is:
#
# list of:
# +------+ +---------+ +-------+
# |object| --DocTestFinder-> | DocTest | --DocTestRunner-> |results|
# +------+ +---------+ +-------+
# | Example |
# | ... |
# | Example |
# +---------+
# Option constants.
OPTIONFLAGS_BY_NAME = {}
def register_optionflag(name):
# Create a new flag unless `name` is already known.
return OPTIONFLAGS_BY_NAME.setdefault(name, 1 << len(OPTIONFLAGS_BY_NAME))
DONT_ACCEPT_TRUE_FOR_1 = register_optionflag('DONT_ACCEPT_TRUE_FOR_1')
DONT_ACCEPT_BLANKLINE = register_optionflag('DONT_ACCEPT_BLANKLINE')
NORMALIZE_WHITESPACE = register_optionflag('NORMALIZE_WHITESPACE')
ELLIPSIS = register_optionflag('ELLIPSIS')
SKIP = register_optionflag('SKIP')
IGNORE_EXCEPTION_DETAIL = register_optionflag('IGNORE_EXCEPTION_DETAIL')
COMPARISON_FLAGS = (DONT_ACCEPT_TRUE_FOR_1 |
DONT_ACCEPT_BLANKLINE |
NORMALIZE_WHITESPACE |
ELLIPSIS |
SKIP |
IGNORE_EXCEPTION_DETAIL)
REPORT_UDIFF = register_optionflag('REPORT_UDIFF')
REPORT_CDIFF = register_optionflag('REPORT_CDIFF')
REPORT_NDIFF = register_optionflag('REPORT_NDIFF')
REPORT_ONLY_FIRST_FAILURE = register_optionflag('REPORT_ONLY_FIRST_FAILURE')
FAIL_FAST = register_optionflag('FAIL_FAST')
REPORTING_FLAGS = (REPORT_UDIFF |
REPORT_CDIFF |
REPORT_NDIFF |
REPORT_ONLY_FIRST_FAILURE |
FAIL_FAST)
# Special string markers for use in `want` strings:
BLANKLINE_MARKER = '<BLANKLINE>'
ELLIPSIS_MARKER = '...'
######################################################################
## Table of Contents
######################################################################
# 1. Utility Functions
# 2. Example & DocTest -- store test cases
# 3. DocTest Parser -- extracts examples from strings
# 4. DocTest Finder -- extracts test cases from objects
# 5. DocTest Runner -- runs test cases
# 6. Test Functions -- convenient wrappers for testing
# 7. Unittest Support
# 8. Debugging Support
# 9. Example Usage
######################################################################
## 1. Utility Functions
######################################################################
def _extract_future_flags(globs):
"""
Return the compiler-flags associated with the future features that
have been imported into the given namespace (globs).
"""
flags = 0
for fname in __future__.all_feature_names:
feature = globs.get(fname, None)
if feature is getattr(__future__, fname):
flags |= feature.compiler_flag
return flags
def _normalize_module(module, depth=2):
"""
Return the module specified by `module`. In particular:
- If `module` is a module, then return module.
- If `module` is a string, then import and return the
module with that name.
- If `module` is None, then return the calling module.
The calling module is assumed to be the module of
the stack frame at the given depth in the call stack.
"""
if inspect.ismodule(module):
return module
elif isinstance(module, str):
return __import__(module, globals(), locals(), ["*"])
elif module is None:
return sys.modules[sys._getframe(depth).f_globals['__name__']]
else:
raise TypeError("Expected a module, string, or None")
def _load_testfile(filename, package, module_relative, encoding):
if module_relative:
package = _normalize_module(package, 3)
filename = _module_relative_path(package, filename)
if getattr(package, '__loader__', None) is not None:
if hasattr(package.__loader__, 'get_data'):
file_contents = package.__loader__.get_data(filename)
file_contents = file_contents.decode(encoding)
# get_data() opens files as 'rb', so one must do the equivalent
# conversion as universal newlines would do.
return file_contents.replace(os.linesep, '\n'), filename
with open(filename, encoding=encoding) as f:
return f.read(), filename
def _indent(s, indent=4):
"""
Add the given number of space characters to the beginning of
every non-blank line in `s`, and return the result.
"""
# This regexp matches the start of non-blank lines:
return re.sub('(?m)^(?!$)', indent*' ', s)
def _exception_traceback(exc_info):
"""
Return a string containing a traceback message for the given
exc_info tuple (as returned by sys.exc_info()).
"""
# Get a traceback message.
excout = StringIO()
exc_type, exc_val, exc_tb = exc_info
traceback.print_exception(exc_type, exc_val, exc_tb, file=excout)
return excout.getvalue()
# Override some StringIO methods.
class _SpoofOut(StringIO):
def getvalue(self):
result = StringIO.getvalue(self)
# If anything at all was written, make sure there's a trailing
# newline. There's no way for the expected output to indicate
# that a trailing newline is missing.
if result and not result.endswith("\n"):
result += "\n"
return result
def truncate(self, size=None):
self.seek(size)
StringIO.truncate(self)
# Worst-case linear-time ellipsis matching.
def _ellipsis_match(want, got):
"""
Essentially the only subtle case:
>>> _ellipsis_match('aa...aa', 'aaa')
False
"""
if ELLIPSIS_MARKER not in want:
return want == got
# Find "the real" strings.
ws = want.split(ELLIPSIS_MARKER)
assert len(ws) >= 2
# Deal with exact matches possibly needed at one or both ends.
startpos, endpos = 0, len(got)
w = ws[0]
if w: # starts with exact match
if got.startswith(w):
startpos = len(w)
del ws[0]
else:
return False
w = ws[-1]
if w: # ends with exact match
if got.endswith(w):
endpos -= len(w)
del ws[-1]
else:
return False
if startpos > endpos:
# Exact end matches required more characters than we have, as in
# _ellipsis_match('aa...aa', 'aaa')
return False
# For the rest, we only need to find the leftmost non-overlapping
# match for each piece. If there's no overall match that way alone,
# there's no overall match period.
for w in ws:
# w may be '' at times, if there are consecutive ellipses, or
# due to an ellipsis at the start or end of `want`. That's OK.
# Search for an empty string succeeds, and doesn't change startpos.
startpos = got.find(w, startpos, endpos)
if startpos < 0:
return False
startpos += len(w)
return True
def _comment_line(line):
"Return a commented form of the given line"
line = line.rstrip()
if line:
return '# '+line
else:
return '#'
def _strip_exception_details(msg):
# Support for IGNORE_EXCEPTION_DETAIL.
# Get rid of everything except the exception name; in particular, drop
# the possibly dotted module path (if any) and the exception message (if
# any). We assume that a colon is never part of a dotted name, or of an
# exception name.
# E.g., given
# "foo.bar.MyError: la di da"
# return "MyError"
# Or for "abc.def" or "abc.def:\n" return "def".
start, end = 0, len(msg)
# The exception name must appear on the first line.
i = msg.find("\n")
if i >= 0:
end = i
# retain up to the first colon (if any)
i = msg.find(':', 0, end)
if i >= 0:
end = i
# retain just the exception name
i = msg.rfind('.', 0, end)
if i >= 0:
start = i+1
return msg[start: end]
class _OutputRedirectingPdb(pdb.Pdb):
"""
A specialized version of the python debugger that redirects stdout
to a given stream when interacting with the user. Stdout is *not*
redirected when traced code is executed.
"""
def __init__(self, out):
self.__out = out
self.__debugger_used = False
# do not play signal games in the pdb
pdb.Pdb.__init__(self, stdout=out, nosigint=True)
# still use input() to get user input
self.use_rawinput = 1
def set_trace(self, frame=None):
self.__debugger_used = True
if frame is None:
frame = sys._getframe().f_back
pdb.Pdb.set_trace(self, frame)
def set_continue(self):
# Calling set_continue unconditionally would break unit test
# coverage reporting, as Bdb.set_continue calls sys.settrace(None).
if self.__debugger_used:
pdb.Pdb.set_continue(self)
def trace_dispatch(self, *args):
# Redirect stdout to the given stream.
save_stdout = sys.stdout
sys.stdout = self.__out
# Call Pdb's trace dispatch method.
try:
return pdb.Pdb.trace_dispatch(self, *args)
finally:
sys.stdout = save_stdout
# [XX] Normalize with respect to os.path.pardir?
def _module_relative_path(module, path):
if not inspect.ismodule(module):
raise TypeError('Expected a module: %r' % module)
if path.startswith('/'):
raise ValueError('Module-relative files may not have absolute paths')
# Find the base directory for the path.
if hasattr(module, '__file__'):
# A normal module/package
basedir = os.path.split(module.__file__)[0]
elif module.__name__ == '__main__':
# An interactive session.
if len(sys.argv)>0 and sys.argv[0] != '':
basedir = os.path.split(sys.argv[0])[0]
else:
basedir = os.curdir
else:
# A module w/o __file__ (this includes builtins)
raise ValueError("Can't resolve paths relative to the module " +
module + " (it has no __file__)")
# Combine the base directory and the path.
return os.path.join(basedir, *(path.split('/')))
######################################################################
## 2. Example & DocTest
######################################################################
## - An "example" is a <source, want> pair, where "source" is a
## fragment of source code, and "want" is the expected output for
## "source." The Example class also includes information about
## where the example was extracted from.
##
## - A "doctest" is a collection of examples, typically extracted from
## a string (such as an object's docstring). The DocTest class also
## includes information about where the string was extracted from.
class Example:
"""
A single doctest example, consisting of source code and expected
output. `Example` defines the following attributes:
- source: A single Python statement, always ending with a newline.
The constructor adds a newline if needed.
- want: The expected output from running the source code (either
from stdout, or a traceback in case of exception). `want` ends
with a newline unless it's empty, in which case it's an empty
string. The constructor adds a newline if needed.
- exc_msg: The exception message generated by the example, if
the example is expected to generate an exception; or `None` if
it is not expected to generate an exception. This exception
message is compared against the return value of
`traceback.format_exception_only()`. `exc_msg` ends with a
newline unless it's `None`. The constructor adds a newline
if needed.
- lineno: The line number within the DocTest string containing
this Example where the Example begins. This line number is
zero-based, with respect to the beginning of the DocTest.
- indent: The example's indentation in the DocTest string.
I.e., the number of space characters that precede the
example's first prompt.
- options: A dictionary mapping from option flags to True or
False, which is used to override default options for this
example. Any option flags not contained in this dictionary
are left at their default value (as specified by the
DocTestRunner's optionflags). By default, no options are set.
"""
def __init__(self, source, want, exc_msg=None, lineno=0, indent=0,
options=None):
# Normalize inputs.
if not source.endswith('\n'):
source += '\n'
if want and not want.endswith('\n'):
want += '\n'
if exc_msg is not None and not exc_msg.endswith('\n'):
exc_msg += '\n'
# Store properties.
self.source = source
self.want = want
self.lineno = lineno
self.indent = indent
if options is None: options = {}
self.options = options
self.exc_msg = exc_msg
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self.source == other.source and \
self.want == other.want and \
self.lineno == other.lineno and \
self.indent == other.indent and \
self.options == other.options and \
self.exc_msg == other.exc_msg
def __hash__(self):
return hash((self.source, self.want, self.lineno, self.indent,
self.exc_msg))
class DocTest:
"""
A collection of doctest examples that should be run in a single
namespace. Each `DocTest` defines the following attributes:
- examples: the list of examples.
- globs: The namespace (aka globals) that the examples should
be run in.
- name: A name identifying the DocTest (typically, the name of
the object whose docstring this DocTest was extracted from).
- filename: The name of the file that this DocTest was extracted
from, or `None` if the filename is unknown.
- lineno: The line number within filename where this DocTest
begins, or `None` if the line number is unavailable. This
line number is zero-based, with respect to the beginning of
the file.
- docstring: The string that the examples were extracted from,
or `None` if the string is unavailable.
"""
def __init__(self, examples, globs, name, filename, lineno, docstring):
"""
Create a new DocTest containing the given examples. The
DocTest's globals are initialized with a copy of `globs`.
"""
assert not isinstance(examples, str), \
"DocTest no longer accepts str; use DocTestParser instead"
self.examples = examples
self.docstring = docstring
self.globs = globs.copy()
self.name = name
self.filename = filename
self.lineno = lineno
def __repr__(self):
if len(self.examples) == 0:
examples = 'no examples'
elif len(self.examples) == 1:
examples = '1 example'
else:
examples = '%d examples' % len(self.examples)
return ('<DocTest %s from %s:%s (%s)>' %
(self.name, self.filename, self.lineno, examples))
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self.examples == other.examples and \
self.docstring == other.docstring and \
self.globs == other.globs and \
self.name == other.name and \
self.filename == other.filename and \
self.lineno == other.lineno
def __hash__(self):
return hash((self.docstring, self.name, self.filename, self.lineno))
# This lets us sort tests by name:
def __lt__(self, other):
if not isinstance(other, DocTest):
return NotImplemented
return ((self.name, self.filename, self.lineno, id(self))
<
(other.name, other.filename, other.lineno, id(other)))
######################################################################
## 3. DocTestParser
######################################################################
class DocTestParser:
"""
A class used to parse strings containing doctest examples.
"""
# This regular expression is used to find doctest examples in a
# string. It defines three groups: `source` is the source code
# (including leading indentation and prompts); `indent` is the
# indentation of the first (PS1) line of the source code; and
# `want` is the expected output (including leading indentation).
_EXAMPLE_RE = re.compile(r'''
# Source consists of a PS1 line followed by zero or more PS2 lines.
(?P<source>
(?:^(?P<indent> [ ]*) >>> .*) # PS1 line
(?:\n [ ]* \.\.\. .*)*) # PS2 lines
\n?
# Want consists of any non-blank lines that do not start with PS1.
(?P<want> (?:(?![ ]*$) # Not a blank line
(?![ ]*>>>) # Not a line starting with PS1
.+$\n? # But any other line
)*)
''', re.MULTILINE | re.VERBOSE)
# A regular expression for handling `want` strings that contain
# expected exceptions. It divides `want` into three pieces:
# - the traceback header line (`hdr`)
# - the traceback stack (`stack`)
# - the exception message (`msg`), as generated by
# traceback.format_exception_only()
# `msg` may have multiple lines. We assume/require that the
# exception message is the first non-indented line starting with a word
# character following the traceback header line.
_EXCEPTION_RE = re.compile(r"""
# Grab the traceback header. Different versions of Python have
# said different things on the first traceback line.
^(?P<hdr> Traceback\ \(
(?: most\ recent\ call\ last
| innermost\ last
) \) :
)
\s* $ # toss trailing whitespace on the header.
(?P<stack> .*?) # don't blink: absorb stuff until...
^ (?P<msg> \w+ .*) # a line *starts* with alphanum.
""", re.VERBOSE | re.MULTILINE | re.DOTALL)
# A callable returning a true value iff its argument is a blank line
# or contains a single comment.
_IS_BLANK_OR_COMMENT = re.compile(r'^[ ]*(#.*)?$').match
def parse(self, string, name='<string>'):
"""
Divide the given string into examples and intervening text,
and return them as a list of alternating Examples and strings.
Line numbers for the Examples are 0-based. The optional
argument `name` is a name identifying this string, and is only
used for error messages.
"""
string = string.expandtabs()
# If all lines begin with the same indentation, then strip it.
min_indent = self._min_indent(string)
if min_indent > 0:
string = '\n'.join([l[min_indent:] for l in string.split('\n')])
output = []
charno, lineno = 0, 0
# Find all doctest examples in the string:
for m in self._EXAMPLE_RE.finditer(string):
# Add the pre-example text to `output`.
output.append(string[charno:m.start()])
# Update lineno (lines before this example)
lineno += string.count('\n', charno, m.start())
# Extract info from the regexp match.
(source, options, want, exc_msg) = \
self._parse_example(m, name, lineno)
# Create an Example, and add it to the list.
if not self._IS_BLANK_OR_COMMENT(source):
output.append( Example(source, want, exc_msg,
lineno=lineno,
indent=min_indent+len(m.group('indent')),
options=options) )
# Update lineno (lines inside this example)
lineno += string.count('\n', m.start(), m.end())
# Update charno.
charno = m.end()
# Add any remaining post-example text to `output`.
output.append(string[charno:])
return output
def get_doctest(self, string, globs, name, filename, lineno):
"""
Extract all doctest examples from the given string, and
collect them into a `DocTest` object.
`globs`, `name`, `filename`, and `lineno` are attributes for
the new `DocTest` object. See the documentation for `DocTest`
for more information.
"""
return DocTest(self.get_examples(string, name), globs,
name, filename, lineno, string)
def get_examples(self, string, name='<string>'):
"""
Extract all doctest examples from the given string, and return
them as a list of `Example` objects. Line numbers are
0-based, because it's most common in doctests that nothing
interesting appears on the same line as opening triple-quote,
and so the first interesting line is called \"line 1\" then.
The optional argument `name` is a name identifying this
string, and is only used for error messages.
"""
return [x for x in self.parse(string, name)
if isinstance(x, Example)]
def _parse_example(self, m, name, lineno):
"""
Given a regular expression match from `_EXAMPLE_RE` (`m`),
return a pair `(source, want)`, where `source` is the matched
example's source code (with prompts and indentation stripped);
and `want` is the example's expected output (with indentation
stripped).
`name` is the string's name, and `lineno` is the line number
where the example starts; both are used for error messages.
"""
# Get the example's indentation level.
indent = len(m.group('indent'))
# Divide source into lines; check that they're properly
# indented; and then strip their indentation & prompts.
source_lines = m.group('source').split('\n')
self._check_prompt_blank(source_lines, indent, name, lineno)
self._check_prefix(source_lines[1:], ' '*indent + '.', name, lineno)
source = '\n'.join([sl[indent+4:] for sl in source_lines])
# Divide want into lines; check that it's properly indented; and
# then strip the indentation. Spaces before the last newline should
# be preserved, so plain rstrip() isn't good enough.
want = m.group('want')
want_lines = want.split('\n')
if len(want_lines) > 1 and re.match(r' *$', want_lines[-1]):
del want_lines[-1] # forget final newline & spaces after it
self._check_prefix(want_lines, ' '*indent, name,
lineno + len(source_lines))
want = '\n'.join([wl[indent:] for wl in want_lines])
# If `want` contains a traceback message, then extract it.
m = self._EXCEPTION_RE.match(want)
if m:
exc_msg = m.group('msg')
else:
exc_msg = None
# Extract options from the source.
options = self._find_options(source, name, lineno)
return source, options, want, exc_msg
# This regular expression looks for option directives in the
# source code of an example. Option directives are comments
# starting with "doctest:". Warning: this may give false
# positives for string-literals that contain the string
# "#doctest:". Eliminating these false positives would require
# actually parsing the string; but we limit them by ignoring any
# line containing "#doctest:" that is *followed* by a quote mark.
_OPTION_DIRECTIVE_RE = re.compile(r'#\s*doctest:\s*([^\n\'"]*)$',
re.MULTILINE)
def _find_options(self, source, name, lineno):
"""
Return a dictionary containing option overrides extracted from
option directives in the given source string.
`name` is the string's name, and `lineno` is the line number
where the example starts; both are used for error messages.
"""
options = {}
# (note: with the current regexp, this will match at most once:)
for m in self._OPTION_DIRECTIVE_RE.finditer(source):
option_strings = m.group(1).replace(',', ' ').split()
for option in option_strings:
if (option[0] not in '+-' or
option[1:] not in OPTIONFLAGS_BY_NAME):
raise ValueError('line %r of the doctest for %s '
'has an invalid option: %r' %
(lineno+1, name, option))
flag = OPTIONFLAGS_BY_NAME[option[1:]]
options[flag] = (option[0] == '+')
if options and self._IS_BLANK_OR_COMMENT(source):
raise ValueError('line %r of the doctest for %s has an option '
'directive on a line with no example: %r' %
(lineno, name, source))
return options
# This regular expression finds the indentation of every non-blank
# line in a string.
_INDENT_RE = re.compile('^([ ]*)(?=\S)', re.MULTILINE)
def _min_indent(self, s):
"Return the minimum indentation of any non-blank line in `s`"
indents = [len(indent) for indent in self._INDENT_RE.findall(s)]
if len(indents) > 0:
return min(indents)
else:
return 0
def _check_prompt_blank(self, lines, indent, name, lineno):
"""
Given the lines of a source string (including prompts and
leading indentation), check to make sure that every prompt is
followed by a space character. If any line is not followed by
a space character, then raise ValueError.
"""
for i, line in enumerate(lines):
if len(line) >= indent+4 and line[indent+3] != ' ':
raise ValueError('line %r of the docstring for %s '
'lacks blank after %s: %r' %
(lineno+i+1, name,
line[indent:indent+3], line))
def _check_prefix(self, lines, prefix, name, lineno):
"""
Check that every line in the given list starts with the given
prefix; if any line does not, then raise a ValueError.
"""
for i, line in enumerate(lines):
if line and not line.startswith(prefix):
raise ValueError('line %r of the docstring for %s has '
'inconsistent leading whitespace: %r' %
(lineno+i+1, name, line))
######################################################################
## 4. DocTest Finder
######################################################################
class DocTestFinder:
"""
A class used to extract the DocTests that are relevant to a given
object, from its docstring and the docstrings of its contained
objects. Doctests can currently be extracted from the following
object types: modules, functions, classes, methods, staticmethods,
classmethods, and properties.
"""
def __init__(self, verbose=False, parser=DocTestParser(),
recurse=True, exclude_empty=True):
"""
Create a new doctest finder.
The optional argument `parser` specifies a class or
function that should be used to create new DocTest objects (or
objects that implement the same interface as DocTest). The
signature for this factory function should match the signature
of the DocTest constructor.
If the optional argument `recurse` is false, then `find` will
only examine the given object, and not any contained objects.
If the optional argument `exclude_empty` is false, then `find`
will include tests for objects with empty docstrings.
"""
self._parser = parser
self._verbose = verbose
self._recurse = recurse
self._exclude_empty = exclude_empty
def find(self, obj, name=None, module=None, globs=None, extraglobs=None):
"""
Return a list of the DocTests that are defined by the given
object's docstring, or by any of its contained objects'
docstrings.
The optional parameter `module` is the module that contains
the given object. If the module is not specified or is None, then
the test finder will attempt to automatically determine the
correct module. The object's module is used:
- As a default namespace, if `globs` is not specified.
- To prevent the DocTestFinder from extracting DocTests
from objects that are imported from other modules.
- To find the name of the file containing the object.
- To help find the line number of the object within its
file.
Contained objects whose module does not match `module` are ignored.
If `module` is False, no attempt to find the module will be made.
This is obscure, of use mostly in tests: if `module` is False, or
is None but cannot be found automatically, then all objects are
considered to belong to the (non-existent) module, so all contained
objects will (recursively) be searched for doctests.
The globals for each DocTest is formed by combining `globs`
and `extraglobs` (bindings in `extraglobs` override bindings
in `globs`). A new copy of the globals dictionary is created
for each DocTest. If `globs` is not specified, then it
defaults to the module's `__dict__`, if specified, or {}
otherwise. If `extraglobs` is not specified, then it defaults
to {}.
"""
# If name was not specified, then extract it from the object.
if name is None:
name = getattr(obj, '__name__', None)
if name is None:
raise ValueError("DocTestFinder.find: name must be given "
"when obj.__name__ doesn't exist: %r" %
(type(obj),))
# Find the module that contains the given object (if obj is
# a module, then module=obj.). Note: this may fail, in which
# case module will be None.
if module is False:
module = None
elif module is None:
module = inspect.getmodule(obj)
# Read the module's source code. This is used by
# DocTestFinder._find_lineno to find the line number for a
# given object's docstring.
try:
file = inspect.getsourcefile(obj)
except TypeError:
source_lines = None
else:
if not file:
# Check to see if it's one of our special internal "files"
# (see __patched_linecache_getlines).
file = inspect.getfile(obj)
if not file[0]+file[-2:] == '<]>': file = None
if file is None:
source_lines = None
else:
if module is not None:
# Supply the module globals in case the module was
# originally loaded via a PEP 302 loader and
# file is not a valid filesystem path
source_lines = linecache.getlines(file, module.__dict__)
else:
# No access to a loader, so assume it's a normal
# filesystem path
source_lines = linecache.getlines(file)
if not source_lines:
source_lines = None
# Initialize globals, and merge in extraglobs.
if globs is None:
if module is None:
globs = {}
else:
globs = module.__dict__.copy()
else:
globs = globs.copy()
if extraglobs is not None:
globs.update(extraglobs)
if '__name__' not in globs:
globs['__name__'] = '__main__' # provide a default module name
# Recursively explore `obj`, extracting DocTests.
tests = []
self._find(tests, obj, name, module, source_lines, globs, {})
# Sort the tests by alpha order of names, for consistency in
# verbose-mode output. This was a feature of doctest in Pythons
# <= 2.3 that got lost by accident in 2.4. It was repaired in
# 2.4.4 and 2.5.
tests.sort()
return tests
def _from_module(self, module, object):
"""
Return true if the given object is defined in the given
module.
"""
if module is None:
return True
elif inspect.getmodule(object) is not None:
return module is inspect.getmodule(object)
elif inspect.isfunction(object):
return module.__dict__ is object.__globals__
elif inspect.ismethoddescriptor(object):
if hasattr(object, '__objclass__'):
obj_mod = object.__objclass__.__module__
elif hasattr(object, '__module__'):
obj_mod = object.__module__
else:
return True # [XX] no easy way to tell otherwise
return module.__name__ == obj_mod
elif inspect.isclass(object):
return module.__name__ == object.__module__
elif hasattr(object, '__module__'):
return module.__name__ == object.__module__
elif isinstance(object, property):
return True # [XX] no way not be sure.
else:
raise ValueError("object must be a class or function")
def _find(self, tests, obj, name, module, source_lines, globs, seen):
"""
Find tests for the given object and any contained objects, and
add them to `tests`.
"""
if self._verbose:
print('Finding tests in %s' % name)
# If we've already processed this object, then ignore it.
if id(obj) in seen:
return
seen[id(obj)] = 1
# Find a test for this object, and add it to the list of tests.
test = self._get_test(obj, name, module, globs, source_lines)
if test is not None:
tests.append(test)
# Look for tests in a module's contained objects.
if inspect.ismodule(obj) and self._recurse:
for valname, val in obj.__dict__.items():
valname = '%s.%s' % (name, valname)
# Recurse to functions & classes.
if ((inspect.isroutine(val) or inspect.isclass(val)) and
self._from_module(module, val)):
self._find(tests, val, valname, module, source_lines,
globs, seen)
# Look for tests in a module's __test__ dictionary.
if inspect.ismodule(obj) and self._recurse:
for valname, val in getattr(obj, '__test__', {}).items():
if not isinstance(valname, str):
raise ValueError("DocTestFinder.find: __test__ keys "
"must be strings: %r" %
(type(valname),))
if not (inspect.isroutine(val) or inspect.isclass(val) or
inspect.ismodule(val) or isinstance(val, str)):
raise ValueError("DocTestFinder.find: __test__ values "
"must be strings, functions, methods, "
"classes, or modules: %r" %
(type(val),))
valname = '%s.__test__.%s' % (name, valname)
self._find(tests, val, valname, module, source_lines,
globs, seen)
# Look for tests in a class's contained objects.
if inspect.isclass(obj) and self._recurse:
for valname, val in obj.__dict__.items():
# Special handling for staticmethod/classmethod.
if isinstance(val, staticmethod):
val = getattr(obj, valname)
if isinstance(val, classmethod):
val = getattr(obj, valname).__func__
# Recurse to methods, properties, and nested classes.
if ((inspect.isroutine(val) or inspect.isclass(val) or
isinstance(val, property)) and
self._from_module(module, val)):
valname = '%s.%s' % (name, valname)
self._find(tests, val, valname, module, source_lines,
globs, seen)
def _get_test(self, obj, name, module, globs, source_lines):
"""
Return a DocTest for the given object, if it defines a docstring;
otherwise, return None.
"""
# Extract the object's docstring. If it doesn't have one,
# then return None (no test for this object).
if isinstance(obj, str):
docstring = obj
else:
try:
if obj.__doc__ is None:
docstring = ''
else:
docstring = obj.__doc__
if not isinstance(docstring, str):
docstring = str(docstring)
except (TypeError, AttributeError):
docstring = ''
# Find the docstring's location in the file.
lineno = self._find_lineno(obj, source_lines)
# Don't bother if the docstring is empty.
if self._exclude_empty and not docstring:
return None
# Return a DocTest for this object.
if module is None:
filename = None
else:
filename = getattr(module, '__file__', module.__name__)
if filename[-4:] in (".pyc", ".pyo"):
filename = filename[:-1]
return self._parser.get_doctest(docstring, globs, name,
filename, lineno)
def _find_lineno(self, obj, source_lines):
"""
Return a line number of the given object's docstring. Note:
this method assumes that the object has a docstring.
"""
lineno = None
# Find the line number for modules.
if inspect.ismodule(obj):
lineno = 0
# Find the line number for classes.
# Note: this could be fooled if a class is defined multiple
# times in a single file.
if inspect.isclass(obj):
if source_lines is None:
return None
pat = re.compile(r'^\s*class\s*%s\b' %
getattr(obj, '__name__', '-'))
for i, line in enumerate(source_lines):
if pat.match(line):
lineno = i
break
# Find the line number for functions & methods.
if inspect.ismethod(obj): obj = obj.__func__
if inspect.isfunction(obj): obj = obj.__code__
if inspect.istraceback(obj): obj = obj.tb_frame
if inspect.isframe(obj): obj = obj.f_code
if inspect.iscode(obj):
lineno = getattr(obj, 'co_firstlineno', None)-1
# Find the line number where the docstring starts. Assume
# that it's the first line that begins with a quote mark.
# Note: this could be fooled by a multiline function
# signature, where a continuation line begins with a quote
# mark.
if lineno is not None:
if source_lines is None:
return lineno+1
pat = re.compile('(^|.*:)\s*\w*("|\')')
for lineno in range(lineno, len(source_lines)):
if pat.match(source_lines[lineno]):
return lineno
# We couldn't find the line number.
return None
######################################################################
## 5. DocTest Runner
######################################################################
class DocTestRunner:
"""
A class used to run DocTest test cases, and accumulate statistics.
The `run` method is used to process a single DocTest case. It
returns a tuple `(f, t)`, where `t` is the number of test cases
tried, and `f` is the number of test cases that failed.
>>> tests = DocTestFinder().find(_TestClass)
>>> runner = DocTestRunner(verbose=False)
>>> tests.sort(key = lambda test: test.name)
>>> for test in tests:
... print(test.name, '->', runner.run(test))
_TestClass -> TestResults(failed=0, attempted=2)
_TestClass.__init__ -> TestResults(failed=0, attempted=2)
_TestClass.get -> TestResults(failed=0, attempted=2)
_TestClass.square -> TestResults(failed=0, attempted=1)
The `summarize` method prints a summary of all the test cases that
have been run by the runner, and returns an aggregated `(f, t)`
tuple:
>>> runner.summarize(verbose=1)
4 items passed all tests:
2 tests in _TestClass
2 tests in _TestClass.__init__
2 tests in _TestClass.get
1 tests in _TestClass.square
7 tests in 4 items.
7 passed and 0 failed.
Test passed.
TestResults(failed=0, attempted=7)
The aggregated number of tried examples and failed examples is
also available via the `tries` and `failures` attributes:
>>> runner.tries
7
>>> runner.failures
0
The comparison between expected outputs and actual outputs is done
by an `OutputChecker`. This comparison may be customized with a
number of option flags; see the documentation for `testmod` for
more information. If the option flags are insufficient, then the
comparison may also be customized by passing a subclass of
`OutputChecker` to the constructor.
The test runner's display output can be controlled in two ways.
First, an output function (`out) can be passed to
`TestRunner.run`; this function will be called with strings that
should be displayed. It defaults to `sys.stdout.write`. If
capturing the output is not sufficient, then the display output
can be also customized by subclassing DocTestRunner, and
overriding the methods `report_start`, `report_success`,
`report_unexpected_exception`, and `report_failure`.
"""
# This divider string is used to separate failure messages, and to
# separate sections of the summary.
DIVIDER = "*" * 70
def __init__(self, checker=None, verbose=None, optionflags=0):
"""
Create a new test runner.
Optional keyword arg `checker` is the `OutputChecker` that
should be used to compare the expected outputs and actual
outputs of doctest examples.
Optional keyword arg 'verbose' prints lots of stuff if true,
only failures if false; by default, it's true iff '-v' is in
sys.argv.
Optional argument `optionflags` can be used to control how the
test runner compares expected output to actual output, and how
it displays failures. See the documentation for `testmod` for
more information.
"""
self._checker = checker or OutputChecker()
if verbose is None:
verbose = '-v' in sys.argv
self._verbose = verbose
self.optionflags = optionflags
self.original_optionflags = optionflags
# Keep track of the examples we've run.
self.tries = 0
self.failures = 0
self._name2ft = {}
# Create a fake output target for capturing doctest output.
self._fakeout = _SpoofOut()
#/////////////////////////////////////////////////////////////////
# Reporting methods
#/////////////////////////////////////////////////////////////////
def report_start(self, out, test, example):
"""
Report that the test runner is about to process the given
example. (Only displays a message if verbose=True)
"""
if self._verbose:
if example.want:
out('Trying:\n' + _indent(example.source) +
'Expecting:\n' + _indent(example.want))
else:
out('Trying:\n' + _indent(example.source) +
'Expecting nothing\n')
def report_success(self, out, test, example, got):
"""
Report that the given example ran successfully. (Only
displays a message if verbose=True)
"""
if self._verbose:
out("ok\n")
def report_failure(self, out, test, example, got):
"""
Report that the given example failed.
"""
out(self._failure_header(test, example) +
self._checker.output_difference(example, got, self.optionflags))
def report_unexpected_exception(self, out, test, example, exc_info):
"""
Report that the given example raised an unexpected exception.
"""
out(self._failure_header(test, example) +
'Exception raised:\n' + _indent(_exception_traceback(exc_info)))
def _failure_header(self, test, example):
out = [self.DIVIDER]
if test.filename:
if test.lineno is not None and example.lineno is not None:
lineno = test.lineno + example.lineno + 1
else:
lineno = '?'
out.append('File "%s", line %s, in %s' %
(test.filename, lineno, test.name))
else:
out.append('Line %s, in %s' % (example.lineno+1, test.name))
out.append('Failed example:')
source = example.source
out.append(_indent(source))
return '\n'.join(out)
#/////////////////////////////////////////////////////////////////
# DocTest Running
#/////////////////////////////////////////////////////////////////
def __run(self, test, compileflags, out):
"""
Run the examples in `test`. Write the outcome of each example
with one of the `DocTestRunner.report_*` methods, using the
writer function `out`. `compileflags` is the set of compiler
flags that should be used to execute examples. Return a tuple
`(f, t)`, where `t` is the number of examples tried, and `f`
is the number of examples that failed. The examples are run
in the namespace `test.globs`.
"""
# Keep track of the number of failures and tries.
failures = tries = 0
# Save the option flags (since option directives can be used
# to modify them).
original_optionflags = self.optionflags
SUCCESS, FAILURE, BOOM = range(3) # `outcome` state
check = self._checker.check_output
# Process each example.
for examplenum, example in enumerate(test.examples):
# If REPORT_ONLY_FIRST_FAILURE is set, then suppress
# reporting after the first failure.
quiet = (self.optionflags & REPORT_ONLY_FIRST_FAILURE and
failures > 0)
# Merge in the example's options.
self.optionflags = original_optionflags
if example.options:
for (optionflag, val) in example.options.items():
if val:
self.optionflags |= optionflag
else:
self.optionflags &= ~optionflag
# If 'SKIP' is set, then skip this example.
if self.optionflags & SKIP:
continue
# Record that we started this example.
tries += 1
if not quiet:
self.report_start(out, test, example)
# Use a special filename for compile(), so we can retrieve
# the source code during interactive debugging (see
# __patched_linecache_getlines).
filename = '<doctest %s[%d]>' % (test.name, examplenum)
# Run the example in the given context (globs), and record
# any exception that gets raised. (But don't intercept
# keyboard interrupts.)
try:
# Don't blink! This is where the user's code gets run.
exec(compile(example.source, filename, "single",
compileflags, 1), test.globs)
self.debugger.set_continue() # ==== Example Finished ====
exception = None
except KeyboardInterrupt:
raise
except:
exception = sys.exc_info()
self.debugger.set_continue() # ==== Example Finished ====
got = self._fakeout.getvalue() # the actual output
self._fakeout.truncate(0)
outcome = FAILURE # guilty until proved innocent or insane
# If the example executed without raising any exceptions,
# verify its output.
if exception is None:
if check(example.want, got, self.optionflags):
outcome = SUCCESS
# The example raised an exception: check if it was expected.
else:
exc_msg = traceback.format_exception_only(*exception[:2])[-1]
if not quiet:
got += _exception_traceback(exception)
# If `example.exc_msg` is None, then we weren't expecting
# an exception.
if example.exc_msg is None:
outcome = BOOM
# We expected an exception: see whether it matches.
elif check(example.exc_msg, exc_msg, self.optionflags):
outcome = SUCCESS
# Another chance if they didn't care about the detail.
elif self.optionflags & IGNORE_EXCEPTION_DETAIL:
if check(_strip_exception_details(example.exc_msg),
_strip_exception_details(exc_msg),
self.optionflags):
outcome = SUCCESS
# Report the outcome.
if outcome is SUCCESS:
if not quiet:
self.report_success(out, test, example, got)
elif outcome is FAILURE:
if not quiet:
self.report_failure(out, test, example, got)
failures += 1
elif outcome is BOOM:
if not quiet:
self.report_unexpected_exception(out, test, example,
exception)
failures += 1
else:
assert False, ("unknown outcome", outcome)
if failures and self.optionflags & FAIL_FAST:
break
# Restore the option flags (in case they were modified)
self.optionflags = original_optionflags
# Record and return the number of failures and tries.
self.__record_outcome(test, failures, tries)
return TestResults(failures, tries)
def __record_outcome(self, test, f, t):
"""
Record the fact that the given DocTest (`test`) generated `f`
failures out of `t` tried examples.
"""
f2, t2 = self._name2ft.get(test.name, (0,0))
self._name2ft[test.name] = (f+f2, t+t2)
self.failures += f
self.tries += t
__LINECACHE_FILENAME_RE = re.compile(r'<doctest '
r'(?P<name>.+)'
r'\[(?P<examplenum>\d+)\]>$')
def __patched_linecache_getlines(self, filename, module_globals=None):
m = self.__LINECACHE_FILENAME_RE.match(filename)
if m and m.group('name') == self.test.name:
example = self.test.examples[int(m.group('examplenum'))]
return example.source.splitlines(keepends=True)
else:
return self.save_linecache_getlines(filename, module_globals)
def run(self, test, compileflags=None, out=None, clear_globs=True):
"""
Run the examples in `test`, and display the results using the
writer function `out`.
The examples are run in the namespace `test.globs`. If
`clear_globs` is true (the default), then this namespace will
be cleared after the test runs, to help with garbage
collection. If you would like to examine the namespace after
the test completes, then use `clear_globs=False`.
`compileflags` gives the set of flags that should be used by
the Python compiler when running the examples. If not
specified, then it will default to the set of future-import
flags that apply to `globs`.
The output of each example is checked using
`DocTestRunner.check_output`, and the results are formatted by
the `DocTestRunner.report_*` methods.
"""
self.test = test
if compileflags is None:
compileflags = _extract_future_flags(test.globs)
save_stdout = sys.stdout
if out is None:
encoding = save_stdout.encoding
if encoding is None or encoding.lower() == 'utf-8':
out = save_stdout.write
else:
# Use backslashreplace error handling on write
def out(s):
s = str(s.encode(encoding, 'backslashreplace'), encoding)
save_stdout.write(s)
sys.stdout = self._fakeout
# Patch pdb.set_trace to restore sys.stdout during interactive
# debugging (so it's not still redirected to self._fakeout).
# Note that the interactive output will go to *our*
# save_stdout, even if that's not the real sys.stdout; this
# allows us to write test cases for the set_trace behavior.
save_trace = sys.gettrace()
save_set_trace = pdb.set_trace
self.debugger = _OutputRedirectingPdb(save_stdout)
self.debugger.reset()
pdb.set_trace = self.debugger.set_trace
# Patch linecache.getlines, so we can see the example's source
# when we're inside the debugger.
self.save_linecache_getlines = linecache.getlines
linecache.getlines = self.__patched_linecache_getlines
# Make sure sys.displayhook just prints the value to stdout
save_displayhook = sys.displayhook
sys.displayhook = sys.__displayhook__
try:
return self.__run(test, compileflags, out)
finally:
sys.stdout = save_stdout
pdb.set_trace = save_set_trace
sys.settrace(save_trace)
linecache.getlines = self.save_linecache_getlines
sys.displayhook = save_displayhook
if clear_globs:
test.globs.clear()
import builtins
builtins._ = None
#/////////////////////////////////////////////////////////////////
# Summarization
#/////////////////////////////////////////////////////////////////
def summarize(self, verbose=None):
"""
Print a summary of all the test cases that have been run by
this DocTestRunner, and return a tuple `(f, t)`, where `f` is
the total number of failed examples, and `t` is the total
number of tried examples.
The optional `verbose` argument controls how detailed the
summary is. If the verbosity is not specified, then the
DocTestRunner's verbosity is used.
"""
if verbose is None:
verbose = self._verbose
notests = []
passed = []
failed = []
totalt = totalf = 0
for x in self._name2ft.items():
name, (f, t) = x
assert f <= t
totalt += t
totalf += f
if t == 0:
notests.append(name)
elif f == 0:
passed.append( (name, t) )
else:
failed.append(x)
if verbose:
if notests:
print(len(notests), "items had no tests:")
notests.sort()
for thing in notests:
print(" ", thing)
if passed:
print(len(passed), "items passed all tests:")
passed.sort()
for thing, count in passed:
print(" %3d tests in %s" % (count, thing))
if failed:
print(self.DIVIDER)
print(len(failed), "items had failures:")
failed.sort()
for thing, (f, t) in failed:
print(" %3d of %3d in %s" % (f, t, thing))
if verbose:
print(totalt, "tests in", len(self._name2ft), "items.")
print(totalt - totalf, "passed and", totalf, "failed.")
if totalf:
print("***Test Failed***", totalf, "failures.")
elif verbose:
print("Test passed.")
return TestResults(totalf, totalt)
#/////////////////////////////////////////////////////////////////
# Backward compatibility cruft to maintain doctest.master.
#/////////////////////////////////////////////////////////////////
def merge(self, other):
d = self._name2ft
for name, (f, t) in other._name2ft.items():
if name in d:
# Don't print here by default, since doing
# so breaks some of the buildbots
#print("*** DocTestRunner.merge: '" + name + "' in both" \
# " testers; summing outcomes.")
f2, t2 = d[name]
f = f + f2
t = t + t2
d[name] = f, t
class OutputChecker:
"""
A class used to check the whether the actual output from a doctest
example matches the expected output. `OutputChecker` defines two
methods: `check_output`, which compares a given pair of outputs,
and returns true if they match; and `output_difference`, which
returns a string describing the differences between two outputs.
"""
def _toAscii(self, s):
"""
Convert string to hex-escaped ASCII string.
"""
return str(s.encode('ASCII', 'backslashreplace'), "ASCII")
def check_output(self, want, got, optionflags):
"""
Return True iff the actual output from an example (`got`)
matches the expected output (`want`). These strings are
always considered to match if they are identical; but
depending on what option flags the test runner is using,
several non-exact match types are also possible. See the
documentation for `TestRunner` for more information about
option flags.
"""
# If `want` contains hex-escaped character such as "\u1234",
# then `want` is a string of six characters(e.g. [\,u,1,2,3,4]).
# On the other hand, `got` could be another sequence of
# characters such as [\u1234], so `want` and `got` should
# be folded to hex-escaped ASCII string to compare.
got = self._toAscii(got)
want = self._toAscii(want)
# Handle the common case first, for efficiency:
# if they're string-identical, always return true.
if got == want:
return True
# The values True and False replaced 1 and 0 as the return
# value for boolean comparisons in Python 2.3.
if not (optionflags & DONT_ACCEPT_TRUE_FOR_1):
if (got,want) == ("True\n", "1\n"):
return True
if (got,want) == ("False\n", "0\n"):
return True
# <BLANKLINE> can be used as a special sequence to signify a
# blank line, unless the DONT_ACCEPT_BLANKLINE flag is used.
if not (optionflags & DONT_ACCEPT_BLANKLINE):
# Replace <BLANKLINE> in want with a blank line.
want = re.sub('(?m)^%s\s*?$' % re.escape(BLANKLINE_MARKER),
'', want)
# If a line in got contains only spaces, then remove the
# spaces.
got = re.sub('(?m)^\s*?$', '', got)
if got == want:
return True
# This flag causes doctest to ignore any differences in the
# contents of whitespace strings. Note that this can be used
# in conjunction with the ELLIPSIS flag.
if optionflags & NORMALIZE_WHITESPACE:
got = ' '.join(got.split())
want = ' '.join(want.split())
if got == want:
return True
# The ELLIPSIS flag says to let the sequence "..." in `want`
# match any substring in `got`.
if optionflags & ELLIPSIS:
if _ellipsis_match(want, got):
return True
# We didn't find any match; return false.
return False
# Should we do a fancy diff?
def _do_a_fancy_diff(self, want, got, optionflags):
# Not unless they asked for a fancy diff.
if not optionflags & (REPORT_UDIFF |
REPORT_CDIFF |
REPORT_NDIFF):
return False
# If expected output uses ellipsis, a meaningful fancy diff is
# too hard ... or maybe not. In two real-life failures Tim saw,
# a diff was a major help anyway, so this is commented out.
# [todo] _ellipsis_match() knows which pieces do and don't match,
# and could be the basis for a kick-ass diff in this case.
##if optionflags & ELLIPSIS and ELLIPSIS_MARKER in want:
## return False
# ndiff does intraline difference marking, so can be useful even
# for 1-line differences.
if optionflags & REPORT_NDIFF:
return True
# The other diff types need at least a few lines to be helpful.
return want.count('\n') > 2 and got.count('\n') > 2
def output_difference(self, example, got, optionflags):
"""
Return a string describing the differences between the
expected output for a given example (`example`) and the actual
output (`got`). `optionflags` is the set of option flags used
to compare `want` and `got`.
"""
want = example.want
# If <BLANKLINE>s are being used, then replace blank lines
# with <BLANKLINE> in the actual output string.
if not (optionflags & DONT_ACCEPT_BLANKLINE):
got = re.sub('(?m)^[ ]*(?=\n)', BLANKLINE_MARKER, got)
# Check if we should use diff.
if self._do_a_fancy_diff(want, got, optionflags):
# Split want & got into lines.
want_lines = want.splitlines(keepends=True)
got_lines = got.splitlines(keepends=True)
# Use difflib to find their differences.
if optionflags & REPORT_UDIFF:
diff = difflib.unified_diff(want_lines, got_lines, n=2)
diff = list(diff)[2:] # strip the diff header
kind = 'unified diff with -expected +actual'
elif optionflags & REPORT_CDIFF:
diff = difflib.context_diff(want_lines, got_lines, n=2)
diff = list(diff)[2:] # strip the diff header
kind = 'context diff with expected followed by actual'
elif optionflags & REPORT_NDIFF:
engine = difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)
diff = list(engine.compare(want_lines, got_lines))
kind = 'ndiff with -expected +actual'
else:
assert 0, 'Bad diff option'
# Remove trailing whitespace on diff output.
diff = [line.rstrip() + '\n' for line in diff]
return 'Differences (%s):\n' % kind + _indent(''.join(diff))
# If we're not using diff, then simply list the expected
# output followed by the actual output.
if want and got:
return 'Expected:\n%sGot:\n%s' % (_indent(want), _indent(got))
elif want:
return 'Expected:\n%sGot nothing\n' % _indent(want)
elif got:
return 'Expected nothing\nGot:\n%s' % _indent(got)
else:
return 'Expected nothing\nGot nothing\n'
class DocTestFailure(Exception):
"""A DocTest example has failed in debugging mode.
The exception instance has variables:
- test: the DocTest object being run
- example: the Example object that failed
- got: the actual output
"""
def __init__(self, test, example, got):
self.test = test
self.example = example
self.got = got
def __str__(self):
return str(self.test)
class UnexpectedException(Exception):
"""A DocTest example has encountered an unexpected exception
The exception instance has variables:
- test: the DocTest object being run
- example: the Example object that failed
- exc_info: the exception info
"""
def __init__(self, test, example, exc_info):
self.test = test
self.example = example
self.exc_info = exc_info
def __str__(self):
return str(self.test)
class DebugRunner(DocTestRunner):
r"""Run doc tests but raise an exception as soon as there is a failure.
If an unexpected exception occurs, an UnexpectedException is raised.
It contains the test, the example, and the original exception:
>>> runner = DebugRunner(verbose=False)
>>> test = DocTestParser().get_doctest('>>> raise KeyError\n42',
... {}, 'foo', 'foo.py', 0)
>>> try:
... runner.run(test)
... except UnexpectedException as f:
... failure = f
>>> failure.test is test
True
>>> failure.example.want
'42\n'
>>> exc_info = failure.exc_info
>>> raise exc_info[1] # Already has the traceback
Traceback (most recent call last):
...
KeyError
We wrap the original exception to give the calling application
access to the test and example information.
If the output doesn't match, then a DocTestFailure is raised:
>>> test = DocTestParser().get_doctest('''
... >>> x = 1
... >>> x
... 2
... ''', {}, 'foo', 'foo.py', 0)
>>> try:
... runner.run(test)
... except DocTestFailure as f:
... failure = f
DocTestFailure objects provide access to the test:
>>> failure.test is test
True
As well as to the example:
>>> failure.example.want
'2\n'
and the actual output:
>>> failure.got
'1\n'
If a failure or error occurs, the globals are left intact:
>>> del test.globs['__builtins__']
>>> test.globs
{'x': 1}
>>> test = DocTestParser().get_doctest('''
... >>> x = 2
... >>> raise KeyError
... ''', {}, 'foo', 'foo.py', 0)
>>> runner.run(test)
Traceback (most recent call last):
...
doctest.UnexpectedException: <DocTest foo from foo.py:0 (2 examples)>
>>> del test.globs['__builtins__']
>>> test.globs
{'x': 2}
But the globals are cleared if there is no error:
>>> test = DocTestParser().get_doctest('''
... >>> x = 2
... ''', {}, 'foo', 'foo.py', 0)
>>> runner.run(test)
TestResults(failed=0, attempted=1)
>>> test.globs
{}
"""
def run(self, test, compileflags=None, out=None, clear_globs=True):
r = DocTestRunner.run(self, test, compileflags, out, False)
if clear_globs:
test.globs.clear()
return r
def report_unexpected_exception(self, out, test, example, exc_info):
raise UnexpectedException(test, example, exc_info)
def report_failure(self, out, test, example, got):
raise DocTestFailure(test, example, got)
######################################################################
## 6. Test Functions
######################################################################
# These should be backwards compatible.
# For backward compatibility, a global instance of a DocTestRunner
# class, updated by testmod.
master = None
def testmod(m=None, name=None, globs=None, verbose=None,
report=True, optionflags=0, extraglobs=None,
raise_on_error=False, exclude_empty=False):
"""m=None, name=None, globs=None, verbose=None, report=True,
optionflags=0, extraglobs=None, raise_on_error=False,
exclude_empty=False
Test examples in docstrings in functions and classes reachable
from module m (or the current module if m is not supplied), starting
with m.__doc__.
Also test examples reachable from dict m.__test__ if it exists and is
not None. m.__test__ maps names to functions, classes and strings;
function and class docstrings are tested even if the name is private;
strings are tested directly, as if they were docstrings.
Return (#failures, #tests).
See help(doctest) for an overview.
Optional keyword arg "name" gives the name of the module; by default
use m.__name__.
Optional keyword arg "globs" gives a dict to be used as the globals
when executing examples; by default, use m.__dict__. A copy of this
dict is actually used for each docstring, so that each docstring's
examples start with a clean slate.
Optional keyword arg "extraglobs" gives a dictionary that should be
merged into the globals that are used to execute examples. By
default, no extra globals are used. This is new in 2.4.
Optional keyword arg "verbose" prints lots of stuff if true, prints
only failures if false; by default, it's true iff "-v" is in sys.argv.
Optional keyword arg "report" prints a summary at the end when true,
else prints nothing at the end. In verbose mode, the summary is
detailed, else very brief (in fact, empty if all tests passed).
Optional keyword arg "optionflags" or's together module constants,
and defaults to 0. This is new in 2.3. Possible values (see the
docs for details):
DONT_ACCEPT_TRUE_FOR_1
DONT_ACCEPT_BLANKLINE
NORMALIZE_WHITESPACE
ELLIPSIS
SKIP
IGNORE_EXCEPTION_DETAIL
REPORT_UDIFF
REPORT_CDIFF
REPORT_NDIFF
REPORT_ONLY_FIRST_FAILURE
Optional keyword arg "raise_on_error" raises an exception on the
first unexpected exception or failure. This allows failures to be
post-mortem debugged.
Advanced tomfoolery: testmod runs methods of a local instance of
class doctest.Tester, then merges the results into (or creates)
global Tester instance doctest.master. Methods of doctest.master
can be called directly too, if you want to do something unusual.
Passing report=0 to testmod is especially useful then, to delay
displaying a summary. Invoke doctest.master.summarize(verbose)
when you're done fiddling.
"""
global master
# If no module was given, then use __main__.
if m is None:
# DWA - m will still be None if this wasn't invoked from the command
# line, in which case the following TypeError is about as good an error
# as we should expect
m = sys.modules.get('__main__')
# Check that we were actually given a module.
if not inspect.ismodule(m):
raise TypeError("testmod: module required; %r" % (m,))
# If no name was given, then use the module's name.
if name is None:
name = m.__name__
# Find, parse, and run all tests in the given module.
finder = DocTestFinder(exclude_empty=exclude_empty)
if raise_on_error:
runner = DebugRunner(verbose=verbose, optionflags=optionflags)
else:
runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
for test in finder.find(m, name, globs=globs, extraglobs=extraglobs):
runner.run(test)
if report:
runner.summarize()
if master is None:
master = runner
else:
master.merge(runner)
return TestResults(runner.failures, runner.tries)
def testfile(filename, module_relative=True, name=None, package=None,
globs=None, verbose=None, report=True, optionflags=0,
extraglobs=None, raise_on_error=False, parser=DocTestParser(),
encoding=None):
"""
Test examples in the given file. Return (#failures, #tests).
Optional keyword arg "module_relative" specifies how filenames
should be interpreted:
- If "module_relative" is True (the default), then "filename"
specifies a module-relative path. By default, this path is
relative to the calling module's directory; but if the
"package" argument is specified, then it is relative to that
package. To ensure os-independence, "filename" should use
"/" characters to separate path segments, and should not
be an absolute path (i.e., it may not begin with "/").
- If "module_relative" is False, then "filename" specifies an
os-specific path. The path may be absolute or relative (to
the current working directory).
Optional keyword arg "name" gives the name of the test; by default
use the file's basename.
Optional keyword argument "package" is a Python package or the
name of a Python package whose directory should be used as the
base directory for a module relative filename. If no package is
specified, then the calling module's directory is used as the base
directory for module relative filenames. It is an error to
specify "package" if "module_relative" is False.
Optional keyword arg "globs" gives a dict to be used as the globals
when executing examples; by default, use {}. A copy of this dict
is actually used for each docstring, so that each docstring's
examples start with a clean slate.
Optional keyword arg "extraglobs" gives a dictionary that should be
merged into the globals that are used to execute examples. By
default, no extra globals are used.
Optional keyword arg "verbose" prints lots of stuff if true, prints
only failures if false; by default, it's true iff "-v" is in sys.argv.
Optional keyword arg "report" prints a summary at the end when true,
else prints nothing at the end. In verbose mode, the summary is
detailed, else very brief (in fact, empty if all tests passed).
Optional keyword arg "optionflags" or's together module constants,
and defaults to 0. Possible values (see the docs for details):
DONT_ACCEPT_TRUE_FOR_1
DONT_ACCEPT_BLANKLINE
NORMALIZE_WHITESPACE
ELLIPSIS
SKIP
IGNORE_EXCEPTION_DETAIL
REPORT_UDIFF
REPORT_CDIFF
REPORT_NDIFF
REPORT_ONLY_FIRST_FAILURE
Optional keyword arg "raise_on_error" raises an exception on the
first unexpected exception or failure. This allows failures to be
post-mortem debugged.
Optional keyword arg "parser" specifies a DocTestParser (or
subclass) that should be used to extract tests from the files.
Optional keyword arg "encoding" specifies an encoding that should
be used to convert the file to unicode.
Advanced tomfoolery: testmod runs methods of a local instance of
class doctest.Tester, then merges the results into (or creates)
global Tester instance doctest.master. Methods of doctest.master
can be called directly too, if you want to do something unusual.
Passing report=0 to testmod is especially useful then, to delay
displaying a summary. Invoke doctest.master.summarize(verbose)
when you're done fiddling.
"""
global master
if package and not module_relative:
raise ValueError("Package may only be specified for module-"
"relative paths.")
# Relativize the path
text, filename = _load_testfile(filename, package, module_relative,
encoding or "utf-8")
# If no name was given, then use the file's name.
if name is None:
name = os.path.basename(filename)
# Assemble the globals.
if globs is None:
globs = {}
else:
globs = globs.copy()
if extraglobs is not None:
globs.update(extraglobs)
if '__name__' not in globs:
globs['__name__'] = '__main__'
if raise_on_error:
runner = DebugRunner(verbose=verbose, optionflags=optionflags)
else:
runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
# Read the file, convert it to a test, and run it.
test = parser.get_doctest(text, globs, name, filename, 0)
runner.run(test)
if report:
runner.summarize()
if master is None:
master = runner
else:
master.merge(runner)
return TestResults(runner.failures, runner.tries)
def run_docstring_examples(f, globs, verbose=False, name="NoName",
compileflags=None, optionflags=0):
"""
Test examples in the given object's docstring (`f`), using `globs`
as globals. Optional argument `name` is used in failure messages.
If the optional argument `verbose` is true, then generate output
even if there are no failures.
`compileflags` gives the set of flags that should be used by the
Python compiler when running the examples. If not specified, then
it will default to the set of future-import flags that apply to
`globs`.
Optional keyword arg `optionflags` specifies options for the
testing and output. See the documentation for `testmod` for more
information.
"""
# Find, parse, and run all tests in the given module.
finder = DocTestFinder(verbose=verbose, recurse=False)
runner = DocTestRunner(verbose=verbose, optionflags=optionflags)
for test in finder.find(f, name, globs=globs):
runner.run(test, compileflags=compileflags)
######################################################################
## 7. Unittest Support
######################################################################
_unittest_reportflags = 0
def set_unittest_reportflags(flags):
"""Sets the unittest option flags.
The old flag is returned so that a runner could restore the old
value if it wished to:
>>> import doctest
>>> old = doctest._unittest_reportflags
>>> doctest.set_unittest_reportflags(REPORT_NDIFF |
... REPORT_ONLY_FIRST_FAILURE) == old
True
>>> doctest._unittest_reportflags == (REPORT_NDIFF |
... REPORT_ONLY_FIRST_FAILURE)
True
Only reporting flags can be set:
>>> doctest.set_unittest_reportflags(ELLIPSIS)
Traceback (most recent call last):
...
ValueError: ('Only reporting flags allowed', 8)
>>> doctest.set_unittest_reportflags(old) == (REPORT_NDIFF |
... REPORT_ONLY_FIRST_FAILURE)
True
"""
global _unittest_reportflags
if (flags & REPORTING_FLAGS) != flags:
raise ValueError("Only reporting flags allowed", flags)
old = _unittest_reportflags
_unittest_reportflags = flags
return old
class DocTestCase(unittest.TestCase):
def __init__(self, test, optionflags=0, setUp=None, tearDown=None,
checker=None):
unittest.TestCase.__init__(self)
self._dt_optionflags = optionflags
self._dt_checker = checker
self._dt_test = test
self._dt_setUp = setUp
self._dt_tearDown = tearDown
def setUp(self):
test = self._dt_test
if self._dt_setUp is not None:
self._dt_setUp(test)
def tearDown(self):
test = self._dt_test
if self._dt_tearDown is not None:
self._dt_tearDown(test)
test.globs.clear()
def runTest(self):
test = self._dt_test
old = sys.stdout
new = StringIO()
optionflags = self._dt_optionflags
if not (optionflags & REPORTING_FLAGS):
# The option flags don't include any reporting flags,
# so add the default reporting flags
optionflags |= _unittest_reportflags
runner = DocTestRunner(optionflags=optionflags,
checker=self._dt_checker, verbose=False)
try:
runner.DIVIDER = "-"*70
failures, tries = runner.run(
test, out=new.write, clear_globs=False)
finally:
sys.stdout = old
if failures:
raise self.failureException(self.format_failure(new.getvalue()))
def format_failure(self, err):
test = self._dt_test
if test.lineno is None:
lineno = 'unknown line number'
else:
lineno = '%s' % test.lineno
lname = '.'.join(test.name.split('.')[-1:])
return ('Failed doctest test for %s\n'
' File "%s", line %s, in %s\n\n%s'
% (test.name, test.filename, lineno, lname, err)
)
def debug(self):
r"""Run the test case without results and without catching exceptions
The unit test framework includes a debug method on test cases
and test suites to support post-mortem debugging. The test code
is run in such a way that errors are not caught. This way a
caller can catch the errors and initiate post-mortem debugging.
The DocTestCase provides a debug method that raises
UnexpectedException errors if there is an unexpected
exception:
>>> test = DocTestParser().get_doctest('>>> raise KeyError\n42',
... {}, 'foo', 'foo.py', 0)
>>> case = DocTestCase(test)
>>> try:
... case.debug()
... except UnexpectedException as f:
... failure = f
The UnexpectedException contains the test, the example, and
the original exception:
>>> failure.test is test
True
>>> failure.example.want
'42\n'
>>> exc_info = failure.exc_info
>>> raise exc_info[1] # Already has the traceback
Traceback (most recent call last):
...
KeyError
If the output doesn't match, then a DocTestFailure is raised:
>>> test = DocTestParser().get_doctest('''
... >>> x = 1
... >>> x
... 2
... ''', {}, 'foo', 'foo.py', 0)
>>> case = DocTestCase(test)
>>> try:
... case.debug()
... except DocTestFailure as f:
... failure = f
DocTestFailure objects provide access to the test:
>>> failure.test is test
True
As well as to the example:
>>> failure.example.want
'2\n'
and the actual output:
>>> failure.got
'1\n'
"""
self.setUp()
runner = DebugRunner(optionflags=self._dt_optionflags,
checker=self._dt_checker, verbose=False)
runner.run(self._dt_test, clear_globs=False)
self.tearDown()
def id(self):
return self._dt_test.name
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self._dt_test == other._dt_test and \
self._dt_optionflags == other._dt_optionflags and \
self._dt_setUp == other._dt_setUp and \
self._dt_tearDown == other._dt_tearDown and \
self._dt_checker == other._dt_checker
def __hash__(self):
return hash((self._dt_optionflags, self._dt_setUp, self._dt_tearDown,
self._dt_checker))
def __repr__(self):
name = self._dt_test.name.split('.')
return "%s (%s)" % (name[-1], '.'.join(name[:-1]))
__str__ = __repr__
def shortDescription(self):
return "Doctest: " + self._dt_test.name
class SkipDocTestCase(DocTestCase):
def __init__(self, module):
self.module = module
DocTestCase.__init__(self, None)
def setUp(self):
self.skipTest("DocTestSuite will not work with -O2 and above")
def test_skip(self):
pass
def shortDescription(self):
return "Skipping tests from %s" % self.module.__name__
__str__ = shortDescription
class _DocTestSuite(unittest.TestSuite):
def _removeTestAtIndex(self, index):
pass
def DocTestSuite(module=None, globs=None, extraglobs=None, test_finder=None,
**options):
"""
Convert doctest tests for a module to a unittest test suite.
This converts each documentation string in a module that
contains doctest tests to a unittest test case. If any of the
tests in a doc string fail, then the test case fails. An exception
is raised showing the name of the file containing the test and a
(sometimes approximate) line number.
The `module` argument provides the module to be tested. The argument
can be either a module or a module name.
If no argument is given, the calling module is used.
A number of options may be provided as keyword arguments:
setUp
A set-up function. This is called before running the
tests in each file. The setUp function will be passed a DocTest
object. The setUp function can access the test globals as the
globs attribute of the test passed.
tearDown
A tear-down function. This is called after running the
tests in each file. The tearDown function will be passed a DocTest
object. The tearDown function can access the test globals as the
globs attribute of the test passed.
globs
A dictionary containing initial global variables for the tests.
optionflags
A set of doctest option flags expressed as an integer.
"""
if test_finder is None:
test_finder = DocTestFinder()
module = _normalize_module(module)
tests = test_finder.find(module, globs=globs, extraglobs=extraglobs)
if not tests and sys.flags.optimize >=2:
# Skip doctests when running with -O2
suite = _DocTestSuite()
suite.addTest(SkipDocTestCase(module))
return suite
elif not tests:
# Why do we want to do this? Because it reveals a bug that might
# otherwise be hidden.
# It is probably a bug that this exception is not also raised if the
# number of doctest examples in tests is zero (i.e. if no doctest
# examples were found). However, we should probably not be raising
# an exception at all here, though it is too late to make this change
# for a maintenance release. See also issue #14649.
raise ValueError(module, "has no docstrings")
tests.sort()
suite = _DocTestSuite()
for test in tests:
if len(test.examples) == 0:
continue
if not test.filename:
filename = module.__file__
if filename[-4:] in (".pyc", ".pyo"):
filename = filename[:-1]
test.filename = filename
suite.addTest(DocTestCase(test, **options))
return suite
class DocFileCase(DocTestCase):
def id(self):
return '_'.join(self._dt_test.name.split('.'))
def __repr__(self):
return self._dt_test.filename
__str__ = __repr__
def format_failure(self, err):
return ('Failed doctest test for %s\n File "%s", line 0\n\n%s'
% (self._dt_test.name, self._dt_test.filename, err)
)
def DocFileTest(path, module_relative=True, package=None,
globs=None, parser=DocTestParser(),
encoding=None, **options):
if globs is None:
globs = {}
else:
globs = globs.copy()
if package and not module_relative:
raise ValueError("Package may only be specified for module-"
"relative paths.")
# Relativize the path.
doc, path = _load_testfile(path, package, module_relative,
encoding or "utf-8")
if "__file__" not in globs:
globs["__file__"] = path
# Find the file and read it.
name = os.path.basename(path)
# Convert it to a test, and wrap it in a DocFileCase.
test = parser.get_doctest(doc, globs, name, path, 0)
return DocFileCase(test, **options)
def DocFileSuite(*paths, **kw):
"""A unittest suite for one or more doctest files.
The path to each doctest file is given as a string; the
interpretation of that string depends on the keyword argument
"module_relative".
A number of options may be provided as keyword arguments:
module_relative
If "module_relative" is True, then the given file paths are
interpreted as os-independent module-relative paths. By
default, these paths are relative to the calling module's
directory; but if the "package" argument is specified, then
they are relative to that package. To ensure os-independence,
"filename" should use "/" characters to separate path
segments, and may not be an absolute path (i.e., it may not
begin with "/").
If "module_relative" is False, then the given file paths are
interpreted as os-specific paths. These paths may be absolute
or relative (to the current working directory).
package
A Python package or the name of a Python package whose directory
should be used as the base directory for module relative paths.
If "package" is not specified, then the calling module's
directory is used as the base directory for module relative
filenames. It is an error to specify "package" if
"module_relative" is False.
setUp
A set-up function. This is called before running the
tests in each file. The setUp function will be passed a DocTest
object. The setUp function can access the test globals as the
globs attribute of the test passed.
tearDown
A tear-down function. This is called after running the
tests in each file. The tearDown function will be passed a DocTest
object. The tearDown function can access the test globals as the
globs attribute of the test passed.
globs
A dictionary containing initial global variables for the tests.
optionflags
A set of doctest option flags expressed as an integer.
parser
A DocTestParser (or subclass) that should be used to extract
tests from the files.
encoding
An encoding that will be used to convert the files to unicode.
"""
suite = _DocTestSuite()
# We do this here so that _normalize_module is called at the right
# level. If it were called in DocFileTest, then this function
# would be the caller and we might guess the package incorrectly.
if kw.get('module_relative', True):
kw['package'] = _normalize_module(kw.get('package'))
for path in paths:
suite.addTest(DocFileTest(path, **kw))
return suite
######################################################################
## 8. Debugging Support
######################################################################
def script_from_examples(s):
r"""Extract script from text with examples.
Converts text with examples to a Python script. Example input is
converted to regular code. Example output and all other words
are converted to comments:
>>> text = '''
... Here are examples of simple math.
...
... Python has super accurate integer addition
...
... >>> 2 + 2
... 5
...
... And very friendly error messages:
...
... >>> 1/0
... To Infinity
... And
... Beyond
...
... You can use logic if you want:
...
... >>> if 0:
... ... blah
... ... blah
... ...
...
... Ho hum
... '''
>>> print(script_from_examples(text))
# Here are examples of simple math.
#
# Python has super accurate integer addition
#
2 + 2
# Expected:
## 5
#
# And very friendly error messages:
#
1/0
# Expected:
## To Infinity
## And
## Beyond
#
# You can use logic if you want:
#
if 0:
blah
blah
#
# Ho hum
<BLANKLINE>
"""
output = []
for piece in DocTestParser().parse(s):
if isinstance(piece, Example):
# Add the example's source code (strip trailing NL)
output.append(piece.source[:-1])
# Add the expected output:
want = piece.want
if want:
output.append('# Expected:')
output += ['## '+l for l in want.split('\n')[:-1]]
else:
# Add non-example text.
output += [_comment_line(l)
for l in piece.split('\n')[:-1]]
# Trim junk on both ends.
while output and output[-1] == '#':
output.pop()
while output and output[0] == '#':
output.pop(0)
# Combine the output, and return it.
# Add a courtesy newline to prevent exec from choking (see bug #1172785)
return '\n'.join(output) + '\n'
def testsource(module, name):
"""Extract the test sources from a doctest docstring as a script.
Provide the module (or dotted name of the module) containing the
test to be debugged and the name (within the module) of the object
with the doc string with tests to be debugged.
"""
module = _normalize_module(module)
tests = DocTestFinder().find(module)
test = [t for t in tests if t.name == name]
if not test:
raise ValueError(name, "not found in tests")
test = test[0]
testsrc = script_from_examples(test.docstring)
return testsrc
def debug_src(src, pm=False, globs=None):
"""Debug a single doctest docstring, in argument `src`'"""
testsrc = script_from_examples(src)
debug_script(testsrc, pm, globs)
def debug_script(src, pm=False, globs=None):
"Debug a test script. `src` is the script, as a string."
import pdb
if globs:
globs = globs.copy()
else:
globs = {}
if pm:
try:
exec(src, globs, globs)
except:
print(sys.exc_info()[1])
p = pdb.Pdb(nosigint=True)
p.reset()
p.interaction(None, sys.exc_info()[2])
else:
pdb.Pdb(nosigint=True).run("exec(%r)" % src, globs, globs)
def debug(module, name, pm=False):
"""Debug a single doctest docstring.
Provide the module (or dotted name of the module) containing the
test to be debugged and the name (within the module) of the object
with the docstring with tests to be debugged.
"""
module = _normalize_module(module)
testsrc = testsource(module, name)
debug_script(testsrc, pm, module.__dict__)
######################################################################
## 9. Example Usage
######################################################################
class _TestClass:
"""
A pointless class, for sanity-checking of docstring testing.
Methods:
square()
get()
>>> _TestClass(13).get() + _TestClass(-12).get()
1
>>> hex(_TestClass(13).square().get())
'0xa9'
"""
def __init__(self, val):
"""val -> _TestClass object with associated value val.
>>> t = _TestClass(123)
>>> print(t.get())
123
"""
self.val = val
def square(self):
"""square() -> square TestClass's associated value
>>> _TestClass(13).square().get()
169
"""
self.val = self.val ** 2
return self
def get(self):
"""get() -> return TestClass's associated value.
>>> x = _TestClass(-42)
>>> print(x.get())
-42
"""
return self.val
__test__ = {"_TestClass": _TestClass,
"string": r"""
Example of a string object, searched as-is.
>>> x = 1; y = 2
>>> x + y, x * y
(3, 2)
""",
"bool-int equivalence": r"""
In 2.2, boolean expressions displayed
0 or 1. By default, we still accept
them. This can be disabled by passing
DONT_ACCEPT_TRUE_FOR_1 to the new
optionflags argument.
>>> 4 == 4
1
>>> 4 == 4
True
>>> 4 > 4
0
>>> 4 > 4
False
""",
"blank lines": r"""
Blank lines can be marked with <BLANKLINE>:
>>> print('foo\n\nbar\n')
foo
<BLANKLINE>
bar
<BLANKLINE>
""",
"ellipsis": r"""
If the ellipsis flag is used, then '...' can be used to
elide substrings in the desired output:
>>> print(list(range(1000))) #doctest: +ELLIPSIS
[0, 1, 2, ..., 999]
""",
"whitespace normalization": r"""
If the whitespace normalization flag is used, then
differences in whitespace are ignored.
>>> print(list(range(30))) #doctest: +NORMALIZE_WHITESPACE
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29]
""",
}
def _test():
parser = argparse.ArgumentParser(description="doctest runner")
parser.add_argument('-v', '--verbose', action='store_true', default=False,
help='print very verbose output for all tests')
parser.add_argument('-o', '--option', action='append',
choices=OPTIONFLAGS_BY_NAME.keys(), default=[],
help=('specify a doctest option flag to apply'
' to the test run; may be specified more'
' than once to apply multiple options'))
parser.add_argument('-f', '--fail-fast', action='store_true',
help=('stop running tests after first failure (this'
' is a shorthand for -o FAIL_FAST, and is'
' in addition to any other -o options)'))
parser.add_argument('file', nargs='+',
help='file containing the tests to run')
args = parser.parse_args()
testfiles = args.file
# Verbose used to be handled by the "inspect argv" magic in DocTestRunner,
# but since we are using argparse we are passing it manually now.
verbose = args.verbose
options = 0
for option in args.option:
options |= OPTIONFLAGS_BY_NAME[option]
if args.fail_fast:
options |= FAIL_FAST
for filename in testfiles:
if filename.endswith(".py"):
# It is a module -- insert its dir into sys.path and try to
# import it. If it is part of a package, that possibly
# won't work because of package imports.
dirname, filename = os.path.split(filename)
sys.path.insert(0, dirname)
m = __import__(filename[:-3])
del sys.path[0]
failures, _ = testmod(m, verbose=verbose, optionflags=options)
else:
failures, _ = testfile(filename, module_relative=False,
verbose=verbose, optionflags=options)
if failures:
return 1
return 0
if __name__ == "__main__":
sys.exit(_test())
"""Faux ``threading`` version using ``dummy_thread`` instead of ``thread``.
The module ``_dummy_threading`` is added to ``sys.modules`` in order
to not have ``threading`` considered imported. Had ``threading`` been
directly imported it would have made all subsequent imports succeed
regardless of whether ``_thread`` was available which is not desired.
"""
from sys import modules as sys_modules
import _dummy_thread
# Declaring now so as to not have to nest ``try``s to get proper clean-up.
holding_thread = False
holding_threading = False
holding__threading_local = False
try:
# Could have checked if ``_thread`` was not in sys.modules and gone
# a different route, but decided to mirror technique used with
# ``threading`` below.
if '_thread' in sys_modules:
held_thread = sys_modules['_thread']
holding_thread = True
# Must have some module named ``_thread`` that implements its API
# in order to initially import ``threading``.
sys_modules['_thread'] = sys_modules['_dummy_thread']
if 'threading' in sys_modules:
# If ``threading`` is already imported, might as well prevent
# trying to import it more than needed by saving it if it is
# already imported before deleting it.
held_threading = sys_modules['threading']
holding_threading = True
del sys_modules['threading']
if '_threading_local' in sys_modules:
# If ``_threading_local`` is already imported, might as well prevent
# trying to import it more than needed by saving it if it is
# already imported before deleting it.
held__threading_local = sys_modules['_threading_local']
holding__threading_local = True
del sys_modules['_threading_local']
import threading
# Need a copy of the code kept somewhere...
sys_modules['_dummy_threading'] = sys_modules['threading']
del sys_modules['threading']
sys_modules['_dummy__threading_local'] = sys_modules['_threading_local']
del sys_modules['_threading_local']
from _dummy_threading import *
from _dummy_threading import __all__
finally:
# Put back ``threading`` if we overwrote earlier
if holding_threading:
sys_modules['threading'] = held_threading
del held_threading
del holding_threading
# Put back ``_threading_local`` if we overwrote earlier
if holding__threading_local:
sys_modules['_threading_local'] = held__threading_local
del held__threading_local
del holding__threading_local
# Put back ``thread`` if we overwrote, else del the entry we made
if holding_thread:
sys_modules['_thread'] = held_thread
del held_thread
else:
del sys_modules['_thread']
del holding_thread
del _dummy_thread
del sys_modules
import sys
from collections import OrderedDict
from types import MappingProxyType, DynamicClassAttribute
__all__ = ['Enum', 'IntEnum', 'unique']
def _is_descriptor(obj):
"""Returns True if obj is a descriptor, False otherwise."""
return (
hasattr(obj, '__get__') or
hasattr(obj, '__set__') or
hasattr(obj, '__delete__'))
def _is_dunder(name):
"""Returns True if a __dunder__ name, False otherwise."""
return (name[:2] == name[-2:] == '__' and
name[2:3] != '_' and
name[-3:-2] != '_' and
len(name) > 4)
def _is_sunder(name):
"""Returns True if a _sunder_ name, False otherwise."""
return (name[0] == name[-1] == '_' and
name[1:2] != '_' and
name[-2:-1] != '_' and
len(name) > 2)
def _make_class_unpicklable(cls):
"""Make the given class un-picklable."""
def _break_on_call_reduce(self, proto):
raise TypeError('%r cannot be pickled' % self)
cls.__reduce_ex__ = _break_on_call_reduce
cls.__module__ = '<unknown>'
class _EnumDict(dict):
"""Track enum member order and ensure member names are not reused.
EnumMeta will use the names found in self._member_names as the
enumeration member names.
"""
def __init__(self):
super().__init__()
self._member_names = []
def __setitem__(self, key, value):
"""Changes anything not dundered or not a descriptor.
If an enum member name is used twice, an error is raised; duplicate
values are not checked for.
Single underscore (sunder) names are reserved.
"""
if _is_sunder(key):
raise ValueError('_names_ are reserved for future Enum use')
elif _is_dunder(key):
pass
elif key in self._member_names:
# descriptor overwriting an enum?
raise TypeError('Attempted to reuse key: %r' % key)
elif not _is_descriptor(value):
if key in self:
# enum overwriting a descriptor?
raise TypeError('Key already defined as: %r' % self[key])
self._member_names.append(key)
super().__setitem__(key, value)
# Dummy value for Enum as EnumMeta explicitly checks for it, but of course
# until EnumMeta finishes running the first time the Enum class doesn't exist.
# This is also why there are checks in EnumMeta like `if Enum is not None`
Enum = None
class EnumMeta(type):
"""Metaclass for Enum"""
@classmethod
def __prepare__(metacls, cls, bases):
return _EnumDict()
def __new__(metacls, cls, bases, classdict):
# an Enum class is final once enumeration items have been defined; it
# cannot be mixed with other types (int, float, etc.) if it has an
# inherited __new__ unless a new __new__ is defined (or the resulting
# class will fail).
member_type, first_enum = metacls._get_mixins_(bases)
__new__, save_new, use_args = metacls._find_new_(classdict, member_type,
first_enum)
# save enum items into separate mapping so they don't get baked into
# the new class
members = {k: classdict[k] for k in classdict._member_names}
for name in classdict._member_names:
del classdict[name]
# check for illegal enum names (any others?)
invalid_names = set(members) & {'mro', }
if invalid_names:
raise ValueError('Invalid enum member name: {0}'.format(
','.join(invalid_names)))
# create our new Enum type
enum_class = super().__new__(metacls, cls, bases, classdict)
enum_class._member_names_ = [] # names in definition order
enum_class._member_map_ = OrderedDict() # name->value map
enum_class._member_type_ = member_type
# Reverse value->name map for hashable values.
enum_class._value2member_map_ = {}
# If a custom type is mixed into the Enum, and it does not know how
# to pickle itself, pickle.dumps will succeed but pickle.loads will
# fail. Rather than have the error show up later and possibly far
# from the source, sabotage the pickle protocol for this class so
# that pickle.dumps also fails.
#
# However, if the new class implements its own __reduce_ex__, do not
# sabotage -- it's on them to make sure it works correctly. We use
# __reduce_ex__ instead of any of the others as it is preferred by
# pickle over __reduce__, and it handles all pickle protocols.
if '__reduce_ex__' not in classdict:
if member_type is not object:
methods = ('__getnewargs_ex__', '__getnewargs__',
'__reduce_ex__', '__reduce__')
if not any(m in member_type.__dict__ for m in methods):
_make_class_unpicklable(enum_class)
# instantiate them, checking for duplicates as we go
# we instantiate first instead of checking for duplicates first in case
# a custom __new__ is doing something funky with the values -- such as
# auto-numbering ;)
for member_name in classdict._member_names:
value = members[member_name]
if not isinstance(value, tuple):
args = (value, )
else:
args = value
if member_type is tuple: # special case for tuple enums
args = (args, ) # wrap it one more time
if not use_args:
enum_member = __new__(enum_class)
if not hasattr(enum_member, '_value_'):
enum_member._value_ = value
else:
enum_member = __new__(enum_class, *args)
if not hasattr(enum_member, '_value_'):
enum_member._value_ = member_type(*args)
value = enum_member._value_
enum_member._name_ = member_name
enum_member.__objclass__ = enum_class
enum_member.__init__(*args)
# If another member with the same value was already defined, the
# new member becomes an alias to the existing one.
for name, canonical_member in enum_class._member_map_.items():
if canonical_member._value_ == enum_member._value_:
enum_member = canonical_member
break
else:
# Aliases don't appear in member names (only in __members__).
enum_class._member_names_.append(member_name)
enum_class._member_map_[member_name] = enum_member
try:
# This may fail if value is not hashable. We can't add the value
# to the map, and by-value lookups for this value will be
# linear.
enum_class._value2member_map_[value] = enum_member
except TypeError:
pass
# double check that repr and friends are not the mixin's or various
# things break (such as pickle)
for name in ('__repr__', '__str__', '__format__', '__reduce_ex__'):
class_method = getattr(enum_class, name)
obj_method = getattr(member_type, name, None)
enum_method = getattr(first_enum, name, None)
if obj_method is not None and obj_method is class_method:
setattr(enum_class, name, enum_method)
# replace any other __new__ with our own (as long as Enum is not None,
# anyway) -- again, this is to support pickle
if Enum is not None:
# if the user defined their own __new__, save it before it gets
# clobbered in case they subclass later
if save_new:
enum_class.__new_member__ = __new__
enum_class.__new__ = Enum.__new__
return enum_class
def __bool__(self):
"""
classes/types should always be True.
"""
return True
def __call__(cls, value, names=None, *, module=None, qualname=None, type=None):
"""Either returns an existing member, or creates a new enum class.
This method is used both when an enum class is given a value to match
to an enumeration member (i.e. Color(3)) and for the functional API
(i.e. Color = Enum('Color', names='red green blue')).
When used for the functional API:
`value` will be the name of the new class.
`names` should be either a string of white-space/comma delimited names
(values will start at 1), or an iterator/mapping of name, value pairs.
`module` should be set to the module this class is being created in;
if it is not set, an attempt to find that module will be made, but if
it fails the class will not be picklable.
`qualname` should be set to the actual location this class can be found
at in its module; by default it is set to the global scope. If this is
not correct, unpickling will fail in some circumstances.
`type`, if set, will be mixed in as the first base class.
"""
if names is None: # simple value lookup
return cls.__new__(cls, value)
# otherwise, functional API: we're creating a new Enum type
return cls._create_(value, names, module=module, qualname=qualname, type=type)
def __contains__(cls, member):
return isinstance(member, cls) and member._name_ in cls._member_map_
def __delattr__(cls, attr):
# nicer error message when someone tries to delete an attribute
# (see issue19025).
if attr in cls._member_map_:
raise AttributeError(
"%s: cannot delete Enum member." % cls.__name__)
super().__delattr__(attr)
def __dir__(self):
return (['__class__', '__doc__', '__members__', '__module__'] +
self._member_names_)
def __getattr__(cls, name):
"""Return the enum member matching `name`
We use __getattr__ instead of descriptors or inserting into the enum
class' __dict__ in order to support `name` and `value` being both
properties for enum members (which live in the class' __dict__) and
enum members themselves.
"""
if _is_dunder(name):
raise AttributeError(name)
try:
return cls._member_map_[name]
except KeyError:
raise AttributeError(name) from None
def __getitem__(cls, name):
return cls._member_map_[name]
def __iter__(cls):
return (cls._member_map_[name] for name in cls._member_names_)
def __len__(cls):
return len(cls._member_names_)
@property
def __members__(cls):
"""Returns a mapping of member name->value.
This mapping lists all enum members, including aliases. Note that this
is a read-only view of the internal mapping.
"""
return MappingProxyType(cls._member_map_)
def __repr__(cls):
return "<enum %r>" % cls.__name__
def __reversed__(cls):
return (cls._member_map_[name] for name in reversed(cls._member_names_))
def __setattr__(cls, name, value):
"""Block attempts to reassign Enum members.
A simple assignment to the class namespace only changes one of the
several possible ways to get an Enum member from the Enum class,
resulting in an inconsistent Enumeration.
"""
member_map = cls.__dict__.get('_member_map_', {})
if name in member_map:
raise AttributeError('Cannot reassign members.')
super().__setattr__(name, value)
def _create_(cls, class_name, names=None, *, module=None, qualname=None, type=None):
"""Convenience method to create a new Enum class.
`names` can be:
* A string containing member names, separated either with spaces or
commas. Values are auto-numbered from 1.
* An iterable of member names. Values are auto-numbered from 1.
* An iterable of (member name, value) pairs.
* A mapping of member name -> value.
"""
metacls = cls.__class__
bases = (cls, ) if type is None else (type, cls)
classdict = metacls.__prepare__(class_name, bases)
# special processing needed for names?
if isinstance(names, str):
names = names.replace(',', ' ').split()
if isinstance(names, (tuple, list)) and isinstance(names[0], str):
names = [(e, i) for (i, e) in enumerate(names, 1)]
# Here, names is either an iterable of (name, value) or a mapping.
for item in names:
if isinstance(item, str):
member_name, member_value = item, names[item]
else:
member_name, member_value = item
classdict[member_name] = member_value
enum_class = metacls.__new__(metacls, class_name, bases, classdict)
# TODO: replace the frame hack if a blessed way to know the calling
# module is ever developed
if module is None:
try:
module = sys._getframe(2).f_globals['__name__']
except (AttributeError, ValueError) as exc:
pass
if module is None:
_make_class_unpicklable(enum_class)
else:
enum_class.__module__ = module
if qualname is not None:
enum_class.__qualname__ = qualname
return enum_class
@staticmethod
def _get_mixins_(bases):
"""Returns the type for creating enum members, and the first inherited
enum class.
bases: the tuple of bases that was given to __new__
"""
if not bases:
return object, Enum
# double check that we are not subclassing a class with existing
# enumeration members; while we're at it, see if any other data
# type has been mixed in so we can use the correct __new__
member_type = first_enum = None
for base in bases:
if (base is not Enum and
issubclass(base, Enum) and
base._member_names_):
raise TypeError("Cannot extend enumerations")
# base is now the last base in bases
if not issubclass(base, Enum):
raise TypeError("new enumerations must be created as "
"`ClassName([mixin_type,] enum_type)`")
# get correct mix-in type (either mix-in type of Enum subclass, or
# first base if last base is Enum)
if not issubclass(bases[0], Enum):
member_type = bases[0] # first data type
first_enum = bases[-1] # enum type
else:
for base in bases[0].__mro__:
# most common: (IntEnum, int, Enum, object)
# possible: (<Enum 'AutoIntEnum'>, <Enum 'IntEnum'>,
# <class 'int'>, <Enum 'Enum'>,
# <class 'object'>)
if issubclass(base, Enum):
if first_enum is None:
first_enum = base
else:
if member_type is None:
member_type = base
return member_type, first_enum
@staticmethod
def _find_new_(classdict, member_type, first_enum):
"""Returns the __new__ to be used for creating the enum members.
classdict: the class dictionary given to __new__
member_type: the data type whose __new__ will be used by default
first_enum: enumeration to check for an overriding __new__
"""
# now find the correct __new__, checking to see of one was defined
# by the user; also check earlier enum classes in case a __new__ was
# saved as __new_member__
__new__ = classdict.get('__new__', None)
# should __new__ be saved as __new_member__ later?
save_new = __new__ is not None
if __new__ is None:
# check all possibles for __new_member__ before falling back to
# __new__
for method in ('__new_member__', '__new__'):
for possible in (member_type, first_enum):
target = getattr(possible, method, None)
if target not in {
None,
None.__new__,
object.__new__,
Enum.__new__,
}:
__new__ = target
break
if __new__ is not None:
break
else:
__new__ = object.__new__
# if a non-object.__new__ is used then whatever value/tuple was
# assigned to the enum member name will be passed to __new__ and to the
# new enum member's __init__
if __new__ is object.__new__:
use_args = False
else:
use_args = True
return __new__, save_new, use_args
class Enum(metaclass=EnumMeta):
"""Generic enumeration.
Derive from this class to define new enumerations.
"""
def __new__(cls, value):
# all enum instances are actually created during class construction
# without calling this method; this method is called by the metaclass'
# __call__ (i.e. Color(3) ), and by pickle
if type(value) is cls:
# For lookups like Color(Color.red)
return value
# by-value search for a matching enum member
# see if it's in the reverse mapping (for hashable values)
try:
if value in cls._value2member_map_:
return cls._value2member_map_[value]
except TypeError:
# not there, now do long search -- O(n) behavior
for member in cls._member_map_.values():
if member._value_ == value:
return member
raise ValueError("%r is not a valid %s" % (value, cls.__name__))
def __repr__(self):
return "<%s.%s: %r>" % (
self.__class__.__name__, self._name_, self._value_)
def __str__(self):
return "%s.%s" % (self.__class__.__name__, self._name_)
def __dir__(self):
added_behavior = [
m
for cls in self.__class__.mro()
for m in cls.__dict__
if m[0] != '_'
]
return (['__class__', '__doc__', '__module__', 'name', 'value'] +
added_behavior)
def __format__(self, format_spec):
# mixed-in Enums should use the mixed-in type's __format__, otherwise
# we can get strange results with the Enum name showing up instead of
# the value
# pure Enum branch
if self._member_type_ is object:
cls = str
val = str(self)
# mix-in branch
else:
cls = self._member_type_
val = self._value_
return cls.__format__(val, format_spec)
def __hash__(self):
return hash(self._name_)
def __reduce_ex__(self, proto):
return self.__class__, (self._value_, )
# DynamicClassAttribute is used to provide access to the `name` and
# `value` properties of enum members while keeping some measure of
# protection from modification, while still allowing for an enumeration
# to have members named `name` and `value`. This works because enumeration
# members are not set directly on the enum class -- __getattr__ is
# used to look them up.
@DynamicClassAttribute
def name(self):
"""The name of the Enum member."""
return self._name_
@DynamicClassAttribute
def value(self):
"""The value of the Enum member."""
return self._value_
@classmethod
def _convert(cls, name, module, filter, source=None):
"""
Create a new Enum subclass that replaces a collection of global constants
"""
# convert all constants from source (or module) that pass filter() to
# a new Enum called name, and export the enum and its members back to
# module;
# also, replace the __reduce_ex__ method so unpickling works in
# previous Python versions
module_globals = vars(sys.modules[module])
if source:
source = vars(source)
else:
source = module_globals
members = {name: value for name, value in source.items()
if filter(name)}
cls = cls(name, members, module=module)
cls.__reduce_ex__ = _reduce_ex_by_name
module_globals.update(cls.__members__)
module_globals[name] = cls
return cls
class IntEnum(int, Enum):
"""Enum where members are also (and must be) ints"""
def _reduce_ex_by_name(self, proto):
return self.name
def unique(enumeration):
"""Class decorator for enumerations ensuring unique member values."""
duplicates = []
for name, member in enumeration.__members__.items():
if name != member.name:
duplicates.append((name, member.name))
if duplicates:
alias_details = ', '.join(
["%s -> %s" % (alias, name) for (alias, name) in duplicates])
raise ValueError('duplicate values found in %r: %s' %
(enumeration, alias_details))
return enumeration
"""Utilities for comparing files and directories.
Classes:
dircmp
Functions:
cmp(f1, f2, shallow=True) -> int
cmpfiles(a, b, common) -> ([], [], [])
clear_cache()
"""
import os
import stat
from itertools import filterfalse
__all__ = ['clear_cache', 'cmp', 'dircmp', 'cmpfiles', 'DEFAULT_IGNORES']
_cache = {}
BUFSIZE = 8*1024
DEFAULT_IGNORES = [
'RCS', 'CVS', 'tags', '.git', '.hg', '.bzr', '_darcs', '__pycache__']
def clear_cache():
"""Clear the filecmp cache."""
_cache.clear()
def cmp(f1, f2, shallow=True):
"""Compare two files.
Arguments:
f1 -- First file name
f2 -- Second file name
shallow -- Just check stat signature (do not read the files).
defaults to True.
Return value:
True if the files are the same, False otherwise.
This function uses a cache for past comparisons and the results,
with cache entries invalidated if their stat information
changes. The cache may be cleared by calling clear_cache().
"""
s1 = _sig(os.stat(f1))
s2 = _sig(os.stat(f2))
if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
return False
if shallow and s1 == s2:
return True
if s1[1] != s2[1]:
return False
outcome = _cache.get((f1, f2, s1, s2))
if outcome is None:
outcome = _do_cmp(f1, f2)
if len(_cache) > 100: # limit the maximum size of the cache
clear_cache()
_cache[f1, f2, s1, s2] = outcome
return outcome
def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)
def _do_cmp(f1, f2):
bufsize = BUFSIZE
with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
while True:
b1 = fp1.read(bufsize)
b2 = fp2.read(bufsize)
if b1 != b2:
return False
if not b1:
return True
# Directory comparison class.
#
class dircmp:
"""A class that manages the comparison of 2 directories.
dircmp(a, b, ignore=None, hide=None)
A and B are directories.
IGNORE is a list of names to ignore,
defaults to DEFAULT_IGNORES.
HIDE is a list of names to hide,
defaults to [os.curdir, os.pardir].
High level usage:
x = dircmp(dir1, dir2)
x.report() -> prints a report on the differences between dir1 and dir2
or
x.report_partial_closure() -> prints report on differences between dir1
and dir2, and reports on common immediate subdirectories.
x.report_full_closure() -> like report_partial_closure,
but fully recursive.
Attributes:
left_list, right_list: The files in dir1 and dir2,
filtered by hide and ignore.
common: a list of names in both dir1 and dir2.
left_only, right_only: names only in dir1, dir2.
common_dirs: subdirectories in both dir1 and dir2.
common_files: files in both dir1 and dir2.
common_funny: names in both dir1 and dir2 where the type differs between
dir1 and dir2, or the name is not stat-able.
same_files: list of identical files.
diff_files: list of filenames which differ.
funny_files: list of files which could not be compared.
subdirs: a dictionary of dircmp objects, keyed by names in common_dirs.
"""
def __init__(self, a, b, ignore=None, hide=None): # Initialize
self.left = a
self.right = b
if hide is None:
self.hide = [os.curdir, os.pardir] # Names never to be shown
else:
self.hide = hide
if ignore is None:
self.ignore = DEFAULT_IGNORES
else:
self.ignore = ignore
def phase0(self): # Compare everything except common subdirectories
self.left_list = _filter(os.listdir(self.left),
self.hide+self.ignore)
self.right_list = _filter(os.listdir(self.right),
self.hide+self.ignore)
self.left_list.sort()
self.right_list.sort()
def phase1(self): # Compute common names
a = dict(zip(map(os.path.normcase, self.left_list), self.left_list))
b = dict(zip(map(os.path.normcase, self.right_list), self.right_list))
self.common = list(map(a.__getitem__, filter(b.__contains__, a)))
self.left_only = list(map(a.__getitem__, filterfalse(b.__contains__, a)))
self.right_only = list(map(b.__getitem__, filterfalse(a.__contains__, b)))
def phase2(self): # Distinguish files, directories, funnies
self.common_dirs = []
self.common_files = []
self.common_funny = []
for x in self.common:
a_path = os.path.join(self.left, x)
b_path = os.path.join(self.right, x)
ok = 1
try:
a_stat = os.stat(a_path)
except OSError as why:
# print('Can\'t stat', a_path, ':', why.args[1])
ok = 0
try:
b_stat = os.stat(b_path)
except OSError as why:
# print('Can\'t stat', b_path, ':', why.args[1])
ok = 0
if ok:
a_type = stat.S_IFMT(a_stat.st_mode)
b_type = stat.S_IFMT(b_stat.st_mode)
if a_type != b_type:
self.common_funny.append(x)
elif stat.S_ISDIR(a_type):
self.common_dirs.append(x)
elif stat.S_ISREG(a_type):
self.common_files.append(x)
else:
self.common_funny.append(x)
else:
self.common_funny.append(x)
def phase3(self): # Find out differences between common files
xx = cmpfiles(self.left, self.right, self.common_files)
self.same_files, self.diff_files, self.funny_files = xx
def phase4(self): # Find out differences between common subdirectories
# A new dircmp object is created for each common subdirectory,
# these are stored in a dictionary indexed by filename.
# The hide and ignore properties are inherited from the parent
self.subdirs = {}
for x in self.common_dirs:
a_x = os.path.join(self.left, x)
b_x = os.path.join(self.right, x)
self.subdirs[x] = dircmp(a_x, b_x, self.ignore, self.hide)
def phase4_closure(self): # Recursively call phase4() on subdirectories
self.phase4()
for sd in self.subdirs.values():
sd.phase4_closure()
def report(self): # Print a report on the differences between a and b
# Output format is purposely lousy
print('diff', self.left, self.right)
if self.left_only:
self.left_only.sort()
print('Only in', self.left, ':', self.left_only)
if self.right_only:
self.right_only.sort()
print('Only in', self.right, ':', self.right_only)
if self.same_files:
self.same_files.sort()
print('Identical files :', self.same_files)
if self.diff_files:
self.diff_files.sort()
print('Differing files :', self.diff_files)
if self.funny_files:
self.funny_files.sort()
print('Trouble with common files :', self.funny_files)
if self.common_dirs:
self.common_dirs.sort()
print('Common subdirectories :', self.common_dirs)
if self.common_funny:
self.common_funny.sort()
print('Common funny cases :', self.common_funny)
def report_partial_closure(self): # Print reports on self and on subdirs
self.report()
for sd in self.subdirs.values():
print()
sd.report()
def report_full_closure(self): # Report on self and subdirs recursively
self.report()
for sd in self.subdirs.values():
print()
sd.report_full_closure()
methodmap = dict(subdirs=phase4,
same_files=phase3, diff_files=phase3, funny_files=phase3,
common_dirs = phase2, common_files=phase2, common_funny=phase2,
common=phase1, left_only=phase1, right_only=phase1,
left_list=phase0, right_list=phase0)
def __getattr__(self, attr):
if attr not in self.methodmap:
raise AttributeError(attr)
self.methodmap[attr](self)
return getattr(self, attr)
def cmpfiles(a, b, common, shallow=True):
"""Compare common files in two directories.
a, b -- directory names
common -- list of file names found in both directories
shallow -- if true, do comparison based solely on stat() information
Returns a tuple of three lists:
files that compare equal
files that are different
filenames that aren't regular files.
"""
res = ([], [], [])
for x in common:
ax = os.path.join(a, x)
bx = os.path.join(b, x)
res[_cmp(ax, bx, shallow)].append(x)
return res
# Compare two files.
# Return:
# 0 for equal
# 1 for different
# 2 for funny cases (can't stat, etc.)
#
def _cmp(a, b, sh, abs=abs, cmp=cmp):
try:
return not abs(cmp(a, b, sh))
except OSError:
return 2
# Return a copy with items that occur in skip removed.
#
def _filter(flist, skip):
return list(filterfalse(skip.__contains__, flist))
# Demonstration and testing.
#
def demo():
import sys
import getopt
options, args = getopt.getopt(sys.argv[1:], 'r')
if len(args) != 2:
raise getopt.GetoptError('need exactly two args', None)
dd = dircmp(args[0], args[1])
if ('-r', '') in options:
dd.report_full_closure()
else:
dd.report()
if __name__ == '__main__':
demo()
"""Helper class to quickly write a loop over all standard input files.
Typical use is:
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:],
defaulting to sys.stdin if the list is empty. If a filename is '-' it
is also replaced by sys.stdin. To specify an alternative list of
filenames, pass it as the argument to input(). A single file name is
also allowed.
Functions filename(), lineno() return the filename and cumulative line
number of the line that has just been read; filelineno() returns its
line number in the current file; isfirstline() returns true iff the
line just read is the first line of its file; isstdin() returns true
iff the line was read from sys.stdin. Function nextfile() closes the
current file so that the next iteration will read the first line from
the next file (if any); lines not read from the file will not count
towards the cumulative line count; the filename is not changed until
after the first line of the next file has been read. Function close()
closes the sequence.
Before any lines have been read, filename() returns None and both line
numbers are zero; nextfile() has no effect. After all lines have been
read, filename() and the line number functions return the values
pertaining to the last line read; nextfile() has no effect.
All files are opened in text mode by default, you can override this by
setting the mode parameter to input() or FileInput.__init__().
If an I/O error occurs during opening or reading a file, the OSError
exception is raised.
If sys.stdin is used more than once, the second and further use will
return no lines, except perhaps for interactive use, or if it has been
explicitly reset (e.g. using sys.stdin.seek(0)).
Empty files are opened and immediately closed; the only time their
presence in the list of filenames is noticeable at all is when the
last file opened is empty.
It is possible that the last line of a file doesn't end in a newline
character; otherwise lines are returned including the trailing
newline.
Class FileInput is the implementation; its methods filename(),
lineno(), fileline(), isfirstline(), isstdin(), nextfile() and close()
correspond to the functions in the module. In addition it has a
readline() method which returns the next input line, and a
__getitem__() method which implements the sequence behavior. The
sequence must be accessed in strictly sequential order; sequence
access and readline() cannot be mixed.
Optional in-place filtering: if the keyword argument inplace=1 is
passed to input() or to the FileInput constructor, the file is moved
to a backup file and standard output is directed to the input file.
This makes it possible to write a filter that rewrites its input file
in place. If the keyword argument backup=".<some extension>" is also
given, it specifies the extension for the backup file, and the backup
file remains around; by default, the extension is ".bak" and it is
deleted when the output file is closed. In-place filtering is
disabled when standard input is read. XXX The current implementation
does not work for MS-DOS 8+3 filesystems.
Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines. Nevertheless, a significant
speed-up has been obtained by using readlines(bufsize) instead of
readline(). A new keyword argument, bufsize=N, is present on the
input() function and the FileInput() class to override the default
buffer size.
XXX Possible additions:
- optional getopt argument processing
- isatty()
- read(), read(size), even readlines()
"""
import sys, os
__all__ = ["input", "close", "nextfile", "filename", "lineno", "filelineno",
"isfirstline", "isstdin", "FileInput"]
_state = None
DEFAULT_BUFSIZE = 8*1024
def input(files=None, inplace=False, backup="", bufsize=0,
mode="r", openhook=None):
"""Return an instance of the FileInput class, which can be iterated.
The parameters are passed to the constructor of the FileInput class.
The returned instance, in addition to being an iterator,
keeps global state for the functions of this module,.
"""
global _state
if _state and _state._file:
raise RuntimeError("input() already active")
_state = FileInput(files, inplace, backup, bufsize, mode, openhook)
return _state
def close():
"""Close the sequence."""
global _state
state = _state
_state = None
if state:
state.close()
def nextfile():
"""
Close the current file so that the next iteration will read the first
line from the next file (if any); lines not read from the file will
not count towards the cumulative line count. The filename is not
changed until after the first line of the next file has been read.
Before the first line has been read, this function has no effect;
it cannot be used to skip the first file. After the last line of the
last file has been read, this function has no effect.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.nextfile()
def filename():
"""
Return the name of the file currently being read.
Before the first line has been read, returns None.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.filename()
def lineno():
"""
Return the cumulative line number of the line that has just been read.
Before the first line has been read, returns 0. After the last line
of the last file has been read, returns the line number of that line.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.lineno()
def filelineno():
"""
Return the line number in the current file. Before the first line
has been read, returns 0. After the last line of the last file has
been read, returns the line number of that line within the file.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.filelineno()
def fileno():
"""
Return the file number of the current file. When no file is currently
opened, returns -1.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.fileno()
def isfirstline():
"""
Returns true the line just read is the first line of its file,
otherwise returns false.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.isfirstline()
def isstdin():
"""
Returns true if the last line was read from sys.stdin,
otherwise returns false.
"""
if not _state:
raise RuntimeError("no active input()")
return _state.isstdin()
class FileInput:
"""FileInput([files[, inplace[, backup[, bufsize, [, mode[, openhook]]]]]])
Class FileInput is the implementation of the module; its methods
filename(), lineno(), fileline(), isfirstline(), isstdin(), fileno(),
nextfile() and close() correspond to the functions of the same name
in the module.
In addition it has a readline() method which returns the next
input line, and a __getitem__() method which implements the
sequence behavior. The sequence must be accessed in strictly
sequential order; random access and readline() cannot be mixed.
"""
def __init__(self, files=None, inplace=False, backup="", bufsize=0,
mode="r", openhook=None):
if isinstance(files, str):
files = (files,)
else:
if files is None:
files = sys.argv[1:]
if not files:
files = ('-',)
else:
files = tuple(files)
self._files = files
self._inplace = inplace
self._backup = backup
self._bufsize = bufsize or DEFAULT_BUFSIZE
self._savestdout = None
self._output = None
self._filename = None
self._lineno = 0
self._filelineno = 0
self._file = None
self._isstdin = False
self._backupfilename = None
self._buffer = []
self._bufindex = 0
# restrict mode argument to reading modes
if mode not in ('r', 'rU', 'U', 'rb'):
raise ValueError("FileInput opening mode must be one of "
"'r', 'rU', 'U' and 'rb'")
if 'U' in mode:
import warnings
warnings.warn("'U' mode is deprecated",
DeprecationWarning, 2)
self._mode = mode
if openhook:
if inplace:
raise ValueError("FileInput cannot use an opening hook in inplace mode")
if not callable(openhook):
raise ValueError("FileInput openhook must be callable")
self._openhook = openhook
def __del__(self):
self.close()
def close(self):
try:
self.nextfile()
finally:
self._files = ()
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.close()
def __iter__(self):
return self
def __next__(self):
try:
line = self._buffer[self._bufindex]
except IndexError:
pass
else:
self._bufindex += 1
self._lineno += 1
self._filelineno += 1
return line
line = self.readline()
if not line:
raise StopIteration
return line
def __getitem__(self, i):
if i != self._lineno:
raise RuntimeError("accessing lines out of order")
try:
return self.__next__()
except StopIteration:
raise IndexError("end of input reached")
def nextfile(self):
savestdout = self._savestdout
self._savestdout = 0
if savestdout:
sys.stdout = savestdout
output = self._output
self._output = 0
try:
if output:
output.close()
finally:
file = self._file
self._file = 0
try:
if file and not self._isstdin:
file.close()
finally:
backupfilename = self._backupfilename
self._backupfilename = 0
if backupfilename and not self._backup:
try: os.unlink(backupfilename)
except OSError: pass
self._isstdin = False
self._buffer = []
self._bufindex = 0
def readline(self):
try:
line = self._buffer[self._bufindex]
except IndexError:
pass
else:
self._bufindex += 1
self._lineno += 1
self._filelineno += 1
return line
if not self._file:
if not self._files:
if 'b' in self._mode:
return b''
else:
return ''
self._filename = self._files[0]
self._files = self._files[1:]
self._filelineno = 0
self._file = None
self._isstdin = False
self._backupfilename = 0
if self._filename == '-':
self._filename = '<stdin>'
if 'b' in self._mode:
self._file = sys.stdin.buffer
else:
self._file = sys.stdin
self._isstdin = True
else:
if self._inplace:
self._backupfilename = (
self._filename + (self._backup or ".bak"))
try:
os.unlink(self._backupfilename)
except OSError:
pass
# The next few lines may raise OSError
os.rename(self._filename, self._backupfilename)
self._file = open(self._backupfilename, self._mode)
try:
perm = os.fstat(self._file.fileno()).st_mode
except OSError:
self._output = open(self._filename, "w")
else:
mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
if hasattr(os, 'O_BINARY'):
mode |= os.O_BINARY
fd = os.open(self._filename, mode, perm)
self._output = os.fdopen(fd, "w")
try:
if hasattr(os, 'chmod'):
os.chmod(self._filename, perm)
except OSError:
pass
self._savestdout = sys.stdout
sys.stdout = self._output
else:
# This may raise OSError
if self._openhook:
self._file = self._openhook(self._filename, self._mode)
else:
self._file = open(self._filename, self._mode)
self._buffer = self._file.readlines(self._bufsize)
self._bufindex = 0
if not self._buffer:
self.nextfile()
# Recursive call
return self.readline()
def filename(self):
return self._filename
def lineno(self):
return self._lineno
def filelineno(self):
return self._filelineno
def fileno(self):
if self._file:
try:
return self._file.fileno()
except ValueError:
return -1
else:
return -1
def isfirstline(self):
return self._filelineno == 1
def isstdin(self):
return self._isstdin
def hook_compressed(filename, mode):
ext = os.path.splitext(filename)[1]
if ext == '.gz':
import gzip
return gzip.open(filename, mode)
elif ext == '.bz2':
import bz2
return bz2.BZ2File(filename, mode)
else:
return open(filename, mode)
def hook_encoded(encoding):
def openhook(filename, mode):
return open(filename, mode, encoding=encoding)
return openhook
def _test():
import getopt
inplace = False
backup = False
opts, args = getopt.getopt(sys.argv[1:], "ib:")
for o, a in opts:
if o == '-i': inplace = True
if o == '-b': backup = a
for line in input(args, inplace=inplace, backup=backup):
if line[-1:] == '\n': line = line[:-1]
if line[-1:] == '\r': line = line[:-1]
print("%d: %s[%d]%s %s" % (lineno(), filename(), filelineno(),
isfirstline() and "*" or "", line))
print("%d: %s[%d]" % (lineno(), filename(), filelineno()))
if __name__ == '__main__':
_test()
"""Filename matching with shell patterns.
fnmatch(FILENAME, PATTERN) matches according to the local convention.
fnmatchcase(FILENAME, PATTERN) always takes case in account.
The functions operate by translating the pattern into a regular
expression. They cache the compiled regular expressions for speed.
The function translate(PATTERN) returns a regular expression
corresponding to PATTERN. (It does not compile it.)
"""
import os
import posixpath
import re
import functools
__all__ = ["filter", "fnmatch", "fnmatchcase", "translate"]
def fnmatch(name, pat):
"""Test whether FILENAME matches PATTERN.
Patterns are Unix shell style:
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any char not in seq
An initial period in FILENAME is not special.
Both FILENAME and PATTERN are first case-normalized
if the operating system requires it.
If you don't want this, use fnmatchcase(FILENAME, PATTERN).
"""
name = os.path.normcase(name)
pat = os.path.normcase(pat)
return fnmatchcase(name, pat)
@functools.lru_cache(maxsize=256, typed=True)
def _compile_pattern(pat):
if isinstance(pat, bytes):
pat_str = str(pat, 'ISO-8859-1')
res_str = translate(pat_str)
res = bytes(res_str, 'ISO-8859-1')
else:
res = translate(pat)
return re.compile(res).match
def filter(names, pat):
"""Return the subset of the list NAMES that match PAT."""
result = []
pat = os.path.normcase(pat)
match = _compile_pattern(pat)
if os.path is posixpath:
# normcase on posix is NOP. Optimize it away from the loop.
for name in names:
if match(name):
result.append(name)
else:
for name in names:
if match(os.path.normcase(name)):
result.append(name)
return result
def fnmatchcase(name, pat):
"""Test whether FILENAME matches PATTERN, including case.
This is a version of fnmatch() which doesn't case-normalize
its arguments.
"""
match = _compile_pattern(pat)
return match(name) is not None
def translate(pat):
"""Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters.
"""
i, n = 0, len(pat)
res = ''
while i < n:
c = pat[i]
i = i+1
if c == '*':
res = res + '.*'
elif c == '?':
res = res + '.'
elif c == '[':
j = i
if j < n and pat[j] == '!':
j = j+1
if j < n and pat[j] == ']':
j = j+1
while j < n and pat[j] != ']':
j = j+1
if j >= n:
res = res + '\\['
else:
stuff = pat[i:j].replace('\\','\\\\')
i = j+1
if stuff[0] == '!':
stuff = '^' + stuff[1:]
elif stuff[0] == '^':
stuff = '\\' + stuff
res = '%s[%s]' % (res, stuff)
else:
res = res + re.escape(c)
return res + '\Z(?ms)'
"""Generic output formatting.
Formatter objects transform an abstract flow of formatting events into
specific output events on writer objects. Formatters manage several stack
structures to allow various properties of a writer object to be changed and
restored; writers need not be able to handle relative changes nor any sort
of ``change back'' operation. Specific writer properties which may be
controlled via formatter objects are horizontal alignment, font, and left
margin indentations. A mechanism is provided which supports providing
arbitrary, non-exclusive style settings to a writer as well. Additional
interfaces facilitate formatting events which are not reversible, such as
paragraph separation.
Writer objects encapsulate device interfaces. Abstract devices, such as
file formats, are supported as well as physical devices. The provided
implementations all work with abstract devices. The interface makes
available mechanisms for setting the properties which formatter objects
manage and inserting data into the output.
"""
import sys
import warnings
warnings.warn('the formatter module is deprecated and will be removed in '
'Python 3.6', PendingDeprecationWarning)
AS_IS = None
class NullFormatter:
"""A formatter which does nothing.
If the writer parameter is omitted, a NullWriter instance is created.
No methods of the writer are called by NullFormatter instances.
Implementations should inherit from this class if implementing a writer
interface but don't need to inherit any implementation.
"""
def __init__(self, writer=None):
if writer is None:
writer = NullWriter()
self.writer = writer
def end_paragraph(self, blankline): pass
def add_line_break(self): pass
def add_hor_rule(self, *args, **kw): pass
def add_label_data(self, format, counter, blankline=None): pass
def add_flowing_data(self, data): pass
def add_literal_data(self, data): pass
def flush_softspace(self): pass
def push_alignment(self, align): pass
def pop_alignment(self): pass
def push_font(self, x): pass
def pop_font(self): pass
def push_margin(self, margin): pass
def pop_margin(self): pass
def set_spacing(self, spacing): pass
def push_style(self, *styles): pass
def pop_style(self, n=1): pass
def assert_line_data(self, flag=1): pass
class AbstractFormatter:
"""The standard formatter.
This implementation has demonstrated wide applicability to many writers,
and may be used directly in most circumstances. It has been used to
implement a full-featured World Wide Web browser.
"""
# Space handling policy: blank spaces at the boundary between elements
# are handled by the outermost context. "Literal" data is not checked
# to determine context, so spaces in literal data are handled directly
# in all circumstances.
def __init__(self, writer):
self.writer = writer # Output device
self.align = None # Current alignment
self.align_stack = [] # Alignment stack
self.font_stack = [] # Font state
self.margin_stack = [] # Margin state
self.spacing = None # Vertical spacing state
self.style_stack = [] # Other state, e.g. color
self.nospace = 1 # Should leading space be suppressed
self.softspace = 0 # Should a space be inserted
self.para_end = 1 # Just ended a paragraph
self.parskip = 0 # Skipped space between paragraphs?
self.hard_break = 1 # Have a hard break
self.have_label = 0
def end_paragraph(self, blankline):
if not self.hard_break:
self.writer.send_line_break()
self.have_label = 0
if self.parskip < blankline and not self.have_label:
self.writer.send_paragraph(blankline - self.parskip)
self.parskip = blankline
self.have_label = 0
self.hard_break = self.nospace = self.para_end = 1
self.softspace = 0
def add_line_break(self):
if not (self.hard_break or self.para_end):
self.writer.send_line_break()
self.have_label = self.parskip = 0
self.hard_break = self.nospace = 1
self.softspace = 0
def add_hor_rule(self, *args, **kw):
if not self.hard_break:
self.writer.send_line_break()
self.writer.send_hor_rule(*args, **kw)
self.hard_break = self.nospace = 1
self.have_label = self.para_end = self.softspace = self.parskip = 0
def add_label_data(self, format, counter, blankline = None):
if self.have_label or not self.hard_break:
self.writer.send_line_break()
if not self.para_end:
self.writer.send_paragraph((blankline and 1) or 0)
if isinstance(format, str):
self.writer.send_label_data(self.format_counter(format, counter))
else:
self.writer.send_label_data(format)
self.nospace = self.have_label = self.hard_break = self.para_end = 1
self.softspace = self.parskip = 0
def format_counter(self, format, counter):
label = ''
for c in format:
if c == '1':
label = label + ('%d' % counter)
elif c in 'aA':
if counter > 0:
label = label + self.format_letter(c, counter)
elif c in 'iI':
if counter > 0:
label = label + self.format_roman(c, counter)
else:
label = label + c
return label
def format_letter(self, case, counter):
label = ''
while counter > 0:
counter, x = divmod(counter-1, 26)
# This makes a strong assumption that lowercase letters
# and uppercase letters form two contiguous blocks, with
# letters in order!
s = chr(ord(case) + x)
label = s + label
return label
def format_roman(self, case, counter):
ones = ['i', 'x', 'c', 'm']
fives = ['v', 'l', 'd']
label, index = '', 0
# This will die of IndexError when counter is too big
while counter > 0:
counter, x = divmod(counter, 10)
if x == 9:
label = ones[index] + ones[index+1] + label
elif x == 4:
label = ones[index] + fives[index] + label
else:
if x >= 5:
s = fives[index]
x = x-5
else:
s = ''
s = s + ones[index]*x
label = s + label
index = index + 1
if case == 'I':
return label.upper()
return label
def add_flowing_data(self, data):
if not data: return
prespace = data[:1].isspace()
postspace = data[-1:].isspace()
data = " ".join(data.split())
if self.nospace and not data:
return
elif prespace or self.softspace:
if not data:
if not self.nospace:
self.softspace = 1
self.parskip = 0
return
if not self.nospace:
data = ' ' + data
self.hard_break = self.nospace = self.para_end = \
self.parskip = self.have_label = 0
self.softspace = postspace
self.writer.send_flowing_data(data)
def add_literal_data(self, data):
if not data: return
if self.softspace:
self.writer.send_flowing_data(" ")
self.hard_break = data[-1:] == '\n'
self.nospace = self.para_end = self.softspace = \
self.parskip = self.have_label = 0
self.writer.send_literal_data(data)
def flush_softspace(self):
if self.softspace:
self.hard_break = self.para_end = self.parskip = \
self.have_label = self.softspace = 0
self.nospace = 1
self.writer.send_flowing_data(' ')
def push_alignment(self, align):
if align and align != self.align:
self.writer.new_alignment(align)
self.align = align
self.align_stack.append(align)
else:
self.align_stack.append(self.align)
def pop_alignment(self):
if self.align_stack:
del self.align_stack[-1]
if self.align_stack:
self.align = align = self.align_stack[-1]
self.writer.new_alignment(align)
else:
self.align = None
self.writer.new_alignment(None)
def push_font(self, font):
size, i, b, tt = font
if self.softspace:
self.hard_break = self.para_end = self.softspace = 0
self.nospace = 1
self.writer.send_flowing_data(' ')
if self.font_stack:
csize, ci, cb, ctt = self.font_stack[-1]
if size is AS_IS: size = csize
if i is AS_IS: i = ci
if b is AS_IS: b = cb
if tt is AS_IS: tt = ctt
font = (size, i, b, tt)
self.font_stack.append(font)
self.writer.new_font(font)
def pop_font(self):
if self.font_stack:
del self.font_stack[-1]
if self.font_stack:
font = self.font_stack[-1]
else:
font = None
self.writer.new_font(font)
def push_margin(self, margin):
self.margin_stack.append(margin)
fstack = [m for m in self.margin_stack if m]
if not margin and fstack:
margin = fstack[-1]
self.writer.new_margin(margin, len(fstack))
def pop_margin(self):
if self.margin_stack:
del self.margin_stack[-1]
fstack = [m for m in self.margin_stack if m]
if fstack:
margin = fstack[-1]
else:
margin = None
self.writer.new_margin(margin, len(fstack))
def set_spacing(self, spacing):
self.spacing = spacing
self.writer.new_spacing(spacing)
def push_style(self, *styles):
if self.softspace:
self.hard_break = self.para_end = self.softspace = 0
self.nospace = 1
self.writer.send_flowing_data(' ')
for style in styles:
self.style_stack.append(style)
self.writer.new_styles(tuple(self.style_stack))
def pop_style(self, n=1):
del self.style_stack[-n:]
self.writer.new_styles(tuple(self.style_stack))
def assert_line_data(self, flag=1):
self.nospace = self.hard_break = not flag
self.para_end = self.parskip = self.have_label = 0
class NullWriter:
"""Minimal writer interface to use in testing & inheritance.
A writer which only provides the interface definition; no actions are
taken on any methods. This should be the base class for all writers
which do not need to inherit any implementation methods.
"""
def __init__(self): pass
def flush(self): pass
def new_alignment(self, align): pass
def new_font(self, font): pass
def new_margin(self, margin, level): pass
def new_spacing(self, spacing): pass
def new_styles(self, styles): pass
def send_paragraph(self, blankline): pass
def send_line_break(self): pass
def send_hor_rule(self, *args, **kw): pass
def send_label_data(self, data): pass
def send_flowing_data(self, data): pass
def send_literal_data(self, data): pass
class AbstractWriter(NullWriter):
"""A writer which can be used in debugging formatters, but not much else.
Each method simply announces itself by printing its name and
arguments on standard output.
"""
def new_alignment(self, align):
print("new_alignment(%r)" % (align,))
def new_font(self, font):
print("new_font(%r)" % (font,))
def new_margin(self, margin, level):
print("new_margin(%r, %d)" % (margin, level))
def new_spacing(self, spacing):
print("new_spacing(%r)" % (spacing,))
def new_styles(self, styles):
print("new_styles(%r)" % (styles,))
def send_paragraph(self, blankline):
print("send_paragraph(%r)" % (blankline,))
def send_line_break(self):
print("send_line_break()")
def send_hor_rule(self, *args, **kw):
print("send_hor_rule()")
def send_label_data(self, data):
print("send_label_data(%r)" % (data,))
def send_flowing_data(self, data):
print("send_flowing_data(%r)" % (data,))
def send_literal_data(self, data):
print("send_literal_data(%r)" % (data,))
class DumbWriter(NullWriter):
"""Simple writer class which writes output on the file object passed in
as the file parameter or, if file is omitted, on standard output. The
output is simply word-wrapped to the number of columns specified by
the maxcol parameter. This class is suitable for reflowing a sequence
of paragraphs.
"""
def __init__(self, file=None, maxcol=72):
self.file = file or sys.stdout
self.maxcol = maxcol
NullWriter.__init__(self)
self.reset()
def reset(self):
self.col = 0
self.atbreak = 0
def send_paragraph(self, blankline):
self.file.write('\n'*blankline)
self.col = 0
self.atbreak = 0
def send_line_break(self):
self.file.write('\n')
self.col = 0
self.atbreak = 0
def send_hor_rule(self, *args, **kw):
self.file.write('\n')
self.file.write('-'*self.maxcol)
self.file.write('\n')
self.col = 0
self.atbreak = 0
def send_literal_data(self, data):
self.file.write(data)
i = data.rfind('\n')
if i >= 0:
self.col = 0
data = data[i+1:]
data = data.expandtabs()
self.col = self.col + len(data)
self.atbreak = 0
def send_flowing_data(self, data):
if not data: return
atbreak = self.atbreak or data[0].isspace()
col = self.col
maxcol = self.maxcol
write = self.file.write
for word in data.split():
if atbreak:
if col + len(word) >= maxcol:
write('\n')
col = 0
else:
write(' ')
col = col + 1
write(word)
col = col + len(word)
atbreak = 1
self.col = col
self.atbreak = data[-1].isspace()
def test(file = None):
w = DumbWriter()
f = AbstractFormatter(w)
if file is not None:
fp = open(file)
elif sys.argv[1:]:
fp = open(sys.argv[1])
else:
fp = sys.stdin
try:
for line in fp:
if line == '\n':
f.end_paragraph(1)
else:
f.add_flowing_data(line)
finally:
if fp is not sys.stdin:
fp.close()
f.end_paragraph(0)
if __name__ == '__main__':
test()
# Originally contributed by Sjoerd Mullender.
# Significantly modified by Jeffrey Yasskin <jyasskin at gmail.com>.
"""Fraction, infinite-precision, real numbers."""
from decimal import Decimal
import math
import numbers
import operator
import re
import sys
__all__ = ['Fraction', 'gcd']
def gcd(a, b):
"""Calculate the Greatest Common Divisor of a and b.
Unless b==0, the result will have the same sign as b (so that when
b is divided by it, the result comes out positive).
"""
while b:
a, b = b, a%b
return a
# Constants related to the hash implementation; hash(x) is based
# on the reduction of x modulo the prime _PyHASH_MODULUS.
_PyHASH_MODULUS = sys.hash_info.modulus
# Value to be used for rationals that reduce to infinity modulo
# _PyHASH_MODULUS.
_PyHASH_INF = sys.hash_info.inf
_RATIONAL_FORMAT = re.compile(r"""
\A\s* # optional whitespace at the start, then
(?P<sign>[-+]?) # an optional sign, then
(?=\d|\.\d) # lookahead for digit or .digit
(?P<num>\d*) # numerator (possibly empty)
(?: # followed by
(?:/(?P<denom>\d+))? # an optional denominator
| # or
(?:\.(?P<decimal>\d*))? # an optional fractional part
(?:E(?P<exp>[-+]?\d+))? # and optional exponent
)
\s*\Z # and optional whitespace to finish
""", re.VERBOSE | re.IGNORECASE)
class Fraction(numbers.Rational):
"""This class implements rational numbers.
In the two-argument form of the constructor, Fraction(8, 6) will
produce a rational number equivalent to 4/3. Both arguments must
be Rational. The numerator defaults to 0 and the denominator
defaults to 1 so that Fraction(3) == 3 and Fraction() == 0.
Fractions can also be constructed from:
- numeric strings similar to those accepted by the
float constructor (for example, '-2.3' or '1e10')
- strings of the form '123/456'
- float and Decimal instances
- other Rational instances (including integers)
"""
__slots__ = ('_numerator', '_denominator')
# We're immutable, so use __new__ not __init__
def __new__(cls, numerator=0, denominator=None):
"""Constructs a Rational.
Takes a string like '3/2' or '1.5', another Rational instance, a
numerator/denominator pair, or a float.
Examples
--------
>>> Fraction(10, -8)
Fraction(-5, 4)
>>> Fraction(Fraction(1, 7), 5)
Fraction(1, 35)
>>> Fraction(Fraction(1, 7), Fraction(2, 3))
Fraction(3, 14)
>>> Fraction('314')
Fraction(314, 1)
>>> Fraction('-35/4')
Fraction(-35, 4)
>>> Fraction('3.1415') # conversion from numeric string
Fraction(6283, 2000)
>>> Fraction('-47e-2') # string may include a decimal exponent
Fraction(-47, 100)
>>> Fraction(1.47) # direct construction from float (exact conversion)
Fraction(6620291452234629, 4503599627370496)
>>> Fraction(2.25)
Fraction(9, 4)
>>> Fraction(Decimal('1.47'))
Fraction(147, 100)
"""
self = super(Fraction, cls).__new__(cls)
if denominator is None:
if isinstance(numerator, numbers.Rational):
self._numerator = numerator.numerator
self._denominator = numerator.denominator
return self
elif isinstance(numerator, float):
# Exact conversion from float
value = Fraction.from_float(numerator)
self._numerator = value._numerator
self._denominator = value._denominator
return self
elif isinstance(numerator, Decimal):
value = Fraction.from_decimal(numerator)
self._numerator = value._numerator
self._denominator = value._denominator
return self
elif isinstance(numerator, str):
# Handle construction from strings.
m = _RATIONAL_FORMAT.match(numerator)
if m is None:
raise ValueError('Invalid literal for Fraction: %r' %
numerator)
numerator = int(m.group('num') or '0')
denom = m.group('denom')
if denom:
denominator = int(denom)
else:
denominator = 1
decimal = m.group('decimal')
if decimal:
scale = 10**len(decimal)
numerator = numerator * scale + int(decimal)
denominator *= scale
exp = m.group('exp')
if exp:
exp = int(exp)
if exp >= 0:
numerator *= 10**exp
else:
denominator *= 10**-exp
if m.group('sign') == '-':
numerator = -numerator
else:
raise TypeError("argument should be a string "
"or a Rational instance")
elif (isinstance(numerator, numbers.Rational) and
isinstance(denominator, numbers.Rational)):
numerator, denominator = (
numerator.numerator * denominator.denominator,
denominator.numerator * numerator.denominator
)
else:
raise TypeError("both arguments should be "
"Rational instances")
if denominator == 0:
raise ZeroDivisionError('Fraction(%s, 0)' % numerator)
g = gcd(numerator, denominator)
self._numerator = numerator // g
self._denominator = denominator // g
return self
@classmethod
def from_float(cls, f):
"""Converts a finite float to a rational number, exactly.
Beware that Fraction.from_float(0.3) != Fraction(3, 10).
"""
if isinstance(f, numbers.Integral):
return cls(f)
elif not isinstance(f, float):
raise TypeError("%s.from_float() only takes floats, not %r (%s)" %
(cls.__name__, f, type(f).__name__))
if math.isnan(f):
raise ValueError("Cannot convert %r to %s." % (f, cls.__name__))
if math.isinf(f):
raise OverflowError("Cannot convert %r to %s." % (f, cls.__name__))
return cls(*f.as_integer_ratio())
@classmethod
def from_decimal(cls, dec):
"""Converts a finite Decimal instance to a rational number, exactly."""
from decimal import Decimal
if isinstance(dec, numbers.Integral):
dec = Decimal(int(dec))
elif not isinstance(dec, Decimal):
raise TypeError(
"%s.from_decimal() only takes Decimals, not %r (%s)" %
(cls.__name__, dec, type(dec).__name__))
if dec.is_infinite():
raise OverflowError(
"Cannot convert %s to %s." % (dec, cls.__name__))
if dec.is_nan():
raise ValueError("Cannot convert %s to %s." % (dec, cls.__name__))
sign, digits, exp = dec.as_tuple()
digits = int(''.join(map(str, digits)))
if sign:
digits = -digits
if exp >= 0:
return cls(digits * 10 ** exp)
else:
return cls(digits, 10 ** -exp)
def limit_denominator(self, max_denominator=1000000):
"""Closest Fraction to self with denominator at most max_denominator.
>>> Fraction('3.141592653589793').limit_denominator(10)
Fraction(22, 7)
>>> Fraction('3.141592653589793').limit_denominator(100)
Fraction(311, 99)
>>> Fraction(4321, 8765).limit_denominator(10000)
Fraction(4321, 8765)
"""
# Algorithm notes: For any real number x, define a *best upper
# approximation* to x to be a rational number p/q such that:
#
# (1) p/q >= x, and
# (2) if p/q > r/s >= x then s > q, for any rational r/s.
#
# Define *best lower approximation* similarly. Then it can be
# proved that a rational number is a best upper or lower
# approximation to x if, and only if, it is a convergent or
# semiconvergent of the (unique shortest) continued fraction
# associated to x.
#
# To find a best rational approximation with denominator <= M,
# we find the best upper and lower approximations with
# denominator <= M and take whichever of these is closer to x.
# In the event of a tie, the bound with smaller denominator is
# chosen. If both denominators are equal (which can happen
# only when max_denominator == 1 and self is midway between
# two integers) the lower bound---i.e., the floor of self, is
# taken.
if max_denominator < 1:
raise ValueError("max_denominator should be at least 1")
if self._denominator <= max_denominator:
return Fraction(self)
p0, q0, p1, q1 = 0, 1, 1, 0
n, d = self._numerator, self._denominator
while True:
a = n//d
q2 = q0+a*q1
if q2 > max_denominator:
break
p0, q0, p1, q1 = p1, q1, p0+a*p1, q2
n, d = d, n-a*d
k = (max_denominator-q0)//q1
bound1 = Fraction(p0+k*p1, q0+k*q1)
bound2 = Fraction(p1, q1)
if abs(bound2 - self) <= abs(bound1-self):
return bound2
else:
return bound1
@property
def numerator(a):
return a._numerator
@property
def denominator(a):
return a._denominator
def __repr__(self):
"""repr(self)"""
return ('Fraction(%s, %s)' % (self._numerator, self._denominator))
def __str__(self):
"""str(self)"""
if self._denominator == 1:
return str(self._numerator)
else:
return '%s/%s' % (self._numerator, self._denominator)
def _operator_fallbacks(monomorphic_operator, fallback_operator):
"""Generates forward and reverse operators given a purely-rational
operator and a function from the operator module.
Use this like:
__op__, __rop__ = _operator_fallbacks(just_rational_op, operator.op)
In general, we want to implement the arithmetic operations so
that mixed-mode operations either call an implementation whose
author knew about the types of both arguments, or convert both
to the nearest built in type and do the operation there. In
Fraction, that means that we define __add__ and __radd__ as:
def __add__(self, other):
# Both types have numerators/denominator attributes,
# so do the operation directly
if isinstance(other, (int, Fraction)):
return Fraction(self.numerator * other.denominator +
other.numerator * self.denominator,
self.denominator * other.denominator)
# float and complex don't have those operations, but we
# know about those types, so special case them.
elif isinstance(other, float):
return float(self) + other
elif isinstance(other, complex):
return complex(self) + other
# Let the other type take over.
return NotImplemented
def __radd__(self, other):
# radd handles more types than add because there's
# nothing left to fall back to.
if isinstance(other, numbers.Rational):
return Fraction(self.numerator * other.denominator +
other.numerator * self.denominator,
self.denominator * other.denominator)
elif isinstance(other, Real):
return float(other) + float(self)
elif isinstance(other, Complex):
return complex(other) + complex(self)
return NotImplemented
There are 5 different cases for a mixed-type addition on
Fraction. I'll refer to all of the above code that doesn't
refer to Fraction, float, or complex as "boilerplate". 'r'
will be an instance of Fraction, which is a subtype of
Rational (r : Fraction <: Rational), and b : B <:
Complex. The first three involve 'r + b':
1. If B <: Fraction, int, float, or complex, we handle
that specially, and all is well.
2. If Fraction falls back to the boilerplate code, and it
were to return a value from __add__, we'd miss the
possibility that B defines a more intelligent __radd__,
so the boilerplate should return NotImplemented from
__add__. In particular, we don't handle Rational
here, even though we could get an exact answer, in case
the other type wants to do something special.
3. If B <: Fraction, Python tries B.__radd__ before
Fraction.__add__. This is ok, because it was
implemented with knowledge of Fraction, so it can
handle those instances before delegating to Real or
Complex.
The next two situations describe 'b + r'. We assume that b
didn't know about Fraction in its implementation, and that it
uses similar boilerplate code:
4. If B <: Rational, then __radd_ converts both to the
builtin rational type (hey look, that's us) and
proceeds.
5. Otherwise, __radd__ tries to find the nearest common
base ABC, and fall back to its builtin type. Since this
class doesn't subclass a concrete type, there's no
implementation to fall back to, so we need to try as
hard as possible to return an actual value, or the user
will get a TypeError.
"""
def forward(a, b):
if isinstance(b, (int, Fraction)):
return monomorphic_operator(a, b)
elif isinstance(b, float):
return fallback_operator(float(a), b)
elif isinstance(b, complex):
return fallback_operator(complex(a), b)
else:
return NotImplemented
forward.__name__ = '__' + fallback_operator.__name__ + '__'
forward.__doc__ = monomorphic_operator.__doc__
def reverse(b, a):
if isinstance(a, numbers.Rational):
# Includes ints.
return monomorphic_operator(a, b)
elif isinstance(a, numbers.Real):
return fallback_operator(float(a), float(b))
elif isinstance(a, numbers.Complex):
return fallback_operator(complex(a), complex(b))
else:
return NotImplemented
reverse.__name__ = '__r' + fallback_operator.__name__ + '__'
reverse.__doc__ = monomorphic_operator.__doc__
return forward, reverse
def _add(a, b):
"""a + b"""
return Fraction(a.numerator * b.denominator +
b.numerator * a.denominator,
a.denominator * b.denominator)
__add__, __radd__ = _operator_fallbacks(_add, operator.add)
def _sub(a, b):
"""a - b"""
return Fraction(a.numerator * b.denominator -
b.numerator * a.denominator,
a.denominator * b.denominator)
__sub__, __rsub__ = _operator_fallbacks(_sub, operator.sub)
def _mul(a, b):
"""a * b"""
return Fraction(a.numerator * b.numerator, a.denominator * b.denominator)
__mul__, __rmul__ = _operator_fallbacks(_mul, operator.mul)
def _div(a, b):
"""a / b"""
return Fraction(a.numerator * b.denominator,
a.denominator * b.numerator)
__truediv__, __rtruediv__ = _operator_fallbacks(_div, operator.truediv)
def __floordiv__(a, b):
"""a // b"""
return math.floor(a / b)
def __rfloordiv__(b, a):
"""a // b"""
return math.floor(a / b)
def __mod__(a, b):
"""a % b"""
div = a // b
return a - b * div
def __rmod__(b, a):
"""a % b"""
div = a // b
return a - b * div
def __pow__(a, b):
"""a ** b
If b is not an integer, the result will be a float or complex
since roots are generally irrational. If b is an integer, the
result will be rational.
"""
if isinstance(b, numbers.Rational):
if b.denominator == 1:
power = b.numerator
if power >= 0:
return Fraction(a._numerator ** power,
a._denominator ** power)
else:
return Fraction(a._denominator ** -power,
a._numerator ** -power)
else:
# A fractional power will generally produce an
# irrational number.
return float(a) ** float(b)
else:
return float(a) ** b
def __rpow__(b, a):
"""a ** b"""
if b._denominator == 1 and b._numerator >= 0:
# If a is an int, keep it that way if possible.
return a ** b._numerator
if isinstance(a, numbers.Rational):
return Fraction(a.numerator, a.denominator) ** b
if b._denominator == 1:
return a ** b._numerator
return a ** float(b)
def __pos__(a):
"""+a: Coerces a subclass instance to Fraction"""
return Fraction(a._numerator, a._denominator)
def __neg__(a):
"""-a"""
return Fraction(-a._numerator, a._denominator)
def __abs__(a):
"""abs(a)"""
return Fraction(abs(a._numerator), a._denominator)
def __trunc__(a):
"""trunc(a)"""
if a._numerator < 0:
return -(-a._numerator // a._denominator)
else:
return a._numerator // a._denominator
def __floor__(a):
"""Will be math.floor(a) in 3.0."""
return a.numerator // a.denominator
def __ceil__(a):
"""Will be math.ceil(a) in 3.0."""
# The negations cleverly convince floordiv to return the ceiling.
return -(-a.numerator // a.denominator)
def __round__(self, ndigits=None):
"""Will be round(self, ndigits) in 3.0.
Rounds half toward even.
"""
if ndigits is None:
floor, remainder = divmod(self.numerator, self.denominator)
if remainder * 2 < self.denominator:
return floor
elif remainder * 2 > self.denominator:
return floor + 1
# Deal with the half case:
elif floor % 2 == 0:
return floor
else:
return floor + 1
shift = 10**abs(ndigits)
# See _operator_fallbacks.forward to check that the results of
# these operations will always be Fraction and therefore have
# round().
if ndigits > 0:
return Fraction(round(self * shift), shift)
else:
return Fraction(round(self / shift) * shift)
def __hash__(self):
"""hash(self)"""
# XXX since this method is expensive, consider caching the result
# In order to make sure that the hash of a Fraction agrees
# with the hash of a numerically equal integer, float or
# Decimal instance, we follow the rules for numeric hashes
# outlined in the documentation. (See library docs, 'Built-in
# Types').
# dinv is the inverse of self._denominator modulo the prime
# _PyHASH_MODULUS, or 0 if self._denominator is divisible by
# _PyHASH_MODULUS.
dinv = pow(self._denominator, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
if not dinv:
hash_ = _PyHASH_INF
else:
hash_ = abs(self._numerator) * dinv % _PyHASH_MODULUS
result = hash_ if self >= 0 else -hash_
return -2 if result == -1 else result
def __eq__(a, b):
"""a == b"""
if isinstance(b, numbers.Rational):
return (a._numerator == b.numerator and
a._denominator == b.denominator)
if isinstance(b, numbers.Complex) and b.imag == 0:
b = b.real
if isinstance(b, float):
if math.isnan(b) or math.isinf(b):
# comparisons with an infinity or nan should behave in
# the same way for any finite a, so treat a as zero.
return 0.0 == b
else:
return a == a.from_float(b)
else:
# Since a doesn't know how to compare with b, let's give b
# a chance to compare itself with a.
return NotImplemented
def _richcmp(self, other, op):
"""Helper for comparison operators, for internal use only.
Implement comparison between a Rational instance `self`, and
either another Rational instance or a float `other`. If
`other` is not a Rational instance or a float, return
NotImplemented. `op` should be one of the six standard
comparison operators.
"""
# convert other to a Rational instance where reasonable.
if isinstance(other, numbers.Rational):
return op(self._numerator * other.denominator,
self._denominator * other.numerator)
if isinstance(other, float):
if math.isnan(other) or math.isinf(other):
return op(0.0, other)
else:
return op(self, self.from_float(other))
else:
return NotImplemented
def __lt__(a, b):
"""a < b"""
return a._richcmp(b, operator.lt)
def __gt__(a, b):
"""a > b"""
return a._richcmp(b, operator.gt)
def __le__(a, b):
"""a <= b"""
return a._richcmp(b, operator.le)
def __ge__(a, b):
"""a >= b"""
return a._richcmp(b, operator.ge)
def __bool__(a):
"""a != 0"""
return a._numerator != 0
# support for pickling, copy, and deepcopy
def __reduce__(self):
return (self.__class__, (str(self),))
def __copy__(self):
if type(self) == Fraction:
return self # I'm immutable; therefore I am my own clone
return self.__class__(self._numerator, self._denominator)
def __deepcopy__(self, memo):
if type(self) == Fraction:
return self # My components are also immutable
return self.__class__(self._numerator, self._denominator)
"""An FTP client class and some helper functions.
Based on RFC 959: File Transfer Protocol (FTP), by J. Postel and J. Reynolds
Example:
>>> from ftplib import FTP
>>> ftp = FTP('ftp.python.org') # connect to host, default port
>>> ftp.login() # default, i.e.: user anonymous, passwd anonymous@
'230 Guest login ok, access restrictions apply.'
>>> ftp.retrlines('LIST') # list directory contents
total 9
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 .
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 ..
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 bin
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 etc
d-wxrwxr-x 2 ftp wheel 1024 Sep 5 13:43 incoming
drwxr-xr-x 2 root wheel 1024 Nov 17 1993 lib
drwxr-xr-x 6 1094 wheel 1024 Sep 13 19:07 pub
drwxr-xr-x 3 root wheel 1024 Jan 3 1994 usr
-rw-r--r-- 1 root root 312 Aug 1 1994 welcome.msg
'226 Transfer complete.'
>>> ftp.quit()
'221 Goodbye.'
>>>
A nice test that reveals some of the network dialogue would be:
python ftplib.py -d localhost -l -p -l
"""
#
# Changes and improvements suggested by Steve Majewski.
# Modified by Jack to work on the mac.
# Modified by Siebren to support docstrings and PASV.
# Modified by Phil Schwartz to add storbinary and storlines callbacks.
# Modified by Giampaolo Rodola' to add TLS support.
#
import os
import sys
import socket
import warnings
from socket import _GLOBAL_DEFAULT_TIMEOUT
__all__ = ["FTP", "Netrc"]
# Magic number from <socket.h>
MSG_OOB = 0x1 # Process data out of band
# The standard FTP server control port
FTP_PORT = 21
# The sizehint parameter passed to readline() calls
MAXLINE = 8192
# Exception raised when an error or invalid response is received
class Error(Exception): pass
class error_reply(Error): pass # unexpected [123]xx reply
class error_temp(Error): pass # 4xx errors
class error_perm(Error): pass # 5xx errors
class error_proto(Error): pass # response does not begin with [1-5]
# All exceptions (hopefully) that may be raised here and that aren't
# (always) programming errors on our side
all_errors = (Error, OSError, EOFError)
# Line terminators (we always output CRLF, but accept any of CRLF, CR, LF)
CRLF = '\r\n'
B_CRLF = b'\r\n'
# The class itself
class FTP:
'''An FTP client class.
To create a connection, call the class using these arguments:
host, user, passwd, acct, timeout
The first four arguments are all strings, and have default value ''.
timeout must be numeric and defaults to None if not passed,
meaning that no timeout will be set on any ftp socket(s)
If a timeout is passed, then this is now the default timeout for all ftp
socket operations for this instance.
Then use self.connect() with optional host and port argument.
To download a file, use ftp.retrlines('RETR ' + filename),
or ftp.retrbinary() with slightly different arguments.
To upload a file, use ftp.storlines() or ftp.storbinary(),
which have an open file as argument (see their definitions
below for details).
The download/upload functions first issue appropriate TYPE
and PORT or PASV commands.
'''
debugging = 0
host = ''
port = FTP_PORT
maxline = MAXLINE
sock = None
file = None
welcome = None
passiveserver = 1
encoding = "latin-1"
# Initialization method (called by class instantiation).
# Initialize host to localhost, port to standard ftp port
# Optional arguments are host (for connect()),
# and user, passwd, acct (for login())
def __init__(self, host='', user='', passwd='', acct='',
timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None):
self.source_address = source_address
self.timeout = timeout
if host:
self.connect(host)
if user:
self.login(user, passwd, acct)
def __enter__(self):
return self
# Context management protocol: try to quit() if active
def __exit__(self, *args):
if self.sock is not None:
try:
self.quit()
except (OSError, EOFError):
pass
finally:
if self.sock is not None:
self.close()
def connect(self, host='', port=0, timeout=-999, source_address=None):
'''Connect to host. Arguments are:
- host: hostname to connect to (string, default previous host)
- port: port to connect to (integer, default previous port)
- timeout: the timeout to set against the ftp socket(s)
- source_address: a 2-tuple (host, port) for the socket to bind
to as its source address before connecting.
'''
if host != '':
self.host = host
if port > 0:
self.port = port
if timeout != -999:
self.timeout = timeout
if source_address is not None:
self.source_address = source_address
self.sock = socket.create_connection((self.host, self.port), self.timeout,
source_address=self.source_address)
self.af = self.sock.family
self.file = self.sock.makefile('r', encoding=self.encoding)
self.welcome = self.getresp()
return self.welcome
def getwelcome(self):
'''Get the welcome message from the server.
(this is read and squirreled away by connect())'''
if self.debugging:
print('*welcome*', self.sanitize(self.welcome))
return self.welcome
def set_debuglevel(self, level):
'''Set the debugging level.
The required argument level means:
0: no debugging output (default)
1: print commands and responses but not body text etc.
2: also print raw lines read and sent before stripping CR/LF'''
self.debugging = level
debug = set_debuglevel
def set_pasv(self, val):
'''Use passive or active mode for data transfers.
With a false argument, use the normal PORT mode,
With a true argument, use the PASV command.'''
self.passiveserver = val
# Internal: "sanitize" a string for printing
def sanitize(self, s):
if s[:5] in {'pass ', 'PASS '}:
i = len(s.rstrip('\r\n'))
s = s[:5] + '*'*(i-5) + s[i:]
return repr(s)
# Internal: send one line to the server, appending CRLF
def putline(self, line):
if '\r' in line or '\n' in line:
raise ValueError('an illegal newline character should not be contained')
line = line + CRLF
if self.debugging > 1:
print('*put*', self.sanitize(line))
self.sock.sendall(line.encode(self.encoding))
# Internal: send one command to the server (through putline())
def putcmd(self, line):
if self.debugging: print('*cmd*', self.sanitize(line))
self.putline(line)
# Internal: return one line from the server, stripping CRLF.
# Raise EOFError if the connection is closed
def getline(self):
line = self.file.readline(self.maxline + 1)
if len(line) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if self.debugging > 1:
print('*get*', self.sanitize(line))
if not line:
raise EOFError
if line[-2:] == CRLF:
line = line[:-2]
elif line[-1:] in CRLF:
line = line[:-1]
return line
# Internal: get a response from the server, which may possibly
# consist of multiple lines. Return a single string with no
# trailing CRLF. If the response consists of multiple lines,
# these are separated by '\n' characters in the string
def getmultiline(self):
line = self.getline()
if line[3:4] == '-':
code = line[:3]
while 1:
nextline = self.getline()
line = line + ('\n' + nextline)
if nextline[:3] == code and \
nextline[3:4] != '-':
break
return line
# Internal: get a response from the server.
# Raise various errors if the response indicates an error
def getresp(self):
resp = self.getmultiline()
if self.debugging:
print('*resp*', self.sanitize(resp))
self.lastresp = resp[:3]
c = resp[:1]
if c in {'1', '2', '3'}:
return resp
if c == '4':
raise error_temp(resp)
if c == '5':
raise error_perm(resp)
raise error_proto(resp)
def voidresp(self):
"""Expect a response beginning with '2'."""
resp = self.getresp()
if resp[:1] != '2':
raise error_reply(resp)
return resp
def abort(self):
'''Abort a file transfer. Uses out-of-band data.
This does not follow the procedure from the RFC to send Telnet
IP and Synch; that doesn't seem to work with the servers I've
tried. Instead, just send the ABOR command as OOB data.'''
line = b'ABOR' + B_CRLF
if self.debugging > 1:
print('*put urgent*', self.sanitize(line))
self.sock.sendall(line, MSG_OOB)
resp = self.getmultiline()
if resp[:3] not in {'426', '225', '226'}:
raise error_proto(resp)
return resp
def sendcmd(self, cmd):
'''Send a command and return the response.'''
self.putcmd(cmd)
return self.getresp()
def voidcmd(self, cmd):
"""Send a command and expect a response beginning with '2'."""
self.putcmd(cmd)
return self.voidresp()
def sendport(self, host, port):
'''Send a PORT command with the current host and the given
port number.
'''
hbytes = host.split('.')
pbytes = [repr(port//256), repr(port%256)]
bytes = hbytes + pbytes
cmd = 'PORT ' + ','.join(bytes)
return self.voidcmd(cmd)
def sendeprt(self, host, port):
'''Send an EPRT command with the current host and the given port number.'''
af = 0
if self.af == socket.AF_INET:
af = 1
if self.af == socket.AF_INET6:
af = 2
if af == 0:
raise error_proto('unsupported address family')
fields = ['', repr(af), host, repr(port), '']
cmd = 'EPRT ' + '|'.join(fields)
return self.voidcmd(cmd)
def makeport(self):
'''Create a new socket and send a PORT command for it.'''
err = None
sock = None
for res in socket.getaddrinfo(None, 0, self.af, socket.SOCK_STREAM, 0, socket.AI_PASSIVE):
af, socktype, proto, canonname, sa = res
try:
sock = socket.socket(af, socktype, proto)
sock.bind(sa)
except OSError as _:
err = _
if sock:
sock.close()
sock = None
continue
break
if sock is None:
if err is not None:
raise err
else:
raise OSError("getaddrinfo returns an empty list")
sock.listen(1)
port = sock.getsockname()[1] # Get proper port
host = self.sock.getsockname()[0] # Get proper host
if self.af == socket.AF_INET:
resp = self.sendport(host, port)
else:
resp = self.sendeprt(host, port)
if self.timeout is not _GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(self.timeout)
return sock
def makepasv(self):
if self.af == socket.AF_INET:
host, port = parse227(self.sendcmd('PASV'))
else:
host, port = parse229(self.sendcmd('EPSV'), self.sock.getpeername())
return host, port
def ntransfercmd(self, cmd, rest=None):
"""Initiate a transfer over the data connection.
If the transfer is active, send a port command and the
transfer command, and accept the connection. If the server is
passive, send a pasv command, connect to it, and start the
transfer command. Either way, return the socket for the
connection and the expected size of the transfer. The
expected size may be None if it could not be determined.
Optional `rest' argument can be a string that is sent as the
argument to a REST command. This is essentially a server
marker used to tell the server to skip over any data up to the
given marker.
"""
size = None
if self.passiveserver:
host, port = self.makepasv()
conn = socket.create_connection((host, port), self.timeout,
source_address=self.source_address)
try:
if rest is not None:
self.sendcmd("REST %s" % rest)
resp = self.sendcmd(cmd)
# Some servers apparently send a 200 reply to
# a LIST or STOR command, before the 150 reply
# (and way before the 226 reply). This seems to
# be in violation of the protocol (which only allows
# 1xx or error messages for LIST), so we just discard
# this response.
if resp[0] == '2':
resp = self.getresp()
if resp[0] != '1':
raise error_reply(resp)
except:
conn.close()
raise
else:
with self.makeport() as sock:
if rest is not None:
self.sendcmd("REST %s" % rest)
resp = self.sendcmd(cmd)
# See above.
if resp[0] == '2':
resp = self.getresp()
if resp[0] != '1':
raise error_reply(resp)
conn, sockaddr = sock.accept()
if self.timeout is not _GLOBAL_DEFAULT_TIMEOUT:
conn.settimeout(self.timeout)
if resp[:3] == '150':
# this is conditional in case we received a 125
size = parse150(resp)
return conn, size
def transfercmd(self, cmd, rest=None):
"""Like ntransfercmd() but returns only the socket."""
return self.ntransfercmd(cmd, rest)[0]
def login(self, user = '', passwd = '', acct = ''):
'''Login, default anonymous.'''
if not user:
user = 'anonymous'
if not passwd:
passwd = ''
if not acct:
acct = ''
if user == 'anonymous' and passwd in {'', '-'}:
# If there is no anonymous ftp password specified
# then we'll just use anonymous@
# We don't send any other thing because:
# - We want to remain anonymous
# - We want to stop SPAM
# - We don't want to let ftp sites to discriminate by the user,
# host or country.
passwd = passwd + 'anonymous@'
resp = self.sendcmd('USER ' + user)
if resp[0] == '3':
resp = self.sendcmd('PASS ' + passwd)
if resp[0] == '3':
resp = self.sendcmd('ACCT ' + acct)
if resp[0] != '2':
raise error_reply(resp)
return resp
def retrbinary(self, cmd, callback, blocksize=8192, rest=None):
"""Retrieve data in binary mode. A new port is created for you.
Args:
cmd: A RETR command.
callback: A single parameter callable to be called on each
block of data read.
blocksize: The maximum number of bytes to read from the
socket at one time. [default: 8192]
rest: Passed to transfercmd(). [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE I')
with self.transfercmd(cmd, rest) as conn:
while 1:
data = conn.recv(blocksize)
if not data:
break
callback(data)
# shutdown ssl layer
if _SSLSocket is not None and isinstance(conn, _SSLSocket):
conn.unwrap()
return self.voidresp()
def retrlines(self, cmd, callback = None):
"""Retrieve data in line mode. A new port is created for you.
Args:
cmd: A RETR, LIST, or NLST command.
callback: An optional single parameter callable that is called
for each line with the trailing CRLF stripped.
[default: print_line()]
Returns:
The response code.
"""
if callback is None:
callback = print_line
resp = self.sendcmd('TYPE A')
with self.transfercmd(cmd) as conn, \
conn.makefile('r', encoding=self.encoding) as fp:
while 1:
line = fp.readline(self.maxline + 1)
if len(line) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if self.debugging > 2:
print('*retr*', repr(line))
if not line:
break
if line[-2:] == CRLF:
line = line[:-2]
elif line[-1:] == '\n':
line = line[:-1]
callback(line)
# shutdown ssl layer
if _SSLSocket is not None and isinstance(conn, _SSLSocket):
conn.unwrap()
return self.voidresp()
def storbinary(self, cmd, fp, blocksize=8192, callback=None, rest=None):
"""Store a file in binary mode. A new port is created for you.
Args:
cmd: A STOR command.
fp: A file-like object with a read(num_bytes) method.
blocksize: The maximum data size to read from fp and send over
the connection at once. [default: 8192]
callback: An optional single parameter callable that is called on
each block of data after it is sent. [default: None]
rest: Passed to transfercmd(). [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE I')
with self.transfercmd(cmd, rest) as conn:
while 1:
buf = fp.read(blocksize)
if not buf:
break
conn.sendall(buf)
if callback:
callback(buf)
# shutdown ssl layer
if _SSLSocket is not None and isinstance(conn, _SSLSocket):
conn.unwrap()
return self.voidresp()
def storlines(self, cmd, fp, callback=None):
"""Store a file in line mode. A new port is created for you.
Args:
cmd: A STOR command.
fp: A file-like object with a readline() method.
callback: An optional single parameter callable that is called on
each line after it is sent. [default: None]
Returns:
The response code.
"""
self.voidcmd('TYPE A')
with self.transfercmd(cmd) as conn:
while 1:
buf = fp.readline(self.maxline + 1)
if len(buf) > self.maxline:
raise Error("got more than %d bytes" % self.maxline)
if not buf:
break
if buf[-2:] != B_CRLF:
if buf[-1] in B_CRLF: buf = buf[:-1]
buf = buf + B_CRLF
conn.sendall(buf)
if callback:
callback(buf)
# shutdown ssl layer
if _SSLSocket is not None and isinstance(conn, _SSLSocket):
conn.unwrap()
return self.voidresp()
def acct(self, password):
'''Send new account name.'''
cmd = 'ACCT ' + password
return self.voidcmd(cmd)
def nlst(self, *args):
'''Return a list of files in a given directory (default the current).'''
cmd = 'NLST'
for arg in args:
cmd = cmd + (' ' + arg)
files = []
self.retrlines(cmd, files.append)
return files
def dir(self, *args):
'''List a directory in long form.
By default list current directory to stdout.
Optional last argument is callback function; all
non-empty arguments before it are concatenated to the
LIST command. (This *should* only be used for a pathname.)'''
cmd = 'LIST'
func = None
if args[-1:] and type(args[-1]) != type(''):
args, func = args[:-1], args[-1]
for arg in args:
if arg:
cmd = cmd + (' ' + arg)
self.retrlines(cmd, func)
def mlsd(self, path="", facts=[]):
'''List a directory in a standardized format by using MLSD
command (RFC-3659). If path is omitted the current directory
is assumed. "facts" is a list of strings representing the type
of information desired (e.g. ["type", "size", "perm"]).
Return a generator object yielding a tuple of two elements
for every file found in path.
First element is the file name, the second one is a dictionary
including a variable number of "facts" depending on the server
and whether "facts" argument has been provided.
'''
if facts:
self.sendcmd("OPTS MLST " + ";".join(facts) + ";")
if path:
cmd = "MLSD %s" % path
else:
cmd = "MLSD"
lines = []
self.retrlines(cmd, lines.append)
for line in lines:
facts_found, _, name = line.rstrip(CRLF).partition(' ')
entry = {}
for fact in facts_found[:-1].split(";"):
key, _, value = fact.partition("=")
entry[key.lower()] = value
yield (name, entry)
def rename(self, fromname, toname):
'''Rename a file.'''
resp = self.sendcmd('RNFR ' + fromname)
if resp[0] != '3':
raise error_reply(resp)
return self.voidcmd('RNTO ' + toname)
def delete(self, filename):
'''Delete a file.'''
resp = self.sendcmd('DELE ' + filename)
if resp[:3] in {'250', '200'}:
return resp
else:
raise error_reply(resp)
def cwd(self, dirname):
'''Change to a directory.'''
if dirname == '..':
try:
return self.voidcmd('CDUP')
except error_perm as msg:
if msg.args[0][:3] != '500':
raise
elif dirname == '':
dirname = '.' # does nothing, but could return error
cmd = 'CWD ' + dirname
return self.voidcmd(cmd)
def size(self, filename):
'''Retrieve the size of a file.'''
# The SIZE command is defined in RFC-3659
resp = self.sendcmd('SIZE ' + filename)
if resp[:3] == '213':
s = resp[3:].strip()
return int(s)
def mkd(self, dirname):
'''Make a directory, return its full pathname.'''
resp = self.voidcmd('MKD ' + dirname)
# fix around non-compliant implementations such as IIS shipped
# with Windows server 2003
if not resp.startswith('257'):
return ''
return parse257(resp)
def rmd(self, dirname):
'''Remove a directory.'''
return self.voidcmd('RMD ' + dirname)
def pwd(self):
'''Return current working directory.'''
resp = self.voidcmd('PWD')
# fix around non-compliant implementations such as IIS shipped
# with Windows server 2003
if not resp.startswith('257'):
return ''
return parse257(resp)
def quit(self):
'''Quit, and close the connection.'''
resp = self.voidcmd('QUIT')
self.close()
return resp
def close(self):
'''Close the connection without assuming anything about it.'''
try:
file = self.file
self.file = None
if file is not None:
file.close()
finally:
sock = self.sock
self.sock = None
if sock is not None:
sock.close()
try:
import ssl
except ImportError:
_SSLSocket = None
else:
_SSLSocket = ssl.SSLSocket
class FTP_TLS(FTP):
'''A FTP subclass which adds TLS support to FTP as described
in RFC-4217.
Connect as usual to port 21 implicitly securing the FTP control
connection before authenticating.
Securing the data connection requires user to explicitly ask
for it by calling prot_p() method.
Usage example:
>>> from ftplib import FTP_TLS
>>> ftps = FTP_TLS('ftp.python.org')
>>> ftps.login() # login anonymously previously securing control channel
'230 Guest login ok, access restrictions apply.'
>>> ftps.prot_p() # switch to secure data connection
'200 Protection level set to P'
>>> ftps.retrlines('LIST') # list directory content securely
total 9
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 .
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 ..
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 bin
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 etc
d-wxrwxr-x 2 ftp wheel 1024 Sep 5 13:43 incoming
drwxr-xr-x 2 root wheel 1024 Nov 17 1993 lib
drwxr-xr-x 6 1094 wheel 1024 Sep 13 19:07 pub
drwxr-xr-x 3 root wheel 1024 Jan 3 1994 usr
-rw-r--r-- 1 root root 312 Aug 1 1994 welcome.msg
'226 Transfer complete.'
>>> ftps.quit()
'221 Goodbye.'
>>>
'''
ssl_version = ssl.PROTOCOL_SSLv23
def __init__(self, host='', user='', passwd='', acct='', keyfile=None,
certfile=None, context=None,
timeout=_GLOBAL_DEFAULT_TIMEOUT, source_address=None):
if context is not None and keyfile is not None:
raise ValueError("context and keyfile arguments are mutually "
"exclusive")
if context is not None and certfile is not None:
raise ValueError("context and certfile arguments are mutually "
"exclusive")
self.keyfile = keyfile
self.certfile = certfile
if context is None:
context = ssl._create_stdlib_context(self.ssl_version,
certfile=certfile,
keyfile=keyfile)
self.context = context
self._prot_p = False
FTP.__init__(self, host, user, passwd, acct, timeout, source_address)
def login(self, user='', passwd='', acct='', secure=True):
if secure and not isinstance(self.sock, ssl.SSLSocket):
self.auth()
return FTP.login(self, user, passwd, acct)
def auth(self):
'''Set up secure control connection by using TLS/SSL.'''
if isinstance(self.sock, ssl.SSLSocket):
raise ValueError("Already using TLS")
if self.ssl_version >= ssl.PROTOCOL_SSLv23:
resp = self.voidcmd('AUTH TLS')
else:
resp = self.voidcmd('AUTH SSL')
self.sock = self.context.wrap_socket(self.sock,
server_hostname=self.host)
self.file = self.sock.makefile(mode='r', encoding=self.encoding)
return resp
def ccc(self):
'''Switch back to a clear-text control connection.'''
if not isinstance(self.sock, ssl.SSLSocket):
raise ValueError("not using TLS")
resp = self.voidcmd('CCC')
self.sock = self.sock.unwrap()
return resp
def prot_p(self):
'''Set up secure data connection.'''
# PROT defines whether or not the data channel is to be protected.
# Though RFC-2228 defines four possible protection levels,
# RFC-4217 only recommends two, Clear and Private.
# Clear (PROT C) means that no security is to be used on the
# data-channel, Private (PROT P) means that the data-channel
# should be protected by TLS.
# PBSZ command MUST still be issued, but must have a parameter of
# '0' to indicate that no buffering is taking place and the data
# connection should not be encapsulated.
self.voidcmd('PBSZ 0')
resp = self.voidcmd('PROT P')
self._prot_p = True
return resp
def prot_c(self):
'''Set up clear text data connection.'''
resp = self.voidcmd('PROT C')
self._prot_p = False
return resp
# --- Overridden FTP methods
def ntransfercmd(self, cmd, rest=None):
conn, size = FTP.ntransfercmd(self, cmd, rest)
if self._prot_p:
conn = self.context.wrap_socket(conn,
server_hostname=self.host)
return conn, size
def abort(self):
# overridden as we can't pass MSG_OOB flag to sendall()
line = b'ABOR' + B_CRLF
self.sock.sendall(line)
resp = self.getmultiline()
if resp[:3] not in {'426', '225', '226'}:
raise error_proto(resp)
return resp
__all__.append('FTP_TLS')
all_errors = (Error, OSError, EOFError, ssl.SSLError)
_150_re = None
def parse150(resp):
'''Parse the '150' response for a RETR request.
Returns the expected transfer size or None; size is not guaranteed to
be present in the 150 message.
'''
if resp[:3] != '150':
raise error_reply(resp)
global _150_re
if _150_re is None:
import re
_150_re = re.compile(
"150 .* \((\d+) bytes\)", re.IGNORECASE | re.ASCII)
m = _150_re.match(resp)
if not m:
return None
return int(m.group(1))
_227_re = None
def parse227(resp):
'''Parse the '227' response for a PASV request.
Raises error_proto if it does not contain '(h1,h2,h3,h4,p1,p2)'
Return ('host.addr.as.numbers', port#) tuple.'''
if resp[:3] != '227':
raise error_reply(resp)
global _227_re
if _227_re is None:
import re
_227_re = re.compile(r'(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)', re.ASCII)
m = _227_re.search(resp)
if not m:
raise error_proto(resp)
numbers = m.groups()
host = '.'.join(numbers[:4])
port = (int(numbers[4]) << 8) + int(numbers[5])
return host, port
def parse229(resp, peer):
'''Parse the '229' response for an EPSV request.
Raises error_proto if it does not contain '(|||port|)'
Return ('host.addr.as.numbers', port#) tuple.'''
if resp[:3] != '229':
raise error_reply(resp)
left = resp.find('(')
if left < 0: raise error_proto(resp)
right = resp.find(')', left + 1)
if right < 0:
raise error_proto(resp) # should contain '(|||port|)'
if resp[left + 1] != resp[right - 1]:
raise error_proto(resp)
parts = resp[left + 1:right].split(resp[left+1])
if len(parts) != 5:
raise error_proto(resp)
host = peer[0]
port = int(parts[3])
return host, port
def parse257(resp):
'''Parse the '257' response for a MKD or PWD request.
This is a response to a MKD or PWD request: a directory name.
Returns the directoryname in the 257 reply.'''
if resp[:3] != '257':
raise error_reply(resp)
if resp[3:5] != ' "':
return '' # Not compliant to RFC 959, but UNIX ftpd does this
dirname = ''
i = 5
n = len(resp)
while i < n:
c = resp[i]
i = i+1
if c == '"':
if i >= n or resp[i] != '"':
break
i = i+1
dirname = dirname + c
return dirname
def print_line(line):
'''Default retrlines callback to print a line.'''
print(line)
def ftpcp(source, sourcename, target, targetname = '', type = 'I'):
'''Copy file from one FTP-instance to another.'''
if not targetname:
targetname = sourcename
type = 'TYPE ' + type
source.voidcmd(type)
target.voidcmd(type)
sourcehost, sourceport = parse227(source.sendcmd('PASV'))
target.sendport(sourcehost, sourceport)
# RFC 959: the user must "listen" [...] BEFORE sending the
# transfer request.
# So: STOR before RETR, because here the target is a "user".
treply = target.sendcmd('STOR ' + targetname)
if treply[:3] not in {'125', '150'}:
raise error_proto # RFC 959
sreply = source.sendcmd('RETR ' + sourcename)
if sreply[:3] not in {'125', '150'}:
raise error_proto # RFC 959
source.voidresp()
target.voidresp()
class Netrc:
"""Class to parse & provide access to 'netrc' format files.
See the netrc(4) man page for information on the file format.
WARNING: This class is obsolete -- use module netrc instead.
"""
__defuser = None
__defpasswd = None
__defacct = None
def __init__(self, filename=None):
warnings.warn("This class is deprecated, use the netrc module instead",
DeprecationWarning, 2)
if filename is None:
if "HOME" in os.environ:
filename = os.path.join(os.environ["HOME"],
".netrc")
else:
raise OSError("specify file to load or set $HOME")
self.__hosts = {}
self.__macros = {}
fp = open(filename, "r")
in_macro = 0
while 1:
line = fp.readline()
if not line:
break
if in_macro and line.strip():
macro_lines.append(line)
continue
elif in_macro:
self.__macros[macro_name] = tuple(macro_lines)
in_macro = 0
words = line.split()
host = user = passwd = acct = None
default = 0
i = 0
while i < len(words):
w1 = words[i]
if i+1 < len(words):
w2 = words[i + 1]
else:
w2 = None
if w1 == 'default':
default = 1
elif w1 == 'machine' and w2:
host = w2.lower()
i = i + 1
elif w1 == 'login' and w2:
user = w2
i = i + 1
elif w1 == 'password' and w2:
passwd = w2
i = i + 1
elif w1 == 'account' and w2:
acct = w2
i = i + 1
elif w1 == 'macdef' and w2:
macro_name = w2
macro_lines = []
in_macro = 1
break
i = i + 1
if default:
self.__defuser = user or self.__defuser
self.__defpasswd = passwd or self.__defpasswd
self.__defacct = acct or self.__defacct
if host:
if host in self.__hosts:
ouser, opasswd, oacct = \
self.__hosts[host]
user = user or ouser
passwd = passwd or opasswd
acct = acct or oacct
self.__hosts[host] = user, passwd, acct
fp.close()
def get_hosts(self):
"""Return a list of hosts mentioned in the .netrc file."""
return self.__hosts.keys()
def get_account(self, host):
"""Returns login information for the named host.
The return value is a triple containing userid,
password, and the accounting field.
"""
host = host.lower()
user = passwd = acct = None
if host in self.__hosts:
user, passwd, acct = self.__hosts[host]
user = user or self.__defuser
passwd = passwd or self.__defpasswd
acct = acct or self.__defacct
return user, passwd, acct
def get_macros(self):
"""Return a list of all defined macro names."""
return self.__macros.keys()
def get_macro(self, macro):
"""Return a sequence of lines which define a named macro."""
return self.__macros[macro]
def test():
'''Test program.
Usage: ftp [-d] [-r[file]] host [-l[dir]] [-d[dir]] [-p] [file] ...
-d dir
-l list
-p password
'''
if len(sys.argv) < 2:
print(test.__doc__)
sys.exit(0)
debugging = 0
rcfile = None
while sys.argv[1] == '-d':
debugging = debugging+1
del sys.argv[1]
if sys.argv[1][:2] == '-r':
# get name of alternate ~/.netrc file:
rcfile = sys.argv[1][2:]
del sys.argv[1]
host = sys.argv[1]
ftp = FTP(host)
ftp.set_debuglevel(debugging)
userid = passwd = acct = ''
try:
netrc = Netrc(rcfile)
except OSError:
if rcfile is not None:
sys.stderr.write("Could not open account file"
" -- using anonymous login.")
else:
try:
userid, passwd, acct = netrc.get_account(host)
except KeyError:
# no account for host
sys.stderr.write(
"No account -- using anonymous login.")
ftp.login(userid, passwd, acct)
for file in sys.argv[2:]:
if file[:2] == '-l':
ftp.dir(file[2:])
elif file[:2] == '-d':
cmd = 'CWD'
if file[2:]: cmd = cmd + ' ' + file[2:]
resp = ftp.sendcmd(cmd)
elif file == '-p':
ftp.set_pasv(not ftp.passiveserver)
else:
ftp.retrbinary('RETR ' + file, \
sys.stdout.write, 1024)
ftp.quit()
if __name__ == '__main__':
test()
"""functools.py - Tools for working with functions and callable objects
"""
# Python module wrapper for _functools C module
# to allow utilities written in Python to be added
# to the functools module.
# Written by Nick Coghlan <ncoghlan at gmail.com>,
# Raymond Hettinger <python at rcn.com>,
# and Łukasz Langa <lukasz at langa.pl>.
# Copyright (C) 2006-2013 Python Software Foundation.
# See C source code for _functools credits/copyright
__all__ = ['update_wrapper', 'wraps', 'WRAPPER_ASSIGNMENTS', 'WRAPPER_UPDATES',
'total_ordering', 'cmp_to_key', 'lru_cache', 'reduce', 'partial',
'partialmethod', 'singledispatch']
try:
from _functools import reduce
except ImportError:
pass
from abc import get_cache_token
from collections import namedtuple
from types import MappingProxyType
from weakref import WeakKeyDictionary
try:
from _thread import RLock
except:
class RLock:
'Dummy reentrant lock for builds without threads'
def __enter__(self): pass
def __exit__(self, exctype, excinst, exctb): pass
################################################################################
### update_wrapper() and wraps() decorator
################################################################################
# update_wrapper() and wraps() are tools to help write
# wrapper functions that can handle naive introspection
WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__qualname__', '__doc__',
'__annotations__')
WRAPPER_UPDATES = ('__dict__',)
def update_wrapper(wrapper,
wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Update a wrapper function to look like the wrapped function
wrapper is the function to be updated
wrapped is the original function
assigned is a tuple naming the attributes assigned directly
from the wrapped function to the wrapper function (defaults to
functools.WRAPPER_ASSIGNMENTS)
updated is a tuple naming the attributes of the wrapper that
are updated with the corresponding attribute from the wrapped
function (defaults to functools.WRAPPER_UPDATES)
"""
for attr in assigned:
try:
value = getattr(wrapped, attr)
except AttributeError:
pass
else:
setattr(wrapper, attr, value)
for attr in updated:
getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
# Issue #17482: set __wrapped__ last so we don't inadvertently copy it
# from the wrapped function when updating __dict__
wrapper.__wrapped__ = wrapped
# Return the wrapper so this can be used as a decorator via partial()
return wrapper
def wraps(wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Decorator factory to apply update_wrapper() to a wrapper function
Returns a decorator that invokes update_wrapper() with the decorated
function as the wrapper argument and the arguments to wraps() as the
remaining arguments. Default arguments are as for update_wrapper().
This is a convenience function to simplify applying partial() to
update_wrapper().
"""
return partial(update_wrapper, wrapped=wrapped,
assigned=assigned, updated=updated)
################################################################################
### total_ordering class decorator
################################################################################
# The total ordering functions all invoke the root magic method directly
# rather than using the corresponding operator. This avoids possible
# infinite recursion that could occur when the operator dispatch logic
# detects a NotImplemented result and then calls a reflected method.
def _gt_from_lt(self, other):
'Return a > b. Computed by @total_ordering from (not a < b) and (a != b).'
op_result = self.__lt__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result and self != other
def _le_from_lt(self, other):
'Return a <= b. Computed by @total_ordering from (a < b) or (a == b).'
op_result = self.__lt__(other)
return op_result or self == other
def _ge_from_lt(self, other):
'Return a >= b. Computed by @total_ordering from (not a < b).'
op_result = self.__lt__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result
def _ge_from_le(self, other):
'Return a >= b. Computed by @total_ordering from (not a <= b) or (a == b).'
op_result = self.__le__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result or self == other
def _lt_from_le(self, other):
'Return a < b. Computed by @total_ordering from (a <= b) and (a != b).'
op_result = self.__le__(other)
if op_result is NotImplemented:
return NotImplemented
return op_result and self != other
def _gt_from_le(self, other):
'Return a > b. Computed by @total_ordering from (not a <= b).'
op_result = self.__le__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result
def _lt_from_gt(self, other):
'Return a < b. Computed by @total_ordering from (not a > b) and (a != b).'
op_result = self.__gt__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result and self != other
def _ge_from_gt(self, other):
'Return a >= b. Computed by @total_ordering from (a > b) or (a == b).'
op_result = self.__gt__(other)
return op_result or self == other
def _le_from_gt(self, other):
'Return a <= b. Computed by @total_ordering from (not a > b).'
op_result = self.__gt__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result
def _le_from_ge(self, other):
'Return a <= b. Computed by @total_ordering from (not a >= b) or (a == b).'
op_result = self.__ge__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result or self == other
def _gt_from_ge(self, other):
'Return a > b. Computed by @total_ordering from (a >= b) and (a != b).'
op_result = self.__ge__(other)
if op_result is NotImplemented:
return NotImplemented
return op_result and self != other
def _lt_from_ge(self, other):
'Return a < b. Computed by @total_ordering from (not a >= b).'
op_result = self.__ge__(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result
def total_ordering(cls):
"""Class decorator that fills in missing ordering methods"""
convert = {
'__lt__': [('__gt__', _gt_from_lt),
('__le__', _le_from_lt),
('__ge__', _ge_from_lt)],
'__le__': [('__ge__', _ge_from_le),
('__lt__', _lt_from_le),
('__gt__', _gt_from_le)],
'__gt__': [('__lt__', _lt_from_gt),
('__ge__', _ge_from_gt),
('__le__', _le_from_gt)],
'__ge__': [('__le__', _le_from_ge),
('__gt__', _gt_from_ge),
('__lt__', _lt_from_ge)]
}
# Find user-defined comparisons (not those inherited from object).
roots = [op for op in convert if getattr(cls, op, None) is not getattr(object, op, None)]
if not roots:
raise ValueError('must define at least one ordering operation: < > <= >=')
root = max(roots) # prefer __lt__ to __le__ to __gt__ to __ge__
for opname, opfunc in convert[root]:
if opname not in roots:
opfunc.__name__ = opname
setattr(cls, opname, opfunc)
return cls
################################################################################
### cmp_to_key() function converter
################################################################################
def cmp_to_key(mycmp):
"""Convert a cmp= function into a key= function"""
class K(object):
__slots__ = ['obj']
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
def __ne__(self, other):
return mycmp(self.obj, other.obj) != 0
__hash__ = None
return K
try:
from _functools import cmp_to_key
except ImportError:
pass
################################################################################
### partial() argument application
################################################################################
# Purely functional, no descriptor behaviour
def partial(func, *args, **keywords):
"""New function with partial application of the given arguments
and keywords.
"""
def newfunc(*fargs, **fkeywords):
newkeywords = keywords.copy()
newkeywords.update(fkeywords)
return func(*(args + fargs), **newkeywords)
newfunc.func = func
newfunc.args = args
newfunc.keywords = keywords
return newfunc
try:
from _functools import partial
except ImportError:
pass
# Descriptor version
class partialmethod(object):
"""Method descriptor with partial application of the given arguments
and keywords.
Supports wrapping existing descriptors and handles non-descriptor
callables as instance methods.
"""
def __init__(self, func, *args, **keywords):
if not callable(func) and not hasattr(func, "__get__"):
raise TypeError("{!r} is not callable or a descriptor"
.format(func))
# func could be a descriptor like classmethod which isn't callable,
# so we can't inherit from partial (it verifies func is callable)
if isinstance(func, partialmethod):
# flattening is mandatory in order to place cls/self before all
# other arguments
# it's also more efficient since only one function will be called
self.func = func.func
self.args = func.args + args
self.keywords = func.keywords.copy()
self.keywords.update(keywords)
else:
self.func = func
self.args = args
self.keywords = keywords
def __repr__(self):
args = ", ".join(map(repr, self.args))
keywords = ", ".join("{}={!r}".format(k, v)
for k, v in self.keywords.items())
format_string = "{module}.{cls}({func}, {args}, {keywords})"
return format_string.format(module=self.__class__.__module__,
cls=self.__class__.__name__,
func=self.func,
args=args,
keywords=keywords)
def _make_unbound_method(self):
def _method(*args, **keywords):
call_keywords = self.keywords.copy()
call_keywords.update(keywords)
cls_or_self, *rest = args
call_args = (cls_or_self,) + self.args + tuple(rest)
return self.func(*call_args, **call_keywords)
_method.__isabstractmethod__ = self.__isabstractmethod__
_method._partialmethod = self
return _method
def __get__(self, obj, cls):
get = getattr(self.func, "__get__", None)
result = None
if get is not None:
new_func = get(obj, cls)
if new_func is not self.func:
# Assume __get__ returning something new indicates the
# creation of an appropriate callable
result = partial(new_func, *self.args, **self.keywords)
try:
result.__self__ = new_func.__self__
except AttributeError:
pass
if result is None:
# If the underlying descriptor didn't do anything, treat this
# like an instance method
result = self._make_unbound_method().__get__(obj, cls)
return result
@property
def __isabstractmethod__(self):
return getattr(self.func, "__isabstractmethod__", False)
################################################################################
### LRU Cache function decorator
################################################################################
_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
class _HashedSeq(list):
""" This class guarantees that hash() will be called no more than once
per element. This is important because the lru_cache() will hash
the key multiple times on a cache miss.
"""
__slots__ = 'hashvalue'
def __init__(self, tup, hash=hash):
self[:] = tup
self.hashvalue = hash(tup)
def __hash__(self):
return self.hashvalue
def _make_key(args, kwds, typed,
kwd_mark = (object(),),
fasttypes = {int, str, frozenset, type(None)},
sorted=sorted, tuple=tuple, type=type, len=len):
"""Make a cache key from optionally typed positional and keyword arguments
The key is constructed in a way that is flat as possible rather than
as a nested structure that would take more memory.
If there is only a single argument and its data type is known to cache
its hash value, then that argument is returned without a wrapper. This
saves space and improves lookup speed.
"""
key = args
if kwds:
sorted_items = sorted(kwds.items())
key += kwd_mark
for item in sorted_items:
key += item
if typed:
key += tuple(type(v) for v in args)
if kwds:
key += tuple(type(v) for k, v in sorted_items)
elif len(key) == 1 and type(key[0]) in fasttypes:
return key[0]
return _HashedSeq(key)
def lru_cache(maxsize=128, typed=False):
"""Least-recently-used cache decorator.
If *maxsize* is set to None, the LRU features are disabled and the cache
can grow without bound.
If *typed* is True, arguments of different types will be cached separately.
For example, f(3.0) and f(3) will be treated as distinct calls with
distinct results.
Arguments to the cached function must be hashable.
View the cache statistics named tuple (hits, misses, maxsize, currsize)
with f.cache_info(). Clear the cache and statistics with f.cache_clear().
Access the underlying function with f.__wrapped__.
See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
"""
# Users should only access the lru_cache through its public API:
# cache_info, cache_clear, and f.__wrapped__
# The internals of the lru_cache are encapsulated for thread safety and
# to allow the implementation to change (including a possible C version).
# Early detection of an erroneous call to @lru_cache without any arguments
# resulting in the inner function being passed to maxsize instead of an
# integer or None.
if maxsize is not None and not isinstance(maxsize, int):
raise TypeError('Expected maxsize to be an integer or None')
# Constants shared by all lru cache instances:
sentinel = object() # unique object used to signal cache misses
make_key = _make_key # build a key from the function arguments
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3 # names for the link fields
def decorating_function(user_function):
cache = {}
hits = misses = 0
full = False
cache_get = cache.get # bound method to lookup a key or return None
lock = RLock() # because linkedlist updates aren't threadsafe
root = [] # root of the circular doubly linked list
root[:] = [root, root, None, None] # initialize by pointing to self
if maxsize == 0:
def wrapper(*args, **kwds):
# No caching -- just a statistics update after a successful call
nonlocal misses
result = user_function(*args, **kwds)
misses += 1
return result
elif maxsize is None:
def wrapper(*args, **kwds):
# Simple caching without ordering or size limit
nonlocal hits, misses
key = make_key(args, kwds, typed)
result = cache_get(key, sentinel)
if result is not sentinel:
hits += 1
return result
result = user_function(*args, **kwds)
cache[key] = result
misses += 1
return result
else:
def wrapper(*args, **kwds):
# Size limited caching that tracks accesses by recency
nonlocal root, hits, misses, full
key = make_key(args, kwds, typed)
with lock:
link = cache_get(key)
if link is not None:
# Move the link to the front of the circular queue
link_prev, link_next, _key, result = link
link_prev[NEXT] = link_next
link_next[PREV] = link_prev
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root
hits += 1
return result
result = user_function(*args, **kwds)
with lock:
if key in cache:
# Getting here means that this same key was added to the
# cache while the lock was released. Since the link
# update is already done, we need only return the
# computed result and update the count of misses.
pass
elif full:
# Use the old root to store the new key and result.
oldroot = root
oldroot[KEY] = key
oldroot[RESULT] = result
# Empty the oldest link and make it the new root.
# Keep a reference to the old key and old result to
# prevent their ref counts from going to zero during the
# update. That will prevent potentially arbitrary object
# clean-up code (i.e. __del__) from running while we're
# still adjusting the links.
root = oldroot[NEXT]
oldkey = root[KEY]
oldresult = root[RESULT]
root[KEY] = root[RESULT] = None
# Now update the cache dictionary.
del cache[oldkey]
# Save the potentially reentrant cache[key] assignment
# for last, after the root and links have been put in
# a consistent state.
cache[key] = oldroot
else:
# Put result in a new link at the front of the queue.
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link
full = (len(cache) >= maxsize)
misses += 1
return result
def cache_info():
"""Report cache statistics"""
with lock:
return _CacheInfo(hits, misses, maxsize, len(cache))
def cache_clear():
"""Clear the cache and cache statistics"""
nonlocal hits, misses, full
with lock:
cache.clear()
root[:] = [root, root, None, None]
hits = misses = 0
full = False
wrapper.cache_info = cache_info
wrapper.cache_clear = cache_clear
return update_wrapper(wrapper, user_function)
return decorating_function
################################################################################
### singledispatch() - single-dispatch generic function decorator
################################################################################
def _c3_merge(sequences):
"""Merges MROs in *sequences* to a single MRO using the C3 algorithm.
Adapted from http://www.python.org/download/releases/2.3/mro/.
"""
result = []
while True:
sequences = [s for s in sequences if s] # purge empty sequences
if not sequences:
return result
for s1 in sequences: # find merge candidates among seq heads
candidate = s1[0]
for s2 in sequences:
if candidate in s2[1:]:
candidate = None
break # reject the current head, it appears later
else:
break
if candidate is None:
raise RuntimeError("Inconsistent hierarchy")
result.append(candidate)
# remove the chosen candidate
for seq in sequences:
if seq[0] == candidate:
del seq[0]
def _c3_mro(cls, abcs=None):
"""Computes the method resolution order using extended C3 linearization.
If no *abcs* are given, the algorithm works exactly like the built-in C3
linearization used for method resolution.
If given, *abcs* is a list of abstract base classes that should be inserted
into the resulting MRO. Unrelated ABCs are ignored and don't end up in the
result. The algorithm inserts ABCs where their functionality is introduced,
i.e. issubclass(cls, abc) returns True for the class itself but returns
False for all its direct base classes. Implicit ABCs for a given class
(either registered or inferred from the presence of a special method like
__len__) are inserted directly after the last ABC explicitly listed in the
MRO of said class. If two implicit ABCs end up next to each other in the
resulting MRO, their ordering depends on the order of types in *abcs*.
"""
for i, base in enumerate(reversed(cls.__bases__)):
if hasattr(base, '__abstractmethods__'):
boundary = len(cls.__bases__) - i
break # Bases up to the last explicit ABC are considered first.
else:
boundary = 0
abcs = list(abcs) if abcs else []
explicit_bases = list(cls.__bases__[:boundary])
abstract_bases = []
other_bases = list(cls.__bases__[boundary:])
for base in abcs:
if issubclass(cls, base) and not any(
issubclass(b, base) for b in cls.__bases__
):
# If *cls* is the class that introduces behaviour described by
# an ABC *base*, insert said ABC to its MRO.
abstract_bases.append(base)
for base in abstract_bases:
abcs.remove(base)
explicit_c3_mros = [_c3_mro(base, abcs=abcs) for base in explicit_bases]
abstract_c3_mros = [_c3_mro(base, abcs=abcs) for base in abstract_bases]
other_c3_mros = [_c3_mro(base, abcs=abcs) for base in other_bases]
return _c3_merge(
[[cls]] +
explicit_c3_mros + abstract_c3_mros + other_c3_mros +
[explicit_bases] + [abstract_bases] + [other_bases]
)
def _compose_mro(cls, types):
"""Calculates the method resolution order for a given class *cls*.
Includes relevant abstract base classes (with their respective bases) from
the *types* iterable. Uses a modified C3 linearization algorithm.
"""
bases = set(cls.__mro__)
# Remove entries which are already present in the __mro__ or unrelated.
def is_related(typ):
return (typ not in bases and hasattr(typ, '__mro__')
and issubclass(cls, typ))
types = [n for n in types if is_related(n)]
# Remove entries which are strict bases of other entries (they will end up
# in the MRO anyway.
def is_strict_base(typ):
for other in types:
if typ != other and typ in other.__mro__:
return True
return False
types = [n for n in types if not is_strict_base(n)]
# Subclasses of the ABCs in *types* which are also implemented by
# *cls* can be used to stabilize ABC ordering.
type_set = set(types)
mro = []
for typ in types:
found = []
for sub in typ.__subclasses__():
if sub not in bases and issubclass(cls, sub):
found.append([s for s in sub.__mro__ if s in type_set])
if not found:
mro.append(typ)
continue
# Favor subclasses with the biggest number of useful bases
found.sort(key=len, reverse=True)
for sub in found:
for subcls in sub:
if subcls not in mro:
mro.append(subcls)
return _c3_mro(cls, abcs=mro)
def _find_impl(cls, registry):
"""Returns the best matching implementation from *registry* for type *cls*.
Where there is no registered implementation for a specific type, its method
resolution order is used to find a more generic implementation.
Note: if *registry* does not contain an implementation for the base
*object* type, this function may return None.
"""
mro = _compose_mro(cls, registry.keys())
match = None
for t in mro:
if match is not None:
# If *match* is an implicit ABC but there is another unrelated,
# equally matching implicit ABC, refuse the temptation to guess.
if (t in registry and t not in cls.__mro__
and match not in cls.__mro__
and not issubclass(match, t)):
raise RuntimeError("Ambiguous dispatch: {} or {}".format(
match, t))
break
if t in registry:
match = t
return registry.get(match)
def singledispatch(func):
"""Single-dispatch generic function decorator.
Transforms a function into a generic function, which can have different
behaviours depending upon the type of its first argument. The decorated
function acts as the default implementation, and additional
implementations can be registered using the register() attribute of the
generic function.
"""
registry = {}
dispatch_cache = WeakKeyDictionary()
cache_token = None
def dispatch(cls):
"""generic_func.dispatch(cls) -> <function implementation>
Runs the dispatch algorithm to return the best available implementation
for the given *cls* registered on *generic_func*.
"""
nonlocal cache_token
if cache_token is not None:
current_token = get_cache_token()
if cache_token != current_token:
dispatch_cache.clear()
cache_token = current_token
try:
impl = dispatch_cache[cls]
except KeyError:
try:
impl = registry[cls]
except KeyError:
impl = _find_impl(cls, registry)
dispatch_cache[cls] = impl
return impl
def register(cls, func=None):
"""generic_func.register(cls, func) -> func
Registers a new implementation for the given *cls* on a *generic_func*.
"""
nonlocal cache_token
if func is None:
return lambda f: register(cls, f)
registry[cls] = func
if cache_token is None and hasattr(cls, '__abstractmethods__'):
cache_token = get_cache_token()
dispatch_cache.clear()
return func
def wrapper(*args, **kw):
return dispatch(args[0].__class__)(*args, **kw)
registry[object] = func
wrapper.register = register
wrapper.dispatch = dispatch
wrapper.registry = MappingProxyType(registry)
wrapper._clear_cache = dispatch_cache.clear
update_wrapper(wrapper, func)
return wrapper
"""
Path operations common to more than one OS
Do not use directly. The OS specific modules import the appropriate
functions from this module themselves.
"""
import os
import stat
__all__ = ['commonprefix', 'exists', 'getatime', 'getctime', 'getmtime',
'getsize', 'isdir', 'isfile', 'samefile', 'sameopenfile',
'samestat']
# Does a path exist?
# This is false for dangling symbolic links on systems that support them.
def exists(path):
"""Test whether a path exists. Returns False for broken symbolic links"""
try:
os.stat(path)
except OSError:
return False
return True
# This follows symbolic links, so both islink() and isdir() can be true
# for the same path on systems that support symlinks
def isfile(path):
"""Test whether a path is a regular file"""
try:
st = os.stat(path)
except OSError:
return False
return stat.S_ISREG(st.st_mode)
# Is a path a directory?
# This follows symbolic links, so both islink() and isdir()
# can be true for the same path on systems that support symlinks
def isdir(s):
"""Return true if the pathname refers to an existing directory."""
try:
st = os.stat(s)
except OSError:
return False
return stat.S_ISDIR(st.st_mode)
def getsize(filename):
"""Return the size of a file, reported by os.stat()."""
return os.stat(filename).st_size
def getmtime(filename):
"""Return the last modification time of a file, reported by os.stat()."""
return os.stat(filename).st_mtime
def getatime(filename):
"""Return the last access time of a file, reported by os.stat()."""
return os.stat(filename).st_atime
def getctime(filename):
"""Return the metadata change time of a file, reported by os.stat()."""
return os.stat(filename).st_ctime
# Return the longest prefix of all list elements.
def commonprefix(m):
"Given a list of pathnames, returns the longest common leading component"
if not m: return ''
s1 = min(m)
s2 = max(m)
for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]
return s1
# Are two stat buffers (obtained from stat, fstat or lstat)
# describing the same file?
def samestat(s1, s2):
"""Test whether two stat buffers reference the same file"""
return (s1.st_ino == s2.st_ino and
s1.st_dev == s2.st_dev)
# Are two filenames really pointing to the same file?
def samefile(f1, f2):
"""Test whether two pathnames reference the same actual file"""
s1 = os.stat(f1)
s2 = os.stat(f2)
return samestat(s1, s2)
# Are two open files really referencing the same file?
# (Not necessarily the same file descriptor!)
def sameopenfile(fp1, fp2):
"""Test whether two open file objects reference the same file"""
s1 = os.fstat(fp1)
s2 = os.fstat(fp2)
return samestat(s1, s2)
# Split a path in root and extension.
# The extension is everything starting at the last dot in the last
# pathname component; the root is everything before that.
# It is always true that root + ext == p.
# Generic implementation of splitext, to be parametrized with
# the separators
def _splitext(p, sep, altsep, extsep):
"""Split the extension from a pathname.
Extension is everything from the last dot to the end, ignoring
leading dots. Returns "(root, ext)"; ext may be empty."""
# NOTE: This code must work for text and bytes strings.
sepIndex = p.rfind(sep)
if altsep:
altsepIndex = p.rfind(altsep)
sepIndex = max(sepIndex, altsepIndex)
dotIndex = p.rfind(extsep)
if dotIndex > sepIndex:
# skip all leading dots
filenameIndex = sepIndex + 1
while filenameIndex < dotIndex:
if p[filenameIndex:filenameIndex+1] != extsep:
return p[:dotIndex], p[dotIndex:]
filenameIndex += 1
return p, p[:0]
"""Parser for command line options.
This module helps scripts to parse the command line arguments in
sys.argv. It supports the same conventions as the Unix getopt()
function (including the special meanings of arguments of the form `-'
and `--'). Long options similar to those supported by GNU software
may be used as well via an optional third argument. This module
provides two functions and an exception:
getopt() -- Parse command line options
gnu_getopt() -- Like getopt(), but allow option and non-option arguments
to be intermixed.
GetoptError -- exception (class) raised with 'opt' attribute, which is the
option involved with the exception.
"""
# Long option support added by Lars Wirzenius <[email protected]>.
#
# Gerrit Holl <[email protected]> moved the string-based exceptions
# to class-based exceptions.
#
# Peter Åstrand <[email protected]> added gnu_getopt().
#
# TODO for gnu_getopt():
#
# - GNU getopt_long_only mechanism
# - allow the caller to specify ordering
# - RETURN_IN_ORDER option
# - GNU extension with '-' as first character of option string
# - optional arguments, specified by double colons
# - an option string with a W followed by semicolon should
# treat "-W foo" as "--foo"
__all__ = ["GetoptError","error","getopt","gnu_getopt"]
import os
try:
from gettext import gettext as _
except ImportError:
# Bootstrapping Python: gettext's dependencies not built yet
def _(s): return s
class GetoptError(Exception):
opt = ''
msg = ''
def __init__(self, msg, opt=''):
self.msg = msg
self.opt = opt
Exception.__init__(self, msg, opt)
def __str__(self):
return self.msg
error = GetoptError # backward compatibility
def getopt(args, shortopts, longopts = []):
"""getopt(args, options[, long_options]) -> opts, args
Parses command line options and parameter list. args is the
argument list to be parsed, without the leading reference to the
running program. Typically, this means "sys.argv[1:]". shortopts
is the string of option letters that the script wants to
recognize, with options that require an argument followed by a
colon (i.e., the same format that Unix getopt() uses). If
specified, longopts is a list of strings with the names of the
long options which should be supported. The leading '--'
characters should not be included in the option name. Options
which require an argument should be followed by an equal sign
('=').
The return value consists of two elements: the first is a list of
(option, value) pairs; the second is the list of program arguments
left after the option list was stripped (this is a trailing slice
of the first argument). Each option-and-value pair returned has
the option as its first element, prefixed with a hyphen (e.g.,
'-x'), and the option argument as its second element, or an empty
string if the option has no argument. The options occur in the
list in the same order in which they were found, thus allowing
multiple occurrences. Long and short options may be mixed.
"""
opts = []
if type(longopts) == type(""):
longopts = [longopts]
else:
longopts = list(longopts)
while args and args[0].startswith('-') and args[0] != '-':
if args[0] == '--':
args = args[1:]
break
if args[0].startswith('--'):
opts, args = do_longs(opts, args[0][2:], longopts, args[1:])
else:
opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])
return opts, args
def gnu_getopt(args, shortopts, longopts = []):
"""getopt(args, options[, long_options]) -> opts, args
This function works like getopt(), except that GNU style scanning
mode is used by default. This means that option and non-option
arguments may be intermixed. The getopt() function stops
processing options as soon as a non-option argument is
encountered.
If the first character of the option string is `+', or if the
environment variable POSIXLY_CORRECT is set, then option
processing stops as soon as a non-option argument is encountered.
"""
opts = []
prog_args = []
if isinstance(longopts, str):
longopts = [longopts]
else:
longopts = list(longopts)
# Allow options after non-option arguments?
if shortopts.startswith('+'):
shortopts = shortopts[1:]
all_options_first = True
elif os.environ.get("POSIXLY_CORRECT"):
all_options_first = True
else:
all_options_first = False
while args:
if args[0] == '--':
prog_args += args[1:]
break
if args[0][:2] == '--':
opts, args = do_longs(opts, args[0][2:], longopts, args[1:])
elif args[0][:1] == '-' and args[0] != '-':
opts, args = do_shorts(opts, args[0][1:], shortopts, args[1:])
else:
if all_options_first:
prog_args += args
break
else:
prog_args.append(args[0])
args = args[1:]
return opts, prog_args
def do_longs(opts, opt, longopts, args):
try:
i = opt.index('=')
except ValueError:
optarg = None
else:
opt, optarg = opt[:i], opt[i+1:]
has_arg, opt = long_has_args(opt, longopts)
if has_arg:
if optarg is None:
if not args:
raise GetoptError(_('option --%s requires argument') % opt, opt)
optarg, args = args[0], args[1:]
elif optarg is not None:
raise GetoptError(_('option --%s must not have an argument') % opt, opt)
opts.append(('--' + opt, optarg or ''))
return opts, args
# Return:
# has_arg?
# full option name
def long_has_args(opt, longopts):
possibilities = [o for o in longopts if o.startswith(opt)]
if not possibilities:
raise GetoptError(_('option --%s not recognized') % opt, opt)
# Is there an exact match?
if opt in possibilities:
return False, opt
elif opt + '=' in possibilities:
return True, opt
# No exact match, so better be unique.
if len(possibilities) > 1:
# XXX since possibilities contains all valid continuations, might be
# nice to work them into the error msg
raise GetoptError(_('option --%s not a unique prefix') % opt, opt)
assert len(possibilities) == 1
unique_match = possibilities[0]
has_arg = unique_match.endswith('=')
if has_arg:
unique_match = unique_match[:-1]
return has_arg, unique_match
def do_shorts(opts, optstring, shortopts, args):
while optstring != '':
opt, optstring = optstring[0], optstring[1:]
if short_has_arg(opt, shortopts):
if optstring == '':
if not args:
raise GetoptError(_('option -%s requires argument') % opt,
opt)
optstring, args = args[0], args[1:]
optarg, optstring = optstring, ''
else:
optarg = ''
opts.append(('-' + opt, optarg))
return opts, args
def short_has_arg(opt, shortopts):
for i in range(len(shortopts)):
if opt == shortopts[i] != ':':
return shortopts.startswith(':', i+1)
raise GetoptError(_('option -%s not recognized') % opt, opt)
if __name__ == '__main__':
import sys
print(getopt(sys.argv[1:], "a:b", ["alpha=", "beta"]))
"""Utilities to get a password and/or the current user name.
getpass(prompt[, stream]) - Prompt for a password, with echo turned off.
getuser() - Get the user name from the environment or password database.
GetPassWarning - This UserWarning is issued when getpass() cannot prevent
echoing of the password contents while reading.
On Windows, the msvcrt module will be used.
On the Mac EasyDialogs.AskPassword is used, if available.
"""
# Authors: Piers Lauder (original)
# Guido van Rossum (Windows support and cleanup)
# Gregory P. Smith (tty support & GetPassWarning)
import contextlib
import io
import os
import sys
import warnings
__all__ = ["getpass","getuser","GetPassWarning"]
class GetPassWarning(UserWarning): pass
def unix_getpass(prompt='Password: ', stream=None):
"""Prompt for a password, with echo turned off.
Args:
prompt: Written on stream to ask for the input. Default: 'Password: '
stream: A writable file object to display the prompt. Defaults to
the tty. If no tty is available defaults to sys.stderr.
Returns:
The seKr3t input.
Raises:
EOFError: If our input tty or stdin was closed.
GetPassWarning: When we were unable to turn echo off on the input.
Always restores terminal settings before returning.
"""
passwd = None
with contextlib.ExitStack() as stack:
try:
# Always try reading and writing directly on the tty first.
fd = os.open('/dev/tty', os.O_RDWR|os.O_NOCTTY)
tty = io.FileIO(fd, 'w+')
stack.enter_context(tty)
input = io.TextIOWrapper(tty)
stack.enter_context(input)
if not stream:
stream = input
except OSError as e:
# If that fails, see if stdin can be controlled.
stack.close()
try:
fd = sys.stdin.fileno()
except (AttributeError, ValueError):
fd = None
passwd = fallback_getpass(prompt, stream)
input = sys.stdin
if not stream:
stream = sys.stderr
if fd is not None:
try:
old = termios.tcgetattr(fd) # a copy to save
new = old[:]
new[3] &= ~termios.ECHO # 3 == 'lflags'
tcsetattr_flags = termios.TCSAFLUSH
if hasattr(termios, 'TCSASOFT'):
tcsetattr_flags |= termios.TCSASOFT
try:
termios.tcsetattr(fd, tcsetattr_flags, new)
passwd = _raw_input(prompt, stream, input=input)
finally:
termios.tcsetattr(fd, tcsetattr_flags, old)
stream.flush() # issue7208
except termios.error:
if passwd is not None:
# _raw_input succeeded. The final tcsetattr failed. Reraise
# instead of leaving the terminal in an unknown state.
raise
# We can't control the tty or stdin. Give up and use normal IO.
# fallback_getpass() raises an appropriate warning.
if stream is not input:
# clean up unused file objects before blocking
stack.close()
passwd = fallback_getpass(prompt, stream)
stream.write('\n')
return passwd
def win_getpass(prompt='Password: ', stream=None):
"""Prompt for password with echo off, using Windows getch()."""
if sys.stdin is not sys.__stdin__:
return fallback_getpass(prompt, stream)
import msvcrt
for c in prompt:
msvcrt.putwch(c)
pw = ""
while 1:
c = msvcrt.getwch()
if c == '\r' or c == '\n':
break
if c == '\003':
raise KeyboardInterrupt
if c == '\b':
pw = pw[:-1]
else:
pw = pw + c
msvcrt.putwch('\r')
msvcrt.putwch('\n')
return pw
def fallback_getpass(prompt='Password: ', stream=None):
warnings.warn("Can not control echo on the terminal.", GetPassWarning,
stacklevel=2)
if not stream:
stream = sys.stderr
print("Warning: Password input may be echoed.", file=stream)
return _raw_input(prompt, stream)
def _raw_input(prompt="", stream=None, input=None):
# This doesn't save the string in the GNU readline history.
if not stream:
stream = sys.stderr
if not input:
input = sys.stdin
prompt = str(prompt)
if prompt:
try:
stream.write(prompt)
except UnicodeEncodeError:
# Use replace error handler to get as much as possible printed.
prompt = prompt.encode(stream.encoding, 'replace')
prompt = prompt.decode(stream.encoding)
stream.write(prompt)
stream.flush()
# NOTE: The Python C API calls flockfile() (and unlock) during readline.
line = input.readline()
if not line:
raise EOFError
if line[-1] == '\n':
line = line[:-1]
return line
def getuser():
"""Get the username from the environment or password database.
First try various environment variables, then the password
database. This works on Windows as long as USERNAME is set.
"""
for name in ('LOGNAME', 'USER', 'LNAME', 'USERNAME'):
user = os.environ.get(name)
if user:
return user
# If this fails, the exception will "explain" why
import pwd
return pwd.getpwuid(os.getuid())[0]
# Bind the name getpass to the appropriate function
try:
import termios
# it's possible there is an incompatible termios from the
# McMillan Installer, make sure we have a UNIX-compatible termios
termios.tcgetattr, termios.tcsetattr
except (ImportError, AttributeError):
try:
import msvcrt
except ImportError:
getpass = fallback_getpass
else:
getpass = win_getpass
else:
getpass = unix_getpass
"""Internationalization and localization support.
This module provides internationalization (I18N) and localization (L10N)
support for your Python programs by providing an interface to the GNU gettext
message catalog library.
I18N refers to the operation by which a program is made aware of multiple
languages. L10N refers to the adaptation of your program, once
internationalized, to the local language and cultural habits.
"""
# This module represents the integration of work, contributions, feedback, and
# suggestions from the following people:
#
# Martin von Loewis, who wrote the initial implementation of the underlying
# C-based libintlmodule (later renamed _gettext), along with a skeletal
# gettext.py implementation.
#
# Peter Funk, who wrote fintl.py, a fairly complete wrapper around intlmodule,
# which also included a pure-Python implementation to read .mo files if
# intlmodule wasn't available.
#
# James Henstridge, who also wrote a gettext.py module, which has some
# interesting, but currently unsupported experimental features: the notion of
# a Catalog class and instances, and the ability to add to a catalog file via
# a Python API.
#
# Barry Warsaw integrated these modules, wrote the .install() API and code,
# and conformed all C and Python code to Python's coding standards.
#
# Francois Pinard and Marc-Andre Lemburg also contributed valuably to this
# module.
#
# J. David Ibanez implemented plural forms. Bruno Haible fixed some bugs.
#
# TODO:
# - Lazy loading of .mo files. Currently the entire catalog is loaded into
# memory, but that's probably bad for large translated programs. Instead,
# the lexical sort of original strings in GNU .mo files should be exploited
# to do binary searches and lazy initializations. Or you might want to use
# the undocumented double-hash algorithm for .mo files with hash tables, but
# you'll need to study the GNU gettext code to do this.
#
# - Support Solaris .mo file formats. Unfortunately, we've been unable to
# find this format documented anywhere.
import locale, copy, io, os, re, struct, sys
from errno import ENOENT
__all__ = ['NullTranslations', 'GNUTranslations', 'Catalog',
'find', 'translation', 'install', 'textdomain', 'bindtextdomain',
'bind_textdomain_codeset',
'dgettext', 'dngettext', 'gettext', 'lgettext', 'ldgettext',
'ldngettext', 'lngettext', 'ngettext',
]
_default_localedir = os.path.join(sys.base_prefix, 'share', 'locale')
# Expression parsing for plural form selection.
#
# The gettext library supports a small subset of C syntax. The only
# incompatible difference is that integer literals starting with zero are
# decimal.
#
# https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms
# http://git.savannah.gnu.org/cgit/gettext.git/tree/gettext-runtime/intl/plural.y
_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,
# <=, >=, ==, !=, &&, ||,
# ? :
# unary and bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)
def _tokenize(plural):
for mo in re.finditer(_token_pattern, plural):
kind = mo.lastgroup
if kind == 'WHITESPACES':
continue
value = mo.group(kind)
if kind == 'INVALID':
raise ValueError('invalid token in plural form: %s' % value)
yield value
yield ''
def _error(value):
if value:
return ValueError('unexpected token in plural form: %s' % value)
else:
return ValueError('unexpected end of plural form')
_binary_ops = (
('||',),
('&&',),
('==', '!='),
('<', '>', '<=', '>='),
('+', '-'),
('*', '/', '%'),
)
_binary_ops = {op: i for i, ops in enumerate(_binary_ops, 1) for op in ops}
_c2py_ops = {'||': 'or', '&&': 'and', '/': '//'}
def _parse(tokens, priority=-1):
result = ''
nexttok = next(tokens)
while nexttok == '!':
result += 'not '
nexttok = next(tokens)
if nexttok == '(':
sub, nexttok = _parse(tokens)
result = '%s(%s)' % (result, sub)
if nexttok != ')':
raise ValueError('unbalanced parenthesis in plural form')
elif nexttok == 'n':
result = '%s%s' % (result, nexttok)
else:
try:
value = int(nexttok, 10)
except ValueError:
raise _error(nexttok) from None
result = '%s%d' % (result, value)
nexttok = next(tokens)
j = 100
while nexttok in _binary_ops:
i = _binary_ops[nexttok]
if i < priority:
break
# Break chained comparisons
if i in (3, 4) and j in (3, 4): # '==', '!=', '<', '>', '<=', '>='
result = '(%s)' % result
# Replace some C operators by their Python equivalents
op = _c2py_ops.get(nexttok, nexttok)
right, nexttok = _parse(tokens, i + 1)
result = '%s %s %s' % (result, op, right)
j = i
if j == priority == 4: # '<', '>', '<=', '>='
result = '(%s)' % result
if nexttok == '?' and priority <= 0:
if_true, nexttok = _parse(tokens, 0)
if nexttok != ':':
raise _error(nexttok)
if_false, nexttok = _parse(tokens)
result = '%s if %s else %s' % (if_true, result, if_false)
if priority == 0:
result = '(%s)' % result
return result, nexttok
def _as_int(n):
try:
i = round(n)
except TypeError:
raise TypeError('Plural value must be an integer, got %s' %
(n.__class__.__name__,)) from None
return n
def c2py(plural):
"""Gets a C expression as used in PO files for plural forms and returns a
Python function that implements an equivalent expression.
"""
if len(plural) > 1000:
raise ValueError('plural form expression is too long')
try:
result, nexttok = _parse(_tokenize(plural))
if nexttok:
raise _error(nexttok)
depth = 0
for c in result:
if c == '(':
depth += 1
if depth > 20:
# Python compiler limit is about 90.
# The most complex example has 2.
raise ValueError('plural form expression is too complex')
elif c == ')':
depth -= 1
ns = {'_as_int': _as_int}
exec('''if True:
def func(n):
if not isinstance(n, int):
n = _as_int(n)
return int(%s)
''' % result, ns)
return ns['func']
except RuntimeError:
# Recursion error can be raised in _parse() or exec().
raise ValueError('plural form expression is too complex')
def _expand_lang(loc):
loc = locale.normalize(loc)
COMPONENT_CODESET = 1 << 0
COMPONENT_TERRITORY = 1 << 1
COMPONENT_MODIFIER = 1 << 2
# split up the locale into its base components
mask = 0
pos = loc.find('@')
if pos >= 0:
modifier = loc[pos:]
loc = loc[:pos]
mask |= COMPONENT_MODIFIER
else:
modifier = ''
pos = loc.find('.')
if pos >= 0:
codeset = loc[pos:]
loc = loc[:pos]
mask |= COMPONENT_CODESET
else:
codeset = ''
pos = loc.find('_')
if pos >= 0:
territory = loc[pos:]
loc = loc[:pos]
mask |= COMPONENT_TERRITORY
else:
territory = ''
language = loc
ret = []
for i in range(mask+1):
if not (i & ~mask): # if all components for this combo exist ...
val = language
if i & COMPONENT_TERRITORY: val += territory
if i & COMPONENT_CODESET: val += codeset
if i & COMPONENT_MODIFIER: val += modifier
ret.append(val)
ret.reverse()
return ret
class NullTranslations:
def __init__(self, fp=None):
self._info = {}
self._charset = None
self._output_charset = None
self._fallback = None
if fp is not None:
self._parse(fp)
def _parse(self, fp):
pass
def add_fallback(self, fallback):
if self._fallback:
self._fallback.add_fallback(fallback)
else:
self._fallback = fallback
def gettext(self, message):
if self._fallback:
return self._fallback.gettext(message)
return message
def lgettext(self, message):
if self._fallback:
return self._fallback.lgettext(message)
return message
def ngettext(self, msgid1, msgid2, n):
if self._fallback:
return self._fallback.ngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def lngettext(self, msgid1, msgid2, n):
if self._fallback:
return self._fallback.lngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def info(self):
return self._info
def charset(self):
return self._charset
def output_charset(self):
return self._output_charset
def set_output_charset(self, charset):
self._output_charset = charset
def install(self, names=None):
import builtins
builtins.__dict__['_'] = self.gettext
if hasattr(names, "__contains__"):
if "gettext" in names:
builtins.__dict__['gettext'] = builtins.__dict__['_']
if "ngettext" in names:
builtins.__dict__['ngettext'] = self.ngettext
if "lgettext" in names:
builtins.__dict__['lgettext'] = self.lgettext
if "lngettext" in names:
builtins.__dict__['lngettext'] = self.lngettext
class GNUTranslations(NullTranslations):
# Magic number of .mo files
LE_MAGIC = 0x950412de
BE_MAGIC = 0xde120495
def _parse(self, fp):
"""Override this method to support alternative .mo formats."""
unpack = struct.unpack
filename = getattr(fp, 'name', '')
# Parse the .mo file header, which consists of 5 little endian 32
# bit words.
self._catalog = catalog = {}
self.plural = lambda n: int(n != 1) # germanic plural by default
buf = fp.read()
buflen = len(buf)
# Are we big endian or little endian?
magic = unpack('<I', buf[:4])[0]
if magic == self.LE_MAGIC:
version, msgcount, masteridx, transidx = unpack('<4I', buf[4:20])
ii = '<II'
elif magic == self.BE_MAGIC:
version, msgcount, masteridx, transidx = unpack('>4I', buf[4:20])
ii = '>II'
else:
raise OSError(0, 'Bad magic number', filename)
# Now put all messages from the .mo file buffer into the catalog
# dictionary.
for i in range(0, msgcount):
mlen, moff = unpack(ii, buf[masteridx:masteridx+8])
mend = moff + mlen
tlen, toff = unpack(ii, buf[transidx:transidx+8])
tend = toff + tlen
if mend < buflen and tend < buflen:
msg = buf[moff:mend]
tmsg = buf[toff:tend]
else:
raise OSError(0, 'File is corrupt', filename)
# See if we're looking at GNU .mo conventions for metadata
if mlen == 0:
# Catalog description
lastk = None
for b_item in tmsg.split('\n'.encode("ascii")):
item = b_item.decode().strip()
if not item:
continue
k = v = None
if ':' in item:
k, v = item.split(':', 1)
k = k.strip().lower()
v = v.strip()
self._info[k] = v
lastk = k
elif lastk:
self._info[lastk] += '\n' + item
if k == 'content-type':
self._charset = v.split('charset=')[1]
elif k == 'plural-forms':
v = v.split(';')
plural = v[1].split('plural=')[1]
self.plural = c2py(plural)
# Note: we unconditionally convert both msgids and msgstrs to
# Unicode using the character encoding specified in the charset
# parameter of the Content-Type header. The gettext documentation
# strongly encourages msgids to be us-ascii, but some applications
# require alternative encodings (e.g. Zope's ZCML and ZPT). For
# traditional gettext applications, the msgid conversion will
# cause no problems since us-ascii should always be a subset of
# the charset encoding. We may want to fall back to 8-bit msgids
# if the Unicode conversion fails.
charset = self._charset or 'ascii'
if b'\x00' in msg:
# Plural forms
msgid1, msgid2 = msg.split(b'\x00')
tmsg = tmsg.split(b'\x00')
msgid1 = str(msgid1, charset)
for i, x in enumerate(tmsg):
catalog[(msgid1, i)] = str(x, charset)
else:
catalog[str(msg, charset)] = str(tmsg, charset)
# advance to next entry in the seek tables
masteridx += 8
transidx += 8
def lgettext(self, message):
missing = object()
tmsg = self._catalog.get(message, missing)
if tmsg is missing:
if self._fallback:
return self._fallback.lgettext(message)
return message
if self._output_charset:
return tmsg.encode(self._output_charset)
return tmsg.encode(locale.getpreferredencoding())
def lngettext(self, msgid1, msgid2, n):
try:
tmsg = self._catalog[(msgid1, self.plural(n))]
if self._output_charset:
return tmsg.encode(self._output_charset)
return tmsg.encode(locale.getpreferredencoding())
except KeyError:
if self._fallback:
return self._fallback.lngettext(msgid1, msgid2, n)
if n == 1:
return msgid1
else:
return msgid2
def gettext(self, message):
missing = object()
tmsg = self._catalog.get(message, missing)
if tmsg is missing:
if self._fallback:
return self._fallback.gettext(message)
return message
return tmsg
def ngettext(self, msgid1, msgid2, n):
try:
tmsg = self._catalog[(msgid1, self.plural(n))]
except KeyError:
if self._fallback:
return self._fallback.ngettext(msgid1, msgid2, n)
if n == 1:
tmsg = msgid1
else:
tmsg = msgid2
return tmsg
# Locate a .mo file using the gettext strategy
def find(domain, localedir=None, languages=None, all=False):
# Get some reasonable defaults for arguments that were not supplied
if localedir is None:
localedir = _default_localedir
if languages is None:
languages = []
for envar in ('LANGUAGE', 'LC_ALL', 'LC_MESSAGES', 'LANG'):
val = os.environ.get(envar)
if val:
languages = val.split(':')
break
if 'C' not in languages:
languages.append('C')
# now normalize and expand the languages
nelangs = []
for lang in languages:
for nelang in _expand_lang(lang):
if nelang not in nelangs:
nelangs.append(nelang)
# select a language
if all:
result = []
else:
result = None
for lang in nelangs:
if lang == 'C':
break
mofile = os.path.join(localedir, lang, 'LC_MESSAGES', '%s.mo' % domain)
if os.path.exists(mofile):
if all:
result.append(mofile)
else:
return mofile
return result
# a mapping between absolute .mo file path and Translation object
_translations = {}
def translation(domain, localedir=None, languages=None,
class_=None, fallback=False, codeset=None):
if class_ is None:
class_ = GNUTranslations
mofiles = find(domain, localedir, languages, all=True)
if not mofiles:
if fallback:
return NullTranslations()
raise OSError(ENOENT, 'No translation file found for domain', domain)
# Avoid opening, reading, and parsing the .mo file after it's been done
# once.
result = None
for mofile in mofiles:
key = (class_, os.path.abspath(mofile))
t = _translations.get(key)
if t is None:
with open(mofile, 'rb') as fp:
t = _translations.setdefault(key, class_(fp))
# Copy the translation object to allow setting fallbacks and
# output charset. All other instance data is shared with the
# cached object.
t = copy.copy(t)
if codeset:
t.set_output_charset(codeset)
if result is None:
result = t
else:
result.add_fallback(t)
return result
def install(domain, localedir=None, codeset=None, names=None):
t = translation(domain, localedir, fallback=True, codeset=codeset)
t.install(names)
# a mapping b/w domains and locale directories
_localedirs = {}
# a mapping b/w domains and codesets
_localecodesets = {}
# current global domain, `messages' used for compatibility w/ GNU gettext
_current_domain = 'messages'
def textdomain(domain=None):
global _current_domain
if domain is not None:
_current_domain = domain
return _current_domain
def bindtextdomain(domain, localedir=None):
global _localedirs
if localedir is not None:
_localedirs[domain] = localedir
return _localedirs.get(domain, _default_localedir)
def bind_textdomain_codeset(domain, codeset=None):
global _localecodesets
if codeset is not None:
_localecodesets[domain] = codeset
return _localecodesets.get(domain)
def dgettext(domain, message):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except OSError:
return message
return t.gettext(message)
def ldgettext(domain, message):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except OSError:
return message
return t.lgettext(message)
def dngettext(domain, msgid1, msgid2, n):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except OSError:
if n == 1:
return msgid1
else:
return msgid2
return t.ngettext(msgid1, msgid2, n)
def ldngettext(domain, msgid1, msgid2, n):
try:
t = translation(domain, _localedirs.get(domain, None),
codeset=_localecodesets.get(domain))
except OSError:
if n == 1:
return msgid1
else:
return msgid2
return t.lngettext(msgid1, msgid2, n)
def gettext(message):
return dgettext(_current_domain, message)
def lgettext(message):
return ldgettext(_current_domain, message)
def ngettext(msgid1, msgid2, n):
return dngettext(_current_domain, msgid1, msgid2, n)
def lngettext(msgid1, msgid2, n):
return ldngettext(_current_domain, msgid1, msgid2, n)
# dcgettext() has been deemed unnecessary and is not implemented.
# James Henstridge's Catalog constructor from GNOME gettext. Documented usage
# was:
#
# import gettext
# cat = gettext.Catalog(PACKAGE, localedir=LOCALEDIR)
# _ = cat.gettext
# print _('Hello World')
# The resulting catalog object currently don't support access through a
# dictionary API, which was supported (but apparently unused) in GNOME
# gettext.
Catalog = translation
"""Filename globbing utility."""
import os
import re
import fnmatch
__all__ = ["glob", "iglob"]
def glob(pathname):
"""Return a list of paths matching a pathname pattern.
The pattern may contain simple shell-style wildcards a la
fnmatch. However, unlike fnmatch, filenames starting with a
dot are special cases that are not matched by '*' and '?'
patterns.
"""
return list(iglob(pathname))
def iglob(pathname):
"""Return an iterator which yields the paths matching a pathname pattern.
The pattern may contain simple shell-style wildcards a la
fnmatch. However, unlike fnmatch, filenames starting with a
dot are special cases that are not matched by '*' and '?'
patterns.
"""
dirname, basename = os.path.split(pathname)
if not has_magic(pathname):
if basename:
if os.path.lexists(pathname):
yield pathname
else:
# Patterns ending with a slash should match only directories
if os.path.isdir(dirname):
yield pathname
return
if not dirname:
yield from glob1(None, basename)
return
# `os.path.split()` returns the argument itself as a dirname if it is a
# drive or UNC path. Prevent an infinite recursion if a drive or UNC path
# contains magic characters (i.e. r'\\?\C:').
if dirname != pathname and has_magic(dirname):
dirs = iglob(dirname)
else:
dirs = [dirname]
if has_magic(basename):
glob_in_dir = glob1
else:
glob_in_dir = glob0
for dirname in dirs:
for name in glob_in_dir(dirname, basename):
yield os.path.join(dirname, name)
# These 2 helper functions non-recursively glob inside a literal directory.
# They return a list of basenames. `glob1` accepts a pattern while `glob0`
# takes a literal basename (so it only has to check for its existence).
def glob1(dirname, pattern):
if not dirname:
if isinstance(pattern, bytes):
dirname = bytes(os.curdir, 'ASCII')
else:
dirname = os.curdir
try:
names = os.listdir(dirname)
except OSError:
return []
if not _ishidden(pattern):
names = [x for x in names if not _ishidden(x)]
return fnmatch.filter(names, pattern)
def glob0(dirname, basename):
if not basename:
# `os.path.split()` returns an empty basename for paths ending with a
# directory separator. 'q*x/' should match only directories.
if os.path.isdir(dirname):
return [basename]
else:
if os.path.lexists(os.path.join(dirname, basename)):
return [basename]
return []
magic_check = re.compile('([*?[])')
magic_check_bytes = re.compile(b'([*?[])')
def has_magic(s):
if isinstance(s, bytes):
match = magic_check_bytes.search(s)
else:
match = magic_check.search(s)
return match is not None
def _ishidden(path):
return path[0] in ('.', b'.'[0])
def escape(pathname):
"""Escape all special characters.
"""
# Escaping is done by wrapping any of "*?[" between square brackets.
# Metacharacters do not work in the drive part and shouldn't be escaped.
drive, pathname = os.path.splitdrive(pathname)
if isinstance(pathname, bytes):
pathname = magic_check_bytes.sub(br'[\1]', pathname)
else:
pathname = magic_check.sub(r'[\1]', pathname)
return drive + pathname
"""Functions that read and write gzipped files.
The user of the file doesn't have to worry about the compression,
but random access is not allowed."""
# based on Andrew Kuchling's minigzip.py distributed with the zlib module
import struct, sys, time, os
import zlib
import builtins
import io
__all__ = ["GzipFile", "open", "compress", "decompress"]
FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT = 1, 2, 4, 8, 16
READ, WRITE = 1, 2
def open(filename, mode="rb", compresslevel=9,
encoding=None, errors=None, newline=None):
"""Open a gzip-compressed file in binary or text mode.
The filename argument can be an actual filename (a str or bytes object), or
an existing file object to read from or write to.
The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for
binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is
"rb", and the default compresslevel is 9.
For binary mode, this function is equivalent to the GzipFile constructor:
GzipFile(filename, mode, compresslevel). In this case, the encoding, errors
and newline arguments must not be provided.
For text mode, a GzipFile object is created, and wrapped in an
io.TextIOWrapper instance with the specified encoding, error handling
behavior, and line ending(s).
"""
if "t" in mode:
if "b" in mode:
raise ValueError("Invalid mode: %r" % (mode,))
else:
if encoding is not None:
raise ValueError("Argument 'encoding' not supported in binary mode")
if errors is not None:
raise ValueError("Argument 'errors' not supported in binary mode")
if newline is not None:
raise ValueError("Argument 'newline' not supported in binary mode")
gz_mode = mode.replace("t", "")
if isinstance(filename, (str, bytes)):
binary_file = GzipFile(filename, gz_mode, compresslevel)
elif hasattr(filename, "read") or hasattr(filename, "write"):
binary_file = GzipFile(None, gz_mode, compresslevel, filename)
else:
raise TypeError("filename must be a str or bytes object, or a file")
if "t" in mode:
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
def write32u(output, value):
# The L format writes the bit pattern correctly whether signed
# or unsigned.
output.write(struct.pack("<L", value))
class _PaddedFile:
"""Minimal read-only file object that prepends a string to the contents
of an actual file. Shouldn't be used outside of gzip.py, as it lacks
essential functionality."""
def __init__(self, f, prepend=b''):
self._buffer = prepend
self._length = len(prepend)
self.file = f
self._read = 0
def read(self, size):
if self._read is None:
return self.file.read(size)
if self._read + size <= self._length:
read = self._read
self._read += size
return self._buffer[read:self._read]
else:
read = self._read
self._read = None
return self._buffer[read:] + \
self.file.read(size-self._length+read)
def prepend(self, prepend=b'', readprevious=False):
if self._read is None:
self._buffer = prepend
elif readprevious and len(prepend) <= self._read:
self._read -= len(prepend)
return
else:
self._buffer = self._buffer[self._read:] + prepend
self._length = len(self._buffer)
self._read = 0
def unused(self):
if self._read is None:
return b''
return self._buffer[self._read:]
def seek(self, offset, whence=0):
# This is only ever called with offset=whence=0
if whence == 1 and self._read is not None:
if 0 <= offset + self._read <= self._length:
self._read += offset
return
else:
offset += self._length - self._read
self._read = None
self._buffer = None
return self.file.seek(offset, whence)
def __getattr__(self, name):
return getattr(self.file, name)
class GzipFile(io.BufferedIOBase):
"""The GzipFile class simulates most of the methods of a file object with
the exception of the readinto() and truncate() methods.
This class only supports opening files in binary mode. If you need to open a
compressed file in text mode, use the gzip.open() function.
"""
myfileobj = None
max_read_chunk = 10 * 1024 * 1024 # 10Mb
def __init__(self, filename=None, mode=None,
compresslevel=9, fileobj=None, mtime=None):
"""Constructor for the GzipFile class.
At least one of fileobj and filename must be given a
non-trivial value.
The new class instance is based on fileobj, which can be a regular
file, an io.BytesIO object, or any other object which simulates a file.
It defaults to None, in which case filename is opened to provide
a file object.
When fileobj is not None, the filename argument is only used to be
included in the gzip file header, which may includes the original
filename of the uncompressed file. It defaults to the filename of
fileobj, if discernible; otherwise, it defaults to the empty string,
and in this case the original filename is not included in the header.
The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x', or
'xb' depending on whether the file will be read or written. The default
is the mode of fileobj if discernible; otherwise, the default is 'rb'.
A mode of 'r' is equivalent to one of 'rb', and similarly for 'w' and
'wb', 'a' and 'ab', and 'x' and 'xb'.
The compresslevel argument is an integer from 0 to 9 controlling the
level of compression; 1 is fastest and produces the least compression,
and 9 is slowest and produces the most compression. 0 is no compression
at all. The default is 9.
The mtime argument is an optional numeric timestamp to be written
to the stream when compressing. All gzip compressed streams
are required to contain a timestamp. If omitted or None, the
current time is used. This module ignores the timestamp when
decompressing; however, some programs, such as gunzip, make use
of it. The format of the timestamp is the same as that of the
return value of time.time() and of the st_mtime member of the
object returned by os.stat().
"""
if mode and ('t' in mode or 'U' in mode):
raise ValueError("Invalid mode: {!r}".format(mode))
if mode and 'b' not in mode:
mode += 'b'
if fileobj is None:
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
if filename is None:
filename = getattr(fileobj, 'name', '')
if not isinstance(filename, (str, bytes)):
filename = ''
if mode is None:
mode = getattr(fileobj, 'mode', 'rb')
if mode.startswith('r'):
self.mode = READ
# Set flag indicating start of a new member
self._new_member = True
# Buffer data read from gzip file. extrastart is offset in
# stream where buffer starts. extrasize is number of
# bytes remaining in buffer from current stream position.
self.extrabuf = b""
self.extrasize = 0
self.extrastart = 0
self.name = filename
# Starts small, scales exponentially
self.min_readsize = 100
fileobj = _PaddedFile(fileobj)
elif mode.startswith(('w', 'a', 'x')):
self.mode = WRITE
self._init_write(filename)
self.compress = zlib.compressobj(compresslevel,
zlib.DEFLATED,
-zlib.MAX_WBITS,
zlib.DEF_MEM_LEVEL,
0)
else:
raise ValueError("Invalid mode: {!r}".format(mode))
self.fileobj = fileobj
self.offset = 0
self.mtime = mtime
if self.mode == WRITE:
self._write_gzip_header()
@property
def filename(self):
import warnings
warnings.warn("use the name attribute", DeprecationWarning, 2)
if self.mode == WRITE and self.name[-3:] != ".gz":
return self.name + ".gz"
return self.name
def __repr__(self):
fileobj = self.fileobj
if isinstance(fileobj, _PaddedFile):
fileobj = fileobj.file
s = repr(fileobj)
return '<gzip ' + s[1:-1] + ' ' + hex(id(self)) + '>'
def _check_closed(self):
"""Raises a ValueError if the underlying file object has been closed.
"""
if self.closed:
raise ValueError('I/O operation on closed file.')
def _init_write(self, filename):
self.name = filename
self.crc = zlib.crc32(b"") & 0xffffffff
self.size = 0
self.writebuf = []
self.bufsize = 0
def _write_gzip_header(self):
self.fileobj.write(b'\037\213') # magic header
self.fileobj.write(b'\010') # compression method
try:
# RFC 1952 requires the FNAME field to be Latin-1. Do not
# include filenames that cannot be represented that way.
fname = os.path.basename(self.name)
if not isinstance(fname, bytes):
fname = fname.encode('latin-1')
if fname.endswith(b'.gz'):
fname = fname[:-3]
except UnicodeEncodeError:
fname = b''
flags = 0
if fname:
flags = FNAME
self.fileobj.write(chr(flags).encode('latin-1'))
mtime = self.mtime
if mtime is None:
mtime = time.time()
write32u(self.fileobj, int(mtime))
self.fileobj.write(b'\002')
self.fileobj.write(b'\377')
if fname:
self.fileobj.write(fname + b'\000')
def _init_read(self):
self.crc = zlib.crc32(b"") & 0xffffffff
self.size = 0
def _read_exact(self, n):
data = self.fileobj.read(n)
while len(data) < n:
b = self.fileobj.read(n - len(data))
if not b:
raise EOFError("Compressed file ended before the "
"end-of-stream marker was reached")
data += b
return data
def _read_gzip_header(self):
magic = self.fileobj.read(2)
if magic == b'':
return False
if magic != b'\037\213':
raise OSError('Not a gzipped file')
method, flag, self.mtime = struct.unpack("<BBIxx", self._read_exact(8))
if method != 8:
raise OSError('Unknown compression method')
if flag & FEXTRA:
# Read & discard the extra field, if present
extra_len, = struct.unpack("<H", self._read_exact(2))
self._read_exact(extra_len)
if flag & FNAME:
# Read and discard a null-terminated string containing the filename
while True:
s = self.fileobj.read(1)
if not s or s==b'\000':
break
if flag & FCOMMENT:
# Read and discard a null-terminated string containing a comment
while True:
s = self.fileobj.read(1)
if not s or s==b'\000':
break
if flag & FHCRC:
self._read_exact(2) # Read & discard the 16-bit header CRC
unused = self.fileobj.unused()
if unused:
uncompress = self.decompress.decompress(unused)
self._add_read_data(uncompress)
return True
def write(self,data):
self._check_closed()
if self.mode != WRITE:
import errno
raise OSError(errno.EBADF, "write() on read-only GzipFile object")
if self.fileobj is None:
raise ValueError("write() on closed GzipFile object")
# Convert data type if called by io.BufferedWriter.
if isinstance(data, memoryview):
data = data.tobytes()
if len(data) > 0:
self.fileobj.write(self.compress.compress(data))
self.size += len(data)
self.crc = zlib.crc32(data, self.crc) & 0xffffffff
self.offset += len(data)
return len(data)
def read(self, size=-1):
self._check_closed()
if self.mode != READ:
import errno
raise OSError(errno.EBADF, "read() on write-only GzipFile object")
if self.extrasize <= 0 and self.fileobj is None:
return b''
readsize = 1024
if size < 0: # get the whole thing
while self._read(readsize):
readsize = min(self.max_read_chunk, readsize * 2)
size = self.extrasize
else: # just get some more of it
while size > self.extrasize:
if not self._read(readsize):
if size > self.extrasize:
size = self.extrasize
break
readsize = min(self.max_read_chunk, readsize * 2)
offset = self.offset - self.extrastart
chunk = self.extrabuf[offset: offset + size]
self.extrasize = self.extrasize - size
self.offset += size
return chunk
def read1(self, size=-1):
self._check_closed()
if self.mode != READ:
import errno
raise OSError(errno.EBADF, "read1() on write-only GzipFile object")
if self.extrasize <= 0 and self.fileobj is None:
return b''
# For certain input data, a single call to _read() may not return
# any data. In this case, retry until we get some data or reach EOF.
while self.extrasize <= 0 and self._read():
pass
if size < 0 or size > self.extrasize:
size = self.extrasize
offset = self.offset - self.extrastart
chunk = self.extrabuf[offset: offset + size]
self.extrasize -= size
self.offset += size
return chunk
def peek(self, n):
if self.mode != READ:
import errno
raise OSError(errno.EBADF, "peek() on write-only GzipFile object")
# Do not return ridiculously small buffers, for one common idiom
# is to call peek(1) and expect more bytes in return.
if n < 100:
n = 100
if self.extrasize == 0:
if self.fileobj is None:
return b''
# Ensure that we don't return b"" if we haven't reached EOF.
# 1024 is the same buffering heuristic used in read()
while self.extrasize == 0 and self._read(max(n, 1024)):
pass
offset = self.offset - self.extrastart
remaining = self.extrasize
assert remaining == len(self.extrabuf) - offset
return self.extrabuf[offset:offset + n]
def _unread(self, buf):
self.extrasize = len(buf) + self.extrasize
self.offset -= len(buf)
def _read(self, size=1024):
if self.fileobj is None:
return False
if self._new_member:
# If the _new_member flag is set, we have to
# jump to the next member, if there is one.
self._init_read()
if not self._read_gzip_header():
return False
self.decompress = zlib.decompressobj(-zlib.MAX_WBITS)
self._new_member = False
# Read a chunk of data from the file
buf = self.fileobj.read(size)
# If the EOF has been reached, flush the decompression object
# and mark this object as finished.
if buf == b"":
uncompress = self.decompress.flush()
# Prepend the already read bytes to the fileobj to they can be
# seen by _read_eof()
self.fileobj.prepend(self.decompress.unused_data, True)
self._read_eof()
self._add_read_data( uncompress )
return False
uncompress = self.decompress.decompress(buf)
self._add_read_data( uncompress )
if self.decompress.unused_data != b"":
# Ending case: we've come to the end of a member in the file,
# so seek back to the start of the unused data, finish up
# this member, and read a new gzip header.
# Prepend the already read bytes to the fileobj to they can be
# seen by _read_eof() and _read_gzip_header()
self.fileobj.prepend(self.decompress.unused_data, True)
# Check the CRC and file size, and set the flag so we read
# a new member on the next call
self._read_eof()
self._new_member = True
return True
def _add_read_data(self, data):
self.crc = zlib.crc32(data, self.crc) & 0xffffffff
offset = self.offset - self.extrastart
self.extrabuf = self.extrabuf[offset:] + data
self.extrasize = self.extrasize + len(data)
self.extrastart = self.offset
self.size = self.size + len(data)
def _read_eof(self):
# We've read to the end of the file
# We check the that the computed CRC and size of the
# uncompressed data matches the stored values. Note that the size
# stored is the true file size mod 2**32.
crc32, isize = struct.unpack("<II", self._read_exact(8))
if crc32 != self.crc:
raise OSError("CRC check failed %s != %s" % (hex(crc32),
hex(self.crc)))
elif isize != (self.size & 0xffffffff):
raise OSError("Incorrect length of data produced")
# Gzip files can be padded with zeroes and still have archives.
# Consume all zero bytes and set the file position to the first
# non-zero byte. See http://www.gzip.org/#faq8
c = b"\x00"
while c == b"\x00":
c = self.fileobj.read(1)
if c:
self.fileobj.prepend(c, True)
@property
def closed(self):
return self.fileobj is None
def close(self):
fileobj = self.fileobj
if fileobj is None:
return
self.fileobj = None
try:
if self.mode == WRITE:
fileobj.write(self.compress.flush())
write32u(fileobj, self.crc)
# self.size may exceed 2GB, or even 4GB
write32u(fileobj, self.size & 0xffffffff)
finally:
myfileobj = self.myfileobj
if myfileobj:
self.myfileobj = None
myfileobj.close()
def flush(self,zlib_mode=zlib.Z_SYNC_FLUSH):
self._check_closed()
if self.mode == WRITE:
# Ensure the compressor's buffer is flushed
self.fileobj.write(self.compress.flush(zlib_mode))
self.fileobj.flush()
def fileno(self):
"""Invoke the underlying file object's fileno() method.
This will raise AttributeError if the underlying file object
doesn't support fileno().
"""
return self.fileobj.fileno()
def rewind(self):
'''Return the uncompressed stream file position indicator to the
beginning of the file'''
if self.mode != READ:
raise OSError("Can't rewind in write mode")
self.fileobj.seek(0)
self._new_member = True
self.extrabuf = b""
self.extrasize = 0
self.extrastart = 0
self.offset = 0
def readable(self):
return self.mode == READ
def writable(self):
return self.mode == WRITE
def seekable(self):
return True
def seek(self, offset, whence=0):
if whence:
if whence == 1:
offset = self.offset + offset
else:
raise ValueError('Seek from end not supported')
if self.mode == WRITE:
if offset < self.offset:
raise OSError('Negative seek in write mode')
count = offset - self.offset
chunk = bytes(1024)
for i in range(count // 1024):
self.write(chunk)
self.write(bytes(count % 1024))
elif self.mode == READ:
if offset < self.offset:
# for negative seek, rewind and do positive seek
self.rewind()
count = offset - self.offset
for i in range(count // 1024):
self.read(1024)
self.read(count % 1024)
return self.offset
def readline(self, size=-1):
if size < 0:
# Shortcut common case - newline found in buffer.
offset = self.offset - self.extrastart
i = self.extrabuf.find(b'\n', offset) + 1
if i > 0:
self.extrasize -= i - offset
self.offset += i - offset
return self.extrabuf[offset: i]
size = sys.maxsize
readsize = self.min_readsize
else:
readsize = size
bufs = []
while size != 0:
c = self.read(readsize)
i = c.find(b'\n')
# We set i=size to break out of the loop under two
# conditions: 1) there's no newline, and the chunk is
# larger than size, or 2) there is a newline, but the
# resulting line would be longer than 'size'.
if (size <= i) or (i == -1 and len(c) > size):
i = size - 1
if i >= 0 or c == b'':
bufs.append(c[:i + 1]) # Add portion of last chunk
self._unread(c[i + 1:]) # Push back rest of chunk
break
# Append chunk to list, decrease 'size',
bufs.append(c)
size = size - len(c)
readsize = min(size, readsize * 2)
if readsize > self.min_readsize:
self.min_readsize = min(readsize, self.min_readsize * 2, 512)
return b''.join(bufs) # Return resulting line
def compress(data, compresslevel=9):
"""Compress data in one shot and return the compressed string.
Optional argument is the compression level, in range of 0-9.
"""
buf = io.BytesIO()
with GzipFile(fileobj=buf, mode='wb', compresslevel=compresslevel) as f:
f.write(data)
return buf.getvalue()
def decompress(data):
"""Decompress a gzip compressed string in one shot.
Return the decompressed string.
"""
with GzipFile(fileobj=io.BytesIO(data)) as f:
return f.read()
def _test():
# Act like gzip; with -d, act like gunzip.
# The input file is not deleted, however, nor are any other gzip
# options or features supported.
args = sys.argv[1:]
decompress = args and args[0] == "-d"
if decompress:
args = args[1:]
if not args:
args = ["-"]
for arg in args:
if decompress:
if arg == "-":
f = GzipFile(filename="", mode="rb", fileobj=sys.stdin.buffer)
g = sys.stdout.buffer
else:
if arg[-3:] != ".gz":
print("filename doesn't end in .gz:", repr(arg))
continue
f = open(arg, "rb")
g = builtins.open(arg[:-3], "wb")
else:
if arg == "-":
f = sys.stdin.buffer
g = GzipFile(filename="", mode="wb", fileobj=sys.stdout.buffer)
else:
f = builtins.open(arg, "rb")
g = open(arg + ".gz", "wb")
while True:
chunk = f.read(1024)
if not chunk:
break
g.write(chunk)
if g is not sys.stdout.buffer:
g.close()
if f is not sys.stdin.buffer:
f.close()
if __name__ == '__main__':
_test()
#. Copyright (C) 2005-2010 Gregory P. Smith ([email protected])
# Licensed to PSF under a Contributor Agreement.
#
__doc__ = """hashlib module - A common interface to many hash functions.
new(name, data=b'') - returns a new hash object implementing the
given hash function; initializing the hash
using the given binary data.
Named constructor functions are also available, these are faster
than using new(name):
md5(), sha1(), sha224(), sha256(), sha384(), and sha512()
More algorithms may be available on your platform but the above are guaranteed
to exist. See the algorithms_guaranteed and algorithms_available attributes
to find out what algorithm names can be passed to new().
NOTE: If you want the adler32 or crc32 hash functions they are available in
the zlib module.
Choose your hash function wisely. Some have known collision weaknesses.
sha384 and sha512 will be slow on 32 bit platforms.
Hash objects have these methods:
- update(arg): Update the hash object with the bytes in arg. Repeated calls
are equivalent to a single call with the concatenation of all
the arguments.
- digest(): Return the digest of the bytes passed to the update() method
so far.
- hexdigest(): Like digest() except the digest is returned as a unicode
object of double length, containing only hexadecimal digits.
- copy(): Return a copy (clone) of the hash object. This can be used to
efficiently compute the digests of strings that share a common
initial substring.
For example, to obtain the digest of the string 'Nobody inspects the
spammish repetition':
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update(b"Nobody inspects")
>>> m.update(b" the spammish repetition")
>>> m.digest()
b'\\xbbd\\x9c\\x83\\xdd\\x1e\\xa5\\xc9\\xd9\\xde\\xc9\\xa1\\x8d\\xf0\\xff\\xe9'
More condensed:
>>> hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest()
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
"""
# This tuple and __get_builtin_constructor() must be modified if a new
# always available algorithm is added.
__always_supported = ('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512')
algorithms_guaranteed = set(__always_supported)
algorithms_available = set(__always_supported)
__all__ = __always_supported + ('new', 'algorithms_guaranteed',
'algorithms_available', 'pbkdf2_hmac')
__builtin_constructor_cache = {}
def __get_builtin_constructor(name):
cache = __builtin_constructor_cache
constructor = cache.get(name)
if constructor is not None:
return constructor
try:
if name in ('SHA1', 'sha1'):
import _sha1
cache['SHA1'] = cache['sha1'] = _sha1.sha1
elif name in ('MD5', 'md5'):
import _md5
cache['MD5'] = cache['md5'] = _md5.md5
elif name in ('SHA256', 'sha256', 'SHA224', 'sha224'):
import _sha256
cache['SHA224'] = cache['sha224'] = _sha256.sha224
cache['SHA256'] = cache['sha256'] = _sha256.sha256
elif name in ('SHA512', 'sha512', 'SHA384', 'sha384'):
import _sha512
cache['SHA384'] = cache['sha384'] = _sha512.sha384
cache['SHA512'] = cache['sha512'] = _sha512.sha512
except ImportError:
pass # no extension module, this hash is unsupported.
constructor = cache.get(name)
if constructor is not None:
return constructor
raise ValueError('unsupported hash type ' + name)
def __get_openssl_constructor(name):
try:
f = getattr(_hashlib, 'openssl_' + name)
# Allow the C module to raise ValueError. The function will be
# defined but the hash not actually available thanks to OpenSSL.
f()
# Use the C function directly (very fast)
return f
except (AttributeError, ValueError):
return __get_builtin_constructor(name)
def __py_new(name, data=b''):
"""new(name, data=b'') - Return a new hashing object using the named algorithm;
optionally initialized with data (which must be bytes).
"""
return __get_builtin_constructor(name)(data)
def __hash_new(name, data=b''):
"""new(name, data=b'') - Return a new hashing object using the named algorithm;
optionally initialized with data (which must be bytes).
"""
try:
return _hashlib.new(name, data)
except ValueError:
# If the _hashlib module (OpenSSL) doesn't support the named
# hash, try using our builtin implementations.
# This allows for SHA224/256 and SHA384/512 support even though
# the OpenSSL library prior to 0.9.8 doesn't provide them.
return __get_builtin_constructor(name)(data)
try:
import _hashlib
new = __hash_new
__get_hash = __get_openssl_constructor
algorithms_available = algorithms_available.union(
_hashlib.openssl_md_meth_names)
except ImportError:
new = __py_new
__get_hash = __get_builtin_constructor
try:
# OpenSSL's PKCS5_PBKDF2_HMAC requires OpenSSL 1.0+ with HMAC and SHA
from _hashlib import pbkdf2_hmac
except ImportError:
_trans_5C = bytes((x ^ 0x5C) for x in range(256))
_trans_36 = bytes((x ^ 0x36) for x in range(256))
def pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None):
"""Password based key derivation function 2 (PKCS #5 v2.0)
This Python implementations based on the hmac module about as fast
as OpenSSL's PKCS5_PBKDF2_HMAC for short passwords and much faster
for long passwords.
"""
if not isinstance(hash_name, str):
raise TypeError(hash_name)
if not isinstance(password, (bytes, bytearray)):
password = bytes(memoryview(password))
if not isinstance(salt, (bytes, bytearray)):
salt = bytes(memoryview(salt))
# Fast inline HMAC implementation
inner = new(hash_name)
outer = new(hash_name)
blocksize = getattr(inner, 'block_size', 64)
if len(password) > blocksize:
password = new(hash_name, password).digest()
password = password + b'\x00' * (blocksize - len(password))
inner.update(password.translate(_trans_36))
outer.update(password.translate(_trans_5C))
def prf(msg, inner=inner, outer=outer):
# PBKDF2_HMAC uses the password as key. We can re-use the same
# digest objects and just update copies to skip initialization.
icpy = inner.copy()
ocpy = outer.copy()
icpy.update(msg)
ocpy.update(icpy.digest())
return ocpy.digest()
if iterations < 1:
raise ValueError(iterations)
if dklen is None:
dklen = outer.digest_size
if dklen < 1:
raise ValueError(dklen)
dkey = b''
loop = 1
from_bytes = int.from_bytes
while len(dkey) < dklen:
prev = prf(salt + loop.to_bytes(4, 'big'))
# endianess doesn't matter here as long to / from use the same
rkey = int.from_bytes(prev, 'big')
for i in range(iterations - 1):
prev = prf(prev)
# rkey = rkey ^ prev
rkey ^= from_bytes(prev, 'big')
loop += 1
dkey += rkey.to_bytes(inner.digest_size, 'big')
return dkey[:dklen]
for __func_name in __always_supported:
# try them all, some may not work due to the OpenSSL
# version not supporting that algorithm.
try:
globals()[__func_name] = __get_hash(__func_name)
except ValueError:
import logging
logging.exception('code for hash %s was not found.', __func_name)
# Cleanup locals()
del __always_supported, __func_name, __get_hash
del __py_new, __hash_new, __get_openssl_constructor
"""Heap queue algorithm (a.k.a. priority queue).
Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
all k, counting elements from 0. For the sake of comparison,
non-existing elements are considered to be infinite. The interesting
property of a heap is that a[0] is always its smallest element.
Usage:
heap = [] # creates an empty heap
heappush(heap, item) # pushes a new item on the heap
item = heappop(heap) # pops the smallest item from the heap
item = heap[0] # smallest item on the heap without popping it
heapify(x) # transforms list into a heap, in-place, in linear time
item = heapreplace(heap, item) # pops and returns smallest item, and adds
# new item; the heap size is unchanged
Our API differs from textbook heap algorithms as follows:
- We use 0-based indexing. This makes the relationship between the
index for a node and the indexes for its children slightly less
obvious, but is more suitable since Python uses 0-based indexing.
- Our heappop() method returns the smallest item, not the largest.
These two make it possible to view the heap as a regular Python list
without surprises: heap[0] is the smallest item, and heap.sort()
maintains the heap invariant!
"""
# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
__about__ = """Heap queues
[explanation by François Pinard]
Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
all k, counting elements from 0. For the sake of comparison,
non-existing elements are considered to be infinite. The interesting
property of a heap is that a[0] is always its smallest element.
The strange invariant above is meant to be an efficient memory
representation for a tournament. The numbers below are `k', not a[k]:
0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In
an usual binary tournament we see in sports, each cell is the winner
over the two cells it tops, and we can trace the winner down the tree
to see all opponents s/he had. However, in many computer applications
of such tournaments, we do not need to trace the history of a winner.
To be more memory efficient, when a winner is promoted, we try to
replace it by something else at a lower level, and the rule becomes
that a cell and the two cells it tops contain three different items,
but the top cell "wins" over the two topped cells.
If this heap invariant is protected at all time, index 0 is clearly
the overall winner. The simplest algorithmic way to remove it and
find the "next" winner is to move some loser (let's say cell 30 in the
diagram above) into the 0 position, and then percolate this new 0 down
the tree, exchanging values, until the invariant is re-established.
This is clearly logarithmic on the total number of items in the tree.
By iterating over all items, you get an O(n ln n) sort.
A nice feature of this sort is that you can efficiently insert new
items while the sort is going on, provided that the inserted items are
not "better" than the last 0'th element you extracted. This is
especially useful in simulation contexts, where the tree holds all
incoming events, and the "win" condition means the smallest scheduled
time. When an event schedule other events for execution, they are
scheduled into the future, so they can easily go into the heap. So, a
heap is a good structure for implementing schedulers (this is what I
used for my MIDI sequencer :-).
Various structures for implementing schedulers have been extensively
studied, and heaps are good for this, as they are reasonably speedy,
the speed is almost constant, and the worst case is not much different
than the average case. However, there are other representations which
are more efficient overall, yet the worst cases might be terrible.
Heaps are also very useful in big disk sorts. You most probably all
know that a big sort implies producing "runs" (which are pre-sorted
sequences, which size is usually related to the amount of CPU memory),
followed by a merging passes for these runs, which merging is often
very cleverly organised[1]. It is very important that the initial
sort produces the longest runs possible. Tournaments are a good way
to that. If, using all the memory available to hold a tournament, you
replace and percolate items that happen to fit the current run, you'll
produce runs which are twice the size of the memory for random input,
and much better for input fuzzily ordered.
Moreover, if you output the 0'th item on disk and get an input which
may not fit in the current tournament (because the value "wins" over
the last output value), it cannot fit in the heap, so the size of the
heap decreases. The freed memory could be cleverly reused immediately
for progressively building a second heap, which grows at exactly the
same rate the first heap is melting. When the first heap completely
vanishes, you switch heaps and start a new run. Clever and quite
effective!
In a word, heaps are useful memory structures to know. I use them in
a few applications, and I think it is good to keep a `heap' module
around. :-)
--------------------
[1] The disk balancing algorithms which are current, nowadays, are
more annoying than clever, and this is a consequence of the seeking
capabilities of the disks. On devices which cannot seek, like big
tape drives, the story was quite different, and one had to be very
clever to ensure (far in advance) that each tape movement will be the
most effective possible (that is, will best participate at
"progressing" the merge). Some tapes were even able to read
backwards, and this was also used to avoid the rewinding time.
Believe me, real good tape sorts were quite spectacular to watch!
From all times, sorting has always been a Great Art! :-)
"""
__all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'merge',
'nlargest', 'nsmallest', 'heappushpop']
from itertools import islice, count, tee, chain
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap)-1)
def heappop(heap):
"""Pop the smallest item off the heap, maintaining the heap invariant."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
_siftup(heap, 0)
else:
returnitem = lastelt
return returnitem
def heapreplace(heap, item):
"""Pop and return the current smallest value, and add the new item.
This is more efficient than heappop() followed by heappush(), and can be
more appropriate when using a fixed-size heap. Note that the value
returned may be larger than item! That constrains reasonable uses of
this routine unless written as part of a conditional replacement:
if item > heap[0]:
item = heapreplace(heap, item)
"""
returnitem = heap[0] # raises appropriate IndexError if heap is empty
heap[0] = item
_siftup(heap, 0)
return returnitem
def heappushpop(heap, item):
"""Fast version of a heappush followed by a heappop."""
if heap and heap[0] < item:
item, heap[0] = heap[0], item
_siftup(heap, 0)
return item
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
# Transform bottom-up. The largest index there's any point to looking at
# is the largest with a child index in-range, so must have 2*i + 1 < n,
# or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
# j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is
# (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
for i in reversed(range(n//2)):
_siftup(x, i)
def _heappushpop_max(heap, item):
"""Maxheap version of a heappush followed by a heappop."""
if heap and item < heap[0]:
item, heap[0] = heap[0], item
_siftup_max(heap, 0)
return item
def _heapify_max(x):
"""Transform list into a maxheap, in-place, in O(len(x)) time."""
n = len(x)
for i in reversed(range(n//2)):
_siftup_max(x, i)
def nlargest(n, iterable):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, reverse=True)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
heapify(result)
_heappushpop = heappushpop
for elem in it:
_heappushpop(result, elem)
result.sort(reverse=True)
return result
def nsmallest(n, iterable):
"""Find the n smallest elements in a dataset.
Equivalent to: sorted(iterable)[:n]
"""
if n < 0:
return []
it = iter(iterable)
result = list(islice(it, n))
if not result:
return result
_heapify_max(result)
_heappushpop = _heappushpop_max
for elem in it:
_heappushpop(result, elem)
result.sort()
return result
# 'heap' is a heap at all indices >= startpos, except possibly for pos. pos
# is the index of a leaf with a possibly out-of-order value. Restore the
# heap invariant.
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if newitem < parent:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
# The child indices of heap index pos are already heaps, and we want to make
# a heap at index pos too. We do this by bubbling the smaller child of
# pos up (and so on with that child's children, etc) until hitting a leaf,
# then using _siftdown to move the oddball originally at index pos into place.
#
# We *could* break out of the loop as soon as we find a pos where newitem <=
# both its children, but turns out that's not a good idea, and despite that
# many books write the algorithm that way. During a heap pop, the last array
# element is sifted in, and that tends to be large, so that comparing it
# against values starting from the root usually doesn't pay (= usually doesn't
# get us out of the loop early). See Knuth, Volume 3, where this is
# explained and quantified in an exercise.
#
# Cutting the # of comparisons is important, since these routines have no
# way to extract "the priority" from an array element, so that intelligence
# is likely to be hiding in custom comparison methods, or in array elements
# storing (priority, record) tuples. Comparisons are thus potentially
# expensive.
#
# On random arrays of length 1000, making this change cut the number of
# comparisons made by heapify() a little, and those made by exhaustive
# heappop() a lot, in accord with theory. Here are typical results from 3
# runs (3 just to demonstrate how small the variance is):
#
# Compares needed by heapify Compares needed by 1000 heappops
# -------------------------- --------------------------------
# 1837 cut to 1663 14996 cut to 8680
# 1855 cut to 1659 14966 cut to 8678
# 1847 cut to 1660 15024 cut to 8703
#
# Building the heap by using heappush() 1000 times instead required
# 2198, 2148, and 2219 compares: heapify() is more efficient, when
# you can use it.
#
# The total compares needed by list.sort() on the same lists were 8627,
# 8627, and 8632 (this should be compared to the sum of heapify() and
# heappop() compares): list.sort() is (unsurprisingly!) more efficient
# for sorting.
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not heap[childpos] < heap[rightpos]:
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
def _siftdown_max(heap, startpos, pos):
'Maxheap variant of _siftdown'
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if parent < newitem:
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
def _siftup_max(heap, pos):
'Maxheap variant of _siftup'
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the larger child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of larger child.
rightpos = childpos + 1
if rightpos < endpos and not heap[rightpos] < heap[childpos]:
childpos = rightpos
# Move the larger child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown_max(heap, startpos, pos)
# If available, use C implementation
try:
from _heapq import *
except ImportError:
pass
def merge(*iterables):
'''Merge multiple sorted inputs into a single sorted output.
Similar to sorted(itertools.chain(*iterables)) but returns a generator,
does not pull the data into memory all at once, and assumes that each of
the input streams is already sorted (smallest to largest).
>>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
'''
_heappop, _heapreplace, _StopIteration = heappop, heapreplace, StopIteration
_len = len
h = []
h_append = h.append
for itnum, it in enumerate(map(iter, iterables)):
try:
next = it.__next__
h_append([next(), itnum, next])
except _StopIteration:
pass
heapify(h)
while _len(h) > 1:
try:
while True:
v, itnum, next = s = h[0]
yield v
s[0] = next() # raises StopIteration when exhausted
_heapreplace(h, s) # restore heap condition
except _StopIteration:
_heappop(h) # remove empty iterator
if h:
# fast case when only a single iterator remains
v, itnum, next = h[0]
yield v
yield from next.__self__
# Extend the implementations of nsmallest and nlargest to use a key= argument
_nsmallest = nsmallest
def nsmallest(n, iterable, key=None):
"""Find the n smallest elements in a dataset.
Equivalent to: sorted(iterable, key=key)[:n]
"""
# Short-cut for n==1 is to use min() when len(iterable)>0
if n == 1:
it = iter(iterable)
head = list(islice(it, 1))
if not head:
return []
if key is None:
return [min(chain(head, it))]
return [min(chain(head, it), key=key)]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key)[:n]
# When key is none, use simpler decoration
if key is None:
it = zip(iterable, count()) # decorate
result = _nsmallest(n, it)
return [r[0] for r in result] # undecorate
# General case, slowest method
in1, in2 = tee(iterable)
it = zip(map(key, in1), count(), in2) # decorate
result = _nsmallest(n, it)
return [r[2] for r in result] # undecorate
_nlargest = nlargest
def nlargest(n, iterable, key=None):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
"""
# Short-cut for n==1 is to use max() when len(iterable)>0
if n == 1:
it = iter(iterable)
head = list(islice(it, 1))
if not head:
return []
if key is None:
return [max(chain(head, it))]
return [max(chain(head, it), key=key)]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key, reverse=True)[:n]
# When key is none, use simpler decoration
if key is None:
it = zip(iterable, count(0,-1)) # decorate
result = _nlargest(n, it)
return [r[0] for r in result] # undecorate
# General case, slowest method
in1, in2 = tee(iterable)
it = zip(map(key, in1), count(0,-1), in2) # decorate
result = _nlargest(n, it)
return [r[2] for r in result] # undecorate
if __name__ == "__main__":
# Simple sanity test
heap = []
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
for item in data:
heappush(heap, item)
sort = []
while heap:
sort.append(heappop(heap))
print(sort)
import doctest
doctest.testmod()
"""HMAC (Keyed-Hashing for Message Authentication) Python module.
Implements the HMAC algorithm as described by RFC 2104.
"""
import warnings as _warnings
from _operator import _compare_digest as compare_digest
import hashlib as _hashlib
trans_5C = bytes((x ^ 0x5C) for x in range(256))
trans_36 = bytes((x ^ 0x36) for x in range(256))
# The size of the digests returned by HMAC depends on the underlying
# hashing module used. Use digest_size from the instance of HMAC instead.
digest_size = None
class HMAC:
"""RFC 2104 HMAC class. Also complies with RFC 4231.
This supports the API for Cryptographic Hash Functions (PEP 247).
"""
blocksize = 64 # 512-bit HMAC; can be changed in subclasses.
def __init__(self, key, msg = None, digestmod = None):
"""Create a new HMAC object.
key: key for the keyed hash object.
msg: Initial input for the hash, if provided.
digestmod: A module supporting PEP 247. *OR*
A hashlib constructor returning a new hash object. *OR*
A hash name suitable for hashlib.new().
Defaults to hashlib.md5.
Implicit default to hashlib.md5 is deprecated and will be
removed in Python 3.6.
Note: key and msg must be a bytes or bytearray objects.
"""
if not isinstance(key, (bytes, bytearray)):
raise TypeError("key: expected bytes or bytearray, but got %r" % type(key).__name__)
if digestmod is None:
_warnings.warn("HMAC() without an explicit digestmod argument "
"is deprecated.", PendingDeprecationWarning, 2)
digestmod = _hashlib.md5
if callable(digestmod):
self.digest_cons = digestmod
elif isinstance(digestmod, str):
self.digest_cons = lambda d=b'': _hashlib.new(digestmod, d)
else:
self.digest_cons = lambda d=b'': digestmod.new(d)
self.outer = self.digest_cons()
self.inner = self.digest_cons()
self.digest_size = self.inner.digest_size
if hasattr(self.inner, 'block_size'):
blocksize = self.inner.block_size
if blocksize < 16:
_warnings.warn('block_size of %d seems too small; using our '
'default of %d.' % (blocksize, self.blocksize),
RuntimeWarning, 2)
blocksize = self.blocksize
else:
_warnings.warn('No block_size attribute on given digest object; '
'Assuming %d.' % (self.blocksize),
RuntimeWarning, 2)
blocksize = self.blocksize
# self.blocksize is the default blocksize. self.block_size is
# effective block size as well as the public API attribute.
self.block_size = blocksize
if len(key) > blocksize:
key = self.digest_cons(key).digest()
key = key + bytes(blocksize - len(key))
self.outer.update(key.translate(trans_5C))
self.inner.update(key.translate(trans_36))
if msg is not None:
self.update(msg)
@property
def name(self):
return "hmac-" + self.inner.name
def update(self, msg):
"""Update this hashing object with the string msg.
"""
self.inner.update(msg)
def copy(self):
"""Return a separate copy of this hashing object.
An update to this copy won't affect the original object.
"""
# Call __new__ directly to avoid the expensive __init__.
other = self.__class__.__new__(self.__class__)
other.digest_cons = self.digest_cons
other.digest_size = self.digest_size
other.inner = self.inner.copy()
other.outer = self.outer.copy()
return other
def _current(self):
"""Return a hash object for the current state.
To be used only internally with digest() and hexdigest().
"""
h = self.outer.copy()
h.update(self.inner.digest())
return h
def digest(self):
"""Return the hash value of this hashing object.
This returns a string containing 8-bit data. The object is
not altered in any way by this function; you can continue
updating the object after calling this function.
"""
h = self._current()
return h.digest()
def hexdigest(self):
"""Like digest(), but returns a string of hexadecimal digits instead.
"""
h = self._current()
return h.hexdigest()
def new(key, msg = None, digestmod = None):
"""Create a new hashing object and return it.
key: The starting key for the hash.
msg: if available, will immediately be hashed into the object's starting
state.
You can now feed arbitrary strings into the object using its update()
method, and can ask for the hash value at any time by calling its digest()
method.
"""
return HMAC(key, msg, digestmod)
"""IMAP4 client.
Based on RFC 2060.
Public class: IMAP4
Public variable: Debug
Public functions: Internaldate2tuple
Int2AP
ParseFlags
Time2Internaldate
"""
# Author: Piers Lauder <[email protected]> December 1997.
#
# Authentication code contributed by Donn Cave <[email protected]> June 1998.
# String method conversion by ESR, February 2001.
# GET/SETACL contributed by Anthony Baxter <[email protected]> April 2001.
# IMAP4_SSL contributed by Tino Lange <[email protected]> March 2002.
# GET/SETQUOTA contributed by Andreas Zeidler <[email protected]> June 2002.
# PROXYAUTH contributed by Rick Holbert <[email protected]> November 2002.
# GET/SETANNOTATION contributed by Tomas Lindroos <[email protected]> June 2005.
__version__ = "2.58"
import binascii, errno, random, re, socket, subprocess, sys, time, calendar
from datetime import datetime, timezone, timedelta
from io import DEFAULT_BUFFER_SIZE
try:
import ssl
HAVE_SSL = True
except ImportError:
HAVE_SSL = False
__all__ = ["IMAP4", "IMAP4_stream", "Internaldate2tuple",
"Int2AP", "ParseFlags", "Time2Internaldate"]
# Globals
CRLF = b'\r\n'
Debug = 0
IMAP4_PORT = 143
IMAP4_SSL_PORT = 993
AllowedVersions = ('IMAP4REV1', 'IMAP4') # Most recent first
# Maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 3501 and 2060 (IMAP 4rev1)
# don't specify a line length. RFC 2683 suggests limiting client
# command lines to 1000 octets and that servers should be prepared
# to accept command lines up to 8000 octets, so we used to use 10K here.
# In the modern world (eg: gmail) the response to, for example, a
# search command can be quite large, so we now use 1M.
_MAXLINE = 1000000
# Commands
Commands = {
# name valid states
'APPEND': ('AUTH', 'SELECTED'),
'AUTHENTICATE': ('NONAUTH',),
'CAPABILITY': ('NONAUTH', 'AUTH', 'SELECTED', 'LOGOUT'),
'CHECK': ('SELECTED',),
'CLOSE': ('SELECTED',),
'COPY': ('SELECTED',),
'CREATE': ('AUTH', 'SELECTED'),
'DELETE': ('AUTH', 'SELECTED'),
'DELETEACL': ('AUTH', 'SELECTED'),
'EXAMINE': ('AUTH', 'SELECTED'),
'EXPUNGE': ('SELECTED',),
'FETCH': ('SELECTED',),
'GETACL': ('AUTH', 'SELECTED'),
'GETANNOTATION':('AUTH', 'SELECTED'),
'GETQUOTA': ('AUTH', 'SELECTED'),
'GETQUOTAROOT': ('AUTH', 'SELECTED'),
'MYRIGHTS': ('AUTH', 'SELECTED'),
'LIST': ('AUTH', 'SELECTED'),
'LOGIN': ('NONAUTH',),
'LOGOUT': ('NONAUTH', 'AUTH', 'SELECTED', 'LOGOUT'),
'LSUB': ('AUTH', 'SELECTED'),
'NAMESPACE': ('AUTH', 'SELECTED'),
'NOOP': ('NONAUTH', 'AUTH', 'SELECTED', 'LOGOUT'),
'PARTIAL': ('SELECTED',), # NB: obsolete
'PROXYAUTH': ('AUTH',),
'RENAME': ('AUTH', 'SELECTED'),
'SEARCH': ('SELECTED',),
'SELECT': ('AUTH', 'SELECTED'),
'SETACL': ('AUTH', 'SELECTED'),
'SETANNOTATION':('AUTH', 'SELECTED'),
'SETQUOTA': ('AUTH', 'SELECTED'),
'SORT': ('SELECTED',),
'STARTTLS': ('NONAUTH',),
'STATUS': ('AUTH', 'SELECTED'),
'STORE': ('SELECTED',),
'SUBSCRIBE': ('AUTH', 'SELECTED'),
'THREAD': ('SELECTED',),
'UID': ('SELECTED',),
'UNSUBSCRIBE': ('AUTH', 'SELECTED'),
}
# Patterns to match server responses
Continuation = re.compile(br'\+( (?P<data>.*))?')
Flags = re.compile(br'.*FLAGS \((?P<flags>[^\)]*)\)')
InternalDate = re.compile(br'.*INTERNALDATE "'
br'(?P<day>[ 0123][0-9])-(?P<mon>[A-Z][a-z][a-z])-(?P<year>[0-9][0-9][0-9][0-9])'
br' (?P<hour>[0-9][0-9]):(?P<min>[0-9][0-9]):(?P<sec>[0-9][0-9])'
br' (?P<zonen>[-+])(?P<zoneh>[0-9][0-9])(?P<zonem>[0-9][0-9])'
br'"')
Literal = re.compile(br'.*{(?P<size>\d+)}$', re.ASCII)
MapCRLF = re.compile(br'\r\n|\r|\n')
Response_code = re.compile(br'\[(?P<type>[A-Z-]+)( (?P<data>[^\]]*))?\]')
Untagged_response = re.compile(br'\* (?P<type>[A-Z-]+)( (?P<data>.*))?')
Untagged_status = re.compile(
br'\* (?P<data>\d+) (?P<type>[A-Z-]+)( (?P<data2>.*))?', re.ASCII)
class IMAP4:
"""IMAP4 client class.
Instantiate with: IMAP4([host[, port]])
host - host's name (default: localhost);
port - port number (default: standard IMAP4 port).
All IMAP4rev1 commands are supported by methods of the same
name (in lower-case).
All arguments to commands are converted to strings, except for
AUTHENTICATE, and the last argument to APPEND which is passed as
an IMAP4 literal. If necessary (the string contains any
non-printing characters or white-space and isn't enclosed with
either parentheses or double quotes) each string is quoted.
However, the 'password' argument to the LOGIN command is always
quoted. If you want to avoid having an argument string quoted
(eg: the 'flags' argument to STORE) then enclose the string in
parentheses (eg: "(\Deleted)").
Each command returns a tuple: (type, [data, ...]) where 'type'
is usually 'OK' or 'NO', and 'data' is either the text from the
tagged response, or untagged results from command. Each 'data'
is either a string, or a tuple. If a tuple, then the first part
is the header of the response, and the second part contains
the data (ie: 'literal' value).
Errors raise the exception class <instance>.error("<reason>").
IMAP4 server errors raise <instance>.abort("<reason>"),
which is a sub-class of 'error'. Mailbox status changes
from READ-WRITE to READ-ONLY raise the exception class
<instance>.readonly("<reason>"), which is a sub-class of 'abort'.
"error" exceptions imply a program error.
"abort" exceptions imply the connection should be reset, and
the command re-tried.
"readonly" exceptions imply the command should be re-tried.
Note: to use this module, you must read the RFCs pertaining to the
IMAP4 protocol, as the semantics of the arguments to each IMAP4
command are left to the invoker, not to mention the results. Also,
most IMAP servers implement a sub-set of the commands available here.
"""
class error(Exception): pass # Logical errors - debug required
class abort(error): pass # Service errors - close and retry
class readonly(abort): pass # Mailbox status changed to READ-ONLY
def __init__(self, host = '', port = IMAP4_PORT):
self.debug = Debug
self.state = 'LOGOUT'
self.literal = None # A literal argument to a command
self.tagged_commands = {} # Tagged commands awaiting response
self.untagged_responses = {} # {typ: [data, ...], ...}
self.continuation_response = '' # Last continuation response
self.is_readonly = False # READ-ONLY desired state
self.tagnum = 0
self._tls_established = False
# Open socket to server.
self.open(host, port)
try:
self._connect()
except Exception:
try:
self.shutdown()
except OSError:
pass
raise
def _connect(self):
# Create unique tag for this session,
# and compile tagged response matcher.
self.tagpre = Int2AP(random.randint(4096, 65535))
self.tagre = re.compile(br'(?P<tag>'
+ self.tagpre
+ br'\d+) (?P<type>[A-Z]+) (?P<data>.*)', re.ASCII)
# Get server welcome message,
# request and store CAPABILITY response.
if __debug__:
self._cmd_log_len = 10
self._cmd_log_idx = 0
self._cmd_log = {} # Last `_cmd_log_len' interactions
if self.debug >= 1:
self._mesg('imaplib version %s' % __version__)
self._mesg('new IMAP4 connection, tag=%s' % self.tagpre)
self.welcome = self._get_response()
if 'PREAUTH' in self.untagged_responses:
self.state = 'AUTH'
elif 'OK' in self.untagged_responses:
self.state = 'NONAUTH'
else:
raise self.error(self.welcome)
self._get_capabilities()
if __debug__:
if self.debug >= 3:
self._mesg('CAPABILITIES: %r' % (self.capabilities,))
for version in AllowedVersions:
if not version in self.capabilities:
continue
self.PROTOCOL_VERSION = version
return
raise self.error('server not IMAP4 compliant')
def __getattr__(self, attr):
# Allow UPPERCASE variants of IMAP4 command methods.
if attr in Commands:
return getattr(self, attr.lower())
raise AttributeError("Unknown IMAP4 command: '%s'" % attr)
# Overridable methods
def _create_socket(self):
return socket.create_connection((self.host, self.port))
def open(self, host = '', port = IMAP4_PORT):
"""Setup connection to remote server on "host:port"
(default: localhost:standard IMAP4 port).
This connection will be used by the routines:
read, readline, send, shutdown.
"""
self.host = host
self.port = port
self.sock = self._create_socket()
self.file = self.sock.makefile('rb')
def read(self, size):
"""Read 'size' bytes from remote."""
return self.file.read(size)
def readline(self):
"""Read line from remote."""
line = self.file.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise self.error("got more than %d bytes" % _MAXLINE)
return line
def send(self, data):
"""Send data to remote."""
self.sock.sendall(data)
def shutdown(self):
"""Close I/O established in "open"."""
self.file.close()
try:
self.sock.shutdown(socket.SHUT_RDWR)
except OSError as e:
# The server might already have closed the connection
if e.errno != errno.ENOTCONN:
raise
finally:
self.sock.close()
def socket(self):
"""Return socket instance used to connect to IMAP4 server.
socket = <instance>.socket()
"""
return self.sock
# Utility methods
def recent(self):
"""Return most recent 'RECENT' responses if any exist,
else prompt server for an update using the 'NOOP' command.
(typ, [data]) = <instance>.recent()
'data' is None if no new messages,
else list of RECENT responses, most recent last.
"""
name = 'RECENT'
typ, dat = self._untagged_response('OK', [None], name)
if dat[-1]:
return typ, dat
typ, dat = self.noop() # Prod server for response
return self._untagged_response(typ, dat, name)
def response(self, code):
"""Return data for response 'code' if received, or None.
Old value for response 'code' is cleared.
(code, [data]) = <instance>.response(code)
"""
return self._untagged_response(code, [None], code.upper())
# IMAP4 commands
def append(self, mailbox, flags, date_time, message):
"""Append message to named mailbox.
(typ, [data]) = <instance>.append(mailbox, flags, date_time, message)
All args except `message' can be None.
"""
name = 'APPEND'
if not mailbox:
mailbox = 'INBOX'
if flags:
if (flags[0],flags[-1]) != ('(',')'):
flags = '(%s)' % flags
else:
flags = None
if date_time:
date_time = Time2Internaldate(date_time)
else:
date_time = None
self.literal = MapCRLF.sub(CRLF, message)
return self._simple_command(name, mailbox, flags, date_time)
def authenticate(self, mechanism, authobject):
"""Authenticate command - requires response processing.
'mechanism' specifies which authentication mechanism is to
be used - it must appear in <instance>.capabilities in the
form AUTH=<mechanism>.
'authobject' must be a callable object:
data = authobject(response)
It will be called to process server continuation responses; the
response argument it is passed will be a bytes. It should return bytes
data that will be base64 encoded and sent to the server. It should
return None if the client abort response '*' should be sent instead.
"""
mech = mechanism.upper()
# XXX: shouldn't this code be removed, not commented out?
#cap = 'AUTH=%s' % mech
#if not cap in self.capabilities: # Let the server decide!
# raise self.error("Server doesn't allow %s authentication." % mech)
self.literal = _Authenticator(authobject).process
typ, dat = self._simple_command('AUTHENTICATE', mech)
if typ != 'OK':
raise self.error(dat[-1])
self.state = 'AUTH'
return typ, dat
def capability(self):
"""(typ, [data]) = <instance>.capability()
Fetch capabilities list from server."""
name = 'CAPABILITY'
typ, dat = self._simple_command(name)
return self._untagged_response(typ, dat, name)
def check(self):
"""Checkpoint mailbox on server.
(typ, [data]) = <instance>.check()
"""
return self._simple_command('CHECK')
def close(self):
"""Close currently selected mailbox.
Deleted messages are removed from writable mailbox.
This is the recommended command before 'LOGOUT'.
(typ, [data]) = <instance>.close()
"""
try:
typ, dat = self._simple_command('CLOSE')
finally:
self.state = 'AUTH'
return typ, dat
def copy(self, message_set, new_mailbox):
"""Copy 'message_set' messages onto end of 'new_mailbox'.
(typ, [data]) = <instance>.copy(message_set, new_mailbox)
"""
return self._simple_command('COPY', message_set, new_mailbox)
def create(self, mailbox):
"""Create new mailbox.
(typ, [data]) = <instance>.create(mailbox)
"""
return self._simple_command('CREATE', mailbox)
def delete(self, mailbox):
"""Delete old mailbox.
(typ, [data]) = <instance>.delete(mailbox)
"""
return self._simple_command('DELETE', mailbox)
def deleteacl(self, mailbox, who):
"""Delete the ACLs (remove any rights) set for who on mailbox.
(typ, [data]) = <instance>.deleteacl(mailbox, who)
"""
return self._simple_command('DELETEACL', mailbox, who)
def expunge(self):
"""Permanently remove deleted items from selected mailbox.
Generates 'EXPUNGE' response for each deleted message.
(typ, [data]) = <instance>.expunge()
'data' is list of 'EXPUNGE'd message numbers in order received.
"""
name = 'EXPUNGE'
typ, dat = self._simple_command(name)
return self._untagged_response(typ, dat, name)
def fetch(self, message_set, message_parts):
"""Fetch (parts of) messages.
(typ, [data, ...]) = <instance>.fetch(message_set, message_parts)
'message_parts' should be a string of selected parts
enclosed in parentheses, eg: "(UID BODY[TEXT])".
'data' are tuples of message part envelope and data.
"""
name = 'FETCH'
typ, dat = self._simple_command(name, message_set, message_parts)
return self._untagged_response(typ, dat, name)
def getacl(self, mailbox):
"""Get the ACLs for a mailbox.
(typ, [data]) = <instance>.getacl(mailbox)
"""
typ, dat = self._simple_command('GETACL', mailbox)
return self._untagged_response(typ, dat, 'ACL')
def getannotation(self, mailbox, entry, attribute):
"""(typ, [data]) = <instance>.getannotation(mailbox, entry, attribute)
Retrieve ANNOTATIONs."""
typ, dat = self._simple_command('GETANNOTATION', mailbox, entry, attribute)
return self._untagged_response(typ, dat, 'ANNOTATION')
def getquota(self, root):
"""Get the quota root's resource usage and limits.
Part of the IMAP4 QUOTA extension defined in rfc2087.
(typ, [data]) = <instance>.getquota(root)
"""
typ, dat = self._simple_command('GETQUOTA', root)
return self._untagged_response(typ, dat, 'QUOTA')
def getquotaroot(self, mailbox):
"""Get the list of quota roots for the named mailbox.
(typ, [[QUOTAROOT responses...], [QUOTA responses]]) = <instance>.getquotaroot(mailbox)
"""
typ, dat = self._simple_command('GETQUOTAROOT', mailbox)
typ, quota = self._untagged_response(typ, dat, 'QUOTA')
typ, quotaroot = self._untagged_response(typ, dat, 'QUOTAROOT')
return typ, [quotaroot, quota]
def list(self, directory='""', pattern='*'):
"""List mailbox names in directory matching pattern.
(typ, [data]) = <instance>.list(directory='""', pattern='*')
'data' is list of LIST responses.
"""
name = 'LIST'
typ, dat = self._simple_command(name, directory, pattern)
return self._untagged_response(typ, dat, name)
def login(self, user, password):
"""Identify client using plaintext password.
(typ, [data]) = <instance>.login(user, password)
NB: 'password' will be quoted.
"""
typ, dat = self._simple_command('LOGIN', user, self._quote(password))
if typ != 'OK':
raise self.error(dat[-1])
self.state = 'AUTH'
return typ, dat
def login_cram_md5(self, user, password):
""" Force use of CRAM-MD5 authentication.
(typ, [data]) = <instance>.login_cram_md5(user, password)
"""
self.user, self.password = user, password
return self.authenticate('CRAM-MD5', self._CRAM_MD5_AUTH)
def _CRAM_MD5_AUTH(self, challenge):
""" Authobject to use with CRAM-MD5 authentication. """
import hmac
pwd = (self.password.encode('ASCII') if isinstance(self.password, str)
else self.password)
return self.user + " " + hmac.HMAC(pwd, challenge, 'md5').hexdigest()
def logout(self):
"""Shutdown connection to server.
(typ, [data]) = <instance>.logout()
Returns server 'BYE' response.
"""
self.state = 'LOGOUT'
try: typ, dat = self._simple_command('LOGOUT')
except: typ, dat = 'NO', ['%s: %s' % sys.exc_info()[:2]]
self.shutdown()
if 'BYE' in self.untagged_responses:
return 'BYE', self.untagged_responses['BYE']
return typ, dat
def lsub(self, directory='""', pattern='*'):
"""List 'subscribed' mailbox names in directory matching pattern.
(typ, [data, ...]) = <instance>.lsub(directory='""', pattern='*')
'data' are tuples of message part envelope and data.
"""
name = 'LSUB'
typ, dat = self._simple_command(name, directory, pattern)
return self._untagged_response(typ, dat, name)
def myrights(self, mailbox):
"""Show my ACLs for a mailbox (i.e. the rights that I have on mailbox).
(typ, [data]) = <instance>.myrights(mailbox)
"""
typ,dat = self._simple_command('MYRIGHTS', mailbox)
return self._untagged_response(typ, dat, 'MYRIGHTS')
def namespace(self):
""" Returns IMAP namespaces ala rfc2342
(typ, [data, ...]) = <instance>.namespace()
"""
name = 'NAMESPACE'
typ, dat = self._simple_command(name)
return self._untagged_response(typ, dat, name)
def noop(self):
"""Send NOOP command.
(typ, [data]) = <instance>.noop()
"""
if __debug__:
if self.debug >= 3:
self._dump_ur(self.untagged_responses)
return self._simple_command('NOOP')
def partial(self, message_num, message_part, start, length):
"""Fetch truncated part of a message.
(typ, [data, ...]) = <instance>.partial(message_num, message_part, start, length)
'data' is tuple of message part envelope and data.
"""
name = 'PARTIAL'
typ, dat = self._simple_command(name, message_num, message_part, start, length)
return self._untagged_response(typ, dat, 'FETCH')
def proxyauth(self, user):
"""Assume authentication as "user".
Allows an authorised administrator to proxy into any user's
mailbox.
(typ, [data]) = <instance>.proxyauth(user)
"""
name = 'PROXYAUTH'
return self._simple_command('PROXYAUTH', user)
def rename(self, oldmailbox, newmailbox):
"""Rename old mailbox name to new.
(typ, [data]) = <instance>.rename(oldmailbox, newmailbox)
"""
return self._simple_command('RENAME', oldmailbox, newmailbox)
def search(self, charset, *criteria):
"""Search mailbox for matching messages.
(typ, [data]) = <instance>.search(charset, criterion, ...)
'data' is space separated list of matching message numbers.
"""
name = 'SEARCH'
if charset:
typ, dat = self._simple_command(name, 'CHARSET', charset, *criteria)
else:
typ, dat = self._simple_command(name, *criteria)
return self._untagged_response(typ, dat, name)
def select(self, mailbox='INBOX', readonly=False):
"""Select a mailbox.
Flush all untagged responses.
(typ, [data]) = <instance>.select(mailbox='INBOX', readonly=False)
'data' is count of messages in mailbox ('EXISTS' response).
Mandated responses are ('FLAGS', 'EXISTS', 'RECENT', 'UIDVALIDITY'), so
other responses should be obtained via <instance>.response('FLAGS') etc.
"""
self.untagged_responses = {} # Flush old responses.
self.is_readonly = readonly
if readonly:
name = 'EXAMINE'
else:
name = 'SELECT'
typ, dat = self._simple_command(name, mailbox)
if typ != 'OK':
self.state = 'AUTH' # Might have been 'SELECTED'
return typ, dat
self.state = 'SELECTED'
if 'READ-ONLY' in self.untagged_responses \
and not readonly:
if __debug__:
if self.debug >= 1:
self._dump_ur(self.untagged_responses)
raise self.readonly('%s is not writable' % mailbox)
return typ, self.untagged_responses.get('EXISTS', [None])
def setacl(self, mailbox, who, what):
"""Set a mailbox acl.
(typ, [data]) = <instance>.setacl(mailbox, who, what)
"""
return self._simple_command('SETACL', mailbox, who, what)
def setannotation(self, *args):
"""(typ, [data]) = <instance>.setannotation(mailbox[, entry, attribute]+)
Set ANNOTATIONs."""
typ, dat = self._simple_command('SETANNOTATION', *args)
return self._untagged_response(typ, dat, 'ANNOTATION')
def setquota(self, root, limits):
"""Set the quota root's resource limits.
(typ, [data]) = <instance>.setquota(root, limits)
"""
typ, dat = self._simple_command('SETQUOTA', root, limits)
return self._untagged_response(typ, dat, 'QUOTA')
def sort(self, sort_criteria, charset, *search_criteria):
"""IMAP4rev1 extension SORT command.
(typ, [data]) = <instance>.sort(sort_criteria, charset, search_criteria, ...)
"""
name = 'SORT'
#if not name in self.capabilities: # Let the server decide!
# raise self.error('unimplemented extension command: %s' % name)
if (sort_criteria[0],sort_criteria[-1]) != ('(',')'):
sort_criteria = '(%s)' % sort_criteria
typ, dat = self._simple_command(name, sort_criteria, charset, *search_criteria)
return self._untagged_response(typ, dat, name)
def starttls(self, ssl_context=None):
name = 'STARTTLS'
if not HAVE_SSL:
raise self.error('SSL support missing')
if self._tls_established:
raise self.abort('TLS session already established')
if name not in self.capabilities:
raise self.abort('TLS not supported by server')
# Generate a default SSL context if none was passed.
if ssl_context is None:
ssl_context = ssl._create_stdlib_context()
typ, dat = self._simple_command(name)
if typ == 'OK':
self.sock = ssl_context.wrap_socket(self.sock,
server_hostname=self.host)
self.file = self.sock.makefile('rb')
self._tls_established = True
self._get_capabilities()
else:
raise self.error("Couldn't establish TLS session")
return self._untagged_response(typ, dat, name)
def status(self, mailbox, names):
"""Request named status conditions for mailbox.
(typ, [data]) = <instance>.status(mailbox, names)
"""
name = 'STATUS'
#if self.PROTOCOL_VERSION == 'IMAP4': # Let the server decide!
# raise self.error('%s unimplemented in IMAP4 (obtain IMAP4rev1 server, or re-code)' % name)
typ, dat = self._simple_command(name, mailbox, names)
return self._untagged_response(typ, dat, name)
def store(self, message_set, command, flags):
"""Alters flag dispositions for messages in mailbox.
(typ, [data]) = <instance>.store(message_set, command, flags)
"""
if (flags[0],flags[-1]) != ('(',')'):
flags = '(%s)' % flags # Avoid quoting the flags
typ, dat = self._simple_command('STORE', message_set, command, flags)
return self._untagged_response(typ, dat, 'FETCH')
def subscribe(self, mailbox):
"""Subscribe to new mailbox.
(typ, [data]) = <instance>.subscribe(mailbox)
"""
return self._simple_command('SUBSCRIBE', mailbox)
def thread(self, threading_algorithm, charset, *search_criteria):
"""IMAPrev1 extension THREAD command.
(type, [data]) = <instance>.thread(threading_algorithm, charset, search_criteria, ...)
"""
name = 'THREAD'
typ, dat = self._simple_command(name, threading_algorithm, charset, *search_criteria)
return self._untagged_response(typ, dat, name)
def uid(self, command, *args):
"""Execute "command arg ..." with messages identified by UID,
rather than message number.
(typ, [data]) = <instance>.uid(command, arg1, arg2, ...)
Returns response appropriate to 'command'.
"""
command = command.upper()
if not command in Commands:
raise self.error("Unknown IMAP4 UID command: %s" % command)
if self.state not in Commands[command]:
raise self.error("command %s illegal in state %s, "
"only allowed in states %s" %
(command, self.state,
', '.join(Commands[command])))
name = 'UID'
typ, dat = self._simple_command(name, command, *args)
if command in ('SEARCH', 'SORT', 'THREAD'):
name = command
else:
name = 'FETCH'
return self._untagged_response(typ, dat, name)
def unsubscribe(self, mailbox):
"""Unsubscribe from old mailbox.
(typ, [data]) = <instance>.unsubscribe(mailbox)
"""
return self._simple_command('UNSUBSCRIBE', mailbox)
def xatom(self, name, *args):
"""Allow simple extension commands
notified by server in CAPABILITY response.
Assumes command is legal in current state.
(typ, [data]) = <instance>.xatom(name, arg, ...)
Returns response appropriate to extension command `name'.
"""
name = name.upper()
#if not name in self.capabilities: # Let the server decide!
# raise self.error('unknown extension command: %s' % name)
if not name in Commands:
Commands[name] = (self.state,)
return self._simple_command(name, *args)
# Private methods
def _append_untagged(self, typ, dat):
if dat is None:
dat = b''
ur = self.untagged_responses
if __debug__:
if self.debug >= 5:
self._mesg('untagged_responses[%s] %s += ["%r"]' %
(typ, len(ur.get(typ,'')), dat))
if typ in ur:
ur[typ].append(dat)
else:
ur[typ] = [dat]
def _check_bye(self):
bye = self.untagged_responses.get('BYE')
if bye:
raise self.abort(bye[-1].decode('ascii', 'replace'))
def _command(self, name, *args):
if self.state not in Commands[name]:
self.literal = None
raise self.error("command %s illegal in state %s, "
"only allowed in states %s" %
(name, self.state,
', '.join(Commands[name])))
for typ in ('OK', 'NO', 'BAD'):
if typ in self.untagged_responses:
del self.untagged_responses[typ]
if 'READ-ONLY' in self.untagged_responses \
and not self.is_readonly:
raise self.readonly('mailbox status changed to READ-ONLY')
tag = self._new_tag()
name = bytes(name, 'ASCII')
data = tag + b' ' + name
for arg in args:
if arg is None: continue
if isinstance(arg, str):
arg = bytes(arg, "ASCII")
data = data + b' ' + arg
literal = self.literal
if literal is not None:
self.literal = None
if type(literal) is type(self._command):
literator = literal
else:
literator = None
data = data + bytes(' {%s}' % len(literal), 'ASCII')
if __debug__:
if self.debug >= 4:
self._mesg('> %r' % data)
else:
self._log('> %r' % data)
try:
self.send(data + CRLF)
except OSError as val:
raise self.abort('socket error: %s' % val)
if literal is None:
return tag
while 1:
# Wait for continuation response
while self._get_response():
if self.tagged_commands[tag]: # BAD/NO?
return tag
# Send literal
if literator:
literal = literator(self.continuation_response)
if __debug__:
if self.debug >= 4:
self._mesg('write literal size %s' % len(literal))
try:
self.send(literal)
self.send(CRLF)
except OSError as val:
raise self.abort('socket error: %s' % val)
if not literator:
break
return tag
def _command_complete(self, name, tag):
# BYE is expected after LOGOUT
if name != 'LOGOUT':
self._check_bye()
try:
typ, data = self._get_tagged_response(tag)
except self.abort as val:
raise self.abort('command: %s => %s' % (name, val))
except self.error as val:
raise self.error('command: %s => %s' % (name, val))
if name != 'LOGOUT':
self._check_bye()
if typ == 'BAD':
raise self.error('%s command error: %s %s' % (name, typ, data))
return typ, data
def _get_capabilities(self):
typ, dat = self.capability()
if dat == [None]:
raise self.error('no CAPABILITY response from server')
dat = str(dat[-1], "ASCII")
dat = dat.upper()
self.capabilities = tuple(dat.split())
def _get_response(self):
# Read response and store.
#
# Returns None for continuation responses,
# otherwise first response line received.
resp = self._get_line()
# Command completion response?
if self._match(self.tagre, resp):
tag = self.mo.group('tag')
if not tag in self.tagged_commands:
raise self.abort('unexpected tagged response: %s' % resp)
typ = self.mo.group('type')
typ = str(typ, 'ASCII')
dat = self.mo.group('data')
self.tagged_commands[tag] = (typ, [dat])
else:
dat2 = None
# '*' (untagged) responses?
if not self._match(Untagged_response, resp):
if self._match(Untagged_status, resp):
dat2 = self.mo.group('data2')
if self.mo is None:
# Only other possibility is '+' (continuation) response...
if self._match(Continuation, resp):
self.continuation_response = self.mo.group('data')
return None # NB: indicates continuation
raise self.abort("unexpected response: '%s'" % resp)
typ = self.mo.group('type')
typ = str(typ, 'ascii')
dat = self.mo.group('data')
if dat is None: dat = b'' # Null untagged response
if dat2: dat = dat + b' ' + dat2
# Is there a literal to come?
while self._match(Literal, dat):
# Read literal direct from connection.
size = int(self.mo.group('size'))
if __debug__:
if self.debug >= 4:
self._mesg('read literal size %s' % size)
data = self.read(size)
# Store response with literal as tuple
self._append_untagged(typ, (dat, data))
# Read trailer - possibly containing another literal
dat = self._get_line()
self._append_untagged(typ, dat)
# Bracketed response information?
if typ in ('OK', 'NO', 'BAD') and self._match(Response_code, dat):
typ = self.mo.group('type')
typ = str(typ, "ASCII")
self._append_untagged(typ, self.mo.group('data'))
if __debug__:
if self.debug >= 1 and typ in ('NO', 'BAD', 'BYE'):
self._mesg('%s response: %r' % (typ, dat))
return resp
def _get_tagged_response(self, tag):
while 1:
result = self.tagged_commands[tag]
if result is not None:
del self.tagged_commands[tag]
return result
# If we've seen a BYE at this point, the socket will be
# closed, so report the BYE now.
self._check_bye()
# Some have reported "unexpected response" exceptions.
# Note that ignoring them here causes loops.
# Instead, send me details of the unexpected response and
# I'll update the code in `_get_response()'.
try:
self._get_response()
except self.abort as val:
if __debug__:
if self.debug >= 1:
self.print_log()
raise
def _get_line(self):
line = self.readline()
if not line:
raise self.abort('socket error: EOF')
# Protocol mandates all lines terminated by CRLF
if not line.endswith(b'\r\n'):
raise self.abort('socket error: unterminated line: %r' % line)
line = line[:-2]
if __debug__:
if self.debug >= 4:
self._mesg('< %r' % line)
else:
self._log('< %r' % line)
return line
def _match(self, cre, s):
# Run compiled regular expression match method on 's'.
# Save result, return success.
self.mo = cre.match(s)
if __debug__:
if self.mo is not None and self.debug >= 5:
self._mesg("\tmatched r'%r' => %r" % (cre.pattern, self.mo.groups()))
return self.mo is not None
def _new_tag(self):
tag = self.tagpre + bytes(str(self.tagnum), 'ASCII')
self.tagnum = self.tagnum + 1
self.tagged_commands[tag] = None
return tag
def _quote(self, arg):
arg = arg.replace('\\', '\\\\')
arg = arg.replace('"', '\\"')
return '"' + arg + '"'
def _simple_command(self, name, *args):
return self._command_complete(name, self._command(name, *args))
def _untagged_response(self, typ, dat, name):
if typ == 'NO':
return typ, dat
if not name in self.untagged_responses:
return typ, [None]
data = self.untagged_responses.pop(name)
if __debug__:
if self.debug >= 5:
self._mesg('untagged_responses[%s] => %s' % (name, data))
return typ, data
if __debug__:
def _mesg(self, s, secs=None):
if secs is None:
secs = time.time()
tm = time.strftime('%M:%S', time.localtime(secs))
sys.stderr.write(' %s.%02d %s\n' % (tm, (secs*100)%100, s))
sys.stderr.flush()
def _dump_ur(self, dict):
# Dump untagged responses (in `dict').
l = dict.items()
if not l: return
t = '\n\t\t'
l = map(lambda x:'%s: "%s"' % (x[0], x[1][0] and '" "'.join(x[1]) or ''), l)
self._mesg('untagged responses dump:%s%s' % (t, t.join(l)))
def _log(self, line):
# Keep log of last `_cmd_log_len' interactions for debugging.
self._cmd_log[self._cmd_log_idx] = (line, time.time())
self._cmd_log_idx += 1
if self._cmd_log_idx >= self._cmd_log_len:
self._cmd_log_idx = 0
def print_log(self):
self._mesg('last %d IMAP4 interactions:' % len(self._cmd_log))
i, n = self._cmd_log_idx, self._cmd_log_len
while n:
try:
self._mesg(*self._cmd_log[i])
except:
pass
i += 1
if i >= self._cmd_log_len:
i = 0
n -= 1
if HAVE_SSL:
class IMAP4_SSL(IMAP4):
"""IMAP4 client class over SSL connection
Instantiate with: IMAP4_SSL([host[, port[, keyfile[, certfile[, ssl_context]]]]])
host - host's name (default: localhost);
port - port number (default: standard IMAP4 SSL port);
keyfile - PEM formatted file that contains your private key (default: None);
certfile - PEM formatted certificate chain file (default: None);
ssl_context - a SSLContext object that contains your certificate chain
and private key (default: None)
Note: if ssl_context is provided, then parameters keyfile or
certfile should not be set otherwise ValueError is raised.
for more documentation see the docstring of the parent class IMAP4.
"""
def __init__(self, host='', port=IMAP4_SSL_PORT, keyfile=None, certfile=None, ssl_context=None):
if ssl_context is not None and keyfile is not None:
raise ValueError("ssl_context and keyfile arguments are mutually "
"exclusive")
if ssl_context is not None and certfile is not None:
raise ValueError("ssl_context and certfile arguments are mutually "
"exclusive")
self.keyfile = keyfile
self.certfile = certfile
if ssl_context is None:
ssl_context = ssl._create_stdlib_context(certfile=certfile,
keyfile=keyfile)
self.ssl_context = ssl_context
IMAP4.__init__(self, host, port)
def _create_socket(self):
sock = IMAP4._create_socket(self)
return self.ssl_context.wrap_socket(sock,
server_hostname=self.host)
def open(self, host='', port=IMAP4_SSL_PORT):
"""Setup connection to remote server on "host:port".
(default: localhost:standard IMAP4 SSL port).
This connection will be used by the routines:
read, readline, send, shutdown.
"""
IMAP4.open(self, host, port)
__all__.append("IMAP4_SSL")
class IMAP4_stream(IMAP4):
"""IMAP4 client class over a stream
Instantiate with: IMAP4_stream(command)
where "command" is a string that can be passed to subprocess.Popen()
for more documentation see the docstring of the parent class IMAP4.
"""
def __init__(self, command):
self.command = command
IMAP4.__init__(self)
def open(self, host = None, port = None):
"""Setup a stream connection.
This connection will be used by the routines:
read, readline, send, shutdown.
"""
self.host = None # For compatibility with parent class
self.port = None
self.sock = None
self.file = None
self.process = subprocess.Popen(self.command,
bufsize=DEFAULT_BUFFER_SIZE,
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
shell=True, close_fds=True)
self.writefile = self.process.stdin
self.readfile = self.process.stdout
def read(self, size):
"""Read 'size' bytes from remote."""
return self.readfile.read(size)
def readline(self):
"""Read line from remote."""
return self.readfile.readline()
def send(self, data):
"""Send data to remote."""
self.writefile.write(data)
self.writefile.flush()
def shutdown(self):
"""Close I/O established in "open"."""
self.readfile.close()
self.writefile.close()
self.process.wait()
class _Authenticator:
"""Private class to provide en/decoding
for base64-based authentication conversation.
"""
def __init__(self, mechinst):
self.mech = mechinst # Callable object to provide/process data
def process(self, data):
ret = self.mech(self.decode(data))
if ret is None:
return b'*' # Abort conversation
return self.encode(ret)
def encode(self, inp):
#
# Invoke binascii.b2a_base64 iteratively with
# short even length buffers, strip the trailing
# line feed from the result and append. "Even"
# means a number that factors to both 6 and 8,
# so when it gets to the end of the 8-bit input
# there's no partial 6-bit output.
#
oup = b''
if isinstance(inp, str):
inp = inp.encode('ASCII')
while inp:
if len(inp) > 48:
t = inp[:48]
inp = inp[48:]
else:
t = inp
inp = b''
e = binascii.b2a_base64(t)
if e:
oup = oup + e[:-1]
return oup
def decode(self, inp):
if not inp:
return b''
return binascii.a2b_base64(inp)
Months = ' Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split(' ')
Mon2num = {s.encode():n+1 for n, s in enumerate(Months[1:])}
def Internaldate2tuple(resp):
"""Parse an IMAP4 INTERNALDATE string.
Return corresponding local time. The return value is a
time.struct_time tuple or None if the string has wrong format.
"""
mo = InternalDate.match(resp)
if not mo:
return None
mon = Mon2num[mo.group('mon')]
zonen = mo.group('zonen')
day = int(mo.group('day'))
year = int(mo.group('year'))
hour = int(mo.group('hour'))
min = int(mo.group('min'))
sec = int(mo.group('sec'))
zoneh = int(mo.group('zoneh'))
zonem = int(mo.group('zonem'))
# INTERNALDATE timezone must be subtracted to get UT
zone = (zoneh*60 + zonem)*60
if zonen == b'-':
zone = -zone
tt = (year, mon, day, hour, min, sec, -1, -1, -1)
utc = calendar.timegm(tt) - zone
return time.localtime(utc)
def Int2AP(num):
"""Convert integer to A-P string representation."""
val = b''; AP = b'ABCDEFGHIJKLMNOP'
num = int(abs(num))
while num:
num, mod = divmod(num, 16)
val = AP[mod:mod+1] + val
return val
def ParseFlags(resp):
"""Convert IMAP4 flags response to python tuple."""
mo = Flags.match(resp)
if not mo:
return ()
return tuple(mo.group('flags').split())
def Time2Internaldate(date_time):
"""Convert date_time to IMAP4 INTERNALDATE representation.
Return string in form: '"DD-Mmm-YYYY HH:MM:SS +HHMM"'. The
date_time argument can be a number (int or float) representing
seconds since epoch (as returned by time.time()), a 9-tuple
representing local time, an instance of time.struct_time (as
returned by time.localtime()), an aware datetime instance or a
double-quoted string. In the last case, it is assumed to already
be in the correct format.
"""
if isinstance(date_time, (int, float)):
dt = datetime.fromtimestamp(date_time,
timezone.utc).astimezone()
elif isinstance(date_time, tuple):
try:
gmtoff = date_time.tm_gmtoff
except AttributeError:
if time.daylight:
dst = date_time[8]
if dst == -1:
dst = time.localtime(time.mktime(date_time))[8]
gmtoff = -(time.timezone, time.altzone)[dst]
else:
gmtoff = -time.timezone
delta = timedelta(seconds=gmtoff)
dt = datetime(*date_time[:6], tzinfo=timezone(delta))
elif isinstance(date_time, datetime):
if date_time.tzinfo is None:
raise ValueError("date_time must be aware")
dt = date_time
elif isinstance(date_time, str) and (date_time[0],date_time[-1]) == ('"','"'):
return date_time # Assume in correct format
else:
raise ValueError("date_time not of a known type")
fmt = '"%d-{}-%Y %H:%M:%S %z"'.format(Months[dt.month])
return dt.strftime(fmt)
if __name__ == '__main__':
# To test: invoke either as 'python imaplib.py [IMAP4_server_hostname]'
# or 'python imaplib.py -s "rsh IMAP4_server_hostname exec /etc/rimapd"'
# to test the IMAP4_stream class
import getopt, getpass
try:
optlist, args = getopt.getopt(sys.argv[1:], 'd:s:')
except getopt.error as val:
optlist, args = (), ()
stream_command = None
for opt,val in optlist:
if opt == '-d':
Debug = int(val)
elif opt == '-s':
stream_command = val
if not args: args = (stream_command,)
if not args: args = ('',)
host = args[0]
USER = getpass.getuser()
PASSWD = getpass.getpass("IMAP password for %s on %s: " % (USER, host or "localhost"))
test_mesg = 'From: %(user)s@localhost%(lf)sSubject: IMAP4 test%(lf)s%(lf)sdata...%(lf)s' % {'user':USER, 'lf':'\n'}
test_seq1 = (
('login', (USER, PASSWD)),
('create', ('/tmp/xxx 1',)),
('rename', ('/tmp/xxx 1', '/tmp/yyy')),
('CREATE', ('/tmp/yyz 2',)),
('append', ('/tmp/yyz 2', None, None, test_mesg)),
('list', ('/tmp', 'yy*')),
('select', ('/tmp/yyz 2',)),
('search', (None, 'SUBJECT', 'test')),
('fetch', ('1', '(FLAGS INTERNALDATE RFC822)')),
('store', ('1', 'FLAGS', '(\Deleted)')),
('namespace', ()),
('expunge', ()),
('recent', ()),
('close', ()),
)
test_seq2 = (
('select', ()),
('response',('UIDVALIDITY',)),
('uid', ('SEARCH', 'ALL')),
('response', ('EXISTS',)),
('append', (None, None, None, test_mesg)),
('recent', ()),
('logout', ()),
)
def run(cmd, args):
M._mesg('%s %s' % (cmd, args))
typ, dat = getattr(M, cmd)(*args)
M._mesg('%s => %s %s' % (cmd, typ, dat))
if typ == 'NO': raise dat[0]
return dat
try:
if stream_command:
M = IMAP4_stream(stream_command)
else:
M = IMAP4(host)
if M.state == 'AUTH':
test_seq1 = test_seq1[1:] # Login not needed
M._mesg('PROTOCOL_VERSION = %s' % M.PROTOCOL_VERSION)
M._mesg('CAPABILITIES = %r' % (M.capabilities,))
for cmd,args in test_seq1:
run(cmd, args)
for ml in run('list', ('/tmp/', 'yy%')):
mo = re.match(r'.*"([^"]+)"$', ml)
if mo: path = mo.group(1)
else: path = ml.split()[-1]
run('delete', (path,))
for cmd,args in test_seq2:
dat = run(cmd, args)
if (cmd,args) != ('uid', ('SEARCH', 'ALL')):
continue
uid = dat[-1].split()
if not uid: continue
run('uid', ('FETCH', '%s' % uid[-1],
'(FLAGS INTERNALDATE RFC822.SIZE RFC822.HEADER RFC822.TEXT)'))
print('\nAll tests OK.')
except:
print('\nTests failed.')
if not Debug:
print('''
If you would like to see debugging output,
try: %s -d5
''' % sys.argv[0])
raise
"""Recognize image file formats based on their first few bytes."""
__all__ = ["what"]
#-------------------------#
# Recognize image headers #
#-------------------------#
def what(file, h=None):
f = None
try:
if h is None:
if isinstance(file, str):
f = open(file, 'rb')
h = f.read(32)
else:
location = file.tell()
h = file.read(32)
file.seek(location)
for tf in tests:
res = tf(h, f)
if res:
return res
finally:
if f: f.close()
return None
#---------------------------------#
# Subroutines per image file type #
#---------------------------------#
tests = []
def test_jpeg(h, f):
"""JPEG data in JFIF or Exif format"""
if h[6:10] in (b'JFIF', b'Exif'):
return 'jpeg'
tests.append(test_jpeg)
def test_png(h, f):
if h.startswith(b'\211PNG\r\n\032\n'):
return 'png'
tests.append(test_png)
def test_gif(h, f):
"""GIF ('87 and '89 variants)"""
if h[:6] in (b'GIF87a', b'GIF89a'):
return 'gif'
tests.append(test_gif)
def test_tiff(h, f):
"""TIFF (can be in Motorola or Intel byte order)"""
if h[:2] in (b'MM', b'II'):
return 'tiff'
tests.append(test_tiff)
def test_rgb(h, f):
"""SGI image library"""
if h.startswith(b'\001\332'):
return 'rgb'
tests.append(test_rgb)
def test_pbm(h, f):
"""PBM (portable bitmap)"""
if len(h) >= 3 and \
h[0] == ord(b'P') and h[1] in b'14' and h[2] in b' \t\n\r':
return 'pbm'
tests.append(test_pbm)
def test_pgm(h, f):
"""PGM (portable graymap)"""
if len(h) >= 3 and \
h[0] == ord(b'P') and h[1] in b'25' and h[2] in b' \t\n\r':
return 'pgm'
tests.append(test_pgm)
def test_ppm(h, f):
"""PPM (portable pixmap)"""
if len(h) >= 3 and \
h[0] == ord(b'P') and h[1] in b'36' and h[2] in b' \t\n\r':
return 'ppm'
tests.append(test_ppm)
def test_rast(h, f):
"""Sun raster file"""
if h.startswith(b'\x59\xA6\x6A\x95'):
return 'rast'
tests.append(test_rast)
def test_xbm(h, f):
"""X bitmap (X10 or X11)"""
if h.startswith(b'#define '):
return 'xbm'
tests.append(test_xbm)
def test_bmp(h, f):
if h.startswith(b'BM'):
return 'bmp'
tests.append(test_bmp)
#--------------------#
# Small test program #
#--------------------#
def test():
import sys
recursive = 0
if sys.argv[1:] and sys.argv[1] == '-r':
del sys.argv[1:2]
recursive = 1
try:
if sys.argv[1:]:
testall(sys.argv[1:], recursive, 1)
else:
testall(['.'], recursive, 1)
except KeyboardInterrupt:
sys.stderr.write('\n[Interrupted]\n')
sys.exit(1)
def testall(list, recursive, toplevel):
import sys
import os
for filename in list:
if os.path.isdir(filename):
print(filename + '/:', end=' ')
if recursive or toplevel:
print('recursing down:')
import glob
names = glob.glob(os.path.join(filename, '*'))
testall(names, recursive, 0)
else:
print('*** directory (use -r) ***')
else:
print(filename + ':', end=' ')
sys.stdout.flush()
try:
print(what(filename))
except OSError:
print('*** not found ***')
if __name__ == '__main__':
test()
"""This module provides the components needed to build your own __import__
function. Undocumented functions are obsolete.
In most cases it is preferred you consider using the importlib module's
functionality over this module.
"""
# (Probably) need to stay in _imp
from _imp import (lock_held, acquire_lock, release_lock,
get_frozen_object, is_frozen_package,
init_builtin, init_frozen, is_builtin, is_frozen,
_fix_co_filename)
try:
from _imp import load_dynamic
except ImportError:
# Platform doesn't support dynamic loading.
load_dynamic = None
from importlib._bootstrap import SourcelessFileLoader, _ERR_MSG, _SpecMethods
from importlib import machinery
from importlib import util
import importlib
import os
import sys
import tokenize
import types
import warnings
warnings.warn("the imp module is deprecated in favour of importlib; "
"see the module's documentation for alternative uses",
PendingDeprecationWarning)
# DEPRECATED
SEARCH_ERROR = 0
PY_SOURCE = 1
PY_COMPILED = 2
C_EXTENSION = 3
PY_RESOURCE = 4
PKG_DIRECTORY = 5
C_BUILTIN = 6
PY_FROZEN = 7
PY_CODERESOURCE = 8
IMP_HOOK = 9
def new_module(name):
"""**DEPRECATED**
Create a new module.
The module is not entered into sys.modules.
"""
return types.ModuleType(name)
def get_magic():
"""**DEPRECATED**
Return the magic number for .pyc or .pyo files.
"""
return util.MAGIC_NUMBER
def get_tag():
"""Return the magic tag for .pyc or .pyo files."""
return sys.implementation.cache_tag
def cache_from_source(path, debug_override=None):
"""**DEPRECATED**
Given the path to a .py file, return the path to its .pyc/.pyo file.
The .py file does not need to exist; this simply returns the path to the
.pyc/.pyo file calculated as if the .py file were imported. The extension
will be .pyc unless sys.flags.optimize is non-zero, then it will be .pyo.
If debug_override is not None, then it must be a boolean and is used in
place of sys.flags.optimize.
If sys.implementation.cache_tag is None then NotImplementedError is raised.
"""
return util.cache_from_source(path, debug_override)
def source_from_cache(path):
"""**DEPRECATED**
Given the path to a .pyc./.pyo file, return the path to its .py file.
The .pyc/.pyo file does not need to exist; this simply returns the path to
the .py file calculated to correspond to the .pyc/.pyo file. If path does
not conform to PEP 3147 format, ValueError will be raised. If
sys.implementation.cache_tag is None then NotImplementedError is raised.
"""
return util.source_from_cache(path)
def get_suffixes():
"""**DEPRECATED**"""
extensions = [(s, 'rb', C_EXTENSION) for s in machinery.EXTENSION_SUFFIXES]
source = [(s, 'r', PY_SOURCE) for s in machinery.SOURCE_SUFFIXES]
bytecode = [(s, 'rb', PY_COMPILED) for s in machinery.BYTECODE_SUFFIXES]
return extensions + source + bytecode
class NullImporter:
"""**DEPRECATED**
Null import object.
"""
def __init__(self, path):
if path == '':
raise ImportError('empty pathname', path='')
elif os.path.isdir(path):
raise ImportError('existing directory', path=path)
def find_module(self, fullname):
"""Always returns None."""
return None
class _HackedGetData:
"""Compatibility support for 'file' arguments of various load_*()
functions."""
def __init__(self, fullname, path, file=None):
super().__init__(fullname, path)
self.file = file
def get_data(self, path):
"""Gross hack to contort loader to deal w/ load_*()'s bad API."""
if self.file and path == self.path:
if not self.file.closed:
file = self.file
else:
self.file = file = open(self.path, 'r')
with file:
# Technically should be returning bytes, but
# SourceLoader.get_code() just passed what is returned to
# compile() which can handle str. And converting to bytes would
# require figuring out the encoding to decode to and
# tokenize.detect_encoding() only accepts bytes.
return file.read()
else:
return super().get_data(path)
class _LoadSourceCompatibility(_HackedGetData, machinery.SourceFileLoader):
"""Compatibility support for implementing load_source()."""
def load_source(name, pathname, file=None):
loader = _LoadSourceCompatibility(name, pathname, file)
spec = util.spec_from_file_location(name, pathname, loader=loader)
methods = _SpecMethods(spec)
if name in sys.modules:
module = methods.exec(sys.modules[name])
else:
module = methods.load()
# To allow reloading to potentially work, use a non-hacked loader which
# won't rely on a now-closed file object.
module.__loader__ = machinery.SourceFileLoader(name, pathname)
module.__spec__.loader = module.__loader__
return module
class _LoadCompiledCompatibility(_HackedGetData, SourcelessFileLoader):
"""Compatibility support for implementing load_compiled()."""
def load_compiled(name, pathname, file=None):
"""**DEPRECATED**"""
loader = _LoadCompiledCompatibility(name, pathname, file)
spec = util.spec_from_file_location(name, pathname, loader=loader)
methods = _SpecMethods(spec)
if name in sys.modules:
module = methods.exec(sys.modules[name])
else:
module = methods.load()
# To allow reloading to potentially work, use a non-hacked loader which
# won't rely on a now-closed file object.
module.__loader__ = SourcelessFileLoader(name, pathname)
module.__spec__.loader = module.__loader__
return module
def load_package(name, path):
"""**DEPRECATED**"""
if os.path.isdir(path):
extensions = (machinery.SOURCE_SUFFIXES[:] +
machinery.BYTECODE_SUFFIXES[:])
for extension in extensions:
path = os.path.join(path, '__init__'+extension)
if os.path.exists(path):
break
else:
raise ValueError('{!r} is not a package'.format(path))
spec = util.spec_from_file_location(name, path,
submodule_search_locations=[])
methods = _SpecMethods(spec)
if name in sys.modules:
return methods.exec(sys.modules[name])
else:
return methods.load()
def load_module(name, file, filename, details):
"""**DEPRECATED**
Load a module, given information returned by find_module().
The module name must include the full package name, if any.
"""
suffix, mode, type_ = details
if mode and (not mode.startswith(('r', 'U')) or '+' in mode):
raise ValueError('invalid file open mode {!r}'.format(mode))
elif file is None and type_ in {PY_SOURCE, PY_COMPILED}:
msg = 'file object required for import (type code {})'.format(type_)
raise ValueError(msg)
elif type_ == PY_SOURCE:
return load_source(name, filename, file)
elif type_ == PY_COMPILED:
return load_compiled(name, filename, file)
elif type_ == C_EXTENSION and load_dynamic is not None:
if file is None:
with open(filename, 'rb') as opened_file:
return load_dynamic(name, filename, opened_file)
else:
return load_dynamic(name, filename, file)
elif type_ == PKG_DIRECTORY:
return load_package(name, filename)
elif type_ == C_BUILTIN:
return init_builtin(name)
elif type_ == PY_FROZEN:
return init_frozen(name)
else:
msg = "Don't know how to import {} (type code {})".format(name, type_)
raise ImportError(msg, name=name)
def find_module(name, path=None):
"""**DEPRECATED**
Search for a module.
If path is omitted or None, search for a built-in, frozen or special
module and continue search in sys.path. The module name cannot
contain '.'; to search for a submodule of a package, pass the
submodule name and the package's __path__.
"""
if not isinstance(name, str):
raise TypeError("'name' must be a str, not {}".format(type(name)))
elif not isinstance(path, (type(None), list)):
# Backwards-compatibility
raise RuntimeError("'list' must be None or a list, "
"not {}".format(type(name)))
if path is None:
if is_builtin(name):
return None, None, ('', '', C_BUILTIN)
elif is_frozen(name):
return None, None, ('', '', PY_FROZEN)
else:
path = sys.path
for entry in path:
package_directory = os.path.join(entry, name)
for suffix in ['.py', machinery.BYTECODE_SUFFIXES[0]]:
package_file_name = '__init__' + suffix
file_path = os.path.join(package_directory, package_file_name)
if os.path.isfile(file_path):
return None, package_directory, ('', '', PKG_DIRECTORY)
for suffix, mode, type_ in get_suffixes():
file_name = name + suffix
file_path = os.path.join(entry, file_name)
if os.path.isfile(file_path):
break
else:
continue
break # Break out of outer loop when breaking out of inner loop.
else:
raise ImportError(_ERR_MSG.format(name), name=name)
encoding = None
if 'b' not in mode:
with open(file_path, 'rb') as file:
encoding = tokenize.detect_encoding(file.readline)[0]
file = open(file_path, mode, encoding=encoding)
return file, file_path, (suffix, mode, type_)
def reload(module):
"""**DEPRECATED**
Reload the module and return it.
The module must have been successfully imported before.
"""
return importlib.reload(module)
"""Get useful information from live Python objects.
This module encapsulates the interface provided by the internal special
attributes (co_*, im_*, tb_*, etc.) in a friendlier fashion.
It also provides some help for examining source code and class layout.
Here are some of the useful functions provided by this module:
ismodule(), isclass(), ismethod(), isfunction(), isgeneratorfunction(),
isgenerator(), istraceback(), isframe(), iscode(), isbuiltin(),
isroutine() - check object types
getmembers() - get members of an object that satisfy a given condition
getfile(), getsourcefile(), getsource() - find an object's source code
getdoc(), getcomments() - get documentation on an object
getmodule() - determine the module that an object came from
getclasstree() - arrange classes so as to represent their hierarchy
getargspec(), getargvalues(), getcallargs() - get info about function arguments
getfullargspec() - same, with support for Python-3000 features
formatargspec(), formatargvalues() - format an argument spec
getouterframes(), getinnerframes() - get info about frames
currentframe() - get the current stack frame
stack(), trace() - get info about frames on the stack or in a traceback
signature() - get a Signature object for the callable
"""
# This module is in the public domain. No warranties.
__author__ = ('Ka-Ping Yee <[email protected]>',
'Yury Selivanov <[email protected]>')
import ast
import importlib.machinery
import itertools
import linecache
import os
import re
import sys
import tokenize
import token
import types
import warnings
import functools
import builtins
from operator import attrgetter
from collections import namedtuple, OrderedDict
# Create constants for the compiler flags in Include/code.h
# We try to get them from dis to avoid duplication, but fall
# back to hard-coding so the dependency is optional
try:
from dis import COMPILER_FLAG_NAMES as _flag_names
except ImportError:
CO_OPTIMIZED, CO_NEWLOCALS = 0x1, 0x2
CO_VARARGS, CO_VARKEYWORDS = 0x4, 0x8
CO_NESTED, CO_GENERATOR, CO_NOFREE = 0x10, 0x20, 0x40
else:
mod_dict = globals()
for k, v in _flag_names.items():
mod_dict["CO_" + v] = k
# See Include/object.h
TPFLAGS_IS_ABSTRACT = 1 << 20
# ----------------------------------------------------------- type-checking
def ismodule(object):
"""Return true if the object is a module.
Module objects provide these attributes:
__cached__ pathname to byte compiled file
__doc__ documentation string
__file__ filename (missing for built-in modules)"""
return isinstance(object, types.ModuleType)
def isclass(object):
"""Return true if the object is a class.
Class objects provide these attributes:
__doc__ documentation string
__module__ name of module in which this class was defined"""
return isinstance(object, type)
def ismethod(object):
"""Return true if the object is an instance method.
Instance method objects provide these attributes:
__doc__ documentation string
__name__ name with which this method was defined
__func__ function object containing implementation of method
__self__ instance to which this method is bound"""
return isinstance(object, types.MethodType)
def ismethoddescriptor(object):
"""Return true if the object is a method descriptor.
But not if ismethod() or isclass() or isfunction() are true.
This is new in Python 2.2, and, for example, is true of int.__add__.
An object passing this test has a __get__ attribute but not a __set__
attribute, but beyond that the set of attributes varies. __name__ is
usually sensible, and __doc__ often is.
Methods implemented via descriptors that also pass one of the other
tests return false from the ismethoddescriptor() test, simply because
the other tests promise more -- you can, e.g., count on having the
__func__ attribute (etc) when an object passes ismethod()."""
if isclass(object) or ismethod(object) or isfunction(object):
# mutual exclusion
return False
tp = type(object)
return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
def isdatadescriptor(object):
"""Return true if the object is a data descriptor.
Data descriptors have both a __get__ and a __set__ attribute. Examples are
properties (defined in Python) and getsets and members (defined in C).
Typically, data descriptors will also have __name__ and __doc__ attributes
(properties, getsets, and members have both of these attributes), but this
is not guaranteed."""
if isclass(object) or ismethod(object) or isfunction(object):
# mutual exclusion
return False
tp = type(object)
return hasattr(tp, "__set__") and hasattr(tp, "__get__")
if hasattr(types, 'MemberDescriptorType'):
# CPython and equivalent
def ismemberdescriptor(object):
"""Return true if the object is a member descriptor.
Member descriptors are specialized descriptors defined in extension
modules."""
return isinstance(object, types.MemberDescriptorType)
else:
# Other implementations
def ismemberdescriptor(object):
"""Return true if the object is a member descriptor.
Member descriptors are specialized descriptors defined in extension
modules."""
return False
if hasattr(types, 'GetSetDescriptorType'):
# CPython and equivalent
def isgetsetdescriptor(object):
"""Return true if the object is a getset descriptor.
getset descriptors are specialized descriptors defined in extension
modules."""
return isinstance(object, types.GetSetDescriptorType)
else:
# Other implementations
def isgetsetdescriptor(object):
"""Return true if the object is a getset descriptor.
getset descriptors are specialized descriptors defined in extension
modules."""
return False
def isfunction(object):
"""Return true if the object is a user-defined function.
Function objects provide these attributes:
__doc__ documentation string
__name__ name with which this function was defined
__code__ code object containing compiled function bytecode
__defaults__ tuple of any default values for arguments
__globals__ global namespace in which this function was defined
__annotations__ dict of parameter annotations
__kwdefaults__ dict of keyword only parameters with defaults"""
return isinstance(object, types.FunctionType)
def isgeneratorfunction(object):
"""Return true if the object is a user-defined generator function.
Generator function objects provides same attributes as functions.
See help(isfunction) for attributes listing."""
return bool((isfunction(object) or ismethod(object)) and
object.__code__.co_flags & CO_GENERATOR)
def isgenerator(object):
"""Return true if the object is a generator.
Generator objects provide these attributes:
__iter__ defined to support iteration over container
close raises a new GeneratorExit exception inside the
generator to terminate the iteration
gi_code code object
gi_frame frame object or possibly None once the generator has
been exhausted
gi_running set to 1 when generator is executing, 0 otherwise
next return the next item from the container
send resumes the generator and "sends" a value that becomes
the result of the current yield-expression
throw used to raise an exception inside the generator"""
return isinstance(object, types.GeneratorType)
def istraceback(object):
"""Return true if the object is a traceback.
Traceback objects provide these attributes:
tb_frame frame object at this level
tb_lasti index of last attempted instruction in bytecode
tb_lineno current line number in Python source code
tb_next next inner traceback object (called by this level)"""
return isinstance(object, types.TracebackType)
def isframe(object):
"""Return true if the object is a frame object.
Frame objects provide these attributes:
f_back next outer frame object (this frame's caller)
f_builtins built-in namespace seen by this frame
f_code code object being executed in this frame
f_globals global namespace seen by this frame
f_lasti index of last attempted instruction in bytecode
f_lineno current line number in Python source code
f_locals local namespace seen by this frame
f_trace tracing function for this frame, or None"""
return isinstance(object, types.FrameType)
def iscode(object):
"""Return true if the object is a code object.
Code objects provide these attributes:
co_argcount number of arguments (not including * or ** args)
co_code string of raw compiled bytecode
co_consts tuple of constants used in the bytecode
co_filename name of file in which this code object was created
co_firstlineno number of first line in Python source code
co_flags bitmap: 1=optimized | 2=newlocals | 4=*arg | 8=**arg
co_lnotab encoded mapping of line numbers to bytecode indices
co_name name with which this code object was defined
co_names tuple of names of local variables
co_nlocals number of local variables
co_stacksize virtual machine stack space required
co_varnames tuple of names of arguments and local variables"""
return isinstance(object, types.CodeType)
def isbuiltin(object):
"""Return true if the object is a built-in function or method.
Built-in functions and methods provide these attributes:
__doc__ documentation string
__name__ original name of this function or method
__self__ instance to which a method is bound, or None"""
return isinstance(object, types.BuiltinFunctionType)
def isroutine(object):
"""Return true if the object is any kind of function or method."""
return (isbuiltin(object)
or isfunction(object)
or ismethod(object)
or ismethoddescriptor(object))
def isabstract(object):
"""Return true if the object is an abstract base class (ABC)."""
return bool(isinstance(object, type) and object.__flags__ & TPFLAGS_IS_ABSTRACT)
def getmembers(object, predicate=None):
"""Return all members of an object as (name, value) pairs sorted by name.
Optionally, only return members that satisfy a given predicate."""
if isclass(object):
mro = (object,) + getmro(object)
else:
mro = ()
results = []
processed = set()
names = dir(object)
# :dd any DynamicClassAttributes to the list of names if object is a class;
# this may result in duplicate entries if, for example, a virtual
# attribute with the same name as a DynamicClassAttribute exists
try:
for base in object.__bases__:
for k, v in base.__dict__.items():
if isinstance(v, types.DynamicClassAttribute):
names.append(k)
except AttributeError:
pass
for key in names:
# First try to get the value via getattr. Some descriptors don't
# like calling their __get__ (see bug #1785), so fall back to
# looking in the __dict__.
try:
value = getattr(object, key)
# handle the duplicate key
if key in processed:
raise AttributeError
except AttributeError:
for base in mro:
if key in base.__dict__:
value = base.__dict__[key]
break
else:
# could be a (currently) missing slot member, or a buggy
# __dir__; discard and move on
continue
if not predicate or predicate(value):
results.append((key, value))
processed.add(key)
results.sort(key=lambda pair: pair[0])
return results
Attribute = namedtuple('Attribute', 'name kind defining_class object')
def classify_class_attrs(cls):
"""Return list of attribute-descriptor tuples.
For each name in dir(cls), the return list contains a 4-tuple
with these elements:
0. The name (a string).
1. The kind of attribute this is, one of these strings:
'class method' created via classmethod()
'static method' created via staticmethod()
'property' created via property()
'method' any other flavor of method or descriptor
'data' not a method
2. The class which defined this attribute (a class).
3. The object as obtained by calling getattr; if this fails, or if the
resulting object does not live anywhere in the class' mro (including
metaclasses) then the object is looked up in the defining class's
dict (found by walking the mro).
If one of the items in dir(cls) is stored in the metaclass it will now
be discovered and not have None be listed as the class in which it was
defined. Any items whose home class cannot be discovered are skipped.
"""
mro = getmro(cls)
metamro = getmro(type(cls)) # for attributes stored in the metaclass
metamro = tuple([cls for cls in metamro if cls not in (type, object)])
class_bases = (cls,) + mro
all_bases = class_bases + metamro
names = dir(cls)
# :dd any DynamicClassAttributes to the list of names;
# this may result in duplicate entries if, for example, a virtual
# attribute with the same name as a DynamicClassAttribute exists.
for base in mro:
for k, v in base.__dict__.items():
if isinstance(v, types.DynamicClassAttribute):
names.append(k)
result = []
processed = set()
for name in names:
# Get the object associated with the name, and where it was defined.
# Normal objects will be looked up with both getattr and directly in
# its class' dict (in case getattr fails [bug #1785], and also to look
# for a docstring).
# For DynamicClassAttributes on the second pass we only look in the
# class's dict.
#
# Getting an obj from the __dict__ sometimes reveals more than
# using getattr. Static and class methods are dramatic examples.
homecls = None
get_obj = None
dict_obj = None
if name not in processed:
try:
if name == '__dict__':
raise Exception("__dict__ is special, don't want the proxy")
get_obj = getattr(cls, name)
except Exception as exc:
pass
else:
homecls = getattr(get_obj, "__objclass__", homecls)
if homecls not in class_bases:
# if the resulting object does not live somewhere in the
# mro, drop it and search the mro manually
homecls = None
last_cls = None
# first look in the classes
for srch_cls in class_bases:
srch_obj = getattr(srch_cls, name, None)
if srch_obj is get_obj:
last_cls = srch_cls
# then check the metaclasses
for srch_cls in metamro:
try:
srch_obj = srch_cls.__getattr__(cls, name)
except AttributeError:
continue
if srch_obj is get_obj:
last_cls = srch_cls
if last_cls is not None:
homecls = last_cls
for base in all_bases:
if name in base.__dict__:
dict_obj = base.__dict__[name]
if homecls not in metamro:
homecls = base
break
if homecls is None:
# unable to locate the attribute anywhere, most likely due to
# buggy custom __dir__; discard and move on
continue
obj = get_obj if get_obj is not None else dict_obj
# Classify the object or its descriptor.
if isinstance(dict_obj, staticmethod):
kind = "static method"
obj = dict_obj
elif isinstance(dict_obj, classmethod):
kind = "class method"
obj = dict_obj
elif isinstance(dict_obj, property):
kind = "property"
obj = dict_obj
elif isroutine(obj):
kind = "method"
else:
kind = "data"
result.append(Attribute(name, kind, homecls, obj))
processed.add(name)
return result
# ----------------------------------------------------------- class helpers
def getmro(cls):
"Return tuple of base classes (including cls) in method resolution order."
return cls.__mro__
# -------------------------------------------------------- function helpers
def unwrap(func, *, stop=None):
"""Get the object wrapped by *func*.
Follows the chain of :attr:`__wrapped__` attributes returning the last
object in the chain.
*stop* is an optional callback accepting an object in the wrapper chain
as its sole argument that allows the unwrapping to be terminated early if
the callback returns a true value. If the callback never returns a true
value, the last object in the chain is returned as usual. For example,
:func:`signature` uses this to stop unwrapping if any object in the
chain has a ``__signature__`` attribute defined.
:exc:`ValueError` is raised if a cycle is encountered.
"""
if stop is None:
def _is_wrapper(f):
return hasattr(f, '__wrapped__')
else:
def _is_wrapper(f):
return hasattr(f, '__wrapped__') and not stop(f)
f = func # remember the original func for error reporting
memo = {id(f)} # Memoise by id to tolerate non-hashable objects
while _is_wrapper(func):
func = func.__wrapped__
id_func = id(func)
if id_func in memo:
raise ValueError('wrapper loop when unwrapping {!r}'.format(f))
memo.add(id_func)
return func
# -------------------------------------------------- source code extraction
def indentsize(line):
"""Return the indent size, in spaces, at the start of a line of text."""
expline = line.expandtabs()
return len(expline) - len(expline.lstrip())
def getdoc(object):
"""Get the documentation string for an object.
All tabs are expanded to spaces. To clean up docstrings that are
indented to line up with blocks of code, any whitespace than can be
uniformly removed from the second line onwards is removed."""
try:
doc = object.__doc__
except AttributeError:
return None
if not isinstance(doc, str):
return None
return cleandoc(doc)
def cleandoc(doc):
"""Clean up indentation from docstrings.
Any whitespace that can be uniformly removed from the second line
onwards is removed."""
try:
lines = doc.expandtabs().split('\n')
except UnicodeError:
return None
else:
# Find minimum indentation of any non-blank lines after first line.
margin = sys.maxsize
for line in lines[1:]:
content = len(line.lstrip())
if content:
indent = len(line) - content
margin = min(margin, indent)
# Remove indentation.
if lines:
lines[0] = lines[0].lstrip()
if margin < sys.maxsize:
for i in range(1, len(lines)): lines[i] = lines[i][margin:]
# Remove any trailing or leading blank lines.
while lines and not lines[-1]:
lines.pop()
while lines and not lines[0]:
lines.pop(0)
return '\n'.join(lines)
def getfile(object):
"""Work out which source or compiled file an object was defined in."""
if ismodule(object):
if hasattr(object, '__file__'):
return object.__file__
raise TypeError('{!r} is a built-in module'.format(object))
if isclass(object):
if hasattr(object, '__module__'):
object = sys.modules.get(object.__module__)
if hasattr(object, '__file__'):
return object.__file__
raise TypeError('{!r} is a built-in class'.format(object))
if ismethod(object):
object = object.__func__
if isfunction(object):
object = object.__code__
if istraceback(object):
object = object.tb_frame
if isframe(object):
object = object.f_code
if iscode(object):
return object.co_filename
raise TypeError('{!r} is not a module, class, method, '
'function, traceback, frame, or code object'.format(object))
ModuleInfo = namedtuple('ModuleInfo', 'name suffix mode module_type')
def getmoduleinfo(path):
"""Get the module name, suffix, mode, and module type for a given file."""
warnings.warn('inspect.getmoduleinfo() is deprecated', DeprecationWarning,
2)
with warnings.catch_warnings():
warnings.simplefilter('ignore', PendingDeprecationWarning)
import imp
filename = os.path.basename(path)
suffixes = [(-len(suffix), suffix, mode, mtype)
for suffix, mode, mtype in imp.get_suffixes()]
suffixes.sort() # try longest suffixes first, in case they overlap
for neglen, suffix, mode, mtype in suffixes:
if filename[neglen:] == suffix:
return ModuleInfo(filename[:neglen], suffix, mode, mtype)
def getmodulename(path):
"""Return the module name for a given file, or None."""
fname = os.path.basename(path)
# Check for paths that look like an actual module file
suffixes = [(-len(suffix), suffix)
for suffix in importlib.machinery.all_suffixes()]
suffixes.sort() # try longest suffixes first, in case they overlap
for neglen, suffix in suffixes:
if fname.endswith(suffix):
return fname[:neglen]
return None
def getsourcefile(object):
"""Return the filename that can be used to locate an object's source.
Return None if no way can be identified to get the source.
"""
filename = getfile(object)
all_bytecode_suffixes = importlib.machinery.DEBUG_BYTECODE_SUFFIXES[:]
all_bytecode_suffixes += importlib.machinery.OPTIMIZED_BYTECODE_SUFFIXES[:]
if any(filename.endswith(s) for s in all_bytecode_suffixes):
filename = (os.path.splitext(filename)[0] +
importlib.machinery.SOURCE_SUFFIXES[0])
elif any(filename.endswith(s) for s in
importlib.machinery.EXTENSION_SUFFIXES):
return None
if os.path.exists(filename):
return filename
# only return a non-existent filename if the module has a PEP 302 loader
if getattr(getmodule(object, filename), '__loader__', None) is not None:
return filename
# or it is in the linecache
if filename in linecache.cache:
return filename
def getabsfile(object, _filename=None):
"""Return an absolute path to the source or compiled file for an object.
The idea is for each object to have a unique origin, so this routine
normalizes the result as much as possible."""
if _filename is None:
_filename = getsourcefile(object) or getfile(object)
return os.path.normcase(os.path.abspath(_filename))
modulesbyfile = {}
_filesbymodname = {}
def getmodule(object, _filename=None):
"""Return the module an object was defined in, or None if not found."""
if ismodule(object):
return object
if hasattr(object, '__module__'):
return sys.modules.get(object.__module__)
# Try the filename to modulename cache
if _filename is not None and _filename in modulesbyfile:
return sys.modules.get(modulesbyfile[_filename])
# Try the cache again with the absolute file name
try:
file = getabsfile(object, _filename)
except TypeError:
return None
if file in modulesbyfile:
return sys.modules.get(modulesbyfile[file])
# Update the filename to module name cache and check yet again
# Copy sys.modules in order to cope with changes while iterating
for modname, module in list(sys.modules.items()):
if ismodule(module) and hasattr(module, '__file__'):
f = module.__file__
if f == _filesbymodname.get(modname, None):
# Have already mapped this module, so skip it
continue
_filesbymodname[modname] = f
f = getabsfile(module)
# Always map to the name the module knows itself by
modulesbyfile[f] = modulesbyfile[
os.path.realpath(f)] = module.__name__
if file in modulesbyfile:
return sys.modules.get(modulesbyfile[file])
# Check the main module
main = sys.modules['__main__']
if not hasattr(object, '__name__'):
return None
if hasattr(main, object.__name__):
mainobject = getattr(main, object.__name__)
if mainobject is object:
return main
# Check builtins
builtin = sys.modules['builtins']
if hasattr(builtin, object.__name__):
builtinobject = getattr(builtin, object.__name__)
if builtinobject is object:
return builtin
def findsource(object):
"""Return the entire source file and starting line number for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a list of all the lines
in the file and the line number indexes a line in that list. An OSError
is raised if the source code cannot be retrieved."""
file = getsourcefile(object)
if file:
# Invalidate cache if needed.
linecache.checkcache(file)
else:
file = getfile(object)
# Allow filenames in form of "<something>" to pass through.
# `doctest` monkeypatches `linecache` module to enable
# inspection, so let `linecache.getlines` to be called.
if not (file.startswith('<') and file.endswith('>')):
raise OSError('source code not available')
module = getmodule(object, file)
if module:
lines = linecache.getlines(file, module.__dict__)
else:
lines = linecache.getlines(file)
if not lines:
raise OSError('could not get source code')
if ismodule(object):
return lines, 0
if isclass(object):
name = object.__name__
pat = re.compile(r'^(\s*)class\s*' + name + r'\b')
# make some effort to find the best matching class definition:
# use the one with the least indentation, which is the one
# that's most probably not inside a function definition.
candidates = []
for i in range(len(lines)):
match = pat.match(lines[i])
if match:
# if it's at toplevel, it's already the best one
if lines[i][0] == 'c':
return lines, i
# else add whitespace to candidate list
candidates.append((match.group(1), i))
if candidates:
# this will sort by whitespace, and by line number,
# less whitespace first
candidates.sort()
return lines, candidates[0][1]
else:
raise OSError('could not find class definition')
if ismethod(object):
object = object.__func__
if isfunction(object):
object = object.__code__
if istraceback(object):
object = object.tb_frame
if isframe(object):
object = object.f_code
if iscode(object):
if not hasattr(object, 'co_firstlineno'):
raise OSError('could not find function definition')
lnum = object.co_firstlineno - 1
pat = re.compile(r'^(\s*def\s)|(.*(?<!\w)lambda(:|\s))|^(\s*@)')
while lnum > 0:
if pat.match(lines[lnum]): break
lnum = lnum - 1
return lines, lnum
raise OSError('could not find code object')
def getcomments(object):
"""Get lines of comments immediately preceding an object's source code.
Returns None when source can't be found.
"""
try:
lines, lnum = findsource(object)
except (OSError, TypeError):
return None
if ismodule(object):
# Look for a comment block at the top of the file.
start = 0
if lines and lines[0][:2] == '#!': start = 1
while start < len(lines) and lines[start].strip() in ('', '#'):
start = start + 1
if start < len(lines) and lines[start][:1] == '#':
comments = []
end = start
while end < len(lines) and lines[end][:1] == '#':
comments.append(lines[end].expandtabs())
end = end + 1
return ''.join(comments)
# Look for a preceding block of comments at the same indentation.
elif lnum > 0:
indent = indentsize(lines[lnum])
end = lnum - 1
if end >= 0 and lines[end].lstrip()[:1] == '#' and \
indentsize(lines[end]) == indent:
comments = [lines[end].expandtabs().lstrip()]
if end > 0:
end = end - 1
comment = lines[end].expandtabs().lstrip()
while comment[:1] == '#' and indentsize(lines[end]) == indent:
comments[:0] = [comment]
end = end - 1
if end < 0: break
comment = lines[end].expandtabs().lstrip()
while comments and comments[0].strip() == '#':
comments[:1] = []
while comments and comments[-1].strip() == '#':
comments[-1:] = []
return ''.join(comments)
class EndOfBlock(Exception): pass
class BlockFinder:
"""Provide a tokeneater() method to detect the end of a code block."""
def __init__(self):
self.indent = 0
self.islambda = False
self.started = False
self.passline = False
self.last = 1
def tokeneater(self, type, token, srowcol, erowcol, line):
if not self.started:
# look for the first "def", "class" or "lambda"
if token in ("def", "class", "lambda"):
if token == "lambda":
self.islambda = True
self.started = True
self.passline = True # skip to the end of the line
elif type == tokenize.NEWLINE:
self.passline = False # stop skipping when a NEWLINE is seen
self.last = srowcol[0]
if self.islambda: # lambdas always end at the first NEWLINE
raise EndOfBlock
elif self.passline:
pass
elif type == tokenize.INDENT:
self.indent = self.indent + 1
self.passline = True
elif type == tokenize.DEDENT:
self.indent = self.indent - 1
# the end of matching indent/dedent pairs end a block
# (note that this only works for "def"/"class" blocks,
# not e.g. for "if: else:" or "try: finally:" blocks)
if self.indent <= 0:
raise EndOfBlock
elif self.indent == 0 and type not in (tokenize.COMMENT, tokenize.NL):
# any other token on the same indentation level end the previous
# block as well, except the pseudo-tokens COMMENT and NL.
raise EndOfBlock
def getblock(lines):
"""Extract the block of code at the top of the given list of lines."""
blockfinder = BlockFinder()
try:
tokens = tokenize.generate_tokens(iter(lines).__next__)
for _token in tokens:
blockfinder.tokeneater(*_token)
except (EndOfBlock, IndentationError):
pass
return lines[:blockfinder.last]
def getsourcelines(object):
"""Return a list of source lines and starting line number for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a list of the lines
corresponding to the object and the line number indicates where in the
original source file the first line of code was found. An OSError is
raised if the source code cannot be retrieved."""
lines, lnum = findsource(object)
if ismodule(object): return lines, 0
else: return getblock(lines[lnum:]), lnum + 1
def getsource(object):
"""Return the text of the source code for an object.
The argument may be a module, class, method, function, traceback, frame,
or code object. The source code is returned as a single string. An
OSError is raised if the source code cannot be retrieved."""
lines, lnum = getsourcelines(object)
return ''.join(lines)
# --------------------------------------------------- class tree extraction
def walktree(classes, children, parent):
"""Recursive helper function for getclasstree()."""
results = []
classes.sort(key=attrgetter('__module__', '__name__'))
for c in classes:
results.append((c, c.__bases__))
if c in children:
results.append(walktree(children[c], children, c))
return results
def getclasstree(classes, unique=False):
"""Arrange the given list of classes into a hierarchy of nested lists.
Where a nested list appears, it contains classes derived from the class
whose entry immediately precedes the list. Each entry is a 2-tuple
containing a class and a tuple of its base classes. If the 'unique'
argument is true, exactly one entry appears in the returned structure
for each class in the given list. Otherwise, classes using multiple
inheritance and their descendants will appear multiple times."""
children = {}
roots = []
for c in classes:
if c.__bases__:
for parent in c.__bases__:
if not parent in children:
children[parent] = []
if c not in children[parent]:
children[parent].append(c)
if unique and parent in classes: break
elif c not in roots:
roots.append(c)
for parent in children:
if parent not in classes:
roots.append(parent)
return walktree(roots, children, None)
# ------------------------------------------------ argument list extraction
Arguments = namedtuple('Arguments', 'args, varargs, varkw')
def getargs(co):
"""Get information about the arguments accepted by a code object.
Three things are returned: (args, varargs, varkw), where
'args' is the list of argument names. Keyword-only arguments are
appended. 'varargs' and 'varkw' are the names of the * and **
arguments or None."""
args, varargs, kwonlyargs, varkw = _getfullargs(co)
return Arguments(args + kwonlyargs, varargs, varkw)
def _getfullargs(co):
"""Get information about the arguments accepted by a code object.
Four things are returned: (args, varargs, kwonlyargs, varkw), where
'args' and 'kwonlyargs' are lists of argument names, and 'varargs'
and 'varkw' are the names of the * and ** arguments or None."""
if not iscode(co):
raise TypeError('{!r} is not a code object'.format(co))
nargs = co.co_argcount
names = co.co_varnames
nkwargs = co.co_kwonlyargcount
args = list(names[:nargs])
kwonlyargs = list(names[nargs:nargs+nkwargs])
step = 0
nargs += nkwargs
varargs = None
if co.co_flags & CO_VARARGS:
varargs = co.co_varnames[nargs]
nargs = nargs + 1
varkw = None
if co.co_flags & CO_VARKEYWORDS:
varkw = co.co_varnames[nargs]
return args, varargs, kwonlyargs, varkw
ArgSpec = namedtuple('ArgSpec', 'args varargs keywords defaults')
def getargspec(func):
"""Get the names and default values of a function's arguments.
A tuple of four things is returned: (args, varargs, varkw, defaults).
'args' is a list of the argument names.
'args' will include keyword-only argument names.
'varargs' and 'varkw' are the names of the * and ** arguments or None.
'defaults' is an n-tuple of the default values of the last n arguments.
Use the getfullargspec() API for Python-3000 code, as annotations
and keyword arguments are supported. getargspec() will raise ValueError
if the func has either annotations or keyword arguments.
"""
args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, ann = \
getfullargspec(func)
if kwonlyargs or ann:
raise ValueError("Function has keyword-only arguments or annotations"
", use getfullargspec() API which can support them")
return ArgSpec(args, varargs, varkw, defaults)
FullArgSpec = namedtuple('FullArgSpec',
'args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, annotations')
def getfullargspec(func):
"""Get the names and default values of a callable object's arguments.
A tuple of seven things is returned:
(args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults annotations).
'args' is a list of the argument names.
'varargs' and 'varkw' are the names of the * and ** arguments or None.
'defaults' is an n-tuple of the default values of the last n arguments.
'kwonlyargs' is a list of keyword-only argument names.
'kwonlydefaults' is a dictionary mapping names from kwonlyargs to defaults.
'annotations' is a dictionary mapping argument names to annotations.
The first four items in the tuple correspond to getargspec().
"""
try:
# Re: `skip_bound_arg=False`
#
# There is a notable difference in behaviour between getfullargspec
# and Signature: the former always returns 'self' parameter for bound
# methods, whereas the Signature always shows the actual calling
# signature of the passed object.
#
# To simulate this behaviour, we "unbind" bound methods, to trick
# inspect.signature to always return their first parameter ("self",
# usually)
# Re: `follow_wrapper_chains=False`
#
# getfullargspec() historically ignored __wrapped__ attributes,
# so we ensure that remains the case in 3.3+
sig = _signature_internal(func,
follow_wrapper_chains=False,
skip_bound_arg=False)
except Exception as ex:
# Most of the times 'signature' will raise ValueError.
# But, it can also raise AttributeError, and, maybe something
# else. So to be fully backwards compatible, we catch all
# possible exceptions here, and reraise a TypeError.
raise TypeError('unsupported callable') from ex
args = []
varargs = None
varkw = None
kwonlyargs = []
defaults = ()
annotations = {}
defaults = ()
kwdefaults = {}
if sig.return_annotation is not sig.empty:
annotations['return'] = sig.return_annotation
for param in sig.parameters.values():
kind = param.kind
name = param.name
if kind is _POSITIONAL_ONLY:
args.append(name)
elif kind is _POSITIONAL_OR_KEYWORD:
args.append(name)
if param.default is not param.empty:
defaults += (param.default,)
elif kind is _VAR_POSITIONAL:
varargs = name
elif kind is _KEYWORD_ONLY:
kwonlyargs.append(name)
if param.default is not param.empty:
kwdefaults[name] = param.default
elif kind is _VAR_KEYWORD:
varkw = name
if param.annotation is not param.empty:
annotations[name] = param.annotation
if not kwdefaults:
# compatibility with 'func.__kwdefaults__'
kwdefaults = None
if not defaults:
# compatibility with 'func.__defaults__'
defaults = None
return FullArgSpec(args, varargs, varkw, defaults,
kwonlyargs, kwdefaults, annotations)
ArgInfo = namedtuple('ArgInfo', 'args varargs keywords locals')
def getargvalues(frame):
"""Get information about arguments passed into a particular frame.
A tuple of four things is returned: (args, varargs, varkw, locals).
'args' is a list of the argument names.
'varargs' and 'varkw' are the names of the * and ** arguments or None.
'locals' is the locals dictionary of the given frame."""
args, varargs, varkw = getargs(frame.f_code)
return ArgInfo(args, varargs, varkw, frame.f_locals)
def formatannotation(annotation, base_module=None):
if isinstance(annotation, type):
if annotation.__module__ in ('builtins', base_module):
return annotation.__name__
return annotation.__module__+'.'+annotation.__name__
return repr(annotation)
def formatannotationrelativeto(object):
module = getattr(object, '__module__', None)
def _formatannotation(annotation):
return formatannotation(annotation, module)
return _formatannotation
def formatargspec(args, varargs=None, varkw=None, defaults=None,
kwonlyargs=(), kwonlydefaults={}, annotations={},
formatarg=str,
formatvarargs=lambda name: '*' + name,
formatvarkw=lambda name: '**' + name,
formatvalue=lambda value: '=' + repr(value),
formatreturns=lambda text: ' -> ' + text,
formatannotation=formatannotation):
"""Format an argument spec from the values returned by getargspec
or getfullargspec.
The first seven arguments are (args, varargs, varkw, defaults,
kwonlyargs, kwonlydefaults, annotations). The other five arguments
are the corresponding optional formatting functions that are called to
turn names and values into strings. The last argument is an optional
function to format the sequence of arguments."""
def formatargandannotation(arg):
result = formatarg(arg)
if arg in annotations:
result += ': ' + formatannotation(annotations[arg])
return result
specs = []
if defaults:
firstdefault = len(args) - len(defaults)
for i, arg in enumerate(args):
spec = formatargandannotation(arg)
if defaults and i >= firstdefault:
spec = spec + formatvalue(defaults[i - firstdefault])
specs.append(spec)
if varargs is not None:
specs.append(formatvarargs(formatargandannotation(varargs)))
else:
if kwonlyargs:
specs.append('*')
if kwonlyargs:
for kwonlyarg in kwonlyargs:
spec = formatargandannotation(kwonlyarg)
if kwonlydefaults and kwonlyarg in kwonlydefaults:
spec += formatvalue(kwonlydefaults[kwonlyarg])
specs.append(spec)
if varkw is not None:
specs.append(formatvarkw(formatargandannotation(varkw)))
result = '(' + ', '.join(specs) + ')'
if 'return' in annotations:
result += formatreturns(formatannotation(annotations['return']))
return result
def formatargvalues(args, varargs, varkw, locals,
formatarg=str,
formatvarargs=lambda name: '*' + name,
formatvarkw=lambda name: '**' + name,
formatvalue=lambda value: '=' + repr(value)):
"""Format an argument spec from the 4 values returned by getargvalues.
The first four arguments are (args, varargs, varkw, locals). The
next four arguments are the corresponding optional formatting functions
that are called to turn names and values into strings. The ninth
argument is an optional function to format the sequence of arguments."""
def convert(name, locals=locals,
formatarg=formatarg, formatvalue=formatvalue):
return formatarg(name) + formatvalue(locals[name])
specs = []
for i in range(len(args)):
specs.append(convert(args[i]))
if varargs:
specs.append(formatvarargs(varargs) + formatvalue(locals[varargs]))
if varkw:
specs.append(formatvarkw(varkw) + formatvalue(locals[varkw]))
return '(' + ', '.join(specs) + ')'
def _missing_arguments(f_name, argnames, pos, values):
names = [repr(name) for name in argnames if name not in values]
missing = len(names)
if missing == 1:
s = names[0]
elif missing == 2:
s = "{} and {}".format(*names)
else:
tail = ", {} and {}".format(*names[-2:])
del names[-2:]
s = ", ".join(names) + tail
raise TypeError("%s() missing %i required %s argument%s: %s" %
(f_name, missing,
"positional" if pos else "keyword-only",
"" if missing == 1 else "s", s))
def _too_many(f_name, args, kwonly, varargs, defcount, given, values):
atleast = len(args) - defcount
kwonly_given = len([arg for arg in kwonly if arg in values])
if varargs:
plural = atleast != 1
sig = "at least %d" % (atleast,)
elif defcount:
plural = True
sig = "from %d to %d" % (atleast, len(args))
else:
plural = len(args) != 1
sig = str(len(args))
kwonly_sig = ""
if kwonly_given:
msg = " positional argument%s (and %d keyword-only argument%s)"
kwonly_sig = (msg % ("s" if given != 1 else "", kwonly_given,
"s" if kwonly_given != 1 else ""))
raise TypeError("%s() takes %s positional argument%s but %d%s %s given" %
(f_name, sig, "s" if plural else "", given, kwonly_sig,
"was" if given == 1 and not kwonly_given else "were"))
def getcallargs(*func_and_positional, **named):
"""Get the mapping of arguments to values.
A dict is returned, with keys the function argument names (including the
names of the * and ** arguments, if any), and values the respective bound
values from 'positional' and 'named'."""
func = func_and_positional[0]
positional = func_and_positional[1:]
spec = getfullargspec(func)
args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, ann = spec
f_name = func.__name__
arg2value = {}
if ismethod(func) and func.__self__ is not None:
# implicit 'self' (or 'cls' for classmethods) argument
positional = (func.__self__,) + positional
num_pos = len(positional)
num_args = len(args)
num_defaults = len(defaults) if defaults else 0
n = min(num_pos, num_args)
for i in range(n):
arg2value[args[i]] = positional[i]
if varargs:
arg2value[varargs] = tuple(positional[n:])
possible_kwargs = set(args + kwonlyargs)
if varkw:
arg2value[varkw] = {}
for kw, value in named.items():
if kw not in possible_kwargs:
if not varkw:
raise TypeError("%s() got an unexpected keyword argument %r" %
(f_name, kw))
arg2value[varkw][kw] = value
continue
if kw in arg2value:
raise TypeError("%s() got multiple values for argument %r" %
(f_name, kw))
arg2value[kw] = value
if num_pos > num_args and not varargs:
_too_many(f_name, args, kwonlyargs, varargs, num_defaults,
num_pos, arg2value)
if num_pos < num_args:
req = args[:num_args - num_defaults]
for arg in req:
if arg not in arg2value:
_missing_arguments(f_name, req, True, arg2value)
for i, arg in enumerate(args[num_args - num_defaults:]):
if arg not in arg2value:
arg2value[arg] = defaults[i]
missing = 0
for kwarg in kwonlyargs:
if kwarg not in arg2value:
if kwonlydefaults and kwarg in kwonlydefaults:
arg2value[kwarg] = kwonlydefaults[kwarg]
else:
missing += 1
if missing:
_missing_arguments(f_name, kwonlyargs, False, arg2value)
return arg2value
ClosureVars = namedtuple('ClosureVars', 'nonlocals globals builtins unbound')
def getclosurevars(func):
"""
Get the mapping of free variables to their current values.
Returns a named tuple of dicts mapping the current nonlocal, global
and builtin references as seen by the body of the function. A final
set of unbound names that could not be resolved is also provided.
"""
if ismethod(func):
func = func.__func__
if not isfunction(func):
raise TypeError("'{!r}' is not a Python function".format(func))
code = func.__code__
# Nonlocal references are named in co_freevars and resolved
# by looking them up in __closure__ by positional index
if func.__closure__ is None:
nonlocal_vars = {}
else:
nonlocal_vars = {
var : cell.cell_contents
for var, cell in zip(code.co_freevars, func.__closure__)
}
# Global and builtin references are named in co_names and resolved
# by looking them up in __globals__ or __builtins__
global_ns = func.__globals__
builtin_ns = global_ns.get("__builtins__", builtins.__dict__)
if ismodule(builtin_ns):
builtin_ns = builtin_ns.__dict__
global_vars = {}
builtin_vars = {}
unbound_names = set()
for name in code.co_names:
if name in ("None", "True", "False"):
# Because these used to be builtins instead of keywords, they
# may still show up as name references. We ignore them.
continue
try:
global_vars[name] = global_ns[name]
except KeyError:
try:
builtin_vars[name] = builtin_ns[name]
except KeyError:
unbound_names.add(name)
return ClosureVars(nonlocal_vars, global_vars,
builtin_vars, unbound_names)
# -------------------------------------------------- stack frame extraction
Traceback = namedtuple('Traceback', 'filename lineno function code_context index')
def getframeinfo(frame, context=1):
"""Get information about a frame or traceback object.
A tuple of five things is returned: the filename, the line number of
the current line, the function name, a list of lines of context from
the source code, and the index of the current line within that list.
The optional second argument specifies the number of lines of context
to return, which are centered around the current line."""
if istraceback(frame):
lineno = frame.tb_lineno
frame = frame.tb_frame
else:
lineno = frame.f_lineno
if not isframe(frame):
raise TypeError('{!r} is not a frame or traceback object'.format(frame))
filename = getsourcefile(frame) or getfile(frame)
if context > 0:
start = lineno - 1 - context//2
try:
lines, lnum = findsource(frame)
except OSError:
lines = index = None
else:
start = max(start, 1)
start = max(0, min(start, len(lines) - context))
lines = lines[start:start+context]
index = lineno - 1 - start
else:
lines = index = None
return Traceback(filename, lineno, frame.f_code.co_name, lines, index)
def getlineno(frame):
"""Get the line number from a frame object, allowing for optimization."""
# FrameType.f_lineno is now a descriptor that grovels co_lnotab
return frame.f_lineno
def getouterframes(frame, context=1):
"""Get a list of records for a frame and all higher (calling) frames.
Each record contains a frame object, filename, line number, function
name, a list of lines of context, and index within the context."""
framelist = []
while frame:
framelist.append((frame,) + getframeinfo(frame, context))
frame = frame.f_back
return framelist
def getinnerframes(tb, context=1):
"""Get a list of records for a traceback's frame and all lower frames.
Each record contains a frame object, filename, line number, function
name, a list of lines of context, and index within the context."""
framelist = []
while tb:
framelist.append((tb.tb_frame,) + getframeinfo(tb, context))
tb = tb.tb_next
return framelist
def currentframe():
"""Return the frame of the caller or None if this is not possible."""
return sys._getframe(1) if hasattr(sys, "_getframe") else None
def stack(context=1):
"""Return a list of records for the stack above the caller's frame."""
return getouterframes(sys._getframe(1), context)
def trace(context=1):
"""Return a list of records for the stack below the current exception."""
return getinnerframes(sys.exc_info()[2], context)
# ------------------------------------------------ static version of getattr
_sentinel = object()
def _static_getmro(klass):
return type.__dict__['__mro__'].__get__(klass)
def _check_instance(obj, attr):
instance_dict = {}
try:
instance_dict = object.__getattribute__(obj, "__dict__")
except AttributeError:
pass
return dict.get(instance_dict, attr, _sentinel)
def _check_class(klass, attr):
for entry in _static_getmro(klass):
if _shadowed_dict(type(entry)) is _sentinel:
try:
return entry.__dict__[attr]
except KeyError:
pass
return _sentinel
def _is_type(obj):
try:
_static_getmro(obj)
except TypeError:
return False
return True
def _shadowed_dict(klass):
dict_attr = type.__dict__["__dict__"]
for entry in _static_getmro(klass):
try:
class_dict = dict_attr.__get__(entry)["__dict__"]
except KeyError:
pass
else:
if not (type(class_dict) is types.GetSetDescriptorType and
class_dict.__name__ == "__dict__" and
class_dict.__objclass__ is entry):
return class_dict
return _sentinel
def getattr_static(obj, attr, default=_sentinel):
"""Retrieve attributes without triggering dynamic lookup via the
descriptor protocol, __getattr__ or __getattribute__.
Note: this function may not be able to retrieve all attributes
that getattr can fetch (like dynamically created attributes)
and may find attributes that getattr can't (like descriptors
that raise AttributeError). It can also return descriptor objects
instead of instance members in some cases. See the
documentation for details.
"""
instance_result = _sentinel
if not _is_type(obj):
klass = type(obj)
dict_attr = _shadowed_dict(klass)
if (dict_attr is _sentinel or
type(dict_attr) is types.MemberDescriptorType):
instance_result = _check_instance(obj, attr)
else:
klass = obj
klass_result = _check_class(klass, attr)
if instance_result is not _sentinel and klass_result is not _sentinel:
if (_check_class(type(klass_result), '__get__') is not _sentinel and
_check_class(type(klass_result), '__set__') is not _sentinel):
return klass_result
if instance_result is not _sentinel:
return instance_result
if klass_result is not _sentinel:
return klass_result
if obj is klass:
# for types we check the metaclass too
for entry in _static_getmro(type(klass)):
if _shadowed_dict(type(entry)) is _sentinel:
try:
return entry.__dict__[attr]
except KeyError:
pass
if default is not _sentinel:
return default
raise AttributeError(attr)
# ------------------------------------------------ generator introspection
GEN_CREATED = 'GEN_CREATED'
GEN_RUNNING = 'GEN_RUNNING'
GEN_SUSPENDED = 'GEN_SUSPENDED'
GEN_CLOSED = 'GEN_CLOSED'
def getgeneratorstate(generator):
"""Get current state of a generator-iterator.
Possible states are:
GEN_CREATED: Waiting to start execution.
GEN_RUNNING: Currently being executed by the interpreter.
GEN_SUSPENDED: Currently suspended at a yield expression.
GEN_CLOSED: Execution has completed.
"""
if generator.gi_running:
return GEN_RUNNING
if generator.gi_frame is None:
return GEN_CLOSED
if generator.gi_frame.f_lasti == -1:
return GEN_CREATED
return GEN_SUSPENDED
def getgeneratorlocals(generator):
"""
Get the mapping of generator local variables to their current values.
A dict is returned, with the keys the local variable names and values the
bound values."""
if not isgenerator(generator):
raise TypeError("'{!r}' is not a Python generator".format(generator))
frame = getattr(generator, "gi_frame", None)
if frame is not None:
return generator.gi_frame.f_locals
else:
return {}
###############################################################################
### Function Signature Object (PEP 362)
###############################################################################
_WrapperDescriptor = type(type.__call__)
_MethodWrapper = type(all.__call__)
_ClassMethodWrapper = type(int.__dict__['from_bytes'])
_NonUserDefinedCallables = (_WrapperDescriptor,
_MethodWrapper,
_ClassMethodWrapper,
types.BuiltinFunctionType)
def _signature_get_user_defined_method(cls, method_name):
try:
meth = getattr(cls, method_name)
except AttributeError:
return
else:
if not isinstance(meth, _NonUserDefinedCallables):
# Once '__signature__' will be added to 'C'-level
# callables, this check won't be necessary
return meth
def _signature_get_partial(wrapped_sig, partial, extra_args=()):
# Internal helper to calculate how 'wrapped_sig' signature will
# look like after applying a 'functools.partial' object (or alike)
# on it.
old_params = wrapped_sig.parameters
new_params = OrderedDict(old_params.items())
partial_args = partial.args or ()
partial_keywords = partial.keywords or {}
if extra_args:
partial_args = extra_args + partial_args
try:
ba = wrapped_sig.bind_partial(*partial_args, **partial_keywords)
except TypeError as ex:
msg = 'partial object {!r} has incorrect arguments'.format(partial)
raise ValueError(msg) from ex
transform_to_kwonly = False
for param_name, param in old_params.items():
try:
arg_value = ba.arguments[param_name]
except KeyError:
pass
else:
if param.kind is _POSITIONAL_ONLY:
# If positional-only parameter is bound by partial,
# it effectively disappears from the signature
new_params.pop(param_name)
continue
if param.kind is _POSITIONAL_OR_KEYWORD:
if param_name in partial_keywords:
# This means that this parameter, and all parameters
# after it should be keyword-only (and var-positional
# should be removed). Here's why. Consider the following
# function:
# foo(a, b, *args, c):
# pass
#
# "partial(foo, a='spam')" will have the following
# signature: "(*, a='spam', b, c)". Because attempting
# to call that partial with "(10, 20)" arguments will
# raise a TypeError, saying that "a" argument received
# multiple values.
transform_to_kwonly = True
# Set the new default value
new_params[param_name] = param.replace(default=arg_value)
else:
# was passed as a positional argument
new_params.pop(param.name)
continue
if param.kind is _KEYWORD_ONLY:
# Set the new default value
new_params[param_name] = param.replace(default=arg_value)
if transform_to_kwonly:
assert param.kind is not _POSITIONAL_ONLY
if param.kind is _POSITIONAL_OR_KEYWORD:
new_param = new_params[param_name].replace(kind=_KEYWORD_ONLY)
new_params[param_name] = new_param
new_params.move_to_end(param_name)
elif param.kind in (_KEYWORD_ONLY, _VAR_KEYWORD):
new_params.move_to_end(param_name)
elif param.kind is _VAR_POSITIONAL:
new_params.pop(param.name)
return wrapped_sig.replace(parameters=new_params.values())
def _signature_bound_method(sig):
# Internal helper to transform signatures for unbound
# functions to bound methods
params = tuple(sig.parameters.values())
if not params or params[0].kind in (_VAR_KEYWORD, _KEYWORD_ONLY):
raise ValueError('invalid method signature')
kind = params[0].kind
if kind in (_POSITIONAL_OR_KEYWORD, _POSITIONAL_ONLY):
# Drop first parameter:
# '(p1, p2[, ...])' -> '(p2[, ...])'
params = params[1:]
else:
if kind is not _VAR_POSITIONAL:
# Unless we add a new parameter type we never
# get here
raise ValueError('invalid argument type')
# It's a var-positional parameter.
# Do nothing. '(*args[, ...])' -> '(*args[, ...])'
return sig.replace(parameters=params)
def _signature_is_builtin(obj):
# Internal helper to test if `obj` is a callable that might
# support Argument Clinic's __text_signature__ protocol.
return (isbuiltin(obj) or
ismethoddescriptor(obj) or
isinstance(obj, _NonUserDefinedCallables) or
# Can't test 'isinstance(type)' here, as it would
# also be True for regular python classes
obj in (type, object))
def _signature_is_functionlike(obj):
# Internal helper to test if `obj` is a duck type of FunctionType.
# A good example of such objects are functions compiled with
# Cython, which have all attributes that a pure Python function
# would have, but have their code statically compiled.
if not callable(obj) or isclass(obj):
# All function-like objects are obviously callables,
# and not classes.
return False
name = getattr(obj, '__name__', None)
code = getattr(obj, '__code__', None)
defaults = getattr(obj, '__defaults__', _void) # Important to use _void ...
kwdefaults = getattr(obj, '__kwdefaults__', _void) # ... and not None here
annotations = getattr(obj, '__annotations__', None)
return (isinstance(code, types.CodeType) and
isinstance(name, str) and
(defaults is None or isinstance(defaults, tuple)) and
(kwdefaults is None or isinstance(kwdefaults, dict)) and
isinstance(annotations, dict))
def _signature_get_bound_param(spec):
# Internal helper to get first parameter name from a
# __text_signature__ of a builtin method, which should
# be in the following format: '($param1, ...)'.
# Assumptions are that the first argument won't have
# a default value or an annotation.
assert spec.startswith('($')
pos = spec.find(',')
if pos == -1:
pos = spec.find(')')
cpos = spec.find(':')
assert cpos == -1 or cpos > pos
cpos = spec.find('=')
assert cpos == -1 or cpos > pos
return spec[2:pos]
def _signature_strip_non_python_syntax(signature):
"""
Takes a signature in Argument Clinic's extended signature format.
Returns a tuple of three things:
* that signature re-rendered in standard Python syntax,
* the index of the "self" parameter (generally 0), or None if
the function does not have a "self" parameter, and
* the index of the last "positional only" parameter,
or None if the signature has no positional-only parameters.
"""
if not signature:
return signature, None, None
self_parameter = None
last_positional_only = None
lines = [l.encode('ascii') for l in signature.split('\n')]
generator = iter(lines).__next__
token_stream = tokenize.tokenize(generator)
delayed_comma = False
skip_next_comma = False
text = []
add = text.append
current_parameter = 0
OP = token.OP
ERRORTOKEN = token.ERRORTOKEN
# token stream always starts with ENCODING token, skip it
t = next(token_stream)
assert t.type == tokenize.ENCODING
for t in token_stream:
type, string = t.type, t.string
if type == OP:
if string == ',':
if skip_next_comma:
skip_next_comma = False
else:
assert not delayed_comma
delayed_comma = True
current_parameter += 1
continue
if string == '/':
assert not skip_next_comma
assert last_positional_only is None
skip_next_comma = True
last_positional_only = current_parameter - 1
continue
if (type == ERRORTOKEN) and (string == '$'):
assert self_parameter is None
self_parameter = current_parameter
continue
if delayed_comma:
delayed_comma = False
if not ((type == OP) and (string == ')')):
add(', ')
add(string)
if (string == ','):
add(' ')
clean_signature = ''.join(text)
return clean_signature, self_parameter, last_positional_only
def _signature_fromstr(cls, obj, s, skip_bound_arg=True):
# Internal helper to parse content of '__text_signature__'
# and return a Signature based on it
Parameter = cls._parameter_cls
clean_signature, self_parameter, last_positional_only = \
_signature_strip_non_python_syntax(s)
program = "def foo" + clean_signature + ": pass"
try:
module = ast.parse(program)
except SyntaxError:
module = None
if not isinstance(module, ast.Module):
raise ValueError("{!r} builtin has invalid signature".format(obj))
f = module.body[0]
parameters = []
empty = Parameter.empty
invalid = object()
module = None
module_dict = {}
module_name = getattr(obj, '__module__', None)
if module_name:
module = sys.modules.get(module_name, None)
if module:
module_dict = module.__dict__
sys_module_dict = sys.modules
def parse_name(node):
assert isinstance(node, ast.arg)
if node.annotation != None:
raise ValueError("Annotations are not currently supported")
return node.arg
def wrap_value(s):
try:
value = eval(s, module_dict)
except NameError:
try:
value = eval(s, sys_module_dict)
except NameError:
raise RuntimeError()
if isinstance(value, str):
return ast.Str(value)
if isinstance(value, (int, float)):
return ast.Num(value)
if isinstance(value, bytes):
return ast.Bytes(value)
if value in (True, False, None):
return ast.NameConstant(value)
raise RuntimeError()
class RewriteSymbolics(ast.NodeTransformer):
def visit_Attribute(self, node):
a = []
n = node
while isinstance(n, ast.Attribute):
a.append(n.attr)
n = n.value
if not isinstance(n, ast.Name):
raise RuntimeError()
a.append(n.id)
value = ".".join(reversed(a))
return wrap_value(value)
def visit_Name(self, node):
if not isinstance(node.ctx, ast.Load):
raise ValueError()
return wrap_value(node.id)
def p(name_node, default_node, default=empty):
name = parse_name(name_node)
if name is invalid:
return None
if default_node and default_node is not _empty:
try:
default_node = RewriteSymbolics().visit(default_node)
o = ast.literal_eval(default_node)
except ValueError:
o = invalid
if o is invalid:
return None
default = o if o is not invalid else default
parameters.append(Parameter(name, kind, default=default, annotation=empty))
# non-keyword-only parameters
args = reversed(f.args.args)
defaults = reversed(f.args.defaults)
iter = itertools.zip_longest(args, defaults, fillvalue=None)
if last_positional_only is not None:
kind = Parameter.POSITIONAL_ONLY
else:
kind = Parameter.POSITIONAL_OR_KEYWORD
for i, (name, default) in enumerate(reversed(list(iter))):
p(name, default)
if i == last_positional_only:
kind = Parameter.POSITIONAL_OR_KEYWORD
# *args
if f.args.vararg:
kind = Parameter.VAR_POSITIONAL
p(f.args.vararg, empty)
# keyword-only arguments
kind = Parameter.KEYWORD_ONLY
for name, default in zip(f.args.kwonlyargs, f.args.kw_defaults):
p(name, default)
# **kwargs
if f.args.kwarg:
kind = Parameter.VAR_KEYWORD
p(f.args.kwarg, empty)
if self_parameter is not None:
# Possibly strip the bound argument:
# - We *always* strip first bound argument if
# it is a module.
# - We don't strip first bound argument if
# skip_bound_arg is False.
assert parameters
_self = getattr(obj, '__self__', None)
self_isbound = _self is not None
self_ismodule = ismodule(_self)
if self_isbound and (self_ismodule or skip_bound_arg):
parameters.pop(0)
else:
# for builtins, self parameter is always positional-only!
p = parameters[0].replace(kind=Parameter.POSITIONAL_ONLY)
parameters[0] = p
return cls(parameters, return_annotation=cls.empty)
def _signature_from_builtin(cls, func, skip_bound_arg=True):
# Internal helper function to get signature for
# builtin callables
if not _signature_is_builtin(func):
raise TypeError("{!r} is not a Python builtin "
"function".format(func))
s = getattr(func, "__text_signature__", None)
if not s:
raise ValueError("no signature found for builtin {!r}".format(func))
return _signature_fromstr(cls, func, s, skip_bound_arg)
def _signature_internal(obj, follow_wrapper_chains=True, skip_bound_arg=True):
if not callable(obj):
raise TypeError('{!r} is not a callable object'.format(obj))
if isinstance(obj, types.MethodType):
# In this case we skip the first parameter of the underlying
# function (usually `self` or `cls`).
sig = _signature_internal(obj.__func__,
follow_wrapper_chains,
skip_bound_arg)
if skip_bound_arg:
return _signature_bound_method(sig)
else:
return sig
# Was this function wrapped by a decorator?
if follow_wrapper_chains:
obj = unwrap(obj, stop=(lambda f: hasattr(f, "__signature__")))
if isinstance(obj, types.MethodType):
# If the unwrapped object is a *method*, we might want to
# skip its first parameter (self).
# See test_signature_wrapped_bound_method for details.
return _signature_internal(
obj,
follow_wrapper_chains=follow_wrapper_chains,
skip_bound_arg=skip_bound_arg)
try:
sig = obj.__signature__
except AttributeError:
pass
else:
if sig is not None:
if not isinstance(sig, Signature):
raise TypeError(
'unexpected object {!r} in __signature__ '
'attribute'.format(sig))
return sig
try:
partialmethod = obj._partialmethod
except AttributeError:
pass
else:
if isinstance(partialmethod, functools.partialmethod):
# Unbound partialmethod (see functools.partialmethod)
# This means, that we need to calculate the signature
# as if it's a regular partial object, but taking into
# account that the first positional argument
# (usually `self`, or `cls`) will not be passed
# automatically (as for boundmethods)
wrapped_sig = _signature_internal(partialmethod.func,
follow_wrapper_chains,
skip_bound_arg)
sig = _signature_get_partial(wrapped_sig, partialmethod, (None,))
first_wrapped_param = tuple(wrapped_sig.parameters.values())[0]
new_params = (first_wrapped_param,) + tuple(sig.parameters.values())
return sig.replace(parameters=new_params)
if isfunction(obj) or _signature_is_functionlike(obj):
# If it's a pure Python function, or an object that is duck type
# of a Python function (Cython functions, for instance), then:
return Signature.from_function(obj)
if _signature_is_builtin(obj):
return _signature_from_builtin(Signature, obj,
skip_bound_arg=skip_bound_arg)
if isinstance(obj, functools.partial):
wrapped_sig = _signature_internal(obj.func,
follow_wrapper_chains,
skip_bound_arg)
return _signature_get_partial(wrapped_sig, obj)
sig = None
if isinstance(obj, type):
# obj is a class or a metaclass
# First, let's see if it has an overloaded __call__ defined
# in its metaclass
call = _signature_get_user_defined_method(type(obj), '__call__')
if call is not None:
sig = _signature_internal(call,
follow_wrapper_chains,
skip_bound_arg)
else:
# Now we check if the 'obj' class has a '__new__' method
new = _signature_get_user_defined_method(obj, '__new__')
if new is not None:
sig = _signature_internal(new,
follow_wrapper_chains,
skip_bound_arg)
else:
# Finally, we should have at least __init__ implemented
init = _signature_get_user_defined_method(obj, '__init__')
if init is not None:
sig = _signature_internal(init,
follow_wrapper_chains,
skip_bound_arg)
if sig is None:
# At this point we know, that `obj` is a class, with no user-
# defined '__init__', '__new__', or class-level '__call__'
for base in obj.__mro__[:-1]:
# Since '__text_signature__' is implemented as a
# descriptor that extracts text signature from the
# class docstring, if 'obj' is derived from a builtin
# class, its own '__text_signature__' may be 'None'.
# Therefore, we go through the MRO (except the last
# class in there, which is 'object') to find the first
# class with non-empty text signature.
try:
text_sig = base.__text_signature__
except AttributeError:
pass
else:
if text_sig:
# If 'obj' class has a __text_signature__ attribute:
# return a signature based on it
return _signature_fromstr(Signature, obj, text_sig)
# No '__text_signature__' was found for the 'obj' class.
# Last option is to check if its '__init__' is
# object.__init__ or type.__init__.
if type not in obj.__mro__:
# We have a class (not metaclass), but no user-defined
# __init__ or __new__ for it
if obj.__init__ is object.__init__:
# Return a signature of 'object' builtin.
return signature(object)
elif not isinstance(obj, _NonUserDefinedCallables):
# An object with __call__
# We also check that the 'obj' is not an instance of
# _WrapperDescriptor or _MethodWrapper to avoid
# infinite recursion (and even potential segfault)
call = _signature_get_user_defined_method(type(obj), '__call__')
if call is not None:
try:
sig = _signature_internal(call,
follow_wrapper_chains,
skip_bound_arg)
except ValueError as ex:
msg = 'no signature found for {!r}'.format(obj)
raise ValueError(msg) from ex
if sig is not None:
# For classes and objects we skip the first parameter of their
# __call__, __new__, or __init__ methods
if skip_bound_arg:
return _signature_bound_method(sig)
else:
return sig
if isinstance(obj, types.BuiltinFunctionType):
# Raise a nicer error message for builtins
msg = 'no signature found for builtin function {!r}'.format(obj)
raise ValueError(msg)
raise ValueError('callable {!r} is not supported by signature'.format(obj))
def signature(obj):
'''Get a signature object for the passed callable.'''
return _signature_internal(obj)
class _void:
'''A private marker - used in Parameter & Signature'''
class _empty:
pass
class _ParameterKind(int):
def __new__(self, *args, name):
obj = int.__new__(self, *args)
obj._name = name
return obj
def __str__(self):
return self._name
def __repr__(self):
return '<_ParameterKind: {!r}>'.format(self._name)
_POSITIONAL_ONLY = _ParameterKind(0, name='POSITIONAL_ONLY')
_POSITIONAL_OR_KEYWORD = _ParameterKind(1, name='POSITIONAL_OR_KEYWORD')
_VAR_POSITIONAL = _ParameterKind(2, name='VAR_POSITIONAL')
_KEYWORD_ONLY = _ParameterKind(3, name='KEYWORD_ONLY')
_VAR_KEYWORD = _ParameterKind(4, name='VAR_KEYWORD')
class Parameter:
'''Represents a parameter in a function signature.
Has the following public attributes:
* name : str
The name of the parameter as a string.
* default : object
The default value for the parameter if specified. If the
parameter has no default value, this attribute is set to
`Parameter.empty`.
* annotation
The annotation for the parameter if specified. If the
parameter has no annotation, this attribute is set to
`Parameter.empty`.
* kind : str
Describes how argument values are bound to the parameter.
Possible values: `Parameter.POSITIONAL_ONLY`,
`Parameter.POSITIONAL_OR_KEYWORD`, `Parameter.VAR_POSITIONAL`,
`Parameter.KEYWORD_ONLY`, `Parameter.VAR_KEYWORD`.
'''
__slots__ = ('_name', '_kind', '_default', '_annotation')
POSITIONAL_ONLY = _POSITIONAL_ONLY
POSITIONAL_OR_KEYWORD = _POSITIONAL_OR_KEYWORD
VAR_POSITIONAL = _VAR_POSITIONAL
KEYWORD_ONLY = _KEYWORD_ONLY
VAR_KEYWORD = _VAR_KEYWORD
empty = _empty
def __init__(self, name, kind, *, default=_empty, annotation=_empty):
if kind not in (_POSITIONAL_ONLY, _POSITIONAL_OR_KEYWORD,
_VAR_POSITIONAL, _KEYWORD_ONLY, _VAR_KEYWORD):
raise ValueError("invalid value for 'Parameter.kind' attribute")
self._kind = kind
if default is not _empty:
if kind in (_VAR_POSITIONAL, _VAR_KEYWORD):
msg = '{} parameters cannot have default values'.format(kind)
raise ValueError(msg)
self._default = default
self._annotation = annotation
if name is _empty:
raise ValueError('name is a required attribute for Parameter')
if not isinstance(name, str):
raise TypeError("name must be a str, not a {!r}".format(name))
if not name.isidentifier():
raise ValueError('{!r} is not a valid parameter name'.format(name))
self._name = name
@property
def name(self):
return self._name
@property
def default(self):
return self._default
@property
def annotation(self):
return self._annotation
@property
def kind(self):
return self._kind
def replace(self, *, name=_void, kind=_void,
annotation=_void, default=_void):
'''Creates a customized copy of the Parameter.'''
if name is _void:
name = self._name
if kind is _void:
kind = self._kind
if annotation is _void:
annotation = self._annotation
if default is _void:
default = self._default
return type(self)(name, kind, default=default, annotation=annotation)
def __str__(self):
kind = self.kind
formatted = self._name
# Add annotation and default value
if self._annotation is not _empty:
formatted = '{}:{}'.format(formatted,
formatannotation(self._annotation))
if self._default is not _empty:
formatted = '{}={}'.format(formatted, repr(self._default))
if kind == _VAR_POSITIONAL:
formatted = '*' + formatted
elif kind == _VAR_KEYWORD:
formatted = '**' + formatted
return formatted
def __repr__(self):
return '<{} at {:#x} {!r}>'.format(self.__class__.__name__,
id(self), self.name)
def __eq__(self, other):
if not isinstance(other, Parameter):
return NotImplemented
return (self._name == other._name and
self._kind == other._kind and
self._default == other._default and
self._annotation == other._annotation)
class BoundArguments:
'''Result of `Signature.bind` call. Holds the mapping of arguments
to the function's parameters.
Has the following public attributes:
* arguments : OrderedDict
An ordered mutable mapping of parameters' names to arguments' values.
Does not contain arguments' default values.
* signature : Signature
The Signature object that created this instance.
* args : tuple
Tuple of positional arguments values.
* kwargs : dict
Dict of keyword arguments values.
'''
def __init__(self, signature, arguments):
self.arguments = arguments
self._signature = signature
@property
def signature(self):
return self._signature
@property
def args(self):
args = []
for param_name, param in self._signature.parameters.items():
if param.kind in (_VAR_KEYWORD, _KEYWORD_ONLY):
break
try:
arg = self.arguments[param_name]
except KeyError:
# We're done here. Other arguments
# will be mapped in 'BoundArguments.kwargs'
break
else:
if param.kind == _VAR_POSITIONAL:
# *args
args.extend(arg)
else:
# plain argument
args.append(arg)
return tuple(args)
@property
def kwargs(self):
kwargs = {}
kwargs_started = False
for param_name, param in self._signature.parameters.items():
if not kwargs_started:
if param.kind in (_VAR_KEYWORD, _KEYWORD_ONLY):
kwargs_started = True
else:
if param_name not in self.arguments:
kwargs_started = True
continue
if not kwargs_started:
continue
try:
arg = self.arguments[param_name]
except KeyError:
pass
else:
if param.kind == _VAR_KEYWORD:
# **kwargs
kwargs.update(arg)
else:
# plain keyword argument
kwargs[param_name] = arg
return kwargs
def __eq__(self, other):
if not isinstance(other, BoundArguments):
return NotImplemented
return (self.signature == other.signature and
self.arguments == other.arguments)
class Signature:
'''A Signature object represents the overall signature of a function.
It stores a Parameter object for each parameter accepted by the
function, as well as information specific to the function itself.
A Signature object has the following public attributes and methods:
* parameters : OrderedDict
An ordered mapping of parameters' names to the corresponding
Parameter objects (keyword-only arguments are in the same order
as listed in `code.co_varnames`).
* return_annotation : object
The annotation for the return type of the function if specified.
If the function has no annotation for its return type, this
attribute is set to `Signature.empty`.
* bind(*args, **kwargs) -> BoundArguments
Creates a mapping from positional and keyword arguments to
parameters.
* bind_partial(*args, **kwargs) -> BoundArguments
Creates a partial mapping from positional and keyword arguments
to parameters (simulating 'functools.partial' behavior.)
'''
__slots__ = ('_return_annotation', '_parameters')
_parameter_cls = Parameter
_bound_arguments_cls = BoundArguments
empty = _empty
def __init__(self, parameters=None, *, return_annotation=_empty,
__validate_parameters__=True):
'''Constructs Signature from the given list of Parameter
objects and 'return_annotation'. All arguments are optional.
'''
if parameters is None:
params = OrderedDict()
else:
if __validate_parameters__:
params = OrderedDict()
top_kind = _POSITIONAL_ONLY
kind_defaults = False
for idx, param in enumerate(parameters):
kind = param.kind
name = param.name
if kind < top_kind:
msg = 'wrong parameter order: {!r} before {!r}'
msg = msg.format(top_kind, kind)
raise ValueError(msg)
elif kind > top_kind:
kind_defaults = False
top_kind = kind
if kind in (_POSITIONAL_ONLY, _POSITIONAL_OR_KEYWORD):
if param.default is _empty:
if kind_defaults:
# No default for this parameter, but the
# previous parameter of the same kind had
# a default
msg = 'non-default argument follows default ' \
'argument'
raise ValueError(msg)
else:
# There is a default for this parameter.
kind_defaults = True
if name in params:
msg = 'duplicate parameter name: {!r}'.format(name)
raise ValueError(msg)
params[name] = param
else:
params = OrderedDict(((param.name, param)
for param in parameters))
self._parameters = types.MappingProxyType(params)
self._return_annotation = return_annotation
@classmethod
def from_function(cls, func):
'''Constructs Signature for the given python function'''
is_duck_function = False
if not isfunction(func):
if _signature_is_functionlike(func):
is_duck_function = True
else:
# If it's not a pure Python function, and not a duck type
# of pure function:
raise TypeError('{!r} is not a Python function'.format(func))
Parameter = cls._parameter_cls
# Parameter information.
func_code = func.__code__
pos_count = func_code.co_argcount
arg_names = func_code.co_varnames
positional = tuple(arg_names[:pos_count])
keyword_only_count = func_code.co_kwonlyargcount
keyword_only = arg_names[pos_count:(pos_count + keyword_only_count)]
annotations = func.__annotations__
defaults = func.__defaults__
kwdefaults = func.__kwdefaults__
if defaults:
pos_default_count = len(defaults)
else:
pos_default_count = 0
parameters = []
# Non-keyword-only parameters w/o defaults.
non_default_count = pos_count - pos_default_count
for name in positional[:non_default_count]:
annotation = annotations.get(name, _empty)
parameters.append(Parameter(name, annotation=annotation,
kind=_POSITIONAL_OR_KEYWORD))
# ... w/ defaults.
for offset, name in enumerate(positional[non_default_count:]):
annotation = annotations.get(name, _empty)
parameters.append(Parameter(name, annotation=annotation,
kind=_POSITIONAL_OR_KEYWORD,
default=defaults[offset]))
# *args
if func_code.co_flags & CO_VARARGS:
name = arg_names[pos_count + keyword_only_count]
annotation = annotations.get(name, _empty)
parameters.append(Parameter(name, annotation=annotation,
kind=_VAR_POSITIONAL))
# Keyword-only parameters.
for name in keyword_only:
default = _empty
if kwdefaults is not None:
default = kwdefaults.get(name, _empty)
annotation = annotations.get(name, _empty)
parameters.append(Parameter(name, annotation=annotation,
kind=_KEYWORD_ONLY,
default=default))
# **kwargs
if func_code.co_flags & CO_VARKEYWORDS:
index = pos_count + keyword_only_count
if func_code.co_flags & CO_VARARGS:
index += 1
name = arg_names[index]
annotation = annotations.get(name, _empty)
parameters.append(Parameter(name, annotation=annotation,
kind=_VAR_KEYWORD))
# Is 'func' is a pure Python function - don't validate the
# parameters list (for correct order and defaults), it should be OK.
return cls(parameters,
return_annotation=annotations.get('return', _empty),
__validate_parameters__=is_duck_function)
@classmethod
def from_builtin(cls, func):
return _signature_from_builtin(cls, func)
@property
def parameters(self):
return self._parameters
@property
def return_annotation(self):
return self._return_annotation
def replace(self, *, parameters=_void, return_annotation=_void):
'''Creates a customized copy of the Signature.
Pass 'parameters' and/or 'return_annotation' arguments
to override them in the new copy.
'''
if parameters is _void:
parameters = self.parameters.values()
if return_annotation is _void:
return_annotation = self._return_annotation
return type(self)(parameters,
return_annotation=return_annotation)
def __eq__(self, other):
if not isinstance(other, Signature):
return NotImplemented
if (self.return_annotation != other.return_annotation or
len(self.parameters) != len(other.parameters)):
return False
other_positions = {param: idx
for idx, param in enumerate(other.parameters.keys())}
for idx, (param_name, param) in enumerate(self.parameters.items()):
if param.kind == _KEYWORD_ONLY:
try:
other_param = other.parameters[param_name]
except KeyError:
return False
else:
if param != other_param:
return False
else:
try:
other_idx = other_positions[param_name]
except KeyError:
return False
else:
if (idx != other_idx or
param != other.parameters[param_name]):
return False
return True
def _bind(self, args, kwargs, *, partial=False):
'''Private method. Don't use directly.'''
arguments = OrderedDict()
parameters = iter(self.parameters.values())
parameters_ex = ()
arg_vals = iter(args)
while True:
# Let's iterate through the positional arguments and corresponding
# parameters
try:
arg_val = next(arg_vals)
except StopIteration:
# No more positional arguments
try:
param = next(parameters)
except StopIteration:
# No more parameters. That's it. Just need to check that
# we have no `kwargs` after this while loop
break
else:
if param.kind == _VAR_POSITIONAL:
# That's OK, just empty *args. Let's start parsing
# kwargs
break
elif param.name in kwargs:
if param.kind == _POSITIONAL_ONLY:
msg = '{arg!r} parameter is positional only, ' \
'but was passed as a keyword'
msg = msg.format(arg=param.name)
raise TypeError(msg) from None
parameters_ex = (param,)
break
elif (param.kind == _VAR_KEYWORD or
param.default is not _empty):
# That's fine too - we have a default value for this
# parameter. So, lets start parsing `kwargs`, starting
# with the current parameter
parameters_ex = (param,)
break
else:
# No default, not VAR_KEYWORD, not VAR_POSITIONAL,
# not in `kwargs`
if partial:
parameters_ex = (param,)
break
else:
msg = '{arg!r} parameter lacking default value'
msg = msg.format(arg=param.name)
raise TypeError(msg) from None
else:
# We have a positional argument to process
try:
param = next(parameters)
except StopIteration:
raise TypeError('too many positional arguments') from None
else:
if param.kind in (_VAR_KEYWORD, _KEYWORD_ONLY):
# Looks like we have no parameter for this positional
# argument
raise TypeError('too many positional arguments')
if param.kind == _VAR_POSITIONAL:
# We have an '*args'-like argument, let's fill it with
# all positional arguments we have left and move on to
# the next phase
values = [arg_val]
values.extend(arg_vals)
arguments[param.name] = tuple(values)
break
if param.name in kwargs:
raise TypeError('multiple values for argument '
'{arg!r}'.format(arg=param.name))
arguments[param.name] = arg_val
# Now, we iterate through the remaining parameters to process
# keyword arguments
kwargs_param = None
for param in itertools.chain(parameters_ex, parameters):
if param.kind == _VAR_KEYWORD:
# Memorize that we have a '**kwargs'-like parameter
kwargs_param = param
continue
if param.kind == _VAR_POSITIONAL:
# Named arguments don't refer to '*args'-like parameters.
# We only arrive here if the positional arguments ended
# before reaching the last parameter before *args.
continue
param_name = param.name
try:
arg_val = kwargs.pop(param_name)
except KeyError:
# We have no value for this parameter. It's fine though,
# if it has a default value, or it is an '*args'-like
# parameter, left alone by the processing of positional
# arguments.
if (not partial and param.kind != _VAR_POSITIONAL and
param.default is _empty):
raise TypeError('{arg!r} parameter lacking default value'. \
format(arg=param_name)) from None
else:
if param.kind == _POSITIONAL_ONLY:
# This should never happen in case of a properly built
# Signature object (but let's have this check here
# to ensure correct behaviour just in case)
raise TypeError('{arg!r} parameter is positional only, '
'but was passed as a keyword'. \
format(arg=param.name))
arguments[param_name] = arg_val
if kwargs:
if kwargs_param is not None:
# Process our '**kwargs'-like parameter
arguments[kwargs_param.name] = kwargs
else:
raise TypeError('too many keyword arguments')
return self._bound_arguments_cls(self, arguments)
def bind(*args, **kwargs):
'''Get a BoundArguments object, that maps the passed `args`
and `kwargs` to the function's signature. Raises `TypeError`
if the passed arguments can not be bound.
'''
return args[0]._bind(args[1:], kwargs)
def bind_partial(*args, **kwargs):
'''Get a BoundArguments object, that partially maps the
passed `args` and `kwargs` to the function's signature.
Raises `TypeError` if the passed arguments can not be bound.
'''
return args[0]._bind(args[1:], kwargs, partial=True)
def __str__(self):
result = []
render_pos_only_separator = False
render_kw_only_separator = True
for param in self.parameters.values():
formatted = str(param)
kind = param.kind
if kind == _POSITIONAL_ONLY:
render_pos_only_separator = True
elif render_pos_only_separator:
# It's not a positional-only parameter, and the flag
# is set to 'True' (there were pos-only params before.)
result.append('/')
render_pos_only_separator = False
if kind == _VAR_POSITIONAL:
# OK, we have an '*args'-like parameter, so we won't need
# a '*' to separate keyword-only arguments
render_kw_only_separator = False
elif kind == _KEYWORD_ONLY and render_kw_only_separator:
# We have a keyword-only parameter to render and we haven't
# rendered an '*args'-like parameter before, so add a '*'
# separator to the parameters list ("foo(arg1, *, arg2)" case)
result.append('*')
# This condition should be only triggered once, so
# reset the flag
render_kw_only_separator = False
result.append(formatted)
if render_pos_only_separator:
# There were only positional-only parameters, hence the
# flag was not reset to 'False'
result.append('/')
rendered = '({})'.format(', '.join(result))
if self.return_annotation is not _empty:
anno = formatannotation(self.return_annotation)
rendered += ' -> {}'.format(anno)
return rendered
def _main():
""" Logic for inspecting an object given at command line """
import argparse
import importlib
parser = argparse.ArgumentParser()
parser.add_argument(
'object',
help="The object to be analysed. "
"It supports the 'module:qualname' syntax")
parser.add_argument(
'-d', '--details', action='store_true',
help='Display info about the module rather than its source code')
args = parser.parse_args()
target = args.object
mod_name, has_attrs, attrs = target.partition(":")
try:
obj = module = importlib.import_module(mod_name)
except Exception as exc:
msg = "Failed to import {} ({}: {})".format(mod_name,
type(exc).__name__,
exc)
print(msg, file=sys.stderr)
exit(2)
if has_attrs:
parts = attrs.split(".")
obj = module
for part in parts:
obj = getattr(obj, part)
if module.__name__ in sys.builtin_module_names:
print("Can't get info for builtin modules.", file=sys.stderr)
exit(1)
if args.details:
print('Target: {}'.format(target))
print('Origin: {}'.format(getsourcefile(module)))
print('Cached: {}'.format(module.__cached__))
if obj is module:
print('Loader: {}'.format(repr(module.__loader__)))
if hasattr(module, '__path__'):
print('Submodule search path: {}'.format(module.__path__))
else:
try:
__, lineno = findsource(obj)
except Exception:
pass
else:
print('Line: {}'.format(lineno))
print('\n')
else:
print(getsource(obj))
if __name__ == "__main__":
_main()
"""The io module provides the Python interfaces to stream handling. The
builtin open function is defined in this module.
At the top of the I/O hierarchy is the abstract base class IOBase. It
defines the basic interface to a stream. Note, however, that there is no
separation between reading and writing to streams; implementations are
allowed to raise an OSError if they do not support a given operation.
Extending IOBase is RawIOBase which deals simply with the reading and
writing of raw bytes to a stream. FileIO subclasses RawIOBase to provide
an interface to OS files.
BufferedIOBase deals with buffering on a raw byte stream (RawIOBase). Its
subclasses, BufferedWriter, BufferedReader, and BufferedRWPair buffer
streams that are readable, writable, and both respectively.
BufferedRandom provides a buffered interface to random access
streams. BytesIO is a simple stream of in-memory bytes.
Another IOBase subclass, TextIOBase, deals with the encoding and decoding
of streams into text. TextIOWrapper, which extends it, is a buffered text
interface to a buffered raw stream (`BufferedIOBase`). Finally, StringIO
is an in-memory stream for text.
Argument names are not part of the specification, and only the arguments
of open() are intended to be used as keyword arguments.
data:
DEFAULT_BUFFER_SIZE
An int containing the default buffer size used by the module's buffered
I/O classes. open() uses the file's blksize (as obtained by os.stat) if
possible.
"""
# New I/O library conforming to PEP 3116.
__author__ = ("Guido van Rossum <[email protected]>, "
"Mike Verdone <[email protected]>, "
"Mark Russell <[email protected]>, "
"Antoine Pitrou <[email protected]>, "
"Amaury Forgeot d'Arc <[email protected]>, "
"Benjamin Peterson <[email protected]>")
__all__ = ["BlockingIOError", "open", "IOBase", "RawIOBase", "FileIO",
"BytesIO", "StringIO", "BufferedIOBase",
"BufferedReader", "BufferedWriter", "BufferedRWPair",
"BufferedRandom", "TextIOBase", "TextIOWrapper",
"UnsupportedOperation", "SEEK_SET", "SEEK_CUR", "SEEK_END"]
import _io
import abc
from _io import (DEFAULT_BUFFER_SIZE, BlockingIOError, UnsupportedOperation,
open, FileIO, BytesIO, StringIO, BufferedReader,
BufferedWriter, BufferedRWPair, BufferedRandom,
IncrementalNewlineDecoder, TextIOWrapper)
OpenWrapper = _io.open # for compatibility with _pyio
# Pretend this exception was created here.
UnsupportedOperation.__module__ = "io"
# for seek()
SEEK_SET = 0
SEEK_CUR = 1
SEEK_END = 2
# Declaring ABCs in C is tricky so we do it here.
# Method descriptions and default implementations are inherited from the C
# version however.
class IOBase(_io._IOBase, metaclass=abc.ABCMeta):
__doc__ = _io._IOBase.__doc__
class RawIOBase(_io._RawIOBase, IOBase):
__doc__ = _io._RawIOBase.__doc__
class BufferedIOBase(_io._BufferedIOBase, IOBase):
__doc__ = _io._BufferedIOBase.__doc__
class TextIOBase(_io._TextIOBase, IOBase):
__doc__ = _io._TextIOBase.__doc__
RawIOBase.register(FileIO)
for klass in (BytesIO, BufferedReader, BufferedWriter, BufferedRandom,
BufferedRWPair):
BufferedIOBase.register(klass)
for klass in (StringIO, TextIOWrapper):
TextIOBase.register(klass)
del klass
# Copyright 2007 Google Inc.
# Licensed to PSF under a Contributor Agreement.
"""A fast, lightweight IPv4/IPv6 manipulation library in Python.
This library is used to create/poke/manipulate IPv4 and IPv6 addresses
and networks.
"""
__version__ = '1.0'
import functools
IPV4LENGTH = 32
IPV6LENGTH = 128
class AddressValueError(ValueError):
"""A Value Error related to the address."""
class NetmaskValueError(ValueError):
"""A Value Error related to the netmask."""
def ip_address(address):
"""Take an IP string/int and return an object of the correct type.
Args:
address: A string or integer, the IP address. Either IPv4 or
IPv6 addresses may be supplied; integers less than 2**32 will
be considered to be IPv4 by default.
Returns:
An IPv4Address or IPv6Address object.
Raises:
ValueError: if the *address* passed isn't either a v4 or a v6
address
"""
try:
return IPv4Address(address)
except (AddressValueError, NetmaskValueError):
pass
try:
return IPv6Address(address)
except (AddressValueError, NetmaskValueError):
pass
raise ValueError('%r does not appear to be an IPv4 or IPv6 address' %
address)
def ip_network(address, strict=True):
"""Take an IP string/int and return an object of the correct type.
Args:
address: A string or integer, the IP network. Either IPv4 or
IPv6 networks may be supplied; integers less than 2**32 will
be considered to be IPv4 by default.
Returns:
An IPv4Network or IPv6Network object.
Raises:
ValueError: if the string passed isn't either a v4 or a v6
address. Or if the network has host bits set.
"""
try:
return IPv4Network(address, strict)
except (AddressValueError, NetmaskValueError):
pass
try:
return IPv6Network(address, strict)
except (AddressValueError, NetmaskValueError):
pass
raise ValueError('%r does not appear to be an IPv4 or IPv6 network' %
address)
def ip_interface(address):
"""Take an IP string/int and return an object of the correct type.
Args:
address: A string or integer, the IP address. Either IPv4 or
IPv6 addresses may be supplied; integers less than 2**32 will
be considered to be IPv4 by default.
Returns:
An IPv4Interface or IPv6Interface object.
Raises:
ValueError: if the string passed isn't either a v4 or a v6
address.
Notes:
The IPv?Interface classes describe an Address on a particular
Network, so they're basically a combination of both the Address
and Network classes.
"""
try:
return IPv4Interface(address)
except (AddressValueError, NetmaskValueError):
pass
try:
return IPv6Interface(address)
except (AddressValueError, NetmaskValueError):
pass
raise ValueError('%r does not appear to be an IPv4 or IPv6 interface' %
address)
def v4_int_to_packed(address):
"""Represent an address as 4 packed bytes in network (big-endian) order.
Args:
address: An integer representation of an IPv4 IP address.
Returns:
The integer address packed as 4 bytes in network (big-endian) order.
Raises:
ValueError: If the integer is negative or too large to be an
IPv4 IP address.
"""
try:
return address.to_bytes(4, 'big')
except:
raise ValueError("Address negative or too large for IPv4")
def v6_int_to_packed(address):
"""Represent an address as 16 packed bytes in network (big-endian) order.
Args:
address: An integer representation of an IPv6 IP address.
Returns:
The integer address packed as 16 bytes in network (big-endian) order.
"""
try:
return address.to_bytes(16, 'big')
except:
raise ValueError("Address negative or too large for IPv6")
def _split_optional_netmask(address):
"""Helper to split the netmask and raise AddressValueError if needed"""
addr = str(address).split('/')
if len(addr) > 2:
raise AddressValueError("Only one '/' permitted in %r" % address)
return addr
def _find_address_range(addresses):
"""Find a sequence of IPv#Address.
Args:
addresses: a list of IPv#Address objects.
Returns:
A tuple containing the first and last IP addresses in the sequence.
"""
first = last = addresses[0]
for ip in addresses[1:]:
if ip._ip == last._ip + 1:
last = ip
else:
break
return (first, last)
def _count_righthand_zero_bits(number, bits):
"""Count the number of zero bits on the right hand side.
Args:
number: an integer.
bits: maximum number of bits to count.
Returns:
The number of zero bits on the right hand side of the number.
"""
if number == 0:
return bits
for i in range(bits):
if (number >> i) & 1:
return i
# All bits of interest were zero, even if there are more in the number
return bits
def summarize_address_range(first, last):
"""Summarize a network range given the first and last IP addresses.
Example:
>>> list(summarize_address_range(IPv4Address('192.0.2.0'),
... IPv4Address('192.0.2.130')))
... #doctest: +NORMALIZE_WHITESPACE
[IPv4Network('192.0.2.0/25'), IPv4Network('192.0.2.128/31'),
IPv4Network('192.0.2.130/32')]
Args:
first: the first IPv4Address or IPv6Address in the range.
last: the last IPv4Address or IPv6Address in the range.
Returns:
An iterator of the summarized IPv(4|6) network objects.
Raise:
TypeError:
If the first and last objects are not IP addresses.
If the first and last objects are not the same version.
ValueError:
If the last object is not greater than the first.
If the version of the first address is not 4 or 6.
"""
if (not (isinstance(first, _BaseAddress) and
isinstance(last, _BaseAddress))):
raise TypeError('first and last must be IP addresses, not networks')
if first.version != last.version:
raise TypeError("%s and %s are not of the same version" % (
first, last))
if first > last:
raise ValueError('last IP address must be greater than first')
if first.version == 4:
ip = IPv4Network
elif first.version == 6:
ip = IPv6Network
else:
raise ValueError('unknown IP version')
ip_bits = first._max_prefixlen
first_int = first._ip
last_int = last._ip
while first_int <= last_int:
nbits = min(_count_righthand_zero_bits(first_int, ip_bits),
(last_int - first_int + 1).bit_length() - 1)
net = ip('%s/%d' % (first, ip_bits - nbits))
yield net
first_int += 1 << nbits
if first_int - 1 == ip._ALL_ONES:
break
first = first.__class__(first_int)
def _collapse_addresses_recursive(addresses):
"""Loops through the addresses, collapsing concurrent netblocks.
Example:
ip1 = IPv4Network('192.0.2.0/26')
ip2 = IPv4Network('192.0.2.64/26')
ip3 = IPv4Network('192.0.2.128/26')
ip4 = IPv4Network('192.0.2.192/26')
_collapse_addresses_recursive([ip1, ip2, ip3, ip4]) ->
[IPv4Network('192.0.2.0/24')]
This shouldn't be called directly; it is called via
collapse_addresses([]).
Args:
addresses: A list of IPv4Network's or IPv6Network's
Returns:
A list of IPv4Network's or IPv6Network's depending on what we were
passed.
"""
while True:
last_addr = None
ret_array = []
optimized = False
for cur_addr in addresses:
if not ret_array:
last_addr = cur_addr
ret_array.append(cur_addr)
elif (cur_addr.network_address >= last_addr.network_address and
cur_addr.broadcast_address <= last_addr.broadcast_address):
optimized = True
elif cur_addr == list(last_addr.supernet().subnets())[1]:
ret_array[-1] = last_addr = last_addr.supernet()
optimized = True
else:
last_addr = cur_addr
ret_array.append(cur_addr)
addresses = ret_array
if not optimized:
return addresses
def collapse_addresses(addresses):
"""Collapse a list of IP objects.
Example:
collapse_addresses([IPv4Network('192.0.2.0/25'),
IPv4Network('192.0.2.128/25')]) ->
[IPv4Network('192.0.2.0/24')]
Args:
addresses: An iterator of IPv4Network or IPv6Network objects.
Returns:
An iterator of the collapsed IPv(4|6)Network objects.
Raises:
TypeError: If passed a list of mixed version objects.
"""
i = 0
addrs = []
ips = []
nets = []
# split IP addresses and networks
for ip in addresses:
if isinstance(ip, _BaseAddress):
if ips and ips[-1]._version != ip._version:
raise TypeError("%s and %s are not of the same version" % (
ip, ips[-1]))
ips.append(ip)
elif ip._prefixlen == ip._max_prefixlen:
if ips and ips[-1]._version != ip._version:
raise TypeError("%s and %s are not of the same version" % (
ip, ips[-1]))
try:
ips.append(ip.ip)
except AttributeError:
ips.append(ip.network_address)
else:
if nets and nets[-1]._version != ip._version:
raise TypeError("%s and %s are not of the same version" % (
ip, nets[-1]))
nets.append(ip)
# sort and dedup
ips = sorted(set(ips))
nets = sorted(set(nets))
while i < len(ips):
(first, last) = _find_address_range(ips[i:])
i = ips.index(last) + 1
addrs.extend(summarize_address_range(first, last))
return iter(_collapse_addresses_recursive(sorted(
addrs + nets, key=_BaseNetwork._get_networks_key)))
def get_mixed_type_key(obj):
"""Return a key suitable for sorting between networks and addresses.
Address and Network objects are not sortable by default; they're
fundamentally different so the expression
IPv4Address('192.0.2.0') <= IPv4Network('192.0.2.0/24')
doesn't make any sense. There are some times however, where you may wish
to have ipaddress sort these for you anyway. If you need to do this, you
can use this function as the key= argument to sorted().
Args:
obj: either a Network or Address object.
Returns:
appropriate key.
"""
if isinstance(obj, _BaseNetwork):
return obj._get_networks_key()
elif isinstance(obj, _BaseAddress):
return obj._get_address_key()
return NotImplemented
class _IPAddressBase:
"""The mother class."""
@property
def exploded(self):
"""Return the longhand version of the IP address as a string."""
return self._explode_shorthand_ip_string()
@property
def compressed(self):
"""Return the shorthand version of the IP address as a string."""
return str(self)
@property
def version(self):
msg = '%200s has no version specified' % (type(self),)
raise NotImplementedError(msg)
def _check_int_address(self, address):
if address < 0:
msg = "%d (< 0) is not permitted as an IPv%d address"
raise AddressValueError(msg % (address, self._version))
if address > self._ALL_ONES:
msg = "%d (>= 2**%d) is not permitted as an IPv%d address"
raise AddressValueError(msg % (address, self._max_prefixlen,
self._version))
def _check_packed_address(self, address, expected_len):
address_len = len(address)
if address_len != expected_len:
msg = "%r (len %d != %d) is not permitted as an IPv%d address"
raise AddressValueError(msg % (address, address_len,
expected_len, self._version))
def _ip_int_from_prefix(self, prefixlen):
"""Turn the prefix length into a bitwise netmask
Args:
prefixlen: An integer, the prefix length.
Returns:
An integer.
"""
return self._ALL_ONES ^ (self._ALL_ONES >> prefixlen)
def _prefix_from_ip_int(self, ip_int):
"""Return prefix length from the bitwise netmask.
Args:
ip_int: An integer, the netmask in expanded bitwise format
Returns:
An integer, the prefix length.
Raises:
ValueError: If the input intermingles zeroes & ones
"""
trailing_zeroes = _count_righthand_zero_bits(ip_int,
self._max_prefixlen)
prefixlen = self._max_prefixlen - trailing_zeroes
leading_ones = ip_int >> trailing_zeroes
all_ones = (1 << prefixlen) - 1
if leading_ones != all_ones:
byteslen = self._max_prefixlen // 8
details = ip_int.to_bytes(byteslen, 'big')
msg = 'Netmask pattern %r mixes zeroes & ones'
raise ValueError(msg % details)
return prefixlen
def _report_invalid_netmask(self, netmask_str):
msg = '%r is not a valid netmask' % netmask_str
raise NetmaskValueError(msg) from None
def _prefix_from_prefix_string(self, prefixlen_str):
"""Return prefix length from a numeric string
Args:
prefixlen_str: The string to be converted
Returns:
An integer, the prefix length.
Raises:
NetmaskValueError: If the input is not a valid netmask
"""
# int allows a leading +/- as well as surrounding whitespace,
# so we ensure that isn't the case
if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str):
self._report_invalid_netmask(prefixlen_str)
try:
prefixlen = int(prefixlen_str)
except ValueError:
self._report_invalid_netmask(prefixlen_str)
if not (0 <= prefixlen <= self._max_prefixlen):
self._report_invalid_netmask(prefixlen_str)
return prefixlen
def _prefix_from_ip_string(self, ip_str):
"""Turn a netmask/hostmask string into a prefix length
Args:
ip_str: The netmask/hostmask to be converted
Returns:
An integer, the prefix length.
Raises:
NetmaskValueError: If the input is not a valid netmask/hostmask
"""
# Parse the netmask/hostmask like an IP address.
try:
ip_int = self._ip_int_from_string(ip_str)
except AddressValueError:
self._report_invalid_netmask(ip_str)
# Try matching a netmask (this would be /1*0*/ as a bitwise regexp).
# Note that the two ambiguous cases (all-ones and all-zeroes) are
# treated as netmasks.
try:
return self._prefix_from_ip_int(ip_int)
except ValueError:
pass
# Invert the bits, and try matching a /0+1+/ hostmask instead.
ip_int ^= self._ALL_ONES
try:
return self._prefix_from_ip_int(ip_int)
except ValueError:
self._report_invalid_netmask(ip_str)
@functools.total_ordering
class _BaseAddress(_IPAddressBase):
"""A generic IP object.
This IP class contains the version independent methods which are
used by single IP addresses.
"""
def __init__(self, address):
if (not isinstance(address, bytes)
and '/' in str(address)):
raise AddressValueError("Unexpected '/' in %r" % address)
def __int__(self):
return self._ip
def __eq__(self, other):
try:
return (self._ip == other._ip
and self._version == other._version)
except AttributeError:
return NotImplemented
def __lt__(self, other):
if not isinstance(other, _BaseAddress):
return NotImplemented
if self._version != other._version:
raise TypeError('%s and %s are not of the same version' % (
self, other))
if self._ip != other._ip:
return self._ip < other._ip
return False
# Shorthand for Integer addition and subtraction. This is not
# meant to ever support addition/subtraction of addresses.
def __add__(self, other):
if not isinstance(other, int):
return NotImplemented
return self.__class__(int(self) + other)
def __sub__(self, other):
if not isinstance(other, int):
return NotImplemented
return self.__class__(int(self) - other)
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, str(self))
def __str__(self):
return str(self._string_from_ip_int(self._ip))
def __hash__(self):
return hash(hex(int(self._ip)))
def _get_address_key(self):
return (self._version, self)
@functools.total_ordering
class _BaseNetwork(_IPAddressBase):
"""A generic IP network object.
This IP class contains the version independent methods which are
used by networks.
"""
def __init__(self, address):
self._cache = {}
def __repr__(self):
return '%s(%r)' % (self.__class__.__name__, str(self))
def __str__(self):
return '%s/%d' % (self.network_address, self.prefixlen)
def hosts(self):
"""Generate Iterator over usable hosts in a network.
This is like __iter__ except it doesn't return the network
or broadcast addresses.
"""
network = int(self.network_address)
broadcast = int(self.broadcast_address)
for x in range(network + 1, broadcast):
yield self._address_class(x)
def __iter__(self):
network = int(self.network_address)
broadcast = int(self.broadcast_address)
for x in range(network, broadcast + 1):
yield self._address_class(x)
def __getitem__(self, n):
network = int(self.network_address)
broadcast = int(self.broadcast_address)
if n >= 0:
if network + n > broadcast:
raise IndexError
return self._address_class(network + n)
else:
n += 1
if broadcast + n < network:
raise IndexError
return self._address_class(broadcast + n)
def __lt__(self, other):
if not isinstance(other, _BaseNetwork):
return NotImplemented
if self._version != other._version:
raise TypeError('%s and %s are not of the same version' % (
self, other))
if self.network_address != other.network_address:
return self.network_address < other.network_address
if self.netmask != other.netmask:
return self.netmask < other.netmask
return False
def __eq__(self, other):
try:
return (self._version == other._version and
self.network_address == other.network_address and
int(self.netmask) == int(other.netmask))
except AttributeError:
return NotImplemented
def __hash__(self):
return hash(int(self.network_address) ^ int(self.netmask))
def __contains__(self, other):
# always false if one is v4 and the other is v6.
if self._version != other._version:
return False
# dealing with another network.
if isinstance(other, _BaseNetwork):
return False
# dealing with another address
else:
# address
return (int(self.network_address) <= int(other._ip) <=
int(self.broadcast_address))
def overlaps(self, other):
"""Tell if self is partly contained in other."""
return self.network_address in other or (
self.broadcast_address in other or (
other.network_address in self or (
other.broadcast_address in self)))
@property
def broadcast_address(self):
x = self._cache.get('broadcast_address')
if x is None:
x = self._address_class(int(self.network_address) |
int(self.hostmask))
self._cache['broadcast_address'] = x
return x
@property
def hostmask(self):
x = self._cache.get('hostmask')
if x is None:
x = self._address_class(int(self.netmask) ^ self._ALL_ONES)
self._cache['hostmask'] = x
return x
@property
def with_prefixlen(self):
return '%s/%d' % (self.network_address, self._prefixlen)
@property
def with_netmask(self):
return '%s/%s' % (self.network_address, self.netmask)
@property
def with_hostmask(self):
return '%s/%s' % (self.network_address, self.hostmask)
@property
def num_addresses(self):
"""Number of hosts in the current subnet."""
return int(self.broadcast_address) - int(self.network_address) + 1
@property
def _address_class(self):
# Returning bare address objects (rather than interfaces) allows for
# more consistent behaviour across the network address, broadcast
# address and individual host addresses.
msg = '%200s has no associated address class' % (type(self),)
raise NotImplementedError(msg)
@property
def prefixlen(self):
return self._prefixlen
def address_exclude(self, other):
"""Remove an address from a larger block.
For example:
addr1 = ip_network('192.0.2.0/28')
addr2 = ip_network('192.0.2.1/32')
addr1.address_exclude(addr2) =
[IPv4Network('192.0.2.0/32'), IPv4Network('192.0.2.2/31'),
IPv4Network('192.0.2.4/30'), IPv4Network('192.0.2.8/29')]
or IPv6:
addr1 = ip_network('2001:db8::1/32')
addr2 = ip_network('2001:db8::1/128')
addr1.address_exclude(addr2) =
[ip_network('2001:db8::1/128'),
ip_network('2001:db8::2/127'),
ip_network('2001:db8::4/126'),
ip_network('2001:db8::8/125'),
...
ip_network('2001:db8:8000::/33')]
Args:
other: An IPv4Network or IPv6Network object of the same type.
Returns:
An iterator of the IPv(4|6)Network objects which is self
minus other.
Raises:
TypeError: If self and other are of differing address
versions, or if other is not a network object.
ValueError: If other is not completely contained by self.
"""
if not self._version == other._version:
raise TypeError("%s and %s are not of the same version" % (
self, other))
if not isinstance(other, _BaseNetwork):
raise TypeError("%s is not a network object" % other)
if not (other.network_address >= self.network_address and
other.broadcast_address <= self.broadcast_address):
raise ValueError('%s not contained in %s' % (other, self))
if other == self:
raise StopIteration
# Make sure we're comparing the network of other.
other = other.__class__('%s/%s' % (other.network_address,
other.prefixlen))
s1, s2 = self.subnets()
while s1 != other and s2 != other:
if (other.network_address >= s1.network_address and
other.broadcast_address <= s1.broadcast_address):
yield s2
s1, s2 = s1.subnets()
elif (other.network_address >= s2.network_address and
other.broadcast_address <= s2.broadcast_address):
yield s1
s1, s2 = s2.subnets()
else:
# If we got here, there's a bug somewhere.
raise AssertionError('Error performing exclusion: '
's1: %s s2: %s other: %s' %
(s1, s2, other))
if s1 == other:
yield s2
elif s2 == other:
yield s1
else:
# If we got here, there's a bug somewhere.
raise AssertionError('Error performing exclusion: '
's1: %s s2: %s other: %s' %
(s1, s2, other))
def compare_networks(self, other):
"""Compare two IP objects.
This is only concerned about the comparison of the integer
representation of the network addresses. This means that the
host bits aren't considered at all in this method. If you want
to compare host bits, you can easily enough do a
'HostA._ip < HostB._ip'
Args:
other: An IP object.
Returns:
If the IP versions of self and other are the same, returns:
-1 if self < other:
eg: IPv4Network('192.0.2.0/25') < IPv4Network('192.0.2.128/25')
IPv6Network('2001:db8::1000/124') <
IPv6Network('2001:db8::2000/124')
0 if self == other
eg: IPv4Network('192.0.2.0/24') == IPv4Network('192.0.2.0/24')
IPv6Network('2001:db8::1000/124') ==
IPv6Network('2001:db8::1000/124')
1 if self > other
eg: IPv4Network('192.0.2.128/25') > IPv4Network('192.0.2.0/25')
IPv6Network('2001:db8::2000/124') >
IPv6Network('2001:db8::1000/124')
Raises:
TypeError if the IP versions are different.
"""
# does this need to raise a ValueError?
if self._version != other._version:
raise TypeError('%s and %s are not of the same type' % (
self, other))
# self._version == other._version below here:
if self.network_address < other.network_address:
return -1
if self.network_address > other.network_address:
return 1
# self.network_address == other.network_address below here:
if self.netmask < other.netmask:
return -1
if self.netmask > other.netmask:
return 1
return 0
def _get_networks_key(self):
"""Network-only key function.
Returns an object that identifies this address' network and
netmask. This function is a suitable "key" argument for sorted()
and list.sort().
"""
return (self._version, self.network_address, self.netmask)
def subnets(self, prefixlen_diff=1, new_prefix=None):
"""The subnets which join to make the current subnet.
In the case that self contains only one IP
(self._prefixlen == 32 for IPv4 or self._prefixlen == 128
for IPv6), yield an iterator with just ourself.
Args:
prefixlen_diff: An integer, the amount the prefix length
should be increased by. This should not be set if
new_prefix is also set.
new_prefix: The desired new prefix length. This must be a
larger number (smaller prefix) than the existing prefix.
This should not be set if prefixlen_diff is also set.
Returns:
An iterator of IPv(4|6) objects.
Raises:
ValueError: The prefixlen_diff is too small or too large.
OR
prefixlen_diff and new_prefix are both set or new_prefix
is a smaller number than the current prefix (smaller
number means a larger network)
"""
if self._prefixlen == self._max_prefixlen:
yield self
return
if new_prefix is not None:
if new_prefix < self._prefixlen:
raise ValueError('new prefix must be longer')
if prefixlen_diff != 1:
raise ValueError('cannot set prefixlen_diff and new_prefix')
prefixlen_diff = new_prefix - self._prefixlen
if prefixlen_diff < 0:
raise ValueError('prefix length diff must be > 0')
new_prefixlen = self._prefixlen + prefixlen_diff
if new_prefixlen > self._max_prefixlen:
raise ValueError(
'prefix length diff %d is invalid for netblock %s' % (
new_prefixlen, self))
first = self.__class__('%s/%s' %
(self.network_address,
self._prefixlen + prefixlen_diff))
yield first
current = first
while True:
broadcast = current.broadcast_address
if broadcast == self.broadcast_address:
return
new_addr = self._address_class(int(broadcast) + 1)
current = self.__class__('%s/%s' % (new_addr,
new_prefixlen))
yield current
def supernet(self, prefixlen_diff=1, new_prefix=None):
"""The supernet containing the current network.
Args:
prefixlen_diff: An integer, the amount the prefix length of
the network should be decreased by. For example, given a
/24 network and a prefixlen_diff of 3, a supernet with a
/21 netmask is returned.
Returns:
An IPv4 network object.
Raises:
ValueError: If self.prefixlen - prefixlen_diff < 0. I.e., you have
a negative prefix length.
OR
If prefixlen_diff and new_prefix are both set or new_prefix is a
larger number than the current prefix (larger number means a
smaller network)
"""
if self._prefixlen == 0:
return self
if new_prefix is not None:
if new_prefix > self._prefixlen:
raise ValueError('new prefix must be shorter')
if prefixlen_diff != 1:
raise ValueError('cannot set prefixlen_diff and new_prefix')
prefixlen_diff = self._prefixlen - new_prefix
if self.prefixlen - prefixlen_diff < 0:
raise ValueError(
'current prefixlen is %d, cannot have a prefixlen_diff of %d' %
(self.prefixlen, prefixlen_diff))
# TODO (pmoody): optimize this.
t = self.__class__('%s/%d' % (self.network_address,
self.prefixlen - prefixlen_diff),
strict=False)
return t.__class__('%s/%d' % (t.network_address, t.prefixlen))
@property
def is_multicast(self):
"""Test if the address is reserved for multicast use.
Returns:
A boolean, True if the address is a multicast address.
See RFC 2373 2.7 for details.
"""
return (self.network_address.is_multicast and
self.broadcast_address.is_multicast)
@property
def is_reserved(self):
"""Test if the address is otherwise IETF reserved.
Returns:
A boolean, True if the address is within one of the
reserved IPv6 Network ranges.
"""
return (self.network_address.is_reserved and
self.broadcast_address.is_reserved)
@property
def is_link_local(self):
"""Test if the address is reserved for link-local.
Returns:
A boolean, True if the address is reserved per RFC 4291.
"""
return (self.network_address.is_link_local and
self.broadcast_address.is_link_local)
@property
def is_private(self):
"""Test if this address is allocated for private networks.
Returns:
A boolean, True if the address is reserved per
iana-ipv4-special-registry or iana-ipv6-special-registry.
"""
return (self.network_address.is_private and
self.broadcast_address.is_private)
@property
def is_global(self):
"""Test if this address is allocated for public networks.
Returns:
A boolean, True if the address is not reserved per
iana-ipv4-special-registry or iana-ipv6-special-registry.
"""
return not self.is_private
@property
def is_unspecified(self):
"""Test if the address is unspecified.
Returns:
A boolean, True if this is the unspecified address as defined in
RFC 2373 2.5.2.
"""
return (self.network_address.is_unspecified and
self.broadcast_address.is_unspecified)
@property
def is_loopback(self):
"""Test if the address is a loopback address.
Returns:
A boolean, True if the address is a loopback address as defined in
RFC 2373 2.5.3.
"""
return (self.network_address.is_loopback and
self.broadcast_address.is_loopback)
class _BaseV4:
"""Base IPv4 object.
The following methods are used by IPv4 objects in both single IP
addresses and networks.
"""
# Equivalent to 255.255.255.255 or 32 bits of 1's.
_ALL_ONES = (2**IPV4LENGTH) - 1
_DECIMAL_DIGITS = frozenset('0123456789')
# the valid octets for host and netmasks. only useful for IPv4.
_valid_mask_octets = frozenset((255, 254, 252, 248, 240, 224, 192, 128, 0))
def __init__(self, address):
self._version = 4
self._max_prefixlen = IPV4LENGTH
def _explode_shorthand_ip_string(self):
return str(self)
def _ip_int_from_string(self, ip_str):
"""Turn the given IP string into an integer for comparison.
Args:
ip_str: A string, the IP ip_str.
Returns:
The IP ip_str as an integer.
Raises:
AddressValueError: if ip_str isn't a valid IPv4 Address.
"""
if not ip_str:
raise AddressValueError('Address cannot be empty')
octets = ip_str.split('.')
if len(octets) != 4:
raise AddressValueError("Expected 4 octets in %r" % ip_str)
try:
return int.from_bytes(map(self._parse_octet, octets), 'big')
except ValueError as exc:
raise AddressValueError("%s in %r" % (exc, ip_str)) from None
def _parse_octet(self, octet_str):
"""Convert a decimal octet into an integer.
Args:
octet_str: A string, the number to parse.
Returns:
The octet as an integer.
Raises:
ValueError: if the octet isn't strictly a decimal from [0..255].
"""
if not octet_str:
raise ValueError("Empty octet not permitted")
# Whitelist the characters, since int() allows a lot of bizarre stuff.
if not self._DECIMAL_DIGITS.issuperset(octet_str):
msg = "Only decimal digits permitted in %r"
raise ValueError(msg % octet_str)
# We do the length check second, since the invalid character error
# is likely to be more informative for the user
if len(octet_str) > 3:
msg = "At most 3 characters permitted in %r"
raise ValueError(msg % octet_str)
# Convert to integer (we know digits are legal)
octet_int = int(octet_str, 10)
# Any octets that look like they *might* be written in octal,
# and which don't look exactly the same in both octal and
# decimal are rejected as ambiguous
if octet_int > 7 and octet_str[0] == '0':
msg = "Ambiguous (octal/decimal) value in %r not permitted"
raise ValueError(msg % octet_str)
if octet_int > 255:
raise ValueError("Octet %d (> 255) not permitted" % octet_int)
return octet_int
def _string_from_ip_int(self, ip_int):
"""Turns a 32-bit integer into dotted decimal notation.
Args:
ip_int: An integer, the IP address.
Returns:
The IP address as a string in dotted decimal notation.
"""
return '.'.join(map(str, ip_int.to_bytes(4, 'big')))
def _is_valid_netmask(self, netmask):
"""Verify that the netmask is valid.
Args:
netmask: A string, either a prefix or dotted decimal
netmask.
Returns:
A boolean, True if the prefix represents a valid IPv4
netmask.
"""
mask = netmask.split('.')
if len(mask) == 4:
try:
for x in mask:
if int(x) not in self._valid_mask_octets:
return False
except ValueError:
# Found something that isn't an integer or isn't valid
return False
for idx, y in enumerate(mask):
if idx > 0 and y > mask[idx - 1]:
return False
return True
try:
netmask = int(netmask)
except ValueError:
return False
return 0 <= netmask <= self._max_prefixlen
def _is_hostmask(self, ip_str):
"""Test if the IP string is a hostmask (rather than a netmask).
Args:
ip_str: A string, the potential hostmask.
Returns:
A boolean, True if the IP string is a hostmask.
"""
bits = ip_str.split('.')
try:
parts = [x for x in map(int, bits) if x in self._valid_mask_octets]
except ValueError:
return False
if len(parts) != len(bits):
return False
if parts[0] < parts[-1]:
return True
return False
@property
def max_prefixlen(self):
return self._max_prefixlen
@property
def version(self):
return self._version
class IPv4Address(_BaseV4, _BaseAddress):
"""Represent and manipulate single IPv4 Addresses."""
def __init__(self, address):
"""
Args:
address: A string or integer representing the IP
Additionally, an integer can be passed, so
IPv4Address('192.0.2.1') == IPv4Address(3221225985).
or, more generally
IPv4Address(int(IPv4Address('192.0.2.1'))) ==
IPv4Address('192.0.2.1')
Raises:
AddressValueError: If ipaddress isn't a valid IPv4 address.
"""
_BaseAddress.__init__(self, address)
_BaseV4.__init__(self, address)
# Efficient constructor from integer.
if isinstance(address, int):
self._check_int_address(address)
self._ip = address
return
# Constructing from a packed address
if isinstance(address, bytes):
self._check_packed_address(address, 4)
self._ip = int.from_bytes(address, 'big')
return
# Assume input argument to be string or any object representation
# which converts into a formatted IP string.
addr_str = str(address)
self._ip = self._ip_int_from_string(addr_str)
@property
def packed(self):
"""The binary representation of this address."""
return v4_int_to_packed(self._ip)
@property
def is_reserved(self):
"""Test if the address is otherwise IETF reserved.
Returns:
A boolean, True if the address is within the
reserved IPv4 Network range.
"""
reserved_network = IPv4Network('240.0.0.0/4')
return self in reserved_network
@property
@functools.lru_cache()
def is_private(self):
"""Test if this address is allocated for private networks.
Returns:
A boolean, True if the address is reserved per
iana-ipv4-special-registry.
"""
return (self in IPv4Network('0.0.0.0/8') or
self in IPv4Network('10.0.0.0/8') or
self in IPv4Network('127.0.0.0/8') or
self in IPv4Network('169.254.0.0/16') or
self in IPv4Network('172.16.0.0/12') or
self in IPv4Network('192.0.0.0/29') or
self in IPv4Network('192.0.0.170/31') or
self in IPv4Network('192.0.2.0/24') or
self in IPv4Network('192.168.0.0/16') or
self in IPv4Network('198.18.0.0/15') or
self in IPv4Network('198.51.100.0/24') or
self in IPv4Network('203.0.113.0/24') or
self in IPv4Network('240.0.0.0/4') or
self in IPv4Network('255.255.255.255/32'))
@property
def is_multicast(self):
"""Test if the address is reserved for multicast use.
Returns:
A boolean, True if the address is multicast.
See RFC 3171 for details.
"""
multicast_network = IPv4Network('224.0.0.0/4')
return self in multicast_network
@property
def is_unspecified(self):
"""Test if the address is unspecified.
Returns:
A boolean, True if this is the unspecified address as defined in
RFC 5735 3.
"""
unspecified_address = IPv4Address('0.0.0.0')
return self == unspecified_address
@property
def is_loopback(self):
"""Test if the address is a loopback address.
Returns:
A boolean, True if the address is a loopback per RFC 3330.
"""
loopback_network = IPv4Network('127.0.0.0/8')
return self in loopback_network
@property
def is_link_local(self):
"""Test if the address is reserved for link-local.
Returns:
A boolean, True if the address is link-local per RFC 3927.
"""
linklocal_network = IPv4Network('169.254.0.0/16')
return self in linklocal_network
class IPv4Interface(IPv4Address):
def __init__(self, address):
if isinstance(address, (bytes, int)):
IPv4Address.__init__(self, address)
self.network = IPv4Network(self._ip)
self._prefixlen = self._max_prefixlen
return
addr = _split_optional_netmask(address)
IPv4Address.__init__(self, addr[0])
self.network = IPv4Network(address, strict=False)
self._prefixlen = self.network._prefixlen
self.netmask = self.network.netmask
self.hostmask = self.network.hostmask
def __str__(self):
return '%s/%d' % (self._string_from_ip_int(self._ip),
self.network.prefixlen)
def __eq__(self, other):
address_equal = IPv4Address.__eq__(self, other)
if not address_equal or address_equal is NotImplemented:
return address_equal
try:
return self.network == other.network
except AttributeError:
# An interface with an associated network is NOT the
# same as an unassociated address. That's why the hash
# takes the extra info into account.
return False
def __lt__(self, other):
address_less = IPv4Address.__lt__(self, other)
if address_less is NotImplemented:
return NotImplemented
try:
return self.network < other.network
except AttributeError:
# We *do* allow addresses and interfaces to be sorted. The
# unassociated address is considered less than all interfaces.
return False
def __hash__(self):
return self._ip ^ self._prefixlen ^ int(self.network.network_address)
@property
def ip(self):
return IPv4Address(self._ip)
@property
def with_prefixlen(self):
return '%s/%s' % (self._string_from_ip_int(self._ip),
self._prefixlen)
@property
def with_netmask(self):
return '%s/%s' % (self._string_from_ip_int(self._ip),
self.netmask)
@property
def with_hostmask(self):
return '%s/%s' % (self._string_from_ip_int(self._ip),
self.hostmask)
class IPv4Network(_BaseV4, _BaseNetwork):
"""This class represents and manipulates 32-bit IPv4 network + addresses..
Attributes: [examples for IPv4Network('192.0.2.0/27')]
.network_address: IPv4Address('192.0.2.0')
.hostmask: IPv4Address('0.0.0.31')
.broadcast_address: IPv4Address('192.0.2.32')
.netmask: IPv4Address('255.255.255.224')
.prefixlen: 27
"""
# Class to use when creating address objects
_address_class = IPv4Address
def __init__(self, address, strict=True):
"""Instantiate a new IPv4 network object.
Args:
address: A string or integer representing the IP [& network].
'192.0.2.0/24'
'192.0.2.0/255.255.255.0'
'192.0.0.2/0.0.0.255'
are all functionally the same in IPv4. Similarly,
'192.0.2.1'
'192.0.2.1/255.255.255.255'
'192.0.2.1/32'
are also functionally equivalent. That is to say, failing to
provide a subnetmask will create an object with a mask of /32.
If the mask (portion after the / in the argument) is given in
dotted quad form, it is treated as a netmask if it starts with a
non-zero field (e.g. /255.0.0.0 == /8) and as a hostmask if it
starts with a zero field (e.g. 0.255.255.255 == /8), with the
single exception of an all-zero mask which is treated as a
netmask == /0. If no mask is given, a default of /32 is used.
Additionally, an integer can be passed, so
IPv4Network('192.0.2.1') == IPv4Network(3221225985)
or, more generally
IPv4Interface(int(IPv4Interface('192.0.2.1'))) ==
IPv4Interface('192.0.2.1')
Raises:
AddressValueError: If ipaddress isn't a valid IPv4 address.
NetmaskValueError: If the netmask isn't valid for
an IPv4 address.
ValueError: If strict is True and a network address is not
supplied.
"""
_BaseV4.__init__(self, address)
_BaseNetwork.__init__(self, address)
# Constructing from a packed address
if isinstance(address, bytes):
self.network_address = IPv4Address(address)
self._prefixlen = self._max_prefixlen
self.netmask = IPv4Address(self._ALL_ONES)
#fixme: address/network test here
return
# Efficient constructor from integer.
if isinstance(address, int):
self.network_address = IPv4Address(address)
self._prefixlen = self._max_prefixlen
self.netmask = IPv4Address(self._ALL_ONES)
#fixme: address/network test here.
return
# Assume input argument to be string or any object representation
# which converts into a formatted IP prefix string.
addr = _split_optional_netmask(address)
self.network_address = IPv4Address(self._ip_int_from_string(addr[0]))
if len(addr) == 2:
try:
# Check for a netmask in prefix length form
self._prefixlen = self._prefix_from_prefix_string(addr[1])
except NetmaskValueError:
# Check for a netmask or hostmask in dotted-quad form.
# This may raise NetmaskValueError.
self._prefixlen = self._prefix_from_ip_string(addr[1])
else:
self._prefixlen = self._max_prefixlen
self.netmask = IPv4Address(self._ip_int_from_prefix(self._prefixlen))
if strict:
if (IPv4Address(int(self.network_address) & int(self.netmask)) !=
self.network_address):
raise ValueError('%s has host bits set' % self)
self.network_address = IPv4Address(int(self.network_address) &
int(self.netmask))
if self._prefixlen == (self._max_prefixlen - 1):
self.hosts = self.__iter__
@property
@functools.lru_cache()
def is_global(self):
"""Test if this address is allocated for public networks.
Returns:
A boolean, True if the address is not reserved per
iana-ipv4-special-registry.
"""
return (not (self.network_address in IPv4Network('100.64.0.0/10') and
self.broadcast_address in IPv4Network('100.64.0.0/10')) and
not self.is_private)
class _BaseV6:
"""Base IPv6 object.
The following methods are used by IPv6 objects in both single IP
addresses and networks.
"""
_ALL_ONES = (2**IPV6LENGTH) - 1
_HEXTET_COUNT = 8
_HEX_DIGITS = frozenset('0123456789ABCDEFabcdef')
def __init__(self, address):
self._version = 6
self._max_prefixlen = IPV6LENGTH
def _ip_int_from_string(self, ip_str):
"""Turn an IPv6 ip_str into an integer.
Args:
ip_str: A string, the IPv6 ip_str.
Returns:
An int, the IPv6 address
Raises:
AddressValueError: if ip_str isn't a valid IPv6 Address.
"""
if not ip_str:
raise AddressValueError('Address cannot be empty')
parts = ip_str.split(':')
# An IPv6 address needs at least 2 colons (3 parts).
_min_parts = 3
if len(parts) < _min_parts:
msg = "At least %d parts expected in %r" % (_min_parts, ip_str)
raise AddressValueError(msg)
# If the address has an IPv4-style suffix, convert it to hexadecimal.
if '.' in parts[-1]:
try:
ipv4_int = IPv4Address(parts.pop())._ip
except AddressValueError as exc:
raise AddressValueError("%s in %r" % (exc, ip_str)) from None
parts.append('%x' % ((ipv4_int >> 16) & 0xFFFF))
parts.append('%x' % (ipv4_int & 0xFFFF))
# An IPv6 address can't have more than 8 colons (9 parts).
# The extra colon comes from using the "::" notation for a single
# leading or trailing zero part.
_max_parts = self._HEXTET_COUNT + 1
if len(parts) > _max_parts:
msg = "At most %d colons permitted in %r" % (_max_parts-1, ip_str)
raise AddressValueError(msg)
# Disregarding the endpoints, find '::' with nothing in between.
# This indicates that a run of zeroes has been skipped.
skip_index = None
for i in range(1, len(parts) - 1):
if not parts[i]:
if skip_index is not None:
# Can't have more than one '::'
msg = "At most one '::' permitted in %r" % ip_str
raise AddressValueError(msg)
skip_index = i
# parts_hi is the number of parts to copy from above/before the '::'
# parts_lo is the number of parts to copy from below/after the '::'
if skip_index is not None:
# If we found a '::', then check if it also covers the endpoints.
parts_hi = skip_index
parts_lo = len(parts) - skip_index - 1
if not parts[0]:
parts_hi -= 1
if parts_hi:
msg = "Leading ':' only permitted as part of '::' in %r"
raise AddressValueError(msg % ip_str) # ^: requires ^::
if not parts[-1]:
parts_lo -= 1
if parts_lo:
msg = "Trailing ':' only permitted as part of '::' in %r"
raise AddressValueError(msg % ip_str) # :$ requires ::$
parts_skipped = self._HEXTET_COUNT - (parts_hi + parts_lo)
if parts_skipped < 1:
msg = "Expected at most %d other parts with '::' in %r"
raise AddressValueError(msg % (self._HEXTET_COUNT-1, ip_str))
else:
# Otherwise, allocate the entire address to parts_hi. The
# endpoints could still be empty, but _parse_hextet() will check
# for that.
if len(parts) != self._HEXTET_COUNT:
msg = "Exactly %d parts expected without '::' in %r"
raise AddressValueError(msg % (self._HEXTET_COUNT, ip_str))
if not parts[0]:
msg = "Leading ':' only permitted as part of '::' in %r"
raise AddressValueError(msg % ip_str) # ^: requires ^::
if not parts[-1]:
msg = "Trailing ':' only permitted as part of '::' in %r"
raise AddressValueError(msg % ip_str) # :$ requires ::$
parts_hi = len(parts)
parts_lo = 0
parts_skipped = 0
try:
# Now, parse the hextets into a 128-bit integer.
ip_int = 0
for i in range(parts_hi):
ip_int <<= 16
ip_int |= self._parse_hextet(parts[i])
ip_int <<= 16 * parts_skipped
for i in range(-parts_lo, 0):
ip_int <<= 16
ip_int |= self._parse_hextet(parts[i])
return ip_int
except ValueError as exc:
raise AddressValueError("%s in %r" % (exc, ip_str)) from None
def _parse_hextet(self, hextet_str):
"""Convert an IPv6 hextet string into an integer.
Args:
hextet_str: A string, the number to parse.
Returns:
The hextet as an integer.
Raises:
ValueError: if the input isn't strictly a hex number from
[0..FFFF].
"""
# Whitelist the characters, since int() allows a lot of bizarre stuff.
if not self._HEX_DIGITS.issuperset(hextet_str):
raise ValueError("Only hex digits permitted in %r" % hextet_str)
# We do the length check second, since the invalid character error
# is likely to be more informative for the user
if len(hextet_str) > 4:
msg = "At most 4 characters permitted in %r"
raise ValueError(msg % hextet_str)
# Length check means we can skip checking the integer value
return int(hextet_str, 16)
def _compress_hextets(self, hextets):
"""Compresses a list of hextets.
Compresses a list of strings, replacing the longest continuous
sequence of "0" in the list with "" and adding empty strings at
the beginning or at the end of the string such that subsequently
calling ":".join(hextets) will produce the compressed version of
the IPv6 address.
Args:
hextets: A list of strings, the hextets to compress.
Returns:
A list of strings.
"""
best_doublecolon_start = -1
best_doublecolon_len = 0
doublecolon_start = -1
doublecolon_len = 0
for index, hextet in enumerate(hextets):
if hextet == '0':
doublecolon_len += 1
if doublecolon_start == -1:
# Start of a sequence of zeros.
doublecolon_start = index
if doublecolon_len > best_doublecolon_len:
# This is the longest sequence of zeros so far.
best_doublecolon_len = doublecolon_len
best_doublecolon_start = doublecolon_start
else:
doublecolon_len = 0
doublecolon_start = -1
if best_doublecolon_len > 1:
best_doublecolon_end = (best_doublecolon_start +
best_doublecolon_len)
# For zeros at the end of the address.
if best_doublecolon_end == len(hextets):
hextets += ['']
hextets[best_doublecolon_start:best_doublecolon_end] = ['']
# For zeros at the beginning of the address.
if best_doublecolon_start == 0:
hextets = [''] + hextets
return hextets
def _string_from_ip_int(self, ip_int=None):
"""Turns a 128-bit integer into hexadecimal notation.
Args:
ip_int: An integer, the IP address.
Returns:
A string, the hexadecimal representation of the address.
Raises:
ValueError: The address is bigger than 128 bits of all ones.
"""
if ip_int is None:
ip_int = int(self._ip)
if ip_int > self._ALL_ONES:
raise ValueError('IPv6 address is too large')
hex_str = '%032x' % ip_int
hextets = ['%x' % int(hex_str[x:x+4], 16) for x in range(0, 32, 4)]
hextets = self._compress_hextets(hextets)
return ':'.join(hextets)
def _explode_shorthand_ip_string(self):
"""Expand a shortened IPv6 address.
Args:
ip_str: A string, the IPv6 address.
Returns:
A string, the expanded IPv6 address.
"""
if isinstance(self, IPv6Network):
ip_str = str(self.network_address)
elif isinstance(self, IPv6Interface):
ip_str = str(self.ip)
else:
ip_str = str(self)
ip_int = self._ip_int_from_string(ip_str)
hex_str = '%032x' % ip_int
parts = [hex_str[x:x+4] for x in range(0, 32, 4)]
if isinstance(self, (_BaseNetwork, IPv6Interface)):
return '%s/%d' % (':'.join(parts), self._prefixlen)
return ':'.join(parts)
@property
def max_prefixlen(self):
return self._max_prefixlen
@property
def version(self):
return self._version
class IPv6Address(_BaseV6, _BaseAddress):
"""Represent and manipulate single IPv6 Addresses."""
def __init__(self, address):
"""Instantiate a new IPv6 address object.
Args:
address: A string or integer representing the IP
Additionally, an integer can be passed, so
IPv6Address('2001:db8::') ==
IPv6Address(42540766411282592856903984951653826560)
or, more generally
IPv6Address(int(IPv6Address('2001:db8::'))) ==
IPv6Address('2001:db8::')
Raises:
AddressValueError: If address isn't a valid IPv6 address.
"""
_BaseAddress.__init__(self, address)
_BaseV6.__init__(self, address)
# Efficient constructor from integer.
if isinstance(address, int):
self._check_int_address(address)
self._ip = address
return
# Constructing from a packed address
if isinstance(address, bytes):
self._check_packed_address(address, 16)
self._ip = int.from_bytes(address, 'big')
return
# Assume input argument to be string or any object representation
# which converts into a formatted IP string.
addr_str = str(address)
self._ip = self._ip_int_from_string(addr_str)
@property
def packed(self):
"""The binary representation of this address."""
return v6_int_to_packed(self._ip)
@property
def is_multicast(self):
"""Test if the address is reserved for multicast use.
Returns:
A boolean, True if the address is a multicast address.
See RFC 2373 2.7 for details.
"""
multicast_network = IPv6Network('ff00::/8')
return self in multicast_network
@property
def is_reserved(self):
"""Test if the address is otherwise IETF reserved.
Returns:
A boolean, True if the address is within one of the
reserved IPv6 Network ranges.
"""
reserved_networks = [IPv6Network('::/8'), IPv6Network('100::/8'),
IPv6Network('200::/7'), IPv6Network('400::/6'),
IPv6Network('800::/5'), IPv6Network('1000::/4'),
IPv6Network('4000::/3'), IPv6Network('6000::/3'),
IPv6Network('8000::/3'), IPv6Network('A000::/3'),
IPv6Network('C000::/3'), IPv6Network('E000::/4'),
IPv6Network('F000::/5'), IPv6Network('F800::/6'),
IPv6Network('FE00::/9')]
return any(self in x for x in reserved_networks)
@property
def is_link_local(self):
"""Test if the address is reserved for link-local.
Returns:
A boolean, True if the address is reserved per RFC 4291.
"""
linklocal_network = IPv6Network('fe80::/10')
return self in linklocal_network
@property
def is_site_local(self):
"""Test if the address is reserved for site-local.
Note that the site-local address space has been deprecated by RFC 3879.
Use is_private to test if this address is in the space of unique local
addresses as defined by RFC 4193.
Returns:
A boolean, True if the address is reserved per RFC 3513 2.5.6.
"""
sitelocal_network = IPv6Network('fec0::/10')
return self in sitelocal_network
@property
@functools.lru_cache()
def is_private(self):
"""Test if this address is allocated for private networks.
Returns:
A boolean, True if the address is reserved per
iana-ipv6-special-registry.
"""
return (self in IPv6Network('::1/128') or
self in IPv6Network('::/128') or
self in IPv6Network('::ffff:0:0/96') or
self in IPv6Network('100::/64') or
self in IPv6Network('2001::/23') or
self in IPv6Network('2001:2::/48') or
self in IPv6Network('2001:db8::/32') or
self in IPv6Network('2001:10::/28') or
self in IPv6Network('fc00::/7') or
self in IPv6Network('fe80::/10'))
@property
def is_global(self):
"""Test if this address is allocated for public networks.
Returns:
A boolean, true if the address is not reserved per
iana-ipv6-special-registry.
"""
return not self.is_private
@property
def is_unspecified(self):
"""Test if the address is unspecified.
Returns:
A boolean, True if this is the unspecified address as defined in
RFC 2373 2.5.2.
"""
return self._ip == 0
@property
def is_loopback(self):
"""Test if the address is a loopback address.
Returns:
A boolean, True if the address is a loopback address as defined in
RFC 2373 2.5.3.
"""
return self._ip == 1
@property
def ipv4_mapped(self):
"""Return the IPv4 mapped address.
Returns:
If the IPv6 address is a v4 mapped address, return the
IPv4 mapped address. Return None otherwise.
"""
if (self._ip >> 32) != 0xFFFF:
return None
return IPv4Address(self._ip & 0xFFFFFFFF)
@property
def teredo(self):
"""Tuple of embedded teredo IPs.
Returns:
Tuple of the (server, client) IPs or None if the address
doesn't appear to be a teredo address (doesn't start with
2001::/32)
"""
if (self._ip >> 96) != 0x20010000:
return None
return (IPv4Address((self._ip >> 64) & 0xFFFFFFFF),
IPv4Address(~self._ip & 0xFFFFFFFF))
@property
def sixtofour(self):
"""Return the IPv4 6to4 embedded address.
Returns:
The IPv4 6to4-embedded address if present or None if the
address doesn't appear to contain a 6to4 embedded address.
"""
if (self._ip >> 112) != 0x2002:
return None
return IPv4Address((self._ip >> 80) & 0xFFFFFFFF)
class IPv6Interface(IPv6Address):
def __init__(self, address):
if isinstance(address, (bytes, int)):
IPv6Address.__init__(self, address)
self.network = IPv6Network(self._ip)
self._prefixlen = self._max_prefixlen
return
addr = _split_optional_netmask(address)
IPv6Address.__init__(self, addr[0])
self.network = IPv6Network(address, strict=False)
self.netmask = self.network.netmask
self._prefixlen = self.network._prefixlen
self.hostmask = self.network.hostmask
def __str__(self):
return '%s/%d' % (self._string_from_ip_int(self._ip),
self.network.prefixlen)
def __eq__(self, other):
address_equal = IPv6Address.__eq__(self, other)
if not address_equal or address_equal is NotImplemented:
return address_equal
try:
return self.network == other.network
except AttributeError:
# An interface with an associated network is NOT the
# same as an unassociated address. That's why the hash
# takes the extra info into account.
return False
def __lt__(self, other):
address_less = IPv6Address.__lt__(self, other)
if address_less is NotImplemented:
return NotImplemented
try:
return self.network < other.network
except AttributeError:
# We *do* allow addresses and interfaces to be sorted. The
# unassociated address is considered less than all interfaces.
return False
def __hash__(self):
return self._ip ^ self._prefixlen ^ int(self.network.network_address)
@property
def ip(self):
return IPv6Address(self._ip)
@property
def with_prefixlen(self):
return '%s/%s' % (self._string_from_ip_int(self._ip),
self._prefixlen)
@property
def with_netmask(self):
return '%s/%s' % (self._string_from_ip_int(self._ip),
self.netmask)
@property
def with_hostmask(self):
return '%s/%s' % (self._string_from_ip_int(self._ip),
self.hostmask)
@property
def is_unspecified(self):
return self._ip == 0 and self.network.is_unspecified
@property
def is_loopback(self):
return self._ip == 1 and self.network.is_loopback
class IPv6Network(_BaseV6, _BaseNetwork):
"""This class represents and manipulates 128-bit IPv6 networks.
Attributes: [examples for IPv6('2001:db8::1000/124')]
.network_address: IPv6Address('2001:db8::1000')
.hostmask: IPv6Address('::f')
.broadcast_address: IPv6Address('2001:db8::100f')
.netmask: IPv6Address('ffff:ffff:ffff:ffff:ffff:ffff:ffff:fff0')
.prefixlen: 124
"""
# Class to use when creating address objects
_address_class = IPv6Address
def __init__(self, address, strict=True):
"""Instantiate a new IPv6 Network object.
Args:
address: A string or integer representing the IPv6 network or the
IP and prefix/netmask.
'2001:db8::/128'
'2001:db8:0000:0000:0000:0000:0000:0000/128'
'2001:db8::'
are all functionally the same in IPv6. That is to say,
failing to provide a subnetmask will create an object with
a mask of /128.
Additionally, an integer can be passed, so
IPv6Network('2001:db8::') ==
IPv6Network(42540766411282592856903984951653826560)
or, more generally
IPv6Network(int(IPv6Network('2001:db8::'))) ==
IPv6Network('2001:db8::')
strict: A boolean. If true, ensure that we have been passed
A true network address, eg, 2001:db8::1000/124 and not an
IP address on a network, eg, 2001:db8::1/124.
Raises:
AddressValueError: If address isn't a valid IPv6 address.
NetmaskValueError: If the netmask isn't valid for
an IPv6 address.
ValueError: If strict was True and a network address was not
supplied.
"""
_BaseV6.__init__(self, address)
_BaseNetwork.__init__(self, address)
# Efficient constructor from integer.
if isinstance(address, int):
self.network_address = IPv6Address(address)
self._prefixlen = self._max_prefixlen
self.netmask = IPv6Address(self._ALL_ONES)
return
# Constructing from a packed address
if isinstance(address, bytes):
self.network_address = IPv6Address(address)
self._prefixlen = self._max_prefixlen
self.netmask = IPv6Address(self._ALL_ONES)
return
# Assume input argument to be string or any object representation
# which converts into a formatted IP prefix string.
addr = _split_optional_netmask(address)
self.network_address = IPv6Address(self._ip_int_from_string(addr[0]))
if len(addr) == 2:
# This may raise NetmaskValueError
self._prefixlen = self._prefix_from_prefix_string(addr[1])
else:
self._prefixlen = self._max_prefixlen
self.netmask = IPv6Address(self._ip_int_from_prefix(self._prefixlen))
if strict:
if (IPv6Address(int(self.network_address) & int(self.netmask)) !=
self.network_address):
raise ValueError('%s has host bits set' % self)
self.network_address = IPv6Address(int(self.network_address) &
int(self.netmask))
if self._prefixlen == (self._max_prefixlen - 1):
self.hosts = self.__iter__
def hosts(self):
"""Generate Iterator over usable hosts in a network.
This is like __iter__ except it doesn't return the
Subnet-Router anycast address.
"""
network = int(self.network_address)
broadcast = int(self.broadcast_address)
for x in range(network + 1, broadcast + 1):
yield self._address_class(x)
@property
def is_site_local(self):
"""Test if the address is reserved for site-local.
Note that the site-local address space has been deprecated by RFC 3879.
Use is_private to test if this address is in the space of unique local
addresses as defined by RFC 4193.
Returns:
A boolean, True if the address is reserved per RFC 3513 2.5.6.
"""
return (self.network_address.is_site_local and
self.broadcast_address.is_site_local)
#! /usr/bin/env python3
"""Keywords (from "graminit.c")
This file is automatically generated; please don't muck it up!
To update the symbols in this file, 'cd' to the top directory of
the python source tree after building the interpreter and run:
./python Lib/keyword.py
"""
__all__ = ["iskeyword", "kwlist"]
kwlist = [
#--start keywords--
'False',
'None',
'True',
'and',
'as',
'assert',
'break',
'class',
'continue',
'def',
'del',
'elif',
'else',
'except',
'finally',
'for',
'from',
'global',
'if',
'import',
'in',
'is',
'lambda',
'nonlocal',
'not',
'or',
'pass',
'raise',
'return',
'try',
'while',
'with',
'yield',
#--end keywords--
]
iskeyword = frozenset(kwlist).__contains__
def main():
import sys, re
args = sys.argv[1:]
iptfile = args and args[0] or "Python/graminit.c"
if len(args) > 1: optfile = args[1]
else: optfile = "Lib/keyword.py"
# load the output skeleton from the target, taking care to preserve its
# newline convention.
with open(optfile, newline='') as fp:
format = fp.readlines()
nl = format[0][len(format[0].strip()):] if format else '\n'
# scan the source file for keywords
with open(iptfile) as fp:
strprog = re.compile('"([^"]+)"')
lines = []
for line in fp:
if '{1, "' in line:
match = strprog.search(line)
if match:
lines.append(" '" + match.group(1) + "'," + nl)
lines.sort()
# insert the lines of keywords into the skeleton
try:
start = format.index("#--start keywords--" + nl) + 1
end = format.index("#--end keywords--" + nl)
format[start:end] = lines
except ValueError:
sys.stderr.write("target does not contain format markers\n")
sys.exit(1)
# write the output file
with open(optfile, 'w', newline='') as fp:
fp.writelines(format)
if __name__ == "__main__":
main()
"""Cache lines from Python source files.
This is intended to read lines from modules imported -- hence if a filename
is not found, it will look down the module search path for a file by
that name.
"""
import sys
import os
import tokenize
__all__ = ["getline", "clearcache", "checkcache"]
def getline(filename, lineno, module_globals=None):
lines = getlines(filename, module_globals)
if 1 <= lineno <= len(lines):
return lines[lineno-1]
else:
return ''
# The cache
cache = {} # The cache
def clearcache():
"""Clear the cache entirely."""
global cache
cache = {}
def getlines(filename, module_globals=None):
"""Get the lines for a Python source file from the cache.
Update the cache if it doesn't contain an entry for this file already."""
if filename in cache:
return cache[filename][2]
try:
return updatecache(filename, module_globals)
except MemoryError:
clearcache()
return []
def checkcache(filename=None):
"""Discard cache entries that are out of date.
(This is not checked upon each call!)"""
if filename is None:
filenames = list(cache.keys())
else:
if filename in cache:
filenames = [filename]
else:
return
for filename in filenames:
size, mtime, lines, fullname = cache[filename]
if mtime is None:
continue # no-op for files loaded via a __loader__
try:
stat = os.stat(fullname)
except OSError:
del cache[filename]
continue
if size != stat.st_size or mtime != stat.st_mtime:
del cache[filename]
def updatecache(filename, module_globals=None):
"""Update a cache entry and return its list of lines.
If something's wrong, print a message, discard the cache entry,
and return an empty list."""
if filename in cache:
del cache[filename]
if not filename or (filename.startswith('<') and filename.endswith('>')):
return []
fullname = filename
try:
stat = os.stat(fullname)
except OSError:
basename = filename
# Try for a __loader__, if available
if module_globals and '__loader__' in module_globals:
name = module_globals.get('__name__')
loader = module_globals['__loader__']
get_source = getattr(loader, 'get_source', None)
if name and get_source:
try:
data = get_source(name)
except (ImportError, OSError):
pass
else:
if data is None:
# No luck, the PEP302 loader cannot find the source
# for this module.
return []
cache[filename] = (
len(data), None,
[line+'\n' for line in data.splitlines()], fullname
)
return cache[filename][2]
# Try looking through the module search path, which is only useful
# when handling a relative filename.
if os.path.isabs(filename):
return []
for dirname in sys.path:
try:
fullname = os.path.join(dirname, basename)
except (TypeError, AttributeError):
# Not sufficiently string-like to do anything useful with.
continue
try:
stat = os.stat(fullname)
break
except OSError:
pass
else:
return []
try:
with tokenize.open(fullname) as fp:
lines = fp.readlines()
except OSError:
return []
if lines and not lines[-1].endswith('\n'):
lines[-1] += '\n'
size, mtime = stat.st_size, stat.st_mtime
cache[filename] = size, mtime, lines, fullname
return lines
""" Locale support.
The module provides low-level access to the C lib's locale APIs
and adds high level number formatting APIs as well as a locale
aliasing engine to complement these.
The aliasing engine includes support for many commonly used locale
names and maps them to values suitable for passing to the C lib's
setlocale() function. It also includes default encodings for all
supported locale names.
"""
import sys
import encodings
import encodings.aliases
import re
import collections
from builtins import str as _builtin_str
import functools
# Try importing the _locale module.
#
# If this fails, fall back on a basic 'C' locale emulation.
# Yuck: LC_MESSAGES is non-standard: can't tell whether it exists before
# trying the import. So __all__ is also fiddled at the end of the file.
__all__ = ["getlocale", "getdefaultlocale", "getpreferredencoding", "Error",
"setlocale", "resetlocale", "localeconv", "strcoll", "strxfrm",
"str", "atof", "atoi", "format", "format_string", "currency",
"normalize", "LC_CTYPE", "LC_COLLATE", "LC_TIME", "LC_MONETARY",
"LC_NUMERIC", "LC_ALL", "CHAR_MAX"]
def _strcoll(a,b):
""" strcoll(string,string) -> int.
Compares two strings according to the locale.
"""
return (a > b) - (a < b)
def _strxfrm(s):
""" strxfrm(string) -> string.
Returns a string that behaves for cmp locale-aware.
"""
return s
try:
from _locale import *
except ImportError:
# Locale emulation
CHAR_MAX = 127
LC_ALL = 6
LC_COLLATE = 3
LC_CTYPE = 0
LC_MESSAGES = 5
LC_MONETARY = 4
LC_NUMERIC = 1
LC_TIME = 2
Error = ValueError
def localeconv():
""" localeconv() -> dict.
Returns numeric and monetary locale-specific parameters.
"""
# 'C' locale default values
return {'grouping': [127],
'currency_symbol': '',
'n_sign_posn': 127,
'p_cs_precedes': 127,
'n_cs_precedes': 127,
'mon_grouping': [],
'n_sep_by_space': 127,
'decimal_point': '.',
'negative_sign': '',
'positive_sign': '',
'p_sep_by_space': 127,
'int_curr_symbol': '',
'p_sign_posn': 127,
'thousands_sep': '',
'mon_thousands_sep': '',
'frac_digits': 127,
'mon_decimal_point': '',
'int_frac_digits': 127}
def setlocale(category, value=None):
""" setlocale(integer,string=None) -> string.
Activates/queries locale processing.
"""
if value not in (None, '', 'C'):
raise Error('_locale emulation only supports "C" locale')
return 'C'
# These may or may not exist in _locale, so be sure to set them.
if 'strxfrm' not in globals():
strxfrm = _strxfrm
if 'strcoll' not in globals():
strcoll = _strcoll
_localeconv = localeconv
# With this dict, you can override some items of localeconv's return value.
# This is useful for testing purposes.
_override_localeconv = {}
@functools.wraps(_localeconv)
def localeconv():
d = _localeconv()
if _override_localeconv:
d.update(_override_localeconv)
return d
### Number formatting APIs
# Author: Martin von Loewis
# improved by Georg Brandl
# Iterate over grouping intervals
def _grouping_intervals(grouping):
last_interval = None
for interval in grouping:
# if grouping is -1, we are done
if interval == CHAR_MAX:
return
# 0: re-use last group ad infinitum
if interval == 0:
if last_interval is None:
raise ValueError("invalid grouping")
while True:
yield last_interval
yield interval
last_interval = interval
#perform the grouping from right to left
def _group(s, monetary=False):
conv = localeconv()
thousands_sep = conv[monetary and 'mon_thousands_sep' or 'thousands_sep']
grouping = conv[monetary and 'mon_grouping' or 'grouping']
if not grouping:
return (s, 0)
if s[-1] == ' ':
stripped = s.rstrip()
right_spaces = s[len(stripped):]
s = stripped
else:
right_spaces = ''
left_spaces = ''
groups = []
for interval in _grouping_intervals(grouping):
if not s or s[-1] not in "0123456789":
# only non-digit characters remain (sign, spaces)
left_spaces = s
s = ''
break
groups.append(s[-interval:])
s = s[:-interval]
if s:
groups.append(s)
groups.reverse()
return (
left_spaces + thousands_sep.join(groups) + right_spaces,
len(thousands_sep) * (len(groups) - 1)
)
# Strip a given amount of excess padding from the given string
def _strip_padding(s, amount):
lpos = 0
while amount and s[lpos] == ' ':
lpos += 1
amount -= 1
rpos = len(s) - 1
while amount and s[rpos] == ' ':
rpos -= 1
amount -= 1
return s[lpos:rpos+1]
_percent_re = re.compile(r'%(?:\((?P<key>.*?)\))?'
r'(?P<modifiers>[-#0-9 +*.hlL]*?)[eEfFgGdiouxXcrs%]')
def format(percent, value, grouping=False, monetary=False, *additional):
"""Returns the locale-aware substitution of a %? specifier
(percent).
additional is for format strings which contain one or more
'*' modifiers."""
# this is only for one-percent-specifier strings and this should be checked
match = _percent_re.match(percent)
if not match or len(match.group())!= len(percent):
raise ValueError(("format() must be given exactly one %%char "
"format specifier, %s not valid") % repr(percent))
return _format(percent, value, grouping, monetary, *additional)
def _format(percent, value, grouping=False, monetary=False, *additional):
if additional:
formatted = percent % ((value,) + additional)
else:
formatted = percent % value
# floats and decimal ints need special action!
if percent[-1] in 'eEfFgG':
seps = 0
parts = formatted.split('.')
if grouping:
parts[0], seps = _group(parts[0], monetary=monetary)
decimal_point = localeconv()[monetary and 'mon_decimal_point'
or 'decimal_point']
formatted = decimal_point.join(parts)
if seps:
formatted = _strip_padding(formatted, seps)
elif percent[-1] in 'diu':
seps = 0
if grouping:
formatted, seps = _group(formatted, monetary=monetary)
if seps:
formatted = _strip_padding(formatted, seps)
return formatted
def format_string(f, val, grouping=False):
"""Formats a string in the same way that the % formatting would use,
but takes the current locale into account.
Grouping is applied if the third parameter is true."""
percents = list(_percent_re.finditer(f))
new_f = _percent_re.sub('%s', f)
if isinstance(val, collections.Mapping):
new_val = []
for perc in percents:
if perc.group()[-1]=='%':
new_val.append('%')
else:
new_val.append(format(perc.group(), val, grouping))
else:
if not isinstance(val, tuple):
val = (val,)
new_val = []
i = 0
for perc in percents:
if perc.group()[-1]=='%':
new_val.append('%')
else:
starcount = perc.group('modifiers').count('*')
new_val.append(_format(perc.group(),
val[i],
grouping,
False,
*val[i+1:i+1+starcount]))
i += (1 + starcount)
val = tuple(new_val)
return new_f % val
def currency(val, symbol=True, grouping=False, international=False):
"""Formats val according to the currency settings
in the current locale."""
conv = localeconv()
# check for illegal values
digits = conv[international and 'int_frac_digits' or 'frac_digits']
if digits == 127:
raise ValueError("Currency formatting is not possible using "
"the 'C' locale.")
s = format('%%.%if' % digits, abs(val), grouping, monetary=True)
# '<' and '>' are markers if the sign must be inserted between symbol and value
s = '<' + s + '>'
if symbol:
smb = conv[international and 'int_curr_symbol' or 'currency_symbol']
precedes = conv[val<0 and 'n_cs_precedes' or 'p_cs_precedes']
separated = conv[val<0 and 'n_sep_by_space' or 'p_sep_by_space']
if precedes:
s = smb + (separated and ' ' or '') + s
else:
s = s + (separated and ' ' or '') + smb
sign_pos = conv[val<0 and 'n_sign_posn' or 'p_sign_posn']
sign = conv[val<0 and 'negative_sign' or 'positive_sign']
if sign_pos == 0:
s = '(' + s + ')'
elif sign_pos == 1:
s = sign + s
elif sign_pos == 2:
s = s + sign
elif sign_pos == 3:
s = s.replace('<', sign)
elif sign_pos == 4:
s = s.replace('>', sign)
else:
# the default if nothing specified;
# this should be the most fitting sign position
s = sign + s
return s.replace('<', '').replace('>', '')
def str(val):
"""Convert float to integer, taking the locale into account."""
return format("%.12g", val)
def atof(string, func=float):
"Parses a string as a float according to the locale settings."
#First, get rid of the grouping
ts = localeconv()['thousands_sep']
if ts:
string = string.replace(ts, '')
#next, replace the decimal point with a dot
dd = localeconv()['decimal_point']
if dd:
string = string.replace(dd, '.')
#finally, parse the string
return func(string)
def atoi(str):
"Converts a string to an integer according to the locale settings."
return atof(str, int)
def _test():
setlocale(LC_ALL, "")
#do grouping
s1 = format("%d", 123456789,1)
print(s1, "is", atoi(s1))
#standard formatting
s1 = str(3.14)
print(s1, "is", atof(s1))
### Locale name aliasing engine
# Author: Marc-Andre Lemburg, [email protected]
# Various tweaks by Fredrik Lundh <[email protected]>
# store away the low-level version of setlocale (it's
# overridden below)
_setlocale = setlocale
def _replace_encoding(code, encoding):
if '.' in code:
langname = code[:code.index('.')]
else:
langname = code
# Convert the encoding to a C lib compatible encoding string
norm_encoding = encodings.normalize_encoding(encoding)
#print('norm encoding: %r' % norm_encoding)
norm_encoding = encodings.aliases.aliases.get(norm_encoding.lower(),
norm_encoding)
#print('aliased encoding: %r' % norm_encoding)
encoding = norm_encoding
norm_encoding = norm_encoding.lower()
if norm_encoding in locale_encoding_alias:
encoding = locale_encoding_alias[norm_encoding]
else:
norm_encoding = norm_encoding.replace('_', '')
norm_encoding = norm_encoding.replace('-', '')
if norm_encoding in locale_encoding_alias:
encoding = locale_encoding_alias[norm_encoding]
#print('found encoding %r' % encoding)
return langname + '.' + encoding
def _append_modifier(code, modifier):
if modifier == 'euro':
if '.' not in code:
return code + '.ISO8859-15'
_, _, encoding = code.partition('.')
if encoding in ('ISO8859-15', 'UTF-8'):
return code
if encoding == 'ISO8859-1':
return _replace_encoding(code, 'ISO8859-15')
return code + '@' + modifier
def normalize(localename):
""" Returns a normalized locale code for the given locale
name.
The returned locale code is formatted for use with
setlocale().
If normalization fails, the original name is returned
unchanged.
If the given encoding is not known, the function defaults to
the default encoding for the locale code just like setlocale()
does.
"""
# Normalize the locale name and extract the encoding and modifier
code = localename.lower()
if ':' in code:
# ':' is sometimes used as encoding delimiter.
code = code.replace(':', '.')
if '@' in code:
code, modifier = code.split('@', 1)
else:
modifier = ''
if '.' in code:
langname, encoding = code.split('.')[:2]
else:
langname = code
encoding = ''
# First lookup: fullname (possibly with encoding and modifier)
lang_enc = langname
if encoding:
norm_encoding = encoding.replace('-', '')
norm_encoding = norm_encoding.replace('_', '')
lang_enc += '.' + norm_encoding
lookup_name = lang_enc
if modifier:
lookup_name += '@' + modifier
code = locale_alias.get(lookup_name, None)
if code is not None:
return code
#print('first lookup failed')
if modifier:
# Second try: fullname without modifier (possibly with encoding)
code = locale_alias.get(lang_enc, None)
if code is not None:
#print('lookup without modifier succeeded')
if '@' not in code:
return _append_modifier(code, modifier)
if code.split('@', 1)[1].lower() == modifier:
return code
#print('second lookup failed')
if encoding:
# Third try: langname (without encoding, possibly with modifier)
lookup_name = langname
if modifier:
lookup_name += '@' + modifier
code = locale_alias.get(lookup_name, None)
if code is not None:
#print('lookup without encoding succeeded')
if '@' not in code:
return _replace_encoding(code, encoding)
code, modifier = code.split('@', 1)
return _replace_encoding(code, encoding) + '@' + modifier
if modifier:
# Fourth try: langname (without encoding and modifier)
code = locale_alias.get(langname, None)
if code is not None:
#print('lookup without modifier and encoding succeeded')
if '@' not in code:
code = _replace_encoding(code, encoding)
return _append_modifier(code, modifier)
code, defmod = code.split('@', 1)
if defmod.lower() == modifier:
return _replace_encoding(code, encoding) + '@' + defmod
return localename
def _parse_localename(localename):
""" Parses the locale code for localename and returns the
result as tuple (language code, encoding).
The localename is normalized and passed through the locale
alias engine. A ValueError is raised in case the locale name
cannot be parsed.
The language code corresponds to RFC 1766. code and encoding
can be None in case the values cannot be determined or are
unknown to this implementation.
"""
code = normalize(localename)
if '@' in code:
# Deal with locale modifiers
code, modifier = code.split('@', 1)
if modifier == 'euro' and '.' not in code:
# Assume Latin-9 for @euro locales. This is bogus,
# since some systems may use other encodings for these
# locales. Also, we ignore other modifiers.
return code, 'iso-8859-15'
if '.' in code:
return tuple(code.split('.')[:2])
elif code == 'C':
return None, None
raise ValueError('unknown locale: %s' % localename)
def _build_localename(localetuple):
""" Builds a locale code from the given tuple (language code,
encoding).
No aliasing or normalizing takes place.
"""
try:
language, encoding = localetuple
if language is None:
language = 'C'
if encoding is None:
return language
else:
return language + '.' + encoding
except (TypeError, ValueError):
raise TypeError('Locale must be None, a string, or an iterable of two strings -- language code, encoding.')
def getdefaultlocale(envvars=('LC_ALL', 'LC_CTYPE', 'LANG', 'LANGUAGE')):
""" Tries to determine the default locale settings and returns
them as tuple (language code, encoding).
According to POSIX, a program which has not called
setlocale(LC_ALL, "") runs using the portable 'C' locale.
Calling setlocale(LC_ALL, "") lets it use the default locale as
defined by the LANG variable. Since we don't want to interfere
with the current locale setting we thus emulate the behavior
in the way described above.
To maintain compatibility with other platforms, not only the
LANG variable is tested, but a list of variables given as
envvars parameter. The first found to be defined will be
used. envvars defaults to the search path used in GNU gettext;
it must always contain the variable name 'LANG'.
Except for the code 'C', the language code corresponds to RFC
1766. code and encoding can be None in case the values cannot
be determined.
"""
try:
# check if it's supported by the _locale module
import _locale
code, encoding = _locale._getdefaultlocale()
except (ImportError, AttributeError):
pass
else:
# make sure the code/encoding values are valid
if sys.platform == "win32" and code and code[:2] == "0x":
# map windows language identifier to language name
code = windows_locale.get(int(code, 0))
# ...add other platform-specific processing here, if
# necessary...
return code, encoding
# fall back on POSIX behaviour
import os
lookup = os.environ.get
for variable in envvars:
localename = lookup(variable,None)
if localename:
if variable == 'LANGUAGE':
localename = localename.split(':')[0]
break
else:
localename = 'C'
return _parse_localename(localename)
def getlocale(category=LC_CTYPE):
""" Returns the current setting for the given locale category as
tuple (language code, encoding).
category may be one of the LC_* value except LC_ALL. It
defaults to LC_CTYPE.
Except for the code 'C', the language code corresponds to RFC
1766. code and encoding can be None in case the values cannot
be determined.
"""
localename = _setlocale(category)
if category == LC_ALL and ';' in localename:
raise TypeError('category LC_ALL is not supported')
return _parse_localename(localename)
def setlocale(category, locale=None):
""" Set the locale for the given category. The locale can be
a string, an iterable of two strings (language code and encoding),
or None.
Iterables are converted to strings using the locale aliasing
engine. Locale strings are passed directly to the C lib.
category may be given as one of the LC_* values.
"""
if locale and not isinstance(locale, _builtin_str):
# convert to string
locale = normalize(_build_localename(locale))
return _setlocale(category, locale)
def resetlocale(category=LC_ALL):
""" Sets the locale for category to the default setting.
The default setting is determined by calling
getdefaultlocale(). category defaults to LC_ALL.
"""
_setlocale(category, _build_localename(getdefaultlocale()))
if sys.platform.startswith("win"):
# On Win32, this will return the ANSI code page
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using."""
if sys.flags.utf8_mode:
return 'UTF-8'
import _bootlocale
return _bootlocale.getpreferredencoding(False)
else:
# On Unix, if CODESET is available, use that.
try:
CODESET
except NameError:
# Fall back to parsing environment variables :-(
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using,
by looking at environment variables."""
if sys.flags.utf8_mode:
return 'UTF-8'
res = getdefaultlocale()[1]
if res is None:
# LANG not set, default conservatively to ASCII
res = 'ascii'
return res
else:
def getpreferredencoding(do_setlocale = True):
"""Return the charset that the user is likely using,
according to the system configuration."""
if sys.flags.utf8_mode:
return 'UTF-8'
import _bootlocale
if do_setlocale:
oldloc = setlocale(LC_CTYPE)
try:
setlocale(LC_CTYPE, "")
except Error:
pass
result = _bootlocale.getpreferredencoding(False)
if do_setlocale:
setlocale(LC_CTYPE, oldloc)
return result
### Database
#
# The following data was extracted from the locale.alias file which
# comes with X11 and then hand edited removing the explicit encoding
# definitions and adding some more aliases. The file is usually
# available as /usr/lib/X11/locale/locale.alias.
#
#
# The local_encoding_alias table maps lowercase encoding alias names
# to C locale encoding names (case-sensitive). Note that normalize()
# first looks up the encoding in the encodings.aliases dictionary and
# then applies this mapping to find the correct C lib name for the
# encoding.
#
locale_encoding_alias = {
# Mappings for non-standard encoding names used in locale names
'437': 'C',
'c': 'C',
'en': 'ISO8859-1',
'jis': 'JIS7',
'jis7': 'JIS7',
'ajec': 'eucJP',
'koi8c': 'KOI8-C',
'microsoftcp1251': 'CP1251',
'microsoftcp1255': 'CP1255',
'microsoftcp1256': 'CP1256',
'88591': 'ISO8859-1',
'88592': 'ISO8859-2',
'88595': 'ISO8859-5',
'885915': 'ISO8859-15',
# Mappings from Python codec names to C lib encoding names
'ascii': 'ISO8859-1',
'latin_1': 'ISO8859-1',
'iso8859_1': 'ISO8859-1',
'iso8859_10': 'ISO8859-10',
'iso8859_11': 'ISO8859-11',
'iso8859_13': 'ISO8859-13',
'iso8859_14': 'ISO8859-14',
'iso8859_15': 'ISO8859-15',
'iso8859_16': 'ISO8859-16',
'iso8859_2': 'ISO8859-2',
'iso8859_3': 'ISO8859-3',
'iso8859_4': 'ISO8859-4',
'iso8859_5': 'ISO8859-5',
'iso8859_6': 'ISO8859-6',
'iso8859_7': 'ISO8859-7',
'iso8859_8': 'ISO8859-8',
'iso8859_9': 'ISO8859-9',
'iso2022_jp': 'JIS7',
'shift_jis': 'SJIS',
'tactis': 'TACTIS',
'euc_jp': 'eucJP',
'euc_kr': 'eucKR',
'utf_8': 'UTF-8',
'koi8_r': 'KOI8-R',
'koi8_u': 'KOI8-U',
'cp1251': 'CP1251',
'cp1255': 'CP1255',
'cp1256': 'CP1256',
# XXX This list is still incomplete. If you know more
# mappings, please file a bug report. Thanks.
}
for k, v in sorted(locale_encoding_alias.items()):
k = k.replace('_', '')
locale_encoding_alias.setdefault(k, v)
#
# The locale_alias table maps lowercase alias names to C locale names
# (case-sensitive). Encodings are always separated from the locale
# name using a dot ('.'); they should only be given in case the
# language name is needed to interpret the given encoding alias
# correctly (CJK codes often have this need).
#
# Note that the normalize() function which uses this tables
# removes '_' and '-' characters from the encoding part of the
# locale name before doing the lookup. This saves a lot of
# space in the table.
#
# MAL 2004-12-10:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.4
# and older):
#
# updated 'bg' -> 'bg_BG.ISO8859-5' to 'bg_BG.CP1251'
# updated 'bg_bg' -> 'bg_BG.ISO8859-5' to 'bg_BG.CP1251'
# updated 'bulgarian' -> 'bg_BG.ISO8859-5' to 'bg_BG.CP1251'
# updated 'cz' -> 'cz_CZ.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'cz_cz' -> 'cz_CZ.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'czech' -> 'cs_CS.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'dutch' -> 'nl_BE.ISO8859-1' to 'nl_NL.ISO8859-1'
# updated 'et' -> 'et_EE.ISO8859-4' to 'et_EE.ISO8859-15'
# updated 'et_ee' -> 'et_EE.ISO8859-4' to 'et_EE.ISO8859-15'
# updated 'fi' -> 'fi_FI.ISO8859-1' to 'fi_FI.ISO8859-15'
# updated 'fi_fi' -> 'fi_FI.ISO8859-1' to 'fi_FI.ISO8859-15'
# updated 'iw' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'iw_il' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'japanese' -> 'ja_JP.SJIS' to 'ja_JP.eucJP'
# updated 'lt' -> 'lt_LT.ISO8859-4' to 'lt_LT.ISO8859-13'
# updated 'lv' -> 'lv_LV.ISO8859-4' to 'lv_LV.ISO8859-13'
# updated 'sl' -> 'sl_CS.ISO8859-2' to 'sl_SI.ISO8859-2'
# updated 'slovene' -> 'sl_CS.ISO8859-2' to 'sl_SI.ISO8859-2'
# updated 'th_th' -> 'th_TH.TACTIS' to 'th_TH.ISO8859-11'
# updated 'zh_cn' -> 'zh_CN.eucCN' to 'zh_CN.gb2312'
# updated 'zh_cn.big5' -> 'zh_TW.eucTW' to 'zh_TW.big5'
# updated 'zh_tw' -> 'zh_TW.eucTW' to 'zh_TW.big5'
#
# MAL 2008-05-30:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.5
# and older):
#
# updated 'cs_cs.iso88592' -> 'cs_CZ.ISO8859-2' to 'cs_CS.ISO8859-2'
# updated 'serbocroatian' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sh' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sh_hr.iso88592' -> 'sh_HR.ISO8859-2' to 'hr_HR.ISO8859-2'
# updated 'sh_sp' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sh_yu' -> 'sh_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sp' -> 'sp_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sp_yu' -> 'sp_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr@cyrillic' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_sp' -> 'sr_SP.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sr_yu' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_yu.cp1251@cyrillic' -> 'sr_YU.CP1251' to 'sr_CS.CP1251'
# updated 'sr_yu.iso88592' -> 'sr_YU.ISO8859-2' to 'sr_CS.ISO8859-2'
# updated 'sr_yu.iso88595' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_yu.iso88595@cyrillic' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
# updated 'sr_yu.microsoftcp1251@cyrillic' -> 'sr_YU.CP1251' to 'sr_CS.CP1251'
# updated 'sr_yu.utf8@cyrillic' -> 'sr_YU.UTF-8' to 'sr_CS.UTF-8'
# updated 'sr_yu@cyrillic' -> 'sr_YU.ISO8859-5' to 'sr_CS.ISO8859-5'
#
# AP 2010-04-12:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 2.6.5
# and older):
#
# updated 'ru' -> 'ru_RU.ISO8859-5' to 'ru_RU.UTF-8'
# updated 'ru_ru' -> 'ru_RU.ISO8859-5' to 'ru_RU.UTF-8'
# updated 'serbocroatian' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sh' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sh_yu' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sr' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8'
# updated 'sr@cyrillic' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8'
# updated 'sr@latn' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sr_cs.utf8@latn' -> 'sr_CS.UTF-8' to 'sr_RS.UTF-8@latin'
# updated 'sr_cs@latn' -> 'sr_CS.ISO8859-2' to 'sr_RS.UTF-8@latin'
# updated 'sr_yu' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8@latin'
# updated 'sr_yu.utf8@cyrillic' -> 'sr_CS.UTF-8' to 'sr_RS.UTF-8'
# updated 'sr_yu@cyrillic' -> 'sr_CS.ISO8859-5' to 'sr_RS.UTF-8'
#
# SS 2013-12-20:
# Updated alias mapping to most recent locale.alias file
# from X.org distribution using makelocalealias.py.
#
# These are the differences compared to the old mapping (Python 3.3.3
# and older):
#
# updated 'a3' -> 'a3_AZ.KOI8-C' to 'az_AZ.KOI8-C'
# updated 'a3_az' -> 'a3_AZ.KOI8-C' to 'az_AZ.KOI8-C'
# updated 'a3_az.koi8c' -> 'a3_AZ.KOI8-C' to 'az_AZ.KOI8-C'
# updated 'cs_cs.iso88592' -> 'cs_CS.ISO8859-2' to 'cs_CZ.ISO8859-2'
# updated 'hebrew' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'hebrew.iso88598' -> 'iw_IL.ISO8859-8' to 'he_IL.ISO8859-8'
# updated 'sd' -> '[email protected]' to 'sd_IN.UTF-8'
# updated 'sr@latn' -> 'sr_RS.UTF-8@latin' to 'sr_CS.UTF-8@latin'
# updated 'sr_cs' -> 'sr_RS.UTF-8' to 'sr_CS.UTF-8'
# updated 'sr_cs.utf8@latn' -> 'sr_RS.UTF-8@latin' to 'sr_CS.UTF-8@latin'
# updated 'sr_cs@latn' -> 'sr_RS.UTF-8@latin' to 'sr_CS.UTF-8@latin'
#
# SS 2014-10-01:
# Updated alias mapping with glibc 2.19 supported locales.
locale_alias = {
'a3': 'az_AZ.KOI8-C',
'a3_az': 'az_AZ.KOI8-C',
'a3_az.koic': 'az_AZ.KOI8-C',
'aa_dj': 'aa_DJ.ISO8859-1',
'aa_er': 'aa_ER.UTF-8',
'aa_et': 'aa_ET.UTF-8',
'af': 'af_ZA.ISO8859-1',
'af_za': 'af_ZA.ISO8859-1',
'am': 'am_ET.UTF-8',
'am_et': 'am_ET.UTF-8',
'american': 'en_US.ISO8859-1',
'an_es': 'an_ES.ISO8859-15',
'ar': 'ar_AA.ISO8859-6',
'ar_aa': 'ar_AA.ISO8859-6',
'ar_ae': 'ar_AE.ISO8859-6',
'ar_bh': 'ar_BH.ISO8859-6',
'ar_dz': 'ar_DZ.ISO8859-6',
'ar_eg': 'ar_EG.ISO8859-6',
'ar_in': 'ar_IN.UTF-8',
'ar_iq': 'ar_IQ.ISO8859-6',
'ar_jo': 'ar_JO.ISO8859-6',
'ar_kw': 'ar_KW.ISO8859-6',
'ar_lb': 'ar_LB.ISO8859-6',
'ar_ly': 'ar_LY.ISO8859-6',
'ar_ma': 'ar_MA.ISO8859-6',
'ar_om': 'ar_OM.ISO8859-6',
'ar_qa': 'ar_QA.ISO8859-6',
'ar_sa': 'ar_SA.ISO8859-6',
'ar_sd': 'ar_SD.ISO8859-6',
'ar_sy': 'ar_SY.ISO8859-6',
'ar_tn': 'ar_TN.ISO8859-6',
'ar_ye': 'ar_YE.ISO8859-6',
'arabic': 'ar_AA.ISO8859-6',
'as': 'as_IN.UTF-8',
'as_in': 'as_IN.UTF-8',
'ast_es': 'ast_ES.ISO8859-15',
'ayc_pe': 'ayc_PE.UTF-8',
'az': 'az_AZ.ISO8859-9E',
'az_az': 'az_AZ.ISO8859-9E',
'az_az.iso88599e': 'az_AZ.ISO8859-9E',
'be': 'be_BY.CP1251',
'be@latin': 'be_BY.UTF-8@latin',
'be_bg.utf8': 'bg_BG.UTF-8',
'be_by': 'be_BY.CP1251',
'be_by@latin': 'be_BY.UTF-8@latin',
'bem_zm': 'bem_ZM.UTF-8',
'ber_dz': 'ber_DZ.UTF-8',
'ber_ma': 'ber_MA.UTF-8',
'bg': 'bg_BG.CP1251',
'bg_bg': 'bg_BG.CP1251',
'bho_in': 'bho_IN.UTF-8',
'bn_bd': 'bn_BD.UTF-8',
'bn_in': 'bn_IN.UTF-8',
'bo_cn': 'bo_CN.UTF-8',
'bo_in': 'bo_IN.UTF-8',
'bokmal': 'nb_NO.ISO8859-1',
'bokm\xe5l': 'nb_NO.ISO8859-1',
'br': 'br_FR.ISO8859-1',
'br_fr': 'br_FR.ISO8859-1',
'brx_in': 'brx_IN.UTF-8',
'bs': 'bs_BA.ISO8859-2',
'bs_ba': 'bs_BA.ISO8859-2',
'bulgarian': 'bg_BG.CP1251',
'byn_er': 'byn_ER.UTF-8',
'c': 'C',
'c-french': 'fr_CA.ISO8859-1',
'c.ascii': 'C',
'c.en': 'C',
'c.iso88591': 'en_US.ISO8859-1',
'c.utf8': 'en_US.UTF-8',
'c_c': 'C',
'c_c.c': 'C',
'ca': 'ca_ES.ISO8859-1',
'ca_ad': 'ca_AD.ISO8859-1',
'ca_es': 'ca_ES.ISO8859-1',
'ca_es@valencia': 'ca_ES.ISO8859-15@valencia',
'ca_fr': 'ca_FR.ISO8859-1',
'ca_it': 'ca_IT.ISO8859-1',
'catalan': 'ca_ES.ISO8859-1',
'cextend': 'en_US.ISO8859-1',
'chinese-s': 'zh_CN.eucCN',
'chinese-t': 'zh_TW.eucTW',
'crh_ua': 'crh_UA.UTF-8',
'croatian': 'hr_HR.ISO8859-2',
'cs': 'cs_CZ.ISO8859-2',
'cs_cs': 'cs_CZ.ISO8859-2',
'cs_cz': 'cs_CZ.ISO8859-2',
'csb_pl': 'csb_PL.UTF-8',
'cv_ru': 'cv_RU.UTF-8',
'cy': 'cy_GB.ISO8859-1',
'cy_gb': 'cy_GB.ISO8859-1',
'cz': 'cs_CZ.ISO8859-2',
'cz_cz': 'cs_CZ.ISO8859-2',
'czech': 'cs_CZ.ISO8859-2',
'da': 'da_DK.ISO8859-1',
'da_dk': 'da_DK.ISO8859-1',
'danish': 'da_DK.ISO8859-1',
'dansk': 'da_DK.ISO8859-1',
'de': 'de_DE.ISO8859-1',
'de_at': 'de_AT.ISO8859-1',
'de_be': 'de_BE.ISO8859-1',
'de_ch': 'de_CH.ISO8859-1',
'de_de': 'de_DE.ISO8859-1',
'de_li.utf8': 'de_LI.UTF-8',
'de_lu': 'de_LU.ISO8859-1',
'deutsch': 'de_DE.ISO8859-1',
'doi_in': 'doi_IN.UTF-8',
'dutch': 'nl_NL.ISO8859-1',
'dutch.iso88591': 'nl_BE.ISO8859-1',
'dv_mv': 'dv_MV.UTF-8',
'dz_bt': 'dz_BT.UTF-8',
'ee': 'ee_EE.ISO8859-4',
'ee_ee': 'ee_EE.ISO8859-4',
'eesti': 'et_EE.ISO8859-1',
'el': 'el_GR.ISO8859-7',
'el_cy': 'el_CY.ISO8859-7',
'el_gr': 'el_GR.ISO8859-7',
'el_gr@euro': 'el_GR.ISO8859-15',
'en': 'en_US.ISO8859-1',
'en_ag': 'en_AG.UTF-8',
'en_au': 'en_AU.ISO8859-1',
'en_be': 'en_BE.ISO8859-1',
'en_bw': 'en_BW.ISO8859-1',
'en_ca': 'en_CA.ISO8859-1',
'en_dk': 'en_DK.ISO8859-1',
'en_dl.utf8': 'en_DL.UTF-8',
'en_gb': 'en_GB.ISO8859-1',
'en_hk': 'en_HK.ISO8859-1',
'en_ie': 'en_IE.ISO8859-1',
'en_in': 'en_IN.ISO8859-1',
'en_ng': 'en_NG.UTF-8',
'en_nz': 'en_NZ.ISO8859-1',
'en_ph': 'en_PH.ISO8859-1',
'en_sg': 'en_SG.ISO8859-1',
'en_uk': 'en_GB.ISO8859-1',
'en_us': 'en_US.ISO8859-1',
'en_us@euro@euro': 'en_US.ISO8859-15',
'en_za': 'en_ZA.ISO8859-1',
'en_zm': 'en_ZM.UTF-8',
'en_zw': 'en_ZW.ISO8859-1',
'en_zw.utf8': 'en_ZS.UTF-8',
'eng_gb': 'en_GB.ISO8859-1',
'english': 'en_EN.ISO8859-1',
'english_uk': 'en_GB.ISO8859-1',
'english_united-states': 'en_US.ISO8859-1',
'english_united-states.437': 'C',
'english_us': 'en_US.ISO8859-1',
'eo': 'eo_XX.ISO8859-3',
'eo.utf8': 'eo.UTF-8',
'eo_eo': 'eo_EO.ISO8859-3',
'eo_us.utf8': 'eo_US.UTF-8',
'eo_xx': 'eo_XX.ISO8859-3',
'es': 'es_ES.ISO8859-1',
'es_ar': 'es_AR.ISO8859-1',
'es_bo': 'es_BO.ISO8859-1',
'es_cl': 'es_CL.ISO8859-1',
'es_co': 'es_CO.ISO8859-1',
'es_cr': 'es_CR.ISO8859-1',
'es_cu': 'es_CU.UTF-8',
'es_do': 'es_DO.ISO8859-1',
'es_ec': 'es_EC.ISO8859-1',
'es_es': 'es_ES.ISO8859-1',
'es_gt': 'es_GT.ISO8859-1',
'es_hn': 'es_HN.ISO8859-1',
'es_mx': 'es_MX.ISO8859-1',
'es_ni': 'es_NI.ISO8859-1',
'es_pa': 'es_PA.ISO8859-1',
'es_pe': 'es_PE.ISO8859-1',
'es_pr': 'es_PR.ISO8859-1',
'es_py': 'es_PY.ISO8859-1',
'es_sv': 'es_SV.ISO8859-1',
'es_us': 'es_US.ISO8859-1',
'es_uy': 'es_UY.ISO8859-1',
'es_ve': 'es_VE.ISO8859-1',
'estonian': 'et_EE.ISO8859-1',
'et': 'et_EE.ISO8859-15',
'et_ee': 'et_EE.ISO8859-15',
'eu': 'eu_ES.ISO8859-1',
'eu_es': 'eu_ES.ISO8859-1',
'eu_fr': 'eu_FR.ISO8859-1',
'fa': 'fa_IR.UTF-8',
'fa_ir': 'fa_IR.UTF-8',
'fa_ir.isiri3342': 'fa_IR.ISIRI-3342',
'ff_sn': 'ff_SN.UTF-8',
'fi': 'fi_FI.ISO8859-15',
'fi_fi': 'fi_FI.ISO8859-15',
'fil_ph': 'fil_PH.UTF-8',
'finnish': 'fi_FI.ISO8859-1',
'fo': 'fo_FO.ISO8859-1',
'fo_fo': 'fo_FO.ISO8859-1',
'fr': 'fr_FR.ISO8859-1',
'fr_be': 'fr_BE.ISO8859-1',
'fr_ca': 'fr_CA.ISO8859-1',
'fr_ch': 'fr_CH.ISO8859-1',
'fr_fr': 'fr_FR.ISO8859-1',
'fr_lu': 'fr_LU.ISO8859-1',
'fran\xe7ais': 'fr_FR.ISO8859-1',
'fre_fr': 'fr_FR.ISO8859-1',
'french': 'fr_FR.ISO8859-1',
'french.iso88591': 'fr_CH.ISO8859-1',
'french_france': 'fr_FR.ISO8859-1',
'fur_it': 'fur_IT.UTF-8',
'fy_de': 'fy_DE.UTF-8',
'fy_nl': 'fy_NL.UTF-8',
'ga': 'ga_IE.ISO8859-1',
'ga_ie': 'ga_IE.ISO8859-1',
'galego': 'gl_ES.ISO8859-1',
'galician': 'gl_ES.ISO8859-1',
'gd': 'gd_GB.ISO8859-1',
'gd_gb': 'gd_GB.ISO8859-1',
'ger_de': 'de_DE.ISO8859-1',
'german': 'de_DE.ISO8859-1',
'german.iso88591': 'de_CH.ISO8859-1',
'german_germany': 'de_DE.ISO8859-1',
'gez_er': 'gez_ER.UTF-8',
'gez_et': 'gez_ET.UTF-8',
'gl': 'gl_ES.ISO8859-1',
'gl_es': 'gl_ES.ISO8859-1',
'greek': 'el_GR.ISO8859-7',
'gu_in': 'gu_IN.UTF-8',
'gv': 'gv_GB.ISO8859-1',
'gv_gb': 'gv_GB.ISO8859-1',
'ha_ng': 'ha_NG.UTF-8',
'he': 'he_IL.ISO8859-8',
'he_il': 'he_IL.ISO8859-8',
'hebrew': 'he_IL.ISO8859-8',
'hi': 'hi_IN.ISCII-DEV',
'hi_in': 'hi_IN.ISCII-DEV',
'hi_in.isciidev': 'hi_IN.ISCII-DEV',
'hne': 'hne_IN.UTF-8',
'hne_in': 'hne_IN.UTF-8',
'hr': 'hr_HR.ISO8859-2',
'hr_hr': 'hr_HR.ISO8859-2',
'hrvatski': 'hr_HR.ISO8859-2',
'hsb_de': 'hsb_DE.ISO8859-2',
'ht_ht': 'ht_HT.UTF-8',
'hu': 'hu_HU.ISO8859-2',
'hu_hu': 'hu_HU.ISO8859-2',
'hungarian': 'hu_HU.ISO8859-2',
'hy_am': 'hy_AM.UTF-8',
'hy_am.armscii8': 'hy_AM.ARMSCII_8',
'ia': 'ia.UTF-8',
'ia_fr': 'ia_FR.UTF-8',
'icelandic': 'is_IS.ISO8859-1',
'id': 'id_ID.ISO8859-1',
'id_id': 'id_ID.ISO8859-1',
'ig_ng': 'ig_NG.UTF-8',
'ik_ca': 'ik_CA.UTF-8',
'in': 'id_ID.ISO8859-1',
'in_id': 'id_ID.ISO8859-1',
'is': 'is_IS.ISO8859-1',
'is_is': 'is_IS.ISO8859-1',
'iso-8859-1': 'en_US.ISO8859-1',
'iso-8859-15': 'en_US.ISO8859-15',
'iso8859-1': 'en_US.ISO8859-1',
'iso8859-15': 'en_US.ISO8859-15',
'iso_8859_1': 'en_US.ISO8859-1',
'iso_8859_15': 'en_US.ISO8859-15',
'it': 'it_IT.ISO8859-1',
'it_ch': 'it_CH.ISO8859-1',
'it_it': 'it_IT.ISO8859-1',
'italian': 'it_IT.ISO8859-1',
'iu': 'iu_CA.NUNACOM-8',
'iu_ca': 'iu_CA.NUNACOM-8',
'iu_ca.nunacom8': 'iu_CA.NUNACOM-8',
'iw': 'he_IL.ISO8859-8',
'iw_il': 'he_IL.ISO8859-8',
'iw_il.utf8': 'iw_IL.UTF-8',
'ja': 'ja_JP.eucJP',
'ja_jp': 'ja_JP.eucJP',
'ja_jp.euc': 'ja_JP.eucJP',
'ja_jp.mscode': 'ja_JP.SJIS',
'ja_jp.pck': 'ja_JP.SJIS',
'japan': 'ja_JP.eucJP',
'japanese': 'ja_JP.eucJP',
'japanese-euc': 'ja_JP.eucJP',
'japanese.euc': 'ja_JP.eucJP',
'jp_jp': 'ja_JP.eucJP',
'ka': 'ka_GE.GEORGIAN-ACADEMY',
'ka_ge': 'ka_GE.GEORGIAN-ACADEMY',
'ka_ge.georgianacademy': 'ka_GE.GEORGIAN-ACADEMY',
'ka_ge.georgianps': 'ka_GE.GEORGIAN-PS',
'ka_ge.georgianrs': 'ka_GE.GEORGIAN-ACADEMY',
'kk_kz': 'kk_KZ.RK1048',
'kl': 'kl_GL.ISO8859-1',
'kl_gl': 'kl_GL.ISO8859-1',
'km_kh': 'km_KH.UTF-8',
'kn': 'kn_IN.UTF-8',
'kn_in': 'kn_IN.UTF-8',
'ko': 'ko_KR.eucKR',
'ko_kr': 'ko_KR.eucKR',
'ko_kr.euc': 'ko_KR.eucKR',
'kok_in': 'kok_IN.UTF-8',
'korean': 'ko_KR.eucKR',
'korean.euc': 'ko_KR.eucKR',
'ks': 'ks_IN.UTF-8',
'ks_in': 'ks_IN.UTF-8',
'[email protected]': 'ks_IN.UTF-8@devanagari',
'ku_tr': 'ku_TR.ISO8859-9',
'kw': 'kw_GB.ISO8859-1',
'kw_gb': 'kw_GB.ISO8859-1',
'ky': 'ky_KG.UTF-8',
'ky_kg': 'ky_KG.UTF-8',
'lb_lu': 'lb_LU.UTF-8',
'lg_ug': 'lg_UG.ISO8859-10',
'li_be': 'li_BE.UTF-8',
'li_nl': 'li_NL.UTF-8',
'lij_it': 'lij_IT.UTF-8',
'lithuanian': 'lt_LT.ISO8859-13',
'lo': 'lo_LA.MULELAO-1',
'lo_la': 'lo_LA.MULELAO-1',
'lo_la.cp1133': 'lo_LA.IBM-CP1133',
'lo_la.ibmcp1133': 'lo_LA.IBM-CP1133',
'lo_la.mulelao1': 'lo_LA.MULELAO-1',
'lt': 'lt_LT.ISO8859-13',
'lt_lt': 'lt_LT.ISO8859-13',
'lv': 'lv_LV.ISO8859-13',
'lv_lv': 'lv_LV.ISO8859-13',
'mag_in': 'mag_IN.UTF-8',
'mai': 'mai_IN.UTF-8',
'mai_in': 'mai_IN.UTF-8',
'mg_mg': 'mg_MG.ISO8859-15',
'mhr_ru': 'mhr_RU.UTF-8',
'mi': 'mi_NZ.ISO8859-1',
'mi_nz': 'mi_NZ.ISO8859-1',
'mk': 'mk_MK.ISO8859-5',
'mk_mk': 'mk_MK.ISO8859-5',
'ml': 'ml_IN.UTF-8',
'ml_in': 'ml_IN.UTF-8',
'mn_mn': 'mn_MN.UTF-8',
'mni_in': 'mni_IN.UTF-8',
'mr': 'mr_IN.UTF-8',
'mr_in': 'mr_IN.UTF-8',
'ms': 'ms_MY.ISO8859-1',
'ms_my': 'ms_MY.ISO8859-1',
'mt': 'mt_MT.ISO8859-3',
'mt_mt': 'mt_MT.ISO8859-3',
'my_mm': 'my_MM.UTF-8',
'nan_tw@latin': 'nan_TW.UTF-8@latin',
'nb': 'nb_NO.ISO8859-1',
'nb_no': 'nb_NO.ISO8859-1',
'nds_de': 'nds_DE.UTF-8',
'nds_nl': 'nds_NL.UTF-8',
'ne_np': 'ne_NP.UTF-8',
'nhn_mx': 'nhn_MX.UTF-8',
'niu_nu': 'niu_NU.UTF-8',
'niu_nz': 'niu_NZ.UTF-8',
'nl': 'nl_NL.ISO8859-1',
'nl_aw': 'nl_AW.UTF-8',
'nl_be': 'nl_BE.ISO8859-1',
'nl_nl': 'nl_NL.ISO8859-1',
'nn': 'nn_NO.ISO8859-1',
'nn_no': 'nn_NO.ISO8859-1',
'no': 'no_NO.ISO8859-1',
'no@nynorsk': 'ny_NO.ISO8859-1',
'no_no': 'no_NO.ISO8859-1',
'no_no.iso88591@bokmal': 'no_NO.ISO8859-1',
'no_no.iso88591@nynorsk': 'no_NO.ISO8859-1',
'norwegian': 'no_NO.ISO8859-1',
'nr': 'nr_ZA.ISO8859-1',
'nr_za': 'nr_ZA.ISO8859-1',
'nso': 'nso_ZA.ISO8859-15',
'nso_za': 'nso_ZA.ISO8859-15',
'ny': 'ny_NO.ISO8859-1',
'ny_no': 'ny_NO.ISO8859-1',
'nynorsk': 'nn_NO.ISO8859-1',
'oc': 'oc_FR.ISO8859-1',
'oc_fr': 'oc_FR.ISO8859-1',
'om_et': 'om_ET.UTF-8',
'om_ke': 'om_KE.ISO8859-1',
'or': 'or_IN.UTF-8',
'or_in': 'or_IN.UTF-8',
'os_ru': 'os_RU.UTF-8',
'pa': 'pa_IN.UTF-8',
'pa_in': 'pa_IN.UTF-8',
'pa_pk': 'pa_PK.UTF-8',
'pap_an': 'pap_AN.UTF-8',
'pd': 'pd_US.ISO8859-1',
'pd_de': 'pd_DE.ISO8859-1',
'pd_us': 'pd_US.ISO8859-1',
'ph': 'ph_PH.ISO8859-1',
'ph_ph': 'ph_PH.ISO8859-1',
'pl': 'pl_PL.ISO8859-2',
'pl_pl': 'pl_PL.ISO8859-2',
'polish': 'pl_PL.ISO8859-2',
'portuguese': 'pt_PT.ISO8859-1',
'portuguese_brazil': 'pt_BR.ISO8859-1',
'posix': 'C',
'posix-utf2': 'C',
'pp': 'pp_AN.ISO8859-1',
'pp_an': 'pp_AN.ISO8859-1',
'ps_af': 'ps_AF.UTF-8',
'pt': 'pt_PT.ISO8859-1',
'pt_br': 'pt_BR.ISO8859-1',
'pt_pt': 'pt_PT.ISO8859-1',
'ro': 'ro_RO.ISO8859-2',
'ro_ro': 'ro_RO.ISO8859-2',
'romanian': 'ro_RO.ISO8859-2',
'ru': 'ru_RU.UTF-8',
'ru_ru': 'ru_RU.UTF-8',
'ru_ua': 'ru_UA.KOI8-U',
'rumanian': 'ro_RO.ISO8859-2',
'russian': 'ru_RU.ISO8859-5',
'rw': 'rw_RW.ISO8859-1',
'rw_rw': 'rw_RW.ISO8859-1',
'sa_in': 'sa_IN.UTF-8',
'sat_in': 'sat_IN.UTF-8',
'sc_it': 'sc_IT.UTF-8',
'sd': 'sd_IN.UTF-8',
'sd_in': 'sd_IN.UTF-8',
'[email protected]': 'sd_IN.UTF-8@devanagari',
'sd_pk': 'sd_PK.UTF-8',
'se_no': 'se_NO.UTF-8',
'serbocroatian': 'sr_RS.UTF-8@latin',
'sh': 'sr_RS.UTF-8@latin',
'sh_ba.iso88592@bosnia': 'sr_CS.ISO8859-2',
'sh_hr': 'sh_HR.ISO8859-2',
'sh_hr.iso88592': 'hr_HR.ISO8859-2',
'sh_sp': 'sr_CS.ISO8859-2',
'sh_yu': 'sr_RS.UTF-8@latin',
'shs_ca': 'shs_CA.UTF-8',
'si': 'si_LK.UTF-8',
'si_lk': 'si_LK.UTF-8',
'sid_et': 'sid_ET.UTF-8',
'sinhala': 'si_LK.UTF-8',
'sk': 'sk_SK.ISO8859-2',
'sk_sk': 'sk_SK.ISO8859-2',
'sl': 'sl_SI.ISO8859-2',
'sl_cs': 'sl_CS.ISO8859-2',
'sl_si': 'sl_SI.ISO8859-2',
'slovak': 'sk_SK.ISO8859-2',
'slovene': 'sl_SI.ISO8859-2',
'slovenian': 'sl_SI.ISO8859-2',
'so_dj': 'so_DJ.ISO8859-1',
'so_et': 'so_ET.UTF-8',
'so_ke': 'so_KE.ISO8859-1',
'so_so': 'so_SO.ISO8859-1',
'sp': 'sr_CS.ISO8859-5',
'sp_yu': 'sr_CS.ISO8859-5',
'spanish': 'es_ES.ISO8859-1',
'spanish_spain': 'es_ES.ISO8859-1',
'sq': 'sq_AL.ISO8859-2',
'sq_al': 'sq_AL.ISO8859-2',
'sq_mk': 'sq_MK.UTF-8',
'sr': 'sr_RS.UTF-8',
'sr@cyrillic': 'sr_RS.UTF-8',
'sr@latn': 'sr_CS.UTF-8@latin',
'sr_cs': 'sr_CS.UTF-8',
'sr_cs.iso88592@latn': 'sr_CS.ISO8859-2',
'sr_cs@latn': 'sr_CS.UTF-8@latin',
'sr_me': 'sr_ME.UTF-8',
'sr_rs': 'sr_RS.UTF-8',
'sr_rs@latn': 'sr_RS.UTF-8@latin',
'sr_sp': 'sr_CS.ISO8859-2',
'sr_yu': 'sr_RS.UTF-8@latin',
'sr_yu.cp1251@cyrillic': 'sr_CS.CP1251',
'sr_yu.iso88592': 'sr_CS.ISO8859-2',
'sr_yu.iso88595': 'sr_CS.ISO8859-5',
'sr_yu.iso88595@cyrillic': 'sr_CS.ISO8859-5',
'sr_yu.microsoftcp1251@cyrillic': 'sr_CS.CP1251',
'sr_yu.utf8': 'sr_RS.UTF-8',
'sr_yu.utf8@cyrillic': 'sr_RS.UTF-8',
'sr_yu@cyrillic': 'sr_RS.UTF-8',
'ss': 'ss_ZA.ISO8859-1',
'ss_za': 'ss_ZA.ISO8859-1',
'st': 'st_ZA.ISO8859-1',
'st_za': 'st_ZA.ISO8859-1',
'sv': 'sv_SE.ISO8859-1',
'sv_fi': 'sv_FI.ISO8859-1',
'sv_se': 'sv_SE.ISO8859-1',
'sw_ke': 'sw_KE.UTF-8',
'sw_tz': 'sw_TZ.UTF-8',
'swedish': 'sv_SE.ISO8859-1',
'szl_pl': 'szl_PL.UTF-8',
'ta': 'ta_IN.TSCII-0',
'ta_in': 'ta_IN.TSCII-0',
'ta_in.tscii': 'ta_IN.TSCII-0',
'ta_in.tscii0': 'ta_IN.TSCII-0',
'ta_lk': 'ta_LK.UTF-8',
'te': 'te_IN.UTF-8',
'te_in': 'te_IN.UTF-8',
'tg': 'tg_TJ.KOI8-C',
'tg_tj': 'tg_TJ.KOI8-C',
'th': 'th_TH.ISO8859-11',
'th_th': 'th_TH.ISO8859-11',
'th_th.tactis': 'th_TH.TIS620',
'th_th.tis620': 'th_TH.TIS620',
'thai': 'th_TH.ISO8859-11',
'ti_er': 'ti_ER.UTF-8',
'ti_et': 'ti_ET.UTF-8',
'tig_er': 'tig_ER.UTF-8',
'tk_tm': 'tk_TM.UTF-8',
'tl': 'tl_PH.ISO8859-1',
'tl_ph': 'tl_PH.ISO8859-1',
'tn': 'tn_ZA.ISO8859-15',
'tn_za': 'tn_ZA.ISO8859-15',
'tr': 'tr_TR.ISO8859-9',
'tr_cy': 'tr_CY.ISO8859-9',
'tr_tr': 'tr_TR.ISO8859-9',
'ts': 'ts_ZA.ISO8859-1',
'ts_za': 'ts_ZA.ISO8859-1',
'tt': 'tt_RU.TATAR-CYR',
'tt_ru': 'tt_RU.TATAR-CYR',
'tt_ru.tatarcyr': 'tt_RU.TATAR-CYR',
'tt_ru@iqtelif': 'tt_RU.UTF-8@iqtelif',
'turkish': 'tr_TR.ISO8859-9',
'ug_cn': 'ug_CN.UTF-8',
'uk': 'uk_UA.KOI8-U',
'uk_ua': 'uk_UA.KOI8-U',
'univ': 'en_US.utf',
'universal': 'en_US.utf',
'universal.utf8@ucs4': 'en_US.UTF-8',
'unm_us': 'unm_US.UTF-8',
'ur': 'ur_PK.CP1256',
'ur_in': 'ur_IN.UTF-8',
'ur_pk': 'ur_PK.CP1256',
'uz': 'uz_UZ.UTF-8',
'uz_uz': 'uz_UZ.UTF-8',
'uz_uz@cyrillic': 'uz_UZ.UTF-8',
've': 've_ZA.UTF-8',
've_za': 've_ZA.UTF-8',
'vi': 'vi_VN.TCVN',
'vi_vn': 'vi_VN.TCVN',
'vi_vn.tcvn': 'vi_VN.TCVN',
'vi_vn.tcvn5712': 'vi_VN.TCVN',
'vi_vn.viscii': 'vi_VN.VISCII',
'vi_vn.viscii111': 'vi_VN.VISCII',
'wa': 'wa_BE.ISO8859-1',
'wa_be': 'wa_BE.ISO8859-1',
'wae_ch': 'wae_CH.UTF-8',
'wal_et': 'wal_ET.UTF-8',
'wo_sn': 'wo_SN.UTF-8',
'xh': 'xh_ZA.ISO8859-1',
'xh_za': 'xh_ZA.ISO8859-1',
'yi': 'yi_US.CP1255',
'yi_us': 'yi_US.CP1255',
'yo_ng': 'yo_NG.UTF-8',
'yue_hk': 'yue_HK.UTF-8',
'zh': 'zh_CN.eucCN',
'zh_cn': 'zh_CN.gb2312',
'zh_cn.big5': 'zh_TW.big5',
'zh_cn.euc': 'zh_CN.eucCN',
'zh_hk': 'zh_HK.big5hkscs',
'zh_hk.big5hk': 'zh_HK.big5hkscs',
'zh_sg': 'zh_SG.GB2312',
'zh_sg.gbk': 'zh_SG.GBK',
'zh_tw': 'zh_TW.big5',
'zh_tw.euc': 'zh_TW.eucTW',
'zh_tw.euctw': 'zh_TW.eucTW',
'zu': 'zu_ZA.ISO8859-1',
'zu_za': 'zu_ZA.ISO8859-1',
}
#
# This maps Windows language identifiers to locale strings.
#
# This list has been updated from
# http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_238z.asp
# to include every locale up to Windows Vista.
#
# NOTE: this mapping is incomplete. If your language is missing, please
# submit a bug report to the Python bug tracker at http://bugs.python.org/
# Make sure you include the missing language identifier and the suggested
# locale code.
#
windows_locale = {
0x0436: "af_ZA", # Afrikaans
0x041c: "sq_AL", # Albanian
0x0484: "gsw_FR",# Alsatian - France
0x045e: "am_ET", # Amharic - Ethiopia
0x0401: "ar_SA", # Arabic - Saudi Arabia
0x0801: "ar_IQ", # Arabic - Iraq
0x0c01: "ar_EG", # Arabic - Egypt
0x1001: "ar_LY", # Arabic - Libya
0x1401: "ar_DZ", # Arabic - Algeria
0x1801: "ar_MA", # Arabic - Morocco
0x1c01: "ar_TN", # Arabic - Tunisia
0x2001: "ar_OM", # Arabic - Oman
0x2401: "ar_YE", # Arabic - Yemen
0x2801: "ar_SY", # Arabic - Syria
0x2c01: "ar_JO", # Arabic - Jordan
0x3001: "ar_LB", # Arabic - Lebanon
0x3401: "ar_KW", # Arabic - Kuwait
0x3801: "ar_AE", # Arabic - United Arab Emirates
0x3c01: "ar_BH", # Arabic - Bahrain
0x4001: "ar_QA", # Arabic - Qatar
0x042b: "hy_AM", # Armenian
0x044d: "as_IN", # Assamese - India
0x042c: "az_AZ", # Azeri - Latin
0x082c: "az_AZ", # Azeri - Cyrillic
0x046d: "ba_RU", # Bashkir
0x042d: "eu_ES", # Basque - Russia
0x0423: "be_BY", # Belarusian
0x0445: "bn_IN", # Begali
0x201a: "bs_BA", # Bosnian - Cyrillic
0x141a: "bs_BA", # Bosnian - Latin
0x047e: "br_FR", # Breton - France
0x0402: "bg_BG", # Bulgarian
# 0x0455: "my_MM", # Burmese - Not supported
0x0403: "ca_ES", # Catalan
0x0004: "zh_CHS",# Chinese - Simplified
0x0404: "zh_TW", # Chinese - Taiwan
0x0804: "zh_CN", # Chinese - PRC
0x0c04: "zh_HK", # Chinese - Hong Kong S.A.R.
0x1004: "zh_SG", # Chinese - Singapore
0x1404: "zh_MO", # Chinese - Macao S.A.R.
0x7c04: "zh_CHT",# Chinese - Traditional
0x0483: "co_FR", # Corsican - France
0x041a: "hr_HR", # Croatian
0x101a: "hr_BA", # Croatian - Bosnia
0x0405: "cs_CZ", # Czech
0x0406: "da_DK", # Danish
0x048c: "gbz_AF",# Dari - Afghanistan
0x0465: "div_MV",# Divehi - Maldives
0x0413: "nl_NL", # Dutch - The Netherlands
0x0813: "nl_BE", # Dutch - Belgium
0x0409: "en_US", # English - United States
0x0809: "en_GB", # English - United Kingdom
0x0c09: "en_AU", # English - Australia
0x1009: "en_CA", # English - Canada
0x1409: "en_NZ", # English - New Zealand
0x1809: "en_IE", # English - Ireland
0x1c09: "en_ZA", # English - South Africa
0x2009: "en_JA", # English - Jamaica
0x2409: "en_CB", # English - Carribbean
0x2809: "en_BZ", # English - Belize
0x2c09: "en_TT", # English - Trinidad
0x3009: "en_ZW", # English - Zimbabwe
0x3409: "en_PH", # English - Philippines
0x4009: "en_IN", # English - India
0x4409: "en_MY", # English - Malaysia
0x4809: "en_IN", # English - Singapore
0x0425: "et_EE", # Estonian
0x0438: "fo_FO", # Faroese
0x0464: "fil_PH",# Filipino
0x040b: "fi_FI", # Finnish
0x040c: "fr_FR", # French - France
0x080c: "fr_BE", # French - Belgium
0x0c0c: "fr_CA", # French - Canada
0x100c: "fr_CH", # French - Switzerland
0x140c: "fr_LU", # French - Luxembourg
0x180c: "fr_MC", # French - Monaco
0x0462: "fy_NL", # Frisian - Netherlands
0x0456: "gl_ES", # Galician
0x0437: "ka_GE", # Georgian
0x0407: "de_DE", # German - Germany
0x0807: "de_CH", # German - Switzerland
0x0c07: "de_AT", # German - Austria
0x1007: "de_LU", # German - Luxembourg
0x1407: "de_LI", # German - Liechtenstein
0x0408: "el_GR", # Greek
0x046f: "kl_GL", # Greenlandic - Greenland
0x0447: "gu_IN", # Gujarati
0x0468: "ha_NG", # Hausa - Latin
0x040d: "he_IL", # Hebrew
0x0439: "hi_IN", # Hindi
0x040e: "hu_HU", # Hungarian
0x040f: "is_IS", # Icelandic
0x0421: "id_ID", # Indonesian
0x045d: "iu_CA", # Inuktitut - Syllabics
0x085d: "iu_CA", # Inuktitut - Latin
0x083c: "ga_IE", # Irish - Ireland
0x0410: "it_IT", # Italian - Italy
0x0810: "it_CH", # Italian - Switzerland
0x0411: "ja_JP", # Japanese
0x044b: "kn_IN", # Kannada - India
0x043f: "kk_KZ", # Kazakh
0x0453: "kh_KH", # Khmer - Cambodia
0x0486: "qut_GT",# K'iche - Guatemala
0x0487: "rw_RW", # Kinyarwanda - Rwanda
0x0457: "kok_IN",# Konkani
0x0412: "ko_KR", # Korean
0x0440: "ky_KG", # Kyrgyz
0x0454: "lo_LA", # Lao - Lao PDR
0x0426: "lv_LV", # Latvian
0x0427: "lt_LT", # Lithuanian
0x082e: "dsb_DE",# Lower Sorbian - Germany
0x046e: "lb_LU", # Luxembourgish
0x042f: "mk_MK", # FYROM Macedonian
0x043e: "ms_MY", # Malay - Malaysia
0x083e: "ms_BN", # Malay - Brunei Darussalam
0x044c: "ml_IN", # Malayalam - India
0x043a: "mt_MT", # Maltese
0x0481: "mi_NZ", # Maori
0x047a: "arn_CL",# Mapudungun
0x044e: "mr_IN", # Marathi
0x047c: "moh_CA",# Mohawk - Canada
0x0450: "mn_MN", # Mongolian - Cyrillic
0x0850: "mn_CN", # Mongolian - PRC
0x0461: "ne_NP", # Nepali
0x0414: "nb_NO", # Norwegian - Bokmal
0x0814: "nn_NO", # Norwegian - Nynorsk
0x0482: "oc_FR", # Occitan - France
0x0448: "or_IN", # Oriya - India
0x0463: "ps_AF", # Pashto - Afghanistan
0x0429: "fa_IR", # Persian
0x0415: "pl_PL", # Polish
0x0416: "pt_BR", # Portuguese - Brazil
0x0816: "pt_PT", # Portuguese - Portugal
0x0446: "pa_IN", # Punjabi
0x046b: "quz_BO",# Quechua (Bolivia)
0x086b: "quz_EC",# Quechua (Ecuador)
0x0c6b: "quz_PE",# Quechua (Peru)
0x0418: "ro_RO", # Romanian - Romania
0x0417: "rm_CH", # Romansh
0x0419: "ru_RU", # Russian
0x243b: "smn_FI",# Sami Finland
0x103b: "smj_NO",# Sami Norway
0x143b: "smj_SE",# Sami Sweden
0x043b: "se_NO", # Sami Northern Norway
0x083b: "se_SE", # Sami Northern Sweden
0x0c3b: "se_FI", # Sami Northern Finland
0x203b: "sms_FI",# Sami Skolt
0x183b: "sma_NO",# Sami Southern Norway
0x1c3b: "sma_SE",# Sami Southern Sweden
0x044f: "sa_IN", # Sanskrit
0x0c1a: "sr_SP", # Serbian - Cyrillic
0x1c1a: "sr_BA", # Serbian - Bosnia Cyrillic
0x081a: "sr_SP", # Serbian - Latin
0x181a: "sr_BA", # Serbian - Bosnia Latin
0x045b: "si_LK", # Sinhala - Sri Lanka
0x046c: "ns_ZA", # Northern Sotho
0x0432: "tn_ZA", # Setswana - Southern Africa
0x041b: "sk_SK", # Slovak
0x0424: "sl_SI", # Slovenian
0x040a: "es_ES", # Spanish - Spain
0x080a: "es_MX", # Spanish - Mexico
0x0c0a: "es_ES", # Spanish - Spain (Modern)
0x100a: "es_GT", # Spanish - Guatemala
0x140a: "es_CR", # Spanish - Costa Rica
0x180a: "es_PA", # Spanish - Panama
0x1c0a: "es_DO", # Spanish - Dominican Republic
0x200a: "es_VE", # Spanish - Venezuela
0x240a: "es_CO", # Spanish - Colombia
0x280a: "es_PE", # Spanish - Peru
0x2c0a: "es_AR", # Spanish - Argentina
0x300a: "es_EC", # Spanish - Ecuador
0x340a: "es_CL", # Spanish - Chile
0x380a: "es_UR", # Spanish - Uruguay
0x3c0a: "es_PY", # Spanish - Paraguay
0x400a: "es_BO", # Spanish - Bolivia
0x440a: "es_SV", # Spanish - El Salvador
0x480a: "es_HN", # Spanish - Honduras
0x4c0a: "es_NI", # Spanish - Nicaragua
0x500a: "es_PR", # Spanish - Puerto Rico
0x540a: "es_US", # Spanish - United States
# 0x0430: "", # Sutu - Not supported
0x0441: "sw_KE", # Swahili
0x041d: "sv_SE", # Swedish - Sweden
0x081d: "sv_FI", # Swedish - Finland
0x045a: "syr_SY",# Syriac
0x0428: "tg_TJ", # Tajik - Cyrillic
0x085f: "tmz_DZ",# Tamazight - Latin
0x0449: "ta_IN", # Tamil
0x0444: "tt_RU", # Tatar
0x044a: "te_IN", # Telugu
0x041e: "th_TH", # Thai
0x0851: "bo_BT", # Tibetan - Bhutan
0x0451: "bo_CN", # Tibetan - PRC
0x041f: "tr_TR", # Turkish
0x0442: "tk_TM", # Turkmen - Cyrillic
0x0480: "ug_CN", # Uighur - Arabic
0x0422: "uk_UA", # Ukrainian
0x042e: "wen_DE",# Upper Sorbian - Germany
0x0420: "ur_PK", # Urdu
0x0820: "ur_IN", # Urdu - India
0x0443: "uz_UZ", # Uzbek - Latin
0x0843: "uz_UZ", # Uzbek - Cyrillic
0x042a: "vi_VN", # Vietnamese
0x0452: "cy_GB", # Welsh
0x0488: "wo_SN", # Wolof - Senegal
0x0434: "xh_ZA", # Xhosa - South Africa
0x0485: "sah_RU",# Yakut - Cyrillic
0x0478: "ii_CN", # Yi - PRC
0x046a: "yo_NG", # Yoruba - Nigeria
0x0435: "zu_ZA", # Zulu
}
def _print_locale():
""" Test function.
"""
categories = {}
def _init_categories(categories=categories):
for k,v in globals().items():
if k[:3] == 'LC_':
categories[k] = v
_init_categories()
del categories['LC_ALL']
print('Locale defaults as determined by getdefaultlocale():')
print('-'*72)
lang, enc = getdefaultlocale()
print('Language: ', lang or '(undefined)')
print('Encoding: ', enc or '(undefined)')
print()
print('Locale settings on startup:')
print('-'*72)
for name,category in categories.items():
print(name, '...')
lang, enc = getlocale(category)
print(' Language: ', lang or '(undefined)')
print(' Encoding: ', enc or '(undefined)')
print()
print()
print('Locale settings after calling resetlocale():')
print('-'*72)
resetlocale()
for name,category in categories.items():
print(name, '...')
lang, enc = getlocale(category)
print(' Language: ', lang or '(undefined)')
print(' Encoding: ', enc or '(undefined)')
print()
try:
setlocale(LC_ALL, "")
except:
print('NOTE:')
print('setlocale(LC_ALL, "") does not support the default locale')
print('given in the OS environment variables.')
else:
print()
print('Locale settings after calling setlocale(LC_ALL, ""):')
print('-'*72)
for name,category in categories.items():
print(name, '...')
lang, enc = getlocale(category)
print(' Language: ', lang or '(undefined)')
print(' Encoding: ', enc or '(undefined)')
print()
###
try:
LC_MESSAGES
except NameError:
pass
else:
__all__.append("LC_MESSAGES")
if __name__=='__main__':
print('Locale aliasing:')
print()
_print_locale()
print()
print('Number formatting:')
print()
_test()
"""Interface to the liblzma compression library.
This module provides a class for reading and writing compressed files,
classes for incremental (de)compression, and convenience functions for
one-shot (de)compression.
These classes and functions support both the XZ and legacy LZMA
container formats, as well as raw compressed data streams.
"""
__all__ = [
"CHECK_NONE", "CHECK_CRC32", "CHECK_CRC64", "CHECK_SHA256",
"CHECK_ID_MAX", "CHECK_UNKNOWN",
"FILTER_LZMA1", "FILTER_LZMA2", "FILTER_DELTA", "FILTER_X86", "FILTER_IA64",
"FILTER_ARM", "FILTER_ARMTHUMB", "FILTER_POWERPC", "FILTER_SPARC",
"FORMAT_AUTO", "FORMAT_XZ", "FORMAT_ALONE", "FORMAT_RAW",
"MF_HC3", "MF_HC4", "MF_BT2", "MF_BT3", "MF_BT4",
"MODE_FAST", "MODE_NORMAL", "PRESET_DEFAULT", "PRESET_EXTREME",
"LZMACompressor", "LZMADecompressor", "LZMAFile", "LZMAError",
"open", "compress", "decompress", "is_check_supported",
]
import builtins
import io
from _lzma import *
from _lzma import _encode_filter_properties, _decode_filter_properties
_MODE_CLOSED = 0
_MODE_READ = 1
_MODE_READ_EOF = 2
_MODE_WRITE = 3
_BUFFER_SIZE = 8192
class LZMAFile(io.BufferedIOBase):
"""A file object providing transparent LZMA (de)compression.
An LZMAFile can act as a wrapper for an existing file object, or
refer directly to a named file on disk.
Note that LZMAFile provides a *binary* file interface - data read
is returned as bytes, and data to be written must be given as bytes.
"""
def __init__(self, filename=None, mode="r", *,
format=None, check=-1, preset=None, filters=None):
"""Open an LZMA-compressed file in binary mode.
filename can be either an actual file name (given as a str or
bytes object), in which case the named file is opened, or it can
be an existing file object to read from or write to.
mode can be "r" for reading (default), "w" for (over)writing,
"x" for creating exclusively, or "a" for appending. These can
equivalently be given as "rb", "wb", "xb" and "ab" respectively.
format specifies the container format to use for the file.
If mode is "r", this defaults to FORMAT_AUTO. Otherwise, the
default is FORMAT_XZ.
check specifies the integrity check to use. This argument can
only be used when opening a file for writing. For FORMAT_XZ,
the default is CHECK_CRC64. FORMAT_ALONE and FORMAT_RAW do not
support integrity checks - for these formats, check must be
omitted, or be CHECK_NONE.
When opening a file for reading, the *preset* argument is not
meaningful, and should be omitted. The *filters* argument should
also be omitted, except when format is FORMAT_RAW (in which case
it is required).
When opening a file for writing, the settings used by the
compressor can be specified either as a preset compression
level (with the *preset* argument), or in detail as a custom
filter chain (with the *filters* argument). For FORMAT_XZ and
FORMAT_ALONE, the default is to use the PRESET_DEFAULT preset
level. For FORMAT_RAW, the caller must always specify a filter
chain; the raw compressor does not support preset compression
levels.
preset (if provided) should be an integer in the range 0-9,
optionally OR-ed with the constant PRESET_EXTREME.
filters (if provided) should be a sequence of dicts. Each dict
should have an entry for "id" indicating ID of the filter, plus
additional entries for options to the filter.
"""
self._fp = None
self._closefp = False
self._mode = _MODE_CLOSED
self._pos = 0
self._size = -1
if mode in ("r", "rb"):
if check != -1:
raise ValueError("Cannot specify an integrity check "
"when opening a file for reading")
if preset is not None:
raise ValueError("Cannot specify a preset compression "
"level when opening a file for reading")
if format is None:
format = FORMAT_AUTO
mode_code = _MODE_READ
# Save the args to pass to the LZMADecompressor initializer.
# If the file contains multiple compressed streams, each
# stream will need a separate decompressor object.
self._init_args = {"format":format, "filters":filters}
self._decompressor = LZMADecompressor(**self._init_args)
self._buffer = b""
self._buffer_offset = 0
elif mode in ("w", "wb", "a", "ab", "x", "xb"):
if format is None:
format = FORMAT_XZ
mode_code = _MODE_WRITE
self._compressor = LZMACompressor(format=format, check=check,
preset=preset, filters=filters)
else:
raise ValueError("Invalid mode: {!r}".format(mode))
if isinstance(filename, (str, bytes)):
if "b" not in mode:
mode += "b"
self._fp = builtins.open(filename, mode)
self._closefp = True
self._mode = mode_code
elif hasattr(filename, "read") or hasattr(filename, "write"):
self._fp = filename
self._mode = mode_code
else:
raise TypeError("filename must be a str or bytes object, or a file")
def close(self):
"""Flush and close the file.
May be called more than once without error. Once the file is
closed, any other operation on it will raise a ValueError.
"""
if self._mode == _MODE_CLOSED:
return
try:
if self._mode in (_MODE_READ, _MODE_READ_EOF):
self._decompressor = None
self._buffer = b""
elif self._mode == _MODE_WRITE:
self._fp.write(self._compressor.flush())
self._compressor = None
finally:
try:
if self._closefp:
self._fp.close()
finally:
self._fp = None
self._closefp = False
self._mode = _MODE_CLOSED
@property
def closed(self):
"""True if this file is closed."""
return self._mode == _MODE_CLOSED
def fileno(self):
"""Return the file descriptor for the underlying file."""
self._check_not_closed()
return self._fp.fileno()
def seekable(self):
"""Return whether the file supports seeking."""
return self.readable() and self._fp.seekable()
def readable(self):
"""Return whether the file was opened for reading."""
self._check_not_closed()
return self._mode in (_MODE_READ, _MODE_READ_EOF)
def writable(self):
"""Return whether the file was opened for writing."""
self._check_not_closed()
return self._mode == _MODE_WRITE
# Mode-checking helper functions.
def _check_not_closed(self):
if self.closed:
raise ValueError("I/O operation on closed file")
def _check_can_read(self):
if self._mode not in (_MODE_READ, _MODE_READ_EOF):
self._check_not_closed()
raise io.UnsupportedOperation("File not open for reading")
def _check_can_write(self):
if self._mode != _MODE_WRITE:
self._check_not_closed()
raise io.UnsupportedOperation("File not open for writing")
def _check_can_seek(self):
if self._mode not in (_MODE_READ, _MODE_READ_EOF):
self._check_not_closed()
raise io.UnsupportedOperation("Seeking is only supported "
"on files open for reading")
if not self._fp.seekable():
raise io.UnsupportedOperation("The underlying file object "
"does not support seeking")
# Fill the readahead buffer if it is empty. Returns False on EOF.
def _fill_buffer(self):
if self._mode == _MODE_READ_EOF:
return False
# Depending on the input data, our call to the decompressor may not
# return any data. In this case, try again after reading another block.
while self._buffer_offset == len(self._buffer):
rawblock = (self._decompressor.unused_data or
self._fp.read(_BUFFER_SIZE))
if not rawblock:
if self._decompressor.eof:
self._mode = _MODE_READ_EOF
self._size = self._pos
return False
else:
raise EOFError("Compressed file ended before the "
"end-of-stream marker was reached")
if self._decompressor.eof:
# Continue to next stream.
self._decompressor = LZMADecompressor(**self._init_args)
try:
self._buffer = self._decompressor.decompress(rawblock)
except LZMAError:
# Trailing data isn't a valid compressed stream; ignore it.
self._mode = _MODE_READ_EOF
self._size = self._pos
return False
else:
self._buffer = self._decompressor.decompress(rawblock)
self._buffer_offset = 0
return True
# Read data until EOF.
# If return_data is false, consume the data without returning it.
def _read_all(self, return_data=True):
# The loop assumes that _buffer_offset is 0. Ensure that this is true.
self._buffer = self._buffer[self._buffer_offset:]
self._buffer_offset = 0
blocks = []
while self._fill_buffer():
if return_data:
blocks.append(self._buffer)
self._pos += len(self._buffer)
self._buffer = b""
if return_data:
return b"".join(blocks)
# Read a block of up to n bytes.
# If return_data is false, consume the data without returning it.
def _read_block(self, n, return_data=True):
# If we have enough data buffered, return immediately.
end = self._buffer_offset + n
if end <= len(self._buffer):
data = self._buffer[self._buffer_offset : end]
self._buffer_offset = end
self._pos += len(data)
return data if return_data else None
# The loop assumes that _buffer_offset is 0. Ensure that this is true.
self._buffer = self._buffer[self._buffer_offset:]
self._buffer_offset = 0
blocks = []
while n > 0 and self._fill_buffer():
if n < len(self._buffer):
data = self._buffer[:n]
self._buffer_offset = n
else:
data = self._buffer
self._buffer = b""
if return_data:
blocks.append(data)
self._pos += len(data)
n -= len(data)
if return_data:
return b"".join(blocks)
def peek(self, size=-1):
"""Return buffered data without advancing the file position.
Always returns at least one byte of data, unless at EOF.
The exact number of bytes returned is unspecified.
"""
self._check_can_read()
if not self._fill_buffer():
return b""
return self._buffer[self._buffer_offset:]
def read(self, size=-1):
"""Read up to size uncompressed bytes from the file.
If size is negative or omitted, read until EOF is reached.
Returns b"" if the file is already at EOF.
"""
self._check_can_read()
if size == 0:
return b""
elif size < 0:
return self._read_all()
else:
return self._read_block(size)
def read1(self, size=-1):
"""Read up to size uncompressed bytes, while trying to avoid
making multiple reads from the underlying stream.
Returns b"" if the file is at EOF.
"""
# Usually, read1() calls _fp.read() at most once. However, sometimes
# this does not give enough data for the decompressor to make progress.
# In this case we make multiple reads, to avoid returning b"".
self._check_can_read()
if (size == 0 or
# Only call _fill_buffer() if the buffer is actually empty.
# This gives a significant speedup if *size* is small.
(self._buffer_offset == len(self._buffer) and not self._fill_buffer())):
return b""
if size > 0:
data = self._buffer[self._buffer_offset :
self._buffer_offset + size]
self._buffer_offset += len(data)
else:
data = self._buffer[self._buffer_offset:]
self._buffer = b""
self._buffer_offset = 0
self._pos += len(data)
return data
def readline(self, size=-1):
"""Read a line of uncompressed bytes from the file.
The terminating newline (if present) is retained. If size is
non-negative, no more than size bytes will be read (in which
case the line may be incomplete). Returns b'' if already at EOF.
"""
self._check_can_read()
# Shortcut for the common case - the whole line is in the buffer.
if size < 0:
end = self._buffer.find(b"\n", self._buffer_offset) + 1
if end > 0:
line = self._buffer[self._buffer_offset : end]
self._buffer_offset = end
self._pos += len(line)
return line
return io.BufferedIOBase.readline(self, size)
def write(self, data):
"""Write a bytes object to the file.
Returns the number of uncompressed bytes written, which is
always len(data). Note that due to buffering, the file on disk
may not reflect the data written until close() is called.
"""
self._check_can_write()
compressed = self._compressor.compress(data)
self._fp.write(compressed)
self._pos += len(data)
return len(data)
# Rewind the file to the beginning of the data stream.
def _rewind(self):
self._fp.seek(0, 0)
self._mode = _MODE_READ
self._pos = 0
self._decompressor = LZMADecompressor(**self._init_args)
self._buffer = b""
self._buffer_offset = 0
def seek(self, offset, whence=0):
"""Change the file position.
The new position is specified by offset, relative to the
position indicated by whence. Possible values for whence are:
0: start of stream (default): offset must not be negative
1: current stream position
2: end of stream; offset must not be positive
Returns the new file position.
Note that seeking is emulated, sp depending on the parameters,
this operation may be extremely slow.
"""
self._check_can_seek()
# Recalculate offset as an absolute file position.
if whence == 0:
pass
elif whence == 1:
offset = self._pos + offset
elif whence == 2:
# Seeking relative to EOF - we need to know the file's size.
if self._size < 0:
self._read_all(return_data=False)
offset = self._size + offset
else:
raise ValueError("Invalid value for whence: {}".format(whence))
# Make it so that offset is the number of bytes to skip forward.
if offset < self._pos:
self._rewind()
else:
offset -= self._pos
# Read and discard data until we reach the desired position.
self._read_block(offset, return_data=False)
return self._pos
def tell(self):
"""Return the current file position."""
self._check_not_closed()
return self._pos
def open(filename, mode="rb", *,
format=None, check=-1, preset=None, filters=None,
encoding=None, errors=None, newline=None):
"""Open an LZMA-compressed file in binary or text mode.
filename can be either an actual file name (given as a str or bytes
object), in which case the named file is opened, or it can be an
existing file object to read from or write to.
The mode argument can be "r", "rb" (default), "w", "wb", "x", "xb",
"a", or "ab" for binary mode, or "rt", "wt", "xt", or "at" for text
mode.
The format, check, preset and filters arguments specify the
compression settings, as for LZMACompressor, LZMADecompressor and
LZMAFile.
For binary mode, this function is equivalent to the LZMAFile
constructor: LZMAFile(filename, mode, ...). In this case, the
encoding, errors and newline arguments must not be provided.
For text mode, a LZMAFile object is created, and wrapped in an
io.TextIOWrapper instance with the specified encoding, error
handling behavior, and line ending(s).
"""
if "t" in mode:
if "b" in mode:
raise ValueError("Invalid mode: %r" % (mode,))
else:
if encoding is not None:
raise ValueError("Argument 'encoding' not supported in binary mode")
if errors is not None:
raise ValueError("Argument 'errors' not supported in binary mode")
if newline is not None:
raise ValueError("Argument 'newline' not supported in binary mode")
lz_mode = mode.replace("t", "")
binary_file = LZMAFile(filename, lz_mode, format=format, check=check,
preset=preset, filters=filters)
if "t" in mode:
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
def compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None):
"""Compress a block of data.
Refer to LZMACompressor's docstring for a description of the
optional arguments *format*, *check*, *preset* and *filters*.
For incremental compression, use an LZMACompressor instead.
"""
comp = LZMACompressor(format, check, preset, filters)
return comp.compress(data) + comp.flush()
def decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None):
"""Decompress a block of data.
Refer to LZMADecompressor's docstring for a description of the
optional arguments *format*, *check* and *filters*.
For incremental decompression, use an LZMADecompressor instead.
"""
results = []
while True:
decomp = LZMADecompressor(format, memlimit, filters)
try:
res = decomp.decompress(data)
except LZMAError:
if results:
break # Leftover data is not a valid LZMA/XZ stream; ignore it.
else:
raise # Error on the first iteration; bail out.
results.append(res)
if not decomp.eof:
raise LZMAError("Compressed data ended before the "
"end-of-stream marker was reached")
data = decomp.unused_data
if not data:
break
return b"".join(results)
"""Pathname and path-related operations for the Macintosh."""
import os
from stat import *
import genericpath
from genericpath import *
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime", "islink","exists","lexists","isdir","isfile",
"expanduser","expandvars","normpath","abspath",
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
"devnull","realpath","supports_unicode_filenames"]
# strings representing various path-related bits and pieces
# These are primarily for export; internally, they are hardcoded.
curdir = ':'
pardir = '::'
extsep = '.'
sep = ':'
pathsep = '\n'
defpath = ':'
altsep = None
devnull = 'Dev:Null'
def _get_colon(path):
if isinstance(path, bytes):
return b':'
else:
return ':'
# Normalize the case of a pathname. Dummy in Posix, but <s>.lower() here.
def normcase(path):
if not isinstance(path, (bytes, str)):
raise TypeError("normcase() argument must be str or bytes, "
"not '{}'".format(path.__class__.__name__))
return path.lower()
def isabs(s):
"""Return true if a path is absolute.
On the Mac, relative paths begin with a colon,
but as a special case, paths with no colons at all are also relative.
Anything else is absolute (the string up to the first colon is the
volume name)."""
colon = _get_colon(s)
return colon in s and s[:1] != colon
def join(s, *p):
colon = _get_colon(s)
path = s
for t in p:
if (not path) or isabs(t):
path = t
continue
if t[:1] == colon:
t = t[1:]
if colon not in path:
path = colon + path
if path[-1:] != colon:
path = path + colon
path = path + t
return path
def split(s):
"""Split a pathname into two parts: the directory leading up to the final
bit, and the basename (the filename, without colons, in that directory).
The result (s, t) is such that join(s, t) yields the original argument."""
colon = _get_colon(s)
if colon not in s: return s[:0], s
col = 0
for i in range(len(s)):
if s[i:i+1] == colon: col = i + 1
path, file = s[:col-1], s[col:]
if path and not colon in path:
path = path + colon
return path, file
def splitext(p):
if isinstance(p, bytes):
return genericpath._splitext(p, b':', altsep, b'.')
else:
return genericpath._splitext(p, sep, altsep, extsep)
splitext.__doc__ = genericpath._splitext.__doc__
def splitdrive(p):
"""Split a pathname into a drive specification and the rest of the
path. Useful on DOS/Windows/NT; on the Mac, the drive is always
empty (don't use the volume name -- it doesn't have the same
syntactic and semantic oddities as DOS drive letters, such as there
being a separate current directory per drive)."""
return p[:0], p
# Short interfaces to split()
def dirname(s): return split(s)[0]
def basename(s): return split(s)[1]
def ismount(s):
if not isabs(s):
return False
components = split(s)
return len(components) == 2 and not components[1]
def islink(s):
"""Return true if the pathname refers to a symbolic link."""
try:
import Carbon.File
return Carbon.File.ResolveAliasFile(s, 0)[2]
except:
return False
# Is `stat`/`lstat` a meaningful difference on the Mac? This is safe in any
# case.
def lexists(path):
"""Test whether a path exists. Returns True for broken symbolic links"""
try:
st = os.lstat(path)
except OSError:
return False
return True
def expandvars(path):
"""Dummy to retain interface-compatibility with other operating systems."""
return path
def expanduser(path):
"""Dummy to retain interface-compatibility with other operating systems."""
return path
class norm_error(Exception):
"""Path cannot be normalized"""
def normpath(s):
"""Normalize a pathname. Will return the same result for
equivalent paths."""
colon = _get_colon(s)
if colon not in s:
return colon + s
comps = s.split(colon)
i = 1
while i < len(comps)-1:
if not comps[i] and comps[i-1]:
if i > 1:
del comps[i-1:i+1]
i = i - 1
else:
# best way to handle this is to raise an exception
raise norm_error('Cannot use :: immediately after volume name')
else:
i = i + 1
s = colon.join(comps)
# remove trailing ":" except for ":" and "Volume:"
if s[-1:] == colon and len(comps) > 2 and s != colon*len(s):
s = s[:-1]
return s
def abspath(path):
"""Return an absolute path."""
if not isabs(path):
if isinstance(path, bytes):
cwd = os.getcwdb()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
# realpath is a no-op on systems without islink support
def realpath(path):
path = abspath(path)
try:
import Carbon.File
except ImportError:
return path
if not path:
return path
colon = _get_colon(path)
components = path.split(colon)
path = components[0] + colon
for c in components[1:]:
path = join(path, c)
try:
path = Carbon.File.FSResolveAliasFile(path, 1)[0].as_pathname()
except Carbon.File.Error:
pass
return path
supports_unicode_filenames = True
"""Macintosh-specific module for conversion between pathnames and URLs.
Do not import directly; use urllib instead."""
import urllib.parse
import os
__all__ = ["url2pathname","pathname2url"]
def url2pathname(pathname):
"""OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use."""
#
# XXXX The .. handling should be fixed...
#
tp = urllib.parse.splittype(pathname)[0]
if tp and tp != 'file':
raise RuntimeError('Cannot convert non-local URL to pathname')
# Turn starting /// into /, an empty hostname means current host
if pathname[:3] == '///':
pathname = pathname[2:]
elif pathname[:2] == '//':
raise RuntimeError('Cannot convert non-local URL to pathname')
components = pathname.split('/')
# Remove . and embedded ..
i = 0
while i < len(components):
if components[i] == '.':
del components[i]
elif components[i] == '..' and i > 0 and \
components[i-1] not in ('', '..'):
del components[i-1:i+1]
i = i-1
elif components[i] == '' and i > 0 and components[i-1] != '':
del components[i]
else:
i = i+1
if not components[0]:
# Absolute unix path, don't start with colon
rv = ':'.join(components[1:])
else:
# relative unix path, start with colon. First replace
# leading .. by empty strings (giving ::file)
i = 0
while i < len(components) and components[i] == '..':
components[i] = ''
i = i + 1
rv = ':' + ':'.join(components)
# and finally unquote slashes and other funny characters
return urllib.parse.unquote(rv)
def pathname2url(pathname):
"""OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use."""
if '/' in pathname:
raise RuntimeError("Cannot convert pathname containing slashes")
components = pathname.split(':')
# Remove empty first and/or last component
if components[0] == '':
del components[0]
if components[-1] == '':
del components[-1]
# Replace empty string ('::') by .. (will result in '/../' later)
for i in range(len(components)):
if components[i] == '':
components[i] = '..'
# Truncate names longer than 31 bytes
components = map(_pncomp2url, components)
if os.path.isabs(pathname):
return '/' + '/'.join(components)
else:
return '/'.join(components)
def _pncomp2url(component):
# We want to quote slashes
return urllib.parse.quote(component[:31], safe='')
"""Read/write support for Maildir, mbox, MH, Babyl, and MMDF mailboxes."""
# Notes for authors of new mailbox subclasses:
#
# Remember to fsync() changes to disk before closing a modified file
# or returning from a flush() method. See functions _sync_flush() and
# _sync_close().
import os
import time
import calendar
import socket
import errno
import copy
import warnings
import email
import email.message
import email.generator
import io
import contextlib
try:
import fcntl
except ImportError:
fcntl = None
__all__ = [ 'Mailbox', 'Maildir', 'mbox', 'MH', 'Babyl', 'MMDF',
'Message', 'MaildirMessage', 'mboxMessage', 'MHMessage',
'BabylMessage', 'MMDFMessage']
linesep = os.linesep.encode('ascii')
class Mailbox:
"""A group of messages in a particular place."""
def __init__(self, path, factory=None, create=True):
"""Initialize a Mailbox instance."""
self._path = os.path.abspath(os.path.expanduser(path))
self._factory = factory
def add(self, message):
"""Add message and return assigned key."""
raise NotImplementedError('Method must be implemented by subclass')
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
raise NotImplementedError('Method must be implemented by subclass')
def __delitem__(self, key):
self.remove(key)
def discard(self, key):
"""If the keyed message exists, remove it."""
try:
self.remove(key)
except KeyError:
pass
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
raise NotImplementedError('Method must be implemented by subclass')
def get(self, key, default=None):
"""Return the keyed message, or default if it doesn't exist."""
try:
return self.__getitem__(key)
except KeyError:
return default
def __getitem__(self, key):
"""Return the keyed message; raise KeyError if it doesn't exist."""
if not self._factory:
return self.get_message(key)
else:
with contextlib.closing(self.get_file(key)) as file:
return self._factory(file)
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
raise NotImplementedError('Method must be implemented by subclass')
def get_string(self, key):
"""Return a string representation or raise a KeyError.
Uses email.message.Message to create a 7bit clean string
representation of the message."""
return email.message_from_bytes(self.get_bytes(key)).as_string()
def get_bytes(self, key):
"""Return a byte string representation or raise a KeyError."""
raise NotImplementedError('Method must be implemented by subclass')
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
raise NotImplementedError('Method must be implemented by subclass')
def iterkeys(self):
"""Return an iterator over keys."""
raise NotImplementedError('Method must be implemented by subclass')
def keys(self):
"""Return a list of keys."""
return list(self.iterkeys())
def itervalues(self):
"""Return an iterator over all messages."""
for key in self.iterkeys():
try:
value = self[key]
except KeyError:
continue
yield value
def __iter__(self):
return self.itervalues()
def values(self):
"""Return a list of messages. Memory intensive."""
return list(self.itervalues())
def iteritems(self):
"""Return an iterator over (key, message) tuples."""
for key in self.iterkeys():
try:
value = self[key]
except KeyError:
continue
yield (key, value)
def items(self):
"""Return a list of (key, message) tuples. Memory intensive."""
return list(self.iteritems())
def __contains__(self, key):
"""Return True if the keyed message exists, False otherwise."""
raise NotImplementedError('Method must be implemented by subclass')
def __len__(self):
"""Return a count of messages in the mailbox."""
raise NotImplementedError('Method must be implemented by subclass')
def clear(self):
"""Delete all messages."""
for key in self.keys():
self.discard(key)
def pop(self, key, default=None):
"""Delete the keyed message and return it, or default."""
try:
result = self[key]
except KeyError:
return default
self.discard(key)
return result
def popitem(self):
"""Delete an arbitrary (key, message) pair and return it."""
for key in self.iterkeys():
return (key, self.pop(key)) # This is only run once.
else:
raise KeyError('No messages in mailbox')
def update(self, arg=None):
"""Change the messages that correspond to certain keys."""
if hasattr(arg, 'iteritems'):
source = arg.iteritems()
elif hasattr(arg, 'items'):
source = arg.items()
else:
source = arg
bad_key = False
for key, message in source:
try:
self[key] = message
except KeyError:
bad_key = True
if bad_key:
raise KeyError('No message with key(s)')
def flush(self):
"""Write any pending changes to the disk."""
raise NotImplementedError('Method must be implemented by subclass')
def lock(self):
"""Lock the mailbox."""
raise NotImplementedError('Method must be implemented by subclass')
def unlock(self):
"""Unlock the mailbox if it is locked."""
raise NotImplementedError('Method must be implemented by subclass')
def close(self):
"""Flush and close the mailbox."""
raise NotImplementedError('Method must be implemented by subclass')
def _string_to_bytes(self, message):
# If a message is not 7bit clean, we refuse to handle it since it
# likely came from reading invalid messages in text mode, and that way
# lies mojibake.
try:
return message.encode('ascii')
except UnicodeError:
raise ValueError("String input must be ASCII-only; "
"use bytes or a Message instead")
# Whether each message must end in a newline
_append_newline = False
def _dump_message(self, message, target, mangle_from_=False):
# This assumes the target file is open in binary mode.
"""Dump message contents to target file."""
if isinstance(message, email.message.Message):
buffer = io.BytesIO()
gen = email.generator.BytesGenerator(buffer, mangle_from_, 0)
gen.flatten(message)
buffer.seek(0)
data = buffer.read()
data = data.replace(b'\n', linesep)
target.write(data)
if self._append_newline and not data.endswith(linesep):
# Make sure the message ends with a newline
target.write(linesep)
elif isinstance(message, (str, bytes, io.StringIO)):
if isinstance(message, io.StringIO):
warnings.warn("Use of StringIO input is deprecated, "
"use BytesIO instead", DeprecationWarning, 3)
message = message.getvalue()
if isinstance(message, str):
message = self._string_to_bytes(message)
if mangle_from_:
message = message.replace(b'\nFrom ', b'\n>From ')
message = message.replace(b'\n', linesep)
target.write(message)
if self._append_newline and not message.endswith(linesep):
# Make sure the message ends with a newline
target.write(linesep)
elif hasattr(message, 'read'):
if hasattr(message, 'buffer'):
warnings.warn("Use of text mode files is deprecated, "
"use a binary mode file instead", DeprecationWarning, 3)
message = message.buffer
lastline = None
while True:
line = message.readline()
# Universal newline support.
if line.endswith(b'\r\n'):
line = line[:-2] + b'\n'
elif line.endswith(b'\r'):
line = line[:-1] + b'\n'
if not line:
break
if mangle_from_ and line.startswith(b'From '):
line = b'>From ' + line[5:]
line = line.replace(b'\n', linesep)
target.write(line)
lastline = line
if self._append_newline and lastline and not lastline.endswith(linesep):
# Make sure the message ends with a newline
target.write(linesep)
else:
raise TypeError('Invalid message type: %s' % type(message))
class Maildir(Mailbox):
"""A qmail-style Maildir mailbox."""
colon = ':'
def __init__(self, dirname, factory=None, create=True):
"""Initialize a Maildir instance."""
Mailbox.__init__(self, dirname, factory, create)
self._paths = {
'tmp': os.path.join(self._path, 'tmp'),
'new': os.path.join(self._path, 'new'),
'cur': os.path.join(self._path, 'cur'),
}
if not os.path.exists(self._path):
if create:
os.mkdir(self._path, 0o700)
for path in self._paths.values():
os.mkdir(path, 0o700)
else:
raise NoSuchMailboxError(self._path)
self._toc = {}
self._toc_mtimes = {'cur': 0, 'new': 0}
self._last_read = 0 # Records last time we read cur/new
self._skewfactor = 0.1 # Adjust if os/fs clocks are skewing
def add(self, message):
"""Add message and return assigned key."""
tmp_file = self._create_tmp()
try:
self._dump_message(message, tmp_file)
except BaseException:
tmp_file.close()
os.remove(tmp_file.name)
raise
_sync_close(tmp_file)
if isinstance(message, MaildirMessage):
subdir = message.get_subdir()
suffix = self.colon + message.get_info()
if suffix == self.colon:
suffix = ''
else:
subdir = 'new'
suffix = ''
uniq = os.path.basename(tmp_file.name).split(self.colon)[0]
dest = os.path.join(self._path, subdir, uniq + suffix)
if isinstance(message, MaildirMessage):
os.utime(tmp_file.name,
(os.path.getatime(tmp_file.name), message.get_date()))
# No file modification should be done after the file is moved to its
# final position in order to prevent race conditions with changes
# from other programs
try:
if hasattr(os, 'link'):
os.link(tmp_file.name, dest)
os.remove(tmp_file.name)
else:
os.rename(tmp_file.name, dest)
except OSError as e:
os.remove(tmp_file.name)
if e.errno == errno.EEXIST:
raise ExternalClashError('Name clash with existing message: %s'
% dest)
else:
raise
return uniq
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
os.remove(os.path.join(self._path, self._lookup(key)))
def discard(self, key):
"""If the keyed message exists, remove it."""
# This overrides an inapplicable implementation in the superclass.
try:
self.remove(key)
except (KeyError, FileNotFoundError):
pass
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
old_subpath = self._lookup(key)
temp_key = self.add(message)
temp_subpath = self._lookup(temp_key)
if isinstance(message, MaildirMessage):
# temp's subdir and suffix were specified by message.
dominant_subpath = temp_subpath
else:
# temp's subdir and suffix were defaults from add().
dominant_subpath = old_subpath
subdir = os.path.dirname(dominant_subpath)
if self.colon in dominant_subpath:
suffix = self.colon + dominant_subpath.split(self.colon)[-1]
else:
suffix = ''
self.discard(key)
tmp_path = os.path.join(self._path, temp_subpath)
new_path = os.path.join(self._path, subdir, key + suffix)
if isinstance(message, MaildirMessage):
os.utime(tmp_path,
(os.path.getatime(tmp_path), message.get_date()))
# No file modification should be done after the file is moved to its
# final position in order to prevent race conditions with changes
# from other programs
os.rename(tmp_path, new_path)
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
subpath = self._lookup(key)
with open(os.path.join(self._path, subpath), 'rb') as f:
if self._factory:
msg = self._factory(f)
else:
msg = MaildirMessage(f)
subdir, name = os.path.split(subpath)
msg.set_subdir(subdir)
if self.colon in name:
msg.set_info(name.split(self.colon)[-1])
msg.set_date(os.path.getmtime(os.path.join(self._path, subpath)))
return msg
def get_bytes(self, key):
"""Return a bytes representation or raise a KeyError."""
with open(os.path.join(self._path, self._lookup(key)), 'rb') as f:
return f.read().replace(linesep, b'\n')
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
f = open(os.path.join(self._path, self._lookup(key)), 'rb')
return _ProxyFile(f)
def iterkeys(self):
"""Return an iterator over keys."""
self._refresh()
for key in self._toc:
try:
self._lookup(key)
except KeyError:
continue
yield key
def __contains__(self, key):
"""Return True if the keyed message exists, False otherwise."""
self._refresh()
return key in self._toc
def __len__(self):
"""Return a count of messages in the mailbox."""
self._refresh()
return len(self._toc)
def flush(self):
"""Write any pending changes to disk."""
# Maildir changes are always written immediately, so there's nothing
# to do.
pass
def lock(self):
"""Lock the mailbox."""
return
def unlock(self):
"""Unlock the mailbox if it is locked."""
return
def close(self):
"""Flush and close the mailbox."""
return
def list_folders(self):
"""Return a list of folder names."""
result = []
for entry in os.listdir(self._path):
if len(entry) > 1 and entry[0] == '.' and \
os.path.isdir(os.path.join(self._path, entry)):
result.append(entry[1:])
return result
def get_folder(self, folder):
"""Return a Maildir instance for the named folder."""
return Maildir(os.path.join(self._path, '.' + folder),
factory=self._factory,
create=False)
def add_folder(self, folder):
"""Create a folder and return a Maildir instance representing it."""
path = os.path.join(self._path, '.' + folder)
result = Maildir(path, factory=self._factory)
maildirfolder_path = os.path.join(path, 'maildirfolder')
if not os.path.exists(maildirfolder_path):
os.close(os.open(maildirfolder_path, os.O_CREAT | os.O_WRONLY,
0o666))
return result
def remove_folder(self, folder):
"""Delete the named folder, which must be empty."""
path = os.path.join(self._path, '.' + folder)
for entry in os.listdir(os.path.join(path, 'new')) + \
os.listdir(os.path.join(path, 'cur')):
if len(entry) < 1 or entry[0] != '.':
raise NotEmptyError('Folder contains message(s): %s' % folder)
for entry in os.listdir(path):
if entry != 'new' and entry != 'cur' and entry != 'tmp' and \
os.path.isdir(os.path.join(path, entry)):
raise NotEmptyError("Folder contains subdirectory '%s': %s" %
(folder, entry))
for root, dirs, files in os.walk(path, topdown=False):
for entry in files:
os.remove(os.path.join(root, entry))
for entry in dirs:
os.rmdir(os.path.join(root, entry))
os.rmdir(path)
def clean(self):
"""Delete old files in "tmp"."""
now = time.time()
for entry in os.listdir(os.path.join(self._path, 'tmp')):
path = os.path.join(self._path, 'tmp', entry)
if now - os.path.getatime(path) > 129600: # 60 * 60 * 36
os.remove(path)
_count = 1 # This is used to generate unique file names.
def _create_tmp(self):
"""Create a file in the tmp subdirectory and open and return it."""
now = time.time()
hostname = socket.gethostname()
if '/' in hostname:
hostname = hostname.replace('/', r'\057')
if ':' in hostname:
hostname = hostname.replace(':', r'\072')
uniq = "%s.M%sP%sQ%s.%s" % (int(now), int(now % 1 * 1e6), os.getpid(),
Maildir._count, hostname)
path = os.path.join(self._path, 'tmp', uniq)
try:
os.stat(path)
except FileNotFoundError:
Maildir._count += 1
try:
return _create_carefully(path)
except FileExistsError:
pass
# Fall through to here if stat succeeded or open raised EEXIST.
raise ExternalClashError('Name clash prevented file creation: %s' %
path)
def _refresh(self):
"""Update table of contents mapping."""
# If it has been less than two seconds since the last _refresh() call,
# we have to unconditionally re-read the mailbox just in case it has
# been modified, because os.path.mtime() has a 2 sec resolution in the
# most common worst case (FAT) and a 1 sec resolution typically. This
# results in a few unnecessary re-reads when _refresh() is called
# multiple times in that interval, but once the clock ticks over, we
# will only re-read as needed. Because the filesystem might be being
# served by an independent system with its own clock, we record and
# compare with the mtimes from the filesystem. Because the other
# system's clock might be skewing relative to our clock, we add an
# extra delta to our wait. The default is one tenth second, but is an
# instance variable and so can be adjusted if dealing with a
# particularly skewed or irregular system.
if time.time() - self._last_read > 2 + self._skewfactor:
refresh = False
for subdir in self._toc_mtimes:
mtime = os.path.getmtime(self._paths[subdir])
if mtime > self._toc_mtimes[subdir]:
refresh = True
self._toc_mtimes[subdir] = mtime
if not refresh:
return
# Refresh toc
self._toc = {}
for subdir in self._toc_mtimes:
path = self._paths[subdir]
for entry in os.listdir(path):
p = os.path.join(path, entry)
if os.path.isdir(p):
continue
uniq = entry.split(self.colon)[0]
self._toc[uniq] = os.path.join(subdir, entry)
self._last_read = time.time()
def _lookup(self, key):
"""Use TOC to return subpath for given key, or raise a KeyError."""
try:
if os.path.exists(os.path.join(self._path, self._toc[key])):
return self._toc[key]
except KeyError:
pass
self._refresh()
try:
return self._toc[key]
except KeyError:
raise KeyError('No message with key: %s' % key)
# This method is for backward compatibility only.
def next(self):
"""Return the next message in a one-time iteration."""
if not hasattr(self, '_onetime_keys'):
self._onetime_keys = self.iterkeys()
while True:
try:
return self[next(self._onetime_keys)]
except StopIteration:
return None
except KeyError:
continue
class _singlefileMailbox(Mailbox):
"""A single-file mailbox."""
def __init__(self, path, factory=None, create=True):
"""Initialize a single-file mailbox."""
Mailbox.__init__(self, path, factory, create)
try:
f = open(self._path, 'rb+')
except OSError as e:
if e.errno == errno.ENOENT:
if create:
f = open(self._path, 'wb+')
else:
raise NoSuchMailboxError(self._path)
elif e.errno in (errno.EACCES, errno.EROFS):
f = open(self._path, 'rb')
else:
raise
self._file = f
self._toc = None
self._next_key = 0
self._pending = False # No changes require rewriting the file.
self._pending_sync = False # No need to sync the file
self._locked = False
self._file_length = None # Used to record mailbox size
def add(self, message):
"""Add message and return assigned key."""
self._lookup()
self._toc[self._next_key] = self._append_message(message)
self._next_key += 1
# _append_message appends the message to the mailbox file. We
# don't need a full rewrite + rename, sync is enough.
self._pending_sync = True
return self._next_key - 1
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
self._lookup(key)
del self._toc[key]
self._pending = True
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
self._lookup(key)
self._toc[key] = self._append_message(message)
self._pending = True
def iterkeys(self):
"""Return an iterator over keys."""
self._lookup()
yield from self._toc.keys()
def __contains__(self, key):
"""Return True if the keyed message exists, False otherwise."""
self._lookup()
return key in self._toc
def __len__(self):
"""Return a count of messages in the mailbox."""
self._lookup()
return len(self._toc)
def lock(self):
"""Lock the mailbox."""
if not self._locked:
_lock_file(self._file)
self._locked = True
def unlock(self):
"""Unlock the mailbox if it is locked."""
if self._locked:
_unlock_file(self._file)
self._locked = False
def flush(self):
"""Write any pending changes to disk."""
if not self._pending:
if self._pending_sync:
# Messages have only been added, so syncing the file
# is enough.
_sync_flush(self._file)
self._pending_sync = False
return
# In order to be writing anything out at all, self._toc must
# already have been generated (and presumably has been modified
# by adding or deleting an item).
assert self._toc is not None
# Check length of self._file; if it's changed, some other process
# has modified the mailbox since we scanned it.
self._file.seek(0, 2)
cur_len = self._file.tell()
if cur_len != self._file_length:
raise ExternalClashError('Size of mailbox file changed '
'(expected %i, found %i)' %
(self._file_length, cur_len))
new_file = _create_temporary(self._path)
try:
new_toc = {}
self._pre_mailbox_hook(new_file)
for key in sorted(self._toc.keys()):
start, stop = self._toc[key]
self._file.seek(start)
self._pre_message_hook(new_file)
new_start = new_file.tell()
while True:
buffer = self._file.read(min(4096,
stop - self._file.tell()))
if not buffer:
break
new_file.write(buffer)
new_toc[key] = (new_start, new_file.tell())
self._post_message_hook(new_file)
self._file_length = new_file.tell()
except:
new_file.close()
os.remove(new_file.name)
raise
_sync_close(new_file)
# self._file is about to get replaced, so no need to sync.
self._file.close()
# Make sure the new file's mode is the same as the old file's
mode = os.stat(self._path).st_mode
os.chmod(new_file.name, mode)
try:
os.rename(new_file.name, self._path)
except FileExistsError:
os.remove(self._path)
os.rename(new_file.name, self._path)
self._file = open(self._path, 'rb+')
self._toc = new_toc
self._pending = False
self._pending_sync = False
if self._locked:
_lock_file(self._file, dotlock=False)
def _pre_mailbox_hook(self, f):
"""Called before writing the mailbox to file f."""
return
def _pre_message_hook(self, f):
"""Called before writing each message to file f."""
return
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
return
def close(self):
"""Flush and close the mailbox."""
try:
self.flush()
finally:
try:
if self._locked:
self.unlock()
finally:
self._file.close() # Sync has been done by self.flush() above.
def _lookup(self, key=None):
"""Return (start, stop) or raise KeyError."""
if self._toc is None:
self._generate_toc()
if key is not None:
try:
return self._toc[key]
except KeyError:
raise KeyError('No message with key: %s' % key)
def _append_message(self, message):
"""Append message to mailbox and return (start, stop) offsets."""
self._file.seek(0, 2)
before = self._file.tell()
if len(self._toc) == 0 and not self._pending:
# This is the first message, and the _pre_mailbox_hook
# hasn't yet been called. If self._pending is True,
# messages have been removed, so _pre_mailbox_hook must
# have been called already.
self._pre_mailbox_hook(self._file)
try:
self._pre_message_hook(self._file)
offsets = self._install_message(message)
self._post_message_hook(self._file)
except BaseException:
self._file.truncate(before)
raise
self._file.flush()
self._file_length = self._file.tell() # Record current length of mailbox
return offsets
class _mboxMMDF(_singlefileMailbox):
"""An mbox or MMDF mailbox."""
_mangle_from_ = True
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
from_line = self._file.readline().replace(linesep, b'')
string = self._file.read(stop - self._file.tell())
msg = self._message_factory(string.replace(linesep, b'\n'))
msg.set_from(from_line[5:].decode('ascii'))
return msg
def get_string(self, key, from_=False):
"""Return a string representation or raise a KeyError."""
return email.message_from_bytes(
self.get_bytes(key)).as_string(unixfrom=from_)
def get_bytes(self, key, from_=False):
"""Return a string representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
if not from_:
self._file.readline()
string = self._file.read(stop - self._file.tell())
return string.replace(linesep, b'\n')
def get_file(self, key, from_=False):
"""Return a file-like representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
if not from_:
self._file.readline()
return _PartialFile(self._file, self._file.tell(), stop)
def _install_message(self, message):
"""Format a message and blindly write to self._file."""
from_line = None
if isinstance(message, str):
message = self._string_to_bytes(message)
if isinstance(message, bytes) and message.startswith(b'From '):
newline = message.find(b'\n')
if newline != -1:
from_line = message[:newline]
message = message[newline + 1:]
else:
from_line = message
message = b''
elif isinstance(message, _mboxMMDFMessage):
author = message.get_from().encode('ascii')
from_line = b'From ' + author
elif isinstance(message, email.message.Message):
from_line = message.get_unixfrom() # May be None.
if from_line is not None:
from_line = from_line.encode('ascii')
if from_line is None:
from_line = b'From MAILER-DAEMON ' + time.asctime(time.gmtime()).encode()
start = self._file.tell()
self._file.write(from_line + linesep)
self._dump_message(message, self._file, self._mangle_from_)
stop = self._file.tell()
return (start, stop)
class mbox(_mboxMMDF):
"""A classic mbox mailbox."""
_mangle_from_ = True
# All messages must end in a newline character, and
# _post_message_hooks outputs an empty line between messages.
_append_newline = True
def __init__(self, path, factory=None, create=True):
"""Initialize an mbox mailbox."""
self._message_factory = mboxMessage
_mboxMMDF.__init__(self, path, factory, create)
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
f.write(linesep)
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
last_was_empty = False
self._file.seek(0)
while True:
line_pos = self._file.tell()
line = self._file.readline()
if line.startswith(b'From '):
if len(stops) < len(starts):
if last_was_empty:
stops.append(line_pos - len(linesep))
else:
# The last line before the "From " line wasn't
# blank, but we consider it a start of a
# message anyway.
stops.append(line_pos)
starts.append(line_pos)
last_was_empty = False
elif not line:
if last_was_empty:
stops.append(line_pos - len(linesep))
else:
stops.append(line_pos)
break
elif line == linesep:
last_was_empty = True
else:
last_was_empty = False
self._toc = dict(enumerate(zip(starts, stops)))
self._next_key = len(self._toc)
self._file_length = self._file.tell()
class MMDF(_mboxMMDF):
"""An MMDF mailbox."""
def __init__(self, path, factory=None, create=True):
"""Initialize an MMDF mailbox."""
self._message_factory = MMDFMessage
_mboxMMDF.__init__(self, path, factory, create)
def _pre_message_hook(self, f):
"""Called before writing each message to file f."""
f.write(b'\001\001\001\001' + linesep)
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
f.write(linesep + b'\001\001\001\001' + linesep)
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
self._file.seek(0)
next_pos = 0
while True:
line_pos = next_pos
line = self._file.readline()
next_pos = self._file.tell()
if line.startswith(b'\001\001\001\001' + linesep):
starts.append(next_pos)
while True:
line_pos = next_pos
line = self._file.readline()
next_pos = self._file.tell()
if line == b'\001\001\001\001' + linesep:
stops.append(line_pos - len(linesep))
break
elif not line:
stops.append(line_pos)
break
elif not line:
break
self._toc = dict(enumerate(zip(starts, stops)))
self._next_key = len(self._toc)
self._file.seek(0, 2)
self._file_length = self._file.tell()
class MH(Mailbox):
"""An MH mailbox."""
def __init__(self, path, factory=None, create=True):
"""Initialize an MH instance."""
Mailbox.__init__(self, path, factory, create)
if not os.path.exists(self._path):
if create:
os.mkdir(self._path, 0o700)
os.close(os.open(os.path.join(self._path, '.mh_sequences'),
os.O_CREAT | os.O_EXCL | os.O_WRONLY, 0o600))
else:
raise NoSuchMailboxError(self._path)
self._locked = False
def add(self, message):
"""Add message and return assigned key."""
keys = self.keys()
if len(keys) == 0:
new_key = 1
else:
new_key = max(keys) + 1
new_path = os.path.join(self._path, str(new_key))
f = _create_carefully(new_path)
closed = False
try:
if self._locked:
_lock_file(f)
try:
try:
self._dump_message(message, f)
except BaseException:
# Unlock and close so it can be deleted on Windows
if self._locked:
_unlock_file(f)
_sync_close(f)
closed = True
os.remove(new_path)
raise
if isinstance(message, MHMessage):
self._dump_sequences(message, new_key)
finally:
if self._locked:
_unlock_file(f)
finally:
if not closed:
_sync_close(f)
return new_key
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
path = os.path.join(self._path, str(key))
try:
f = open(path, 'rb+')
except OSError as e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
else:
f.close()
os.remove(path)
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
path = os.path.join(self._path, str(key))
try:
f = open(path, 'rb+')
except OSError as e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
try:
if self._locked:
_lock_file(f)
try:
os.close(os.open(path, os.O_WRONLY | os.O_TRUNC))
self._dump_message(message, f)
if isinstance(message, MHMessage):
self._dump_sequences(message, key)
finally:
if self._locked:
_unlock_file(f)
finally:
_sync_close(f)
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
try:
if self._locked:
f = open(os.path.join(self._path, str(key)), 'rb+')
else:
f = open(os.path.join(self._path, str(key)), 'rb')
except OSError as e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
with f:
if self._locked:
_lock_file(f)
try:
msg = MHMessage(f)
finally:
if self._locked:
_unlock_file(f)
for name, key_list in self.get_sequences().items():
if key in key_list:
msg.add_sequence(name)
return msg
def get_bytes(self, key):
"""Return a bytes representation or raise a KeyError."""
try:
if self._locked:
f = open(os.path.join(self._path, str(key)), 'rb+')
else:
f = open(os.path.join(self._path, str(key)), 'rb')
except OSError as e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
with f:
if self._locked:
_lock_file(f)
try:
return f.read().replace(linesep, b'\n')
finally:
if self._locked:
_unlock_file(f)
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
try:
f = open(os.path.join(self._path, str(key)), 'rb')
except OSError as e:
if e.errno == errno.ENOENT:
raise KeyError('No message with key: %s' % key)
else:
raise
return _ProxyFile(f)
def iterkeys(self):
"""Return an iterator over keys."""
return iter(sorted(int(entry) for entry in os.listdir(self._path)
if entry.isdigit()))
def __contains__(self, key):
"""Return True if the keyed message exists, False otherwise."""
return os.path.exists(os.path.join(self._path, str(key)))
def __len__(self):
"""Return a count of messages in the mailbox."""
return len(list(self.iterkeys()))
def lock(self):
"""Lock the mailbox."""
if not self._locked:
self._file = open(os.path.join(self._path, '.mh_sequences'), 'rb+')
_lock_file(self._file)
self._locked = True
def unlock(self):
"""Unlock the mailbox if it is locked."""
if self._locked:
_unlock_file(self._file)
_sync_close(self._file)
del self._file
self._locked = False
def flush(self):
"""Write any pending changes to the disk."""
return
def close(self):
"""Flush and close the mailbox."""
if self._locked:
self.unlock()
def list_folders(self):
"""Return a list of folder names."""
result = []
for entry in os.listdir(self._path):
if os.path.isdir(os.path.join(self._path, entry)):
result.append(entry)
return result
def get_folder(self, folder):
"""Return an MH instance for the named folder."""
return MH(os.path.join(self._path, folder),
factory=self._factory, create=False)
def add_folder(self, folder):
"""Create a folder and return an MH instance representing it."""
return MH(os.path.join(self._path, folder),
factory=self._factory)
def remove_folder(self, folder):
"""Delete the named folder, which must be empty."""
path = os.path.join(self._path, folder)
entries = os.listdir(path)
if entries == ['.mh_sequences']:
os.remove(os.path.join(path, '.mh_sequences'))
elif entries == []:
pass
else:
raise NotEmptyError('Folder not empty: %s' % self._path)
os.rmdir(path)
def get_sequences(self):
"""Return a name-to-key-list dictionary to define each sequence."""
results = {}
with open(os.path.join(self._path, '.mh_sequences'), 'r', encoding='ASCII') as f:
all_keys = set(self.keys())
for line in f:
try:
name, contents = line.split(':')
keys = set()
for spec in contents.split():
if spec.isdigit():
keys.add(int(spec))
else:
start, stop = (int(x) for x in spec.split('-'))
keys.update(range(start, stop + 1))
results[name] = [key for key in sorted(keys) \
if key in all_keys]
if len(results[name]) == 0:
del results[name]
except ValueError:
raise FormatError('Invalid sequence specification: %s' %
line.rstrip())
return results
def set_sequences(self, sequences):
"""Set sequences using the given name-to-key-list dictionary."""
f = open(os.path.join(self._path, '.mh_sequences'), 'r+', encoding='ASCII')
try:
os.close(os.open(f.name, os.O_WRONLY | os.O_TRUNC))
for name, keys in sequences.items():
if len(keys) == 0:
continue
f.write(name + ':')
prev = None
completing = False
for key in sorted(set(keys)):
if key - 1 == prev:
if not completing:
completing = True
f.write('-')
elif completing:
completing = False
f.write('%s %s' % (prev, key))
else:
f.write(' %s' % key)
prev = key
if completing:
f.write(str(prev) + '\n')
else:
f.write('\n')
finally:
_sync_close(f)
def pack(self):
"""Re-name messages to eliminate numbering gaps. Invalidates keys."""
sequences = self.get_sequences()
prev = 0
changes = []
for key in self.iterkeys():
if key - 1 != prev:
changes.append((key, prev + 1))
if hasattr(os, 'link'):
os.link(os.path.join(self._path, str(key)),
os.path.join(self._path, str(prev + 1)))
os.unlink(os.path.join(self._path, str(key)))
else:
os.rename(os.path.join(self._path, str(key)),
os.path.join(self._path, str(prev + 1)))
prev += 1
self._next_key = prev + 1
if len(changes) == 0:
return
for name, key_list in sequences.items():
for old, new in changes:
if old in key_list:
key_list[key_list.index(old)] = new
self.set_sequences(sequences)
def _dump_sequences(self, message, key):
"""Inspect a new MHMessage and update sequences appropriately."""
pending_sequences = message.get_sequences()
all_sequences = self.get_sequences()
for name, key_list in all_sequences.items():
if name in pending_sequences:
key_list.append(key)
elif key in key_list:
del key_list[key_list.index(key)]
for sequence in pending_sequences:
if sequence not in all_sequences:
all_sequences[sequence] = [key]
self.set_sequences(all_sequences)
class Babyl(_singlefileMailbox):
"""An Rmail-style Babyl mailbox."""
_special_labels = frozenset(('unseen', 'deleted', 'filed', 'answered',
'forwarded', 'edited', 'resent'))
def __init__(self, path, factory=None, create=True):
"""Initialize a Babyl mailbox."""
_singlefileMailbox.__init__(self, path, factory, create)
self._labels = {}
def add(self, message):
"""Add message and return assigned key."""
key = _singlefileMailbox.add(self, message)
if isinstance(message, BabylMessage):
self._labels[key] = message.get_labels()
return key
def remove(self, key):
"""Remove the keyed message; raise KeyError if it doesn't exist."""
_singlefileMailbox.remove(self, key)
if key in self._labels:
del self._labels[key]
def __setitem__(self, key, message):
"""Replace the keyed message; raise KeyError if it doesn't exist."""
_singlefileMailbox.__setitem__(self, key, message)
if isinstance(message, BabylMessage):
self._labels[key] = message.get_labels()
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
self._file.readline() # Skip b'1,' line specifying labels.
original_headers = io.BytesIO()
while True:
line = self._file.readline()
if line == b'*** EOOH ***' + linesep or not line:
break
original_headers.write(line.replace(linesep, b'\n'))
visible_headers = io.BytesIO()
while True:
line = self._file.readline()
if line == linesep or not line:
break
visible_headers.write(line.replace(linesep, b'\n'))
# Read up to the stop, or to the end
n = stop - self._file.tell()
assert n >= 0
body = self._file.read(n)
body = body.replace(linesep, b'\n')
msg = BabylMessage(original_headers.getvalue() + body)
msg.set_visible(visible_headers.getvalue())
if key in self._labels:
msg.set_labels(self._labels[key])
return msg
def get_bytes(self, key):
"""Return a string representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
self._file.readline() # Skip b'1,' line specifying labels.
original_headers = io.BytesIO()
while True:
line = self._file.readline()
if line == b'*** EOOH ***' + linesep or not line:
break
original_headers.write(line.replace(linesep, b'\n'))
while True:
line = self._file.readline()
if line == linesep or not line:
break
headers = original_headers.getvalue()
n = stop - self._file.tell()
assert n >= 0
data = self._file.read(n)
data = data.replace(linesep, b'\n')
return headers + data
def get_file(self, key):
"""Return a file-like representation or raise a KeyError."""
return io.BytesIO(self.get_bytes(key).replace(b'\n', linesep))
def get_labels(self):
"""Return a list of user-defined labels in the mailbox."""
self._lookup()
labels = set()
for label_list in self._labels.values():
labels.update(label_list)
labels.difference_update(self._special_labels)
return list(labels)
def _generate_toc(self):
"""Generate key-to-(start, stop) table of contents."""
starts, stops = [], []
self._file.seek(0)
next_pos = 0
label_lists = []
while True:
line_pos = next_pos
line = self._file.readline()
next_pos = self._file.tell()
if line == b'\037\014' + linesep:
if len(stops) < len(starts):
stops.append(line_pos - len(linesep))
starts.append(next_pos)
labels = [label.strip() for label
in self._file.readline()[1:].split(b',')
if label.strip()]
label_lists.append(labels)
elif line == b'\037' or line == b'\037' + linesep:
if len(stops) < len(starts):
stops.append(line_pos - len(linesep))
elif not line:
stops.append(line_pos - len(linesep))
break
self._toc = dict(enumerate(zip(starts, stops)))
self._labels = dict(enumerate(label_lists))
self._next_key = len(self._toc)
self._file.seek(0, 2)
self._file_length = self._file.tell()
def _pre_mailbox_hook(self, f):
"""Called before writing the mailbox to file f."""
babyl = b'BABYL OPTIONS:' + linesep
babyl += b'Version: 5' + linesep
labels = self.get_labels()
labels = (label.encode() for label in labels)
babyl += b'Labels:' + b','.join(labels) + linesep
babyl += b'\037'
f.write(babyl)
def _pre_message_hook(self, f):
"""Called before writing each message to file f."""
f.write(b'\014' + linesep)
def _post_message_hook(self, f):
"""Called after writing each message to file f."""
f.write(linesep + b'\037')
def _install_message(self, message):
"""Write message contents and return (start, stop)."""
start = self._file.tell()
if isinstance(message, BabylMessage):
special_labels = []
labels = []
for label in message.get_labels():
if label in self._special_labels:
special_labels.append(label)
else:
labels.append(label)
self._file.write(b'1')
for label in special_labels:
self._file.write(b', ' + label.encode())
self._file.write(b',,')
for label in labels:
self._file.write(b' ' + label.encode() + b',')
self._file.write(linesep)
else:
self._file.write(b'1,,' + linesep)
if isinstance(message, email.message.Message):
orig_buffer = io.BytesIO()
orig_generator = email.generator.BytesGenerator(orig_buffer, False, 0)
orig_generator.flatten(message)
orig_buffer.seek(0)
while True:
line = orig_buffer.readline()
self._file.write(line.replace(b'\n', linesep))
if line == b'\n' or not line:
break
self._file.write(b'*** EOOH ***' + linesep)
if isinstance(message, BabylMessage):
vis_buffer = io.BytesIO()
vis_generator = email.generator.BytesGenerator(vis_buffer, False, 0)
vis_generator.flatten(message.get_visible())
while True:
line = vis_buffer.readline()
self._file.write(line.replace(b'\n', linesep))
if line == b'\n' or not line:
break
else:
orig_buffer.seek(0)
while True:
line = orig_buffer.readline()
self._file.write(line.replace(b'\n', linesep))
if line == b'\n' or not line:
break
while True:
buffer = orig_buffer.read(4096) # Buffer size is arbitrary.
if not buffer:
break
self._file.write(buffer.replace(b'\n', linesep))
elif isinstance(message, (bytes, str, io.StringIO)):
if isinstance(message, io.StringIO):
warnings.warn("Use of StringIO input is deprecated, "
"use BytesIO instead", DeprecationWarning, 3)
message = message.getvalue()
if isinstance(message, str):
message = self._string_to_bytes(message)
body_start = message.find(b'\n\n') + 2
if body_start - 2 != -1:
self._file.write(message[:body_start].replace(b'\n', linesep))
self._file.write(b'*** EOOH ***' + linesep)
self._file.write(message[:body_start].replace(b'\n', linesep))
self._file.write(message[body_start:].replace(b'\n', linesep))
else:
self._file.write(b'*** EOOH ***' + linesep + linesep)
self._file.write(message.replace(b'\n', linesep))
elif hasattr(message, 'readline'):
if hasattr(message, 'buffer'):
warnings.warn("Use of text mode files is deprecated, "
"use a binary mode file instead", DeprecationWarning, 3)
message = message.buffer
original_pos = message.tell()
first_pass = True
while True:
line = message.readline()
# Universal newline support.
if line.endswith(b'\r\n'):
line = line[:-2] + b'\n'
elif line.endswith(b'\r'):
line = line[:-1] + b'\n'
self._file.write(line.replace(b'\n', linesep))
if line == b'\n' or not line:
if first_pass:
first_pass = False
self._file.write(b'*** EOOH ***' + linesep)
message.seek(original_pos)
else:
break
while True:
line = message.readline()
if not line:
break
# Universal newline support.
if line.endswith(b'\r\n'):
line = line[:-2] + linesep
elif line.endswith(b'\r'):
line = line[:-1] + linesep
elif line.endswith(b'\n'):
line = line[:-1] + linesep
self._file.write(line)
else:
raise TypeError('Invalid message type: %s' % type(message))
stop = self._file.tell()
return (start, stop)
class Message(email.message.Message):
"""Message with mailbox-format-specific properties."""
def __init__(self, message=None):
"""Initialize a Message instance."""
if isinstance(message, email.message.Message):
self._become_message(copy.deepcopy(message))
if isinstance(message, Message):
message._explain_to(self)
elif isinstance(message, bytes):
self._become_message(email.message_from_bytes(message))
elif isinstance(message, str):
self._become_message(email.message_from_string(message))
elif isinstance(message, io.TextIOWrapper):
self._become_message(email.message_from_file(message))
elif hasattr(message, "read"):
self._become_message(email.message_from_binary_file(message))
elif message is None:
email.message.Message.__init__(self)
else:
raise TypeError('Invalid message type: %s' % type(message))
def _become_message(self, message):
"""Assume the non-format-specific state of message."""
type_specific = getattr(message, '_type_specific_attributes', [])
for name in message.__dict__:
if name not in type_specific:
self.__dict__[name] = message.__dict__[name]
def _explain_to(self, message):
"""Copy format-specific state to message insofar as possible."""
if isinstance(message, Message):
return # There's nothing format-specific to explain.
else:
raise TypeError('Cannot convert to specified type')
class MaildirMessage(Message):
"""Message with Maildir-specific properties."""
_type_specific_attributes = ['_subdir', '_info', '_date']
def __init__(self, message=None):
"""Initialize a MaildirMessage instance."""
self._subdir = 'new'
self._info = ''
self._date = time.time()
Message.__init__(self, message)
def get_subdir(self):
"""Return 'new' or 'cur'."""
return self._subdir
def set_subdir(self, subdir):
"""Set subdir to 'new' or 'cur'."""
if subdir == 'new' or subdir == 'cur':
self._subdir = subdir
else:
raise ValueError("subdir must be 'new' or 'cur': %s" % subdir)
def get_flags(self):
"""Return as a string the flags that are set."""
if self._info.startswith('2,'):
return self._info[2:]
else:
return ''
def set_flags(self, flags):
"""Set the given flags and unset all others."""
self._info = '2,' + ''.join(sorted(flags))
def add_flag(self, flag):
"""Set the given flag(s) without changing others."""
self.set_flags(''.join(set(self.get_flags()) | set(flag)))
def remove_flag(self, flag):
"""Unset the given string flag(s) without changing others."""
if self.get_flags():
self.set_flags(''.join(set(self.get_flags()) - set(flag)))
def get_date(self):
"""Return delivery date of message, in seconds since the epoch."""
return self._date
def set_date(self, date):
"""Set delivery date of message, in seconds since the epoch."""
try:
self._date = float(date)
except ValueError:
raise TypeError("can't convert to float: %s" % date)
def get_info(self):
"""Get the message's "info" as a string."""
return self._info
def set_info(self, info):
"""Set the message's "info" string."""
if isinstance(info, str):
self._info = info
else:
raise TypeError('info must be a string: %s' % type(info))
def _explain_to(self, message):
"""Copy Maildir-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
message.set_flags(self.get_flags())
message.set_subdir(self.get_subdir())
message.set_date(self.get_date())
elif isinstance(message, _mboxMMDFMessage):
flags = set(self.get_flags())
if 'S' in flags:
message.add_flag('R')
if self.get_subdir() == 'cur':
message.add_flag('O')
if 'T' in flags:
message.add_flag('D')
if 'F' in flags:
message.add_flag('F')
if 'R' in flags:
message.add_flag('A')
message.set_from('MAILER-DAEMON', time.gmtime(self.get_date()))
elif isinstance(message, MHMessage):
flags = set(self.get_flags())
if 'S' not in flags:
message.add_sequence('unseen')
if 'R' in flags:
message.add_sequence('replied')
if 'F' in flags:
message.add_sequence('flagged')
elif isinstance(message, BabylMessage):
flags = set(self.get_flags())
if 'S' not in flags:
message.add_label('unseen')
if 'T' in flags:
message.add_label('deleted')
if 'R' in flags:
message.add_label('answered')
if 'P' in flags:
message.add_label('forwarded')
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class _mboxMMDFMessage(Message):
"""Message with mbox- or MMDF-specific properties."""
_type_specific_attributes = ['_from']
def __init__(self, message=None):
"""Initialize an mboxMMDFMessage instance."""
self.set_from('MAILER-DAEMON', True)
if isinstance(message, email.message.Message):
unixfrom = message.get_unixfrom()
if unixfrom is not None and unixfrom.startswith('From '):
self.set_from(unixfrom[5:])
Message.__init__(self, message)
def get_from(self):
"""Return contents of "From " line."""
return self._from
def set_from(self, from_, time_=None):
"""Set "From " line, formatting and appending time_ if specified."""
if time_ is not None:
if time_ is True:
time_ = time.gmtime()
from_ += ' ' + time.asctime(time_)
self._from = from_
def get_flags(self):
"""Return as a string the flags that are set."""
return self.get('Status', '') + self.get('X-Status', '')
def set_flags(self, flags):
"""Set the given flags and unset all others."""
flags = set(flags)
status_flags, xstatus_flags = '', ''
for flag in ('R', 'O'):
if flag in flags:
status_flags += flag
flags.remove(flag)
for flag in ('D', 'F', 'A'):
if flag in flags:
xstatus_flags += flag
flags.remove(flag)
xstatus_flags += ''.join(sorted(flags))
try:
self.replace_header('Status', status_flags)
except KeyError:
self.add_header('Status', status_flags)
try:
self.replace_header('X-Status', xstatus_flags)
except KeyError:
self.add_header('X-Status', xstatus_flags)
def add_flag(self, flag):
"""Set the given flag(s) without changing others."""
self.set_flags(''.join(set(self.get_flags()) | set(flag)))
def remove_flag(self, flag):
"""Unset the given string flag(s) without changing others."""
if 'Status' in self or 'X-Status' in self:
self.set_flags(''.join(set(self.get_flags()) - set(flag)))
def _explain_to(self, message):
"""Copy mbox- or MMDF-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
flags = set(self.get_flags())
if 'O' in flags:
message.set_subdir('cur')
if 'F' in flags:
message.add_flag('F')
if 'A' in flags:
message.add_flag('R')
if 'R' in flags:
message.add_flag('S')
if 'D' in flags:
message.add_flag('T')
del message['status']
del message['x-status']
maybe_date = ' '.join(self.get_from().split()[-5:])
try:
message.set_date(calendar.timegm(time.strptime(maybe_date,
'%a %b %d %H:%M:%S %Y')))
except (ValueError, OverflowError):
pass
elif isinstance(message, _mboxMMDFMessage):
message.set_flags(self.get_flags())
message.set_from(self.get_from())
elif isinstance(message, MHMessage):
flags = set(self.get_flags())
if 'R' not in flags:
message.add_sequence('unseen')
if 'A' in flags:
message.add_sequence('replied')
if 'F' in flags:
message.add_sequence('flagged')
del message['status']
del message['x-status']
elif isinstance(message, BabylMessage):
flags = set(self.get_flags())
if 'R' not in flags:
message.add_label('unseen')
if 'D' in flags:
message.add_label('deleted')
if 'A' in flags:
message.add_label('answered')
del message['status']
del message['x-status']
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class mboxMessage(_mboxMMDFMessage):
"""Message with mbox-specific properties."""
class MHMessage(Message):
"""Message with MH-specific properties."""
_type_specific_attributes = ['_sequences']
def __init__(self, message=None):
"""Initialize an MHMessage instance."""
self._sequences = []
Message.__init__(self, message)
def get_sequences(self):
"""Return a list of sequences that include the message."""
return self._sequences[:]
def set_sequences(self, sequences):
"""Set the list of sequences that include the message."""
self._sequences = list(sequences)
def add_sequence(self, sequence):
"""Add sequence to list of sequences including the message."""
if isinstance(sequence, str):
if not sequence in self._sequences:
self._sequences.append(sequence)
else:
raise TypeError('sequence type must be str: %s' % type(sequence))
def remove_sequence(self, sequence):
"""Remove sequence from the list of sequences including the message."""
try:
self._sequences.remove(sequence)
except ValueError:
pass
def _explain_to(self, message):
"""Copy MH-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
sequences = set(self.get_sequences())
if 'unseen' in sequences:
message.set_subdir('cur')
else:
message.set_subdir('cur')
message.add_flag('S')
if 'flagged' in sequences:
message.add_flag('F')
if 'replied' in sequences:
message.add_flag('R')
elif isinstance(message, _mboxMMDFMessage):
sequences = set(self.get_sequences())
if 'unseen' not in sequences:
message.add_flag('RO')
else:
message.add_flag('O')
if 'flagged' in sequences:
message.add_flag('F')
if 'replied' in sequences:
message.add_flag('A')
elif isinstance(message, MHMessage):
for sequence in self.get_sequences():
message.add_sequence(sequence)
elif isinstance(message, BabylMessage):
sequences = set(self.get_sequences())
if 'unseen' in sequences:
message.add_label('unseen')
if 'replied' in sequences:
message.add_label('answered')
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class BabylMessage(Message):
"""Message with Babyl-specific properties."""
_type_specific_attributes = ['_labels', '_visible']
def __init__(self, message=None):
"""Initialize an BabylMessage instance."""
self._labels = []
self._visible = Message()
Message.__init__(self, message)
def get_labels(self):
"""Return a list of labels on the message."""
return self._labels[:]
def set_labels(self, labels):
"""Set the list of labels on the message."""
self._labels = list(labels)
def add_label(self, label):
"""Add label to list of labels on the message."""
if isinstance(label, str):
if label not in self._labels:
self._labels.append(label)
else:
raise TypeError('label must be a string: %s' % type(label))
def remove_label(self, label):
"""Remove label from the list of labels on the message."""
try:
self._labels.remove(label)
except ValueError:
pass
def get_visible(self):
"""Return a Message representation of visible headers."""
return Message(self._visible)
def set_visible(self, visible):
"""Set the Message representation of visible headers."""
self._visible = Message(visible)
def update_visible(self):
"""Update and/or sensibly generate a set of visible headers."""
for header in self._visible.keys():
if header in self:
self._visible.replace_header(header, self[header])
else:
del self._visible[header]
for header in ('Date', 'From', 'Reply-To', 'To', 'CC', 'Subject'):
if header in self and header not in self._visible:
self._visible[header] = self[header]
def _explain_to(self, message):
"""Copy Babyl-specific state to message insofar as possible."""
if isinstance(message, MaildirMessage):
labels = set(self.get_labels())
if 'unseen' in labels:
message.set_subdir('cur')
else:
message.set_subdir('cur')
message.add_flag('S')
if 'forwarded' in labels or 'resent' in labels:
message.add_flag('P')
if 'answered' in labels:
message.add_flag('R')
if 'deleted' in labels:
message.add_flag('T')
elif isinstance(message, _mboxMMDFMessage):
labels = set(self.get_labels())
if 'unseen' not in labels:
message.add_flag('RO')
else:
message.add_flag('O')
if 'deleted' in labels:
message.add_flag('D')
if 'answered' in labels:
message.add_flag('A')
elif isinstance(message, MHMessage):
labels = set(self.get_labels())
if 'unseen' in labels:
message.add_sequence('unseen')
if 'answered' in labels:
message.add_sequence('replied')
elif isinstance(message, BabylMessage):
message.set_visible(self.get_visible())
for label in self.get_labels():
message.add_label(label)
elif isinstance(message, Message):
pass
else:
raise TypeError('Cannot convert to specified type: %s' %
type(message))
class MMDFMessage(_mboxMMDFMessage):
"""Message with MMDF-specific properties."""
class _ProxyFile:
"""A read-only wrapper of a file."""
def __init__(self, f, pos=None):
"""Initialize a _ProxyFile."""
self._file = f
if pos is None:
self._pos = f.tell()
else:
self._pos = pos
def read(self, size=None):
"""Read bytes."""
return self._read(size, self._file.read)
def read1(self, size=None):
"""Read bytes."""
return self._read(size, self._file.read1)
def readline(self, size=None):
"""Read a line."""
return self._read(size, self._file.readline)
def readlines(self, sizehint=None):
"""Read multiple lines."""
result = []
for line in self:
result.append(line)
if sizehint is not None:
sizehint -= len(line)
if sizehint <= 0:
break
return result
def __iter__(self):
"""Iterate over lines."""
while True:
line = self.readline()
if not line:
raise StopIteration
yield line
def tell(self):
"""Return the position."""
return self._pos
def seek(self, offset, whence=0):
"""Change position."""
if whence == 1:
self._file.seek(self._pos)
self._file.seek(offset, whence)
self._pos = self._file.tell()
def close(self):
"""Close the file."""
if hasattr(self, '_file'):
if hasattr(self._file, 'close'):
self._file.close()
del self._file
def _read(self, size, read_method):
"""Read size bytes using read_method."""
if size is None:
size = -1
self._file.seek(self._pos)
result = read_method(size)
self._pos = self._file.tell()
return result
def __enter__(self):
"""Context management protocol support."""
return self
def __exit__(self, *exc):
self.close()
def readable(self):
return self._file.readable()
def writable(self):
return self._file.writable()
def seekable(self):
return self._file.seekable()
def flush(self):
return self._file.flush()
@property
def closed(self):
if not hasattr(self, '_file'):
return True
if not hasattr(self._file, 'closed'):
return False
return self._file.closed
class _PartialFile(_ProxyFile):
"""A read-only wrapper of part of a file."""
def __init__(self, f, start=None, stop=None):
"""Initialize a _PartialFile."""
_ProxyFile.__init__(self, f, start)
self._start = start
self._stop = stop
def tell(self):
"""Return the position with respect to start."""
return _ProxyFile.tell(self) - self._start
def seek(self, offset, whence=0):
"""Change position, possibly with respect to start or stop."""
if whence == 0:
self._pos = self._start
whence = 1
elif whence == 2:
self._pos = self._stop
whence = 1
_ProxyFile.seek(self, offset, whence)
def _read(self, size, read_method):
"""Read size bytes using read_method, honoring start and stop."""
remaining = self._stop - self._pos
if remaining <= 0:
return b''
if size is None or size < 0 or size > remaining:
size = remaining
return _ProxyFile._read(self, size, read_method)
def close(self):
# do *not* close the underlying file object for partial files,
# since it's global to the mailbox object
if hasattr(self, '_file'):
del self._file
def _lock_file(f, dotlock=True):
"""Lock file f using lockf and dot locking."""
dotlock_done = False
try:
if fcntl:
try:
fcntl.lockf(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except OSError as e:
if e.errno in (errno.EAGAIN, errno.EACCES, errno.EROFS):
raise ExternalClashError('lockf: lock unavailable: %s' %
f.name)
else:
raise
if dotlock:
try:
pre_lock = _create_temporary(f.name + '.lock')
pre_lock.close()
except OSError as e:
if e.errno in (errno.EACCES, errno.EROFS):
return # Without write access, just skip dotlocking.
else:
raise
try:
if hasattr(os, 'link'):
os.link(pre_lock.name, f.name + '.lock')
dotlock_done = True
os.unlink(pre_lock.name)
else:
os.rename(pre_lock.name, f.name + '.lock')
dotlock_done = True
except FileExistsError:
os.remove(pre_lock.name)
raise ExternalClashError('dot lock unavailable: %s' %
f.name)
except:
if fcntl:
fcntl.lockf(f, fcntl.LOCK_UN)
if dotlock_done:
os.remove(f.name + '.lock')
raise
def _unlock_file(f):
"""Unlock file f using lockf and dot locking."""
if fcntl:
fcntl.lockf(f, fcntl.LOCK_UN)
if os.path.exists(f.name + '.lock'):
os.remove(f.name + '.lock')
def _create_carefully(path):
"""Create a file if it doesn't exist and open for reading and writing."""
fd = os.open(path, os.O_CREAT | os.O_EXCL | os.O_RDWR, 0o666)
try:
return open(path, 'rb+')
finally:
os.close(fd)
def _create_temporary(path):
"""Create a temp file based on path and open for reading and writing."""
return _create_carefully('%s.%s.%s.%s' % (path, int(time.time()),
socket.gethostname(),
os.getpid()))
def _sync_flush(f):
"""Ensure changes to file f are physically on disk."""
f.flush()
if hasattr(os, 'fsync'):
os.fsync(f.fileno())
def _sync_close(f):
"""Close file f, ensuring all changes are physically on disk."""
_sync_flush(f)
f.close()
class Error(Exception):
"""Raised for module-specific errors."""
class NoSuchMailboxError(Error):
"""The specified mailbox does not exist and won't be created."""
class NotEmptyError(Error):
"""The specified mailbox is not empty and deletion was requested."""
class ExternalClashError(Error):
"""Another process caused an action to fail."""
class FormatError(Error):
"""A file appears to have an invalid format."""
"""Mailcap file handling. See RFC 1524."""
import os
__all__ = ["getcaps","findmatch"]
# Part 1: top-level interface.
def getcaps():
"""Return a dictionary containing the mailcap database.
The dictionary maps a MIME type (in all lowercase, e.g. 'text/plain')
to a list of dictionaries corresponding to mailcap entries. The list
collects all the entries for that MIME type from all available mailcap
files. Each dictionary contains key-value pairs for that MIME type,
where the viewing command is stored with the key "view".
"""
caps = {}
for mailcap in listmailcapfiles():
try:
fp = open(mailcap, 'r')
except OSError:
continue
with fp:
morecaps = readmailcapfile(fp)
for key, value in morecaps.items():
if not key in caps:
caps[key] = value
else:
caps[key] = caps[key] + value
return caps
def listmailcapfiles():
"""Return a list of all mailcap files found on the system."""
# This is mostly a Unix thing, but we use the OS path separator anyway
if 'MAILCAPS' in os.environ:
pathstr = os.environ['MAILCAPS']
mailcaps = pathstr.split(os.pathsep)
else:
if 'HOME' in os.environ:
home = os.environ['HOME']
else:
# Don't bother with getpwuid()
home = '.' # Last resort
mailcaps = [home + '/.mailcap', '/etc/mailcap',
'/usr/etc/mailcap', '/usr/local/etc/mailcap']
return mailcaps
# Part 2: the parser.
def readmailcapfile(fp):
"""Read a mailcap file and return a dictionary keyed by MIME type.
Each MIME type is mapped to an entry consisting of a list of
dictionaries; the list will contain more than one such dictionary
if a given MIME type appears more than once in the mailcap file.
Each dictionary contains key-value pairs for that MIME type, where
the viewing command is stored with the key "view".
"""
caps = {}
while 1:
line = fp.readline()
if not line: break
# Ignore comments and blank lines
if line[0] == '#' or line.strip() == '':
continue
nextline = line
# Join continuation lines
while nextline[-2:] == '\\\n':
nextline = fp.readline()
if not nextline: nextline = '\n'
line = line[:-2] + nextline
# Parse the line
key, fields = parseline(line)
if not (key and fields):
continue
# Normalize the key
types = key.split('/')
for j in range(len(types)):
types[j] = types[j].strip()
key = '/'.join(types).lower()
# Update the database
if key in caps:
caps[key].append(fields)
else:
caps[key] = [fields]
return caps
def parseline(line):
"""Parse one entry in a mailcap file and return a dictionary.
The viewing command is stored as the value with the key "view",
and the rest of the fields produce key-value pairs in the dict.
"""
fields = []
i, n = 0, len(line)
while i < n:
field, i = parsefield(line, i, n)
fields.append(field)
i = i+1 # Skip semicolon
if len(fields) < 2:
return None, None
key, view, rest = fields[0], fields[1], fields[2:]
fields = {'view': view}
for field in rest:
i = field.find('=')
if i < 0:
fkey = field
fvalue = ""
else:
fkey = field[:i].strip()
fvalue = field[i+1:].strip()
if fkey in fields:
# Ignore it
pass
else:
fields[fkey] = fvalue
return key, fields
def parsefield(line, i, n):
"""Separate one key-value pair in a mailcap entry."""
start = i
while i < n:
c = line[i]
if c == ';':
break
elif c == '\\':
i = i+2
else:
i = i+1
return line[start:i].strip(), i
# Part 3: using the database.
def findmatch(caps, MIMEtype, key='view', filename="/dev/null", plist=[]):
"""Find a match for a mailcap entry.
Return a tuple containing the command line, and the mailcap entry
used; (None, None) if no match is found. This may invoke the
'test' command of several matching entries before deciding which
entry to use.
"""
entries = lookup(caps, MIMEtype, key)
# XXX This code should somehow check for the needsterminal flag.
for e in entries:
if 'test' in e:
test = subst(e['test'], filename, plist)
if test and os.system(test) != 0:
continue
command = subst(e[key], MIMEtype, filename, plist)
return command, e
return None, None
def lookup(caps, MIMEtype, key=None):
entries = []
if MIMEtype in caps:
entries = entries + caps[MIMEtype]
MIMEtypes = MIMEtype.split('/')
MIMEtype = MIMEtypes[0] + '/*'
if MIMEtype in caps:
entries = entries + caps[MIMEtype]
if key is not None:
entries = [e for e in entries if key in e]
return entries
def subst(field, MIMEtype, filename, plist=[]):
# XXX Actually, this is Unix-specific
res = ''
i, n = 0, len(field)
while i < n:
c = field[i]; i = i+1
if c != '%':
if c == '\\':
c = field[i:i+1]; i = i+1
res = res + c
else:
c = field[i]; i = i+1
if c == '%':
res = res + c
elif c == 's':
res = res + filename
elif c == 't':
res = res + MIMEtype
elif c == '{':
start = i
while i < n and field[i] != '}':
i = i+1
name = field[start:i]
i = i+1
res = res + findparam(name, plist)
# XXX To do:
# %n == number of parts if type is multipart/*
# %F == list of alternating type and filename for parts
else:
res = res + '%' + c
return res
def findparam(name, plist):
name = name.lower() + '='
n = len(name)
for p in plist:
if p[:n].lower() == name:
return p[n:]
return ''
# Part 4: test program.
def test():
import sys
caps = getcaps()
if not sys.argv[1:]:
show(caps)
return
for i in range(1, len(sys.argv), 2):
args = sys.argv[i:i+2]
if len(args) < 2:
print("usage: mailcap [MIMEtype file] ...")
return
MIMEtype = args[0]
file = args[1]
command, e = findmatch(caps, MIMEtype, 'view', file)
if not command:
print("No viewer found for", type)
else:
print("Executing:", command)
sts = os.system(command)
if sts:
print("Exit status:", sts)
def show(caps):
print("Mailcap files:")
for fn in listmailcapfiles(): print("\t" + fn)
print()
if not caps: caps = getcaps()
print("Mailcap entries:")
print()
ckeys = sorted(caps)
for type in ckeys:
print(type)
entries = caps[type]
for e in entries:
keys = sorted(e)
for k in keys:
print(" %-15s" % k, e[k])
print()
if __name__ == '__main__':
test()
"""Guess the MIME type of a file.
This module defines two useful functions:
guess_type(url, strict=True) -- guess the MIME type and encoding of a URL.
guess_extension(type, strict=True) -- guess the extension for a given MIME type.
It also contains the following, for tuning the behavior:
Data:
knownfiles -- list of files to parse
inited -- flag set when init() has been called
suffix_map -- dictionary mapping suffixes to suffixes
encodings_map -- dictionary mapping suffixes to encodings
types_map -- dictionary mapping suffixes to types
Functions:
init([files]) -- parse a list of files, default knownfiles (on Windows, the
default values are taken from the registry)
read_mime_types(file) -- parse one file, return a dictionary or None
"""
import os
import sys
import posixpath
import urllib.parse
try:
import winreg as _winreg
except ImportError:
_winreg = None
__all__ = [
"guess_type","guess_extension","guess_all_extensions",
"add_type","read_mime_types","init"
]
knownfiles = [
"/etc/mime.types",
"/etc/httpd/mime.types", # Mac OS X
"/etc/httpd/conf/mime.types", # Apache
"/etc/apache/mime.types", # Apache 1
"/etc/apache2/mime.types", # Apache 2
"/usr/local/etc/httpd/conf/mime.types",
"/usr/local/lib/netscape/mime.types",
"/usr/local/etc/httpd/conf/mime.types", # Apache 1.2
"/usr/local/etc/mime.types", # Apache 1.3
]
inited = False
_db = None
class MimeTypes:
"""MIME-types datastore.
This datastore can handle information from mime.types-style files
and supports basic determination of MIME type from a filename or
URL, and can guess a reasonable extension given a MIME type.
"""
def __init__(self, filenames=(), strict=True):
if not inited:
init()
self.encodings_map = encodings_map.copy()
self.suffix_map = suffix_map.copy()
self.types_map = ({}, {}) # dict for (non-strict, strict)
self.types_map_inv = ({}, {})
for (ext, type) in types_map.items():
self.add_type(type, ext, True)
for (ext, type) in common_types.items():
self.add_type(type, ext, False)
for name in filenames:
self.read(name, strict)
def add_type(self, type, ext, strict=True):
"""Add a mapping between a type and an extension.
When the extension is already known, the new
type will replace the old one. When the type
is already known the extension will be added
to the list of known extensions.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
self.types_map[strict][ext] = type
exts = self.types_map_inv[strict].setdefault(type, [])
if ext not in exts:
exts.append(ext)
def guess_type(self, url, strict=True):
"""Guess the type of a file based on its URL.
Return value is a tuple (type, encoding) where type is None if
the type can't be guessed (no or unknown suffix) or a string
of the form type/subtype, usable for a MIME Content-type
header; and encoding is None for no encoding or the name of
the program used to encode (e.g. compress or gzip). The
mappings are table driven. Encoding suffixes are case
sensitive; type suffixes are first tried case sensitive, then
case insensitive.
The suffixes .tgz, .taz and .tz (case sensitive!) are all
mapped to '.tar.gz'. (This is table-driven too, using the
dictionary suffix_map.)
Optional `strict' argument when False adds a bunch of commonly found,
but non-standard types.
"""
scheme, url = urllib.parse.splittype(url)
if scheme == 'data':
# syntax of data URLs:
# dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
# mediatype := [ type "/" subtype ] *( ";" parameter )
# data := *urlchar
# parameter := attribute "=" value
# type/subtype defaults to "text/plain"
comma = url.find(',')
if comma < 0:
# bad data URL
return None, None
semi = url.find(';', 0, comma)
if semi >= 0:
type = url[:semi]
else:
type = url[:comma]
if '=' in type or '/' not in type:
type = 'text/plain'
return type, None # never compressed, so encoding is None
base, ext = posixpath.splitext(url)
while ext in self.suffix_map:
base, ext = posixpath.splitext(base + self.suffix_map[ext])
if ext in self.encodings_map:
encoding = self.encodings_map[ext]
base, ext = posixpath.splitext(base)
else:
encoding = None
types_map = self.types_map[True]
if ext in types_map:
return types_map[ext], encoding
elif ext.lower() in types_map:
return types_map[ext.lower()], encoding
elif strict:
return None, encoding
types_map = self.types_map[False]
if ext in types_map:
return types_map[ext], encoding
elif ext.lower() in types_map:
return types_map[ext.lower()], encoding
else:
return None, encoding
def guess_all_extensions(self, type, strict=True):
"""Guess the extensions for a file based on its MIME type.
Return value is a list of strings giving the possible filename
extensions, including the leading dot ('.'). The extension is not
guaranteed to have been associated with any particular data stream,
but would be mapped to the MIME type `type' by guess_type().
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
type = type.lower()
extensions = self.types_map_inv[True].get(type, [])
if not strict:
for ext in self.types_map_inv[False].get(type, []):
if ext not in extensions:
extensions.append(ext)
return extensions
def guess_extension(self, type, strict=True):
"""Guess the extension for a file based on its MIME type.
Return value is a string giving a filename extension,
including the leading dot ('.'). The extension is not
guaranteed to have been associated with any particular data
stream, but would be mapped to the MIME type `type' by
guess_type(). If no extension can be guessed for `type', None
is returned.
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
extensions = self.guess_all_extensions(type, strict)
if not extensions:
return None
return extensions[0]
def read(self, filename, strict=True):
"""
Read a single mime.types-format file, specified by pathname.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
with open(filename, encoding='utf-8') as fp:
self.readfp(fp, strict)
def readfp(self, fp, strict=True):
"""
Read a single mime.types-format file.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
while 1:
line = fp.readline()
if not line:
break
words = line.split()
for i in range(len(words)):
if words[i][0] == '#':
del words[i:]
break
if not words:
continue
type, suffixes = words[0], words[1:]
for suff in suffixes:
self.add_type(type, '.' + suff, strict)
def read_windows_registry(self, strict=True):
"""
Load the MIME types database from Windows registry.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
# Windows only
if not _winreg:
return
def enum_types(mimedb):
i = 0
while True:
try:
ctype = _winreg.EnumKey(mimedb, i)
except EnvironmentError:
break
else:
if '\0' not in ctype:
yield ctype
i += 1
with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT, '') as hkcr:
for subkeyname in enum_types(hkcr):
# ironpython: code modified to avoid StackOverflowException - https://github.com/IronLanguages/ironpython3/issues/1182
with _winreg.OpenKey(hkcr, subkeyname) as subkey:
# Only check file extensions
if not subkeyname.startswith("."):
continue
try:
# raises EnvironmentError if no 'Content Type' value
mimetype, datatype = _winreg.QueryValueEx(
subkey, 'Content Type')
except EnvironmentError:
pass
else:
if datatype != _winreg.REG_SZ:
continue
self.add_type(mimetype, subkeyname, strict)
def guess_type(url, strict=True):
"""Guess the type of a file based on its URL.
Return value is a tuple (type, encoding) where type is None if the
type can't be guessed (no or unknown suffix) or a string of the
form type/subtype, usable for a MIME Content-type header; and
encoding is None for no encoding or the name of the program used
to encode (e.g. compress or gzip). The mappings are table
driven. Encoding suffixes are case sensitive; type suffixes are
first tried case sensitive, then case insensitive.
The suffixes .tgz, .taz and .tz (case sensitive!) are all mapped
to ".tar.gz". (This is table-driven too, using the dictionary
suffix_map).
Optional `strict' argument when false adds a bunch of commonly found, but
non-standard types.
"""
if _db is None:
init()
return _db.guess_type(url, strict)
def guess_all_extensions(type, strict=True):
"""Guess the extensions for a file based on its MIME type.
Return value is a list of strings giving the possible filename
extensions, including the leading dot ('.'). The extension is not
guaranteed to have been associated with any particular data
stream, but would be mapped to the MIME type `type' by
guess_type(). If no extension can be guessed for `type', None
is returned.
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
if _db is None:
init()
return _db.guess_all_extensions(type, strict)
def guess_extension(type, strict=True):
"""Guess the extension for a file based on its MIME type.
Return value is a string giving a filename extension, including the
leading dot ('.'). The extension is not guaranteed to have been
associated with any particular data stream, but would be mapped to the
MIME type `type' by guess_type(). If no extension can be guessed for
`type', None is returned.
Optional `strict' argument when false adds a bunch of commonly found,
but non-standard types.
"""
if _db is None:
init()
return _db.guess_extension(type, strict)
def add_type(type, ext, strict=True):
"""Add a mapping between a type and an extension.
When the extension is already known, the new
type will replace the old one. When the type
is already known the extension will be added
to the list of known extensions.
If strict is true, information will be added to
list of standard types, else to the list of non-standard
types.
"""
if _db is None:
init()
return _db.add_type(type, ext, strict)
def init(files=None):
global suffix_map, types_map, encodings_map, common_types
global inited, _db
inited = True # so that MimeTypes.__init__() doesn't call us again
db = MimeTypes()
if files is None:
if _winreg:
db.read_windows_registry()
files = knownfiles
for file in files:
if os.path.isfile(file):
db.read(file)
encodings_map = db.encodings_map
suffix_map = db.suffix_map
types_map = db.types_map[True]
common_types = db.types_map[False]
# Make the DB a global variable now that it is fully initialized
_db = db
def read_mime_types(file):
try:
f = open(file)
except OSError:
return None
with f:
db = MimeTypes()
db.readfp(f, True)
return db.types_map[True]
def _default_mime_types():
global suffix_map
global encodings_map
global types_map
global common_types
suffix_map = {
'.svgz': '.svg.gz',
'.tgz': '.tar.gz',
'.taz': '.tar.gz',
'.tz': '.tar.gz',
'.tbz2': '.tar.bz2',
'.txz': '.tar.xz',
}
encodings_map = {
'.gz': 'gzip',
'.Z': 'compress',
'.bz2': 'bzip2',
'.xz': 'xz',
}
# Before adding new types, make sure they are either registered with IANA,
# at http://www.iana.org/assignments/media-types
# or extensions, i.e. using the x- prefix
# If you add to these, please keep them sorted!
types_map = {
'.a' : 'application/octet-stream',
'.ai' : 'application/postscript',
'.aif' : 'audio/x-aiff',
'.aifc' : 'audio/x-aiff',
'.aiff' : 'audio/x-aiff',
'.au' : 'audio/basic',
'.avi' : 'video/x-msvideo',
'.bat' : 'text/plain',
'.bcpio' : 'application/x-bcpio',
'.bin' : 'application/octet-stream',
'.bmp' : 'image/x-ms-bmp',
'.c' : 'text/plain',
# Duplicates :(
'.cdf' : 'application/x-cdf',
'.cdf' : 'application/x-netcdf',
'.cpio' : 'application/x-cpio',
'.csh' : 'application/x-csh',
'.css' : 'text/css',
'.dll' : 'application/octet-stream',
'.doc' : 'application/msword',
'.dot' : 'application/msword',
'.dvi' : 'application/x-dvi',
'.eml' : 'message/rfc822',
'.eps' : 'application/postscript',
'.etx' : 'text/x-setext',
'.exe' : 'application/octet-stream',
'.gif' : 'image/gif',
'.gtar' : 'application/x-gtar',
'.h' : 'text/plain',
'.hdf' : 'application/x-hdf',
'.htm' : 'text/html',
'.html' : 'text/html',
'.ico' : 'image/vnd.microsoft.icon',
'.ief' : 'image/ief',
'.jpe' : 'image/jpeg',
'.jpeg' : 'image/jpeg',
'.jpg' : 'image/jpeg',
'.js' : 'application/javascript',
'.ksh' : 'text/plain',
'.latex' : 'application/x-latex',
'.m1v' : 'video/mpeg',
'.m3u' : 'application/vnd.apple.mpegurl',
'.m3u8' : 'application/vnd.apple.mpegurl',
'.man' : 'application/x-troff-man',
'.me' : 'application/x-troff-me',
'.mht' : 'message/rfc822',
'.mhtml' : 'message/rfc822',
'.mif' : 'application/x-mif',
'.mov' : 'video/quicktime',
'.movie' : 'video/x-sgi-movie',
'.mp2' : 'audio/mpeg',
'.mp3' : 'audio/mpeg',
'.mp4' : 'video/mp4',
'.mpa' : 'video/mpeg',
'.mpe' : 'video/mpeg',
'.mpeg' : 'video/mpeg',
'.mpg' : 'video/mpeg',
'.ms' : 'application/x-troff-ms',
'.nc' : 'application/x-netcdf',
'.nws' : 'message/rfc822',
'.o' : 'application/octet-stream',
'.obj' : 'application/octet-stream',
'.oda' : 'application/oda',
'.p12' : 'application/x-pkcs12',
'.p7c' : 'application/pkcs7-mime',
'.pbm' : 'image/x-portable-bitmap',
'.pdf' : 'application/pdf',
'.pfx' : 'application/x-pkcs12',
'.pgm' : 'image/x-portable-graymap',
'.pl' : 'text/plain',
'.png' : 'image/png',
'.pnm' : 'image/x-portable-anymap',
'.pot' : 'application/vnd.ms-powerpoint',
'.ppa' : 'application/vnd.ms-powerpoint',
'.ppm' : 'image/x-portable-pixmap',
'.pps' : 'application/vnd.ms-powerpoint',
'.ppt' : 'application/vnd.ms-powerpoint',
'.ps' : 'application/postscript',
'.pwz' : 'application/vnd.ms-powerpoint',
'.py' : 'text/x-python',
'.pyc' : 'application/x-python-code',
'.pyo' : 'application/x-python-code',
'.qt' : 'video/quicktime',
'.ra' : 'audio/x-pn-realaudio',
'.ram' : 'application/x-pn-realaudio',
'.ras' : 'image/x-cmu-raster',
'.rdf' : 'application/xml',
'.rgb' : 'image/x-rgb',
'.roff' : 'application/x-troff',
'.rtx' : 'text/richtext',
'.sgm' : 'text/x-sgml',
'.sgml' : 'text/x-sgml',
'.sh' : 'application/x-sh',
'.shar' : 'application/x-shar',
'.snd' : 'audio/basic',
'.so' : 'application/octet-stream',
'.src' : 'application/x-wais-source',
'.sv4cpio': 'application/x-sv4cpio',
'.sv4crc' : 'application/x-sv4crc',
'.svg' : 'image/svg+xml',
'.swf' : 'application/x-shockwave-flash',
'.t' : 'application/x-troff',
'.tar' : 'application/x-tar',
'.tcl' : 'application/x-tcl',
'.tex' : 'application/x-tex',
'.texi' : 'application/x-texinfo',
'.texinfo': 'application/x-texinfo',
'.tif' : 'image/tiff',
'.tiff' : 'image/tiff',
'.tr' : 'application/x-troff',
'.tsv' : 'text/tab-separated-values',
'.txt' : 'text/plain',
'.ustar' : 'application/x-ustar',
'.vcf' : 'text/x-vcard',
'.wav' : 'audio/x-wav',
'.wiz' : 'application/msword',
'.wsdl' : 'application/xml',
'.xbm' : 'image/x-xbitmap',
'.xlb' : 'application/vnd.ms-excel',
# Duplicates :(
'.xls' : 'application/excel',
'.xls' : 'application/vnd.ms-excel',
'.xml' : 'text/xml',
'.xpdl' : 'application/xml',
'.xpm' : 'image/x-xpixmap',
'.xsl' : 'application/xml',
'.xwd' : 'image/x-xwindowdump',
'.zip' : 'application/zip',
}
# These are non-standard types, commonly found in the wild. They will
# only match if strict=0 flag is given to the API methods.
# Please sort these too
common_types = {
'.jpg' : 'image/jpg',
'.mid' : 'audio/midi',
'.midi': 'audio/midi',
'.pct' : 'image/pict',
'.pic' : 'image/pict',
'.pict': 'image/pict',
'.rtf' : 'application/rtf',
'.xul' : 'text/xul'
}
_default_mime_types()
if __name__ == '__main__':
import getopt
USAGE = """\
Usage: mimetypes.py [options] type
Options:
--help / -h -- print this message and exit
--lenient / -l -- additionally search of some common, but non-standard
types.
--extension / -e -- guess extension instead of type
More than one type argument may be given.
"""
def usage(code, msg=''):
print(USAGE)
if msg: print(msg)
sys.exit(code)
try:
opts, args = getopt.getopt(sys.argv[1:], 'hle',
['help', 'lenient', 'extension'])
except getopt.error as msg:
usage(1, msg)
strict = 1
extension = 0
for opt, arg in opts:
if opt in ('-h', '--help'):
usage(0)
elif opt in ('-l', '--lenient'):
strict = 0
elif opt in ('-e', '--extension'):
extension = 1
for gtype in args:
if extension:
guess = guess_extension(gtype, strict)
if not guess: print("I don't know anything about type", gtype)
else: print(guess)
else:
guess, encoding = guess_type(gtype, strict)
if not guess: print("I don't know anything about type", gtype)
else: print('type:', guess, 'encoding:', encoding)
"""Find modules used by a script, using introspection."""
import dis
import importlib._bootstrap
import importlib.machinery
import marshal
import os
import sys
import types
import struct
import warnings
with warnings.catch_warnings():
warnings.simplefilter('ignore', PendingDeprecationWarning)
import imp
# XXX Clean up once str8's cstor matches bytes.
LOAD_CONST = bytes([dis.opname.index('LOAD_CONST')])
IMPORT_NAME = bytes([dis.opname.index('IMPORT_NAME')])
STORE_NAME = bytes([dis.opname.index('STORE_NAME')])
STORE_GLOBAL = bytes([dis.opname.index('STORE_GLOBAL')])
STORE_OPS = [STORE_NAME, STORE_GLOBAL]
HAVE_ARGUMENT = bytes([dis.HAVE_ARGUMENT])
# Modulefinder does a good job at simulating Python's, but it can not
# handle __path__ modifications packages make at runtime. Therefore there
# is a mechanism whereby you can register extra paths in this map for a
# package, and it will be honored.
# Note this is a mapping is lists of paths.
packagePathMap = {}
# A Public interface
def AddPackagePath(packagename, path):
packagePathMap.setdefault(packagename, []).append(path)
replacePackageMap = {}
# This ReplacePackage mechanism allows modulefinder to work around
# situations in which a package injects itself under the name
# of another package into sys.modules at runtime by calling
# ReplacePackage("real_package_name", "faked_package_name")
# before running ModuleFinder.
def ReplacePackage(oldname, newname):
replacePackageMap[oldname] = newname
class Module:
def __init__(self, name, file=None, path=None):
self.__name__ = name
self.__file__ = file
self.__path__ = path
self.__code__ = None
# The set of global names that are assigned to in the module.
# This includes those names imported through starimports of
# Python modules.
self.globalnames = {}
# The set of starimports this module did that could not be
# resolved, ie. a starimport from a non-Python module.
self.starimports = {}
def __repr__(self):
s = "Module(%r" % (self.__name__,)
if self.__file__ is not None:
s = s + ", %r" % (self.__file__,)
if self.__path__ is not None:
s = s + ", %r" % (self.__path__,)
s = s + ")"
return s
class ModuleFinder:
def __init__(self, path=None, debug=0, excludes=[], replace_paths=[]):
if path is None:
path = sys.path
self.path = path
self.modules = {}
self.badmodules = {}
self.debug = debug
self.indent = 0
self.excludes = excludes
self.replace_paths = replace_paths
self.processed_paths = [] # Used in debugging only
def msg(self, level, str, *args):
if level <= self.debug:
for i in range(self.indent):
print(" ", end=' ')
print(str, end=' ')
for arg in args:
print(repr(arg), end=' ')
print()
def msgin(self, *args):
level = args[0]
if level <= self.debug:
self.indent = self.indent + 1
self.msg(*args)
def msgout(self, *args):
level = args[0]
if level <= self.debug:
self.indent = self.indent - 1
self.msg(*args)
def run_script(self, pathname):
self.msg(2, "run_script", pathname)
with open(pathname) as fp:
stuff = ("", "r", imp.PY_SOURCE)
self.load_module('__main__', fp, pathname, stuff)
def load_file(self, pathname):
dir, name = os.path.split(pathname)
name, ext = os.path.splitext(name)
with open(pathname) as fp:
stuff = (ext, "r", imp.PY_SOURCE)
self.load_module(name, fp, pathname, stuff)
def import_hook(self, name, caller=None, fromlist=None, level=-1):
self.msg(3, "import_hook", name, caller, fromlist, level)
parent = self.determine_parent(caller, level=level)
q, tail = self.find_head_package(parent, name)
m = self.load_tail(q, tail)
if not fromlist:
return q
if m.__path__:
self.ensure_fromlist(m, fromlist)
return None
def determine_parent(self, caller, level=-1):
self.msgin(4, "determine_parent", caller, level)
if not caller or level == 0:
self.msgout(4, "determine_parent -> None")
return None
pname = caller.__name__
if level >= 1: # relative import
if caller.__path__:
level -= 1
if level == 0:
parent = self.modules[pname]
assert parent is caller
self.msgout(4, "determine_parent ->", parent)
return parent
if pname.count(".") < level:
raise ImportError("relative importpath too deep")
pname = ".".join(pname.split(".")[:-level])
parent = self.modules[pname]
self.msgout(4, "determine_parent ->", parent)
return parent
if caller.__path__:
parent = self.modules[pname]
assert caller is parent
self.msgout(4, "determine_parent ->", parent)
return parent
if '.' in pname:
i = pname.rfind('.')
pname = pname[:i]
parent = self.modules[pname]
assert parent.__name__ == pname
self.msgout(4, "determine_parent ->", parent)
return parent
self.msgout(4, "determine_parent -> None")
return None
def find_head_package(self, parent, name):
self.msgin(4, "find_head_package", parent, name)
if '.' in name:
i = name.find('.')
head = name[:i]
tail = name[i+1:]
else:
head = name
tail = ""
if parent:
qname = "%s.%s" % (parent.__name__, head)
else:
qname = head
q = self.import_module(head, qname, parent)
if q:
self.msgout(4, "find_head_package ->", (q, tail))
return q, tail
if parent:
qname = head
parent = None
q = self.import_module(head, qname, parent)
if q:
self.msgout(4, "find_head_package ->", (q, tail))
return q, tail
self.msgout(4, "raise ImportError: No module named", qname)
raise ImportError("No module named " + qname)
def load_tail(self, q, tail):
self.msgin(4, "load_tail", q, tail)
m = q
while tail:
i = tail.find('.')
if i < 0: i = len(tail)
head, tail = tail[:i], tail[i+1:]
mname = "%s.%s" % (m.__name__, head)
m = self.import_module(head, mname, m)
if not m:
self.msgout(4, "raise ImportError: No module named", mname)
raise ImportError("No module named " + mname)
self.msgout(4, "load_tail ->", m)
return m
def ensure_fromlist(self, m, fromlist, recursive=0):
self.msg(4, "ensure_fromlist", m, fromlist, recursive)
for sub in fromlist:
if sub == "*":
if not recursive:
all = self.find_all_submodules(m)
if all:
self.ensure_fromlist(m, all, 1)
elif not hasattr(m, sub):
subname = "%s.%s" % (m.__name__, sub)
submod = self.import_module(sub, subname, m)
if not submod:
raise ImportError("No module named " + subname)
def find_all_submodules(self, m):
if not m.__path__:
return
modules = {}
# 'suffixes' used to be a list hardcoded to [".py", ".pyc", ".pyo"].
# But we must also collect Python extension modules - although
# we cannot separate normal dlls from Python extensions.
suffixes = []
suffixes += importlib.machinery.EXTENSION_SUFFIXES[:]
suffixes += importlib.machinery.SOURCE_SUFFIXES[:]
suffixes += importlib.machinery.BYTECODE_SUFFIXES[:]
for dir in m.__path__:
try:
names = os.listdir(dir)
except OSError:
self.msg(2, "can't list directory", dir)
continue
for name in names:
mod = None
for suff in suffixes:
n = len(suff)
if name[-n:] == suff:
mod = name[:-n]
break
if mod and mod != "__init__":
modules[mod] = mod
return modules.keys()
def import_module(self, partname, fqname, parent):
self.msgin(3, "import_module", partname, fqname, parent)
try:
m = self.modules[fqname]
except KeyError:
pass
else:
self.msgout(3, "import_module ->", m)
return m
if fqname in self.badmodules:
self.msgout(3, "import_module -> None")
return None
if parent and parent.__path__ is None:
self.msgout(3, "import_module -> None")
return None
try:
fp, pathname, stuff = self.find_module(partname,
parent and parent.__path__, parent)
except ImportError:
self.msgout(3, "import_module ->", None)
return None
try:
m = self.load_module(fqname, fp, pathname, stuff)
finally:
if fp:
fp.close()
if parent:
setattr(parent, partname, m)
self.msgout(3, "import_module ->", m)
return m
def load_module(self, fqname, fp, pathname, file_info):
suffix, mode, type = file_info
self.msgin(2, "load_module", fqname, fp and "fp", pathname)
if type == imp.PKG_DIRECTORY:
m = self.load_package(fqname, pathname)
self.msgout(2, "load_module ->", m)
return m
if type == imp.PY_SOURCE:
co = compile(fp.read()+'\n', pathname, 'exec')
elif type == imp.PY_COMPILED:
try:
marshal_data = importlib._bootstrap._validate_bytecode_header(fp.read())
except ImportError as exc:
self.msgout(2, "raise ImportError: " + str(exc), pathname)
raise
co = marshal.loads(marshal_data)
else:
co = None
m = self.add_module(fqname)
m.__file__ = pathname
if co:
if self.replace_paths:
co = self.replace_paths_in_code(co)
m.__code__ = co
self.scan_code(co, m)
self.msgout(2, "load_module ->", m)
return m
def _add_badmodule(self, name, caller):
if name not in self.badmodules:
self.badmodules[name] = {}
if caller:
self.badmodules[name][caller.__name__] = 1
else:
self.badmodules[name]["-"] = 1
def _safe_import_hook(self, name, caller, fromlist, level=-1):
# wrapper for self.import_hook() that won't raise ImportError
if name in self.badmodules:
self._add_badmodule(name, caller)
return
try:
self.import_hook(name, caller, level=level)
except ImportError as msg:
self.msg(2, "ImportError:", str(msg))
self._add_badmodule(name, caller)
else:
if fromlist:
for sub in fromlist:
if sub in self.badmodules:
self._add_badmodule(sub, caller)
continue
try:
self.import_hook(name, caller, [sub], level=level)
except ImportError as msg:
self.msg(2, "ImportError:", str(msg))
fullname = name + "." + sub
self._add_badmodule(fullname, caller)
def scan_opcodes_25(self, co,
unpack = struct.unpack):
# Scan the code, and yield 'interesting' opcode combinations
# Python 2.5 version (has absolute and relative imports)
code = co.co_code
names = co.co_names
consts = co.co_consts
LOAD_LOAD_AND_IMPORT = LOAD_CONST + LOAD_CONST + IMPORT_NAME
while code:
c = bytes([code[0]])
if c in STORE_OPS:
oparg, = unpack('<H', code[1:3])
yield "store", (names[oparg],)
code = code[3:]
continue
if code[:9:3] == LOAD_LOAD_AND_IMPORT:
oparg_1, oparg_2, oparg_3 = unpack('<xHxHxH', code[:9])
level = consts[oparg_1]
if level == 0: # absolute import
yield "absolute_import", (consts[oparg_2], names[oparg_3])
else: # relative import
yield "relative_import", (level, consts[oparg_2], names[oparg_3])
code = code[9:]
continue
if c >= HAVE_ARGUMENT:
code = code[3:]
else:
code = code[1:]
def scan_code(self, co, m):
code = co.co_code
scanner = self.scan_opcodes_25
for what, args in scanner(co):
if what == "store":
name, = args
m.globalnames[name] = 1
elif what == "absolute_import":
fromlist, name = args
have_star = 0
if fromlist is not None:
if "*" in fromlist:
have_star = 1
fromlist = [f for f in fromlist if f != "*"]
self._safe_import_hook(name, m, fromlist, level=0)
if have_star:
# We've encountered an "import *". If it is a Python module,
# the code has already been parsed and we can suck out the
# global names.
mm = None
if m.__path__:
# At this point we don't know whether 'name' is a
# submodule of 'm' or a global module. Let's just try
# the full name first.
mm = self.modules.get(m.__name__ + "." + name)
if mm is None:
mm = self.modules.get(name)
if mm is not None:
m.globalnames.update(mm.globalnames)
m.starimports.update(mm.starimports)
if mm.__code__ is None:
m.starimports[name] = 1
else:
m.starimports[name] = 1
elif what == "relative_import":
level, fromlist, name = args
if name:
self._safe_import_hook(name, m, fromlist, level=level)
else:
parent = self.determine_parent(m, level=level)
self._safe_import_hook(parent.__name__, None, fromlist, level=0)
else:
# We don't expect anything else from the generator.
raise RuntimeError(what)
for c in co.co_consts:
if isinstance(c, type(co)):
self.scan_code(c, m)
def load_package(self, fqname, pathname):
self.msgin(2, "load_package", fqname, pathname)
newname = replacePackageMap.get(fqname)
if newname:
fqname = newname
m = self.add_module(fqname)
m.__file__ = pathname
m.__path__ = [pathname]
# As per comment at top of file, simulate runtime __path__ additions.
m.__path__ = m.__path__ + packagePathMap.get(fqname, [])
fp, buf, stuff = self.find_module("__init__", m.__path__)
try:
self.load_module(fqname, fp, buf, stuff)
self.msgout(2, "load_package ->", m)
return m
finally:
if fp:
fp.close()
def add_module(self, fqname):
if fqname in self.modules:
return self.modules[fqname]
self.modules[fqname] = m = Module(fqname)
return m
def find_module(self, name, path, parent=None):
if parent is not None:
# assert path is not None
fullname = parent.__name__+'.'+name
else:
fullname = name
if fullname in self.excludes:
self.msgout(3, "find_module -> Excluded", fullname)
raise ImportError(name)
if path is None:
if name in sys.builtin_module_names:
return (None, None, ("", "", imp.C_BUILTIN))
path = self.path
return imp.find_module(name, path)
def report(self):
"""Print a report to stdout, listing the found modules with their
paths, as well as modules that are missing, or seem to be missing.
"""
print()
print(" %-25s %s" % ("Name", "File"))
print(" %-25s %s" % ("----", "----"))
# Print modules found
keys = sorted(self.modules.keys())
for key in keys:
m = self.modules[key]
if m.__path__:
print("P", end=' ')
else:
print("m", end=' ')
print("%-25s" % key, m.__file__ or "")
# Print missing modules
missing, maybe = self.any_missing_maybe()
if missing:
print()
print("Missing modules:")
for name in missing:
mods = sorted(self.badmodules[name].keys())
print("?", name, "imported from", ', '.join(mods))
# Print modules that may be missing, but then again, maybe not...
if maybe:
print()
print("Submodules that appear to be missing, but could also be", end=' ')
print("global names in the parent package:")
for name in maybe:
mods = sorted(self.badmodules[name].keys())
print("?", name, "imported from", ', '.join(mods))
def any_missing(self):
"""Return a list of modules that appear to be missing. Use
any_missing_maybe() if you want to know which modules are
certain to be missing, and which *may* be missing.
"""
missing, maybe = self.any_missing_maybe()
return missing + maybe
def any_missing_maybe(self):
"""Return two lists, one with modules that are certainly missing
and one with modules that *may* be missing. The latter names could
either be submodules *or* just global names in the package.
The reason it can't always be determined is that it's impossible to
tell which names are imported when "from module import *" is done
with an extension module, short of actually importing it.
"""
missing = []
maybe = []
for name in self.badmodules:
if name in self.excludes:
continue
i = name.rfind(".")
if i < 0:
missing.append(name)
continue
subname = name[i+1:]
pkgname = name[:i]
pkg = self.modules.get(pkgname)
if pkg is not None:
if pkgname in self.badmodules[name]:
# The package tried to import this module itself and
# failed. It's definitely missing.
missing.append(name)
elif subname in pkg.globalnames:
# It's a global in the package: definitely not missing.
pass
elif pkg.starimports:
# It could be missing, but the package did an "import *"
# from a non-Python module, so we simply can't be sure.
maybe.append(name)
else:
# It's not a global in the package, the package didn't
# do funny star imports, it's very likely to be missing.
# The symbol could be inserted into the package from the
# outside, but since that's not good style we simply list
# it missing.
missing.append(name)
else:
missing.append(name)
missing.sort()
maybe.sort()
return missing, maybe
def replace_paths_in_code(self, co):
new_filename = original_filename = os.path.normpath(co.co_filename)
for f, r in self.replace_paths:
if original_filename.startswith(f):
new_filename = r + original_filename[len(f):]
break
if self.debug and original_filename not in self.processed_paths:
if new_filename != original_filename:
self.msgout(2, "co_filename %r changed to %r" \
% (original_filename,new_filename,))
else:
self.msgout(2, "co_filename %r remains unchanged" \
% (original_filename,))
self.processed_paths.append(original_filename)
consts = list(co.co_consts)
for i in range(len(consts)):
if isinstance(consts[i], type(co)):
consts[i] = self.replace_paths_in_code(consts[i])
return types.CodeType(co.co_argcount, co.co_kwonlyargcount,
co.co_nlocals, co.co_stacksize, co.co_flags,
co.co_code, tuple(consts), co.co_names,
co.co_varnames, new_filename, co.co_name,
co.co_firstlineno, co.co_lnotab, co.co_freevars,
co.co_cellvars)
def test():
# Parse command line
import getopt
try:
opts, args = getopt.getopt(sys.argv[1:], "dmp:qx:")
except getopt.error as msg:
print(msg)
return
# Process options
debug = 1
domods = 0
addpath = []
exclude = []
for o, a in opts:
if o == '-d':
debug = debug + 1
if o == '-m':
domods = 1
if o == '-p':
addpath = addpath + a.split(os.pathsep)
if o == '-q':
debug = 0
if o == '-x':
exclude.append(a)
# Provide default arguments
if not args:
script = "hello.py"
else:
script = args[0]
# Set the path based on sys.path and the script directory
path = sys.path[:]
path[0] = os.path.dirname(script)
path = addpath + path
if debug > 1:
print("path:")
for item in path:
print(" ", repr(item))
# Create the module finder and turn its crank
mf = ModuleFinder(path, debug, exclude)
for arg in args[1:]:
if arg == '-m':
domods = 1
continue
if domods:
if arg[-2:] == '.*':
mf.import_hook(arg[:-2], None, ["*"])
else:
mf.import_hook(arg)
else:
mf.load_file(arg)
mf.run_script(script)
mf.report()
return mf # for -i debugging
if __name__ == '__main__':
try:
mf = test()
except KeyboardInterrupt:
print("\n[interrupted]")
"""An object-oriented interface to .netrc files."""
# Module and documentation by Eric S. Raymond, 21 Dec 1998
import os, shlex, stat
__all__ = ["netrc", "NetrcParseError"]
class NetrcParseError(Exception):
"""Exception raised on syntax errors in the .netrc file."""
def __init__(self, msg, filename=None, lineno=None):
self.filename = filename
self.lineno = lineno
self.msg = msg
Exception.__init__(self, msg)
def __str__(self):
return "%s (%s, line %s)" % (self.msg, self.filename, self.lineno)
class netrc:
def __init__(self, file=None):
default_netrc = file is None
if file is None:
try:
file = os.path.join(os.environ['HOME'], ".netrc")
except KeyError:
raise OSError("Could not find .netrc: $HOME is not set")
self.hosts = {}
self.macros = {}
with open(file) as fp:
self._parse(file, fp, default_netrc)
def _parse(self, file, fp, default_netrc):
lexer = shlex.shlex(fp)
lexer.wordchars += r"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
lexer.commenters = lexer.commenters.replace('#', '')
while 1:
# Look for a machine, default, or macdef top-level keyword
saved_lineno = lexer.lineno
toplevel = tt = lexer.get_token()
if not tt:
break
elif tt[0] == '#':
if lexer.lineno == saved_lineno and len(tt) == 1:
lexer.instream.readline()
continue
elif tt == 'machine':
entryname = lexer.get_token()
elif tt == 'default':
entryname = 'default'
elif tt == 'macdef': # Just skip to end of macdefs
entryname = lexer.get_token()
self.macros[entryname] = []
lexer.whitespace = ' \t'
while 1:
line = lexer.instream.readline()
if not line or line == '\012':
lexer.whitespace = ' \t\r\n'
break
self.macros[entryname].append(line)
continue
else:
raise NetrcParseError(
"bad toplevel token %r" % tt, file, lexer.lineno)
# We're looking at start of an entry for a named machine or default.
login = ''
account = password = None
self.hosts[entryname] = {}
while 1:
tt = lexer.get_token()
if (tt.startswith('#') or
tt in {'', 'machine', 'default', 'macdef'}):
if password:
self.hosts[entryname] = (login, account, password)
lexer.push_token(tt)
break
else:
raise NetrcParseError(
"malformed %s entry %s terminated by %s"
% (toplevel, entryname, repr(tt)),
file, lexer.lineno)
elif tt == 'login' or tt == 'user':
login = lexer.get_token()
elif tt == 'account':
account = lexer.get_token()
elif tt == 'password':
if os.name == 'posix' and default_netrc:
prop = os.fstat(fp.fileno())
if prop.st_uid != os.getuid():
import pwd
try:
fowner = pwd.getpwuid(prop.st_uid)[0]
except KeyError:
fowner = 'uid %s' % prop.st_uid
try:
user = pwd.getpwuid(os.getuid())[0]
except KeyError:
user = 'uid %s' % os.getuid()
raise NetrcParseError(
("~/.netrc file owner (%s) does not match"
" current user (%s)") % (fowner, user),
file, lexer.lineno)
if (prop.st_mode & (stat.S_IRWXG | stat.S_IRWXO)):
raise NetrcParseError(
"~/.netrc access too permissive: access"
" permissions must restrict access to only"
" the owner", file, lexer.lineno)
password = lexer.get_token()
else:
raise NetrcParseError("bad follower token %r" % tt,
file, lexer.lineno)
def authenticators(self, host):
"""Return a (user, account, password) tuple for given host."""
if host in self.hosts:
return self.hosts[host]
elif 'default' in self.hosts:
return self.hosts['default']
else:
return None
def __repr__(self):
"""Dump the class data in the format of a .netrc file."""
rep = ""
for host in self.hosts.keys():
attrs = self.hosts[host]
rep = rep + "machine "+ host + "\n\tlogin " + repr(attrs[0]) + "\n"
if attrs[1]:
rep = rep + "account " + repr(attrs[1])
rep = rep + "\tpassword " + repr(attrs[2]) + "\n"
for macro in self.macros.keys():
rep = rep + "macdef " + macro + "\n"
for line in self.macros[macro]:
rep = rep + line
rep = rep + "\n"
return rep
if __name__ == '__main__':
print(netrc())
"""An NNTP client class based on:
- RFC 977: Network News Transfer Protocol
- RFC 2980: Common NNTP Extensions
- RFC 3977: Network News Transfer Protocol (version 2)
Example:
>>> from nntplib import NNTP
>>> s = NNTP('news')
>>> resp, count, first, last, name = s.group('comp.lang.python')
>>> print('Group', name, 'has', count, 'articles, range', first, 'to', last)
Group comp.lang.python has 51 articles, range 5770 to 5821
>>> resp, subs = s.xhdr('subject', '{0}-{1}'.format(first, last))
>>> resp = s.quit()
>>>
Here 'resp' is the server response line.
Error responses are turned into exceptions.
To post an article from a file:
>>> f = open(filename, 'rb') # file containing article, including header
>>> resp = s.post(f)
>>>
For descriptions of all methods, read the comments in the code below.
Note that all arguments and return values representing article numbers
are strings, not numbers, since they are rarely used for calculations.
"""
# RFC 977 by Brian Kantor and Phil Lapsley.
# xover, xgtitle, xpath, date methods by Kevan Heydon
# Incompatible changes from the 2.x nntplib:
# - all commands are encoded as UTF-8 data (using the "surrogateescape"
# error handler), except for raw message data (POST, IHAVE)
# - all responses are decoded as UTF-8 data (using the "surrogateescape"
# error handler), except for raw message data (ARTICLE, HEAD, BODY)
# - the `file` argument to various methods is keyword-only
#
# - NNTP.date() returns a datetime object
# - NNTP.newgroups() and NNTP.newnews() take a datetime (or date) object,
# rather than a pair of (date, time) strings.
# - NNTP.newgroups() and NNTP.list() return a list of GroupInfo named tuples
# - NNTP.descriptions() returns a dict mapping group names to descriptions
# - NNTP.xover() returns a list of dicts mapping field names (header or metadata)
# to field values; each dict representing a message overview.
# - NNTP.article(), NNTP.head() and NNTP.body() return a (response, ArticleInfo)
# tuple.
# - the "internal" methods have been marked private (they now start with
# an underscore)
# Other changes from the 2.x/3.1 nntplib:
# - automatic querying of capabilities at connect
# - New method NNTP.getcapabilities()
# - New method NNTP.over()
# - New helper function decode_header()
# - NNTP.post() and NNTP.ihave() accept file objects, bytes-like objects and
# arbitrary iterables yielding lines.
# - An extensive test suite :-)
# TODO:
# - return structured data (GroupInfo etc.) everywhere
# - support HDR
# Imports
import re
import socket
import collections
import datetime
import warnings
try:
import ssl
except ImportError:
_have_ssl = False
else:
_have_ssl = True
from email.header import decode_header as _email_decode_header
from socket import _GLOBAL_DEFAULT_TIMEOUT
__all__ = ["NNTP",
"NNTPError", "NNTPReplyError", "NNTPTemporaryError",
"NNTPPermanentError", "NNTPProtocolError", "NNTPDataError",
"decode_header",
]
# maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 3977 limits NNTP line length to
# 512 characters, including CRLF. We have selected 2048 just to be on
# the safe side.
_MAXLINE = 2048
# Exceptions raised when an error or invalid response is received
class NNTPError(Exception):
"""Base class for all nntplib exceptions"""
def __init__(self, *args):
Exception.__init__(self, *args)
try:
self.response = args[0]
except IndexError:
self.response = 'No response given'
class NNTPReplyError(NNTPError):
"""Unexpected [123]xx reply"""
pass
class NNTPTemporaryError(NNTPError):
"""4xx errors"""
pass
class NNTPPermanentError(NNTPError):
"""5xx errors"""
pass
class NNTPProtocolError(NNTPError):
"""Response does not begin with [1-5]"""
pass
class NNTPDataError(NNTPError):
"""Error in response data"""
pass
# Standard port used by NNTP servers
NNTP_PORT = 119
NNTP_SSL_PORT = 563
# Response numbers that are followed by additional text (e.g. article)
_LONGRESP = {
'100', # HELP
'101', # CAPABILITIES
'211', # LISTGROUP (also not multi-line with GROUP)
'215', # LIST
'220', # ARTICLE
'221', # HEAD, XHDR
'222', # BODY
'224', # OVER, XOVER
'225', # HDR
'230', # NEWNEWS
'231', # NEWGROUPS
'282', # XGTITLE
}
# Default decoded value for LIST OVERVIEW.FMT if not supported
_DEFAULT_OVERVIEW_FMT = [
"subject", "from", "date", "message-id", "references", ":bytes", ":lines"]
# Alternative names allowed in LIST OVERVIEW.FMT response
_OVERVIEW_FMT_ALTERNATIVES = {
'bytes': ':bytes',
'lines': ':lines',
}
# Line terminators (we always output CRLF, but accept any of CRLF, CR, LF)
_CRLF = b'\r\n'
GroupInfo = collections.namedtuple('GroupInfo',
['group', 'last', 'first', 'flag'])
ArticleInfo = collections.namedtuple('ArticleInfo',
['number', 'message_id', 'lines'])
# Helper function(s)
def decode_header(header_str):
"""Takes an unicode string representing a munged header value
and decodes it as a (possibly non-ASCII) readable value."""
parts = []
for v, enc in _email_decode_header(header_str):
if isinstance(v, bytes):
parts.append(v.decode(enc or 'ascii'))
else:
parts.append(v)
return ''.join(parts)
def _parse_overview_fmt(lines):
"""Parse a list of string representing the response to LIST OVERVIEW.FMT
and return a list of header/metadata names.
Raises NNTPDataError if the response is not compliant
(cf. RFC 3977, section 8.4)."""
fmt = []
for line in lines:
if line[0] == ':':
# Metadata name (e.g. ":bytes")
name, _, suffix = line[1:].partition(':')
name = ':' + name
else:
# Header name (e.g. "Subject:" or "Xref:full")
name, _, suffix = line.partition(':')
name = name.lower()
name = _OVERVIEW_FMT_ALTERNATIVES.get(name, name)
# Should we do something with the suffix?
fmt.append(name)
defaults = _DEFAULT_OVERVIEW_FMT
if len(fmt) < len(defaults):
raise NNTPDataError("LIST OVERVIEW.FMT response too short")
if fmt[:len(defaults)] != defaults:
raise NNTPDataError("LIST OVERVIEW.FMT redefines default fields")
return fmt
def _parse_overview(lines, fmt, data_process_func=None):
"""Parse the response to an OVER or XOVER command according to the
overview format `fmt`."""
n_defaults = len(_DEFAULT_OVERVIEW_FMT)
overview = []
for line in lines:
fields = {}
article_number, *tokens = line.split('\t')
article_number = int(article_number)
for i, token in enumerate(tokens):
if i >= len(fmt):
# XXX should we raise an error? Some servers might not
# support LIST OVERVIEW.FMT and still return additional
# headers.
continue
field_name = fmt[i]
is_metadata = field_name.startswith(':')
if i >= n_defaults and not is_metadata:
# Non-default header names are included in full in the response
# (unless the field is totally empty)
h = field_name + ": "
if token and token[:len(h)].lower() != h:
raise NNTPDataError("OVER/XOVER response doesn't include "
"names of additional headers")
token = token[len(h):] if token else None
fields[fmt[i]] = token
overview.append((article_number, fields))
return overview
def _parse_datetime(date_str, time_str=None):
"""Parse a pair of (date, time) strings, and return a datetime object.
If only the date is given, it is assumed to be date and time
concatenated together (e.g. response to the DATE command).
"""
if time_str is None:
time_str = date_str[-6:]
date_str = date_str[:-6]
hours = int(time_str[:2])
minutes = int(time_str[2:4])
seconds = int(time_str[4:])
year = int(date_str[:-4])
month = int(date_str[-4:-2])
day = int(date_str[-2:])
# RFC 3977 doesn't say how to interpret 2-char years. Assume that
# there are no dates before 1970 on Usenet.
if year < 70:
year += 2000
elif year < 100:
year += 1900
return datetime.datetime(year, month, day, hours, minutes, seconds)
def _unparse_datetime(dt, legacy=False):
"""Format a date or datetime object as a pair of (date, time) strings
in the format required by the NEWNEWS and NEWGROUPS commands. If a
date object is passed, the time is assumed to be midnight (00h00).
The returned representation depends on the legacy flag:
* if legacy is False (the default):
date has the YYYYMMDD format and time the HHMMSS format
* if legacy is True:
date has the YYMMDD format and time the HHMMSS format.
RFC 3977 compliant servers should understand both formats; therefore,
legacy is only needed when talking to old servers.
"""
if not isinstance(dt, datetime.datetime):
time_str = "000000"
else:
time_str = "{0.hour:02d}{0.minute:02d}{0.second:02d}".format(dt)
y = dt.year
if legacy:
y = y % 100
date_str = "{0:02d}{1.month:02d}{1.day:02d}".format(y, dt)
else:
date_str = "{0:04d}{1.month:02d}{1.day:02d}".format(y, dt)
return date_str, time_str
if _have_ssl:
def _encrypt_on(sock, context, hostname):
"""Wrap a socket in SSL/TLS. Arguments:
- sock: Socket to wrap
- context: SSL context to use for the encrypted connection
Returns:
- sock: New, encrypted socket.
"""
# Generate a default SSL context if none was passed.
if context is None:
context = ssl._create_stdlib_context()
return context.wrap_socket(sock, server_hostname=hostname)
# The classes themselves
class _NNTPBase:
# UTF-8 is the character set for all NNTP commands and responses: they
# are automatically encoded (when sending) and decoded (and receiving)
# by this class.
# However, some multi-line data blocks can contain arbitrary bytes (for
# example, latin-1 or utf-16 data in the body of a message). Commands
# taking (POST, IHAVE) or returning (HEAD, BODY, ARTICLE) raw message
# data will therefore only accept and produce bytes objects.
# Furthermore, since there could be non-compliant servers out there,
# we use 'surrogateescape' as the error handler for fault tolerance
# and easy round-tripping. This could be useful for some applications
# (e.g. NNTP gateways).
encoding = 'utf-8'
errors = 'surrogateescape'
def __init__(self, file, host,
readermode=None, timeout=_GLOBAL_DEFAULT_TIMEOUT):
"""Initialize an instance. Arguments:
- file: file-like object (open for read/write in binary mode)
- host: hostname of the server
- readermode: if true, send 'mode reader' command after
connecting.
- timeout: timeout (in seconds) used for socket connections
readermode is sometimes necessary if you are connecting to an
NNTP server on the local machine and intend to call
reader-specific commands, such as `group'. If you get
unexpected NNTPPermanentErrors, you might need to set
readermode.
"""
self.host = host
self.file = file
self.debugging = 0
self.welcome = self._getresp()
# Inquire about capabilities (RFC 3977).
self._caps = None
self.getcapabilities()
# 'MODE READER' is sometimes necessary to enable 'reader' mode.
# However, the order in which 'MODE READER' and 'AUTHINFO' need to
# arrive differs between some NNTP servers. If _setreadermode() fails
# with an authorization failed error, it will set this to True;
# the login() routine will interpret that as a request to try again
# after performing its normal function.
# Enable only if we're not already in READER mode anyway.
self.readermode_afterauth = False
if readermode and 'READER' not in self._caps:
self._setreadermode()
if not self.readermode_afterauth:
# Capabilities might have changed after MODE READER
self._caps = None
self.getcapabilities()
# RFC 4642 2.2.2: Both the client and the server MUST know if there is
# a TLS session active. A client MUST NOT attempt to start a TLS
# session if a TLS session is already active.
self.tls_on = False
# Log in and encryption setup order is left to subclasses.
self.authenticated = False
def __enter__(self):
return self
def __exit__(self, *args):
is_connected = lambda: hasattr(self, "file")
if is_connected():
try:
self.quit()
except (OSError, EOFError):
pass
finally:
if is_connected():
self._close()
def getwelcome(self):
"""Get the welcome message from the server
(this is read and squirreled away by __init__()).
If the response code is 200, posting is allowed;
if it 201, posting is not allowed."""
if self.debugging: print('*welcome*', repr(self.welcome))
return self.welcome
def getcapabilities(self):
"""Get the server capabilities, as read by __init__().
If the CAPABILITIES command is not supported, an empty dict is
returned."""
if self._caps is None:
self.nntp_version = 1
self.nntp_implementation = None
try:
resp, caps = self.capabilities()
except (NNTPPermanentError, NNTPTemporaryError):
# Server doesn't support capabilities
self._caps = {}
else:
self._caps = caps
if 'VERSION' in caps:
# The server can advertise several supported versions,
# choose the highest.
self.nntp_version = max(map(int, caps['VERSION']))
if 'IMPLEMENTATION' in caps:
self.nntp_implementation = ' '.join(caps['IMPLEMENTATION'])
return self._caps
def set_debuglevel(self, level):
"""Set the debugging level. Argument 'level' means:
0: no debugging output (default)
1: print commands and responses but not body text etc.
2: also print raw lines read and sent before stripping CR/LF"""
self.debugging = level
debug = set_debuglevel
def _putline(self, line):
"""Internal: send one line to the server, appending CRLF.
The `line` must be a bytes-like object."""
line = line + _CRLF
if self.debugging > 1: print('*put*', repr(line))
self.file.write(line)
self.file.flush()
def _putcmd(self, line):
"""Internal: send one command to the server (through _putline()).
The `line` must be an unicode string."""
if self.debugging: print('*cmd*', repr(line))
line = line.encode(self.encoding, self.errors)
self._putline(line)
def _getline(self, strip_crlf=True):
"""Internal: return one line from the server, stripping _CRLF.
Raise EOFError if the connection is closed.
Returns a bytes object."""
line = self.file.readline(_MAXLINE +1)
if len(line) > _MAXLINE:
raise NNTPDataError('line too long')
if self.debugging > 1:
print('*get*', repr(line))
if not line: raise EOFError
if strip_crlf:
if line[-2:] == _CRLF:
line = line[:-2]
elif line[-1:] in _CRLF:
line = line[:-1]
return line
def _getresp(self):
"""Internal: get a response from the server.
Raise various errors if the response indicates an error.
Returns an unicode string."""
resp = self._getline()
if self.debugging: print('*resp*', repr(resp))
resp = resp.decode(self.encoding, self.errors)
c = resp[:1]
if c == '4':
raise NNTPTemporaryError(resp)
if c == '5':
raise NNTPPermanentError(resp)
if c not in '123':
raise NNTPProtocolError(resp)
return resp
def _getlongresp(self, file=None):
"""Internal: get a response plus following text from the server.
Raise various errors if the response indicates an error.
Returns a (response, lines) tuple where `response` is an unicode
string and `lines` is a list of bytes objects.
If `file` is a file-like object, it must be open in binary mode.
"""
openedFile = None
try:
# If a string was passed then open a file with that name
if isinstance(file, (str, bytes)):
openedFile = file = open(file, "wb")
resp = self._getresp()
if resp[:3] not in _LONGRESP:
raise NNTPReplyError(resp)
lines = []
if file is not None:
# XXX lines = None instead?
terminators = (b'.' + _CRLF, b'.\n')
while 1:
line = self._getline(False)
if line in terminators:
break
if line.startswith(b'..'):
line = line[1:]
file.write(line)
else:
terminator = b'.'
while 1:
line = self._getline()
if line == terminator:
break
if line.startswith(b'..'):
line = line[1:]
lines.append(line)
finally:
# If this method created the file, then it must close it
if openedFile:
openedFile.close()
return resp, lines
def _shortcmd(self, line):
"""Internal: send a command and get the response.
Same return value as _getresp()."""
self._putcmd(line)
return self._getresp()
def _longcmd(self, line, file=None):
"""Internal: send a command and get the response plus following text.
Same return value as _getlongresp()."""
self._putcmd(line)
return self._getlongresp(file)
def _longcmdstring(self, line, file=None):
"""Internal: send a command and get the response plus following text.
Same as _longcmd() and _getlongresp(), except that the returned `lines`
are unicode strings rather than bytes objects.
"""
self._putcmd(line)
resp, list = self._getlongresp(file)
return resp, [line.decode(self.encoding, self.errors)
for line in list]
def _getoverviewfmt(self):
"""Internal: get the overview format. Queries the server if not
already done, else returns the cached value."""
try:
return self._cachedoverviewfmt
except AttributeError:
pass
try:
resp, lines = self._longcmdstring("LIST OVERVIEW.FMT")
except NNTPPermanentError:
# Not supported by server?
fmt = _DEFAULT_OVERVIEW_FMT[:]
else:
fmt = _parse_overview_fmt(lines)
self._cachedoverviewfmt = fmt
return fmt
def _grouplist(self, lines):
# Parse lines into "group last first flag"
return [GroupInfo(*line.split()) for line in lines]
def capabilities(self):
"""Process a CAPABILITIES command. Not supported by all servers.
Return:
- resp: server response if successful
- caps: a dictionary mapping capability names to lists of tokens
(for example {'VERSION': ['2'], 'OVER': [], LIST: ['ACTIVE', 'HEADERS'] })
"""
caps = {}
resp, lines = self._longcmdstring("CAPABILITIES")
for line in lines:
name, *tokens = line.split()
caps[name] = tokens
return resp, caps
def newgroups(self, date, *, file=None):
"""Process a NEWGROUPS command. Arguments:
- date: a date or datetime object
Return:
- resp: server response if successful
- list: list of newsgroup names
"""
if not isinstance(date, (datetime.date, datetime.date)):
raise TypeError(
"the date parameter must be a date or datetime object, "
"not '{:40}'".format(date.__class__.__name__))
date_str, time_str = _unparse_datetime(date, self.nntp_version < 2)
cmd = 'NEWGROUPS {0} {1}'.format(date_str, time_str)
resp, lines = self._longcmdstring(cmd, file)
return resp, self._grouplist(lines)
def newnews(self, group, date, *, file=None):
"""Process a NEWNEWS command. Arguments:
- group: group name or '*'
- date: a date or datetime object
Return:
- resp: server response if successful
- list: list of message ids
"""
if not isinstance(date, (datetime.date, datetime.date)):
raise TypeError(
"the date parameter must be a date or datetime object, "
"not '{:40}'".format(date.__class__.__name__))
date_str, time_str = _unparse_datetime(date, self.nntp_version < 2)
cmd = 'NEWNEWS {0} {1} {2}'.format(group, date_str, time_str)
return self._longcmdstring(cmd, file)
def list(self, group_pattern=None, *, file=None):
"""Process a LIST or LIST ACTIVE command. Arguments:
- group_pattern: a pattern indicating which groups to query
- file: Filename string or file object to store the result in
Returns:
- resp: server response if successful
- list: list of (group, last, first, flag) (strings)
"""
if group_pattern is not None:
command = 'LIST ACTIVE ' + group_pattern
else:
command = 'LIST'
resp, lines = self._longcmdstring(command, file)
return resp, self._grouplist(lines)
def _getdescriptions(self, group_pattern, return_all):
line_pat = re.compile('^(?P<group>[^ \t]+)[ \t]+(.*)$')
# Try the more std (acc. to RFC2980) LIST NEWSGROUPS first
resp, lines = self._longcmdstring('LIST NEWSGROUPS ' + group_pattern)
if not resp.startswith('215'):
# Now the deprecated XGTITLE. This either raises an error
# or succeeds with the same output structure as LIST
# NEWSGROUPS.
resp, lines = self._longcmdstring('XGTITLE ' + group_pattern)
groups = {}
for raw_line in lines:
match = line_pat.search(raw_line.strip())
if match:
name, desc = match.group(1, 2)
if not return_all:
return desc
groups[name] = desc
if return_all:
return resp, groups
else:
# Nothing found
return ''
def description(self, group):
"""Get a description for a single group. If more than one
group matches ('group' is a pattern), return the first. If no
group matches, return an empty string.
This elides the response code from the server, since it can
only be '215' or '285' (for xgtitle) anyway. If the response
code is needed, use the 'descriptions' method.
NOTE: This neither checks for a wildcard in 'group' nor does
it check whether the group actually exists."""
return self._getdescriptions(group, False)
def descriptions(self, group_pattern):
"""Get descriptions for a range of groups."""
return self._getdescriptions(group_pattern, True)
def group(self, name):
"""Process a GROUP command. Argument:
- group: the group name
Returns:
- resp: server response if successful
- count: number of articles
- first: first article number
- last: last article number
- name: the group name
"""
resp = self._shortcmd('GROUP ' + name)
if not resp.startswith('211'):
raise NNTPReplyError(resp)
words = resp.split()
count = first = last = 0
n = len(words)
if n > 1:
count = words[1]
if n > 2:
first = words[2]
if n > 3:
last = words[3]
if n > 4:
name = words[4].lower()
return resp, int(count), int(first), int(last), name
def help(self, *, file=None):
"""Process a HELP command. Argument:
- file: Filename string or file object to store the result in
Returns:
- resp: server response if successful
- list: list of strings returned by the server in response to the
HELP command
"""
return self._longcmdstring('HELP', file)
def _statparse(self, resp):
"""Internal: parse the response line of a STAT, NEXT, LAST,
ARTICLE, HEAD or BODY command."""
if not resp.startswith('22'):
raise NNTPReplyError(resp)
words = resp.split()
art_num = int(words[1])
message_id = words[2]
return resp, art_num, message_id
def _statcmd(self, line):
"""Internal: process a STAT, NEXT or LAST command."""
resp = self._shortcmd(line)
return self._statparse(resp)
def stat(self, message_spec=None):
"""Process a STAT command. Argument:
- message_spec: article number or message id (if not specified,
the current article is selected)
Returns:
- resp: server response if successful
- art_num: the article number
- message_id: the message id
"""
if message_spec:
return self._statcmd('STAT {0}'.format(message_spec))
else:
return self._statcmd('STAT')
def next(self):
"""Process a NEXT command. No arguments. Return as for STAT."""
return self._statcmd('NEXT')
def last(self):
"""Process a LAST command. No arguments. Return as for STAT."""
return self._statcmd('LAST')
def _artcmd(self, line, file=None):
"""Internal: process a HEAD, BODY or ARTICLE command."""
resp, lines = self._longcmd(line, file)
resp, art_num, message_id = self._statparse(resp)
return resp, ArticleInfo(art_num, message_id, lines)
def head(self, message_spec=None, *, file=None):
"""Process a HEAD command. Argument:
- message_spec: article number or message id
- file: filename string or file object to store the headers in
Returns:
- resp: server response if successful
- ArticleInfo: (article number, message id, list of header lines)
"""
if message_spec is not None:
cmd = 'HEAD {0}'.format(message_spec)
else:
cmd = 'HEAD'
return self._artcmd(cmd, file)
def body(self, message_spec=None, *, file=None):
"""Process a BODY command. Argument:
- message_spec: article number or message id
- file: filename string or file object to store the body in
Returns:
- resp: server response if successful
- ArticleInfo: (article number, message id, list of body lines)
"""
if message_spec is not None:
cmd = 'BODY {0}'.format(message_spec)
else:
cmd = 'BODY'
return self._artcmd(cmd, file)
def article(self, message_spec=None, *, file=None):
"""Process an ARTICLE command. Argument:
- message_spec: article number or message id
- file: filename string or file object to store the article in
Returns:
- resp: server response if successful
- ArticleInfo: (article number, message id, list of article lines)
"""
if message_spec is not None:
cmd = 'ARTICLE {0}'.format(message_spec)
else:
cmd = 'ARTICLE'
return self._artcmd(cmd, file)
def slave(self):
"""Process a SLAVE command. Returns:
- resp: server response if successful
"""
return self._shortcmd('SLAVE')
def xhdr(self, hdr, str, *, file=None):
"""Process an XHDR command (optional server extension). Arguments:
- hdr: the header type (e.g. 'subject')
- str: an article nr, a message id, or a range nr1-nr2
- file: Filename string or file object to store the result in
Returns:
- resp: server response if successful
- list: list of (nr, value) strings
"""
pat = re.compile('^([0-9]+) ?(.*)\n?')
resp, lines = self._longcmdstring('XHDR {0} {1}'.format(hdr, str), file)
def remove_number(line):
m = pat.match(line)
return m.group(1, 2) if m else line
return resp, [remove_number(line) for line in lines]
def xover(self, start, end, *, file=None):
"""Process an XOVER command (optional server extension) Arguments:
- start: start of range
- end: end of range
- file: Filename string or file object to store the result in
Returns:
- resp: server response if successful
- list: list of dicts containing the response fields
"""
resp, lines = self._longcmdstring('XOVER {0}-{1}'.format(start, end),
file)
fmt = self._getoverviewfmt()
return resp, _parse_overview(lines, fmt)
def over(self, message_spec, *, file=None):
"""Process an OVER command. If the command isn't supported, fall
back to XOVER. Arguments:
- message_spec:
- either a message id, indicating the article to fetch
information about
- or a (start, end) tuple, indicating a range of article numbers;
if end is None, information up to the newest message will be
retrieved
- or None, indicating the current article number must be used
- file: Filename string or file object to store the result in
Returns:
- resp: server response if successful
- list: list of dicts containing the response fields
NOTE: the "message id" form isn't supported by XOVER
"""
cmd = 'OVER' if 'OVER' in self._caps else 'XOVER'
if isinstance(message_spec, (tuple, list)):
start, end = message_spec
cmd += ' {0}-{1}'.format(start, end or '')
elif message_spec is not None:
cmd = cmd + ' ' + message_spec
resp, lines = self._longcmdstring(cmd, file)
fmt = self._getoverviewfmt()
return resp, _parse_overview(lines, fmt)
def xgtitle(self, group, *, file=None):
"""Process an XGTITLE command (optional server extension) Arguments:
- group: group name wildcard (i.e. news.*)
Returns:
- resp: server response if successful
- list: list of (name,title) strings"""
warnings.warn("The XGTITLE extension is not actively used, "
"use descriptions() instead",
DeprecationWarning, 2)
line_pat = re.compile('^([^ \t]+)[ \t]+(.*)$')
resp, raw_lines = self._longcmdstring('XGTITLE ' + group, file)
lines = []
for raw_line in raw_lines:
match = line_pat.search(raw_line.strip())
if match:
lines.append(match.group(1, 2))
return resp, lines
def xpath(self, id):
"""Process an XPATH command (optional server extension) Arguments:
- id: Message id of article
Returns:
resp: server response if successful
path: directory path to article
"""
warnings.warn("The XPATH extension is not actively used",
DeprecationWarning, 2)
resp = self._shortcmd('XPATH {0}'.format(id))
if not resp.startswith('223'):
raise NNTPReplyError(resp)
try:
[resp_num, path] = resp.split()
except ValueError:
raise NNTPReplyError(resp)
else:
return resp, path
def date(self):
"""Process the DATE command.
Returns:
- resp: server response if successful
- date: datetime object
"""
resp = self._shortcmd("DATE")
if not resp.startswith('111'):
raise NNTPReplyError(resp)
elem = resp.split()
if len(elem) != 2:
raise NNTPDataError(resp)
date = elem[1]
if len(date) != 14:
raise NNTPDataError(resp)
return resp, _parse_datetime(date, None)
def _post(self, command, f):
resp = self._shortcmd(command)
# Raises a specific exception if posting is not allowed
if not resp.startswith('3'):
raise NNTPReplyError(resp)
if isinstance(f, (bytes, bytearray)):
f = f.splitlines()
# We don't use _putline() because:
# - we don't want additional CRLF if the file or iterable is already
# in the right format
# - we don't want a spurious flush() after each line is written
for line in f:
if not line.endswith(_CRLF):
line = line.rstrip(b"\r\n") + _CRLF
if line.startswith(b'.'):
line = b'.' + line
self.file.write(line)
self.file.write(b".\r\n")
self.file.flush()
return self._getresp()
def post(self, data):
"""Process a POST command. Arguments:
- data: bytes object, iterable or file containing the article
Returns:
- resp: server response if successful"""
return self._post('POST', data)
def ihave(self, message_id, data):
"""Process an IHAVE command. Arguments:
- message_id: message-id of the article
- data: file containing the article
Returns:
- resp: server response if successful
Note that if the server refuses the article an exception is raised."""
return self._post('IHAVE {0}'.format(message_id), data)
def _close(self):
self.file.close()
del self.file
def quit(self):
"""Process a QUIT command and close the socket. Returns:
- resp: server response if successful"""
try:
resp = self._shortcmd('QUIT')
finally:
self._close()
return resp
def login(self, user=None, password=None, usenetrc=True):
if self.authenticated:
raise ValueError("Already logged in.")
if not user and not usenetrc:
raise ValueError(
"At least one of `user` and `usenetrc` must be specified")
# If no login/password was specified but netrc was requested,
# try to get them from ~/.netrc
# Presume that if .netrc has an entry, NNRP authentication is required.
try:
if usenetrc and not user:
import netrc
credentials = netrc.netrc()
auth = credentials.authenticators(self.host)
if auth:
user = auth[0]
password = auth[2]
except OSError:
pass
# Perform NNTP authentication if needed.
if not user:
return
resp = self._shortcmd('authinfo user ' + user)
if resp.startswith('381'):
if not password:
raise NNTPReplyError(resp)
else:
resp = self._shortcmd('authinfo pass ' + password)
if not resp.startswith('281'):
raise NNTPPermanentError(resp)
# Capabilities might have changed after login
self._caps = None
self.getcapabilities()
# Attempt to send mode reader if it was requested after login.
# Only do so if we're not in reader mode already.
if self.readermode_afterauth and 'READER' not in self._caps:
self._setreadermode()
# Capabilities might have changed after MODE READER
self._caps = None
self.getcapabilities()
def _setreadermode(self):
try:
self.welcome = self._shortcmd('mode reader')
except NNTPPermanentError:
# Error 5xx, probably 'not implemented'
pass
except NNTPTemporaryError as e:
if e.response.startswith('480'):
# Need authorization before 'mode reader'
self.readermode_afterauth = True
else:
raise
if _have_ssl:
def starttls(self, context=None):
"""Process a STARTTLS command. Arguments:
- context: SSL context to use for the encrypted connection
"""
# Per RFC 4642, STARTTLS MUST NOT be sent after authentication or if
# a TLS session already exists.
if self.tls_on:
raise ValueError("TLS is already enabled.")
if self.authenticated:
raise ValueError("TLS cannot be started after authentication.")
resp = self._shortcmd('STARTTLS')
if resp.startswith('382'):
self.file.close()
self.sock = _encrypt_on(self.sock, context, self.host)
self.file = self.sock.makefile("rwb")
self.tls_on = True
# Capabilities may change after TLS starts up, so ask for them
# again.
self._caps = None
self.getcapabilities()
else:
raise NNTPError("TLS failed to start.")
class NNTP(_NNTPBase):
def __init__(self, host, port=NNTP_PORT, user=None, password=None,
readermode=None, usenetrc=False,
timeout=_GLOBAL_DEFAULT_TIMEOUT):
"""Initialize an instance. Arguments:
- host: hostname to connect to
- port: port to connect to (default the standard NNTP port)
- user: username to authenticate with
- password: password to use with username
- readermode: if true, send 'mode reader' command after
connecting.
- usenetrc: allow loading username and password from ~/.netrc file
if not specified explicitly
- timeout: timeout (in seconds) used for socket connections
readermode is sometimes necessary if you are connecting to an
NNTP server on the local machine and intend to call
reader-specific commands, such as `group'. If you get
unexpected NNTPPermanentErrors, you might need to set
readermode.
"""
self.host = host
self.port = port
self.sock = socket.create_connection((host, port), timeout)
file = None
try:
file = self.sock.makefile("rwb")
_NNTPBase.__init__(self, file, host,
readermode, timeout)
if user or usenetrc:
self.login(user, password, usenetrc)
except:
if file:
file.close()
self.sock.close()
raise
def _close(self):
try:
_NNTPBase._close(self)
finally:
self.sock.close()
if _have_ssl:
class NNTP_SSL(_NNTPBase):
def __init__(self, host, port=NNTP_SSL_PORT,
user=None, password=None, ssl_context=None,
readermode=None, usenetrc=False,
timeout=_GLOBAL_DEFAULT_TIMEOUT):
"""This works identically to NNTP.__init__, except for the change
in default port and the `ssl_context` argument for SSL connections.
"""
self.sock = socket.create_connection((host, port), timeout)
file = None
try:
self.sock = _encrypt_on(self.sock, ssl_context, host)
file = self.sock.makefile("rwb")
_NNTPBase.__init__(self, file, host,
readermode=readermode, timeout=timeout)
if user or usenetrc:
self.login(user, password, usenetrc)
except:
if file:
file.close()
self.sock.close()
raise
def _close(self):
try:
_NNTPBase._close(self)
finally:
self.sock.close()
__all__.append("NNTP_SSL")
# Test retrieval when run as a script.
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description="""\
nntplib built-in demo - display the latest articles in a newsgroup""")
parser.add_argument('-g', '--group', default='gmane.comp.python.general',
help='group to fetch messages from (default: %(default)s)')
parser.add_argument('-s', '--server', default='news.gmane.org',
help='NNTP server hostname (default: %(default)s)')
parser.add_argument('-p', '--port', default=-1, type=int,
help='NNTP port number (default: %s / %s)' % (NNTP_PORT, NNTP_SSL_PORT))
parser.add_argument('-n', '--nb-articles', default=10, type=int,
help='number of articles to fetch (default: %(default)s)')
parser.add_argument('-S', '--ssl', action='store_true', default=False,
help='use NNTP over SSL')
args = parser.parse_args()
port = args.port
if not args.ssl:
if port == -1:
port = NNTP_PORT
s = NNTP(host=args.server, port=port)
else:
if port == -1:
port = NNTP_SSL_PORT
s = NNTP_SSL(host=args.server, port=port)
caps = s.getcapabilities()
if 'STARTTLS' in caps:
s.starttls()
resp, count, first, last, name = s.group(args.group)
print('Group', name, 'has', count, 'articles, range', first, 'to', last)
def cut(s, lim):
if len(s) > lim:
s = s[:lim - 4] + "..."
return s
first = str(int(last) - args.nb_articles + 1)
resp, overviews = s.xover(first, last)
for artnum, over in overviews:
author = decode_header(over['from']).split('<', 1)[0]
subject = decode_header(over['subject'])
lines = int(over[':lines'])
print("{:7} {:20} {:42} ({})".format(
artnum, cut(author, 20), cut(subject, 42), lines)
)
s.quit()
# Module 'ntpath' -- common operations on WinNT/Win95 pathnames
"""Common pathname manipulations, WindowsNT/95 version.
Instead of importing this module directly, import os and refer to this
module as os.path.
"""
import os
import sys
import stat
import genericpath
from genericpath import *
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime", "islink","exists","lexists","isdir","isfile",
"ismount", "expanduser","expandvars","normpath","abspath",
"splitunc","curdir","pardir","sep","pathsep","defpath","altsep",
"extsep","devnull","realpath","supports_unicode_filenames","relpath",
"samefile", "sameopenfile", "samestat",]
# strings representing various path-related bits and pieces
# These are primarily for export; internally, they are hardcoded.
curdir = '.'
pardir = '..'
extsep = '.'
sep = '\\'
pathsep = ';'
altsep = '/'
defpath = '.;C:\\bin'
if 'ce' in sys.builtin_module_names:
defpath = '\\Windows'
devnull = 'nul'
def _get_empty(path):
if isinstance(path, bytes):
return b''
else:
return ''
def _get_sep(path):
if isinstance(path, bytes):
return b'\\'
else:
return '\\'
def _get_altsep(path):
if isinstance(path, bytes):
return b'/'
else:
return '/'
def _get_bothseps(path):
if isinstance(path, bytes):
return b'\\/'
else:
return '\\/'
def _get_dot(path):
if isinstance(path, bytes):
return b'.'
else:
return '.'
def _get_colon(path):
if isinstance(path, bytes):
return b':'
else:
return ':'
def _get_special(path):
if isinstance(path, bytes):
return (b'\\\\.\\', b'\\\\?\\')
else:
return ('\\\\.\\', '\\\\?\\')
# Normalize the case of a pathname and map slashes to backslashes.
# Other normalizations (such as optimizing '../' away) are not done
# (this is done by normpath).
def normcase(s):
"""Normalize case of pathname.
Makes all characters lowercase and all slashes into backslashes."""
if not isinstance(s, (bytes, str)):
raise TypeError("normcase() argument must be str or bytes, "
"not '{}'".format(s.__class__.__name__))
return s.replace(_get_altsep(s), _get_sep(s)).lower()
# Return whether a path is absolute.
# Trivial in Posix, harder on Windows.
# For Windows it is absolute if it starts with a slash or backslash (current
# volume), or if a pathname after the volume-letter-and-colon or UNC-resource
# starts with a slash or backslash.
def isabs(s):
"""Test whether a path is absolute"""
s = splitdrive(s)[1]
return len(s) > 0 and s[:1] in _get_bothseps(s)
# Join two (or more) paths.
def join(path, *paths):
sep = _get_sep(path)
seps = _get_bothseps(path)
colon = _get_colon(path)
result_drive, result_path = splitdrive(path)
for p in paths:
p_drive, p_path = splitdrive(p)
if p_path and p_path[0] in seps:
# Second path is absolute
if p_drive or not result_drive:
result_drive = p_drive
result_path = p_path
continue
elif p_drive and p_drive != result_drive:
if p_drive.lower() != result_drive.lower():
# Different drives => ignore the first path entirely
result_drive = p_drive
result_path = p_path
continue
# Same drive in different case
result_drive = p_drive
# Second path is relative to the first
if result_path and result_path[-1] not in seps:
result_path = result_path + sep
result_path = result_path + p_path
## add separator between UNC and non-absolute path
if (result_path and result_path[0] not in seps and
result_drive and result_drive[-1:] != colon):
return result_drive + sep + result_path
return result_drive + result_path
# Split a path in a drive specification (a drive letter followed by a
# colon) and the path specification.
# It is always true that drivespec + pathspec == p
def splitdrive(p):
"""Split a pathname into drive/UNC sharepoint and relative path specifiers.
Returns a 2-tuple (drive_or_unc, path); either part may be empty.
If you assign
result = splitdrive(p)
It is always true that:
result[0] + result[1] == p
If the path contained a drive letter, drive_or_unc will contain everything
up to and including the colon. e.g. splitdrive("c:/dir") returns ("c:", "/dir")
If the path contained a UNC path, the drive_or_unc will contain the host name
and share up to but not including the fourth directory separator character.
e.g. splitdrive("//host/computer/dir") returns ("//host/computer", "/dir")
Paths cannot contain both a drive letter and a UNC path.
"""
empty = _get_empty(p)
if len(p) > 1:
sep = _get_sep(p)
normp = p.replace(_get_altsep(p), sep)
if (normp[0:2] == sep*2) and (normp[2:3] != sep):
# is a UNC path:
# vvvvvvvvvvvvvvvvvvvv drive letter or UNC path
# \\machine\mountpoint\directory\etc\...
# directory ^^^^^^^^^^^^^^^
index = normp.find(sep, 2)
if index == -1:
return empty, p
index2 = normp.find(sep, index + 1)
# a UNC path can't have two slashes in a row
# (after the initial two)
if index2 == index + 1:
return empty, p
if index2 == -1:
index2 = len(p)
return p[:index2], p[index2:]
if normp[1:2] == _get_colon(p):
return p[:2], p[2:]
return empty, p
# Parse UNC paths
def splitunc(p):
"""Deprecated since Python 3.1. Please use splitdrive() instead;
it now handles UNC paths.
Split a pathname into UNC mount point and relative path specifiers.
Return a 2-tuple (unc, rest); either part may be empty.
If unc is not empty, it has the form '//host/mount' (or similar
using backslashes). unc+rest is always the input path.
Paths containing drive letters never have an UNC part.
"""
import warnings
warnings.warn("ntpath.splitunc is deprecated, use ntpath.splitdrive instead",
DeprecationWarning, 2)
drive, path = splitdrive(p)
if len(drive) == 2:
# Drive letter present
return p[:0], p
return drive, path
# Split a path in head (everything up to the last '/') and tail (the
# rest). After the trailing '/' is stripped, the invariant
# join(head, tail) == p holds.
# The resulting head won't end in '/' unless it is the root.
def split(p):
"""Split a pathname.
Return tuple (head, tail) where tail is everything after the final slash.
Either part may be empty."""
seps = _get_bothseps(p)
d, p = splitdrive(p)
# set i to index beyond p's last slash
i = len(p)
while i and p[i-1] not in seps:
i -= 1
head, tail = p[:i], p[i:] # now tail has no slashes
# remove trailing slashes from head, unless it's all slashes
head2 = head
while head2 and head2[-1:] in seps:
head2 = head2[:-1]
head = head2 or head
return d + head, tail
# Split a path in root and extension.
# The extension is everything starting at the last dot in the last
# pathname component; the root is everything before that.
# It is always true that root + ext == p.
def splitext(p):
return genericpath._splitext(p, _get_sep(p), _get_altsep(p),
_get_dot(p))
splitext.__doc__ = genericpath._splitext.__doc__
# Return the tail (basename) part of a path.
def basename(p):
"""Returns the final component of a pathname"""
return split(p)[1]
# Return the head (dirname) part of a path.
def dirname(p):
"""Returns the directory component of a pathname"""
return split(p)[0]
# Is a path a symbolic link?
# This will always return false on systems where os.lstat doesn't exist.
def islink(path):
"""Test whether a path is a symbolic link.
This will always return false for Windows prior to 6.0.
"""
try:
st = os.lstat(path)
except (OSError, AttributeError):
return False
return stat.S_ISLNK(st.st_mode)
# Being true for dangling symbolic links is also useful.
def lexists(path):
"""Test whether a path exists. Returns True for broken symbolic links"""
try:
st = os.lstat(path)
except OSError:
return False
return True
# Is a path a mount point?
# Any drive letter root (eg c:\)
# Any share UNC (eg \\server\share)
# Any volume mounted on a filesystem folder
#
# No one method detects all three situations. Historically we've lexically
# detected drive letter roots and share UNCs. The canonical approach to
# detecting mounted volumes (querying the reparse tag) fails for the most
# common case: drive letter roots. The alternative which uses GetVolumePathName
# fails if the drive letter is the result of a SUBST.
try:
from nt import _getvolumepathname
except ImportError:
_getvolumepathname = None
def ismount(path):
"""Test whether a path is a mount point (a drive root, the root of a
share, or a mounted volume)"""
seps = _get_bothseps(path)
path = abspath(path)
root, rest = splitdrive(path)
if root and root[0] in seps:
return (not rest) or (rest in seps)
if rest in seps:
return True
if _getvolumepathname:
return path.rstrip(seps) == _getvolumepathname(path).rstrip(seps)
else:
return False
# Expand paths beginning with '~' or '~user'.
# '~' means $HOME; '~user' means that user's home directory.
# If the path doesn't begin with '~', or if the user or $HOME is unknown,
# the path is returned unchanged (leaving error reporting to whatever
# function is called with the expanded path as argument).
# See also module 'glob' for expansion of *, ? and [...] in pathnames.
# (A function should also be defined to do full *sh-style environment
# variable expansion.)
def expanduser(path):
"""Expand ~ and ~user constructs.
If user or $HOME is unknown, do nothing."""
if isinstance(path, bytes):
tilde = b'~'
else:
tilde = '~'
if not path.startswith(tilde):
return path
i, n = 1, len(path)
while i < n and path[i] not in _get_bothseps(path):
i += 1
if 'HOME' in os.environ:
userhome = os.environ['HOME']
elif 'USERPROFILE' in os.environ:
userhome = os.environ['USERPROFILE']
elif not 'HOMEPATH' in os.environ:
return path
else:
try:
drive = os.environ['HOMEDRIVE']
except KeyError:
drive = ''
userhome = join(drive, os.environ['HOMEPATH'])
if isinstance(path, bytes):
userhome = userhome.encode(sys.getfilesystemencoding())
if i != 1: #~user
userhome = join(dirname(userhome), path[1:i])
return userhome + path[i:]
# Expand paths containing shell variable substitutions.
# The following rules apply:
# - no expansion within single quotes
# - '$$' is translated into '$'
# - '%%' is translated into '%' if '%%' are not seen in %var1%%var2%
# - ${varname} is accepted.
# - $varname is accepted.
# - %varname% is accepted.
# - varnames can be made out of letters, digits and the characters '_-'
# (though is not verified in the ${varname} and %varname% cases)
# XXX With COMMAND.COM you can use any characters in a variable name,
# XXX except '^|<>='.
def expandvars(path):
"""Expand shell variables of the forms $var, ${var} and %var%.
Unknown variables are left unchanged."""
if isinstance(path, bytes):
if ord('$') not in path and ord('%') not in path:
return path
import string
varchars = bytes(string.ascii_letters + string.digits + '_-', 'ascii')
quote = b'\''
percent = b'%'
brace = b'{'
dollar = b'$'
environ = getattr(os, 'environb', None)
else:
if '$' not in path and '%' not in path:
return path
import string
varchars = string.ascii_letters + string.digits + '_-'
quote = '\''
percent = '%'
brace = '{'
dollar = '$'
environ = os.environ
res = path[:0]
index = 0
pathlen = len(path)
while index < pathlen:
c = path[index:index+1]
if c == quote: # no expansion within single quotes
path = path[index + 1:]
pathlen = len(path)
try:
index = path.index(c)
res += c + path[:index + 1]
except ValueError:
res += c + path
index = pathlen - 1
elif c == percent: # variable or '%'
if path[index + 1:index + 2] == percent:
res += c
index += 1
else:
path = path[index+1:]
pathlen = len(path)
try:
index = path.index(percent)
except ValueError:
res += percent + path
index = pathlen - 1
else:
var = path[:index]
try:
if environ is None:
value = os.fsencode(os.environ[os.fsdecode(var)])
else:
value = environ[var]
except KeyError:
value = percent + var + percent
res += value
elif c == dollar: # variable or '$$'
if path[index + 1:index + 2] == dollar:
res += c
index += 1
elif path[index + 1:index + 2] == brace:
path = path[index+2:]
pathlen = len(path)
try:
if isinstance(path, bytes):
index = path.index(b'}')
else:
index = path.index('}')
except ValueError:
if isinstance(path, bytes):
res += b'${' + path
else:
res += '${' + path
index = pathlen - 1
else:
var = path[:index]
try:
if environ is None:
value = os.fsencode(os.environ[os.fsdecode(var)])
else:
value = environ[var]
except KeyError:
if isinstance(path, bytes):
value = b'${' + var + b'}'
else:
value = '${' + var + '}'
res += value
else:
var = path[:0]
index += 1
c = path[index:index + 1]
while c and c in varchars:
var += c
index += 1
c = path[index:index + 1]
try:
if environ is None:
value = os.fsencode(os.environ[os.fsdecode(var)])
else:
value = environ[var]
except KeyError:
value = dollar + var
res += value
if c:
index -= 1
else:
res += c
index += 1
return res
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A\B.
# Previously, this function also truncated pathnames to 8+3 format,
# but as this module is called "ntpath", that's obviously wrong!
def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
sep = _get_sep(path)
dotdot = _get_dot(path) * 2
special_prefixes = _get_special(path)
if path.startswith(special_prefixes):
# in the case of paths with these prefixes:
# \\.\ -> device names
# \\?\ -> literal paths
# do not do any normalization, but return the path unchanged
return path
path = path.replace(_get_altsep(path), sep)
prefix, path = splitdrive(path)
# collapse initial backslashes
if path.startswith(sep):
prefix += sep
path = path.lstrip(sep)
comps = path.split(sep)
i = 0
while i < len(comps):
if not comps[i] or comps[i] == _get_dot(path):
del comps[i]
elif comps[i] == dotdot:
if i > 0 and comps[i-1] != dotdot:
del comps[i-1:i+1]
i -= 1
elif i == 0 and prefix.endswith(_get_sep(path)):
del comps[i]
else:
i += 1
else:
i += 1
# If the path is now empty, substitute '.'
if not prefix and not comps:
comps.append(_get_dot(path))
return prefix + sep.join(comps)
# Return an absolute path.
try:
from nt import _getfullpathname
except ImportError: # not running on Windows - mock up something sensible
def abspath(path):
"""Return the absolute version of a path."""
if not isabs(path):
if isinstance(path, bytes):
cwd = os.getcwdb()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
else: # use native Windows method on Windows
def abspath(path):
"""Return the absolute version of a path."""
if path: # Empty path must return current working directory.
try:
path = _getfullpathname(path)
except OSError:
pass # Bad path - return unchanged.
elif isinstance(path, bytes):
path = os.getcwdb()
else:
path = os.getcwd()
return normpath(path)
# realpath is a no-op on systems without islink support
realpath = abspath
# Win9x family and earlier have no Unicode filename support.
supports_unicode_filenames = (hasattr(sys, "getwindowsversion") and
sys.getwindowsversion()[3] >= 2)
def relpath(path, start=curdir):
"""Return a relative version of a path"""
sep = _get_sep(path)
if start is curdir:
start = _get_dot(path)
if not path:
raise ValueError("no path specified")
start_abs = abspath(normpath(start))
path_abs = abspath(normpath(path))
start_drive, start_rest = splitdrive(start_abs)
path_drive, path_rest = splitdrive(path_abs)
if normcase(start_drive) != normcase(path_drive):
error = "path is on mount '{0}', start on mount '{1}'".format(
path_drive, start_drive)
raise ValueError(error)
start_list = [x for x in start_rest.split(sep) if x]
path_list = [x for x in path_rest.split(sep) if x]
# Work out how much of the filepath is shared by start and path.
i = 0
for e1, e2 in zip(start_list, path_list):
if normcase(e1) != normcase(e2):
break
i += 1
if isinstance(path, bytes):
pardir = b'..'
else:
pardir = '..'
rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
if not rel_list:
return _get_dot(path)
return join(*rel_list)
# determine if two files are in fact the same file
try:
# GetFinalPathNameByHandle is available starting with Windows 6.0.
# Windows XP and non-Windows OS'es will mock _getfinalpathname.
if sys.getwindowsversion()[:2] >= (6, 0):
from nt import _getfinalpathname
else:
raise ImportError
except (AttributeError, ImportError):
# On Windows XP and earlier, two files are the same if their absolute
# pathnames are the same.
# Non-Windows operating systems fake this method with an XP
# approximation.
def _getfinalpathname(f):
return normcase(abspath(f))
try:
# The genericpath.isdir implementation uses os.stat and checks the mode
# attribute to tell whether or not the path is a directory.
# This is overkill on Windows - just pass the path to GetFileAttributes
# and check the attribute from there.
from nt import _isdir as isdir
except ImportError:
# Use genericpath.isdir as imported above.
pass
"""Convert a NT pathname to a file URL and vice versa."""
def url2pathname(url):
"""OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use."""
# e.g.
# ///C|/foo/bar/spam.foo
# and
# ///C:/foo/bar/spam.foo
# become
# C:\foo\bar\spam.foo
import string, urllib.parse
# Windows itself uses ":" even in URLs.
url = url.replace(':', '|')
if not '|' in url:
# No drive specifier, just convert slashes
if url[:4] == '////':
# path is something like ////host/path/on/remote/host
# convert this to \\host\path\on\remote\host
# (notice halving of slashes at the start of the path)
url = url[2:]
components = url.split('/')
# make sure not to convert quoted slashes :-)
return urllib.parse.unquote('\\'.join(components))
comp = url.split('|')
if len(comp) != 2 or comp[0][-1] not in string.ascii_letters:
error = 'Bad URL: ' + url
raise OSError(error)
drive = comp[0][-1].upper()
components = comp[1].split('/')
path = drive + ':'
for comp in components:
if comp:
path = path + '\\' + urllib.parse.unquote(comp)
# Issue #11474 - handing url such as |c/|
if path.endswith(':') and url.endswith('/'):
path += '\\'
return path
def pathname2url(p):
"""OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use."""
# e.g.
# C:\foo\bar\spam.foo
# becomes
# ///C:/foo/bar/spam.foo
import urllib.parse
if not ':' in p:
# No drive specifier, just convert slashes and quote the name
if p[:2] == '\\\\':
# path is something like \\host\path\on\remote\host
# convert this to ////host/path/on/remote/host
# (notice doubling of slashes at the start of the path)
p = '\\\\' + p
components = p.split('\\')
return urllib.parse.quote('/'.join(components))
comp = p.split(':')
if len(comp) != 2 or len(comp[0]) > 1:
error = 'Bad path: ' + p
raise OSError(error)
drive = urllib.parse.quote(comp[0].upper())
components = comp[1].split('\\')
path = '///' + drive + ':'
for comp in components:
if comp:
path = path + '/' + urllib.parse.quote(comp)
return path
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Abstract Base Classes (ABCs) for numbers, according to PEP 3141.
TODO: Fill out more detailed documentation on the operators."""
from abc import ABCMeta, abstractmethod
__all__ = ["Number", "Complex", "Real", "Rational", "Integral"]
class Number(metaclass=ABCMeta):
"""All numbers inherit from this class.
If you just want to check if an argument x is a number, without
caring what kind, use isinstance(x, Number).
"""
__slots__ = ()
# Concrete numeric types must provide their own hash implementation
__hash__ = None
## Notes on Decimal
## ----------------
## Decimal has all of the methods specified by the Real abc, but it should
## not be registered as a Real because decimals do not interoperate with
## binary floats (i.e. Decimal('3.14') + 2.71828 is undefined). But,
## abstract reals are expected to interoperate (i.e. R1 + R2 should be
## expected to work if R1 and R2 are both Reals).
class Complex(Number):
"""Complex defines the operations that work on the builtin complex type.
In short, those are: a conversion to complex, .real, .imag, +, -,
*, /, abs(), .conjugate, ==, and !=.
If it is given heterogenous arguments, and doesn't have special
knowledge about them, it should fall back to the builtin complex
type as described below.
"""
__slots__ = ()
@abstractmethod
def __complex__(self):
"""Return a builtin complex instance. Called for complex(self)."""
def __bool__(self):
"""True if self != 0. Called for bool(self)."""
return self != 0
@property
@abstractmethod
def real(self):
"""Retrieve the real component of this number.
This should subclass Real.
"""
raise NotImplementedError
@property
@abstractmethod
def imag(self):
"""Retrieve the imaginary component of this number.
This should subclass Real.
"""
raise NotImplementedError
@abstractmethod
def __add__(self, other):
"""self + other"""
raise NotImplementedError
@abstractmethod
def __radd__(self, other):
"""other + self"""
raise NotImplementedError
@abstractmethod
def __neg__(self):
"""-self"""
raise NotImplementedError
@abstractmethod
def __pos__(self):
"""+self"""
raise NotImplementedError
def __sub__(self, other):
"""self - other"""
return self + -other
def __rsub__(self, other):
"""other - self"""
return -self + other
@abstractmethod
def __mul__(self, other):
"""self * other"""
raise NotImplementedError
@abstractmethod
def __rmul__(self, other):
"""other * self"""
raise NotImplementedError
@abstractmethod
def __truediv__(self, other):
"""self / other: Should promote to float when necessary."""
raise NotImplementedError
@abstractmethod
def __rtruediv__(self, other):
"""other / self"""
raise NotImplementedError
@abstractmethod
def __pow__(self, exponent):
"""self**exponent; should promote to float or complex when necessary."""
raise NotImplementedError
@abstractmethod
def __rpow__(self, base):
"""base ** self"""
raise NotImplementedError
@abstractmethod
def __abs__(self):
"""Returns the Real distance from 0. Called for abs(self)."""
raise NotImplementedError
@abstractmethod
def conjugate(self):
"""(x+y*i).conjugate() returns (x-y*i)."""
raise NotImplementedError
@abstractmethod
def __eq__(self, other):
"""self == other"""
raise NotImplementedError
Complex.register(complex)
class Real(Complex):
"""To Complex, Real adds the operations that work on real numbers.
In short, those are: a conversion to float, trunc(), divmod,
%, <, <=, >, and >=.
Real also provides defaults for the derived operations.
"""
__slots__ = ()
@abstractmethod
def __float__(self):
"""Any Real can be converted to a native float object.
Called for float(self)."""
raise NotImplementedError
@abstractmethod
def __trunc__(self):
"""trunc(self): Truncates self to an Integral.
Returns an Integral i such that:
* i>0 iff self>0;
* abs(i) <= abs(self);
* for any Integral j satisfying the first two conditions,
abs(i) >= abs(j) [i.e. i has "maximal" abs among those].
i.e. "truncate towards 0".
"""
raise NotImplementedError
@abstractmethod
def __floor__(self):
"""Finds the greatest Integral <= self."""
raise NotImplementedError
@abstractmethod
def __ceil__(self):
"""Finds the least Integral >= self."""
raise NotImplementedError
@abstractmethod
def __round__(self, ndigits=None):
"""Rounds self to ndigits decimal places, defaulting to 0.
If ndigits is omitted or None, returns an Integral, otherwise
returns a Real. Rounds half toward even.
"""
raise NotImplementedError
def __divmod__(self, other):
"""divmod(self, other): The pair (self // other, self % other).
Sometimes this can be computed faster than the pair of
operations.
"""
return (self // other, self % other)
def __rdivmod__(self, other):
"""divmod(other, self): The pair (self // other, self % other).
Sometimes this can be computed faster than the pair of
operations.
"""
return (other // self, other % self)
@abstractmethod
def __floordiv__(self, other):
"""self // other: The floor() of self/other."""
raise NotImplementedError
@abstractmethod
def __rfloordiv__(self, other):
"""other // self: The floor() of other/self."""
raise NotImplementedError
@abstractmethod
def __mod__(self, other):
"""self % other"""
raise NotImplementedError
@abstractmethod
def __rmod__(self, other):
"""other % self"""
raise NotImplementedError
@abstractmethod
def __lt__(self, other):
"""self < other
< on Reals defines a total ordering, except perhaps for NaN."""
raise NotImplementedError
@abstractmethod
def __le__(self, other):
"""self <= other"""
raise NotImplementedError
# Concrete implementations of Complex abstract methods.
def __complex__(self):
"""complex(self) == complex(float(self), 0)"""
return complex(float(self))
@property
def real(self):
"""Real numbers are their real component."""
return +self
@property
def imag(self):
"""Real numbers have no imaginary component."""
return 0
def conjugate(self):
"""Conjugate is a no-op for Reals."""
return +self
Real.register(float)
class Rational(Real):
""".numerator and .denominator should be in lowest terms."""
__slots__ = ()
@property
@abstractmethod
def numerator(self):
raise NotImplementedError
@property
@abstractmethod
def denominator(self):
raise NotImplementedError
# Concrete implementation of Real's conversion to float.
def __float__(self):
"""float(self) = self.numerator / self.denominator
It's important that this conversion use the integer's "true"
division rather than casting one side to float before dividing
so that ratios of huge integers convert without overflowing.
"""
return self.numerator / self.denominator
class Integral(Rational):
"""Integral adds a conversion to int and the bit-string operations."""
__slots__ = ()
@abstractmethod
def __int__(self):
"""int(self)"""
raise NotImplementedError
def __index__(self):
"""Called whenever an index is needed, such as in slicing"""
return int(self)
@abstractmethod
def __pow__(self, exponent, modulus=None):
"""self ** exponent % modulus, but maybe faster.
Accept the modulus argument if you want to support the
3-argument version of pow(). Raise a TypeError if exponent < 0
or any argument isn't Integral. Otherwise, just implement the
2-argument version described in Complex.
"""
raise NotImplementedError
@abstractmethod
def __lshift__(self, other):
"""self << other"""
raise NotImplementedError
@abstractmethod
def __rlshift__(self, other):
"""other << self"""
raise NotImplementedError
@abstractmethod
def __rshift__(self, other):
"""self >> other"""
raise NotImplementedError
@abstractmethod
def __rrshift__(self, other):
"""other >> self"""
raise NotImplementedError
@abstractmethod
def __and__(self, other):
"""self & other"""
raise NotImplementedError
@abstractmethod
def __rand__(self, other):
"""other & self"""
raise NotImplementedError
@abstractmethod
def __xor__(self, other):
"""self ^ other"""
raise NotImplementedError
@abstractmethod
def __rxor__(self, other):
"""other ^ self"""
raise NotImplementedError
@abstractmethod
def __or__(self, other):
"""self | other"""
raise NotImplementedError
@abstractmethod
def __ror__(self, other):
"""other | self"""
raise NotImplementedError
@abstractmethod
def __invert__(self):
"""~self"""
raise NotImplementedError
# Concrete implementations of Rational and Real abstract methods.
def __float__(self):
"""float(self) == float(int(self))"""
return float(int(self))
@property
def numerator(self):
"""Integers are their own numerators."""
return +self
@property
def denominator(self):
"""Integers have a denominator of 1."""
return 1
Integral.register(int)
"""
opcode module - potentially shared between dis and other modules which
operate on bytecodes (e.g. peephole optimizers).
"""
__all__ = ["cmp_op", "hasconst", "hasname", "hasjrel", "hasjabs",
"haslocal", "hascompare", "hasfree", "opname", "opmap",
"HAVE_ARGUMENT", "EXTENDED_ARG", "hasnargs"]
# It's a chicken-and-egg I'm afraid:
# We're imported before _opcode's made.
# With exception unheeded
# (stack_effect is not needed)
# Both our chickens and eggs are allayed.
# --Larry Hastings, 2013/11/23
try:
from _opcode import stack_effect
__all__.append('stack_effect')
except ImportError:
pass
cmp_op = ('<', '<=', '==', '!=', '>', '>=', 'in', 'not in', 'is',
'is not', 'exception match', 'BAD')
hasconst = []
hasname = []
hasjrel = []
hasjabs = []
haslocal = []
hascompare = []
hasfree = []
hasnargs = []
opmap = {}
opname = [''] * 256
for op in range(256): opname[op] = '<%r>' % (op,)
del op
def def_op(name, op):
opname[op] = name
opmap[name] = op
def name_op(name, op):
def_op(name, op)
hasname.append(op)
def jrel_op(name, op):
def_op(name, op)
hasjrel.append(op)
def jabs_op(name, op):
def_op(name, op)
hasjabs.append(op)
# Instruction opcodes for compiled code
# Blank lines correspond to available opcodes
def_op('POP_TOP', 1)
def_op('ROT_TWO', 2)
def_op('ROT_THREE', 3)
def_op('DUP_TOP', 4)
def_op('DUP_TOP_TWO', 5)
def_op('NOP', 9)
def_op('UNARY_POSITIVE', 10)
def_op('UNARY_NEGATIVE', 11)
def_op('UNARY_NOT', 12)
def_op('UNARY_INVERT', 15)
def_op('BINARY_POWER', 19)
def_op('BINARY_MULTIPLY', 20)
def_op('BINARY_MODULO', 22)
def_op('BINARY_ADD', 23)
def_op('BINARY_SUBTRACT', 24)
def_op('BINARY_SUBSCR', 25)
def_op('BINARY_FLOOR_DIVIDE', 26)
def_op('BINARY_TRUE_DIVIDE', 27)
def_op('INPLACE_FLOOR_DIVIDE', 28)
def_op('INPLACE_TRUE_DIVIDE', 29)
def_op('STORE_MAP', 54)
def_op('INPLACE_ADD', 55)
def_op('INPLACE_SUBTRACT', 56)
def_op('INPLACE_MULTIPLY', 57)
def_op('INPLACE_MODULO', 59)
def_op('STORE_SUBSCR', 60)
def_op('DELETE_SUBSCR', 61)
def_op('BINARY_LSHIFT', 62)
def_op('BINARY_RSHIFT', 63)
def_op('BINARY_AND', 64)
def_op('BINARY_XOR', 65)
def_op('BINARY_OR', 66)
def_op('INPLACE_POWER', 67)
def_op('GET_ITER', 68)
def_op('PRINT_EXPR', 70)
def_op('LOAD_BUILD_CLASS', 71)
def_op('YIELD_FROM', 72)
def_op('INPLACE_LSHIFT', 75)
def_op('INPLACE_RSHIFT', 76)
def_op('INPLACE_AND', 77)
def_op('INPLACE_XOR', 78)
def_op('INPLACE_OR', 79)
def_op('BREAK_LOOP', 80)
def_op('WITH_CLEANUP', 81)
def_op('RETURN_VALUE', 83)
def_op('IMPORT_STAR', 84)
def_op('YIELD_VALUE', 86)
def_op('POP_BLOCK', 87)
def_op('END_FINALLY', 88)
def_op('POP_EXCEPT', 89)
HAVE_ARGUMENT = 90 # Opcodes from here have an argument:
name_op('STORE_NAME', 90) # Index in name list
name_op('DELETE_NAME', 91) # ""
def_op('UNPACK_SEQUENCE', 92) # Number of tuple items
jrel_op('FOR_ITER', 93)
def_op('UNPACK_EX', 94)
name_op('STORE_ATTR', 95) # Index in name list
name_op('DELETE_ATTR', 96) # ""
name_op('STORE_GLOBAL', 97) # ""
name_op('DELETE_GLOBAL', 98) # ""
def_op('LOAD_CONST', 100) # Index in const list
hasconst.append(100)
name_op('LOAD_NAME', 101) # Index in name list
def_op('BUILD_TUPLE', 102) # Number of tuple items
def_op('BUILD_LIST', 103) # Number of list items
def_op('BUILD_SET', 104) # Number of set items
def_op('BUILD_MAP', 105) # Number of dict entries (upto 255)
name_op('LOAD_ATTR', 106) # Index in name list
def_op('COMPARE_OP', 107) # Comparison operator
hascompare.append(107)
name_op('IMPORT_NAME', 108) # Index in name list
name_op('IMPORT_FROM', 109) # Index in name list
jrel_op('JUMP_FORWARD', 110) # Number of bytes to skip
jabs_op('JUMP_IF_FALSE_OR_POP', 111) # Target byte offset from beginning of code
jabs_op('JUMP_IF_TRUE_OR_POP', 112) # ""
jabs_op('JUMP_ABSOLUTE', 113) # ""
jabs_op('POP_JUMP_IF_FALSE', 114) # ""
jabs_op('POP_JUMP_IF_TRUE', 115) # ""
name_op('LOAD_GLOBAL', 116) # Index in name list
jabs_op('CONTINUE_LOOP', 119) # Target address
jrel_op('SETUP_LOOP', 120) # Distance to target address
jrel_op('SETUP_EXCEPT', 121) # ""
jrel_op('SETUP_FINALLY', 122) # ""
def_op('LOAD_FAST', 124) # Local variable number
haslocal.append(124)
def_op('STORE_FAST', 125) # Local variable number
haslocal.append(125)
def_op('DELETE_FAST', 126) # Local variable number
haslocal.append(126)
def_op('RAISE_VARARGS', 130) # Number of raise arguments (1, 2, or 3)
def_op('CALL_FUNCTION', 131) # #args + (#kwargs << 8)
hasnargs.append(131)
def_op('MAKE_FUNCTION', 132) # Number of args with default values
def_op('BUILD_SLICE', 133) # Number of items
def_op('MAKE_CLOSURE', 134)
def_op('LOAD_CLOSURE', 135)
hasfree.append(135)
def_op('LOAD_DEREF', 136)
hasfree.append(136)
def_op('STORE_DEREF', 137)
hasfree.append(137)
def_op('DELETE_DEREF', 138)
hasfree.append(138)
def_op('CALL_FUNCTION_VAR', 140) # #args + (#kwargs << 8)
hasnargs.append(140)
def_op('CALL_FUNCTION_KW', 141) # #args + (#kwargs << 8)
hasnargs.append(141)
def_op('CALL_FUNCTION_VAR_KW', 142) # #args + (#kwargs << 8)
hasnargs.append(142)
jrel_op('SETUP_WITH', 143)
def_op('LIST_APPEND', 145)
def_op('SET_ADD', 146)
def_op('MAP_ADD', 147)
def_op('LOAD_CLASSDEREF', 148)
hasfree.append(148)
def_op('EXTENDED_ARG', 144)
EXTENDED_ARG = 144
del def_op, name_op, jrel_op, jabs_op
"""
Operator Interface
This module exports a set of functions corresponding to the intrinsic
operators of Python. For example, operator.add(x, y) is equivalent
to the expression x+y. The function names are those used for special
methods; variants without leading and trailing '__' are also provided
for convenience.
This is the pure Python implementation of the module.
"""
__all__ = ['abs', 'add', 'and_', 'attrgetter', 'concat', 'contains', 'countOf',
'delitem', 'eq', 'floordiv', 'ge', 'getitem', 'gt', 'iadd', 'iand',
'iconcat', 'ifloordiv', 'ilshift', 'imod', 'imul', 'index',
'indexOf', 'inv', 'invert', 'ior', 'ipow', 'irshift', 'is_',
'is_not', 'isub', 'itemgetter', 'itruediv', 'ixor', 'le',
'length_hint', 'lshift', 'lt', 'methodcaller', 'mod', 'mul', 'ne',
'neg', 'not_', 'or_', 'pos', 'pow', 'rshift', 'setitem', 'sub',
'truediv', 'truth', 'xor']
from builtins import abs as _abs
# Comparison Operations *******************************************************#
def lt(a, b):
"Same as a < b."
return a < b
def le(a, b):
"Same as a <= b."
return a <= b
def eq(a, b):
"Same as a == b."
return a == b
def ne(a, b):
"Same as a != b."
return a != b
def ge(a, b):
"Same as a >= b."
return a >= b
def gt(a, b):
"Same as a > b."
return a > b
# Logical Operations **********************************************************#
def not_(a):
"Same as not a."
return not a
def truth(a):
"Return True if a is true, False otherwise."
return True if a else False
def is_(a, b):
"Same as a is b."
return a is b
def is_not(a, b):
"Same as a is not b."
return a is not b
# Mathematical/Bitwise Operations *********************************************#
def abs(a):
"Same as abs(a)."
return _abs(a)
def add(a, b):
"Same as a + b."
return a + b
def and_(a, b):
"Same as a & b."
return a & b
def floordiv(a, b):
"Same as a // b."
return a // b
def index(a):
"Same as a.__index__()."
return a.__index__()
def inv(a):
"Same as ~a."
return ~a
invert = inv
def lshift(a, b):
"Same as a << b."
return a << b
def mod(a, b):
"Same as a % b."
return a % b
def mul(a, b):
"Same as a * b."
return a * b
def neg(a):
"Same as -a."
return -a
def or_(a, b):
"Same as a | b."
return a | b
def pos(a):
"Same as +a."
return +a
def pow(a, b):
"Same as a ** b."
return a ** b
def rshift(a, b):
"Same as a >> b."
return a >> b
def sub(a, b):
"Same as a - b."
return a - b
def truediv(a, b):
"Same as a / b."
return a / b
def xor(a, b):
"Same as a ^ b."
return a ^ b
# Sequence Operations *********************************************************#
def concat(a, b):
"Same as a + b, for a and b sequences."
if not hasattr(a, '__getitem__'):
msg = "'%s' object can't be concatenated" % type(a).__name__
raise TypeError(msg)
return a + b
def contains(a, b):
"Same as b in a (note reversed operands)."
return b in a
def countOf(a, b):
"Return the number of times b occurs in a."
count = 0
for i in a:
if i == b:
count += 1
return count
def delitem(a, b):
"Same as del a[b]."
del a[b]
def getitem(a, b):
"Same as a[b]."
return a[b]
def indexOf(a, b):
"Return the first index of b in a."
for i, j in enumerate(a):
if j == b:
return i
else:
raise ValueError('sequence.index(x): x not in sequence')
def setitem(a, b, c):
"Same as a[b] = c."
a[b] = c
def length_hint(obj, default=0):
"""
Return an estimate of the number of items in obj.
This is useful for presizing containers when building from an iterable.
If the object supports len(), the result will be exact. Otherwise, it may
over- or under-estimate by an arbitrary amount. The result will be an
integer >= 0.
"""
if not isinstance(default, int):
msg = ("'%s' object cannot be interpreted as an integer" %
type(default).__name__)
raise TypeError(msg)
try:
return len(obj)
except TypeError:
pass
try:
hint = type(obj).__length_hint__
except AttributeError:
return default
try:
val = hint(obj)
except TypeError:
return default
if val is NotImplemented:
return default
if not isinstance(val, int):
msg = ('__length_hint__ must be integer, not %s' %
type(val).__name__)
raise TypeError(msg)
if val < 0:
msg = '__length_hint__() should return >= 0'
raise ValueError(msg)
return val
# Generalized Lookup Objects **************************************************#
class attrgetter:
"""
Return a callable object that fetches the given attribute(s) from its operand.
After f = attrgetter('name'), the call f(r) returns r.name.
After g = attrgetter('name', 'date'), the call g(r) returns (r.name, r.date).
After h = attrgetter('name.first', 'name.last'), the call h(r) returns
(r.name.first, r.name.last).
"""
def __init__(self, attr, *attrs):
if not attrs:
if not isinstance(attr, str):
raise TypeError('attribute name must be a string')
names = attr.split('.')
def func(obj):
for name in names:
obj = getattr(obj, name)
return obj
self._call = func
else:
getters = tuple(map(attrgetter, (attr,) + attrs))
def func(obj):
return tuple(getter(obj) for getter in getters)
self._call = func
def __call__(self, obj):
return self._call(obj)
class itemgetter:
"""
Return a callable object that fetches the given item(s) from its operand.
After f = itemgetter(2), the call f(r) returns r[2].
After g = itemgetter(2, 5, 3), the call g(r) returns (r[2], r[5], r[3])
"""
def __init__(self, item, *items):
if not items:
def func(obj):
return obj[item]
self._call = func
else:
items = (item,) + items
def func(obj):
return tuple(obj[i] for i in items)
self._call = func
def __call__(self, obj):
return self._call(obj)
class methodcaller:
"""
Return a callable object that calls the given method on its operand.
After f = methodcaller('name'), the call f(r) returns r.name().
After g = methodcaller('name', 'date', foo=1), the call g(r) returns
r.name('date', foo=1).
"""
def __init__(*args, **kwargs):
if len(args) < 2:
msg = "methodcaller needs at least one argument, the method name"
raise TypeError(msg)
self = args[0]
self._name = args[1]
self._args = args[2:]
self._kwargs = kwargs
def __call__(self, obj):
return getattr(obj, self._name)(*self._args, **self._kwargs)
# In-place Operations *********************************************************#
def iadd(a, b):
"Same as a += b."
a += b
return a
def iand(a, b):
"Same as a &= b."
a &= b
return a
def iconcat(a, b):
"Same as a += b, for a and b sequences."
if not hasattr(a, '__getitem__'):
msg = "'%s' object can't be concatenated" % type(a).__name__
raise TypeError(msg)
a += b
return a
def ifloordiv(a, b):
"Same as a //= b."
a //= b
return a
def ilshift(a, b):
"Same as a <<= b."
a <<= b
return a
def imod(a, b):
"Same as a %= b."
a %= b
return a
def imul(a, b):
"Same as a *= b."
a *= b
return a
def ior(a, b):
"Same as a |= b."
a |= b
return a
def ipow(a, b):
"Same as a **= b."
a **=b
return a
def irshift(a, b):
"Same as a >>= b."
a >>= b
return a
def isub(a, b):
"Same as a -= b."
a -= b
return a
def itruediv(a, b):
"Same as a /= b."
a /= b
return a
def ixor(a, b):
"Same as a ^= b."
a ^= b
return a
try:
from _operator import *
except ImportError:
pass
else:
from _operator import __doc__
# All of these "__func__ = func" assignments have to happen after importing
# from _operator to make sure they're set to the right function
__lt__ = lt
__le__ = le
__eq__ = eq
__ne__ = ne
__ge__ = ge
__gt__ = gt
__not__ = not_
__abs__ = abs
__add__ = add
__and__ = and_
__floordiv__ = floordiv
__index__ = index
__inv__ = inv
__invert__ = invert
__lshift__ = lshift
__mod__ = mod
__mul__ = mul
__neg__ = neg
__or__ = or_
__pos__ = pos
__pow__ = pow
__rshift__ = rshift
__sub__ = sub
__truediv__ = truediv
__xor__ = xor
__concat__ = concat
__contains__ = contains
__delitem__ = delitem
__getitem__ = getitem
__setitem__ = setitem
__iadd__ = iadd
__iand__ = iand
__iconcat__ = iconcat
__ifloordiv__ = ifloordiv
__ilshift__ = ilshift
__imod__ = imod
__imul__ = imul
__ior__ = ior
__ipow__ = ipow
__irshift__ = irshift
__isub__ = isub
__itruediv__ = itruediv
__ixor__ = ixor
"""A powerful, extensible, and easy-to-use option parser.
By Greg Ward <[email protected]>
Originally distributed as Optik.
For support, use the [email protected] mailing list
(http://lists.sourceforge.net/lists/listinfo/optik-users).
Simple usage example:
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-f", "--file", dest="filename",
help="write report to FILE", metavar="FILE")
parser.add_option("-q", "--quiet",
action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
(options, args) = parser.parse_args()
"""
__version__ = "1.5.3"
__all__ = ['Option',
'make_option',
'SUPPRESS_HELP',
'SUPPRESS_USAGE',
'Values',
'OptionContainer',
'OptionGroup',
'OptionParser',
'HelpFormatter',
'IndentedHelpFormatter',
'TitledHelpFormatter',
'OptParseError',
'OptionError',
'OptionConflictError',
'OptionValueError',
'BadOptionError']
__copyright__ = """
Copyright (c) 2001-2006 Gregory P. Ward. All rights reserved.
Copyright (c) 2002-2006 Python Software Foundation. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the author nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
import sys, os
import textwrap
def _repr(self):
return "<%s at 0x%x: %s>" % (self.__class__.__name__, id(self), self)
# This file was generated from:
# Id: option_parser.py 527 2006-07-23 15:21:30Z greg
# Id: option.py 522 2006-06-11 16:22:03Z gward
# Id: help.py 527 2006-07-23 15:21:30Z greg
# Id: errors.py 509 2006-04-20 00:58:24Z gward
try:
from gettext import gettext, ngettext
except ImportError:
def gettext(message):
return message
def ngettext(singular, plural, n):
if n == 1:
return singular
return plural
_ = gettext
class OptParseError (Exception):
def __init__(self, msg):
self.msg = msg
def __str__(self):
return self.msg
class OptionError (OptParseError):
"""
Raised if an Option instance is created with invalid or
inconsistent arguments.
"""
def __init__(self, msg, option):
self.msg = msg
self.option_id = str(option)
def __str__(self):
if self.option_id:
return "option %s: %s" % (self.option_id, self.msg)
else:
return self.msg
class OptionConflictError (OptionError):
"""
Raised if conflicting options are added to an OptionParser.
"""
class OptionValueError (OptParseError):
"""
Raised if an invalid option value is encountered on the command
line.
"""
class BadOptionError (OptParseError):
"""
Raised if an invalid option is seen on the command line.
"""
def __init__(self, opt_str):
self.opt_str = opt_str
def __str__(self):
return _("no such option: %s") % self.opt_str
class AmbiguousOptionError (BadOptionError):
"""
Raised if an ambiguous option is seen on the command line.
"""
def __init__(self, opt_str, possibilities):
BadOptionError.__init__(self, opt_str)
self.possibilities = possibilities
def __str__(self):
return (_("ambiguous option: %s (%s?)")
% (self.opt_str, ", ".join(self.possibilities)))
class HelpFormatter:
"""
Abstract base class for formatting option help. OptionParser
instances should use one of the HelpFormatter subclasses for
formatting help; by default IndentedHelpFormatter is used.
Instance attributes:
parser : OptionParser
the controlling OptionParser instance
indent_increment : int
the number of columns to indent per nesting level
max_help_position : int
the maximum starting column for option help text
help_position : int
the calculated starting column for option help text;
initially the same as the maximum
width : int
total number of columns for output (pass None to constructor for
this value to be taken from the $COLUMNS environment variable)
level : int
current indentation level
current_indent : int
current indentation level (in columns)
help_width : int
number of columns available for option help text (calculated)
default_tag : str
text to replace with each option's default value, "%default"
by default. Set to false value to disable default value expansion.
option_strings : { Option : str }
maps Option instances to the snippet of help text explaining
the syntax of that option, e.g. "-h, --help" or
"-fFILE, --file=FILE"
_short_opt_fmt : str
format string controlling how short options with values are
printed in help text. Must be either "%s%s" ("-fFILE") or
"%s %s" ("-f FILE"), because those are the two syntaxes that
Optik supports.
_long_opt_fmt : str
similar but for long options; must be either "%s %s" ("--file FILE")
or "%s=%s" ("--file=FILE").
"""
NO_DEFAULT_VALUE = "none"
def __init__(self,
indent_increment,
max_help_position,
width,
short_first):
self.parser = None
self.indent_increment = indent_increment
if width is None:
try:
width = int(os.environ['COLUMNS'])
except (KeyError, ValueError):
width = 80
width -= 2
self.width = width
self.help_position = self.max_help_position = \
min(max_help_position, max(width - 20, indent_increment * 2))
self.current_indent = 0
self.level = 0
self.help_width = None # computed later
self.short_first = short_first
self.default_tag = "%default"
self.option_strings = {}
self._short_opt_fmt = "%s %s"
self._long_opt_fmt = "%s=%s"
def set_parser(self, parser):
self.parser = parser
def set_short_opt_delimiter(self, delim):
if delim not in ("", " "):
raise ValueError(
"invalid metavar delimiter for short options: %r" % delim)
self._short_opt_fmt = "%s" + delim + "%s"
def set_long_opt_delimiter(self, delim):
if delim not in ("=", " "):
raise ValueError(
"invalid metavar delimiter for long options: %r" % delim)
self._long_opt_fmt = "%s" + delim + "%s"
def indent(self):
self.current_indent += self.indent_increment
self.level += 1
def dedent(self):
self.current_indent -= self.indent_increment
assert self.current_indent >= 0, "Indent decreased below 0."
self.level -= 1
def format_usage(self, usage):
raise NotImplementedError("subclasses must implement")
def format_heading(self, heading):
raise NotImplementedError("subclasses must implement")
def _format_text(self, text):
"""
Format a paragraph of free-form text for inclusion in the
help output at the current indentation level.
"""
text_width = max(self.width - self.current_indent, 11)
indent = " "*self.current_indent
return textwrap.fill(text,
text_width,
initial_indent=indent,
subsequent_indent=indent)
def format_description(self, description):
if description:
return self._format_text(description) + "\n"
else:
return ""
def format_epilog(self, epilog):
if epilog:
return "\n" + self._format_text(epilog) + "\n"
else:
return ""
def expand_default(self, option):
if self.parser is None or not self.default_tag:
return option.help
default_value = self.parser.defaults.get(option.dest)
if default_value is NO_DEFAULT or default_value is None:
default_value = self.NO_DEFAULT_VALUE
return option.help.replace(self.default_tag, str(default_value))
def format_option(self, option):
# The help for each option consists of two parts:
# * the opt strings and metavars
# eg. ("-x", or "-fFILENAME, --file=FILENAME")
# * the user-supplied help string
# eg. ("turn on expert mode", "read data from FILENAME")
#
# If possible, we write both of these on the same line:
# -x turn on expert mode
#
# But if the opt string list is too long, we put the help
# string on a second line, indented to the same column it would
# start in if it fit on the first line.
# -fFILENAME, --file=FILENAME
# read data from FILENAME
result = []
opts = self.option_strings[option]
opt_width = self.help_position - self.current_indent - 2
if len(opts) > opt_width:
opts = "%*s%s\n" % (self.current_indent, "", opts)
indent_first = self.help_position
else: # start help on same line as opts
opts = "%*s%-*s " % (self.current_indent, "", opt_width, opts)
indent_first = 0
result.append(opts)
if option.help:
help_text = self.expand_default(option)
help_lines = textwrap.wrap(help_text, self.help_width)
result.append("%*s%s\n" % (indent_first, "", help_lines[0]))
result.extend(["%*s%s\n" % (self.help_position, "", line)
for line in help_lines[1:]])
elif opts[-1] != "\n":
result.append("\n")
return "".join(result)
def store_option_strings(self, parser):
self.indent()
max_len = 0
for opt in parser.option_list:
strings = self.format_option_strings(opt)
self.option_strings[opt] = strings
max_len = max(max_len, len(strings) + self.current_indent)
self.indent()
for group in parser.option_groups:
for opt in group.option_list:
strings = self.format_option_strings(opt)
self.option_strings[opt] = strings
max_len = max(max_len, len(strings) + self.current_indent)
self.dedent()
self.dedent()
self.help_position = min(max_len + 2, self.max_help_position)
self.help_width = max(self.width - self.help_position, 11)
def format_option_strings(self, option):
"""Return a comma-separated list of option strings & metavariables."""
if option.takes_value():
metavar = option.metavar or option.dest.upper()
short_opts = [self._short_opt_fmt % (sopt, metavar)
for sopt in option._short_opts]
long_opts = [self._long_opt_fmt % (lopt, metavar)
for lopt in option._long_opts]
else:
short_opts = option._short_opts
long_opts = option._long_opts
if self.short_first:
opts = short_opts + long_opts
else:
opts = long_opts + short_opts
return ", ".join(opts)
class IndentedHelpFormatter (HelpFormatter):
"""Format help with indented section bodies.
"""
def __init__(self,
indent_increment=2,
max_help_position=24,
width=None,
short_first=1):
HelpFormatter.__init__(
self, indent_increment, max_help_position, width, short_first)
def format_usage(self, usage):
return _("Usage: %s\n") % usage
def format_heading(self, heading):
return "%*s%s:\n" % (self.current_indent, "", heading)
class TitledHelpFormatter (HelpFormatter):
"""Format help with underlined section headers.
"""
def __init__(self,
indent_increment=0,
max_help_position=24,
width=None,
short_first=0):
HelpFormatter.__init__ (
self, indent_increment, max_help_position, width, short_first)
def format_usage(self, usage):
return "%s %s\n" % (self.format_heading(_("Usage")), usage)
def format_heading(self, heading):
return "%s\n%s\n" % (heading, "=-"[self.level] * len(heading))
def _parse_num(val, type):
if val[:2].lower() == "0x": # hexadecimal
radix = 16
elif val[:2].lower() == "0b": # binary
radix = 2
val = val[2:] or "0" # have to remove "0b" prefix
elif val[:1] == "0": # octal
radix = 8
else: # decimal
radix = 10
return type(val, radix)
def _parse_int(val):
return _parse_num(val, int)
_builtin_cvt = { "int" : (_parse_int, _("integer")),
"long" : (_parse_int, _("integer")),
"float" : (float, _("floating-point")),
"complex" : (complex, _("complex")) }
def check_builtin(option, opt, value):
(cvt, what) = _builtin_cvt[option.type]
try:
return cvt(value)
except ValueError:
raise OptionValueError(
_("option %s: invalid %s value: %r") % (opt, what, value))
def check_choice(option, opt, value):
if value in option.choices:
return value
else:
choices = ", ".join(map(repr, option.choices))
raise OptionValueError(
_("option %s: invalid choice: %r (choose from %s)")
% (opt, value, choices))
# Not supplying a default is different from a default of None,
# so we need an explicit "not supplied" value.
NO_DEFAULT = ("NO", "DEFAULT")
class Option:
"""
Instance attributes:
_short_opts : [string]
_long_opts : [string]
action : string
type : string
dest : string
default : any
nargs : int
const : any
choices : [string]
callback : function
callback_args : (any*)
callback_kwargs : { string : any }
help : string
metavar : string
"""
# The list of instance attributes that may be set through
# keyword args to the constructor.
ATTRS = ['action',
'type',
'dest',
'default',
'nargs',
'const',
'choices',
'callback',
'callback_args',
'callback_kwargs',
'help',
'metavar']
# The set of actions allowed by option parsers. Explicitly listed
# here so the constructor can validate its arguments.
ACTIONS = ("store",
"store_const",
"store_true",
"store_false",
"append",
"append_const",
"count",
"callback",
"help",
"version")
# The set of actions that involve storing a value somewhere;
# also listed just for constructor argument validation. (If
# the action is one of these, there must be a destination.)
STORE_ACTIONS = ("store",
"store_const",
"store_true",
"store_false",
"append",
"append_const",
"count")
# The set of actions for which it makes sense to supply a value
# type, ie. which may consume an argument from the command line.
TYPED_ACTIONS = ("store",
"append",
"callback")
# The set of actions which *require* a value type, ie. that
# always consume an argument from the command line.
ALWAYS_TYPED_ACTIONS = ("store",
"append")
# The set of actions which take a 'const' attribute.
CONST_ACTIONS = ("store_const",
"append_const")
# The set of known types for option parsers. Again, listed here for
# constructor argument validation.
TYPES = ("string", "int", "long", "float", "complex", "choice")
# Dictionary of argument checking functions, which convert and
# validate option arguments according to the option type.
#
# Signature of checking functions is:
# check(option : Option, opt : string, value : string) -> any
# where
# option is the Option instance calling the checker
# opt is the actual option seen on the command-line
# (eg. "-a", "--file")
# value is the option argument seen on the command-line
#
# The return value should be in the appropriate Python type
# for option.type -- eg. an integer if option.type == "int".
#
# If no checker is defined for a type, arguments will be
# unchecked and remain strings.
TYPE_CHECKER = { "int" : check_builtin,
"long" : check_builtin,
"float" : check_builtin,
"complex": check_builtin,
"choice" : check_choice,
}
# CHECK_METHODS is a list of unbound method objects; they are called
# by the constructor, in order, after all attributes are
# initialized. The list is created and filled in later, after all
# the methods are actually defined. (I just put it here because I
# like to define and document all class attributes in the same
# place.) Subclasses that add another _check_*() method should
# define their own CHECK_METHODS list that adds their check method
# to those from this class.
CHECK_METHODS = None
# -- Constructor/initialization methods ----------------------------
def __init__(self, *opts, **attrs):
# Set _short_opts, _long_opts attrs from 'opts' tuple.
# Have to be set now, in case no option strings are supplied.
self._short_opts = []
self._long_opts = []
opts = self._check_opt_strings(opts)
self._set_opt_strings(opts)
# Set all other attrs (action, type, etc.) from 'attrs' dict
self._set_attrs(attrs)
# Check all the attributes we just set. There are lots of
# complicated interdependencies, but luckily they can be farmed
# out to the _check_*() methods listed in CHECK_METHODS -- which
# could be handy for subclasses! The one thing these all share
# is that they raise OptionError if they discover a problem.
for checker in self.CHECK_METHODS:
checker(self)
def _check_opt_strings(self, opts):
# Filter out None because early versions of Optik had exactly
# one short option and one long option, either of which
# could be None.
opts = [opt for opt in opts if opt]
if not opts:
raise TypeError("at least one option string must be supplied")
return opts
def _set_opt_strings(self, opts):
for opt in opts:
if len(opt) < 2:
raise OptionError(
"invalid option string %r: "
"must be at least two characters long" % opt, self)
elif len(opt) == 2:
if not (opt[0] == "-" and opt[1] != "-"):
raise OptionError(
"invalid short option string %r: "
"must be of the form -x, (x any non-dash char)" % opt,
self)
self._short_opts.append(opt)
else:
if not (opt[0:2] == "--" and opt[2] != "-"):
raise OptionError(
"invalid long option string %r: "
"must start with --, followed by non-dash" % opt,
self)
self._long_opts.append(opt)
def _set_attrs(self, attrs):
for attr in self.ATTRS:
if attr in attrs:
setattr(self, attr, attrs[attr])
del attrs[attr]
else:
if attr == 'default':
setattr(self, attr, NO_DEFAULT)
else:
setattr(self, attr, None)
if attrs:
attrs = sorted(attrs.keys())
raise OptionError(
"invalid keyword arguments: %s" % ", ".join(attrs),
self)
# -- Constructor validation methods --------------------------------
def _check_action(self):
if self.action is None:
self.action = "store"
elif self.action not in self.ACTIONS:
raise OptionError("invalid action: %r" % self.action, self)
def _check_type(self):
if self.type is None:
if self.action in self.ALWAYS_TYPED_ACTIONS:
if self.choices is not None:
# The "choices" attribute implies "choice" type.
self.type = "choice"
else:
# No type given? "string" is the most sensible default.
self.type = "string"
else:
# Allow type objects or builtin type conversion functions
# (int, str, etc.) as an alternative to their names.
if isinstance(self.type, type):
self.type = self.type.__name__
if self.type == "str":
self.type = "string"
if self.type not in self.TYPES:
raise OptionError("invalid option type: %r" % self.type, self)
if self.action not in self.TYPED_ACTIONS:
raise OptionError(
"must not supply a type for action %r" % self.action, self)
def _check_choice(self):
if self.type == "choice":
if self.choices is None:
raise OptionError(
"must supply a list of choices for type 'choice'", self)
elif not isinstance(self.choices, (tuple, list)):
raise OptionError(
"choices must be a list of strings ('%s' supplied)"
% str(type(self.choices)).split("'")[1], self)
elif self.choices is not None:
raise OptionError(
"must not supply choices for type %r" % self.type, self)
def _check_dest(self):
# No destination given, and we need one for this action. The
# self.type check is for callbacks that take a value.
takes_value = (self.action in self.STORE_ACTIONS or
self.type is not None)
if self.dest is None and takes_value:
# Glean a destination from the first long option string,
# or from the first short option string if no long options.
if self._long_opts:
# eg. "--foo-bar" -> "foo_bar"
self.dest = self._long_opts[0][2:].replace('-', '_')
else:
self.dest = self._short_opts[0][1]
def _check_const(self):
if self.action not in self.CONST_ACTIONS and self.const is not None:
raise OptionError(
"'const' must not be supplied for action %r" % self.action,
self)
def _check_nargs(self):
if self.action in self.TYPED_ACTIONS:
if self.nargs is None:
self.nargs = 1
elif self.nargs is not None:
raise OptionError(
"'nargs' must not be supplied for action %r" % self.action,
self)
def _check_callback(self):
if self.action == "callback":
if not callable(self.callback):
raise OptionError(
"callback not callable: %r" % self.callback, self)
if (self.callback_args is not None and
not isinstance(self.callback_args, tuple)):
raise OptionError(
"callback_args, if supplied, must be a tuple: not %r"
% self.callback_args, self)
if (self.callback_kwargs is not None and
not isinstance(self.callback_kwargs, dict)):
raise OptionError(
"callback_kwargs, if supplied, must be a dict: not %r"
% self.callback_kwargs, self)
else:
if self.callback is not None:
raise OptionError(
"callback supplied (%r) for non-callback option"
% self.callback, self)
if self.callback_args is not None:
raise OptionError(
"callback_args supplied for non-callback option", self)
if self.callback_kwargs is not None:
raise OptionError(
"callback_kwargs supplied for non-callback option", self)
CHECK_METHODS = [_check_action,
_check_type,
_check_choice,
_check_dest,
_check_const,
_check_nargs,
_check_callback]
# -- Miscellaneous methods -----------------------------------------
def __str__(self):
return "/".join(self._short_opts + self._long_opts)
__repr__ = _repr
def takes_value(self):
return self.type is not None
def get_opt_string(self):
if self._long_opts:
return self._long_opts[0]
else:
return self._short_opts[0]
# -- Processing methods --------------------------------------------
def check_value(self, opt, value):
checker = self.TYPE_CHECKER.get(self.type)
if checker is None:
return value
else:
return checker(self, opt, value)
def convert_value(self, opt, value):
if value is not None:
if self.nargs == 1:
return self.check_value(opt, value)
else:
return tuple([self.check_value(opt, v) for v in value])
def process(self, opt, value, values, parser):
# First, convert the value(s) to the right type. Howl if any
# value(s) are bogus.
value = self.convert_value(opt, value)
# And then take whatever action is expected of us.
# This is a separate method to make life easier for
# subclasses to add new actions.
return self.take_action(
self.action, self.dest, opt, value, values, parser)
def take_action(self, action, dest, opt, value, values, parser):
if action == "store":
setattr(values, dest, value)
elif action == "store_const":
setattr(values, dest, self.const)
elif action == "store_true":
setattr(values, dest, True)
elif action == "store_false":
setattr(values, dest, False)
elif action == "append":
values.ensure_value(dest, []).append(value)
elif action == "append_const":
values.ensure_value(dest, []).append(self.const)
elif action == "count":
setattr(values, dest, values.ensure_value(dest, 0) + 1)
elif action == "callback":
args = self.callback_args or ()
kwargs = self.callback_kwargs or {}
self.callback(self, opt, value, parser, *args, **kwargs)
elif action == "help":
parser.print_help()
parser.exit()
elif action == "version":
parser.print_version()
parser.exit()
else:
raise ValueError("unknown action %r" % self.action)
return 1
# class Option
SUPPRESS_HELP = "SUPPRESS"+"HELP"
SUPPRESS_USAGE = "SUPPRESS"+"USAGE"
class Values:
def __init__(self, defaults=None):
if defaults:
for (attr, val) in defaults.items():
setattr(self, attr, val)
def __str__(self):
return str(self.__dict__)
__repr__ = _repr
def __eq__(self, other):
if isinstance(other, Values):
return self.__dict__ == other.__dict__
elif isinstance(other, dict):
return self.__dict__ == other
else:
return NotImplemented
def _update_careful(self, dict):
"""
Update the option values from an arbitrary dictionary, but only
use keys from dict that already have a corresponding attribute
in self. Any keys in dict without a corresponding attribute
are silently ignored.
"""
for attr in dir(self):
if attr in dict:
dval = dict[attr]
if dval is not None:
setattr(self, attr, dval)
def _update_loose(self, dict):
"""
Update the option values from an arbitrary dictionary,
using all keys from the dictionary regardless of whether
they have a corresponding attribute in self or not.
"""
self.__dict__.update(dict)
def _update(self, dict, mode):
if mode == "careful":
self._update_careful(dict)
elif mode == "loose":
self._update_loose(dict)
else:
raise ValueError("invalid update mode: %r" % mode)
def read_module(self, modname, mode="careful"):
__import__(modname)
mod = sys.modules[modname]
self._update(vars(mod), mode)
def read_file(self, filename, mode="careful"):
vars = {}
exec(open(filename).read(), vars)
self._update(vars, mode)
def ensure_value(self, attr, value):
if not hasattr(self, attr) or getattr(self, attr) is None:
setattr(self, attr, value)
return getattr(self, attr)
class OptionContainer:
"""
Abstract base class.
Class attributes:
standard_option_list : [Option]
list of standard options that will be accepted by all instances
of this parser class (intended to be overridden by subclasses).
Instance attributes:
option_list : [Option]
the list of Option objects contained by this OptionContainer
_short_opt : { string : Option }
dictionary mapping short option strings, eg. "-f" or "-X",
to the Option instances that implement them. If an Option
has multiple short option strings, it will appears in this
dictionary multiple times. [1]
_long_opt : { string : Option }
dictionary mapping long option strings, eg. "--file" or
"--exclude", to the Option instances that implement them.
Again, a given Option can occur multiple times in this
dictionary. [1]
defaults : { string : any }
dictionary mapping option destination names to default
values for each destination [1]
[1] These mappings are common to (shared by) all components of the
controlling OptionParser, where they are initially created.
"""
def __init__(self, option_class, conflict_handler, description):
# Initialize the option list and related data structures.
# This method must be provided by subclasses, and it must
# initialize at least the following instance attributes:
# option_list, _short_opt, _long_opt, defaults.
self._create_option_list()
self.option_class = option_class
self.set_conflict_handler(conflict_handler)
self.set_description(description)
def _create_option_mappings(self):
# For use by OptionParser constructor -- create the master
# option mappings used by this OptionParser and all
# OptionGroups that it owns.
self._short_opt = {} # single letter -> Option instance
self._long_opt = {} # long option -> Option instance
self.defaults = {} # maps option dest -> default value
def _share_option_mappings(self, parser):
# For use by OptionGroup constructor -- use shared option
# mappings from the OptionParser that owns this OptionGroup.
self._short_opt = parser._short_opt
self._long_opt = parser._long_opt
self.defaults = parser.defaults
def set_conflict_handler(self, handler):
if handler not in ("error", "resolve"):
raise ValueError("invalid conflict_resolution value %r" % handler)
self.conflict_handler = handler
def set_description(self, description):
self.description = description
def get_description(self):
return self.description
def destroy(self):
"""see OptionParser.destroy()."""
del self._short_opt
del self._long_opt
del self.defaults
# -- Option-adding methods -----------------------------------------
def _check_conflict(self, option):
conflict_opts = []
for opt in option._short_opts:
if opt in self._short_opt:
conflict_opts.append((opt, self._short_opt[opt]))
for opt in option._long_opts:
if opt in self._long_opt:
conflict_opts.append((opt, self._long_opt[opt]))
if conflict_opts:
handler = self.conflict_handler
if handler == "error":
raise OptionConflictError(
"conflicting option string(s): %s"
% ", ".join([co[0] for co in conflict_opts]),
option)
elif handler == "resolve":
for (opt, c_option) in conflict_opts:
if opt.startswith("--"):
c_option._long_opts.remove(opt)
del self._long_opt[opt]
else:
c_option._short_opts.remove(opt)
del self._short_opt[opt]
if not (c_option._short_opts or c_option._long_opts):
c_option.container.option_list.remove(c_option)
def add_option(self, *args, **kwargs):
"""add_option(Option)
add_option(opt_str, ..., kwarg=val, ...)
"""
if isinstance(args[0], str):
option = self.option_class(*args, **kwargs)
elif len(args) == 1 and not kwargs:
option = args[0]
if not isinstance(option, Option):
raise TypeError("not an Option instance: %r" % option)
else:
raise TypeError("invalid arguments")
self._check_conflict(option)
self.option_list.append(option)
option.container = self
for opt in option._short_opts:
self._short_opt[opt] = option
for opt in option._long_opts:
self._long_opt[opt] = option
if option.dest is not None: # option has a dest, we need a default
if option.default is not NO_DEFAULT:
self.defaults[option.dest] = option.default
elif option.dest not in self.defaults:
self.defaults[option.dest] = None
return option
def add_options(self, option_list):
for option in option_list:
self.add_option(option)
# -- Option query/removal methods ----------------------------------
def get_option(self, opt_str):
return (self._short_opt.get(opt_str) or
self._long_opt.get(opt_str))
def has_option(self, opt_str):
return (opt_str in self._short_opt or
opt_str in self._long_opt)
def remove_option(self, opt_str):
option = self._short_opt.get(opt_str)
if option is None:
option = self._long_opt.get(opt_str)
if option is None:
raise ValueError("no such option %r" % opt_str)
for opt in option._short_opts:
del self._short_opt[opt]
for opt in option._long_opts:
del self._long_opt[opt]
option.container.option_list.remove(option)
# -- Help-formatting methods ---------------------------------------
def format_option_help(self, formatter):
if not self.option_list:
return ""
result = []
for option in self.option_list:
if not option.help is SUPPRESS_HELP:
result.append(formatter.format_option(option))
return "".join(result)
def format_description(self, formatter):
return formatter.format_description(self.get_description())
def format_help(self, formatter):
result = []
if self.description:
result.append(self.format_description(formatter))
if self.option_list:
result.append(self.format_option_help(formatter))
return "\n".join(result)
class OptionGroup (OptionContainer):
def __init__(self, parser, title, description=None):
self.parser = parser
OptionContainer.__init__(
self, parser.option_class, parser.conflict_handler, description)
self.title = title
def _create_option_list(self):
self.option_list = []
self._share_option_mappings(self.parser)
def set_title(self, title):
self.title = title
def destroy(self):
"""see OptionParser.destroy()."""
OptionContainer.destroy(self)
del self.option_list
# -- Help-formatting methods ---------------------------------------
def format_help(self, formatter):
result = formatter.format_heading(self.title)
formatter.indent()
result += OptionContainer.format_help(self, formatter)
formatter.dedent()
return result
class OptionParser (OptionContainer):
"""
Class attributes:
standard_option_list : [Option]
list of standard options that will be accepted by all instances
of this parser class (intended to be overridden by subclasses).
Instance attributes:
usage : string
a usage string for your program. Before it is displayed
to the user, "%prog" will be expanded to the name of
your program (self.prog or os.path.basename(sys.argv[0])).
prog : string
the name of the current program (to override
os.path.basename(sys.argv[0])).
description : string
A paragraph of text giving a brief overview of your program.
optparse reformats this paragraph to fit the current terminal
width and prints it when the user requests help (after usage,
but before the list of options).
epilog : string
paragraph of help text to print after option help
option_groups : [OptionGroup]
list of option groups in this parser (option groups are
irrelevant for parsing the command-line, but very useful
for generating help)
allow_interspersed_args : bool = true
if true, positional arguments may be interspersed with options.
Assuming -a and -b each take a single argument, the command-line
-ablah foo bar -bboo baz
will be interpreted the same as
-ablah -bboo -- foo bar baz
If this flag were false, that command line would be interpreted as
-ablah -- foo bar -bboo baz
-- ie. we stop processing options as soon as we see the first
non-option argument. (This is the tradition followed by
Python's getopt module, Perl's Getopt::Std, and other argument-
parsing libraries, but it is generally annoying to users.)
process_default_values : bool = true
if true, option default values are processed similarly to option
values from the command line: that is, they are passed to the
type-checking function for the option's type (as long as the
default value is a string). (This really only matters if you
have defined custom types; see SF bug #955889.) Set it to false
to restore the behaviour of Optik 1.4.1 and earlier.
rargs : [string]
the argument list currently being parsed. Only set when
parse_args() is active, and continually trimmed down as
we consume arguments. Mainly there for the benefit of
callback options.
largs : [string]
the list of leftover arguments that we have skipped while
parsing options. If allow_interspersed_args is false, this
list is always empty.
values : Values
the set of option values currently being accumulated. Only
set when parse_args() is active. Also mainly for callbacks.
Because of the 'rargs', 'largs', and 'values' attributes,
OptionParser is not thread-safe. If, for some perverse reason, you
need to parse command-line arguments simultaneously in different
threads, use different OptionParser instances.
"""
standard_option_list = []
def __init__(self,
usage=None,
option_list=None,
option_class=Option,
version=None,
conflict_handler="error",
description=None,
formatter=None,
add_help_option=True,
prog=None,
epilog=None):
OptionContainer.__init__(
self, option_class, conflict_handler, description)
self.set_usage(usage)
self.prog = prog
self.version = version
self.allow_interspersed_args = True
self.process_default_values = True
if formatter is None:
formatter = IndentedHelpFormatter()
self.formatter = formatter
self.formatter.set_parser(self)
self.epilog = epilog
# Populate the option list; initial sources are the
# standard_option_list class attribute, the 'option_list'
# argument, and (if applicable) the _add_version_option() and
# _add_help_option() methods.
self._populate_option_list(option_list,
add_help=add_help_option)
self._init_parsing_state()
def destroy(self):
"""
Declare that you are done with this OptionParser. This cleans up
reference cycles so the OptionParser (and all objects referenced by
it) can be garbage-collected promptly. After calling destroy(), the
OptionParser is unusable.
"""
OptionContainer.destroy(self)
for group in self.option_groups:
group.destroy()
del self.option_list
del self.option_groups
del self.formatter
# -- Private methods -----------------------------------------------
# (used by our or OptionContainer's constructor)
def _create_option_list(self):
self.option_list = []
self.option_groups = []
self._create_option_mappings()
def _add_help_option(self):
self.add_option("-h", "--help",
action="help",
help=_("show this help message and exit"))
def _add_version_option(self):
self.add_option("--version",
action="version",
help=_("show program's version number and exit"))
def _populate_option_list(self, option_list, add_help=True):
if self.standard_option_list:
self.add_options(self.standard_option_list)
if option_list:
self.add_options(option_list)
if self.version:
self._add_version_option()
if add_help:
self._add_help_option()
def _init_parsing_state(self):
# These are set in parse_args() for the convenience of callbacks.
self.rargs = None
self.largs = None
self.values = None
# -- Simple modifier methods ---------------------------------------
def set_usage(self, usage):
if usage is None:
self.usage = _("%prog [options]")
elif usage is SUPPRESS_USAGE:
self.usage = None
# For backwards compatibility with Optik 1.3 and earlier.
elif usage.lower().startswith("usage: "):
self.usage = usage[7:]
else:
self.usage = usage
def enable_interspersed_args(self):
"""Set parsing to not stop on the first non-option, allowing
interspersing switches with command arguments. This is the
default behavior. See also disable_interspersed_args() and the
class documentation description of the attribute
allow_interspersed_args."""
self.allow_interspersed_args = True
def disable_interspersed_args(self):
"""Set parsing to stop on the first non-option. Use this if
you have a command processor which runs another command that
has options of its own and you want to make sure these options
don't get confused.
"""
self.allow_interspersed_args = False
def set_process_default_values(self, process):
self.process_default_values = process
def set_default(self, dest, value):
self.defaults[dest] = value
def set_defaults(self, **kwargs):
self.defaults.update(kwargs)
def _get_all_options(self):
options = self.option_list[:]
for group in self.option_groups:
options.extend(group.option_list)
return options
def get_default_values(self):
if not self.process_default_values:
# Old, pre-Optik 1.5 behaviour.
return Values(self.defaults)
defaults = self.defaults.copy()
for option in self._get_all_options():
default = defaults.get(option.dest)
if isinstance(default, str):
opt_str = option.get_opt_string()
defaults[option.dest] = option.check_value(opt_str, default)
return Values(defaults)
# -- OptionGroup methods -------------------------------------------
def add_option_group(self, *args, **kwargs):
# XXX lots of overlap with OptionContainer.add_option()
if isinstance(args[0], str):
group = OptionGroup(self, *args, **kwargs)
elif len(args) == 1 and not kwargs:
group = args[0]
if not isinstance(group, OptionGroup):
raise TypeError("not an OptionGroup instance: %r" % group)
if group.parser is not self:
raise ValueError("invalid OptionGroup (wrong parser)")
else:
raise TypeError("invalid arguments")
self.option_groups.append(group)
return group
def get_option_group(self, opt_str):
option = (self._short_opt.get(opt_str) or
self._long_opt.get(opt_str))
if option and option.container is not self:
return option.container
return None
# -- Option-parsing methods ----------------------------------------
def _get_args(self, args):
if args is None:
return sys.argv[1:]
else:
return args[:] # don't modify caller's list
def parse_args(self, args=None, values=None):
"""
parse_args(args : [string] = sys.argv[1:],
values : Values = None)
-> (values : Values, args : [string])
Parse the command-line options found in 'args' (default:
sys.argv[1:]). Any errors result in a call to 'error()', which
by default prints the usage message to stderr and calls
sys.exit() with an error message. On success returns a pair
(values, args) where 'values' is an Values instance (with all
your option values) and 'args' is the list of arguments left
over after parsing options.
"""
rargs = self._get_args(args)
if values is None:
values = self.get_default_values()
# Store the halves of the argument list as attributes for the
# convenience of callbacks:
# rargs
# the rest of the command-line (the "r" stands for
# "remaining" or "right-hand")
# largs
# the leftover arguments -- ie. what's left after removing
# options and their arguments (the "l" stands for "leftover"
# or "left-hand")
self.rargs = rargs
self.largs = largs = []
self.values = values
try:
stop = self._process_args(largs, rargs, values)
except (BadOptionError, OptionValueError) as err:
self.error(str(err))
args = largs + rargs
return self.check_values(values, args)
def check_values(self, values, args):
"""
check_values(values : Values, args : [string])
-> (values : Values, args : [string])
Check that the supplied option values and leftover arguments are
valid. Returns the option values and leftover arguments
(possibly adjusted, possibly completely new -- whatever you
like). Default implementation just returns the passed-in
values; subclasses may override as desired.
"""
return (values, args)
def _process_args(self, largs, rargs, values):
"""_process_args(largs : [string],
rargs : [string],
values : Values)
Process command-line arguments and populate 'values', consuming
options and arguments from 'rargs'. If 'allow_interspersed_args' is
false, stop at the first non-option argument. If true, accumulate any
interspersed non-option arguments in 'largs'.
"""
while rargs:
arg = rargs[0]
# We handle bare "--" explicitly, and bare "-" is handled by the
# standard arg handler since the short arg case ensures that the
# len of the opt string is greater than 1.
if arg == "--":
del rargs[0]
return
elif arg[0:2] == "--":
# process a single long option (possibly with value(s))
self._process_long_opt(rargs, values)
elif arg[:1] == "-" and len(arg) > 1:
# process a cluster of short options (possibly with
# value(s) for the last one only)
self._process_short_opts(rargs, values)
elif self.allow_interspersed_args:
largs.append(arg)
del rargs[0]
else:
return # stop now, leave this arg in rargs
# Say this is the original argument list:
# [arg0, arg1, ..., arg(i-1), arg(i), arg(i+1), ..., arg(N-1)]
# ^
# (we are about to process arg(i)).
#
# Then rargs is [arg(i), ..., arg(N-1)] and largs is a *subset* of
# [arg0, ..., arg(i-1)] (any options and their arguments will have
# been removed from largs).
#
# The while loop will usually consume 1 or more arguments per pass.
# If it consumes 1 (eg. arg is an option that takes no arguments),
# then after _process_arg() is done the situation is:
#
# largs = subset of [arg0, ..., arg(i)]
# rargs = [arg(i+1), ..., arg(N-1)]
#
# If allow_interspersed_args is false, largs will always be
# *empty* -- still a subset of [arg0, ..., arg(i-1)], but
# not a very interesting subset!
def _match_long_opt(self, opt):
"""_match_long_opt(opt : string) -> string
Determine which long option string 'opt' matches, ie. which one
it is an unambiguous abbreviation for. Raises BadOptionError if
'opt' doesn't unambiguously match any long option string.
"""
return _match_abbrev(opt, self._long_opt)
def _process_long_opt(self, rargs, values):
arg = rargs.pop(0)
# Value explicitly attached to arg? Pretend it's the next
# argument.
if "=" in arg:
(opt, next_arg) = arg.split("=", 1)
rargs.insert(0, next_arg)
had_explicit_value = True
else:
opt = arg
had_explicit_value = False
opt = self._match_long_opt(opt)
option = self._long_opt[opt]
if option.takes_value():
nargs = option.nargs
if len(rargs) < nargs:
self.error(ngettext(
"%(option)s option requires %(number)d argument",
"%(option)s option requires %(number)d arguments",
nargs) % {"option": opt, "number": nargs})
elif nargs == 1:
value = rargs.pop(0)
else:
value = tuple(rargs[0:nargs])
del rargs[0:nargs]
elif had_explicit_value:
self.error(_("%s option does not take a value") % opt)
else:
value = None
option.process(opt, value, values, self)
def _process_short_opts(self, rargs, values):
arg = rargs.pop(0)
stop = False
i = 1
for ch in arg[1:]:
opt = "-" + ch
option = self._short_opt.get(opt)
i += 1 # we have consumed a character
if not option:
raise BadOptionError(opt)
if option.takes_value():
# Any characters left in arg? Pretend they're the
# next arg, and stop consuming characters of arg.
if i < len(arg):
rargs.insert(0, arg[i:])
stop = True
nargs = option.nargs
if len(rargs) < nargs:
self.error(ngettext(
"%(option)s option requires %(number)d argument",
"%(option)s option requires %(number)d arguments",
nargs) % {"option": opt, "number": nargs})
elif nargs == 1:
value = rargs.pop(0)
else:
value = tuple(rargs[0:nargs])
del rargs[0:nargs]
else: # option doesn't take a value
value = None
option.process(opt, value, values, self)
if stop:
break
# -- Feedback methods ----------------------------------------------
def get_prog_name(self):
if self.prog is None:
return os.path.basename(sys.argv[0])
else:
return self.prog
def expand_prog_name(self, s):
return s.replace("%prog", self.get_prog_name())
def get_description(self):
return self.expand_prog_name(self.description)
def exit(self, status=0, msg=None):
if msg:
sys.stderr.write(msg)
sys.exit(status)
def error(self, msg):
"""error(msg : string)
Print a usage message incorporating 'msg' to stderr and exit.
If you override this in a subclass, it should not return -- it
should either exit or raise an exception.
"""
self.print_usage(sys.stderr)
self.exit(2, "%s: error: %s\n" % (self.get_prog_name(), msg))
def get_usage(self):
if self.usage:
return self.formatter.format_usage(
self.expand_prog_name(self.usage))
else:
return ""
def print_usage(self, file=None):
"""print_usage(file : file = stdout)
Print the usage message for the current program (self.usage) to
'file' (default stdout). Any occurrence of the string "%prog" in
self.usage is replaced with the name of the current program
(basename of sys.argv[0]). Does nothing if self.usage is empty
or not defined.
"""
if self.usage:
print(self.get_usage(), file=file)
def get_version(self):
if self.version:
return self.expand_prog_name(self.version)
else:
return ""
def print_version(self, file=None):
"""print_version(file : file = stdout)
Print the version message for this program (self.version) to
'file' (default stdout). As with print_usage(), any occurrence
of "%prog" in self.version is replaced by the current program's
name. Does nothing if self.version is empty or undefined.
"""
if self.version:
print(self.get_version(), file=file)
def format_option_help(self, formatter=None):
if formatter is None:
formatter = self.formatter
formatter.store_option_strings(self)
result = []
result.append(formatter.format_heading(_("Options")))
formatter.indent()
if self.option_list:
result.append(OptionContainer.format_option_help(self, formatter))
result.append("\n")
for group in self.option_groups:
result.append(group.format_help(formatter))
result.append("\n")
formatter.dedent()
# Drop the last "\n", or the header if no options or option groups:
return "".join(result[:-1])
def format_epilog(self, formatter):
return formatter.format_epilog(self.epilog)
def format_help(self, formatter=None):
if formatter is None:
formatter = self.formatter
result = []
if self.usage:
result.append(self.get_usage() + "\n")
if self.description:
result.append(self.format_description(formatter) + "\n")
result.append(self.format_option_help(formatter))
result.append(self.format_epilog(formatter))
return "".join(result)
def print_help(self, file=None):
"""print_help(file : file = stdout)
Print an extended help message, listing all options and any
help text provided with them, to 'file' (default stdout).
"""
if file is None:
file = sys.stdout
file.write(self.format_help())
# class OptionParser
def _match_abbrev(s, wordmap):
"""_match_abbrev(s : string, wordmap : {string : Option}) -> string
Return the string key in 'wordmap' for which 's' is an unambiguous
abbreviation. If 's' is found to be ambiguous or doesn't match any of
'words', raise BadOptionError.
"""
# Is there an exact match?
if s in wordmap:
return s
else:
# Isolate all words with s as a prefix.
possibilities = [word for word in wordmap.keys()
if word.startswith(s)]
# No exact match, so there had better be just one possibility.
if len(possibilities) == 1:
return possibilities[0]
elif not possibilities:
raise BadOptionError(s)
else:
# More than one possible completion: ambiguous prefix.
possibilities.sort()
raise AmbiguousOptionError(s, possibilities)
# Some day, there might be many Option classes. As of Optik 1.3, the
# preferred way to instantiate Options is indirectly, via make_option(),
# which will become a factory function when there are many Option
# classes.
make_option = Option
r"""OS routines for NT or Posix depending on what system we're on.
This exports:
- all functions from posix, nt or ce, e.g. unlink, stat, etc.
- os.path is either posixpath or ntpath
- os.name is either 'posix', 'nt' or 'ce'.
- os.curdir is a string representing the current directory ('.' or ':')
- os.pardir is a string representing the parent directory ('..' or '::')
- os.sep is the (or a most common) pathname separator ('/' or ':' or '\\')
- os.extsep is the extension separator (always '.')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
- os.defpath is the default search path for executables
- os.devnull is the file path of the null device ('/dev/null', etc.)
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then
only use functions that are defined by all platforms (e.g., unlink
and opendir), and leave all pathname manipulation to os.path
(e.g., split and join).
"""
#'
import sys, errno
import stat as st
_names = sys.builtin_module_names
# Note: more names are added to __all__ later.
__all__ = ["altsep", "curdir", "pardir", "sep", "pathsep", "linesep",
"defpath", "name", "path", "devnull", "SEEK_SET", "SEEK_CUR",
"SEEK_END", "fsencode", "fsdecode", "get_exec_path", "fdopen",
"popen", "extsep"]
def _exists(name):
return name in globals()
def _get_exports_list(module):
try:
return list(module.__all__)
except AttributeError:
return [n for n in dir(module) if n[0] != '_']
# Any new dependencies of the os module and/or changes in path separator
# requires updating importlib as well.
if 'posix' in _names:
name = 'posix'
linesep = '\n'
from posix import *
try:
from posix import _exit
__all__.append('_exit')
except ImportError:
pass
import posixpath as path
try:
from posix import _have_functions
except ImportError:
pass
elif 'nt' in _names:
name = 'nt'
linesep = '\r\n'
from nt import *
try:
from nt import _exit
__all__.append('_exit')
except ImportError:
pass
import ntpath as path
import nt
__all__.extend(_get_exports_list(nt))
del nt
try:
from nt import _have_functions
except ImportError:
pass
elif 'ce' in _names:
name = 'ce'
linesep = '\r\n'
from ce import *
try:
from ce import _exit
__all__.append('_exit')
except ImportError:
pass
# We can use the standard Windows path.
import ntpath as path
import ce
__all__.extend(_get_exports_list(ce))
del ce
try:
from ce import _have_functions
except ImportError:
pass
else:
raise ImportError('no os specific module found')
sys.modules['os.path'] = path
from os.path import (curdir, pardir, sep, pathsep, defpath, extsep, altsep,
devnull)
del _names
if _exists("_have_functions"):
_globals = globals()
def _add(str, fn):
if (fn in _globals) and (str in _have_functions):
_set.add(_globals[fn])
_set = set()
_add("HAVE_FACCESSAT", "access")
_add("HAVE_FCHMODAT", "chmod")
_add("HAVE_FCHOWNAT", "chown")
_add("HAVE_FSTATAT", "stat")
_add("HAVE_FUTIMESAT", "utime")
_add("HAVE_LINKAT", "link")
_add("HAVE_MKDIRAT", "mkdir")
_add("HAVE_MKFIFOAT", "mkfifo")
_add("HAVE_MKNODAT", "mknod")
_add("HAVE_OPENAT", "open")
_add("HAVE_READLINKAT", "readlink")
_add("HAVE_RENAMEAT", "rename")
_add("HAVE_SYMLINKAT", "symlink")
_add("HAVE_UNLINKAT", "unlink")
_add("HAVE_UNLINKAT", "rmdir")
_add("HAVE_UTIMENSAT", "utime")
supports_dir_fd = _set
_set = set()
_add("HAVE_FACCESSAT", "access")
supports_effective_ids = _set
_set = set()
_add("HAVE_FCHDIR", "chdir")
_add("HAVE_FCHMOD", "chmod")
_add("HAVE_FCHOWN", "chown")
_add("HAVE_FDOPENDIR", "listdir")
_add("HAVE_FEXECVE", "execve")
_set.add(stat) # fstat always works
_add("HAVE_FTRUNCATE", "truncate")
_add("HAVE_FUTIMENS", "utime")
_add("HAVE_FUTIMES", "utime")
_add("HAVE_FPATHCONF", "pathconf")
if _exists("statvfs") and _exists("fstatvfs"): # mac os x10.3
_add("HAVE_FSTATVFS", "statvfs")
supports_fd = _set
_set = set()
_add("HAVE_FACCESSAT", "access")
# Some platforms don't support lchmod(). Often the function exists
# anyway, as a stub that always returns ENOSUP or perhaps EOPNOTSUPP.
# (No, I don't know why that's a good design.) ./configure will detect
# this and reject it--so HAVE_LCHMOD still won't be defined on such
# platforms. This is Very Helpful.
#
# However, sometimes platforms without a working lchmod() *do* have
# fchmodat(). (Examples: Linux kernel 3.2 with glibc 2.15,
# OpenIndiana 3.x.) And fchmodat() has a flag that theoretically makes
# it behave like lchmod(). So in theory it would be a suitable
# replacement for lchmod(). But when lchmod() doesn't work, fchmodat()'s
# flag doesn't work *either*. Sadly ./configure isn't sophisticated
# enough to detect this condition--it only determines whether or not
# fchmodat() minimally works.
#
# Therefore we simply ignore fchmodat() when deciding whether or not
# os.chmod supports follow_symlinks. Just checking lchmod() is
# sufficient. After all--if you have a working fchmodat(), your
# lchmod() almost certainly works too.
#
# _add("HAVE_FCHMODAT", "chmod")
_add("HAVE_FCHOWNAT", "chown")
_add("HAVE_FSTATAT", "stat")
_add("HAVE_LCHFLAGS", "chflags")
_add("HAVE_LCHMOD", "chmod")
if _exists("lchown"): # mac os x10.3
_add("HAVE_LCHOWN", "chown")
_add("HAVE_LINKAT", "link")
_add("HAVE_LUTIMES", "utime")
_add("HAVE_LSTAT", "stat")
_add("HAVE_FSTATAT", "stat")
_add("HAVE_UTIMENSAT", "utime")
_add("MS_WINDOWS", "stat")
supports_follow_symlinks = _set
del _set
del _have_functions
del _globals
del _add
# Python uses fixed values for the SEEK_ constants; they are mapped
# to native constants if necessary in posixmodule.c
# Other possible SEEK values are directly imported from posixmodule.c
SEEK_SET = 0
SEEK_CUR = 1
SEEK_END = 2
# Super directory utilities.
# (Inspired by Eric Raymond; the doc strings are mostly his)
def makedirs(name, mode=0o777, exist_ok=False):
"""makedirs(name [, mode=0o777][, exist_ok=False])
Super-mkdir; create a leaf directory and all intermediate ones. Works like
mkdir, except that any intermediate path segment (not just the rightmost)
will be created if it does not exist. If the target directory already
exists, raise an OSError if exist_ok is False. Otherwise no exception is
raised. This is recursive.
"""
head, tail = path.split(name)
if not tail:
head, tail = path.split(head)
if head and tail and not path.exists(head):
try:
makedirs(head, mode, exist_ok)
except FileExistsError:
# Defeats race condition when another thread created the path
pass
cdir = curdir
if isinstance(tail, bytes):
cdir = bytes(curdir, 'ASCII')
if tail == cdir: # xxx/newdir/. exists if xxx/newdir exists
return
try:
mkdir(name, mode)
except OSError:
# Cannot rely on checking for EEXIST, since the operating system
# could give priority to other errors like EACCES or EROFS
if not exist_ok or not path.isdir(name):
raise
def removedirs(name):
"""removedirs(name)
Super-rmdir; remove a leaf directory and all empty intermediate
ones. Works like rmdir except that, if the leaf directory is
successfully removed, directories corresponding to rightmost path
segments will be pruned away until either the whole path is
consumed or an error occurs. Errors during this latter phase are
ignored -- they generally mean that a directory was not empty.
"""
rmdir(name)
head, tail = path.split(name)
if not tail:
head, tail = path.split(head)
while head and tail:
try:
rmdir(head)
except OSError:
break
head, tail = path.split(head)
def renames(old, new):
"""renames(old, new)
Super-rename; create directories as necessary and delete any left
empty. Works like rename, except creation of any intermediate
directories needed to make the new pathname good is attempted
first. After the rename, directories corresponding to rightmost
path segments of the old name will be pruned until either the
whole path is consumed or a nonempty directory is found.
Note: this function can fail with the new directory structure made
if you lack permissions needed to unlink the leaf directory or
file.
"""
head, tail = path.split(new)
if head and tail and not path.exists(head):
makedirs(head)
rename(old, new)
head, tail = path.split(old)
if head and tail:
try:
removedirs(head)
except OSError:
pass
__all__.extend(["makedirs", "removedirs", "renames"])
def walk(top, topdown=True, onerror=None, followlinks=False):
"""Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).
If optional arg 'topdown' is true or not specified, the triple for a
directory is generated before the triples for any of its subdirectories
(directories are generated top down). If topdown is false, the triple
for a directory is generated after the triples for all of its
subdirectories (directories are generated bottom up).
When topdown is true, the caller can modify the dirnames list in-place
(e.g., via del or slice assignment), and walk will only recurse into the
subdirectories whose names remain in dirnames; this can be used to prune the
search, or to impose a specific order of visiting. Modifying dirnames when
topdown is false is ineffective, since the directories in dirnames have
already been generated by the time dirnames itself is generated. No matter
the value of topdown, the list of subdirectories is retrieved before the
tuples for the directory and its subdirectories are generated.
By default errors from the os.listdir() call are ignored. If
optional arg 'onerror' is specified, it should be a function; it
will be called with one argument, an OSError instance. It can
report the error to continue with the walk, or raise the exception
to abort the walk. Note that the filename is available as the
filename attribute of the exception object.
By default, os.walk does not follow symbolic links to subdirectories on
systems that support them. In order to get this functionality, set the
optional argument 'followlinks' to true.
Caution: if you pass a relative pathname for top, don't change the
current working directory between resumptions of walk. walk never
changes the current directory, and assumes that the client doesn't
either.
Example:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print(root, "consumes", end="")
print(sum([getsize(join(root, name)) for name in files]), end="")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
"""
islink, join, isdir = path.islink, path.join, path.isdir
# We may not have read permission for top, in which case we can't
# get a list of the files the directory contains. os.walk
# always suppressed the exception then, rather than blow up for a
# minor reason when (say) a thousand readable directories are still
# left to visit. That logic is copied here.
try:
# Note that listdir is global in this module due
# to earlier import-*.
names = listdir(top)
except OSError as err:
if onerror is not None:
onerror(err)
return
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
if topdown:
yield top, dirs, nondirs
for name in dirs:
new_path = join(top, name)
if followlinks or not islink(new_path):
yield from walk(new_path, topdown, onerror, followlinks)
if not topdown:
yield top, dirs, nondirs
__all__.append("walk")
if {open, stat} <= supports_dir_fd and {listdir, stat} <= supports_fd:
def fwalk(top=".", topdown=True, onerror=None, *, follow_symlinks=False, dir_fd=None):
"""Directory tree generator.
This behaves exactly like walk(), except that it yields a 4-tuple
dirpath, dirnames, filenames, dirfd
`dirpath`, `dirnames` and `filenames` are identical to walk() output,
and `dirfd` is a file descriptor referring to the directory `dirpath`.
The advantage of fwalk() over walk() is that it's safe against symlink
races (when follow_symlinks is False).
If dir_fd is not None, it should be a file descriptor open to a directory,
and top should be relative; top will then be relative to that directory.
(dir_fd is always supported for fwalk.)
Caution:
Since fwalk() yields file descriptors, those are only valid until the
next iteration step, so you should dup() them if you want to keep them
for a longer period.
Example:
import os
for root, dirs, files, rootfd in os.fwalk('python/Lib/email'):
print(root, "consumes", end="")
print(sum([os.stat(name, dir_fd=rootfd).st_size for name in files]),
end="")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
"""
# Note: To guard against symlink races, we use the standard
# lstat()/open()/fstat() trick.
orig_st = stat(top, follow_symlinks=False, dir_fd=dir_fd)
topfd = open(top, O_RDONLY, dir_fd=dir_fd)
try:
if (follow_symlinks or (st.S_ISDIR(orig_st.st_mode) and
path.samestat(orig_st, stat(topfd)))):
yield from _fwalk(topfd, top, topdown, onerror, follow_symlinks)
finally:
close(topfd)
def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks):
# Note: This uses O(depth of the directory tree) file descriptors: if
# necessary, it can be adapted to only require O(1) FDs, see issue
# #13734.
names = listdir(topfd)
dirs, nondirs = [], []
for name in names:
try:
# Here, we don't use AT_SYMLINK_NOFOLLOW to be consistent with
# walk() which reports symlinks to directories as directories.
# We do however check for symlinks before recursing into
# a subdirectory.
if st.S_ISDIR(stat(name, dir_fd=topfd).st_mode):
dirs.append(name)
else:
nondirs.append(name)
except FileNotFoundError:
try:
# Add dangling symlinks, ignore disappeared files
if st.S_ISLNK(stat(name, dir_fd=topfd, follow_symlinks=False)
.st_mode):
nondirs.append(name)
except FileNotFoundError:
continue
if topdown:
yield toppath, dirs, nondirs, topfd
for name in dirs:
try:
orig_st = stat(name, dir_fd=topfd, follow_symlinks=follow_symlinks)
dirfd = open(name, O_RDONLY, dir_fd=topfd)
except OSError as err:
if onerror is not None:
onerror(err)
return
try:
if follow_symlinks or path.samestat(orig_st, stat(dirfd)):
dirpath = path.join(toppath, name)
yield from _fwalk(dirfd, dirpath, topdown, onerror, follow_symlinks)
finally:
close(dirfd)
if not topdown:
yield toppath, dirs, nondirs, topfd
__all__.append("fwalk")
# Make sure os.environ exists, at least
try:
environ
except NameError:
environ = {}
def execl(file, *args):
"""execl(file, *args)
Execute the executable file with argument list args, replacing the
current process. """
execv(file, args)
def execle(file, *args):
"""execle(file, *args, env)
Execute the executable file with argument list args and
environment env, replacing the current process. """
env = args[-1]
execve(file, args[:-1], env)
def execlp(file, *args):
"""execlp(file, *args)
Execute the executable file (which is searched for along $PATH)
with argument list args, replacing the current process. """
execvp(file, args)
def execlpe(file, *args):
"""execlpe(file, *args, env)
Execute the executable file (which is searched for along $PATH)
with argument list args and environment env, replacing the current
process. """
env = args[-1]
execvpe(file, args[:-1], env)
def execvp(file, args):
"""execvp(file, args)
Execute the executable file (which is searched for along $PATH)
with argument list args, replacing the current process.
args may be a list or tuple of strings. """
_execvpe(file, args)
def execvpe(file, args, env):
"""execvpe(file, args, env)
Execute the executable file (which is searched for along $PATH)
with argument list args and environment env , replacing the
current process.
args may be a list or tuple of strings. """
_execvpe(file, args, env)
__all__.extend(["execl","execle","execlp","execlpe","execvp","execvpe"])
def _execvpe(file, args, env=None):
if env is not None:
exec_func = execve
argrest = (args, env)
else:
exec_func = execv
argrest = (args,)
env = environ
head, tail = path.split(file)
if head:
exec_func(file, *argrest)
return
last_exc = saved_exc = None
saved_tb = None
path_list = get_exec_path(env)
if name != 'nt':
file = fsencode(file)
path_list = map(fsencode, path_list)
for dir in path_list:
fullname = path.join(dir, file)
try:
exec_func(fullname, *argrest)
except OSError as e:
last_exc = e
tb = sys.exc_info()[2]
if (e.errno != errno.ENOENT and e.errno != errno.ENOTDIR
and saved_exc is None):
saved_exc = e
saved_tb = tb
if saved_exc:
raise saved_exc.with_traceback(saved_tb)
raise last_exc.with_traceback(tb)
def get_exec_path(env=None):
"""Returns the sequence of directories that will be searched for the
named executable (similar to a shell) when launching a process.
*env* must be an environment variable dict or None. If *env* is None,
os.environ will be used.
"""
# Use a local import instead of a global import to limit the number of
# modules loaded at startup: the os module is always loaded at startup by
# Python. It may also avoid a bootstrap issue.
import warnings
if env is None:
env = environ
# {b'PATH': ...}.get('PATH') and {'PATH': ...}.get(b'PATH') emit a
# BytesWarning when using python -b or python -bb: ignore the warning
with warnings.catch_warnings():
warnings.simplefilter("ignore", BytesWarning)
try:
path_list = env.get('PATH')
except TypeError:
path_list = None
if supports_bytes_environ:
try:
path_listb = env[b'PATH']
except (KeyError, TypeError):
pass
else:
if path_list is not None:
raise ValueError(
"env cannot contain 'PATH' and b'PATH' keys")
path_list = path_listb
if path_list is not None and isinstance(path_list, bytes):
path_list = fsdecode(path_list)
if path_list is None:
path_list = defpath
return path_list.split(pathsep)
# Change environ to automatically call putenv(), unsetenv if they exist.
from _collections_abc import MutableMapping
class _Environ(MutableMapping):
def __init__(self, data, encodekey, decodekey, encodevalue, decodevalue, putenv, unsetenv):
self.encodekey = encodekey
self.decodekey = decodekey
self.encodevalue = encodevalue
self.decodevalue = decodevalue
self.putenv = putenv
self.unsetenv = unsetenv
self._data = data
def __getitem__(self, key):
try:
value = self._data[self.encodekey(key)]
except KeyError:
# raise KeyError with the original key value
raise KeyError(key) from None
return self.decodevalue(value)
def __setitem__(self, key, value):
key = self.encodekey(key)
value = self.encodevalue(value)
self.putenv(key, value)
self._data[key] = value
def __delitem__(self, key):
encodedkey = self.encodekey(key)
self.unsetenv(encodedkey)
try:
del self._data[encodedkey]
except KeyError:
# raise KeyError with the original key value
raise KeyError(key) from None
def __iter__(self):
for key in self._data:
yield self.decodekey(key)
def __len__(self):
return len(self._data)
def __repr__(self):
return 'environ({{{}}})'.format(', '.join(
('{!r}: {!r}'.format(self.decodekey(key), self.decodevalue(value))
for key, value in self._data.items())))
def copy(self):
return dict(self)
def setdefault(self, key, value):
if key not in self:
self[key] = value
return self[key]
try:
_putenv = putenv
except NameError:
_putenv = lambda key, value: None
else:
if "putenv" not in __all__:
__all__.append("putenv")
try:
_unsetenv = unsetenv
except NameError:
_unsetenv = lambda key: _putenv(key, "")
else:
if "unsetenv" not in __all__:
__all__.append("unsetenv")
def _createenviron():
if name == 'nt':
# Where Env Var Names Must Be UPPERCASE
def check_str(value):
if not isinstance(value, str):
raise TypeError("str expected, not %s" % type(value).__name__)
return value
encode = check_str
decode = str
def encodekey(key):
return encode(key).upper()
data = {}
for key, value in environ.items():
data[encodekey(key)] = value
else:
# Where Env Var Names Can Be Mixed Case
encoding = sys.getfilesystemencoding()
def encode(value):
if not isinstance(value, str):
raise TypeError("str expected, not %s" % type(value).__name__)
if sys.implementation.name == "ironpython":
return value
return value.encode(encoding, 'surrogateescape')
def decode(value):
if sys.implementation.name == "ironpython":
return value
return value.decode(encoding, 'surrogateescape')
encodekey = encode
data = environ
return _Environ(data,
encodekey, decode,
encode, decode,
_putenv, _unsetenv)
# unicode environ
environ = _createenviron()
del _createenviron
def getenv(key, default=None):
"""Get an environment variable, return None if it doesn't exist.
The optional second argument can specify an alternate default.
key, default and the result are str."""
return environ.get(key, default)
supports_bytes_environ = (name != 'nt')
__all__.extend(("getenv", "supports_bytes_environ"))
if supports_bytes_environ:
def _check_bytes(value):
if not isinstance(value, bytes):
raise TypeError("bytes expected, not %s" % type(value).__name__)
return value
# bytes environ
environb = _Environ(environ._data,
_check_bytes, bytes,
_check_bytes, bytes,
_putenv, _unsetenv)
del _check_bytes
def getenvb(key, default=None):
"""Get an environment variable, return None if it doesn't exist.
The optional second argument can specify an alternate default.
key, default and the result are bytes."""
return environb.get(key, default)
__all__.extend(("environb", "getenvb"))
def _fscodec():
encoding = sys.getfilesystemencoding()
if encoding == 'mbcs':
errors = 'strict'
else:
errors = 'surrogateescape'
def fsencode(filename):
"""
Encode filename to the filesystem encoding with 'surrogateescape' error
handler, return bytes unchanged. On Windows, use 'strict' error handler if
the file system encoding is 'mbcs' (which is the default encoding).
"""
if isinstance(filename, bytes):
return filename
elif isinstance(filename, str):
return filename.encode(encoding, errors)
else:
raise TypeError("expect bytes or str, not %s" % type(filename).__name__)
def fsdecode(filename):
"""
Decode filename from the filesystem encoding with 'surrogateescape' error
handler, return str unchanged. On Windows, use 'strict' error handler if
the file system encoding is 'mbcs' (which is the default encoding).
"""
if isinstance(filename, str):
return filename
elif isinstance(filename, bytes):
return filename.decode(encoding, errors)
else:
raise TypeError("expect bytes or str, not %s" % type(filename).__name__)
return fsencode, fsdecode
fsencode, fsdecode = _fscodec()
del _fscodec
# Supply spawn*() (probably only for Unix)
if _exists("fork") and not _exists("spawnv") and _exists("execv"):
P_WAIT = 0
P_NOWAIT = P_NOWAITO = 1
__all__.extend(["P_WAIT", "P_NOWAIT", "P_NOWAITO"])
# XXX Should we support P_DETACH? I suppose it could fork()**2
# and close the std I/O streams. Also, P_OVERLAY is the same
# as execv*()?
def _spawnvef(mode, file, args, env, func):
# Internal helper; func is the exec*() function to use
pid = fork()
if not pid:
# Child
try:
if env is None:
func(file, args)
else:
func(file, args, env)
except:
_exit(127)
else:
# Parent
if mode == P_NOWAIT:
return pid # Caller is responsible for waiting!
while 1:
wpid, sts = waitpid(pid, 0)
if WIFSTOPPED(sts):
continue
elif WIFSIGNALED(sts):
return -WTERMSIG(sts)
elif WIFEXITED(sts):
return WEXITSTATUS(sts)
else:
raise OSError("Not stopped, signaled or exited???")
def spawnv(mode, file, args):
"""spawnv(mode, file, args) -> integer
Execute file with arguments from args in a subprocess.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, None, execv)
def spawnve(mode, file, args, env):
"""spawnve(mode, file, args, env) -> integer
Execute file with arguments from args in a subprocess with the
specified environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, env, execve)
# Note: spawnvp[e] is't currently supported on Windows
def spawnvp(mode, file, args):
"""spawnvp(mode, file, args) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, None, execvp)
def spawnvpe(mode, file, args, env):
"""spawnvpe(mode, file, args, env) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess with the supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return _spawnvef(mode, file, args, env, execvpe)
__all__.extend(["spawnv", "spawnve", "spawnvp", "spawnvpe"])
if _exists("spawnv"):
# These aren't supplied by the basic Windows code
# but can be easily implemented in Python
def spawnl(mode, file, *args):
"""spawnl(mode, file, *args) -> integer
Execute file with arguments from args in a subprocess.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return spawnv(mode, file, args)
def spawnle(mode, file, *args):
"""spawnle(mode, file, *args, env) -> integer
Execute file with arguments from args in a subprocess with the
supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
env = args[-1]
return spawnve(mode, file, args[:-1], env)
__all__.extend(["spawnl", "spawnle"])
if _exists("spawnvp"):
# At the moment, Windows doesn't implement spawnvp[e],
# so it won't have spawnlp[e] either.
def spawnlp(mode, file, *args):
"""spawnlp(mode, file, *args) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess with the supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
return spawnvp(mode, file, args)
def spawnlpe(mode, file, *args):
"""spawnlpe(mode, file, *args, env) -> integer
Execute file (which is looked for along $PATH) with arguments from
args in a subprocess with the supplied environment.
If mode == P_NOWAIT return the pid of the process.
If mode == P_WAIT return the process's exit code if it exits normally;
otherwise return -SIG, where SIG is the signal that killed it. """
env = args[-1]
return spawnvpe(mode, file, args[:-1], env)
__all__.extend(["spawnlp", "spawnlpe"])
# Supply os.popen()
def popen(cmd, mode="r", buffering=-1):
if not isinstance(cmd, str):
raise TypeError("invalid cmd type (%s, expected string)" % type(cmd))
if mode not in ("r", "w"):
raise ValueError("invalid mode %r" % mode)
if buffering == 0 or buffering is None:
raise ValueError("popen() does not support unbuffered streams")
import subprocess, io
if mode == "r":
proc = subprocess.Popen(cmd,
shell=True,
stdout=subprocess.PIPE,
bufsize=buffering)
return _wrap_close(io.TextIOWrapper(proc.stdout), proc)
else:
proc = subprocess.Popen(cmd,
shell=True,
stdin=subprocess.PIPE,
bufsize=buffering)
return _wrap_close(io.TextIOWrapper(proc.stdin), proc)
# Helper for popen() -- a proxy for a file whose close waits for the process
class _wrap_close:
def __init__(self, stream, proc):
self._stream = stream
self._proc = proc
def close(self):
self._stream.close()
returncode = self._proc.wait()
if returncode == 0:
return None
if name == 'nt':
return returncode
else:
return returncode << 8 # Shift left to match old behavior
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def __getattr__(self, name):
return getattr(self._stream, name)
def __iter__(self):
return iter(self._stream)
# Supply os.fdopen()
def fdopen(fd, *args, **kwargs):
if not isinstance(fd, int):
raise TypeError("invalid fd type (%s, expected integer)" % type(fd))
import io
return io.open(fd, *args, **kwargs)
import fnmatch
import functools
import io
import ntpath
import os
import posixpath
import re
import sys
from collections import Sequence
from contextlib import contextmanager
from errno import EINVAL, ENOENT, ENOTDIR
from operator import attrgetter
from stat import S_ISDIR, S_ISLNK, S_ISREG, S_ISSOCK, S_ISBLK, S_ISCHR, S_ISFIFO
from urllib.parse import quote_from_bytes as urlquote_from_bytes
supports_symlinks = True
if os.name == 'nt':
import nt
if sys.getwindowsversion()[:2] >= (6, 0):
from nt import _getfinalpathname
else:
supports_symlinks = False
_getfinalpathname = None
else:
nt = None
__all__ = [
"PurePath", "PurePosixPath", "PureWindowsPath",
"Path", "PosixPath", "WindowsPath",
]
#
# Internals
#
def _is_wildcard_pattern(pat):
# Whether this pattern needs actual matching using fnmatch, or can
# be looked up directly as a file.
return "*" in pat or "?" in pat or "[" in pat
class _Flavour(object):
"""A flavour implements a particular (platform-specific) set of path
semantics."""
def __init__(self):
self.join = self.sep.join
def parse_parts(self, parts):
parsed = []
sep = self.sep
altsep = self.altsep
drv = root = ''
it = reversed(parts)
for part in it:
if not part:
continue
if altsep:
part = part.replace(altsep, sep)
drv, root, rel = self.splitroot(part)
if sep in rel:
for x in reversed(rel.split(sep)):
if x and x != '.':
parsed.append(sys.intern(x))
else:
if rel and rel != '.':
parsed.append(sys.intern(rel))
if drv or root:
if not drv:
# If no drive is present, try to find one in the previous
# parts. This makes the result of parsing e.g.
# ("C:", "/", "a") reasonably intuitive.
for part in it:
if not part:
continue
if altsep:
part = part.replace(altsep, sep)
drv = self.splitroot(part)[0]
if drv:
break
break
if drv or root:
parsed.append(drv + root)
parsed.reverse()
return drv, root, parsed
def join_parsed_parts(self, drv, root, parts, drv2, root2, parts2):
"""
Join the two paths represented by the respective
(drive, root, parts) tuples. Return a new (drive, root, parts) tuple.
"""
if root2:
if not drv2 and drv:
return drv, root2, [drv + root2] + parts2[1:]
elif drv2:
if drv2 == drv or self.casefold(drv2) == self.casefold(drv):
# Same drive => second path is relative to the first
return drv, root, parts + parts2[1:]
else:
# Second path is non-anchored (common case)
return drv, root, parts + parts2
return drv2, root2, parts2
class _WindowsFlavour(_Flavour):
# Reference for Windows paths can be found at
# http://msdn.microsoft.com/en-us/library/aa365247%28v=vs.85%29.aspx
sep = '\\'
altsep = '/'
has_drv = True
pathmod = ntpath
is_supported = (os.name == 'nt')
drive_letters = (
set(chr(x) for x in range(ord('a'), ord('z') + 1)) |
set(chr(x) for x in range(ord('A'), ord('Z') + 1))
)
ext_namespace_prefix = '\\\\?\\'
reserved_names = (
{'CON', 'PRN', 'AUX', 'NUL'} |
{'COM%d' % i for i in range(1, 10)} |
{'LPT%d' % i for i in range(1, 10)}
)
# Interesting findings about extended paths:
# - '\\?\c:\a', '//?/c:\a' and '//?/c:/a' are all supported
# but '\\?\c:/a' is not
# - extended paths are always absolute; "relative" extended paths will
# fail.
def splitroot(self, part, sep=sep):
first = part[0:1]
second = part[1:2]
if (second == sep and first == sep):
# XXX extended paths should also disable the collapsing of "."
# components (according to MSDN docs).
prefix, part = self._split_extended_path(part)
first = part[0:1]
second = part[1:2]
else:
prefix = ''
third = part[2:3]
if (second == sep and first == sep and third != sep):
# is a UNC path:
# vvvvvvvvvvvvvvvvvvvvv root
# \\machine\mountpoint\directory\etc\...
# directory ^^^^^^^^^^^^^^
index = part.find(sep, 2)
if index != -1:
index2 = part.find(sep, index + 1)
# a UNC path can't have two slashes in a row
# (after the initial two)
if index2 != index + 1:
if index2 == -1:
index2 = len(part)
if prefix:
return prefix + part[1:index2], sep, part[index2+1:]
else:
return part[:index2], sep, part[index2+1:]
drv = root = ''
if second == ':' and first in self.drive_letters:
drv = part[:2]
part = part[2:]
first = third
if first == sep:
root = first
part = part.lstrip(sep)
return prefix + drv, root, part
def casefold(self, s):
return s.lower()
def casefold_parts(self, parts):
return [p.lower() for p in parts]
def resolve(self, path):
s = str(path)
if not s:
return os.getcwd()
if _getfinalpathname is not None:
return self._ext_to_normal(_getfinalpathname(s))
# Means fallback on absolute
return None
def _split_extended_path(self, s, ext_prefix=ext_namespace_prefix):
prefix = ''
if s.startswith(ext_prefix):
prefix = s[:4]
s = s[4:]
if s.startswith('UNC\\'):
prefix += s[:3]
s = '\\' + s[3:]
return prefix, s
def _ext_to_normal(self, s):
# Turn back an extended path into a normal DOS-like path
return self._split_extended_path(s)[1]
def is_reserved(self, parts):
# NOTE: the rules for reserved names seem somewhat complicated
# (e.g. r"..\NUL" is reserved but not r"foo\NUL").
# We err on the side of caution and return True for paths which are
# not considered reserved by Windows.
if not parts:
return False
if parts[0].startswith('\\\\'):
# UNC paths are never reserved
return False
return parts[-1].partition('.')[0].upper() in self.reserved_names
def make_uri(self, path):
# Under Windows, file URIs use the UTF-8 encoding.
drive = path.drive
if len(drive) == 2 and drive[1] == ':':
# It's a path on a local drive => 'file:///c:/a/b'
rest = path.as_posix()[2:].lstrip('/')
return 'file:///%s/%s' % (
drive, urlquote_from_bytes(rest.encode('utf-8')))
else:
# It's a path on a network drive => 'file://host/share/a/b'
return 'file:' + urlquote_from_bytes(path.as_posix().encode('utf-8'))
class _PosixFlavour(_Flavour):
sep = '/'
altsep = ''
has_drv = False
pathmod = posixpath
is_supported = (os.name != 'nt')
def splitroot(self, part, sep=sep):
if part and part[0] == sep:
stripped_part = part.lstrip(sep)
# According to POSIX path resolution:
# http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_11
# "A pathname that begins with two successive slashes may be
# interpreted in an implementation-defined manner, although more
# than two leading slashes shall be treated as a single slash".
if len(part) - len(stripped_part) == 2:
return '', sep * 2, stripped_part
else:
return '', sep, stripped_part
else:
return '', '', part
def casefold(self, s):
return s
def casefold_parts(self, parts):
return parts
def resolve(self, path):
sep = self.sep
accessor = path._accessor
seen = {}
def _resolve(path, rest):
if rest.startswith(sep):
path = ''
for name in rest.split(sep):
if not name or name == '.':
# current dir
continue
if name == '..':
# parent dir
path, _, _ = path.rpartition(sep)
continue
newpath = path + sep + name
if newpath in seen:
# Already seen this path
path = seen[newpath]
if path is not None:
# use cached value
continue
# The symlink is not resolved, so we must have a symlink loop.
raise RuntimeError("Symlink loop from %r" % newpath)
# Resolve the symbolic link
try:
target = accessor.readlink(newpath)
except OSError as e:
if e.errno != EINVAL:
raise
# Not a symlink
path = newpath
else:
seen[newpath] = None # not resolved symlink
path = _resolve(path, target)
seen[newpath] = path # resolved symlink
return path
# NOTE: according to POSIX, getcwd() cannot contain path components
# which are symlinks.
base = '' if path.is_absolute() else os.getcwd()
return _resolve(base, str(path)) or sep
def is_reserved(self, parts):
return False
def make_uri(self, path):
# We represent the path using the local filesystem encoding,
# for portability to other applications.
bpath = bytes(path)
return 'file://' + urlquote_from_bytes(bpath)
_windows_flavour = _WindowsFlavour()
_posix_flavour = _PosixFlavour()
class _Accessor:
"""An accessor implements a particular (system-specific or not) way of
accessing paths on the filesystem."""
class _NormalAccessor(_Accessor):
def _wrap_strfunc(strfunc):
@functools.wraps(strfunc)
def wrapped(pathobj, *args):
return strfunc(str(pathobj), *args)
return staticmethod(wrapped)
def _wrap_binary_strfunc(strfunc):
@functools.wraps(strfunc)
def wrapped(pathobjA, pathobjB, *args):
return strfunc(str(pathobjA), str(pathobjB), *args)
return staticmethod(wrapped)
stat = _wrap_strfunc(os.stat)
lstat = _wrap_strfunc(os.lstat)
open = _wrap_strfunc(os.open)
listdir = _wrap_strfunc(os.listdir)
chmod = _wrap_strfunc(os.chmod)
if hasattr(os, "lchmod"):
lchmod = _wrap_strfunc(os.lchmod)
else:
def lchmod(self, pathobj, mode):
raise NotImplementedError("lchmod() not available on this system")
mkdir = _wrap_strfunc(os.mkdir)
unlink = _wrap_strfunc(os.unlink)
rmdir = _wrap_strfunc(os.rmdir)
rename = _wrap_binary_strfunc(os.rename)
replace = _wrap_binary_strfunc(os.replace)
if nt:
if supports_symlinks:
symlink = _wrap_binary_strfunc(os.symlink)
else:
def symlink(a, b, target_is_directory):
raise NotImplementedError("symlink() not available on this system")
else:
# Under POSIX, os.symlink() takes two args
@staticmethod
def symlink(a, b, target_is_directory):
return os.symlink(str(a), str(b))
utime = _wrap_strfunc(os.utime)
# Helper for resolve()
def readlink(self, path):
return os.readlink(path)
_normal_accessor = _NormalAccessor()
#
# Globbing helpers
#
@contextmanager
def _cached(func):
try:
func.__cached__
yield func
except AttributeError:
cache = {}
def wrapper(*args):
try:
return cache[args]
except KeyError:
value = cache[args] = func(*args)
return value
wrapper.__cached__ = True
try:
yield wrapper
finally:
cache.clear()
def _make_selector(pattern_parts):
pat = pattern_parts[0]
child_parts = pattern_parts[1:]
if pat == '**':
cls = _RecursiveWildcardSelector
elif '**' in pat:
raise ValueError("Invalid pattern: '**' can only be an entire path component")
elif _is_wildcard_pattern(pat):
cls = _WildcardSelector
else:
cls = _PreciseSelector
return cls(pat, child_parts)
if hasattr(functools, "lru_cache"):
_make_selector = functools.lru_cache()(_make_selector)
class _Selector:
"""A selector matches a specific glob pattern part against the children
of a given path."""
def __init__(self, child_parts):
self.child_parts = child_parts
if child_parts:
self.successor = _make_selector(child_parts)
else:
self.successor = _TerminatingSelector()
def select_from(self, parent_path):
"""Iterate over all child paths of `parent_path` matched by this
selector. This can contain parent_path itself."""
path_cls = type(parent_path)
is_dir = path_cls.is_dir
exists = path_cls.exists
listdir = parent_path._accessor.listdir
return self._select_from(parent_path, is_dir, exists, listdir)
class _TerminatingSelector:
def _select_from(self, parent_path, is_dir, exists, listdir):
yield parent_path
class _PreciseSelector(_Selector):
def __init__(self, name, child_parts):
self.name = name
_Selector.__init__(self, child_parts)
def _select_from(self, parent_path, is_dir, exists, listdir):
try:
if not is_dir(parent_path):
return
path = parent_path._make_child_relpath(self.name)
if exists(path):
for p in self.successor._select_from(path, is_dir, exists, listdir):
yield p
except PermissionError:
return
class _WildcardSelector(_Selector):
def __init__(self, pat, child_parts):
self.pat = re.compile(fnmatch.translate(pat))
_Selector.__init__(self, child_parts)
def _select_from(self, parent_path, is_dir, exists, listdir):
try:
if not is_dir(parent_path):
return
cf = parent_path._flavour.casefold
for name in listdir(parent_path):
casefolded = cf(name)
if self.pat.match(casefolded):
path = parent_path._make_child_relpath(name)
for p in self.successor._select_from(path, is_dir, exists, listdir):
yield p
except PermissionError:
return
class _RecursiveWildcardSelector(_Selector):
def __init__(self, pat, child_parts):
_Selector.__init__(self, child_parts)
def _iterate_directories(self, parent_path, is_dir, listdir):
yield parent_path
try:
for name in listdir(parent_path):
path = parent_path._make_child_relpath(name)
if is_dir(path) and not path.is_symlink():
for p in self._iterate_directories(path, is_dir, listdir):
yield p
except PermissionError:
return
def _select_from(self, parent_path, is_dir, exists, listdir):
try:
if not is_dir(parent_path):
return
with _cached(listdir) as listdir:
yielded = set()
try:
successor_select = self.successor._select_from
for starting_point in self._iterate_directories(parent_path, is_dir, listdir):
for p in successor_select(starting_point, is_dir, exists, listdir):
if p not in yielded:
yield p
yielded.add(p)
finally:
yielded.clear()
except PermissionError:
return
#
# Public API
#
class _PathParents(Sequence):
"""This object provides sequence-like access to the logical ancestors
of a path. Don't try to construct it yourself."""
__slots__ = ('_pathcls', '_drv', '_root', '_parts')
def __init__(self, path):
# We don't store the instance to avoid reference cycles
self._pathcls = type(path)
self._drv = path._drv
self._root = path._root
self._parts = path._parts
def __len__(self):
if self._drv or self._root:
return len(self._parts) - 1
else:
return len(self._parts)
def __getitem__(self, idx):
if idx < 0 or idx >= len(self):
raise IndexError(idx)
return self._pathcls._from_parsed_parts(self._drv, self._root,
self._parts[:-idx - 1])
def __repr__(self):
return "<{}.parents>".format(self._pathcls.__name__)
class PurePath(object):
"""PurePath represents a filesystem path and offers operations which
don't imply any actual filesystem I/O. Depending on your system,
instantiating a PurePath will return either a PurePosixPath or a
PureWindowsPath object. You can also instantiate either of these classes
directly, regardless of your system.
"""
__slots__ = (
'_drv', '_root', '_parts',
'_str', '_hash', '_pparts', '_cached_cparts',
)
def __new__(cls, *args):
"""Construct a PurePath from one or several strings and or existing
PurePath objects. The strings and path objects are combined so as
to yield a canonicalized path, which is incorporated into the
new PurePath object.
"""
if cls is PurePath:
cls = PureWindowsPath if os.name == 'nt' else PurePosixPath
return cls._from_parts(args)
def __reduce__(self):
# Using the parts tuple helps share interned path parts
# when pickling related paths.
return (self.__class__, tuple(self._parts))
@classmethod
def _parse_args(cls, args):
# This is useful when you don't want to create an instance, just
# canonicalize some constructor arguments.
parts = []
for a in args:
if isinstance(a, PurePath):
parts += a._parts
elif isinstance(a, str):
# Force-cast str subclasses to str (issue #21127)
parts.append(str(a))
else:
raise TypeError(
"argument should be a path or str object, not %r"
% type(a))
return cls._flavour.parse_parts(parts)
@classmethod
def _from_parts(cls, args, init=True):
# We need to call _parse_args on the instance, so as to get the
# right flavour.
self = object.__new__(cls)
drv, root, parts = self._parse_args(args)
self._drv = drv
self._root = root
self._parts = parts
if init:
self._init()
return self
@classmethod
def _from_parsed_parts(cls, drv, root, parts, init=True):
self = object.__new__(cls)
self._drv = drv
self._root = root
self._parts = parts
if init:
self._init()
return self
@classmethod
def _format_parsed_parts(cls, drv, root, parts):
if drv or root:
return drv + root + cls._flavour.join(parts[1:])
else:
return cls._flavour.join(parts)
def _init(self):
# Overriden in concrete Path
pass
def _make_child(self, args):
drv, root, parts = self._parse_args(args)
drv, root, parts = self._flavour.join_parsed_parts(
self._drv, self._root, self._parts, drv, root, parts)
return self._from_parsed_parts(drv, root, parts)
def __str__(self):
"""Return the string representation of the path, suitable for
passing to system calls."""
try:
return self._str
except AttributeError:
self._str = self._format_parsed_parts(self._drv, self._root,
self._parts) or '.'
return self._str
def as_posix(self):
"""Return the string representation of the path with forward (/)
slashes."""
f = self._flavour
return str(self).replace(f.sep, '/')
def __bytes__(self):
"""Return the bytes representation of the path. This is only
recommended to use under Unix."""
return os.fsencode(str(self))
def __repr__(self):
return "{}({!r})".format(self.__class__.__name__, self.as_posix())
def as_uri(self):
"""Return the path as a 'file' URI."""
if not self.is_absolute():
raise ValueError("relative path can't be expressed as a file URI")
return self._flavour.make_uri(self)
@property
def _cparts(self):
# Cached casefolded parts, for hashing and comparison
try:
return self._cached_cparts
except AttributeError:
self._cached_cparts = self._flavour.casefold_parts(self._parts)
return self._cached_cparts
def __eq__(self, other):
if not isinstance(other, PurePath):
return NotImplemented
return self._cparts == other._cparts and self._flavour is other._flavour
def __hash__(self):
try:
return self._hash
except AttributeError:
self._hash = hash(tuple(self._cparts))
return self._hash
def __lt__(self, other):
if not isinstance(other, PurePath) or self._flavour is not other._flavour:
return NotImplemented
return self._cparts < other._cparts
def __le__(self, other):
if not isinstance(other, PurePath) or self._flavour is not other._flavour:
return NotImplemented
return self._cparts <= other._cparts
def __gt__(self, other):
if not isinstance(other, PurePath) or self._flavour is not other._flavour:
return NotImplemented
return self._cparts > other._cparts
def __ge__(self, other):
if not isinstance(other, PurePath) or self._flavour is not other._flavour:
return NotImplemented
return self._cparts >= other._cparts
drive = property(attrgetter('_drv'),
doc="""The drive prefix (letter or UNC path), if any.""")
root = property(attrgetter('_root'),
doc="""The root of the path, if any.""")
@property
def anchor(self):
"""The concatenation of the drive and root, or ''."""
anchor = self._drv + self._root
return anchor
@property
def name(self):
"""The final path component, if any."""
parts = self._parts
if len(parts) == (1 if (self._drv or self._root) else 0):
return ''
return parts[-1]
@property
def suffix(self):
"""The final component's last suffix, if any."""
name = self.name
i = name.rfind('.')
if 0 < i < len(name) - 1:
return name[i:]
else:
return ''
@property
def suffixes(self):
"""A list of the final component's suffixes, if any."""
name = self.name
if name.endswith('.'):
return []
name = name.lstrip('.')
return ['.' + suffix for suffix in name.split('.')[1:]]
@property
def stem(self):
"""The final path component, minus its last suffix."""
name = self.name
i = name.rfind('.')
if 0 < i < len(name) - 1:
return name[:i]
else:
return name
def with_name(self, name):
"""Return a new path with the file name changed."""
if not self.name:
raise ValueError("%r has an empty name" % (self,))
drv, root, parts = self._flavour.parse_parts((name,))
if (not name or name[-1] in [self._flavour.sep, self._flavour.altsep]
or drv or root or len(parts) != 1):
raise ValueError("Invalid name %r" % (name))
return self._from_parsed_parts(self._drv, self._root,
self._parts[:-1] + [name])
def with_suffix(self, suffix):
"""Return a new path with the file suffix changed (or added, if none)."""
# XXX if suffix is None, should the current suffix be removed?
f = self._flavour
if f.sep in suffix or f.altsep and f.altsep in suffix:
raise ValueError("Invalid suffix %r" % (suffix))
if suffix and not suffix.startswith('.') or suffix == '.':
raise ValueError("Invalid suffix %r" % (suffix))
name = self.name
if not name:
raise ValueError("%r has an empty name" % (self,))
old_suffix = self.suffix
if not old_suffix:
name = name + suffix
else:
name = name[:-len(old_suffix)] + suffix
return self._from_parsed_parts(self._drv, self._root,
self._parts[:-1] + [name])
def relative_to(self, *other):
"""Return the relative path to another path identified by the passed
arguments. If the operation is not possible (because this is not
a subpath of the other path), raise ValueError.
"""
# For the purpose of this method, drive and root are considered
# separate parts, i.e.:
# Path('c:/').relative_to('c:') gives Path('/')
# Path('c:/').relative_to('/') raise ValueError
if not other:
raise TypeError("need at least one argument")
parts = self._parts
drv = self._drv
root = self._root
if root:
abs_parts = [drv, root] + parts[1:]
else:
abs_parts = parts
to_drv, to_root, to_parts = self._parse_args(other)
if to_root:
to_abs_parts = [to_drv, to_root] + to_parts[1:]
else:
to_abs_parts = to_parts
n = len(to_abs_parts)
cf = self._flavour.casefold_parts
if (root or drv) if n == 0 else cf(abs_parts[:n]) != cf(to_abs_parts):
formatted = self._format_parsed_parts(to_drv, to_root, to_parts)
raise ValueError("{!r} does not start with {!r}"
.format(str(self), str(formatted)))
return self._from_parsed_parts('', root if n == 1 else '',
abs_parts[n:])
@property
def parts(self):
"""An object providing sequence-like access to the
components in the filesystem path."""
# We cache the tuple to avoid building a new one each time .parts
# is accessed. XXX is this necessary?
try:
return self._pparts
except AttributeError:
self._pparts = tuple(self._parts)
return self._pparts
def joinpath(self, *args):
"""Combine this path with one or several arguments, and return a
new path representing either a subpath (if all arguments are relative
paths) or a totally different path (if one of the arguments is
anchored).
"""
return self._make_child(args)
def __truediv__(self, key):
return self._make_child((key,))
def __rtruediv__(self, key):
return self._from_parts([key] + self._parts)
@property
def parent(self):
"""The logical parent of the path."""
drv = self._drv
root = self._root
parts = self._parts
if len(parts) == 1 and (drv or root):
return self
return self._from_parsed_parts(drv, root, parts[:-1])
@property
def parents(self):
"""A sequence of this path's logical parents."""
return _PathParents(self)
def is_absolute(self):
"""True if the path is absolute (has both a root and, if applicable,
a drive)."""
if not self._root:
return False
return not self._flavour.has_drv or bool(self._drv)
def is_reserved(self):
"""Return True if the path contains one of the special names reserved
by the system, if any."""
return self._flavour.is_reserved(self._parts)
def match(self, path_pattern):
"""
Return True if this path matches the given pattern.
"""
cf = self._flavour.casefold
path_pattern = cf(path_pattern)
drv, root, pat_parts = self._flavour.parse_parts((path_pattern,))
if not pat_parts:
raise ValueError("empty pattern")
if drv and drv != cf(self._drv):
return False
if root and root != cf(self._root):
return False
parts = self._cparts
if drv or root:
if len(pat_parts) != len(parts):
return False
pat_parts = pat_parts[1:]
elif len(pat_parts) > len(parts):
return False
for part, pat in zip(reversed(parts), reversed(pat_parts)):
if not fnmatch.fnmatchcase(part, pat):
return False
return True
class PurePosixPath(PurePath):
_flavour = _posix_flavour
__slots__ = ()
class PureWindowsPath(PurePath):
_flavour = _windows_flavour
__slots__ = ()
# Filesystem-accessing classes
class Path(PurePath):
__slots__ = (
'_accessor',
'_closed',
)
def __new__(cls, *args, **kwargs):
if cls is Path:
cls = WindowsPath if os.name == 'nt' else PosixPath
self = cls._from_parts(args, init=False)
if not self._flavour.is_supported:
raise NotImplementedError("cannot instantiate %r on your system"
% (cls.__name__,))
self._init()
return self
def _init(self,
# Private non-constructor arguments
template=None,
):
self._closed = False
if template is not None:
self._accessor = template._accessor
else:
self._accessor = _normal_accessor
def _make_child_relpath(self, part):
# This is an optimization used for dir walking. `part` must be
# a single part relative to this path.
parts = self._parts + [part]
return self._from_parsed_parts(self._drv, self._root, parts)
def __enter__(self):
if self._closed:
self._raise_closed()
return self
def __exit__(self, t, v, tb):
self._closed = True
def _raise_closed(self):
raise ValueError("I/O operation on closed path")
def _opener(self, name, flags, mode=0o666):
# A stub for the opener argument to built-in open()
return self._accessor.open(self, flags, mode)
def _raw_open(self, flags, mode=0o777):
"""
Open the file pointed by this path and return a file descriptor,
as os.open() does.
"""
if self._closed:
self._raise_closed()
return self._accessor.open(self, flags, mode)
# Public API
@classmethod
def cwd(cls):
"""Return a new path pointing to the current working directory
(as returned by os.getcwd()).
"""
return cls(os.getcwd())
def iterdir(self):
"""Iterate over the files in this directory. Does not yield any
result for the special paths '.' and '..'.
"""
if self._closed:
self._raise_closed()
for name in self._accessor.listdir(self):
if name in {'.', '..'}:
# Yielding a path object for these makes little sense
continue
yield self._make_child_relpath(name)
if self._closed:
self._raise_closed()
def glob(self, pattern):
"""Iterate over this subtree and yield all existing files (of any
kind, including directories) matching the given pattern.
"""
pattern = self._flavour.casefold(pattern)
drv, root, pattern_parts = self._flavour.parse_parts((pattern,))
if drv or root:
raise NotImplementedError("Non-relative patterns are unsupported")
selector = _make_selector(tuple(pattern_parts))
for p in selector.select_from(self):
yield p
def rglob(self, pattern):
"""Recursively yield all existing files (of any kind, including
directories) matching the given pattern, anywhere in this subtree.
"""
pattern = self._flavour.casefold(pattern)
drv, root, pattern_parts = self._flavour.parse_parts((pattern,))
if drv or root:
raise NotImplementedError("Non-relative patterns are unsupported")
selector = _make_selector(("**",) + tuple(pattern_parts))
for p in selector.select_from(self):
yield p
def absolute(self):
"""Return an absolute version of this path. This function works
even if the path doesn't point to anything.
No normalization is done, i.e. all '.' and '..' will be kept along.
Use resolve() to get the canonical path to a file.
"""
# XXX untested yet!
if self._closed:
self._raise_closed()
if self.is_absolute():
return self
# FIXME this must defer to the specific flavour (and, under Windows,
# use nt._getfullpathname())
obj = self._from_parts([os.getcwd()] + self._parts, init=False)
obj._init(template=self)
return obj
def resolve(self):
"""
Make the path absolute, resolving all symlinks on the way and also
normalizing it (for example turning slashes into backslashes under
Windows).
"""
if self._closed:
self._raise_closed()
s = self._flavour.resolve(self)
if s is None:
# No symlink resolution => for consistency, raise an error if
# the path doesn't exist or is forbidden
self.stat()
s = str(self.absolute())
# Now we have no symlinks in the path, it's safe to normalize it.
normed = self._flavour.pathmod.normpath(s)
obj = self._from_parts((normed,), init=False)
obj._init(template=self)
return obj
def stat(self):
"""
Return the result of the stat() system call on this path, like
os.stat() does.
"""
return self._accessor.stat(self)
def owner(self):
"""
Return the login name of the file owner.
"""
import pwd
return pwd.getpwuid(self.stat().st_uid).pw_name
def group(self):
"""
Return the group name of the file gid.
"""
import grp
return grp.getgrgid(self.stat().st_gid).gr_name
def open(self, mode='r', buffering=-1, encoding=None,
errors=None, newline=None):
"""
Open the file pointed by this path and return a file object, as
the built-in open() function does.
"""
if self._closed:
self._raise_closed()
return io.open(str(self), mode, buffering, encoding, errors, newline,
opener=self._opener)
def touch(self, mode=0o666, exist_ok=True):
"""
Create this file with the given access mode, if it doesn't exist.
"""
if self._closed:
self._raise_closed()
if exist_ok:
# First try to bump modification time
# Implementation note: GNU touch uses the UTIME_NOW option of
# the utimensat() / futimens() functions.
try:
self._accessor.utime(self, None)
except OSError:
# Avoid exception chaining
pass
else:
return
flags = os.O_CREAT | os.O_WRONLY
if not exist_ok:
flags |= os.O_EXCL
fd = self._raw_open(flags, mode)
os.close(fd)
def mkdir(self, mode=0o777, parents=False):
if self._closed:
self._raise_closed()
if not parents:
self._accessor.mkdir(self, mode)
else:
try:
self._accessor.mkdir(self, mode)
except OSError as e:
if e.errno != ENOENT:
raise
self.parent.mkdir(parents=True)
self._accessor.mkdir(self, mode)
def chmod(self, mode):
"""
Change the permissions of the path, like os.chmod().
"""
if self._closed:
self._raise_closed()
self._accessor.chmod(self, mode)
def lchmod(self, mode):
"""
Like chmod(), except if the path points to a symlink, the symlink's
permissions are changed, rather than its target's.
"""
if self._closed:
self._raise_closed()
self._accessor.lchmod(self, mode)
def unlink(self):
"""
Remove this file or link.
If the path is a directory, use rmdir() instead.
"""
if self._closed:
self._raise_closed()
self._accessor.unlink(self)
def rmdir(self):
"""
Remove this directory. The directory must be empty.
"""
if self._closed:
self._raise_closed()
self._accessor.rmdir(self)
def lstat(self):
"""
Like stat(), except if the path points to a symlink, the symlink's
status information is returned, rather than its target's.
"""
if self._closed:
self._raise_closed()
return self._accessor.lstat(self)
def rename(self, target):
"""
Rename this path to the given path.
"""
if self._closed:
self._raise_closed()
self._accessor.rename(self, target)
def replace(self, target):
"""
Rename this path to the given path, clobbering the existing
destination if it exists.
"""
if self._closed:
self._raise_closed()
self._accessor.replace(self, target)
def symlink_to(self, target, target_is_directory=False):
"""
Make this path a symlink pointing to the given path.
Note the order of arguments (self, target) is the reverse of os.symlink's.
"""
if self._closed:
self._raise_closed()
self._accessor.symlink(target, self, target_is_directory)
# Convenience functions for querying the stat results
def exists(self):
"""
Whether this path exists.
"""
try:
self.stat()
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
return False
return True
def is_dir(self):
"""
Whether this path is a directory.
"""
try:
return S_ISDIR(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
def is_file(self):
"""
Whether this path is a regular file (also True for symlinks pointing
to regular files).
"""
try:
return S_ISREG(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
def is_symlink(self):
"""
Whether this path is a symbolic link.
"""
try:
return S_ISLNK(self.lstat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist
return False
def is_block_device(self):
"""
Whether this path is a block device.
"""
try:
return S_ISBLK(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
def is_char_device(self):
"""
Whether this path is a character device.
"""
try:
return S_ISCHR(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
def is_fifo(self):
"""
Whether this path is a FIFO.
"""
try:
return S_ISFIFO(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
def is_socket(self):
"""
Whether this path is a socket.
"""
try:
return S_ISSOCK(self.stat().st_mode)
except OSError as e:
if e.errno not in (ENOENT, ENOTDIR):
raise
# Path doesn't exist or is a broken symlink
# (see https://bitbucket.org/pitrou/pathlib/issue/12/)
return False
class PosixPath(Path, PurePosixPath):
__slots__ = ()
class WindowsPath(Path, PureWindowsPath):
__slots__ = ()
#! /usr/bin/env python3
"""
The Python Debugger Pdb
=======================
To use the debugger in its simplest form:
>>> import pdb
>>> pdb.run('<a statement>')
The debugger's prompt is '(Pdb) '. This will stop in the first
function call in <a statement>.
Alternatively, if a statement terminated with an unhandled exception,
you can use pdb's post-mortem facility to inspect the contents of the
traceback:
>>> <a statement>
<exception traceback>
>>> import pdb
>>> pdb.pm()
The commands recognized by the debugger are listed in the next
section. Most can be abbreviated as indicated; e.g., h(elp) means
that 'help' can be typed as 'h' or 'help' (but not as 'he' or 'hel',
nor as 'H' or 'Help' or 'HELP'). Optional arguments are enclosed in
square brackets. Alternatives in the command syntax are separated
by a vertical bar (|).
A blank line repeats the previous command literally, except for
'list', where it lists the next 11 lines.
Commands that the debugger doesn't recognize are assumed to be Python
statements and are executed in the context of the program being
debugged. Python statements can also be prefixed with an exclamation
point ('!'). This is a powerful way to inspect the program being
debugged; it is even possible to change variables or call functions.
When an exception occurs in such a statement, the exception name is
printed but the debugger's state is not changed.
The debugger supports aliases, which can save typing. And aliases can
have parameters (see the alias help entry) which allows one a certain
level of adaptability to the context under examination.
Multiple commands may be entered on a single line, separated by the
pair ';;'. No intelligence is applied to separating the commands; the
input is split at the first ';;', even if it is in the middle of a
quoted string.
If a file ".pdbrc" exists in your home directory or in the current
directory, it is read in and executed as if it had been typed at the
debugger prompt. This is particularly useful for aliases. If both
files exist, the one in the home directory is read first and aliases
defined there can be overriden by the local file.
Aside from aliases, the debugger is not directly programmable; but it
is implemented as a class from which you can derive your own debugger
class, which you can make as fancy as you like.
Debugger commands
=================
"""
# NOTE: the actual command documentation is collected from docstrings of the
# commands and is appended to __doc__ after the class has been defined.
import os
import re
import sys
import cmd
import bdb
import dis
import code
import glob
import pprint
import signal
import inspect
import traceback
import linecache
class Restart(Exception):
"""Causes a debugger to be restarted for the debugged python program."""
pass
__all__ = ["run", "pm", "Pdb", "runeval", "runctx", "runcall", "set_trace",
"post_mortem", "help"]
def find_function(funcname, filename):
cre = re.compile(r'def\s+%s\s*[(]' % re.escape(funcname))
try:
fp = open(filename)
except OSError:
return None
# consumer of this info expects the first line to be 1
with fp:
for lineno, line in enumerate(fp, start=1):
if cre.match(line):
return funcname, filename, lineno
return None
def getsourcelines(obj):
lines, lineno = inspect.findsource(obj)
if inspect.isframe(obj) and obj.f_globals is obj.f_locals:
# must be a module frame: do not try to cut a block out of it
return lines, 1
elif inspect.ismodule(obj):
return lines, 1
return inspect.getblock(lines[lineno:]), lineno+1
def lasti2lineno(code, lasti):
linestarts = list(dis.findlinestarts(code))
linestarts.reverse()
for i, lineno in linestarts:
if lasti >= i:
return lineno
return 0
class _rstr(str):
"""String that doesn't quote its repr."""
def __repr__(self):
return self
# Interaction prompt line will separate file and call info from code
# text using value of line_prefix string. A newline and arrow may
# be to your liking. You can set it once pdb is imported using the
# command "pdb.line_prefix = '\n% '".
# line_prefix = ': ' # Use this to get the old situation back
line_prefix = '\n-> ' # Probably a better default
class Pdb(bdb.Bdb, cmd.Cmd):
def __init__(self, completekey='tab', stdin=None, stdout=None, skip=None,
nosigint=False):
bdb.Bdb.__init__(self, skip=skip)
cmd.Cmd.__init__(self, completekey, stdin, stdout)
if stdout:
self.use_rawinput = 0
self.prompt = '(Pdb) '
self.aliases = {}
self.displaying = {}
self.mainpyfile = ''
self._wait_for_mainpyfile = False
self.tb_lineno = {}
# Try to load readline if it exists
try:
import readline
# remove some common file name delimiters
readline.set_completer_delims(' \t\n`@#$%^&*()=+[{]}\\|;:\'",<>?')
except ImportError:
pass
self.allow_kbdint = False
self.nosigint = nosigint
# Read $HOME/.pdbrc and ./.pdbrc
self.rcLines = []
if 'HOME' in os.environ:
envHome = os.environ['HOME']
try:
with open(os.path.join(envHome, ".pdbrc")) as rcFile:
self.rcLines.extend(rcFile)
except OSError:
pass
try:
with open(".pdbrc") as rcFile:
self.rcLines.extend(rcFile)
except OSError:
pass
self.commands = {} # associates a command list to breakpoint numbers
self.commands_doprompt = {} # for each bp num, tells if the prompt
# must be disp. after execing the cmd list
self.commands_silent = {} # for each bp num, tells if the stack trace
# must be disp. after execing the cmd list
self.commands_defining = False # True while in the process of defining
# a command list
self.commands_bnum = None # The breakpoint number for which we are
# defining a list
def sigint_handler(self, signum, frame):
if self.allow_kbdint:
raise KeyboardInterrupt
self.message("\nProgram interrupted. (Use 'cont' to resume).")
self.set_step()
self.set_trace(frame)
# restore previous signal handler
signal.signal(signal.SIGINT, self._previous_sigint_handler)
def reset(self):
bdb.Bdb.reset(self)
self.forget()
def forget(self):
self.lineno = None
self.stack = []
self.curindex = 0
self.curframe = None
self.tb_lineno.clear()
def setup(self, f, tb):
self.forget()
self.stack, self.curindex = self.get_stack(f, tb)
while tb:
# when setting up post-mortem debugging with a traceback, save all
# the original line numbers to be displayed along the current line
# numbers (which can be different, e.g. due to finally clauses)
lineno = lasti2lineno(tb.tb_frame.f_code, tb.tb_lasti)
self.tb_lineno[tb.tb_frame] = lineno
tb = tb.tb_next
self.curframe = self.stack[self.curindex][0]
# The f_locals dictionary is updated from the actual frame
# locals whenever the .f_locals accessor is called, so we
# cache it here to ensure that modifications are not overwritten.
self.curframe_locals = self.curframe.f_locals
return self.execRcLines()
# Can be executed earlier than 'setup' if desired
def execRcLines(self):
if not self.rcLines:
return
# local copy because of recursion
rcLines = self.rcLines
rcLines.reverse()
# execute every line only once
self.rcLines = []
while rcLines:
line = rcLines.pop().strip()
if line and line[0] != '#':
if self.onecmd(line):
# if onecmd returns True, the command wants to exit
# from the interaction, save leftover rc lines
# to execute before next interaction
self.rcLines += reversed(rcLines)
return True
# Override Bdb methods
def user_call(self, frame, argument_list):
"""This method is called when there is the remote possibility
that we ever need to stop in this function."""
if self._wait_for_mainpyfile:
return
if self.stop_here(frame):
self.message('--Call--')
self.interaction(frame, None)
def user_line(self, frame):
"""This function is called when we stop or break at this line."""
if self._wait_for_mainpyfile:
if (self.mainpyfile != self.canonic(frame.f_code.co_filename)
or frame.f_lineno <= 0):
return
self._wait_for_mainpyfile = False
if self.bp_commands(frame):
self.interaction(frame, None)
def bp_commands(self, frame):
"""Call every command that was set for the current active breakpoint
(if there is one).
Returns True if the normal interaction function must be called,
False otherwise."""
# self.currentbp is set in bdb in Bdb.break_here if a breakpoint was hit
if getattr(self, "currentbp", False) and \
self.currentbp in self.commands:
currentbp = self.currentbp
self.currentbp = 0
lastcmd_back = self.lastcmd
self.setup(frame, None)
for line in self.commands[currentbp]:
self.onecmd(line)
self.lastcmd = lastcmd_back
if not self.commands_silent[currentbp]:
self.print_stack_entry(self.stack[self.curindex])
if self.commands_doprompt[currentbp]:
self._cmdloop()
self.forget()
return
return 1
def user_return(self, frame, return_value):
"""This function is called when a return trap is set here."""
if self._wait_for_mainpyfile:
return
frame.f_locals['__return__'] = return_value
self.message('--Return--')
self.interaction(frame, None)
def user_exception(self, frame, exc_info):
"""This function is called if an exception occurs,
but only if we are to stop at or just below this level."""
if self._wait_for_mainpyfile:
return
exc_type, exc_value, exc_traceback = exc_info
frame.f_locals['__exception__'] = exc_type, exc_value
# An 'Internal StopIteration' exception is an exception debug event
# issued by the interpreter when handling a subgenerator run with
# 'yield from' or a generator controled by a for loop. No exception has
# actually occurred in this case. The debugger uses this debug event to
# stop when the debuggee is returning from such generators.
prefix = 'Internal ' if (not exc_traceback
and exc_type is StopIteration) else ''
self.message('%s%s' % (prefix,
traceback.format_exception_only(exc_type, exc_value)[-1].strip()))
self.interaction(frame, exc_traceback)
# General interaction function
def _cmdloop(self):
while True:
try:
# keyboard interrupts allow for an easy way to cancel
# the current command, so allow them during interactive input
self.allow_kbdint = True
self.cmdloop()
self.allow_kbdint = False
break
except KeyboardInterrupt:
self.message('--KeyboardInterrupt--')
# Called before loop, handles display expressions
def preloop(self):
displaying = self.displaying.get(self.curframe)
if displaying:
for expr, oldvalue in displaying.items():
newvalue = self._getval_except(expr)
# check for identity first; this prevents custom __eq__ to
# be called at every loop, and also prevents instances whose
# fields are changed to be displayed
if newvalue is not oldvalue and newvalue != oldvalue:
displaying[expr] = newvalue
self.message('display %s: %r [old: %r]' %
(expr, newvalue, oldvalue))
def interaction(self, frame, traceback):
if self.setup(frame, traceback):
# no interaction desired at this time (happens if .pdbrc contains
# a command like "continue")
self.forget()
return
self.print_stack_entry(self.stack[self.curindex])
self._cmdloop()
self.forget()
def displayhook(self, obj):
"""Custom displayhook for the exec in default(), which prevents
assignment of the _ variable in the builtins.
"""
# reproduce the behavior of the standard displayhook, not printing None
if obj is not None:
self.message(repr(obj))
def default(self, line):
if line[:1] == '!': line = line[1:]
locals = self.curframe_locals
globals = self.curframe.f_globals
try:
code = compile(line + '\n', '<stdin>', 'single')
save_stdout = sys.stdout
save_stdin = sys.stdin
save_displayhook = sys.displayhook
try:
sys.stdin = self.stdin
sys.stdout = self.stdout
sys.displayhook = self.displayhook
exec(code, globals, locals)
finally:
sys.stdout = save_stdout
sys.stdin = save_stdin
sys.displayhook = save_displayhook
except:
exc_info = sys.exc_info()[:2]
self.error(traceback.format_exception_only(*exc_info)[-1].strip())
def precmd(self, line):
"""Handle alias expansion and ';;' separator."""
if not line.strip():
return line
args = line.split()
while args[0] in self.aliases:
line = self.aliases[args[0]]
ii = 1
for tmpArg in args[1:]:
line = line.replace("%" + str(ii),
tmpArg)
ii += 1
line = line.replace("%*", ' '.join(args[1:]))
args = line.split()
# split into ';;' separated commands
# unless it's an alias command
if args[0] != 'alias':
marker = line.find(';;')
if marker >= 0:
# queue up everything after marker
next = line[marker+2:].lstrip()
self.cmdqueue.append(next)
line = line[:marker].rstrip()
return line
def onecmd(self, line):
"""Interpret the argument as though it had been typed in response
to the prompt.
Checks whether this line is typed at the normal prompt or in
a breakpoint command list definition.
"""
if not self.commands_defining:
return cmd.Cmd.onecmd(self, line)
else:
return self.handle_command_def(line)
def handle_command_def(self, line):
"""Handles one command line during command list definition."""
cmd, arg, line = self.parseline(line)
if not cmd:
return
if cmd == 'silent':
self.commands_silent[self.commands_bnum] = True
return # continue to handle other cmd def in the cmd list
elif cmd == 'end':
self.cmdqueue = []
return 1 # end of cmd list
cmdlist = self.commands[self.commands_bnum]
if arg:
cmdlist.append(cmd+' '+arg)
else:
cmdlist.append(cmd)
# Determine if we must stop
try:
func = getattr(self, 'do_' + cmd)
except AttributeError:
func = self.default
# one of the resuming commands
if func.__name__ in self.commands_resuming:
self.commands_doprompt[self.commands_bnum] = False
self.cmdqueue = []
return 1
return
# interface abstraction functions
def message(self, msg):
print(msg, file=self.stdout)
def error(self, msg):
print('***', msg, file=self.stdout)
# Generic completion functions. Individual complete_foo methods can be
# assigned below to one of these functions.
def _complete_location(self, text, line, begidx, endidx):
# Complete a file/module/function location for break/tbreak/clear.
if line.strip().endswith((':', ',')):
# Here comes a line number or a condition which we can't complete.
return []
# First, try to find matching functions (i.e. expressions).
try:
ret = self._complete_expression(text, line, begidx, endidx)
except Exception:
ret = []
# Then, try to complete file names as well.
globs = glob.glob(text + '*')
for fn in globs:
if os.path.isdir(fn):
ret.append(fn + '/')
elif os.path.isfile(fn) and fn.lower().endswith(('.py', '.pyw')):
ret.append(fn + ':')
return ret
def _complete_bpnumber(self, text, line, begidx, endidx):
# Complete a breakpoint number. (This would be more helpful if we could
# display additional info along with the completions, such as file/line
# of the breakpoint.)
return [str(i) for i, bp in enumerate(bdb.Breakpoint.bpbynumber)
if bp is not None and str(i).startswith(text)]
def _complete_expression(self, text, line, begidx, endidx):
# Complete an arbitrary expression.
if not self.curframe:
return []
# Collect globals and locals. It is usually not really sensible to also
# complete builtins, and they clutter the namespace quite heavily, so we
# leave them out.
ns = self.curframe.f_globals.copy()
ns.update(self.curframe_locals)
if '.' in text:
# Walk an attribute chain up to the last part, similar to what
# rlcompleter does. This will bail if any of the parts are not
# simple attribute access, which is what we want.
dotted = text.split('.')
try:
obj = ns[dotted[0]]
for part in dotted[1:-1]:
obj = getattr(obj, part)
except (KeyError, AttributeError):
return []
prefix = '.'.join(dotted[:-1]) + '.'
return [prefix + n for n in dir(obj) if n.startswith(dotted[-1])]
else:
# Complete a simple name.
return [n for n in ns.keys() if n.startswith(text)]
# Command definitions, called by cmdloop()
# The argument is the remaining string on the command line
# Return true to exit from the command loop
def do_commands(self, arg):
"""commands [bpnumber]
(com) ...
(com) end
(Pdb)
Specify a list of commands for breakpoint number bpnumber.
The commands themselves are entered on the following lines.
Type a line containing just 'end' to terminate the commands.
The commands are executed when the breakpoint is hit.
To remove all commands from a breakpoint, type commands and
follow it immediately with end; that is, give no commands.
With no bpnumber argument, commands refers to the last
breakpoint set.
You can use breakpoint commands to start your program up
again. Simply use the continue command, or step, or any other
command that resumes execution.
Specifying any command resuming execution (currently continue,
step, next, return, jump, quit and their abbreviations)
terminates the command list (as if that command was
immediately followed by end). This is because any time you
resume execution (even with a simple next or step), you may
encounter another breakpoint -- which could have its own
command list, leading to ambiguities about which list to
execute.
If you use the 'silent' command in the command list, the usual
message about stopping at a breakpoint is not printed. This
may be desirable for breakpoints that are to print a specific
message and then continue. If none of the other commands
print anything, you will see no sign that the breakpoint was
reached.
"""
if not arg:
bnum = len(bdb.Breakpoint.bpbynumber) - 1
else:
try:
bnum = int(arg)
except:
self.error("Usage: commands [bnum]\n ...\n end")
return
self.commands_bnum = bnum
# Save old definitions for the case of a keyboard interrupt.
if bnum in self.commands:
old_command_defs = (self.commands[bnum],
self.commands_doprompt[bnum],
self.commands_silent[bnum])
else:
old_command_defs = None
self.commands[bnum] = []
self.commands_doprompt[bnum] = True
self.commands_silent[bnum] = False
prompt_back = self.prompt
self.prompt = '(com) '
self.commands_defining = True
try:
self.cmdloop()
except KeyboardInterrupt:
# Restore old definitions.
if old_command_defs:
self.commands[bnum] = old_command_defs[0]
self.commands_doprompt[bnum] = old_command_defs[1]
self.commands_silent[bnum] = old_command_defs[2]
else:
del self.commands[bnum]
del self.commands_doprompt[bnum]
del self.commands_silent[bnum]
self.error('command definition aborted, old commands restored')
finally:
self.commands_defining = False
self.prompt = prompt_back
complete_commands = _complete_bpnumber
def do_break(self, arg, temporary = 0):
"""b(reak) [ ([filename:]lineno | function) [, condition] ]
Without argument, list all breaks.
With a line number argument, set a break at this line in the
current file. With a function name, set a break at the first
executable line of that function. If a second argument is
present, it is a string specifying an expression which must
evaluate to true before the breakpoint is honored.
The line number may be prefixed with a filename and a colon,
to specify a breakpoint in another file (probably one that
hasn't been loaded yet). The file is searched for on
sys.path; the .py suffix may be omitted.
"""
if not arg:
if self.breaks: # There's at least one
self.message("Num Type Disp Enb Where")
for bp in bdb.Breakpoint.bpbynumber:
if bp:
self.message(bp.bpformat())
return
# parse arguments; comma has lowest precedence
# and cannot occur in filename
filename = None
lineno = None
cond = None
comma = arg.find(',')
if comma > 0:
# parse stuff after comma: "condition"
cond = arg[comma+1:].lstrip()
arg = arg[:comma].rstrip()
# parse stuff before comma: [filename:]lineno | function
colon = arg.rfind(':')
funcname = None
if colon >= 0:
filename = arg[:colon].rstrip()
f = self.lookupmodule(filename)
if not f:
self.error('%r not found from sys.path' % filename)
return
else:
filename = f
arg = arg[colon+1:].lstrip()
try:
lineno = int(arg)
except ValueError:
self.error('Bad lineno: %s' % arg)
return
else:
# no colon; can be lineno or function
try:
lineno = int(arg)
except ValueError:
try:
func = eval(arg,
self.curframe.f_globals,
self.curframe_locals)
except:
func = arg
try:
if hasattr(func, '__func__'):
func = func.__func__
code = func.__code__
#use co_name to identify the bkpt (function names
#could be aliased, but co_name is invariant)
funcname = code.co_name
lineno = code.co_firstlineno
filename = code.co_filename
except:
# last thing to try
(ok, filename, ln) = self.lineinfo(arg)
if not ok:
self.error('The specified object %r is not a function '
'or was not found along sys.path.' % arg)
return
funcname = ok # ok contains a function name
lineno = int(ln)
if not filename:
filename = self.defaultFile()
# Check for reasonable breakpoint
line = self.checkline(filename, lineno)
if line:
# now set the break point
err = self.set_break(filename, line, temporary, cond, funcname)
if err:
self.error(err)
else:
bp = self.get_breaks(filename, line)[-1]
self.message("Breakpoint %d at %s:%d" %
(bp.number, bp.file, bp.line))
# To be overridden in derived debuggers
def defaultFile(self):
"""Produce a reasonable default."""
filename = self.curframe.f_code.co_filename
if filename == '<string>' and self.mainpyfile:
filename = self.mainpyfile
return filename
do_b = do_break
complete_break = _complete_location
complete_b = _complete_location
def do_tbreak(self, arg):
"""tbreak [ ([filename:]lineno | function) [, condition] ]
Same arguments as break, but sets a temporary breakpoint: it
is automatically deleted when first hit.
"""
self.do_break(arg, 1)
complete_tbreak = _complete_location
def lineinfo(self, identifier):
failed = (None, None, None)
# Input is identifier, may be in single quotes
idstring = identifier.split("'")
if len(idstring) == 1:
# not in single quotes
id = idstring[0].strip()
elif len(idstring) == 3:
# quoted
id = idstring[1].strip()
else:
return failed
if id == '': return failed
parts = id.split('.')
# Protection for derived debuggers
if parts[0] == 'self':
del parts[0]
if len(parts) == 0:
return failed
# Best first guess at file to look at
fname = self.defaultFile()
if len(parts) == 1:
item = parts[0]
else:
# More than one part.
# First is module, second is method/class
f = self.lookupmodule(parts[0])
if f:
fname = f
item = parts[1]
answer = find_function(item, fname)
return answer or failed
def checkline(self, filename, lineno):
"""Check whether specified line seems to be executable.
Return `lineno` if it is, 0 if not (e.g. a docstring, comment, blank
line or EOF). Warning: testing is not comprehensive.
"""
# this method should be callable before starting debugging, so default
# to "no globals" if there is no current frame
globs = self.curframe.f_globals if hasattr(self, 'curframe') else None
line = linecache.getline(filename, lineno, globs)
if not line:
self.message('End of file')
return 0
line = line.strip()
# Don't allow setting breakpoint at a blank line
if (not line or (line[0] == '#') or
(line[:3] == '"""') or line[:3] == "'''"):
self.error('Blank or comment')
return 0
return lineno
def do_enable(self, arg):
"""enable bpnumber [bpnumber ...]
Enables the breakpoints given as a space separated list of
breakpoint numbers.
"""
args = arg.split()
for i in args:
try:
bp = self.get_bpbynumber(i)
except ValueError as err:
self.error(err)
else:
bp.enable()
self.message('Enabled %s' % bp)
complete_enable = _complete_bpnumber
def do_disable(self, arg):
"""disable bpnumber [bpnumber ...]
Disables the breakpoints given as a space separated list of
breakpoint numbers. Disabling a breakpoint means it cannot
cause the program to stop execution, but unlike clearing a
breakpoint, it remains in the list of breakpoints and can be
(re-)enabled.
"""
args = arg.split()
for i in args:
try:
bp = self.get_bpbynumber(i)
except ValueError as err:
self.error(err)
else:
bp.disable()
self.message('Disabled %s' % bp)
complete_disable = _complete_bpnumber
def do_condition(self, arg):
"""condition bpnumber [condition]
Set a new condition for the breakpoint, an expression which
must evaluate to true before the breakpoint is honored. If
condition is absent, any existing condition is removed; i.e.,
the breakpoint is made unconditional.
"""
args = arg.split(' ', 1)
try:
cond = args[1]
except IndexError:
cond = None
try:
bp = self.get_bpbynumber(args[0].strip())
except IndexError:
self.error('Breakpoint number expected')
except ValueError as err:
self.error(err)
else:
bp.cond = cond
if not cond:
self.message('Breakpoint %d is now unconditional.' % bp.number)
else:
self.message('New condition set for breakpoint %d.' % bp.number)
complete_condition = _complete_bpnumber
def do_ignore(self, arg):
"""ignore bpnumber [count]
Set the ignore count for the given breakpoint number. If
count is omitted, the ignore count is set to 0. A breakpoint
becomes active when the ignore count is zero. When non-zero,
the count is decremented each time the breakpoint is reached
and the breakpoint is not disabled and any associated
condition evaluates to true.
"""
args = arg.split()
try:
count = int(args[1].strip())
except:
count = 0
try:
bp = self.get_bpbynumber(args[0].strip())
except IndexError:
self.error('Breakpoint number expected')
except ValueError as err:
self.error(err)
else:
bp.ignore = count
if count > 0:
if count > 1:
countstr = '%d crossings' % count
else:
countstr = '1 crossing'
self.message('Will ignore next %s of breakpoint %d.' %
(countstr, bp.number))
else:
self.message('Will stop next time breakpoint %d is reached.'
% bp.number)
complete_ignore = _complete_bpnumber
def do_clear(self, arg):
"""cl(ear) filename:lineno\ncl(ear) [bpnumber [bpnumber...]]
With a space separated list of breakpoint numbers, clear
those breakpoints. Without argument, clear all breaks (but
first ask confirmation). With a filename:lineno argument,
clear all breaks at that line in that file.
"""
if not arg:
try:
reply = input('Clear all breaks? ')
except EOFError:
reply = 'no'
reply = reply.strip().lower()
if reply in ('y', 'yes'):
bplist = [bp for bp in bdb.Breakpoint.bpbynumber if bp]
self.clear_all_breaks()
for bp in bplist:
self.message('Deleted %s' % bp)
return
if ':' in arg:
# Make sure it works for "clear C:\foo\bar.py:12"
i = arg.rfind(':')
filename = arg[:i]
arg = arg[i+1:]
try:
lineno = int(arg)
except ValueError:
err = "Invalid line number (%s)" % arg
else:
bplist = self.get_breaks(filename, lineno)
err = self.clear_break(filename, lineno)
if err:
self.error(err)
else:
for bp in bplist:
self.message('Deleted %s' % bp)
return
numberlist = arg.split()
for i in numberlist:
try:
bp = self.get_bpbynumber(i)
except ValueError as err:
self.error(err)
else:
self.clear_bpbynumber(i)
self.message('Deleted %s' % bp)
do_cl = do_clear # 'c' is already an abbreviation for 'continue'
complete_clear = _complete_location
complete_cl = _complete_location
def do_where(self, arg):
"""w(here)
Print a stack trace, with the most recent frame at the bottom.
An arrow indicates the "current frame", which determines the
context of most commands. 'bt' is an alias for this command.
"""
self.print_stack_trace()
do_w = do_where
do_bt = do_where
def _select_frame(self, number):
assert 0 <= number < len(self.stack)
self.curindex = number
self.curframe = self.stack[self.curindex][0]
self.curframe_locals = self.curframe.f_locals
self.print_stack_entry(self.stack[self.curindex])
self.lineno = None
def do_up(self, arg):
"""u(p) [count]
Move the current frame count (default one) levels up in the
stack trace (to an older frame).
"""
if self.curindex == 0:
self.error('Oldest frame')
return
try:
count = int(arg or 1)
except ValueError:
self.error('Invalid frame count (%s)' % arg)
return
if count < 0:
newframe = 0
else:
newframe = max(0, self.curindex - count)
self._select_frame(newframe)
do_u = do_up
def do_down(self, arg):
"""d(own) [count]
Move the current frame count (default one) levels down in the
stack trace (to a newer frame).
"""
if self.curindex + 1 == len(self.stack):
self.error('Newest frame')
return
try:
count = int(arg or 1)
except ValueError:
self.error('Invalid frame count (%s)' % arg)
return
if count < 0:
newframe = len(self.stack) - 1
else:
newframe = min(len(self.stack) - 1, self.curindex + count)
self._select_frame(newframe)
do_d = do_down
def do_until(self, arg):
"""unt(il) [lineno]
Without argument, continue execution until the line with a
number greater than the current one is reached. With a line
number, continue execution until a line with a number greater
or equal to that is reached. In both cases, also stop when
the current frame returns.
"""
if arg:
try:
lineno = int(arg)
except ValueError:
self.error('Error in argument: %r' % arg)
return
if lineno <= self.curframe.f_lineno:
self.error('"until" line number is smaller than current '
'line number')
return
else:
lineno = None
self.set_until(self.curframe, lineno)
return 1
do_unt = do_until
def do_step(self, arg):
"""s(tep)
Execute the current line, stop at the first possible occasion
(either in a function that is called or in the current
function).
"""
self.set_step()
return 1
do_s = do_step
def do_next(self, arg):
"""n(ext)
Continue execution until the next line in the current function
is reached or it returns.
"""
self.set_next(self.curframe)
return 1
do_n = do_next
def do_run(self, arg):
"""run [args...]
Restart the debugged python program. If a string is supplied
it is split with "shlex", and the result is used as the new
sys.argv. History, breakpoints, actions and debugger options
are preserved. "restart" is an alias for "run".
"""
if arg:
import shlex
argv0 = sys.argv[0:1]
sys.argv = shlex.split(arg)
sys.argv[:0] = argv0
# this is caught in the main debugger loop
raise Restart
do_restart = do_run
def do_return(self, arg):
"""r(eturn)
Continue execution until the current function returns.
"""
self.set_return(self.curframe)
return 1
do_r = do_return
def do_continue(self, arg):
"""c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
"""
if not self.nosigint:
try:
self._previous_sigint_handler = \
signal.signal(signal.SIGINT, self.sigint_handler)
except ValueError:
# ValueError happens when do_continue() is invoked from
# a non-main thread in which case we just continue without
# SIGINT set. Would printing a message here (once) make
# sense?
pass
self.set_continue()
return 1
do_c = do_cont = do_continue
def do_jump(self, arg):
"""j(ump) lineno
Set the next line that will be executed. Only available in
the bottom-most frame. This lets you jump back and execute
code again, or jump forward to skip code that you don't want
to run.
It should be noted that not all jumps are allowed -- for
instance it is not possible to jump into the middle of a
for loop or out of a finally clause.
"""
if self.curindex + 1 != len(self.stack):
self.error('You can only jump within the bottom frame')
return
try:
arg = int(arg)
except ValueError:
self.error("The 'jump' command requires a line number")
else:
try:
# Do the jump, fix up our copy of the stack, and display the
# new position
self.curframe.f_lineno = arg
self.stack[self.curindex] = self.stack[self.curindex][0], arg
self.print_stack_entry(self.stack[self.curindex])
except ValueError as e:
self.error('Jump failed: %s' % e)
do_j = do_jump
def do_debug(self, arg):
"""debug code
Enter a recursive debugger that steps through the code
argument (which is an arbitrary expression or statement to be
executed in the current environment).
"""
sys.settrace(None)
globals = self.curframe.f_globals
locals = self.curframe_locals
p = Pdb(self.completekey, self.stdin, self.stdout)
p.prompt = "(%s) " % self.prompt.strip()
self.message("ENTERING RECURSIVE DEBUGGER")
sys.call_tracing(p.run, (arg, globals, locals))
self.message("LEAVING RECURSIVE DEBUGGER")
sys.settrace(self.trace_dispatch)
self.lastcmd = p.lastcmd
complete_debug = _complete_expression
def do_quit(self, arg):
"""q(uit)\nexit
Quit from the debugger. The program being executed is aborted.
"""
self._user_requested_quit = True
self.set_quit()
return 1
do_q = do_quit
do_exit = do_quit
def do_EOF(self, arg):
"""EOF
Handles the receipt of EOF as a command.
"""
self.message('')
self._user_requested_quit = True
self.set_quit()
return 1
def do_args(self, arg):
"""a(rgs)
Print the argument list of the current function.
"""
co = self.curframe.f_code
dict = self.curframe_locals
n = co.co_argcount
if co.co_flags & 4: n = n+1
if co.co_flags & 8: n = n+1
for i in range(n):
name = co.co_varnames[i]
if name in dict:
self.message('%s = %r' % (name, dict[name]))
else:
self.message('%s = *** undefined ***' % (name,))
do_a = do_args
def do_retval(self, arg):
"""retval
Print the return value for the last return of a function.
"""
if '__return__' in self.curframe_locals:
self.message(repr(self.curframe_locals['__return__']))
else:
self.error('Not yet returned!')
do_rv = do_retval
def _getval(self, arg):
try:
return eval(arg, self.curframe.f_globals, self.curframe_locals)
except:
exc_info = sys.exc_info()[:2]
self.error(traceback.format_exception_only(*exc_info)[-1].strip())
raise
def _getval_except(self, arg, frame=None):
try:
if frame is None:
return eval(arg, self.curframe.f_globals, self.curframe_locals)
else:
return eval(arg, frame.f_globals, frame.f_locals)
except:
exc_info = sys.exc_info()[:2]
err = traceback.format_exception_only(*exc_info)[-1].strip()
return _rstr('** raised %s **' % err)
def do_p(self, arg):
"""p expression
Print the value of the expression.
"""
try:
self.message(repr(self._getval(arg)))
except:
pass
def do_pp(self, arg):
"""pp expression
Pretty-print the value of the expression.
"""
try:
self.message(pprint.pformat(self._getval(arg)))
except:
pass
complete_print = _complete_expression
complete_p = _complete_expression
complete_pp = _complete_expression
def do_list(self, arg):
"""l(ist) [first [,last] | .]
List source code for the current file. Without arguments,
list 11 lines around the current line or continue the previous
listing. With . as argument, list 11 lines around the current
line. With one argument, list 11 lines starting at that line.
With two arguments, list the given range; if the second
argument is less than the first, it is a count.
The current line in the current frame is indicated by "->".
If an exception is being debugged, the line where the
exception was originally raised or propagated is indicated by
">>", if it differs from the current line.
"""
self.lastcmd = 'list'
last = None
if arg and arg != '.':
try:
if ',' in arg:
first, last = arg.split(',')
first = int(first.strip())
last = int(last.strip())
if last < first:
# assume it's a count
last = first + last
else:
first = int(arg.strip())
first = max(1, first - 5)
except ValueError:
self.error('Error in argument: %r' % arg)
return
elif self.lineno is None or arg == '.':
first = max(1, self.curframe.f_lineno - 5)
else:
first = self.lineno + 1
if last is None:
last = first + 10
filename = self.curframe.f_code.co_filename
breaklist = self.get_file_breaks(filename)
try:
lines = linecache.getlines(filename, self.curframe.f_globals)
self._print_lines(lines[first-1:last], first, breaklist,
self.curframe)
self.lineno = min(last, len(lines))
if len(lines) < last:
self.message('[EOF]')
except KeyboardInterrupt:
pass
do_l = do_list
def do_longlist(self, arg):
"""longlist | ll
List the whole source code for the current function or frame.
"""
filename = self.curframe.f_code.co_filename
breaklist = self.get_file_breaks(filename)
try:
lines, lineno = getsourcelines(self.curframe)
except OSError as err:
self.error(err)
return
self._print_lines(lines, lineno, breaklist, self.curframe)
do_ll = do_longlist
def do_source(self, arg):
"""source expression
Try to get source code for the given object and display it.
"""
try:
obj = self._getval(arg)
except:
return
try:
lines, lineno = getsourcelines(obj)
except (OSError, TypeError) as err:
self.error(err)
return
self._print_lines(lines, lineno)
complete_source = _complete_expression
def _print_lines(self, lines, start, breaks=(), frame=None):
"""Print a range of lines."""
if frame:
current_lineno = frame.f_lineno
exc_lineno = self.tb_lineno.get(frame, -1)
else:
current_lineno = exc_lineno = -1
for lineno, line in enumerate(lines, start):
s = str(lineno).rjust(3)
if len(s) < 4:
s += ' '
if lineno in breaks:
s += 'B'
else:
s += ' '
if lineno == current_lineno:
s += '->'
elif lineno == exc_lineno:
s += '>>'
self.message(s + '\t' + line.rstrip())
def do_whatis(self, arg):
"""whatis arg
Print the type of the argument.
"""
try:
value = self._getval(arg)
except:
# _getval() already printed the error
return
code = None
# Is it a function?
try:
code = value.__code__
except Exception:
pass
if code:
self.message('Function %s' % code.co_name)
return
# Is it an instance method?
try:
code = value.__func__.__code__
except Exception:
pass
if code:
self.message('Method %s' % code.co_name)
return
# Is it a class?
if value.__class__ is type:
self.message('Class %s.%s' % (value.__module__, value.__name__))
return
# None of the above...
self.message(type(value))
complete_whatis = _complete_expression
def do_display(self, arg):
"""display [expression]
Display the value of the expression if it changed, each time execution
stops in the current frame.
Without expression, list all display expressions for the current frame.
"""
if not arg:
self.message('Currently displaying:')
for item in self.displaying.get(self.curframe, {}).items():
self.message('%s: %r' % item)
else:
val = self._getval_except(arg)
self.displaying.setdefault(self.curframe, {})[arg] = val
self.message('display %s: %r' % (arg, val))
complete_display = _complete_expression
def do_undisplay(self, arg):
"""undisplay [expression]
Do not display the expression any more in the current frame.
Without expression, clear all display expressions for the current frame.
"""
if arg:
try:
del self.displaying.get(self.curframe, {})[arg]
except KeyError:
self.error('not displaying %s' % arg)
else:
self.displaying.pop(self.curframe, None)
def complete_undisplay(self, text, line, begidx, endidx):
return [e for e in self.displaying.get(self.curframe, {})
if e.startswith(text)]
def do_interact(self, arg):
"""interact
Start an interactive interpreter whose global namespace
contains all the (global and local) names found in the current scope.
"""
ns = self.curframe.f_globals.copy()
ns.update(self.curframe_locals)
code.interact("*interactive*", local=ns)
def do_alias(self, arg):
"""alias [name [command [parameter parameter ...] ]]
Create an alias called 'name' that executes 'command'. The
command must *not* be enclosed in quotes. Replaceable
parameters can be indicated by %1, %2, and so on, while %* is
replaced by all the parameters. If no command is given, the
current alias for name is shown. If no name is given, all
aliases are listed.
Aliases may be nested and can contain anything that can be
legally typed at the pdb prompt. Note! You *can* override
internal pdb commands with aliases! Those internal commands
are then hidden until the alias is removed. Aliasing is
recursively applied to the first word of the command line; all
other words in the line are left alone.
As an example, here are two useful aliases (especially when
placed in the .pdbrc file):
# Print instance variables (usage "pi classInst")
alias pi for k in %1.__dict__.keys(): print("%1.",k,"=",%1.__dict__[k])
# Print instance variables in self
alias ps pi self
"""
args = arg.split()
if len(args) == 0:
keys = sorted(self.aliases.keys())
for alias in keys:
self.message("%s = %s" % (alias, self.aliases[alias]))
return
if args[0] in self.aliases and len(args) == 1:
self.message("%s = %s" % (args[0], self.aliases[args[0]]))
else:
self.aliases[args[0]] = ' '.join(args[1:])
def do_unalias(self, arg):
"""unalias name
Delete the specified alias.
"""
args = arg.split()
if len(args) == 0: return
if args[0] in self.aliases:
del self.aliases[args[0]]
def complete_unalias(self, text, line, begidx, endidx):
return [a for a in self.aliases if a.startswith(text)]
# List of all the commands making the program resume execution.
commands_resuming = ['do_continue', 'do_step', 'do_next', 'do_return',
'do_quit', 'do_jump']
# Print a traceback starting at the top stack frame.
# The most recently entered frame is printed last;
# this is different from dbx and gdb, but consistent with
# the Python interpreter's stack trace.
# It is also consistent with the up/down commands (which are
# compatible with dbx and gdb: up moves towards 'main()'
# and down moves towards the most recent stack frame).
def print_stack_trace(self):
try:
for frame_lineno in self.stack:
self.print_stack_entry(frame_lineno)
except KeyboardInterrupt:
pass
def print_stack_entry(self, frame_lineno, prompt_prefix=line_prefix):
frame, lineno = frame_lineno
if frame is self.curframe:
prefix = '> '
else:
prefix = ' '
self.message(prefix +
self.format_stack_entry(frame_lineno, prompt_prefix))
# Provide help
def do_help(self, arg):
"""h(elp)
Without argument, print the list of available commands.
With a command name as argument, print help about that command.
"help pdb" shows the full pdb documentation.
"help exec" gives help on the ! command.
"""
if not arg:
return cmd.Cmd.do_help(self, arg)
try:
try:
topic = getattr(self, 'help_' + arg)
return topic()
except AttributeError:
command = getattr(self, 'do_' + arg)
except AttributeError:
self.error('No help for %r' % arg)
else:
if sys.flags.optimize >= 2:
self.error('No help for %r; please do not run Python with -OO '
'if you need command help' % arg)
return
self.message(command.__doc__.rstrip())
do_h = do_help
def help_exec(self):
"""(!) statement
Execute the (one-line) statement in the context of the current
stack frame. The exclamation point can be omitted unless the
first word of the statement resembles a debugger command. To
assign to a global variable you must always prefix the command
with a 'global' command, e.g.:
(Pdb) global list_options; list_options = ['-l']
(Pdb)
"""
self.message((self.help_exec.__doc__ or '').strip())
def help_pdb(self):
help()
# other helper functions
def lookupmodule(self, filename):
"""Helper function for break/clear parsing -- may be overridden.
lookupmodule() translates (possibly incomplete) file or module name
into an absolute file name.
"""
if os.path.isabs(filename) and os.path.exists(filename):
return filename
f = os.path.join(sys.path[0], filename)
if os.path.exists(f) and self.canonic(f) == self.mainpyfile:
return f
root, ext = os.path.splitext(filename)
if ext == '':
filename = filename + '.py'
if os.path.isabs(filename):
return filename
for dirname in sys.path:
while os.path.islink(dirname):
dirname = os.readlink(dirname)
fullname = os.path.join(dirname, filename)
if os.path.exists(fullname):
return fullname
return None
def _runscript(self, filename):
# The script has to run in __main__ namespace (or imports from
# __main__ will break).
#
# So we clear up the __main__ and set several special variables
# (this gets rid of pdb's globals and cleans old variables on restarts).
import __main__
__main__.__dict__.clear()
__main__.__dict__.update({"__name__" : "__main__",
"__file__" : filename,
"__builtins__": __builtins__,
})
# When bdb sets tracing, a number of call and line events happens
# BEFORE debugger even reaches user's code (and the exact sequence of
# events depends on python version). So we take special measures to
# avoid stopping before we reach the main script (see user_line and
# user_call for details).
self._wait_for_mainpyfile = True
self.mainpyfile = self.canonic(filename)
self._user_requested_quit = False
with open(filename, "rb") as fp:
statement = "exec(compile(%r, %r, 'exec'))" % \
(fp.read(), self.mainpyfile)
self.run(statement)
# Collect all command help into docstring, if not run with -OO
if __doc__ is not None:
# unfortunately we can't guess this order from the class definition
_help_order = [
'help', 'where', 'down', 'up', 'break', 'tbreak', 'clear', 'disable',
'enable', 'ignore', 'condition', 'commands', 'step', 'next', 'until',
'jump', 'return', 'retval', 'run', 'continue', 'list', 'longlist',
'args', 'p', 'pp', 'whatis', 'source', 'display', 'undisplay',
'interact', 'alias', 'unalias', 'debug', 'quit',
]
for _command in _help_order:
__doc__ += getattr(Pdb, 'do_' + _command).__doc__.strip() + '\n\n'
__doc__ += Pdb.help_exec.__doc__
del _help_order, _command
# Simplified interface
def run(statement, globals=None, locals=None):
Pdb().run(statement, globals, locals)
def runeval(expression, globals=None, locals=None):
return Pdb().runeval(expression, globals, locals)
def runctx(statement, globals, locals):
# B/W compatibility
run(statement, globals, locals)
def runcall(*args, **kwds):
return Pdb().runcall(*args, **kwds)
def set_trace():
Pdb().set_trace(sys._getframe().f_back)
# Post-Mortem interface
def post_mortem(t=None):
# handling the default
if t is None:
# sys.exc_info() returns (type, value, traceback) if an exception is
# being handled, otherwise it returns None
t = sys.exc_info()[2]
if t is None:
raise ValueError("A valid traceback must be passed if no "
"exception is being handled")
p = Pdb()
p.reset()
p.interaction(None, t)
def pm():
post_mortem(sys.last_traceback)
# Main program for testing
TESTCMD = 'import x; x.main()'
def test():
run(TESTCMD)
# print help
def help():
import pydoc
pydoc.pager(__doc__)
_usage = """\
usage: pdb.py [-c command] ... pyfile [arg] ...
Debug the Python program given by pyfile.
Initial commands are read from .pdbrc files in your home directory
and in the current directory, if they exist. Commands supplied with
-c are executed after commands from .pdbrc files.
To let the script run until an exception occurs, use "-c continue".
To let the script run up to a given line X in the debugged file, use
"-c 'until X'"."""
def main():
import getopt
opts, args = getopt.getopt(sys.argv[1:], 'hc:', ['--help', '--command='])
if not args:
print(_usage)
sys.exit(2)
commands = []
for opt, optarg in opts:
if opt in ['-h', '--help']:
print(_usage)
sys.exit()
elif opt in ['-c', '--command']:
commands.append(optarg)
mainpyfile = args[0] # Get script filename
if not os.path.exists(mainpyfile):
print('Error:', mainpyfile, 'does not exist')
sys.exit(1)
sys.argv[:] = args # Hide "pdb.py" and pdb options from argument list
# Replace pdb's dir with script's dir in front of module search path.
sys.path[0] = os.path.dirname(mainpyfile)
# Note on saving/restoring sys.argv: it's a good idea when sys.argv was
# modified by the script being debugged. It's a bad idea when it was
# changed by the user from the command line. There is a "restart" command
# which allows explicit specification of command line arguments.
pdb = Pdb()
pdb.rcLines.extend(commands)
while True:
try:
pdb._runscript(mainpyfile)
if pdb._user_requested_quit:
break
print("The program finished and will be restarted")
except Restart:
print("Restarting", mainpyfile, "with arguments:")
print("\t" + " ".join(args))
except SystemExit:
# In most cases SystemExit does not warrant a post-mortem session.
print("The program exited via sys.exit(). Exit status:", end=' ')
print(sys.exc_info()[1])
except SyntaxError:
traceback.print_exc()
sys.exit(1)
except:
traceback.print_exc()
print("Uncaught exception. Entering post mortem debugging")
print("Running 'cont' or 'step' will restart the program")
t = sys.exc_info()[2]
pdb.interaction(None, t)
print("Post mortem debugger finished. The " + mainpyfile +
" will be restarted")
# When invoked as main program, invoke the debugger on a script
if __name__ == '__main__':
import pdb
pdb.main()
"""Create portable serialized representations of Python objects.
See module copyreg for a mechanism for registering custom picklers.
See module pickletools source for extensive comments.
Classes:
Pickler
Unpickler
Functions:
dump(object, file)
dumps(object) -> string
load(file) -> object
loads(string) -> object
Misc variables:
__version__
format_version
compatible_formats
"""
from types import FunctionType
from copyreg import dispatch_table
from copyreg import _extension_registry, _inverted_registry, _extension_cache
from itertools import islice
import sys
from sys import maxsize
from struct import pack, unpack
import re
import io
import codecs
import _compat_pickle
__all__ = ["PickleError", "PicklingError", "UnpicklingError", "Pickler",
"Unpickler", "dump", "dumps", "load", "loads"]
# Shortcut for use in isinstance testing
bytes_types = (bytes, bytearray)
# These are purely informational; no code uses these.
format_version = "4.0" # File format version we write
compatible_formats = ["1.0", # Original protocol 0
"1.1", # Protocol 0 with INST added
"1.2", # Original protocol 1
"1.3", # Protocol 1 with BINFLOAT added
"2.0", # Protocol 2
"3.0", # Protocol 3
"4.0", # Protocol 4
] # Old format versions we can read
# This is the highest protocol number we know how to read.
HIGHEST_PROTOCOL = 4
# The protocol we write by default. May be less than HIGHEST_PROTOCOL.
# We intentionally write a protocol that Python 2.x cannot read;
# there are too many issues with that.
DEFAULT_PROTOCOL = 3
class PickleError(Exception):
"""A common base class for the other pickling exceptions."""
pass
class PicklingError(PickleError):
"""This exception is raised when an unpicklable object is passed to the
dump() method.
"""
pass
class UnpicklingError(PickleError):
"""This exception is raised when there is a problem unpickling an object,
such as a security violation.
Note that other exceptions may also be raised during unpickling, including
(but not necessarily limited to) AttributeError, EOFError, ImportError,
and IndexError.
"""
pass
# An instance of _Stop is raised by Unpickler.load_stop() in response to
# the STOP opcode, passing the object that is the result of unpickling.
class _Stop(Exception):
def __init__(self, value):
self.value = value
# Jython has PyStringMap; it's a dict subclass with string keys
try:
from org.python.core import PyStringMap
except ImportError:
PyStringMap = None
# Pickle opcodes. See pickletools.py for extensive docs. The listing
# here is in kind-of alphabetical order of 1-character pickle code.
# pickletools groups them by purpose.
MARK = b'(' # push special markobject on stack
STOP = b'.' # every pickle ends with STOP
POP = b'0' # discard topmost stack item
POP_MARK = b'1' # discard stack top through topmost markobject
DUP = b'2' # duplicate top stack item
FLOAT = b'F' # push float object; decimal string argument
INT = b'I' # push integer or bool; decimal string argument
BININT = b'J' # push four-byte signed int
BININT1 = b'K' # push 1-byte unsigned int
LONG = b'L' # push long; decimal string argument
BININT2 = b'M' # push 2-byte unsigned int
NONE = b'N' # push None
PERSID = b'P' # push persistent object; id is taken from string arg
BINPERSID = b'Q' # " " " ; " " " " stack
REDUCE = b'R' # apply callable to argtuple, both on stack
STRING = b'S' # push string; NL-terminated string argument
BINSTRING = b'T' # push string; counted binary string argument
SHORT_BINSTRING= b'U' # " " ; " " " " < 256 bytes
UNICODE = b'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = b'X' # " " " ; counted UTF-8 string argument
APPEND = b'a' # append stack top to list below it
BUILD = b'b' # call __setstate__ or __dict__.update()
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
DICT = b'd' # build a dict from stack items
EMPTY_DICT = b'}' # push empty dict
APPENDS = b'e' # extend list on stack by topmost stack slice
GET = b'g' # push item from memo on stack; index is string arg
BINGET = b'h' # " " " " " " ; " " 1-byte arg
INST = b'i' # build & push class instance
LONG_BINGET = b'j' # push item from memo on stack; index is 4-byte arg
LIST = b'l' # build list from topmost stack items
EMPTY_LIST = b']' # push empty list
OBJ = b'o' # build & push class instance
PUT = b'p' # store stack top in memo; index is string arg
BINPUT = b'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = b'r' # " " " " " ; " " 4-byte arg
SETITEM = b's' # add key+value pair to dict
TUPLE = b't' # build tuple from topmost stack items
EMPTY_TUPLE = b')' # push empty tuple
SETITEMS = b'u' # modify dict by adding topmost key+value pairs
BINFLOAT = b'G' # push float; arg is 8-byte float encoding
TRUE = b'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = b'I00\n' # not an opcode; see INT docs in pickletools.py
# Protocol 2
PROTO = b'\x80' # identify pickle protocol
NEWOBJ = b'\x81' # build object by applying cls.__new__ to argtuple
EXT1 = b'\x82' # push object from extension registry; 1-byte index
EXT2 = b'\x83' # ditto, but 2-byte index
EXT4 = b'\x84' # ditto, but 4-byte index
TUPLE1 = b'\x85' # build 1-tuple from stack top
TUPLE2 = b'\x86' # build 2-tuple from two topmost stack items
TUPLE3 = b'\x87' # build 3-tuple from three topmost stack items
NEWTRUE = b'\x88' # push True
NEWFALSE = b'\x89' # push False
LONG1 = b'\x8a' # push long from < 256 bytes
LONG4 = b'\x8b' # push really big long
_tuplesize2code = [EMPTY_TUPLE, TUPLE1, TUPLE2, TUPLE3]
# Protocol 3 (Python 3.x)
BINBYTES = b'B' # push bytes; counted binary string argument
SHORT_BINBYTES = b'C' # " " ; " " " " < 256 bytes
# Protocol 4
SHORT_BINUNICODE = b'\x8c' # push short string; UTF-8 length < 256 bytes
BINUNICODE8 = b'\x8d' # push very long string
BINBYTES8 = b'\x8e' # push very long bytes string
EMPTY_SET = b'\x8f' # push empty set on the stack
ADDITEMS = b'\x90' # modify set by adding topmost stack items
FROZENSET = b'\x91' # build frozenset from topmost stack items
NEWOBJ_EX = b'\x92' # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL = b'\x93' # same as GLOBAL but using names on the stacks
MEMOIZE = b'\x94' # store top of the stack in memo
FRAME = b'\x95' # indicate the beginning of a new frame
__all__.extend([x for x in dir() if re.match("[A-Z][A-Z0-9_]+$", x)])
class _Framer:
_FRAME_SIZE_TARGET = 64 * 1024
def __init__(self, file_write):
self.file_write = file_write
self.current_frame = None
def start_framing(self):
self.current_frame = io.BytesIO()
def end_framing(self):
if self.current_frame and self.current_frame.tell() > 0:
self.commit_frame(force=True)
self.current_frame = None
def commit_frame(self, force=False):
if self.current_frame:
f = self.current_frame
if f.tell() >= self._FRAME_SIZE_TARGET or force:
with f.getbuffer() as data:
n = len(data)
write = self.file_write
write(FRAME)
write(pack("<Q", n))
write(data)
f.seek(0)
f.truncate()
def write(self, data):
if self.current_frame:
return self.current_frame.write(data)
else:
return self.file_write(data)
class _Unframer:
def __init__(self, file_read, file_readline, file_tell=None):
self.file_read = file_read
self.file_readline = file_readline
self.current_frame = None
def read(self, n):
if self.current_frame:
data = self.current_frame.read(n)
if not data and n != 0:
self.current_frame = None
return self.file_read(n)
if len(data) < n:
raise UnpicklingError(
"pickle exhausted before end of frame")
return data
else:
return self.file_read(n)
def readline(self):
if self.current_frame:
data = self.current_frame.readline()
if not data:
self.current_frame = None
return self.file_readline()
if data[-1] != b'\n'[0]:
raise UnpicklingError(
"pickle exhausted before end of frame")
return data
else:
return self.file_readline()
def load_frame(self, frame_size):
if self.current_frame and self.current_frame.read() != b'':
raise UnpicklingError(
"beginning of a new frame before end of current frame")
self.current_frame = io.BytesIO(self.file_read(frame_size))
# Tools used for pickling.
def _getattribute(obj, name, allow_qualname=False):
dotted_path = name.split(".")
if not allow_qualname and len(dotted_path) > 1:
raise AttributeError("Can't get qualified attribute {!r} on {!r}; " +
"use protocols >= 4 to enable support"
.format(name, obj))
for subpath in dotted_path:
if subpath == '<locals>':
raise AttributeError("Can't get local attribute {!r} on {!r}"
.format(name, obj))
try:
obj = getattr(obj, subpath)
except AttributeError:
raise AttributeError("Can't get attribute {!r} on {!r}"
.format(name, obj))
return obj
def whichmodule(obj, name, allow_qualname=False):
"""Find the module an object belong to."""
module_name = getattr(obj, '__module__', None)
if module_name is not None:
return module_name
# Protect the iteration by using a list copy of sys.modules against dynamic
# modules that trigger imports of other modules upon calls to getattr.
for module_name, module in list(sys.modules.items()):
if module_name == '__main__' or module is None:
continue
try:
if _getattribute(module, name, allow_qualname) is obj:
return module_name
except AttributeError:
pass
return '__main__'
def encode_long(x):
r"""Encode a long to a two's complement little-endian binary string.
Note that 0 is a special case, returning an empty string, to save a
byte in the LONG1 pickling context.
>>> encode_long(0)
b''
>>> encode_long(255)
b'\xff\x00'
>>> encode_long(32767)
b'\xff\x7f'
>>> encode_long(-256)
b'\x00\xff'
>>> encode_long(-32768)
b'\x00\x80'
>>> encode_long(-128)
b'\x80'
>>> encode_long(127)
b'\x7f'
>>>
"""
if x == 0:
return b''
nbytes = (x.bit_length() >> 3) + 1
result = x.to_bytes(nbytes, byteorder='little', signed=True)
if x < 0 and nbytes > 1:
if result[-1] == 0xff and (result[-2] & 0x80) != 0:
result = result[:-1]
return result
def decode_long(data):
r"""Decode a long from a two's complement little-endian binary string.
>>> decode_long(b'')
0
>>> decode_long(b"\xff\x00")
255
>>> decode_long(b"\xff\x7f")
32767
>>> decode_long(b"\x00\xff")
-256
>>> decode_long(b"\x00\x80")
-32768
>>> decode_long(b"\x80")
-128
>>> decode_long(b"\x7f")
127
"""
return int.from_bytes(data, byteorder='little', signed=True)
# Pickling machinery
class _Pickler:
def __init__(self, file, protocol=None, *, fix_imports=True):
"""This takes a binary file for writing a pickle data stream.
The optional *protocol* argument tells the pickler to use the
given protocol; supported protocols are 0, 1, 2, 3 and 4. The
default protocol is 3; a backward-incompatible protocol designed
for Python 3.
Specifying a negative protocol version selects the highest
protocol version supported. The higher the protocol used, the
more recent the version of Python needed to read the pickle
produced.
The *file* argument must have a write() method that accepts a
single bytes argument. It can thus be a file object opened for
binary writing, an io.BytesIO instance, or any other custom
object that meets this interface.
If *fix_imports* is True and *protocol* is less than 3, pickle
will try to map the new Python 3 names to the old module names
used in Python 2, so that the pickle data stream is readable
with Python 2.
"""
if protocol is None:
protocol = DEFAULT_PROTOCOL
if protocol < 0:
protocol = HIGHEST_PROTOCOL
elif not 0 <= protocol <= HIGHEST_PROTOCOL:
raise ValueError("pickle protocol must be <= %d" % HIGHEST_PROTOCOL)
try:
self._file_write = file.write
except AttributeError:
raise TypeError("file must have a 'write' attribute")
self.framer = _Framer(self._file_write)
self.write = self.framer.write
self.memo = {}
self.proto = int(protocol)
self.bin = protocol >= 1
self.fast = 0
self.fix_imports = fix_imports and protocol < 3
def clear_memo(self):
"""Clears the pickler's "memo".
The memo is the data structure that remembers which objects the
pickler has already seen, so that shared or recursive objects
are pickled by reference and not by value. This method is
useful when re-using picklers.
"""
self.memo.clear()
def dump(self, obj):
"""Write a pickled representation of obj to the open file."""
# Check whether Pickler was initialized correctly. This is
# only needed to mimic the behavior of _pickle.Pickler.dump().
if not hasattr(self, "_file_write"):
raise PicklingError("Pickler.__init__() was not called by "
"%s.__init__()" % (self.__class__.__name__,))
if self.proto >= 2:
self.write(PROTO + pack("<B", self.proto))
if self.proto >= 4:
self.framer.start_framing()
self.save(obj)
self.write(STOP)
self.framer.end_framing()
def memoize(self, obj):
"""Store an object in the memo."""
# The Pickler memo is a dictionary mapping object ids to 2-tuples
# that contain the Unpickler memo key and the object being memoized.
# The memo key is written to the pickle and will become
# the key in the Unpickler's memo. The object is stored in the
# Pickler memo so that transient objects are kept alive during
# pickling.
# The use of the Unpickler memo length as the memo key is just a
# convention. The only requirement is that the memo values be unique.
# But there appears no advantage to any other scheme, and this
# scheme allows the Unpickler memo to be implemented as a plain (but
# growable) array, indexed by memo key.
if self.fast:
return
assert id(obj) not in self.memo
idx = len(self.memo)
self.write(self.put(idx))
self.memo[id(obj)] = idx, obj
# Return a PUT (BINPUT, LONG_BINPUT) opcode string, with argument i.
def put(self, idx):
if self.proto >= 4:
return MEMOIZE
elif self.bin:
if idx < 256:
return BINPUT + pack("<B", idx)
else:
return LONG_BINPUT + pack("<I", idx)
else:
return PUT + repr(idx).encode("ascii") + b'\n'
# Return a GET (BINGET, LONG_BINGET) opcode string, with argument i.
def get(self, i):
if self.bin:
if i < 256:
return BINGET + pack("<B", i)
else:
return LONG_BINGET + pack("<I", i)
return GET + repr(i).encode("ascii") + b'\n'
def save(self, obj, save_persistent_id=True):
self.framer.commit_frame()
# Check for persistent id (defined by a subclass)
pid = self.persistent_id(obj)
if pid is not None and save_persistent_id:
self.save_pers(pid)
return
# Check the memo
x = self.memo.get(id(obj))
if x is not None:
self.write(self.get(x[0]))
return
# Check the type dispatch table
t = type(obj)
f = self.dispatch.get(t)
if f is not None:
f(self, obj) # Call unbound method with explicit self
return
# Check private dispatch table if any, or else copyreg.dispatch_table
reduce = getattr(self, 'dispatch_table', dispatch_table).get(t)
if reduce is not None:
rv = reduce(obj)
else:
# Check for a class with a custom metaclass; treat as regular class
try:
issc = issubclass(t, type)
except TypeError: # t is not a class (old Boost; see SF #502085)
issc = False
if issc:
self.save_global(obj)
return
# Check for a __reduce_ex__ method, fall back to __reduce__
reduce = getattr(obj, "__reduce_ex__", None)
if reduce is not None:
rv = reduce(self.proto)
else:
reduce = getattr(obj, "__reduce__", None)
if reduce is not None:
rv = reduce()
else:
raise PicklingError("Can't pickle %r object: %r" %
(t.__name__, obj))
# Check for string returned by reduce(), meaning "save as global"
if isinstance(rv, str):
self.save_global(obj, rv)
return
# Assert that reduce() returned a tuple
if not isinstance(rv, tuple):
raise PicklingError("%s must return string or tuple" % reduce)
# Assert that it returned an appropriately sized tuple
l = len(rv)
if not (2 <= l <= 5):
raise PicklingError("Tuple returned by %s must have "
"two to five elements" % reduce)
# Save the reduce() output and finally memoize the object
self.save_reduce(obj=obj, *rv)
def persistent_id(self, obj):
# This exists so a subclass can override it
return None
def save_pers(self, pid):
# Save a persistent id reference
if self.bin:
self.save(pid, save_persistent_id=False)
self.write(BINPERSID)
else:
self.write(PERSID + str(pid).encode("ascii") + b'\n')
def save_reduce(self, func, args, state=None, listitems=None,
dictitems=None, obj=None):
# This API is called by some subclasses
if not isinstance(args, tuple):
raise PicklingError("args from save_reduce() must be a tuple")
if not callable(func):
raise PicklingError("func from save_reduce() must be callable")
save = self.save
write = self.write
func_name = getattr(func, "__name__", "")
if self.proto >= 4 and func_name == "__newobj_ex__":
cls, args, kwargs = args
if not hasattr(cls, "__new__"):
raise PicklingError("args[0] from {} args has no __new__"
.format(func_name))
if obj is not None and cls is not obj.__class__:
raise PicklingError("args[0] from {} args has the wrong class"
.format(func_name))
save(cls)
save(args)
save(kwargs)
write(NEWOBJ_EX)
elif self.proto >= 2 and func_name == "__newobj__":
# A __reduce__ implementation can direct protocol 2 or newer to
# use the more efficient NEWOBJ opcode, while still
# allowing protocol 0 and 1 to work normally. For this to
# work, the function returned by __reduce__ should be
# called __newobj__, and its first argument should be a
# class. The implementation for __newobj__
# should be as follows, although pickle has no way to
# verify this:
#
# def __newobj__(cls, *args):
# return cls.__new__(cls, *args)
#
# Protocols 0 and 1 will pickle a reference to __newobj__,
# while protocol 2 (and above) will pickle a reference to
# cls, the remaining args tuple, and the NEWOBJ code,
# which calls cls.__new__(cls, *args) at unpickling time
# (see load_newobj below). If __reduce__ returns a
# three-tuple, the state from the third tuple item will be
# pickled regardless of the protocol, calling __setstate__
# at unpickling time (see load_build below).
#
# Note that no standard __newobj__ implementation exists;
# you have to provide your own. This is to enforce
# compatibility with Python 2.2 (pickles written using
# protocol 0 or 1 in Python 2.3 should be unpicklable by
# Python 2.2).
cls = args[0]
if not hasattr(cls, "__new__"):
raise PicklingError(
"args[0] from __newobj__ args has no __new__")
if obj is not None and cls is not obj.__class__:
raise PicklingError(
"args[0] from __newobj__ args has the wrong class")
args = args[1:]
save(cls)
save(args)
write(NEWOBJ)
else:
save(func)
save(args)
write(REDUCE)
if obj is not None:
# If the object is already in the memo, this means it is
# recursive. In this case, throw away everything we put on the
# stack, and fetch the object back from the memo.
if id(obj) in self.memo:
write(POP + self.get(self.memo[id(obj)][0]))
else:
self.memoize(obj)
# More new special cases (that work with older protocols as
# well): when __reduce__ returns a tuple with 4 or 5 items,
# the 4th and 5th item should be iterators that provide list
# items and dict items (as (key, value) tuples), or None.
if listitems is not None:
self._batch_appends(listitems)
if dictitems is not None:
self._batch_setitems(dictitems)
if state is not None:
save(state)
write(BUILD)
# Methods below this point are dispatched through the dispatch table
dispatch = {}
def save_none(self, obj):
self.write(NONE)
dispatch[type(None)] = save_none
def save_bool(self, obj):
if self.proto >= 2:
self.write(NEWTRUE if obj else NEWFALSE)
else:
self.write(TRUE if obj else FALSE)
dispatch[bool] = save_bool
def save_long(self, obj):
if self.bin:
# If the int is small enough to fit in a signed 4-byte 2's-comp
# format, we can store it more efficiently than the general
# case.
# First one- and two-byte unsigned ints:
if obj >= 0:
if obj <= 0xff:
self.write(BININT1 + pack("<B", obj))
return
if obj <= 0xffff:
self.write(BININT2 + pack("<H", obj))
return
# Next check for 4-byte signed ints:
if -0x80000000 <= obj <= 0x7fffffff:
self.write(BININT + pack("<i", obj))
return
if self.proto >= 2:
encoded = encode_long(obj)
n = len(encoded)
if n < 256:
self.write(LONG1 + pack("<B", n) + encoded)
else:
self.write(LONG4 + pack("<i", n) + encoded)
return
self.write(LONG + repr(obj).encode("ascii") + b'L\n')
dispatch[int] = save_long
def save_float(self, obj):
if self.bin:
self.write(BINFLOAT + pack('>d', obj))
else:
self.write(FLOAT + repr(obj).encode("ascii") + b'\n')
dispatch[float] = save_float
def save_bytes(self, obj):
if self.proto < 3:
if not obj: # bytes object is empty
self.save_reduce(bytes, (), obj=obj)
else:
self.save_reduce(codecs.encode,
(str(obj, 'latin1'), 'latin1'), obj=obj)
return
n = len(obj)
if n <= 0xff:
self.write(SHORT_BINBYTES + pack("<B", n) + obj)
elif n > 0xffffffff and self.proto >= 4:
self.write(BINBYTES8 + pack("<Q", n) + obj)
else:
self.write(BINBYTES + pack("<I", n) + obj)
self.memoize(obj)
dispatch[bytes] = save_bytes
def save_str(self, obj):
if self.bin:
encoded = obj.encode('utf-8', 'surrogatepass')
n = len(encoded)
if n <= 0xff and self.proto >= 4:
self.write(SHORT_BINUNICODE + pack("<B", n) + encoded)
elif n > 0xffffffff and self.proto >= 4:
self.write(BINUNICODE8 + pack("<Q", n) + encoded)
else:
self.write(BINUNICODE + pack("<I", n) + encoded)
else:
obj = obj.replace("\\", "\\u005c")
obj = obj.replace("\n", "\\u000a")
self.write(UNICODE + obj.encode('raw-unicode-escape') +
b'\n')
self.memoize(obj)
dispatch[str] = save_str
def save_tuple(self, obj):
if not obj: # tuple is empty
if self.bin:
self.write(EMPTY_TUPLE)
else:
self.write(MARK + TUPLE)
return
n = len(obj)
save = self.save
memo = self.memo
if n <= 3 and self.proto >= 2:
for element in obj:
save(element)
# Subtle. Same as in the big comment below.
if id(obj) in memo:
get = self.get(memo[id(obj)][0])
self.write(POP * n + get)
else:
self.write(_tuplesize2code[n])
self.memoize(obj)
return
# proto 0 or proto 1 and tuple isn't empty, or proto > 1 and tuple
# has more than 3 elements.
write = self.write
write(MARK)
for element in obj:
save(element)
if id(obj) in memo:
# Subtle. d was not in memo when we entered save_tuple(), so
# the process of saving the tuple's elements must have saved
# the tuple itself: the tuple is recursive. The proper action
# now is to throw away everything we put on the stack, and
# simply GET the tuple (it's already constructed). This check
# could have been done in the "for element" loop instead, but
# recursive tuples are a rare thing.
get = self.get(memo[id(obj)][0])
if self.bin:
write(POP_MARK + get)
else: # proto 0 -- POP_MARK not available
write(POP * (n+1) + get)
return
# No recursion.
write(TUPLE)
self.memoize(obj)
dispatch[tuple] = save_tuple
def save_list(self, obj):
if self.bin:
self.write(EMPTY_LIST)
else: # proto 0 -- can't use EMPTY_LIST
self.write(MARK + LIST)
self.memoize(obj)
self._batch_appends(obj)
dispatch[list] = save_list
_BATCHSIZE = 1000
def _batch_appends(self, items):
# Helper to batch up APPENDS sequences
save = self.save
write = self.write
if not self.bin:
for x in items:
save(x)
write(APPEND)
return
it = iter(items)
while True:
tmp = list(islice(it, self._BATCHSIZE))
n = len(tmp)
if n > 1:
write(MARK)
for x in tmp:
save(x)
write(APPENDS)
elif n:
save(tmp[0])
write(APPEND)
# else tmp is empty, and we're done
if n < self._BATCHSIZE:
return
def save_dict(self, obj):
if self.bin:
self.write(EMPTY_DICT)
else: # proto 0 -- can't use EMPTY_DICT
self.write(MARK + DICT)
self.memoize(obj)
self._batch_setitems(obj.items())
dispatch[dict] = save_dict
if PyStringMap is not None:
dispatch[PyStringMap] = save_dict
def _batch_setitems(self, items):
# Helper to batch up SETITEMS sequences; proto >= 1 only
save = self.save
write = self.write
if not self.bin:
for k, v in items:
save(k)
save(v)
write(SETITEM)
return
it = iter(items)
while True:
tmp = list(islice(it, self._BATCHSIZE))
n = len(tmp)
if n > 1:
write(MARK)
for k, v in tmp:
save(k)
save(v)
write(SETITEMS)
elif n:
k, v = tmp[0]
save(k)
save(v)
write(SETITEM)
# else tmp is empty, and we're done
if n < self._BATCHSIZE:
return
def save_set(self, obj):
save = self.save
write = self.write
if self.proto < 4:
self.save_reduce(set, (list(obj),), obj=obj)
return
write(EMPTY_SET)
self.memoize(obj)
it = iter(obj)
while True:
batch = list(islice(it, self._BATCHSIZE))
n = len(batch)
if n > 0:
write(MARK)
for item in batch:
save(item)
write(ADDITEMS)
if n < self._BATCHSIZE:
return
dispatch[set] = save_set
def save_frozenset(self, obj):
save = self.save
write = self.write
if self.proto < 4:
self.save_reduce(frozenset, (list(obj),), obj=obj)
return
write(MARK)
for item in obj:
save(item)
if id(obj) in self.memo:
# If the object is already in the memo, this means it is
# recursive. In this case, throw away everything we put on the
# stack, and fetch the object back from the memo.
write(POP_MARK + self.get(self.memo[id(obj)][0]))
return
write(FROZENSET)
self.memoize(obj)
dispatch[frozenset] = save_frozenset
def save_global(self, obj, name=None):
write = self.write
memo = self.memo
if name is None and self.proto >= 4:
name = getattr(obj, '__qualname__', None)
if name is None:
name = obj.__name__
module_name = whichmodule(obj, name, allow_qualname=self.proto >= 4)
try:
__import__(module_name, level=0)
module = sys.modules[module_name]
obj2 = _getattribute(module, name, allow_qualname=self.proto >= 4)
except (ImportError, KeyError, AttributeError):
raise PicklingError(
"Can't pickle %r: it's not found as %s.%s" %
(obj, module_name, name))
else:
if obj2 is not obj:
raise PicklingError(
"Can't pickle %r: it's not the same object as %s.%s" %
(obj, module_name, name))
if self.proto >= 2:
code = _extension_registry.get((module_name, name))
if code:
assert code > 0
if code <= 0xff:
write(EXT1 + pack("<B", code))
elif code <= 0xffff:
write(EXT2 + pack("<H", code))
else:
write(EXT4 + pack("<i", code))
return
# Non-ASCII identifiers are supported only with protocols >= 3.
if self.proto >= 4:
self.save(module_name)
self.save(name)
write(STACK_GLOBAL)
elif self.proto >= 3:
write(GLOBAL + bytes(module_name, "utf-8") + b'\n' +
bytes(name, "utf-8") + b'\n')
else:
if self.fix_imports:
r_name_mapping = _compat_pickle.REVERSE_NAME_MAPPING
r_import_mapping = _compat_pickle.REVERSE_IMPORT_MAPPING
if (module_name, name) in r_name_mapping:
module_name, name = r_name_mapping[(module_name, name)]
elif module_name in r_import_mapping:
module_name = r_import_mapping[module_name]
try:
write(GLOBAL + bytes(module_name, "ascii") + b'\n' +
bytes(name, "ascii") + b'\n')
except UnicodeEncodeError:
raise PicklingError(
"can't pickle global identifier '%s.%s' using "
"pickle protocol %i" % (module, name, self.proto))
self.memoize(obj)
def save_type(self, obj):
if obj is type(None):
return self.save_reduce(type, (None,), obj=obj)
elif obj is type(NotImplemented):
return self.save_reduce(type, (NotImplemented,), obj=obj)
elif obj is type(...):
return self.save_reduce(type, (...,), obj=obj)
return self.save_global(obj)
dispatch[FunctionType] = save_global
dispatch[type] = save_type
# Unpickling machinery
class _Unpickler:
def __init__(self, file, *, fix_imports=True,
encoding="ASCII", errors="strict"):
"""This takes a binary file for reading a pickle data stream.
The protocol version of the pickle is detected automatically, so
no proto argument is needed.
The argument *file* must have two methods, a read() method that
takes an integer argument, and a readline() method that requires
no arguments. Both methods should return bytes. Thus *file*
can be a binary file object opened for reading, an io.BytesIO
object, or any other custom object that meets this interface.
The file-like object must have two methods, a read() method
that takes an integer argument, and a readline() method that
requires no arguments. Both methods should return bytes.
Thus file-like object can be a binary file object opened for
reading, a BytesIO object, or any other custom object that
meets this interface.
Optional keyword arguments are *fix_imports*, *encoding* and
*errors*, which are used to control compatiblity support for
pickle stream generated by Python 2. If *fix_imports* is True,
pickle will try to map the old Python 2 names to the new names
used in Python 3. The *encoding* and *errors* tell pickle how
to decode 8-bit string instances pickled by Python 2; these
default to 'ASCII' and 'strict', respectively. *encoding* can be
'bytes' to read theses 8-bit string instances as bytes objects.
"""
self._file_readline = file.readline
self._file_read = file.read
self.memo = {}
self.encoding = encoding
self.errors = errors
self.proto = 0
self.fix_imports = fix_imports
def load(self):
"""Read a pickled object representation from the open file.
Return the reconstituted object hierarchy specified in the file.
"""
# Check whether Unpickler was initialized correctly. This is
# only needed to mimic the behavior of _pickle.Unpickler.dump().
if not hasattr(self, "_file_read"):
raise UnpicklingError("Unpickler.__init__() was not called by "
"%s.__init__()" % (self.__class__.__name__,))
self._unframer = _Unframer(self._file_read, self._file_readline)
self.read = self._unframer.read
self.readline = self._unframer.readline
self.mark = object() # any new unique object
self.stack = []
self.append = self.stack.append
self.proto = 0
read = self.read
dispatch = self.dispatch
try:
while True:
key = read(1)
if not key:
raise EOFError
assert isinstance(key, bytes_types)
dispatch[key[0]](self)
except _Stop as stopinst:
return stopinst.value
# Return largest index k such that self.stack[k] is self.mark.
# If the stack doesn't contain a mark, eventually raises IndexError.
# This could be sped by maintaining another stack, of indices at which
# the mark appears. For that matter, the latter stack would suffice,
# and we wouldn't need to push mark objects on self.stack at all.
# Doing so is probably a good thing, though, since if the pickle is
# corrupt (or hostile) we may get a clue from finding self.mark embedded
# in unpickled objects.
def marker(self):
stack = self.stack
mark = self.mark
k = len(stack)-1
while stack[k] is not mark: k = k-1
return k
def persistent_load(self, pid):
raise UnpicklingError("unsupported persistent id encountered")
dispatch = {}
def load_proto(self):
proto = self.read(1)[0]
if not 0 <= proto <= HIGHEST_PROTOCOL:
raise ValueError("unsupported pickle protocol: %d" % proto)
self.proto = proto
dispatch[PROTO[0]] = load_proto
def load_frame(self):
frame_size, = unpack('<Q', self.read(8))
if frame_size > sys.maxsize:
raise ValueError("frame size > sys.maxsize: %d" % frame_size)
self._unframer.load_frame(frame_size)
dispatch[FRAME[0]] = load_frame
def load_persid(self):
pid = self.readline()[:-1].decode("ascii")
self.append(self.persistent_load(pid))
dispatch[PERSID[0]] = load_persid
def load_binpersid(self):
pid = self.stack.pop()
self.append(self.persistent_load(pid))
dispatch[BINPERSID[0]] = load_binpersid
def load_none(self):
self.append(None)
dispatch[NONE[0]] = load_none
def load_false(self):
self.append(False)
dispatch[NEWFALSE[0]] = load_false
def load_true(self):
self.append(True)
dispatch[NEWTRUE[0]] = load_true
def load_int(self):
data = self.readline()
if data == FALSE[1:]:
val = False
elif data == TRUE[1:]:
val = True
else:
val = int(data, 0)
self.append(val)
dispatch[INT[0]] = load_int
def load_binint(self):
self.append(unpack('<i', self.read(4))[0])
dispatch[BININT[0]] = load_binint
def load_binint1(self):
self.append(self.read(1)[0])
dispatch[BININT1[0]] = load_binint1
def load_binint2(self):
self.append(unpack('<H', self.read(2))[0])
dispatch[BININT2[0]] = load_binint2
def load_long(self):
val = self.readline()[:-1]
if val and val[-1] == b'L'[0]:
val = val[:-1]
self.append(int(val, 0))
dispatch[LONG[0]] = load_long
def load_long1(self):
n = self.read(1)[0]
data = self.read(n)
self.append(decode_long(data))
dispatch[LONG1[0]] = load_long1
def load_long4(self):
n, = unpack('<i', self.read(4))
if n < 0:
# Corrupt or hostile pickle -- we never write one like this
raise UnpicklingError("LONG pickle has negative byte count")
data = self.read(n)
self.append(decode_long(data))
dispatch[LONG4[0]] = load_long4
def load_float(self):
self.append(float(self.readline()[:-1]))
dispatch[FLOAT[0]] = load_float
def load_binfloat(self):
self.append(unpack('>d', self.read(8))[0])
dispatch[BINFLOAT[0]] = load_binfloat
def _decode_string(self, value):
# Used to allow strings from Python 2 to be decoded either as
# bytes or Unicode strings. This should be used only with the
# STRING, BINSTRING and SHORT_BINSTRING opcodes.
if self.encoding == "bytes":
return value
else:
return value.decode(self.encoding, self.errors)
def load_string(self):
data = self.readline()[:-1]
# Strip outermost quotes
if len(data) >= 2 and data[0] == data[-1] and data[0] in b'"\'':
data = data[1:-1]
else:
raise UnpicklingError("the STRING opcode argument must be quoted")
self.append(self._decode_string(codecs.escape_decode(data)[0]))
dispatch[STRING[0]] = load_string
def load_binstring(self):
# Deprecated BINSTRING uses signed 32-bit length
len, = unpack('<i', self.read(4))
if len < 0:
raise UnpicklingError("BINSTRING pickle has negative byte count")
data = self.read(len)
self.append(self._decode_string(data))
dispatch[BINSTRING[0]] = load_binstring
def load_binbytes(self):
len, = unpack('<I', self.read(4))
if len > maxsize:
raise UnpicklingError("BINBYTES exceeds system's maximum size "
"of %d bytes" % maxsize)
self.append(self.read(len))
dispatch[BINBYTES[0]] = load_binbytes
def load_unicode(self):
self.append(str(self.readline()[:-1], 'raw-unicode-escape'))
dispatch[UNICODE[0]] = load_unicode
def load_binunicode(self):
len, = unpack('<I', self.read(4))
if len > maxsize:
raise UnpicklingError("BINUNICODE exceeds system's maximum size "
"of %d bytes" % maxsize)
self.append(str(self.read(len), 'utf-8', 'surrogatepass'))
dispatch[BINUNICODE[0]] = load_binunicode
def load_binunicode8(self):
len, = unpack('<Q', self.read(8))
if len > maxsize:
raise UnpicklingError("BINUNICODE8 exceeds system's maximum size "
"of %d bytes" % maxsize)
self.append(str(self.read(len), 'utf-8', 'surrogatepass'))
dispatch[BINUNICODE8[0]] = load_binunicode8
def load_binbytes8(self):
len, = unpack('<Q', self.read(8))
if len > maxsize:
raise UnpicklingError("BINBYTES8 exceeds system's maximum size "
"of %d bytes" % maxsize)
self.append(self.read(len))
dispatch[BINBYTES8[0]] = load_binbytes8
def load_short_binstring(self):
len = self.read(1)[0]
data = self.read(len)
self.append(self._decode_string(data))
dispatch[SHORT_BINSTRING[0]] = load_short_binstring
def load_short_binbytes(self):
len = self.read(1)[0]
self.append(self.read(len))
dispatch[SHORT_BINBYTES[0]] = load_short_binbytes
def load_short_binunicode(self):
len = self.read(1)[0]
self.append(str(self.read(len), 'utf-8', 'surrogatepass'))
dispatch[SHORT_BINUNICODE[0]] = load_short_binunicode
def load_tuple(self):
k = self.marker()
self.stack[k:] = [tuple(self.stack[k+1:])]
dispatch[TUPLE[0]] = load_tuple
def load_empty_tuple(self):
self.append(())
dispatch[EMPTY_TUPLE[0]] = load_empty_tuple
def load_tuple1(self):
self.stack[-1] = (self.stack[-1],)
dispatch[TUPLE1[0]] = load_tuple1
def load_tuple2(self):
self.stack[-2:] = [(self.stack[-2], self.stack[-1])]
dispatch[TUPLE2[0]] = load_tuple2
def load_tuple3(self):
self.stack[-3:] = [(self.stack[-3], self.stack[-2], self.stack[-1])]
dispatch[TUPLE3[0]] = load_tuple3
def load_empty_list(self):
self.append([])
dispatch[EMPTY_LIST[0]] = load_empty_list
def load_empty_dictionary(self):
self.append({})
dispatch[EMPTY_DICT[0]] = load_empty_dictionary
def load_empty_set(self):
self.append(set())
dispatch[EMPTY_SET[0]] = load_empty_set
def load_frozenset(self):
k = self.marker()
self.stack[k:] = [frozenset(self.stack[k+1:])]
dispatch[FROZENSET[0]] = load_frozenset
def load_list(self):
k = self.marker()
self.stack[k:] = [self.stack[k+1:]]
dispatch[LIST[0]] = load_list
def load_dict(self):
k = self.marker()
items = self.stack[k+1:]
d = {items[i]: items[i+1]
for i in range(0, len(items), 2)}
self.stack[k:] = [d]
dispatch[DICT[0]] = load_dict
# INST and OBJ differ only in how they get a class object. It's not
# only sensible to do the rest in a common routine, the two routines
# previously diverged and grew different bugs.
# klass is the class to instantiate, and k points to the topmost mark
# object, following which are the arguments for klass.__init__.
def _instantiate(self, klass, k):
args = tuple(self.stack[k+1:])
del self.stack[k:]
if (args or not isinstance(klass, type) or
hasattr(klass, "__getinitargs__")):
try:
value = klass(*args)
except TypeError as err:
raise TypeError("in constructor for %s: %s" %
(klass.__name__, str(err)), sys.exc_info()[2])
else:
value = klass.__new__(klass)
self.append(value)
def load_inst(self):
module = self.readline()[:-1].decode("ascii")
name = self.readline()[:-1].decode("ascii")
klass = self.find_class(module, name)
self._instantiate(klass, self.marker())
dispatch[INST[0]] = load_inst
def load_obj(self):
# Stack is ... markobject classobject arg1 arg2 ...
k = self.marker()
klass = self.stack.pop(k+1)
self._instantiate(klass, k)
dispatch[OBJ[0]] = load_obj
def load_newobj(self):
args = self.stack.pop()
cls = self.stack.pop()
obj = cls.__new__(cls, *args)
self.append(obj)
dispatch[NEWOBJ[0]] = load_newobj
def load_newobj_ex(self):
kwargs = self.stack.pop()
args = self.stack.pop()
cls = self.stack.pop()
obj = cls.__new__(cls, *args, **kwargs)
self.append(obj)
dispatch[NEWOBJ_EX[0]] = load_newobj_ex
def load_global(self):
module = self.readline()[:-1].decode("utf-8")
name = self.readline()[:-1].decode("utf-8")
klass = self.find_class(module, name)
self.append(klass)
dispatch[GLOBAL[0]] = load_global
def load_stack_global(self):
name = self.stack.pop()
module = self.stack.pop()
if type(name) is not str or type(module) is not str:
raise UnpicklingError("STACK_GLOBAL requires str")
self.append(self.find_class(module, name))
dispatch[STACK_GLOBAL[0]] = load_stack_global
def load_ext1(self):
code = self.read(1)[0]
self.get_extension(code)
dispatch[EXT1[0]] = load_ext1
def load_ext2(self):
code, = unpack('<H', self.read(2))
self.get_extension(code)
dispatch[EXT2[0]] = load_ext2
def load_ext4(self):
code, = unpack('<i', self.read(4))
self.get_extension(code)
dispatch[EXT4[0]] = load_ext4
def get_extension(self, code):
nil = []
obj = _extension_cache.get(code, nil)
if obj is not nil:
self.append(obj)
return
key = _inverted_registry.get(code)
if not key:
if code <= 0: # note that 0 is forbidden
# Corrupt or hostile pickle.
raise UnpicklingError("EXT specifies code <= 0")
raise ValueError("unregistered extension code %d" % code)
obj = self.find_class(*key)
_extension_cache[code] = obj
self.append(obj)
def find_class(self, module, name):
# Subclasses may override this.
if self.proto < 3 and self.fix_imports:
if (module, name) in _compat_pickle.NAME_MAPPING:
module, name = _compat_pickle.NAME_MAPPING[(module, name)]
elif module in _compat_pickle.IMPORT_MAPPING:
module = _compat_pickle.IMPORT_MAPPING[module]
__import__(module, level=0)
return _getattribute(sys.modules[module], name,
allow_qualname=self.proto >= 4)
def load_reduce(self):
stack = self.stack
args = stack.pop()
func = stack[-1]
stack[-1] = func(*args)
dispatch[REDUCE[0]] = load_reduce
def load_pop(self):
del self.stack[-1]
dispatch[POP[0]] = load_pop
def load_pop_mark(self):
k = self.marker()
del self.stack[k:]
dispatch[POP_MARK[0]] = load_pop_mark
def load_dup(self):
self.append(self.stack[-1])
dispatch[DUP[0]] = load_dup
def load_get(self):
i = int(self.readline()[:-1])
self.append(self.memo[i])
dispatch[GET[0]] = load_get
def load_binget(self):
i = self.read(1)[0]
self.append(self.memo[i])
dispatch[BINGET[0]] = load_binget
def load_long_binget(self):
i, = unpack('<I', self.read(4))
self.append(self.memo[i])
dispatch[LONG_BINGET[0]] = load_long_binget
def load_put(self):
i = int(self.readline()[:-1])
if i < 0:
raise ValueError("negative PUT argument")
self.memo[i] = self.stack[-1]
dispatch[PUT[0]] = load_put
def load_binput(self):
i = self.read(1)[0]
if i < 0:
raise ValueError("negative BINPUT argument")
self.memo[i] = self.stack[-1]
dispatch[BINPUT[0]] = load_binput
def load_long_binput(self):
i, = unpack('<I', self.read(4))
if i > maxsize:
raise ValueError("negative LONG_BINPUT argument")
self.memo[i] = self.stack[-1]
dispatch[LONG_BINPUT[0]] = load_long_binput
def load_memoize(self):
memo = self.memo
memo[len(memo)] = self.stack[-1]
dispatch[MEMOIZE[0]] = load_memoize
def load_append(self):
stack = self.stack
value = stack.pop()
list = stack[-1]
list.append(value)
dispatch[APPEND[0]] = load_append
def load_appends(self):
stack = self.stack
mark = self.marker()
list_obj = stack[mark - 1]
items = stack[mark + 1:]
if isinstance(list_obj, list):
list_obj.extend(items)
else:
append = list_obj.append
for item in items:
append(item)
del stack[mark:]
dispatch[APPENDS[0]] = load_appends
def load_setitem(self):
stack = self.stack
value = stack.pop()
key = stack.pop()
dict = stack[-1]
dict[key] = value
dispatch[SETITEM[0]] = load_setitem
def load_setitems(self):
stack = self.stack
mark = self.marker()
dict = stack[mark - 1]
for i in range(mark + 1, len(stack), 2):
dict[stack[i]] = stack[i + 1]
del stack[mark:]
dispatch[SETITEMS[0]] = load_setitems
def load_additems(self):
stack = self.stack
mark = self.marker()
set_obj = stack[mark - 1]
items = stack[mark + 1:]
if isinstance(set_obj, set):
set_obj.update(items)
else:
add = set_obj.add
for item in items:
add(item)
del stack[mark:]
dispatch[ADDITEMS[0]] = load_additems
def load_build(self):
stack = self.stack
state = stack.pop()
inst = stack[-1]
setstate = getattr(inst, "__setstate__", None)
if setstate is not None:
setstate(state)
return
slotstate = None
if isinstance(state, tuple) and len(state) == 2:
state, slotstate = state
if state:
inst_dict = inst.__dict__
intern = sys.intern
for k, v in state.items():
if type(k) is str:
inst_dict[intern(k)] = v
else:
inst_dict[k] = v
if slotstate:
for k, v in slotstate.items():
setattr(inst, k, v)
dispatch[BUILD[0]] = load_build
def load_mark(self):
self.append(self.mark)
dispatch[MARK[0]] = load_mark
def load_stop(self):
value = self.stack.pop()
raise _Stop(value)
dispatch[STOP[0]] = load_stop
# Shorthands
def _dump(obj, file, protocol=None, *, fix_imports=True):
_Pickler(file, protocol, fix_imports=fix_imports).dump(obj)
def _dumps(obj, protocol=None, *, fix_imports=True):
f = io.BytesIO()
_Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
res = f.getvalue()
assert isinstance(res, bytes_types)
return res
def _load(file, *, fix_imports=True, encoding="ASCII", errors="strict"):
return _Unpickler(file, fix_imports=fix_imports,
encoding=encoding, errors=errors).load()
def _loads(s, *, fix_imports=True, encoding="ASCII", errors="strict"):
if isinstance(s, str):
raise TypeError("Can't load pickle from unicode string")
file = io.BytesIO(s)
return _Unpickler(file, fix_imports=fix_imports,
encoding=encoding, errors=errors).load()
# Use the faster _pickle if possible
try:
from _pickle import (
PickleError,
PicklingError,
UnpicklingError,
Pickler,
Unpickler,
dump,
dumps,
load,
loads
)
except ImportError:
Pickler, Unpickler = _Pickler, _Unpickler
dump, dumps, load, loads = _dump, _dumps, _load, _loads
# Doctest
def _test():
import doctest
return doctest.testmod()
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(
description='display contents of the pickle files')
parser.add_argument(
'pickle_file', type=argparse.FileType('br'),
nargs='*', help='the pickle file')
parser.add_argument(
'-t', '--test', action='store_true',
help='run self-test suite')
parser.add_argument(
'-v', action='store_true',
help='run verbosely; only affects self-test run')
args = parser.parse_args()
if args.test:
_test()
else:
if not args.pickle_file:
parser.print_help()
else:
import pprint
for f in args.pickle_file:
obj = load(f)
pprint.pprint(obj)
'''"Executable documentation" for the pickle module.
Extensive comments about the pickle protocols and pickle-machine opcodes
can be found here. Some functions meant for external use:
genops(pickle)
Generate all the opcodes in a pickle, as (opcode, arg, position) triples.
dis(pickle, out=None, memo=None, indentlevel=4)
Print a symbolic disassembly of a pickle.
'''
import codecs
import io
import pickle
import re
import sys
__all__ = ['dis', 'genops', 'optimize']
bytes_types = pickle.bytes_types
# Other ideas:
#
# - A pickle verifier: read a pickle and check it exhaustively for
# well-formedness. dis() does a lot of this already.
#
# - A protocol identifier: examine a pickle and return its protocol number
# (== the highest .proto attr value among all the opcodes in the pickle).
# dis() already prints this info at the end.
#
# - A pickle optimizer: for example, tuple-building code is sometimes more
# elaborate than necessary, catering for the possibility that the tuple
# is recursive. Or lots of times a PUT is generated that's never accessed
# by a later GET.
# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
# called an unpickling machine). It's a sequence of opcodes, interpreted by the
# PM, building an arbitrarily complex Python object.
#
# For the most part, the PM is very simple: there are no looping, testing, or
# conditional instructions, no arithmetic and no function calls. Opcodes are
# executed once each, from first to last, until a STOP opcode is reached.
#
# The PM has two data areas, "the stack" and "the memo".
#
# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
# integer object on the stack, whose value is gotten from a decimal string
# literal immediately following the INT opcode in the pickle bytestream. Other
# opcodes take Python objects off the stack. The result of unpickling is
# whatever object is left on the stack when the final STOP opcode is executed.
#
# The memo is simply an array of objects, or it can be implemented as a dict
# mapping little integers to objects. The memo serves as the PM's "long term
# memory", and the little integers indexing the memo are akin to variable
# names. Some opcodes pop a stack object into the memo at a given index,
# and others push a memo object at a given index onto the stack again.
#
# At heart, that's all the PM has. Subtleties arise for these reasons:
#
# + Object identity. Objects can be arbitrarily complex, and subobjects
# may be shared (for example, the list [a, a] refers to the same object a
# twice). It can be vital that unpickling recreate an isomorphic object
# graph, faithfully reproducing sharing.
#
# + Recursive objects. For example, after "L = []; L.append(L)", L is a
# list, and L[0] is the same list. This is related to the object identity
# point, and some sequences of pickle opcodes are subtle in order to
# get the right result in all cases.
#
# + Things pickle doesn't know everything about. Examples of things pickle
# does know everything about are Python's builtin scalar and container
# types, like ints and tuples. They generally have opcodes dedicated to
# them. For things like module references and instances of user-defined
# classes, pickle's knowledge is limited. Historically, many enhancements
# have been made to the pickle protocol in order to do a better (faster,
# and/or more compact) job on those.
#
# + Backward compatibility and micro-optimization. As explained below,
# pickle opcodes never go away, not even when better ways to do a thing
# get invented. The repertoire of the PM just keeps growing over time.
# For example, protocol 0 had two opcodes for building Python integers (INT
# and LONG), protocol 1 added three more for more-efficient pickling of short
# integers, and protocol 2 added two more for more-efficient pickling of
# long integers (before protocol 2, the only ways to pickle a Python long
# took time quadratic in the number of digits, for both pickling and
# unpickling). "Opcode bloat" isn't so much a subtlety as a source of
# wearying complication.
#
#
# Pickle protocols:
#
# For compatibility, the meaning of a pickle opcode never changes. Instead new
# pickle opcodes get added, and each version's unpickler can handle all the
# pickle opcodes in all protocol versions to date. So old pickles continue to
# be readable forever. The pickler can generally be told to restrict itself to
# the subset of opcodes available under previous protocol versions too, so that
# users can create pickles under the current version readable by older
# versions. However, a pickle does not contain its version number embedded
# within it. If an older unpickler tries to read a pickle using a later
# protocol, the result is most likely an exception due to seeing an unknown (in
# the older unpickler) opcode.
#
# The original pickle used what's now called "protocol 0", and what was called
# "text mode" before Python 2.3. The entire pickle bytestream is made up of
# printable 7-bit ASCII characters, plus the newline character, in protocol 0.
# That's why it was called text mode. Protocol 0 is small and elegant, but
# sometimes painfully inefficient.
#
# The second major set of additions is now called "protocol 1", and was called
# "binary mode" before Python 2.3. This added many opcodes with arguments
# consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"
# bytes. Binary mode pickles can be substantially smaller than equivalent
# text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte
# int as 4 bytes following the opcode, which is cheaper to unpickle than the
# (perhaps) 11-character decimal string attached to INT. Protocol 1 also added
# a number of opcodes that operate on many stack elements at once (like APPENDS
# and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).
#
# The third major set of additions came in Python 2.3, and is called "protocol
# 2". This added:
#
# - A better way to pickle instances of new-style classes (NEWOBJ).
#
# - A way for a pickle to identify its protocol (PROTO).
#
# - Time- and space- efficient pickling of long ints (LONG{1,4}).
#
# - Shortcuts for small tuples (TUPLE{1,2,3}}.
#
# - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).
#
# - The "extension registry", a vector of popular objects that can be pushed
# efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but
# the registry contents are predefined (there's nothing akin to the memo's
# PUT).
#
# Another independent change with Python 2.3 is the abandonment of any
# pretense that it might be safe to load pickles received from untrusted
# parties -- no sufficient security analysis has been done to guarantee
# this and there isn't a use case that warrants the expense of such an
# analysis.
#
# To this end, all tests for __safe_for_unpickling__ or for
# copyreg.safe_constructors are removed from the unpickling code.
# References to these variables in the descriptions below are to be seen
# as describing unpickling in Python 2.2 and before.
# Meta-rule: Descriptions are stored in instances of descriptor objects,
# with plain constructors. No meta-language is defined from which
# descriptors could be constructed. If you want, e.g., XML, write a little
# program to generate XML from the objects.
##############################################################################
# Some pickle opcodes have an argument, following the opcode in the
# bytestream. An argument is of a specific type, described by an instance
# of ArgumentDescriptor. These are not to be confused with arguments taken
# off the stack -- ArgumentDescriptor applies only to arguments embedded in
# the opcode stream, immediately following an opcode.
# Represents the number of bytes consumed by an argument delimited by the
# next newline character.
UP_TO_NEWLINE = -1
# Represents the number of bytes consumed by a two-argument opcode where
# the first argument gives the number of bytes in the second argument.
TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int
TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int
TAKEN_FROM_ARGUMENT4U = -4 # num bytes is 4-byte unsigned little-endian int
TAKEN_FROM_ARGUMENT8U = -5 # num bytes is 8-byte unsigned little-endian int
class ArgumentDescriptor(object):
__slots__ = (
# name of descriptor record, also a module global name; a string
'name',
# length of argument, in bytes; an int; UP_TO_NEWLINE and
# TAKEN_FROM_ARGUMENT{1,4,8} are negative values for variable-length
# cases
'n',
# a function taking a file-like object, reading this kind of argument
# from the object at the current position, advancing the current
# position by n bytes, and returning the value of the argument
'reader',
# human-readable docs for this arg descriptor; a string
'doc',
)
def __init__(self, name, n, reader, doc):
assert isinstance(name, str)
self.name = name
assert isinstance(n, int) and (n >= 0 or
n in (UP_TO_NEWLINE,
TAKEN_FROM_ARGUMENT1,
TAKEN_FROM_ARGUMENT4,
TAKEN_FROM_ARGUMENT4U,
TAKEN_FROM_ARGUMENT8U))
self.n = n
self.reader = reader
assert isinstance(doc, str)
self.doc = doc
from struct import unpack as _unpack
def read_uint1(f):
r"""
>>> import io
>>> read_uint1(io.BytesIO(b'\xff'))
255
"""
data = f.read(1)
if data:
return data[0]
raise ValueError("not enough data in stream to read uint1")
uint1 = ArgumentDescriptor(
name='uint1',
n=1,
reader=read_uint1,
doc="One-byte unsigned integer.")
def read_uint2(f):
r"""
>>> import io
>>> read_uint2(io.BytesIO(b'\xff\x00'))
255
>>> read_uint2(io.BytesIO(b'\xff\xff'))
65535
"""
data = f.read(2)
if len(data) == 2:
return _unpack("<H", data)[0]
raise ValueError("not enough data in stream to read uint2")
uint2 = ArgumentDescriptor(
name='uint2',
n=2,
reader=read_uint2,
doc="Two-byte unsigned integer, little-endian.")
def read_int4(f):
r"""
>>> import io
>>> read_int4(io.BytesIO(b'\xff\x00\x00\x00'))
255
>>> read_int4(io.BytesIO(b'\x00\x00\x00\x80')) == -(2**31)
True
"""
data = f.read(4)
if len(data) == 4:
return _unpack("<i", data)[0]
raise ValueError("not enough data in stream to read int4")
int4 = ArgumentDescriptor(
name='int4',
n=4,
reader=read_int4,
doc="Four-byte signed integer, little-endian, 2's complement.")
def read_uint4(f):
r"""
>>> import io
>>> read_uint4(io.BytesIO(b'\xff\x00\x00\x00'))
255
>>> read_uint4(io.BytesIO(b'\x00\x00\x00\x80')) == 2**31
True
"""
data = f.read(4)
if len(data) == 4:
return _unpack("<I", data)[0]
raise ValueError("not enough data in stream to read uint4")
uint4 = ArgumentDescriptor(
name='uint4',
n=4,
reader=read_uint4,
doc="Four-byte unsigned integer, little-endian.")
def read_uint8(f):
r"""
>>> import io
>>> read_uint8(io.BytesIO(b'\xff\x00\x00\x00\x00\x00\x00\x00'))
255
>>> read_uint8(io.BytesIO(b'\xff' * 8)) == 2**64-1
True
"""
data = f.read(8)
if len(data) == 8:
return _unpack("<Q", data)[0]
raise ValueError("not enough data in stream to read uint8")
uint8 = ArgumentDescriptor(
name='uint8',
n=8,
reader=read_uint8,
doc="Eight-byte unsigned integer, little-endian.")
def read_stringnl(f, decode=True, stripquotes=True):
r"""
>>> import io
>>> read_stringnl(io.BytesIO(b"'abcd'\nefg\n"))
'abcd'
>>> read_stringnl(io.BytesIO(b"\n"))
Traceback (most recent call last):
...
ValueError: no string quotes around b''
>>> read_stringnl(io.BytesIO(b"\n"), stripquotes=False)
''
>>> read_stringnl(io.BytesIO(b"''\n"))
''
>>> read_stringnl(io.BytesIO(b'"abcd"'))
Traceback (most recent call last):
...
ValueError: no newline found when trying to read stringnl
Embedded escapes are undone in the result.
>>> read_stringnl(io.BytesIO(br"'a\n\\b\x00c\td'" + b"\n'e'"))
'a\n\\b\x00c\td'
"""
data = f.readline()
if not data.endswith(b'\n'):
raise ValueError("no newline found when trying to read stringnl")
data = data[:-1] # lose the newline
if stripquotes:
for q in (b'"', b"'"):
if data.startswith(q):
if not data.endswith(q):
raise ValueError("strinq quote %r not found at both "
"ends of %r" % (q, data))
data = data[1:-1]
break
else:
raise ValueError("no string quotes around %r" % data)
if decode:
data = codecs.escape_decode(data)[0].decode("ascii")
return data
stringnl = ArgumentDescriptor(
name='stringnl',
n=UP_TO_NEWLINE,
reader=read_stringnl,
doc="""A newline-terminated string.
This is a repr-style string, with embedded escapes, and
bracketing quotes.
""")
def read_stringnl_noescape(f):
return read_stringnl(f, stripquotes=False)
stringnl_noescape = ArgumentDescriptor(
name='stringnl_noescape',
n=UP_TO_NEWLINE,
reader=read_stringnl_noescape,
doc="""A newline-terminated string.
This is a str-style string, without embedded escapes,
or bracketing quotes. It should consist solely of
printable ASCII characters.
""")
def read_stringnl_noescape_pair(f):
r"""
>>> import io
>>> read_stringnl_noescape_pair(io.BytesIO(b"Queue\nEmpty\njunk"))
'Queue Empty'
"""
return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))
stringnl_noescape_pair = ArgumentDescriptor(
name='stringnl_noescape_pair',
n=UP_TO_NEWLINE,
reader=read_stringnl_noescape_pair,
doc="""A pair of newline-terminated strings.
These are str-style strings, without embedded
escapes, or bracketing quotes. They should
consist solely of printable ASCII characters.
The pair is returned as a single string, with
a single blank separating the two strings.
""")
def read_string1(f):
r"""
>>> import io
>>> read_string1(io.BytesIO(b"\x00"))
''
>>> read_string1(io.BytesIO(b"\x03abcdef"))
'abc'
"""
n = read_uint1(f)
assert n >= 0
data = f.read(n)
if len(data) == n:
return data.decode("latin-1")
raise ValueError("expected %d bytes in a string1, but only %d remain" %
(n, len(data)))
string1 = ArgumentDescriptor(
name="string1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_string1,
doc="""A counted string.
The first argument is a 1-byte unsigned int giving the number
of bytes in the string, and the second argument is that many
bytes.
""")
def read_string4(f):
r"""
>>> import io
>>> read_string4(io.BytesIO(b"\x00\x00\x00\x00abc"))
''
>>> read_string4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
'abc'
>>> read_string4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Traceback (most recent call last):
...
ValueError: expected 50331648 bytes in a string4, but only 6 remain
"""
n = read_int4(f)
if n < 0:
raise ValueError("string4 byte count < 0: %d" % n)
data = f.read(n)
if len(data) == n:
return data.decode("latin-1")
raise ValueError("expected %d bytes in a string4, but only %d remain" %
(n, len(data)))
string4 = ArgumentDescriptor(
name="string4",
n=TAKEN_FROM_ARGUMENT4,
reader=read_string4,
doc="""A counted string.
The first argument is a 4-byte little-endian signed int giving
the number of bytes in the string, and the second argument is
that many bytes.
""")
def read_bytes1(f):
r"""
>>> import io
>>> read_bytes1(io.BytesIO(b"\x00"))
b''
>>> read_bytes1(io.BytesIO(b"\x03abcdef"))
b'abc'
"""
n = read_uint1(f)
assert n >= 0
data = f.read(n)
if len(data) == n:
return data
raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
(n, len(data)))
bytes1 = ArgumentDescriptor(
name="bytes1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_bytes1,
doc="""A counted bytes string.
The first argument is a 1-byte unsigned int giving the number
of bytes in the string, and the second argument is that many
bytes.
""")
def read_bytes1(f):
r"""
>>> import io
>>> read_bytes1(io.BytesIO(b"\x00"))
b''
>>> read_bytes1(io.BytesIO(b"\x03abcdef"))
b'abc'
"""
n = read_uint1(f)
assert n >= 0
data = f.read(n)
if len(data) == n:
return data
raise ValueError("expected %d bytes in a bytes1, but only %d remain" %
(n, len(data)))
bytes1 = ArgumentDescriptor(
name="bytes1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_bytes1,
doc="""A counted bytes string.
The first argument is a 1-byte unsigned int giving the number
of bytes, and the second argument is that many bytes.
""")
def read_bytes4(f):
r"""
>>> import io
>>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x00abc"))
b''
>>> read_bytes4(io.BytesIO(b"\x03\x00\x00\x00abcdef"))
b'abc'
>>> read_bytes4(io.BytesIO(b"\x00\x00\x00\x03abcdef"))
Traceback (most recent call last):
...
ValueError: expected 50331648 bytes in a bytes4, but only 6 remain
"""
n = read_uint4(f)
assert n >= 0
if n > sys.maxsize:
raise ValueError("bytes4 byte count > sys.maxsize: %d" % n)
data = f.read(n)
if len(data) == n:
return data
raise ValueError("expected %d bytes in a bytes4, but only %d remain" %
(n, len(data)))
bytes4 = ArgumentDescriptor(
name="bytes4",
n=TAKEN_FROM_ARGUMENT4U,
reader=read_bytes4,
doc="""A counted bytes string.
The first argument is a 4-byte little-endian unsigned int giving
the number of bytes, and the second argument is that many bytes.
""")
def read_bytes8(f):
r"""
>>> import io, struct, sys
>>> read_bytes8(io.BytesIO(b"\x00\x00\x00\x00\x00\x00\x00\x00abc"))
b''
>>> read_bytes8(io.BytesIO(b"\x03\x00\x00\x00\x00\x00\x00\x00abcdef"))
b'abc'
>>> bigsize8 = struct.pack("<Q", sys.maxsize//3)
>>> read_bytes8(io.BytesIO(bigsize8 + b"abcdef")) #doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: expected ... bytes in a bytes8, but only 6 remain
"""
n = read_uint8(f)
assert n >= 0
if n > sys.maxsize:
raise ValueError("bytes8 byte count > sys.maxsize: %d" % n)
data = f.read(n)
if len(data) == n:
return data
raise ValueError("expected %d bytes in a bytes8, but only %d remain" %
(n, len(data)))
bytes8 = ArgumentDescriptor(
name="bytes8",
n=TAKEN_FROM_ARGUMENT8U,
reader=read_bytes8,
doc="""A counted bytes string.
The first argument is a 8-byte little-endian unsigned int giving
the number of bytes, and the second argument is that many bytes.
""")
def read_unicodestringnl(f):
r"""
>>> import io
>>> read_unicodestringnl(io.BytesIO(b"abc\\uabcd\njunk")) == 'abc\uabcd'
True
"""
data = f.readline()
if not data.endswith(b'\n'):
raise ValueError("no newline found when trying to read "
"unicodestringnl")
data = data[:-1] # lose the newline
return str(data, 'raw-unicode-escape')
unicodestringnl = ArgumentDescriptor(
name='unicodestringnl',
n=UP_TO_NEWLINE,
reader=read_unicodestringnl,
doc="""A newline-terminated Unicode string.
This is raw-unicode-escape encoded, so consists of
printable ASCII characters, and may contain embedded
escape sequences.
""")
def read_unicodestring1(f):
r"""
>>> import io
>>> s = 'abcd\uabcd'
>>> enc = s.encode('utf-8')
>>> enc
b'abcd\xea\xaf\x8d'
>>> n = bytes([len(enc)]) # little-endian 1-byte length
>>> t = read_unicodestring1(io.BytesIO(n + enc + b'junk'))
>>> s == t
True
>>> read_unicodestring1(io.BytesIO(n + enc[:-1]))
Traceback (most recent call last):
...
ValueError: expected 7 bytes in a unicodestring1, but only 6 remain
"""
n = read_uint1(f)
assert n >= 0
data = f.read(n)
if len(data) == n:
return str(data, 'utf-8', 'surrogatepass')
raise ValueError("expected %d bytes in a unicodestring1, but only %d "
"remain" % (n, len(data)))
unicodestring1 = ArgumentDescriptor(
name="unicodestring1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_unicodestring1,
doc="""A counted Unicode string.
The first argument is a 1-byte little-endian signed int
giving the number of bytes in the string, and the second
argument-- the UTF-8 encoding of the Unicode string --
contains that many bytes.
""")
def read_unicodestring4(f):
r"""
>>> import io
>>> s = 'abcd\uabcd'
>>> enc = s.encode('utf-8')
>>> enc
b'abcd\xea\xaf\x8d'
>>> n = bytes([len(enc), 0, 0, 0]) # little-endian 4-byte length
>>> t = read_unicodestring4(io.BytesIO(n + enc + b'junk'))
>>> s == t
True
>>> read_unicodestring4(io.BytesIO(n + enc[:-1]))
Traceback (most recent call last):
...
ValueError: expected 7 bytes in a unicodestring4, but only 6 remain
"""
n = read_uint4(f)
assert n >= 0
if n > sys.maxsize:
raise ValueError("unicodestring4 byte count > sys.maxsize: %d" % n)
data = f.read(n)
if len(data) == n:
return str(data, 'utf-8', 'surrogatepass')
raise ValueError("expected %d bytes in a unicodestring4, but only %d "
"remain" % (n, len(data)))
unicodestring4 = ArgumentDescriptor(
name="unicodestring4",
n=TAKEN_FROM_ARGUMENT4U,
reader=read_unicodestring4,
doc="""A counted Unicode string.
The first argument is a 4-byte little-endian signed int
giving the number of bytes in the string, and the second
argument-- the UTF-8 encoding of the Unicode string --
contains that many bytes.
""")
def read_unicodestring8(f):
r"""
>>> import io
>>> s = 'abcd\uabcd'
>>> enc = s.encode('utf-8')
>>> enc
b'abcd\xea\xaf\x8d'
>>> n = bytes([len(enc)]) + bytes(7) # little-endian 8-byte length
>>> t = read_unicodestring8(io.BytesIO(n + enc + b'junk'))
>>> s == t
True
>>> read_unicodestring8(io.BytesIO(n + enc[:-1]))
Traceback (most recent call last):
...
ValueError: expected 7 bytes in a unicodestring8, but only 6 remain
"""
n = read_uint8(f)
assert n >= 0
if n > sys.maxsize:
raise ValueError("unicodestring8 byte count > sys.maxsize: %d" % n)
data = f.read(n)
if len(data) == n:
return str(data, 'utf-8', 'surrogatepass')
raise ValueError("expected %d bytes in a unicodestring8, but only %d "
"remain" % (n, len(data)))
unicodestring8 = ArgumentDescriptor(
name="unicodestring8",
n=TAKEN_FROM_ARGUMENT8U,
reader=read_unicodestring8,
doc="""A counted Unicode string.
The first argument is a 8-byte little-endian signed int
giving the number of bytes in the string, and the second
argument-- the UTF-8 encoding of the Unicode string --
contains that many bytes.
""")
def read_decimalnl_short(f):
r"""
>>> import io
>>> read_decimalnl_short(io.BytesIO(b"1234\n56"))
1234
>>> read_decimalnl_short(io.BytesIO(b"1234L\n56"))
Traceback (most recent call last):
...
ValueError: invalid literal for int() with base 10: b'1234L'
"""
s = read_stringnl(f, decode=False, stripquotes=False)
# There's a hack for True and False here.
if s == b"00":
return False
elif s == b"01":
return True
return int(s)
def read_decimalnl_long(f):
r"""
>>> import io
>>> read_decimalnl_long(io.BytesIO(b"1234L\n56"))
1234
>>> read_decimalnl_long(io.BytesIO(b"123456789012345678901234L\n6"))
123456789012345678901234
"""
s = read_stringnl(f, decode=False, stripquotes=False)
if s[-1:] == b'L':
s = s[:-1]
return int(s)
decimalnl_short = ArgumentDescriptor(
name='decimalnl_short',
n=UP_TO_NEWLINE,
reader=read_decimalnl_short,
doc="""A newline-terminated decimal integer literal.
This never has a trailing 'L', and the integer fit
in a short Python int on the box where the pickle
was written -- but there's no guarantee it will fit
in a short Python int on the box where the pickle
is read.
""")
decimalnl_long = ArgumentDescriptor(
name='decimalnl_long',
n=UP_TO_NEWLINE,
reader=read_decimalnl_long,
doc="""A newline-terminated decimal integer literal.
This has a trailing 'L', and can represent integers
of any size.
""")
def read_floatnl(f):
r"""
>>> import io
>>> read_floatnl(io.BytesIO(b"-1.25\n6"))
-1.25
"""
s = read_stringnl(f, decode=False, stripquotes=False)
return float(s)
floatnl = ArgumentDescriptor(
name='floatnl',
n=UP_TO_NEWLINE,
reader=read_floatnl,
doc="""A newline-terminated decimal floating literal.
In general this requires 17 significant digits for roundtrip
identity, and pickling then unpickling infinities, NaNs, and
minus zero doesn't work across boxes, or on some boxes even
on itself (e.g., Windows can't read the strings it produces
for infinities or NaNs).
""")
def read_float8(f):
r"""
>>> import io, struct
>>> raw = struct.pack(">d", -1.25)
>>> raw
b'\xbf\xf4\x00\x00\x00\x00\x00\x00'
>>> read_float8(io.BytesIO(raw + b"\n"))
-1.25
"""
data = f.read(8)
if len(data) == 8:
return _unpack(">d", data)[0]
raise ValueError("not enough data in stream to read float8")
float8 = ArgumentDescriptor(
name='float8',
n=8,
reader=read_float8,
doc="""An 8-byte binary representation of a float, big-endian.
The format is unique to Python, and shared with the struct
module (format string '>d') "in theory" (the struct and pickle
implementations don't share the code -- they should). It's
strongly related to the IEEE-754 double format, and, in normal
cases, is in fact identical to the big-endian 754 double format.
On other boxes the dynamic range is limited to that of a 754
double, and "add a half and chop" rounding is used to reduce
the precision to 53 bits. However, even on a 754 box,
infinities, NaNs, and minus zero may not be handled correctly
(may not survive roundtrip pickling intact).
""")
# Protocol 2 formats
from pickle import decode_long
def read_long1(f):
r"""
>>> import io
>>> read_long1(io.BytesIO(b"\x00"))
0
>>> read_long1(io.BytesIO(b"\x02\xff\x00"))
255
>>> read_long1(io.BytesIO(b"\x02\xff\x7f"))
32767
>>> read_long1(io.BytesIO(b"\x02\x00\xff"))
-256
>>> read_long1(io.BytesIO(b"\x02\x00\x80"))
-32768
"""
n = read_uint1(f)
data = f.read(n)
if len(data) != n:
raise ValueError("not enough data in stream to read long1")
return decode_long(data)
long1 = ArgumentDescriptor(
name="long1",
n=TAKEN_FROM_ARGUMENT1,
reader=read_long1,
doc="""A binary long, little-endian, using 1-byte size.
This first reads one byte as an unsigned size, then reads that
many bytes and interprets them as a little-endian 2's-complement long.
If the size is 0, that's taken as a shortcut for the long 0L.
""")
def read_long4(f):
r"""
>>> import io
>>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x00"))
255
>>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\xff\x7f"))
32767
>>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\xff"))
-256
>>> read_long4(io.BytesIO(b"\x02\x00\x00\x00\x00\x80"))
-32768
>>> read_long1(io.BytesIO(b"\x00\x00\x00\x00"))
0
"""
n = read_int4(f)
if n < 0:
raise ValueError("long4 byte count < 0: %d" % n)
data = f.read(n)
if len(data) != n:
raise ValueError("not enough data in stream to read long4")
return decode_long(data)
long4 = ArgumentDescriptor(
name="long4",
n=TAKEN_FROM_ARGUMENT4,
reader=read_long4,
doc="""A binary representation of a long, little-endian.
This first reads four bytes as a signed size (but requires the
size to be >= 0), then reads that many bytes and interprets them
as a little-endian 2's-complement long. If the size is 0, that's taken
as a shortcut for the int 0, although LONG1 should really be used
then instead (and in any case where # of bytes < 256).
""")
##############################################################################
# Object descriptors. The stack used by the pickle machine holds objects,
# and in the stack_before and stack_after attributes of OpcodeInfo
# descriptors we need names to describe the various types of objects that can
# appear on the stack.
class StackObject(object):
__slots__ = (
# name of descriptor record, for info only
'name',
# type of object, or tuple of type objects (meaning the object can
# be of any type in the tuple)
'obtype',
# human-readable docs for this kind of stack object; a string
'doc',
)
def __init__(self, name, obtype, doc):
assert isinstance(name, str)
self.name = name
assert isinstance(obtype, type) or isinstance(obtype, tuple)
if isinstance(obtype, tuple):
for contained in obtype:
assert isinstance(contained, type)
self.obtype = obtype
assert isinstance(doc, str)
self.doc = doc
def __repr__(self):
return self.name
pyint = pylong = StackObject(
name='int',
obtype=int,
doc="A Python integer object.")
pyinteger_or_bool = StackObject(
name='int_or_bool',
obtype=(int, bool),
doc="A Python integer or boolean object.")
pybool = StackObject(
name='bool',
obtype=bool,
doc="A Python boolean object.")
pyfloat = StackObject(
name='float',
obtype=float,
doc="A Python float object.")
pybytes_or_str = pystring = StackObject(
name='bytes_or_str',
obtype=(bytes, str),
doc="A Python bytes or (Unicode) string object.")
pybytes = StackObject(
name='bytes',
obtype=bytes,
doc="A Python bytes object.")
pyunicode = StackObject(
name='str',
obtype=str,
doc="A Python (Unicode) string object.")
pynone = StackObject(
name="None",
obtype=type(None),
doc="The Python None object.")
pytuple = StackObject(
name="tuple",
obtype=tuple,
doc="A Python tuple object.")
pylist = StackObject(
name="list",
obtype=list,
doc="A Python list object.")
pydict = StackObject(
name="dict",
obtype=dict,
doc="A Python dict object.")
pyset = StackObject(
name="set",
obtype=set,
doc="A Python set object.")
pyfrozenset = StackObject(
name="frozenset",
obtype=set,
doc="A Python frozenset object.")
anyobject = StackObject(
name='any',
obtype=object,
doc="Any kind of object whatsoever.")
markobject = StackObject(
name="mark",
obtype=StackObject,
doc="""'The mark' is a unique object.
Opcodes that operate on a variable number of objects
generally don't embed the count of objects in the opcode,
or pull it off the stack. Instead the MARK opcode is used
to push a special marker object on the stack, and then
some other opcodes grab all the objects from the top of
the stack down to (but not including) the topmost marker
object.
""")
stackslice = StackObject(
name="stackslice",
obtype=StackObject,
doc="""An object representing a contiguous slice of the stack.
This is used in conjunction with markobject, to represent all
of the stack following the topmost markobject. For example,
the POP_MARK opcode changes the stack from
[..., markobject, stackslice]
to
[...]
No matter how many object are on the stack after the topmost
markobject, POP_MARK gets rid of all of them (including the
topmost markobject too).
""")
##############################################################################
# Descriptors for pickle opcodes.
class OpcodeInfo(object):
__slots__ = (
# symbolic name of opcode; a string
'name',
# the code used in a bytestream to represent the opcode; a
# one-character string
'code',
# If the opcode has an argument embedded in the byte string, an
# instance of ArgumentDescriptor specifying its type. Note that
# arg.reader(s) can be used to read and decode the argument from
# the bytestream s, and arg.doc documents the format of the raw
# argument bytes. If the opcode doesn't have an argument embedded
# in the bytestream, arg should be None.
'arg',
# what the stack looks like before this opcode runs; a list
'stack_before',
# what the stack looks like after this opcode runs; a list
'stack_after',
# the protocol number in which this opcode was introduced; an int
'proto',
# human-readable docs for this opcode; a string
'doc',
)
def __init__(self, name, code, arg,
stack_before, stack_after, proto, doc):
assert isinstance(name, str)
self.name = name
assert isinstance(code, str)
assert len(code) == 1
self.code = code
assert arg is None or isinstance(arg, ArgumentDescriptor)
self.arg = arg
assert isinstance(stack_before, list)
for x in stack_before:
assert isinstance(x, StackObject)
self.stack_before = stack_before
assert isinstance(stack_after, list)
for x in stack_after:
assert isinstance(x, StackObject)
self.stack_after = stack_after
assert isinstance(proto, int) and 0 <= proto <= pickle.HIGHEST_PROTOCOL
self.proto = proto
assert isinstance(doc, str)
self.doc = doc
I = OpcodeInfo
opcodes = [
# Ways to spell integers.
I(name='INT',
code='I',
arg=decimalnl_short,
stack_before=[],
stack_after=[pyinteger_or_bool],
proto=0,
doc="""Push an integer or bool.
The argument is a newline-terminated decimal literal string.
The intent may have been that this always fit in a short Python int,
but INT can be generated in pickles written on a 64-bit box that
require a Python long on a 32-bit box. The difference between this
and LONG then is that INT skips a trailing 'L', and produces a short
int whenever possible.
Another difference is due to that, when bool was introduced as a
distinct type in 2.3, builtin names True and False were also added to
2.2.2, mapping to ints 1 and 0. For compatibility in both directions,
True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".
Leading zeroes are never produced for a genuine integer. The 2.3
(and later) unpicklers special-case these and return bool instead;
earlier unpicklers ignore the leading "0" and return the int.
"""),
I(name='BININT',
code='J',
arg=int4,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a four-byte signed integer.
This handles the full range of Python (short) integers on a 32-bit
box, directly as binary bytes (1 for the opcode and 4 for the integer).
If the integer is non-negative and fits in 1 or 2 bytes, pickling via
BININT1 or BININT2 saves space.
"""),
I(name='BININT1',
code='K',
arg=uint1,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a one-byte unsigned integer.
This is a space optimization for pickling very small non-negative ints,
in range(256).
"""),
I(name='BININT2',
code='M',
arg=uint2,
stack_before=[],
stack_after=[pyint],
proto=1,
doc="""Push a two-byte unsigned integer.
This is a space optimization for pickling small positive ints, in
range(256, 2**16). Integers in range(256) can also be pickled via
BININT2, but BININT1 instead saves a byte.
"""),
I(name='LONG',
code='L',
arg=decimalnl_long,
stack_before=[],
stack_after=[pyint],
proto=0,
doc="""Push a long integer.
The same as INT, except that the literal ends with 'L', and always
unpickles to a Python long. There doesn't seem a real purpose to the
trailing 'L'.
Note that LONG takes time quadratic in the number of digits when
unpickling (this is simply due to the nature of decimal->binary
conversion). Proto 2 added linear-time (in C; still quadratic-time
in Python) LONG1 and LONG4 opcodes.
"""),
I(name="LONG1",
code='\x8a',
arg=long1,
stack_before=[],
stack_after=[pyint],
proto=2,
doc="""Long integer using one-byte length.
A more efficient encoding of a Python long; the long1 encoding
says it all."""),
I(name="LONG4",
code='\x8b',
arg=long4,
stack_before=[],
stack_after=[pyint],
proto=2,
doc="""Long integer using found-byte length.
A more efficient encoding of a Python long; the long4 encoding
says it all."""),
# Ways to spell strings (8-bit, not Unicode).
I(name='STRING',
code='S',
arg=stringnl,
stack_before=[],
stack_after=[pybytes_or_str],
proto=0,
doc="""Push a Python string object.
The argument is a repr-style string, with bracketing quote characters,
and perhaps embedded escapes. The argument extends until the next
newline character. These are usually decoded into a str instance
using the encoding given to the Unpickler constructor. or the default,
'ASCII'. If the encoding given was 'bytes' however, they will be
decoded as bytes object instead.
"""),
I(name='BINSTRING',
code='T',
arg=string4,
stack_before=[],
stack_after=[pybytes_or_str],
proto=1,
doc="""Push a Python string object.
There are two arguments: the first is a 4-byte little-endian
signed int giving the number of bytes in the string, and the
second is that many bytes, which are taken literally as the string
content. These are usually decoded into a str instance using the
encoding given to the Unpickler constructor. or the default,
'ASCII'. If the encoding given was 'bytes' however, they will be
decoded as bytes object instead.
"""),
I(name='SHORT_BINSTRING',
code='U',
arg=string1,
stack_before=[],
stack_after=[pybytes_or_str],
proto=1,
doc="""Push a Python string object.
There are two arguments: the first is a 1-byte unsigned int giving
the number of bytes in the string, and the second is that many
bytes, which are taken literally as the string content. These are
usually decoded into a str instance using the encoding given to
the Unpickler constructor. or the default, 'ASCII'. If the
encoding given was 'bytes' however, they will be decoded as bytes
object instead.
"""),
# Bytes (protocol 3 only; older protocols don't support bytes at all)
I(name='BINBYTES',
code='B',
arg=bytes4,
stack_before=[],
stack_after=[pybytes],
proto=3,
doc="""Push a Python bytes object.
There are two arguments: the first is a 4-byte little-endian unsigned int
giving the number of bytes, and the second is that many bytes, which are
taken literally as the bytes content.
"""),
I(name='SHORT_BINBYTES',
code='C',
arg=bytes1,
stack_before=[],
stack_after=[pybytes],
proto=3,
doc="""Push a Python bytes object.
There are two arguments: the first is a 1-byte unsigned int giving
the number of bytes, and the second is that many bytes, which are taken
literally as the string content.
"""),
I(name='BINBYTES8',
code='\x8e',
arg=bytes8,
stack_before=[],
stack_after=[pybytes],
proto=4,
doc="""Push a Python bytes object.
There are two arguments: the first is a 8-byte unsigned int giving
the number of bytes in the string, and the second is that many bytes,
which are taken literally as the string content.
"""),
# Ways to spell None.
I(name='NONE',
code='N',
arg=None,
stack_before=[],
stack_after=[pynone],
proto=0,
doc="Push None on the stack."),
# Ways to spell bools, starting with proto 2. See INT for how this was
# done before proto 2.
I(name='NEWTRUE',
code='\x88',
arg=None,
stack_before=[],
stack_after=[pybool],
proto=2,
doc="""True.
Push True onto the stack."""),
I(name='NEWFALSE',
code='\x89',
arg=None,
stack_before=[],
stack_after=[pybool],
proto=2,
doc="""True.
Push False onto the stack."""),
# Ways to spell Unicode strings.
I(name='UNICODE',
code='V',
arg=unicodestringnl,
stack_before=[],
stack_after=[pyunicode],
proto=0, # this may be pure-text, but it's a later addition
doc="""Push a Python Unicode string object.
The argument is a raw-unicode-escape encoding of a Unicode string,
and so may contain embedded escape sequences. The argument extends
until the next newline character.
"""),
I(name='SHORT_BINUNICODE',
code='\x8c',
arg=unicodestring1,
stack_before=[],
stack_after=[pyunicode],
proto=4,
doc="""Push a Python Unicode string object.
There are two arguments: the first is a 1-byte little-endian signed int
giving the number of bytes in the string. The second is that many
bytes, and is the UTF-8 encoding of the Unicode string.
"""),
I(name='BINUNICODE',
code='X',
arg=unicodestring4,
stack_before=[],
stack_after=[pyunicode],
proto=1,
doc="""Push a Python Unicode string object.
There are two arguments: the first is a 4-byte little-endian unsigned int
giving the number of bytes in the string. The second is that many
bytes, and is the UTF-8 encoding of the Unicode string.
"""),
I(name='BINUNICODE8',
code='\x8d',
arg=unicodestring8,
stack_before=[],
stack_after=[pyunicode],
proto=4,
doc="""Push a Python Unicode string object.
There are two arguments: the first is a 8-byte little-endian signed int
giving the number of bytes in the string. The second is that many
bytes, and is the UTF-8 encoding of the Unicode string.
"""),
# Ways to spell floats.
I(name='FLOAT',
code='F',
arg=floatnl,
stack_before=[],
stack_after=[pyfloat],
proto=0,
doc="""Newline-terminated decimal float literal.
The argument is repr(a_float), and in general requires 17 significant
digits for roundtrip conversion to be an identity (this is so for
IEEE-754 double precision values, which is what Python float maps to
on most boxes).
In general, FLOAT cannot be used to transport infinities, NaNs, or
minus zero across boxes (or even on a single box, if the platform C
library can't read the strings it produces for such things -- Windows
is like that), but may do less damage than BINFLOAT on boxes with
greater precision or dynamic range than IEEE-754 double.
"""),
I(name='BINFLOAT',
code='G',
arg=float8,
stack_before=[],
stack_after=[pyfloat],
proto=1,
doc="""Float stored in binary form, with 8 bytes of data.
This generally requires less than half the space of FLOAT encoding.
In general, BINFLOAT cannot be used to transport infinities, NaNs, or
minus zero, raises an exception if the exponent exceeds the range of
an IEEE-754 double, and retains no more than 53 bits of precision (if
there are more than that, "add a half and chop" rounding is used to
cut it back to 53 significant bits).
"""),
# Ways to build lists.
I(name='EMPTY_LIST',
code=']',
arg=None,
stack_before=[],
stack_after=[pylist],
proto=1,
doc="Push an empty list."),
I(name='APPEND',
code='a',
arg=None,
stack_before=[pylist, anyobject],
stack_after=[pylist],
proto=0,
doc="""Append an object to a list.
Stack before: ... pylist anyobject
Stack after: ... pylist+[anyobject]
although pylist is really extended in-place.
"""),
I(name='APPENDS',
code='e',
arg=None,
stack_before=[pylist, markobject, stackslice],
stack_after=[pylist],
proto=1,
doc="""Extend a list by a slice of stack objects.
Stack before: ... pylist markobject stackslice
Stack after: ... pylist+stackslice
although pylist is really extended in-place.
"""),
I(name='LIST',
code='l',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pylist],
proto=0,
doc="""Build a list out of the topmost stack slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python list, which single list object replaces all of the
stack from the topmost markobject onward. For example,
Stack before: ... markobject 1 2 3 'abc'
Stack after: ... [1, 2, 3, 'abc']
"""),
# Ways to build tuples.
I(name='EMPTY_TUPLE',
code=')',
arg=None,
stack_before=[],
stack_after=[pytuple],
proto=1,
doc="Push an empty tuple."),
I(name='TUPLE',
code='t',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pytuple],
proto=0,
doc="""Build a tuple out of the topmost stack slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python tuple, which single tuple object replaces all of the
stack from the topmost markobject onward. For example,
Stack before: ... markobject 1 2 3 'abc'
Stack after: ... (1, 2, 3, 'abc')
"""),
I(name='TUPLE1',
code='\x85',
arg=None,
stack_before=[anyobject],
stack_after=[pytuple],
proto=2,
doc="""Build a one-tuple out of the topmost item on the stack.
This code pops one value off the stack and pushes a tuple of
length 1 whose one item is that value back onto it. In other
words:
stack[-1] = tuple(stack[-1:])
"""),
I(name='TUPLE2',
code='\x86',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[pytuple],
proto=2,
doc="""Build a two-tuple out of the top two items on the stack.
This code pops two values off the stack and pushes a tuple of
length 2 whose items are those values back onto it. In other
words:
stack[-2:] = [tuple(stack[-2:])]
"""),
I(name='TUPLE3',
code='\x87',
arg=None,
stack_before=[anyobject, anyobject, anyobject],
stack_after=[pytuple],
proto=2,
doc="""Build a three-tuple out of the top three items on the stack.
This code pops three values off the stack and pushes a tuple of
length 3 whose items are those values back onto it. In other
words:
stack[-3:] = [tuple(stack[-3:])]
"""),
# Ways to build dicts.
I(name='EMPTY_DICT',
code='}',
arg=None,
stack_before=[],
stack_after=[pydict],
proto=1,
doc="Push an empty dict."),
I(name='DICT',
code='d',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pydict],
proto=0,
doc="""Build a dict out of the topmost stack slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python dict, which single dict object replaces all of the
stack from the topmost markobject onward. The stack slice alternates
key, value, key, value, .... For example,
Stack before: ... markobject 1 2 3 'abc'
Stack after: ... {1: 2, 3: 'abc'}
"""),
I(name='SETITEM',
code='s',
arg=None,
stack_before=[pydict, anyobject, anyobject],
stack_after=[pydict],
proto=0,
doc="""Add a key+value pair to an existing dict.
Stack before: ... pydict key value
Stack after: ... pydict
where pydict has been modified via pydict[key] = value.
"""),
I(name='SETITEMS',
code='u',
arg=None,
stack_before=[pydict, markobject, stackslice],
stack_after=[pydict],
proto=1,
doc="""Add an arbitrary number of key+value pairs to an existing dict.
The slice of the stack following the topmost markobject is taken as
an alternating sequence of keys and values, added to the dict
immediately under the topmost markobject. Everything at and after the
topmost markobject is popped, leaving the mutated dict at the top
of the stack.
Stack before: ... pydict markobject key_1 value_1 ... key_n value_n
Stack after: ... pydict
where pydict has been modified via pydict[key_i] = value_i for i in
1, 2, ..., n, and in that order.
"""),
# Ways to build sets
I(name='EMPTY_SET',
code='\x8f',
arg=None,
stack_before=[],
stack_after=[pyset],
proto=4,
doc="Push an empty set."),
I(name='ADDITEMS',
code='\x90',
arg=None,
stack_before=[pyset, markobject, stackslice],
stack_after=[pyset],
proto=4,
doc="""Add an arbitrary number of items to an existing set.
The slice of the stack following the topmost markobject is taken as
a sequence of items, added to the set immediately under the topmost
markobject. Everything at and after the topmost markobject is popped,
leaving the mutated set at the top of the stack.
Stack before: ... pyset markobject item_1 ... item_n
Stack after: ... pyset
where pyset has been modified via pyset.add(item_i) = item_i for i in
1, 2, ..., n, and in that order.
"""),
# Way to build frozensets
I(name='FROZENSET',
code='\x91',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[pyfrozenset],
proto=4,
doc="""Build a frozenset out of the topmost slice, after markobject.
All the stack entries following the topmost markobject are placed into
a single Python frozenset, which single frozenset object replaces all
of the stack from the topmost markobject onward. For example,
Stack before: ... markobject 1 2 3
Stack after: ... frozenset({1, 2, 3})
"""),
# Stack manipulation.
I(name='POP',
code='0',
arg=None,
stack_before=[anyobject],
stack_after=[],
proto=0,
doc="Discard the top stack item, shrinking the stack by one item."),
I(name='DUP',
code='2',
arg=None,
stack_before=[anyobject],
stack_after=[anyobject, anyobject],
proto=0,
doc="Push the top stack item onto the stack again, duplicating it."),
I(name='MARK',
code='(',
arg=None,
stack_before=[],
stack_after=[markobject],
proto=0,
doc="""Push markobject onto the stack.
markobject is a unique object, used by other opcodes to identify a
region of the stack containing a variable number of objects for them
to work on. See markobject.doc for more detail.
"""),
I(name='POP_MARK',
code='1',
arg=None,
stack_before=[markobject, stackslice],
stack_after=[],
proto=1,
doc="""Pop all the stack objects at and above the topmost markobject.
When an opcode using a variable number of stack objects is done,
POP_MARK is used to remove those objects, and to remove the markobject
that delimited their starting position on the stack.
"""),
# Memo manipulation. There are really only two operations (get and put),
# each in all-text, "short binary", and "long binary" flavors.
I(name='GET',
code='g',
arg=decimalnl_short,
stack_before=[],
stack_after=[anyobject],
proto=0,
doc="""Read an object from the memo and push it on the stack.
The index of the memo object to push is given by the newline-terminated
decimal string following. BINGET and LONG_BINGET are space-optimized
versions.
"""),
I(name='BINGET',
code='h',
arg=uint1,
stack_before=[],
stack_after=[anyobject],
proto=1,
doc="""Read an object from the memo and push it on the stack.
The index of the memo object to push is given by the 1-byte unsigned
integer following.
"""),
I(name='LONG_BINGET',
code='j',
arg=uint4,
stack_before=[],
stack_after=[anyobject],
proto=1,
doc="""Read an object from the memo and push it on the stack.
The index of the memo object to push is given by the 4-byte unsigned
little-endian integer following.
"""),
I(name='PUT',
code='p',
arg=decimalnl_short,
stack_before=[],
stack_after=[],
proto=0,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write into is given by the newline-
terminated decimal string following. BINPUT and LONG_BINPUT are
space-optimized versions.
"""),
I(name='BINPUT',
code='q',
arg=uint1,
stack_before=[],
stack_after=[],
proto=1,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write into is given by the 1-byte
unsigned integer following.
"""),
I(name='LONG_BINPUT',
code='r',
arg=uint4,
stack_before=[],
stack_after=[],
proto=1,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write into is given by the 4-byte
unsigned little-endian integer following.
"""),
I(name='MEMOIZE',
code='\x94',
arg=None,
stack_before=[anyobject],
stack_after=[anyobject],
proto=4,
doc="""Store the stack top into the memo. The stack is not popped.
The index of the memo location to write is the number of
elements currently present in the memo.
"""),
# Access the extension registry (predefined objects). Akin to the GET
# family.
I(name='EXT1',
code='\x82',
arg=uint1,
stack_before=[],
stack_after=[anyobject],
proto=2,
doc="""Extension code.
This code and the similar EXT2 and EXT4 allow using a registry
of popular objects that are pickled by name, typically classes.
It is envisioned that through a global negotiation and
registration process, third parties can set up a mapping between
ints and object names.
In order to guarantee pickle interchangeability, the extension
code registry ought to be global, although a range of codes may
be reserved for private use.
EXT1 has a 1-byte integer argument. This is used to index into the
extension registry, and the object at that index is pushed on the stack.
"""),
I(name='EXT2',
code='\x83',
arg=uint2,
stack_before=[],
stack_after=[anyobject],
proto=2,
doc="""Extension code.
See EXT1. EXT2 has a two-byte integer argument.
"""),
I(name='EXT4',
code='\x84',
arg=int4,
stack_before=[],
stack_after=[anyobject],
proto=2,
doc="""Extension code.
See EXT1. EXT4 has a four-byte integer argument.
"""),
# Push a class object, or module function, on the stack, via its module
# and name.
I(name='GLOBAL',
code='c',
arg=stringnl_noescape_pair,
stack_before=[],
stack_after=[anyobject],
proto=0,
doc="""Push a global object (module.attr) on the stack.
Two newline-terminated strings follow the GLOBAL opcode. The first is
taken as a module name, and the second as a class name. The class
object module.class is pushed on the stack. More accurately, the
object returned by self.find_class(module, class) is pushed on the
stack, so unpickling subclasses can override this form of lookup.
"""),
I(name='STACK_GLOBAL',
code='\x93',
arg=None,
stack_before=[pyunicode, pyunicode],
stack_after=[anyobject],
proto=4,
doc="""Push a global object (module.attr) on the stack.
"""),
# Ways to build objects of classes pickle doesn't know about directly
# (user-defined classes). I despair of documenting this accurately
# and comprehensibly -- you really have to read the pickle code to
# find all the special cases.
I(name='REDUCE',
code='R',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=0,
doc="""Push an object built from a callable and an argument tuple.
The opcode is named to remind of the __reduce__() method.
Stack before: ... callable pytuple
Stack after: ... callable(*pytuple)
The callable and the argument tuple are the first two items returned
by a __reduce__ method. Applying the callable to the argtuple is
supposed to reproduce the original object, or at least get it started.
If the __reduce__ method returns a 3-tuple, the last component is an
argument to be passed to the object's __setstate__, and then the REDUCE
opcode is followed by code to create setstate's argument, and then a
BUILD opcode to apply __setstate__ to that argument.
If not isinstance(callable, type), REDUCE complains unless the
callable has been registered with the copyreg module's
safe_constructors dict, or the callable has a magic
'__safe_for_unpickling__' attribute with a true value. I'm not sure
why it does this, but I've sure seen this complaint often enough when
I didn't want to <wink>.
"""),
I(name='BUILD',
code='b',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=0,
doc="""Finish building an object, via __setstate__ or dict update.
Stack before: ... anyobject argument
Stack after: ... anyobject
where anyobject may have been mutated, as follows:
If the object has a __setstate__ method,
anyobject.__setstate__(argument)
is called.
Else the argument must be a dict, the object must have a __dict__, and
the object is updated via
anyobject.__dict__.update(argument)
"""),
I(name='INST',
code='i',
arg=stringnl_noescape_pair,
stack_before=[markobject, stackslice],
stack_after=[anyobject],
proto=0,
doc="""Build a class instance.
This is the protocol 0 version of protocol 1's OBJ opcode.
INST is followed by two newline-terminated strings, giving a
module and class name, just as for the GLOBAL opcode (and see
GLOBAL for more details about that). self.find_class(module, name)
is used to get a class object.
In addition, all the objects on the stack following the topmost
markobject are gathered into a tuple and popped (along with the
topmost markobject), just as for the TUPLE opcode.
Now it gets complicated. If all of these are true:
+ The argtuple is empty (markobject was at the top of the stack
at the start).
+ The class object does not have a __getinitargs__ attribute.
then we want to create an old-style class instance without invoking
its __init__() method (pickle has waffled on this over the years; not
calling __init__() is current wisdom). In this case, an instance of
an old-style dummy class is created, and then we try to rebind its
__class__ attribute to the desired class object. If this succeeds,
the new instance object is pushed on the stack, and we're done.
Else (the argtuple is not empty, it's not an old-style class object,
or the class object does have a __getinitargs__ attribute), the code
first insists that the class object have a __safe_for_unpickling__
attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,
it doesn't matter whether this attribute has a true or false value, it
only matters whether it exists (XXX this is a bug). If
__safe_for_unpickling__ doesn't exist, UnpicklingError is raised.
Else (the class object does have a __safe_for_unpickling__ attr),
the class object obtained from INST's arguments is applied to the
argtuple obtained from the stack, and the resulting instance object
is pushed on the stack.
NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.
NOTE: the distinction between old-style and new-style classes does
not make sense in Python 3.
"""),
I(name='OBJ',
code='o',
arg=None,
stack_before=[markobject, anyobject, stackslice],
stack_after=[anyobject],
proto=1,
doc="""Build a class instance.
This is the protocol 1 version of protocol 0's INST opcode, and is
very much like it. The major difference is that the class object
is taken off the stack, allowing it to be retrieved from the memo
repeatedly if several instances of the same class are created. This
can be much more efficient (in both time and space) than repeatedly
embedding the module and class names in INST opcodes.
Unlike INST, OBJ takes no arguments from the opcode stream. Instead
the class object is taken off the stack, immediately above the
topmost markobject:
Stack before: ... markobject classobject stackslice
Stack after: ... new_instance_object
As for INST, the remainder of the stack above the markobject is
gathered into an argument tuple, and then the logic seems identical,
except that no __safe_for_unpickling__ check is done (XXX this is
a bug). See INST for the gory details.
NOTE: In Python 2.3, INST and OBJ are identical except for how they
get the class object. That was always the intent; the implementations
had diverged for accidental reasons.
"""),
I(name='NEWOBJ',
code='\x81',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=2,
doc="""Build an object instance.
The stack before should be thought of as containing a class
object followed by an argument tuple (the tuple being the stack
top). Call these cls and args. They are popped off the stack,
and the value returned by cls.__new__(cls, *args) is pushed back
onto the stack.
"""),
I(name='NEWOBJ_EX',
code='\x92',
arg=None,
stack_before=[anyobject, anyobject, anyobject],
stack_after=[anyobject],
proto=4,
doc="""Build an object instance.
The stack before should be thought of as containing a class
object followed by an argument tuple and by a keyword argument dict
(the dict being the stack top). Call these cls and args. They are
popped off the stack, and the value returned by
cls.__new__(cls, *args, *kwargs) is pushed back onto the stack.
"""),
# Machine control.
I(name='PROTO',
code='\x80',
arg=uint1,
stack_before=[],
stack_after=[],
proto=2,
doc="""Protocol version indicator.
For protocol 2 and above, a pickle must start with this opcode.
The argument is the protocol version, an int in range(2, 256).
"""),
I(name='STOP',
code='.',
arg=None,
stack_before=[anyobject],
stack_after=[],
proto=0,
doc="""Stop the unpickling machine.
Every pickle ends with this opcode. The object at the top of the stack
is popped, and that's the result of unpickling. The stack should be
empty then.
"""),
# Framing support.
I(name='FRAME',
code='\x95',
arg=uint8,
stack_before=[],
stack_after=[],
proto=4,
doc="""Indicate the beginning of a new frame.
The unpickler may use this opcode to safely prefetch data from its
underlying stream.
"""),
# Ways to deal with persistent IDs.
I(name='PERSID',
code='P',
arg=stringnl_noescape,
stack_before=[],
stack_after=[anyobject],
proto=0,
doc="""Push an object identified by a persistent ID.
The pickle module doesn't define what a persistent ID means. PERSID's
argument is a newline-terminated str-style (no embedded escapes, no
bracketing quote characters) string, which *is* "the persistent ID".
The unpickler passes this string to self.persistent_load(). Whatever
object that returns is pushed on the stack. There is no implementation
of persistent_load() in Python's unpickler: it must be supplied by an
unpickler subclass.
"""),
I(name='BINPERSID',
code='Q',
arg=None,
stack_before=[anyobject],
stack_after=[anyobject],
proto=1,
doc="""Push an object identified by a persistent ID.
Like PERSID, except the persistent ID is popped off the stack (instead
of being a string embedded in the opcode bytestream). The persistent
ID is passed to self.persistent_load(), and whatever object that
returns is pushed on the stack. See PERSID for more detail.
"""),
]
del I
# Verify uniqueness of .name and .code members.
name2i = {}
code2i = {}
for i, d in enumerate(opcodes):
if d.name in name2i:
raise ValueError("repeated name %r at indices %d and %d" %
(d.name, name2i[d.name], i))
if d.code in code2i:
raise ValueError("repeated code %r at indices %d and %d" %
(d.code, code2i[d.code], i))
name2i[d.name] = i
code2i[d.code] = i
del name2i, code2i, i, d
##############################################################################
# Build a code2op dict, mapping opcode characters to OpcodeInfo records.
# Also ensure we've got the same stuff as pickle.py, although the
# introspection here is dicey.
code2op = {}
for d in opcodes:
code2op[d.code] = d
del d
def assure_pickle_consistency(verbose=False):
copy = code2op.copy()
for name in pickle.__all__:
if not re.match("[A-Z][A-Z0-9_]+$", name):
if verbose:
print("skipping %r: it doesn't look like an opcode name" % name)
continue
picklecode = getattr(pickle, name)
if not isinstance(picklecode, bytes) or len(picklecode) != 1:
if verbose:
print(("skipping %r: value %r doesn't look like a pickle "
"code" % (name, picklecode)))
continue
picklecode = picklecode.decode("latin-1")
if picklecode in copy:
if verbose:
print("checking name %r w/ code %r for consistency" % (
name, picklecode))
d = copy[picklecode]
if d.name != name:
raise ValueError("for pickle code %r, pickle.py uses name %r "
"but we're using name %r" % (picklecode,
name,
d.name))
# Forget this one. Any left over in copy at the end are a problem
# of a different kind.
del copy[picklecode]
else:
raise ValueError("pickle.py appears to have a pickle opcode with "
"name %r and code %r, but we don't" %
(name, picklecode))
if copy:
msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]
for code, d in copy.items():
msg.append(" name %r with code %r" % (d.name, code))
raise ValueError("\n".join(msg))
assure_pickle_consistency()
del assure_pickle_consistency
##############################################################################
# A pickle opcode generator.
def _genops(data, yield_end_pos=False):
if isinstance(data, bytes_types):
data = io.BytesIO(data)
if hasattr(data, "tell"):
getpos = data.tell
else:
getpos = lambda: None
while True:
pos = getpos()
code = data.read(1)
opcode = code2op.get(code.decode("latin-1"))
if opcode is None:
if code == b"":
raise ValueError("pickle exhausted before seeing STOP")
else:
raise ValueError("at position %s, opcode %r unknown" % (
"<unknown>" if pos is None else pos,
code))
if opcode.arg is None:
arg = None
else:
arg = opcode.arg.reader(data)
if yield_end_pos:
yield opcode, arg, pos, getpos()
else:
yield opcode, arg, pos
if code == b'.':
assert opcode.name == 'STOP'
break
def genops(pickle):
"""Generate all the opcodes in a pickle.
'pickle' is a file-like object, or string, containing the pickle.
Each opcode in the pickle is generated, from the current pickle position,
stopping after a STOP opcode is delivered. A triple is generated for
each opcode:
opcode, arg, pos
opcode is an OpcodeInfo record, describing the current opcode.
If the opcode has an argument embedded in the pickle, arg is its decoded
value, as a Python object. If the opcode doesn't have an argument, arg
is None.
If the pickle has a tell() method, pos was the value of pickle.tell()
before reading the current opcode. If the pickle is a bytes object,
it's wrapped in a BytesIO object, and the latter's tell() result is
used. Else (the pickle doesn't have a tell(), and it's not obvious how
to query its current position) pos is None.
"""
return _genops(pickle)
##############################################################################
# A pickle optimizer.
def optimize(p):
'Optimize a pickle string by removing unused PUT opcodes'
put = 'PUT'
get = 'GET'
oldids = set() # set of all PUT ids
newids = {} # set of ids used by a GET opcode
opcodes = [] # (op, idx) or (pos, end_pos)
proto = 0
protoheader = b''
for opcode, arg, pos, end_pos in _genops(p, yield_end_pos=True):
if 'PUT' in opcode.name:
oldids.add(arg)
opcodes.append((put, arg))
elif opcode.name == 'MEMOIZE':
idx = len(oldids)
oldids.add(idx)
opcodes.append((put, idx))
elif 'FRAME' in opcode.name:
pass
elif 'GET' in opcode.name:
if opcode.proto > proto:
proto = opcode.proto
newids[arg] = None
opcodes.append((get, arg))
elif opcode.name == 'PROTO':
if arg > proto:
proto = arg
if pos == 0:
protoheader = p[pos: end_pos]
else:
opcodes.append((pos, end_pos))
else:
opcodes.append((pos, end_pos))
del oldids
# Copy the opcodes except for PUTS without a corresponding GET
out = io.BytesIO()
# Write the PROTO header before any framing
out.write(protoheader)
pickler = pickle._Pickler(out, proto)
if proto >= 4:
pickler.framer.start_framing()
idx = 0
for op, arg in opcodes:
if op is put:
if arg not in newids:
continue
data = pickler.put(idx)
newids[arg] = idx
idx += 1
elif op is get:
data = pickler.get(newids[arg])
else:
data = p[op:arg]
pickler.framer.commit_frame()
pickler.write(data)
pickler.framer.end_framing()
return out.getvalue()
##############################################################################
# A symbolic pickle disassembler.
def dis(pickle, out=None, memo=None, indentlevel=4, annotate=0):
"""Produce a symbolic disassembly of a pickle.
'pickle' is a file-like object, or string, containing a (at least one)
pickle. The pickle is disassembled from the current position, through
the first STOP opcode encountered.
Optional arg 'out' is a file-like object to which the disassembly is
printed. It defaults to sys.stdout.
Optional arg 'memo' is a Python dict, used as the pickle's memo. It
may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.
Passing the same memo object to another dis() call then allows disassembly
to proceed across multiple pickles that were all created by the same
pickler with the same memo. Ordinarily you don't need to worry about this.
Optional arg 'indentlevel' is the number of blanks by which to indent
a new MARK level. It defaults to 4.
Optional arg 'annotate' if nonzero instructs dis() to add short
description of the opcode on each line of disassembled output.
The value given to 'annotate' must be an integer and is used as a
hint for the column where annotation should start. The default
value is 0, meaning no annotations.
In addition to printing the disassembly, some sanity checks are made:
+ All embedded opcode arguments "make sense".
+ Explicit and implicit pop operations have enough items on the stack.
+ When an opcode implicitly refers to a markobject, a markobject is
actually on the stack.
+ A memo entry isn't referenced before it's defined.
+ The markobject isn't stored in the memo.
+ A memo entry isn't redefined.
"""
# Most of the hair here is for sanity checks, but most of it is needed
# anyway to detect when a protocol 0 POP takes a MARK off the stack
# (which in turn is needed to indent MARK blocks correctly).
stack = [] # crude emulation of unpickler stack
if memo is None:
memo = {} # crude emulation of unpickler memo
maxproto = -1 # max protocol number seen
markstack = [] # bytecode positions of MARK opcodes
indentchunk = ' ' * indentlevel
errormsg = None
annocol = annotate # column hint for annotations
for opcode, arg, pos in genops(pickle):
if pos is not None:
print("%5d:" % pos, end=' ', file=out)
line = "%-4s %s%s" % (repr(opcode.code)[1:-1],
indentchunk * len(markstack),
opcode.name)
maxproto = max(maxproto, opcode.proto)
before = opcode.stack_before # don't mutate
after = opcode.stack_after # don't mutate
numtopop = len(before)
# See whether a MARK should be popped.
markmsg = None
if markobject in before or (opcode.name == "POP" and
stack and
stack[-1] is markobject):
assert markobject not in after
if __debug__:
if markobject in before:
assert before[-1] is stackslice
if markstack:
markpos = markstack.pop()
if markpos is None:
markmsg = "(MARK at unknown opcode offset)"
else:
markmsg = "(MARK at %d)" % markpos
# Pop everything at and after the topmost markobject.
while stack[-1] is not markobject:
stack.pop()
stack.pop()
# Stop later code from popping too much.
try:
numtopop = before.index(markobject)
except ValueError:
assert opcode.name == "POP"
numtopop = 0
else:
errormsg = markmsg = "no MARK exists on stack"
# Check for correct memo usage.
if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT", "MEMOIZE"):
if opcode.name == "MEMOIZE":
memo_idx = len(memo)
else:
assert arg is not None
memo_idx = arg
if memo_idx in memo:
errormsg = "memo key %r already defined" % arg
elif not stack:
errormsg = "stack is empty -- can't store into memo"
elif stack[-1] is markobject:
errormsg = "can't store markobject in the memo"
else:
memo[memo_idx] = stack[-1]
elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):
if arg in memo:
assert len(after) == 1
after = [memo[arg]] # for better stack emulation
else:
errormsg = "memo key %r has never been stored into" % arg
if arg is not None or markmsg:
# make a mild effort to align arguments
line += ' ' * (10 - len(opcode.name))
if arg is not None:
line += ' ' + repr(arg)
if markmsg:
line += ' ' + markmsg
if annotate:
line += ' ' * (annocol - len(line))
# make a mild effort to align annotations
annocol = len(line)
if annocol > 50:
annocol = annotate
line += ' ' + opcode.doc.split('\n', 1)[0]
print(line, file=out)
if errormsg:
# Note that we delayed complaining until the offending opcode
# was printed.
raise ValueError(errormsg)
# Emulate the stack effects.
if len(stack) < numtopop:
raise ValueError("tries to pop %d items from stack with "
"only %d items" % (numtopop, len(stack)))
if numtopop:
del stack[-numtopop:]
if markobject in after:
assert markobject not in before
markstack.append(pos)
stack.extend(after)
print("highest protocol among opcodes =", maxproto, file=out)
if stack:
raise ValueError("stack not empty after STOP: %r" % stack)
# For use in the doctest, simply as an example of a class to pickle.
class _Example:
def __init__(self, value):
self.value = value
_dis_test = r"""
>>> import pickle
>>> x = [1, 2, (3, 4), {b'abc': "def"}]
>>> pkl0 = pickle.dumps(x, 0)
>>> dis(pkl0)
0: ( MARK
1: l LIST (MARK at 0)
2: p PUT 0
5: L LONG 1
9: a APPEND
10: L LONG 2
14: a APPEND
15: ( MARK
16: L LONG 3
20: L LONG 4
24: t TUPLE (MARK at 15)
25: p PUT 1
28: a APPEND
29: ( MARK
30: d DICT (MARK at 29)
31: p PUT 2
34: c GLOBAL '_codecs encode'
50: p PUT 3
53: ( MARK
54: V UNICODE 'abc'
59: p PUT 4
62: V UNICODE 'latin1'
70: p PUT 5
73: t TUPLE (MARK at 53)
74: p PUT 6
77: R REDUCE
78: p PUT 7
81: V UNICODE 'def'
86: p PUT 8
89: s SETITEM
90: a APPEND
91: . STOP
highest protocol among opcodes = 0
Try again with a "binary" pickle.
>>> pkl1 = pickle.dumps(x, 1)
>>> dis(pkl1)
0: ] EMPTY_LIST
1: q BINPUT 0
3: ( MARK
4: K BININT1 1
6: K BININT1 2
8: ( MARK
9: K BININT1 3
11: K BININT1 4
13: t TUPLE (MARK at 8)
14: q BINPUT 1
16: } EMPTY_DICT
17: q BINPUT 2
19: c GLOBAL '_codecs encode'
35: q BINPUT 3
37: ( MARK
38: X BINUNICODE 'abc'
46: q BINPUT 4
48: X BINUNICODE 'latin1'
59: q BINPUT 5
61: t TUPLE (MARK at 37)
62: q BINPUT 6
64: R REDUCE
65: q BINPUT 7
67: X BINUNICODE 'def'
75: q BINPUT 8
77: s SETITEM
78: e APPENDS (MARK at 3)
79: . STOP
highest protocol among opcodes = 1
Exercise the INST/OBJ/BUILD family.
>>> import pickletools
>>> dis(pickle.dumps(pickletools.dis, 0))
0: c GLOBAL 'pickletools dis'
17: p PUT 0
20: . STOP
highest protocol among opcodes = 0
>>> from pickletools import _Example
>>> x = [_Example(42)] * 2
>>> dis(pickle.dumps(x, 0))
0: ( MARK
1: l LIST (MARK at 0)
2: p PUT 0
5: c GLOBAL 'copy_reg _reconstructor'
30: p PUT 1
33: ( MARK
34: c GLOBAL 'pickletools _Example'
56: p PUT 2
59: c GLOBAL '__builtin__ object'
79: p PUT 3
82: N NONE
83: t TUPLE (MARK at 33)
84: p PUT 4
87: R REDUCE
88: p PUT 5
91: ( MARK
92: d DICT (MARK at 91)
93: p PUT 6
96: V UNICODE 'value'
103: p PUT 7
106: L LONG 42
111: s SETITEM
112: b BUILD
113: a APPEND
114: g GET 5
117: a APPEND
118: . STOP
highest protocol among opcodes = 0
>>> dis(pickle.dumps(x, 1))
0: ] EMPTY_LIST
1: q BINPUT 0
3: ( MARK
4: c GLOBAL 'copy_reg _reconstructor'
29: q BINPUT 1
31: ( MARK
32: c GLOBAL 'pickletools _Example'
54: q BINPUT 2
56: c GLOBAL '__builtin__ object'
76: q BINPUT 3
78: N NONE
79: t TUPLE (MARK at 31)
80: q BINPUT 4
82: R REDUCE
83: q BINPUT 5
85: } EMPTY_DICT
86: q BINPUT 6
88: X BINUNICODE 'value'
98: q BINPUT 7
100: K BININT1 42
102: s SETITEM
103: b BUILD
104: h BINGET 5
106: e APPENDS (MARK at 3)
107: . STOP
highest protocol among opcodes = 1
Try "the canonical" recursive-object test.
>>> L = []
>>> T = L,
>>> L.append(T)
>>> L[0] is T
True
>>> T[0] is L
True
>>> L[0][0] is L
True
>>> T[0][0] is T
True
>>> dis(pickle.dumps(L, 0))
0: ( MARK
1: l LIST (MARK at 0)
2: p PUT 0
5: ( MARK
6: g GET 0
9: t TUPLE (MARK at 5)
10: p PUT 1
13: a APPEND
14: . STOP
highest protocol among opcodes = 0
>>> dis(pickle.dumps(L, 1))
0: ] EMPTY_LIST
1: q BINPUT 0
3: ( MARK
4: h BINGET 0
6: t TUPLE (MARK at 3)
7: q BINPUT 1
9: a APPEND
10: . STOP
highest protocol among opcodes = 1
Note that, in the protocol 0 pickle of the recursive tuple, the disassembler
has to emulate the stack in order to realize that the POP opcode at 16 gets
rid of the MARK at 0.
>>> dis(pickle.dumps(T, 0))
0: ( MARK
1: ( MARK
2: l LIST (MARK at 1)
3: p PUT 0
6: ( MARK
7: g GET 0
10: t TUPLE (MARK at 6)
11: p PUT 1
14: a APPEND
15: 0 POP
16: 0 POP (MARK at 0)
17: g GET 1
20: . STOP
highest protocol among opcodes = 0
>>> dis(pickle.dumps(T, 1))
0: ( MARK
1: ] EMPTY_LIST
2: q BINPUT 0
4: ( MARK
5: h BINGET 0
7: t TUPLE (MARK at 4)
8: q BINPUT 1
10: a APPEND
11: 1 POP_MARK (MARK at 0)
12: h BINGET 1
14: . STOP
highest protocol among opcodes = 1
Try protocol 2.
>>> dis(pickle.dumps(L, 2))
0: \x80 PROTO 2
2: ] EMPTY_LIST
3: q BINPUT 0
5: h BINGET 0
7: \x85 TUPLE1
8: q BINPUT 1
10: a APPEND
11: . STOP
highest protocol among opcodes = 2
>>> dis(pickle.dumps(T, 2))
0: \x80 PROTO 2
2: ] EMPTY_LIST
3: q BINPUT 0
5: h BINGET 0
7: \x85 TUPLE1
8: q BINPUT 1
10: a APPEND
11: 0 POP
12: h BINGET 1
14: . STOP
highest protocol among opcodes = 2
Try protocol 3 with annotations:
>>> dis(pickle.dumps(T, 3), annotate=1)
0: \x80 PROTO 3 Protocol version indicator.
2: ] EMPTY_LIST Push an empty list.
3: q BINPUT 0 Store the stack top into the memo. The stack is not popped.
5: h BINGET 0 Read an object from the memo and push it on the stack.
7: \x85 TUPLE1 Build a one-tuple out of the topmost item on the stack.
8: q BINPUT 1 Store the stack top into the memo. The stack is not popped.
10: a APPEND Append an object to a list.
11: 0 POP Discard the top stack item, shrinking the stack by one item.
12: h BINGET 1 Read an object from the memo and push it on the stack.
14: . STOP Stop the unpickling machine.
highest protocol among opcodes = 2
"""
_memo_test = r"""
>>> import pickle
>>> import io
>>> f = io.BytesIO()
>>> p = pickle.Pickler(f, 2)
>>> x = [1, 2, 3]
>>> p.dump(x)
>>> p.dump(x)
>>> f.seek(0)
0
>>> memo = {}
>>> dis(f, memo=memo)
0: \x80 PROTO 2
2: ] EMPTY_LIST
3: q BINPUT 0
5: ( MARK
6: K BININT1 1
8: K BININT1 2
10: K BININT1 3
12: e APPENDS (MARK at 5)
13: . STOP
highest protocol among opcodes = 2
>>> dis(f, memo=memo)
14: \x80 PROTO 2
16: h BINGET 0
18: . STOP
highest protocol among opcodes = 2
"""
__test__ = {'disassembler_test': _dis_test,
'disassembler_memo_test': _memo_test,
}
def _test():
import doctest
return doctest.testmod()
if __name__ == "__main__":
import sys, argparse
parser = argparse.ArgumentParser(
description='disassemble one or more pickle files')
parser.add_argument(
'pickle_file', type=argparse.FileType('br'),
nargs='*', help='the pickle file')
parser.add_argument(
'-o', '--output', default=sys.stdout, type=argparse.FileType('w'),
help='the file where the output should be written')
parser.add_argument(
'-m', '--memo', action='store_true',
help='preserve memo between disassemblies')
parser.add_argument(
'-l', '--indentlevel', default=4, type=int,
help='the number of blanks by which to indent a new MARK level')
parser.add_argument(
'-a', '--annotate', action='store_true',
help='annotate each line with a short opcode description')
parser.add_argument(
'-p', '--preamble', default="==> {name} <==",
help='if more than one pickle file is specified, print this before'
' each disassembly')
parser.add_argument(
'-t', '--test', action='store_true',
help='run self-test suite')
parser.add_argument(
'-v', action='store_true',
help='run verbosely; only affects self-test run')
args = parser.parse_args()
if args.test:
_test()
else:
annotate = 30 if args.annotate else 0
if not args.pickle_file:
parser.print_help()
elif len(args.pickle_file) == 1:
dis(args.pickle_file[0], args.output, None,
args.indentlevel, annotate)
else:
memo = {} if args.memo else None
for f in args.pickle_file:
preamble = args.preamble.format(name=f.name)
args.output.write(preamble + '\n')
dis(f, args.output, memo, args.indentlevel, annotate)
"""Conversion pipeline templates.
The problem:
------------
Suppose you have some data that you want to convert to another format,
such as from GIF image format to PPM image format. Maybe the
conversion involves several steps (e.g. piping it through compress or
uuencode). Some of the conversion steps may require that their input
is a disk file, others may be able to read standard input; similar for
their output. The input to the entire conversion may also be read
from a disk file or from an open file, and similar for its output.
The module lets you construct a pipeline template by sticking one or
more conversion steps together. It will take care of creating and
removing temporary files if they are necessary to hold intermediate
data. You can then use the template to do conversions from many
different sources to many different destinations. The temporary
file names used are different each time the template is used.
The templates are objects so you can create templates for many
different conversion steps and store them in a dictionary, for
instance.
Directions:
-----------
To create a template:
t = Template()
To add a conversion step to a template:
t.append(command, kind)
where kind is a string of two characters: the first is '-' if the
command reads its standard input or 'f' if it requires a file; the
second likewise for the output. The command must be valid /bin/sh
syntax. If input or output files are required, they are passed as
$IN and $OUT; otherwise, it must be possible to use the command in
a pipeline.
To add a conversion step at the beginning:
t.prepend(command, kind)
To convert a file to another file using a template:
sts = t.copy(infile, outfile)
If infile or outfile are the empty string, standard input is read or
standard output is written, respectively. The return value is the
exit status of the conversion pipeline.
To open a file for reading or writing through a conversion pipeline:
fp = t.open(file, mode)
where mode is 'r' to read the file, or 'w' to write it -- just like
for the built-in function open() or for os.popen().
To create a new template object initialized to a given one:
t2 = t.clone()
""" # '
import re
import os
import tempfile
# we import the quote function rather than the module for backward compat
# (quote used to be an undocumented but used function in pipes)
from shlex import quote
__all__ = ["Template"]
# Conversion step kinds
FILEIN_FILEOUT = 'ff' # Must read & write real files
STDIN_FILEOUT = '-f' # Must write a real file
FILEIN_STDOUT = 'f-' # Must read a real file
STDIN_STDOUT = '--' # Normal pipeline element
SOURCE = '.-' # Must be first, writes stdout
SINK = '-.' # Must be last, reads stdin
stepkinds = [FILEIN_FILEOUT, STDIN_FILEOUT, FILEIN_STDOUT, STDIN_STDOUT, \
SOURCE, SINK]
class Template:
"""Class representing a pipeline template."""
def __init__(self):
"""Template() returns a fresh pipeline template."""
self.debugging = 0
self.reset()
def __repr__(self):
"""t.__repr__() implements repr(t)."""
return '<Template instance, steps=%r>' % (self.steps,)
def reset(self):
"""t.reset() restores a pipeline template to its initial state."""
self.steps = []
def clone(self):
"""t.clone() returns a new pipeline template with identical
initial state as the current one."""
t = Template()
t.steps = self.steps[:]
t.debugging = self.debugging
return t
def debug(self, flag):
"""t.debug(flag) turns debugging on or off."""
self.debugging = flag
def append(self, cmd, kind):
"""t.append(cmd, kind) adds a new step at the end."""
if type(cmd) is not type(''):
raise TypeError('Template.append: cmd must be a string')
if kind not in stepkinds:
raise ValueError('Template.append: bad kind %r' % (kind,))
if kind == SOURCE:
raise ValueError('Template.append: SOURCE can only be prepended')
if self.steps and self.steps[-1][1] == SINK:
raise ValueError('Template.append: already ends with SINK')
if kind[0] == 'f' and not re.search(r'\$IN\b', cmd):
raise ValueError('Template.append: missing $IN in cmd')
if kind[1] == 'f' and not re.search(r'\$OUT\b', cmd):
raise ValueError('Template.append: missing $OUT in cmd')
self.steps.append((cmd, kind))
def prepend(self, cmd, kind):
"""t.prepend(cmd, kind) adds a new step at the front."""
if type(cmd) is not type(''):
raise TypeError('Template.prepend: cmd must be a string')
if kind not in stepkinds:
raise ValueError('Template.prepend: bad kind %r' % (kind,))
if kind == SINK:
raise ValueError('Template.prepend: SINK can only be appended')
if self.steps and self.steps[0][1] == SOURCE:
raise ValueError('Template.prepend: already begins with SOURCE')
if kind[0] == 'f' and not re.search(r'\$IN\b', cmd):
raise ValueError('Template.prepend: missing $IN in cmd')
if kind[1] == 'f' and not re.search(r'\$OUT\b', cmd):
raise ValueError('Template.prepend: missing $OUT in cmd')
self.steps.insert(0, (cmd, kind))
def open(self, file, rw):
"""t.open(file, rw) returns a pipe or file object open for
reading or writing; the file is the other end of the pipeline."""
if rw == 'r':
return self.open_r(file)
if rw == 'w':
return self.open_w(file)
raise ValueError('Template.open: rw must be \'r\' or \'w\', not %r'
% (rw,))
def open_r(self, file):
"""t.open_r(file) and t.open_w(file) implement
t.open(file, 'r') and t.open(file, 'w') respectively."""
if not self.steps:
return open(file, 'r')
if self.steps[-1][1] == SINK:
raise ValueError('Template.open_r: pipeline ends width SINK')
cmd = self.makepipeline(file, '')
return os.popen(cmd, 'r')
def open_w(self, file):
if not self.steps:
return open(file, 'w')
if self.steps[0][1] == SOURCE:
raise ValueError('Template.open_w: pipeline begins with SOURCE')
cmd = self.makepipeline('', file)
return os.popen(cmd, 'w')
def copy(self, infile, outfile):
return os.system(self.makepipeline(infile, outfile))
def makepipeline(self, infile, outfile):
cmd = makepipeline(infile, self.steps, outfile)
if self.debugging:
print(cmd)
cmd = 'set -x; ' + cmd
return cmd
def makepipeline(infile, steps, outfile):
# Build a list with for each command:
# [input filename or '', command string, kind, output filename or '']
list = []
for cmd, kind in steps:
list.append(['', cmd, kind, ''])
#
# Make sure there is at least one step
#
if not list:
list.append(['', 'cat', '--', ''])
#
# Take care of the input and output ends
#
[cmd, kind] = list[0][1:3]
if kind[0] == 'f' and not infile:
list.insert(0, ['', 'cat', '--', ''])
list[0][0] = infile
#
[cmd, kind] = list[-1][1:3]
if kind[1] == 'f' and not outfile:
list.append(['', 'cat', '--', ''])
list[-1][-1] = outfile
#
# Invent temporary files to connect stages that need files
#
garbage = []
for i in range(1, len(list)):
lkind = list[i-1][2]
rkind = list[i][2]
if lkind[1] == 'f' or rkind[0] == 'f':
(fd, temp) = tempfile.mkstemp()
os.close(fd)
garbage.append(temp)
list[i-1][-1] = list[i][0] = temp
#
for item in list:
[inf, cmd, kind, outf] = item
if kind[1] == 'f':
cmd = 'OUT=' + quote(outf) + '; ' + cmd
if kind[0] == 'f':
cmd = 'IN=' + quote(inf) + '; ' + cmd
if kind[0] == '-' and inf:
cmd = cmd + ' <' + quote(inf)
if kind[1] == '-' and outf:
cmd = cmd + ' >' + quote(outf)
item[1] = cmd
#
cmdlist = list[0][1]
for item in list[1:]:
[cmd, kind] = item[1:3]
if item[0] == '':
if 'f' in kind:
cmd = '{ ' + cmd + '; }'
cmdlist = cmdlist + ' |\n' + cmd
else:
cmdlist = cmdlist + '\n' + cmd
#
if garbage:
rmcmd = 'rm -f'
for file in garbage:
rmcmd = rmcmd + ' ' + quote(file)
trapcmd = 'trap ' + quote(rmcmd + '; exit') + ' 1 2 3 13 14 15'
cmdlist = trapcmd + '\n' + cmdlist + '\n' + rmcmd
#
return cmdlist
"""Utilities to support packages."""
from functools import singledispatch as simplegeneric
import importlib
import importlib.util
import importlib.machinery
import os
import os.path
import sys
from types import ModuleType
import warnings
__all__ = [
'get_importer', 'iter_importers', 'get_loader', 'find_loader',
'walk_packages', 'iter_modules', 'get_data',
'ImpImporter', 'ImpLoader', 'read_code', 'extend_path',
]
def _get_spec(finder, name):
"""Return the finder-specific module spec."""
# Works with legacy finders.
try:
find_spec = finder.find_spec
except AttributeError:
loader = finder.find_module(name)
if loader is None:
return None
return importlib.util.spec_from_loader(name, loader)
else:
return find_spec(name)
def read_code(stream):
# This helper is needed in order for the PEP 302 emulation to
# correctly handle compiled files
import marshal
magic = stream.read(4)
if magic != importlib.util.MAGIC_NUMBER:
return None
stream.read(8) # Skip timestamp and size
return marshal.load(stream)
def walk_packages(path=None, prefix='', onerror=None):
"""Yields (module_loader, name, ispkg) for all modules recursively
on path, or, if path is None, all accessible modules.
'path' should be either None or a list of paths to look for
modules in.
'prefix' is a string to output on the front of every module name
on output.
Note that this function must import all *packages* (NOT all
modules!) on the given path, in order to access the __path__
attribute to find submodules.
'onerror' is a function which gets called with one argument (the
name of the package which was being imported) if any exception
occurs while trying to import a package. If no onerror function is
supplied, ImportErrors are caught and ignored, while all other
exceptions are propagated, terminating the search.
Examples:
# list all modules python can access
walk_packages()
# list all submodules of ctypes
walk_packages(ctypes.__path__, ctypes.__name__+'.')
"""
def seen(p, m={}):
if p in m:
return True
m[p] = True
for importer, name, ispkg in iter_modules(path, prefix):
yield importer, name, ispkg
if ispkg:
try:
__import__(name)
except ImportError:
if onerror is not None:
onerror(name)
except Exception:
if onerror is not None:
onerror(name)
else:
raise
else:
path = getattr(sys.modules[name], '__path__', None) or []
# don't traverse path items we've seen before
path = [p for p in path if not seen(p)]
yield from walk_packages(path, name+'.', onerror)
def iter_modules(path=None, prefix=''):
"""Yields (module_loader, name, ispkg) for all submodules on path,
or, if path is None, all top-level modules on sys.path.
'path' should be either None or a list of paths to look for
modules in.
'prefix' is a string to output on the front of every module name
on output.
"""
if path is None:
importers = iter_importers()
else:
importers = map(get_importer, path)
yielded = {}
for i in importers:
for name, ispkg in iter_importer_modules(i, prefix):
if name not in yielded:
yielded[name] = 1
yield i, name, ispkg
@simplegeneric
def iter_importer_modules(importer, prefix=''):
if not hasattr(importer, 'iter_modules'):
return []
return importer.iter_modules(prefix)
# Implement a file walker for the normal importlib path hook
def _iter_file_finder_modules(importer, prefix=''):
if importer.path is None or not os.path.isdir(importer.path):
return
yielded = {}
import inspect
try:
filenames = os.listdir(importer.path)
except OSError:
# ignore unreadable directories like import does
filenames = []
filenames.sort() # handle packages before same-named modules
for fn in filenames:
modname = inspect.getmodulename(fn)
if modname=='__init__' or modname in yielded:
continue
path = os.path.join(importer.path, fn)
ispkg = False
if not modname and os.path.isdir(path) and '.' not in fn:
modname = fn
try:
dircontents = os.listdir(path)
except OSError:
# ignore unreadable directories like import does
dircontents = []
for fn in dircontents:
subname = inspect.getmodulename(fn)
if subname=='__init__':
ispkg = True
break
else:
continue # not a package
if modname and '.' not in modname:
yielded[modname] = 1
yield prefix + modname, ispkg
iter_importer_modules.register(
importlib.machinery.FileFinder, _iter_file_finder_modules)
def _import_imp():
global imp
with warnings.catch_warnings():
warnings.simplefilter('ignore', PendingDeprecationWarning)
imp = importlib.import_module('imp')
class ImpImporter:
"""PEP 302 Importer that wraps Python's "classic" import algorithm
ImpImporter(dirname) produces a PEP 302 importer that searches that
directory. ImpImporter(None) produces a PEP 302 importer that searches
the current sys.path, plus any modules that are frozen or built-in.
Note that ImpImporter does not currently support being used by placement
on sys.meta_path.
"""
def __init__(self, path=None):
global imp
warnings.warn("This emulation is deprecated, use 'importlib' instead",
DeprecationWarning)
_import_imp()
self.path = path
def find_module(self, fullname, path=None):
# Note: we ignore 'path' argument since it is only used via meta_path
subname = fullname.split(".")[-1]
if subname != fullname and self.path is None:
return None
if self.path is None:
path = None
else:
path = [os.path.realpath(self.path)]
try:
file, filename, etc = imp.find_module(subname, path)
except ImportError:
return None
return ImpLoader(fullname, file, filename, etc)
def iter_modules(self, prefix=''):
if self.path is None or not os.path.isdir(self.path):
return
yielded = {}
import inspect
try:
filenames = os.listdir(self.path)
except OSError:
# ignore unreadable directories like import does
filenames = []
filenames.sort() # handle packages before same-named modules
for fn in filenames:
modname = inspect.getmodulename(fn)
if modname=='__init__' or modname in yielded:
continue
path = os.path.join(self.path, fn)
ispkg = False
if not modname and os.path.isdir(path) and '.' not in fn:
modname = fn
try:
dircontents = os.listdir(path)
except OSError:
# ignore unreadable directories like import does
dircontents = []
for fn in dircontents:
subname = inspect.getmodulename(fn)
if subname=='__init__':
ispkg = True
break
else:
continue # not a package
if modname and '.' not in modname:
yielded[modname] = 1
yield prefix + modname, ispkg
class ImpLoader:
"""PEP 302 Loader that wraps Python's "classic" import algorithm
"""
code = source = None
def __init__(self, fullname, file, filename, etc):
warnings.warn("This emulation is deprecated, use 'importlib' instead",
DeprecationWarning)
_import_imp()
self.file = file
self.filename = filename
self.fullname = fullname
self.etc = etc
def load_module(self, fullname):
self._reopen()
try:
mod = imp.load_module(fullname, self.file, self.filename, self.etc)
finally:
if self.file:
self.file.close()
# Note: we don't set __loader__ because we want the module to look
# normal; i.e. this is just a wrapper for standard import machinery
return mod
def get_data(self, pathname):
with open(pathname, "rb") as file:
return file.read()
def _reopen(self):
if self.file and self.file.closed:
mod_type = self.etc[2]
if mod_type==imp.PY_SOURCE:
self.file = open(self.filename, 'r')
elif mod_type in (imp.PY_COMPILED, imp.C_EXTENSION):
self.file = open(self.filename, 'rb')
def _fix_name(self, fullname):
if fullname is None:
fullname = self.fullname
elif fullname != self.fullname:
raise ImportError("Loader for module %s cannot handle "
"module %s" % (self.fullname, fullname))
return fullname
def is_package(self, fullname):
fullname = self._fix_name(fullname)
return self.etc[2]==imp.PKG_DIRECTORY
def get_code(self, fullname=None):
fullname = self._fix_name(fullname)
if self.code is None:
mod_type = self.etc[2]
if mod_type==imp.PY_SOURCE:
source = self.get_source(fullname)
self.code = compile(source, self.filename, 'exec')
elif mod_type==imp.PY_COMPILED:
self._reopen()
try:
self.code = read_code(self.file)
finally:
self.file.close()
elif mod_type==imp.PKG_DIRECTORY:
self.code = self._get_delegate().get_code()
return self.code
def get_source(self, fullname=None):
fullname = self._fix_name(fullname)
if self.source is None:
mod_type = self.etc[2]
if mod_type==imp.PY_SOURCE:
self._reopen()
try:
self.source = self.file.read()
finally:
self.file.close()
elif mod_type==imp.PY_COMPILED:
if os.path.exists(self.filename[:-1]):
with open(self.filename[:-1], 'r') as f:
self.source = f.read()
elif mod_type==imp.PKG_DIRECTORY:
self.source = self._get_delegate().get_source()
return self.source
def _get_delegate(self):
finder = ImpImporter(self.filename)
spec = _get_spec(finder, '__init__')
return spec.loader
def get_filename(self, fullname=None):
fullname = self._fix_name(fullname)
mod_type = self.etc[2]
if mod_type==imp.PKG_DIRECTORY:
return self._get_delegate().get_filename()
elif mod_type in (imp.PY_SOURCE, imp.PY_COMPILED, imp.C_EXTENSION):
return self.filename
return None
try:
import zipimport
from zipimport import zipimporter
def iter_zipimport_modules(importer, prefix=''):
dirlist = sorted(zipimport._zip_directory_cache[importer.archive])
_prefix = importer.prefix
plen = len(_prefix)
yielded = {}
import inspect
for fn in dirlist:
if not fn.startswith(_prefix):
continue
fn = fn[plen:].split(os.sep)
if len(fn)==2 and fn[1].startswith('__init__.py'):
if fn[0] not in yielded:
yielded[fn[0]] = 1
yield fn[0], True
if len(fn)!=1:
continue
modname = inspect.getmodulename(fn[0])
if modname=='__init__':
continue
if modname and '.' not in modname and modname not in yielded:
yielded[modname] = 1
yield prefix + modname, False
iter_importer_modules.register(zipimporter, iter_zipimport_modules)
except ImportError:
pass
def get_importer(path_item):
"""Retrieve a PEP 302 importer for the given path item
The returned importer is cached in sys.path_importer_cache
if it was newly created by a path hook.
The cache (or part of it) can be cleared manually if a
rescan of sys.path_hooks is necessary.
"""
try:
importer = sys.path_importer_cache[path_item]
except KeyError:
for path_hook in sys.path_hooks:
try:
importer = path_hook(path_item)
sys.path_importer_cache.setdefault(path_item, importer)
break
except ImportError:
pass
else:
importer = None
return importer
def iter_importers(fullname=""):
"""Yield PEP 302 importers for the given module name
If fullname contains a '.', the importers will be for the package
containing fullname, otherwise they will be all registered top level
importers (i.e. those on both sys.meta_path and sys.path_hooks).
If the named module is in a package, that package is imported as a side
effect of invoking this function.
If no module name is specified, all top level importers are produced.
"""
if fullname.startswith('.'):
msg = "Relative module name {!r} not supported".format(fullname)
raise ImportError(msg)
if '.' in fullname:
# Get the containing package's __path__
pkg_name = fullname.rpartition(".")[0]
pkg = importlib.import_module(pkg_name)
path = getattr(pkg, '__path__', None)
if path is None:
return
else:
yield from sys.meta_path
path = sys.path
for item in path:
yield get_importer(item)
def get_loader(module_or_name):
"""Get a PEP 302 "loader" object for module_or_name
Returns None if the module cannot be found or imported.
If the named module is not already imported, its containing package
(if any) is imported, in order to establish the package __path__.
"""
if module_or_name in sys.modules:
module_or_name = sys.modules[module_or_name]
if module_or_name is None:
return None
if isinstance(module_or_name, ModuleType):
module = module_or_name
loader = getattr(module, '__loader__', None)
if loader is not None:
return loader
if getattr(module, '__spec__', None) is None:
return None
fullname = module.__name__
else:
fullname = module_or_name
return find_loader(fullname)
def find_loader(fullname):
"""Find a PEP 302 "loader" object for fullname
This is a backwards compatibility wrapper around
importlib.util.find_spec that converts most failures to ImportError
and only returns the loader rather than the full spec
"""
if fullname.startswith('.'):
msg = "Relative module name {!r} not supported".format(fullname)
raise ImportError(msg)
try:
spec = importlib.util.find_spec(fullname)
except (ImportError, AttributeError, TypeError, ValueError) as ex:
# This hack fixes an impedance mismatch between pkgutil and
# importlib, where the latter raises other errors for cases where
# pkgutil previously raised ImportError
msg = "Error while finding loader for {!r} ({}: {})"
raise ImportError(msg.format(fullname, type(ex), ex)) from ex
return spec.loader if spec is not None else None
def extend_path(path, name):
"""Extend a package's path.
Intended use is to place the following code in a package's __init__.py:
from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)
This will add to the package's __path__ all subdirectories of
directories on sys.path named after the package. This is useful
if one wants to distribute different parts of a single logical
package as multiple directories.
It also looks for *.pkg files beginning where * matches the name
argument. This feature is similar to *.pth files (see site.py),
except that it doesn't special-case lines starting with 'import'.
A *.pkg file is trusted at face value: apart from checking for
duplicates, all entries found in a *.pkg file are added to the
path, regardless of whether they are exist the filesystem. (This
is a feature.)
If the input path is not a list (as is the case for frozen
packages) it is returned unchanged. The input path is not
modified; an extended copy is returned. Items are only appended
to the copy at the end.
It is assumed that sys.path is a sequence. Items of sys.path that
are not (unicode or 8-bit) strings referring to existing
directories are ignored. Unicode items of sys.path that cause
errors when used as filenames may cause this function to raise an
exception (in line with os.path.isdir() behavior).
"""
if not isinstance(path, list):
# This could happen e.g. when this is called from inside a
# frozen package. Return the path unchanged in that case.
return path
sname_pkg = name + ".pkg"
path = path[:] # Start with a copy of the existing path
parent_package, _, final_name = name.rpartition('.')
if parent_package:
try:
search_path = sys.modules[parent_package].__path__
except (KeyError, AttributeError):
# We can't do anything: find_loader() returns None when
# passed a dotted name.
return path
else:
search_path = sys.path
for dir in search_path:
if not isinstance(dir, str):
continue
finder = get_importer(dir)
if finder is not None:
portions = []
if hasattr(finder, 'find_spec'):
spec = finder.find_spec(final_name)
if spec is not None:
portions = spec.submodule_search_locations or []
# Is this finder PEP 420 compliant?
elif hasattr(finder, 'find_loader'):
_, portions = finder.find_loader(final_name)
for portion in portions:
# XXX This may still add duplicate entries to path on
# case-insensitive filesystems
if portion not in path:
path.append(portion)
# XXX Is this the right thing for subpackages like zope.app?
# It looks for a file named "zope.app.pkg"
pkgfile = os.path.join(dir, sname_pkg)
if os.path.isfile(pkgfile):
try:
f = open(pkgfile)
except OSError as msg:
sys.stderr.write("Can't open %s: %s\n" %
(pkgfile, msg))
else:
with f:
for line in f:
line = line.rstrip('\n')
if not line or line.startswith('#'):
continue
path.append(line) # Don't check for existence!
return path
def get_data(package, resource):
"""Get a resource from a package.
This is a wrapper round the PEP 302 loader get_data API. The package
argument should be the name of a package, in standard module format
(foo.bar). The resource argument should be in the form of a relative
filename, using '/' as the path separator. The parent directory name '..'
is not allowed, and nor is a rooted name (starting with a '/').
The function returns a binary string, which is the contents of the
specified resource.
For packages located in the filesystem, which have already been imported,
this is the rough equivalent of
d = os.path.dirname(sys.modules[package].__file__)
data = open(os.path.join(d, resource), 'rb').read()
If the package cannot be located or loaded, or it uses a PEP 302 loader
which does not support get_data(), then None is returned.
"""
spec = importlib.util.find_spec(package)
if spec is None:
return None
loader = spec.loader
if loader is None or not hasattr(loader, 'get_data'):
return None
# XXX needs test
mod = (sys.modules.get(package) or
importlib._bootstrap._SpecMethods(spec).load())
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
return loader.get_data(resource_name)
#!/usr/bin/env python3
""" This module tries to retrieve as much platform-identifying data as
possible. It makes this information available via function APIs.
If called from the command line, it prints the platform
information concatenated as single string to stdout. The output
format is useable as part of a filename.
"""
# This module is maintained by Marc-Andre Lemburg <[email protected]>.
# If you find problems, please submit bug reports/patches via the
# Python bug tracker (http://bugs.python.org) and assign them to "lemburg".
#
# Still needed:
# * more support for WinCE
# * support for MS-DOS (PythonDX ?)
# * support for Amiga and other still unsupported platforms running Python
# * support for additional Linux distributions
#
# Many thanks to all those who helped adding platform-specific
# checks (in no particular order):
#
# Charles G Waldman, David Arnold, Gordon McMillan, Ben Darnell,
# Jeff Bauer, Cliff Crawford, Ivan Van Laningham, Josef
# Betancourt, Randall Hopper, Karl Putland, John Farrell, Greg
# Andruk, Just van Rossum, Thomas Heller, Mark R. Levinson, Mark
# Hammond, Bill Tutt, Hans Nowak, Uwe Zessin (OpenVMS support),
# Colin Kong, Trent Mick, Guido van Rossum, Anthony Baxter, Steve
# Dower
#
# History:
#
# <see CVS and SVN checkin messages for history>
#
# 1.0.8 - changed Windows support to read version from kernel32.dll
# 1.0.7 - added DEV_NULL
# 1.0.6 - added linux_distribution()
# 1.0.5 - fixed Java support to allow running the module on Jython
# 1.0.4 - added IronPython support
# 1.0.3 - added normalization of Windows system name
# 1.0.2 - added more Windows support
# 1.0.1 - reformatted to make doc.py happy
# 1.0.0 - reformatted a bit and checked into Python CVS
# 0.8.0 - added sys.version parser and various new access
# APIs (python_version(), python_compiler(), etc.)
# 0.7.2 - fixed architecture() to use sizeof(pointer) where available
# 0.7.1 - added support for Caldera OpenLinux
# 0.7.0 - some fixes for WinCE; untabified the source file
# 0.6.2 - support for OpenVMS - requires version 1.5.2-V006 or higher and
# vms_lib.getsyi() configured
# 0.6.1 - added code to prevent 'uname -p' on platforms which are
# known not to support it
# 0.6.0 - fixed win32_ver() to hopefully work on Win95,98,NT and Win2k;
# did some cleanup of the interfaces - some APIs have changed
# 0.5.5 - fixed another type in the MacOS code... should have
# used more coffee today ;-)
# 0.5.4 - fixed a few typos in the MacOS code
# 0.5.3 - added experimental MacOS support; added better popen()
# workarounds in _syscmd_ver() -- still not 100% elegant
# though
# 0.5.2 - fixed uname() to return '' instead of 'unknown' in all
# return values (the system uname command tends to return
# 'unknown' instead of just leaving the field emtpy)
# 0.5.1 - included code for slackware dist; added exception handlers
# to cover up situations where platforms don't have os.popen
# (e.g. Mac) or fail on socket.gethostname(); fixed libc
# detection RE
# 0.5.0 - changed the API names referring to system commands to *syscmd*;
# added java_ver(); made syscmd_ver() a private
# API (was system_ver() in previous versions) -- use uname()
# instead; extended the win32_ver() to also return processor
# type information
# 0.4.0 - added win32_ver() and modified the platform() output for WinXX
# 0.3.4 - fixed a bug in _follow_symlinks()
# 0.3.3 - fixed popen() and "file" command invokation bugs
# 0.3.2 - added architecture() API and support for it in platform()
# 0.3.1 - fixed syscmd_ver() RE to support Windows NT
# 0.3.0 - added system alias support
# 0.2.3 - removed 'wince' again... oh well.
# 0.2.2 - added 'wince' to syscmd_ver() supported platforms
# 0.2.1 - added cache logic and changed the platform string format
# 0.2.0 - changed the API to use functions instead of module globals
# since some action take too long to be run on module import
# 0.1.0 - first release
#
# You can always get the latest version of this module at:
#
# http://www.egenix.com/files/python/platform.py
#
# If that URL should fail, try contacting the author.
__copyright__ = """
Copyright (c) 1999-2000, Marc-Andre Lemburg; mailto:[email protected]
Copyright (c) 2000-2010, eGenix.com Software GmbH; mailto:[email protected]
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee or royalty is hereby granted,
provided that the above copyright notice appear in all copies and that
both that copyright notice and this permission notice appear in
supporting documentation or portions thereof, including modifications,
that you make.
EGENIX.COM SOFTWARE GMBH DISCLAIMS ALL WARRANTIES WITH REGARD TO
THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL,
INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
WITH THE USE OR PERFORMANCE OF THIS SOFTWARE !
"""
__version__ = '1.0.7'
import collections
import sys, os, re, subprocess
### Globals & Constants
# Determine the platform's /dev/null device
try:
DEV_NULL = os.devnull
except AttributeError:
# os.devnull was added in Python 2.4, so emulate it for earlier
# Python versions
if sys.platform in ('dos', 'win32', 'win16'):
# Use the old CP/M NUL as device name
DEV_NULL = 'NUL'
else:
# Standard Unix uses /dev/null
DEV_NULL = '/dev/null'
# Directory to search for configuration information on Unix.
# Constant used by test_platform to test linux_distribution().
_UNIXCONFDIR = '/etc'
### Platform specific APIs
_libc_search = re.compile(b'(__libc_init)'
b'|'
b'(GLIBC_([0-9.]+))'
b'|'
br'(libc(_\w+)?\.so(?:\.(\d[0-9.]*))?)', re.ASCII)
def libc_ver(executable=sys.executable, lib='', version='',
chunksize=16384):
""" Tries to determine the libc version that the file executable
(which defaults to the Python interpreter) is linked against.
Returns a tuple of strings (lib,version) which default to the
given parameters in case the lookup fails.
Note that the function has intimate knowledge of how different
libc versions add symbols to the executable and thus is probably
only useable for executables compiled using gcc.
The file is read and scanned in chunks of chunksize bytes.
"""
if hasattr(os.path, 'realpath'):
# Python 2.2 introduced os.path.realpath(); it is used
# here to work around problems with Cygwin not being
# able to open symlinks for reading
executable = os.path.realpath(executable)
f = open(executable, 'rb')
binary = f.read(chunksize)
pos = 0
while 1:
if b'libc' in binary or b'GLIBC' in binary:
m = _libc_search.search(binary, pos)
else:
m = None
if not m:
binary = f.read(chunksize)
if not binary:
break
pos = 0
continue
libcinit, glibc, glibcversion, so, threads, soversion = [
s.decode('latin1') if s is not None else s
for s in m.groups()]
if libcinit and not lib:
lib = 'libc'
elif glibc:
if lib != 'glibc':
lib = 'glibc'
version = glibcversion
elif glibcversion > version:
version = glibcversion
elif so:
if lib != 'glibc':
lib = 'libc'
if soversion and soversion > version:
version = soversion
if threads and version[-len(threads):] != threads:
version = version + threads
pos = m.end()
f.close()
return lib, version
def _dist_try_harder(distname, version, id):
""" Tries some special tricks to get the distribution
information in case the default method fails.
Currently supports older SuSE Linux, Caldera OpenLinux and
Slackware Linux distributions.
"""
if os.path.exists('/var/adm/inst-log/info'):
# SuSE Linux stores distribution information in that file
distname = 'SuSE'
for line in open('/var/adm/inst-log/info'):
tv = line.split()
if len(tv) == 2:
tag, value = tv
else:
continue
if tag == 'MIN_DIST_VERSION':
version = value.strip()
elif tag == 'DIST_IDENT':
values = value.split('-')
id = values[2]
return distname, version, id
if os.path.exists('/etc/.installed'):
# Caldera OpenLinux has some infos in that file (thanks to Colin Kong)
for line in open('/etc/.installed'):
pkg = line.split('-')
if len(pkg) >= 2 and pkg[0] == 'OpenLinux':
# XXX does Caldera support non Intel platforms ? If yes,
# where can we find the needed id ?
return 'OpenLinux', pkg[1], id
if os.path.isdir('/usr/lib/setup'):
# Check for slackware version tag file (thanks to Greg Andruk)
verfiles = os.listdir('/usr/lib/setup')
for n in range(len(verfiles)-1, -1, -1):
if verfiles[n][:14] != 'slack-version-':
del verfiles[n]
if verfiles:
verfiles.sort()
distname = 'slackware'
version = verfiles[-1][14:]
return distname, version, id
return distname, version, id
_release_filename = re.compile(r'(\w+)[-_](release|version)', re.ASCII)
_lsb_release_version = re.compile(r'(.+)'
' release '
'([\d.]+)'
'[^(]*(?:\((.+)\))?', re.ASCII)
_release_version = re.compile(r'([^0-9]+)'
'(?: release )?'
'([\d.]+)'
'[^(]*(?:\((.+)\))?', re.ASCII)
# See also http://www.novell.com/coolsolutions/feature/11251.html
# and http://linuxmafia.com/faq/Admin/release-files.html
# and http://data.linux-ntfs.org/rpm/whichrpm
# and http://www.die.net/doc/linux/man/man1/lsb_release.1.html
_supported_dists = (
'SuSE', 'debian', 'fedora', 'redhat', 'centos',
'mandrake', 'mandriva', 'rocks', 'slackware', 'yellowdog', 'gentoo',
'UnitedLinux', 'turbolinux', 'arch', 'mageia')
def _parse_release_file(firstline):
# Default to empty 'version' and 'id' strings. Both defaults are used
# when 'firstline' is empty. 'id' defaults to empty when an id can not
# be deduced.
version = ''
id = ''
# Parse the first line
m = _lsb_release_version.match(firstline)
if m is not None:
# LSB format: "distro release x.x (codename)"
return tuple(m.groups())
# Pre-LSB format: "distro x.x (codename)"
m = _release_version.match(firstline)
if m is not None:
return tuple(m.groups())
# Unknown format... take the first two words
l = firstline.strip().split()
if l:
version = l[0]
if len(l) > 1:
id = l[1]
return '', version, id
def linux_distribution(distname='', version='', id='',
supported_dists=_supported_dists,
full_distribution_name=1):
""" Tries to determine the name of the Linux OS distribution name.
The function first looks for a distribution release file in
/etc and then reverts to _dist_try_harder() in case no
suitable files are found.
supported_dists may be given to define the set of Linux
distributions to look for. It defaults to a list of currently
supported Linux distributions identified by their release file
name.
If full_distribution_name is true (default), the full
distribution read from the OS is returned. Otherwise the short
name taken from supported_dists is used.
Returns a tuple (distname, version, id) which default to the
args given as parameters.
"""
try:
etc = os.listdir(_UNIXCONFDIR)
except OSError:
# Probably not a Unix system
return distname, version, id
etc.sort()
for file in etc:
m = _release_filename.match(file)
if m is not None:
_distname, dummy = m.groups()
if _distname in supported_dists:
distname = _distname
break
else:
return _dist_try_harder(distname, version, id)
# Read the first line
with open(os.path.join(_UNIXCONFDIR, file), 'r',
encoding='utf-8', errors='surrogateescape') as f:
firstline = f.readline()
_distname, _version, _id = _parse_release_file(firstline)
if _distname and full_distribution_name:
distname = _distname
if _version:
version = _version
if _id:
id = _id
return distname, version, id
# To maintain backwards compatibility:
def dist(distname='', version='', id='',
supported_dists=_supported_dists):
""" Tries to determine the name of the Linux OS distribution name.
The function first looks for a distribution release file in
/etc and then reverts to _dist_try_harder() in case no
suitable files are found.
Returns a tuple (distname, version, id) which default to the
args given as parameters.
"""
return linux_distribution(distname, version, id,
supported_dists=supported_dists,
full_distribution_name=0)
def popen(cmd, mode='r', bufsize=-1):
""" Portable popen() interface.
"""
import warnings
warnings.warn('use os.popen instead', DeprecationWarning, stacklevel=2)
return os.popen(cmd, mode, bufsize)
def _norm_version(version, build=''):
""" Normalize the version and build strings and return a single
version string using the format major.minor.build (or patchlevel).
"""
l = version.split('.')
if build:
l.append(build)
try:
ints = map(int, l)
except ValueError:
strings = l
else:
strings = list(map(str, ints))
version = '.'.join(strings[:3])
return version
_ver_output = re.compile(r'(?:([\w ]+) ([\w.]+) '
'.*'
'\[.* ([\d.]+)\])')
# Examples of VER command output:
#
# Windows 2000: Microsoft Windows 2000 [Version 5.00.2195]
# Windows XP: Microsoft Windows XP [Version 5.1.2600]
# Windows Vista: Microsoft Windows [Version 6.0.6002]
#
# Note that the "Version" string gets localized on different
# Windows versions.
def _syscmd_ver(system='', release='', version='',
supported_platforms=('win32', 'win16', 'dos')):
""" Tries to figure out the OS version used and returns
a tuple (system, release, version).
It uses the "ver" shell command for this which is known
to exists on Windows, DOS. XXX Others too ?
In case this fails, the given parameters are used as
defaults.
"""
if sys.platform not in supported_platforms:
return system, release, version
# Try some common cmd strings
for cmd in ('ver', 'command /c ver', 'cmd /c ver'):
try:
pipe = popen(cmd)
info = pipe.read()
if pipe.close():
raise OSError('command failed')
# XXX How can I suppress shell errors from being written
# to stderr ?
except OSError as why:
#print 'Command %s failed: %s' % (cmd, why)
continue
else:
break
else:
return system, release, version
# Parse the output
info = info.strip()
m = _ver_output.match(info)
if m is not None:
system, release, version = m.groups()
# Strip trailing dots from version and release
if release[-1] == '.':
release = release[:-1]
if version[-1] == '.':
version = version[:-1]
# Normalize the version and build strings (eliminating additional
# zeros)
version = _norm_version(version)
return system, release, version
_WIN32_CLIENT_RELEASES = {
(5, 0): "2000",
(5, 1): "XP",
# Strictly, 5.2 client is XP 64-bit, but platform.py historically
# has always called it 2003 Server
(5, 2): "2003Server",
(5, None): "post2003",
(6, 0): "Vista",
(6, 1): "7",
(6, 2): "8",
(6, 3): "8.1",
(6, None): "post8.1",
(10, 0): "10",
(10, None): "post10",
}
# Server release name lookup will default to client names if necessary
_WIN32_SERVER_RELEASES = {
(5, 2): "2003Server",
(6, 0): "2008Server",
(6, 1): "2008ServerR2",
(6, 2): "2012Server",
(6, 3): "2012ServerR2",
(6, None): "post2012ServerR2",
}
def _get_real_winver(maj, min, build):
if maj < 6 or (maj == 6 and min < 2):
return maj, min, build
from ctypes import (c_buffer, POINTER, byref, create_unicode_buffer,
Structure, WinDLL)
from ctypes.wintypes import DWORD, HANDLE
class VS_FIXEDFILEINFO(Structure):
_fields_ = [
("dwSignature", DWORD),
("dwStrucVersion", DWORD),
("dwFileVersionMS", DWORD),
("dwFileVersionLS", DWORD),
("dwProductVersionMS", DWORD),
("dwProductVersionLS", DWORD),
("dwFileFlagsMask", DWORD),
("dwFileFlags", DWORD),
("dwFileOS", DWORD),
("dwFileType", DWORD),
("dwFileSubtype", DWORD),
("dwFileDateMS", DWORD),
("dwFileDateLS", DWORD),
]
kernel32 = WinDLL('kernel32')
version = WinDLL('version')
# We will immediately double the length up to MAX_PATH, but the
# path may be longer, so we retry until the returned string is
# shorter than our buffer.
name_len = actual_len = 130
while actual_len == name_len:
name_len *= 2
name = create_unicode_buffer(name_len)
actual_len = kernel32.GetModuleFileNameW(HANDLE(kernel32._handle),
name, len(name))
if not actual_len:
return maj, min, build
size = version.GetFileVersionInfoSizeW(name, None)
if not size:
return maj, min, build
ver_block = c_buffer(size)
if (not version.GetFileVersionInfoW(name, None, size, ver_block) or
not ver_block):
return maj, min, build
pvi = POINTER(VS_FIXEDFILEINFO)()
if not version.VerQueryValueW(ver_block, "", byref(pvi), byref(DWORD())):
return maj, min, build
maj = pvi.contents.dwProductVersionMS >> 16
min = pvi.contents.dwProductVersionMS & 0xFFFF
build = pvi.contents.dwProductVersionLS >> 16
return maj, min, build
def win32_ver(release='', version='', csd='', ptype=''):
try:
from sys import getwindowsversion
except ImportError:
return release, version, csd, ptype
try:
from winreg import OpenKeyEx, QueryValueEx, CloseKey, HKEY_LOCAL_MACHINE
except ImportError:
from _winreg import OpenKeyEx, QueryValueEx, CloseKey, HKEY_LOCAL_MACHINE
winver = getwindowsversion()
maj, min, build = _get_real_winver(*winver[:3])
version = '{0}.{1}.{2}'.format(maj, min, build)
release = (_WIN32_CLIENT_RELEASES.get((maj, min)) or
_WIN32_CLIENT_RELEASES.get((maj, None)) or
release)
# getwindowsversion() reflect the compatibility mode Python is
# running under, and so the service pack value is only going to be
# valid if the versions match.
if winver[:2] == (maj, min):
try:
csd = 'SP{}'.format(winver.service_pack_major)
except AttributeError:
if csd[:13] == 'Service Pack ':
csd = 'SP' + csd[13:]
# VER_NT_SERVER = 3
if getattr(winver, 'product_type', None) == 3:
release = (_WIN32_SERVER_RELEASES.get((maj, min)) or
_WIN32_SERVER_RELEASES.get((maj, None)) or
release)
key = None
try:
key = OpenKeyEx(HKEY_LOCAL_MACHINE,
r'SOFTWARE\Microsoft\Windows NT\CurrentVersion')
ptype = QueryValueEx(key, 'CurrentType')[0]
except:
pass
finally:
if key:
CloseKey(key)
return release, version, csd, ptype
def _mac_ver_xml():
fn = '/System/Library/CoreServices/SystemVersion.plist'
if not os.path.exists(fn):
return None
try:
import plistlib
except ImportError:
return None
with open(fn, 'rb') as f:
pl = plistlib.load(f)
release = pl['ProductVersion']
versioninfo = ('', '', '')
machine = os.uname().machine
if machine in ('ppc', 'Power Macintosh'):
# Canonical name
machine = 'PowerPC'
return release, versioninfo, machine
def mac_ver(release='', versioninfo=('', '', ''), machine=''):
""" Get MacOS version information and return it as tuple (release,
versioninfo, machine) with versioninfo being a tuple (version,
dev_stage, non_release_version).
Entries which cannot be determined are set to the parameter values
which default to ''. All tuple entries are strings.
"""
# First try reading the information from an XML file which should
# always be present
info = _mac_ver_xml()
if info is not None:
return info
# If that also doesn't work return the default values
return release, versioninfo, machine
def _java_getprop(name, default):
from java.lang import System
try:
value = System.getProperty(name)
if value is None:
return default
return value
except AttributeError:
return default
def java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')):
""" Version interface for Jython.
Returns a tuple (release, vendor, vminfo, osinfo) with vminfo being
a tuple (vm_name, vm_release, vm_vendor) and osinfo being a
tuple (os_name, os_version, os_arch).
Values which cannot be determined are set to the defaults
given as parameters (which all default to '').
"""
# Import the needed APIs
try:
import java.lang
except ImportError:
return release, vendor, vminfo, osinfo
vendor = _java_getprop('java.vendor', vendor)
release = _java_getprop('java.version', release)
vm_name, vm_release, vm_vendor = vminfo
vm_name = _java_getprop('java.vm.name', vm_name)
vm_vendor = _java_getprop('java.vm.vendor', vm_vendor)
vm_release = _java_getprop('java.vm.version', vm_release)
vminfo = vm_name, vm_release, vm_vendor
os_name, os_version, os_arch = osinfo
os_arch = _java_getprop('java.os.arch', os_arch)
os_name = _java_getprop('java.os.name', os_name)
os_version = _java_getprop('java.os.version', os_version)
osinfo = os_name, os_version, os_arch
return release, vendor, vminfo, osinfo
### System name aliasing
def system_alias(system, release, version):
""" Returns (system, release, version) aliased to common
marketing names used for some systems.
It also does some reordering of the information in some cases
where it would otherwise cause confusion.
"""
if system == 'Rhapsody':
# Apple's BSD derivative
# XXX How can we determine the marketing release number ?
return 'MacOS X Server', system+release, version
elif system == 'SunOS':
# Sun's OS
if release < '5':
# These releases use the old name SunOS
return system, release, version
# Modify release (marketing release = SunOS release - 3)
l = release.split('.')
if l:
try:
major = int(l[0])
except ValueError:
pass
else:
major = major - 3
l[0] = str(major)
release = '.'.join(l)
if release < '6':
system = 'Solaris'
else:
# XXX Whatever the new SunOS marketing name is...
system = 'Solaris'
elif system == 'IRIX64':
# IRIX reports IRIX64 on platforms with 64-bit support; yet it
# is really a version and not a different platform, since 32-bit
# apps are also supported..
system = 'IRIX'
if version:
version = version + ' (64bit)'
else:
version = '64bit'
elif system in ('win32', 'win16'):
# In case one of the other tricks
system = 'Windows'
return system, release, version
### Various internal helpers
def _platform(*args):
""" Helper to format the platform string in a filename
compatible format e.g. "system-version-machine".
"""
# Format the platform string
platform = '-'.join(x.strip() for x in filter(len, args))
# Cleanup some possible filename obstacles...
platform = platform.replace(' ', '_')
platform = platform.replace('/', '-')
platform = platform.replace('\\', '-')
platform = platform.replace(':', '-')
platform = platform.replace(';', '-')
platform = platform.replace('"', '-')
platform = platform.replace('(', '-')
platform = platform.replace(')', '-')
# No need to report 'unknown' information...
platform = platform.replace('unknown', '')
# Fold '--'s and remove trailing '-'
while 1:
cleaned = platform.replace('--', '-')
if cleaned == platform:
break
platform = cleaned
while platform[-1] == '-':
platform = platform[:-1]
return platform
def _node(default=''):
""" Helper to determine the node name of this machine.
"""
try:
import socket
except ImportError:
# No sockets...
return default
try:
return socket.gethostname()
except OSError:
# Still not working...
return default
def _follow_symlinks(filepath):
""" In case filepath is a symlink, follow it until a
real file is reached.
"""
filepath = os.path.abspath(filepath)
while os.path.islink(filepath):
filepath = os.path.normpath(
os.path.join(os.path.dirname(filepath), os.readlink(filepath)))
return filepath
def _syscmd_uname(option, default=''):
""" Interface to the system's uname command.
"""
if sys.platform in ('dos', 'win32', 'win16'):
# XXX Others too ?
return default
try:
f = os.popen('uname %s 2> %s' % (option, DEV_NULL))
except (AttributeError, OSError):
return default
output = f.read().strip()
rc = f.close()
if not output or rc:
return default
else:
return output
def _syscmd_file(target, default=''):
""" Interface to the system's file command.
The function uses the -b option of the file command to have it
omit the filename in its output. Follow the symlinks. It returns
default in case the command should fail.
"""
if sys.platform in ('dos', 'win32', 'win16'):
# XXX Others too ?
return default
target = _follow_symlinks(target)
try:
proc = subprocess.Popen(['file', target],
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
except (AttributeError, OSError):
return default
output = proc.communicate()[0].decode('latin-1')
rc = proc.wait()
if not output or rc:
return default
else:
return output
### Information about the used architecture
# Default values for architecture; non-empty strings override the
# defaults given as parameters
_default_architecture = {
'win32': ('', 'WindowsPE'),
'win16': ('', 'Windows'),
'dos': ('', 'MSDOS'),
}
def architecture(executable=sys.executable, bits='', linkage=''):
""" Queries the given executable (defaults to the Python interpreter
binary) for various architecture information.
Returns a tuple (bits, linkage) which contains information about
the bit architecture and the linkage format used for the
executable. Both values are returned as strings.
Values that cannot be determined are returned as given by the
parameter presets. If bits is given as '', the sizeof(pointer)
(or sizeof(long) on Python version < 1.5.2) is used as
indicator for the supported pointer size.
The function relies on the system's "file" command to do the
actual work. This is available on most if not all Unix
platforms. On some non-Unix platforms where the "file" command
does not exist and the executable is set to the Python interpreter
binary defaults from _default_architecture are used.
"""
# Use the sizeof(pointer) as default number of bits if nothing
# else is given as default.
if not bits:
import struct
try:
size = struct.calcsize('P')
except struct.error:
# Older installations can only query longs
size = struct.calcsize('l')
bits = str(size*8) + 'bit'
# Get data from the 'file' system command
if executable:
fileout = _syscmd_file(executable, '')
else:
fileout = ''
if not fileout and \
executable == sys.executable:
# "file" command did not return anything; we'll try to provide
# some sensible defaults then...
if sys.platform in _default_architecture:
b, l = _default_architecture[sys.platform]
if b:
bits = b
if l:
linkage = l
return bits, linkage
if 'executable' not in fileout:
# Format not supported
return bits, linkage
# Bits
if '32-bit' in fileout:
bits = '32bit'
elif 'N32' in fileout:
# On Irix only
bits = 'n32bit'
elif '64-bit' in fileout:
bits = '64bit'
# Linkage
if 'ELF' in fileout:
linkage = 'ELF'
elif 'PE' in fileout:
# E.g. Windows uses this format
if 'Windows' in fileout:
linkage = 'WindowsPE'
else:
linkage = 'PE'
elif 'COFF' in fileout:
linkage = 'COFF'
elif 'MS-DOS' in fileout:
linkage = 'MSDOS'
else:
# XXX the A.OUT format also falls under this class...
pass
return bits, linkage
### Portable uname() interface
uname_result = collections.namedtuple("uname_result",
"system node release version machine processor")
_uname_cache = None
def uname():
""" Fairly portable uname interface. Returns a tuple
of strings (system, node, release, version, machine, processor)
identifying the underlying platform.
Note that unlike the os.uname function this also returns
possible processor information as an additional tuple entry.
Entries which cannot be determined are set to ''.
"""
global _uname_cache
no_os_uname = 0
if _uname_cache is not None:
return _uname_cache
processor = ''
# Get some infos from the builtin os.uname API...
try:
system, node, release, version, machine = os.uname()
except AttributeError:
no_os_uname = 1
if no_os_uname or not list(filter(None, (system, node, release, version, machine))):
# Hmm, no there is either no uname or uname has returned
#'unknowns'... we'll have to poke around the system then.
if no_os_uname:
system = sys.platform
release = ''
version = ''
node = _node()
machine = ''
use_syscmd_ver = 1
# Try win32_ver() on win32 platforms
if system == 'win32':
release, version, csd, ptype = win32_ver()
if release and version:
use_syscmd_ver = 0
# Try to use the PROCESSOR_* environment variables
# available on Win XP and later; see
# http://support.microsoft.com/kb/888731 and
# http://www.geocities.com/rick_lively/MANUALS/ENV/MSWIN/PROCESSI.HTM
if not machine:
# WOW64 processes mask the native architecture
if "PROCESSOR_ARCHITEW6432" in os.environ:
machine = os.environ.get("PROCESSOR_ARCHITEW6432", '')
else:
machine = os.environ.get('PROCESSOR_ARCHITECTURE', '')
if not processor:
processor = os.environ.get('PROCESSOR_IDENTIFIER', machine)
# Try the 'ver' system command available on some
# platforms
if use_syscmd_ver:
system, release, version = _syscmd_ver(system)
# Normalize system to what win32_ver() normally returns
# (_syscmd_ver() tends to return the vendor name as well)
if system == 'Microsoft Windows':
system = 'Windows'
elif system == 'Microsoft' and release == 'Windows':
# Under Windows Vista and Windows Server 2008,
# Microsoft changed the output of the ver command. The
# release is no longer printed. This causes the
# system and release to be misidentified.
system = 'Windows'
if '6.0' == version[:3]:
release = 'Vista'
else:
release = ''
# In case we still don't know anything useful, we'll try to
# help ourselves
if system in ('win32', 'win16'):
if not version:
if system == 'win32':
version = '32bit'
else:
version = '16bit'
system = 'Windows'
elif system[:4] == 'java':
release, vendor, vminfo, osinfo = java_ver()
system = 'Java'
version = ', '.join(vminfo)
if not version:
version = vendor
# System specific extensions
if system == 'OpenVMS':
# OpenVMS seems to have release and version mixed up
if not release or release == '0':
release = version
version = ''
# Get processor information
try:
import vms_lib
except ImportError:
pass
else:
csid, cpu_number = vms_lib.getsyi('SYI$_CPU', 0)
if (cpu_number >= 128):
processor = 'Alpha'
else:
processor = 'VAX'
if not processor:
# Get processor information from the uname system command
processor = _syscmd_uname('-p', '')
#If any unknowns still exist, replace them with ''s, which are more portable
if system == 'unknown':
system = ''
if node == 'unknown':
node = ''
if release == 'unknown':
release = ''
if version == 'unknown':
version = ''
if machine == 'unknown':
machine = ''
if processor == 'unknown':
processor = ''
# normalize name
if system == 'Microsoft' and release == 'Windows':
system = 'Windows'
release = 'Vista'
_uname_cache = uname_result(system, node, release, version,
machine, processor)
return _uname_cache
### Direct interfaces to some of the uname() return values
def system():
""" Returns the system/OS name, e.g. 'Linux', 'Windows' or 'Java'.
An empty string is returned if the value cannot be determined.
"""
return uname().system
def node():
""" Returns the computer's network name (which may not be fully
qualified)
An empty string is returned if the value cannot be determined.
"""
return uname().node
def release():
""" Returns the system's release, e.g. '2.2.0' or 'NT'
An empty string is returned if the value cannot be determined.
"""
return uname().release
def version():
""" Returns the system's release version, e.g. '#3 on degas'
An empty string is returned if the value cannot be determined.
"""
return uname().version
def machine():
""" Returns the machine type, e.g. 'i386'
An empty string is returned if the value cannot be determined.
"""
return uname().machine
def processor():
""" Returns the (true) processor name, e.g. 'amdk6'
An empty string is returned if the value cannot be
determined. Note that many platforms do not provide this
information or simply return the same value as for machine(),
e.g. NetBSD does this.
"""
return uname().processor
### Various APIs for extracting information from sys.version
_sys_version_parser = re.compile(
r'([\w.+]+)\s*'
'\(#?([^,]+),\s*([\w ]+),\s*([\w :]+)\)\s*'
'\[([^\]]+)\]?', re.ASCII)
_ironpython_sys_version_parser = re.compile(
r'([\w.+]+)\s*' # "version<space>"
r'(?: DEBUG)?\s*' # DEBUG - IronPython DEBUG builds only
r'\(([^,]+)\)\s*' # "(fileversion)<space>"
r'\[([^\]]+)\]?') # "[compiler]"
_pypy_sys_version_parser = re.compile(
r'([\w.+]+)\s*'
'\(#?([^,]+),\s*([\w ]+),\s*([\w :]+)\)\s*'
'\[PyPy [^\]]+\]?')
_sys_version_cache = {}
def _sys_version(sys_version=None):
""" Returns a parsed version of Python's sys.version as tuple
(name, version, branch, revision, buildno, builddate, compiler)
referring to the Python implementation name, version, branch,
revision, build number, build date/time as string and the compiler
identification string.
Note that unlike the Python sys.version, the returned value
for the Python version will always include the patchlevel (it
defaults to '.0').
The function returns empty strings for tuple entries that
cannot be determined.
sys_version may be given to parse an alternative version
string, e.g. if the version was read from a different Python
interpreter.
"""
# Get the Python version
if sys_version is None:
sys_version = sys.version
# Try the cache first
result = _sys_version_cache.get(sys_version, None)
if result is not None:
return result
# Parse it
if sys.implementation.name == "ironpython":
# IronPython
name = 'IronPython'
match = _ironpython_sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse IronPython sys.version: %s' %
repr(sys_version))
version, _, _ = match.groups()
buildno = ''
builddate = ''
compiler = ''
elif sys.platform.startswith('java'):
# Jython
name = 'Jython'
match = _sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse Jython sys.version: %s' %
repr(sys_version))
version, buildno, builddate, buildtime, _ = match.groups()
compiler = sys.platform
elif "PyPy" in sys_version:
# PyPy
name = "PyPy"
match = _pypy_sys_version_parser.match(sys_version)
if match is None:
raise ValueError("failed to parse PyPy sys.version: %s" %
repr(sys_version))
version, buildno, builddate, buildtime = match.groups()
compiler = ""
else:
# CPython
match = _sys_version_parser.match(sys_version)
if match is None:
raise ValueError(
'failed to parse CPython sys.version: %s' %
repr(sys_version))
version, buildno, builddate, buildtime, compiler = \
match.groups()
name = 'CPython'
builddate = builddate + ' ' + buildtime
if hasattr(sys, '_mercurial'):
_, branch, revision = sys._mercurial
elif hasattr(sys, 'subversion'):
# sys.subversion was added in Python 2.5
_, branch, revision = sys.subversion
else:
branch = ''
revision = ''
# Add the patchlevel version if missing
l = version.split('.')
if len(l) == 2:
l.append('0')
version = '.'.join(l)
# Build and cache the result
result = (name, version, branch, revision, buildno, builddate, compiler)
_sys_version_cache[sys_version] = result
return result
def python_implementation():
""" Returns a string identifying the Python implementation.
Currently, the following implementations are identified:
'CPython' (C implementation of Python),
'IronPython' (.NET implementation of Python),
'Jython' (Java implementation of Python),
'PyPy' (Python implementation of Python).
"""
return _sys_version()[0]
def python_version():
""" Returns the Python version as string 'major.minor.patchlevel'
Note that unlike the Python sys.version, the returned value
will always include the patchlevel (it defaults to 0).
"""
return _sys_version()[1]
def python_version_tuple():
""" Returns the Python version as tuple (major, minor, patchlevel)
of strings.
Note that unlike the Python sys.version, the returned value
will always include the patchlevel (it defaults to 0).
"""
return tuple(_sys_version()[1].split('.'))
def python_branch():
""" Returns a string identifying the Python implementation
branch.
For CPython this is the Subversion branch from which the
Python binary was built.
If not available, an empty string is returned.
"""
return _sys_version()[2]
def python_revision():
""" Returns a string identifying the Python implementation
revision.
For CPython this is the Subversion revision from which the
Python binary was built.
If not available, an empty string is returned.
"""
return _sys_version()[3]
def python_build():
""" Returns a tuple (buildno, builddate) stating the Python
build number and date as strings.
"""
return _sys_version()[4:6]
def python_compiler():
""" Returns a string identifying the compiler used for compiling
Python.
"""
return _sys_version()[6]
### The Opus Magnum of platform strings :-)
_platform_cache = {}
def platform(aliased=0, terse=0):
""" Returns a single string identifying the underlying platform
with as much useful information as possible (but no more :).
The output is intended to be human readable rather than
machine parseable. It may look different on different
platforms and this is intended.
If "aliased" is true, the function will use aliases for
various platforms that report system names which differ from
their common names, e.g. SunOS will be reported as
Solaris. The system_alias() function is used to implement
this.
Setting terse to true causes the function to return only the
absolute minimum information needed to identify the platform.
"""
result = _platform_cache.get((aliased, terse), None)
if result is not None:
return result
# Get uname information and then apply platform specific cosmetics
# to it...
system, node, release, version, machine, processor = uname()
if machine == processor:
processor = ''
if aliased:
system, release, version = system_alias(system, release, version)
if system == 'Windows':
# MS platforms
rel, vers, csd, ptype = win32_ver(version)
if terse:
platform = _platform(system, release)
else:
platform = _platform(system, release, version, csd)
elif system in ('Linux',):
# Linux based systems
distname, distversion, distid = dist('')
if distname and not terse:
platform = _platform(system, release, machine, processor,
'with',
distname, distversion, distid)
else:
# If the distribution name is unknown check for libc vs. glibc
libcname, libcversion = libc_ver(sys.executable)
platform = _platform(system, release, machine, processor,
'with',
libcname+libcversion)
elif system == 'Java':
# Java platforms
r, v, vminfo, (os_name, os_version, os_arch) = java_ver()
if terse or not os_name:
platform = _platform(system, release, version)
else:
platform = _platform(system, release, version,
'on',
os_name, os_version, os_arch)
elif system == 'MacOS':
# MacOS platforms
if terse:
platform = _platform(system, release)
else:
platform = _platform(system, release, machine)
else:
# Generic handler
if terse:
platform = _platform(system, release)
else:
bits, linkage = architecture(sys.executable)
platform = _platform(system, release, machine,
processor, bits, linkage)
_platform_cache[(aliased, terse)] = platform
return platform
### Command line interface
if __name__ == '__main__':
# Default is to print the aliased verbose platform string
terse = ('terse' in sys.argv or '--terse' in sys.argv)
aliased = (not 'nonaliased' in sys.argv and not '--nonaliased' in sys.argv)
print(platform(aliased, terse))
sys.exit(0)
r"""plistlib.py -- a tool to generate and parse MacOSX .plist files.
The property list (.plist) file format is a simple XML pickle supporting
basic object types, like dictionaries, lists, numbers and strings.
Usually the top level object is a dictionary.
To write out a plist file, use the dump(value, file)
function. 'value' is the top level object, 'file' is
a (writable) file object.
To parse a plist from a file, use the load(file) function,
with a (readable) file object as the only argument. It
returns the top level object (again, usually a dictionary).
To work with plist data in bytes objects, you can use loads()
and dumps().
Values can be strings, integers, floats, booleans, tuples, lists,
dictionaries (but only with string keys), Data, bytes, bytearray, or
datetime.datetime objects.
Generate Plist example:
pl = dict(
aString = "Doodah",
aList = ["A", "B", 12, 32.1, [1, 2, 3]],
aFloat = 0.1,
anInt = 728,
aDict = dict(
anotherString = "<hello & hi there!>",
aUnicodeValue = "M\xe4ssig, Ma\xdf",
aTrueValue = True,
aFalseValue = False,
),
someData = b"<binary gunk>",
someMoreData = b"<lots of binary gunk>" * 10,
aDate = datetime.datetime.fromtimestamp(time.mktime(time.gmtime())),
)
with open(fileName, 'wb') as fp:
dump(pl, fp)
Parse Plist example:
with open(fileName, 'rb') as fp:
pl = load(fp)
print(pl["aKey"])
"""
__all__ = [
"readPlist", "writePlist", "readPlistFromBytes", "writePlistToBytes",
"Plist", "Data", "Dict", "FMT_XML", "FMT_BINARY",
"load", "dump", "loads", "dumps"
]
import binascii
import codecs
import contextlib
import datetime
import enum
from io import BytesIO
import itertools
import os
import re
import struct
from warnings import warn
from xml.parsers.expat import ParserCreate
PlistFormat = enum.Enum('PlistFormat', 'FMT_XML FMT_BINARY', module=__name__)
globals().update(PlistFormat.__members__)
#
#
# Deprecated functionality
#
#
class _InternalDict(dict):
# This class is needed while Dict is scheduled for deprecation:
# we only need to warn when a *user* instantiates Dict or when
# the "attribute notation for dict keys" is used.
__slots__ = ()
def __getattr__(self, attr):
try:
value = self[attr]
except KeyError:
raise AttributeError(attr)
warn("Attribute access from plist dicts is deprecated, use d[key] "
"notation instead", DeprecationWarning, 2)
return value
def __setattr__(self, attr, value):
warn("Attribute access from plist dicts is deprecated, use d[key] "
"notation instead", DeprecationWarning, 2)
self[attr] = value
def __delattr__(self, attr):
try:
del self[attr]
except KeyError:
raise AttributeError(attr)
warn("Attribute access from plist dicts is deprecated, use d[key] "
"notation instead", DeprecationWarning, 2)
class Dict(_InternalDict):
def __init__(self, **kwargs):
warn("The plistlib.Dict class is deprecated, use builtin dict instead",
DeprecationWarning, 2)
super().__init__(**kwargs)
@contextlib.contextmanager
def _maybe_open(pathOrFile, mode):
if isinstance(pathOrFile, str):
with open(pathOrFile, mode) as fp:
yield fp
else:
yield pathOrFile
class Plist(_InternalDict):
"""This class has been deprecated. Use dump() and load()
functions instead, together with regular dict objects.
"""
def __init__(self, **kwargs):
warn("The Plist class is deprecated, use the load() and "
"dump() functions instead", DeprecationWarning, 2)
super().__init__(**kwargs)
@classmethod
def fromFile(cls, pathOrFile):
"""Deprecated. Use the load() function instead."""
with _maybe_open(pathOrFile, 'rb') as fp:
value = load(fp)
plist = cls()
plist.update(value)
return plist
def write(self, pathOrFile):
"""Deprecated. Use the dump() function instead."""
with _maybe_open(pathOrFile, 'wb') as fp:
dump(self, fp)
def readPlist(pathOrFile):
"""
Read a .plist from a path or file. pathOrFile should either
be a file name, or a readable binary file object.
This function is deprecated, use load instead.
"""
warn("The readPlist function is deprecated, use load() instead",
DeprecationWarning, 2)
with _maybe_open(pathOrFile, 'rb') as fp:
return load(fp, fmt=None, use_builtin_types=False,
dict_type=_InternalDict)
def writePlist(value, pathOrFile):
"""
Write 'value' to a .plist file. 'pathOrFile' may either be a
file name or a (writable) file object.
This function is deprecated, use dump instead.
"""
warn("The writePlist function is deprecated, use dump() instead",
DeprecationWarning, 2)
with _maybe_open(pathOrFile, 'wb') as fp:
dump(value, fp, fmt=FMT_XML, sort_keys=True, skipkeys=False)
def readPlistFromBytes(data):
"""
Read a plist data from a bytes object. Return the root object.
This function is deprecated, use loads instead.
"""
warn("The readPlistFromBytes function is deprecated, use loads() instead",
DeprecationWarning, 2)
return load(BytesIO(data), fmt=None, use_builtin_types=False,
dict_type=_InternalDict)
def writePlistToBytes(value):
"""
Return 'value' as a plist-formatted bytes object.
This function is deprecated, use dumps instead.
"""
warn("The writePlistToBytes function is deprecated, use dumps() instead",
DeprecationWarning, 2)
f = BytesIO()
dump(value, f, fmt=FMT_XML, sort_keys=True, skipkeys=False)
return f.getvalue()
class Data:
"""
Wrapper for binary data.
This class is deprecated, use a bytes object instead.
"""
def __init__(self, data):
if not isinstance(data, bytes):
raise TypeError("data must be as bytes")
self.data = data
@classmethod
def fromBase64(cls, data):
# base64.decodebytes just calls binascii.a2b_base64;
# it seems overkill to use both base64 and binascii.
return cls(_decode_base64(data))
def asBase64(self, maxlinelength=76):
return _encode_base64(self.data, maxlinelength)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.data == other.data
elif isinstance(other, str):
return self.data == other
else:
return id(self) == id(other)
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, repr(self.data))
#
#
# End of deprecated functionality
#
#
#
# XML support
#
# XML 'header'
PLISTHEADER = b"""\
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
"""
# Regex to find any control chars, except for \t \n and \r
_controlCharPat = re.compile(
r"[\x00\x01\x02\x03\x04\x05\x06\x07\x08\x0b\x0c\x0e\x0f"
r"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f]")
def _encode_base64(s, maxlinelength=76):
# copied from base64.encodebytes(), with added maxlinelength argument
maxbinsize = (maxlinelength//4)*3
pieces = []
for i in range(0, len(s), maxbinsize):
chunk = s[i : i + maxbinsize]
pieces.append(binascii.b2a_base64(chunk))
return b''.join(pieces)
def _decode_base64(s):
if isinstance(s, str):
return binascii.a2b_base64(s.encode("utf-8"))
else:
return binascii.a2b_base64(s)
# Contents should conform to a subset of ISO 8601
# (in particular, YYYY '-' MM '-' DD 'T' HH ':' MM ':' SS 'Z'. Smaller units
# may be omitted with # a loss of precision)
_dateParser = re.compile(r"(?P<year>\d\d\d\d)(?:-(?P<month>\d\d)(?:-(?P<day>\d\d)(?:T(?P<hour>\d\d)(?::(?P<minute>\d\d)(?::(?P<second>\d\d))?)?)?)?)?Z", re.ASCII)
def _date_from_string(s):
order = ('year', 'month', 'day', 'hour', 'minute', 'second')
gd = _dateParser.match(s).groupdict()
lst = []
for key in order:
val = gd[key]
if val is None:
break
lst.append(int(val))
return datetime.datetime(*lst)
def _date_to_string(d):
return '%04d-%02d-%02dT%02d:%02d:%02dZ' % (
d.year, d.month, d.day,
d.hour, d.minute, d.second
)
def _escape(text):
m = _controlCharPat.search(text)
if m is not None:
raise ValueError("strings can't contains control characters; "
"use bytes instead")
text = text.replace("\r\n", "\n") # convert DOS line endings
text = text.replace("\r", "\n") # convert Mac line endings
text = text.replace("&", "&") # escape '&'
text = text.replace("<", "<") # escape '<'
text = text.replace(">", ">") # escape '>'
return text
class _PlistParser:
def __init__(self, use_builtin_types, dict_type):
self.stack = []
self.current_key = None
self.root = None
self._use_builtin_types = use_builtin_types
self._dict_type = dict_type
def parse(self, fileobj):
self.parser = ParserCreate()
self.parser.StartElementHandler = self.handle_begin_element
self.parser.EndElementHandler = self.handle_end_element
self.parser.CharacterDataHandler = self.handle_data
self.parser.ParseFile(fileobj)
return self.root
def handle_begin_element(self, element, attrs):
self.data = []
handler = getattr(self, "begin_" + element, None)
if handler is not None:
handler(attrs)
def handle_end_element(self, element):
handler = getattr(self, "end_" + element, None)
if handler is not None:
handler()
def handle_data(self, data):
self.data.append(data)
def add_object(self, value):
if self.current_key is not None:
if not isinstance(self.stack[-1], type({})):
raise ValueError("unexpected element at line %d" %
self.parser.CurrentLineNumber)
self.stack[-1][self.current_key] = value
self.current_key = None
elif not self.stack:
# this is the root object
self.root = value
else:
if not isinstance(self.stack[-1], type([])):
raise ValueError("unexpected element at line %d" %
self.parser.CurrentLineNumber)
self.stack[-1].append(value)
def get_data(self):
data = ''.join(self.data)
self.data = []
return data
# element handlers
def begin_dict(self, attrs):
d = self._dict_type()
self.add_object(d)
self.stack.append(d)
def end_dict(self):
if self.current_key:
raise ValueError("missing value for key '%s' at line %d" %
(self.current_key,self.parser.CurrentLineNumber))
self.stack.pop()
def end_key(self):
if self.current_key or not isinstance(self.stack[-1], type({})):
raise ValueError("unexpected key at line %d" %
self.parser.CurrentLineNumber)
self.current_key = self.get_data()
def begin_array(self, attrs):
a = []
self.add_object(a)
self.stack.append(a)
def end_array(self):
self.stack.pop()
def end_true(self):
self.add_object(True)
def end_false(self):
self.add_object(False)
def end_integer(self):
self.add_object(int(self.get_data()))
def end_real(self):
self.add_object(float(self.get_data()))
def end_string(self):
self.add_object(self.get_data())
def end_data(self):
if self._use_builtin_types:
self.add_object(_decode_base64(self.get_data()))
else:
self.add_object(Data.fromBase64(self.get_data()))
def end_date(self):
self.add_object(_date_from_string(self.get_data()))
class _DumbXMLWriter:
def __init__(self, file, indent_level=0, indent="\t"):
self.file = file
self.stack = []
self._indent_level = indent_level
self.indent = indent
def begin_element(self, element):
self.stack.append(element)
self.writeln("<%s>" % element)
self._indent_level += 1
def end_element(self, element):
assert self._indent_level > 0
assert self.stack.pop() == element
self._indent_level -= 1
self.writeln("</%s>" % element)
def simple_element(self, element, value=None):
if value is not None:
value = _escape(value)
self.writeln("<%s>%s</%s>" % (element, value, element))
else:
self.writeln("<%s/>" % element)
def writeln(self, line):
if line:
# plist has fixed encoding of utf-8
# XXX: is this test needed?
if isinstance(line, str):
line = line.encode('utf-8')
self.file.write(self._indent_level * self.indent)
self.file.write(line)
self.file.write(b'\n')
class _PlistWriter(_DumbXMLWriter):
def __init__(
self, file, indent_level=0, indent=b"\t", writeHeader=1,
sort_keys=True, skipkeys=False):
if writeHeader:
file.write(PLISTHEADER)
_DumbXMLWriter.__init__(self, file, indent_level, indent)
self._sort_keys = sort_keys
self._skipkeys = skipkeys
def write(self, value):
self.writeln("<plist version=\"1.0\">")
self.write_value(value)
self.writeln("</plist>")
def write_value(self, value):
if isinstance(value, str):
self.simple_element("string", value)
elif value is True:
self.simple_element("true")
elif value is False:
self.simple_element("false")
elif isinstance(value, int):
if -1 << 63 <= value < 1 << 64:
self.simple_element("integer", "%d" % value)
else:
raise OverflowError(value)
elif isinstance(value, float):
self.simple_element("real", repr(value))
elif isinstance(value, dict):
self.write_dict(value)
elif isinstance(value, Data):
self.write_data(value)
elif isinstance(value, (bytes, bytearray)):
self.write_bytes(value)
elif isinstance(value, datetime.datetime):
self.simple_element("date", _date_to_string(value))
elif isinstance(value, (tuple, list)):
self.write_array(value)
else:
raise TypeError("unsupported type: %s" % type(value))
def write_data(self, data):
self.write_bytes(data.data)
def write_bytes(self, data):
self.begin_element("data")
self._indent_level -= 1
maxlinelength = max(
16,
76 - len(self.indent.replace(b"\t", b" " * 8) * self._indent_level))
for line in _encode_base64(data, maxlinelength).split(b"\n"):
if line:
self.writeln(line)
self._indent_level += 1
self.end_element("data")
def write_dict(self, d):
if d:
self.begin_element("dict")
if self._sort_keys:
items = sorted(d.items())
else:
items = d.items()
for key, value in items:
if not isinstance(key, str):
if self._skipkeys:
continue
raise TypeError("keys must be strings")
self.simple_element("key", key)
self.write_value(value)
self.end_element("dict")
else:
self.simple_element("dict")
def write_array(self, array):
if array:
self.begin_element("array")
for value in array:
self.write_value(value)
self.end_element("array")
else:
self.simple_element("array")
def _is_fmt_xml(header):
prefixes = (b'<?xml', b'<plist')
for pfx in prefixes:
if header.startswith(pfx):
return True
# Also check for alternative XML encodings, this is slightly
# overkill because the Apple tools (and plistlib) will not
# generate files with these encodings.
for bom, encoding in (
(codecs.BOM_UTF8, "utf-8"),
(codecs.BOM_UTF16_BE, "utf-16-be"),
(codecs.BOM_UTF16_LE, "utf-16-le"),
# expat does not support utf-32
#(codecs.BOM_UTF32_BE, "utf-32-be"),
#(codecs.BOM_UTF32_LE, "utf-32-le"),
):
if not header.startswith(bom):
continue
for start in prefixes:
prefix = bom + start.decode('ascii').encode(encoding)
if header[:len(prefix)] == prefix:
return True
return False
#
# Binary Plist
#
class InvalidFileException (ValueError):
def __init__(self, message="Invalid file"):
ValueError.__init__(self, message)
_BINARY_FORMAT = {1: 'B', 2: 'H', 4: 'L', 8: 'Q'}
_undefined = object()
class _BinaryPlistParser:
"""
Read or write a binary plist file, following the description of the binary
format. Raise InvalidFileException in case of error, otherwise return the
root object.
see also: http://opensource.apple.com/source/CF/CF-744.18/CFBinaryPList.c
"""
def __init__(self, use_builtin_types, dict_type):
self._use_builtin_types = use_builtin_types
self._dict_type = dict_type
def parse(self, fp):
try:
# The basic file format:
# HEADER
# object...
# refid->offset...
# TRAILER
self._fp = fp
self._fp.seek(-32, os.SEEK_END)
trailer = self._fp.read(32)
if len(trailer) != 32:
raise InvalidFileException()
(
offset_size, self._ref_size, num_objects, top_object,
offset_table_offset
) = struct.unpack('>6xBBQQQ', trailer)
self._fp.seek(offset_table_offset)
self._object_offsets = self._read_ints(num_objects, offset_size)
self._objects = [_undefined] * num_objects
return self._read_object(top_object)
except (OSError, IndexError, struct.error):
raise InvalidFileException()
def _get_size(self, tokenL):
""" return the size of the next object."""
if tokenL == 0xF:
m = self._fp.read(1)[0] & 0x3
s = 1 << m
f = '>' + _BINARY_FORMAT[s]
return struct.unpack(f, self._fp.read(s))[0]
return tokenL
def _read_ints(self, n, size):
data = self._fp.read(size * n)
if size in _BINARY_FORMAT:
return struct.unpack('>' + _BINARY_FORMAT[size] * n, data)
else:
return tuple(int.from_bytes(data[i: i + size], 'big')
for i in range(0, size * n, size))
def _read_refs(self, n):
return self._read_ints(n, self._ref_size)
def _read_object(self, ref):
"""
read the object by reference.
May recursively read sub-objects (content of an array/dict/set)
"""
result = self._objects[ref]
if result is not _undefined:
return result
offset = self._object_offsets[ref]
self._fp.seek(offset)
token = self._fp.read(1)[0]
tokenH, tokenL = token & 0xF0, token & 0x0F
if token == 0x00:
result = None
elif token == 0x08:
result = False
elif token == 0x09:
result = True
# The referenced source code also mentions URL (0x0c, 0x0d) and
# UUID (0x0e), but neither can be generated using the Cocoa libraries.
elif token == 0x0f:
result = b''
elif tokenH == 0x10: # int
result = int.from_bytes(self._fp.read(1 << tokenL),
'big', signed=tokenL >= 3)
elif token == 0x22: # real
result = struct.unpack('>f', self._fp.read(4))[0]
elif token == 0x23: # real
result = struct.unpack('>d', self._fp.read(8))[0]
elif token == 0x33: # date
f = struct.unpack('>d', self._fp.read(8))[0]
# timestamp 0 of binary plists corresponds to 1/1/2001
# (year of Mac OS X 10.0), instead of 1/1/1970.
result = datetime.datetime.utcfromtimestamp(f + (31 * 365 + 8) * 86400)
elif tokenH == 0x40: # data
s = self._get_size(tokenL)
if self._use_builtin_types:
result = self._fp.read(s)
else:
result = Data(self._fp.read(s))
elif tokenH == 0x50: # ascii string
s = self._get_size(tokenL)
result = self._fp.read(s).decode('ascii')
result = result
elif tokenH == 0x60: # unicode string
s = self._get_size(tokenL)
result = self._fp.read(s * 2).decode('utf-16be')
# tokenH == 0x80 is documented as 'UID' and appears to be used for
# keyed-archiving, not in plists.
elif tokenH == 0xA0: # array
s = self._get_size(tokenL)
obj_refs = self._read_refs(s)
result = []
self._objects[ref] = result
result.extend(self._read_object(x) for x in obj_refs)
# tokenH == 0xB0 is documented as 'ordset', but is not actually
# implemented in the Apple reference code.
# tokenH == 0xC0 is documented as 'set', but sets cannot be used in
# plists.
elif tokenH == 0xD0: # dict
s = self._get_size(tokenL)
key_refs = self._read_refs(s)
obj_refs = self._read_refs(s)
result = self._dict_type()
self._objects[ref] = result
for k, o in zip(key_refs, obj_refs):
result[self._read_object(k)] = self._read_object(o)
else:
raise InvalidFileException()
self._objects[ref] = result
return result
def _count_to_size(count):
if count < 1 << 8:
return 1
elif count < 1 << 16:
return 2
elif count << 1 << 32:
return 4
else:
return 8
_scalars = (str, int, float, datetime.datetime, bytes)
class _BinaryPlistWriter (object):
def __init__(self, fp, sort_keys, skipkeys):
self._fp = fp
self._sort_keys = sort_keys
self._skipkeys = skipkeys
def write(self, value):
# Flattened object list:
self._objlist = []
# Mappings from object->objectid
# First dict has (type(object), object) as the key,
# second dict is used when object is not hashable and
# has id(object) as the key.
self._objtable = {}
self._objidtable = {}
# Create list of all objects in the plist
self._flatten(value)
# Size of object references in serialized containers
# depends on the number of objects in the plist.
num_objects = len(self._objlist)
self._object_offsets = [0]*num_objects
self._ref_size = _count_to_size(num_objects)
self._ref_format = _BINARY_FORMAT[self._ref_size]
# Write file header
self._fp.write(b'bplist00')
# Write object list
for obj in self._objlist:
self._write_object(obj)
# Write refnum->object offset table
top_object = self._getrefnum(value)
offset_table_offset = self._fp.tell()
offset_size = _count_to_size(offset_table_offset)
offset_format = '>' + _BINARY_FORMAT[offset_size] * num_objects
self._fp.write(struct.pack(offset_format, *self._object_offsets))
# Write trailer
sort_version = 0
trailer = (
sort_version, offset_size, self._ref_size, num_objects,
top_object, offset_table_offset
)
self._fp.write(struct.pack('>5xBBBQQQ', *trailer))
def _flatten(self, value):
# First check if the object is in the object table, not used for
# containers to ensure that two subcontainers with the same contents
# will be serialized as distinct values.
if isinstance(value, _scalars):
if (type(value), value) in self._objtable:
return
elif isinstance(value, Data):
if (type(value.data), value.data) in self._objtable:
return
elif id(value) in self._objidtable:
return
# Add to objectreference map
refnum = len(self._objlist)
self._objlist.append(value)
if isinstance(value, _scalars):
self._objtable[(type(value), value)] = refnum
elif isinstance(value, Data):
self._objtable[(type(value.data), value.data)] = refnum
else:
self._objidtable[id(value)] = refnum
# And finally recurse into containers
if isinstance(value, dict):
keys = []
values = []
items = value.items()
if self._sort_keys:
items = sorted(items)
for k, v in items:
if not isinstance(k, str):
if self._skipkeys:
continue
raise TypeError("keys must be strings")
keys.append(k)
values.append(v)
for o in itertools.chain(keys, values):
self._flatten(o)
elif isinstance(value, (list, tuple)):
for o in value:
self._flatten(o)
def _getrefnum(self, value):
if isinstance(value, _scalars):
return self._objtable[(type(value), value)]
elif isinstance(value, Data):
return self._objtable[(type(value.data), value.data)]
else:
return self._objidtable[id(value)]
def _write_size(self, token, size):
if size < 15:
self._fp.write(struct.pack('>B', token | size))
elif size < 1 << 8:
self._fp.write(struct.pack('>BBB', token | 0xF, 0x10, size))
elif size < 1 << 16:
self._fp.write(struct.pack('>BBH', token | 0xF, 0x11, size))
elif size < 1 << 32:
self._fp.write(struct.pack('>BBL', token | 0xF, 0x12, size))
else:
self._fp.write(struct.pack('>BBQ', token | 0xF, 0x13, size))
def _write_object(self, value):
ref = self._getrefnum(value)
self._object_offsets[ref] = self._fp.tell()
if value is None:
self._fp.write(b'\x00')
elif value is False:
self._fp.write(b'\x08')
elif value is True:
self._fp.write(b'\x09')
elif isinstance(value, int):
if value < 0:
try:
self._fp.write(struct.pack('>Bq', 0x13, value))
except struct.error:
raise OverflowError(value) from None
elif value < 1 << 8:
self._fp.write(struct.pack('>BB', 0x10, value))
elif value < 1 << 16:
self._fp.write(struct.pack('>BH', 0x11, value))
elif value < 1 << 32:
self._fp.write(struct.pack('>BL', 0x12, value))
elif value < 1 << 63:
self._fp.write(struct.pack('>BQ', 0x13, value))
elif value < 1 << 64:
self._fp.write(b'\x14' + value.to_bytes(16, 'big', signed=True))
else:
raise OverflowError(value)
elif isinstance(value, float):
self._fp.write(struct.pack('>Bd', 0x23, value))
elif isinstance(value, datetime.datetime):
f = (value - datetime.datetime(2001, 1, 1)).total_seconds()
self._fp.write(struct.pack('>Bd', 0x33, f))
elif isinstance(value, Data):
self._write_size(0x40, len(value.data))
self._fp.write(value.data)
elif isinstance(value, (bytes, bytearray)):
self._write_size(0x40, len(value))
self._fp.write(value)
elif isinstance(value, str):
try:
t = value.encode('ascii')
self._write_size(0x50, len(value))
except UnicodeEncodeError:
t = value.encode('utf-16be')
self._write_size(0x60, len(value))
self._fp.write(t)
elif isinstance(value, (list, tuple)):
refs = [self._getrefnum(o) for o in value]
s = len(refs)
self._write_size(0xA0, s)
self._fp.write(struct.pack('>' + self._ref_format * s, *refs))
elif isinstance(value, dict):
keyRefs, valRefs = [], []
if self._sort_keys:
rootItems = sorted(value.items())
else:
rootItems = value.items()
for k, v in rootItems:
if not isinstance(k, str):
if self._skipkeys:
continue
raise TypeError("keys must be strings")
keyRefs.append(self._getrefnum(k))
valRefs.append(self._getrefnum(v))
s = len(keyRefs)
self._write_size(0xD0, s)
self._fp.write(struct.pack('>' + self._ref_format * s, *keyRefs))
self._fp.write(struct.pack('>' + self._ref_format * s, *valRefs))
else:
raise TypeError(value)
def _is_fmt_binary(header):
return header[:8] == b'bplist00'
#
# Generic bits
#
_FORMATS={
FMT_XML: dict(
detect=_is_fmt_xml,
parser=_PlistParser,
writer=_PlistWriter,
),
FMT_BINARY: dict(
detect=_is_fmt_binary,
parser=_BinaryPlistParser,
writer=_BinaryPlistWriter,
)
}
def load(fp, *, fmt=None, use_builtin_types=True, dict_type=dict):
"""Read a .plist file. 'fp' should be (readable) file object.
Return the unpacked root object (which usually is a dictionary).
"""
if fmt is None:
header = fp.read(32)
fp.seek(0)
for info in _FORMATS.values():
if info['detect'](header):
P = info['parser']
break
else:
raise InvalidFileException()
else:
P = _FORMATS[fmt]['parser']
p = P(use_builtin_types=use_builtin_types, dict_type=dict_type)
return p.parse(fp)
def loads(value, *, fmt=None, use_builtin_types=True, dict_type=dict):
"""Read a .plist file from a bytes object.
Return the unpacked root object (which usually is a dictionary).
"""
fp = BytesIO(value)
return load(
fp, fmt=fmt, use_builtin_types=use_builtin_types, dict_type=dict_type)
def dump(value, fp, *, fmt=FMT_XML, sort_keys=True, skipkeys=False):
"""Write 'value' to a .plist file. 'fp' should be a (writable)
file object.
"""
if fmt not in _FORMATS:
raise ValueError("Unsupported format: %r"%(fmt,))
writer = _FORMATS[fmt]["writer"](fp, sort_keys=sort_keys, skipkeys=skipkeys)
writer.write(value)
def dumps(value, *, fmt=FMT_XML, skipkeys=False, sort_keys=True):
"""Return a bytes object with the contents for a .plist file.
"""
fp = BytesIO()
dump(value, fp, fmt=fmt, skipkeys=skipkeys, sort_keys=sort_keys)
return fp.getvalue()
"""A POP3 client class.
Based on the J. Myers POP3 draft, Jan. 96
"""
# Author: David Ascher <[email protected]>
# [heavily stealing from nntplib.py]
# Updated: Piers Lauder <[email protected]> [Jul '97]
# String method conversion and test jig improvements by ESR, February 2001.
# Added the POP3_SSL class. Methods loosely based on IMAP_SSL. Hector Urtubia <[email protected]> Aug 2003
# Example (see the test function at the end of this file)
# Imports
import errno
import re
import socket
try:
import ssl
HAVE_SSL = True
except ImportError:
HAVE_SSL = False
__all__ = ["POP3","error_proto"]
# Exception raised when an error or invalid response is received:
class error_proto(Exception): pass
# Standard Port
POP3_PORT = 110
# POP SSL PORT
POP3_SSL_PORT = 995
# Line terminators (we always output CRLF, but accept any of CRLF, LFCR, LF)
CR = b'\r'
LF = b'\n'
CRLF = CR+LF
# maximal line length when calling readline(). This is to prevent
# reading arbitrary length lines. RFC 1939 limits POP3 line length to
# 512 characters, including CRLF. We have selected 2048 just to be on
# the safe side.
_MAXLINE = 2048
class POP3:
"""This class supports both the minimal and optional command sets.
Arguments can be strings or integers (where appropriate)
(e.g.: retr(1) and retr('1') both work equally well.
Minimal Command Set:
USER name user(name)
PASS string pass_(string)
STAT stat()
LIST [msg] list(msg = None)
RETR msg retr(msg)
DELE msg dele(msg)
NOOP noop()
RSET rset()
QUIT quit()
Optional Commands (some servers support these):
RPOP name rpop(name)
APOP name digest apop(name, digest)
TOP msg n top(msg, n)
UIDL [msg] uidl(msg = None)
CAPA capa()
STLS stls()
Raises one exception: 'error_proto'.
Instantiate with:
POP3(hostname, port=110)
NB: the POP protocol locks the mailbox from user
authorization until QUIT, so be sure to get in, suck
the messages, and quit, each time you access the
mailbox.
POP is a line-based protocol, which means large mail
messages consume lots of python cycles reading them
line-by-line.
If it's available on your mail server, use IMAP4
instead, it doesn't suffer from the two problems
above.
"""
encoding = 'UTF-8'
def __init__(self, host, port=POP3_PORT,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
self.host = host
self.port = port
self._tls_established = False
self.sock = self._create_socket(timeout)
self.file = self.sock.makefile('rb')
self._debugging = 0
self.welcome = self._getresp()
def _create_socket(self, timeout):
return socket.create_connection((self.host, self.port), timeout)
def _putline(self, line):
if self._debugging > 1: print('*put*', repr(line))
self.sock.sendall(line + CRLF)
# Internal: send one command to the server (through _putline())
def _putcmd(self, line):
if self._debugging: print('*cmd*', repr(line))
line = bytes(line, self.encoding)
self._putline(line)
# Internal: return one line from the server, stripping CRLF.
# This is where all the CPU time of this module is consumed.
# Raise error_proto('-ERR EOF') if the connection is closed.
def _getline(self):
line = self.file.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise error_proto('line too long')
if self._debugging > 1: print('*get*', repr(line))
if not line: raise error_proto('-ERR EOF')
octets = len(line)
# server can send any combination of CR & LF
# however, 'readline()' returns lines ending in LF
# so only possibilities are ...LF, ...CRLF, CR...LF
if line[-2:] == CRLF:
return line[:-2], octets
if line[:1] == CR:
return line[1:-1], octets
return line[:-1], octets
# Internal: get a response from the server.
# Raise 'error_proto' if the response doesn't start with '+'.
def _getresp(self):
resp, o = self._getline()
if self._debugging > 1: print('*resp*', repr(resp))
if not resp.startswith(b'+'):
raise error_proto(resp)
return resp
# Internal: get a response plus following text from the server.
def _getlongresp(self):
resp = self._getresp()
list = []; octets = 0
line, o = self._getline()
while line != b'.':
if line.startswith(b'..'):
o = o-1
line = line[1:]
octets = octets + o
list.append(line)
line, o = self._getline()
return resp, list, octets
# Internal: send a command and get the response
def _shortcmd(self, line):
self._putcmd(line)
return self._getresp()
# Internal: send a command and get the response plus following text
def _longcmd(self, line):
self._putcmd(line)
return self._getlongresp()
# These can be useful:
def getwelcome(self):
return self.welcome
def set_debuglevel(self, level):
self._debugging = level
# Here are all the POP commands:
def user(self, user):
"""Send user name, return response
(should indicate password required).
"""
return self._shortcmd('USER %s' % user)
def pass_(self, pswd):
"""Send password, return response
(response includes message count, mailbox size).
NB: mailbox is locked by server from here to 'quit()'
"""
return self._shortcmd('PASS %s' % pswd)
def stat(self):
"""Get mailbox status.
Result is tuple of 2 ints (message count, mailbox size)
"""
retval = self._shortcmd('STAT')
rets = retval.split()
if self._debugging: print('*stat*', repr(rets))
numMessages = int(rets[1])
sizeMessages = int(rets[2])
return (numMessages, sizeMessages)
def list(self, which=None):
"""Request listing, return result.
Result without a message number argument is in form
['response', ['mesg_num octets', ...], octets].
Result when a message number argument is given is a
single response: the "scan listing" for that message.
"""
if which is not None:
return self._shortcmd('LIST %s' % which)
return self._longcmd('LIST')
def retr(self, which):
"""Retrieve whole message number 'which'.
Result is in form ['response', ['line', ...], octets].
"""
return self._longcmd('RETR %s' % which)
def dele(self, which):
"""Delete message number 'which'.
Result is 'response'.
"""
return self._shortcmd('DELE %s' % which)
def noop(self):
"""Does nothing.
One supposes the response indicates the server is alive.
"""
return self._shortcmd('NOOP')
def rset(self):
"""Unmark all messages marked for deletion."""
return self._shortcmd('RSET')
def quit(self):
"""Signoff: commit changes on server, unlock mailbox, close connection."""
resp = self._shortcmd('QUIT')
self.close()
return resp
def close(self):
"""Close the connection without assuming anything about it."""
try:
file = self.file
self.file = None
if file is not None:
file.close()
finally:
sock = self.sock
self.sock = None
if sock is not None:
try:
sock.shutdown(socket.SHUT_RDWR)
except OSError as e:
# The server might already have closed the connection
if e.errno != errno.ENOTCONN:
raise
finally:
sock.close()
#__del__ = quit
# optional commands:
def rpop(self, user):
"""Not sure what this does."""
return self._shortcmd('RPOP %s' % user)
timestamp = re.compile(br'\+OK.[^<]*(<.*>)')
def apop(self, user, password):
"""Authorisation
- only possible if server has supplied a timestamp in initial greeting.
Args:
user - mailbox user;
password - mailbox password.
NB: mailbox is locked by server from here to 'quit()'
"""
secret = bytes(password, self.encoding)
m = self.timestamp.match(self.welcome)
if not m:
raise error_proto('-ERR APOP not supported by server')
import hashlib
digest = m.group(1)+secret
digest = hashlib.md5(digest).hexdigest()
return self._shortcmd('APOP %s %s' % (user, digest))
def top(self, which, howmuch):
"""Retrieve message header of message number 'which'
and first 'howmuch' lines of message body.
Result is in form ['response', ['line', ...], octets].
"""
return self._longcmd('TOP %s %s' % (which, howmuch))
def uidl(self, which=None):
"""Return message digest (unique id) list.
If 'which', result contains unique id for that message
in the form 'response mesgnum uid', otherwise result is
the list ['response', ['mesgnum uid', ...], octets]
"""
if which is not None:
return self._shortcmd('UIDL %s' % which)
return self._longcmd('UIDL')
def capa(self):
"""Return server capabilities (RFC 2449) as a dictionary
>>> c=poplib.POP3('localhost')
>>> c.capa()
{'IMPLEMENTATION': ['Cyrus', 'POP3', 'server', 'v2.2.12'],
'TOP': [], 'LOGIN-DELAY': ['0'], 'AUTH-RESP-CODE': [],
'EXPIRE': ['NEVER'], 'USER': [], 'STLS': [], 'PIPELINING': [],
'UIDL': [], 'RESP-CODES': []}
>>>
Really, according to RFC 2449, the cyrus folks should avoid
having the implementation split into multiple arguments...
"""
def _parsecap(line):
lst = line.decode('ascii').split()
return lst[0], lst[1:]
caps = {}
try:
resp = self._longcmd('CAPA')
rawcaps = resp[1]
for capline in rawcaps:
capnm, capargs = _parsecap(capline)
caps[capnm] = capargs
except error_proto as _err:
raise error_proto('-ERR CAPA not supported by server')
return caps
def stls(self, context=None):
"""Start a TLS session on the active connection as specified in RFC 2595.
context - a ssl.SSLContext
"""
if not HAVE_SSL:
raise error_proto('-ERR TLS support missing')
if self._tls_established:
raise error_proto('-ERR TLS session already established')
caps = self.capa()
if not 'STLS' in caps:
raise error_proto('-ERR STLS not supported by server')
if context is None:
context = ssl._create_stdlib_context()
resp = self._shortcmd('STLS')
self.sock = context.wrap_socket(self.sock,
server_hostname=self.host)
self.file = self.sock.makefile('rb')
self._tls_established = True
return resp
if HAVE_SSL:
class POP3_SSL(POP3):
"""POP3 client class over SSL connection
Instantiate with: POP3_SSL(hostname, port=995, keyfile=None, certfile=None,
context=None)
hostname - the hostname of the pop3 over ssl server
port - port number
keyfile - PEM formatted file that contains your private key
certfile - PEM formatted certificate chain file
context - a ssl.SSLContext
See the methods of the parent class POP3 for more documentation.
"""
def __init__(self, host, port=POP3_SSL_PORT, keyfile=None, certfile=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT, context=None):
if context is not None and keyfile is not None:
raise ValueError("context and keyfile arguments are mutually "
"exclusive")
if context is not None and certfile is not None:
raise ValueError("context and certfile arguments are mutually "
"exclusive")
self.keyfile = keyfile
self.certfile = certfile
if context is None:
context = ssl._create_stdlib_context(certfile=certfile,
keyfile=keyfile)
self.context = context
POP3.__init__(self, host, port, timeout)
def _create_socket(self, timeout):
sock = POP3._create_socket(self, timeout)
sock = self.context.wrap_socket(sock,
server_hostname=self.host)
return sock
def stls(self, keyfile=None, certfile=None, context=None):
"""The method unconditionally raises an exception since the
STLS command doesn't make any sense on an already established
SSL/TLS session.
"""
raise error_proto('-ERR TLS session already established')
__all__.append("POP3_SSL")
if __name__ == "__main__":
import sys
a = POP3(sys.argv[1])
print(a.getwelcome())
a.user(sys.argv[2])
a.pass_(sys.argv[3])
a.list()
(numMsgs, totalSize) = a.stat()
for i in range(1, numMsgs + 1):
(header, msg, octets) = a.retr(i)
print("Message %d:" % i)
for line in msg:
print(' ' + line)
print('-----------------------')
a.quit()
"""Common operations on Posix pathnames.
Instead of importing this module directly, import os and refer to
this module as os.path. The "os.path" name is an alias for this
module on Posix systems; on other systems (e.g. Mac, Windows),
os.path provides the same operations in a manner specific to that
platform, and is an alias to another module (e.g. macpath, ntpath).
Some of this can actually be useful on non-Posix systems too, e.g.
for manipulation of the pathname component of URLs.
"""
import os
import sys
import stat
import genericpath
from genericpath import *
__all__ = ["normcase","isabs","join","splitdrive","split","splitext",
"basename","dirname","commonprefix","getsize","getmtime",
"getatime","getctime","islink","exists","lexists","isdir","isfile",
"ismount", "expanduser","expandvars","normpath","abspath",
"samefile","sameopenfile","samestat",
"curdir","pardir","sep","pathsep","defpath","altsep","extsep",
"devnull","realpath","supports_unicode_filenames","relpath"]
# Strings representing various path-related bits and pieces.
# These are primarily for export; internally, they are hardcoded.
curdir = '.'
pardir = '..'
extsep = '.'
sep = '/'
pathsep = ':'
defpath = ':/bin:/usr/bin'
altsep = None
devnull = '/dev/null'
def _get_sep(path):
if isinstance(path, bytes):
return b'/'
else:
return '/'
# Normalize the case of a pathname. Trivial in Posix, string.lower on Mac.
# On MS-DOS this may also turn slashes into backslashes; however, other
# normalizations (such as optimizing '../' away) are not allowed
# (another function should be defined to do that).
def normcase(s):
"""Normalize case of pathname. Has no effect under Posix"""
if not isinstance(s, (bytes, str)):
raise TypeError("normcase() argument must be str or bytes, "
"not '{}'".format(s.__class__.__name__))
return s
# Return whether a path is absolute.
# Trivial in Posix, harder on the Mac or MS-DOS.
def isabs(s):
"""Test whether a path is absolute"""
sep = _get_sep(s)
return s.startswith(sep)
# Join pathnames.
# Ignore the previous parts if a part is absolute.
# Insert a '/' unless the first part is empty or already ends in '/'.
def join(a, *p):
"""Join two or more pathname components, inserting '/' as needed.
If any component is an absolute path, all previous path components
will be discarded. An empty last part will result in a path that
ends with a separator."""
sep = _get_sep(a)
path = a
try:
for b in p:
if b.startswith(sep):
path = b
elif not path or path.endswith(sep):
path += b
else:
path += sep + b
except TypeError:
if all(isinstance(s, (str, bytes)) for s in (a,) + p):
# Must have a mixture of text and binary data
raise TypeError("Can't mix strings and bytes in path "
"components") from None
raise
return path
# Split a path in head (everything up to the last '/') and tail (the
# rest). If the path ends in '/', tail will be empty. If there is no
# '/' in the path, head will be empty.
# Trailing '/'es are stripped from head unless it is the root.
def split(p):
"""Split a pathname. Returns tuple "(head, tail)" where "tail" is
everything after the final slash. Either part may be empty."""
sep = _get_sep(p)
i = p.rfind(sep) + 1
head, tail = p[:i], p[i:]
if head and head != sep*len(head):
head = head.rstrip(sep)
return head, tail
# Split a path in root and extension.
# The extension is everything starting at the last dot in the last
# pathname component; the root is everything before that.
# It is always true that root + ext == p.
def splitext(p):
if isinstance(p, bytes):
sep = b'/'
extsep = b'.'
else:
sep = '/'
extsep = '.'
return genericpath._splitext(p, sep, None, extsep)
splitext.__doc__ = genericpath._splitext.__doc__
# Split a pathname into a drive specification and the rest of the
# path. Useful on DOS/Windows/NT; on Unix, the drive is always empty.
def splitdrive(p):
"""Split a pathname into drive and path. On Posix, drive is always
empty."""
return p[:0], p
# Return the tail (basename) part of a path, same as split(path)[1].
def basename(p):
"""Returns the final component of a pathname"""
sep = _get_sep(p)
i = p.rfind(sep) + 1
return p[i:]
# Return the head (dirname) part of a path, same as split(path)[0].
def dirname(p):
"""Returns the directory component of a pathname"""
sep = _get_sep(p)
i = p.rfind(sep) + 1
head = p[:i]
if head and head != sep*len(head):
head = head.rstrip(sep)
return head
# Is a path a symbolic link?
# This will always return false on systems where os.lstat doesn't exist.
def islink(path):
"""Test whether a path is a symbolic link"""
try:
st = os.lstat(path)
except (OSError, AttributeError):
return False
return stat.S_ISLNK(st.st_mode)
# Being true for dangling symbolic links is also useful.
def lexists(path):
"""Test whether a path exists. Returns True for broken symbolic links"""
try:
os.lstat(path)
except OSError:
return False
return True
# Is a path a mount point?
# (Does this work for all UNIXes? Is it even guaranteed to work by Posix?)
def ismount(path):
"""Test whether a path is a mount point"""
try:
s1 = os.lstat(path)
except OSError:
# It doesn't exist -- so not a mount point. :-)
return False
else:
# A symlink can never be a mount point
if stat.S_ISLNK(s1.st_mode):
return False
if isinstance(path, bytes):
parent = join(path, b'..')
else:
parent = join(path, '..')
try:
s2 = os.lstat(parent)
except OSError:
return False
dev1 = s1.st_dev
dev2 = s2.st_dev
if dev1 != dev2:
return True # path/.. on a different device as path
ino1 = s1.st_ino
ino2 = s2.st_ino
if ino1 == ino2:
return True # path/.. is the same i-node as path
return False
# Expand paths beginning with '~' or '~user'.
# '~' means $HOME; '~user' means that user's home directory.
# If the path doesn't begin with '~', or if the user or $HOME is unknown,
# the path is returned unchanged (leaving error reporting to whatever
# function is called with the expanded path as argument).
# See also module 'glob' for expansion of *, ? and [...] in pathnames.
# (A function should also be defined to do full *sh-style environment
# variable expansion.)
def expanduser(path):
"""Expand ~ and ~user constructions. If user or $HOME is unknown,
do nothing."""
if isinstance(path, bytes):
tilde = b'~'
else:
tilde = '~'
if not path.startswith(tilde):
return path
sep = _get_sep(path)
i = path.find(sep, 1)
if i < 0:
i = len(path)
if i == 1:
if 'HOME' not in os.environ:
import pwd
userhome = pwd.getpwuid(os.getuid()).pw_dir
else:
userhome = os.environ['HOME']
else:
import pwd
name = path[1:i]
if isinstance(name, bytes):
name = str(name, 'ASCII')
try:
pwent = pwd.getpwnam(name)
except KeyError:
return path
userhome = pwent.pw_dir
if isinstance(path, bytes):
userhome = os.fsencode(userhome)
root = b'/'
else:
root = '/'
userhome = userhome.rstrip(root)
return (userhome + path[i:]) or root
# Expand paths containing shell variable substitutions.
# This expands the forms $variable and ${variable} only.
# Non-existent variables are left unchanged.
_varprog = None
_varprogb = None
def expandvars(path):
"""Expand shell variables of form $var and ${var}. Unknown variables
are left unchanged."""
global _varprog, _varprogb
if isinstance(path, bytes):
if b'$' not in path:
return path
if not _varprogb:
import re
_varprogb = re.compile(br'\$(\w+|\{[^}]*\})', re.ASCII)
search = _varprogb.search
start = b'{'
end = b'}'
environ = getattr(os, 'environb', None)
else:
if '$' not in path:
return path
if not _varprog:
import re
_varprog = re.compile(r'\$(\w+|\{[^}]*\})', re.ASCII)
search = _varprog.search
start = '{'
end = '}'
environ = os.environ
i = 0
while True:
m = search(path, i)
if not m:
break
i, j = m.span(0)
name = m.group(1)
if name.startswith(start) and name.endswith(end):
name = name[1:-1]
try:
if environ is None:
value = os.fsencode(os.environ[os.fsdecode(name)])
else:
value = environ[name]
except KeyError:
i = j
else:
tail = path[j:]
path = path[:i] + value
i = len(path)
path += tail
return path
# Normalize a path, e.g. A//B, A/./B and A/foo/../B all become A/B.
# It should be understood that this may change the meaning of the path
# if it contains symbolic links!
def normpath(path):
"""Normalize path, eliminating double slashes, etc."""
if isinstance(path, bytes):
sep = b'/'
empty = b''
dot = b'.'
dotdot = b'..'
else:
sep = '/'
empty = ''
dot = '.'
dotdot = '..'
if path == empty:
return dot
initial_slashes = path.startswith(sep)
# POSIX allows one or two initial slashes, but treats three or more
# as single slash.
if (initial_slashes and
path.startswith(sep*2) and not path.startswith(sep*3)):
initial_slashes = 2
comps = path.split(sep)
new_comps = []
for comp in comps:
if comp in (empty, dot):
continue
if (comp != dotdot or (not initial_slashes and not new_comps) or
(new_comps and new_comps[-1] == dotdot)):
new_comps.append(comp)
elif new_comps:
new_comps.pop()
comps = new_comps
path = sep.join(comps)
if initial_slashes:
path = sep*initial_slashes + path
return path or dot
def abspath(path):
"""Return an absolute path."""
if not isabs(path):
if isinstance(path, bytes):
cwd = os.getcwdb()
else:
cwd = os.getcwd()
path = join(cwd, path)
return normpath(path)
# Return a canonical path (i.e. the absolute location of a file on the
# filesystem).
def realpath(filename):
"""Return the canonical path of the specified filename, eliminating any
symbolic links encountered in the path."""
path, ok = _joinrealpath(filename[:0], filename, {})
return abspath(path)
# Join two paths, normalizing ang eliminating any symbolic links
# encountered in the second path.
def _joinrealpath(path, rest, seen):
if isinstance(path, bytes):
sep = b'/'
curdir = b'.'
pardir = b'..'
else:
sep = '/'
curdir = '.'
pardir = '..'
if isabs(rest):
rest = rest[1:]
path = sep
while rest:
name, _, rest = rest.partition(sep)
if not name or name == curdir:
# current dir
continue
if name == pardir:
# parent dir
if path:
path, name = split(path)
if name == pardir:
path = join(path, pardir, pardir)
else:
path = pardir
continue
newpath = join(path, name)
if not islink(newpath):
path = newpath
continue
# Resolve the symbolic link
if newpath in seen:
# Already seen this path
path = seen[newpath]
if path is not None:
# use cached value
continue
# The symlink is not resolved, so we must have a symlink loop.
# Return already resolved part + rest of the path unchanged.
return join(newpath, rest), False
seen[newpath] = None # not resolved symlink
path, ok = _joinrealpath(path, os.readlink(newpath), seen)
if not ok:
return join(path, rest), False
seen[newpath] = path # resolved symlink
return path, True
supports_unicode_filenames = (sys.platform == 'darwin')
def relpath(path, start=None):
"""Return a relative version of a path"""
if not path:
raise ValueError("no path specified")
if isinstance(path, bytes):
curdir = b'.'
sep = b'/'
pardir = b'..'
else:
curdir = '.'
sep = '/'
pardir = '..'
if start is None:
start = curdir
start_list = [x for x in abspath(start).split(sep) if x]
path_list = [x for x in abspath(path).split(sep) if x]
# Work out how much of the filepath is shared by start and path.
i = len(commonprefix([start_list, path_list]))
rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
if not rel_list:
return curdir
return join(*rel_list)
# Author: Fred L. Drake, Jr.
# [email protected]
#
# This is a simple little module I wrote to make life easier. I didn't
# see anything quite like it in the library, though I may have overlooked
# something. I wrote this when I was trying to read some heavily nested
# tuples with fairly non-descriptive content. This is modeled very much
# after Lisp/Scheme - style pretty-printing of lists. If you find it
# useful, thank small children who sleep at night.
"""Support to pretty-print lists, tuples, & dictionaries recursively.
Very simple, but useful, especially in debugging data structures.
Classes
-------
PrettyPrinter()
Handle pretty-printing operations onto a stream using a configured
set of formatting parameters.
Functions
---------
pformat()
Format a Python object into a pretty-printed representation.
pprint()
Pretty-print a Python object to a stream [default is sys.stdout].
saferepr()
Generate a 'standard' repr()-like value, but protect against recursive
data structures.
"""
import re
import sys as _sys
from collections import OrderedDict as _OrderedDict
from io import StringIO as _StringIO
__all__ = ["pprint","pformat","isreadable","isrecursive","saferepr",
"PrettyPrinter"]
def pprint(object, stream=None, indent=1, width=80, depth=None, *,
compact=False):
"""Pretty-print a Python object to a stream [default is sys.stdout]."""
printer = PrettyPrinter(
stream=stream, indent=indent, width=width, depth=depth,
compact=compact)
printer.pprint(object)
def pformat(object, indent=1, width=80, depth=None, *, compact=False):
"""Format a Python object into a pretty-printed representation."""
return PrettyPrinter(indent=indent, width=width, depth=depth,
compact=compact).pformat(object)
def saferepr(object):
"""Version of repr() which can handle recursive data structures."""
return _safe_repr(object, {}, None, 0)[0]
def isreadable(object):
"""Determine if saferepr(object) is readable by eval()."""
return _safe_repr(object, {}, None, 0)[1]
def isrecursive(object):
"""Determine if object requires a recursive representation."""
return _safe_repr(object, {}, None, 0)[2]
class _safe_key:
"""Helper function for key functions when sorting unorderable objects.
The wrapped-object will fallback to an Py2.x style comparison for
unorderable types (sorting first comparing the type name and then by
the obj ids). Does not work recursively, so dict.items() must have
_safe_key applied to both the key and the value.
"""
__slots__ = ['obj']
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
try:
rv = self.obj.__lt__(other.obj)
except TypeError:
rv = NotImplemented
if rv is NotImplemented:
rv = (str(type(self.obj)), id(self.obj)) < \
(str(type(other.obj)), id(other.obj))
return rv
def _safe_tuple(t):
"Helper function for comparing 2-tuples"
return _safe_key(t[0]), _safe_key(t[1])
class PrettyPrinter:
def __init__(self, indent=1, width=80, depth=None, stream=None, *,
compact=False):
"""Handle pretty printing operations onto a stream using a set of
configured parameters.
indent
Number of spaces to indent for each level of nesting.
width
Attempted maximum number of columns in the output.
depth
The maximum depth to print out nested structures.
stream
The desired output stream. If omitted (or false), the standard
output stream available at construction will be used.
compact
If true, several items will be combined in one line.
"""
indent = int(indent)
width = int(width)
assert indent >= 0, "indent must be >= 0"
assert depth is None or depth > 0, "depth must be > 0"
assert width, "width must be != 0"
self._depth = depth
self._indent_per_level = indent
self._width = width
if stream is not None:
self._stream = stream
else:
self._stream = _sys.stdout
self._compact = bool(compact)
def pprint(self, object):
self._format(object, self._stream, 0, 0, {}, 0)
self._stream.write("\n")
def pformat(self, object):
sio = _StringIO()
self._format(object, sio, 0, 0, {}, 0)
return sio.getvalue()
def isrecursive(self, object):
return self.format(object, {}, 0, 0)[2]
def isreadable(self, object):
s, readable, recursive = self.format(object, {}, 0, 0)
return readable and not recursive
def _format(self, object, stream, indent, allowance, context, level):
level = level + 1
objid = id(object)
if objid in context:
stream.write(_recursion(object))
self._recursive = True
self._readable = False
return
rep = self._repr(object, context, level - 1)
typ = type(object)
max_width = self._width - 1 - indent - allowance
sepLines = len(rep) > max_width
write = stream.write
if sepLines:
r = getattr(typ, "__repr__", None)
if issubclass(typ, dict):
write('{')
if self._indent_per_level > 1:
write((self._indent_per_level - 1) * ' ')
length = len(object)
if length:
context[objid] = 1
indent = indent + self._indent_per_level
if issubclass(typ, _OrderedDict):
items = list(object.items())
else:
items = sorted(object.items(), key=_safe_tuple)
key, ent = items[0]
rep = self._repr(key, context, level)
write(rep)
write(': ')
self._format(ent, stream, indent + len(rep) + 2,
allowance + 1, context, level)
if length > 1:
for key, ent in items[1:]:
rep = self._repr(key, context, level)
write(',\n%s%s: ' % (' '*indent, rep))
self._format(ent, stream, indent + len(rep) + 2,
allowance + 1, context, level)
indent = indent - self._indent_per_level
del context[objid]
write('}')
return
if ((issubclass(typ, list) and r is list.__repr__) or
(issubclass(typ, tuple) and r is tuple.__repr__) or
(issubclass(typ, set) and r is set.__repr__) or
(issubclass(typ, frozenset) and r is frozenset.__repr__)
):
length = len(object)
if issubclass(typ, list):
write('[')
endchar = ']'
elif issubclass(typ, tuple):
write('(')
endchar = ')'
else:
if not length:
write(rep)
return
if typ is set:
write('{')
endchar = '}'
else:
write(typ.__name__)
write('({')
endchar = '})'
indent += len(typ.__name__) + 1
object = sorted(object, key=_safe_key)
if self._indent_per_level > 1:
write((self._indent_per_level - 1) * ' ')
if length:
context[objid] = 1
self._format_items(object, stream,
indent + self._indent_per_level,
allowance + 1, context, level)
del context[objid]
if issubclass(typ, tuple) and length == 1:
write(',')
write(endchar)
return
if issubclass(typ, str) and len(object) > 0 and r is str.__repr__:
chunks = []
lines = object.splitlines(True)
if level == 1:
indent += 1
max_width -= 2
for i, line in enumerate(lines):
rep = repr(line)
if len(rep) <= max_width:
chunks.append(rep)
else:
# A list of alternating (non-space, space) strings
parts = re.split(r'(\s+)', line) + ['']
current = ''
for i in range(0, len(parts), 2):
part = parts[i] + parts[i+1]
candidate = current + part
if len(repr(candidate)) > max_width:
if current:
chunks.append(repr(current))
current = part
else:
current = candidate
if current:
chunks.append(repr(current))
if len(chunks) == 1:
write(rep)
return
if level == 1:
write('(')
for i, rep in enumerate(chunks):
if i > 0:
write('\n' + ' '*indent)
write(rep)
if level == 1:
write(')')
return
write(rep)
def _format_items(self, items, stream, indent, allowance, context, level):
write = stream.write
delimnl = ',\n' + ' ' * indent
delim = ''
width = max_width = self._width - indent - allowance + 2
for ent in items:
if self._compact:
rep = self._repr(ent, context, level)
w = len(rep) + 2
if width < w:
width = max_width
if delim:
delim = delimnl
if width >= w:
width -= w
write(delim)
delim = ', '
write(rep)
continue
write(delim)
delim = delimnl
self._format(ent, stream, indent, allowance, context, level)
def _repr(self, object, context, level):
repr, readable, recursive = self.format(object, context.copy(),
self._depth, level)
if not readable:
self._readable = False
if recursive:
self._recursive = True
return repr
def format(self, object, context, maxlevels, level):
"""Format object for a specific context, returning a string
and flags indicating whether the representation is 'readable'
and whether the object represents a recursive construct.
"""
return _safe_repr(object, context, maxlevels, level)
# Return triple (repr_string, isreadable, isrecursive).
def _safe_repr(object, context, maxlevels, level):
typ = type(object)
if typ is str:
if 'locale' not in _sys.modules:
return repr(object), True, False
if "'" in object and '"' not in object:
closure = '"'
quotes = {'"': '\\"'}
else:
closure = "'"
quotes = {"'": "\\'"}
qget = quotes.get
sio = _StringIO()
write = sio.write
for char in object:
if char.isalpha():
write(char)
else:
write(qget(char, repr(char)[1:-1]))
return ("%s%s%s" % (closure, sio.getvalue(), closure)), True, False
r = getattr(typ, "__repr__", None)
if issubclass(typ, dict) and r is dict.__repr__:
if not object:
return "{}", True, False
objid = id(object)
if maxlevels and level >= maxlevels:
return "{...}", False, objid in context
if objid in context:
return _recursion(object), False, True
context[objid] = 1
readable = True
recursive = False
components = []
append = components.append
level += 1
saferepr = _safe_repr
items = sorted(object.items(), key=_safe_tuple)
for k, v in items:
krepr, kreadable, krecur = saferepr(k, context, maxlevels, level)
vrepr, vreadable, vrecur = saferepr(v, context, maxlevels, level)
append("%s: %s" % (krepr, vrepr))
readable = readable and kreadable and vreadable
if krecur or vrecur:
recursive = True
del context[objid]
return "{%s}" % ", ".join(components), readable, recursive
if (issubclass(typ, list) and r is list.__repr__) or \
(issubclass(typ, tuple) and r is tuple.__repr__):
if issubclass(typ, list):
if not object:
return "[]", True, False
format = "[%s]"
elif len(object) == 1:
format = "(%s,)"
else:
if not object:
return "()", True, False
format = "(%s)"
objid = id(object)
if maxlevels and level >= maxlevels:
return format % "...", False, objid in context
if objid in context:
return _recursion(object), False, True
context[objid] = 1
readable = True
recursive = False
components = []
append = components.append
level += 1
for o in object:
orepr, oreadable, orecur = _safe_repr(o, context, maxlevels, level)
append(orepr)
if not oreadable:
readable = False
if orecur:
recursive = True
del context[objid]
return format % ", ".join(components), readable, recursive
rep = repr(object)
return rep, (rep and not rep.startswith('<')), False
def _recursion(object):
return ("<Recursion on %s with id=%s>"
% (type(object).__name__, id(object)))
def _perfcheck(object=None):
import time
if object is None:
object = [("string", (1, 2), [3, 4], {5: 6, 7: 8})] * 100000
p = PrettyPrinter()
t1 = time.time()
_safe_repr(object, {}, None, 0)
t2 = time.time()
p.pformat(object)
t3 = time.time()
print("_safe_repr:", t2 - t1)
print("pformat:", t3 - t2)
if __name__ == "__main__":
_perfcheck()
#! /usr/bin/env python3
#
# Class for profiling python code. rev 1.0 6/2/94
#
# Written by James Roskind
# Based on prior profile module by Sjoerd Mullender...
# which was hacked somewhat by: Guido van Rossum
"""Class for profiling Python code."""
# Copyright Disney Enterprises, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied. See the License for the specific language
# governing permissions and limitations under the License.
import sys
import os
import time
import marshal
from optparse import OptionParser
__all__ = ["run", "runctx", "Profile"]
# Sample timer for use with
#i_count = 0
#def integer_timer():
# global i_count
# i_count = i_count + 1
# return i_count
#itimes = integer_timer # replace with C coded timer returning integers
class _Utils:
"""Support class for utility functions which are shared by
profile.py and cProfile.py modules.
Not supposed to be used directly.
"""
def __init__(self, profiler):
self.profiler = profiler
def run(self, statement, filename, sort):
prof = self.profiler()
try:
prof.run(statement)
except SystemExit:
pass
finally:
self._show(prof, filename, sort)
def runctx(self, statement, globals, locals, filename, sort):
prof = self.profiler()
try:
prof.runctx(statement, globals, locals)
except SystemExit:
pass
finally:
self._show(prof, filename, sort)
def _show(self, prof, filename, sort):
if filename is not None:
prof.dump_stats(filename)
else:
prof.print_stats(sort)
#**************************************************************************
# The following are the static member functions for the profiler class
# Note that an instance of Profile() is *not* needed to call them.
#**************************************************************************
def run(statement, filename=None, sort=-1):
"""Run statement under profiler optionally saving results in filename
This function takes a single argument that can be passed to the
"exec" statement, and an optional file name. In all cases this
routine attempts to "exec" its first argument and gather profiling
statistics from the execution. If no file name is present, then this
function automatically prints a simple profiling report, sorted by the
standard name string (file/line/function-name) that is presented in
each line.
"""
return _Utils(Profile).run(statement, filename, sort)
def runctx(statement, globals, locals, filename=None, sort=-1):
"""Run statement under profiler, supplying your own globals and locals,
optionally saving results in filename.
statement and filename have the same semantics as profile.run
"""
return _Utils(Profile).runctx(statement, globals, locals, filename, sort)
class Profile:
"""Profiler class.
self.cur is always a tuple. Each such tuple corresponds to a stack
frame that is currently active (self.cur[-2]). The following are the
definitions of its members. We use this external "parallel stack" to
avoid contaminating the program that we are profiling. (old profiler
used to write into the frames local dictionary!!) Derived classes
can change the definition of some entries, as long as they leave
[-2:] intact (frame and previous tuple). In case an internal error is
detected, the -3 element is used as the function name.
[ 0] = Time that needs to be charged to the parent frame's function.
It is used so that a function call will not have to access the
timing data for the parent frame.
[ 1] = Total time spent in this frame's function, excluding time in
subfunctions (this latter is tallied in cur[2]).
[ 2] = Total time spent in subfunctions, excluding time executing the
frame's function (this latter is tallied in cur[1]).
[-3] = Name of the function that corresponds to this frame.
[-2] = Actual frame that we correspond to (used to sync exception handling).
[-1] = Our parent 6-tuple (corresponds to frame.f_back).
Timing data for each function is stored as a 5-tuple in the dictionary
self.timings[]. The index is always the name stored in self.cur[-3].
The following are the definitions of the members:
[0] = The number of times this function was called, not counting direct
or indirect recursion,
[1] = Number of times this function appears on the stack, minus one
[2] = Total time spent internal to this function
[3] = Cumulative time that this function was present on the stack. In
non-recursive functions, this is the total execution time from start
to finish of each invocation of a function, including time spent in
all subfunctions.
[4] = A dictionary indicating for each function name, the number of times
it was called by us.
"""
bias = 0 # calibration constant
def __init__(self, timer=None, bias=None):
self.timings = {}
self.cur = None
self.cmd = ""
self.c_func_name = ""
if bias is None:
bias = self.bias
self.bias = bias # Materialize in local dict for lookup speed.
if not timer:
self.timer = self.get_time = time.process_time
self.dispatcher = self.trace_dispatch_i
else:
self.timer = timer
t = self.timer() # test out timer function
try:
length = len(t)
except TypeError:
self.get_time = timer
self.dispatcher = self.trace_dispatch_i
else:
if length == 2:
self.dispatcher = self.trace_dispatch
else:
self.dispatcher = self.trace_dispatch_l
# This get_time() implementation needs to be defined
# here to capture the passed-in timer in the parameter
# list (for performance). Note that we can't assume
# the timer() result contains two values in all
# cases.
def get_time_timer(timer=timer, sum=sum):
return sum(timer())
self.get_time = get_time_timer
self.t = self.get_time()
self.simulate_call('profiler')
# Heavily optimized dispatch routine for os.times() timer
def trace_dispatch(self, frame, event, arg):
timer = self.timer
t = timer()
t = t[0] + t[1] - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame,t):
t = timer()
self.t = t[0] + t[1]
else:
r = timer()
self.t = r[0] + r[1] - t # put back unrecorded delta
# Dispatch routine for best timer program (return = scalar, fastest if
# an integer but float works too -- and time.clock() relies on that).
def trace_dispatch_i(self, frame, event, arg):
timer = self.timer
t = timer() - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame, t):
self.t = timer()
else:
self.t = timer() - t # put back unrecorded delta
# Dispatch routine for macintosh (timer returns time in ticks of
# 1/60th second)
def trace_dispatch_mac(self, frame, event, arg):
timer = self.timer
t = timer()/60.0 - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame, t):
self.t = timer()/60.0
else:
self.t = timer()/60.0 - t # put back unrecorded delta
# SLOW generic dispatch routine for timer returning lists of numbers
def trace_dispatch_l(self, frame, event, arg):
get_time = self.get_time
t = get_time() - self.t - self.bias
if event == "c_call":
self.c_func_name = arg.__name__
if self.dispatch[event](self, frame, t):
self.t = get_time()
else:
self.t = get_time() - t # put back unrecorded delta
# In the event handlers, the first 3 elements of self.cur are unpacked
# into vrbls w/ 3-letter names. The last two characters are meant to be
# mnemonic:
# _pt self.cur[0] "parent time" time to be charged to parent frame
# _it self.cur[1] "internal time" time spent directly in the function
# _et self.cur[2] "external time" time spent in subfunctions
def trace_dispatch_exception(self, frame, t):
rpt, rit, ret, rfn, rframe, rcur = self.cur
if (rframe is not frame) and rcur:
return self.trace_dispatch_return(rframe, t)
self.cur = rpt, rit+t, ret, rfn, rframe, rcur
return 1
def trace_dispatch_call(self, frame, t):
if self.cur and frame.f_back is not self.cur[-2]:
rpt, rit, ret, rfn, rframe, rcur = self.cur
if not isinstance(rframe, Profile.fake_frame):
assert rframe.f_back is frame.f_back, ("Bad call", rfn,
rframe, rframe.f_back,
frame, frame.f_back)
self.trace_dispatch_return(rframe, 0)
assert (self.cur is None or \
frame.f_back is self.cur[-2]), ("Bad call",
self.cur[-3])
fcode = frame.f_code
fn = (fcode.co_filename, fcode.co_firstlineno, fcode.co_name)
self.cur = (t, 0, 0, fn, frame, self.cur)
timings = self.timings
if fn in timings:
cc, ns, tt, ct, callers = timings[fn]
timings[fn] = cc, ns + 1, tt, ct, callers
else:
timings[fn] = 0, 0, 0, 0, {}
return 1
def trace_dispatch_c_call (self, frame, t):
fn = ("", 0, self.c_func_name)
self.cur = (t, 0, 0, fn, frame, self.cur)
timings = self.timings
if fn in timings:
cc, ns, tt, ct, callers = timings[fn]
timings[fn] = cc, ns+1, tt, ct, callers
else:
timings[fn] = 0, 0, 0, 0, {}
return 1
def trace_dispatch_return(self, frame, t):
if frame is not self.cur[-2]:
assert frame is self.cur[-2].f_back, ("Bad return", self.cur[-3])
self.trace_dispatch_return(self.cur[-2], 0)
# Prefix "r" means part of the Returning or exiting frame.
# Prefix "p" means part of the Previous or Parent or older frame.
rpt, rit, ret, rfn, frame, rcur = self.cur
rit = rit + t
frame_total = rit + ret
ppt, pit, pet, pfn, pframe, pcur = rcur
self.cur = ppt, pit + rpt, pet + frame_total, pfn, pframe, pcur
timings = self.timings
cc, ns, tt, ct, callers = timings[rfn]
if not ns:
# This is the only occurrence of the function on the stack.
# Else this is a (directly or indirectly) recursive call, and
# its cumulative time will get updated when the topmost call to
# it returns.
ct = ct + frame_total
cc = cc + 1
if pfn in callers:
callers[pfn] = callers[pfn] + 1 # hack: gather more
# stats such as the amount of time added to ct courtesy
# of this specific call, and the contribution to cc
# courtesy of this call.
else:
callers[pfn] = 1
timings[rfn] = cc, ns - 1, tt + rit, ct, callers
return 1
dispatch = {
"call": trace_dispatch_call,
"exception": trace_dispatch_exception,
"return": trace_dispatch_return,
"c_call": trace_dispatch_c_call,
"c_exception": trace_dispatch_return, # the C function returned
"c_return": trace_dispatch_return,
}
# The next few functions play with self.cmd. By carefully preloading
# our parallel stack, we can force the profiled result to include
# an arbitrary string as the name of the calling function.
# We use self.cmd as that string, and the resulting stats look
# very nice :-).
def set_cmd(self, cmd):
if self.cur[-1]: return # already set
self.cmd = cmd
self.simulate_call(cmd)
class fake_code:
def __init__(self, filename, line, name):
self.co_filename = filename
self.co_line = line
self.co_name = name
self.co_firstlineno = 0
def __repr__(self):
return repr((self.co_filename, self.co_line, self.co_name))
class fake_frame:
def __init__(self, code, prior):
self.f_code = code
self.f_back = prior
def simulate_call(self, name):
code = self.fake_code('profile', 0, name)
if self.cur:
pframe = self.cur[-2]
else:
pframe = None
frame = self.fake_frame(code, pframe)
self.dispatch['call'](self, frame, 0)
# collect stats from pending stack, including getting final
# timings for self.cmd frame.
def simulate_cmd_complete(self):
get_time = self.get_time
t = get_time() - self.t
while self.cur[-1]:
# We *can* cause assertion errors here if
# dispatch_trace_return checks for a frame match!
self.dispatch['return'](self, self.cur[-2], t)
t = 0
self.t = get_time() - t
def print_stats(self, sort=-1):
import pstats
pstats.Stats(self).strip_dirs().sort_stats(sort). \
print_stats()
def dump_stats(self, file):
with open(file, 'wb') as f:
self.create_stats()
marshal.dump(self.stats, f)
def create_stats(self):
self.simulate_cmd_complete()
self.snapshot_stats()
def snapshot_stats(self):
self.stats = {}
for func, (cc, ns, tt, ct, callers) in self.timings.items():
callers = callers.copy()
nc = 0
for callcnt in callers.values():
nc += callcnt
self.stats[func] = cc, nc, tt, ct, callers
# The following two methods can be called by clients to use
# a profiler to profile a statement, given as a string.
def run(self, cmd):
import __main__
dict = __main__.__dict__
return self.runctx(cmd, dict, dict)
def runctx(self, cmd, globals, locals):
self.set_cmd(cmd)
sys.setprofile(self.dispatcher)
try:
exec(cmd, globals, locals)
finally:
sys.setprofile(None)
return self
# This method is more useful to profile a single function call.
def runcall(self, func, *args, **kw):
self.set_cmd(repr(func))
sys.setprofile(self.dispatcher)
try:
return func(*args, **kw)
finally:
sys.setprofile(None)
#******************************************************************
# The following calculates the overhead for using a profiler. The
# problem is that it takes a fair amount of time for the profiler
# to stop the stopwatch (from the time it receives an event).
# Similarly, there is a delay from the time that the profiler
# re-starts the stopwatch before the user's code really gets to
# continue. The following code tries to measure the difference on
# a per-event basis.
#
# Note that this difference is only significant if there are a lot of
# events, and relatively little user code per event. For example,
# code with small functions will typically benefit from having the
# profiler calibrated for the current platform. This *could* be
# done on the fly during init() time, but it is not worth the
# effort. Also note that if too large a value specified, then
# execution time on some functions will actually appear as a
# negative number. It is *normal* for some functions (with very
# low call counts) to have such negative stats, even if the
# calibration figure is "correct."
#
# One alternative to profile-time calibration adjustments (i.e.,
# adding in the magic little delta during each event) is to track
# more carefully the number of events (and cumulatively, the number
# of events during sub functions) that are seen. If this were
# done, then the arithmetic could be done after the fact (i.e., at
# display time). Currently, we track only call/return events.
# These values can be deduced by examining the callees and callers
# vectors for each functions. Hence we *can* almost correct the
# internal time figure at print time (note that we currently don't
# track exception event processing counts). Unfortunately, there
# is currently no similar information for cumulative sub-function
# time. It would not be hard to "get all this info" at profiler
# time. Specifically, we would have to extend the tuples to keep
# counts of this in each frame, and then extend the defs of timing
# tuples to include the significant two figures. I'm a bit fearful
# that this additional feature will slow the heavily optimized
# event/time ratio (i.e., the profiler would run slower, fur a very
# low "value added" feature.)
#**************************************************************
def calibrate(self, m, verbose=0):
if self.__class__ is not Profile:
raise TypeError("Subclasses must override .calibrate().")
saved_bias = self.bias
self.bias = 0
try:
return self._calibrate_inner(m, verbose)
finally:
self.bias = saved_bias
def _calibrate_inner(self, m, verbose):
get_time = self.get_time
# Set up a test case to be run with and without profiling. Include
# lots of calls, because we're trying to quantify stopwatch overhead.
# Do not raise any exceptions, though, because we want to know
# exactly how many profile events are generated (one call event, +
# one return event, per Python-level call).
def f1(n):
for i in range(n):
x = 1
def f(m, f1=f1):
for i in range(m):
f1(100)
f(m) # warm up the cache
# elapsed_noprofile <- time f(m) takes without profiling.
t0 = get_time()
f(m)
t1 = get_time()
elapsed_noprofile = t1 - t0
if verbose:
print("elapsed time without profiling =", elapsed_noprofile)
# elapsed_profile <- time f(m) takes with profiling. The difference
# is profiling overhead, only some of which the profiler subtracts
# out on its own.
p = Profile()
t0 = get_time()
p.runctx('f(m)', globals(), locals())
t1 = get_time()
elapsed_profile = t1 - t0
if verbose:
print("elapsed time with profiling =", elapsed_profile)
# reported_time <- "CPU seconds" the profiler charged to f and f1.
total_calls = 0.0
reported_time = 0.0
for (filename, line, funcname), (cc, ns, tt, ct, callers) in \
p.timings.items():
if funcname in ("f", "f1"):
total_calls += cc
reported_time += tt
if verbose:
print("'CPU seconds' profiler reported =", reported_time)
print("total # calls =", total_calls)
if total_calls != m + 1:
raise ValueError("internal error: total calls = %d" % total_calls)
# reported_time - elapsed_noprofile = overhead the profiler wasn't
# able to measure. Divide by twice the number of calls (since there
# are two profiler events per call in this test) to get the hidden
# overhead per event.
mean = (reported_time - elapsed_noprofile) / 2.0 / total_calls
if verbose:
print("mean stopwatch overhead per profile event =", mean)
return mean
#****************************************************************************
def main():
usage = "profile.py [-o output_file_path] [-s sort] scriptfile [arg] ..."
parser = OptionParser(usage=usage)
parser.allow_interspersed_args = False
parser.add_option('-o', '--outfile', dest="outfile",
help="Save stats to <outfile>", default=None)
parser.add_option('-s', '--sort', dest="sort",
help="Sort order when printing to stdout, based on pstats.Stats class",
default=-1)
if not sys.argv[1:]:
parser.print_usage()
sys.exit(2)
(options, args) = parser.parse_args()
sys.argv[:] = args
if len(args) > 0:
progname = args[0]
sys.path.insert(0, os.path.dirname(progname))
with open(progname, 'rb') as fp:
code = compile(fp.read(), progname, 'exec')
globs = {
'__file__': progname,
'__name__': '__main__',
'__package__': None,
'__cached__': None,
}
runctx(code, globs, None, options.outfile, options.sort)
else:
parser.print_usage()
return parser
# When invoked as main program, invoke the profiler on a script
if __name__ == '__main__':
main()
"""Class for printing reports on profiled python code."""
# Written by James Roskind
# Based on prior profile module by Sjoerd Mullender...
# which was hacked somewhat by: Guido van Rossum
# Copyright Disney Enterprises, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied. See the License for the specific language
# governing permissions and limitations under the License.
import sys
import os
import time
import marshal
import re
from functools import cmp_to_key
__all__ = ["Stats"]
class Stats:
"""This class is used for creating reports from data generated by the
Profile class. It is a "friend" of that class, and imports data either
by direct access to members of Profile class, or by reading in a dictionary
that was emitted (via marshal) from the Profile class.
The big change from the previous Profiler (in terms of raw functionality)
is that an "add()" method has been provided to combine Stats from
several distinct profile runs. Both the constructor and the add()
method now take arbitrarily many file names as arguments.
All the print methods now take an argument that indicates how many lines
to print. If the arg is a floating point number between 0 and 1.0, then
it is taken as a decimal percentage of the available lines to be printed
(e.g., .1 means print 10% of all available lines). If it is an integer,
it is taken to mean the number of lines of data that you wish to have
printed.
The sort_stats() method now processes some additional options (i.e., in
addition to the old -1, 0, 1, or 2). It takes an arbitrary number of
quoted strings to select the sort order. For example sort_stats('time',
'name') sorts on the major key of 'internal function time', and on the
minor key of 'the name of the function'. Look at the two tables in
sort_stats() and get_sort_arg_defs(self) for more examples.
All methods return self, so you can string together commands like:
Stats('foo', 'goo').strip_dirs().sort_stats('calls').\
print_stats(5).print_callers(5)
"""
def __init__(self, *args, stream=None):
self.stream = stream or sys.stdout
if not len(args):
arg = None
else:
arg = args[0]
args = args[1:]
self.init(arg)
self.add(*args)
def init(self, arg):
self.all_callees = None # calc only if needed
self.files = []
self.fcn_list = None
self.total_tt = 0
self.total_calls = 0
self.prim_calls = 0
self.max_name_len = 0
self.top_level = set()
self.stats = {}
self.sort_arg_dict = {}
self.load_stats(arg)
try:
self.get_top_level_stats()
except Exception:
print("Invalid timing data %s" %
(self.files[-1] if self.files else ''), file=self.stream)
raise
def load_stats(self, arg):
if arg is None:
self.stats = {}
return
elif isinstance(arg, str):
with open(arg, 'rb') as f:
self.stats = marshal.load(f)
try:
file_stats = os.stat(arg)
arg = time.ctime(file_stats.st_mtime) + " " + arg
except: # in case this is not unix
pass
self.files = [arg]
elif hasattr(arg, 'create_stats'):
arg.create_stats()
self.stats = arg.stats
arg.stats = {}
if not self.stats:
raise TypeError("Cannot create or construct a %r object from %r"
% (self.__class__, arg))
return
def get_top_level_stats(self):
for func, (cc, nc, tt, ct, callers) in self.stats.items():
self.total_calls += nc
self.prim_calls += cc
self.total_tt += tt
if ("jprofile", 0, "profiler") in callers:
self.top_level.add(func)
if len(func_std_string(func)) > self.max_name_len:
self.max_name_len = len(func_std_string(func))
def add(self, *arg_list):
if not arg_list:
return self
for item in reversed(arg_list):
if type(self) != type(item):
item = Stats(item)
self.files += item.files
self.total_calls += item.total_calls
self.prim_calls += item.prim_calls
self.total_tt += item.total_tt
for func in item.top_level:
self.top_level.add(func)
if self.max_name_len < item.max_name_len:
self.max_name_len = item.max_name_len
self.fcn_list = None
for func, stat in item.stats.items():
if func in self.stats:
old_func_stat = self.stats[func]
else:
old_func_stat = (0, 0, 0, 0, {},)
self.stats[func] = add_func_stats(old_func_stat, stat)
return self
def dump_stats(self, filename):
"""Write the profile data to a file we know how to load back."""
with open(filename, 'wb') as f:
marshal.dump(self.stats, f)
# list the tuple indices and directions for sorting,
# along with some printable description
sort_arg_dict_default = {
"calls" : (((1,-1), ), "call count"),
"ncalls" : (((1,-1), ), "call count"),
"cumtime" : (((3,-1), ), "cumulative time"),
"cumulative": (((3,-1), ), "cumulative time"),
"file" : (((4, 1), ), "file name"),
"filename" : (((4, 1), ), "file name"),
"line" : (((5, 1), ), "line number"),
"module" : (((4, 1), ), "file name"),
"name" : (((6, 1), ), "function name"),
"nfl" : (((6, 1),(4, 1),(5, 1),), "name/file/line"),
"pcalls" : (((0,-1), ), "primitive call count"),
"stdname" : (((7, 1), ), "standard name"),
"time" : (((2,-1), ), "internal time"),
"tottime" : (((2,-1), ), "internal time"),
}
def get_sort_arg_defs(self):
"""Expand all abbreviations that are unique."""
if not self.sort_arg_dict:
self.sort_arg_dict = dict = {}
bad_list = {}
for word, tup in self.sort_arg_dict_default.items():
fragment = word
while fragment:
if not fragment:
break
if fragment in dict:
bad_list[fragment] = 0
break
dict[fragment] = tup
fragment = fragment[:-1]
for word in bad_list:
del dict[word]
return self.sort_arg_dict
def sort_stats(self, *field):
if not field:
self.fcn_list = 0
return self
if len(field) == 1 and isinstance(field[0], int):
# Be compatible with old profiler
field = [ {-1: "stdname",
0: "calls",
1: "time",
2: "cumulative"}[field[0]] ]
sort_arg_defs = self.get_sort_arg_defs()
sort_tuple = ()
self.sort_type = ""
connector = ""
for word in field:
sort_tuple = sort_tuple + sort_arg_defs[word][0]
self.sort_type += connector + sort_arg_defs[word][1]
connector = ", "
stats_list = []
for func, (cc, nc, tt, ct, callers) in self.stats.items():
stats_list.append((cc, nc, tt, ct) + func +
(func_std_string(func), func))
stats_list.sort(key=cmp_to_key(TupleComp(sort_tuple).compare))
self.fcn_list = fcn_list = []
for tuple in stats_list:
fcn_list.append(tuple[-1])
return self
def reverse_order(self):
if self.fcn_list:
self.fcn_list.reverse()
return self
def strip_dirs(self):
oldstats = self.stats
self.stats = newstats = {}
max_name_len = 0
for func, (cc, nc, tt, ct, callers) in oldstats.items():
newfunc = func_strip_path(func)
if len(func_std_string(newfunc)) > max_name_len:
max_name_len = len(func_std_string(newfunc))
newcallers = {}
for func2, caller in callers.items():
newcallers[func_strip_path(func2)] = caller
if newfunc in newstats:
newstats[newfunc] = add_func_stats(
newstats[newfunc],
(cc, nc, tt, ct, newcallers))
else:
newstats[newfunc] = (cc, nc, tt, ct, newcallers)
old_top = self.top_level
self.top_level = new_top = set()
for func in old_top:
new_top.add(func_strip_path(func))
self.max_name_len = max_name_len
self.fcn_list = None
self.all_callees = None
return self
def calc_callees(self):
if self.all_callees:
return
self.all_callees = all_callees = {}
for func, (cc, nc, tt, ct, callers) in self.stats.items():
if not func in all_callees:
all_callees[func] = {}
for func2, caller in callers.items():
if not func2 in all_callees:
all_callees[func2] = {}
all_callees[func2][func] = caller
return
#******************************************************************
# The following functions support actual printing of reports
#******************************************************************
# Optional "amount" is either a line count, or a percentage of lines.
def eval_print_amount(self, sel, list, msg):
new_list = list
if isinstance(sel, str):
try:
rex = re.compile(sel)
except re.error:
msg += " <Invalid regular expression %r>\n" % sel
return new_list, msg
new_list = []
for func in list:
if rex.search(func_std_string(func)):
new_list.append(func)
else:
count = len(list)
if isinstance(sel, float) and 0.0 <= sel < 1.0:
count = int(count * sel + .5)
new_list = list[:count]
elif isinstance(sel, int) and 0 <= sel < count:
count = sel
new_list = list[:count]
if len(list) != len(new_list):
msg += " List reduced from %r to %r due to restriction <%r>\n" % (
len(list), len(new_list), sel)
return new_list, msg
def get_print_list(self, sel_list):
width = self.max_name_len
if self.fcn_list:
stat_list = self.fcn_list[:]
msg = " Ordered by: " + self.sort_type + '\n'
else:
stat_list = list(self.stats.keys())
msg = " Random listing order was used\n"
for selection in sel_list:
stat_list, msg = self.eval_print_amount(selection, stat_list, msg)
count = len(stat_list)
if not stat_list:
return 0, stat_list
print(msg, file=self.stream)
if count < len(self.stats):
width = 0
for func in stat_list:
if len(func_std_string(func)) > width:
width = len(func_std_string(func))
return width+2, stat_list
def print_stats(self, *amount):
for filename in self.files:
print(filename, file=self.stream)
if self.files:
print(file=self.stream)
indent = ' ' * 8
for func in self.top_level:
print(indent, func_get_function_name(func), file=self.stream)
print(indent, self.total_calls, "function calls", end=' ', file=self.stream)
if self.total_calls != self.prim_calls:
print("(%d primitive calls)" % self.prim_calls, end=' ', file=self.stream)
print("in %.3f seconds" % self.total_tt, file=self.stream)
print(file=self.stream)
width, list = self.get_print_list(amount)
if list:
self.print_title()
for func in list:
self.print_line(func)
print(file=self.stream)
print(file=self.stream)
return self
def print_callees(self, *amount):
width, list = self.get_print_list(amount)
if list:
self.calc_callees()
self.print_call_heading(width, "called...")
for func in list:
if func in self.all_callees:
self.print_call_line(width, func, self.all_callees[func])
else:
self.print_call_line(width, func, {})
print(file=self.stream)
print(file=self.stream)
return self
def print_callers(self, *amount):
width, list = self.get_print_list(amount)
if list:
self.print_call_heading(width, "was called by...")
for func in list:
cc, nc, tt, ct, callers = self.stats[func]
self.print_call_line(width, func, callers, "<-")
print(file=self.stream)
print(file=self.stream)
return self
def print_call_heading(self, name_size, column_title):
print("Function ".ljust(name_size) + column_title, file=self.stream)
# print sub-header only if we have new-style callers
subheader = False
for cc, nc, tt, ct, callers in self.stats.values():
if callers:
value = next(iter(callers.values()))
subheader = isinstance(value, tuple)
break
if subheader:
print(" "*name_size + " ncalls tottime cumtime", file=self.stream)
def print_call_line(self, name_size, source, call_dict, arrow="->"):
print(func_std_string(source).ljust(name_size) + arrow, end=' ', file=self.stream)
if not call_dict:
print(file=self.stream)
return
clist = sorted(call_dict.keys())
indent = ""
for func in clist:
name = func_std_string(func)
value = call_dict[func]
if isinstance(value, tuple):
nc, cc, tt, ct = value
if nc != cc:
substats = '%d/%d' % (nc, cc)
else:
substats = '%d' % (nc,)
substats = '%s %s %s %s' % (substats.rjust(7+2*len(indent)),
f8(tt), f8(ct), name)
left_width = name_size + 1
else:
substats = '%s(%r) %s' % (name, value, f8(self.stats[func][3]))
left_width = name_size + 3
print(indent*left_width + substats, file=self.stream)
indent = " "
def print_title(self):
print(' ncalls tottime percall cumtime percall', end=' ', file=self.stream)
print('filename:lineno(function)', file=self.stream)
def print_line(self, func): # hack: should print percentages
cc, nc, tt, ct, callers = self.stats[func]
c = str(nc)
if nc != cc:
c = c + '/' + str(cc)
print(c.rjust(9), end=' ', file=self.stream)
print(f8(tt), end=' ', file=self.stream)
if nc == 0:
print(' '*8, end=' ', file=self.stream)
else:
print(f8(tt/nc), end=' ', file=self.stream)
print(f8(ct), end=' ', file=self.stream)
if cc == 0:
print(' '*8, end=' ', file=self.stream)
else:
print(f8(ct/cc), end=' ', file=self.stream)
print(func_std_string(func), file=self.stream)
class TupleComp:
"""This class provides a generic function for comparing any two tuples.
Each instance records a list of tuple-indices (from most significant
to least significant), and sort direction (ascending or decending) for
each tuple-index. The compare functions can then be used as the function
argument to the system sort() function when a list of tuples need to be
sorted in the instances order."""
def __init__(self, comp_select_list):
self.comp_select_list = comp_select_list
def compare (self, left, right):
for index, direction in self.comp_select_list:
l = left[index]
r = right[index]
if l < r:
return -direction
if l > r:
return direction
return 0
#**************************************************************************
# func_name is a triple (file:string, line:int, name:string)
def func_strip_path(func_name):
filename, line, name = func_name
return os.path.basename(filename), line, name
def func_get_function_name(func):
return func[2]
def func_std_string(func_name): # match what old profile produced
if func_name[:2] == ('~', 0):
# special case for built-in functions
name = func_name[2]
if name.startswith('<') and name.endswith('>'):
return '{%s}' % name[1:-1]
else:
return name
else:
return "%s:%d(%s)" % func_name
#**************************************************************************
# The following functions combine statists for pairs functions.
# The bulk of the processing involves correctly handling "call" lists,
# such as callers and callees.
#**************************************************************************
def add_func_stats(target, source):
"""Add together all the stats for two profile entries."""
cc, nc, tt, ct, callers = source
t_cc, t_nc, t_tt, t_ct, t_callers = target
return (cc+t_cc, nc+t_nc, tt+t_tt, ct+t_ct,
add_callers(t_callers, callers))
def add_callers(target, source):
"""Combine two caller lists in a single list."""
new_callers = {}
for func, caller in target.items():
new_callers[func] = caller
for func, caller in source.items():
if func in new_callers:
if isinstance(caller, tuple):
# format used by cProfile
new_callers[func] = tuple([i[0] + i[1] for i in
zip(caller, new_callers[func])])
else:
# format used by profile
new_callers[func] += caller
else:
new_callers[func] = caller
return new_callers
def count_calls(callers):
"""Sum the caller statistics to get total number of calls received."""
nc = 0
for calls in callers.values():
nc += calls
return nc
#**************************************************************************
# The following functions support printing of reports
#**************************************************************************
def f8(x):
return "%8.3f" % x
#**************************************************************************
# Statistics browser added by ESR, April 2001
#**************************************************************************
if __name__ == '__main__':
import cmd
try:
import readline
except ImportError:
pass
class ProfileBrowser(cmd.Cmd):
def __init__(self, profile=None):
cmd.Cmd.__init__(self)
self.prompt = "% "
self.stats = None
self.stream = sys.stdout
if profile is not None:
self.do_read(profile)
def generic(self, fn, line):
args = line.split()
processed = []
for term in args:
try:
processed.append(int(term))
continue
except ValueError:
pass
try:
frac = float(term)
if frac > 1 or frac < 0:
print("Fraction argument must be in [0, 1]", file=self.stream)
continue
processed.append(frac)
continue
except ValueError:
pass
processed.append(term)
if self.stats:
getattr(self.stats, fn)(*processed)
else:
print("No statistics object is loaded.", file=self.stream)
return 0
def generic_help(self):
print("Arguments may be:", file=self.stream)
print("* An integer maximum number of entries to print.", file=self.stream)
print("* A decimal fractional number between 0 and 1, controlling", file=self.stream)
print(" what fraction of selected entries to print.", file=self.stream)
print("* A regular expression; only entries with function names", file=self.stream)
print(" that match it are printed.", file=self.stream)
def do_add(self, line):
if self.stats:
self.stats.add(line)
else:
print("No statistics object is loaded.", file=self.stream)
return 0
def help_add(self):
print("Add profile info from given file to current statistics object.", file=self.stream)
def do_callees(self, line):
return self.generic('print_callees', line)
def help_callees(self):
print("Print callees statistics from the current stat object.", file=self.stream)
self.generic_help()
def do_callers(self, line):
return self.generic('print_callers', line)
def help_callers(self):
print("Print callers statistics from the current stat object.", file=self.stream)
self.generic_help()
def do_EOF(self, line):
print("", file=self.stream)
return 1
def help_EOF(self):
print("Leave the profile brower.", file=self.stream)
def do_quit(self, line):
return 1
def help_quit(self):
print("Leave the profile brower.", file=self.stream)
def do_read(self, line):
if line:
try:
self.stats = Stats(line)
except OSError as err:
print(err.args[1], file=self.stream)
return
except Exception as err:
print(err.__class__.__name__ + ':', err, file=self.stream)
return
self.prompt = line + "% "
elif len(self.prompt) > 2:
line = self.prompt[:-2]
self.do_read(line)
else:
print("No statistics object is current -- cannot reload.", file=self.stream)
return 0
def help_read(self):
print("Read in profile data from a specified file.", file=self.stream)
print("Without argument, reload the current file.", file=self.stream)
def do_reverse(self, line):
if self.stats:
self.stats.reverse_order()
else:
print("No statistics object is loaded.", file=self.stream)
return 0
def help_reverse(self):
print("Reverse the sort order of the profiling report.", file=self.stream)
def do_sort(self, line):
if not self.stats:
print("No statistics object is loaded.", file=self.stream)
return
abbrevs = self.stats.get_sort_arg_defs()
if line and all((x in abbrevs) for x in line.split()):
self.stats.sort_stats(*line.split())
else:
print("Valid sort keys (unique prefixes are accepted):", file=self.stream)
for (key, value) in Stats.sort_arg_dict_default.items():
print("%s -- %s" % (key, value[1]), file=self.stream)
return 0
def help_sort(self):
print("Sort profile data according to specified keys.", file=self.stream)
print("(Typing `sort' without arguments lists valid keys.)", file=self.stream)
def complete_sort(self, text, *args):
return [a for a in Stats.sort_arg_dict_default if a.startswith(text)]
def do_stats(self, line):
return self.generic('print_stats', line)
def help_stats(self):
print("Print statistics from the current stat object.", file=self.stream)
self.generic_help()
def do_strip(self, line):
if self.stats:
self.stats.strip_dirs()
else:
print("No statistics object is loaded.", file=self.stream)
def help_strip(self):
print("Strip leading path information from filenames in the report.", file=self.stream)
def help_help(self):
print("Show help for a given command.", file=self.stream)
def postcmd(self, stop, line):
if stop:
return stop
return None
if len(sys.argv) > 1:
initprofile = sys.argv[1]
else:
initprofile = None
try:
browser = ProfileBrowser(initprofile)
for profile in sys.argv[2:]:
browser.do_add(profile)
print("Welcome to the profile statistics browser.", file=browser.stream)
browser.cmdloop()
print("Goodbye.", file=browser.stream)
except KeyboardInterrupt:
pass
# That's all, folks.
"""Pseudo terminal utilities."""
# Bugs: No signal handling. Doesn't set slave termios and window size.
# Only tested on Linux.
# See: W. Richard Stevens. 1992. Advanced Programming in the
# UNIX Environment. Chapter 19.
# Author: Steen Lumholt -- with additions by Guido.
from select import select
import os
import tty
__all__ = ["openpty","fork","spawn"]
STDIN_FILENO = 0
STDOUT_FILENO = 1
STDERR_FILENO = 2
CHILD = 0
def openpty():
"""openpty() -> (master_fd, slave_fd)
Open a pty master/slave pair, using os.openpty() if possible."""
try:
return os.openpty()
except (AttributeError, OSError):
pass
master_fd, slave_name = _open_terminal()
slave_fd = slave_open(slave_name)
return master_fd, slave_fd
def master_open():
"""master_open() -> (master_fd, slave_name)
Open a pty master and return the fd, and the filename of the slave end.
Deprecated, use openpty() instead."""
try:
master_fd, slave_fd = os.openpty()
except (AttributeError, OSError):
pass
else:
slave_name = os.ttyname(slave_fd)
os.close(slave_fd)
return master_fd, slave_name
return _open_terminal()
def _open_terminal():
"""Open pty master and return (master_fd, tty_name)."""
for x in 'pqrstuvwxyzPQRST':
for y in '0123456789abcdef':
pty_name = '/dev/pty' + x + y
try:
fd = os.open(pty_name, os.O_RDWR)
except OSError:
continue
return (fd, '/dev/tty' + x + y)
raise OSError('out of pty devices')
def slave_open(tty_name):
"""slave_open(tty_name) -> slave_fd
Open the pty slave and acquire the controlling terminal, returning
opened filedescriptor.
Deprecated, use openpty() instead."""
result = os.open(tty_name, os.O_RDWR)
try:
from fcntl import ioctl, I_PUSH
except ImportError:
return result
try:
ioctl(result, I_PUSH, "ptem")
ioctl(result, I_PUSH, "ldterm")
except OSError:
pass
return result
def fork():
"""fork() -> (pid, master_fd)
Fork and make the child a session leader with a controlling terminal."""
try:
pid, fd = os.forkpty()
except (AttributeError, OSError):
pass
else:
if pid == CHILD:
try:
os.setsid()
except OSError:
# os.forkpty() already set us session leader
pass
return pid, fd
master_fd, slave_fd = openpty()
pid = os.fork()
if pid == CHILD:
# Establish a new session.
os.setsid()
os.close(master_fd)
# Slave becomes stdin/stdout/stderr of child.
os.dup2(slave_fd, STDIN_FILENO)
os.dup2(slave_fd, STDOUT_FILENO)
os.dup2(slave_fd, STDERR_FILENO)
if (slave_fd > STDERR_FILENO):
os.close (slave_fd)
# Explicitly open the tty to make it become a controlling tty.
tmp_fd = os.open(os.ttyname(STDOUT_FILENO), os.O_RDWR)
os.close(tmp_fd)
else:
os.close(slave_fd)
# Parent and child process.
return pid, master_fd
def _writen(fd, data):
"""Write all the data to a descriptor."""
while data:
n = os.write(fd, data)
data = data[n:]
def _read(fd):
"""Default read function."""
return os.read(fd, 1024)
def _copy(master_fd, master_read=_read, stdin_read=_read):
"""Parent copy loop.
Copies
pty master -> standard output (master_read)
standard input -> pty master (stdin_read)"""
fds = [master_fd, STDIN_FILENO]
while True:
rfds, wfds, xfds = select(fds, [], [])
if master_fd in rfds:
data = master_read(master_fd)
if not data: # Reached EOF.
fds.remove(master_fd)
else:
os.write(STDOUT_FILENO, data)
if STDIN_FILENO in rfds:
data = stdin_read(STDIN_FILENO)
if not data:
fds.remove(STDIN_FILENO)
else:
_writen(master_fd, data)
def spawn(argv, master_read=_read, stdin_read=_read):
"""Create a spawned process."""
if type(argv) == type(''):
argv = (argv,)
pid, master_fd = fork()
if pid == CHILD:
os.execlp(argv[0], *argv)
try:
mode = tty.tcgetattr(STDIN_FILENO)
tty.setraw(STDIN_FILENO)
restore = 1
except tty.error: # This is the same as termios.error
restore = 0
try:
_copy(master_fd, master_read, stdin_read)
except OSError:
if restore:
tty.tcsetattr(STDIN_FILENO, tty.TCSAFLUSH, mode)
os.close(master_fd)
return os.waitpid(pid, 0)[1]
"""Parse a Python module and describe its classes and methods.
Parse enough of a Python file to recognize imports and class and
method definitions, and to find out the superclasses of a class.
The interface consists of a single function:
readmodule_ex(module [, path])
where module is the name of a Python module, and path is an optional
list of directories where the module is to be searched. If present,
path is prepended to the system search path sys.path. The return
value is a dictionary. The keys of the dictionary are the names of
the classes defined in the module (including classes that are defined
via the from XXX import YYY construct). The values are class
instances of the class Class defined here. One special key/value pair
is present for packages: the key '__path__' has a list as its value
which contains the package search path.
A class is described by the class Class in this module. Instances
of this class have the following instance variables:
module -- the module name
name -- the name of the class
super -- a list of super classes (Class instances)
methods -- a dictionary of methods
file -- the file in which the class was defined
lineno -- the line in the file on which the class statement occurred
The dictionary of methods uses the method names as keys and the line
numbers on which the method was defined as values.
If the name of a super class is not recognized, the corresponding
entry in the list of super classes is not a class instance but a
string giving the name of the super class. Since import statements
are recognized and imported modules are scanned as well, this
shouldn't happen often.
A function is described by the class Function in this module.
Instances of this class have the following instance variables:
module -- the module name
name -- the name of the class
file -- the file in which the class was defined
lineno -- the line in the file on which the class statement occurred
"""
import io
import os
import sys
import importlib.util
import tokenize
from token import NAME, DEDENT, OP
from operator import itemgetter
__all__ = ["readmodule", "readmodule_ex", "Class", "Function"]
_modules = {} # cache of modules we've seen
# each Python class is represented by an instance of this class
class Class:
'''Class to represent a Python class.'''
def __init__(self, module, name, super, file, lineno):
self.module = module
self.name = name
if super is None:
super = []
self.super = super
self.methods = {}
self.file = file
self.lineno = lineno
def _addmethod(self, name, lineno):
self.methods[name] = lineno
class Function:
'''Class to represent a top-level Python function'''
def __init__(self, module, name, file, lineno):
self.module = module
self.name = name
self.file = file
self.lineno = lineno
def readmodule(module, path=None):
'''Backwards compatible interface.
Call readmodule_ex() and then only keep Class objects from the
resulting dictionary.'''
res = {}
for key, value in _readmodule(module, path or []).items():
if isinstance(value, Class):
res[key] = value
return res
def readmodule_ex(module, path=None):
'''Read a module file and return a dictionary of classes.
Search for MODULE in PATH and sys.path, read and parse the
module and return a dictionary with one entry for each class
found in the module.
'''
return _readmodule(module, path or [])
def _readmodule(module, path, inpackage=None):
'''Do the hard work for readmodule[_ex].
If INPACKAGE is given, it must be the dotted name of the package in
which we are searching for a submodule, and then PATH must be the
package search path; otherwise, we are searching for a top-level
module, and PATH is combined with sys.path.
'''
# Compute the full module name (prepending inpackage if set)
if inpackage is not None:
fullmodule = "%s.%s" % (inpackage, module)
else:
fullmodule = module
# Check in the cache
if fullmodule in _modules:
return _modules[fullmodule]
# Initialize the dict for this module's contents
dict = {}
# Check if it is a built-in module; we don't do much for these
if module in sys.builtin_module_names and inpackage is None:
_modules[module] = dict
return dict
# Check for a dotted module name
i = module.rfind('.')
if i >= 0:
package = module[:i]
submodule = module[i+1:]
parent = _readmodule(package, path, inpackage)
if inpackage is not None:
package = "%s.%s" % (inpackage, package)
if not '__path__' in parent:
raise ImportError('No package named {}'.format(package))
return _readmodule(submodule, parent['__path__'], package)
# Search the path for the module
f = None
if inpackage is not None:
search_path = path
else:
search_path = path + sys.path
# XXX This will change once issue19944 lands.
spec = importlib.util._find_spec_from_path(fullmodule, search_path)
fname = spec.loader.get_filename(fullmodule)
_modules[fullmodule] = dict
if spec.loader.is_package(fullmodule):
dict['__path__'] = [os.path.dirname(fname)]
try:
source = spec.loader.get_source(fullmodule)
if source is None:
return dict
except (AttributeError, ImportError):
# not Python source, can't do anything with this module
return dict
f = io.StringIO(source)
stack = [] # stack of (class, indent) pairs
g = tokenize.generate_tokens(f.readline)
try:
for tokentype, token, start, _end, _line in g:
if tokentype == DEDENT:
lineno, thisindent = start
# close nested classes and defs
while stack and stack[-1][1] >= thisindent:
del stack[-1]
elif token == 'def':
lineno, thisindent = start
# close previous nested classes and defs
while stack and stack[-1][1] >= thisindent:
del stack[-1]
tokentype, meth_name, start = next(g)[0:3]
if tokentype != NAME:
continue # Syntax error
if stack:
cur_class = stack[-1][0]
if isinstance(cur_class, Class):
# it's a method
cur_class._addmethod(meth_name, lineno)
# else it's a nested def
else:
# it's a function
dict[meth_name] = Function(fullmodule, meth_name,
fname, lineno)
stack.append((None, thisindent)) # Marker for nested fns
elif token == 'class':
lineno, thisindent = start
# close previous nested classes and defs
while stack and stack[-1][1] >= thisindent:
del stack[-1]
tokentype, class_name, start = next(g)[0:3]
if tokentype != NAME:
continue # Syntax error
# parse what follows the class name
tokentype, token, start = next(g)[0:3]
inherit = None
if token == '(':
names = [] # List of superclasses
# there's a list of superclasses
level = 1
super = [] # Tokens making up current superclass
while True:
tokentype, token, start = next(g)[0:3]
if token in (')', ',') and level == 1:
n = "".join(super)
if n in dict:
# we know this super class
n = dict[n]
else:
c = n.split('.')
if len(c) > 1:
# super class is of the form
# module.class: look in module for
# class
m = c[-2]
c = c[-1]
if m in _modules:
d = _modules[m]
if c in d:
n = d[c]
names.append(n)
super = []
if token == '(':
level += 1
elif token == ')':
level -= 1
if level == 0:
break
elif token == ',' and level == 1:
pass
# only use NAME and OP (== dot) tokens for type name
elif tokentype in (NAME, OP) and level == 1:
super.append(token)
# expressions in the base list are not supported
inherit = names
cur_class = Class(fullmodule, class_name, inherit,
fname, lineno)
if not stack:
dict[class_name] = cur_class
stack.append((cur_class, thisindent))
elif token == 'import' and start[1] == 0:
modules = _getnamelist(g)
for mod, _mod2 in modules:
try:
# Recursively read the imported module
if inpackage is None:
_readmodule(mod, path)
else:
try:
_readmodule(mod, path, inpackage)
except ImportError:
_readmodule(mod, [])
except:
# If we can't find or parse the imported module,
# too bad -- don't die here.
pass
elif token == 'from' and start[1] == 0:
mod, token = _getname(g)
if not mod or token != "import":
continue
names = _getnamelist(g)
try:
# Recursively read the imported module
d = _readmodule(mod, path, inpackage)
except:
# If we can't find or parse the imported module,
# too bad -- don't die here.
continue
# add any classes that were defined in the imported module
# to our name space if they were mentioned in the list
for n, n2 in names:
if n in d:
dict[n2 or n] = d[n]
elif n == '*':
# don't add names that start with _
for n in d:
if n[0] != '_':
dict[n] = d[n]
except StopIteration:
pass
f.close()
return dict
def _getnamelist(g):
# Helper to get a comma-separated list of dotted names plus 'as'
# clauses. Return a list of pairs (name, name2) where name2 is
# the 'as' name, or None if there is no 'as' clause.
names = []
while True:
name, token = _getname(g)
if not name:
break
if token == 'as':
name2, token = _getname(g)
else:
name2 = None
names.append((name, name2))
while token != "," and "\n" not in token:
token = next(g)[1]
if token != ",":
break
return names
def _getname(g):
# Helper to get a dotted name, return a pair (name, token) where
# name is the dotted name, or None if there was no dotted name,
# and token is the next input token.
parts = []
tokentype, token = next(g)[0:2]
if tokentype != NAME and token != '*':
return (None, token)
parts.append(token)
while True:
tokentype, token = next(g)[0:2]
if token != '.':
break
tokentype, token = next(g)[0:2]
if tokentype != NAME:
break
parts.append(token)
return (".".join(parts), token)
def _main():
# Main program for testing.
import os
mod = sys.argv[1]
if os.path.exists(mod):
path = [os.path.dirname(mod)]
mod = os.path.basename(mod)
if mod.lower().endswith(".py"):
mod = mod[:-3]
else:
path = []
dict = readmodule_ex(mod, path)
objs = list(dict.values())
objs.sort(key=lambda a: getattr(a, 'lineno', 0))
for obj in objs:
if isinstance(obj, Class):
print("class", obj.name, obj.super, obj.lineno)
methods = sorted(obj.methods.items(), key=itemgetter(1))
for name, lineno in methods:
if name != "__path__":
print(" def", name, lineno)
elif isinstance(obj, Function):
print("def", obj.name, obj.lineno)
if __name__ == "__main__":
_main()
#!/usr/bin/env python3
"""Generate Python documentation in HTML or text for interactive use.
At the Python interactive prompt, calling help(thing) on a Python object
documents the object, and calling help() starts up an interactive
help session.
Or, at the shell command line outside of Python:
Run "pydoc <name>" to show documentation on something. <name> may be
the name of a function, module, package, or a dotted reference to a
class or function within a module or module in a package. If the
argument contains a path segment delimiter (e.g. slash on Unix,
backslash on Windows) it is treated as the path to a Python source file.
Run "pydoc -k <keyword>" to search for a keyword in the synopsis lines
of all available modules.
Run "pydoc -p <port>" to start an HTTP server on the given port on the
local machine. Port number 0 can be used to get an arbitrary unused port.
Run "pydoc -b" to start an HTTP server on an arbitrary unused port and
open a Web browser to interactively browse documentation. The -p option
can be used with the -b option to explicitly specify the server port.
Run "pydoc -w <name>" to write out the HTML documentation for a module
to a file named "<name>.html".
Module docs for core modules are assumed to be in
http://docs.python.org/X.Y/library/
This can be overridden by setting the PYTHONDOCS environment variable
to a different URL or to a local directory containing the Library
Reference Manual pages.
"""
__all__ = ['help']
__author__ = "Ka-Ping Yee <[email protected]>"
__date__ = "26 February 2001"
__credits__ = """Guido van Rossum, for an excellent programming language.
Tommy Burnette, the original creator of manpy.
Paul Prescod, for all his work on onlinehelp.
Richard Chamberlain, for the first implementation of textdoc.
"""
# Known bugs that can't be fixed here:
# - synopsis() cannot be prevented from clobbering existing
# loaded modules.
# - If the __file__ attribute on a module is a relative path and
# the current directory is changed with os.chdir(), an incorrect
# path will be displayed.
import builtins
import importlib._bootstrap
import importlib.machinery
import importlib.util
import inspect
import io
import os
import pkgutil
import platform
import re
import sys
import time
import tokenize
import urllib.parse
import warnings
from collections import deque
from reprlib import Repr
from traceback import format_exception_only
# --------------------------------------------------------- common routines
def pathdirs():
"""Convert sys.path into a list of absolute, existing, unique paths."""
dirs = []
normdirs = []
for dir in sys.path:
dir = os.path.abspath(dir or '.')
normdir = os.path.normcase(dir)
if normdir not in normdirs and os.path.isdir(dir):
dirs.append(dir)
normdirs.append(normdir)
return dirs
def getdoc(object):
"""Get the doc string or comments for an object."""
result = inspect.getdoc(object) or inspect.getcomments(object)
return result and re.sub('^ *\n', '', result.rstrip()) or ''
def splitdoc(doc):
"""Split a doc string into a synopsis line (if any) and the rest."""
lines = doc.strip().split('\n')
if len(lines) == 1:
return lines[0], ''
elif len(lines) >= 2 and not lines[1].rstrip():
return lines[0], '\n'.join(lines[2:])
return '', '\n'.join(lines)
def classname(object, modname):
"""Get a class name and qualify it with a module name if necessary."""
name = object.__name__
if object.__module__ != modname:
name = object.__module__ + '.' + name
return name
def isdata(object):
"""Check if an object is of a type that probably means it's data."""
return not (inspect.ismodule(object) or inspect.isclass(object) or
inspect.isroutine(object) or inspect.isframe(object) or
inspect.istraceback(object) or inspect.iscode(object))
def replace(text, *pairs):
"""Do a series of global replacements on a string."""
while pairs:
text = pairs[1].join(text.split(pairs[0]))
pairs = pairs[2:]
return text
def cram(text, maxlen):
"""Omit part of a string if needed to make it fit in a maximum length."""
if len(text) > maxlen:
pre = max(0, (maxlen-3)//2)
post = max(0, maxlen-3-pre)
return text[:pre] + '...' + text[len(text)-post:]
return text
_re_stripid = re.compile(r' at 0x[0-9a-f]{6,16}(>+)$', re.IGNORECASE)
def stripid(text):
"""Remove the hexadecimal id from a Python object representation."""
# The behaviour of %p is implementation-dependent in terms of case.
return _re_stripid.sub(r'\1', text)
def _is_some_method(obj):
return (inspect.isfunction(obj) or
inspect.ismethod(obj) or
inspect.isbuiltin(obj) or
inspect.ismethoddescriptor(obj))
def _is_bound_method(fn):
"""
Returns True if fn is a bound method, regardless of whether
fn was implemented in Python or in C.
"""
if inspect.ismethod(fn):
return True
if inspect.isbuiltin(fn):
self = getattr(fn, '__self__', None)
return not (inspect.ismodule(self) or (self is None))
return False
def allmethods(cl):
methods = {}
for key, value in inspect.getmembers(cl, _is_some_method):
methods[key] = 1
for base in cl.__bases__:
methods.update(allmethods(base)) # all your base are belong to us
for key in methods.keys():
methods[key] = getattr(cl, key)
return methods
def _split_list(s, predicate):
"""Split sequence s via predicate, and return pair ([true], [false]).
The return value is a 2-tuple of lists,
([x for x in s if predicate(x)],
[x for x in s if not predicate(x)])
"""
yes = []
no = []
for x in s:
if predicate(x):
yes.append(x)
else:
no.append(x)
return yes, no
def visiblename(name, all=None, obj=None):
"""Decide whether to show documentation on a variable."""
# Certain special names are redundant or internal.
# XXX Remove __initializing__?
if name in {'__author__', '__builtins__', '__cached__', '__credits__',
'__date__', '__doc__', '__file__', '__spec__',
'__loader__', '__module__', '__name__', '__package__',
'__path__', '__qualname__', '__slots__', '__version__'}:
return 0
# Private names are hidden, but special names are displayed.
if name.startswith('__') and name.endswith('__'): return 1
# Namedtuples have public fields and methods with a single leading underscore
if name.startswith('_') and hasattr(obj, '_fields'):
return True
if all is not None:
# only document that which the programmer exported in __all__
return name in all
else:
return not name.startswith('_')
def classify_class_attrs(object):
"""Wrap inspect.classify_class_attrs, with fixup for data descriptors."""
results = []
for (name, kind, cls, value) in inspect.classify_class_attrs(object):
if inspect.isdatadescriptor(value):
kind = 'data descriptor'
results.append((name, kind, cls, value))
return results
# ----------------------------------------------------- module manipulation
def ispackage(path):
"""Guess whether a path refers to a package directory."""
if os.path.isdir(path):
for ext in ('.py', '.pyc', '.pyo'):
if os.path.isfile(os.path.join(path, '__init__' + ext)):
return True
return False
def source_synopsis(file):
line = file.readline()
while line[:1] == '#' or not line.strip():
line = file.readline()
if not line: break
line = line.strip()
if line[:4] == 'r"""': line = line[1:]
if line[:3] == '"""':
line = line[3:]
if line[-1:] == '\\': line = line[:-1]
while not line.strip():
line = file.readline()
if not line: break
result = line.split('"""')[0].strip()
else: result = None
return result
def synopsis(filename, cache={}):
"""Get the one-line summary out of a module file."""
mtime = os.stat(filename).st_mtime
lastupdate, result = cache.get(filename, (None, None))
if lastupdate is None or lastupdate < mtime:
# Look for binary suffixes first, falling back to source.
if filename.endswith(tuple(importlib.machinery.BYTECODE_SUFFIXES)):
loader_cls = importlib.machinery.SourcelessFileLoader
elif filename.endswith(tuple(importlib.machinery.EXTENSION_SUFFIXES)):
loader_cls = importlib.machinery.ExtensionFileLoader
else:
loader_cls = None
# Now handle the choice.
if loader_cls is None:
# Must be a source file.
try:
file = tokenize.open(filename)
except OSError:
# module can't be opened, so skip it
return None
# text modules can be directly examined
with file:
result = source_synopsis(file)
else:
# Must be a binary module, which has to be imported.
loader = loader_cls('__temp__', filename)
# XXX We probably don't need to pass in the loader here.
spec = importlib.util.spec_from_file_location('__temp__', filename,
loader=loader)
_spec = importlib._bootstrap._SpecMethods(spec)
try:
module = _spec.load()
except:
return None
del sys.modules['__temp__']
result = module.__doc__.splitlines()[0] if module.__doc__ else None
# Cache the result.
cache[filename] = (mtime, result)
return result
class ErrorDuringImport(Exception):
"""Errors that occurred while trying to import something to document it."""
def __init__(self, filename, exc_info):
self.filename = filename
self.exc, self.value, self.tb = exc_info
def __str__(self):
exc = self.exc.__name__
return 'problem in %s - %s: %s' % (self.filename, exc, self.value)
def importfile(path):
"""Import a Python source file or compiled file given its path."""
magic = importlib.util.MAGIC_NUMBER
with open(path, 'rb') as file:
is_bytecode = magic == file.read(len(magic))
filename = os.path.basename(path)
name, ext = os.path.splitext(filename)
if is_bytecode:
loader = importlib._bootstrap.SourcelessFileLoader(name, path)
else:
loader = importlib._bootstrap.SourceFileLoader(name, path)
# XXX We probably don't need to pass in the loader here.
spec = importlib.util.spec_from_file_location(name, path, loader=loader)
_spec = importlib._bootstrap._SpecMethods(spec)
try:
return _spec.load()
except:
raise ErrorDuringImport(path, sys.exc_info())
def safeimport(path, forceload=0, cache={}):
"""Import a module; handle errors; return None if the module isn't found.
If the module *is* found but an exception occurs, it's wrapped in an
ErrorDuringImport exception and reraised. Unlike __import__, if a
package path is specified, the module at the end of the path is returned,
not the package at the beginning. If the optional 'forceload' argument
is 1, we reload the module from disk (unless it's a dynamic extension)."""
try:
# If forceload is 1 and the module has been previously loaded from
# disk, we always have to reload the module. Checking the file's
# mtime isn't good enough (e.g. the module could contain a class
# that inherits from another module that has changed).
if forceload and path in sys.modules:
if path not in sys.builtin_module_names:
# Remove the module from sys.modules and re-import to try
# and avoid problems with partially loaded modules.
# Also remove any submodules because they won't appear
# in the newly loaded module's namespace if they're already
# in sys.modules.
subs = [m for m in sys.modules if m.startswith(path + '.')]
for key in [path] + subs:
# Prevent garbage collection.
cache[key] = sys.modules[key]
del sys.modules[key]
module = __import__(path)
except:
# Did the error occur before or after the module was found?
(exc, value, tb) = info = sys.exc_info()
if path in sys.modules:
# An error occurred while executing the imported module.
raise ErrorDuringImport(sys.modules[path].__file__, info)
elif exc is SyntaxError:
# A SyntaxError occurred before we could execute the module.
raise ErrorDuringImport(value.filename, info)
elif exc is ImportError and value.name == path:
# No such module in the path.
return None
else:
# Some other error occurred during the importing process.
raise ErrorDuringImport(path, sys.exc_info())
for part in path.split('.')[1:]:
try: module = getattr(module, part)
except AttributeError: return None
return module
# ---------------------------------------------------- formatter base class
class Doc:
PYTHONDOCS = os.environ.get("PYTHONDOCS",
"http://docs.python.org/%d.%d/library"
% sys.version_info[:2])
def document(self, object, name=None, *args):
"""Generate documentation for an object."""
args = (object, name) + args
# 'try' clause is to attempt to handle the possibility that inspect
# identifies something in a way that pydoc itself has issues handling;
# think 'super' and how it is a descriptor (which raises the exception
# by lacking a __name__ attribute) and an instance.
if inspect.isgetsetdescriptor(object): return self.docdata(*args)
if inspect.ismemberdescriptor(object): return self.docdata(*args)
try:
if inspect.ismodule(object): return self.docmodule(*args)
if inspect.isclass(object): return self.docclass(*args)
if inspect.isroutine(object): return self.docroutine(*args)
except AttributeError:
pass
if isinstance(object, property): return self.docproperty(*args)
return self.docother(*args)
def fail(self, object, name=None, *args):
"""Raise an exception for unimplemented types."""
message = "don't know how to document object%s of type %s" % (
name and ' ' + repr(name), type(object).__name__)
raise TypeError(message)
docmodule = docclass = docroutine = docother = docproperty = docdata = fail
def getdocloc(self, object):
"""Return the location of module docs or None"""
try:
file = inspect.getabsfile(object)
except TypeError:
file = '(built-in)'
docloc = os.environ.get("PYTHONDOCS", self.PYTHONDOCS)
basedir = os.path.join(sys.base_exec_prefix, "lib",
"python%d.%d" % sys.version_info[:2])
if (isinstance(object, type(os)) and
(object.__name__ in ('errno', 'exceptions', 'gc', 'imp',
'marshal', 'posix', 'signal', 'sys',
'_thread', 'zipimport') or
(file.startswith(basedir) and
not file.startswith(os.path.join(basedir, 'site-packages')))) and
object.__name__ not in ('xml.etree', 'test.pydoc_mod')):
if docloc.startswith("http://"):
docloc = "%s/%s" % (docloc.rstrip("/"), object.__name__)
else:
docloc = os.path.join(docloc, object.__name__ + ".html")
else:
docloc = None
return docloc
# -------------------------------------------- HTML documentation generator
class HTMLRepr(Repr):
"""Class for safely making an HTML representation of a Python object."""
def __init__(self):
Repr.__init__(self)
self.maxlist = self.maxtuple = 20
self.maxdict = 10
self.maxstring = self.maxother = 100
def escape(self, text):
return replace(text, '&', '&', '<', '<', '>', '>')
def repr(self, object):
return Repr.repr(self, object)
def repr1(self, x, level):
if hasattr(type(x), '__name__'):
methodname = 'repr_' + '_'.join(type(x).__name__.split())
if hasattr(self, methodname):
return getattr(self, methodname)(x, level)
return self.escape(cram(stripid(repr(x)), self.maxother))
def repr_string(self, x, level):
test = cram(x, self.maxstring)
testrepr = repr(test)
if '\\' in test and '\\' not in replace(testrepr, r'\\', ''):
# Backslashes are only literal in the string and are never
# needed to make any special characters, so show a raw string.
return 'r' + testrepr[0] + self.escape(test) + testrepr[0]
return re.sub(r'((\\[\\abfnrtv\'"]|\\[0-9]..|\\x..|\\u....)+)',
r'<font color="#c040c0">\1</font>',
self.escape(testrepr))
repr_str = repr_string
def repr_instance(self, x, level):
try:
return self.escape(cram(stripid(repr(x)), self.maxstring))
except:
return self.escape('<%s instance>' % x.__class__.__name__)
repr_unicode = repr_string
class HTMLDoc(Doc):
"""Formatter class for HTML documentation."""
# ------------------------------------------- HTML formatting utilities
_repr_instance = HTMLRepr()
repr = _repr_instance.repr
escape = _repr_instance.escape
def page(self, title, contents):
"""Format an HTML page."""
return '''\
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Python: %s</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head><body bgcolor="#f0f0f8">
%s
</body></html>''' % (title, contents)
def heading(self, title, fgcol, bgcol, extras=''):
"""Format a page heading."""
return '''
<table width="100%%" cellspacing=0 cellpadding=2 border=0 summary="heading">
<tr bgcolor="%s">
<td valign=bottom> <br>
<font color="%s" face="helvetica, arial"> <br>%s</font></td
><td align=right valign=bottom
><font color="%s" face="helvetica, arial">%s</font></td></tr></table>
''' % (bgcol, fgcol, title, fgcol, extras or ' ')
def section(self, title, fgcol, bgcol, contents, width=6,
prelude='', marginalia=None, gap=' '):
"""Format a section with a heading."""
if marginalia is None:
marginalia = '<tt>' + ' ' * width + '</tt>'
result = '''<p>
<table width="100%%" cellspacing=0 cellpadding=2 border=0 summary="section">
<tr bgcolor="%s">
<td colspan=3 valign=bottom> <br>
<font color="%s" face="helvetica, arial">%s</font></td></tr>
''' % (bgcol, fgcol, title)
if prelude:
result = result + '''
<tr bgcolor="%s"><td rowspan=2>%s</td>
<td colspan=2>%s</td></tr>
<tr><td>%s</td>''' % (bgcol, marginalia, prelude, gap)
else:
result = result + '''
<tr><td bgcolor="%s">%s</td><td>%s</td>''' % (bgcol, marginalia, gap)
return result + '\n<td width="100%%">%s</td></tr></table>' % contents
def bigsection(self, title, *args):
"""Format a section with a big heading."""
title = '<big><strong>%s</strong></big>' % title
return self.section(title, *args)
def preformat(self, text):
"""Format literal preformatted text."""
text = self.escape(text.expandtabs())
return replace(text, '\n\n', '\n \n', '\n\n', '\n \n',
' ', ' ', '\n', '<br>\n')
def multicolumn(self, list, format, cols=4):
"""Format a list of items into a multi-column list."""
result = ''
rows = (len(list)+cols-1)//cols
for col in range(cols):
result = result + '<td width="%d%%" valign=top>' % (100//cols)
for i in range(rows*col, rows*col+rows):
if i < len(list):
result = result + format(list[i]) + '<br>\n'
result = result + '</td>'
return '<table width="100%%" summary="list"><tr>%s</tr></table>' % result
def grey(self, text): return '<font color="#909090">%s</font>' % text
def namelink(self, name, *dicts):
"""Make a link for an identifier, given name-to-URL mappings."""
for dict in dicts:
if name in dict:
return '<a href="%s">%s</a>' % (dict[name], name)
return name
def classlink(self, object, modname):
"""Make a link for a class."""
name, module = object.__name__, sys.modules.get(object.__module__)
if hasattr(module, name) and getattr(module, name) is object:
return '<a href="%s.html#%s">%s</a>' % (
module.__name__, name, classname(object, modname))
return classname(object, modname)
def modulelink(self, object):
"""Make a link for a module."""
return '<a href="%s.html">%s</a>' % (object.__name__, object.__name__)
def modpkglink(self, modpkginfo):
"""Make a link for a module or package to display in an index."""
name, path, ispackage, shadowed = modpkginfo
if shadowed:
return self.grey(name)
if path:
url = '%s.%s.html' % (path, name)
else:
url = '%s.html' % name
if ispackage:
text = '<strong>%s</strong> (package)' % name
else:
text = name
return '<a href="%s">%s</a>' % (url, text)
def filelink(self, url, path):
"""Make a link to source file."""
return '<a href="file:%s">%s</a>' % (url, path)
def markup(self, text, escape=None, funcs={}, classes={}, methods={}):
"""Mark up some plain text, given a context of symbols to look for.
Each context dictionary maps object names to anchor names."""
escape = escape or self.escape
results = []
here = 0
pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
r'RFC[- ]?(\d+)|'
r'PEP[- ]?(\d+)|'
r'(self\.)?(\w+))')
while True:
match = pattern.search(text, here)
if not match: break
start, end = match.span()
results.append(escape(text[here:start]))
all, scheme, rfc, pep, selfdot, name = match.groups()
if scheme:
url = escape(all).replace('"', '"')
results.append('<a href="%s">%s</a>' % (url, url))
elif rfc:
url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif pep:
url = 'http://www.python.org/dev/peps/pep-%04d/' % int(pep)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif selfdot:
# Create a link for methods like 'self.method(...)'
# and use <strong> for attributes like 'self.attr'
if text[end:end+1] == '(':
results.append('self.' + self.namelink(name, methods))
else:
results.append('self.<strong>%s</strong>' % name)
elif text[end:end+1] == '(':
results.append(self.namelink(name, methods, funcs, classes))
else:
results.append(self.namelink(name, classes))
here = end
results.append(escape(text[here:]))
return ''.join(results)
# ---------------------------------------------- type-specific routines
def formattree(self, tree, modname, parent=None):
"""Produce HTML for a class tree as given by inspect.getclasstree()."""
result = ''
for entry in tree:
if type(entry) is type(()):
c, bases = entry
result = result + '<dt><font face="helvetica, arial">'
result = result + self.classlink(c, modname)
if bases and bases != (parent,):
parents = []
for base in bases:
parents.append(self.classlink(base, modname))
result = result + '(' + ', '.join(parents) + ')'
result = result + '\n</font></dt>'
elif type(entry) is type([]):
result = result + '<dd>\n%s</dd>\n' % self.formattree(
entry, modname, c)
return '<dl>\n%s</dl>\n' % result
def docmodule(self, object, name=None, mod=None, *ignored):
"""Produce HTML documentation for a module object."""
name = object.__name__ # ignore the passed-in name
try:
all = object.__all__
except AttributeError:
all = None
parts = name.split('.')
links = []
for i in range(len(parts)-1):
links.append(
'<a href="%s.html"><font color="#ffffff">%s</font></a>' %
('.'.join(parts[:i+1]), parts[i]))
linkedname = '.'.join(links + parts[-1:])
head = '<big><big><strong>%s</strong></big></big>' % linkedname
try:
path = inspect.getabsfile(object)
url = urllib.parse.quote(path)
filelink = self.filelink(url, path)
except TypeError:
filelink = '(built-in)'
info = []
if hasattr(object, '__version__'):
version = str(object.__version__)
if version[:11] == '$' + 'Revision: ' and version[-1:] == '$':
version = version[11:-1].strip()
info.append('version %s' % self.escape(version))
if hasattr(object, '__date__'):
info.append(self.escape(str(object.__date__)))
if info:
head = head + ' (%s)' % ', '.join(info)
docloc = self.getdocloc(object)
if docloc is not None:
docloc = '<br><a href="%(docloc)s">Module Reference</a>' % locals()
else:
docloc = ''
result = self.heading(
head, '#ffffff', '#7799ee',
'<a href=".">index</a><br>' + filelink + docloc)
modules = inspect.getmembers(object, inspect.ismodule)
classes, cdict = [], {}
for key, value in inspect.getmembers(object, inspect.isclass):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None or
(inspect.getmodule(value) or object) is object):
if visiblename(key, all, object):
classes.append((key, value))
cdict[key] = cdict[value] = '#' + key
for key, value in classes:
for base in value.__bases__:
key, modname = base.__name__, base.__module__
module = sys.modules.get(modname)
if modname != name and module and hasattr(module, key):
if getattr(module, key) is base:
if not key in cdict:
cdict[key] = cdict[base] = modname + '.html#' + key
funcs, fdict = [], {}
for key, value in inspect.getmembers(object, inspect.isroutine):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None or
inspect.isbuiltin(value) or inspect.getmodule(value) is object):
if visiblename(key, all, object):
funcs.append((key, value))
fdict[key] = '#-' + key
if inspect.isfunction(value): fdict[value] = fdict[key]
data = []
for key, value in inspect.getmembers(object, isdata):
if visiblename(key, all, object):
data.append((key, value))
doc = self.markup(getdoc(object), self.preformat, fdict, cdict)
doc = doc and '<tt>%s</tt>' % doc
result = result + '<p>%s</p>\n' % doc
if hasattr(object, '__path__'):
modpkgs = []
for importer, modname, ispkg in pkgutil.iter_modules(object.__path__):
modpkgs.append((modname, name, ispkg, 0))
modpkgs.sort()
contents = self.multicolumn(modpkgs, self.modpkglink)
result = result + self.bigsection(
'Package Contents', '#ffffff', '#aa55cc', contents)
elif modules:
contents = self.multicolumn(
modules, lambda t: self.modulelink(t[1]))
result = result + self.bigsection(
'Modules', '#ffffff', '#aa55cc', contents)
if classes:
classlist = [value for (key, value) in classes]
contents = [
self.formattree(inspect.getclasstree(classlist, 1), name)]
for key, value in classes:
contents.append(self.document(value, key, name, fdict, cdict))
result = result + self.bigsection(
'Classes', '#ffffff', '#ee77aa', ' '.join(contents))
if funcs:
contents = []
for key, value in funcs:
contents.append(self.document(value, key, name, fdict, cdict))
result = result + self.bigsection(
'Functions', '#ffffff', '#eeaa77', ' '.join(contents))
if data:
contents = []
for key, value in data:
contents.append(self.document(value, key))
result = result + self.bigsection(
'Data', '#ffffff', '#55aa55', '<br>\n'.join(contents))
if hasattr(object, '__author__'):
contents = self.markup(str(object.__author__), self.preformat)
result = result + self.bigsection(
'Author', '#ffffff', '#7799ee', contents)
if hasattr(object, '__credits__'):
contents = self.markup(str(object.__credits__), self.preformat)
result = result + self.bigsection(
'Credits', '#ffffff', '#7799ee', contents)
return result
def docclass(self, object, name=None, mod=None, funcs={}, classes={},
*ignored):
"""Produce HTML documentation for a class object."""
realname = object.__name__
name = name or realname
bases = object.__bases__
contents = []
push = contents.append
# Cute little class to pump out a horizontal rule between sections.
class HorizontalRule:
def __init__(self):
self.needone = 0
def maybe(self):
if self.needone:
push('<hr>\n')
self.needone = 1
hr = HorizontalRule()
# List the mro, if non-trivial.
mro = deque(inspect.getmro(object))
if len(mro) > 2:
hr.maybe()
push('<dl><dt>Method resolution order:</dt>\n')
for base in mro:
push('<dd>%s</dd>\n' % self.classlink(base,
object.__module__))
push('</dl>\n')
def spill(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
try:
value = getattr(object, name)
except Exception:
# Some descriptors may meet a failure in their __get__.
# (bug #1785)
push(self._docdescriptor(name, value, mod))
else:
push(self.document(value, name, mod,
funcs, classes, mdict, object))
push('\n')
return attrs
def spilldescriptors(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
push(self._docdescriptor(name, value, mod))
return attrs
def spilldata(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
base = self.docother(getattr(object, name), name, mod)
if callable(value) or inspect.isdatadescriptor(value):
doc = getattr(value, "__doc__", None)
else:
doc = None
if doc is None:
push('<dl><dt>%s</dl>\n' % base)
else:
doc = self.markup(getdoc(value), self.preformat,
funcs, classes, mdict)
doc = '<dd><tt>%s</tt>' % doc
push('<dl><dt>%s%s</dl>\n' % (base, doc))
push('\n')
return attrs
attrs = [(name, kind, cls, value)
for name, kind, cls, value in classify_class_attrs(object)
if visiblename(name, obj=object)]
mdict = {}
for key, kind, homecls, value in attrs:
mdict[key] = anchor = '#' + name + '-' + key
try:
value = getattr(object, name)
except Exception:
# Some descriptors may meet a failure in their __get__.
# (bug #1785)
pass
try:
# The value may not be hashable (e.g., a data attr with
# a dict or list value).
mdict[value] = anchor
except TypeError:
pass
while attrs:
if mro:
thisclass = mro.popleft()
else:
thisclass = attrs[0][2]
attrs, inherited = _split_list(attrs, lambda t: t[2] is thisclass)
if thisclass is builtins.object:
attrs = inherited
continue
elif thisclass is object:
tag = 'defined here'
else:
tag = 'inherited from %s' % self.classlink(thisclass,
object.__module__)
tag += ':<br>\n'
# Sort attrs by name.
attrs.sort(key=lambda t: t[0])
# Pump out the attrs, segregated by kind.
attrs = spill('Methods %s' % tag, attrs,
lambda t: t[1] == 'method')
attrs = spill('Class methods %s' % tag, attrs,
lambda t: t[1] == 'class method')
attrs = spill('Static methods %s' % tag, attrs,
lambda t: t[1] == 'static method')
attrs = spilldescriptors('Data descriptors %s' % tag, attrs,
lambda t: t[1] == 'data descriptor')
attrs = spilldata('Data and other attributes %s' % tag, attrs,
lambda t: t[1] == 'data')
assert attrs == []
attrs = inherited
contents = ''.join(contents)
if name == realname:
title = '<a name="%s">class <strong>%s</strong></a>' % (
name, realname)
else:
title = '<strong>%s</strong> = <a name="%s">class %s</a>' % (
name, name, realname)
if bases:
parents = []
for base in bases:
parents.append(self.classlink(base, object.__module__))
title = title + '(%s)' % ', '.join(parents)
doc = self.markup(getdoc(object), self.preformat, funcs, classes, mdict)
doc = doc and '<tt>%s<br> </tt>' % doc
return self.section(title, '#000000', '#ffc8d8', contents, 3, doc)
def formatvalue(self, object):
"""Format an argument default value as text."""
return self.grey('=' + self.repr(object))
def docroutine(self, object, name=None, mod=None,
funcs={}, classes={}, methods={}, cl=None):
"""Produce HTML documentation for a function or method object."""
realname = object.__name__
name = name or realname
anchor = (cl and cl.__name__ or '') + '-' + name
note = ''
skipdocs = 0
if _is_bound_method(object):
imclass = object.__self__.__class__
if cl:
if imclass is not cl:
note = ' from ' + self.classlink(imclass, mod)
else:
if object.__self__ is not None:
note = ' method of %s instance' % self.classlink(
object.__self__.__class__, mod)
else:
note = ' unbound %s method' % self.classlink(imclass,mod)
if name == realname:
title = '<a name="%s"><strong>%s</strong></a>' % (anchor, realname)
else:
if (cl and realname in cl.__dict__ and
cl.__dict__[realname] is object):
reallink = '<a href="#%s">%s</a>' % (
cl.__name__ + '-' + realname, realname)
skipdocs = 1
else:
reallink = realname
title = '<a name="%s"><strong>%s</strong></a> = %s' % (
anchor, name, reallink)
argspec = None
if inspect.isroutine(object):
try:
signature = inspect.signature(object)
except (ValueError, TypeError):
signature = None
if signature:
argspec = str(signature)
if realname == '<lambda>':
title = '<strong>%s</strong> <em>lambda</em> ' % name
# XXX lambda's won't usually have func_annotations['return']
# since the syntax doesn't support but it is possible.
# So removing parentheses isn't truly safe.
argspec = argspec[1:-1] # remove parentheses
if not argspec:
argspec = '(...)'
decl = title + self.escape(argspec) + (note and self.grey(
'<font face="helvetica, arial">%s</font>' % note))
if skipdocs:
return '<dl><dt>%s</dt></dl>\n' % decl
else:
doc = self.markup(
getdoc(object), self.preformat, funcs, classes, methods)
doc = doc and '<dd><tt>%s</tt></dd>' % doc
return '<dl><dt>%s</dt>%s</dl>\n' % (decl, doc)
def _docdescriptor(self, name, value, mod):
results = []
push = results.append
if name:
push('<dl><dt><strong>%s</strong></dt>\n' % name)
if value.__doc__ is not None:
doc = self.markup(getdoc(value), self.preformat)
push('<dd><tt>%s</tt></dd>\n' % doc)
push('</dl>\n')
return ''.join(results)
def docproperty(self, object, name=None, mod=None, cl=None):
"""Produce html documentation for a property."""
return self._docdescriptor(name, object, mod)
def docother(self, object, name=None, mod=None, *ignored):
"""Produce HTML documentation for a data object."""
lhs = name and '<strong>%s</strong> = ' % name or ''
return lhs + self.repr(object)
def docdata(self, object, name=None, mod=None, cl=None):
"""Produce html documentation for a data descriptor."""
return self._docdescriptor(name, object, mod)
def index(self, dir, shadowed=None):
"""Generate an HTML index for a directory of modules."""
modpkgs = []
if shadowed is None: shadowed = {}
for importer, name, ispkg in pkgutil.iter_modules([dir]):
if any((0xD800 <= ord(ch) <= 0xDFFF) for ch in name):
# ignore a module if its name contains a surrogate character
continue
modpkgs.append((name, '', ispkg, name in shadowed))
shadowed[name] = 1
modpkgs.sort()
contents = self.multicolumn(modpkgs, self.modpkglink)
return self.bigsection(dir, '#ffffff', '#ee77aa', contents)
# -------------------------------------------- text documentation generator
class TextRepr(Repr):
"""Class for safely making a text representation of a Python object."""
def __init__(self):
Repr.__init__(self)
self.maxlist = self.maxtuple = 20
self.maxdict = 10
self.maxstring = self.maxother = 100
def repr1(self, x, level):
if hasattr(type(x), '__name__'):
methodname = 'repr_' + '_'.join(type(x).__name__.split())
if hasattr(self, methodname):
return getattr(self, methodname)(x, level)
return cram(stripid(repr(x)), self.maxother)
def repr_string(self, x, level):
test = cram(x, self.maxstring)
testrepr = repr(test)
if '\\' in test and '\\' not in replace(testrepr, r'\\', ''):
# Backslashes are only literal in the string and are never
# needed to make any special characters, so show a raw string.
return 'r' + testrepr[0] + test + testrepr[0]
return testrepr
repr_str = repr_string
def repr_instance(self, x, level):
try:
return cram(stripid(repr(x)), self.maxstring)
except:
return '<%s instance>' % x.__class__.__name__
class TextDoc(Doc):
"""Formatter class for text documentation."""
# ------------------------------------------- text formatting utilities
_repr_instance = TextRepr()
repr = _repr_instance.repr
def bold(self, text):
"""Format a string in bold by overstriking."""
return ''.join(ch + '\b' + ch for ch in text)
def indent(self, text, prefix=' '):
"""Indent text by prepending a given prefix to each line."""
if not text: return ''
lines = [prefix + line for line in text.split('\n')]
if lines: lines[-1] = lines[-1].rstrip()
return '\n'.join(lines)
def section(self, title, contents):
"""Format a section with a given heading."""
clean_contents = self.indent(contents).rstrip()
return self.bold(title) + '\n' + clean_contents + '\n\n'
# ---------------------------------------------- type-specific routines
def formattree(self, tree, modname, parent=None, prefix=''):
"""Render in text a class tree as returned by inspect.getclasstree()."""
result = ''
for entry in tree:
if type(entry) is type(()):
c, bases = entry
result = result + prefix + classname(c, modname)
if bases and bases != (parent,):
parents = (classname(c, modname) for c in bases)
result = result + '(%s)' % ', '.join(parents)
result = result + '\n'
elif type(entry) is type([]):
result = result + self.formattree(
entry, modname, c, prefix + ' ')
return result
def docmodule(self, object, name=None, mod=None):
"""Produce text documentation for a given module object."""
name = object.__name__ # ignore the passed-in name
synop, desc = splitdoc(getdoc(object))
result = self.section('NAME', name + (synop and ' - ' + synop))
all = getattr(object, '__all__', None)
docloc = self.getdocloc(object)
if docloc is not None:
result = result + self.section('MODULE REFERENCE', docloc + """
The following documentation is automatically generated from the Python
source files. It may be incomplete, incorrect or include features that
are considered implementation detail and may vary between Python
implementations. When in doubt, consult the module reference at the
location listed above.
""")
if desc:
result = result + self.section('DESCRIPTION', desc)
classes = []
for key, value in inspect.getmembers(object, inspect.isclass):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None
or (inspect.getmodule(value) or object) is object):
if visiblename(key, all, object):
classes.append((key, value))
funcs = []
for key, value in inspect.getmembers(object, inspect.isroutine):
# if __all__ exists, believe it. Otherwise use old heuristic.
if (all is not None or
inspect.isbuiltin(value) or inspect.getmodule(value) is object):
if visiblename(key, all, object):
funcs.append((key, value))
data = []
for key, value in inspect.getmembers(object, isdata):
if visiblename(key, all, object):
data.append((key, value))
modpkgs = []
modpkgs_names = set()
if hasattr(object, '__path__'):
for importer, modname, ispkg in pkgutil.iter_modules(object.__path__):
modpkgs_names.add(modname)
if ispkg:
modpkgs.append(modname + ' (package)')
else:
modpkgs.append(modname)
modpkgs.sort()
result = result + self.section(
'PACKAGE CONTENTS', '\n'.join(modpkgs))
# Detect submodules as sometimes created by C extensions
submodules = []
for key, value in inspect.getmembers(object, inspect.ismodule):
if value.__name__.startswith(name + '.') and key not in modpkgs_names:
submodules.append(key)
if submodules:
submodules.sort()
result = result + self.section(
'SUBMODULES', '\n'.join(submodules))
if classes:
classlist = [value for key, value in classes]
contents = [self.formattree(
inspect.getclasstree(classlist, 1), name)]
for key, value in classes:
contents.append(self.document(value, key, name))
result = result + self.section('CLASSES', '\n'.join(contents))
if funcs:
contents = []
for key, value in funcs:
contents.append(self.document(value, key, name))
result = result + self.section('FUNCTIONS', '\n'.join(contents))
if data:
contents = []
for key, value in data:
contents.append(self.docother(value, key, name, maxlen=70))
result = result + self.section('DATA', '\n'.join(contents))
if hasattr(object, '__version__'):
version = str(object.__version__)
if version[:11] == '$' + 'Revision: ' and version[-1:] == '$':
version = version[11:-1].strip()
result = result + self.section('VERSION', version)
if hasattr(object, '__date__'):
result = result + self.section('DATE', str(object.__date__))
if hasattr(object, '__author__'):
result = result + self.section('AUTHOR', str(object.__author__))
if hasattr(object, '__credits__'):
result = result + self.section('CREDITS', str(object.__credits__))
try:
file = inspect.getabsfile(object)
except TypeError:
file = '(built-in)'
result = result + self.section('FILE', file)
return result
def docclass(self, object, name=None, mod=None, *ignored):
"""Produce text documentation for a given class object."""
realname = object.__name__
name = name or realname
bases = object.__bases__
def makename(c, m=object.__module__):
return classname(c, m)
if name == realname:
title = 'class ' + self.bold(realname)
else:
title = self.bold(name) + ' = class ' + realname
if bases:
parents = map(makename, bases)
title = title + '(%s)' % ', '.join(parents)
doc = getdoc(object)
contents = doc and [doc + '\n'] or []
push = contents.append
# List the mro, if non-trivial.
mro = deque(inspect.getmro(object))
if len(mro) > 2:
push("Method resolution order:")
for base in mro:
push(' ' + makename(base))
push('')
# Cute little class to pump out a horizontal rule between sections.
class HorizontalRule:
def __init__(self):
self.needone = 0
def maybe(self):
if self.needone:
push('-' * 70)
self.needone = 1
hr = HorizontalRule()
def spill(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
try:
value = getattr(object, name)
except Exception:
# Some descriptors may meet a failure in their __get__.
# (bug #1785)
push(self._docdescriptor(name, value, mod))
else:
push(self.document(value,
name, mod, object))
return attrs
def spilldescriptors(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
push(self._docdescriptor(name, value, mod))
return attrs
def spilldata(msg, attrs, predicate):
ok, attrs = _split_list(attrs, predicate)
if ok:
hr.maybe()
push(msg)
for name, kind, homecls, value in ok:
if callable(value) or inspect.isdatadescriptor(value):
doc = getdoc(value)
else:
doc = None
try:
obj = getattr(object, name)
except AttributeError:
obj = homecls.__dict__[name]
push(self.docother(obj, name, mod, maxlen=70, doc=doc) +
'\n')
return attrs
attrs = [(name, kind, cls, value)
for name, kind, cls, value in classify_class_attrs(object)
if visiblename(name, obj=object)]
while attrs:
if mro:
thisclass = mro.popleft()
else:
thisclass = attrs[0][2]
attrs, inherited = _split_list(attrs, lambda t: t[2] is thisclass)
if thisclass is builtins.object:
attrs = inherited
continue
elif thisclass is object:
tag = "defined here"
else:
tag = "inherited from %s" % classname(thisclass,
object.__module__)
# Sort attrs by name.
attrs.sort()
# Pump out the attrs, segregated by kind.
attrs = spill("Methods %s:\n" % tag, attrs,
lambda t: t[1] == 'method')
attrs = spill("Class methods %s:\n" % tag, attrs,
lambda t: t[1] == 'class method')
attrs = spill("Static methods %s:\n" % tag, attrs,
lambda t: t[1] == 'static method')
attrs = spilldescriptors("Data descriptors %s:\n" % tag, attrs,
lambda t: t[1] == 'data descriptor')
attrs = spilldata("Data and other attributes %s:\n" % tag, attrs,
lambda t: t[1] == 'data')
assert attrs == []
attrs = inherited
contents = '\n'.join(contents)
if not contents:
return title + '\n'
return title + '\n' + self.indent(contents.rstrip(), ' | ') + '\n'
def formatvalue(self, object):
"""Format an argument default value as text."""
return '=' + self.repr(object)
def docroutine(self, object, name=None, mod=None, cl=None):
"""Produce text documentation for a function or method object."""
realname = object.__name__
name = name or realname
note = ''
skipdocs = 0
if _is_bound_method(object):
imclass = object.__self__.__class__
if cl:
if imclass is not cl:
note = ' from ' + classname(imclass, mod)
else:
if object.__self__ is not None:
note = ' method of %s instance' % classname(
object.__self__.__class__, mod)
else:
note = ' unbound %s method' % classname(imclass,mod)
if name == realname:
title = self.bold(realname)
else:
if (cl and realname in cl.__dict__ and
cl.__dict__[realname] is object):
skipdocs = 1
title = self.bold(name) + ' = ' + realname
argspec = None
if inspect.isroutine(object):
try:
signature = inspect.signature(object)
except (ValueError, TypeError):
signature = None
if signature:
argspec = str(signature)
if realname == '<lambda>':
title = self.bold(name) + ' lambda '
# XXX lambda's won't usually have func_annotations['return']
# since the syntax doesn't support but it is possible.
# So removing parentheses isn't truly safe.
argspec = argspec[1:-1] # remove parentheses
if not argspec:
argspec = '(...)'
decl = title + argspec + note
if skipdocs:
return decl + '\n'
else:
doc = getdoc(object) or ''
return decl + '\n' + (doc and self.indent(doc).rstrip() + '\n')
def _docdescriptor(self, name, value, mod):
results = []
push = results.append
if name:
push(self.bold(name))
push('\n')
doc = getdoc(value) or ''
if doc:
push(self.indent(doc))
push('\n')
return ''.join(results)
def docproperty(self, object, name=None, mod=None, cl=None):
"""Produce text documentation for a property."""
return self._docdescriptor(name, object, mod)
def docdata(self, object, name=None, mod=None, cl=None):
"""Produce text documentation for a data descriptor."""
return self._docdescriptor(name, object, mod)
def docother(self, object, name=None, mod=None, parent=None, maxlen=None, doc=None):
"""Produce text documentation for a data object."""
repr = self.repr(object)
if maxlen:
line = (name and name + ' = ' or '') + repr
chop = maxlen - len(line)
if chop < 0: repr = repr[:chop] + '...'
line = (name and self.bold(name) + ' = ' or '') + repr
if doc is not None:
line += '\n' + self.indent(str(doc))
return line
class _PlainTextDoc(TextDoc):
"""Subclass of TextDoc which overrides string styling"""
def bold(self, text):
return text
# --------------------------------------------------------- user interfaces
def pager(text):
"""The first time this is called, determine what kind of pager to use."""
global pager
pager = getpager()
pager(text)
def getpager():
"""Decide what method to use for paging through text."""
if not hasattr(sys.stdin, "isatty"):
return plainpager
if not hasattr(sys.stdout, "isatty"):
return plainpager
if not sys.stdin.isatty() or not sys.stdout.isatty():
return plainpager
if 'PAGER' in os.environ:
if sys.platform == 'win32': # pipes completely broken in Windows
return lambda text: tempfilepager(plain(text), os.environ['PAGER'])
elif os.environ.get('TERM') in ('dumb', 'emacs'):
return lambda text: pipepager(plain(text), os.environ['PAGER'])
else:
return lambda text: pipepager(text, os.environ['PAGER'])
if os.environ.get('TERM') in ('dumb', 'emacs'):
return plainpager
if sys.platform == 'win32':
return lambda text: tempfilepager(plain(text), 'more <')
if hasattr(os, 'system') and os.system('(less) 2>/dev/null') == 0:
return lambda text: pipepager(text, 'less')
import tempfile
(fd, filename) = tempfile.mkstemp()
os.close(fd)
try:
if hasattr(os, 'system') and os.system('more "%s"' % filename) == 0:
return lambda text: pipepager(text, 'more')
else:
return ttypager
finally:
os.unlink(filename)
def plain(text):
"""Remove boldface formatting from text."""
return re.sub('.\b', '', text)
def pipepager(text, cmd):
"""Page through text by feeding it to another program."""
import subprocess
proc = subprocess.Popen(cmd, shell=True, stdin=subprocess.PIPE)
try:
with io.TextIOWrapper(proc.stdin, errors='backslashreplace') as pipe:
try:
pipe.write(text)
except KeyboardInterrupt:
# We've hereby abandoned whatever text hasn't been written,
# but the pager is still in control of the terminal.
pass
except OSError:
pass # Ignore broken pipes caused by quitting the pager program.
while True:
try:
proc.wait()
break
except KeyboardInterrupt:
# Ignore ctl-c like the pager itself does. Otherwise the pager is
# left running and the terminal is in raw mode and unusable.
pass
def tempfilepager(text, cmd):
"""Page through text by invoking a program on a temporary file."""
import tempfile
filename = tempfile.mktemp()
with open(filename, 'w', errors='backslashreplace') as file:
file.write(text)
try:
os.system(cmd + ' "' + filename + '"')
finally:
os.unlink(filename)
def _escape_stdout(text):
# Escape non-encodable characters to avoid encoding errors later
encoding = getattr(sys.stdout, 'encoding', None) or 'utf-8'
return text.encode(encoding, 'backslashreplace').decode(encoding)
def ttypager(text):
"""Page through text on a text terminal."""
lines = plain(_escape_stdout(text)).split('\n')
try:
import tty
fd = sys.stdin.fileno()
old = tty.tcgetattr(fd)
tty.setcbreak(fd)
getchar = lambda: sys.stdin.read(1)
except (ImportError, AttributeError, io.UnsupportedOperation):
tty = None
getchar = lambda: sys.stdin.readline()[:-1][:1]
try:
try:
h = int(os.environ.get('LINES', 0))
except ValueError:
h = 0
if h <= 1:
h = 25
r = inc = h - 1
sys.stdout.write('\n'.join(lines[:inc]) + '\n')
while lines[r:]:
sys.stdout.write('-- more --')
sys.stdout.flush()
c = getchar()
if c in ('q', 'Q'):
sys.stdout.write('\r \r')
break
elif c in ('\r', '\n'):
sys.stdout.write('\r \r' + lines[r] + '\n')
r = r + 1
continue
if c in ('b', 'B', '\x1b'):
r = r - inc - inc
if r < 0: r = 0
sys.stdout.write('\n' + '\n'.join(lines[r:r+inc]) + '\n')
r = r + inc
finally:
if tty:
tty.tcsetattr(fd, tty.TCSAFLUSH, old)
def plainpager(text):
"""Simply print unformatted text. This is the ultimate fallback."""
sys.stdout.write(plain(_escape_stdout(text)))
def describe(thing):
"""Produce a short description of the given thing."""
if inspect.ismodule(thing):
if thing.__name__ in sys.builtin_module_names:
return 'built-in module ' + thing.__name__
if hasattr(thing, '__path__'):
return 'package ' + thing.__name__
else:
return 'module ' + thing.__name__
if inspect.isbuiltin(thing):
return 'built-in function ' + thing.__name__
if inspect.isgetsetdescriptor(thing):
return 'getset descriptor %s.%s.%s' % (
thing.__objclass__.__module__, thing.__objclass__.__name__,
thing.__name__)
if inspect.ismemberdescriptor(thing):
return 'member descriptor %s.%s.%s' % (
thing.__objclass__.__module__, thing.__objclass__.__name__,
thing.__name__)
if inspect.isclass(thing):
return 'class ' + thing.__name__
if inspect.isfunction(thing):
return 'function ' + thing.__name__
if inspect.ismethod(thing):
return 'method ' + thing.__name__
return type(thing).__name__
def locate(path, forceload=0):
"""Locate an object by name or dotted path, importing as necessary."""
parts = [part for part in path.split('.') if part]
module, n = None, 0
while n < len(parts):
nextmodule = safeimport('.'.join(parts[:n+1]), forceload)
if nextmodule: module, n = nextmodule, n + 1
else: break
if module:
object = module
else:
object = builtins
for part in parts[n:]:
try:
object = getattr(object, part)
except AttributeError:
return None
return object
# --------------------------------------- interactive interpreter interface
text = TextDoc()
plaintext = _PlainTextDoc()
html = HTMLDoc()
def resolve(thing, forceload=0):
"""Given an object or a path to an object, get the object and its name."""
if isinstance(thing, str):
object = locate(thing, forceload)
if object is None:
raise ImportError('no Python documentation found for %r' % thing)
return object, thing
else:
name = getattr(thing, '__name__', None)
return thing, name if isinstance(name, str) else None
def render_doc(thing, title='Python Library Documentation: %s', forceload=0,
renderer=None):
"""Render text documentation, given an object or a path to an object."""
if renderer is None:
renderer = text
object, name = resolve(thing, forceload)
desc = describe(object)
module = inspect.getmodule(object)
if name and '.' in name:
desc += ' in ' + name[:name.rfind('.')]
elif module and module is not object:
desc += ' in module ' + module.__name__
if not (inspect.ismodule(object) or
inspect.isclass(object) or
inspect.isroutine(object) or
inspect.isgetsetdescriptor(object) or
inspect.ismemberdescriptor(object) or
isinstance(object, property)):
# If the passed object is a piece of data or an instance,
# document its available methods instead of its value.
object = type(object)
desc += ' object'
return title % desc + '\n\n' + renderer.document(object, name)
def doc(thing, title='Python Library Documentation: %s', forceload=0,
output=None):
"""Display text documentation, given an object or a path to an object."""
try:
if output is None:
pager(render_doc(thing, title, forceload))
else:
output.write(render_doc(thing, title, forceload, plaintext))
except (ImportError, ErrorDuringImport) as value:
print(value)
def writedoc(thing, forceload=0):
"""Write HTML documentation to a file in the current directory."""
try:
object, name = resolve(thing, forceload)
page = html.page(describe(object), html.document(object, name))
file = open(name + '.html', 'w', encoding='utf-8')
file.write(page)
file.close()
print('wrote', name + '.html')
except (ImportError, ErrorDuringImport) as value:
print(value)
def writedocs(dir, pkgpath='', done=None):
"""Write out HTML documentation for all modules in a directory tree."""
if done is None: done = {}
for importer, modname, ispkg in pkgutil.walk_packages([dir], pkgpath):
writedoc(modname)
return
class Helper:
# These dictionaries map a topic name to either an alias, or a tuple
# (label, seealso-items). The "label" is the label of the corresponding
# section in the .rst file under Doc/ and an index into the dictionary
# in pydoc_data/topics.py.
#
# CAUTION: if you change one of these dictionaries, be sure to adapt the
# list of needed labels in Doc/tools/pyspecific.py and
# regenerate the pydoc_data/topics.py file by running
# make pydoc-topics
# in Doc/ and copying the output file into the Lib/ directory.
keywords = {
'False': '',
'None': '',
'True': '',
'and': 'BOOLEAN',
'as': 'with',
'assert': ('assert', ''),
'break': ('break', 'while for'),
'class': ('class', 'CLASSES SPECIALMETHODS'),
'continue': ('continue', 'while for'),
'def': ('function', ''),
'del': ('del', 'BASICMETHODS'),
'elif': 'if',
'else': ('else', 'while for'),
'except': 'try',
'finally': 'try',
'for': ('for', 'break continue while'),
'from': 'import',
'global': ('global', 'nonlocal NAMESPACES'),
'if': ('if', 'TRUTHVALUE'),
'import': ('import', 'MODULES'),
'in': ('in', 'SEQUENCEMETHODS'),
'is': 'COMPARISON',
'lambda': ('lambda', 'FUNCTIONS'),
'nonlocal': ('nonlocal', 'global NAMESPACES'),
'not': 'BOOLEAN',
'or': 'BOOLEAN',
'pass': ('pass', ''),
'raise': ('raise', 'EXCEPTIONS'),
'return': ('return', 'FUNCTIONS'),
'try': ('try', 'EXCEPTIONS'),
'while': ('while', 'break continue if TRUTHVALUE'),
'with': ('with', 'CONTEXTMANAGERS EXCEPTIONS yield'),
'yield': ('yield', ''),
}
# Either add symbols to this dictionary or to the symbols dictionary
# directly: Whichever is easier. They are merged later.
_symbols_inverse = {
'STRINGS' : ("'", "'''", "r'", "b'", '"""', '"', 'r"', 'b"'),
'OPERATORS' : ('+', '-', '*', '**', '/', '//', '%', '<<', '>>', '&',
'|', '^', '~', '<', '>', '<=', '>=', '==', '!=', '<>'),
'COMPARISON' : ('<', '>', '<=', '>=', '==', '!=', '<>'),
'UNARY' : ('-', '~'),
'AUGMENTEDASSIGNMENT' : ('+=', '-=', '*=', '/=', '%=', '&=', '|=',
'^=', '<<=', '>>=', '**=', '//='),
'BITWISE' : ('<<', '>>', '&', '|', '^', '~'),
'COMPLEX' : ('j', 'J')
}
symbols = {
'%': 'OPERATORS FORMATTING',
'**': 'POWER',
',': 'TUPLES LISTS FUNCTIONS',
'.': 'ATTRIBUTES FLOAT MODULES OBJECTS',
'...': 'ELLIPSIS',
':': 'SLICINGS DICTIONARYLITERALS',
'@': 'def class',
'\\': 'STRINGS',
'_': 'PRIVATENAMES',
'__': 'PRIVATENAMES SPECIALMETHODS',
'`': 'BACKQUOTES',
'(': 'TUPLES FUNCTIONS CALLS',
')': 'TUPLES FUNCTIONS CALLS',
'[': 'LISTS SUBSCRIPTS SLICINGS',
']': 'LISTS SUBSCRIPTS SLICINGS'
}
for topic, symbols_ in _symbols_inverse.items():
for symbol in symbols_:
topics = symbols.get(symbol, topic)
if topic not in topics:
topics = topics + ' ' + topic
symbols[symbol] = topics
topics = {
'TYPES': ('types', 'STRINGS UNICODE NUMBERS SEQUENCES MAPPINGS '
'FUNCTIONS CLASSES MODULES FILES inspect'),
'STRINGS': ('strings', 'str UNICODE SEQUENCES STRINGMETHODS '
'FORMATTING TYPES'),
'STRINGMETHODS': ('string-methods', 'STRINGS FORMATTING'),
'FORMATTING': ('formatstrings', 'OPERATORS'),
'UNICODE': ('strings', 'encodings unicode SEQUENCES STRINGMETHODS '
'FORMATTING TYPES'),
'NUMBERS': ('numbers', 'INTEGER FLOAT COMPLEX TYPES'),
'INTEGER': ('integers', 'int range'),
'FLOAT': ('floating', 'float math'),
'COMPLEX': ('imaginary', 'complex cmath'),
'SEQUENCES': ('typesseq', 'STRINGMETHODS FORMATTING range LISTS'),
'MAPPINGS': 'DICTIONARIES',
'FUNCTIONS': ('typesfunctions', 'def TYPES'),
'METHODS': ('typesmethods', 'class def CLASSES TYPES'),
'CODEOBJECTS': ('bltin-code-objects', 'compile FUNCTIONS TYPES'),
'TYPEOBJECTS': ('bltin-type-objects', 'types TYPES'),
'FRAMEOBJECTS': 'TYPES',
'TRACEBACKS': 'TYPES',
'NONE': ('bltin-null-object', ''),
'ELLIPSIS': ('bltin-ellipsis-object', 'SLICINGS'),
'SPECIALATTRIBUTES': ('specialattrs', ''),
'CLASSES': ('types', 'class SPECIALMETHODS PRIVATENAMES'),
'MODULES': ('typesmodules', 'import'),
'PACKAGES': 'import',
'EXPRESSIONS': ('operator-summary', 'lambda or and not in is BOOLEAN '
'COMPARISON BITWISE SHIFTING BINARY FORMATTING POWER '
'UNARY ATTRIBUTES SUBSCRIPTS SLICINGS CALLS TUPLES '
'LISTS DICTIONARIES'),
'OPERATORS': 'EXPRESSIONS',
'PRECEDENCE': 'EXPRESSIONS',
'OBJECTS': ('objects', 'TYPES'),
'SPECIALMETHODS': ('specialnames', 'BASICMETHODS ATTRIBUTEMETHODS '
'CALLABLEMETHODS SEQUENCEMETHODS MAPPINGMETHODS '
'NUMBERMETHODS CLASSES'),
'BASICMETHODS': ('customization', 'hash repr str SPECIALMETHODS'),
'ATTRIBUTEMETHODS': ('attribute-access', 'ATTRIBUTES SPECIALMETHODS'),
'CALLABLEMETHODS': ('callable-types', 'CALLS SPECIALMETHODS'),
'SEQUENCEMETHODS': ('sequence-types', 'SEQUENCES SEQUENCEMETHODS '
'SPECIALMETHODS'),
'MAPPINGMETHODS': ('sequence-types', 'MAPPINGS SPECIALMETHODS'),
'NUMBERMETHODS': ('numeric-types', 'NUMBERS AUGMENTEDASSIGNMENT '
'SPECIALMETHODS'),
'EXECUTION': ('execmodel', 'NAMESPACES DYNAMICFEATURES EXCEPTIONS'),
'NAMESPACES': ('naming', 'global nonlocal ASSIGNMENT DELETION DYNAMICFEATURES'),
'DYNAMICFEATURES': ('dynamic-features', ''),
'SCOPING': 'NAMESPACES',
'FRAMES': 'NAMESPACES',
'EXCEPTIONS': ('exceptions', 'try except finally raise'),
'CONVERSIONS': ('conversions', ''),
'IDENTIFIERS': ('identifiers', 'keywords SPECIALIDENTIFIERS'),
'SPECIALIDENTIFIERS': ('id-classes', ''),
'PRIVATENAMES': ('atom-identifiers', ''),
'LITERALS': ('atom-literals', 'STRINGS NUMBERS TUPLELITERALS '
'LISTLITERALS DICTIONARYLITERALS'),
'TUPLES': 'SEQUENCES',
'TUPLELITERALS': ('exprlists', 'TUPLES LITERALS'),
'LISTS': ('typesseq-mutable', 'LISTLITERALS'),
'LISTLITERALS': ('lists', 'LISTS LITERALS'),
'DICTIONARIES': ('typesmapping', 'DICTIONARYLITERALS'),
'DICTIONARYLITERALS': ('dict', 'DICTIONARIES LITERALS'),
'ATTRIBUTES': ('attribute-references', 'getattr hasattr setattr ATTRIBUTEMETHODS'),
'SUBSCRIPTS': ('subscriptions', 'SEQUENCEMETHODS'),
'SLICINGS': ('slicings', 'SEQUENCEMETHODS'),
'CALLS': ('calls', 'EXPRESSIONS'),
'POWER': ('power', 'EXPRESSIONS'),
'UNARY': ('unary', 'EXPRESSIONS'),
'BINARY': ('binary', 'EXPRESSIONS'),
'SHIFTING': ('shifting', 'EXPRESSIONS'),
'BITWISE': ('bitwise', 'EXPRESSIONS'),
'COMPARISON': ('comparisons', 'EXPRESSIONS BASICMETHODS'),
'BOOLEAN': ('booleans', 'EXPRESSIONS TRUTHVALUE'),
'ASSERTION': 'assert',
'ASSIGNMENT': ('assignment', 'AUGMENTEDASSIGNMENT'),
'AUGMENTEDASSIGNMENT': ('augassign', 'NUMBERMETHODS'),
'DELETION': 'del',
'RETURNING': 'return',
'IMPORTING': 'import',
'CONDITIONAL': 'if',
'LOOPING': ('compound', 'for while break continue'),
'TRUTHVALUE': ('truth', 'if while and or not BASICMETHODS'),
'DEBUGGING': ('debugger', 'pdb'),
'CONTEXTMANAGERS': ('context-managers', 'with'),
}
def __init__(self, input=None, output=None):
self._input = input
self._output = output
input = property(lambda self: self._input or sys.stdin)
output = property(lambda self: self._output or sys.stdout)
def __repr__(self):
if inspect.stack()[1][3] == '?':
self()
return ''
return '<pydoc.Helper instance>'
_GoInteractive = object()
def __call__(self, request=_GoInteractive):
if request is not self._GoInteractive:
self.help(request)
else:
self.intro()
self.interact()
self.output.write('''
You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)". Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.
''')
def interact(self):
self.output.write('\n')
while True:
try:
request = self.getline('help> ')
if not request: break
except (KeyboardInterrupt, EOFError):
break
request = replace(request, '"', '', "'", '').strip()
if request.lower() in ('q', 'quit'): break
self.help(request)
def getline(self, prompt):
"""Read one line, using input() when appropriate."""
if self.input is sys.stdin:
return input(prompt)
else:
self.output.write(prompt)
self.output.flush()
return self.input.readline()
def help(self, request):
if type(request) is type(''):
request = request.strip()
if request == 'help': self.intro()
elif request == 'keywords': self.listkeywords()
elif request == 'symbols': self.listsymbols()
elif request == 'topics': self.listtopics()
elif request == 'modules': self.listmodules()
elif request[:8] == 'modules ':
self.listmodules(request.split()[1])
elif request in self.symbols: self.showsymbol(request)
elif request in ['True', 'False', 'None']:
# special case these keywords since they are objects too
doc(eval(request), 'Help on %s:')
elif request in self.keywords: self.showtopic(request)
elif request in self.topics: self.showtopic(request)
elif request: doc(request, 'Help on %s:', output=self._output)
elif isinstance(request, Helper): self()
else: doc(request, 'Help on %s:', output=self._output)
self.output.write('\n')
def intro(self):
self.output.write('''
Welcome to Python %s's help utility!
If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/%s/tutorial/.
Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules. To quit this help utility and
return to the interpreter, just type "quit".
To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics". Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".
''' % tuple([sys.version[:3]]*2))
def list(self, items, columns=4, width=80):
items = list(sorted(items))
colw = width // columns
rows = (len(items) + columns - 1) // columns
for row in range(rows):
for col in range(columns):
i = col * rows + row
if i < len(items):
self.output.write(items[i])
if col < columns - 1:
self.output.write(' ' + ' ' * (colw - 1 - len(items[i])))
self.output.write('\n')
def listkeywords(self):
self.output.write('''
Here is a list of the Python keywords. Enter any keyword to get more help.
''')
self.list(self.keywords.keys())
def listsymbols(self):
self.output.write('''
Here is a list of the punctuation symbols which Python assigns special meaning
to. Enter any symbol to get more help.
''')
self.list(self.symbols.keys())
def listtopics(self):
self.output.write('''
Here is a list of available topics. Enter any topic name to get more help.
''')
self.list(self.topics.keys())
def showtopic(self, topic, more_xrefs=''):
try:
import pydoc_data.topics
except ImportError:
self.output.write('''
Sorry, topic and keyword documentation is not available because the
module "pydoc_data.topics" could not be found.
''')
return
target = self.topics.get(topic, self.keywords.get(topic))
if not target:
self.output.write('no documentation found for %s\n' % repr(topic))
return
if type(target) is type(''):
return self.showtopic(target, more_xrefs)
label, xrefs = target
try:
doc = pydoc_data.topics.topics[label]
except KeyError:
self.output.write('no documentation found for %s\n' % repr(topic))
return
pager(doc.strip() + '\n')
if more_xrefs:
xrefs = (xrefs or '') + ' ' + more_xrefs
if xrefs:
import textwrap
text = 'Related help topics: ' + ', '.join(xrefs.split()) + '\n'
wrapped_text = textwrap.wrap(text, 72)
self.output.write('\n%s\n' % ''.join(wrapped_text))
def _gettopic(self, topic, more_xrefs=''):
"""Return unbuffered tuple of (topic, xrefs).
If an error occurs here, the exception is caught and displayed by
the url handler.
This function duplicates the showtopic method but returns its
result directly so it can be formatted for display in an html page.
"""
try:
import pydoc_data.topics
except ImportError:
return('''
Sorry, topic and keyword documentation is not available because the
module "pydoc_data.topics" could not be found.
''' , '')
target = self.topics.get(topic, self.keywords.get(topic))
if not target:
raise ValueError('could not find topic')
if isinstance(target, str):
return self._gettopic(target, more_xrefs)
label, xrefs = target
doc = pydoc_data.topics.topics[label]
if more_xrefs:
xrefs = (xrefs or '') + ' ' + more_xrefs
return doc, xrefs
def showsymbol(self, symbol):
target = self.symbols[symbol]
topic, _, xrefs = target.partition(' ')
self.showtopic(topic, xrefs)
def listmodules(self, key=''):
if key:
self.output.write('''
Here is a list of modules whose name or summary contains '{}'.
If there are any, enter a module name to get more help.
'''.format(key))
apropos(key)
else:
self.output.write('''
Please wait a moment while I gather a list of all available modules...
''')
modules = {}
def callback(path, modname, desc, modules=modules):
if modname and modname[-9:] == '.__init__':
modname = modname[:-9] + ' (package)'
if modname.find('.') < 0:
modules[modname] = 1
def onerror(modname):
callback(None, modname, None)
ModuleScanner().run(callback, onerror=onerror)
self.list(modules.keys())
self.output.write('''
Enter any module name to get more help. Or, type "modules spam" to search
for modules whose name or summary contain the string "spam".
''')
help = Helper()
class ModuleScanner:
"""An interruptible scanner that searches module synopses."""
def run(self, callback, key=None, completer=None, onerror=None):
if key: key = key.lower()
self.quit = False
seen = {}
for modname in sys.builtin_module_names:
if modname != '__main__':
seen[modname] = 1
if key is None:
callback(None, modname, '')
else:
name = __import__(modname).__doc__ or ''
desc = name.split('\n')[0]
name = modname + ' - ' + desc
if name.lower().find(key) >= 0:
callback(None, modname, desc)
for importer, modname, ispkg in pkgutil.walk_packages(onerror=onerror):
if self.quit:
break
if key is None:
callback(None, modname, '')
else:
try:
spec = pkgutil._get_spec(importer, modname)
except SyntaxError:
# raised by tests for bad coding cookies or BOM
continue
loader = spec.loader
if hasattr(loader, 'get_source'):
try:
source = loader.get_source(modname)
except Exception:
if onerror:
onerror(modname)
continue
desc = source_synopsis(io.StringIO(source)) or ''
if hasattr(loader, 'get_filename'):
path = loader.get_filename(modname)
else:
path = None
else:
_spec = importlib._bootstrap._SpecMethods(spec)
try:
module = _spec.load()
except ImportError:
if onerror:
onerror(modname)
continue
desc = module.__doc__.splitlines()[0] if module.__doc__ else ''
path = getattr(module,'__file__',None)
name = modname + ' - ' + desc
if name.lower().find(key) >= 0:
callback(path, modname, desc)
if completer:
completer()
def apropos(key):
"""Print all the one-line module summaries that contain a substring."""
def callback(path, modname, desc):
if modname[-9:] == '.__init__':
modname = modname[:-9] + ' (package)'
print(modname, desc and '- ' + desc)
def onerror(modname):
pass
with warnings.catch_warnings():
warnings.filterwarnings('ignore') # ignore problems during import
ModuleScanner().run(callback, key, onerror=onerror)
# --------------------------------------- enhanced Web browser interface
def _start_server(urlhandler, port):
"""Start an HTTP server thread on a specific port.
Start an HTML/text server thread, so HTML or text documents can be
browsed dynamically and interactively with a Web browser. Example use:
>>> import time
>>> import pydoc
Define a URL handler. To determine what the client is asking
for, check the URL and content_type.
Then get or generate some text or HTML code and return it.
>>> def my_url_handler(url, content_type):
... text = 'the URL sent was: (%s, %s)' % (url, content_type)
... return text
Start server thread on port 0.
If you use port 0, the server will pick a random port number.
You can then use serverthread.port to get the port number.
>>> port = 0
>>> serverthread = pydoc._start_server(my_url_handler, port)
Check that the server is really started. If it is, open browser
and get first page. Use serverthread.url as the starting page.
>>> if serverthread.serving:
... import webbrowser
The next two lines are commented out so a browser doesn't open if
doctest is run on this module.
#... webbrowser.open(serverthread.url)
#True
Let the server do its thing. We just need to monitor its status.
Use time.sleep so the loop doesn't hog the CPU.
>>> starttime = time.time()
>>> timeout = 1 #seconds
This is a short timeout for testing purposes.
>>> while serverthread.serving:
... time.sleep(.01)
... if serverthread.serving and time.time() - starttime > timeout:
... serverthread.stop()
... break
Print any errors that may have occurred.
>>> print(serverthread.error)
None
"""
import http.server
import email.message
import select
import threading
class DocHandler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
"""Process a request from an HTML browser.
The URL received is in self.path.
Get an HTML page from self.urlhandler and send it.
"""
if self.path.endswith('.css'):
content_type = 'text/css'
else:
content_type = 'text/html'
self.send_response(200)
self.send_header('Content-Type', '%s; charset=UTF-8' % content_type)
self.end_headers()
self.wfile.write(self.urlhandler(
self.path, content_type).encode('utf-8'))
def log_message(self, *args):
# Don't log messages.
pass
class DocServer(http.server.HTTPServer):
def __init__(self, port, callback):
self.host = 'localhost'
self.address = (self.host, port)
self.callback = callback
self.base.__init__(self, self.address, self.handler)
self.quit = False
def serve_until_quit(self):
while not self.quit:
rd, wr, ex = select.select([self.socket.fileno()], [], [], 1)
if rd:
self.handle_request()
self.server_close()
def server_activate(self):
self.base.server_activate(self)
if self.callback:
self.callback(self)
class ServerThread(threading.Thread):
def __init__(self, urlhandler, port):
self.urlhandler = urlhandler
self.port = int(port)
threading.Thread.__init__(self)
self.serving = False
self.error = None
def run(self):
"""Start the server."""
try:
DocServer.base = http.server.HTTPServer
DocServer.handler = DocHandler
DocHandler.MessageClass = email.message.Message
DocHandler.urlhandler = staticmethod(self.urlhandler)
docsvr = DocServer(self.port, self.ready)
self.docserver = docsvr
docsvr.serve_until_quit()
except Exception as e:
self.error = e
def ready(self, server):
self.serving = True
self.host = server.host
self.port = server.server_port
self.url = 'http://%s:%d/' % (self.host, self.port)
def stop(self):
"""Stop the server and this thread nicely"""
self.docserver.quit = True
self.serving = False
self.url = None
thread = ServerThread(urlhandler, port)
thread.start()
# Wait until thread.serving is True to make sure we are
# really up before returning.
while not thread.error and not thread.serving:
time.sleep(.01)
return thread
def _url_handler(url, content_type="text/html"):
"""The pydoc url handler for use with the pydoc server.
If the content_type is 'text/css', the _pydoc.css style
sheet is read and returned if it exits.
If the content_type is 'text/html', then the result of
get_html_page(url) is returned.
"""
class _HTMLDoc(HTMLDoc):
def page(self, title, contents):
"""Format an HTML page."""
css_path = "pydoc_data/_pydoc.css"
css_link = (
'<link rel="stylesheet" type="text/css" href="%s">' %
css_path)
return '''\
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head><title>Pydoc: %s</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
%s</head><body bgcolor="#f0f0f8">%s<div style="clear:both;padding-top:.5em;">%s</div>
</body></html>''' % (title, css_link, html_navbar(), contents)
def filelink(self, url, path):
return '<a href="getfile?key=%s">%s</a>' % (url, path)
html = _HTMLDoc()
def html_navbar():
version = html.escape("%s [%s, %s]" % (platform.python_version(),
platform.python_build()[0],
platform.python_compiler()))
return """
<div style='float:left'>
Python %s<br>%s
</div>
<div style='float:right'>
<div style='text-align:center'>
<a href="index.html">Module Index</a>
: <a href="topics.html">Topics</a>
: <a href="keywords.html">Keywords</a>
</div>
<div>
<form action="get" style='display:inline;'>
<input type=text name=key size=15>
<input type=submit value="Get">
</form>
<form action="search" style='display:inline;'>
<input type=text name=key size=15>
<input type=submit value="Search">
</form>
</div>
</div>
""" % (version, html.escape(platform.platform(terse=True)))
def html_index():
"""Module Index page."""
def bltinlink(name):
return '<a href="%s.html">%s</a>' % (name, name)
heading = html.heading(
'<big><big><strong>Index of Modules</strong></big></big>',
'#ffffff', '#7799ee')
names = [name for name in sys.builtin_module_names
if name != '__main__']
contents = html.multicolumn(names, bltinlink)
contents = [heading, '<p>' + html.bigsection(
'Built-in Modules', '#ffffff', '#ee77aa', contents)]
seen = {}
for dir in sys.path:
contents.append(html.index(dir, seen))
contents.append(
'<p align=right><font color="#909090" face="helvetica,'
'arial"><strong>pydoc</strong> by Ka-Ping Yee'
'<[email protected]></font>')
return 'Index of Modules', ''.join(contents)
def html_search(key):
"""Search results page."""
# scan for modules
search_result = []
def callback(path, modname, desc):
if modname[-9:] == '.__init__':
modname = modname[:-9] + ' (package)'
search_result.append((modname, desc and '- ' + desc))
with warnings.catch_warnings():
warnings.filterwarnings('ignore') # ignore problems during import
def onerror(modname):
pass
ModuleScanner().run(callback, key, onerror=onerror)
# format page
def bltinlink(name):
return '<a href="%s.html">%s</a>' % (name, name)
results = []
heading = html.heading(
'<big><big><strong>Search Results</strong></big></big>',
'#ffffff', '#7799ee')
for name, desc in search_result:
results.append(bltinlink(name) + desc)
contents = heading + html.bigsection(
'key = %s' % key, '#ffffff', '#ee77aa', '<br>'.join(results))
return 'Search Results', contents
def html_getfile(path):
"""Get and display a source file listing safely."""
path = urllib.parse.unquote(path)
with tokenize.open(path) as fp:
lines = html.escape(fp.read())
body = '<pre>%s</pre>' % lines
heading = html.heading(
'<big><big><strong>File Listing</strong></big></big>',
'#ffffff', '#7799ee')
contents = heading + html.bigsection(
'File: %s' % path, '#ffffff', '#ee77aa', body)
return 'getfile %s' % path, contents
def html_topics():
"""Index of topic texts available."""
def bltinlink(name):
return '<a href="topic?key=%s">%s</a>' % (name, name)
heading = html.heading(
'<big><big><strong>INDEX</strong></big></big>',
'#ffffff', '#7799ee')
names = sorted(Helper.topics.keys())
contents = html.multicolumn(names, bltinlink)
contents = heading + html.bigsection(
'Topics', '#ffffff', '#ee77aa', contents)
return 'Topics', contents
def html_keywords():
"""Index of keywords."""
heading = html.heading(
'<big><big><strong>INDEX</strong></big></big>',
'#ffffff', '#7799ee')
names = sorted(Helper.keywords.keys())
def bltinlink(name):
return '<a href="topic?key=%s">%s</a>' % (name, name)
contents = html.multicolumn(names, bltinlink)
contents = heading + html.bigsection(
'Keywords', '#ffffff', '#ee77aa', contents)
return 'Keywords', contents
def html_topicpage(topic):
"""Topic or keyword help page."""
buf = io.StringIO()
htmlhelp = Helper(buf, buf)
contents, xrefs = htmlhelp._gettopic(topic)
if topic in htmlhelp.keywords:
title = 'KEYWORD'
else:
title = 'TOPIC'
heading = html.heading(
'<big><big><strong>%s</strong></big></big>' % title,
'#ffffff', '#7799ee')
contents = '<pre>%s</pre>' % html.markup(contents)
contents = html.bigsection(topic , '#ffffff','#ee77aa', contents)
if xrefs:
xrefs = sorted(xrefs.split())
def bltinlink(name):
return '<a href="topic?key=%s">%s</a>' % (name, name)
xrefs = html.multicolumn(xrefs, bltinlink)
xrefs = html.section('Related help topics: ',
'#ffffff', '#ee77aa', xrefs)
return ('%s %s' % (title, topic),
''.join((heading, contents, xrefs)))
def html_getobj(url):
obj = locate(url, forceload=1)
if obj is None and url != 'None':
raise ValueError('could not find object')
title = describe(obj)
content = html.document(obj, url)
return title, content
def html_error(url, exc):
heading = html.heading(
'<big><big><strong>Error</strong></big></big>',
'#ffffff', '#7799ee')
contents = '<br>'.join(html.escape(line) for line in
format_exception_only(type(exc), exc))
contents = heading + html.bigsection(url, '#ffffff', '#bb0000',
contents)
return "Error - %s" % url, contents
def get_html_page(url):
"""Generate an HTML page for url."""
complete_url = url
if url.endswith('.html'):
url = url[:-5]
try:
if url in ("", "index"):
title, content = html_index()
elif url == "topics":
title, content = html_topics()
elif url == "keywords":
title, content = html_keywords()
elif '=' in url:
op, _, url = url.partition('=')
if op == "search?key":
title, content = html_search(url)
elif op == "getfile?key":
title, content = html_getfile(url)
elif op == "topic?key":
# try topics first, then objects.
try:
title, content = html_topicpage(url)
except ValueError:
title, content = html_getobj(url)
elif op == "get?key":
# try objects first, then topics.
if url in ("", "index"):
title, content = html_index()
else:
try:
title, content = html_getobj(url)
except ValueError:
title, content = html_topicpage(url)
else:
raise ValueError('bad pydoc url')
else:
title, content = html_getobj(url)
except Exception as exc:
# Catch any errors and display them in an error page.
title, content = html_error(complete_url, exc)
return html.page(title, content)
if url.startswith('/'):
url = url[1:]
if content_type == 'text/css':
path_here = os.path.dirname(os.path.realpath(__file__))
css_path = os.path.join(path_here, url)
with open(css_path) as fp:
return ''.join(fp.readlines())
elif content_type == 'text/html':
return get_html_page(url)
# Errors outside the url handler are caught by the server.
raise TypeError('unknown content type %r for url %s' % (content_type, url))
def browse(port=0, *, open_browser=True):
"""Start the enhanced pydoc Web server and open a Web browser.
Use port '0' to start the server on an arbitrary port.
Set open_browser to False to suppress opening a browser.
"""
import webbrowser
serverthread = _start_server(_url_handler, port)
if serverthread.error:
print(serverthread.error)
return
if serverthread.serving:
server_help_msg = 'Server commands: [b]rowser, [q]uit'
if open_browser:
webbrowser.open(serverthread.url)
try:
print('Server ready at', serverthread.url)
print(server_help_msg)
while serverthread.serving:
cmd = input('server> ')
cmd = cmd.lower()
if cmd == 'q':
break
elif cmd == 'b':
webbrowser.open(serverthread.url)
else:
print(server_help_msg)
except (KeyboardInterrupt, EOFError):
print()
finally:
if serverthread.serving:
serverthread.stop()
print('Server stopped')
# -------------------------------------------------- command-line interface
def ispath(x):
return isinstance(x, str) and x.find(os.sep) >= 0
def cli():
"""Command-line interface (looks at sys.argv to decide what to do)."""
import getopt
class BadUsage(Exception): pass
# Scripts don't get the current directory in their path by default
# unless they are run with the '-m' switch
if '' not in sys.path:
scriptdir = os.path.dirname(sys.argv[0])
if scriptdir in sys.path:
sys.path.remove(scriptdir)
sys.path.insert(0, '.')
try:
opts, args = getopt.getopt(sys.argv[1:], 'bk:p:w')
writing = False
start_server = False
open_browser = False
port = None
for opt, val in opts:
if opt == '-b':
start_server = True
open_browser = True
if opt == '-k':
apropos(val)
return
if opt == '-p':
start_server = True
port = val
if opt == '-w':
writing = True
if start_server:
if port is None:
port = 0
browse(port, open_browser=open_browser)
return
if not args: raise BadUsage
for arg in args:
if ispath(arg) and not os.path.exists(arg):
print('file %r does not exist' % arg)
break
try:
if ispath(arg) and os.path.isfile(arg):
arg = importfile(arg)
if writing:
if ispath(arg) and os.path.isdir(arg):
writedocs(arg)
else:
writedoc(arg)
else:
help.help(arg)
except ErrorDuringImport as value:
print(value)
except (getopt.error, BadUsage):
cmd = os.path.splitext(os.path.basename(sys.argv[0]))[0]
print("""pydoc - the Python documentation tool
{cmd} <name> ...
Show text documentation on something. <name> may be the name of a
Python keyword, topic, function, module, or package, or a dotted
reference to a class or function within a module or module in a
package. If <name> contains a '{sep}', it is used as the path to a
Python source file to document. If name is 'keywords', 'topics',
or 'modules', a listing of these things is displayed.
{cmd} -k <keyword>
Search for a keyword in the synopsis lines of all available modules.
{cmd} -p <port>
Start an HTTP server on the given port on the local machine. Port
number 0 can be used to get an arbitrary unused port.
{cmd} -b
Start an HTTP server on an arbitrary unused port and open a Web browser
to interactively browse documentation. The -p option can be used with
the -b option to explicitly specify the server port.
{cmd} -w <name> ...
Write out the HTML documentation for a module to a file in the current
directory. If <name> contains a '{sep}', it is treated as a filename; if
it names a directory, documentation is written for all the contents.
""".format(cmd=cmd, sep=os.sep))
if __name__ == '__main__':
cli()
"""Routine to "compile" a .py file to a .pyc (or .pyo) file.
This module has intimate knowledge of the format of .pyc files.
"""
import importlib._bootstrap
import importlib.machinery
import importlib.util
import os
import os.path
import sys
import traceback
__all__ = ["compile", "main", "PyCompileError"]
class PyCompileError(Exception):
"""Exception raised when an error occurs while attempting to
compile the file.
To raise this exception, use
raise PyCompileError(exc_type,exc_value,file[,msg])
where
exc_type: exception type to be used in error message
type name can be accesses as class variable
'exc_type_name'
exc_value: exception value to be used in error message
can be accesses as class variable 'exc_value'
file: name of file being compiled to be used in error message
can be accesses as class variable 'file'
msg: string message to be written as error message
If no value is given, a default exception message will be
given, consistent with 'standard' py_compile output.
message (or default) can be accesses as class variable
'msg'
"""
def __init__(self, exc_type, exc_value, file, msg=''):
exc_type_name = exc_type.__name__
if exc_type is SyntaxError:
tbtext = ''.join(traceback.format_exception_only(
exc_type, exc_value))
errmsg = tbtext.replace('File "<string>"', 'File "%s"' % file)
else:
errmsg = "Sorry: %s: %s" % (exc_type_name,exc_value)
Exception.__init__(self,msg or errmsg,exc_type_name,exc_value,file)
self.exc_type_name = exc_type_name
self.exc_value = exc_value
self.file = file
self.msg = msg or errmsg
def __str__(self):
return self.msg
def compile(file, cfile=None, dfile=None, doraise=False, optimize=-1):
"""Byte-compile one Python source file to Python bytecode.
:param file: The source file name.
:param cfile: The target byte compiled file name. When not given, this
defaults to the PEP 3147 location.
:param dfile: Purported file name, i.e. the file name that shows up in
error messages. Defaults to the source file name.
:param doraise: Flag indicating whether or not an exception should be
raised when a compile error is found. If an exception occurs and this
flag is set to False, a string indicating the nature of the exception
will be printed, and the function will return to the caller. If an
exception occurs and this flag is set to True, a PyCompileError
exception will be raised.
:param optimize: The optimization level for the compiler. Valid values
are -1, 0, 1 and 2. A value of -1 means to use the optimization
level of the current interpreter, as given by -O command line options.
:return: Path to the resulting byte compiled file.
Note that it isn't necessary to byte-compile Python modules for
execution efficiency -- Python itself byte-compiles a module when
it is loaded, and if it can, writes out the bytecode to the
corresponding .pyc (or .pyo) file.
However, if a Python installation is shared between users, it is a
good idea to byte-compile all modules upon installation, since
other users may not be able to write in the source directories,
and thus they won't be able to write the .pyc/.pyo file, and then
they would be byte-compiling every module each time it is loaded.
This can slow down program start-up considerably.
See compileall.py for a script/module that uses this module to
byte-compile all installed files (or all files in selected
directories).
Do note that FileExistsError is raised if cfile ends up pointing at a
non-regular file or symlink. Because the compilation uses a file renaming,
the resulting file would be regular and thus not the same type of file as
it was previously.
"""
if sys.implementation.name == "ironpython": return # not supported on IronPython, so return unconditionally
if cfile is None:
if optimize >= 0:
cfile = importlib.util.cache_from_source(file,
debug_override=not optimize)
else:
cfile = importlib.util.cache_from_source(file)
if os.path.islink(cfile):
msg = ('{} is a symlink and will be changed into a regular file if '
'import writes a byte-compiled file to it')
raise FileExistsError(msg.format(cfile))
elif os.path.exists(cfile) and not os.path.isfile(cfile):
msg = ('{} is a non-regular file and will be changed into a regular '
'one if import writes a byte-compiled file to it')
raise FileExistsError(msg.format(cfile))
loader = importlib.machinery.SourceFileLoader('<py_compile>', file)
source_bytes = loader.get_data(file)
try:
code = loader.source_to_code(source_bytes, dfile or file,
_optimize=optimize)
except Exception as err:
py_exc = PyCompileError(err.__class__, err, dfile or file)
if doraise:
raise py_exc
else:
sys.stderr.write(py_exc.msg + '\n')
return
try:
dirname = os.path.dirname(cfile)
if dirname:
os.makedirs(dirname)
except FileExistsError:
pass
source_stats = loader.path_stats(file)
bytecode = importlib._bootstrap._code_to_bytecode(
code, source_stats['mtime'], source_stats['size'])
mode = importlib._bootstrap._calc_mode(file)
importlib._bootstrap._write_atomic(cfile, bytecode, mode)
return cfile
def main(args=None):
"""Compile several source files.
The files named in 'args' (or on the command line, if 'args' is
not specified) are compiled and the resulting bytecode is cached
in the normal manner. This function does not search a directory
structure to locate source files; it only compiles files named
explicitly. If '-' is the only parameter in args, the list of
files is taken from standard input.
"""
if args is None:
args = sys.argv[1:]
rv = 0
if args == ['-']:
while True:
filename = sys.stdin.readline()
if not filename:
break
filename = filename.rstrip('\n')
try:
compile(filename, doraise=True)
except PyCompileError as error:
rv = 1
sys.stderr.write("%s\n" % error.msg)
except OSError as error:
rv = 1
sys.stderr.write("%s\n" % error)
else:
for filename in args:
try:
compile(filename, doraise=True)
except PyCompileError as error:
# return value to indicate at least one failure
rv = 1
sys.stderr.write("%s\n" % error.msg)
return rv
if __name__ == "__main__":
sys.exit(main())
'''A multi-producer, multi-consumer queue.'''
try:
import threading
except ImportError:
import dummy_threading as threading
from collections import deque
from heapq import heappush, heappop
try:
from time import monotonic as time
except ImportError:
from time import time
__all__ = ['Empty', 'Full', 'Queue', 'PriorityQueue', 'LifoQueue']
class Empty(Exception):
'Exception raised by Queue.get(block=0)/get_nowait().'
pass
class Full(Exception):
'Exception raised by Queue.put(block=0)/put_nowait().'
pass
class Queue:
'''Create a queue object with a given maximum size.
If maxsize is <= 0, the queue size is infinite.
'''
def __init__(self, maxsize=0):
self.maxsize = maxsize
self._init(maxsize)
# mutex must be held whenever the queue is mutating. All methods
# that acquire mutex must release it before returning. mutex
# is shared between the three conditions, so acquiring and
# releasing the conditions also acquires and releases mutex.
self.mutex = threading.Lock()
# Notify not_empty whenever an item is added to the queue; a
# thread waiting to get is notified then.
self.not_empty = threading.Condition(self.mutex)
# Notify not_full whenever an item is removed from the queue;
# a thread waiting to put is notified then.
self.not_full = threading.Condition(self.mutex)
# Notify all_tasks_done whenever the number of unfinished tasks
# drops to zero; thread waiting to join() is notified to resume
self.all_tasks_done = threading.Condition(self.mutex)
self.unfinished_tasks = 0
def task_done(self):
'''Indicate that a formerly enqueued task is complete.
Used by Queue consumer threads. For each get() used to fetch a task,
a subsequent call to task_done() tells the queue that the processing
on the task is complete.
If a join() is currently blocking, it will resume when all items
have been processed (meaning that a task_done() call was received
for every item that had been put() into the queue).
Raises a ValueError if called more times than there were items
placed in the queue.
'''
with self.all_tasks_done:
unfinished = self.unfinished_tasks - 1
if unfinished <= 0:
if unfinished < 0:
raise ValueError('task_done() called too many times')
self.all_tasks_done.notify_all()
self.unfinished_tasks = unfinished
def join(self):
'''Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls task_done()
to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
'''
with self.all_tasks_done:
while self.unfinished_tasks:
self.all_tasks_done.wait()
def qsize(self):
'''Return the approximate size of the queue (not reliable!).'''
with self.mutex:
return self._qsize()
def empty(self):
'''Return True if the queue is empty, False otherwise (not reliable!).
This method is likely to be removed at some point. Use qsize() == 0
as a direct substitute, but be aware that either approach risks a race
condition where a queue can grow before the result of empty() or
qsize() can be used.
To create code that needs to wait for all queued tasks to be
completed, the preferred technique is to use the join() method.
'''
with self.mutex:
return not self._qsize()
def full(self):
'''Return True if the queue is full, False otherwise (not reliable!).
This method is likely to be removed at some point. Use qsize() >= n
as a direct substitute, but be aware that either approach risks a race
condition where a queue can shrink before the result of full() or
qsize() can be used.
'''
with self.mutex:
return 0 < self.maxsize <= self._qsize()
def put(self, item, block=True, timeout=None):
'''Put an item into the queue.
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until a free slot is available. If 'timeout' is
a non-negative number, it blocks at most 'timeout' seconds and raises
the Full exception if no free slot was available within that time.
Otherwise ('block' is false), put an item on the queue if a free slot
is immediately available, else raise the Full exception ('timeout'
is ignored in that case).
'''
with self.not_full:
if self.maxsize > 0:
if not block:
if self._qsize() >= self.maxsize:
raise Full
elif timeout is None:
while self._qsize() >= self.maxsize:
self.not_full.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = time() + timeout
while self._qsize() >= self.maxsize:
remaining = endtime - time()
if remaining <= 0.0:
raise Full
self.not_full.wait(remaining)
self._put(item)
self.unfinished_tasks += 1
self.not_empty.notify()
def get(self, block=True, timeout=None):
'''Remove and return an item from the queue.
If optional args 'block' is true and 'timeout' is None (the default),
block if necessary until an item is available. If 'timeout' is
a non-negative number, it blocks at most 'timeout' seconds and raises
the Empty exception if no item was available within that time.
Otherwise ('block' is false), return an item if one is immediately
available, else raise the Empty exception ('timeout' is ignored
in that case).
'''
with self.not_empty:
if not block:
if not self._qsize():
raise Empty
elif timeout is None:
while not self._qsize():
self.not_empty.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = time() + timeout
while not self._qsize():
remaining = endtime - time()
if remaining <= 0.0:
raise Empty
self.not_empty.wait(remaining)
item = self._get()
self.not_full.notify()
return item
def put_nowait(self, item):
'''Put an item into the queue without blocking.
Only enqueue the item if a free slot is immediately available.
Otherwise raise the Full exception.
'''
return self.put(item, block=False)
def get_nowait(self):
'''Remove and return an item from the queue without blocking.
Only get an item if one is immediately available. Otherwise
raise the Empty exception.
'''
return self.get(block=False)
# Override these methods to implement other queue organizations
# (e.g. stack or priority queue).
# These will only be called with appropriate locks held
# Initialize the queue representation
def _init(self, maxsize):
self.queue = deque()
def _qsize(self):
return len(self.queue)
# Put a new item in the queue
def _put(self, item):
self.queue.append(item)
# Get an item from the queue
def _get(self):
return self.queue.popleft()
class PriorityQueue(Queue):
'''Variant of Queue that retrieves open entries in priority order (lowest first).
Entries are typically tuples of the form: (priority number, data).
'''
def _init(self, maxsize):
self.queue = []
def _qsize(self):
return len(self.queue)
def _put(self, item):
heappush(self.queue, item)
def _get(self):
return heappop(self.queue)
class LifoQueue(Queue):
'''Variant of Queue that retrieves most recently added entries first.'''
def _init(self, maxsize):
self.queue = []
def _qsize(self):
return len(self.queue)
def _put(self, item):
self.queue.append(item)
def _get(self):
return self.queue.pop()
#! /usr/bin/env python3
"""Conversions to/from quoted-printable transport encoding as per RFC 1521."""
# (Dec 1991 version).
__all__ = ["encode", "decode", "encodestring", "decodestring"]
ESCAPE = b'='
MAXLINESIZE = 76
HEX = b'0123456789ABCDEF'
EMPTYSTRING = b''
try:
from binascii import a2b_qp, b2a_qp
except ImportError:
a2b_qp = None
b2a_qp = None
def needsquoting(c, quotetabs, header):
"""Decide whether a particular byte ordinal needs to be quoted.
The 'quotetabs' flag indicates whether embedded tabs and spaces should be
quoted. Note that line-ending tabs and spaces are always encoded, as per
RFC 1521.
"""
assert isinstance(c, bytes)
if c in b' \t':
return quotetabs
# if header, we have to escape _ because _ is used to escape space
if c == b'_':
return header
return c == ESCAPE or not (b' ' <= c <= b'~')
def quote(c):
"""Quote a single character."""
assert isinstance(c, bytes) and len(c)==1
c = ord(c)
return ESCAPE + bytes((HEX[c//16], HEX[c%16]))
def encode(input, output, quotetabs, header=False):
"""Read 'input', apply quoted-printable encoding, and write to 'output'.
'input' and 'output' are binary file objects. The 'quotetabs' flag
indicates whether embedded tabs and spaces should be quoted. Note that
line-ending tabs and spaces are always encoded, as per RFC 1521.
The 'header' flag indicates whether we are encoding spaces as _ as per RFC
1522."""
if b2a_qp is not None:
data = input.read()
odata = b2a_qp(data, quotetabs=quotetabs, header=header)
output.write(odata)
return
def write(s, output=output, lineEnd=b'\n'):
# RFC 1521 requires that the line ending in a space or tab must have
# that trailing character encoded.
if s and s[-1:] in b' \t':
output.write(s[:-1] + quote(s[-1:]) + lineEnd)
elif s == b'.':
output.write(quote(s) + lineEnd)
else:
output.write(s + lineEnd)
prevline = None
while 1:
line = input.readline()
if not line:
break
outline = []
# Strip off any readline induced trailing newline
stripped = b''
if line[-1:] == b'\n':
line = line[:-1]
stripped = b'\n'
# Calculate the un-length-limited encoded line
for c in line:
c = bytes((c,))
if needsquoting(c, quotetabs, header):
c = quote(c)
if header and c == b' ':
outline.append(b'_')
else:
outline.append(c)
# First, write out the previous line
if prevline is not None:
write(prevline)
# Now see if we need any soft line breaks because of RFC-imposed
# length limitations. Then do the thisline->prevline dance.
thisline = EMPTYSTRING.join(outline)
while len(thisline) > MAXLINESIZE:
# Don't forget to include the soft line break `=' sign in the
# length calculation!
write(thisline[:MAXLINESIZE-1], lineEnd=b'=\n')
thisline = thisline[MAXLINESIZE-1:]
# Write out the current line
prevline = thisline
# Write out the last line, without a trailing newline
if prevline is not None:
write(prevline, lineEnd=stripped)
def encodestring(s, quotetabs=False, header=False):
if b2a_qp is not None:
return b2a_qp(s, quotetabs=quotetabs, header=header)
from io import BytesIO
infp = BytesIO(s)
outfp = BytesIO()
encode(infp, outfp, quotetabs, header)
return outfp.getvalue()
def decode(input, output, header=False):
"""Read 'input', apply quoted-printable decoding, and write to 'output'.
'input' and 'output' are binary file objects.
If 'header' is true, decode underscore as space (per RFC 1522)."""
if a2b_qp is not None:
data = input.read()
odata = a2b_qp(data, header=header)
output.write(odata)
return
new = b''
while 1:
line = input.readline()
if not line: break
i, n = 0, len(line)
if n > 0 and line[n-1:n] == b'\n':
partial = 0; n = n-1
# Strip trailing whitespace
while n > 0 and line[n-1:n] in b" \t\r":
n = n-1
else:
partial = 1
while i < n:
c = line[i:i+1]
if c == b'_' and header:
new = new + b' '; i = i+1
elif c != ESCAPE:
new = new + c; i = i+1
elif i+1 == n and not partial:
partial = 1; break
elif i+1 < n and line[i+1:i+2] == ESCAPE:
new = new + ESCAPE; i = i+2
elif i+2 < n and ishex(line[i+1:i+2]) and ishex(line[i+2:i+3]):
new = new + bytes((unhex(line[i+1:i+3]),)); i = i+3
else: # Bad escape sequence -- leave it in
new = new + c; i = i+1
if not partial:
output.write(new + b'\n')
new = b''
if new:
output.write(new)
def decodestring(s, header=False):
if a2b_qp is not None:
return a2b_qp(s, header=header)
from io import BytesIO
infp = BytesIO(s)
outfp = BytesIO()
decode(infp, outfp, header=header)
return outfp.getvalue()
# Other helper functions
def ishex(c):
"""Return true if the byte ordinal 'c' is a hexadecimal digit in ASCII."""
assert isinstance(c, bytes)
return b'0' <= c <= b'9' or b'a' <= c <= b'f' or b'A' <= c <= b'F'
def unhex(s):
"""Get the integer value of a hexadecimal number."""
bits = 0
for c in s:
c = bytes((c,))
if b'0' <= c <= b'9':
i = ord('0')
elif b'a' <= c <= b'f':
i = ord('a')-10
elif b'A' <= c <= b'F':
i = ord(b'A')-10
else:
assert False, "non-hex digit "+repr(c)
bits = bits*16 + (ord(c) - i)
return bits
def main():
import sys
import getopt
try:
opts, args = getopt.getopt(sys.argv[1:], 'td')
except getopt.error as msg:
sys.stdout = sys.stderr
print(msg)
print("usage: quopri [-t | -d] [file] ...")
print("-t: quote tabs")
print("-d: decode; default encode")
sys.exit(2)
deco = 0
tabs = 0
for o, a in opts:
if o == '-t': tabs = 1
if o == '-d': deco = 1
if tabs and deco:
sys.stdout = sys.stderr
print("-t and -d are mutually exclusive")
sys.exit(2)
if not args: args = ['-']
sts = 0
for file in args:
if file == '-':
fp = sys.stdin.buffer
else:
try:
fp = open(file, "rb")
except OSError as msg:
sys.stderr.write("%s: can't open (%s)\n" % (file, msg))
sts = 1
continue
try:
if deco:
decode(fp, sys.stdout.buffer)
else:
encode(fp, sys.stdout.buffer, tabs)
finally:
if file != '-':
fp.close()
if sts:
sys.exit(sts)
if __name__ == '__main__':
main()
"""Random variable generators.
integers
--------
uniform within range
sequences
---------
pick random element
pick random sample
generate random permutation
distributions on the real line:
------------------------------
uniform
triangular
normal (Gaussian)
lognormal
negative exponential
gamma
beta
pareto
Weibull
distributions on the circle (angles 0 to 2pi)
---------------------------------------------
circular uniform
von Mises
General notes on the underlying Mersenne Twister core generator:
* The period is 2**19937-1.
* It is one of the most extensively tested generators in existence.
* The random() method is implemented in C, executes in a single Python step,
and is, therefore, threadsafe.
"""
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from _collections_abc import Set as _Set, Sequence as _Sequence
from hashlib import sha512 as _sha512
__all__ = ["Random","seed","random","uniform","randint","choice","sample",
"randrange","shuffle","normalvariate","lognormvariate",
"expovariate","vonmisesvariate","gammavariate","triangular",
"gauss","betavariate","paretovariate","weibullvariate",
"getstate","setstate", "getrandbits",
"SystemRandom"]
NV_MAGICCONST = 4 * _exp(-0.5)/_sqrt(2.0)
TWOPI = 2.0*_pi
LOG4 = _log(4.0)
SG_MAGICCONST = 1.0 + _log(4.5)
BPF = 53 # Number of bits in a float
RECIP_BPF = 2**-BPF
# Translated by Guido van Rossum from C source provided by
# Adrian Baddeley. Adapted by Raymond Hettinger for use with
# the Mersenne Twister and os.urandom() core generators.
import _random
class Random(_random.Random):
"""Random number generator base class used by bound module functions.
Used to instantiate instances of Random to get generators that don't
share state.
Class Random can also be subclassed if you want to use a different basic
generator of your own devising: in that case, override the following
methods: random(), seed(), getstate(), and setstate().
Optionally, implement a getrandbits() method so that randrange()
can cover arbitrarily large ranges.
"""
VERSION = 3 # used by getstate/setstate
def __init__(self, x=None):
"""Initialize an instance.
Optional argument x controls seeding, as for Random.seed().
"""
self.seed(x)
self.gauss_next = None
def seed(self, a=None, version=2):
"""Initialize internal state from hashable object.
None or no argument seeds from current time or from an operating
system specific randomness source if available.
For version 2 (the default), all of the bits are used if *a* is a str,
bytes, or bytearray. For version 1, the hash() of *a* is used instead.
If *a* is an int, all bits are used.
"""
if a is None:
try:
# Seed with enough bytes to span the 19937 bit
# state space for the Mersenne Twister
a = int.from_bytes(_urandom(2500), 'big')
except NotImplementedError:
import time
a = int(time.time() * 256) # use fractional seconds
if version == 2:
if isinstance(a, (str, bytes, bytearray)):
if isinstance(a, str):
a = a.encode()
a += _sha512(a).digest()
a = int.from_bytes(a, 'big')
super().seed(a)
self.gauss_next = None
def getstate(self):
"""Return internal state; can be passed to setstate() later."""
return self.VERSION, super().getstate(), self.gauss_next
def setstate(self, state):
"""Restore internal state from object returned by getstate()."""
version = state[0]
if version == 3:
version, internalstate, self.gauss_next = state
super().setstate(internalstate)
elif version == 2:
version, internalstate, self.gauss_next = state
# In version 2, the state was saved as signed ints, which causes
# inconsistencies between 32/64-bit systems. The state is
# really unsigned 32-bit ints, so we convert negative ints from
# version 2 to positive longs for version 3.
try:
internalstate = tuple(x % (2**32) for x in internalstate)
except ValueError as e:
raise TypeError from e
super().setstate(internalstate)
else:
raise ValueError("state with version %s passed to "
"Random.setstate() of version %s" %
(version, self.VERSION))
## ---- Methods below this point do not need to be overridden when
## ---- subclassing for the purpose of using a different core generator.
## -------------------- pickle support -------------------
# Issue 17489: Since __reduce__ was defined to fix #759889 this is no
# longer called; we leave it here because it has been here since random was
# rewritten back in 2001 and why risk breaking something.
def __getstate__(self): # for pickle
return self.getstate()
def __setstate__(self, state): # for pickle
self.setstate(state)
def __reduce__(self):
return self.__class__, (), self.getstate()
## -------------------- integer methods -------------------
def randrange(self, start, stop=None, step=1, _int=int):
"""Choose a random item from range(start, stop[, step]).
This fixes the problem with randint() which includes the
endpoint; in Python this is usually not what you want.
"""
# This code is a bit messy to make it fast for the
# common case while still doing adequate error checking.
istart = _int(start)
if istart != start:
raise ValueError("non-integer arg 1 for randrange()")
if stop is None:
if istart > 0:
return self._randbelow(istart)
raise ValueError("empty range for randrange()")
# stop argument supplied.
istop = _int(stop)
if istop != stop:
raise ValueError("non-integer stop for randrange()")
width = istop - istart
if step == 1 and width > 0:
return istart + self._randbelow(width)
if step == 1:
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
# Non-unit step argument supplied.
istep = _int(step)
if istep != step:
raise ValueError("non-integer step for randrange()")
if istep > 0:
n = (width + istep - 1) // istep
elif istep < 0:
n = (width + istep + 1) // istep
else:
raise ValueError("zero step for randrange()")
if n <= 0:
raise ValueError("empty range for randrange()")
return istart + istep*self._randbelow(n)
def randint(self, a, b):
"""Return random integer in range [a, b], including both end points.
"""
return self.randrange(a, b+1)
def _randbelow(self, n, int=int, maxsize=1<<BPF, type=type,
Method=_MethodType, BuiltinMethod=_BuiltinMethodType):
"Return a random int in the range [0,n). Raises ValueError if n==0."
random = self.random
getrandbits = self.getrandbits
# Only call self.getrandbits if the original random() builtin method
# has not been overridden or if a new getrandbits() was supplied.
if type(random) is BuiltinMethod or type(getrandbits) is Method:
k = n.bit_length() # don't use (n-1) here because n can be 1
r = getrandbits(k) # 0 <= r < 2**k
while r >= n:
r = getrandbits(k)
return r
# There's an overriden random() method but no new getrandbits() method,
# so we can only use random() from here.
if n >= maxsize:
_warn("Underlying random() generator does not supply \n"
"enough bits to choose from a population range this large.\n"
"To remove the range limitation, add a getrandbits() method.")
return int(random() * n)
rem = maxsize % n
limit = (maxsize - rem) / maxsize # int(limit * maxsize) % n == 0
r = random()
while r >= limit:
r = random()
return int(r*maxsize) % n
## -------------------- sequence methods -------------------
def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
i = self._randbelow(len(seq))
except ValueError:
raise IndexError('Cannot choose from an empty sequence')
return seq[i]
def shuffle(self, x, random=None):
"""Shuffle list x in place, and return None.
Optional argument random is a 0-argument function returning a
random float in [0.0, 1.0); if it is the default None, the
standard random.random will be used.
"""
if random is None:
randbelow = self._randbelow
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i+1)
x[i], x[j] = x[j], x[i]
else:
_int = int
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = _int(random() * (i+1))
x[i], x[j] = x[j], x[i]
def sample(self, population, k):
"""Chooses k unique random elements from a population sequence or set.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use range as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(range(10000000), 60)
"""
# Sampling without replacement entails tracking either potential
# selections (the pool) in a list or previous selections in a set.
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
if isinstance(population, _Set):
population = tuple(population)
if not isinstance(population, _Sequence):
raise TypeError("Population must be a sequence or set. For dicts, use list(d).")
randbelow = self._randbelow
n = len(population)
if not 0 <= k <= n:
raise ValueError("Sample larger than population")
result = [None] * k
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
if n <= setsize:
# An n-length list is smaller than a k-length set
pool = list(population)
for i in range(k): # invariant: non-selected at [0,n-i)
j = randbelow(n-i)
result[i] = pool[j]
pool[j] = pool[n-i-1] # move non-selected item into vacancy
else:
selected = set()
selected_add = selected.add
for i in range(k):
j = randbelow(n)
while j in selected:
j = randbelow(n)
selected_add(j)
result[i] = population[j]
return result
## -------------------- real-valued distributions -------------------
## -------------------- uniform distribution -------------------
def uniform(self, a, b):
"Get a random number in the range [a, b) or [a, b] depending on rounding."
return a + (b-a) * self.random()
## -------------------- triangular --------------------
def triangular(self, low=0.0, high=1.0, mode=None):
"""Triangular distribution.
Continuous distribution bounded by given lower and upper limits,
and having a given mode value in-between.
http://en.wikipedia.org/wiki/Triangular_distribution
"""
u = self.random()
try:
c = 0.5 if mode is None else (mode - low) / (high - low)
except ZeroDivisionError:
return low
if u > c:
u = 1.0 - u
c = 1.0 - c
low, high = high, low
return low + (high - low) * (u * c) ** 0.5
## -------------------- normal distribution --------------------
def normalvariate(self, mu, sigma):
"""Normal distribution.
mu is the mean, and sigma is the standard deviation.
"""
# mu = mean, sigma = standard deviation
# Uses Kinderman and Monahan method. Reference: Kinderman,
# A.J. and Monahan, J.F., "Computer generation of random
# variables using the ratio of uniform deviates", ACM Trans
# Math Software, 3, (1977), pp257-260.
random = self.random
while 1:
u1 = random()
u2 = 1.0 - random()
z = NV_MAGICCONST*(u1-0.5)/u2
zz = z*z/4.0
if zz <= -_log(u2):
break
return mu + z*sigma
## -------------------- lognormal distribution --------------------
def lognormvariate(self, mu, sigma):
"""Log normal distribution.
If you take the natural logarithm of this distribution, you'll get a
normal distribution with mean mu and standard deviation sigma.
mu can have any value, and sigma must be greater than zero.
"""
return _exp(self.normalvariate(mu, sigma))
## -------------------- exponential distribution --------------------
def expovariate(self, lambd):
"""Exponential distribution.
lambd is 1.0 divided by the desired mean. It should be
nonzero. (The parameter would be called "lambda", but that is
a reserved word in Python.) Returned values range from 0 to
positive infinity if lambd is positive, and from negative
infinity to 0 if lambd is negative.
"""
# lambd: rate lambd = 1/mean
# ('lambda' is a Python reserved word)
# we use 1-random() instead of random() to preclude the
# possibility of taking the log of zero.
return -_log(1.0 - self.random())/lambd
## -------------------- von Mises distribution --------------------
def vonmisesvariate(self, mu, kappa):
"""Circular data distribution.
mu is the mean angle, expressed in radians between 0 and 2*pi, and
kappa is the concentration parameter, which must be greater than or
equal to zero. If kappa is equal to zero, this distribution reduces
to a uniform random angle over the range 0 to 2*pi.
"""
# mu: mean angle (in radians between 0 and 2*pi)
# kappa: concentration parameter kappa (>= 0)
# if kappa = 0 generate uniform random angle
# Based upon an algorithm published in: Fisher, N.I.,
# "Statistical Analysis of Circular Data", Cambridge
# University Press, 1993.
# Thanks to Magnus Kessler for a correction to the
# implementation of step 4.
random = self.random
if kappa <= 1e-6:
return TWOPI * random()
s = 0.5 / kappa
r = s + _sqrt(1.0 + s * s)
while 1:
u1 = random()
z = _cos(_pi * u1)
d = z / (r + z)
u2 = random()
if u2 < 1.0 - d * d or u2 <= (1.0 - d) * _exp(d):
break
q = 1.0 / r
f = (q + z) / (1.0 + q * z)
u3 = random()
if u3 > 0.5:
theta = (mu + _acos(f)) % TWOPI
else:
theta = (mu - _acos(f)) % TWOPI
return theta
## -------------------- gamma distribution --------------------
def gammavariate(self, alpha, beta):
"""Gamma distribution. Not the gamma function!
Conditions on the parameters are alpha > 0 and beta > 0.
The probability distribution function is:
x ** (alpha - 1) * math.exp(-x / beta)
pdf(x) = --------------------------------------
math.gamma(alpha) * beta ** alpha
"""
# alpha > 0, beta > 0, mean is alpha*beta, variance is alpha*beta**2
# Warning: a few older sources define the gamma distribution in terms
# of alpha > -1.0
if alpha <= 0.0 or beta <= 0.0:
raise ValueError('gammavariate: alpha and beta must be > 0.0')
random = self.random
if alpha > 1.0:
# Uses R.C.H. Cheng, "The generation of Gamma
# variables with non-integral shape parameters",
# Applied Statistics, (1977), 26, No. 1, p71-74
ainv = _sqrt(2.0 * alpha - 1.0)
bbb = alpha - LOG4
ccc = alpha + ainv
while 1:
u1 = random()
if not 1e-7 < u1 < .9999999:
continue
u2 = 1.0 - random()
v = _log(u1/(1.0-u1))/ainv
x = alpha*_exp(v)
z = u1*u1*u2
r = bbb+ccc*v-x
if r + SG_MAGICCONST - 4.5*z >= 0.0 or r >= _log(z):
return x * beta
elif alpha == 1.0:
# expovariate(1)
u = random()
while u <= 1e-7:
u = random()
return -_log(u) * beta
else: # alpha is between 0 and 1 (exclusive)
# Uses ALGORITHM GS of Statistical Computing - Kennedy & Gentle
while 1:
u = random()
b = (_e + alpha)/_e
p = b*u
if p <= 1.0:
x = p ** (1.0/alpha)
else:
x = -_log((b-p)/alpha)
u1 = random()
if p > 1.0:
if u1 <= x ** (alpha - 1.0):
break
elif u1 <= _exp(-x):
break
return x * beta
## -------------------- Gauss (faster alternative) --------------------
def gauss(self, mu, sigma):
"""Gaussian distribution.
mu is the mean, and sigma is the standard deviation. This is
slightly faster than the normalvariate() function.
Not thread-safe without a lock around calls.
"""
# When x and y are two variables from [0, 1), uniformly
# distributed, then
#
# cos(2*pi*x)*sqrt(-2*log(1-y))
# sin(2*pi*x)*sqrt(-2*log(1-y))
#
# are two *independent* variables with normal distribution
# (mu = 0, sigma = 1).
# (Lambert Meertens)
# (corrected version; bug discovered by Mike Miller, fixed by LM)
# Multithreading note: When two threads call this function
# simultaneously, it is possible that they will receive the
# same return value. The window is very small though. To
# avoid this, you have to use a lock around all calls. (I
# didn't want to slow this down in the serial case by using a
# lock here.)
random = self.random
z = self.gauss_next
self.gauss_next = None
if z is None:
x2pi = random() * TWOPI
g2rad = _sqrt(-2.0 * _log(1.0 - random()))
z = _cos(x2pi) * g2rad
self.gauss_next = _sin(x2pi) * g2rad
return mu + z*sigma
## -------------------- beta --------------------
## See
## http://mail.python.org/pipermail/python-bugs-list/2001-January/003752.html
## for Ivan Frohne's insightful analysis of why the original implementation:
##
## def betavariate(self, alpha, beta):
## # Discrete Event Simulation in C, pp 87-88.
##
## y = self.expovariate(alpha)
## z = self.expovariate(1.0/beta)
## return z/(y+z)
##
## was dead wrong, and how it probably got that way.
def betavariate(self, alpha, beta):
"""Beta distribution.
Conditions on the parameters are alpha > 0 and beta > 0.
Returned values range between 0 and 1.
"""
# This version due to Janne Sinkkonen, and matches all the std
# texts (e.g., Knuth Vol 2 Ed 3 pg 134 "the beta distribution").
y = self.gammavariate(alpha, 1.)
if y == 0:
return 0.0
else:
return y / (y + self.gammavariate(beta, 1.))
## -------------------- Pareto --------------------
def paretovariate(self, alpha):
"""Pareto distribution. alpha is the shape parameter."""
# Jain, pg. 495
u = 1.0 - self.random()
return 1.0 / u ** (1.0/alpha)
## -------------------- Weibull --------------------
def weibullvariate(self, alpha, beta):
"""Weibull distribution.
alpha is the scale parameter and beta is the shape parameter.
"""
# Jain, pg. 499; bug fix courtesy Bill Arms
u = 1.0 - self.random()
return alpha * (-_log(u)) ** (1.0/beta)
## --------------- Operating System Random Source ------------------
class SystemRandom(Random):
"""Alternate random number generator using sources provided
by the operating system (such as /dev/urandom on Unix or
CryptGenRandom on Windows).
Not available on all systems (see os.urandom() for details).
"""
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF
def getrandbits(self, k):
"""getrandbits(k) -> x. Generates an int with k random bits."""
if k <= 0:
raise ValueError('number of bits must be greater than zero')
if k != int(k):
raise TypeError('number of bits should be an integer')
numbytes = (k + 7) // 8 # bits / 8 and rounded up
x = int.from_bytes(_urandom(numbytes), 'big')
return x >> (numbytes * 8 - k) # trim excess bits
def seed(self, *args, **kwds):
"Stub method. Not used for a system random number generator."
return None
def _notimplemented(self, *args, **kwds):
"Method should not be called for a system random number generator."
raise NotImplementedError('System entropy source does not have state.')
getstate = setstate = _notimplemented
## -------------------- test program --------------------
def _test_generator(n, func, args):
import time
print(n, 'times', func.__name__)
total = 0.0
sqsum = 0.0
smallest = 1e10
largest = -1e10
t0 = time.time()
for i in range(n):
x = func(*args)
total += x
sqsum = sqsum + x*x
smallest = min(x, smallest)
largest = max(x, largest)
t1 = time.time()
print(round(t1-t0, 3), 'sec,', end=' ')
avg = total/n
stddev = _sqrt(sqsum/n - avg*avg)
print('avg %g, stddev %g, min %g, max %g' % \
(avg, stddev, smallest, largest))
def _test(N=2000):
_test_generator(N, random, ())
_test_generator(N, normalvariate, (0.0, 1.0))
_test_generator(N, lognormvariate, (0.0, 1.0))
_test_generator(N, vonmisesvariate, (0.0, 1.0))
_test_generator(N, gammavariate, (0.01, 1.0))
_test_generator(N, gammavariate, (0.1, 1.0))
_test_generator(N, gammavariate, (0.1, 2.0))
_test_generator(N, gammavariate, (0.5, 1.0))
_test_generator(N, gammavariate, (0.9, 1.0))
_test_generator(N, gammavariate, (1.0, 1.0))
_test_generator(N, gammavariate, (2.0, 1.0))
_test_generator(N, gammavariate, (20.0, 1.0))
_test_generator(N, gammavariate, (200.0, 1.0))
_test_generator(N, gauss, (0.0, 1.0))
_test_generator(N, betavariate, (3.0, 3.0))
_test_generator(N, triangular, (0.0, 1.0, 1.0/3.0))
# Create one instance, seeded from current time, and export its methods
# as module-level functions. The functions share state across all uses
#(both in the user's code and in the Python libraries), but that's fine
# for most programs and is easier for the casual user than making them
# instantiate their own Random() instance.
_inst = Random()
seed = _inst.seed
random = _inst.random
uniform = _inst.uniform
triangular = _inst.triangular
randint = _inst.randint
choice = _inst.choice
randrange = _inst.randrange
sample = _inst.sample
shuffle = _inst.shuffle
normalvariate = _inst.normalvariate
lognormvariate = _inst.lognormvariate
expovariate = _inst.expovariate
vonmisesvariate = _inst.vonmisesvariate
gammavariate = _inst.gammavariate
gauss = _inst.gauss
betavariate = _inst.betavariate
paretovariate = _inst.paretovariate
weibullvariate = _inst.weibullvariate
getstate = _inst.getstate
setstate = _inst.setstate
getrandbits = _inst.getrandbits
if __name__ == '__main__':
_test()
#
# Secret Labs' Regular Expression Engine
#
# re-compatible interface for the sre matching engine
#
# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved.
#
# This version of the SRE library can be redistributed under CNRI's
# Python 1.6 license. For any other use, please contact Secret Labs
# AB ([email protected]).
#
# Portions of this engine have been developed in cooperation with
# CNRI. Hewlett-Packard provided funding for 1.6 integration and
# other compatibility work.
#
r"""Support for regular expressions (RE).
This module provides regular expression matching operations similar to
those found in Perl. It supports both 8-bit and Unicode strings; both
the pattern and the strings being processed can contain null bytes and
characters outside the US ASCII range.
Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like "A", "a", or "0", are the simplest
regular expressions; they simply match themselves. You can
concatenate ordinary characters, so last matches the string 'last'.
The special characters are:
"." Matches any character except a newline.
"^" Matches the start of the string.
"$" Matches the end of the string or just before the newline at
the end of the string.
"*" Matches 0 or more (greedy) repetitions of the preceding RE.
Greedy means that it will match as many repetitions as possible.
"+" Matches 1 or more (greedy) repetitions of the preceding RE.
"?" Matches 0 or 1 (greedy) of the preceding RE.
*?,+?,?? Non-greedy versions of the previous three special characters.
{m,n} Matches from m to n repetitions of the preceding RE.
{m,n}? Non-greedy version of the above.
"\\" Either escapes special characters or signals a special sequence.
[] Indicates a set of characters.
A "^" as the first character indicates a complementing set.
"|" A|B, creates an RE that will match either A or B.
(...) Matches the RE inside the parentheses.
The contents can be retrieved or matched later in the string.
(?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
(?:...) Non-grouping version of regular parentheses.
(?P<name>...) The substring matched by the group is accessible by name.
(?P=name) Matches the text matched earlier by the group named name.
(?#...) A comment; ignored.
(?=...) Matches if ... matches next, but doesn't consume the string.
(?!...) Matches if ... doesn't match next.
(?<=...) Matches if preceded by ... (must be fixed length).
(?<!...) Matches if not preceded by ... (must be fixed length).
(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
the (optional) no pattern otherwise.
The special sequences consist of "\\" and a character from the list
below. If the ordinary character is not on the list, then the
resulting RE will match the second character.
\number Matches the contents of the group of the same number.
\A Matches only at the start of the string.
\Z Matches only at the end of the string.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\d Matches any decimal digit; equivalent to the set [0-9] in
bytes patterns or string patterns with the ASCII flag.
In string patterns without the ASCII flag, it will match the whole
range of Unicode digits.
\D Matches any non-digit character; equivalent to [^\d].
\s Matches any whitespace character; equivalent to [ \t\n\r\f\v] in
bytes patterns or string patterns with the ASCII flag.
In string patterns without the ASCII flag, it will match the whole
range of Unicode whitespace characters.
\S Matches any non-whitespace character; equivalent to [^\s].
\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
in bytes patterns or string patterns with the ASCII flag.
In string patterns without the ASCII flag, it will match the
range of Unicode alphanumeric characters (letters plus digits
plus underscore).
With LOCALE, it will match the set [0-9_] plus characters defined
as letters for the current locale.
\W Matches the complement of \w.
\\ Matches a literal backslash.
This module exports the following functions:
match Match a regular expression pattern to the beginning of a string.
fullmatch Match a regular expression pattern to all of a string.
search Search a string for the presence of a pattern.
sub Substitute occurrences of a pattern found in a string.
subn Same as sub, but also return the number of substitutions made.
split Split a string by the occurrences of a pattern.
findall Find all occurrences of a pattern in a string.
finditer Return an iterator yielding a match object for each match.
compile Compile a pattern into a RegexObject.
purge Clear the regular expression cache.
escape Backslash all non-alphanumerics in a string.
Some of the functions in this module takes flags as optional parameters:
A ASCII For string patterns, make \w, \W, \b, \B, \d, \D
match the corresponding ASCII character categories
(rather than the whole Unicode categories, which is the
default).
For bytes patterns, this flag is the only available
behaviour and needn't be specified.
I IGNORECASE Perform case-insensitive matching.
L LOCALE Make \w, \W, \b, \B, dependent on the current locale.
M MULTILINE "^" matches the beginning of lines (after a newline)
as well as the string.
"$" matches the end of lines (before a newline) as well
as the end of the string.
S DOTALL "." matches any character at all, including the newline.
X VERBOSE Ignore whitespace and comments for nicer looking RE's.
U UNICODE For compatibility only. Ignored for string patterns (it
is the default), and forbidden for bytes patterns.
This module also defines an exception 'error'.
"""
import sys
import sre_compile
import sre_parse
try:
import _locale
except ImportError:
_locale = None
# public symbols
__all__ = [ "match", "fullmatch", "search", "sub", "subn", "split", "findall",
"compile", "purge", "template", "escape", "A", "I", "L", "M", "S", "X",
"U", "ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
"UNICODE", "error" ]
__version__ = "2.2.1"
# flags
A = ASCII = sre_compile.SRE_FLAG_ASCII # assume ascii "locale"
I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode "locale"
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
# sre extensions (experimental, don't rely on these)
T = TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking
DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation
# sre exception
error = sre_compile.error
# --------------------------------------------------------------------
# public interface
def match(pattern, string, flags=0):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).match(string)
def fullmatch(pattern, string, flags=0):
"""Try to apply the pattern to all of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).fullmatch(string)
def search(pattern, string, flags=0):
"""Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).search(string)
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used."""
return _compile(pattern, flags).sub(repl, string, count)
def subn(pattern, repl, string, count=0, flags=0):
"""Return a 2-tuple containing (new_string, number).
new_string is the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in the source
string by the replacement repl. number is the number of
substitutions that were made. repl can be either a string or a
callable; if a string, backslash escapes in it are processed.
If it is a callable, it's passed the match object and must
return a replacement string to be used."""
return _compile(pattern, flags).subn(repl, string, count)
def split(pattern, string, maxsplit=0, flags=0):
"""Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings. If
capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting
list. If maxsplit is nonzero, at most maxsplit splits occur,
and the remainder of the string is returned as the final element
of the list."""
return _compile(pattern, flags).split(string, maxsplit)
def findall(pattern, string, flags=0):
"""Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return
a list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result."""
return _compile(pattern, flags).findall(string)
if sys.hexversion >= 0x02020000:
__all__.append("finditer")
def finditer(pattern, string, flags=0):
"""Return an iterator over all non-overlapping matches in the
string. For each match, the iterator returns a match object.
Empty matches are included in the result."""
return _compile(pattern, flags).finditer(string)
def compile(pattern, flags=0):
"Compile a regular expression pattern, returning a pattern object."
return _compile(pattern, flags)
def purge():
"Clear the regular expression caches"
_cache.clear()
_cache_repl.clear()
def template(pattern, flags=0):
"Compile a template pattern, returning a pattern object"
return _compile(pattern, flags|T)
_alphanum_str = frozenset(
"_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890")
_alphanum_bytes = frozenset(
b"_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890")
def escape(pattern):
"""
Escape all the characters in pattern except ASCII letters, numbers and '_'.
"""
if isinstance(pattern, str):
alphanum = _alphanum_str
s = list(pattern)
for i, c in enumerate(pattern):
if c not in alphanum:
if c == "\000":
s[i] = "\\000"
else:
s[i] = "\\" + c
return "".join(s)
else:
alphanum = _alphanum_bytes
s = []
esc = ord(b"\\")
for c in pattern:
if c in alphanum:
s.append(c)
else:
if c == 0:
s.extend(b"\\000")
else:
s.append(esc)
s.append(c)
return bytes(s)
# --------------------------------------------------------------------
# internals
_cache = {}
_cache_repl = {}
_pattern_type = type(sre_compile.compile("", 0))
_MAXCACHE = 512
def _compile(pattern, flags):
# internal: compile pattern
bypass_cache = flags & DEBUG
if not bypass_cache:
try:
p, loc = _cache[type(pattern), pattern, flags]
if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):
return p
except KeyError:
pass
if isinstance(pattern, _pattern_type):
if flags:
raise ValueError(
"Cannot process flags argument with a compiled pattern")
return pattern
if not sre_compile.isstring(pattern):
raise TypeError("first argument must be string or compiled pattern")
p = sre_compile.compile(pattern, flags)
if not bypass_cache:
if len(_cache) >= _MAXCACHE:
_cache.clear()
if p.flags & LOCALE:
if not _locale:
return p
loc = _locale.setlocale(_locale.LC_CTYPE)
else:
loc = None
_cache[type(pattern), pattern, flags] = p, loc
return p
def _compile_repl(repl, pattern):
# internal: compile replacement pattern
try:
return _cache_repl[repl, pattern]
except KeyError:
pass
p = sre_parse.parse_template(repl, pattern)
if len(_cache_repl) >= _MAXCACHE:
_cache_repl.clear()
_cache_repl[repl, pattern] = p
return p
def _expand(pattern, match, template):
# internal: match.expand implementation hook
template = sre_parse.parse_template(template, pattern)
return sre_parse.expand_template(template, match)
def _subx(pattern, template):
# internal: pattern.sub/subn implementation helper
template = _compile_repl(template, pattern)
if not template[0] and len(template[1]) == 1:
# literal replacement
return template[1][0]
def filter(match, template=template):
return sre_parse.expand_template(template, match)
return filter
# register myself for pickling
import copyreg
def _pickle(p):
return _compile, (p.pattern, p.flags)
copyreg.pickle(_pattern_type, _pickle, _compile)
# --------------------------------------------------------------------
# experimental stuff (see python-dev discussions for details)
class Scanner:
def __init__(self, lexicon, flags=0):
from sre_constants import BRANCH, SUBPATTERN
self.lexicon = lexicon
# combine phrases into a compound pattern
p = []
s = sre_parse.Pattern()
s.flags = flags
for phrase, action in lexicon:
p.append(sre_parse.SubPattern(s, [
(SUBPATTERN, (len(p)+1, sre_parse.parse(phrase, flags))),
]))
s.groups = len(p)+1
p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])
self.scanner = sre_compile.compile(p)
def scan(self, string):
result = []
append = result.append
match = self.scanner.scanner(string).match
i = 0
while 1:
m = match()
if not m:
break
j = m.end()
if i == j:
break
action = self.lexicon[m.lastindex-1][1]
if callable(action):
self.match = m
action = action(self, m.group())
if action is not None:
append(action)
i = j
return result, string[i:]
"""Redo the builtin repr() (representation) but with limits on most sizes."""
__all__ = ["Repr", "repr", "recursive_repr"]
import builtins
from itertools import islice
try:
from _thread import get_ident
except ImportError:
from _dummy_thread import get_ident
def recursive_repr(fillvalue='...'):
'Decorator to make a repr function return fillvalue for a recursive call'
def decorating_function(user_function):
repr_running = set()
def wrapper(self):
key = id(self), get_ident()
if key in repr_running:
return fillvalue
repr_running.add(key)
try:
result = user_function(self)
finally:
repr_running.discard(key)
return result
# Can't use functools.wraps() here because of bootstrap issues
wrapper.__module__ = getattr(user_function, '__module__')
wrapper.__doc__ = getattr(user_function, '__doc__')
wrapper.__name__ = getattr(user_function, '__name__')
wrapper.__annotations__ = getattr(user_function, '__annotations__', {})
return wrapper
return decorating_function
class Repr:
def __init__(self):
self.maxlevel = 6
self.maxtuple = 6
self.maxlist = 6
self.maxarray = 5
self.maxdict = 4
self.maxset = 6
self.maxfrozenset = 6
self.maxdeque = 6
self.maxstring = 30
self.maxlong = 40
self.maxother = 30
def repr(self, x):
return self.repr1(x, self.maxlevel)
def repr1(self, x, level):
typename = type(x).__name__
if ' ' in typename:
parts = typename.split()
typename = '_'.join(parts)
if hasattr(self, 'repr_' + typename):
return getattr(self, 'repr_' + typename)(x, level)
else:
return self.repr_instance(x, level)
def _repr_iterable(self, x, level, left, right, maxiter, trail=''):
n = len(x)
if level <= 0 and n:
s = '...'
else:
newlevel = level - 1
repr1 = self.repr1
pieces = [repr1(elem, newlevel) for elem in islice(x, maxiter)]
if n > maxiter: pieces.append('...')
s = ', '.join(pieces)
if n == 1 and trail: right = trail + right
return '%s%s%s' % (left, s, right)
def repr_tuple(self, x, level):
return self._repr_iterable(x, level, '(', ')', self.maxtuple, ',')
def repr_list(self, x, level):
return self._repr_iterable(x, level, '[', ']', self.maxlist)
def repr_array(self, x, level):
header = "array('%s', [" % x.typecode
return self._repr_iterable(x, level, header, '])', self.maxarray)
def repr_set(self, x, level):
x = _possibly_sorted(x)
return self._repr_iterable(x, level, 'set([', '])', self.maxset)
def repr_frozenset(self, x, level):
x = _possibly_sorted(x)
return self._repr_iterable(x, level, 'frozenset([', '])',
self.maxfrozenset)
def repr_deque(self, x, level):
return self._repr_iterable(x, level, 'deque([', '])', self.maxdeque)
def repr_dict(self, x, level):
n = len(x)
if n == 0: return '{}'
if level <= 0: return '{...}'
newlevel = level - 1
repr1 = self.repr1
pieces = []
for key in islice(_possibly_sorted(x), self.maxdict):
keyrepr = repr1(key, newlevel)
valrepr = repr1(x[key], newlevel)
pieces.append('%s: %s' % (keyrepr, valrepr))
if n > self.maxdict: pieces.append('...')
s = ', '.join(pieces)
return '{%s}' % (s,)
def repr_str(self, x, level):
s = builtins.repr(x[:self.maxstring])
if len(s) > self.maxstring:
i = max(0, (self.maxstring-3)//2)
j = max(0, self.maxstring-3-i)
s = builtins.repr(x[:i] + x[len(x)-j:])
s = s[:i] + '...' + s[len(s)-j:]
return s
def repr_int(self, x, level):
s = builtins.repr(x) # XXX Hope this isn't too slow...
if len(s) > self.maxlong:
i = max(0, (self.maxlong-3)//2)
j = max(0, self.maxlong-3-i)
s = s[:i] + '...' + s[len(s)-j:]
return s
def repr_instance(self, x, level):
try:
s = builtins.repr(x)
# Bugs in x.__repr__() can cause arbitrary
# exceptions -- then make up something
except Exception:
return '<%s instance at %x>' % (x.__class__.__name__, id(x))
if len(s) > self.maxother:
i = max(0, (self.maxother-3)//2)
j = max(0, self.maxother-3-i)
s = s[:i] + '...' + s[len(s)-j:]
return s
def _possibly_sorted(x):
# Since not all sequences of items can be sorted and comparison
# functions may raise arbitrary exceptions, return an unsorted
# sequence in that case.
try:
return sorted(x)
except Exception:
return list(x)
aRepr = Repr()
repr = aRepr.repr
"""Word completion for GNU readline.
The completer completes keywords, built-ins and globals in a selectable
namespace (which defaults to __main__); when completing NAME.NAME..., it
evaluates (!) the expression up to the last dot and completes its attributes.
It's very cool to do "import sys" type "sys.", hit the completion key (twice),
and see the list of names defined by the sys module!
Tip: to use the tab key as the completion key, call
readline.parse_and_bind("tab: complete")
Notes:
- Exceptions raised by the completer function are *ignored* (and generally cause
the completion to fail). This is a feature -- since readline sets the tty
device in raw (or cbreak) mode, printing a traceback wouldn't work well
without some complicated hoopla to save, reset and restore the tty state.
- The evaluation of the NAME.NAME... form may cause arbitrary application
defined code to be executed if an object with a __getattr__ hook is found.
Since it is the responsibility of the application (or the user) to enable this
feature, I consider this an acceptable risk. More complicated expressions
(e.g. function calls or indexing operations) are *not* evaluated.
- When the original stdin is not a tty device, GNU readline is never
used, and this module (and the readline module) are silently inactive.
"""
import atexit
import builtins
import __main__
__all__ = ["Completer"]
class Completer:
def __init__(self, namespace = None):
"""Create a new completer for the command line.
Completer([namespace]) -> completer instance.
If unspecified, the default namespace where completions are performed
is __main__ (technically, __main__.__dict__). Namespaces should be
given as dictionaries.
Completer instances should be used as the completion mechanism of
readline via the set_completer() call:
readline.set_completer(Completer(my_namespace).complete)
"""
if namespace and not isinstance(namespace, dict):
raise TypeError('namespace must be a dictionary')
# Don't bind to namespace quite yet, but flag whether the user wants a
# specific namespace or to use __main__.__dict__. This will allow us
# to bind to __main__.__dict__ at completion time, not now.
if namespace is None:
self.use_main_ns = 1
else:
self.use_main_ns = 0
self.namespace = namespace
def complete(self, text, state):
"""Return the next possible completion for 'text'.
This is called successively with state == 0, 1, 2, ... until it
returns None. The completion should begin with 'text'.
"""
if self.use_main_ns:
self.namespace = __main__.__dict__
if not text.strip():
if state == 0:
return '\t'
else:
return None
if state == 0:
if "." in text:
self.matches = self.attr_matches(text)
else:
self.matches = self.global_matches(text)
try:
return self.matches[state]
except IndexError:
return None
def _callable_postfix(self, val, word):
if callable(val):
word = word + "("
return word
def global_matches(self, text):
"""Compute matches when text is a simple name.
Return a list of all keywords, built-in functions and names currently
defined in self.namespace that match.
"""
import keyword
matches = []
seen = {"__builtins__"}
n = len(text)
for word in keyword.kwlist:
if word[:n] == text:
seen.add(word)
matches.append(word)
for nspace in [self.namespace, builtins.__dict__]:
for word, val in nspace.items():
if word[:n] == text and word not in seen:
seen.add(word)
matches.append(self._callable_postfix(val, word))
return matches
def attr_matches(self, text):
"""Compute matches when text contains a dot.
Assuming the text is of the form NAME.NAME....[NAME], and is
evaluable in self.namespace, it will be evaluated and its attributes
(as revealed by dir()) are used as possible completions. (For class
instances, class members are also considered.)
WARNING: this can still invoke arbitrary C code, if an object
with a __getattr__ hook is evaluated.
"""
import re
m = re.match(r"(\w+(\.\w+)*)\.(\w*)", text)
if not m:
return []
expr, attr = m.group(1, 3)
try:
thisobject = eval(expr, self.namespace)
except Exception:
return []
# get the content of the object, except __builtins__
words = set(dir(thisobject))
words.discard("__builtins__")
if hasattr(thisobject, '__class__'):
words.add('__class__')
words.update(get_class_members(thisobject.__class__))
matches = []
n = len(attr)
for word in words:
if word[:n] == attr:
try:
val = getattr(thisobject, word)
except Exception:
continue # Exclude properties that are not set
word = self._callable_postfix(val, "%s.%s" % (expr, word))
matches.append(word)
matches.sort()
return matches
def get_class_members(klass):
ret = dir(klass)
if hasattr(klass,'__bases__'):
for base in klass.__bases__:
ret = ret + get_class_members(base)
return ret
try:
import readline
except ImportError:
pass
else:
readline.set_completer(Completer().complete)
# Release references early at shutdown (the readline module's
# contents are quasi-immortal, and the completer function holds a
# reference to globals).
atexit.register(lambda: readline.set_completer(None))
"""runpy.py - locating and running Python code using the module namespace
Provides support for locating and running Python scripts using the Python
module namespace instead of the native filesystem.
This allows Python code to play nicely with non-filesystem based PEP 302
importers when locating support scripts as well as when importing modules.
"""
# Written by Nick Coghlan <ncoghlan at gmail.com>
# to implement PEP 338 (Executing Modules as Scripts)
import sys
import importlib.machinery # importlib first so we can test #15386 via -m
import importlib.util
import types
from pkgutil import read_code, get_importer
__all__ = [
"run_module", "run_path",
]
class _TempModule(object):
"""Temporarily replace a module in sys.modules with an empty namespace"""
def __init__(self, mod_name):
self.mod_name = mod_name
self.module = types.ModuleType(mod_name)
self._saved_module = []
def __enter__(self):
mod_name = self.mod_name
try:
self._saved_module.append(sys.modules[mod_name])
except KeyError:
pass
sys.modules[mod_name] = self.module
return self
def __exit__(self, *args):
if self._saved_module:
sys.modules[self.mod_name] = self._saved_module[0]
else:
del sys.modules[self.mod_name]
self._saved_module = []
class _ModifiedArgv0(object):
def __init__(self, value):
self.value = value
self._saved_value = self._sentinel = object()
def __enter__(self):
if self._saved_value is not self._sentinel:
raise RuntimeError("Already preserving saved value")
self._saved_value = sys.argv[0]
sys.argv[0] = self.value
def __exit__(self, *args):
self.value = self._sentinel
sys.argv[0] = self._saved_value
# TODO: Replace these helpers with importlib._bootstrap._SpecMethods
def _run_code(code, run_globals, init_globals=None,
mod_name=None, mod_spec=None,
pkg_name=None, script_name=None):
"""Helper to run code in nominated namespace"""
if init_globals is not None:
run_globals.update(init_globals)
if mod_spec is None:
loader = None
fname = script_name
cached = None
else:
loader = mod_spec.loader
fname = mod_spec.origin
cached = mod_spec.cached
if pkg_name is None:
pkg_name = mod_spec.parent
run_globals.update(__name__ = mod_name,
__file__ = fname,
__cached__ = cached,
__doc__ = None,
__loader__ = loader,
__package__ = pkg_name,
__spec__ = mod_spec)
exec(code, run_globals)
return run_globals
def _run_module_code(code, init_globals=None,
mod_name=None, mod_spec=None,
pkg_name=None, script_name=None):
"""Helper to run code in new namespace with sys modified"""
fname = script_name if mod_spec is None else mod_spec.origin
with _TempModule(mod_name) as temp_module, _ModifiedArgv0(fname):
mod_globals = temp_module.module.__dict__
_run_code(code, mod_globals, init_globals,
mod_name, mod_spec, pkg_name, script_name)
# Copy the globals of the temporary module, as they
# may be cleared when the temporary module goes away
return mod_globals.copy()
# Helper to get the loader, code and filename for a module
def _get_module_details(mod_name):
try:
spec = importlib.util.find_spec(mod_name)
except (ImportError, AttributeError, TypeError, ValueError) as ex:
# This hack fixes an impedance mismatch between pkgutil and
# importlib, where the latter raises other errors for cases where
# pkgutil previously raised ImportError
msg = "Error while finding spec for {!r} ({}: {})"
raise ImportError(msg.format(mod_name, type(ex), ex)) from ex
if spec is None:
raise ImportError("No module named %s" % mod_name)
if spec.submodule_search_locations is not None:
if mod_name == "__main__" or mod_name.endswith(".__main__"):
raise ImportError("Cannot use package as __main__ module")
try:
pkg_main_name = mod_name + ".__main__"
return _get_module_details(pkg_main_name)
except ImportError as e:
raise ImportError(("%s; %r is a package and cannot " +
"be directly executed") %(e, mod_name))
loader = spec.loader
if loader is None:
raise ImportError("%r is a namespace package and cannot be executed"
% mod_name)
code = loader.get_code(mod_name)
if code is None:
raise ImportError("No code object available for %s" % mod_name)
return mod_name, spec, code
# XXX ncoghlan: Should this be documented and made public?
# (Current thoughts: don't repeat the mistake that lead to its
# creation when run_module() no longer met the needs of
# mainmodule.c, but couldn't be changed because it was public)
def _run_module_as_main(mod_name, alter_argv=True):
"""Runs the designated module in the __main__ namespace
Note that the executed module will have full access to the
__main__ namespace. If this is not desirable, the run_module()
function should be used to run the module code in a fresh namespace.
At the very least, these variables in __main__ will be overwritten:
__name__
__file__
__cached__
__loader__
__package__
"""
try:
if alter_argv or mod_name != "__main__": # i.e. -m switch
mod_name, mod_spec, code = _get_module_details(mod_name)
else: # i.e. directory or zipfile execution
mod_name, mod_spec, code = _get_main_module_details()
except ImportError as exc:
# Try to provide a good error message
# for directories, zip files and the -m switch
if alter_argv:
# For -m switch, just display the exception
info = str(exc)
else:
# For directories/zipfiles, let the user
# know what the code was looking for
info = "can't find '__main__' module in %r" % sys.argv[0]
msg = "%s: %s" % (sys.executable, info)
sys.exit(msg)
main_globals = sys.modules["__main__"].__dict__
if alter_argv:
sys.argv[0] = mod_spec.origin
return _run_code(code, main_globals, None,
"__main__", mod_spec)
def run_module(mod_name, init_globals=None,
run_name=None, alter_sys=False):
"""Execute a module's code without importing it
Returns the resulting top level namespace dictionary
"""
mod_name, mod_spec, code = _get_module_details(mod_name)
if run_name is None:
run_name = mod_name
if alter_sys:
return _run_module_code(code, init_globals, run_name, mod_spec)
else:
# Leave the sys module alone
return _run_code(code, {}, init_globals, run_name, mod_spec)
def _get_main_module_details():
# Helper that gives a nicer error message when attempting to
# execute a zipfile or directory by invoking __main__.py
# Also moves the standard __main__ out of the way so that the
# preexisting __loader__ entry doesn't cause issues
main_name = "__main__"
saved_main = sys.modules[main_name]
del sys.modules[main_name]
try:
return _get_module_details(main_name)
except ImportError as exc:
if main_name in str(exc):
raise ImportError("can't find %r module in %r" %
(main_name, sys.path[0])) from exc
raise
finally:
sys.modules[main_name] = saved_main
def _get_code_from_file(run_name, fname):
# Check for a compiled file first
with open(fname, "rb") as f:
code = read_code(f)
if code is None:
# That didn't work, so try it as normal source code
with open(fname, "rb") as f:
code = compile(f.read(), fname, 'exec')
return code, fname
def run_path(path_name, init_globals=None, run_name=None):
"""Execute code located at the specified filesystem location
Returns the resulting top level namespace dictionary
The file path may refer directly to a Python script (i.e.
one that could be directly executed with execfile) or else
it may refer to a zipfile or directory containing a top
level __main__.py script.
"""
if run_name is None:
run_name = "<run_path>"
pkg_name = run_name.rpartition(".")[0]
importer = get_importer(path_name)
# Trying to avoid importing imp so as to not consume the deprecation warning.
is_NullImporter = False
if type(importer).__module__ == 'imp':
if type(importer).__name__ == 'NullImporter':
is_NullImporter = True
if isinstance(importer, type(None)) or is_NullImporter:
# Not a valid sys.path entry, so run the code directly
# execfile() doesn't help as we want to allow compiled files
code, fname = _get_code_from_file(run_name, path_name)
return _run_module_code(code, init_globals, run_name,
pkg_name=pkg_name, script_name=fname)
else:
# Importer is defined for path, so add it to
# the start of sys.path
sys.path.insert(0, path_name)
try:
# Here's where things are a little different from the run_module
# case. There, we only had to replace the module in sys while the
# code was running and doing so was somewhat optional. Here, we
# have no choice and we have to remove it even while we read the
# code. If we don't do this, a __loader__ attribute in the
# existing __main__ module may prevent location of the new module.
mod_name, mod_spec, code = _get_main_module_details()
with _TempModule(run_name) as temp_module, \
_ModifiedArgv0(path_name):
mod_globals = temp_module.module.__dict__
return _run_code(code, mod_globals, init_globals,
run_name, mod_spec, pkg_name).copy()
finally:
try:
sys.path.remove(path_name)
except ValueError:
pass
if __name__ == "__main__":
# Run the module specified as the next command line argument
if len(sys.argv) < 2:
print("No module specified for execution", file=sys.stderr)
else:
del sys.argv[0] # Make the requested module sys.argv[0]
_run_module_as_main(sys.argv[0])
"""A generally useful event scheduler class.
Each instance of this class manages its own queue.
No multi-threading is implied; you are supposed to hack that
yourself, or use a single instance per application.
Each instance is parametrized with two functions, one that is
supposed to return the current time, one that is supposed to
implement a delay. You can implement real-time scheduling by
substituting time and sleep from built-in module time, or you can
implement simulated time by writing your own functions. This can
also be used to integrate scheduling with STDWIN events; the delay
function is allowed to modify the queue. Time can be expressed as
integers or floating point numbers, as long as it is consistent.
Events are specified by tuples (time, priority, action, argument, kwargs).
As in UNIX, lower priority numbers mean higher priority; in this
way the queue can be maintained as a priority queue. Execution of the
event means calling the action function, passing it the argument
sequence in "argument" (remember that in Python, multiple function
arguments are be packed in a sequence) and keyword parameters in "kwargs".
The action function may be an instance method so it
has another way to reference private data (besides global variables).
"""
# XXX The timefunc and delayfunc should have been defined as methods
# XXX so you can define new kinds of schedulers using subclassing
# XXX instead of having to define a module or class just to hold
# XXX the global state of your particular time and delay functions.
import time
import heapq
from collections import namedtuple
try:
import threading
except ImportError:
import dummy_threading as threading
try:
from time import monotonic as _time
except ImportError:
from time import time as _time
__all__ = ["scheduler"]
class Event(namedtuple('Event', 'time, priority, action, argument, kwargs')):
def __eq__(s, o): return (s.time, s.priority) == (o.time, o.priority)
def __ne__(s, o): return (s.time, s.priority) != (o.time, o.priority)
def __lt__(s, o): return (s.time, s.priority) < (o.time, o.priority)
def __le__(s, o): return (s.time, s.priority) <= (o.time, o.priority)
def __gt__(s, o): return (s.time, s.priority) > (o.time, o.priority)
def __ge__(s, o): return (s.time, s.priority) >= (o.time, o.priority)
_sentinel = object()
class scheduler:
def __init__(self, timefunc=_time, delayfunc=time.sleep):
"""Initialize a new instance, passing the time and delay
functions"""
self._queue = []
self._lock = threading.RLock()
self.timefunc = timefunc
self.delayfunc = delayfunc
def enterabs(self, time, priority, action, argument=(), kwargs=_sentinel):
"""Enter a new event in the queue at an absolute time.
Returns an ID for the event which can be used to remove it,
if necessary.
"""
if kwargs is _sentinel:
kwargs = {}
event = Event(time, priority, action, argument, kwargs)
with self._lock:
heapq.heappush(self._queue, event)
return event # The ID
def enter(self, delay, priority, action, argument=(), kwargs=_sentinel):
"""A variant that specifies the time as a relative time.
This is actually the more commonly used interface.
"""
time = self.timefunc() + delay
return self.enterabs(time, priority, action, argument, kwargs)
def cancel(self, event):
"""Remove an event from the queue.
This must be presented the ID as returned by enter().
If the event is not in the queue, this raises ValueError.
"""
with self._lock:
self._queue.remove(event)
heapq.heapify(self._queue)
def empty(self):
"""Check whether the queue is empty."""
with self._lock:
return not self._queue
def run(self, blocking=True):
"""Execute events until the queue is empty.
If blocking is False executes the scheduled events due to
expire soonest (if any) and then return the deadline of the
next scheduled call in the scheduler.
When there is a positive delay until the first event, the
delay function is called and the event is left in the queue;
otherwise, the event is removed from the queue and executed
(its action function is called, passing it the argument). If
the delay function returns prematurely, it is simply
restarted.
It is legal for both the delay function and the action
function to modify the queue or to raise an exception;
exceptions are not caught but the scheduler's state remains
well-defined so run() may be called again.
A questionable hack is added to allow other threads to run:
just after an event is executed, a delay of 0 is executed, to
avoid monopolizing the CPU when other threads are also
runnable.
"""
# localize variable access to minimize overhead
# and to improve thread safety
lock = self._lock
q = self._queue
delayfunc = self.delayfunc
timefunc = self.timefunc
pop = heapq.heappop
while True:
with lock:
if not q:
break
time, priority, action, argument, kwargs = q[0]
now = timefunc()
if time > now:
delay = True
else:
delay = False
pop(q)
if delay:
if not blocking:
return time - now
delayfunc(time - now)
else:
action(*argument, **kwargs)
delayfunc(0) # Let other threads run
@property
def queue(self):
"""An ordered list of upcoming events.
Events are named tuples with fields for:
time, priority, action, arguments, kwargs
"""
# Use heapq to sort the queue rather than using 'sorted(self._queue)'.
# With heapq, two events scheduled at the same time will show in
# the actual order they would be retrieved.
with self._lock:
events = self._queue[:]
return list(map(heapq.heappop, [events]*len(events)))
"""Selectors module.
This module allows high-level and efficient I/O multiplexing, built upon the
`select` module primitives.
"""
from abc import ABCMeta, abstractmethod
from collections import namedtuple, Mapping
import math
import select
import sys
# generic events, that must be mapped to implementation-specific ones
EVENT_READ = (1 << 0)
EVENT_WRITE = (1 << 1)
def _fileobj_to_fd(fileobj):
"""Return a file descriptor from a file object.
Parameters:
fileobj -- file object or file descriptor
Returns:
corresponding file descriptor
Raises:
ValueError if the object is invalid
"""
if isinstance(fileobj, int):
fd = fileobj
else:
try:
fd = int(fileobj.fileno())
except (AttributeError, TypeError, ValueError):
raise ValueError("Invalid file object: "
"{!r}".format(fileobj)) from None
if fd < 0:
raise ValueError("Invalid file descriptor: {}".format(fd))
return fd
SelectorKey = namedtuple('SelectorKey', ['fileobj', 'fd', 'events', 'data'])
"""Object used to associate a file object to its backing file descriptor,
selected event mask and attached data."""
class _SelectorMapping(Mapping):
"""Mapping of file objects to selector keys."""
def __init__(self, selector):
self._selector = selector
def __len__(self):
return len(self._selector._fd_to_key)
def __getitem__(self, fileobj):
try:
fd = self._selector._fileobj_lookup(fileobj)
return self._selector._fd_to_key[fd]
except KeyError:
raise KeyError("{!r} is not registered".format(fileobj)) from None
def __iter__(self):
return iter(self._selector._fd_to_key)
class BaseSelector(metaclass=ABCMeta):
"""Selector abstract base class.
A selector supports registering file objects to be monitored for specific
I/O events.
A file object is a file descriptor or any object with a `fileno()` method.
An arbitrary object can be attached to the file object, which can be used
for example to store context information, a callback, etc.
A selector can use various implementations (select(), poll(), epoll()...)
depending on the platform. The default `Selector` class uses the most
efficient implementation on the current platform.
"""
@abstractmethod
def register(self, fileobj, events, data=None):
"""Register a file object.
Parameters:
fileobj -- file object or file descriptor
events -- events to monitor (bitwise mask of EVENT_READ|EVENT_WRITE)
data -- attached data
Returns:
SelectorKey instance
Raises:
ValueError if events is invalid
KeyError if fileobj is already registered
OSError if fileobj is closed or otherwise is unacceptable to
the underlying system call (if a system call is made)
Note:
OSError may or may not be raised
"""
raise NotImplementedError
@abstractmethod
def unregister(self, fileobj):
"""Unregister a file object.
Parameters:
fileobj -- file object or file descriptor
Returns:
SelectorKey instance
Raises:
KeyError if fileobj is not registered
Note:
If fileobj is registered but has since been closed this does
*not* raise OSError (even if the wrapped syscall does)
"""
raise NotImplementedError
def modify(self, fileobj, events, data=None):
"""Change a registered file object monitored events or attached data.
Parameters:
fileobj -- file object or file descriptor
events -- events to monitor (bitwise mask of EVENT_READ|EVENT_WRITE)
data -- attached data
Returns:
SelectorKey instance
Raises:
Anything that unregister() or register() raises
"""
self.unregister(fileobj)
return self.register(fileobj, events, data)
@abstractmethod
def select(self, timeout=None):
"""Perform the actual selection, until some monitored file objects are
ready or a timeout expires.
Parameters:
timeout -- if timeout > 0, this specifies the maximum wait time, in
seconds
if timeout <= 0, the select() call won't block, and will
report the currently ready file objects
if timeout is None, select() will block until a monitored
file object becomes ready
Returns:
list of (key, events) for ready file objects
`events` is a bitwise mask of EVENT_READ|EVENT_WRITE
"""
raise NotImplementedError
def close(self):
"""Close the selector.
This must be called to make sure that any underlying resource is freed.
"""
pass
def get_key(self, fileobj):
"""Return the key associated to a registered file object.
Returns:
SelectorKey for this file object
"""
mapping = self.get_map()
try:
if mapping is None:
raise KeyError
return mapping[fileobj]
except KeyError:
raise KeyError("{!r} is not registered".format(fileobj)) from None
@abstractmethod
def get_map(self):
"""Return a mapping of file objects to selector keys."""
raise NotImplementedError
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
class _BaseSelectorImpl(BaseSelector):
"""Base selector implementation."""
def __init__(self):
# this maps file descriptors to keys
self._fd_to_key = {}
# read-only mapping returned by get_map()
self._map = _SelectorMapping(self)
def _fileobj_lookup(self, fileobj):
"""Return a file descriptor from a file object.
This wraps _fileobj_to_fd() to do an exhaustive search in case
the object is invalid but we still have it in our map. This
is used by unregister() so we can unregister an object that
was previously registered even if it is closed. It is also
used by _SelectorMapping.
"""
try:
return _fileobj_to_fd(fileobj)
except ValueError:
# Do an exhaustive search.
for key in self._fd_to_key.values():
if key.fileobj is fileobj:
return key.fd
# Raise ValueError after all.
raise
def register(self, fileobj, events, data=None):
if (not events) or (events & ~(EVENT_READ | EVENT_WRITE)):
raise ValueError("Invalid events: {!r}".format(events))
key = SelectorKey(fileobj, self._fileobj_lookup(fileobj), events, data)
if key.fd in self._fd_to_key:
raise KeyError("{!r} (FD {}) is already registered"
.format(fileobj, key.fd))
self._fd_to_key[key.fd] = key
return key
def unregister(self, fileobj):
try:
key = self._fd_to_key.pop(self._fileobj_lookup(fileobj))
except KeyError:
raise KeyError("{!r} is not registered".format(fileobj)) from None
return key
def modify(self, fileobj, events, data=None):
# TODO: Subclasses can probably optimize this even further.
try:
key = self._fd_to_key[self._fileobj_lookup(fileobj)]
except KeyError:
raise KeyError("{!r} is not registered".format(fileobj)) from None
if events != key.events:
self.unregister(fileobj)
key = self.register(fileobj, events, data)
elif data != key.data:
# Use a shortcut to update the data.
key = key._replace(data=data)
self._fd_to_key[key.fd] = key
return key
def close(self):
self._fd_to_key.clear()
self._map = None
def get_map(self):
return self._map
def _key_from_fd(self, fd):
"""Return the key associated to a given file descriptor.
Parameters:
fd -- file descriptor
Returns:
corresponding key, or None if not found
"""
try:
return self._fd_to_key[fd]
except KeyError:
return None
class SelectSelector(_BaseSelectorImpl):
"""Select-based selector."""
def __init__(self):
super().__init__()
self._readers = set()
self._writers = set()
def register(self, fileobj, events, data=None):
key = super().register(fileobj, events, data)
if events & EVENT_READ:
self._readers.add(key.fd)
if events & EVENT_WRITE:
self._writers.add(key.fd)
return key
def unregister(self, fileobj):
key = super().unregister(fileobj)
self._readers.discard(key.fd)
self._writers.discard(key.fd)
return key
if sys.platform == 'win32':
def _select(self, r, w, _, timeout=None):
r, w, x = select.select(r, w, w, timeout)
return r, w + x, []
else:
_select = select.select
def select(self, timeout=None):
timeout = None if timeout is None else max(timeout, 0)
ready = []
try:
r, w, _ = self._select(self._readers, self._writers, [], timeout)
except InterruptedError:
return ready
r = set(r)
w = set(w)
for fd in r | w:
events = 0
if fd in r:
events |= EVENT_READ
if fd in w:
events |= EVENT_WRITE
key = self._key_from_fd(fd)
if key:
ready.append((key, events & key.events))
return ready
if hasattr(select, 'poll'):
class PollSelector(_BaseSelectorImpl):
"""Poll-based selector."""
def __init__(self):
super().__init__()
self._poll = select.poll()
def register(self, fileobj, events, data=None):
key = super().register(fileobj, events, data)
poll_events = 0
if events & EVENT_READ:
poll_events |= select.POLLIN
if events & EVENT_WRITE:
poll_events |= select.POLLOUT
self._poll.register(key.fd, poll_events)
return key
def unregister(self, fileobj):
key = super().unregister(fileobj)
self._poll.unregister(key.fd)
return key
def select(self, timeout=None):
if timeout is None:
timeout = None
elif timeout <= 0:
timeout = 0
else:
# poll() has a resolution of 1 millisecond, round away from
# zero to wait *at least* timeout seconds.
timeout = math.ceil(timeout * 1e3)
ready = []
try:
fd_event_list = self._poll.poll(timeout)
except InterruptedError:
return ready
for fd, event in fd_event_list:
events = 0
if event & ~select.POLLIN:
events |= EVENT_WRITE
if event & ~select.POLLOUT:
events |= EVENT_READ
key = self._key_from_fd(fd)
if key:
ready.append((key, events & key.events))
return ready
if hasattr(select, 'epoll'):
class EpollSelector(_BaseSelectorImpl):
"""Epoll-based selector."""
def __init__(self):
super().__init__()
self._epoll = select.epoll()
def fileno(self):
return self._epoll.fileno()
def register(self, fileobj, events, data=None):
key = super().register(fileobj, events, data)
epoll_events = 0
if events & EVENT_READ:
epoll_events |= select.EPOLLIN
if events & EVENT_WRITE:
epoll_events |= select.EPOLLOUT
try:
self._epoll.register(key.fd, epoll_events)
except BaseException:
super().unregister(fileobj)
raise
return key
def unregister(self, fileobj):
key = super().unregister(fileobj)
try:
self._epoll.unregister(key.fd)
except OSError:
# This can happen if the FD was closed since it
# was registered.
pass
return key
def select(self, timeout=None):
if timeout is None:
timeout = -1
elif timeout <= 0:
timeout = 0
else:
# epoll_wait() has a resolution of 1 millisecond, round away
# from zero to wait *at least* timeout seconds.
timeout = math.ceil(timeout * 1e3) * 1e-3
# epoll_wait() expects `maxevents` to be greater than zero;
# we want to make sure that `select()` can be called when no
# FD is registered.
max_ev = max(len(self._fd_to_key), 1)
ready = []
try:
fd_event_list = self._epoll.poll(timeout, max_ev)
except InterruptedError:
return ready
for fd, event in fd_event_list:
events = 0
if event & ~select.EPOLLIN:
events |= EVENT_WRITE
if event & ~select.EPOLLOUT:
events |= EVENT_READ
key = self._key_from_fd(fd)
if key:
ready.append((key, events & key.events))
return ready
def close(self):
try:
self._epoll.close()
finally:
super().close()
if hasattr(select, 'kqueue'):
class KqueueSelector(_BaseSelectorImpl):
"""Kqueue-based selector."""
def __init__(self):
super().__init__()
self._kqueue = select.kqueue()
def fileno(self):
return self._kqueue.fileno()
def register(self, fileobj, events, data=None):
key = super().register(fileobj, events, data)
try:
if events & EVENT_READ:
kev = select.kevent(key.fd, select.KQ_FILTER_READ,
select.KQ_EV_ADD)
self._kqueue.control([kev], 0, 0)
if events & EVENT_WRITE:
kev = select.kevent(key.fd, select.KQ_FILTER_WRITE,
select.KQ_EV_ADD)
self._kqueue.control([kev], 0, 0)
except BaseException:
super().unregister(fileobj)
raise
return key
def unregister(self, fileobj):
key = super().unregister(fileobj)
if key.events & EVENT_READ:
kev = select.kevent(key.fd, select.KQ_FILTER_READ,
select.KQ_EV_DELETE)
try:
self._kqueue.control([kev], 0, 0)
except OSError:
# This can happen if the FD was closed since it
# was registered.
pass
if key.events & EVENT_WRITE:
kev = select.kevent(key.fd, select.KQ_FILTER_WRITE,
select.KQ_EV_DELETE)
try:
self._kqueue.control([kev], 0, 0)
except OSError:
# See comment above.
pass
return key
def select(self, timeout=None):
timeout = None if timeout is None else max(timeout, 0)
max_ev = len(self._fd_to_key)
ready = []
try:
kev_list = self._kqueue.control(None, max_ev, timeout)
except InterruptedError:
return ready
for kev in kev_list:
fd = kev.ident
flag = kev.filter
events = 0
if flag == select.KQ_FILTER_READ:
events |= EVENT_READ
if flag == select.KQ_FILTER_WRITE:
events |= EVENT_WRITE
key = self._key_from_fd(fd)
if key:
ready.append((key, events & key.events))
return ready
def close(self):
try:
self._kqueue.close()
finally:
super().close()
# Choose the best implementation: roughly, epoll|kqueue > poll > select.
# select() also can't accept a FD > FD_SETSIZE (usually around 1024)
if 'KqueueSelector' in globals():
DefaultSelector = KqueueSelector
elif 'EpollSelector' in globals():
DefaultSelector = EpollSelector
elif 'PollSelector' in globals():
DefaultSelector = PollSelector
else:
DefaultSelector = SelectSelector
"""Manage shelves of pickled objects.
A "shelf" is a persistent, dictionary-like object. The difference
with dbm databases is that the values (not the keys!) in a shelf can
be essentially arbitrary Python objects -- anything that the "pickle"
module can handle. This includes most class instances, recursive data
types, and objects containing lots of shared sub-objects. The keys
are ordinary strings.
To summarize the interface (key is a string, data is an arbitrary
object):
import shelve
d = shelve.open(filename) # open, with (g)dbm filename -- no suffix
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of the data at key (raise
# KeyError if no such key) -- NOTE that this
# access returns a *copy* of the entry!
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = key in d # true if the key exists
list = d.keys() # a list of all existing keys (slow!)
d.close() # close it
Dependent on the implementation, closing a persistent dictionary may
or may not be necessary to flush changes to disk.
Normally, d[key] returns a COPY of the entry. This needs care when
mutable entries are mutated: for example, if d[key] is a list,
d[key].append(anitem)
does NOT modify the entry d[key] itself, as stored in the persistent
mapping -- it only modifies the copy, which is then immediately
discarded, so that the append has NO effect whatsoever. To append an
item to d[key] in a way that will affect the persistent mapping, use:
data = d[key]
data.append(anitem)
d[key] = data
To avoid the problem with mutable entries, you may pass the keyword
argument writeback=True in the call to shelve.open. When you use:
d = shelve.open(filename, writeback=True)
then d keeps a cache of all entries you access, and writes them all back
to the persistent mapping when you call d.close(). This ensures that
such usage as d[key].append(anitem) works as intended.
However, using keyword argument writeback=True may consume vast amount
of memory for the cache, and it may make d.close() very slow, if you
access many of d's entries after opening it in this way: d has no way to
check which of the entries you access are mutable and/or which ones you
actually mutate, so it must cache, and write back at close, all of the
entries that you access. You can call d.sync() to write back all the
entries in the cache, and empty the cache (d.sync() also synchronizes
the persistent dictionary on disk, if feasible).
"""
from pickle import Pickler, Unpickler
from io import BytesIO
import collections
__all__ = ["Shelf", "BsdDbShelf", "DbfilenameShelf", "open"]
class _ClosedDict(collections.MutableMapping):
'Marker for a closed dict. Access attempts raise a ValueError.'
def closed(self, *args):
raise ValueError('invalid operation on closed shelf')
__iter__ = __len__ = __getitem__ = __setitem__ = __delitem__ = keys = closed
def __repr__(self):
return '<Closed Dictionary>'
class Shelf(collections.MutableMapping):
"""Base class for shelf implementations.
This is initialized with a dictionary-like object.
See the module's __doc__ string for an overview of the interface.
"""
def __init__(self, dict, protocol=None, writeback=False,
keyencoding="utf-8"):
self.dict = dict
if protocol is None:
protocol = 3
self._protocol = protocol
self.writeback = writeback
self.cache = {}
self.keyencoding = keyencoding
def __iter__(self):
for k in self.dict.keys():
yield k.decode(self.keyencoding)
def __len__(self):
return len(self.dict)
def __contains__(self, key):
return key.encode(self.keyencoding) in self.dict
def get(self, key, default=None):
if key.encode(self.keyencoding) in self.dict:
return self[key]
return default
def __getitem__(self, key):
try:
value = self.cache[key]
except KeyError:
f = BytesIO(self.dict[key.encode(self.keyencoding)])
value = Unpickler(f).load()
if self.writeback:
self.cache[key] = value
return value
def __setitem__(self, key, value):
if self.writeback:
self.cache[key] = value
f = BytesIO()
p = Pickler(f, self._protocol)
p.dump(value)
self.dict[key.encode(self.keyencoding)] = f.getvalue()
def __delitem__(self, key):
del self.dict[key.encode(self.keyencoding)]
try:
del self.cache[key]
except KeyError:
pass
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.close()
def close(self):
if self.dict is None:
return
try:
self.sync()
try:
self.dict.close()
except AttributeError:
pass
finally:
# Catch errors that may happen when close is called from __del__
# because CPython is in interpreter shutdown.
try:
self.dict = _ClosedDict()
except:
self.dict = None
def __del__(self):
if not hasattr(self, 'writeback'):
# __init__ didn't succeed, so don't bother closing
# see http://bugs.python.org/issue1339007 for details
return
self.close()
def sync(self):
if self.writeback and self.cache:
self.writeback = False
for key, entry in self.cache.items():
self[key] = entry
self.writeback = True
self.cache = {}
if hasattr(self.dict, 'sync'):
self.dict.sync()
class BsdDbShelf(Shelf):
"""Shelf implementation using the "BSD" db interface.
This adds methods first(), next(), previous(), last() and
set_location() that have no counterpart in [g]dbm databases.
The actual database must be opened using one of the "bsddb"
modules "open" routines (i.e. bsddb.hashopen, bsddb.btopen or
bsddb.rnopen) and passed to the constructor.
See the module's __doc__ string for an overview of the interface.
"""
def __init__(self, dict, protocol=None, writeback=False,
keyencoding="utf-8"):
Shelf.__init__(self, dict, protocol, writeback, keyencoding)
def set_location(self, key):
(key, value) = self.dict.set_location(key)
f = BytesIO(value)
return (key.decode(self.keyencoding), Unpickler(f).load())
def next(self):
(key, value) = next(self.dict)
f = BytesIO(value)
return (key.decode(self.keyencoding), Unpickler(f).load())
def previous(self):
(key, value) = self.dict.previous()
f = BytesIO(value)
return (key.decode(self.keyencoding), Unpickler(f).load())
def first(self):
(key, value) = self.dict.first()
f = BytesIO(value)
return (key.decode(self.keyencoding), Unpickler(f).load())
def last(self):
(key, value) = self.dict.last()
f = BytesIO(value)
return (key.decode(self.keyencoding), Unpickler(f).load())
class DbfilenameShelf(Shelf):
"""Shelf implementation using the "dbm" generic dbm interface.
This is initialized with the filename for the dbm database.
See the module's __doc__ string for an overview of the interface.
"""
def __init__(self, filename, flag='c', protocol=None, writeback=False):
import dbm
Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
def open(filename, flag='c', protocol=None, writeback=False):
"""Open a persistent dictionary for reading and writing.
The filename parameter is the base filename for the underlying
database. As a side-effect, an extension may be added to the
filename and more than one file may be created. The optional flag
parameter has the same interpretation as the flag parameter of
dbm.open(). The optional protocol parameter specifies the
version of the pickle protocol (0, 1, or 2).
See the module's __doc__ string for an overview of the interface.
"""
return DbfilenameShelf(filename, flag, protocol, writeback)
"""A lexical analyzer class for simple shell-like syntaxes."""
# Module and documentation by Eric S. Raymond, 21 Dec 1998
# Input stacking and error message cleanup added by ESR, March 2000
# push_source() and pop_source() made explicit by ESR, January 2001.
# Posix compliance, split(), string arguments, and
# iterator interface by Gustavo Niemeyer, April 2003.
import os
import re
import sys
from collections import deque
from io import StringIO
__all__ = ["shlex", "split", "quote"]
class shlex:
"A lexical analyzer class for simple shell-like syntaxes."
def __init__(self, instream=None, infile=None, posix=False):
if isinstance(instream, str):
instream = StringIO(instream)
if instream is not None:
self.instream = instream
self.infile = infile
else:
self.instream = sys.stdin
self.infile = None
self.posix = posix
if posix:
self.eof = None
else:
self.eof = ''
self.commenters = '#'
self.wordchars = ('abcdfeghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_')
if self.posix:
self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'
'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')
self.whitespace = ' \t\r\n'
self.whitespace_split = False
self.quotes = '\'"'
self.escape = '\\'
self.escapedquotes = '"'
self.state = ' '
self.pushback = deque()
self.lineno = 1
self.debug = 0
self.token = ''
self.filestack = deque()
self.source = None
if self.debug:
print('shlex: reading from %s, line %d' \
% (self.instream, self.lineno))
def push_token(self, tok):
"Push a token onto the stack popped by the get_token method"
if self.debug >= 1:
print("shlex: pushing token " + repr(tok))
self.pushback.appendleft(tok)
def push_source(self, newstream, newfile=None):
"Push an input source onto the lexer's input source stack."
if isinstance(newstream, str):
newstream = StringIO(newstream)
self.filestack.appendleft((self.infile, self.instream, self.lineno))
self.infile = newfile
self.instream = newstream
self.lineno = 1
if self.debug:
if newfile is not None:
print('shlex: pushing to file %s' % (self.infile,))
else:
print('shlex: pushing to stream %s' % (self.instream,))
def pop_source(self):
"Pop the input source stack."
self.instream.close()
(self.infile, self.instream, self.lineno) = self.filestack.popleft()
if self.debug:
print('shlex: popping to %s, line %d' \
% (self.instream, self.lineno))
self.state = ' '
def get_token(self):
"Get a token from the input stream (or from stack if it's nonempty)"
if self.pushback:
tok = self.pushback.popleft()
if self.debug >= 1:
print("shlex: popping token " + repr(tok))
return tok
# No pushback. Get a token.
raw = self.read_token()
# Handle inclusions
if self.source is not None:
while raw == self.source:
spec = self.sourcehook(self.read_token())
if spec:
(newfile, newstream) = spec
self.push_source(newstream, newfile)
raw = self.get_token()
# Maybe we got EOF instead?
while raw == self.eof:
if not self.filestack:
return self.eof
else:
self.pop_source()
raw = self.get_token()
# Neither inclusion nor EOF
if self.debug >= 1:
if raw != self.eof:
print("shlex: token=" + repr(raw))
else:
print("shlex: token=EOF")
return raw
def read_token(self):
quoted = False
escapedstate = ' '
while True:
nextchar = self.instream.read(1)
if nextchar == '\n':
self.lineno = self.lineno + 1
if self.debug >= 3:
print("shlex: in state", repr(self.state), \
"I see character:", repr(nextchar))
if self.state is None:
self.token = '' # past end of file
break
elif self.state == ' ':
if not nextchar:
self.state = None # end of file
break
elif nextchar in self.whitespace:
if self.debug >= 2:
print("shlex: I see whitespace in whitespace state")
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif nextchar in self.commenters:
self.instream.readline()
self.lineno = self.lineno + 1
elif self.posix and nextchar in self.escape:
escapedstate = 'a'
self.state = nextchar
elif nextchar in self.wordchars:
self.token = nextchar
self.state = 'a'
elif nextchar in self.quotes:
if not self.posix:
self.token = nextchar
self.state = nextchar
elif self.whitespace_split:
self.token = nextchar
self.state = 'a'
else:
self.token = nextchar
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif self.state in self.quotes:
quoted = True
if not nextchar: # end of file
if self.debug >= 2:
print("shlex: I see EOF in quotes state")
# XXX what error should be raised here?
raise ValueError("No closing quotation")
if nextchar == self.state:
if not self.posix:
self.token = self.token + nextchar
self.state = ' '
break
else:
self.state = 'a'
elif self.posix and nextchar in self.escape and \
self.state in self.escapedquotes:
escapedstate = self.state
self.state = nextchar
else:
self.token = self.token + nextchar
elif self.state in self.escape:
if not nextchar: # end of file
if self.debug >= 2:
print("shlex: I see EOF in escape state")
# XXX what error should be raised here?
raise ValueError("No escaped character")
# In posix shells, only the quote itself or the escape
# character may be escaped within quotes.
if escapedstate in self.quotes and \
nextchar != self.state and nextchar != escapedstate:
self.token = self.token + self.state
self.token = self.token + nextchar
self.state = escapedstate
elif self.state == 'a':
if not nextchar:
self.state = None # end of file
break
elif nextchar in self.whitespace:
if self.debug >= 2:
print("shlex: I see whitespace in word state")
self.state = ' '
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif nextchar in self.commenters:
self.instream.readline()
self.lineno = self.lineno + 1
if self.posix:
self.state = ' '
if self.token or (self.posix and quoted):
break # emit current token
else:
continue
elif self.posix and nextchar in self.quotes:
self.state = nextchar
elif self.posix and nextchar in self.escape:
escapedstate = 'a'
self.state = nextchar
elif nextchar in self.wordchars or nextchar in self.quotes \
or self.whitespace_split:
self.token = self.token + nextchar
else:
self.pushback.appendleft(nextchar)
if self.debug >= 2:
print("shlex: I see punctuation in word state")
self.state = ' '
if self.token:
break # emit current token
else:
continue
result = self.token
self.token = ''
if self.posix and not quoted and result == '':
result = None
if self.debug > 1:
if result:
print("shlex: raw token=" + repr(result))
else:
print("shlex: raw token=EOF")
return result
def sourcehook(self, newfile):
"Hook called on a filename to be sourced."
if newfile[0] == '"':
newfile = newfile[1:-1]
# This implements cpp-like semantics for relative-path inclusion.
if isinstance(self.infile, str) and not os.path.isabs(newfile):
newfile = os.path.join(os.path.dirname(self.infile), newfile)
return (newfile, open(newfile, "r"))
def error_leader(self, infile=None, lineno=None):
"Emit a C-compiler-like, Emacs-friendly error-message leader."
if infile is None:
infile = self.infile
if lineno is None:
lineno = self.lineno
return "\"%s\", line %d: " % (infile, lineno)
def __iter__(self):
return self
def __next__(self):
token = self.get_token()
if token == self.eof:
raise StopIteration
return token
def split(s, comments=False, posix=True):
lex = shlex(s, posix=posix)
lex.whitespace_split = True
if not comments:
lex.commenters = ''
return list(lex)
_find_unsafe = re.compile(r'[^\w@%+=:,./-]', re.ASCII).search
def quote(s):
"""Return a shell-escaped version of the string *s*."""
if not s:
return "''"
if _find_unsafe(s) is None:
return s
# use single quotes, and put single quotes into double quotes
# the string $'b is then quoted as '$'"'"'b'
return "'" + s.replace("'", "'\"'\"'") + "'"
def _print_tokens(lexer):
while 1:
tt = lexer.get_token()
if not tt:
break
print("Token: " + repr(tt))
if __name__ == '__main__':
if len(sys.argv) == 1:
_print_tokens(shlex())
else:
fn = sys.argv[1]
with open(fn) as f:
_print_tokens(shlex(f, fn))
"""Utility functions for copying and archiving files and directory trees.
XXX The functions here don't copy the resource fork or other metadata on Mac.
"""
import os
import sys
import stat
from os.path import abspath
import fnmatch
import collections
import errno
import tarfile
try:
import bz2
del bz2
_BZ2_SUPPORTED = True
except ImportError:
_BZ2_SUPPORTED = False
try:
from pwd import getpwnam
except ImportError:
getpwnam = None
try:
from grp import getgrnam
except ImportError:
getgrnam = None
__all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2",
"copytree", "move", "rmtree", "Error", "SpecialFileError",
"ExecError", "make_archive", "get_archive_formats",
"register_archive_format", "unregister_archive_format",
"get_unpack_formats", "register_unpack_format",
"unregister_unpack_format", "unpack_archive",
"ignore_patterns", "chown", "which", "get_terminal_size",
"SameFileError"]
# disk_usage is added later, if available on the platform
class Error(OSError):
pass
class SameFileError(Error):
"""Raised when source and destination are the same file."""
class SpecialFileError(OSError):
"""Raised when trying to do a kind of operation (e.g. copying) which is
not supported on a special file (e.g. a named pipe)"""
class ExecError(OSError):
"""Raised when a command could not be executed"""
class ReadError(OSError):
"""Raised when an archive cannot be read"""
class RegistryError(Exception):
"""Raised when a registry operation with the archiving
and unpacking registeries fails"""
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
def _samefile(src, dst):
# Macintosh, Unix.
if hasattr(os.path, 'samefile'):
try:
return os.path.samefile(src, dst)
except OSError:
return False
# All other platforms: check for same pathname.
return (os.path.normcase(os.path.abspath(src)) ==
os.path.normcase(os.path.abspath(dst)))
def copyfile(src, dst, *, follow_symlinks=True):
"""Copy data from src to dst.
If follow_symlinks is not set and src is a symbolic link, a new
symlink will be created instead of copying the file it points to.
"""
if _samefile(src, dst):
raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
for fn in [src, dst]:
try:
st = os.stat(fn)
except OSError:
# File most likely does not exist
pass
else:
# XXX What about other special files? (sockets, devices...)
if stat.S_ISFIFO(st.st_mode):
raise SpecialFileError("`%s` is a named pipe" % fn)
if not follow_symlinks and os.path.islink(src):
os.symlink(os.readlink(src), dst)
else:
with open(src, 'rb') as fsrc:
with open(dst, 'wb') as fdst:
copyfileobj(fsrc, fdst)
return dst
def copymode(src, dst, *, follow_symlinks=True):
"""Copy mode bits from src to dst.
If follow_symlinks is not set, symlinks aren't followed if and only
if both `src` and `dst` are symlinks. If `lchmod` isn't available
(e.g. Linux) this method does nothing.
"""
if not follow_symlinks and os.path.islink(src) and os.path.islink(dst):
if hasattr(os, 'lchmod'):
stat_func, chmod_func = os.lstat, os.lchmod
else:
return
elif hasattr(os, 'chmod'):
stat_func, chmod_func = os.stat, os.chmod
else:
return
st = stat_func(src)
chmod_func(dst, stat.S_IMODE(st.st_mode))
if hasattr(os, 'listxattr'):
def _copyxattr(src, dst, *, follow_symlinks=True):
"""Copy extended filesystem attributes from `src` to `dst`.
Overwrite existing attributes.
If `follow_symlinks` is false, symlinks won't be followed.
"""
try:
names = os.listxattr(src, follow_symlinks=follow_symlinks)
except OSError as e:
if e.errno not in (errno.ENOTSUP, errno.ENODATA):
raise
return
for name in names:
try:
value = os.getxattr(src, name, follow_symlinks=follow_symlinks)
os.setxattr(dst, name, value, follow_symlinks=follow_symlinks)
except OSError as e:
if e.errno not in (errno.EPERM, errno.ENOTSUP, errno.ENODATA):
raise
else:
def _copyxattr(*args, **kwargs):
pass
def copystat(src, dst, *, follow_symlinks=True):
"""Copy all stat info (mode bits, atime, mtime, flags) from src to dst.
If the optional flag `follow_symlinks` is not set, symlinks aren't followed if and
only if both `src` and `dst` are symlinks.
"""
def _nop(*args, ns=None, follow_symlinks=None):
pass
# follow symlinks (aka don't not follow symlinks)
follow = follow_symlinks or not (os.path.islink(src) and os.path.islink(dst))
if follow:
# use the real function if it exists
def lookup(name):
return getattr(os, name, _nop)
else:
# use the real function only if it exists
# *and* it supports follow_symlinks
def lookup(name):
fn = getattr(os, name, _nop)
if fn in os.supports_follow_symlinks:
return fn
return _nop
st = lookup("stat")(src, follow_symlinks=follow)
mode = stat.S_IMODE(st.st_mode)
lookup("utime")(dst, ns=(st.st_atime_ns, st.st_mtime_ns),
follow_symlinks=follow)
try:
lookup("chmod")(dst, mode, follow_symlinks=follow)
except NotImplementedError:
# if we got a NotImplementedError, it's because
# * follow_symlinks=False,
# * lchown() is unavailable, and
# * either
# * fchownat() is unavailable or
# * fchownat() doesn't implement AT_SYMLINK_NOFOLLOW.
# (it returned ENOSUP.)
# therefore we're out of options--we simply cannot chown the
# symlink. give up, suppress the error.
# (which is what shutil always did in this circumstance.)
pass
if hasattr(st, 'st_flags'):
try:
lookup("chflags")(dst, st.st_flags, follow_symlinks=follow)
except OSError as why:
for err in 'EOPNOTSUPP', 'ENOTSUP':
if hasattr(errno, err) and why.errno == getattr(errno, err):
break
else:
raise
_copyxattr(src, dst, follow_symlinks=follow)
def copy(src, dst, *, follow_symlinks=True):
"""Copy data and mode bits ("cp src dst"). Return the file's destination.
The destination may be a directory.
If follow_symlinks is false, symlinks won't be followed. This
resembles GNU's "cp -P src dst".
If source and destination are the same file, a SameFileError will be
raised.
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst, follow_symlinks=follow_symlinks)
copymode(src, dst, follow_symlinks=follow_symlinks)
return dst
def copy2(src, dst, *, follow_symlinks=True):
"""Copy data and all stat info ("cp -p src dst"). Return the file's
destination."
The destination may be a directory.
If follow_symlinks is false, symlinks won't be followed. This
resembles GNU's "cp -P src dst".
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst, follow_symlinks=follow_symlinks)
copystat(src, dst, follow_symlinks=follow_symlinks)
return dst
def ignore_patterns(*patterns):
"""Function that can be used as copytree() ignore parameter.
Patterns is a sequence of glob-style patterns
that are used to exclude files"""
def _ignore_patterns(path, names):
ignored_names = []
for pattern in patterns:
ignored_names.extend(fnmatch.filter(names, pattern))
return set(ignored_names)
return _ignore_patterns
def copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2,
ignore_dangling_symlinks=False):
"""Recursively copy a directory tree.
The destination directory must not already exist.
If exception(s) occur, an Error is raised with a list of reasons.
If the optional symlinks flag is true, symbolic links in the
source tree result in symbolic links in the destination tree; if
it is false, the contents of the files pointed to by symbolic
links are copied. If the file pointed by the symlink doesn't
exist, an exception will be added in the list of errors raised in
an Error exception at the end of the copy process.
You can set the optional ignore_dangling_symlinks flag to true if you
want to silence this exception. Notice that this has no effect on
platforms that don't support os.symlink.
The optional ignore argument is a callable. If given, it
is called with the `src` parameter, which is the directory
being visited by copytree(), and `names` which is the list of
`src` contents, as returned by os.listdir():
callable(src, names) -> ignored_names
Since copytree() is called recursively, the callable will be
called once for each directory that is copied. It returns a
list of names relative to the `src` directory that should
not be copied.
The optional copy_function argument is a callable that will be used
to copy each file. It will be called with the source path and the
destination path as arguments. By default, copy2() is used, but any
function that supports the same signature (like copy()) can be used.
"""
names = os.listdir(src)
if ignore is not None:
ignored_names = ignore(src, names)
else:
ignored_names = set()
os.makedirs(dst)
errors = []
for name in names:
if name in ignored_names:
continue
srcname = os.path.join(src, name)
dstname = os.path.join(dst, name)
try:
if os.path.islink(srcname):
linkto = os.readlink(srcname)
if symlinks:
# We can't just leave it to `copy_function` because legacy
# code with a custom `copy_function` may rely on copytree
# doing the right thing.
os.symlink(linkto, dstname)
copystat(srcname, dstname, follow_symlinks=not symlinks)
else:
# ignore dangling symlink if the flag is on
if not os.path.exists(linkto) and ignore_dangling_symlinks:
continue
# otherwise let the copy occurs. copy2 will raise an error
if os.path.isdir(srcname):
copytree(srcname, dstname, symlinks, ignore,
copy_function)
else:
copy_function(srcname, dstname)
elif os.path.isdir(srcname):
copytree(srcname, dstname, symlinks, ignore, copy_function)
else:
# Will raise a SpecialFileError for unsupported file types
copy_function(srcname, dstname)
# catch the Error from the recursive copytree so that we can
# continue with other files
except Error as err:
errors.extend(err.args[0])
except OSError as why:
errors.append((srcname, dstname, str(why)))
try:
copystat(src, dst)
except OSError as why:
# Copying file access times may fail on Windows
if getattr(why, 'winerror', None) is None:
errors.append((src, dst, str(why)))
if errors:
raise Error(errors)
return dst
# version vulnerable to race conditions
def _rmtree_unsafe(path, onerror):
try:
if os.path.islink(path):
# symlinks to directories are forbidden, see bug #1669
raise OSError("Cannot call rmtree on a symbolic link")
except OSError:
onerror(os.path.islink, path, sys.exc_info())
# can't continue even if onerror hook returns
return
names = []
try:
names = os.listdir(path)
except OSError:
onerror(os.listdir, path, sys.exc_info())
for name in names:
fullname = os.path.join(path, name)
try:
mode = os.lstat(fullname).st_mode
except OSError:
mode = 0
if stat.S_ISDIR(mode):
_rmtree_unsafe(fullname, onerror)
else:
try:
os.unlink(fullname)
except OSError:
onerror(os.unlink, fullname, sys.exc_info())
try:
os.rmdir(path)
except OSError:
onerror(os.rmdir, path, sys.exc_info())
# Version using fd-based APIs to protect against races
def _rmtree_safe_fd(topfd, path, onerror):
names = []
try:
names = os.listdir(topfd)
except OSError as err:
err.filename = path
onerror(os.listdir, path, sys.exc_info())
for name in names:
fullname = os.path.join(path, name)
try:
orig_st = os.stat(name, dir_fd=topfd, follow_symlinks=False)
mode = orig_st.st_mode
except OSError:
mode = 0
if stat.S_ISDIR(mode):
try:
dirfd = os.open(name, os.O_RDONLY, dir_fd=topfd)
except OSError:
onerror(os.open, fullname, sys.exc_info())
else:
try:
if os.path.samestat(orig_st, os.fstat(dirfd)):
_rmtree_safe_fd(dirfd, fullname, onerror)
try:
os.rmdir(name, dir_fd=topfd)
except OSError:
onerror(os.rmdir, fullname, sys.exc_info())
else:
try:
# This can only happen if someone replaces
# a directory with a symlink after the call to
# stat.S_ISDIR above.
raise OSError("Cannot call rmtree on a symbolic "
"link")
except OSError:
onerror(os.path.islink, fullname, sys.exc_info())
finally:
os.close(dirfd)
else:
try:
os.unlink(name, dir_fd=topfd)
except OSError:
onerror(os.unlink, fullname, sys.exc_info())
_use_fd_functions = ({os.open, os.stat, os.unlink, os.rmdir} <=
os.supports_dir_fd and
os.listdir in os.supports_fd and
os.stat in os.supports_follow_symlinks)
def rmtree(path, ignore_errors=False, onerror=None):
"""Recursively delete a directory tree.
If ignore_errors is set, errors are ignored; otherwise, if onerror
is set, it is called to handle the error with arguments (func,
path, exc_info) where func is platform and implementation dependent;
path is the argument to that function that caused it to fail; and
exc_info is a tuple returned by sys.exc_info(). If ignore_errors
is false and onerror is None, an exception is raised.
"""
if ignore_errors:
def onerror(*args):
pass
elif onerror is None:
def onerror(*args):
raise
if _use_fd_functions:
# While the unsafe rmtree works fine on bytes, the fd based does not.
if isinstance(path, bytes):
path = os.fsdecode(path)
# Note: To guard against symlink races, we use the standard
# lstat()/open()/fstat() trick.
try:
orig_st = os.lstat(path)
except Exception:
onerror(os.lstat, path, sys.exc_info())
return
try:
fd = os.open(path, os.O_RDONLY)
except Exception:
onerror(os.lstat, path, sys.exc_info())
return
try:
if os.path.samestat(orig_st, os.fstat(fd)):
_rmtree_safe_fd(fd, path, onerror)
try:
os.rmdir(path)
except OSError:
onerror(os.rmdir, path, sys.exc_info())
else:
try:
# symlinks to directories are forbidden, see bug #1669
raise OSError("Cannot call rmtree on a symbolic link")
except OSError:
onerror(os.path.islink, path, sys.exc_info())
finally:
os.close(fd)
else:
return _rmtree_unsafe(path, onerror)
# Allow introspection of whether or not the hardening against symlink
# attacks is supported on the current platform
rmtree.avoids_symlink_attacks = _use_fd_functions
def _basename(path):
# A basename() variant which first strips the trailing slash, if present.
# Thus we always get the last component of the path, even for directories.
sep = os.path.sep + (os.path.altsep or '')
return os.path.basename(path.rstrip(sep))
def move(src, dst):
"""Recursively move a file or directory to another location. This is
similar to the Unix "mv" command. Return the file or directory's
destination.
If the destination is a directory or a symlink to a directory, the source
is moved inside the directory. The destination path must not already
exist.
If the destination already exists but is not a directory, it may be
overwritten depending on os.rename() semantics.
If the destination is on our current filesystem, then rename() is used.
Otherwise, src is copied to the destination and then removed. Symlinks are
recreated under the new name if os.rename() fails because of cross
filesystem renames.
A lot more could be done here... A look at a mv.c shows a lot of
the issues this implementation glosses over.
"""
real_dst = dst
if os.path.isdir(dst):
if _samefile(src, dst):
# We might be on a case insensitive filesystem,
# perform the rename anyway.
os.rename(src, dst)
return
real_dst = os.path.join(dst, _basename(src))
if os.path.exists(real_dst):
raise Error("Destination path '%s' already exists" % real_dst)
try:
os.rename(src, real_dst)
except OSError:
if os.path.islink(src):
linkto = os.readlink(src)
os.symlink(linkto, real_dst)
os.unlink(src)
elif os.path.isdir(src):
if _destinsrc(src, dst):
raise Error("Cannot move a directory '%s' into itself '%s'." % (src, dst))
copytree(src, real_dst, symlinks=True)
rmtree(src)
else:
copy2(src, real_dst)
os.unlink(src)
return real_dst
def _destinsrc(src, dst):
src = abspath(src)
dst = abspath(dst)
if not src.endswith(os.path.sep):
src += os.path.sep
if not dst.endswith(os.path.sep):
dst += os.path.sep
return dst.startswith(src)
def _get_gid(name):
"""Returns a gid, given a group name."""
if getgrnam is None or name is None:
return None
try:
result = getgrnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def _get_uid(name):
"""Returns an uid, given a user name."""
if getpwnam is None or name is None:
return None
try:
result = getpwnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def _make_tarball(base_name, base_dir, compress="gzip", verbose=0, dry_run=0,
owner=None, group=None, logger=None):
"""Create a (possibly compressed) tar file from all the files under
'base_dir'.
'compress' must be "gzip" (the default), "bzip2", or None.
'owner' and 'group' can be used to define an owner and a group for the
archive that is being built. If not provided, the current owner and group
will be used.
The output tar file will be named 'base_name' + ".tar", possibly plus
the appropriate compression extension (".gz", or ".bz2").
Returns the output filename.
"""
tar_compression = {'gzip': 'gz', None: ''}
compress_ext = {'gzip': '.gz'}
if _BZ2_SUPPORTED:
tar_compression['bzip2'] = 'bz2'
compress_ext['bzip2'] = '.bz2'
# flags for compression program, each element of list will be an argument
if compress is not None and compress not in compress_ext:
raise ValueError("bad value for 'compress', or compression format not "
"supported : {0}".format(compress))
archive_name = base_name + '.tar' + compress_ext.get(compress, '')
archive_dir = os.path.dirname(archive_name)
if archive_dir and not os.path.exists(archive_dir):
if logger is not None:
logger.info("creating %s", archive_dir)
if not dry_run:
os.makedirs(archive_dir)
# creating the tarball
if logger is not None:
logger.info('Creating tar archive')
uid = _get_uid(owner)
gid = _get_gid(group)
def _set_uid_gid(tarinfo):
if gid is not None:
tarinfo.gid = gid
tarinfo.gname = group
if uid is not None:
tarinfo.uid = uid
tarinfo.uname = owner
return tarinfo
if not dry_run:
tar = tarfile.open(archive_name, 'w|%s' % tar_compression[compress])
try:
tar.add(base_dir, filter=_set_uid_gid)
finally:
tar.close()
return archive_name
def _call_external_zip(base_dir, zip_filename, verbose=False, dry_run=False):
# XXX see if we want to keep an external call here
if verbose:
zipoptions = "-r"
else:
zipoptions = "-rq"
from distutils.errors import DistutilsExecError
from distutils.spawn import spawn
try:
spawn(["zip", zipoptions, zip_filename, base_dir], dry_run=dry_run)
except DistutilsExecError:
# XXX really should distinguish between "couldn't find
# external 'zip' command" and "zip failed".
raise ExecError("unable to create zip file '%s': "
"could neither import the 'zipfile' module nor "
"find a standalone zip utility") % zip_filename
def _make_zipfile(base_name, base_dir, verbose=0, dry_run=0, logger=None):
"""Create a zip file from all the files under 'base_dir'.
The output zip file will be named 'base_name' + ".zip". Uses either the
"zipfile" Python module (if available) or the InfoZIP "zip" utility
(if installed and found on the default search path). If neither tool is
available, raises ExecError. Returns the name of the output zip
file.
"""
zip_filename = base_name + ".zip"
archive_dir = os.path.dirname(base_name)
if archive_dir and not os.path.exists(archive_dir):
if logger is not None:
logger.info("creating %s", archive_dir)
if not dry_run:
os.makedirs(archive_dir)
# If zipfile module is not available, try spawning an external 'zip'
# command.
try:
import zipfile
except ImportError:
zipfile = None
if zipfile is None:
_call_external_zip(base_dir, zip_filename, verbose, dry_run)
else:
if logger is not None:
logger.info("creating '%s' and adding '%s' to it",
zip_filename, base_dir)
if not dry_run:
with zipfile.ZipFile(zip_filename, "w",
compression=zipfile.ZIP_DEFLATED) as zf:
path = os.path.normpath(base_dir)
zf.write(path, path)
if logger is not None:
logger.info("adding '%s'", path)
for dirpath, dirnames, filenames in os.walk(base_dir):
for name in sorted(dirnames):
path = os.path.normpath(os.path.join(dirpath, name))
zf.write(path, path)
if logger is not None:
logger.info("adding '%s'", path)
for name in filenames:
path = os.path.normpath(os.path.join(dirpath, name))
if os.path.isfile(path):
zf.write(path, path)
if logger is not None:
logger.info("adding '%s'", path)
return zip_filename
_ARCHIVE_FORMATS = {
'gztar': (_make_tarball, [('compress', 'gzip')], "gzip'ed tar-file"),
'tar': (_make_tarball, [('compress', None)], "uncompressed tar file"),
'zip': (_make_zipfile, [], "ZIP file")
}
if _BZ2_SUPPORTED:
_ARCHIVE_FORMATS['bztar'] = (_make_tarball, [('compress', 'bzip2')],
"bzip2'ed tar-file")
def get_archive_formats():
"""Returns a list of supported formats for archiving and unarchiving.
Each element of the returned sequence is a tuple (name, description)
"""
formats = [(name, registry[2]) for name, registry in
_ARCHIVE_FORMATS.items()]
formats.sort()
return formats
def register_archive_format(name, function, extra_args=None, description=''):
"""Registers an archive format.
name is the name of the format. function is the callable that will be
used to create archives. If provided, extra_args is a sequence of
(name, value) tuples that will be passed as arguments to the callable.
description can be provided to describe the format, and will be returned
by the get_archive_formats() function.
"""
if extra_args is None:
extra_args = []
if not callable(function):
raise TypeError('The %s object is not callable' % function)
if not isinstance(extra_args, (tuple, list)):
raise TypeError('extra_args needs to be a sequence')
for element in extra_args:
if not isinstance(element, (tuple, list)) or len(element) !=2:
raise TypeError('extra_args elements are : (arg_name, value)')
_ARCHIVE_FORMATS[name] = (function, extra_args, description)
def unregister_archive_format(name):
del _ARCHIVE_FORMATS[name]
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
dry_run=0, owner=None, group=None, logger=None):
"""Create an archive file (eg. zip or tar).
'base_name' is the name of the file to create, minus any format-specific
extension; 'format' is the archive format: one of "zip", "tar", "bztar"
or "gztar".
'root_dir' is a directory that will be the root directory of the
archive; ie. we typically chdir into 'root_dir' before creating the
archive. 'base_dir' is the directory where we start archiving from;
ie. 'base_dir' will be the common prefix of all files and
directories in the archive. 'root_dir' and 'base_dir' both default
to the current directory. Returns the name of the archive file.
'owner' and 'group' are used when creating a tar archive. By default,
uses the current owner and group.
"""
save_cwd = os.getcwd()
if root_dir is not None:
if logger is not None:
logger.debug("changing into '%s'", root_dir)
base_name = os.path.abspath(base_name)
if not dry_run:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
kwargs = {'dry_run': dry_run, 'logger': logger}
try:
format_info = _ARCHIVE_FORMATS[format]
except KeyError:
raise ValueError("unknown archive format '%s'" % format)
func = format_info[0]
for arg, val in format_info[1]:
kwargs[arg] = val
if format != 'zip':
kwargs['owner'] = owner
kwargs['group'] = group
try:
filename = func(base_name, base_dir, **kwargs)
finally:
if root_dir is not None:
if logger is not None:
logger.debug("changing back to '%s'", save_cwd)
os.chdir(save_cwd)
return filename
def get_unpack_formats():
"""Returns a list of supported formats for unpacking.
Each element of the returned sequence is a tuple
(name, extensions, description)
"""
formats = [(name, info[0], info[3]) for name, info in
_UNPACK_FORMATS.items()]
formats.sort()
return formats
def _check_unpack_options(extensions, function, extra_args):
"""Checks what gets registered as an unpacker."""
# first make sure no other unpacker is registered for this extension
existing_extensions = {}
for name, info in _UNPACK_FORMATS.items():
for ext in info[0]:
existing_extensions[ext] = name
for extension in extensions:
if extension in existing_extensions:
msg = '%s is already registered for "%s"'
raise RegistryError(msg % (extension,
existing_extensions[extension]))
if not callable(function):
raise TypeError('The registered function must be a callable')
def register_unpack_format(name, extensions, function, extra_args=None,
description=''):
"""Registers an unpack format.
`name` is the name of the format. `extensions` is a list of extensions
corresponding to the format.
`function` is the callable that will be
used to unpack archives. The callable will receive archives to unpack.
If it's unable to handle an archive, it needs to raise a ReadError
exception.
If provided, `extra_args` is a sequence of
(name, value) tuples that will be passed as arguments to the callable.
description can be provided to describe the format, and will be returned
by the get_unpack_formats() function.
"""
if extra_args is None:
extra_args = []
_check_unpack_options(extensions, function, extra_args)
_UNPACK_FORMATS[name] = extensions, function, extra_args, description
def unregister_unpack_format(name):
"""Removes the pack format from the registery."""
del _UNPACK_FORMATS[name]
def _ensure_directory(path):
"""Ensure that the parent directory of `path` exists"""
dirname = os.path.dirname(path)
if not os.path.isdir(dirname):
os.makedirs(dirname)
def _unpack_zipfile(filename, extract_dir):
"""Unpack zip `filename` to `extract_dir`
"""
try:
import zipfile
except ImportError:
raise ReadError('zlib not supported, cannot unpack this archive.')
if not zipfile.is_zipfile(filename):
raise ReadError("%s is not a zip file" % filename)
zip = zipfile.ZipFile(filename)
try:
for info in zip.infolist():
name = info.filename
# don't extract absolute paths or ones with .. in them
if name.startswith('/') or '..' in name:
continue
target = os.path.join(extract_dir, *name.split('/'))
if not target:
continue
_ensure_directory(target)
if not name.endswith('/'):
# file
data = zip.read(info.filename)
f = open(target, 'wb')
try:
f.write(data)
finally:
f.close()
del data
finally:
zip.close()
def _unpack_tarfile(filename, extract_dir):
"""Unpack tar/tar.gz/tar.bz2 `filename` to `extract_dir`
"""
try:
tarobj = tarfile.open(filename)
except tarfile.TarError:
raise ReadError(
"%s is not a compressed or uncompressed tar file" % filename)
try:
tarobj.extractall(extract_dir)
finally:
tarobj.close()
_UNPACK_FORMATS = {
'gztar': (['.tar.gz', '.tgz'], _unpack_tarfile, [], "gzip'ed tar-file"),
'tar': (['.tar'], _unpack_tarfile, [], "uncompressed tar file"),
'zip': (['.zip'], _unpack_zipfile, [], "ZIP file")
}
if _BZ2_SUPPORTED:
_UNPACK_FORMATS['bztar'] = (['.bz2'], _unpack_tarfile, [],
"bzip2'ed tar-file")
def _find_unpack_format(filename):
for name, info in _UNPACK_FORMATS.items():
for extension in info[0]:
if filename.endswith(extension):
return name
return None
def unpack_archive(filename, extract_dir=None, format=None):
"""Unpack an archive.
`filename` is the name of the archive.
`extract_dir` is the name of the target directory, where the archive
is unpacked. If not provided, the current working directory is used.
`format` is the archive format: one of "zip", "tar", or "gztar". Or any
other registered format. If not provided, unpack_archive will use the
filename extension and see if an unpacker was registered for that
extension.
In case none is found, a ValueError is raised.
"""
if extract_dir is None:
extract_dir = os.getcwd()
if format is not None:
try:
format_info = _UNPACK_FORMATS[format]
except KeyError:
raise ValueError("Unknown unpack format '{0}'".format(format))
func = format_info[1]
func(filename, extract_dir, **dict(format_info[2]))
else:
# we need to look at the registered unpackers supported extensions
format = _find_unpack_format(filename)
if format is None:
raise ReadError("Unknown archive format '{0}'".format(filename))
func = _UNPACK_FORMATS[format][1]
kwargs = dict(_UNPACK_FORMATS[format][2])
func(filename, extract_dir, **kwargs)
if hasattr(os, 'statvfs'):
__all__.append('disk_usage')
_ntuple_diskusage = collections.namedtuple('usage', 'total used free')
def disk_usage(path):
"""Return disk usage statistics about the given path.
Returned value is a named tuple with attributes 'total', 'used' and
'free', which are the amount of total, used and free space, in bytes.
"""
st = os.statvfs(path)
free = st.f_bavail * st.f_frsize
total = st.f_blocks * st.f_frsize
used = (st.f_blocks - st.f_bfree) * st.f_frsize
return _ntuple_diskusage(total, used, free)
elif os.name == 'nt':
import nt
__all__.append('disk_usage')
_ntuple_diskusage = collections.namedtuple('usage', 'total used free')
def disk_usage(path):
"""Return disk usage statistics about the given path.
Returned values is a named tuple with attributes 'total', 'used' and
'free', which are the amount of total, used and free space, in bytes.
"""
total, free = nt._getdiskusage(path)
used = total - free
return _ntuple_diskusage(total, used, free)
def chown(path, user=None, group=None):
"""Change owner user and group of the given path.
user and group can be the uid/gid or the user/group names, and in that case,
they are converted to their respective uid/gid.
"""
if user is None and group is None:
raise ValueError("user and/or group must be set")
_user = user
_group = group
# -1 means don't change it
if user is None:
_user = -1
# user can either be an int (the uid) or a string (the system username)
elif isinstance(user, str):
_user = _get_uid(user)
if _user is None:
raise LookupError("no such user: {!r}".format(user))
if group is None:
_group = -1
elif not isinstance(group, int):
_group = _get_gid(group)
if _group is None:
raise LookupError("no such group: {!r}".format(group))
os.chown(path, _user, _group)
def get_terminal_size(fallback=(80, 24)):
"""Get the size of the terminal window.
For each of the two dimensions, the environment variable, COLUMNS
and LINES respectively, is checked. If the variable is defined and
the value is a positive integer, it is used.
When COLUMNS or LINES is not defined, which is the common case,
the terminal connected to sys.__stdout__ is queried
by invoking os.get_terminal_size.
If the terminal size cannot be successfully queried, either because
the system doesn't support querying, or because we are not
connected to a terminal, the value given in fallback parameter
is used. Fallback defaults to (80, 24) which is the default
size used by many terminal emulators.
The value returned is a named tuple of type os.terminal_size.
"""
# columns, lines are the working values
try:
columns = int(os.environ['COLUMNS'])
except (KeyError, ValueError):
columns = 0
try:
lines = int(os.environ['LINES'])
except (KeyError, ValueError):
lines = 0
# only query if necessary
if columns <= 0 or lines <= 0:
try:
size = os.get_terminal_size(sys.__stdout__.fileno())
except (NameError, OSError):
size = os.terminal_size(fallback)
if columns <= 0:
columns = size.columns
if lines <= 0:
lines = size.lines
return os.terminal_size((columns, lines))
def which(cmd, mode=os.F_OK | os.X_OK, path=None):
"""Given a command, mode, and a PATH string, return the path which
conforms to the given mode on the PATH, or None if there is no such
file.
`mode` defaults to os.F_OK | os.X_OK. `path` defaults to the result
of os.environ.get("PATH"), or can be overridden with a custom search
path.
"""
# Check that a given file can be accessed with the correct mode.
# Additionally check that `file` is not a directory, as on Windows
# directories pass the os.access check.
def _access_check(fn, mode):
return (os.path.exists(fn) and os.access(fn, mode)
and not os.path.isdir(fn))
# If we're given a path with a directory part, look it up directly rather
# than referring to PATH directories. This includes checking relative to the
# current directory, e.g. ./script
if os.path.dirname(cmd):
if _access_check(cmd, mode):
return cmd
return None
if path is None:
path = os.environ.get("PATH", os.defpath)
if not path:
return None
path = path.split(os.pathsep)
if sys.platform == "win32":
# The current directory takes precedence on Windows.
if not os.curdir in path:
path.insert(0, os.curdir)
# PATHEXT is necessary to check on Windows.
pathext = os.environ.get("PATHEXT", "").split(os.pathsep)
# See if the given file matches any of the expected path extensions.
# This will allow us to short circuit when given "python.exe".
# If it does match, only test that one, otherwise we have to try
# others.
if any(cmd.lower().endswith(ext.lower()) for ext in pathext):
files = [cmd]
else:
files = [cmd + ext for ext in pathext]
else:
# On other platforms you don't have things like PATHEXT to tell you
# what file suffixes are executable, so just pass on cmd as-is.
files = [cmd]
seen = set()
for dir in path:
normdir = os.path.normcase(dir)
if not normdir in seen:
seen.add(normdir)
for thefile in files:
name = os.path.join(dir, thefile)
if _access_check(name, mode):
return name
return None
"""Append module search paths for third-party packages to sys.path.
****************************************************************
* This module is automatically imported during initialization. *
****************************************************************
This will append site-specific paths to the module search path. On
Unix (including Mac OSX), it starts with sys.prefix and
sys.exec_prefix (if different) and appends
lib/python<version>/site-packages as well as lib/site-python.
On other platforms (such as Windows), it tries each of the
prefixes directly, as well as with lib/site-packages appended. The
resulting directories, if they exist, are appended to sys.path, and
also inspected for path configuration files.
If a file named "pyvenv.cfg" exists one directory above sys.executable,
sys.prefix and sys.exec_prefix are set to that directory and
it is also checked for site-packages and site-python (sys.base_prefix and
sys.base_exec_prefix will always be the "real" prefixes of the Python
installation). If "pyvenv.cfg" (a bootstrap configuration file) contains
the key "include-system-site-packages" set to anything other than "false"
(case-insensitive), the system-level prefixes will still also be
searched for site-packages; otherwise they won't.
All of the resulting site-specific directories, if they exist, are
appended to sys.path, and also inspected for path configuration
files.
A path configuration file is a file whose name has the form
<package>.pth; its contents are additional directories (one per line)
to be added to sys.path. Non-existing directories (or
non-directories) are never added to sys.path; no directory is added to
sys.path more than once. Blank lines and lines beginning with
'#' are skipped. Lines starting with 'import' are executed.
For example, suppose sys.prefix and sys.exec_prefix are set to
/usr/local and there is a directory /usr/local/lib/python2.5/site-packages
with three subdirectories, foo, bar and spam, and two path
configuration files, foo.pth and bar.pth. Assume foo.pth contains the
following:
# foo package configuration
foo
bar
bletch
and bar.pth contains:
# bar package configuration
bar
Then the following directories are added to sys.path, in this order:
/usr/local/lib/python2.5/site-packages/bar
/usr/local/lib/python2.5/site-packages/foo
Note that bletch is omitted because it doesn't exist; bar precedes foo
because bar.pth comes alphabetically before foo.pth; and spam is
omitted because it is not mentioned in either path configuration file.
The readline module is also automatically configured to enable
completion for systems that support it. This can be overriden in
sitecustomize, usercustomize or PYTHONSTARTUP.
After these operations, an attempt is made to import a module
named sitecustomize, which can perform arbitrary additional
site-specific customizations. If this import fails with an
ImportError exception, it is silently ignored.
"""
import sys
import os
import builtins
import _sitebuiltins
# Prefixes for site-packages; add additional prefixes like /usr/local here
PREFIXES = [sys.prefix, sys.exec_prefix]
# Enable per user site-packages directory
# set it to False to disable the feature or True to force the feature
ENABLE_USER_SITE = None
# for distutils.commands.install
# These values are initialized by the getuserbase() and getusersitepackages()
# functions, through the main() function when Python starts.
USER_SITE = None
USER_BASE = None
def makepath(*paths):
dir = os.path.join(*paths)
try:
dir = os.path.abspath(dir)
except OSError:
pass
return dir, os.path.normcase(dir)
def abs_paths():
"""Set all module __file__ and __cached__ attributes to an absolute path"""
for m in set(sys.modules.values()):
if (getattr(getattr(m, '__loader__', None), '__module__', None) !=
'_frozen_importlib'):
continue # don't mess with a PEP 302-supplied __file__
try:
m.__file__ = os.path.abspath(m.__file__)
except (AttributeError, OSError):
pass
try:
m.__cached__ = os.path.abspath(m.__cached__)
except (AttributeError, OSError):
pass
def removeduppaths():
""" Remove duplicate entries from sys.path along with making them
absolute"""
# This ensures that the initial path provided by the interpreter contains
# only absolute pathnames, even if we're running from the build directory.
L = []
known_paths = set()
for dir in sys.path:
# Filter out duplicate paths (on case-insensitive file systems also
# if they only differ in case); turn relative paths into absolute
# paths.
dir, dircase = makepath(dir)
if not dircase in known_paths:
L.append(dir)
known_paths.add(dircase)
sys.path[:] = L
return known_paths
def _init_pathinfo():
"""Return a set containing all existing directory entries from sys.path"""
d = set()
for dir in sys.path:
try:
if os.path.isdir(dir):
dir, dircase = makepath(dir)
d.add(dircase)
except TypeError:
continue
return d
def addpackage(sitedir, name, known_paths):
"""Process a .pth file within the site-packages directory:
For each line in the file, either combine it with sitedir to a path
and add that to known_paths, or execute it if it starts with 'import '.
"""
if known_paths is None:
known_paths = _init_pathinfo()
reset = 1
else:
reset = 0
fullname = os.path.join(sitedir, name)
try:
f = open(fullname, "r")
except OSError:
return
with f:
for n, line in enumerate(f):
if line.startswith("#"):
continue
try:
if line.startswith(("import ", "import\t")):
exec(line)
continue
line = line.rstrip()
dir, dircase = makepath(sitedir, line)
if not dircase in known_paths and os.path.exists(dir):
sys.path.append(dir)
known_paths.add(dircase)
except Exception:
print("Error processing line {:d} of {}:\n".format(n+1, fullname),
file=sys.stderr)
import traceback
for record in traceback.format_exception(*sys.exc_info()):
for line in record.splitlines():
print(' '+line, file=sys.stderr)
print("\nRemainder of file ignored", file=sys.stderr)
break
if reset:
known_paths = None
return known_paths
def addsitedir(sitedir, known_paths=None):
"""Add 'sitedir' argument to sys.path if missing and handle .pth files in
'sitedir'"""
if known_paths is None:
known_paths = _init_pathinfo()
reset = 1
else:
reset = 0
sitedir, sitedircase = makepath(sitedir)
if not sitedircase in known_paths:
sys.path.append(sitedir) # Add path component
known_paths.add(sitedircase)
try:
names = os.listdir(sitedir)
except OSError:
return
names = [name for name in names if name.endswith(".pth")]
for name in sorted(names):
addpackage(sitedir, name, known_paths)
if reset:
known_paths = None
return known_paths
def check_enableusersite():
"""Check if user site directory is safe for inclusion
The function tests for the command line flag (including environment var),
process uid/gid equal to effective uid/gid.
None: Disabled for security reasons
False: Disabled by user (command line option)
True: Safe and enabled
"""
if sys.flags.no_user_site:
return False
if hasattr(os, "getuid") and hasattr(os, "geteuid"):
# check process uid == effective uid
if os.geteuid() != os.getuid():
return None
if hasattr(os, "getgid") and hasattr(os, "getegid"):
# check process gid == effective gid
if os.getegid() != os.getgid():
return None
return True
def getuserbase():
"""Returns the `user base` directory path.
The `user base` directory can be used to store data. If the global
variable ``USER_BASE`` is not initialized yet, this function will also set
it.
"""
global USER_BASE
if USER_BASE is not None:
return USER_BASE
from sysconfig import get_config_var
USER_BASE = get_config_var('userbase')
return USER_BASE
def getusersitepackages():
"""Returns the user-specific site-packages directory path.
If the global variable ``USER_SITE`` is not initialized yet, this
function will also set it.
"""
global USER_SITE
user_base = getuserbase() # this will also set USER_BASE
if USER_SITE is not None:
return USER_SITE
from sysconfig import get_path
if sys.platform == 'darwin':
from sysconfig import get_config_var
if get_config_var('PYTHONFRAMEWORK'):
USER_SITE = get_path('purelib', 'osx_framework_user')
return USER_SITE
USER_SITE = get_path('purelib', '%s_user' % os.name)
return USER_SITE
def addusersitepackages(known_paths):
"""Add a per user site-package to sys.path
Each user has its own python directory with site-packages in the
home directory.
"""
# get the per user site-package path
# this call will also make sure USER_BASE and USER_SITE are set
user_site = getusersitepackages()
if ENABLE_USER_SITE and os.path.isdir(user_site):
addsitedir(user_site, known_paths)
return known_paths
def getsitepackages(prefixes=None):
"""Returns a list containing all global site-packages directories
(and possibly site-python).
For each directory present in ``prefixes`` (or the global ``PREFIXES``),
this function will find its `site-packages` subdirectory depending on the
system environment, and will return a list of full paths.
"""
sitepackages = []
seen = set()
if prefixes is None:
prefixes = PREFIXES
for prefix in prefixes:
if not prefix or prefix in seen:
continue
seen.add(prefix)
if os.sep == '/':
sitepackages.append(os.path.join(prefix, "lib",
"python" + sys.version[:3],
"site-packages"))
sitepackages.append(os.path.join(prefix, "lib", "site-python"))
else:
sitepackages.append(prefix)
sitepackages.append(os.path.join(prefix, "lib", "site-packages"))
if sys.platform == "darwin":
# for framework builds *only* we add the standard Apple
# locations.
from sysconfig import get_config_var
framework = get_config_var("PYTHONFRAMEWORK")
if framework:
sitepackages.append(
os.path.join("/Library", framework,
sys.version[:3], "site-packages"))
return sitepackages
def addsitepackages(known_paths, prefixes=None):
"""Add site-packages (and possibly site-python) to sys.path"""
for sitedir in getsitepackages(prefixes):
if os.path.isdir(sitedir):
if "site-python" in sitedir:
import warnings
warnings.warn('"site-python" directories will not be '
'supported in 3.5 anymore',
DeprecationWarning)
addsitedir(sitedir, known_paths)
return known_paths
def setquit():
"""Define new builtins 'quit' and 'exit'.
These are objects which make the interpreter exit when called.
The repr of each object contains a hint at how it works.
"""
if os.sep == ':':
eof = 'Cmd-Q'
elif os.sep == '\\':
eof = 'Ctrl-Z plus Return'
else:
eof = 'Ctrl-D (i.e. EOF)'
builtins.quit = _sitebuiltins.Quitter('quit', eof)
builtins.exit = _sitebuiltins.Quitter('exit', eof)
def setcopyright():
"""Set 'copyright' and 'credits' in builtins"""
builtins.copyright = _sitebuiltins._Printer("copyright", sys.copyright)
if sys.platform[:4] == 'java':
builtins.credits = _sitebuiltins._Printer(
"credits",
"Jython is maintained by the Jython developers (www.jython.org).")
else:
builtins.credits = _sitebuiltins._Printer("credits", """\
Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information.""")
files, dirs = [], []
# Not all modules are required to have a __file__ attribute. See
# PEP 420 for more details.
if hasattr(os, '__file__'):
here = os.path.dirname(os.__file__)
files.extend(["LICENSE.txt", "LICENSE"])
dirs.extend([os.path.join(here, os.pardir), here, os.curdir])
builtins.license = _sitebuiltins._Printer(
"license",
"See https://www.python.org/psf/license/",
files, dirs)
def sethelper():
builtins.help = _sitebuiltins._Helper()
def enablerlcompleter():
"""Enable default readline configuration on interactive prompts, by
registering a sys.__interactivehook__.
If the readline module can be imported, the hook will set the Tab key
as completion key and register ~/.python_history as history file.
This can be overriden in the sitecustomize or usercustomize module,
or in a PYTHONSTARTUP file.
"""
def register_readline():
import atexit
try:
import readline
import rlcompleter
except ImportError:
return
# Reading the initialization (config) file may not be enough to set a
# completion key, so we set one first and then read the file.
readline_doc = getattr(readline, '__doc__', '')
if readline_doc is not None and 'libedit' in readline_doc:
readline.parse_and_bind('bind ^I rl_complete')
else:
readline.parse_and_bind('tab: complete')
try:
readline.read_init_file()
except OSError:
# An OSError here could have many causes, but the most likely one
# is that there's no .inputrc file (or .editrc file in the case of
# Mac OS X + libedit) in the expected location. In that case, we
# want to ignore the exception.
pass
if readline.get_current_history_length() == 0:
# If no history was loaded, default to .python_history.
# The guard is necessary to avoid doubling history size at
# each interpreter exit when readline was already configured
# through a PYTHONSTARTUP hook, see:
# http://bugs.python.org/issue5845#msg198636
history = os.path.join(os.path.expanduser('~'),
'.python_history')
try:
readline.read_history_file(history)
except IOError:
pass
atexit.register(readline.write_history_file, history)
sys.__interactivehook__ = register_readline
def aliasmbcs():
"""On Windows, some default encodings are not provided by Python,
while they are always available as "mbcs" in each locale. Make
them usable by aliasing to "mbcs" in such a case."""
if sys.platform == 'win32':
import _bootlocale, codecs
enc = _bootlocale.getpreferredencoding(False)
if enc.startswith('cp'): # "cp***" ?
try:
codecs.lookup(enc)
except LookupError:
import encodings
encodings._cache[enc] = encodings._unknown
encodings.aliases.aliases[enc] = 'mbcs'
CONFIG_LINE = r'^(?P<key>(\w|[-_])+)\s*=\s*(?P<value>.*)\s*$'
def venv(known_paths):
global PREFIXES, ENABLE_USER_SITE
env = os.environ
if sys.platform == 'darwin' and '__PYVENV_LAUNCHER__' in env:
executable = os.environ['__PYVENV_LAUNCHER__']
else:
executable = sys.executable
exe_dir, _ = os.path.split(os.path.abspath(executable))
site_prefix = os.path.dirname(exe_dir)
sys._home = None
conf_basename = 'pyvenv.cfg'
candidate_confs = [
conffile for conffile in (
os.path.join(exe_dir, conf_basename),
os.path.join(site_prefix, conf_basename)
)
if os.path.isfile(conffile)
]
if candidate_confs:
import re
config_line = re.compile(CONFIG_LINE)
virtual_conf = candidate_confs[0]
system_site = "true"
# Issue 25185: Use UTF-8, as that's what the venv module uses when
# writing the file.
with open(virtual_conf, encoding='utf-8') as f:
for line in f:
line = line.strip()
m = config_line.match(line)
if m:
d = m.groupdict()
key, value = d['key'].lower(), d['value']
if key == 'include-system-site-packages':
system_site = value.lower()
elif key == 'home':
sys._home = value
sys.prefix = sys.exec_prefix = site_prefix
# Doing this here ensures venv takes precedence over user-site
addsitepackages(known_paths, [sys.prefix])
# addsitepackages will process site_prefix again if its in PREFIXES,
# but that's ok; known_paths will prevent anything being added twice
if system_site == "true":
PREFIXES.insert(0, sys.prefix)
else:
PREFIXES = [sys.prefix]
ENABLE_USER_SITE = False
return known_paths
def execsitecustomize():
"""Run custom site specific code, if available."""
try:
import sitecustomize
except ImportError:
pass
except Exception as err:
if sys.flags.verbose:
sys.excepthook(*sys.exc_info())
else:
sys.stderr.write(
"Error in sitecustomize; set PYTHONVERBOSE for traceback:\n"
"%s: %s\n" %
(err.__class__.__name__, err))
def execusercustomize():
"""Run custom user specific code, if available."""
try:
import usercustomize
except ImportError:
pass
except Exception as err:
if sys.flags.verbose:
sys.excepthook(*sys.exc_info())
else:
sys.stderr.write(
"Error in usercustomize; set PYTHONVERBOSE for traceback:\n"
"%s: %s\n" %
(err.__class__.__name__, err))
def main():
"""Add standard site-specific directories to the module search path.
This function is called automatically when this module is imported,
unless the python interpreter was started with the -S flag.
"""
global ENABLE_USER_SITE
abs_paths()
known_paths = removeduppaths()
known_paths = venv(known_paths)
if ENABLE_USER_SITE is None:
ENABLE_USER_SITE = check_enableusersite()
known_paths = addusersitepackages(known_paths)
known_paths = addsitepackages(known_paths)
setquit()
setcopyright()
sethelper()
enablerlcompleter()
aliasmbcs()
execsitecustomize()
if ENABLE_USER_SITE:
execusercustomize()
# Prevent edition of sys.path when python was started with -S and
# site is imported later.
if not sys.flags.no_site:
main()
def _script():
help = """\
%s [--user-base] [--user-site]
Without arguments print some useful information
With arguments print the value of USER_BASE and/or USER_SITE separated
by '%s'.
Exit codes with --user-base or --user-site:
0 - user site directory is enabled
1 - user site directory is disabled by user
2 - uses site directory is disabled by super user
or for security reasons
>2 - unknown error
"""
args = sys.argv[1:]
if not args:
user_base = getuserbase()
user_site = getusersitepackages()
print("sys.path = [")
for dir in sys.path:
print(" %r," % (dir,))
print("]")
print("USER_BASE: %r (%s)" % (user_base,
"exists" if os.path.isdir(user_base) else "doesn't exist"))
print("USER_SITE: %r (%s)" % (user_site,
"exists" if os.path.isdir(user_site) else "doesn't exist"))
print("ENABLE_USER_SITE: %r" % ENABLE_USER_SITE)
sys.exit(0)
buffer = []
if '--user-base' in args:
buffer.append(USER_BASE)
if '--user-site' in args:
buffer.append(USER_SITE)
if buffer:
print(os.pathsep.join(buffer))
if ENABLE_USER_SITE:
sys.exit(0)
elif ENABLE_USER_SITE is False:
sys.exit(1)
elif ENABLE_USER_SITE is None:
sys.exit(2)
else:
sys.exit(3)
else:
import textwrap
print(textwrap.dedent(help % (sys.argv[0], os.pathsep)))
sys.exit(10)
if __name__ == '__main__':
_script()
#! /usr/bin/env python3
"""An RFC 5321 smtp proxy.
Usage: %(program)s [options] [localhost:localport [remotehost:remoteport]]
Options:
--nosetuid
-n
This program generally tries to setuid `nobody', unless this flag is
set. The setuid call will fail if this program is not run as root (in
which case, use this flag).
--version
-V
Print the version number and exit.
--class classname
-c classname
Use `classname' as the concrete SMTP proxy class. Uses `PureProxy' by
default.
--size limit
-s limit
Restrict the total size of the incoming message to "limit" number of
bytes via the RFC 1870 SIZE extension. Defaults to 33554432 bytes.
--debug
-d
Turn on debugging prints.
--help
-h
Print this message and exit.
Version: %(__version__)s
If localhost is not given then `localhost' is used, and if localport is not
given then 8025 is used. If remotehost is not given then `localhost' is used,
and if remoteport is not given, then 25 is used.
"""
# Overview:
#
# This file implements the minimal SMTP protocol as defined in RFC 5321. It
# has a hierarchy of classes which implement the backend functionality for the
# smtpd. A number of classes are provided:
#
# SMTPServer - the base class for the backend. Raises NotImplementedError
# if you try to use it.
#
# DebuggingServer - simply prints each message it receives on stdout.
#
# PureProxy - Proxies all messages to a real smtpd which does final
# delivery. One known problem with this class is that it doesn't handle
# SMTP errors from the backend server at all. This should be fixed
# (contributions are welcome!).
#
# MailmanProxy - An experimental hack to work with GNU Mailman
# <www.list.org>. Using this server as your real incoming smtpd, your
# mailhost will automatically recognize and accept mail destined to Mailman
# lists when those lists are created. Every message not destined for a list
# gets forwarded to a real backend smtpd, as with PureProxy. Again, errors
# are not handled correctly yet.
#
#
# Author: Barry Warsaw <[email protected]>
#
# TODO:
#
# - support mailbox delivery
# - alias files
# - Handle more ESMTP extensions
# - handle error codes from the backend smtpd
import sys
import os
import errno
import getopt
import time
import socket
import asyncore
import asynchat
import collections
from warnings import warn
from email._header_value_parser import get_addr_spec, get_angle_addr
__all__ = ["SMTPServer","DebuggingServer","PureProxy","MailmanProxy"]
program = sys.argv[0]
__version__ = 'Python SMTP proxy version 0.3'
class Devnull:
def write(self, msg): pass
def flush(self): pass
DEBUGSTREAM = Devnull()
NEWLINE = '\n'
EMPTYSTRING = ''
COMMASPACE = ', '
DATA_SIZE_DEFAULT = 33554432
def usage(code, msg=''):
print(__doc__ % globals(), file=sys.stderr)
if msg:
print(msg, file=sys.stderr)
sys.exit(code)
class SMTPChannel(asynchat.async_chat):
COMMAND = 0
DATA = 1
command_size_limit = 512
command_size_limits = collections.defaultdict(lambda x=command_size_limit: x)
command_size_limits.update({
'MAIL': command_size_limit + 26,
})
max_command_size_limit = max(command_size_limits.values())
def __init__(self, server, conn, addr, data_size_limit=DATA_SIZE_DEFAULT,
map=None):
asynchat.async_chat.__init__(self, conn, map=map)
self.smtp_server = server
self.conn = conn
self.addr = addr
self.data_size_limit = data_size_limit
self.received_lines = []
self.smtp_state = self.COMMAND
self.seen_greeting = ''
self.mailfrom = None
self.rcpttos = []
self.received_data = ''
self.fqdn = socket.getfqdn()
self.num_bytes = 0
try:
self.peer = conn.getpeername()
except OSError as err:
# a race condition may occur if the other end is closing
# before we can get the peername
self.close()
if err.args[0] != errno.ENOTCONN:
raise
return
print('Peer:', repr(self.peer), file=DEBUGSTREAM)
self.push('220 %s %s' % (self.fqdn, __version__))
self.set_terminator(b'\r\n')
self.extended_smtp = False
# properties for backwards-compatibility
@property
def __server(self):
warn("Access to __server attribute on SMTPChannel is deprecated, "
"use 'smtp_server' instead", DeprecationWarning, 2)
return self.smtp_server
@__server.setter
def __server(self, value):
warn("Setting __server attribute on SMTPChannel is deprecated, "
"set 'smtp_server' instead", DeprecationWarning, 2)
self.smtp_server = value
@property
def __line(self):
warn("Access to __line attribute on SMTPChannel is deprecated, "
"use 'received_lines' instead", DeprecationWarning, 2)
return self.received_lines
@__line.setter
def __line(self, value):
warn("Setting __line attribute on SMTPChannel is deprecated, "
"set 'received_lines' instead", DeprecationWarning, 2)
self.received_lines = value
@property
def __state(self):
warn("Access to __state attribute on SMTPChannel is deprecated, "
"use 'smtp_state' instead", DeprecationWarning, 2)
return self.smtp_state
@__state.setter
def __state(self, value):
warn("Setting __state attribute on SMTPChannel is deprecated, "
"set 'smtp_state' instead", DeprecationWarning, 2)
self.smtp_state = value
@property
def __greeting(self):
warn("Access to __greeting attribute on SMTPChannel is deprecated, "
"use 'seen_greeting' instead", DeprecationWarning, 2)
return self.seen_greeting
@__greeting.setter
def __greeting(self, value):
warn("Setting __greeting attribute on SMTPChannel is deprecated, "
"set 'seen_greeting' instead", DeprecationWarning, 2)
self.seen_greeting = value
@property
def __mailfrom(self):
warn("Access to __mailfrom attribute on SMTPChannel is deprecated, "
"use 'mailfrom' instead", DeprecationWarning, 2)
return self.mailfrom
@__mailfrom.setter
def __mailfrom(self, value):
warn("Setting __mailfrom attribute on SMTPChannel is deprecated, "
"set 'mailfrom' instead", DeprecationWarning, 2)
self.mailfrom = value
@property
def __rcpttos(self):
warn("Access to __rcpttos attribute on SMTPChannel is deprecated, "
"use 'rcpttos' instead", DeprecationWarning, 2)
return self.rcpttos
@__rcpttos.setter
def __rcpttos(self, value):
warn("Setting __rcpttos attribute on SMTPChannel is deprecated, "
"set 'rcpttos' instead", DeprecationWarning, 2)
self.rcpttos = value
@property
def __data(self):
warn("Access to __data attribute on SMTPChannel is deprecated, "
"use 'received_data' instead", DeprecationWarning, 2)
return self.received_data
@__data.setter
def __data(self, value):
warn("Setting __data attribute on SMTPChannel is deprecated, "
"set 'received_data' instead", DeprecationWarning, 2)
self.received_data = value
@property
def __fqdn(self):
warn("Access to __fqdn attribute on SMTPChannel is deprecated, "
"use 'fqdn' instead", DeprecationWarning, 2)
return self.fqdn
@__fqdn.setter
def __fqdn(self, value):
warn("Setting __fqdn attribute on SMTPChannel is deprecated, "
"set 'fqdn' instead", DeprecationWarning, 2)
self.fqdn = value
@property
def __peer(self):
warn("Access to __peer attribute on SMTPChannel is deprecated, "
"use 'peer' instead", DeprecationWarning, 2)
return self.peer
@__peer.setter
def __peer(self, value):
warn("Setting __peer attribute on SMTPChannel is deprecated, "
"set 'peer' instead", DeprecationWarning, 2)
self.peer = value
@property
def __conn(self):
warn("Access to __conn attribute on SMTPChannel is deprecated, "
"use 'conn' instead", DeprecationWarning, 2)
return self.conn
@__conn.setter
def __conn(self, value):
warn("Setting __conn attribute on SMTPChannel is deprecated, "
"set 'conn' instead", DeprecationWarning, 2)
self.conn = value
@property
def __addr(self):
warn("Access to __addr attribute on SMTPChannel is deprecated, "
"use 'addr' instead", DeprecationWarning, 2)
return self.addr
@__addr.setter
def __addr(self, value):
warn("Setting __addr attribute on SMTPChannel is deprecated, "
"set 'addr' instead", DeprecationWarning, 2)
self.addr = value
# Overrides base class for convenience
def push(self, msg):
asynchat.async_chat.push(self, bytes(msg + '\r\n', 'ascii'))
# Implementation of base class abstract method
def collect_incoming_data(self, data):
limit = None
if self.smtp_state == self.COMMAND:
limit = self.max_command_size_limit
elif self.smtp_state == self.DATA:
limit = self.data_size_limit
if limit and self.num_bytes > limit:
return
elif limit:
self.num_bytes += len(data)
self.received_lines.append(str(data, "utf-8"))
# Implementation of base class abstract method
def found_terminator(self):
line = EMPTYSTRING.join(self.received_lines)
print('Data:', repr(line), file=DEBUGSTREAM)
self.received_lines = []
if self.smtp_state == self.COMMAND:
sz, self.num_bytes = self.num_bytes, 0
if not line:
self.push('500 Error: bad syntax')
return
method = None
i = line.find(' ')
if i < 0:
command = line.upper()
arg = None
else:
command = line[:i].upper()
arg = line[i+1:].strip()
max_sz = (self.command_size_limits[command]
if self.extended_smtp else self.command_size_limit)
if sz > max_sz:
self.push('500 Error: line too long')
return
method = getattr(self, 'smtp_' + command, None)
if not method:
self.push('500 Error: command "%s" not recognized' % command)
return
method(arg)
return
else:
if self.smtp_state != self.DATA:
self.push('451 Internal confusion')
self.num_bytes = 0
return
if self.data_size_limit and self.num_bytes > self.data_size_limit:
self.push('552 Error: Too much mail data')
self.num_bytes = 0
return
# Remove extraneous carriage returns and de-transparency according
# to RFC 5321, Section 4.5.2.
data = []
for text in line.split('\r\n'):
if text and text[0] == '.':
data.append(text[1:])
else:
data.append(text)
self.received_data = NEWLINE.join(data)
status = self.smtp_server.process_message(self.peer,
self.mailfrom,
self.rcpttos,
self.received_data)
self.rcpttos = []
self.mailfrom = None
self.smtp_state = self.COMMAND
self.num_bytes = 0
self.set_terminator(b'\r\n')
if not status:
self.push('250 OK')
else:
self.push(status)
# SMTP and ESMTP commands
def smtp_HELO(self, arg):
if not arg:
self.push('501 Syntax: HELO hostname')
return
if self.seen_greeting:
self.push('503 Duplicate HELO/EHLO')
else:
self.seen_greeting = arg
self.extended_smtp = False
self.push('250 %s' % self.fqdn)
def smtp_EHLO(self, arg):
if not arg:
self.push('501 Syntax: EHLO hostname')
return
if self.seen_greeting:
self.push('503 Duplicate HELO/EHLO')
else:
self.seen_greeting = arg
self.extended_smtp = True
self.push('250-%s' % self.fqdn)
if self.data_size_limit:
self.push('250-SIZE %s' % self.data_size_limit)
self.push('250 HELP')
def smtp_NOOP(self, arg):
if arg:
self.push('501 Syntax: NOOP')
else:
self.push('250 OK')
def smtp_QUIT(self, arg):
# args is ignored
self.push('221 Bye')
self.close_when_done()
def _strip_command_keyword(self, keyword, arg):
keylen = len(keyword)
if arg[:keylen].upper() == keyword:
return arg[keylen:].strip()
return ''
def _getaddr(self, arg):
if not arg:
return '', ''
if arg.lstrip().startswith('<'):
address, rest = get_angle_addr(arg)
else:
address, rest = get_addr_spec(arg)
if not address:
return address, rest
return address.addr_spec, rest
def _getparams(self, params):
# Return any parameters that appear to be syntactically valid according
# to RFC 1869, ignore all others. (Postel rule: accept what we can.)
params = [param.split('=', 1) for param in params.split()
if '=' in param]
return {k: v for k, v in params if k.isalnum()}
def smtp_HELP(self, arg):
if arg:
extended = ' [SP <mail-parameters>]'
lc_arg = arg.upper()
if lc_arg == 'EHLO':
self.push('250 Syntax: EHLO hostname')
elif lc_arg == 'HELO':
self.push('250 Syntax: HELO hostname')
elif lc_arg == 'MAIL':
msg = '250 Syntax: MAIL FROM: <address>'
if self.extended_smtp:
msg += extended
self.push(msg)
elif lc_arg == 'RCPT':
msg = '250 Syntax: RCPT TO: <address>'
if self.extended_smtp:
msg += extended
self.push(msg)
elif lc_arg == 'DATA':
self.push('250 Syntax: DATA')
elif lc_arg == 'RSET':
self.push('250 Syntax: RSET')
elif lc_arg == 'NOOP':
self.push('250 Syntax: NOOP')
elif lc_arg == 'QUIT':
self.push('250 Syntax: QUIT')
elif lc_arg == 'VRFY':
self.push('250 Syntax: VRFY <address>')
else:
self.push('501 Supported commands: EHLO HELO MAIL RCPT '
'DATA RSET NOOP QUIT VRFY')
else:
self.push('250 Supported commands: EHLO HELO MAIL RCPT DATA '
'RSET NOOP QUIT VRFY')
def smtp_VRFY(self, arg):
if arg:
address, params = self._getaddr(arg)
if address:
self.push('252 Cannot VRFY user, but will accept message '
'and attempt delivery')
else:
self.push('502 Could not VRFY %s' % arg)
else:
self.push('501 Syntax: VRFY <address>')
def smtp_MAIL(self, arg):
if not self.seen_greeting:
self.push('503 Error: send HELO first');
return
print('===> MAIL', arg, file=DEBUGSTREAM)
syntaxerr = '501 Syntax: MAIL FROM: <address>'
if self.extended_smtp:
syntaxerr += ' [SP <mail-parameters>]'
if arg is None:
self.push(syntaxerr)
return
arg = self._strip_command_keyword('FROM:', arg)
address, params = self._getaddr(arg)
if not address:
self.push(syntaxerr)
return
if not self.extended_smtp and params:
self.push(syntaxerr)
return
if self.mailfrom:
self.push('503 Error: nested MAIL command')
return
params = self._getparams(params.upper())
if params is None:
self.push(syntaxerr)
return
size = params.pop('SIZE', None)
if size:
if not size.isdigit():
self.push(syntaxerr)
return
elif self.data_size_limit and int(size) > self.data_size_limit:
self.push('552 Error: message size exceeds fixed maximum message size')
return
if len(params.keys()) > 0:
self.push('555 MAIL FROM parameters not recognized or not implemented')
return
self.mailfrom = address
print('sender:', self.mailfrom, file=DEBUGSTREAM)
self.push('250 OK')
def smtp_RCPT(self, arg):
if not self.seen_greeting:
self.push('503 Error: send HELO first');
return
print('===> RCPT', arg, file=DEBUGSTREAM)
if not self.mailfrom:
self.push('503 Error: need MAIL command')
return
syntaxerr = '501 Syntax: RCPT TO: <address>'
if self.extended_smtp:
syntaxerr += ' [SP <mail-parameters>]'
if arg is None:
self.push(syntaxerr)
return
arg = self._strip_command_keyword('TO:', arg)
address, params = self._getaddr(arg)
if not address:
self.push(syntaxerr)
return
if params:
if self.extended_smtp:
params = self._getparams(params.upper())
if params is None:
self.push(syntaxerr)
return
else:
self.push(syntaxerr)
return
if params and len(params.keys()) > 0:
self.push('555 RCPT TO parameters not recognized or not implemented')
return
self.rcpttos.append(address)
print('recips:', self.rcpttos, file=DEBUGSTREAM)
self.push('250 OK')
def smtp_RSET(self, arg):
if arg:
self.push('501 Syntax: RSET')
return
# Resets the sender, recipients, and data, but not the greeting
self.mailfrom = None
self.rcpttos = []
self.received_data = ''
self.smtp_state = self.COMMAND
self.push('250 OK')
def smtp_DATA(self, arg):
if not self.seen_greeting:
self.push('503 Error: send HELO first');
return
if not self.rcpttos:
self.push('503 Error: need RCPT command')
return
if arg:
self.push('501 Syntax: DATA')
return
self.smtp_state = self.DATA
self.set_terminator(b'\r\n.\r\n')
self.push('354 End data with <CR><LF>.<CR><LF>')
# Commands that have not been implemented
def smtp_EXPN(self, arg):
self.push('502 EXPN not implemented')
class SMTPServer(asyncore.dispatcher):
# SMTPChannel class to use for managing client connections
channel_class = SMTPChannel
def __init__(self, localaddr, remoteaddr,
data_size_limit=DATA_SIZE_DEFAULT, map=None):
self._localaddr = localaddr
self._remoteaddr = remoteaddr
self.data_size_limit = data_size_limit
asyncore.dispatcher.__init__(self, map=map)
try:
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
# try to re-use a server port if possible
self.set_reuse_addr()
self.bind(localaddr)
self.listen(5)
except:
self.close()
raise
else:
print('%s started at %s\n\tLocal addr: %s\n\tRemote addr:%s' % (
self.__class__.__name__, time.ctime(time.time()),
localaddr, remoteaddr), file=DEBUGSTREAM)
def handle_accepted(self, conn, addr):
print('Incoming connection from %s' % repr(addr), file=DEBUGSTREAM)
channel = self.channel_class(self, conn, addr, self.data_size_limit,
self._map)
# API for "doing something useful with the message"
def process_message(self, peer, mailfrom, rcpttos, data):
"""Override this abstract method to handle messages from the client.
peer is a tuple containing (ipaddr, port) of the client that made the
socket connection to our smtp port.
mailfrom is the raw address the client claims the message is coming
from.
rcpttos is a list of raw addresses the client wishes to deliver the
message to.
data is a string containing the entire full text of the message,
headers (if supplied) and all. It has been `de-transparencied'
according to RFC 821, Section 4.5.2. In other words, a line
containing a `.' followed by other text has had the leading dot
removed.
This function should return None, for a normal `250 Ok' response;
otherwise it returns the desired response string in RFC 821 format.
"""
raise NotImplementedError
class DebuggingServer(SMTPServer):
# Do something with the gathered message
def process_message(self, peer, mailfrom, rcpttos, data):
inheaders = 1
lines = data.split('\n')
print('---------- MESSAGE FOLLOWS ----------')
for line in lines:
# headers first
if inheaders and not line:
print('X-Peer:', peer[0])
inheaders = 0
print(line)
print('------------ END MESSAGE ------------')
class PureProxy(SMTPServer):
def process_message(self, peer, mailfrom, rcpttos, data):
lines = data.split('\n')
# Look for the last header
i = 0
for line in lines:
if not line:
break
i += 1
lines.insert(i, 'X-Peer: %s' % peer[0])
data = NEWLINE.join(lines)
refused = self._deliver(mailfrom, rcpttos, data)
# TBD: what to do with refused addresses?
print('we got some refusals:', refused, file=DEBUGSTREAM)
def _deliver(self, mailfrom, rcpttos, data):
import smtplib
refused = {}
try:
s = smtplib.SMTP()
s.connect(self._remoteaddr[0], self._remoteaddr[1])
try:
refused = s.sendmail(mailfrom, rcpttos, data)
finally:
s.quit()
except smtplib.SMTPRecipientsRefused as e:
print('got SMTPRecipientsRefused', file=DEBUGSTREAM)
refused = e.recipients
except (OSError, smtplib.SMTPException) as e:
print('got', e.__class__, file=DEBUGSTREAM)
# All recipients were refused. If the exception had an associated
# error code, use it. Otherwise,fake it with a non-triggering
# exception code.
errcode = getattr(e, 'smtp_code', -1)
errmsg = getattr(e, 'smtp_error', 'ignore')
for r in rcpttos:
refused[r] = (errcode, errmsg)
return refused
class MailmanProxy(PureProxy):
def process_message(self, peer, mailfrom, rcpttos, data):
from io import StringIO
from Mailman import Utils
from Mailman import Message
from Mailman import MailList
# If the message is to a Mailman mailing list, then we'll invoke the
# Mailman script directly, without going through the real smtpd.
# Otherwise we'll forward it to the local proxy for disposition.
listnames = []
for rcpt in rcpttos:
local = rcpt.lower().split('@')[0]
# We allow the following variations on the theme
# listname
# listname-admin
# listname-owner
# listname-request
# listname-join
# listname-leave
parts = local.split('-')
if len(parts) > 2:
continue
listname = parts[0]
if len(parts) == 2:
command = parts[1]
else:
command = ''
if not Utils.list_exists(listname) or command not in (
'', 'admin', 'owner', 'request', 'join', 'leave'):
continue
listnames.append((rcpt, listname, command))
# Remove all list recipients from rcpttos and forward what we're not
# going to take care of ourselves. Linear removal should be fine
# since we don't expect a large number of recipients.
for rcpt, listname, command in listnames:
rcpttos.remove(rcpt)
# If there's any non-list destined recipients left,
print('forwarding recips:', ' '.join(rcpttos), file=DEBUGSTREAM)
if rcpttos:
refused = self._deliver(mailfrom, rcpttos, data)
# TBD: what to do with refused addresses?
print('we got refusals:', refused, file=DEBUGSTREAM)
# Now deliver directly to the list commands
mlists = {}
s = StringIO(data)
msg = Message.Message(s)
# These headers are required for the proper execution of Mailman. All
# MTAs in existence seem to add these if the original message doesn't
# have them.
if not msg.get('from'):
msg['From'] = mailfrom
if not msg.get('date'):
msg['Date'] = time.ctime(time.time())
for rcpt, listname, command in listnames:
print('sending message to', rcpt, file=DEBUGSTREAM)
mlist = mlists.get(listname)
if not mlist:
mlist = MailList.MailList(listname, lock=0)
mlists[listname] = mlist
# dispatch on the type of command
if command == '':
# post
msg.Enqueue(mlist, tolist=1)
elif command == 'admin':
msg.Enqueue(mlist, toadmin=1)
elif command == 'owner':
msg.Enqueue(mlist, toowner=1)
elif command == 'request':
msg.Enqueue(mlist, torequest=1)
elif command in ('join', 'leave'):
# TBD: this is a hack!
if command == 'join':
msg['Subject'] = 'subscribe'
else:
msg['Subject'] = 'unsubscribe'
msg.Enqueue(mlist, torequest=1)
class Options:
setuid = 1
classname = 'PureProxy'
size_limit = None
def parseargs():
global DEBUGSTREAM
try:
opts, args = getopt.getopt(
sys.argv[1:], 'nVhc:s:d',
['class=', 'nosetuid', 'version', 'help', 'size=', 'debug'])
except getopt.error as e:
usage(1, e)
options = Options()
for opt, arg in opts:
if opt in ('-h', '--help'):
usage(0)
elif opt in ('-V', '--version'):
print(__version__)
sys.exit(0)
elif opt in ('-n', '--nosetuid'):
options.setuid = 0
elif opt in ('-c', '--class'):
options.classname = arg
elif opt in ('-d', '--debug'):
DEBUGSTREAM = sys.stderr
elif opt in ('-s', '--size'):
try:
int_size = int(arg)
options.size_limit = int_size
except:
print('Invalid size: ' + arg, file=sys.stderr)
sys.exit(1)
# parse the rest of the arguments
if len(args) < 1:
localspec = 'localhost:8025'
remotespec = 'localhost:25'
elif len(args) < 2:
localspec = args[0]
remotespec = 'localhost:25'
elif len(args) < 3:
localspec = args[0]
remotespec = args[1]
else:
usage(1, 'Invalid arguments: %s' % COMMASPACE.join(args))
# split into host/port pairs
i = localspec.find(':')
if i < 0:
usage(1, 'Bad local spec: %s' % localspec)
options.localhost = localspec[:i]
try:
options.localport = int(localspec[i+1:])
except ValueError:
usage(1, 'Bad local port: %s' % localspec)
i = remotespec.find(':')
if i < 0:
usage(1, 'Bad remote spec: %s' % remotespec)
options.remotehost = remotespec[:i]
try:
options.remoteport = int(remotespec[i+1:])
except ValueError:
usage(1, 'Bad remote port: %s' % remotespec)
return options
if __name__ == '__main__':
options = parseargs()
# Become nobody
classname = options.classname
if "." in classname:
lastdot = classname.rfind(".")
mod = __import__(classname[:lastdot], globals(), locals(), [""])
classname = classname[lastdot+1:]
else:
import __main__ as mod
class_ = getattr(mod, classname)
proxy = class_((options.localhost, options.localport),
(options.remotehost, options.remoteport),
options.size_limit)
if options.setuid:
try:
import pwd
except ImportError:
print('Cannot import module "pwd"; try running with -n option.', file=sys.stderr)
sys.exit(1)
nobody = pwd.getpwnam('nobody')[2]
try:
os.setuid(nobody)
except PermissionError:
print('Cannot setuid "nobody"; try running with -n option.', file=sys.stderr)
sys.exit(1)
try:
asyncore.loop()
except KeyboardInterrupt:
pass
#! /usr/bin/env python3
'''SMTP/ESMTP client class.
This should follow RFC 821 (SMTP), RFC 1869 (ESMTP), RFC 2554 (SMTP
Authentication) and RFC 2487 (Secure SMTP over TLS).
Notes:
Please remember, when doing ESMTP, that the names of the SMTP service
extensions are NOT the same thing as the option keywords for the RCPT
and MAIL commands!
Example:
>>> import smtplib
>>> s=smtplib.SMTP("localhost")
>>> print(s.help())
This is Sendmail version 8.8.4
Topics:
HELO EHLO MAIL RCPT DATA
RSET NOOP QUIT HELP VRFY
EXPN VERB ETRN DSN
For more info use "HELP <topic>".
To report bugs in the implementation send email to
[email protected].
For local information send email to Postmaster at your site.
End of HELP info
>>> s.putcmd("vrfy","someone@here")
>>> s.getreply()
(250, "Somebody OverHere <[email protected]>")
>>> s.quit()
'''
# Author: The Dragon De Monsyne <[email protected]>
# ESMTP support, test code and doc fixes added by
# Eric S. Raymond <[email protected]>
# Better RFC 821 compliance (MAIL and RCPT, and CRLF in data)
# by Carey Evans <[email protected]>, for picky mail servers.
# RFC 2554 (authentication) support by Gerhard Haering <[email protected]>.
#
# This was modified from the Python 1.5 library HTTP lib.
import socket
import io
import re
import email.utils
import email.message
import email.generator
import base64
import hmac
import copy
from email.base64mime import body_encode as encode_base64
from sys import stderr
__all__ = ["SMTPException", "SMTPServerDisconnected", "SMTPResponseException",
"SMTPSenderRefused", "SMTPRecipientsRefused", "SMTPDataError",
"SMTPConnectError", "SMTPHeloError", "SMTPAuthenticationError",
"quoteaddr", "quotedata", "SMTP"]
SMTP_PORT = 25
SMTP_SSL_PORT = 465
CRLF = "\r\n"
bCRLF = b"\r\n"
_MAXLINE = 8192 # more than 8 times larger than RFC 821, 4.5.3
OLDSTYLE_AUTH = re.compile(r"auth=(.*)", re.I)
# Exception classes used by this module.
class SMTPException(OSError):
"""Base class for all exceptions raised by this module."""
class SMTPServerDisconnected(SMTPException):
"""Not connected to any SMTP server.
This exception is raised when the server unexpectedly disconnects,
or when an attempt is made to use the SMTP instance before
connecting it to a server.
"""
class SMTPResponseException(SMTPException):
"""Base class for all exceptions that include an SMTP error code.
These exceptions are generated in some instances when the SMTP
server returns an error code. The error code is stored in the
`smtp_code' attribute of the error, and the `smtp_error' attribute
is set to the error message.
"""
def __init__(self, code, msg):
self.smtp_code = code
self.smtp_error = msg
self.args = (code, msg)
class SMTPSenderRefused(SMTPResponseException):
"""Sender address refused.
In addition to the attributes set by on all SMTPResponseException
exceptions, this sets `sender' to the string that the SMTP refused.
"""
def __init__(self, code, msg, sender):
self.smtp_code = code
self.smtp_error = msg
self.sender = sender
self.args = (code, msg, sender)
class SMTPRecipientsRefused(SMTPException):
"""All recipient addresses refused.
The errors for each recipient are accessible through the attribute
'recipients', which is a dictionary of exactly the same sort as
SMTP.sendmail() returns.
"""
def __init__(self, recipients):
self.recipients = recipients
self.args = (recipients,)
class SMTPDataError(SMTPResponseException):
"""The SMTP server didn't accept the data."""
class SMTPConnectError(SMTPResponseException):
"""Error during connection establishment."""
class SMTPHeloError(SMTPResponseException):
"""The server refused our HELO reply."""
class SMTPAuthenticationError(SMTPResponseException):
"""Authentication error.
Most probably the server didn't accept the username/password
combination provided.
"""
def quoteaddr(addrstring):
"""Quote a subset of the email addresses defined by RFC 821.
Should be able to handle anything email.utils.parseaddr can handle.
"""
displayname, addr = email.utils.parseaddr(addrstring)
if (displayname, addr) == ('', ''):
# parseaddr couldn't parse it, use it as is and hope for the best.
if addrstring.strip().startswith('<'):
return addrstring
return "<%s>" % addrstring
return "<%s>" % addr
def _addr_only(addrstring):
displayname, addr = email.utils.parseaddr(addrstring)
if (displayname, addr) == ('', ''):
# parseaddr couldn't parse it, so use it as is.
return addrstring
return addr
# Legacy method kept for backward compatibility.
def quotedata(data):
"""Quote data for email.
Double leading '.', and change Unix newline '\\n', or Mac '\\r' into
Internet CRLF end-of-line.
"""
return re.sub(r'(?m)^\.', '..',
re.sub(r'(?:\r\n|\n|\r(?!\n))', CRLF, data))
def _quote_periods(bindata):
return re.sub(br'(?m)^\.', b'..', bindata)
def _fix_eols(data):
return re.sub(r'(?:\r\n|\n|\r(?!\n))', CRLF, data)
try:
import ssl
except ImportError:
_have_ssl = False
else:
_have_ssl = True
class SMTP:
"""This class manages a connection to an SMTP or ESMTP server.
SMTP Objects:
SMTP objects have the following attributes:
helo_resp
This is the message given by the server in response to the
most recent HELO command.
ehlo_resp
This is the message given by the server in response to the
most recent EHLO command. This is usually multiline.
does_esmtp
This is a True value _after you do an EHLO command_, if the
server supports ESMTP.
esmtp_features
This is a dictionary, which, if the server supports ESMTP,
will _after you do an EHLO command_, contain the names of the
SMTP service extensions this server supports, and their
parameters (if any).
Note, all extension names are mapped to lower case in the
dictionary.
See each method's docstrings for details. In general, there is a
method of the same name to perform each SMTP command. There is also a
method called 'sendmail' that will do an entire mail transaction.
"""
debuglevel = 0
file = None
helo_resp = None
ehlo_msg = "ehlo"
ehlo_resp = None
does_esmtp = 0
default_port = SMTP_PORT
def __init__(self, host='', port=0, local_hostname=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None):
"""Initialize a new instance.
If specified, `host' is the name of the remote host to which to
connect. If specified, `port' specifies the port to which to connect.
By default, smtplib.SMTP_PORT is used. If a host is specified the
connect method is called, and if it returns anything other than a
success code an SMTPConnectError is raised. If specified,
`local_hostname` is used as the FQDN of the local host in the HELO/EHLO
command. Otherwise, the local hostname is found using
socket.getfqdn(). The `source_address` parameter takes a 2-tuple (host,
port) for the socket to bind to as its source address before
connecting. If the host is '' and port is 0, the OS default behavior
will be used.
"""
self._host = host
self.timeout = timeout
self.esmtp_features = {}
self.source_address = source_address
if host:
(code, msg) = self.connect(host, port)
if code != 220:
raise SMTPConnectError(code, msg)
if local_hostname is not None:
self.local_hostname = local_hostname
else:
# RFC 2821 says we should use the fqdn in the EHLO/HELO verb, and
# if that can't be calculated, that we should use a domain literal
# instead (essentially an encoded IP address like [A.B.C.D]).
fqdn = socket.getfqdn()
if '.' in fqdn:
self.local_hostname = fqdn
else:
# We can't find an fqdn hostname, so use a domain literal
addr = '127.0.0.1'
try:
addr = socket.gethostbyname(socket.gethostname())
except socket.gaierror:
pass
self.local_hostname = '[%s]' % addr
def __enter__(self):
return self
def __exit__(self, *args):
try:
code, message = self.docmd("QUIT")
if code != 221:
raise SMTPResponseException(code, message)
except SMTPServerDisconnected:
pass
finally:
self.close()
def set_debuglevel(self, debuglevel):
"""Set the debug output level.
A non-false value results in debug messages for connection and for all
messages sent to and received from the server.
"""
self.debuglevel = debuglevel
def _get_socket(self, host, port, timeout):
# This makes it simpler for SMTP_SSL to use the SMTP connect code
# and just alter the socket connection bit.
if self.debuglevel > 0:
print('connect: to', (host, port), self.source_address,
file=stderr)
return socket.create_connection((host, port), timeout,
self.source_address)
def connect(self, host='localhost', port=0, source_address=None):
"""Connect to a host on a given port.
If the hostname ends with a colon (`:') followed by a number, and
there is no port specified, that suffix will be stripped off and the
number interpreted as the port number to use.
Note: This method is automatically invoked by __init__, if a host is
specified during instantiation.
"""
if source_address:
self.source_address = source_address
if not port and (host.find(':') == host.rfind(':')):
i = host.rfind(':')
if i >= 0:
host, port = host[:i], host[i + 1:]
try:
port = int(port)
except ValueError:
raise OSError("nonnumeric port")
if not port:
port = self.default_port
if self.debuglevel > 0:
print('connect:', (host, port), file=stderr)
self.sock = self._get_socket(host, port, self.timeout)
self.file = None
(code, msg) = self.getreply()
if self.debuglevel > 0:
print("connect:", msg, file=stderr)
return (code, msg)
def send(self, s):
"""Send `s' to the server."""
if self.debuglevel > 0:
print('send:', repr(s), file=stderr)
if hasattr(self, 'sock') and self.sock:
if isinstance(s, str):
s = s.encode("ascii")
try:
self.sock.sendall(s)
except OSError:
self.close()
raise SMTPServerDisconnected('Server not connected')
else:
raise SMTPServerDisconnected('please run connect() first')
def putcmd(self, cmd, args=""):
"""Send a command to the server."""
if args == "":
str = '%s%s' % (cmd, CRLF)
else:
str = '%s %s%s' % (cmd, args, CRLF)
self.send(str)
def getreply(self):
"""Get a reply from the server.
Returns a tuple consisting of:
- server response code (e.g. '250', or such, if all goes well)
Note: returns -1 if it can't read response code.
- server response string corresponding to response code (multiline
responses are converted to a single, multiline string).
Raises SMTPServerDisconnected if end-of-file is reached.
"""
resp = []
if self.file is None:
self.file = self.sock.makefile('rb')
while 1:
try:
line = self.file.readline(_MAXLINE + 1)
except OSError as e:
self.close()
raise SMTPServerDisconnected("Connection unexpectedly closed: "
+ str(e))
if not line:
self.close()
raise SMTPServerDisconnected("Connection unexpectedly closed")
if self.debuglevel > 0:
print('reply:', repr(line), file=stderr)
if len(line) > _MAXLINE:
self.close()
raise SMTPResponseException(500, "Line too long.")
resp.append(line[4:].strip(b' \t\r\n'))
code = line[:3]
# Check that the error code is syntactically correct.
# Don't attempt to read a continuation line if it is broken.
try:
errcode = int(code)
except ValueError:
errcode = -1
break
# Check if multiline response.
if line[3:4] != b"-":
break
errmsg = b"\n".join(resp)
if self.debuglevel > 0:
print('reply: retcode (%s); Msg: %s' % (errcode, errmsg),
file=stderr)
return errcode, errmsg
def docmd(self, cmd, args=""):
"""Send a command, and return its response code."""
self.putcmd(cmd, args)
return self.getreply()
# std smtp commands
def helo(self, name=''):
"""SMTP 'helo' command.
Hostname to send for this command defaults to the FQDN of the local
host.
"""
self.putcmd("helo", name or self.local_hostname)
(code, msg) = self.getreply()
self.helo_resp = msg
return (code, msg)
def ehlo(self, name=''):
""" SMTP 'ehlo' command.
Hostname to send for this command defaults to the FQDN of the local
host.
"""
self.esmtp_features = {}
self.putcmd(self.ehlo_msg, name or self.local_hostname)
(code, msg) = self.getreply()
# According to RFC1869 some (badly written)
# MTA's will disconnect on an ehlo. Toss an exception if
# that happens -ddm
if code == -1 and len(msg) == 0:
self.close()
raise SMTPServerDisconnected("Server not connected")
self.ehlo_resp = msg
if code != 250:
return (code, msg)
self.does_esmtp = 1
#parse the ehlo response -ddm
assert isinstance(self.ehlo_resp, bytes), repr(self.ehlo_resp)
resp = self.ehlo_resp.decode("latin-1").split('\n')
del resp[0]
for each in resp:
# To be able to communicate with as many SMTP servers as possible,
# we have to take the old-style auth advertisement into account,
# because:
# 1) Else our SMTP feature parser gets confused.
# 2) There are some servers that only advertise the auth methods we
# support using the old style.
auth_match = OLDSTYLE_AUTH.match(each)
if auth_match:
# This doesn't remove duplicates, but that's no problem
self.esmtp_features["auth"] = self.esmtp_features.get("auth", "") \
+ " " + auth_match.groups(0)[0]
continue
# RFC 1869 requires a space between ehlo keyword and parameters.
# It's actually stricter, in that only spaces are allowed between
# parameters, but were not going to check for that here. Note
# that the space isn't present if there are no parameters.
m = re.match(r'(?P<feature>[A-Za-z0-9][A-Za-z0-9\-]*) ?', each)
if m:
feature = m.group("feature").lower()
params = m.string[m.end("feature"):].strip()
if feature == "auth":
self.esmtp_features[feature] = self.esmtp_features.get(feature, "") \
+ " " + params
else:
self.esmtp_features[feature] = params
return (code, msg)
def has_extn(self, opt):
"""Does the server support a given SMTP service extension?"""
return opt.lower() in self.esmtp_features
def help(self, args=''):
"""SMTP 'help' command.
Returns help text from server."""
self.putcmd("help", args)
return self.getreply()[1]
def rset(self):
"""SMTP 'rset' command -- resets session."""
return self.docmd("rset")
def _rset(self):
"""Internal 'rset' command which ignores any SMTPServerDisconnected error.
Used internally in the library, since the server disconnected error
should appear to the application when the *next* command is issued, if
we are doing an internal "safety" reset.
"""
try:
self.rset()
except SMTPServerDisconnected:
pass
def noop(self):
"""SMTP 'noop' command -- doesn't do anything :>"""
return self.docmd("noop")
def mail(self, sender, options=[]):
"""SMTP 'mail' command -- begins mail xfer session."""
optionlist = ''
if options and self.does_esmtp:
optionlist = ' ' + ' '.join(options)
self.putcmd("mail", "FROM:%s%s" % (quoteaddr(sender), optionlist))
return self.getreply()
def rcpt(self, recip, options=[]):
"""SMTP 'rcpt' command -- indicates 1 recipient for this mail."""
optionlist = ''
if options and self.does_esmtp:
optionlist = ' ' + ' '.join(options)
self.putcmd("rcpt", "TO:%s%s" % (quoteaddr(recip), optionlist))
return self.getreply()
def data(self, msg):
"""SMTP 'DATA' command -- sends message data to server.
Automatically quotes lines beginning with a period per rfc821.
Raises SMTPDataError if there is an unexpected reply to the
DATA command; the return value from this method is the final
response code received when the all data is sent. If msg
is a string, lone '\\r' and '\\n' characters are converted to
'\\r\\n' characters. If msg is bytes, it is transmitted as is.
"""
self.putcmd("data")
(code, repl) = self.getreply()
if self.debuglevel > 0:
print("data:", (code, repl), file=stderr)
if code != 354:
raise SMTPDataError(code, repl)
else:
if isinstance(msg, str):
msg = _fix_eols(msg).encode('ascii')
q = _quote_periods(msg)
if q[-2:] != bCRLF:
q = q + bCRLF
q = q + b"." + bCRLF
self.send(q)
(code, msg) = self.getreply()
if self.debuglevel > 0:
print("data:", (code, msg), file=stderr)
return (code, msg)
def verify(self, address):
"""SMTP 'verify' command -- checks for address validity."""
self.putcmd("vrfy", _addr_only(address))
return self.getreply()
# a.k.a.
vrfy = verify
def expn(self, address):
"""SMTP 'expn' command -- expands a mailing list."""
self.putcmd("expn", _addr_only(address))
return self.getreply()
# some useful methods
def ehlo_or_helo_if_needed(self):
"""Call self.ehlo() and/or self.helo() if needed.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
"""
if self.helo_resp is None and self.ehlo_resp is None:
if not (200 <= self.ehlo()[0] <= 299):
(code, resp) = self.helo()
if not (200 <= code <= 299):
raise SMTPHeloError(code, resp)
def login(self, user, password):
"""Log in on an SMTP server that requires authentication.
The arguments are:
- user: The user name to authenticate with.
- password: The password for the authentication.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.
This method will return normally if the authentication was successful.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
SMTPAuthenticationError The server didn't accept the username/
password combination.
SMTPException No suitable authentication method was
found.
"""
def encode_cram_md5(challenge, user, password):
challenge = base64.decodebytes(challenge)
response = user + " " + hmac.HMAC(password.encode('ascii'),
challenge, 'md5').hexdigest()
return encode_base64(response.encode('ascii'), eol='')
def encode_plain(user, password):
s = "\0%s\0%s" % (user, password)
return encode_base64(s.encode('ascii'), eol='')
AUTH_PLAIN = "PLAIN"
AUTH_CRAM_MD5 = "CRAM-MD5"
AUTH_LOGIN = "LOGIN"
self.ehlo_or_helo_if_needed()
if not self.has_extn("auth"):
raise SMTPException("SMTP AUTH extension not supported by server.")
# Authentication methods the server claims to support
advertised_authlist = self.esmtp_features["auth"].split()
# List of authentication methods we support: from preferred to
# less preferred methods. Except for the purpose of testing the weaker
# ones, we prefer stronger methods like CRAM-MD5:
preferred_auths = [AUTH_CRAM_MD5, AUTH_PLAIN, AUTH_LOGIN]
# We try the authentication methods the server advertises, but only the
# ones *we* support. And in our preferred order.
authlist = [auth for auth in preferred_auths if auth in advertised_authlist]
if not authlist:
raise SMTPException("No suitable authentication method found.")
# Some servers advertise authentication methods they don't really
# support, so if authentication fails, we continue until we've tried
# all methods.
for authmethod in authlist:
if authmethod == AUTH_CRAM_MD5:
(code, resp) = self.docmd("AUTH", AUTH_CRAM_MD5)
if code == 334:
(code, resp) = self.docmd(encode_cram_md5(resp, user, password))
elif authmethod == AUTH_PLAIN:
(code, resp) = self.docmd("AUTH",
AUTH_PLAIN + " " + encode_plain(user, password))
elif authmethod == AUTH_LOGIN:
(code, resp) = self.docmd("AUTH",
"%s %s" % (AUTH_LOGIN, encode_base64(user.encode('ascii'), eol='')))
if code == 334:
(code, resp) = self.docmd(encode_base64(password.encode('ascii'), eol=''))
# 235 == 'Authentication successful'
# 503 == 'Error: already authenticated'
if code in (235, 503):
return (code, resp)
# We could not login sucessfully. Return result of last attempt.
raise SMTPAuthenticationError(code, resp)
def starttls(self, keyfile=None, certfile=None, context=None):
"""Puts the connection to the SMTP server into TLS mode.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.
If the server supports TLS, this will encrypt the rest of the SMTP
session. If you provide the keyfile and certfile parameters,
the identity of the SMTP server and client can be checked. This,
however, depends on whether the socket module really checks the
certificates.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
"""
self.ehlo_or_helo_if_needed()
if not self.has_extn("starttls"):
raise SMTPException("STARTTLS extension not supported by server.")
(resp, reply) = self.docmd("STARTTLS")
if resp == 220:
if not _have_ssl:
raise RuntimeError("No SSL support included in this Python")
if context is not None and keyfile is not None:
raise ValueError("context and keyfile arguments are mutually "
"exclusive")
if context is not None and certfile is not None:
raise ValueError("context and certfile arguments are mutually "
"exclusive")
if context is None:
context = ssl._create_stdlib_context(certfile=certfile,
keyfile=keyfile)
self.sock = context.wrap_socket(self.sock,
server_hostname=self._host)
self.file = None
# RFC 3207:
# The client MUST discard any knowledge obtained from
# the server, such as the list of SMTP service extensions,
# which was not obtained from the TLS negotiation itself.
self.helo_resp = None
self.ehlo_resp = None
self.esmtp_features = {}
self.does_esmtp = 0
else:
# RFC 3207:
# 501 Syntax error (no parameters allowed)
# 454 TLS not available due to temporary reason
raise SMTPResponseException(resp, reply)
return (resp, reply)
def sendmail(self, from_addr, to_addrs, msg, mail_options=[],
rcpt_options=[]):
"""This command performs an entire mail transaction.
The arguments are:
- from_addr : The address sending this mail.
- to_addrs : A list of addresses to send this mail to. A bare
string will be treated as a list with 1 address.
- msg : The message to send.
- mail_options : List of ESMTP options (such as 8bitmime) for the
mail command.
- rcpt_options : List of ESMTP options (such as DSN commands) for
all the rcpt commands.
msg may be a string containing characters in the ASCII range, or a byte
string. A string is encoded to bytes using the ascii codec, and lone
\\r and \\n characters are converted to \\r\\n characters.
If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first. If the server does ESMTP, message size
and each of the specified options will be passed to it. If EHLO
fails, HELO will be tried and ESMTP options suppressed.
This method will return normally if the mail is accepted for at least
one recipient. It returns a dictionary, with one entry for each
recipient that was refused. Each entry contains a tuple of the SMTP
error code and the accompanying error message sent by the server.
This method may raise the following exceptions:
SMTPHeloError The server didn't reply properly to
the helo greeting.
SMTPRecipientsRefused The server rejected ALL recipients
(no mail was sent).
SMTPSenderRefused The server didn't accept the from_addr.
SMTPDataError The server replied with an unexpected
error code (other than a refusal of
a recipient).
Note: the connection will be open even after an exception is raised.
Example:
>>> import smtplib
>>> s=smtplib.SMTP("localhost")
>>> tolist=["[email protected]","[email protected]","[email protected]","[email protected]"]
>>> msg = '''\\
... From: [email protected]
... Subject: testin'...
...
... This is a test '''
>>> s.sendmail("[email protected]",tolist,msg)
{ "[email protected]" : ( 550 ,"User unknown" ) }
>>> s.quit()
In the above example, the message was accepted for delivery to three
of the four addresses, and one was rejected, with the error code
550. If all addresses are accepted, then the method will return an
empty dictionary.
"""
self.ehlo_or_helo_if_needed()
esmtp_opts = []
if isinstance(msg, str):
msg = _fix_eols(msg).encode('ascii')
if self.does_esmtp:
# Hmmm? what's this? -ddm
# self.esmtp_features['7bit']=""
if self.has_extn('size'):
esmtp_opts.append("size=%d" % len(msg))
for option in mail_options:
esmtp_opts.append(option)
(code, resp) = self.mail(from_addr, esmtp_opts)
if code != 250:
if code == 421:
self.close()
else:
self._rset()
raise SMTPSenderRefused(code, resp, from_addr)
senderrs = {}
if isinstance(to_addrs, str):
to_addrs = [to_addrs]
for each in to_addrs:
(code, resp) = self.rcpt(each, rcpt_options)
if (code != 250) and (code != 251):
senderrs[each] = (code, resp)
if code == 421:
self.close()
raise SMTPRecipientsRefused(senderrs)
if len(senderrs) == len(to_addrs):
# the server refused all our recipients
self._rset()
raise SMTPRecipientsRefused(senderrs)
(code, resp) = self.data(msg)
if code != 250:
if code == 421:
self.close()
else:
self._rset()
raise SMTPDataError(code, resp)
#if we got here then somebody got our mail
return senderrs
def send_message(self, msg, from_addr=None, to_addrs=None,
mail_options=[], rcpt_options={}):
"""Converts message to a bytestring and passes it to sendmail.
The arguments are as for sendmail, except that msg is an
email.message.Message object. If from_addr is None or to_addrs is
None, these arguments are taken from the headers of the Message as
described in RFC 2822 (a ValueError is raised if there is more than
one set of 'Resent-' headers). Regardless of the values of from_addr and
to_addr, any Bcc field (or Resent-Bcc field, when the Message is a
resent) of the Message object won't be transmitted. The Message
object is then serialized using email.generator.BytesGenerator and
sendmail is called to transmit the message.
"""
# 'Resent-Date' is a mandatory field if the Message is resent (RFC 2822
# Section 3.6.6). In such a case, we use the 'Resent-*' fields. However,
# if there is more than one 'Resent-' block there's no way to
# unambiguously determine which one is the most recent in all cases,
# so rather than guess we raise a ValueError in that case.
#
# TODO implement heuristics to guess the correct Resent-* block with an
# option allowing the user to enable the heuristics. (It should be
# possible to guess correctly almost all of the time.)
resent = msg.get_all('Resent-Date')
if resent is None:
header_prefix = ''
elif len(resent) == 1:
header_prefix = 'Resent-'
else:
raise ValueError("message has more than one 'Resent-' header block")
if from_addr is None:
# Prefer the sender field per RFC 2822:3.6.2.
from_addr = (msg[header_prefix + 'Sender']
if (header_prefix + 'Sender') in msg
else msg[header_prefix + 'From'])
if to_addrs is None:
addr_fields = [f for f in (msg[header_prefix + 'To'],
msg[header_prefix + 'Bcc'],
msg[header_prefix + 'Cc']) if f is not None]
to_addrs = [a[1] for a in email.utils.getaddresses(addr_fields)]
# Make a local copy so we can delete the bcc headers.
msg_copy = copy.copy(msg)
del msg_copy['Bcc']
del msg_copy['Resent-Bcc']
with io.BytesIO() as bytesmsg:
g = email.generator.BytesGenerator(bytesmsg)
g.flatten(msg_copy, linesep='\r\n')
flatmsg = bytesmsg.getvalue()
return self.sendmail(from_addr, to_addrs, flatmsg, mail_options,
rcpt_options)
def close(self):
"""Close the connection to the SMTP server."""
try:
file = self.file
self.file = None
if file:
file.close()
finally:
sock = self.sock
self.sock = None
if sock:
sock.close()
def quit(self):
"""Terminate the SMTP session."""
res = self.docmd("quit")
# A new EHLO is required after reconnecting with connect()
self.ehlo_resp = self.helo_resp = None
self.esmtp_features = {}
self.does_esmtp = False
self.close()
return res
if _have_ssl:
class SMTP_SSL(SMTP):
""" This is a subclass derived from SMTP that connects over an SSL
encrypted socket (to use this class you need a socket module that was
compiled with SSL support). If host is not specified, '' (the local
host) is used. If port is omitted, the standard SMTP-over-SSL port
(465) is used. local_hostname and source_address have the same meaning
as they do in the SMTP class. keyfile and certfile are also optional -
they can contain a PEM formatted private key and certificate chain file
for the SSL connection. context also optional, can contain a
SSLContext, and is an alternative to keyfile and certfile; If it is
specified both keyfile and certfile must be None.
"""
default_port = SMTP_SSL_PORT
def __init__(self, host='', port=0, local_hostname=None,
keyfile=None, certfile=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None, context=None):
if context is not None and keyfile is not None:
raise ValueError("context and keyfile arguments are mutually "
"exclusive")
if context is not None and certfile is not None:
raise ValueError("context and certfile arguments are mutually "
"exclusive")
self.keyfile = keyfile
self.certfile = certfile
if context is None:
context = ssl._create_stdlib_context(certfile=certfile,
keyfile=keyfile)
self.context = context
SMTP.__init__(self, host, port, local_hostname, timeout,
source_address)
def _get_socket(self, host, port, timeout):
if self.debuglevel > 0:
print('connect:', (host, port), file=stderr)
new_socket = socket.create_connection((host, port), timeout,
self.source_address)
new_socket = self.context.wrap_socket(new_socket,
server_hostname=self._host)
return new_socket
__all__.append("SMTP_SSL")
#
# LMTP extension
#
LMTP_PORT = 2003
class LMTP(SMTP):
"""LMTP - Local Mail Transfer Protocol
The LMTP protocol, which is very similar to ESMTP, is heavily based
on the standard SMTP client. It's common to use Unix sockets for
LMTP, so our connect() method must support that as well as a regular
host:port server. local_hostname and source_address have the same
meaning as they do in the SMTP class. To specify a Unix socket,
you must use an absolute path as the host, starting with a '/'.
Authentication is supported, using the regular SMTP mechanism. When
using a Unix socket, LMTP generally don't support or require any
authentication, but your mileage might vary."""
ehlo_msg = "lhlo"
def __init__(self, host='', port=LMTP_PORT, local_hostname=None,
source_address=None):
"""Initialize a new instance."""
SMTP.__init__(self, host, port, local_hostname=local_hostname,
source_address=source_address)
def connect(self, host='localhost', port=0, source_address=None):
"""Connect to the LMTP daemon, on either a Unix or a TCP socket."""
if host[0] != '/':
return SMTP.connect(self, host, port, source_address=source_address)
# Handle Unix-domain sockets.
try:
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.file = None
self.sock.connect(host)
except OSError:
if self.debuglevel > 0:
print('connect fail:', host, file=stderr)
if self.sock:
self.sock.close()
self.sock = None
raise
(code, msg) = self.getreply()
if self.debuglevel > 0:
print('connect:', msg, file=stderr)
return (code, msg)
# Test the sendmail method, which tests most of the others.
# Note: This always sends to localhost.
if __name__ == '__main__':
import sys
def prompt(prompt):
sys.stdout.write(prompt + ": ")
sys.stdout.flush()
return sys.stdin.readline().strip()
fromaddr = prompt("From")
toaddrs = prompt("To").split(',')
print("Enter message, end with ^D:")
msg = ''
while 1:
line = sys.stdin.readline()
if not line:
break
msg = msg + line
print("Message length is %d" % len(msg))
server = SMTP('localhost')
server.set_debuglevel(1)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()
"""Routines to help recognizing sound files.
Function whathdr() recognizes various types of sound file headers.
It understands almost all headers that SOX can decode.
The return tuple contains the following items, in this order:
- file type (as SOX understands it)
- sampling rate (0 if unknown or hard to decode)
- number of channels (0 if unknown or hard to decode)
- number of frames in the file (-1 if unknown or hard to decode)
- number of bits/sample, or 'U' for U-LAW, or 'A' for A-LAW
If the file doesn't have a recognizable type, it returns None.
If the file can't be opened, OSError is raised.
To compute the total time, divide the number of frames by the
sampling rate (a frame contains a sample for each channel).
Function what() calls whathdr(). (It used to also use some
heuristics for raw data, but this doesn't work very well.)
Finally, the function test() is a simple main program that calls
what() for all files mentioned on the argument list. For directory
arguments it calls what() for all files in that directory. Default
argument is "." (testing all files in the current directory). The
option -r tells it to recurse down directories found inside
explicitly given directories.
"""
# The file structure is top-down except that the test program and its
# subroutine come last.
__all__ = ['what', 'whathdr']
def what(filename):
"""Guess the type of a sound file."""
res = whathdr(filename)
return res
def whathdr(filename):
"""Recognize sound headers."""
with open(filename, 'rb') as f:
h = f.read(512)
for tf in tests:
res = tf(h, f)
if res:
return res
return None
#-----------------------------------#
# Subroutines per sound header type #
#-----------------------------------#
tests = []
def test_aifc(h, f):
import aifc
if not h.startswith(b'FORM'):
return None
if h[8:12] == b'AIFC':
fmt = 'aifc'
elif h[8:12] == b'AIFF':
fmt = 'aiff'
else:
return None
f.seek(0)
try:
a = aifc.open(f, 'r')
except (EOFError, aifc.Error):
return None
return (fmt, a.getframerate(), a.getnchannels(),
a.getnframes(), 8 * a.getsampwidth())
tests.append(test_aifc)
def test_au(h, f):
if h.startswith(b'.snd'):
func = get_long_be
elif h[:4] in (b'\0ds.', b'dns.'):
func = get_long_le
else:
return None
filetype = 'au'
hdr_size = func(h[4:8])
data_size = func(h[8:12])
encoding = func(h[12:16])
rate = func(h[16:20])
nchannels = func(h[20:24])
sample_size = 1 # default
if encoding == 1:
sample_bits = 'U'
elif encoding == 2:
sample_bits = 8
elif encoding == 3:
sample_bits = 16
sample_size = 2
else:
sample_bits = '?'
frame_size = sample_size * nchannels
if frame_size:
nframe = data_size / frame_size
else:
nframe = -1
return filetype, rate, nchannels, nframe, sample_bits
tests.append(test_au)
def test_hcom(h, f):
if h[65:69] != b'FSSD' or h[128:132] != b'HCOM':
return None
divisor = get_long_be(h[144:148])
if divisor:
rate = 22050 / divisor
else:
rate = 0
return 'hcom', rate, 1, -1, 8
tests.append(test_hcom)
def test_voc(h, f):
if not h.startswith(b'Creative Voice File\032'):
return None
sbseek = get_short_le(h[20:22])
rate = 0
if 0 <= sbseek < 500 and h[sbseek] == 1:
ratecode = 256 - h[sbseek+4]
if ratecode:
rate = int(1000000.0 / ratecode)
return 'voc', rate, 1, -1, 8
tests.append(test_voc)
def test_wav(h, f):
import wave
# 'RIFF' <len> 'WAVE' 'fmt ' <len>
if not h.startswith(b'RIFF') or h[8:12] != b'WAVE' or h[12:16] != b'fmt ':
return None
f.seek(0)
try:
w = wave.openfp(f, 'r')
except (EOFError, wave.Error):
return None
return ('wav', w.getframerate(), w.getnchannels(),
w.getnframes(), 8*w.getsampwidth())
tests.append(test_wav)
def test_8svx(h, f):
if not h.startswith(b'FORM') or h[8:12] != b'8SVX':
return None
# Should decode it to get #channels -- assume always 1
return '8svx', 0, 1, 0, 8
tests.append(test_8svx)
def test_sndt(h, f):
if h.startswith(b'SOUND'):
nsamples = get_long_le(h[8:12])
rate = get_short_le(h[20:22])
return 'sndt', rate, 1, nsamples, 8
tests.append(test_sndt)
def test_sndr(h, f):
if h.startswith(b'\0\0'):
rate = get_short_le(h[2:4])
if 4000 <= rate <= 25000:
return 'sndr', rate, 1, -1, 8
tests.append(test_sndr)
#-------------------------------------------#
# Subroutines to extract numbers from bytes #
#-------------------------------------------#
def get_long_be(b):
return (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[3]
def get_long_le(b):
return (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | b[0]
def get_short_be(b):
return (b[0] << 8) | b[1]
def get_short_le(b):
return (b[1] << 8) | b[0]
#--------------------#
# Small test program #
#--------------------#
def test():
import sys
recursive = 0
if sys.argv[1:] and sys.argv[1] == '-r':
del sys.argv[1:2]
recursive = 1
try:
if sys.argv[1:]:
testall(sys.argv[1:], recursive, 1)
else:
testall(['.'], recursive, 1)
except KeyboardInterrupt:
sys.stderr.write('\n[Interrupted]\n')
sys.exit(1)
def testall(list, recursive, toplevel):
import sys
import os
for filename in list:
if os.path.isdir(filename):
print(filename + '/:', end=' ')
if recursive or toplevel:
print('recursing down:')
import glob
names = glob.glob(os.path.join(filename, '*'))
testall(names, recursive, 0)
else:
print('*** directory (use -r) ***')
else:
print(filename + ':', end=' ')
sys.stdout.flush()
try:
print(what(filename))
except OSError:
print('*** not found ***')
if __name__ == '__main__':
test()
# Wrapper module for _socket, providing some additional facilities
# implemented in Python.
"""\
This module provides socket operations and some related functions.
On Unix, it supports IP (Internet Protocol) and Unix domain sockets.
On other systems, it only supports IP. Functions specific for a
socket are available as methods of the socket object.
Functions:
socket() -- create a new socket object
socketpair() -- create a pair of new socket objects [*]
fromfd() -- create a socket object from an open file descriptor [*]
fromshare() -- create a socket object from data received from socket.share() [*]
gethostname() -- return the current hostname
gethostbyname() -- map a hostname to its IP number
gethostbyaddr() -- map an IP number or hostname to DNS info
getservbyname() -- map a service name and a protocol name to a port number
getprotobyname() -- map a protocol name (e.g. 'tcp') to a number
ntohs(), ntohl() -- convert 16, 32 bit int from network to host byte order
htons(), htonl() -- convert 16, 32 bit int from host to network byte order
inet_aton() -- convert IP addr string (123.45.67.89) to 32-bit packed format
inet_ntoa() -- convert 32-bit packed format IP to string (123.45.67.89)
socket.getdefaulttimeout() -- get the default timeout value
socket.setdefaulttimeout() -- set the default timeout value
create_connection() -- connects to an address, with an optional timeout and
optional source address.
[*] not available on all platforms!
Special objects:
SocketType -- type object for socket objects
error -- exception raised for I/O errors
has_ipv6 -- boolean value indicating if IPv6 is supported
IntEnum constants:
AF_INET, AF_UNIX -- socket domains (first argument to socket() call)
SOCK_STREAM, SOCK_DGRAM, SOCK_RAW -- socket types (second argument)
Integer constants:
Many other constants may be defined; these may be used in calls to
the setsockopt() and getsockopt() methods.
"""
import _socket
from _socket import *
import os, sys, io
from enum import IntEnum
try:
import errno
except ImportError:
errno = None
EBADF = getattr(errno, 'EBADF', 9)
EAGAIN = getattr(errno, 'EAGAIN', 11)
EWOULDBLOCK = getattr(errno, 'EWOULDBLOCK', 11)
__all__ = ["fromfd", "getfqdn", "create_connection",
"AddressFamily", "SocketKind"]
__all__.extend(os._get_exports_list(_socket))
# Set up the socket.AF_* socket.SOCK_* constants as members of IntEnums for
# nicer string representations.
# Note that _socket only knows about the integer values. The public interface
# in this module understands the enums and translates them back from integers
# where needed (e.g. .family property of a socket object).
IntEnum._convert(
'AddressFamily',
__name__,
lambda C: C.isupper() and C.startswith('AF_'))
IntEnum._convert(
'SocketKind',
__name__,
lambda C: C.isupper() and C.startswith('SOCK_'))
_LOCALHOST = '127.0.0.1'
_LOCALHOST_V6 = '::1'
def _intenum_converter(value, enum_klass):
"""Convert a numeric family value to an IntEnum member.
If it's not a known member, return the numeric value itself.
"""
try:
return enum_klass(value)
except ValueError:
return value
_realsocket = socket
# WSA error codes
if sys.platform.lower().startswith("win"):
errorTab = {}
errorTab[10004] = "The operation was interrupted."
errorTab[10009] = "A bad file handle was passed."
errorTab[10013] = "Permission denied."
errorTab[10014] = "A fault occurred on the network??" # WSAEFAULT
errorTab[10022] = "An invalid operation was attempted."
errorTab[10035] = "The socket operation would block"
errorTab[10036] = "A blocking operation is already in progress."
errorTab[10048] = "The network address is in use."
errorTab[10054] = "The connection has been reset."
errorTab[10058] = "The network has been shut down."
errorTab[10060] = "The operation timed out."
errorTab[10061] = "Connection refused."
errorTab[10063] = "The name is too long."
errorTab[10064] = "The host is down."
errorTab[10065] = "The host is unreachable."
__all__.append("errorTab")
class socket(_socket.socket):
"""A subclass of _socket.socket adding the makefile() method."""
__slots__ = ["__weakref__", "_io_refs", "_closed"]
def __init__(self, family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None):
# For user code address family and type values are IntEnum members, but
# for the underlying _socket.socket they're just integers. The
# constructor of _socket.socket converts the given argument to an
# integer automatically.
_socket.socket.__init__(self, family, type, proto, fileno)
self._io_refs = 0
self._closed = False
def __enter__(self):
return self
def __exit__(self, *args):
if not self._closed:
self.close()
def __repr__(self):
"""Wrap __repr__() to reveal the real class name and socket
address(es).
"""
closed = getattr(self, '_closed', False)
s = "<%s.%s%s fd=%i, family=%s, type=%s, proto=%i" \
% (self.__class__.__module__,
self.__class__.__name__,
" [closed]" if closed else "",
self.fileno(),
self.family,
self.type,
self.proto)
if not closed:
try:
laddr = self.getsockname()
if laddr:
s += ", laddr=%s" % str(laddr)
except error:
pass
try:
raddr = self.getpeername()
if raddr:
s += ", raddr=%s" % str(raddr)
except error:
pass
s += '>'
return s
def __getstate__(self):
raise TypeError("Cannot serialize socket object")
def dup(self):
"""dup() -> socket object
Duplicate the socket. Return a new socket object connected to the same
system resource. The new socket is non-inheritable.
"""
fd = dup(self.fileno())
sock = self.__class__(self.family, self.type, self.proto, fileno=fd)
sock.settimeout(self.gettimeout())
return sock
def accept(self):
"""accept() -> (socket object, address info)
Wait for an incoming connection. Return a new socket
representing the connection, and the address of the client.
For IP sockets, the address info is a pair (hostaddr, port).
"""
fd, addr = self._accept()
# If our type has the SOCK_NONBLOCK flag, we shouldn't pass it onto the
# new socket. We do not currently allow passing SOCK_NONBLOCK to
# accept4, so the returned socket is always blocking.
type = self.type & ~globals().get("SOCK_NONBLOCK", 0)
sock = socket(self.family, type, self.proto, fileno=fd)
# Issue #7995: if no default timeout is set and the listening
# socket had a (non-zero) timeout, force the new socket in blocking
# mode to override platform-specific socket flags inheritance.
if getdefaulttimeout() is None and self.gettimeout():
sock.setblocking(True)
return sock, addr
def makefile(self, mode="r", buffering=None, *,
encoding=None, errors=None, newline=None):
"""makefile(...) -> an I/O stream connected to the socket
The arguments are as for io.open() after the filename,
except the only mode characters supported are 'r', 'w' and 'b'.
The semantics are similar too. (XXX refactor to share code?)
"""
if not set(mode) <= {"r", "w", "b"}:
raise ValueError("invalid mode %r (only r, w, b allowed)" % (mode,))
writing = "w" in mode
reading = "r" in mode or not writing
assert reading or writing
binary = "b" in mode
rawmode = ""
if reading:
rawmode += "r"
if writing:
rawmode += "w"
raw = SocketIO(self, rawmode)
self._io_refs += 1
if buffering is None:
buffering = -1
if buffering < 0:
buffering = io.DEFAULT_BUFFER_SIZE
if buffering == 0:
if not binary:
raise ValueError("unbuffered streams must be binary")
return raw
if reading and writing:
buffer = io.BufferedRWPair(raw, raw, buffering)
elif reading:
buffer = io.BufferedReader(raw, buffering)
else:
assert writing
buffer = io.BufferedWriter(raw, buffering)
if binary:
return buffer
text = io.TextIOWrapper(buffer, encoding, errors, newline)
text.mode = mode
return text
def _decref_socketios(self):
if self._io_refs > 0:
self._io_refs -= 1
if self._closed:
self.close()
def _real_close(self, _ss=_socket.socket):
# This function should not reference any globals. See issue #808164.
_ss.close(self)
def close(self):
# This function should not reference any globals. See issue #808164.
self._closed = True
if self._io_refs <= 0:
self._real_close()
def detach(self):
"""detach() -> file descriptor
Close the socket object without closing the underlying file descriptor.
The object cannot be used after this call, but the file descriptor
can be reused for other purposes. The file descriptor is returned.
"""
self._closed = True
return super().detach()
@property
def family(self):
"""Read-only access to the address family for this socket.
"""
return _intenum_converter(super().family, AddressFamily)
@property
def type(self):
"""Read-only access to the socket type.
"""
return _intenum_converter(super().type, SocketKind)
if os.name == 'nt':
def get_inheritable(self):
return os.get_handle_inheritable(self.fileno())
def set_inheritable(self, inheritable):
os.set_handle_inheritable(self.fileno(), inheritable)
else:
def get_inheritable(self):
return os.get_inheritable(self.fileno())
def set_inheritable(self, inheritable):
os.set_inheritable(self.fileno(), inheritable)
get_inheritable.__doc__ = "Get the inheritable flag of the socket"
set_inheritable.__doc__ = "Set the inheritable flag of the socket"
def fromfd(fd, family, type, proto=0):
""" fromfd(fd, family, type[, proto]) -> socket object
Create a socket object from a duplicate of the given file
descriptor. The remaining arguments are the same as for socket().
"""
nfd = dup(fd)
return socket(family, type, proto, nfd)
if hasattr(_socket.socket, "share"):
def fromshare(info):
""" fromshare(info) -> socket object
Create a socket object from the bytes object returned by
socket.share(pid).
"""
return socket(0, 0, 0, info)
__all__.append("fromshare")
if hasattr(_socket, "socketpair"):
def socketpair(family=None, type=SOCK_STREAM, proto=0):
"""socketpair([family[, type[, proto]]]) -> (socket object, socket object)
Create a pair of socket objects from the sockets returned by the platform
socketpair() function.
The arguments are the same as for socket() except the default family is
AF_UNIX if defined on the platform; otherwise, the default is AF_INET.
"""
if family is None:
try:
family = AF_UNIX
except NameError:
family = AF_INET
a, b = _socket.socketpair(family, type, proto)
a = socket(family, type, proto, a.detach())
b = socket(family, type, proto, b.detach())
return a, b
else:
# Origin: https://gist.github.com/4325783, by Geert Jansen. Public domain.
def socketpair(family=AF_INET, type=SOCK_STREAM, proto=0):
if family == AF_INET:
host = _LOCALHOST
elif family == AF_INET6:
host = _LOCALHOST_V6
else:
raise ValueError("Only AF_INET and AF_INET6 socket address families "
"are supported")
if type != SOCK_STREAM:
raise ValueError("Only SOCK_STREAM socket type is supported")
if proto != 0:
raise ValueError("Only protocol zero is supported")
# We create a connected TCP socket. Note the trick with
# setblocking(False) that prevents us from having to create a thread.
lsock = socket(family, type, proto)
try:
lsock.bind((host, 0))
lsock.listen()
# On IPv6, ignore flow_info and scope_id
addr, port = lsock.getsockname()[:2]
csock = socket(family, type, proto)
try:
csock.setblocking(False)
try:
csock.connect((addr, port))
except (BlockingIOError, InterruptedError):
pass
csock.setblocking(True)
ssock, _ = lsock.accept()
except:
csock.close()
raise
finally:
lsock.close()
return (ssock, csock)
__all__.append("socketpair")
socketpair.__doc__ = """socketpair([family[, type[, proto]]]) -> (socket object, socket object)
Create a pair of socket objects from the sockets returned by the platform
socketpair() function.
The arguments are the same as for socket() except the default family is AF_UNIX
if defined on the platform; otherwise, the default is AF_INET.
"""
_blocking_errnos = { EAGAIN, EWOULDBLOCK }
class SocketIO(io.RawIOBase):
"""Raw I/O implementation for stream sockets.
This class supports the makefile() method on sockets. It provides
the raw I/O interface on top of a socket object.
"""
# One might wonder why not let FileIO do the job instead. There are two
# main reasons why FileIO is not adapted:
# - it wouldn't work under Windows (where you can't used read() and
# write() on a socket handle)
# - it wouldn't work with socket timeouts (FileIO would ignore the
# timeout and consider the socket non-blocking)
# XXX More docs
def __init__(self, sock, mode):
if mode not in ("r", "w", "rw", "rb", "wb", "rwb"):
raise ValueError("invalid mode: %r" % mode)
io.RawIOBase.__init__(self)
self._sock = sock
if "b" not in mode:
mode += "b"
self._mode = mode
self._reading = "r" in mode
self._writing = "w" in mode
self._timeout_occurred = False
def readinto(self, b):
"""Read up to len(b) bytes into the writable buffer *b* and return
the number of bytes read. If the socket is non-blocking and no bytes
are available, None is returned.
If *b* is non-empty, a 0 return value indicates that the connection
was shutdown at the other end.
"""
self._checkClosed()
self._checkReadable()
if self._timeout_occurred:
raise OSError("cannot read from timed out object")
while True:
try:
return self._sock.recv_into(b)
except timeout:
self._timeout_occurred = True
raise
except InterruptedError:
continue
except error as e:
if e.args[0] in _blocking_errnos:
return None
raise
def write(self, b):
"""Write the given bytes or bytearray object *b* to the socket
and return the number of bytes written. This can be less than
len(b) if not all data could be written. If the socket is
non-blocking and no bytes could be written None is returned.
"""
self._checkClosed()
self._checkWritable()
try:
return self._sock.send(b)
except error as e:
# XXX what about EINTR?
if e.args[0] in _blocking_errnos:
return None
raise
def readable(self):
"""True if the SocketIO is open for reading.
"""
if self.closed:
raise ValueError("I/O operation on closed socket.")
return self._reading
def writable(self):
"""True if the SocketIO is open for writing.
"""
if self.closed:
raise ValueError("I/O operation on closed socket.")
return self._writing
def seekable(self):
"""True if the SocketIO is open for seeking.
"""
if self.closed:
raise ValueError("I/O operation on closed socket.")
return super().seekable()
def fileno(self):
"""Return the file descriptor of the underlying socket.
"""
self._checkClosed()
return self._sock.fileno()
@property
def name(self):
if not self.closed:
return self.fileno()
else:
return -1
@property
def mode(self):
return self._mode
def close(self):
"""Close the SocketIO object. This doesn't close the underlying
socket, except if all references to it have disappeared.
"""
if self.closed:
return
io.RawIOBase.close(self)
self._sock._decref_socketios()
self._sock = None
def getfqdn(name=''):
"""Get fully qualified domain name from name.
An empty argument is interpreted as meaning the local host.
First the hostname returned by gethostbyaddr() is checked, then
possibly existing aliases. In case no FQDN is available, hostname
from gethostname() is returned.
"""
name = name.strip()
if not name or name == '0.0.0.0':
name = gethostname()
try:
hostname, aliases, ipaddrs = gethostbyaddr(name)
except error:
pass
else:
aliases.insert(0, hostname)
for name in aliases:
if '.' in name:
break
else:
name = hostname
return name
_GLOBAL_DEFAULT_TIMEOUT = object()
def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT,
source_address=None):
"""Connect to *address* and return the socket object.
Convenience function. Connect to *address* (a 2-tuple ``(host,
port)``) and return the socket object. Passing the optional
*timeout* parameter will set the timeout on the socket instance
before attempting to connect. If no *timeout* is supplied, the
global default timeout setting returned by :func:`getdefaulttimeout`
is used. If *source_address* is set it must be a tuple of (host, port)
for the socket to bind as a source address before making the connection.
An host of '' or port 0 tells the OS to use the default.
"""
host, port = address
err = None
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket(af, socktype, proto)
if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
return sock
except error as _:
err = _
if sock is not None:
sock.close()
if err is not None:
raise err
else:
raise error("getaddrinfo returns an empty list")
def getaddrinfo(host, port, family=0, type=0, proto=0, flags=0):
"""Resolve host and port into list of address info entries.
Translate the host/port argument into a sequence of 5-tuples that contain
all the necessary arguments for creating a socket connected to that service.
host is a domain name, a string representation of an IPv4/v6 address or
None. port is a string service name such as 'http', a numeric port number or
None. By passing None as the value of host and port, you can pass NULL to
the underlying C API.
The family, type and proto arguments can be optionally specified in order to
narrow the list of addresses returned. Passing zero as a value for each of
these arguments selects the full range of results.
"""
# We override this function since we want to translate the numeric family
# and socket type values to enum constants.
addrlist = []
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
af, socktype, proto, canonname, sa = res
addrlist.append((_intenum_converter(af, AddressFamily),
_intenum_converter(socktype, SocketKind),
proto, canonname, sa))
return addrlist
"""Generic socket server classes.
This module tries to capture the various aspects of defining a server:
For socket-based servers:
- address family:
- AF_INET{,6}: IP (Internet Protocol) sockets (default)
- AF_UNIX: Unix domain sockets
- others, e.g. AF_DECNET are conceivable (see <socket.h>
- socket type:
- SOCK_STREAM (reliable stream, e.g. TCP)
- SOCK_DGRAM (datagrams, e.g. UDP)
For request-based servers (including socket-based):
- client address verification before further looking at the request
(This is actually a hook for any processing that needs to look
at the request before anything else, e.g. logging)
- how to handle multiple requests:
- synchronous (one request is handled at a time)
- forking (each request is handled by a new process)
- threading (each request is handled by a new thread)
The classes in this module favor the server type that is simplest to
write: a synchronous TCP/IP server. This is bad class design, but
save some typing. (There's also the issue that a deep class hierarchy
slows down method lookups.)
There are five classes in an inheritance diagram, four of which represent
synchronous servers of four types:
+------------+
| BaseServer |
+------------+
|
v
+-----------+ +------------------+
| TCPServer |------->| UnixStreamServer |
+-----------+ +------------------+
|
v
+-----------+ +--------------------+
| UDPServer |------->| UnixDatagramServer |
+-----------+ +--------------------+
Note that UnixDatagramServer derives from UDPServer, not from
UnixStreamServer -- the only difference between an IP and a Unix
stream server is the address family, which is simply repeated in both
unix server classes.
Forking and threading versions of each type of server can be created
using the ForkingMixIn and ThreadingMixIn mix-in classes. For
instance, a threading UDP server class is created as follows:
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
The Mix-in class must come first, since it overrides a method defined
in UDPServer! Setting the various member variables also changes
the behavior of the underlying server mechanism.
To implement a service, you must derive a class from
BaseRequestHandler and redefine its handle() method. You can then run
various versions of the service by combining one of the server classes
with your request handler class.
The request handler class must be different for datagram or stream
services. This can be hidden by using the request handler
subclasses StreamRequestHandler or DatagramRequestHandler.
Of course, you still have to use your head!
For instance, it makes no sense to use a forking server if the service
contains state in memory that can be modified by requests (since the
modifications in the child process would never reach the initial state
kept in the parent process and passed to each child). In this case,
you can use a threading server, but you will probably have to use
locks to avoid two requests that come in nearly simultaneous to apply
conflicting changes to the server state.
On the other hand, if you are building e.g. an HTTP server, where all
data is stored externally (e.g. in the file system), a synchronous
class will essentially render the service "deaf" while one request is
being handled -- which may be for a very long time if a client is slow
to read all the data it has requested. Here a threading or forking
server is appropriate.
In some cases, it may be appropriate to process part of a request
synchronously, but to finish processing in a forked child depending on
the request data. This can be implemented by using a synchronous
server and doing an explicit fork in the request handler class
handle() method.
Another approach to handling multiple simultaneous requests in an
environment that supports neither threads nor fork (or where these are
too expensive or inappropriate for the service) is to maintain an
explicit table of partially finished requests and to use select() to
decide which request to work on next (or whether to handle a new
incoming request). This is particularly important for stream services
where each client can potentially be connected for a long time (if
threads or subprocesses cannot be used).
Future work:
- Standard classes for Sun RPC (which uses either UDP or TCP)
- Standard mix-in classes to implement various authentication
and encryption schemes
- Standard framework for select-based multiplexing
XXX Open problems:
- What to do with out-of-band data?
BaseServer:
- split generic "request" functionality out into BaseServer class.
Copyright (C) 2000 Luke Kenneth Casson Leighton <[email protected]>
example: read entries from a SQL database (requires overriding
get_request() to return a table entry from the database).
entry is processed by a RequestHandlerClass.
"""
# Author of the BaseServer patch: Luke Kenneth Casson Leighton
# XXX Warning!
# There is a test suite for this module, but it cannot be run by the
# standard regression test.
# To run it manually, run Lib/test/test_socketserver.py.
__version__ = "0.4"
import socket
import select
import os
import errno
try:
import threading
except ImportError:
import dummy_threading as threading
__all__ = ["BaseServer", "TCPServer", "UDPServer", "ForkingUDPServer",
"ForkingTCPServer", "ThreadingUDPServer", "ThreadingTCPServer",
"BaseRequestHandler", "StreamRequestHandler",
"DatagramRequestHandler", "ThreadingMixIn", "ForkingMixIn"]
if hasattr(socket, "AF_UNIX"):
__all__.extend(["UnixStreamServer","UnixDatagramServer",
"ThreadingUnixStreamServer",
"ThreadingUnixDatagramServer"])
def _eintr_retry(func, *args):
"""restart a system call interrupted by EINTR"""
while True:
try:
return func(*args)
except OSError as e:
if e.errno != errno.EINTR:
raise
class BaseServer:
"""Base class for server classes.
Methods for the caller:
- __init__(server_address, RequestHandlerClass)
- serve_forever(poll_interval=0.5)
- shutdown()
- handle_request() # if you do not use serve_forever()
- fileno() -> int # for select()
Methods that may be overridden:
- server_bind()
- server_activate()
- get_request() -> request, client_address
- handle_timeout()
- verify_request(request, client_address)
- server_close()
- process_request(request, client_address)
- shutdown_request(request)
- close_request(request)
- service_actions()
- handle_error()
Methods for derived classes:
- finish_request(request, client_address)
Class variables that may be overridden by derived classes or
instances:
- timeout
- address_family
- socket_type
- allow_reuse_address
Instance variables:
- RequestHandlerClass
- socket
"""
timeout = None
def __init__(self, server_address, RequestHandlerClass):
"""Constructor. May be extended, do not override."""
self.server_address = server_address
self.RequestHandlerClass = RequestHandlerClass
self.__is_shut_down = threading.Event()
self.__shutdown_request = False
def server_activate(self):
"""Called by constructor to activate the server.
May be overridden.
"""
pass
def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.
Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self.__is_shut_down.clear()
try:
while not self.__shutdown_request:
# XXX: Consider using another file descriptor or
# connecting to the socket to wake this up instead of
# polling. Polling reduces our responsiveness to a
# shutdown request and wastes cpu at all other times.
r, w, e = _eintr_retry(select.select, [self], [], [],
poll_interval)
if self in r:
self._handle_request_noblock()
self.service_actions()
finally:
self.__shutdown_request = False
self.__is_shut_down.set()
def shutdown(self):
"""Stops the serve_forever loop.
Blocks until the loop has finished. This must be called while
serve_forever() is running in another thread, or it will
deadlock.
"""
self.__shutdown_request = True
self.__is_shut_down.wait()
def service_actions(self):
"""Called by the serve_forever() loop.
May be overridden by a subclass / Mixin to implement any code that
needs to be run during the loop.
"""
pass
# The distinction between handling, getting, processing and
# finishing a request is fairly arbitrary. Remember:
#
# - handle_request() is the top-level call. It calls
# select, get_request(), verify_request() and process_request()
# - get_request() is different for stream or datagram sockets
# - process_request() is the place that may fork a new process
# or create a new thread to finish the request
# - finish_request() instantiates the request handler class;
# this constructor will handle the request all by itself
def handle_request(self):
"""Handle one request, possibly blocking.
Respects self.timeout.
"""
# Support people who used socket.settimeout() to escape
# handle_request before self.timeout was available.
timeout = self.socket.gettimeout()
if timeout is None:
timeout = self.timeout
elif self.timeout is not None:
timeout = min(timeout, self.timeout)
fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
if not fd_sets[0]:
self.handle_timeout()
return
self._handle_request_noblock()
def _handle_request_noblock(self):
"""Handle one request, without blocking.
I assume that select.select has returned that the socket is
readable before this function was called, so there should be
no risk of blocking in get_request().
"""
try:
request, client_address = self.get_request()
except OSError:
return
if self.verify_request(request, client_address):
try:
self.process_request(request, client_address)
except:
self.handle_error(request, client_address)
self.shutdown_request(request)
def handle_timeout(self):
"""Called if no new request arrives within self.timeout.
Overridden by ForkingMixIn.
"""
pass
def verify_request(self, request, client_address):
"""Verify the request. May be overridden.
Return True if we should proceed with this request.
"""
return True
def process_request(self, request, client_address):
"""Call finish_request.
Overridden by ForkingMixIn and ThreadingMixIn.
"""
self.finish_request(request, client_address)
self.shutdown_request(request)
def server_close(self):
"""Called to clean-up the server.
May be overridden.
"""
pass
def finish_request(self, request, client_address):
"""Finish one request by instantiating RequestHandlerClass."""
self.RequestHandlerClass(request, client_address, self)
def shutdown_request(self, request):
"""Called to shutdown and close an individual request."""
self.close_request(request)
def close_request(self, request):
"""Called to clean up an individual request."""
pass
def handle_error(self, request, client_address):
"""Handle an error gracefully. May be overridden.
The default is to print a traceback and continue.
"""
print('-'*40)
print('Exception happened during processing of request from', end=' ')
print(client_address)
import traceback
traceback.print_exc() # XXX But this goes to stderr!
print('-'*40)
class TCPServer(BaseServer):
"""Base class for various socket-based server classes.
Defaults to synchronous IP stream (i.e., TCP).
Methods for the caller:
- __init__(server_address, RequestHandlerClass, bind_and_activate=True)
- serve_forever(poll_interval=0.5)
- shutdown()
- handle_request() # if you don't use serve_forever()
- fileno() -> int # for select()
Methods that may be overridden:
- server_bind()
- server_activate()
- get_request() -> request, client_address
- handle_timeout()
- verify_request(request, client_address)
- process_request(request, client_address)
- shutdown_request(request)
- close_request(request)
- handle_error()
Methods for derived classes:
- finish_request(request, client_address)
Class variables that may be overridden by derived classes or
instances:
- timeout
- address_family
- socket_type
- request_queue_size (only for stream sockets)
- allow_reuse_address
Instance variables:
- server_address
- RequestHandlerClass
- socket
"""
address_family = socket.AF_INET
socket_type = socket.SOCK_STREAM
request_queue_size = 5
allow_reuse_address = False
def __init__(self, server_address, RequestHandlerClass, bind_and_activate=True):
"""Constructor. May be extended, do not override."""
BaseServer.__init__(self, server_address, RequestHandlerClass)
self.socket = socket.socket(self.address_family,
self.socket_type)
if bind_and_activate:
try:
self.server_bind()
self.server_activate()
except:
self.server_close()
raise
def server_bind(self):
"""Called by constructor to bind the socket.
May be overridden.
"""
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.socket.bind(self.server_address)
self.server_address = self.socket.getsockname()
def server_activate(self):
"""Called by constructor to activate the server.
May be overridden.
"""
self.socket.listen(self.request_queue_size)
def server_close(self):
"""Called to clean-up the server.
May be overridden.
"""
self.socket.close()
def fileno(self):
"""Return socket file number.
Interface required by select().
"""
return self.socket.fileno()
def get_request(self):
"""Get the request and client address from the socket.
May be overridden.
"""
return self.socket.accept()
def shutdown_request(self, request):
"""Called to shutdown and close an individual request."""
try:
#explicitly shutdown. socket.close() merely releases
#the socket and waits for GC to perform the actual close.
request.shutdown(socket.SHUT_WR)
except OSError:
pass #some platforms may raise ENOTCONN here
self.close_request(request)
def close_request(self, request):
"""Called to clean up an individual request."""
request.close()
class UDPServer(TCPServer):
"""UDP server class."""
allow_reuse_address = False
socket_type = socket.SOCK_DGRAM
max_packet_size = 8192
def get_request(self):
data, client_addr = self.socket.recvfrom(self.max_packet_size)
return (data, self.socket), client_addr
def server_activate(self):
# No need to call listen() for UDP.
pass
def shutdown_request(self, request):
# No need to shutdown anything.
self.close_request(request)
def close_request(self, request):
# No need to close anything.
pass
class ForkingMixIn:
"""Mix-in class to handle each request in a new process."""
timeout = 300
active_children = None
max_children = 40
def collect_children(self):
"""Internal routine to wait for children that have exited."""
if self.active_children is None:
return
# If we're above the max number of children, wait and reap them until
# we go back below threshold. Note that we use waitpid(-1) below to be
# able to collect children in size(<defunct children>) syscalls instead
# of size(<children>): the downside is that this might reap children
# which we didn't spawn, which is why we only resort to this when we're
# above max_children.
while len(self.active_children) >= self.max_children:
try:
pid, _ = os.waitpid(-1, 0)
self.active_children.discard(pid)
except InterruptedError:
pass
except ChildProcessError:
# we don't have any children, we're done
self.active_children.clear()
except OSError:
break
# Now reap all defunct children.
for pid in self.active_children.copy():
try:
pid, _ = os.waitpid(pid, os.WNOHANG)
# if the child hasn't exited yet, pid will be 0 and ignored by
# discard() below
self.active_children.discard(pid)
except ChildProcessError:
# someone else reaped it
self.active_children.discard(pid)
except OSError:
pass
def handle_timeout(self):
"""Wait for zombies after self.timeout seconds of inactivity.
May be extended, do not override.
"""
self.collect_children()
def service_actions(self):
"""Collect the zombie child processes regularly in the ForkingMixIn.
service_actions is called in the BaseServer's serve_forver loop.
"""
self.collect_children()
def process_request(self, request, client_address):
"""Fork a new subprocess to process the request."""
pid = os.fork()
if pid:
# Parent process
if self.active_children is None:
self.active_children = set()
self.active_children.add(pid)
self.close_request(request)
return
else:
# Child process.
# This must never return, hence os._exit()!
try:
self.finish_request(request, client_address)
self.shutdown_request(request)
os._exit(0)
except:
try:
self.handle_error(request, client_address)
self.shutdown_request(request)
finally:
os._exit(1)
class ThreadingMixIn:
"""Mix-in class to handle each request in a new thread."""
# Decides how threads will act upon termination of the
# main process
daemon_threads = False
def process_request_thread(self, request, client_address):
"""Same as in BaseServer but as a thread.
In addition, exception handling is done here.
"""
try:
self.finish_request(request, client_address)
self.shutdown_request(request)
except:
self.handle_error(request, client_address)
self.shutdown_request(request)
def process_request(self, request, client_address):
"""Start a new thread to process the request."""
t = threading.Thread(target = self.process_request_thread,
args = (request, client_address))
t.daemon = self.daemon_threads
t.start()
class ForkingUDPServer(ForkingMixIn, UDPServer): pass
class ForkingTCPServer(ForkingMixIn, TCPServer): pass
class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
class ThreadingTCPServer(ThreadingMixIn, TCPServer): pass
if hasattr(socket, 'AF_UNIX'):
class UnixStreamServer(TCPServer):
address_family = socket.AF_UNIX
class UnixDatagramServer(UDPServer):
address_family = socket.AF_UNIX
class ThreadingUnixStreamServer(ThreadingMixIn, UnixStreamServer): pass
class ThreadingUnixDatagramServer(ThreadingMixIn, UnixDatagramServer): pass
class BaseRequestHandler:
"""Base class for request handler classes.
This class is instantiated for each request to be handled. The
constructor sets the instance variables request, client_address
and server, and then calls the handle() method. To implement a
specific service, all you need to do is to derive a class which
defines a handle() method.
The handle() method can find the request as self.request, the
client address as self.client_address, and the server (in case it
needs access to per-server information) as self.server. Since a
separate instance is created for each request, the handle() method
can define arbitrary other instance variariables.
"""
def __init__(self, request, client_address, server):
self.request = request
self.client_address = client_address
self.server = server
self.setup()
try:
self.handle()
finally:
self.finish()
def setup(self):
pass
def handle(self):
pass
def finish(self):
pass
# The following two classes make it possible to use the same service
# class for stream or datagram servers.
# Each class sets up these instance variables:
# - rfile: a file object from which receives the request is read
# - wfile: a file object to which the reply is written
# When the handle() method returns, wfile is flushed properly
class StreamRequestHandler(BaseRequestHandler):
"""Define self.rfile and self.wfile for stream sockets."""
# Default buffer sizes for rfile, wfile.
# We default rfile to buffered because otherwise it could be
# really slow for large data (a getc() call per byte); we make
# wfile unbuffered because (a) often after a write() we want to
# read and we need to flush the line; (b) big writes to unbuffered
# files are typically optimized by stdio even when big reads
# aren't.
rbufsize = -1
wbufsize = 0
# A timeout to apply to the request socket, if not None.
timeout = None
# Disable nagle algorithm for this socket, if True.
# Use only when wbufsize != 0, to avoid small packets.
disable_nagle_algorithm = False
def setup(self):
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
self.wfile = self.connection.makefile('wb', self.wbufsize)
def finish(self):
if not self.wfile.closed:
try:
self.wfile.flush()
except socket.error:
# An final socket error may have occurred here, such as
# the local error ECONNABORTED.
pass
self.wfile.close()
self.rfile.close()
class DatagramRequestHandler(BaseRequestHandler):
# XXX Regrettably, I cannot get this working on Linux;
# s.recvfrom() doesn't return a meaningful client address.
"""Define self.rfile and self.wfile for datagram sockets."""
def setup(self):
from io import BytesIO
self.packet, self.socket = self.request
self.rfile = BytesIO(self.packet)
self.wfile = BytesIO()
def finish(self):
self.socket.sendto(self.wfile.getvalue(), self.client_address)
#
# Secret Labs' Regular Expression Engine
#
# convert template to internal format
#
# Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved.
#
# See the sre.py file for information on usage and redistribution.
#
"""Internal support module for sre"""
import _sre
import sre_parse
from sre_constants import *
from _sre import MAXREPEAT
assert _sre.MAGIC == MAGIC, "SRE module mismatch"
if _sre.CODESIZE == 2:
MAXCODE = 65535
else:
MAXCODE = 0xFFFFFFFF
_LITERAL_CODES = set([LITERAL, NOT_LITERAL])
_REPEATING_CODES = set([REPEAT, MIN_REPEAT, MAX_REPEAT])
_SUCCESS_CODES = set([SUCCESS, FAILURE])
_ASSERT_CODES = set([ASSERT, ASSERT_NOT])
# Sets of lowercase characters which have the same uppercase.
_equivalences = (
# LATIN SMALL LETTER I, LATIN SMALL LETTER DOTLESS I
(0x69, 0x131), # iı
# LATIN SMALL LETTER S, LATIN SMALL LETTER LONG S
(0x73, 0x17f), # sſ
# MICRO SIGN, GREEK SMALL LETTER MU
(0xb5, 0x3bc), # µμ
# COMBINING GREEK YPOGEGRAMMENI, GREEK SMALL LETTER IOTA, GREEK PROSGEGRAMMENI
(0x345, 0x3b9, 0x1fbe), # \u0345ιι
# GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS, GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
(0x390, 0x1fd3), # ΐΐ
# GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS, GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
(0x3b0, 0x1fe3), # ΰΰ
# GREEK SMALL LETTER BETA, GREEK BETA SYMBOL
(0x3b2, 0x3d0), # βϐ
# GREEK SMALL LETTER EPSILON, GREEK LUNATE EPSILON SYMBOL
(0x3b5, 0x3f5), # εϵ
# GREEK SMALL LETTER THETA, GREEK THETA SYMBOL
(0x3b8, 0x3d1), # θϑ
# GREEK SMALL LETTER KAPPA, GREEK KAPPA SYMBOL
(0x3ba, 0x3f0), # κϰ
# GREEK SMALL LETTER PI, GREEK PI SYMBOL
(0x3c0, 0x3d6), # πϖ
# GREEK SMALL LETTER RHO, GREEK RHO SYMBOL
(0x3c1, 0x3f1), # ρϱ
# GREEK SMALL LETTER FINAL SIGMA, GREEK SMALL LETTER SIGMA
(0x3c2, 0x3c3), # ςσ
# GREEK SMALL LETTER PHI, GREEK PHI SYMBOL
(0x3c6, 0x3d5), # φϕ
# LATIN SMALL LETTER S WITH DOT ABOVE, LATIN SMALL LETTER LONG S WITH DOT ABOVE
(0x1e61, 0x1e9b), # ṡẛ
# LATIN SMALL LIGATURE LONG S T, LATIN SMALL LIGATURE ST
(0xfb05, 0xfb06), # ſtst
)
# Maps the lowercase code to lowercase codes which have the same uppercase.
_ignorecase_fixes = {i: tuple(j for j in t if i != j)
for t in _equivalences for i in t}
def _compile(code, pattern, flags):
# internal: compile a (sub)pattern
emit = code.append
_len = len
LITERAL_CODES = _LITERAL_CODES
REPEATING_CODES = _REPEATING_CODES
SUCCESS_CODES = _SUCCESS_CODES
ASSERT_CODES = _ASSERT_CODES
if (flags & SRE_FLAG_IGNORECASE and
not (flags & SRE_FLAG_LOCALE) and
flags & SRE_FLAG_UNICODE):
fixes = _ignorecase_fixes
else:
fixes = None
for op, av in pattern:
if op in LITERAL_CODES:
if flags & SRE_FLAG_IGNORECASE:
lo = _sre.getlower(av, flags)
if fixes and lo in fixes:
emit(OPCODES[IN_IGNORE])
skip = _len(code); emit(0)
if op is NOT_LITERAL:
emit(OPCODES[NEGATE])
for k in (lo,) + fixes[lo]:
emit(OPCODES[LITERAL])
emit(k)
emit(OPCODES[FAILURE])
code[skip] = _len(code) - skip
else:
emit(OPCODES[OP_IGNORE[op]])
emit(lo)
else:
emit(OPCODES[op])
emit(av)
elif op is IN:
if flags & SRE_FLAG_IGNORECASE:
emit(OPCODES[OP_IGNORE[op]])
def fixup(literal, flags=flags):
return _sre.getlower(literal, flags)
else:
emit(OPCODES[op])
fixup = None
skip = _len(code); emit(0)
_compile_charset(av, flags, code, fixup, fixes)
code[skip] = _len(code) - skip
elif op is ANY:
if flags & SRE_FLAG_DOTALL:
emit(OPCODES[ANY_ALL])
else:
emit(OPCODES[ANY])
elif op in REPEATING_CODES:
if flags & SRE_FLAG_TEMPLATE:
raise error("internal: unsupported template operator")
elif _simple(av) and op is not REPEAT:
if op is MAX_REPEAT:
emit(OPCODES[REPEAT_ONE])
else:
emit(OPCODES[MIN_REPEAT_ONE])
skip = _len(code); emit(0)
emit(av[0])
emit(av[1])
_compile(code, av[2], flags)
emit(OPCODES[SUCCESS])
code[skip] = _len(code) - skip
else:
emit(OPCODES[REPEAT])
skip = _len(code); emit(0)
emit(av[0])
emit(av[1])
_compile(code, av[2], flags)
code[skip] = _len(code) - skip
if op is MAX_REPEAT:
emit(OPCODES[MAX_UNTIL])
else:
emit(OPCODES[MIN_UNTIL])
elif op is SUBPATTERN:
if av[0]:
emit(OPCODES[MARK])
emit((av[0]-1)*2)
# _compile_info(code, av[1], flags)
_compile(code, av[1], flags)
if av[0]:
emit(OPCODES[MARK])
emit((av[0]-1)*2+1)
elif op in SUCCESS_CODES:
emit(OPCODES[op])
elif op in ASSERT_CODES:
emit(OPCODES[op])
skip = _len(code); emit(0)
if av[0] >= 0:
emit(0) # look ahead
else:
lo, hi = av[1].getwidth()
if lo != hi:
raise error("look-behind requires fixed-width pattern")
emit(lo) # look behind
_compile(code, av[1], flags)
emit(OPCODES[SUCCESS])
code[skip] = _len(code) - skip
elif op is CALL:
emit(OPCODES[op])
skip = _len(code); emit(0)
_compile(code, av, flags)
emit(OPCODES[SUCCESS])
code[skip] = _len(code) - skip
elif op is AT:
emit(OPCODES[op])
if flags & SRE_FLAG_MULTILINE:
av = AT_MULTILINE.get(av, av)
if flags & SRE_FLAG_LOCALE:
av = AT_LOCALE.get(av, av)
elif flags & SRE_FLAG_UNICODE:
av = AT_UNICODE.get(av, av)
emit(ATCODES[av])
elif op is BRANCH:
emit(OPCODES[op])
tail = []
tailappend = tail.append
for av in av[1]:
skip = _len(code); emit(0)
# _compile_info(code, av, flags)
_compile(code, av, flags)
emit(OPCODES[JUMP])
tailappend(_len(code)); emit(0)
code[skip] = _len(code) - skip
emit(0) # end of branch
for tail in tail:
code[tail] = _len(code) - tail
elif op is CATEGORY:
emit(OPCODES[op])
if flags & SRE_FLAG_LOCALE:
av = CH_LOCALE[av]
elif flags & SRE_FLAG_UNICODE:
av = CH_UNICODE[av]
emit(CHCODES[av])
elif op is GROUPREF:
if flags & SRE_FLAG_IGNORECASE:
emit(OPCODES[OP_IGNORE[op]])
else:
emit(OPCODES[op])
emit(av-1)
elif op is GROUPREF_EXISTS:
emit(OPCODES[op])
emit(av[0]-1)
skipyes = _len(code); emit(0)
_compile(code, av[1], flags)
if av[2]:
emit(OPCODES[JUMP])
skipno = _len(code); emit(0)
code[skipyes] = _len(code) - skipyes + 1
_compile(code, av[2], flags)
code[skipno] = _len(code) - skipno
else:
code[skipyes] = _len(code) - skipyes + 1
else:
raise ValueError("unsupported operand type", op)
def _compile_charset(charset, flags, code, fixup=None, fixes=None):
# compile charset subprogram
emit = code.append
for op, av in _optimize_charset(charset, fixup, fixes,
flags & SRE_FLAG_UNICODE):
emit(OPCODES[op])
if op is NEGATE:
pass
elif op is LITERAL:
emit(av)
elif op is RANGE:
emit(av[0])
emit(av[1])
elif op is CHARSET:
code.extend(av)
elif op is BIGCHARSET:
code.extend(av)
elif op is CATEGORY:
if flags & SRE_FLAG_LOCALE:
emit(CHCODES[CH_LOCALE[av]])
elif flags & SRE_FLAG_UNICODE:
emit(CHCODES[CH_UNICODE[av]])
else:
emit(CHCODES[av])
else:
raise error("internal: unsupported set operator")
emit(OPCODES[FAILURE])
def _optimize_charset(charset, fixup, fixes, isunicode):
# internal: optimize character set
out = []
tail = []
charmap = bytearray(256)
for op, av in charset:
while True:
try:
if op is LITERAL:
if fixup:
i = fixup(av)
charmap[i] = 1
if fixes and i in fixes:
for k in fixes[i]:
charmap[k] = 1
else:
charmap[av] = 1
elif op is RANGE:
r = range(av[0], av[1]+1)
if fixup:
r = map(fixup, r)
if fixup and fixes:
for i in r:
charmap[i] = 1
if i in fixes:
for k in fixes[i]:
charmap[k] = 1
else:
for i in r:
charmap[i] = 1
elif op is NEGATE:
out.append((op, av))
else:
tail.append((op, av))
except IndexError:
if len(charmap) == 256:
# character set contains non-UCS1 character codes
charmap += b'\0' * 0xff00
continue
# character set contains non-BMP character codes
if fixup and isunicode and op is RANGE:
lo, hi = av
ranges = [av]
# There are only two ranges of cased astral characters:
# 10400-1044F (Deseret) and 118A0-118DF (Warang Citi).
_fixup_range(max(0x10000, lo), min(0x11fff, hi),
ranges, fixup)
for lo, hi in ranges:
if lo == hi:
tail.append((LITERAL, hi))
else:
tail.append((RANGE, (lo, hi)))
else:
tail.append((op, av))
break
# compress character map
runs = []
q = 0
while True:
p = charmap.find(1, q)
if p < 0:
break
if len(runs) >= 2:
runs = None
break
q = charmap.find(0, p)
if q < 0:
runs.append((p, len(charmap)))
break
runs.append((p, q))
if runs is not None:
# use literal/range
for p, q in runs:
if q - p == 1:
out.append((LITERAL, p))
else:
out.append((RANGE, (p, q - 1)))
out += tail
# if the case was changed or new representation is more compact
if fixup or len(out) < len(charset):
return out
# else original character set is good enough
return charset
# use bitmap
if len(charmap) == 256:
data = _mk_bitmap(charmap)
out.append((CHARSET, data))
out += tail
return out
# To represent a big charset, first a bitmap of all characters in the
# set is constructed. Then, this bitmap is sliced into chunks of 256
# characters, duplicate chunks are eliminated, and each chunk is
# given a number. In the compiled expression, the charset is
# represented by a 32-bit word sequence, consisting of one word for
# the number of different chunks, a sequence of 256 bytes (64 words)
# of chunk numbers indexed by their original chunk position, and a
# sequence of 256-bit chunks (8 words each).
# Compression is normally good: in a typical charset, large ranges of
# Unicode will be either completely excluded (e.g. if only cyrillic
# letters are to be matched), or completely included (e.g. if large
# subranges of Kanji match). These ranges will be represented by
# chunks of all one-bits or all zero-bits.
# Matching can be also done efficiently: the more significant byte of
# the Unicode character is an index into the chunk number, and the
# less significant byte is a bit index in the chunk (just like the
# CHARSET matching).
charmap = bytes(charmap) # should be hashable
comps = {}
mapping = bytearray(256)
block = 0
data = bytearray()
for i in range(0, 65536, 256):
chunk = charmap[i: i + 256]
if chunk in comps:
mapping[i // 256] = comps[chunk]
else:
mapping[i // 256] = comps[chunk] = block
block += 1
data += chunk
data = _mk_bitmap(data)
data[0:0] = [block] + _bytes_to_codes(mapping)
out.append((BIGCHARSET, data))
out += tail
return out
def _fixup_range(lo, hi, ranges, fixup):
for i in map(fixup, range(lo, hi+1)):
for k, (lo, hi) in enumerate(ranges):
if i < lo:
if l == lo - 1:
ranges[k] = (i, hi)
else:
ranges.insert(k, (i, i))
break
elif i > hi:
if i == hi + 1:
ranges[k] = (lo, i)
break
else:
break
else:
ranges.append((i, i))
_CODEBITS = _sre.CODESIZE * 8
_BITS_TRANS = b'0' + b'1' * 255
def _mk_bitmap(bits, _CODEBITS=_CODEBITS, _int=int):
s = bits.translate(_BITS_TRANS)[::-1]
return [_int(s[i - _CODEBITS: i], 2)
for i in range(len(s), 0, -_CODEBITS)]
def _bytes_to_codes(b):
# Convert block indices to word array
a = memoryview(b).cast('I')
assert a.itemsize == _sre.CODESIZE
assert len(a) * a.itemsize == len(b)
return a.tolist()
def _simple(av):
# check if av is a "simple" operator
lo, hi = av[2].getwidth()
return lo == hi == 1 and av[2][0][0] != SUBPATTERN
def _generate_overlap_table(prefix):
"""
Generate an overlap table for the following prefix.
An overlap table is a table of the same size as the prefix which
informs about the potential self-overlap for each index in the prefix:
- if overlap[i] == 0, prefix[i:] can't overlap prefix[0:...]
- if overlap[i] == k with 0 < k <= i, prefix[i-k+1:i+1] overlaps with
prefix[0:k]
"""
table = [0] * len(prefix)
for i in range(1, len(prefix)):
idx = table[i - 1]
while prefix[i] != prefix[idx]:
if idx == 0:
table[i] = 0
break
idx = table[idx - 1]
else:
table[i] = idx + 1
return table
def _compile_info(code, pattern, flags):
# internal: compile an info block. in the current version,
# this contains min/max pattern width, and an optional literal
# prefix or a character map
lo, hi = pattern.getwidth()
if lo == 0:
return # not worth it
# look for a literal prefix
prefix = []
prefixappend = prefix.append
prefix_skip = 0
charset = [] # not used
charsetappend = charset.append
if not (flags & SRE_FLAG_IGNORECASE):
# look for literal prefix
for op, av in pattern.data:
if op is LITERAL:
if len(prefix) == prefix_skip:
prefix_skip = prefix_skip + 1
prefixappend(av)
elif op is SUBPATTERN and len(av[1]) == 1:
op, av = av[1][0]
if op is LITERAL:
prefixappend(av)
else:
break
else:
break
# if no prefix, look for charset prefix
if not prefix and pattern.data:
op, av = pattern.data[0]
if op is SUBPATTERN and av[1]:
op, av = av[1][0]
if op is LITERAL:
charsetappend((op, av))
elif op is BRANCH:
c = []
cappend = c.append
for p in av[1]:
if not p:
break
op, av = p[0]
if op is LITERAL:
cappend((op, av))
else:
break
else:
charset = c
elif op is BRANCH:
c = []
cappend = c.append
for p in av[1]:
if not p:
break
op, av = p[0]
if op is LITERAL:
cappend((op, av))
else:
break
else:
charset = c
elif op is IN:
charset = av
## if prefix:
## print "*** PREFIX", prefix, prefix_skip
## if charset:
## print "*** CHARSET", charset
# add an info block
emit = code.append
emit(OPCODES[INFO])
skip = len(code); emit(0)
# literal flag
mask = 0
if prefix:
mask = SRE_INFO_PREFIX
if len(prefix) == prefix_skip == len(pattern.data):
mask = mask + SRE_INFO_LITERAL
elif charset:
mask = mask + SRE_INFO_CHARSET
emit(mask)
# pattern length
if lo < MAXCODE:
emit(lo)
else:
emit(MAXCODE)
prefix = prefix[:MAXCODE]
if hi < MAXCODE:
emit(hi)
else:
emit(0)
# add literal prefix
if prefix:
emit(len(prefix)) # length
emit(prefix_skip) # skip
code.extend(prefix)
# generate overlap table
code.extend(_generate_overlap_table(prefix))
elif charset:
_compile_charset(charset, flags, code)
code[skip] = len(code) - skip
def isstring(obj):
return isinstance(obj, (str, bytes))
def _code(p, flags):
flags = p.pattern.flags | flags
code = []
# compile info block
_compile_info(code, p, flags)
# compile the pattern
_compile(code, p.data, flags)
code.append(OPCODES[SUCCESS])
return code
def compile(p, flags=0):
# internal: convert pattern list to internal format
if isstring(p):
pattern = p
p = sre_parse.parse(p, flags)
else:
pattern = None
code = _code(p, flags)
# print code
# XXX: <fl> get rid of this limitation!
if p.pattern.groups > 100:
raise AssertionError(
"sorry, but this version only supports 100 named groups"
)
# map in either direction
groupindex = p.pattern.groupdict
indexgroup = [None] * p.pattern.groups
for k, i in groupindex.items():
indexgroup[i] = k
return _sre.compile(
pattern, flags | p.pattern.flags, code,
p.pattern.groups-1,
groupindex, indexgroup
)
#
# Secret Labs' Regular Expression Engine
#
# various symbols used by the regular expression engine.
# run this script to update the _sre include files!
#
# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved.
#
# See the sre.py file for information on usage and redistribution.
#
"""Internal support module for sre"""
# update when constants are added or removed
MAGIC = 20031017
from _sre import MAXREPEAT
# SRE standard exception (access as sre.error)
# should this really be here?
class error(Exception):
pass
# operators
FAILURE = "failure"
SUCCESS = "success"
ANY = "any"
ANY_ALL = "any_all"
ASSERT = "assert"
ASSERT_NOT = "assert_not"
AT = "at"
BIGCHARSET = "bigcharset"
BRANCH = "branch"
CALL = "call"
CATEGORY = "category"
CHARSET = "charset"
GROUPREF = "groupref"
GROUPREF_IGNORE = "groupref_ignore"
GROUPREF_EXISTS = "groupref_exists"
IN = "in"
IN_IGNORE = "in_ignore"
INFO = "info"
JUMP = "jump"
LITERAL = "literal"
LITERAL_IGNORE = "literal_ignore"
MARK = "mark"
MAX_REPEAT = "max_repeat"
MAX_UNTIL = "max_until"
MIN_REPEAT = "min_repeat"
MIN_UNTIL = "min_until"
NEGATE = "negate"
NOT_LITERAL = "not_literal"
NOT_LITERAL_IGNORE = "not_literal_ignore"
RANGE = "range"
REPEAT = "repeat"
REPEAT_ONE = "repeat_one"
SUBPATTERN = "subpattern"
MIN_REPEAT_ONE = "min_repeat_one"
# positions
AT_BEGINNING = "at_beginning"
AT_BEGINNING_LINE = "at_beginning_line"
AT_BEGINNING_STRING = "at_beginning_string"
AT_BOUNDARY = "at_boundary"
AT_NON_BOUNDARY = "at_non_boundary"
AT_END = "at_end"
AT_END_LINE = "at_end_line"
AT_END_STRING = "at_end_string"
AT_LOC_BOUNDARY = "at_loc_boundary"
AT_LOC_NON_BOUNDARY = "at_loc_non_boundary"
AT_UNI_BOUNDARY = "at_uni_boundary"
AT_UNI_NON_BOUNDARY = "at_uni_non_boundary"
# categories
CATEGORY_DIGIT = "category_digit"
CATEGORY_NOT_DIGIT = "category_not_digit"
CATEGORY_SPACE = "category_space"
CATEGORY_NOT_SPACE = "category_not_space"
CATEGORY_WORD = "category_word"
CATEGORY_NOT_WORD = "category_not_word"
CATEGORY_LINEBREAK = "category_linebreak"
CATEGORY_NOT_LINEBREAK = "category_not_linebreak"
CATEGORY_LOC_WORD = "category_loc_word"
CATEGORY_LOC_NOT_WORD = "category_loc_not_word"
CATEGORY_UNI_DIGIT = "category_uni_digit"
CATEGORY_UNI_NOT_DIGIT = "category_uni_not_digit"
CATEGORY_UNI_SPACE = "category_uni_space"
CATEGORY_UNI_NOT_SPACE = "category_uni_not_space"
CATEGORY_UNI_WORD = "category_uni_word"
CATEGORY_UNI_NOT_WORD = "category_uni_not_word"
CATEGORY_UNI_LINEBREAK = "category_uni_linebreak"
CATEGORY_UNI_NOT_LINEBREAK = "category_uni_not_linebreak"
OPCODES = [
# failure=0 success=1 (just because it looks better that way :-)
FAILURE, SUCCESS,
ANY, ANY_ALL,
ASSERT, ASSERT_NOT,
AT,
BRANCH,
CALL,
CATEGORY,
CHARSET, BIGCHARSET,
GROUPREF, GROUPREF_EXISTS, GROUPREF_IGNORE,
IN, IN_IGNORE,
INFO,
JUMP,
LITERAL, LITERAL_IGNORE,
MARK,
MAX_UNTIL,
MIN_UNTIL,
NOT_LITERAL, NOT_LITERAL_IGNORE,
NEGATE,
RANGE,
REPEAT,
REPEAT_ONE,
SUBPATTERN,
MIN_REPEAT_ONE
]
ATCODES = [
AT_BEGINNING, AT_BEGINNING_LINE, AT_BEGINNING_STRING, AT_BOUNDARY,
AT_NON_BOUNDARY, AT_END, AT_END_LINE, AT_END_STRING,
AT_LOC_BOUNDARY, AT_LOC_NON_BOUNDARY, AT_UNI_BOUNDARY,
AT_UNI_NON_BOUNDARY
]
CHCODES = [
CATEGORY_DIGIT, CATEGORY_NOT_DIGIT, CATEGORY_SPACE,
CATEGORY_NOT_SPACE, CATEGORY_WORD, CATEGORY_NOT_WORD,
CATEGORY_LINEBREAK, CATEGORY_NOT_LINEBREAK, CATEGORY_LOC_WORD,
CATEGORY_LOC_NOT_WORD, CATEGORY_UNI_DIGIT, CATEGORY_UNI_NOT_DIGIT,
CATEGORY_UNI_SPACE, CATEGORY_UNI_NOT_SPACE, CATEGORY_UNI_WORD,
CATEGORY_UNI_NOT_WORD, CATEGORY_UNI_LINEBREAK,
CATEGORY_UNI_NOT_LINEBREAK
]
def makedict(list):
d = {}
i = 0
for item in list:
d[item] = i
i = i + 1
return d
OPCODES = makedict(OPCODES)
ATCODES = makedict(ATCODES)
CHCODES = makedict(CHCODES)
# replacement operations for "ignore case" mode
OP_IGNORE = {
GROUPREF: GROUPREF_IGNORE,
IN: IN_IGNORE,
LITERAL: LITERAL_IGNORE,
NOT_LITERAL: NOT_LITERAL_IGNORE
}
AT_MULTILINE = {
AT_BEGINNING: AT_BEGINNING_LINE,
AT_END: AT_END_LINE
}
AT_LOCALE = {
AT_BOUNDARY: AT_LOC_BOUNDARY,
AT_NON_BOUNDARY: AT_LOC_NON_BOUNDARY
}
AT_UNICODE = {
AT_BOUNDARY: AT_UNI_BOUNDARY,
AT_NON_BOUNDARY: AT_UNI_NON_BOUNDARY
}
CH_LOCALE = {
CATEGORY_DIGIT: CATEGORY_DIGIT,
CATEGORY_NOT_DIGIT: CATEGORY_NOT_DIGIT,
CATEGORY_SPACE: CATEGORY_SPACE,
CATEGORY_NOT_SPACE: CATEGORY_NOT_SPACE,
CATEGORY_WORD: CATEGORY_LOC_WORD,
CATEGORY_NOT_WORD: CATEGORY_LOC_NOT_WORD,
CATEGORY_LINEBREAK: CATEGORY_LINEBREAK,
CATEGORY_NOT_LINEBREAK: CATEGORY_NOT_LINEBREAK
}
CH_UNICODE = {
CATEGORY_DIGIT: CATEGORY_UNI_DIGIT,
CATEGORY_NOT_DIGIT: CATEGORY_UNI_NOT_DIGIT,
CATEGORY_SPACE: CATEGORY_UNI_SPACE,
CATEGORY_NOT_SPACE: CATEGORY_UNI_NOT_SPACE,
CATEGORY_WORD: CATEGORY_UNI_WORD,
CATEGORY_NOT_WORD: CATEGORY_UNI_NOT_WORD,
CATEGORY_LINEBREAK: CATEGORY_UNI_LINEBREAK,
CATEGORY_NOT_LINEBREAK: CATEGORY_UNI_NOT_LINEBREAK
}
# flags
SRE_FLAG_TEMPLATE = 1 # template mode (disable backtracking)
SRE_FLAG_IGNORECASE = 2 # case insensitive
SRE_FLAG_LOCALE = 4 # honour system locale
SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string
SRE_FLAG_UNICODE = 32 # use unicode "locale"
SRE_FLAG_VERBOSE = 64 # ignore whitespace and comments
SRE_FLAG_DEBUG = 128 # debugging
SRE_FLAG_ASCII = 256 # use ascii "locale"
# flags for INFO primitive
SRE_INFO_PREFIX = 1 # has prefix
SRE_INFO_LITERAL = 2 # entire pattern is literal (given by prefix)
SRE_INFO_CHARSET = 4 # pattern starts with character from given set
if __name__ == "__main__":
def dump(f, d, prefix):
items = sorted(d.items(), key=lambda a: a[1])
for k, v in items:
f.write("#define %s_%s %s\n" % (prefix, k.upper(), v))
f = open("sre_constants.h", "w")
f.write("""\
/*
* Secret Labs' Regular Expression Engine
*
* regular expression matching engine
*
* NOTE: This file is generated by sre_constants.py. If you need
* to change anything in here, edit sre_constants.py and run it.
*
* Copyright (c) 1997-2001 by Secret Labs AB. All rights reserved.
*
* See the _sre.c file for information on usage and redistribution.
*/
""")
f.write("#define SRE_MAGIC %d\n" % MAGIC)
dump(f, OPCODES, "SRE_OP")
dump(f, ATCODES, "SRE")
dump(f, CHCODES, "SRE")
f.write("#define SRE_FLAG_TEMPLATE %d\n" % SRE_FLAG_TEMPLATE)
f.write("#define SRE_FLAG_IGNORECASE %d\n" % SRE_FLAG_IGNORECASE)
f.write("#define SRE_FLAG_LOCALE %d\n" % SRE_FLAG_LOCALE)
f.write("#define SRE_FLAG_MULTILINE %d\n" % SRE_FLAG_MULTILINE)
f.write("#define SRE_FLAG_DOTALL %d\n" % SRE_FLAG_DOTALL)
f.write("#define SRE_FLAG_UNICODE %d\n" % SRE_FLAG_UNICODE)
f.write("#define SRE_FLAG_VERBOSE %d\n" % SRE_FLAG_VERBOSE)
f.write("#define SRE_FLAG_DEBUG %d\n" % SRE_FLAG_DEBUG)
f.write("#define SRE_FLAG_ASCII %d\n" % SRE_FLAG_ASCII)
f.write("#define SRE_INFO_PREFIX %d\n" % SRE_INFO_PREFIX)
f.write("#define SRE_INFO_LITERAL %d\n" % SRE_INFO_LITERAL)
f.write("#define SRE_INFO_CHARSET %d\n" % SRE_INFO_CHARSET)
f.close()
print("done")
#
# Secret Labs' Regular Expression Engine
#
# convert re-style regular expression to sre pattern
#
# Copyright (c) 1998-2001 by Secret Labs AB. All rights reserved.
#
# See the sre.py file for information on usage and redistribution.
#
"""Internal support module for sre"""
# XXX: show string offset and offending character for all errors
from sre_constants import *
from _sre import MAXREPEAT
SPECIAL_CHARS = ".\\[{()*+?^$|"
REPEAT_CHARS = "*+?{"
DIGITS = set("0123456789")
OCTDIGITS = set("01234567")
HEXDIGITS = set("0123456789abcdefABCDEF")
WHITESPACE = set(" \t\n\r\v\f")
ESCAPES = {
r"\a": (LITERAL, ord("\a")),
r"\b": (LITERAL, ord("\b")),
r"\f": (LITERAL, ord("\f")),
r"\n": (LITERAL, ord("\n")),
r"\r": (LITERAL, ord("\r")),
r"\t": (LITERAL, ord("\t")),
r"\v": (LITERAL, ord("\v")),
r"\\": (LITERAL, ord("\\"))
}
CATEGORIES = {
r"\A": (AT, AT_BEGINNING_STRING), # start of string
r"\b": (AT, AT_BOUNDARY),
r"\B": (AT, AT_NON_BOUNDARY),
r"\d": (IN, [(CATEGORY, CATEGORY_DIGIT)]),
r"\D": (IN, [(CATEGORY, CATEGORY_NOT_DIGIT)]),
r"\s": (IN, [(CATEGORY, CATEGORY_SPACE)]),
r"\S": (IN, [(CATEGORY, CATEGORY_NOT_SPACE)]),
r"\w": (IN, [(CATEGORY, CATEGORY_WORD)]),
r"\W": (IN, [(CATEGORY, CATEGORY_NOT_WORD)]),
r"\Z": (AT, AT_END_STRING), # end of string
}
FLAGS = {
# standard flags
"i": SRE_FLAG_IGNORECASE,
"L": SRE_FLAG_LOCALE,
"m": SRE_FLAG_MULTILINE,
"s": SRE_FLAG_DOTALL,
"x": SRE_FLAG_VERBOSE,
# extensions
"a": SRE_FLAG_ASCII,
"t": SRE_FLAG_TEMPLATE,
"u": SRE_FLAG_UNICODE,
}
class Pattern:
# master pattern object. keeps track of global attributes
def __init__(self):
self.flags = 0
self.open = []
self.groups = 1
self.groupdict = {}
self.lookbehind = 0
def opengroup(self, name=None):
gid = self.groups
self.groups = gid + 1
if name is not None:
ogid = self.groupdict.get(name, None)
if ogid is not None:
raise error("redefinition of group name %s as group %d; "
"was group %d" % (repr(name), gid, ogid))
self.groupdict[name] = gid
self.open.append(gid)
return gid
def closegroup(self, gid):
self.open.remove(gid)
def checkgroup(self, gid):
return gid < self.groups and gid not in self.open
class SubPattern:
# a subpattern, in intermediate form
def __init__(self, pattern, data=None):
self.pattern = pattern
if data is None:
data = []
self.data = data
self.width = None
def dump(self, level=0):
nl = True
seqtypes = (tuple, list)
for op, av in self.data:
print(level*" " + op, end='')
if op == IN:
# member sublanguage
print()
for op, a in av:
print((level+1)*" " + op, a)
elif op == BRANCH:
print()
for i, a in enumerate(av[1]):
if i:
print(level*" " + "or")
a.dump(level+1)
elif op == GROUPREF_EXISTS:
condgroup, item_yes, item_no = av
print('', condgroup)
item_yes.dump(level+1)
if item_no:
print(level*" " + "else")
item_no.dump(level+1)
elif isinstance(av, seqtypes):
nl = False
for a in av:
if isinstance(a, SubPattern):
if not nl:
print()
a.dump(level+1)
nl = True
else:
if not nl:
print(' ', end='')
print(a, end='')
nl = False
if not nl:
print()
else:
print('', av)
def __repr__(self):
return repr(self.data)
def __len__(self):
return len(self.data)
def __delitem__(self, index):
del self.data[index]
def __getitem__(self, index):
if isinstance(index, slice):
return SubPattern(self.pattern, self.data[index])
return self.data[index]
def __setitem__(self, index, code):
self.data[index] = code
def insert(self, index, code):
self.data.insert(index, code)
def append(self, code):
self.data.append(code)
def getwidth(self):
# determine the width (min, max) for this subpattern
if self.width:
return self.width
lo = hi = 0
UNITCODES = (ANY, RANGE, IN, LITERAL, NOT_LITERAL, CATEGORY)
REPEATCODES = (MIN_REPEAT, MAX_REPEAT)
for op, av in self.data:
if op is BRANCH:
i = MAXREPEAT - 1
j = 0
for av in av[1]:
l, h = av.getwidth()
i = min(i, l)
j = max(j, h)
lo = lo + i
hi = hi + j
elif op is CALL:
i, j = av.getwidth()
lo = lo + i
hi = hi + j
elif op is SUBPATTERN:
i, j = av[1].getwidth()
lo = lo + i
hi = hi + j
elif op in REPEATCODES:
i, j = av[2].getwidth()
lo = lo + i * av[0]
hi = hi + j * av[1]
elif op in UNITCODES:
lo = lo + 1
hi = hi + 1
elif op == SUCCESS:
break
self.width = min(lo, MAXREPEAT - 1), min(hi, MAXREPEAT)
return self.width
class Tokenizer:
def __init__(self, string):
self.istext = isinstance(string, str)
self.string = string
self.index = 0
self.__next()
def __next(self):
if self.index >= len(self.string):
self.next = None
return
char = self.string[self.index:self.index+1]
# Special case for the str8, since indexing returns a integer
# XXX This is only needed for test_bug_926075 in test_re.py
if char and not self.istext:
char = chr(char[0])
if char == "\\":
try:
c = self.string[self.index + 1]
except IndexError:
raise error("bogus escape (end of line)")
if not self.istext:
c = chr(c)
char = char + c
self.index = self.index + len(char)
self.next = char
def match(self, char, skip=1):
if char == self.next:
if skip:
self.__next()
return 1
return 0
def get(self):
this = self.next
self.__next()
return this
def getwhile(self, n, charset):
result = ''
for _ in range(n):
c = self.next
if c not in charset:
break
result += c
self.__next()
return result
def tell(self):
return self.index, self.next
def seek(self, index):
self.index, self.next = index
# The following three functions are not used in this module anymore, but we keep
# them here (with DeprecationWarnings) for backwards compatibility.
def isident(char):
import warnings
warnings.warn('sre_parse.isident() will be removed in 3.5',
DeprecationWarning, stacklevel=2)
return "a" <= char <= "z" or "A" <= char <= "Z" or char == "_"
def isdigit(char):
import warnings
warnings.warn('sre_parse.isdigit() will be removed in 3.5',
DeprecationWarning, stacklevel=2)
return "0" <= char <= "9"
def isname(name):
import warnings
warnings.warn('sre_parse.isname() will be removed in 3.5',
DeprecationWarning, stacklevel=2)
# check that group name is a valid string
if not isident(name[0]):
return False
for char in name[1:]:
if not isident(char) and not isdigit(char):
return False
return True
def _class_escape(source, escape):
# handle escape code inside character class
code = ESCAPES.get(escape)
if code:
return code
code = CATEGORIES.get(escape)
if code and code[0] == IN:
return code
try:
c = escape[1:2]
if c == "x":
# hexadecimal escape (exactly two digits)
escape += source.getwhile(2, HEXDIGITS)
if len(escape) != 4:
raise ValueError
return LITERAL, int(escape[2:], 16) & 0xff
elif c == "u" and source.istext:
# unicode escape (exactly four digits)
escape += source.getwhile(4, HEXDIGITS)
if len(escape) != 6:
raise ValueError
return LITERAL, int(escape[2:], 16)
elif c == "U" and source.istext:
# unicode escape (exactly eight digits)
escape += source.getwhile(8, HEXDIGITS)
if len(escape) != 10:
raise ValueError
c = int(escape[2:], 16)
chr(c) # raise ValueError for invalid code
return LITERAL, c
elif c in OCTDIGITS:
# octal escape (up to three digits)
escape += source.getwhile(2, OCTDIGITS)
return LITERAL, int(escape[1:], 8) & 0xff
elif c in DIGITS:
raise ValueError
if len(escape) == 2:
return LITERAL, ord(escape[1])
except ValueError:
pass
raise error("bogus escape: %s" % repr(escape))
def _escape(source, escape, state):
# handle escape code in expression
code = CATEGORIES.get(escape)
if code:
return code
code = ESCAPES.get(escape)
if code:
return code
try:
c = escape[1:2]
if c == "x":
# hexadecimal escape
escape += source.getwhile(2, HEXDIGITS)
if len(escape) != 4:
raise ValueError
return LITERAL, int(escape[2:], 16) & 0xff
elif c == "u" and source.istext:
# unicode escape (exactly four digits)
escape += source.getwhile(4, HEXDIGITS)
if len(escape) != 6:
raise ValueError
return LITERAL, int(escape[2:], 16)
elif c == "U" and source.istext:
# unicode escape (exactly eight digits)
escape += source.getwhile(8, HEXDIGITS)
if len(escape) != 10:
raise ValueError
c = int(escape[2:], 16)
chr(c) # raise ValueError for invalid code
return LITERAL, c
elif c == "0":
# octal escape
escape += source.getwhile(2, OCTDIGITS)
return LITERAL, int(escape[1:], 8) & 0xff
elif c in DIGITS:
# octal escape *or* decimal group reference (sigh)
if source.next in DIGITS:
escape = escape + source.get()
if (escape[1] in OCTDIGITS and escape[2] in OCTDIGITS and
source.next in OCTDIGITS):
# got three octal digits; this is an octal escape
escape = escape + source.get()
return LITERAL, int(escape[1:], 8) & 0xff
# not an octal escape, so this is a group reference
group = int(escape[1:])
if group < state.groups:
if not state.checkgroup(group):
raise error("cannot refer to open group")
if state.lookbehind:
import warnings
warnings.warn('group references in lookbehind '
'assertions are not supported',
RuntimeWarning)
return GROUPREF, group
raise ValueError
if len(escape) == 2:
return LITERAL, ord(escape[1])
except ValueError:
pass
raise error("bogus escape: %s" % repr(escape))
def _parse_sub(source, state, nested=1):
# parse an alternation: a|b|c
items = []
itemsappend = items.append
sourcematch = source.match
while 1:
itemsappend(_parse(source, state))
if sourcematch("|"):
continue
if not nested:
break
if not source.next or sourcematch(")", 0):
break
else:
raise error("pattern not properly closed")
if len(items) == 1:
return items[0]
subpattern = SubPattern(state)
subpatternappend = subpattern.append
# check if all items share a common prefix
while 1:
prefix = None
for item in items:
if not item:
break
if prefix is None:
prefix = item[0]
elif item[0] != prefix:
break
else:
# all subitems start with a common "prefix".
# move it out of the branch
for item in items:
del item[0]
subpatternappend(prefix)
continue # check next one
break
# check if the branch can be replaced by a character set
for item in items:
if len(item) != 1 or item[0][0] != LITERAL:
break
else:
# we can store this as a character set instead of a
# branch (the compiler may optimize this even more)
set = []
setappend = set.append
for item in items:
setappend(item[0])
subpatternappend((IN, set))
return subpattern
subpattern.append((BRANCH, (None, items)))
return subpattern
def _parse_sub_cond(source, state, condgroup):
item_yes = _parse(source, state)
if source.match("|"):
item_no = _parse(source, state)
if source.match("|"):
raise error("conditional backref with more than two branches")
else:
item_no = None
if source.next and not source.match(")", 0):
raise error("pattern not properly closed")
subpattern = SubPattern(state)
subpattern.append((GROUPREF_EXISTS, (condgroup, item_yes, item_no)))
return subpattern
_PATTERNENDERS = set("|)")
_ASSERTCHARS = set("=!<")
_LOOKBEHINDASSERTCHARS = set("=!")
_REPEATCODES = set([MIN_REPEAT, MAX_REPEAT])
def _parse(source, state):
# parse a simple pattern
subpattern = SubPattern(state)
# precompute constants into local variables
subpatternappend = subpattern.append
sourceget = source.get
sourcematch = source.match
_len = len
PATTERNENDERS = _PATTERNENDERS
ASSERTCHARS = _ASSERTCHARS
LOOKBEHINDASSERTCHARS = _LOOKBEHINDASSERTCHARS
REPEATCODES = _REPEATCODES
while 1:
if source.next in PATTERNENDERS:
break # end of subpattern
this = sourceget()
if this is None:
break # end of pattern
if state.flags & SRE_FLAG_VERBOSE:
# skip whitespace and comments
if this in WHITESPACE:
continue
if this == "#":
while 1:
this = sourceget()
if this in (None, "\n"):
break
continue
if this and this[0] not in SPECIAL_CHARS:
subpatternappend((LITERAL, ord(this)))
elif this == "[":
# character set
set = []
setappend = set.append
## if sourcematch(":"):
## pass # handle character classes
if sourcematch("^"):
setappend((NEGATE, None))
# check remaining characters
start = set[:]
while 1:
this = sourceget()
if this == "]" and set != start:
break
elif this and this[0] == "\\":
code1 = _class_escape(source, this)
elif this:
code1 = LITERAL, ord(this)
else:
raise error("unexpected end of regular expression")
if sourcematch("-"):
# potential range
this = sourceget()
if this == "]":
if code1[0] is IN:
code1 = code1[1][0]
setappend(code1)
setappend((LITERAL, ord("-")))
break
elif this:
if this[0] == "\\":
code2 = _class_escape(source, this)
else:
code2 = LITERAL, ord(this)
if code1[0] != LITERAL or code2[0] != LITERAL:
raise error("bad character range")
lo = code1[1]
hi = code2[1]
if hi < lo:
raise error("bad character range")
setappend((RANGE, (lo, hi)))
else:
raise error("unexpected end of regular expression")
else:
if code1[0] is IN:
code1 = code1[1][0]
setappend(code1)
# XXX: <fl> should move set optimization to compiler!
if _len(set)==1 and set[0][0] is LITERAL:
subpatternappend(set[0]) # optimization
elif _len(set)==2 and set[0][0] is NEGATE and set[1][0] is LITERAL:
subpatternappend((NOT_LITERAL, set[1][1])) # optimization
else:
# XXX: <fl> should add charmap optimization here
subpatternappend((IN, set))
elif this and this[0] in REPEAT_CHARS:
# repeat previous item
if this == "?":
min, max = 0, 1
elif this == "*":
min, max = 0, MAXREPEAT
elif this == "+":
min, max = 1, MAXREPEAT
elif this == "{":
if source.next == "}":
subpatternappend((LITERAL, ord(this)))
continue
here = source.tell()
min, max = 0, MAXREPEAT
lo = hi = ""
while source.next in DIGITS:
lo = lo + source.get()
if sourcematch(","):
while source.next in DIGITS:
hi = hi + sourceget()
else:
hi = lo
if not sourcematch("}"):
subpatternappend((LITERAL, ord(this)))
source.seek(here)
continue
if lo:
min = int(lo)
if min >= MAXREPEAT:
raise OverflowError("the repetition number is too large")
if hi:
max = int(hi)
if max >= MAXREPEAT:
raise OverflowError("the repetition number is too large")
if max < min:
raise error("bad repeat interval")
else:
raise error("not supported")
# figure out which item to repeat
if subpattern:
item = subpattern[-1:]
else:
item = None
if not item or (_len(item) == 1 and item[0][0] == AT):
raise error("nothing to repeat")
if item[0][0] in REPEATCODES:
raise error("multiple repeat")
if sourcematch("?"):
subpattern[-1] = (MIN_REPEAT, (min, max, item))
else:
subpattern[-1] = (MAX_REPEAT, (min, max, item))
elif this == ".":
subpatternappend((ANY, None))
elif this == "(":
group = 1
name = None
condgroup = None
if sourcematch("?"):
group = 0
# options
if sourcematch("P"):
# python extensions
if sourcematch("<"):
# named group: skip forward to end of name
name = ""
while 1:
char = sourceget()
if char is None:
raise error("unterminated name")
if char == ">":
break
name = name + char
group = 1
if not name:
raise error("missing group name")
if not name.isidentifier():
raise error("bad character in group name %r" % name)
elif sourcematch("="):
# named backreference
name = ""
while 1:
char = sourceget()
if char is None:
raise error("unterminated name")
if char == ")":
break
name = name + char
if not name:
raise error("missing group name")
if not name.isidentifier():
raise error("bad character in backref group name "
"%r" % name)
gid = state.groupdict.get(name)
if gid is None:
msg = "unknown group name: {0!r}".format(name)
raise error(msg)
if state.lookbehind:
import warnings
warnings.warn('group references in lookbehind '
'assertions are not supported',
RuntimeWarning)
subpatternappend((GROUPREF, gid))
continue
else:
char = sourceget()
if char is None:
raise error("unexpected end of pattern")
raise error("unknown specifier: ?P%s" % char)
elif sourcematch(":"):
# non-capturing group
group = 2
elif sourcematch("#"):
# comment
while 1:
if source.next is None or source.next == ")":
break
sourceget()
if not sourcematch(")"):
raise error("unbalanced parenthesis")
continue
elif source.next in ASSERTCHARS:
# lookahead assertions
char = sourceget()
dir = 1
if char == "<":
if source.next not in LOOKBEHINDASSERTCHARS:
raise error("syntax error")
dir = -1 # lookbehind
char = sourceget()
state.lookbehind += 1
p = _parse_sub(source, state)
if dir < 0:
state.lookbehind -= 1
if not sourcematch(")"):
raise error("unbalanced parenthesis")
if char == "=":
subpatternappend((ASSERT, (dir, p)))
else:
subpatternappend((ASSERT_NOT, (dir, p)))
continue
elif sourcematch("("):
# conditional backreference group
condname = ""
while 1:
char = sourceget()
if char is None:
raise error("unterminated name")
if char == ")":
break
condname = condname + char
group = 2
if not condname:
raise error("missing group name")
if condname.isidentifier():
condgroup = state.groupdict.get(condname)
if condgroup is None:
msg = "unknown group name: {0!r}".format(condname)
raise error(msg)
else:
try:
condgroup = int(condname)
except ValueError:
raise error("bad character in group name")
if state.lookbehind:
import warnings
warnings.warn('group references in lookbehind '
'assertions are not supported',
RuntimeWarning)
else:
# flags
if not source.next in FLAGS:
raise error("unexpected end of pattern")
while source.next in FLAGS:
state.flags = state.flags | FLAGS[sourceget()]
if group:
# parse group contents
if group == 2:
# anonymous group
group = None
else:
group = state.opengroup(name)
if condgroup:
p = _parse_sub_cond(source, state, condgroup)
else:
p = _parse_sub(source, state)
if not sourcematch(")"):
raise error("unbalanced parenthesis")
if group is not None:
state.closegroup(group)
subpatternappend((SUBPATTERN, (group, p)))
else:
while 1:
char = sourceget()
if char is None:
raise error("unexpected end of pattern")
if char == ")":
break
raise error("unknown extension")
elif this == "^":
subpatternappend((AT, AT_BEGINNING))
elif this == "$":
subpattern.append((AT, AT_END))
elif this and this[0] == "\\":
code = _escape(source, this, state)
subpatternappend(code)
else:
raise error("parser error")
return subpattern
def fix_flags(src, flags):
# Check and fix flags according to the type of pattern (str or bytes)
if isinstance(src, str):
if not flags & SRE_FLAG_ASCII:
flags |= SRE_FLAG_UNICODE
elif flags & SRE_FLAG_UNICODE:
raise ValueError("ASCII and UNICODE flags are incompatible")
else:
if flags & SRE_FLAG_UNICODE:
raise ValueError("can't use UNICODE flag with a bytes pattern")
return flags
def parse(str, flags=0, pattern=None):
# parse 're' pattern into list of (opcode, argument) tuples
source = Tokenizer(str)
if pattern is None:
pattern = Pattern()
pattern.flags = flags
pattern.str = str
p = _parse_sub(source, pattern, 0)
p.pattern.flags = fix_flags(str, p.pattern.flags)
tail = source.get()
if tail == ")":
raise error("unbalanced parenthesis")
elif tail:
raise error("bogus characters at end of regular expression")
if flags & SRE_FLAG_DEBUG:
p.dump()
if not (flags & SRE_FLAG_VERBOSE) and p.pattern.flags & SRE_FLAG_VERBOSE:
# the VERBOSE flag was switched on inside the pattern. to be
# on the safe side, we'll parse the whole thing again...
return parse(str, p.pattern.flags)
return p
def parse_template(source, pattern):
# parse 're' replacement string into list of literals and
# group references
s = Tokenizer(source)
sget = s.get
groups = []
literals = []
literal = []
lappend = literal.append
def addgroup(index):
if literal:
literals.append(''.join(literal))
del literal[:]
groups.append((len(literals), index))
literals.append(None)
while True:
this = sget()
if this is None:
break # end of replacement string
if this[0] == "\\":
# group
c = this[1]
if c == "g":
name = ""
if s.match("<"):
while True:
char = sget()
if char is None:
raise error("unterminated group name")
if char == ">":
break
name += char
if not name:
raise error("missing group name")
try:
index = int(name)
if index < 0:
raise error("negative group number")
except ValueError:
if not name.isidentifier():
raise error("bad character in group name")
try:
index = pattern.groupindex[name]
except KeyError:
msg = "unknown group name: {0!r}".format(name)
raise IndexError(msg)
addgroup(index)
elif c == "0":
if s.next in OCTDIGITS:
this += sget()
if s.next in OCTDIGITS:
this += sget()
lappend(chr(int(this[1:], 8) & 0xff))
elif c in DIGITS:
isoctal = False
if s.next in DIGITS:
this += sget()
if (c in OCTDIGITS and this[2] in OCTDIGITS and
s.next in OCTDIGITS):
this += sget()
isoctal = True
lappend(chr(int(this[1:], 8) & 0xff))
if not isoctal:
addgroup(int(this[1:]))
else:
try:
this = chr(ESCAPES[this][1])
except KeyError:
pass
lappend(this)
else:
lappend(this)
if literal:
literals.append(''.join(literal))
if not isinstance(source, str):
# The tokenizer implicitly decodes bytes objects as latin-1, we must
# therefore re-encode the final representation.
literals = [None if s is None else s.encode('latin-1') for s in literals]
return groups, literals
def expand_template(template, match):
g = match.group
sep = match.string[:0]
groups, literals = template
literals = literals[:]
try:
for index, group in groups:
literals[index] = s = g(group)
if s is None:
raise error("unmatched group")
except IndexError:
raise error("invalid group reference")
return sep.join(literals)
# Wrapper module for _ssl, providing some additional facilities
# implemented in Python. Written by Bill Janssen.
"""This module provides some more Pythonic support for SSL.
Object types:
SSLSocket -- subtype of socket.socket which does SSL over the socket
Exceptions:
SSLError -- exception raised for I/O errors
Functions:
cert_time_to_seconds -- convert time string used for certificate
notBefore and notAfter functions to integer
seconds past the Epoch (the time values
returned from time.time())
fetch_server_certificate (HOST, PORT) -- fetch the certificate provided
by the server running on HOST at port PORT. No
validation of the certificate is performed.
Integer constants:
SSL_ERROR_ZERO_RETURN
SSL_ERROR_WANT_READ
SSL_ERROR_WANT_WRITE
SSL_ERROR_WANT_X509_LOOKUP
SSL_ERROR_SYSCALL
SSL_ERROR_SSL
SSL_ERROR_WANT_CONNECT
SSL_ERROR_EOF
SSL_ERROR_INVALID_ERROR_CODE
The following group define certificate requirements that one side is
allowing/requiring from the other side:
CERT_NONE - no certificates from the other side are required (or will
be looked at if provided)
CERT_OPTIONAL - certificates are not required, but if provided will be
validated, and if validation fails, the connection will
also fail
CERT_REQUIRED - certificates are required, and will be validated, and
if validation fails, the connection will also fail
The following constants identify various SSL protocol variants:
PROTOCOL_SSLv2
PROTOCOL_SSLv3
PROTOCOL_SSLv23
PROTOCOL_TLSv1
PROTOCOL_TLSv1_1
PROTOCOL_TLSv1_2
The following constants identify various SSL alert message descriptions as per
http://www.iana.org/assignments/tls-parameters/tls-parameters.xml#tls-parameters-6
ALERT_DESCRIPTION_CLOSE_NOTIFY
ALERT_DESCRIPTION_UNEXPECTED_MESSAGE
ALERT_DESCRIPTION_BAD_RECORD_MAC
ALERT_DESCRIPTION_RECORD_OVERFLOW
ALERT_DESCRIPTION_DECOMPRESSION_FAILURE
ALERT_DESCRIPTION_HANDSHAKE_FAILURE
ALERT_DESCRIPTION_BAD_CERTIFICATE
ALERT_DESCRIPTION_UNSUPPORTED_CERTIFICATE
ALERT_DESCRIPTION_CERTIFICATE_REVOKED
ALERT_DESCRIPTION_CERTIFICATE_EXPIRED
ALERT_DESCRIPTION_CERTIFICATE_UNKNOWN
ALERT_DESCRIPTION_ILLEGAL_PARAMETER
ALERT_DESCRIPTION_UNKNOWN_CA
ALERT_DESCRIPTION_ACCESS_DENIED
ALERT_DESCRIPTION_DECODE_ERROR
ALERT_DESCRIPTION_DECRYPT_ERROR
ALERT_DESCRIPTION_PROTOCOL_VERSION
ALERT_DESCRIPTION_INSUFFICIENT_SECURITY
ALERT_DESCRIPTION_INTERNAL_ERROR
ALERT_DESCRIPTION_USER_CANCELLED
ALERT_DESCRIPTION_NO_RENEGOTIATION
ALERT_DESCRIPTION_UNSUPPORTED_EXTENSION
ALERT_DESCRIPTION_CERTIFICATE_UNOBTAINABLE
ALERT_DESCRIPTION_UNRECOGNIZED_NAME
ALERT_DESCRIPTION_BAD_CERTIFICATE_STATUS_RESPONSE
ALERT_DESCRIPTION_BAD_CERTIFICATE_HASH_VALUE
ALERT_DESCRIPTION_UNKNOWN_PSK_IDENTITY
"""
import textwrap
import re
import sys
import os
from collections import namedtuple
from enum import Enum as _Enum
import _ssl # if we can't import it, let the error propagate
from _ssl import OPENSSL_VERSION_NUMBER, OPENSSL_VERSION_INFO, OPENSSL_VERSION
from _ssl import _SSLContext
from _ssl import (
SSLError, SSLZeroReturnError, SSLWantReadError, SSLWantWriteError,
SSLSyscallError, SSLEOFError,
)
from _ssl import CERT_NONE, CERT_OPTIONAL, CERT_REQUIRED
from _ssl import txt2obj as _txt2obj, nid2obj as _nid2obj
from _ssl import RAND_status, RAND_add, RAND_bytes, RAND_pseudo_bytes
try:
from _ssl import RAND_egd
except ImportError:
# LibreSSL does not provide RAND_egd
pass
def _import_symbols(prefix):
for n in dir(_ssl):
if n.startswith(prefix):
globals()[n] = getattr(_ssl, n)
_import_symbols('OP_')
_import_symbols('ALERT_DESCRIPTION_')
_import_symbols('SSL_ERROR_')
_import_symbols('PROTOCOL_')
_import_symbols('VERIFY_')
from _ssl import HAS_SNI, HAS_ECDH, HAS_NPN
from _ssl import _OPENSSL_API_VERSION
_PROTOCOL_NAMES = {value: name for name, value in globals().items() if name.startswith('PROTOCOL_')}
try:
from _ssl import PROTOCOL_SSLv2
_SSLv2_IF_EXISTS = PROTOCOL_SSLv2
except ImportError:
_SSLv2_IF_EXISTS = None
else:
_PROTOCOL_NAMES[PROTOCOL_SSLv2] = "SSLv2"
try:
from _ssl import PROTOCOL_TLSv1_1, PROTOCOL_TLSv1_2
except ImportError:
pass
else:
_PROTOCOL_NAMES[PROTOCOL_TLSv1_1] = "TLSv1.1"
_PROTOCOL_NAMES[PROTOCOL_TLSv1_2] = "TLSv1.2"
if sys.platform == "win32":
from _ssl import enum_certificates, enum_crls
from socket import socket, AF_INET, SOCK_STREAM, create_connection
from socket import SOL_SOCKET, SO_TYPE
import base64 # for DER-to-PEM translation
import errno
socket_error = OSError # keep that public name in module namespace
if _ssl.HAS_TLS_UNIQUE:
CHANNEL_BINDING_TYPES = ['tls-unique']
else:
CHANNEL_BINDING_TYPES = []
# Disable weak or insecure ciphers by default
# (OpenSSL's default setting is 'DEFAULT:!aNULL:!eNULL')
# Enable a better set of ciphers by default
# This list has been explicitly chosen to:
# * Prefer cipher suites that offer perfect forward secrecy (DHE/ECDHE)
# * Prefer ECDHE over DHE for better performance
# * Prefer AEAD over CBC for better performance and security
# * Prefer AES-GCM over ChaCha20 because most platforms have AES-NI
# (ChaCha20 needs OpenSSL 1.1.0 or patched 1.0.2)
# * Prefer any AES-GCM and ChaCha20 over any AES-CBC for better
# performance and security
# * Then Use HIGH cipher suites as a fallback
# * Disable NULL authentication, NULL encryption, 3DES and MD5 MACs
# for security reasons
_DEFAULT_CIPHERS = (
'ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:'
'ECDH+AES128:DH+AES:ECDH+HIGH:DH+HIGH:RSA+AESGCM:RSA+AES:RSA+HIGH:'
'!aNULL:!eNULL:!MD5:!3DES'
)
# Restricted and more secure ciphers for the server side
# This list has been explicitly chosen to:
# * Prefer cipher suites that offer perfect forward secrecy (DHE/ECDHE)
# * Prefer ECDHE over DHE for better performance
# * Prefer AEAD over CBC for better performance and security
# * Prefer AES-GCM over ChaCha20 because most platforms have AES-NI
# * Prefer any AES-GCM and ChaCha20 over any AES-CBC for better
# performance and security
# * Then Use HIGH cipher suites as a fallback
# * Disable NULL authentication, NULL encryption, MD5 MACs, DSS, RC4, and
# 3DES for security reasons
_RESTRICTED_SERVER_CIPHERS = (
'ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:'
'ECDH+AES128:DH+AES:ECDH+HIGH:DH+HIGH:RSA+AESGCM:RSA+AES:RSA+HIGH:'
'!aNULL:!eNULL:!MD5:!DSS:!RC4:!3DES'
)
class CertificateError(ValueError):
pass
def _dnsname_match(dn, hostname, max_wildcards=1):
"""Matching according to RFC 6125, section 6.4.3
http://tools.ietf.org/html/rfc6125#section-6.4.3
"""
pats = []
if not dn:
return False
leftmost, *remainder = dn.split(r'.')
wildcards = leftmost.count('*')
if wildcards > max_wildcards:
# Issue #17980: avoid denials of service by refusing more
# than one wildcard per fragment. A survey of established
# policy among SSL implementations showed it to be a
# reasonable choice.
raise CertificateError(
"too many wildcards in certificate DNS name: " + repr(dn))
# speed up common case w/o wildcards
if not wildcards:
return dn.lower() == hostname.lower()
# RFC 6125, section 6.4.3, subitem 1.
# The client SHOULD NOT attempt to match a presented identifier in which
# the wildcard character comprises a label other than the left-most label.
if leftmost == '*':
# When '*' is a fragment by itself, it matches a non-empty dotless
# fragment.
pats.append('[^.]+')
elif leftmost.startswith('xn--') or hostname.startswith('xn--'):
# RFC 6125, section 6.4.3, subitem 3.
# The client SHOULD NOT attempt to match a presented identifier
# where the wildcard character is embedded within an A-label or
# U-label of an internationalized domain name.
pats.append(re.escape(leftmost))
else:
# Otherwise, '*' matches any dotless string, e.g. www*
pats.append(re.escape(leftmost).replace(r'\*', '[^.]*'))
# add the remaining fragments, ignore any wildcards
for frag in remainder:
pats.append(re.escape(frag))
pat = re.compile(r'\A' + r'\.'.join(pats) + r'\Z', re.IGNORECASE)
return pat.match(hostname)
def match_hostname(cert, hostname):
"""Verify that *cert* (in decoded format as returned by
SSLSocket.getpeercert()) matches the *hostname*. RFC 2818 and RFC 6125
rules are followed, but IP addresses are not accepted for *hostname*.
CertificateError is raised on failure. On success, the function
returns nothing.
"""
if not cert:
raise ValueError("empty or no certificate, match_hostname needs a "
"SSL socket or SSL context with either "
"CERT_OPTIONAL or CERT_REQUIRED")
dnsnames = []
san = cert.get('subjectAltName', ())
for key, value in san:
if key == 'DNS':
if _dnsname_match(value, hostname):
return
dnsnames.append(value)
if not dnsnames:
# The subject is only checked when there is no dNSName entry
# in subjectAltName
for sub in cert.get('subject', ()):
for key, value in sub:
# XXX according to RFC 2818, the most specific Common Name
# must be used.
if key == 'commonName':
if _dnsname_match(value, hostname):
return
dnsnames.append(value)
if len(dnsnames) > 1:
raise CertificateError("hostname %r "
"doesn't match either of %s"
% (hostname, ', '.join(map(repr, dnsnames))))
elif len(dnsnames) == 1:
raise CertificateError("hostname %r "
"doesn't match %r"
% (hostname, dnsnames[0]))
else:
raise CertificateError("no appropriate commonName or "
"subjectAltName fields were found")
DefaultVerifyPaths = namedtuple("DefaultVerifyPaths",
"cafile capath openssl_cafile_env openssl_cafile openssl_capath_env "
"openssl_capath")
def get_default_verify_paths():
"""Return paths to default cafile and capath.
"""
parts = _ssl.get_default_verify_paths()
# environment vars shadow paths
cafile = os.environ.get(parts[0], parts[1])
capath = os.environ.get(parts[2], parts[3])
return DefaultVerifyPaths(cafile if os.path.isfile(cafile) else None,
capath if os.path.isdir(capath) else None,
*parts)
class _ASN1Object(namedtuple("_ASN1Object", "nid shortname longname oid")):
"""ASN.1 object identifier lookup
"""
__slots__ = ()
def __new__(cls, oid):
return super().__new__(cls, *_txt2obj(oid, name=False))
@classmethod
def fromnid(cls, nid):
"""Create _ASN1Object from OpenSSL numeric ID
"""
return super().__new__(cls, *_nid2obj(nid))
@classmethod
def fromname(cls, name):
"""Create _ASN1Object from short name, long name or OID
"""
return super().__new__(cls, *_txt2obj(name, name=True))
class Purpose(_ASN1Object, _Enum):
"""SSLContext purpose flags with X509v3 Extended Key Usage objects
"""
SERVER_AUTH = '1.3.6.1.5.5.7.3.1'
CLIENT_AUTH = '1.3.6.1.5.5.7.3.2'
class SSLContext(_SSLContext):
"""An SSLContext holds various SSL-related configuration options and
data, such as certificates and possibly a private key."""
__slots__ = ('protocol', '__weakref__')
_windows_cert_stores = ("CA", "ROOT")
def __new__(cls, protocol, *args, **kwargs):
self = _SSLContext.__new__(cls, protocol)
if protocol != _SSLv2_IF_EXISTS:
self.set_ciphers(_DEFAULT_CIPHERS)
return self
def __init__(self, protocol):
self.protocol = protocol
def wrap_socket(self, sock, server_side=False,
do_handshake_on_connect=True,
suppress_ragged_eofs=True,
server_hostname=None):
return SSLSocket(sock=sock, server_side=server_side,
do_handshake_on_connect=do_handshake_on_connect,
suppress_ragged_eofs=suppress_ragged_eofs,
server_hostname=server_hostname,
_context=self)
def set_npn_protocols(self, npn_protocols):
protos = bytearray()
for protocol in npn_protocols:
b = bytes(protocol, 'ascii')
if len(b) == 0 or len(b) > 255:
raise SSLError('NPN protocols must be 1 to 255 in length')
protos.append(len(b))
protos.extend(b)
self._set_npn_protocols(protos)
def _load_windows_store_certs(self, storename, purpose):
certs = bytearray()
for cert, encoding, trust in enum_certificates(storename):
# CA certs are never PKCS#7 encoded
if encoding == "x509_asn":
if trust is True or purpose.oid in trust:
certs.extend(cert)
self.load_verify_locations(cadata=certs)
return certs
def load_default_certs(self, purpose=Purpose.SERVER_AUTH):
if not isinstance(purpose, _ASN1Object):
raise TypeError(purpose)
if sys.platform == "win32":
for storename in self._windows_cert_stores:
self._load_windows_store_certs(storename, purpose)
self.set_default_verify_paths()
def create_default_context(purpose=Purpose.SERVER_AUTH, *, cafile=None,
capath=None, cadata=None):
"""Create a SSLContext object with default settings.
NOTE: The protocol and settings may change anytime without prior
deprecation. The values represent a fair balance between maximum
compatibility and security.
"""
if not isinstance(purpose, _ASN1Object):
raise TypeError(purpose)
context = SSLContext(PROTOCOL_SSLv23)
# SSLv2 considered harmful.
context.options |= OP_NO_SSLv2
# SSLv3 has problematic security and is only required for really old
# clients such as IE6 on Windows XP
context.options |= OP_NO_SSLv3
# disable compression to prevent CRIME attacks (OpenSSL 1.0+)
context.options |= getattr(_ssl, "OP_NO_COMPRESSION", 0)
if purpose == Purpose.SERVER_AUTH:
# verify certs and host name in client mode
context.verify_mode = CERT_REQUIRED
context.check_hostname = True
elif purpose == Purpose.CLIENT_AUTH:
# Prefer the server's ciphers by default so that we get stronger
# encryption
context.options |= getattr(_ssl, "OP_CIPHER_SERVER_PREFERENCE", 0)
# Use single use keys in order to improve forward secrecy
context.options |= getattr(_ssl, "OP_SINGLE_DH_USE", 0)
context.options |= getattr(_ssl, "OP_SINGLE_ECDH_USE", 0)
# disallow ciphers with known vulnerabilities
context.set_ciphers(_RESTRICTED_SERVER_CIPHERS)
if cafile or capath or cadata:
context.load_verify_locations(cafile, capath, cadata)
elif context.verify_mode != CERT_NONE:
# no explicit cafile, capath or cadata but the verify mode is
# CERT_OPTIONAL or CERT_REQUIRED. Let's try to load default system
# root CA certificates for the given purpose. This may fail silently.
context.load_default_certs(purpose)
return context
def _create_unverified_context(protocol=PROTOCOL_SSLv23, *, cert_reqs=None,
check_hostname=False, purpose=Purpose.SERVER_AUTH,
certfile=None, keyfile=None,
cafile=None, capath=None, cadata=None):
"""Create a SSLContext object for Python stdlib modules
All Python stdlib modules shall use this function to create SSLContext
objects in order to keep common settings in one place. The configuration
is less restrict than create_default_context()'s to increase backward
compatibility.
"""
if not isinstance(purpose, _ASN1Object):
raise TypeError(purpose)
context = SSLContext(protocol)
# SSLv2 considered harmful.
context.options |= OP_NO_SSLv2
# SSLv3 has problematic security and is only required for really old
# clients such as IE6 on Windows XP
context.options |= OP_NO_SSLv3
if cert_reqs is not None:
context.verify_mode = cert_reqs
context.check_hostname = check_hostname
if keyfile and not certfile:
raise ValueError("certfile must be specified")
if certfile or keyfile:
context.load_cert_chain(certfile, keyfile)
# load CA root certs
if cafile or capath or cadata:
context.load_verify_locations(cafile, capath, cadata)
elif context.verify_mode != CERT_NONE:
# no explicit cafile, capath or cadata but the verify mode is
# CERT_OPTIONAL or CERT_REQUIRED. Let's try to load default system
# root CA certificates for the given purpose. This may fail silently.
context.load_default_certs(purpose)
return context
# Used by http.client if no context is explicitly passed.
_create_default_https_context = create_default_context
# Backwards compatibility alias, even though it's not a public name.
_create_stdlib_context = _create_unverified_context
class SSLSocket(socket):
"""This class implements a subtype of socket.socket that wraps
the underlying OS socket in an SSL context when necessary, and
provides read and write methods over that channel."""
def __init__(self, sock=None, keyfile=None, certfile=None,
server_side=False, cert_reqs=CERT_NONE,
ssl_version=PROTOCOL_SSLv23, ca_certs=None,
do_handshake_on_connect=True,
family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None,
suppress_ragged_eofs=True, npn_protocols=None, ciphers=None,
server_hostname=None,
_context=None):
if _context:
self._context = _context
else:
if server_side and not certfile:
raise ValueError("certfile must be specified for server-side "
"operations")
if keyfile and not certfile:
raise ValueError("certfile must be specified")
if certfile and not keyfile:
keyfile = certfile
self._context = SSLContext(ssl_version)
self._context.verify_mode = cert_reqs
if ca_certs:
self._context.load_verify_locations(ca_certs)
if certfile:
self._context.load_cert_chain(certfile, keyfile)
if npn_protocols:
self._context.set_npn_protocols(npn_protocols)
if ciphers:
self._context.set_ciphers(ciphers)
self.keyfile = keyfile
self.certfile = certfile
self.cert_reqs = cert_reqs
self.ssl_version = ssl_version
self.ca_certs = ca_certs
self.ciphers = ciphers
# Can't use sock.type as other flags (such as SOCK_NONBLOCK) get
# mixed in.
if sock.getsockopt(SOL_SOCKET, SO_TYPE) != SOCK_STREAM:
raise NotImplementedError("only stream sockets are supported")
if server_side and server_hostname:
raise ValueError("server_hostname can only be specified "
"in client mode")
if self._context.check_hostname and not server_hostname:
raise ValueError("check_hostname requires server_hostname")
self.server_side = server_side
self.server_hostname = server_hostname
self.do_handshake_on_connect = do_handshake_on_connect
self.suppress_ragged_eofs = suppress_ragged_eofs
if sock is not None:
socket.__init__(self,
family=sock.family,
type=sock.type,
proto=sock.proto,
fileno=sock.fileno())
self.settimeout(sock.gettimeout())
sock.detach()
elif fileno is not None:
socket.__init__(self, fileno=fileno)
else:
socket.__init__(self, family=family, type=type, proto=proto)
# See if we are connected
try:
self.getpeername()
except OSError as e:
if e.errno != errno.ENOTCONN:
raise
connected = False
else:
connected = True
self._closed = False
self._sslobj = None
self._connected = connected
if connected:
# create the SSL object
try:
self._sslobj = self._context._wrap_socket(self, server_side,
server_hostname)
if do_handshake_on_connect:
timeout = self.gettimeout()
if timeout == 0.0:
# non-blocking
raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
self.do_handshake()
except (OSError, ValueError):
self.close()
raise
@property
def context(self):
return self._context
@context.setter
def context(self, ctx):
self._context = ctx
self._sslobj.context = ctx
def dup(self):
raise NotImplemented("Can't dup() %s instances" %
self.__class__.__name__)
def _checkClosed(self, msg=None):
# raise an exception here if you wish to check for spurious closes
pass
def _check_connected(self):
if not self._connected:
# getpeername() will raise ENOTCONN if the socket is really
# not connected; note that we can be connected even without
# _connected being set, e.g. if connect() first returned
# EAGAIN.
self.getpeername()
def read(self, len=0, buffer=None):
"""Read up to LEN bytes and return them.
Return zero-length string on EOF."""
self._checkClosed()
if not self._sslobj:
raise ValueError("Read on closed or unwrapped SSL socket.")
try:
if buffer is not None:
v = self._sslobj.read(len, buffer)
else:
v = self._sslobj.read(len or 1024)
return v
except SSLError as x:
if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:
if buffer is not None:
return 0
else:
return b''
else:
raise
def write(self, data):
"""Write DATA to the underlying SSL channel. Returns
number of bytes of DATA actually transmitted."""
self._checkClosed()
if not self._sslobj:
raise ValueError("Write on closed or unwrapped SSL socket.")
return self._sslobj.write(data)
def getpeercert(self, binary_form=False):
"""Returns a formatted version of the data in the
certificate provided by the other end of the SSL channel.
Return None if no certificate was provided, {} if a
certificate was provided, but not validated."""
self._checkClosed()
self._check_connected()
return self._sslobj.peer_certificate(binary_form)
def selected_npn_protocol(self):
self._checkClosed()
if not self._sslobj or not _ssl.HAS_NPN:
return None
else:
return self._sslobj.selected_npn_protocol()
def cipher(self):
self._checkClosed()
if not self._sslobj:
return None
else:
return self._sslobj.cipher()
def compression(self):
self._checkClosed()
if not self._sslobj:
return None
else:
return self._sslobj.compression()
def send(self, data, flags=0):
self._checkClosed()
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to send() on %s" %
self.__class__)
try:
v = self._sslobj.write(data)
except SSLError as x:
if x.args[0] == SSL_ERROR_WANT_READ:
return 0
elif x.args[0] == SSL_ERROR_WANT_WRITE:
return 0
else:
raise
else:
return v
else:
return socket.send(self, data, flags)
def sendto(self, data, flags_or_addr, addr=None):
self._checkClosed()
if self._sslobj:
raise ValueError("sendto not allowed on instances of %s" %
self.__class__)
elif addr is None:
return socket.sendto(self, data, flags_or_addr)
else:
return socket.sendto(self, data, flags_or_addr, addr)
def sendmsg(self, *args, **kwargs):
# Ensure programs don't send data unencrypted if they try to
# use this method.
raise NotImplementedError("sendmsg not allowed on instances of %s" %
self.__class__)
def sendall(self, data, flags=0):
self._checkClosed()
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to sendall() on %s" %
self.__class__)
amount = len(data)
count = 0
while (count < amount):
v = self.send(data[count:])
count += v
return amount
else:
return socket.sendall(self, data, flags)
def recv(self, buflen=1024, flags=0):
self._checkClosed()
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to recv() on %s" %
self.__class__)
return self.read(buflen)
else:
return socket.recv(self, buflen, flags)
def recv_into(self, buffer, nbytes=None, flags=0):
self._checkClosed()
if buffer and (nbytes is None):
nbytes = len(buffer)
elif nbytes is None:
nbytes = 1024
if self._sslobj:
if flags != 0:
raise ValueError(
"non-zero flags not allowed in calls to recv_into() on %s" %
self.__class__)
return self.read(nbytes, buffer)
else:
return socket.recv_into(self, buffer, nbytes, flags)
def recvfrom(self, buflen=1024, flags=0):
self._checkClosed()
if self._sslobj:
raise ValueError("recvfrom not allowed on instances of %s" %
self.__class__)
else:
return socket.recvfrom(self, buflen, flags)
def recvfrom_into(self, buffer, nbytes=None, flags=0):
self._checkClosed()
if self._sslobj:
raise ValueError("recvfrom_into not allowed on instances of %s" %
self.__class__)
else:
return socket.recvfrom_into(self, buffer, nbytes, flags)
def recvmsg(self, *args, **kwargs):
raise NotImplementedError("recvmsg not allowed on instances of %s" %
self.__class__)
def recvmsg_into(self, *args, **kwargs):
raise NotImplementedError("recvmsg_into not allowed on instances of "
"%s" % self.__class__)
def pending(self):
self._checkClosed()
if self._sslobj:
return self._sslobj.pending()
else:
return 0
def shutdown(self, how):
self._checkClosed()
self._sslobj = None
socket.shutdown(self, how)
def unwrap(self):
if self._sslobj:
s = self._sslobj.shutdown()
self._sslobj = None
return s
else:
raise ValueError("No SSL wrapper around " + str(self))
def _real_close(self):
self._sslobj = None
socket._real_close(self)
def do_handshake(self, block=False):
"""Perform a TLS/SSL handshake."""
self._check_connected()
timeout = self.gettimeout()
try:
if timeout == 0.0 and block:
self.settimeout(None)
self._sslobj.do_handshake()
finally:
self.settimeout(timeout)
if self.context.check_hostname:
if not self.server_hostname:
raise ValueError("check_hostname needs server_hostname "
"argument")
match_hostname(self.getpeercert(), self.server_hostname)
def _real_connect(self, addr, connect_ex):
if self.server_side:
raise ValueError("can't connect in server-side mode")
# Here we assume that the socket is client-side, and not
# connected at the time of the call. We connect it, then wrap it.
if self._connected:
raise ValueError("attempt to connect already-connected SSLSocket!")
self._sslobj = self.context._wrap_socket(self, False, self.server_hostname)
try:
if connect_ex:
rc = socket.connect_ex(self, addr)
else:
rc = None
socket.connect(self, addr)
if not rc:
self._connected = True
if self.do_handshake_on_connect:
self.do_handshake()
return rc
except (OSError, ValueError):
self._sslobj = None
raise
def connect(self, addr):
"""Connects to remote ADDR, and then wraps the connection in
an SSL channel."""
self._real_connect(addr, False)
def connect_ex(self, addr):
"""Connects to remote ADDR, and then wraps the connection in
an SSL channel."""
return self._real_connect(addr, True)
def accept(self):
"""Accepts a new connection from a remote client, and returns
a tuple containing that new connection wrapped with a server-side
SSL channel, and the address of the remote client."""
newsock, addr = socket.accept(self)
newsock = self.context.wrap_socket(newsock,
do_handshake_on_connect=self.do_handshake_on_connect,
suppress_ragged_eofs=self.suppress_ragged_eofs,
server_side=True)
return newsock, addr
def get_channel_binding(self, cb_type="tls-unique"):
"""Get channel binding data for current connection. Raise ValueError
if the requested `cb_type` is not supported. Return bytes of the data
or None if the data is not available (e.g. before the handshake).
"""
if cb_type not in CHANNEL_BINDING_TYPES:
raise ValueError("Unsupported channel binding type")
if cb_type != "tls-unique":
raise NotImplementedError(
"{0} channel binding type not implemented"
.format(cb_type))
if self._sslobj is None:
return None
return self._sslobj.tls_unique_cb()
def wrap_socket(sock, keyfile=None, certfile=None,
server_side=False, cert_reqs=CERT_NONE,
ssl_version=PROTOCOL_SSLv23, ca_certs=None,
do_handshake_on_connect=True,
suppress_ragged_eofs=True,
ciphers=None):
return SSLSocket(sock=sock, keyfile=keyfile, certfile=certfile,
server_side=server_side, cert_reqs=cert_reqs,
ssl_version=ssl_version, ca_certs=ca_certs,
do_handshake_on_connect=do_handshake_on_connect,
suppress_ragged_eofs=suppress_ragged_eofs,
ciphers=ciphers)
# some utility functions
def cert_time_to_seconds(cert_time):
"""Takes a date-time string in standard ASN1_print form
("MON DAY 24HOUR:MINUTE:SEC YEAR TIMEZONE") and return
a Python time value in seconds past the epoch."""
import time
return time.mktime(time.strptime(cert_time, "%b %d %H:%M:%S %Y GMT"))
PEM_HEADER = "-----BEGIN CERTIFICATE-----"
PEM_FOOTER = "-----END CERTIFICATE-----"
def DER_cert_to_PEM_cert(der_cert_bytes):
"""Takes a certificate in binary DER format and returns the
PEM version of it as a string."""
f = str(base64.standard_b64encode(der_cert_bytes), 'ASCII', 'strict')
return (PEM_HEADER + '\n' +
textwrap.fill(f, 64) + '\n' +
PEM_FOOTER + '\n')
def PEM_cert_to_DER_cert(pem_cert_string):
"""Takes a certificate in ASCII PEM format and returns the
DER-encoded version of it as a byte sequence"""
if not pem_cert_string.startswith(PEM_HEADER):
raise ValueError("Invalid PEM encoding; must start with %s"
% PEM_HEADER)
if not pem_cert_string.strip().endswith(PEM_FOOTER):
raise ValueError("Invalid PEM encoding; must end with %s"
% PEM_FOOTER)
d = pem_cert_string.strip()[len(PEM_HEADER):-len(PEM_FOOTER)]
return base64.decodebytes(d.encode('ASCII', 'strict'))
def get_server_certificate(addr, ssl_version=PROTOCOL_SSLv23, ca_certs=None):
"""Retrieve the certificate from the server at the specified address,
and return it as a PEM-encoded string.
If 'ca_certs' is specified, validate the server cert against it.
If 'ssl_version' is specified, use it in the connection attempt."""
host, port = addr
if ca_certs is not None:
cert_reqs = CERT_REQUIRED
else:
cert_reqs = CERT_NONE
context = _create_stdlib_context(ssl_version,
cert_reqs=cert_reqs,
cafile=ca_certs)
with create_connection(addr) as sock:
with context.wrap_socket(sock) as sslsock:
dercert = sslsock.getpeercert(True)
return DER_cert_to_PEM_cert(dercert)
def get_protocol_name(protocol_code):
return _PROTOCOL_NAMES.get(protocol_code, '<unknown>')
"""Constants/functions for interpreting results of os.stat() and os.lstat().
Suggested usage: from stat import *
"""
# Indices for stat struct members in the tuple returned by os.stat()
ST_MODE = 0
ST_INO = 1
ST_DEV = 2
ST_NLINK = 3
ST_UID = 4
ST_GID = 5
ST_SIZE = 6
ST_ATIME = 7
ST_MTIME = 8
ST_CTIME = 9
# Extract bits from the mode
def S_IMODE(mode):
"""Return the portion of the file's mode that can be set by
os.chmod().
"""
return mode & 0o7777
def S_IFMT(mode):
"""Return the portion of the file's mode that describes the
file type.
"""
return mode & 0o170000
# Constants used as S_IFMT() for various file types
# (not all are implemented on all systems)
S_IFDIR = 0o040000 # directory
S_IFCHR = 0o020000 # character device
S_IFBLK = 0o060000 # block device
S_IFREG = 0o100000 # regular file
S_IFIFO = 0o010000 # fifo (named pipe)
S_IFLNK = 0o120000 # symbolic link
S_IFSOCK = 0o140000 # socket file
# Functions to test for each file type
def S_ISDIR(mode):
"""Return True if mode is from a directory."""
return S_IFMT(mode) == S_IFDIR
def S_ISCHR(mode):
"""Return True if mode is from a character special device file."""
return S_IFMT(mode) == S_IFCHR
def S_ISBLK(mode):
"""Return True if mode is from a block special device file."""
return S_IFMT(mode) == S_IFBLK
def S_ISREG(mode):
"""Return True if mode is from a regular file."""
return S_IFMT(mode) == S_IFREG
def S_ISFIFO(mode):
"""Return True if mode is from a FIFO (named pipe)."""
return S_IFMT(mode) == S_IFIFO
def S_ISLNK(mode):
"""Return True if mode is from a symbolic link."""
return S_IFMT(mode) == S_IFLNK
def S_ISSOCK(mode):
"""Return True if mode is from a socket."""
return S_IFMT(mode) == S_IFSOCK
# Names for permission bits
S_ISUID = 0o4000 # set UID bit
S_ISGID = 0o2000 # set GID bit
S_ENFMT = S_ISGID # file locking enforcement
S_ISVTX = 0o1000 # sticky bit
S_IREAD = 0o0400 # Unix V7 synonym for S_IRUSR
S_IWRITE = 0o0200 # Unix V7 synonym for S_IWUSR
S_IEXEC = 0o0100 # Unix V7 synonym for S_IXUSR
S_IRWXU = 0o0700 # mask for owner permissions
S_IRUSR = 0o0400 # read by owner
S_IWUSR = 0o0200 # write by owner
S_IXUSR = 0o0100 # execute by owner
S_IRWXG = 0o0070 # mask for group permissions
S_IRGRP = 0o0040 # read by group
S_IWGRP = 0o0020 # write by group
S_IXGRP = 0o0010 # execute by group
S_IRWXO = 0o0007 # mask for others (not in group) permissions
S_IROTH = 0o0004 # read by others
S_IWOTH = 0o0002 # write by others
S_IXOTH = 0o0001 # execute by others
# Names for file flags
UF_NODUMP = 0x00000001 # do not dump file
UF_IMMUTABLE = 0x00000002 # file may not be changed
UF_APPEND = 0x00000004 # file may only be appended to
UF_OPAQUE = 0x00000008 # directory is opaque when viewed through a union stack
UF_NOUNLINK = 0x00000010 # file may not be renamed or deleted
UF_COMPRESSED = 0x00000020 # OS X: file is hfs-compressed
UF_HIDDEN = 0x00008000 # OS X: file should not be displayed
SF_ARCHIVED = 0x00010000 # file may be archived
SF_IMMUTABLE = 0x00020000 # file may not be changed
SF_APPEND = 0x00040000 # file may only be appended to
SF_NOUNLINK = 0x00100000 # file may not be renamed or deleted
SF_SNAPSHOT = 0x00200000 # file is a snapshot file
_filemode_table = (
((S_IFLNK, "l"),
(S_IFREG, "-"),
(S_IFBLK, "b"),
(S_IFDIR, "d"),
(S_IFCHR, "c"),
(S_IFIFO, "p")),
((S_IRUSR, "r"),),
((S_IWUSR, "w"),),
((S_IXUSR|S_ISUID, "s"),
(S_ISUID, "S"),
(S_IXUSR, "x")),
((S_IRGRP, "r"),),
((S_IWGRP, "w"),),
((S_IXGRP|S_ISGID, "s"),
(S_ISGID, "S"),
(S_IXGRP, "x")),
((S_IROTH, "r"),),
((S_IWOTH, "w"),),
((S_IXOTH|S_ISVTX, "t"),
(S_ISVTX, "T"),
(S_IXOTH, "x"))
)
def filemode(mode):
"""Convert a file's mode to a string of the form '-rwxrwxrwx'."""
perm = []
for table in _filemode_table:
for bit, char in table:
if mode & bit == bit:
perm.append(char)
break
else:
perm.append("-")
return "".join(perm)
# If available, use C implementation
try:
from _stat import *
except ImportError:
pass
## Module statistics.py
##
## Copyright (c) 2013 Steven D'Aprano <[email protected]>.
##
## Licensed under the Apache License, Version 2.0 (the "License");
## you may not use this file except in compliance with the License.
## You may obtain a copy of the License at
##
## http://www.apache.org/licenses/LICENSE-2.0
##
## Unless required by applicable law or agreed to in writing, software
## distributed under the License is distributed on an "AS IS" BASIS,
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
## See the License for the specific language governing permissions and
## limitations under the License.
"""
Basic statistics module.
This module provides functions for calculating statistics of data, including
averages, variance, and standard deviation.
Calculating averages
--------------------
================== =============================================
Function Description
================== =============================================
mean Arithmetic mean (average) of data.
median Median (middle value) of data.
median_low Low median of data.
median_high High median of data.
median_grouped Median, or 50th percentile, of grouped data.
mode Mode (most common value) of data.
================== =============================================
Calculate the arithmetic mean ("the average") of data:
>>> mean([-1.0, 2.5, 3.25, 5.75])
2.625
Calculate the standard median of discrete data:
>>> median([2, 3, 4, 5])
3.5
Calculate the median, or 50th percentile, of data grouped into class intervals
centred on the data values provided. E.g. if your data points are rounded to
the nearest whole number:
>>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS
2.8333333333...
This should be interpreted in this way: you have two data points in the class
interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in
the class interval 3.5-4.5. The median of these data points is 2.8333...
Calculating variability or spread
---------------------------------
================== =============================================
Function Description
================== =============================================
pvariance Population variance of data.
variance Sample variance of data.
pstdev Population standard deviation of data.
stdev Sample standard deviation of data.
================== =============================================
Calculate the standard deviation of sample data:
>>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS
4.38961843444...
If you have previously calculated the mean, you can pass it as the optional
second argument to the four "spread" functions to avoid recalculating it:
>>> data = [1, 2, 2, 4, 4, 4, 5, 6]
>>> mu = mean(data)
>>> pvariance(data, mu)
2.5
Exceptions
----------
A single exception is defined: StatisticsError is a subclass of ValueError.
"""
__all__ = [ 'StatisticsError',
'pstdev', 'pvariance', 'stdev', 'variance',
'median', 'median_low', 'median_high', 'median_grouped',
'mean', 'mode',
]
import collections
import math
from fractions import Fraction
from decimal import Decimal
from itertools import groupby
# === Exceptions ===
class StatisticsError(ValueError):
pass
# === Private utilities ===
def _sum(data, start=0):
"""_sum(data [, start]) -> (type, sum, count)
Return a high-precision sum of the given numeric data as a fraction,
together with the type to be converted to and the count of items.
If optional argument ``start`` is given, it is added to the total.
If ``data`` is empty, ``start`` (defaulting to 0) is returned.
Examples
--------
>>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75)
(<class 'float'>, Fraction(11, 1), 5)
Some sources of round-off error will be avoided:
>>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero.
(<class 'float'>, Fraction(1000, 1), 3000)
Fractions and Decimals are also supported:
>>> from fractions import Fraction as F
>>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
(<class 'fractions.Fraction'>, Fraction(63, 20), 4)
>>> from decimal import Decimal as D
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> _sum(data)
(<class 'decimal.Decimal'>, Fraction(6963, 10000), 4)
Mixed types are currently treated as an error, except that int is
allowed.
"""
count = 0
n, d = _exact_ratio(start)
partials = {d: n}
partials_get = partials.get
T = _coerce(int, type(start))
for typ, values in groupby(data, type):
T = _coerce(T, typ) # or raise TypeError
for n,d in map(_exact_ratio, values):
count += 1
partials[d] = partials_get(d, 0) + n
if None in partials:
# The sum will be a NAN or INF. We can ignore all the finite
# partials, and just look at this special one.
total = partials[None]
assert not _isfinite(total)
else:
# Sum all the partial sums using builtin sum.
# FIXME is this faster if we sum them in order of the denominator?
total = sum(Fraction(n, d) for d, n in sorted(partials.items()))
return (T, total, count)
def _isfinite(x):
try:
return x.is_finite() # Likely a Decimal.
except AttributeError:
return math.isfinite(x) # Coerces to float first.
def _coerce(T, S):
"""Coerce types T and S to a common type, or raise TypeError.
Coercion rules are currently an implementation detail. See the CoerceTest
test class in test_statistics for details.
"""
# See http://bugs.python.org/issue24068.
assert T is not bool, "initial type T is bool"
# If the types are the same, no need to coerce anything. Put this
# first, so that the usual case (no coercion needed) happens as soon
# as possible.
if T is S: return T
# Mixed int & other coerce to the other type.
if S is int or S is bool: return T
if T is int: return S
# If one is a (strict) subclass of the other, coerce to the subclass.
if issubclass(S, T): return S
if issubclass(T, S): return T
# Ints coerce to the other type.
if issubclass(T, int): return S
if issubclass(S, int): return T
# Mixed fraction & float coerces to float (or float subclass).
if issubclass(T, Fraction) and issubclass(S, float):
return S
if issubclass(T, float) and issubclass(S, Fraction):
return T
# Any other combination is disallowed.
msg = "don't know how to coerce %s and %s"
raise TypeError(msg % (T.__name__, S.__name__))
def _exact_ratio(x):
"""Return Real number x to exact (numerator, denominator) pair.
>>> _exact_ratio(0.25)
(1, 4)
x is expected to be an int, Fraction, Decimal or float.
"""
try:
# Optimise the common case of floats. We expect that the most often
# used numeric type will be builtin floats, so try to make this as
# fast as possible.
if type(x) is float:
return x.as_integer_ratio()
try:
# x may be an int, Fraction, or Integral ABC.
return (x.numerator, x.denominator)
except AttributeError:
try:
# x may be a float subclass.
return x.as_integer_ratio()
except AttributeError:
try:
# x may be a Decimal.
return _decimal_to_ratio(x)
except AttributeError:
# Just give up?
pass
except (OverflowError, ValueError):
# float NAN or INF.
assert not math.isfinite(x)
return (x, None)
msg = "can't convert type '{}' to numerator/denominator"
raise TypeError(msg.format(type(x).__name__))
# FIXME This is faster than Fraction.from_decimal, but still too slow.
def _decimal_to_ratio(d):
"""Convert Decimal d to exact integer ratio (numerator, denominator).
>>> from decimal import Decimal
>>> _decimal_to_ratio(Decimal("2.6"))
(26, 10)
"""
sign, digits, exp = d.as_tuple()
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = 0
for digit in digits:
num = num*10 + digit
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
def _convert(value, T):
"""Convert value to given numeric type T."""
if type(value) is T:
# This covers the cases where T is Fraction, or where value is
# a NAN or INF (Decimal or float).
return value
if issubclass(T, int) and value.denominator != 1:
T = float
try:
# FIXME: what do we do if this overflows?
return T(value)
except TypeError:
if issubclass(T, Decimal):
return T(value.numerator)/T(value.denominator)
else:
raise
def _counts(data):
# Generate a table of sorted (value, frequency) pairs.
table = collections.Counter(iter(data)).most_common()
if not table:
return table
# Extract the values with the highest frequency.
maxfreq = table[0][1]
for i in range(1, len(table)):
if table[i][1] != maxfreq:
table = table[:i]
break
return table
# === Measures of central tendency (averages) ===
def mean(data):
"""Return the sample arithmetic mean of data.
>>> mean([1, 2, 3, 4, 4])
2.8
>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)
>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')
If ``data`` is empty, StatisticsError will be raised.
"""
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
T, total, count = _sum(data)
assert count == n
return _convert(total/n, T)
# FIXME: investigate ways to calculate medians without sorting? Quickselect?
def median(data):
"""Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by
taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0
"""
data = sorted(data)
n = len(data)
if n == 0:
raise StatisticsError("no median for empty data")
if n%2 == 1:
return data[n//2]
else:
i = n//2
return (data[i - 1] + data[i])/2
def median_low(data):
"""Return the low median of numeric data.
When the number of data points is odd, the middle value is returned.
When it is even, the smaller of the two middle values is returned.
>>> median_low([1, 3, 5])
3
>>> median_low([1, 3, 5, 7])
3
"""
data = sorted(data)
n = len(data)
if n == 0:
raise StatisticsError("no median for empty data")
if n%2 == 1:
return data[n//2]
else:
return data[n//2 - 1]
def median_high(data):
"""Return the high median of data.
When the number of data points is odd, the middle value is returned.
When it is even, the larger of the two middle values is returned.
>>> median_high([1, 3, 5])
3
>>> median_high([1, 3, 5, 7])
5
"""
data = sorted(data)
n = len(data)
if n == 0:
raise StatisticsError("no median for empty data")
return data[n//2]
def median_grouped(data, interval=1):
"""Return the 50th percentile (median) of grouped continuous data.
>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
3.7
>>> median_grouped([52, 52, 53, 54])
52.5
This calculates the median as the 50th percentile, and should be
used when your data is continuous and grouped. In the above example,
the values 1, 2, 3, etc. actually represent the midpoint of classes
0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in
class 3.5-4.5, and interpolation is used to estimate it.
Optional argument ``interval`` represents the class interval, and
defaults to 1. Changing the class interval naturally will change the
interpolated 50th percentile value:
>>> median_grouped([1, 3, 3, 5, 7], interval=1)
3.25
>>> median_grouped([1, 3, 3, 5, 7], interval=2)
3.5
This function does not check whether the data points are at least
``interval`` apart.
"""
data = sorted(data)
n = len(data)
if n == 0:
raise StatisticsError("no median for empty data")
elif n == 1:
return data[0]
# Find the value at the midpoint. Remember this corresponds to the
# centre of the class interval.
x = data[n//2]
for obj in (x, interval):
if isinstance(obj, (str, bytes)):
raise TypeError('expected number but got %r' % obj)
try:
L = x - interval/2 # The lower limit of the median interval.
except TypeError:
# Mixed type. For now we just coerce to float.
L = float(x) - float(interval)/2
cf = data.index(x) # Number of values below the median interval.
# FIXME The following line could be more efficient for big lists.
f = data.count(x) # Number of data points in the median interval.
return L + interval*(n/2 - cf)/f
def mode(data):
"""Return the most common data point from discrete or nominal data.
``mode`` assumes discrete data, and returns a single value. This is the
standard treatment of the mode as commonly taught in schools:
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
This also works with nominal (non-numeric) data:
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
If there is not exactly one most common value, ``mode`` will raise
StatisticsError.
"""
# Generate a table of sorted (value, frequency) pairs.
table = _counts(data)
if len(table) == 1:
return table[0][0]
elif table:
raise StatisticsError(
'no unique mode; found %d equally common values' % len(table)
)
else:
raise StatisticsError('no mode for empty data')
# === Measures of spread ===
# See http://mathworld.wolfram.com/Variance.html
# http://mathworld.wolfram.com/SampleVariance.html
# http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
#
# Under no circumstances use the so-called "computational formula for
# variance", as that is only suitable for hand calculations with a small
# amount of low-precision data. It has terrible numeric properties.
#
# See a comparison of three computational methods here:
# http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
def _ss(data, c=None):
"""Return sum of square deviations of sequence data.
If ``c`` is None, the mean is calculated in one pass, and the deviations
from the mean are calculated in a second pass. Otherwise, deviations are
calculated from ``c`` as given. Use the second case with care, as it can
lead to garbage results.
"""
if c is None:
c = mean(data)
T, total, count = _sum((x-c)**2 for x in data)
# The following sum should mathematically equal zero, but due to rounding
# error may not.
U, total2, count2 = _sum((x-c) for x in data)
assert T == U and count == count2
total -= total2**2/len(data)
assert not total < 0, 'negative sum of square deviations: %f' % total
return (T, total)
def variance(data, xbar=None):
"""Return the sample variance of data.
data should be an iterable of Real-valued numbers, with at least two
values. The optional argument xbar, if given, should be the mean of
the data. If it is missing or None, the mean is automatically calculated.
Use this function when your data is a sample from a population. To
calculate the variance from the entire population, see ``pvariance``.
Examples:
>>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
>>> variance(data)
1.3720238095238095
If you have already calculated the mean of your data, you can pass it as
the optional second argument ``xbar`` to avoid recalculating it:
>>> m = mean(data)
>>> variance(data, m)
1.3720238095238095
This function does not check that ``xbar`` is actually the mean of
``data``. Giving arbitrary values for ``xbar`` may lead to invalid or
impossible results.
Decimals and Fractions are supported:
>>> from decimal import Decimal as D
>>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
Decimal('31.01875')
>>> from fractions import Fraction as F
>>> variance([F(1, 6), F(1, 2), F(5, 3)])
Fraction(67, 108)
"""
if iter(data) is data:
data = list(data)
n = len(data)
if n < 2:
raise StatisticsError('variance requires at least two data points')
T, ss = _ss(data, xbar)
return _convert(ss/(n-1), T)
def pvariance(data, mu=None):
"""Return the population variance of ``data``.
data should be an iterable of Real-valued numbers, with at least one
value. The optional argument mu, if given, should be the mean of
the data. If it is missing or None, the mean is automatically calculated.
Use this function to calculate the variance from the entire population.
To estimate the variance from a sample, the ``variance`` function is
usually a better choice.
Examples:
>>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
>>> pvariance(data)
1.25
If you have already calculated the mean of the data, you can pass it as
the optional second argument to avoid recalculating it:
>>> mu = mean(data)
>>> pvariance(data, mu)
1.25
This function does not check that ``mu`` is actually the mean of ``data``.
Giving arbitrary values for ``mu`` may lead to invalid or impossible
results.
Decimals and Fractions are supported:
>>> from decimal import Decimal as D
>>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")])
Decimal('24.815')
>>> from fractions import Fraction as F
>>> pvariance([F(1, 4), F(5, 4), F(1, 2)])
Fraction(13, 72)
"""
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('pvariance requires at least one data point')
ss = _ss(data, mu)
T, ss = _ss(data, mu)
return _convert(ss/n, T)
def stdev(data, xbar=None):
"""Return the square root of the sample variance.
See ``variance`` for arguments and other details.
>>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
1.0810874155219827
"""
var = variance(data, xbar)
try:
return var.sqrt()
except AttributeError:
return math.sqrt(var)
def pstdev(data, mu=None):
"""Return the square root of the population variance.
See ``pvariance`` for arguments and other details.
>>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
0.986893273527251
"""
var = pvariance(data, mu)
try:
return var.sqrt()
except AttributeError:
return math.sqrt(var)
"""A collection of string constants.
Public module variables:
whitespace -- a string containing all ASCII whitespace
ascii_lowercase -- a string containing all ASCII lowercase letters
ascii_uppercase -- a string containing all ASCII uppercase letters
ascii_letters -- a string containing all ASCII letters
digits -- a string containing all ASCII decimal digits
hexdigits -- a string containing all ASCII hexadecimal digits
octdigits -- a string containing all ASCII octal digits
punctuation -- a string containing all ASCII punctuation characters
printable -- a string containing all ASCII characters considered printable
"""
import _string
# Some strings for ctype-style character classification
whitespace = ' \t\n\r\v\f'
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
ascii_letters = ascii_lowercase + ascii_uppercase
digits = '0123456789'
hexdigits = digits + 'abcdef' + 'ABCDEF'
octdigits = '01234567'
punctuation = """!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
printable = digits + ascii_letters + punctuation + whitespace
# Functions which aren't available as string methods.
# Capitalize the words in a string, e.g. " aBc dEf " -> "Abc Def".
def capwords(s, sep=None):
"""capwords(s [,sep]) -> string
Split the argument into words using split, capitalize each
word using capitalize, and join the capitalized words using
join. If the optional second argument sep is absent or None,
runs of whitespace characters are replaced by a single space
and leading and trailing whitespace are removed, otherwise
sep is used to split and join the words.
"""
return (sep or ' ').join(x.capitalize() for x in s.split(sep))
####################################################################
import re as _re
from collections import ChainMap
class _TemplateMetaclass(type):
pattern = r"""
%(delim)s(?:
(?P<escaped>%(delim)s) | # Escape sequence of two delimiters
(?P<named>%(id)s) | # delimiter and a Python identifier
{(?P<braced>%(id)s)} | # delimiter and a braced identifier
(?P<invalid>) # Other ill-formed delimiter exprs
)
"""
def __init__(cls, name, bases, dct):
super(_TemplateMetaclass, cls).__init__(name, bases, dct)
if 'pattern' in dct:
pattern = cls.pattern
else:
pattern = _TemplateMetaclass.pattern % {
'delim' : _re.escape(cls.delimiter),
'id' : cls.idpattern,
}
cls.pattern = _re.compile(pattern, cls.flags | _re.VERBOSE)
class Template(metaclass=_TemplateMetaclass):
"""A string class for supporting $-substitutions."""
delimiter = '$'
idpattern = r'[_a-z][_a-z0-9]*'
flags = _re.IGNORECASE
def __init__(self, template):
self.template = template
# Search for $$, $identifier, ${identifier}, and any bare $'s
def _invalid(self, mo):
i = mo.start('invalid')
lines = self.template[:i].splitlines(keepends=True)
if not lines:
colno = 1
lineno = 1
else:
colno = i - len(''.join(lines[:-1]))
lineno = len(lines)
raise ValueError('Invalid placeholder in string: line %d, col %d' %
(lineno, colno))
def substitute(*args, **kws):
if not args:
raise TypeError("descriptor 'substitute' of 'Template' object "
"needs an argument")
self, *args = args # allow the "self" keyword be passed
if len(args) > 1:
raise TypeError('Too many positional arguments')
if not args:
mapping = kws
elif kws:
mapping = ChainMap(kws, args[0])
else:
mapping = args[0]
# Helper function for .sub()
def convert(mo):
# Check the most common path first.
named = mo.group('named') or mo.group('braced')
if named is not None:
val = mapping[named]
# We use this idiom instead of str() because the latter will
# fail if val is a Unicode containing non-ASCII characters.
return '%s' % (val,)
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
self._invalid(mo)
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)
def safe_substitute(*args, **kws):
if not args:
raise TypeError("descriptor 'safe_substitute' of 'Template' object "
"needs an argument")
self, *args = args # allow the "self" keyword be passed
if len(args) > 1:
raise TypeError('Too many positional arguments')
if not args:
mapping = kws
elif kws:
mapping = ChainMap(kws, args[0])
else:
mapping = args[0]
# Helper function for .sub()
def convert(mo):
named = mo.group('named') or mo.group('braced')
if named is not None:
try:
# We use this idiom instead of str() because the latter
# will fail if val is a Unicode containing non-ASCII
return '%s' % (mapping[named],)
except KeyError:
return mo.group()
if mo.group('escaped') is not None:
return self.delimiter
if mo.group('invalid') is not None:
return mo.group()
raise ValueError('Unrecognized named group in pattern',
self.pattern)
return self.pattern.sub(convert, self.template)
########################################################################
# the Formatter class
# see PEP 3101 for details and purpose of this class
# The hard parts are reused from the C implementation. They're exposed as "_"
# prefixed methods of str.
# The overall parser is implemented in _string.formatter_parser.
# The field name parser is implemented in _string.formatter_field_name_split
class Formatter:
def format(*args, **kwargs):
if not args:
raise TypeError("descriptor 'format' of 'Formatter' object "
"needs an argument")
self, *args = args # allow the "self" keyword be passed
try:
format_string, *args = args # allow the "format_string" keyword be passed
except ValueError:
if 'format_string' in kwargs:
format_string = kwargs.pop('format_string')
else:
raise TypeError("format() missing 1 required positional "
"argument: 'format_string'") from None
return self.vformat(format_string, args, kwargs)
def vformat(self, format_string, args, kwargs):
used_args = set()
result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
self.check_unused_args(used_args, args, kwargs)
return result
def _vformat(self, format_string, args, kwargs, used_args, recursion_depth,
auto_arg_index=0):
if recursion_depth < 0:
raise ValueError('Max string recursion exceeded')
result = []
for literal_text, field_name, format_spec, conversion in \
self.parse(format_string):
# output the literal text
if literal_text:
result.append(literal_text)
# if there's a field, output it
if field_name is not None:
# this is some markup, find the object and do
# the formatting
# handle arg indexing when empty field_names are given.
if field_name == '':
if auto_arg_index is False:
raise ValueError('cannot switch from manual field '
'specification to automatic field '
'numbering')
field_name = str(auto_arg_index)
auto_arg_index += 1
elif field_name.isdigit():
if auto_arg_index:
raise ValueError('cannot switch from manual field '
'specification to automatic field '
'numbering')
# disable auto arg incrementing, if it gets
# used later on, then an exception will be raised
auto_arg_index = False
# given the field_name, find the object it references
# and the argument it came from
obj, arg_used = self.get_field(field_name, args, kwargs)
used_args.add(arg_used)
# do any conversion on the resulting object
obj = self.convert_field(obj, conversion)
# expand the format spec, if needed
format_spec, auto_arg_index = self._vformat(
format_spec, args, kwargs,
used_args, recursion_depth-1,
auto_arg_index=auto_arg_index)
# format the object and append to the result
result.append(self.format_field(obj, format_spec))
return ''.join(result), auto_arg_index
def get_value(self, key, args, kwargs):
if isinstance(key, int):
return args[key]
else:
return kwargs[key]
def check_unused_args(self, used_args, args, kwargs):
pass
def format_field(self, value, format_spec):
return format(value, format_spec)
def convert_field(self, value, conversion):
# do any conversion on the resulting object
if conversion is None:
return value
elif conversion == 's':
return str(value)
elif conversion == 'r':
return repr(value)
elif conversion == 'a':
return ascii(value)
raise ValueError("Unknown conversion specifier {0!s}".format(conversion))
# returns an iterable that contains tuples of the form:
# (literal_text, field_name, format_spec, conversion)
# literal_text can be zero length
# field_name can be None, in which case there's no
# object to format and output
# if field_name is not None, it is looked up, formatted
# with format_spec and conversion and then used
def parse(self, format_string):
return _string.formatter_parser(format_string)
# given a field_name, find the object it references.
# field_name: the field being looked up, e.g. "0.name"
# or "lookup[3]"
# used_args: a set of which args have been used
# args, kwargs: as passed in to vformat
def get_field(self, field_name, args, kwargs):
first, rest = _string.formatter_field_name_split(field_name)
obj = self.get_value(first, args, kwargs)
# loop through the rest of the field_name, doing
# getattr or getitem as needed
for is_attr, i in rest:
if is_attr:
obj = getattr(obj, i)
else:
obj = obj[i]
return obj, first
# This file is generated by mkstringprep.py. DO NOT EDIT.
"""Library that exposes various tables found in the StringPrep RFC 3454.
There are two kinds of tables: sets, for which a member test is provided,
and mappings, for which a mapping function is provided.
"""
from unicodedata import ucd_3_2_0 as unicodedata
assert unicodedata.unidata_version == '3.2.0'
def in_table_a1(code):
if unicodedata.category(code) != 'Cn': return False
c = ord(code)
if 0xFDD0 <= c < 0xFDF0: return False
return (c & 0xFFFF) not in (0xFFFE, 0xFFFF)
b1_set = set([173, 847, 6150, 6155, 6156, 6157, 8203, 8204, 8205, 8288, 65279] + list(range(65024,65040)))
def in_table_b1(code):
return ord(code) in b1_set
b3_exceptions = {
0xb5:'\u03bc', 0xdf:'ss', 0x130:'i\u0307', 0x149:'\u02bcn',
0x17f:'s', 0x1f0:'j\u030c', 0x345:'\u03b9', 0x37a:' \u03b9',
0x390:'\u03b9\u0308\u0301', 0x3b0:'\u03c5\u0308\u0301', 0x3c2:'\u03c3', 0x3d0:'\u03b2',
0x3d1:'\u03b8', 0x3d2:'\u03c5', 0x3d3:'\u03cd', 0x3d4:'\u03cb',
0x3d5:'\u03c6', 0x3d6:'\u03c0', 0x3f0:'\u03ba', 0x3f1:'\u03c1',
0x3f2:'\u03c3', 0x3f5:'\u03b5', 0x587:'\u0565\u0582', 0x1e96:'h\u0331',
0x1e97:'t\u0308', 0x1e98:'w\u030a', 0x1e99:'y\u030a', 0x1e9a:'a\u02be',
0x1e9b:'\u1e61', 0x1f50:'\u03c5\u0313', 0x1f52:'\u03c5\u0313\u0300', 0x1f54:'\u03c5\u0313\u0301',
0x1f56:'\u03c5\u0313\u0342', 0x1f80:'\u1f00\u03b9', 0x1f81:'\u1f01\u03b9', 0x1f82:'\u1f02\u03b9',
0x1f83:'\u1f03\u03b9', 0x1f84:'\u1f04\u03b9', 0x1f85:'\u1f05\u03b9', 0x1f86:'\u1f06\u03b9',
0x1f87:'\u1f07\u03b9', 0x1f88:'\u1f00\u03b9', 0x1f89:'\u1f01\u03b9', 0x1f8a:'\u1f02\u03b9',
0x1f8b:'\u1f03\u03b9', 0x1f8c:'\u1f04\u03b9', 0x1f8d:'\u1f05\u03b9', 0x1f8e:'\u1f06\u03b9',
0x1f8f:'\u1f07\u03b9', 0x1f90:'\u1f20\u03b9', 0x1f91:'\u1f21\u03b9', 0x1f92:'\u1f22\u03b9',
0x1f93:'\u1f23\u03b9', 0x1f94:'\u1f24\u03b9', 0x1f95:'\u1f25\u03b9', 0x1f96:'\u1f26\u03b9',
0x1f97:'\u1f27\u03b9', 0x1f98:'\u1f20\u03b9', 0x1f99:'\u1f21\u03b9', 0x1f9a:'\u1f22\u03b9',
0x1f9b:'\u1f23\u03b9', 0x1f9c:'\u1f24\u03b9', 0x1f9d:'\u1f25\u03b9', 0x1f9e:'\u1f26\u03b9',
0x1f9f:'\u1f27\u03b9', 0x1fa0:'\u1f60\u03b9', 0x1fa1:'\u1f61\u03b9', 0x1fa2:'\u1f62\u03b9',
0x1fa3:'\u1f63\u03b9', 0x1fa4:'\u1f64\u03b9', 0x1fa5:'\u1f65\u03b9', 0x1fa6:'\u1f66\u03b9',
0x1fa7:'\u1f67\u03b9', 0x1fa8:'\u1f60\u03b9', 0x1fa9:'\u1f61\u03b9', 0x1faa:'\u1f62\u03b9',
0x1fab:'\u1f63\u03b9', 0x1fac:'\u1f64\u03b9', 0x1fad:'\u1f65\u03b9', 0x1fae:'\u1f66\u03b9',
0x1faf:'\u1f67\u03b9', 0x1fb2:'\u1f70\u03b9', 0x1fb3:'\u03b1\u03b9', 0x1fb4:'\u03ac\u03b9',
0x1fb6:'\u03b1\u0342', 0x1fb7:'\u03b1\u0342\u03b9', 0x1fbc:'\u03b1\u03b9', 0x1fbe:'\u03b9',
0x1fc2:'\u1f74\u03b9', 0x1fc3:'\u03b7\u03b9', 0x1fc4:'\u03ae\u03b9', 0x1fc6:'\u03b7\u0342',
0x1fc7:'\u03b7\u0342\u03b9', 0x1fcc:'\u03b7\u03b9', 0x1fd2:'\u03b9\u0308\u0300', 0x1fd3:'\u03b9\u0308\u0301',
0x1fd6:'\u03b9\u0342', 0x1fd7:'\u03b9\u0308\u0342', 0x1fe2:'\u03c5\u0308\u0300', 0x1fe3:'\u03c5\u0308\u0301',
0x1fe4:'\u03c1\u0313', 0x1fe6:'\u03c5\u0342', 0x1fe7:'\u03c5\u0308\u0342', 0x1ff2:'\u1f7c\u03b9',
0x1ff3:'\u03c9\u03b9', 0x1ff4:'\u03ce\u03b9', 0x1ff6:'\u03c9\u0342', 0x1ff7:'\u03c9\u0342\u03b9',
0x1ffc:'\u03c9\u03b9', 0x20a8:'rs', 0x2102:'c', 0x2103:'\xb0c',
0x2107:'\u025b', 0x2109:'\xb0f', 0x210b:'h', 0x210c:'h',
0x210d:'h', 0x2110:'i', 0x2111:'i', 0x2112:'l',
0x2115:'n', 0x2116:'no', 0x2119:'p', 0x211a:'q',
0x211b:'r', 0x211c:'r', 0x211d:'r', 0x2120:'sm',
0x2121:'tel', 0x2122:'tm', 0x2124:'z', 0x2128:'z',
0x212c:'b', 0x212d:'c', 0x2130:'e', 0x2131:'f',
0x2133:'m', 0x213e:'\u03b3', 0x213f:'\u03c0', 0x2145:'d',
0x3371:'hpa', 0x3373:'au', 0x3375:'ov', 0x3380:'pa',
0x3381:'na', 0x3382:'\u03bca', 0x3383:'ma', 0x3384:'ka',
0x3385:'kb', 0x3386:'mb', 0x3387:'gb', 0x338a:'pf',
0x338b:'nf', 0x338c:'\u03bcf', 0x3390:'hz', 0x3391:'khz',
0x3392:'mhz', 0x3393:'ghz', 0x3394:'thz', 0x33a9:'pa',
0x33aa:'kpa', 0x33ab:'mpa', 0x33ac:'gpa', 0x33b4:'pv',
0x33b5:'nv', 0x33b6:'\u03bcv', 0x33b7:'mv', 0x33b8:'kv',
0x33b9:'mv', 0x33ba:'pw', 0x33bb:'nw', 0x33bc:'\u03bcw',
0x33bd:'mw', 0x33be:'kw', 0x33bf:'mw', 0x33c0:'k\u03c9',
0x33c1:'m\u03c9', 0x33c3:'bq', 0x33c6:'c\u2215kg', 0x33c7:'co.',
0x33c8:'db', 0x33c9:'gy', 0x33cb:'hp', 0x33cd:'kk',
0x33ce:'km', 0x33d7:'ph', 0x33d9:'ppm', 0x33da:'pr',
0x33dc:'sv', 0x33dd:'wb', 0xfb00:'ff', 0xfb01:'fi',
0xfb02:'fl', 0xfb03:'ffi', 0xfb04:'ffl', 0xfb05:'st',
0xfb06:'st', 0xfb13:'\u0574\u0576', 0xfb14:'\u0574\u0565', 0xfb15:'\u0574\u056b',
0xfb16:'\u057e\u0576', 0xfb17:'\u0574\u056d', 0x1d400:'a', 0x1d401:'b',
0x1d402:'c', 0x1d403:'d', 0x1d404:'e', 0x1d405:'f',
0x1d406:'g', 0x1d407:'h', 0x1d408:'i', 0x1d409:'j',
0x1d40a:'k', 0x1d40b:'l', 0x1d40c:'m', 0x1d40d:'n',
0x1d40e:'o', 0x1d40f:'p', 0x1d410:'q', 0x1d411:'r',
0x1d412:'s', 0x1d413:'t', 0x1d414:'u', 0x1d415:'v',
0x1d416:'w', 0x1d417:'x', 0x1d418:'y', 0x1d419:'z',
0x1d434:'a', 0x1d435:'b', 0x1d436:'c', 0x1d437:'d',
0x1d438:'e', 0x1d439:'f', 0x1d43a:'g', 0x1d43b:'h',
0x1d43c:'i', 0x1d43d:'j', 0x1d43e:'k', 0x1d43f:'l',
0x1d440:'m', 0x1d441:'n', 0x1d442:'o', 0x1d443:'p',
0x1d444:'q', 0x1d445:'r', 0x1d446:'s', 0x1d447:'t',
0x1d448:'u', 0x1d449:'v', 0x1d44a:'w', 0x1d44b:'x',
0x1d44c:'y', 0x1d44d:'z', 0x1d468:'a', 0x1d469:'b',
0x1d46a:'c', 0x1d46b:'d', 0x1d46c:'e', 0x1d46d:'f',
0x1d46e:'g', 0x1d46f:'h', 0x1d470:'i', 0x1d471:'j',
0x1d472:'k', 0x1d473:'l', 0x1d474:'m', 0x1d475:'n',
0x1d476:'o', 0x1d477:'p', 0x1d478:'q', 0x1d479:'r',
0x1d47a:'s', 0x1d47b:'t', 0x1d47c:'u', 0x1d47d:'v',
0x1d47e:'w', 0x1d47f:'x', 0x1d480:'y', 0x1d481:'z',
0x1d49c:'a', 0x1d49e:'c', 0x1d49f:'d', 0x1d4a2:'g',
0x1d4a5:'j', 0x1d4a6:'k', 0x1d4a9:'n', 0x1d4aa:'o',
0x1d4ab:'p', 0x1d4ac:'q', 0x1d4ae:'s', 0x1d4af:'t',
0x1d4b0:'u', 0x1d4b1:'v', 0x1d4b2:'w', 0x1d4b3:'x',
0x1d4b4:'y', 0x1d4b5:'z', 0x1d4d0:'a', 0x1d4d1:'b',
0x1d4d2:'c', 0x1d4d3:'d', 0x1d4d4:'e', 0x1d4d5:'f',
0x1d4d6:'g', 0x1d4d7:'h', 0x1d4d8:'i', 0x1d4d9:'j',
0x1d4da:'k', 0x1d4db:'l', 0x1d4dc:'m', 0x1d4dd:'n',
0x1d4de:'o', 0x1d4df:'p', 0x1d4e0:'q', 0x1d4e1:'r',
0x1d4e2:'s', 0x1d4e3:'t', 0x1d4e4:'u', 0x1d4e5:'v',
0x1d4e6:'w', 0x1d4e7:'x', 0x1d4e8:'y', 0x1d4e9:'z',
0x1d504:'a', 0x1d505:'b', 0x1d507:'d', 0x1d508:'e',
0x1d509:'f', 0x1d50a:'g', 0x1d50d:'j', 0x1d50e:'k',
0x1d50f:'l', 0x1d510:'m', 0x1d511:'n', 0x1d512:'o',
0x1d513:'p', 0x1d514:'q', 0x1d516:'s', 0x1d517:'t',
0x1d518:'u', 0x1d519:'v', 0x1d51a:'w', 0x1d51b:'x',
0x1d51c:'y', 0x1d538:'a', 0x1d539:'b', 0x1d53b:'d',
0x1d53c:'e', 0x1d53d:'f', 0x1d53e:'g', 0x1d540:'i',
0x1d541:'j', 0x1d542:'k', 0x1d543:'l', 0x1d544:'m',
0x1d546:'o', 0x1d54a:'s', 0x1d54b:'t', 0x1d54c:'u',
0x1d54d:'v', 0x1d54e:'w', 0x1d54f:'x', 0x1d550:'y',
0x1d56c:'a', 0x1d56d:'b', 0x1d56e:'c', 0x1d56f:'d',
0x1d570:'e', 0x1d571:'f', 0x1d572:'g', 0x1d573:'h',
0x1d574:'i', 0x1d575:'j', 0x1d576:'k', 0x1d577:'l',
0x1d578:'m', 0x1d579:'n', 0x1d57a:'o', 0x1d57b:'p',
0x1d57c:'q', 0x1d57d:'r', 0x1d57e:'s', 0x1d57f:'t',
0x1d580:'u', 0x1d581:'v', 0x1d582:'w', 0x1d583:'x',
0x1d584:'y', 0x1d585:'z', 0x1d5a0:'a', 0x1d5a1:'b',
0x1d5a2:'c', 0x1d5a3:'d', 0x1d5a4:'e', 0x1d5a5:'f',
0x1d5a6:'g', 0x1d5a7:'h', 0x1d5a8:'i', 0x1d5a9:'j',
0x1d5aa:'k', 0x1d5ab:'l', 0x1d5ac:'m', 0x1d5ad:'n',
0x1d5ae:'o', 0x1d5af:'p', 0x1d5b0:'q', 0x1d5b1:'r',
0x1d5b2:'s', 0x1d5b3:'t', 0x1d5b4:'u', 0x1d5b5:'v',
0x1d5b6:'w', 0x1d5b7:'x', 0x1d5b8:'y', 0x1d5b9:'z',
0x1d5d4:'a', 0x1d5d5:'b', 0x1d5d6:'c', 0x1d5d7:'d',
0x1d5d8:'e', 0x1d5d9:'f', 0x1d5da:'g', 0x1d5db:'h',
0x1d5dc:'i', 0x1d5dd:'j', 0x1d5de:'k', 0x1d5df:'l',
0x1d5e0:'m', 0x1d5e1:'n', 0x1d5e2:'o', 0x1d5e3:'p',
0x1d5e4:'q', 0x1d5e5:'r', 0x1d5e6:'s', 0x1d5e7:'t',
0x1d5e8:'u', 0x1d5e9:'v', 0x1d5ea:'w', 0x1d5eb:'x',
0x1d5ec:'y', 0x1d5ed:'z', 0x1d608:'a', 0x1d609:'b',
0x1d60a:'c', 0x1d60b:'d', 0x1d60c:'e', 0x1d60d:'f',
0x1d60e:'g', 0x1d60f:'h', 0x1d610:'i', 0x1d611:'j',
0x1d612:'k', 0x1d613:'l', 0x1d614:'m', 0x1d615:'n',
0x1d616:'o', 0x1d617:'p', 0x1d618:'q', 0x1d619:'r',
0x1d61a:'s', 0x1d61b:'t', 0x1d61c:'u', 0x1d61d:'v',
0x1d61e:'w', 0x1d61f:'x', 0x1d620:'y', 0x1d621:'z',
0x1d63c:'a', 0x1d63d:'b', 0x1d63e:'c', 0x1d63f:'d',
0x1d640:'e', 0x1d641:'f', 0x1d642:'g', 0x1d643:'h',
0x1d644:'i', 0x1d645:'j', 0x1d646:'k', 0x1d647:'l',
0x1d648:'m', 0x1d649:'n', 0x1d64a:'o', 0x1d64b:'p',
0x1d64c:'q', 0x1d64d:'r', 0x1d64e:'s', 0x1d64f:'t',
0x1d650:'u', 0x1d651:'v', 0x1d652:'w', 0x1d653:'x',
0x1d654:'y', 0x1d655:'z', 0x1d670:'a', 0x1d671:'b',
0x1d672:'c', 0x1d673:'d', 0x1d674:'e', 0x1d675:'f',
0x1d676:'g', 0x1d677:'h', 0x1d678:'i', 0x1d679:'j',
0x1d67a:'k', 0x1d67b:'l', 0x1d67c:'m', 0x1d67d:'n',
0x1d67e:'o', 0x1d67f:'p', 0x1d680:'q', 0x1d681:'r',
0x1d682:'s', 0x1d683:'t', 0x1d684:'u', 0x1d685:'v',
0x1d686:'w', 0x1d687:'x', 0x1d688:'y', 0x1d689:'z',
0x1d6a8:'\u03b1', 0x1d6a9:'\u03b2', 0x1d6aa:'\u03b3', 0x1d6ab:'\u03b4',
0x1d6ac:'\u03b5', 0x1d6ad:'\u03b6', 0x1d6ae:'\u03b7', 0x1d6af:'\u03b8',
0x1d6b0:'\u03b9', 0x1d6b1:'\u03ba', 0x1d6b2:'\u03bb', 0x1d6b3:'\u03bc',
0x1d6b4:'\u03bd', 0x1d6b5:'\u03be', 0x1d6b6:'\u03bf', 0x1d6b7:'\u03c0',
0x1d6b8:'\u03c1', 0x1d6b9:'\u03b8', 0x1d6ba:'\u03c3', 0x1d6bb:'\u03c4',
0x1d6bc:'\u03c5', 0x1d6bd:'\u03c6', 0x1d6be:'\u03c7', 0x1d6bf:'\u03c8',
0x1d6c0:'\u03c9', 0x1d6d3:'\u03c3', 0x1d6e2:'\u03b1', 0x1d6e3:'\u03b2',
0x1d6e4:'\u03b3', 0x1d6e5:'\u03b4', 0x1d6e6:'\u03b5', 0x1d6e7:'\u03b6',
0x1d6e8:'\u03b7', 0x1d6e9:'\u03b8', 0x1d6ea:'\u03b9', 0x1d6eb:'\u03ba',
0x1d6ec:'\u03bb', 0x1d6ed:'\u03bc', 0x1d6ee:'\u03bd', 0x1d6ef:'\u03be',
0x1d6f0:'\u03bf', 0x1d6f1:'\u03c0', 0x1d6f2:'\u03c1', 0x1d6f3:'\u03b8',
0x1d6f4:'\u03c3', 0x1d6f5:'\u03c4', 0x1d6f6:'\u03c5', 0x1d6f7:'\u03c6',
0x1d6f8:'\u03c7', 0x1d6f9:'\u03c8', 0x1d6fa:'\u03c9', 0x1d70d:'\u03c3',
0x1d71c:'\u03b1', 0x1d71d:'\u03b2', 0x1d71e:'\u03b3', 0x1d71f:'\u03b4',
0x1d720:'\u03b5', 0x1d721:'\u03b6', 0x1d722:'\u03b7', 0x1d723:'\u03b8',
0x1d724:'\u03b9', 0x1d725:'\u03ba', 0x1d726:'\u03bb', 0x1d727:'\u03bc',
0x1d728:'\u03bd', 0x1d729:'\u03be', 0x1d72a:'\u03bf', 0x1d72b:'\u03c0',
0x1d72c:'\u03c1', 0x1d72d:'\u03b8', 0x1d72e:'\u03c3', 0x1d72f:'\u03c4',
0x1d730:'\u03c5', 0x1d731:'\u03c6', 0x1d732:'\u03c7', 0x1d733:'\u03c8',
0x1d734:'\u03c9', 0x1d747:'\u03c3', 0x1d756:'\u03b1', 0x1d757:'\u03b2',
0x1d758:'\u03b3', 0x1d759:'\u03b4', 0x1d75a:'\u03b5', 0x1d75b:'\u03b6',
0x1d75c:'\u03b7', 0x1d75d:'\u03b8', 0x1d75e:'\u03b9', 0x1d75f:'\u03ba',
0x1d760:'\u03bb', 0x1d761:'\u03bc', 0x1d762:'\u03bd', 0x1d763:'\u03be',
0x1d764:'\u03bf', 0x1d765:'\u03c0', 0x1d766:'\u03c1', 0x1d767:'\u03b8',
0x1d768:'\u03c3', 0x1d769:'\u03c4', 0x1d76a:'\u03c5', 0x1d76b:'\u03c6',
0x1d76c:'\u03c7', 0x1d76d:'\u03c8', 0x1d76e:'\u03c9', 0x1d781:'\u03c3',
0x1d790:'\u03b1', 0x1d791:'\u03b2', 0x1d792:'\u03b3', 0x1d793:'\u03b4',
0x1d794:'\u03b5', 0x1d795:'\u03b6', 0x1d796:'\u03b7', 0x1d797:'\u03b8',
0x1d798:'\u03b9', 0x1d799:'\u03ba', 0x1d79a:'\u03bb', 0x1d79b:'\u03bc',
0x1d79c:'\u03bd', 0x1d79d:'\u03be', 0x1d79e:'\u03bf', 0x1d79f:'\u03c0',
0x1d7a0:'\u03c1', 0x1d7a1:'\u03b8', 0x1d7a2:'\u03c3', 0x1d7a3:'\u03c4',
0x1d7a4:'\u03c5', 0x1d7a5:'\u03c6', 0x1d7a6:'\u03c7', 0x1d7a7:'\u03c8',
0x1d7a8:'\u03c9', 0x1d7bb:'\u03c3', }
def map_table_b3(code):
r = b3_exceptions.get(ord(code))
if r is not None: return r
return code.lower()
def map_table_b2(a):
al = map_table_b3(a)
b = unicodedata.normalize("NFKC", al)
bl = "".join([map_table_b3(ch) for ch in b])
c = unicodedata.normalize("NFKC", bl)
if b != c:
return c
else:
return al
def in_table_c11(code):
return code == " "
def in_table_c12(code):
return unicodedata.category(code) == "Zs" and code != " "
def in_table_c11_c12(code):
return unicodedata.category(code) == "Zs"
def in_table_c21(code):
return ord(code) < 128 and unicodedata.category(code) == "Cc"
c22_specials = set([1757, 1807, 6158, 8204, 8205, 8232, 8233, 65279] + list(range(8288,8292)) + list(range(8298,8304)) + list(range(65529,65533)) + list(range(119155,119163)))
def in_table_c22(code):
c = ord(code)
if c < 128: return False
if unicodedata.category(code) == "Cc": return True
return c in c22_specials
def in_table_c21_c22(code):
return unicodedata.category(code) == "Cc" or \
ord(code) in c22_specials
def in_table_c3(code):
return unicodedata.category(code) == "Co"
def in_table_c4(code):
c = ord(code)
if c < 0xFDD0: return False
if c < 0xFDF0: return True
return (ord(code) & 0xFFFF) in (0xFFFE, 0xFFFF)
def in_table_c5(code):
return unicodedata.category(code) == "Cs"
c6_set = set(range(65529,65534))
def in_table_c6(code):
return ord(code) in c6_set
c7_set = set(range(12272,12284))
def in_table_c7(code):
return ord(code) in c7_set
c8_set = set([832, 833, 8206, 8207] + list(range(8234,8239)) + list(range(8298,8304)))
def in_table_c8(code):
return ord(code) in c8_set
c9_set = set([917505] + list(range(917536,917632)))
def in_table_c9(code):
return ord(code) in c9_set
def in_table_d1(code):
return unicodedata.bidirectional(code) in ("R","AL")
def in_table_d2(code):
return unicodedata.bidirectional(code) == "L"
__all__ = [
# Functions
'calcsize', 'pack', 'pack_into', 'unpack', 'unpack_from',
'iter_unpack',
# Classes
'Struct',
# Exceptions
'error'
]
from _struct import *
from _struct import _clearcache
from _struct import __doc__
# subprocess - Subprocesses with accessible I/O streams
#
# For more information about this module, see PEP 324.
#
# Copyright (c) 2003-2005 by Peter Astrand <[email protected]>
#
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/2.4/license for licensing details.
r"""subprocess - Subprocesses with accessible I/O streams
This module allows you to spawn processes, connect to their
input/output/error pipes, and obtain their return codes. This module
intends to replace several older modules and functions:
os.system
os.spawn*
Information about how the subprocess module can be used to replace these
modules and functions can be found below.
Using the subprocess module
===========================
This module defines one class called Popen:
class Popen(args, bufsize=-1, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=True, shell=False,
cwd=None, env=None, universal_newlines=False,
startupinfo=None, creationflags=0,
restore_signals=True, start_new_session=False, pass_fds=()):
Arguments are:
args should be a string, or a sequence of program arguments. The
program to execute is normally the first item in the args sequence or
string, but can be explicitly set by using the executable argument.
On POSIX, with shell=False (default): In this case, the Popen class
uses os.execvp() to execute the child program. args should normally
be a sequence. A string will be treated as a sequence with the string
as the only item (the program to execute).
On POSIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
On Windows: the Popen class uses CreateProcess() to execute the child
program, which operates on strings. If args is a sequence, it will be
converted to a string using the list2cmdline method. Please note that
not all MS Windows applications interpret the command line the same
way: The list2cmdline is designed for applications using the same
rules as the MS C runtime.
bufsize will be supplied as the corresponding argument to the io.open()
function when creating the stdin/stdout/stderr pipe file objects:
0 means unbuffered (read & write are one system call and can return short),
1 means line buffered, any other positive value means use a buffer of
approximately that size. A negative bufsize, the default, means the system
default of io.DEFAULT_BUFFER_SIZE will be used.
stdin, stdout and stderr specify the executed programs' standard
input, standard output and standard error file handles, respectively.
Valid values are PIPE, an existing file descriptor (a positive
integer), an existing file object, and None. PIPE indicates that a
new pipe to the child should be created. With None, no redirection
will occur; the child's file handles will be inherited from the
parent. Additionally, stderr can be STDOUT, which indicates that the
stderr data from the applications should be captured into the same
file handle as for stdout.
On POSIX, if preexec_fn is set to a callable object, this object will be
called in the child process just before the child is executed. The use
of preexec_fn is not thread safe, using it in the presence of threads
could lead to a deadlock in the child process before the new executable
is executed.
If close_fds is true, all file descriptors except 0, 1 and 2 will be
closed before the child process is executed. The default for close_fds
varies by platform: Always true on POSIX. True when stdin/stdout/stderr
are None on Windows, false otherwise.
pass_fds is an optional sequence of file descriptors to keep open between the
parent and child. Providing any pass_fds implicitly sets close_fds to true.
if shell is true, the specified command will be executed through the
shell.
If cwd is not None, the current directory will be changed to cwd
before the child is executed.
On POSIX, if restore_signals is True all signals that Python sets to
SIG_IGN are restored to SIG_DFL in the child process before the exec.
Currently this includes the SIGPIPE, SIGXFZ and SIGXFSZ signals. This
parameter does nothing on Windows.
On POSIX, if start_new_session is True, the setsid() system call will be made
in the child process prior to executing the command.
If env is not None, it defines the environment variables for the new
process.
If universal_newlines is false, the file objects stdin, stdout and stderr
are opened as binary files, and no line ending conversion is done.
If universal_newlines is true, the file objects stdout and stderr are
opened as a text files, but lines may be terminated by any of '\n',
the Unix end-of-line convention, '\r', the old Macintosh convention or
'\r\n', the Windows convention. All of these external representations
are seen as '\n' by the Python program. Also, the newlines attribute
of the file objects stdout, stdin and stderr are not updated by the
communicate() method.
The startupinfo and creationflags, if given, will be passed to the
underlying CreateProcess() function. They can specify things such as
appearance of the main window and priority for the new process.
(Windows only)
This module also defines some shortcut functions:
call(*popenargs, **kwargs):
Run command with arguments. Wait for command to complete, then
return the returncode attribute.
The arguments are the same as for the Popen constructor. Example:
>>> retcode = subprocess.call(["ls", "-l"])
check_call(*popenargs, **kwargs):
Run command with arguments. Wait for command to complete. If the
exit code was zero then return, otherwise raise
CalledProcessError. The CalledProcessError object will have the
return code in the returncode attribute.
The arguments are the same as for the Popen constructor. Example:
>>> subprocess.check_call(["ls", "-l"])
0
getstatusoutput(cmd):
Return (status, output) of executing cmd in a shell.
Execute the string 'cmd' in a shell with 'check_output' and
return a 2-tuple (status, output). Universal newlines mode is used,
meaning that the result with be decoded to a string.
A trailing newline is stripped from the output.
The exit status for the command can be interpreted
according to the rules for the function 'wait'. Example:
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')
>>> subprocess.getstatusoutput('cat /bin/junk')
(256, 'cat: /bin/junk: No such file or directory')
>>> subprocess.getstatusoutput('/bin/junk')
(256, 'sh: /bin/junk: not found')
getoutput(cmd):
Return output (stdout or stderr) of executing cmd in a shell.
Like getstatusoutput(), except the exit status is ignored and the return
value is a string containing the command's output. Example:
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'
check_output(*popenargs, **kwargs):
Run command with arguments and return its output.
If the exit code was non-zero it raises a CalledProcessError. The
CalledProcessError object will have the return code in the returncode
attribute and output in the output attribute.
The arguments are the same as for the Popen constructor. Example:
>>> output = subprocess.check_output(["ls", "-l", "/dev/null"])
There is an additional optional argument, "input", allowing you to
pass a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument.
Exceptions
----------
Exceptions raised in the child process, before the new program has
started to execute, will be re-raised in the parent. Additionally,
the exception object will have one extra attribute called
'child_traceback', which is a string containing traceback information
from the child's point of view.
The most common exception raised is OSError. This occurs, for
example, when trying to execute a non-existent file. Applications
should prepare for OSErrors.
A ValueError will be raised if Popen is called with invalid arguments.
Exceptions defined within this module inherit from SubprocessError.
check_call() and check_output() will raise CalledProcessError if the
called process returns a non-zero return code. TimeoutExpired
be raised if a timeout was specified and expired.
Security
--------
Unlike some other popen functions, this implementation will never call
/bin/sh implicitly. This means that all characters, including shell
metacharacters, can safely be passed to child processes.
Popen objects
=============
Instances of the Popen class have the following methods:
poll()
Check if child process has terminated. Returns returncode
attribute.
wait()
Wait for child process to terminate. Returns returncode attribute.
communicate(input=None)
Interact with process: Send data to stdin. Read data from stdout
and stderr, until end-of-file is reached. Wait for process to
terminate. The optional input argument should be a string to be
sent to the child process, or None, if no data should be sent to
the child.
communicate() returns a tuple (stdout, stderr).
Note: The data read is buffered in memory, so do not use this
method if the data size is large or unlimited.
The following attributes are also available:
stdin
If the stdin argument is PIPE, this attribute is a file object
that provides input to the child process. Otherwise, it is None.
stdout
If the stdout argument is PIPE, this attribute is a file object
that provides output from the child process. Otherwise, it is
None.
stderr
If the stderr argument is PIPE, this attribute is file object that
provides error output from the child process. Otherwise, it is
None.
pid
The process ID of the child process.
returncode
The child return code. A None value indicates that the process
hasn't terminated yet. A negative value -N indicates that the
child was terminated by signal N (POSIX only).
Replacing older functions with the subprocess module
====================================================
In this section, "a ==> b" means that b can be used as a replacement
for a.
Note: All functions in this section fail (more or less) silently if
the executed program cannot be found; this module raises an OSError
exception.
In the following examples, we assume that the subprocess module is
imported with "from subprocess import *".
Replacing /bin/sh shell backquote
---------------------------------
output=`mycmd myarg`
==>
output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]
Replacing shell pipe line
-------------------------
output=`dmesg | grep hda`
==>
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
Replacing os.system()
---------------------
sts = os.system("mycmd" + " myarg")
==>
p = Popen("mycmd" + " myarg", shell=True)
pid, sts = os.waitpid(p.pid, 0)
Note:
* Calling the program through the shell is usually not required.
* It's easier to look at the returncode attribute than the
exitstatus.
A more real-world example would look like this:
try:
retcode = call("mycmd" + " myarg", shell=True)
if retcode < 0:
print("Child was terminated by signal", -retcode, file=sys.stderr)
else:
print("Child returned", retcode, file=sys.stderr)
except OSError as e:
print("Execution failed:", e, file=sys.stderr)
Replacing os.spawn*
-------------------
P_NOWAIT example:
pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg")
==>
pid = Popen(["/bin/mycmd", "myarg"]).pid
P_WAIT example:
retcode = os.spawnlp(os.P_WAIT, "/bin/mycmd", "mycmd", "myarg")
==>
retcode = call(["/bin/mycmd", "myarg"])
Vector example:
os.spawnvp(os.P_NOWAIT, path, args)
==>
Popen([path] + args[1:])
Environment example:
os.spawnlpe(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg", env)
==>
Popen(["/bin/mycmd", "myarg"], env={"PATH": "/usr/bin"})
"""
import sys
ironpython = (sys.implementation.name == 'ironpython')
mswindows = (sys.platform == "win32")
import io
import os
import time
import signal
import builtins
import warnings
import errno
try:
from time import monotonic as _time
except ImportError:
from time import time as _time
# Exception classes used by this module.
class SubprocessError(Exception): pass
class CalledProcessError(SubprocessError):
"""This exception is raised when a process run by check_call() or
check_output() returns a non-zero exit status.
The exit status will be stored in the returncode attribute;
check_output() will also store the output in the output attribute.
"""
def __init__(self, returncode, cmd, output=None):
self.returncode = returncode
self.cmd = cmd
self.output = output
def __str__(self):
return "Command '%s' returned non-zero exit status %d" % (self.cmd, self.returncode)
class TimeoutExpired(SubprocessError):
"""This exception is raised when the timeout expires while waiting for a
child process.
"""
def __init__(self, cmd, timeout, output=None):
self.cmd = cmd
self.timeout = timeout
self.output = output
def __str__(self):
return ("Command '%s' timed out after %s seconds" %
(self.cmd, self.timeout))
if mswindows:
import threading
import msvcrt
import _winapi
class STARTUPINFO:
dwFlags = 0
hStdInput = None
hStdOutput = None
hStdError = None
wShowWindow = 0
elif ironpython:
import threading
import clr
clr.AddReference("System")
from System.Diagnostics import Process
else:
import _posixsubprocess
import select
import selectors
try:
import threading
except ImportError:
import dummy_threading as threading
# When select or poll has indicated that the file is writable,
# we can write up to _PIPE_BUF bytes without risk of blocking.
# POSIX defines PIPE_BUF as >= 512.
_PIPE_BUF = getattr(select, 'PIPE_BUF', 512)
# poll/select have the advantage of not requiring any extra file
# descriptor, contrarily to epoll/kqueue (also, they require a single
# syscall).
if hasattr(selectors, 'PollSelector'):
_PopenSelector = selectors.PollSelector
else:
_PopenSelector = selectors.SelectSelector
__all__ = ["Popen", "PIPE", "STDOUT", "call", "check_call", "getstatusoutput",
"getoutput", "check_output", "CalledProcessError", "DEVNULL"]
if mswindows:
from _winapi import (CREATE_NEW_CONSOLE, CREATE_NEW_PROCESS_GROUP,
STD_INPUT_HANDLE, STD_OUTPUT_HANDLE,
STD_ERROR_HANDLE, SW_HIDE,
STARTF_USESTDHANDLES, STARTF_USESHOWWINDOW)
__all__.extend(["CREATE_NEW_CONSOLE", "CREATE_NEW_PROCESS_GROUP",
"STD_INPUT_HANDLE", "STD_OUTPUT_HANDLE",
"STD_ERROR_HANDLE", "SW_HIDE",
"STARTF_USESTDHANDLES", "STARTF_USESHOWWINDOW"])
class Handle(int):
closed = False
def Close(self, CloseHandle=_winapi.CloseHandle):
if not self.closed:
self.closed = True
CloseHandle(self)
def Detach(self):
if not self.closed:
self.closed = True
return int(self)
raise ValueError("already closed")
def __repr__(self):
return "Handle(%d)" % int(self)
__del__ = Close
__str__ = __repr__
elif ironpython:
__all__.extend(["Process"])
try:
MAXFD = os.sysconf("SC_OPEN_MAX")
except:
MAXFD = 256
# This lists holds Popen instances for which the underlying process had not
# exited at the time its __del__ method got called: those processes are wait()ed
# for synchronously from _cleanup() when a new Popen object is created, to avoid
# zombie processes.
_active = []
def _cleanup():
for inst in _active[:]:
res = inst._internal_poll(_deadstate=sys.maxsize)
if res is not None:
try:
_active.remove(inst)
except ValueError:
# This can happen if two threads create a new Popen instance.
# It's harmless that it was already removed, so ignore.
pass
PIPE = -1
STDOUT = -2
DEVNULL = -3
def _eintr_retry_call(func, *args):
while True:
try:
return func(*args)
except InterruptedError:
continue
# XXX This function is only used by multiprocessing and the test suite,
# but it's here so that it can be imported when Python is compiled without
# threads.
def _args_from_interpreter_flags():
"""Return a list of command-line arguments reproducing the current
settings in sys.flags and sys.warnoptions."""
flag_opt_map = {
'debug': 'd',
# 'inspect': 'i',
# 'interactive': 'i',
'optimize': 'O',
'dont_write_bytecode': 'B',
'no_user_site': 's',
'no_site': 'S',
'ignore_environment': 'E',
'verbose': 'v',
'bytes_warning': 'b',
'quiet': 'q',
}
args = []
for flag, opt in flag_opt_map.items():
v = getattr(sys.flags, flag)
if v > 0:
args.append('-' + opt * v)
for opt in sys.warnoptions:
args.append('-W' + opt)
return args
def call(*popenargs, timeout=None, **kwargs):
"""Run command with arguments. Wait for command to complete or
timeout, then return the returncode attribute.
The arguments are the same as for the Popen constructor. Example:
retcode = call(["ls", "-l"])
"""
with Popen(*popenargs, **kwargs) as p:
try:
return p.wait(timeout=timeout)
except:
p.kill()
p.wait()
raise
def check_call(*popenargs, **kwargs):
"""Run command with arguments. Wait for command to complete. If
the exit code was zero then return, otherwise raise
CalledProcessError. The CalledProcessError object will have the
return code in the returncode attribute.
The arguments are the same as for the call function. Example:
check_call(["ls", "-l"])
"""
retcode = call(*popenargs, **kwargs)
if retcode:
cmd = kwargs.get("args")
if cmd is None:
cmd = popenargs[0]
raise CalledProcessError(retcode, cmd)
return 0
def check_output(*popenargs, timeout=None, **kwargs):
r"""Run command with arguments and return its output.
If the exit code was non-zero it raises a CalledProcessError. The
CalledProcessError object will have the return code in the returncode
attribute and output in the output attribute.
The arguments are the same as for the Popen constructor. Example:
>>> check_output(["ls", "-l", "/dev/null"])
b'crw-rw-rw- 1 root root 1, 3 Oct 18 2007 /dev/null\n'
The stdout argument is not allowed as it is used internally.
To capture standard error in the result, use stderr=STDOUT.
>>> check_output(["/bin/sh", "-c",
... "ls -l non_existent_file ; exit 0"],
... stderr=STDOUT)
b'ls: non_existent_file: No such file or directory\n'
There is an additional optional argument, "input", allowing you to
pass a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it too will be used internally. Example:
>>> check_output(["sed", "-e", "s/foo/bar/"],
... input=b"when in the course of fooman events\n")
b'when in the course of barman events\n'
If universal_newlines=True is passed, the return value will be a
string rather than bytes.
"""
if 'stdout' in kwargs:
raise ValueError('stdout argument not allowed, it will be overridden.')
if 'input' in kwargs:
if 'stdin' in kwargs:
raise ValueError('stdin and input arguments may not both be used.')
inputdata = kwargs['input']
del kwargs['input']
kwargs['stdin'] = PIPE
else:
inputdata = None
with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
try:
output, unused_err = process.communicate(inputdata, timeout=timeout)
except TimeoutExpired:
process.kill()
output, unused_err = process.communicate()
raise TimeoutExpired(process.args, timeout, output=output)
except:
process.kill()
process.wait()
raise
retcode = process.poll()
if retcode:
raise CalledProcessError(retcode, process.args, output=output)
return output
def list2cmdline(seq):
"""
Translate a sequence of arguments into a command line
string, using the same rules as the MS C runtime:
1) Arguments are delimited by white space, which is either a
space or a tab.
2) A string surrounded by double quotation marks is
interpreted as a single argument, regardless of white space
contained within. A quoted string can be embedded in an
argument.
3) A double quotation mark preceded by a backslash is
interpreted as a literal double quotation mark.
4) Backslashes are interpreted literally, unless they
immediately precede a double quotation mark.
5) If backslashes immediately precede a double quotation mark,
every pair of backslashes is interpreted as a literal
backslash. If the number of backslashes is odd, the last
backslash escapes the next double quotation mark as
described in rule 3.
"""
# See
# http://msdn.microsoft.com/en-us/library/17w5ykft.aspx
# or search http://msdn.microsoft.com for
# "Parsing C++ Command-Line Arguments"
result = []
needquote = False
for arg in seq:
bs_buf = []
# Add a space to separate this argument from the others
if result:
result.append(' ')
needquote = (" " in arg) or ("\t" in arg) or not arg
if needquote:
result.append('"')
for c in arg:
if c == '\\':
# Don't know if we need to double yet.
bs_buf.append(c)
elif c == '"':
# Double backslashes.
result.append('\\' * len(bs_buf)*2)
bs_buf = []
result.append('\\"')
else:
# Normal char
if bs_buf:
result.extend(bs_buf)
bs_buf = []
result.append(c)
# Add remaining backslashes, if any.
if bs_buf:
result.extend(bs_buf)
if needquote:
result.extend(bs_buf)
result.append('"')
return ''.join(result)
# Various tools for executing commands and looking at their output and status.
#
def getstatusoutput(cmd):
""" Return (status, output) of executing cmd in a shell.
Execute the string 'cmd' in a shell with 'check_output' and
return a 2-tuple (status, output). Universal newlines mode is used,
meaning that the result with be decoded to a string.
A trailing newline is stripped from the output.
The exit status for the command can be interpreted
according to the rules for the function 'wait'. Example:
>>> import subprocess
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')
>>> subprocess.getstatusoutput('cat /bin/junk')
(256, 'cat: /bin/junk: No such file or directory')
>>> subprocess.getstatusoutput('/bin/junk')
(256, 'sh: /bin/junk: not found')
"""
try:
data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
status = 0
except CalledProcessError as ex:
data = ex.output
status = ex.returncode
if data[-1:] == '\n':
data = data[:-1]
return status, data
def getoutput(cmd):
"""Return output (stdout or stderr) of executing cmd in a shell.
Like getstatusoutput(), except the exit status is ignored and the return
value is a string containing the command's output. Example:
>>> import subprocess
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'
"""
return getstatusoutput(cmd)[1]
_PLATFORM_DEFAULT_CLOSE_FDS = object()
class Popen(object):
_child_created = False # Set here since __del__ checks it
def __init__(self, args, bufsize=-1, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=_PLATFORM_DEFAULT_CLOSE_FDS,
shell=False, cwd=None, env=None, universal_newlines=False,
startupinfo=None, creationflags=0,
restore_signals=True, start_new_session=False,
pass_fds=()):
"""Create new Popen instance."""
_cleanup()
# Held while anything is calling waitpid before returncode has been
# updated to prevent clobbering returncode if wait() or poll() are
# called from multiple threads at once. After acquiring the lock,
# code must re-check self.returncode to see if another thread just
# finished a waitpid() call.
self._waitpid_lock = threading.Lock()
self._input = None
self._communication_started = False
if bufsize is None:
bufsize = -1 # Restore default
if not isinstance(bufsize, int):
raise TypeError("bufsize must be an integer")
if mswindows:
if preexec_fn is not None:
raise ValueError("preexec_fn is not supported on Windows "
"platforms")
any_stdio_set = (stdin is not None or stdout is not None or
stderr is not None)
if close_fds is _PLATFORM_DEFAULT_CLOSE_FDS:
if any_stdio_set:
close_fds = False
else:
close_fds = True
elif close_fds and any_stdio_set:
raise ValueError(
"close_fds is not supported on Windows platforms"
" if you redirect stdin/stdout/stderr")
elif ironpython:
if preexec_fn is not None:
raise ValueError("preexec_fn is not supported by IronPython")
# if close_fds:
# raise ValueError("close_fds is not supported by IronPython")
else:
# POSIX
if close_fds is _PLATFORM_DEFAULT_CLOSE_FDS:
close_fds = True
if pass_fds and not close_fds:
warnings.warn("pass_fds overriding close_fds.", RuntimeWarning)
close_fds = True
if startupinfo is not None:
raise ValueError("startupinfo is only supported on Windows "
"platforms")
if creationflags != 0:
raise ValueError("creationflags is only supported on Windows "
"platforms")
self.args = args
self.stdin = None
self.stdout = None
self.stderr = None
self.pid = None
self.returncode = None
self.universal_newlines = universal_newlines
# Input and output objects. The general principle is like
# this:
#
# Parent Child
# ------ -----
# p2cwrite ---stdin---> p2cread
# c2pread <--stdout--- c2pwrite
# errread <--stderr--- errwrite
#
# On POSIX, the child objects are file descriptors. On
# Windows, these are Windows file handles. The parent objects
# are file descriptors on both platforms. The parent objects
# are -1 when not using PIPEs. The child objects are -1
# when not redirecting.
(p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite) = self._get_handles(stdin, stdout, stderr)
# We wrap OS handles *before* launching the child, otherwise a
# quickly terminating child could make our fds unwrappable
# (see #8458).
if mswindows:
if p2cwrite != -1:
p2cwrite = msvcrt.open_osfhandle(p2cwrite.Detach(), 0)
if c2pread != -1:
c2pread = msvcrt.open_osfhandle(c2pread.Detach(), 0)
if errread != -1:
errread = msvcrt.open_osfhandle(errread.Detach(), 0)
if p2cwrite != -1:
self.stdin = io.open(p2cwrite, 'wb', bufsize)
if universal_newlines:
self.stdin = io.TextIOWrapper(self.stdin, write_through=True,
line_buffering=(bufsize == 1))
if c2pread != -1:
self.stdout = io.open(c2pread, 'rb', bufsize)
if universal_newlines:
self.stdout = io.TextIOWrapper(self.stdout)
if errread != -1:
self.stderr = io.open(errread, 'rb', bufsize)
if universal_newlines:
self.stderr = io.TextIOWrapper(self.stderr)
self._closed_child_pipe_fds = False
try:
self._execute_child(args, executable, preexec_fn, close_fds,
pass_fds, cwd, env,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite,
restore_signals, start_new_session)
except:
# Cleanup if the child failed starting.
for f in filter(None, (self.stdin, self.stdout, self.stderr)):
try:
f.close()
except OSError:
pass # Ignore EBADF or other errors.
if not self._closed_child_pipe_fds:
to_close = []
if stdin == PIPE:
to_close.append(p2cread)
if stdout == PIPE:
to_close.append(c2pwrite)
if stderr == PIPE:
to_close.append(errwrite)
if hasattr(self, '_devnull'):
to_close.append(self._devnull)
for fd in to_close:
try:
os.close(fd)
except OSError:
pass
raise
def _translate_newlines(self, data, encoding):
data = data.decode(encoding)
return data.replace("\r\n", "\n").replace("\r", "\n")
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
if self.stdout:
self.stdout.close()
if self.stderr:
self.stderr.close()
try: # Flushing a BufferedWriter may raise an error
if self.stdin:
self.stdin.close()
finally:
# Wait for the process to terminate, to avoid zombies.
self.wait()
def __del__(self, _maxsize=sys.maxsize):
if not self._child_created:
# We didn't get to successfully create a child process.
return
# In case the child hasn't been waited on, check if it's done.
self._internal_poll(_deadstate=_maxsize)
if self.returncode is None and _active is not None:
# Child is still running, keep us alive until we can wait on it.
_active.append(self)
def _get_devnull(self):
if not hasattr(self, '_devnull'):
self._devnull = os.open(os.devnull, os.O_RDWR)
return self._devnull
def communicate(self, input=None, timeout=None):
"""Interact with process: Send data to stdin. Read data from
stdout and stderr, until end-of-file is reached. Wait for
process to terminate. The optional input argument should be
bytes to be sent to the child process, or None, if no data
should be sent to the child.
communicate() returns a tuple (stdout, stderr)."""
if self._communication_started and input:
raise ValueError("Cannot send input after starting communication")
# Optimization: If we are not worried about timeouts, we haven't
# started communicating, and we have one or zero pipes, using select()
# or threads is unnecessary.
if (timeout is None and not self._communication_started and
[self.stdin, self.stdout, self.stderr].count(None) >= 2):
stdout = None
stderr = None
if self.stdin:
if input:
try:
self.stdin.write(input)
except OSError as e:
if e.errno != errno.EPIPE and e.errno != errno.EINVAL:
raise
self.stdin.close()
elif self.stdout:
stdout = _eintr_retry_call(self.stdout.read)
self.stdout.close()
elif self.stderr:
stderr = _eintr_retry_call(self.stderr.read)
self.stderr.close()
self.wait()
else:
if timeout is not None:
endtime = _time() + timeout
else:
endtime = None
try:
stdout, stderr = self._communicate(input, endtime, timeout)
finally:
self._communication_started = True
sts = self.wait(timeout=self._remaining_time(endtime))
return (stdout, stderr)
def poll(self):
return self._internal_poll()
def _remaining_time(self, endtime):
"""Convenience for _communicate when computing timeouts."""
if endtime is None:
return None
else:
return endtime - _time()
def _check_timeout(self, endtime, orig_timeout):
"""Convenience for checking if a timeout has expired."""
if endtime is None:
return
if _time() > endtime:
raise TimeoutExpired(self.args, orig_timeout)
if mswindows:
#
# Windows methods
#
def _get_handles(self, stdin, stdout, stderr):
"""Construct and return tuple with IO objects:
p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite
"""
if stdin is None and stdout is None and stderr is None:
return (-1, -1, -1, -1, -1, -1)
p2cread, p2cwrite = -1, -1
c2pread, c2pwrite = -1, -1
errread, errwrite = -1, -1
if stdin is None:
p2cread = _winapi.GetStdHandle(_winapi.STD_INPUT_HANDLE)
if p2cread is None:
p2cread, _ = _winapi.CreatePipe(None, 0)
p2cread = Handle(p2cread)
_winapi.CloseHandle(_)
elif stdin == PIPE:
p2cread, p2cwrite = _winapi.CreatePipe(None, 0)
p2cread, p2cwrite = Handle(p2cread), Handle(p2cwrite)
elif stdin == DEVNULL:
p2cread = msvcrt.get_osfhandle(self._get_devnull())
elif isinstance(stdin, int):
p2cread = msvcrt.get_osfhandle(stdin)
else:
# Assuming file-like object
p2cread = msvcrt.get_osfhandle(stdin.fileno())
p2cread = self._make_inheritable(p2cread)
if stdout is None:
c2pwrite = _winapi.GetStdHandle(_winapi.STD_OUTPUT_HANDLE)
if c2pwrite is None:
_, c2pwrite = _winapi.CreatePipe(None, 0)
c2pwrite = Handle(c2pwrite)
_winapi.CloseHandle(_)
elif stdout == PIPE:
c2pread, c2pwrite = _winapi.CreatePipe(None, 0)
c2pread, c2pwrite = Handle(c2pread), Handle(c2pwrite)
elif stdout == DEVNULL:
c2pwrite = msvcrt.get_osfhandle(self._get_devnull())
elif isinstance(stdout, int):
c2pwrite = msvcrt.get_osfhandle(stdout)
else:
# Assuming file-like object
c2pwrite = msvcrt.get_osfhandle(stdout.fileno())
c2pwrite = self._make_inheritable(c2pwrite)
if stderr is None:
errwrite = _winapi.GetStdHandle(_winapi.STD_ERROR_HANDLE)
if errwrite is None:
_, errwrite = _winapi.CreatePipe(None, 0)
errwrite = Handle(errwrite)
_winapi.CloseHandle(_)
elif stderr == PIPE:
errread, errwrite = _winapi.CreatePipe(None, 0)
errread, errwrite = Handle(errread), Handle(errwrite)
elif stderr == STDOUT:
errwrite = c2pwrite
elif stderr == DEVNULL:
errwrite = msvcrt.get_osfhandle(self._get_devnull())
elif isinstance(stderr, int):
errwrite = msvcrt.get_osfhandle(stderr)
else:
# Assuming file-like object
errwrite = msvcrt.get_osfhandle(stderr.fileno())
errwrite = self._make_inheritable(errwrite)
return (p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite)
def _make_inheritable(self, handle):
"""Return a duplicate of handle, which is inheritable"""
h = _winapi.DuplicateHandle(
_winapi.GetCurrentProcess(), handle,
_winapi.GetCurrentProcess(), 0, 1,
_winapi.DUPLICATE_SAME_ACCESS)
# IronPython: Not closing this handle may cause read operations to block down the line.
# Because of the GC of .NET, it may not be closed fast enough so we force it to close.
# This is not an issue with CPython because the handle is closed via __del__.
if isinstance(handle, Handle): handle.Close()
return Handle(h)
def _execute_child(self, args, executable, preexec_fn, close_fds,
pass_fds, cwd, env,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite,
unused_restore_signals, unused_start_new_session):
"""Execute program (MS Windows version)"""
assert not pass_fds, "pass_fds not supported on Windows."
if not isinstance(args, str):
args = list2cmdline(args)
# Process startup details
if startupinfo is None:
startupinfo = STARTUPINFO()
if -1 not in (p2cread, c2pwrite, errwrite):
startupinfo.dwFlags |= _winapi.STARTF_USESTDHANDLES
startupinfo.hStdInput = p2cread
startupinfo.hStdOutput = c2pwrite
startupinfo.hStdError = errwrite
if shell:
startupinfo.dwFlags |= _winapi.STARTF_USESHOWWINDOW
startupinfo.wShowWindow = _winapi.SW_HIDE
comspec = os.environ.get("COMSPEC", "cmd.exe")
args = '{} /c "{}"'.format (comspec, args)
# Start the process
try:
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
# no special security
None, None,
int(not close_fds),
creationflags,
env,
cwd,
startupinfo)
finally:
# Child is launched. Close the parent's copy of those pipe
# handles that only the child should have open. You need
# to make sure that no handles to the write end of the
# output pipe are maintained in this process or else the
# pipe will not close when the child process exits and the
# ReadFile will hang.
if p2cread != -1:
p2cread.Close()
if c2pwrite != -1:
c2pwrite.Close()
if errwrite != -1:
errwrite.Close()
if hasattr(self, '_devnull'):
os.close(self._devnull)
# Retain the process handle, but close the thread handle
self._child_created = True
self._handle = Handle(hp)
self.pid = pid
_winapi.CloseHandle(ht)
def _internal_poll(self, _deadstate=None,
_WaitForSingleObject=_winapi.WaitForSingleObject,
_WAIT_OBJECT_0=_winapi.WAIT_OBJECT_0,
_GetExitCodeProcess=_winapi.GetExitCodeProcess):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it can only refer to objects
in its local scope.
"""
if self.returncode is None:
if _WaitForSingleObject(self._handle, 0) == _WAIT_OBJECT_0:
self.returncode = _GetExitCodeProcess(self._handle)
return self.returncode
def wait(self, timeout=None, endtime=None):
"""Wait for child process to terminate. Returns returncode
attribute."""
if endtime is not None:
timeout = self._remaining_time(endtime)
if timeout is None:
timeout_millis = _winapi.INFINITE
else:
timeout_millis = int(timeout * 1000)
if self.returncode is None:
result = _winapi.WaitForSingleObject(self._handle,
timeout_millis)
if result == _winapi.WAIT_TIMEOUT:
raise TimeoutExpired(self.args, timeout)
self.returncode = _winapi.GetExitCodeProcess(self._handle)
return self.returncode
def _readerthread(self, fh, buffer):
buffer.append(fh.read())
fh.close()
def _communicate(self, input, endtime, orig_timeout):
# Start reader threads feeding into a list hanging off of this
# object, unless they've already been started.
if self.stdout and not hasattr(self, "_stdout_buff"):
self._stdout_buff = []
self.stdout_thread = \
threading.Thread(target=self._readerthread,
args=(self.stdout, self._stdout_buff))
self.stdout_thread.daemon = True
self.stdout_thread.start()
if self.stderr and not hasattr(self, "_stderr_buff"):
self._stderr_buff = []
self.stderr_thread = \
threading.Thread(target=self._readerthread,
args=(self.stderr, self._stderr_buff))
self.stderr_thread.daemon = True
self.stderr_thread.start()
if self.stdin:
if input is not None:
try:
self.stdin.write(input)
except OSError as e:
if e.errno == errno.EPIPE:
# communicate() should ignore pipe full error
pass
elif (e.errno == errno.EINVAL
and self.poll() is not None):
# Issue #19612: stdin.write() fails with EINVAL
# if the process already exited before the write
pass
else:
raise
self.stdin.close()
# Wait for the reader threads, or time out. If we time out, the
# threads remain reading and the fds left open in case the user
# calls communicate again.
if self.stdout is not None:
self.stdout_thread.join(self._remaining_time(endtime))
if self.stdout_thread.is_alive():
raise TimeoutExpired(self.args, orig_timeout)
if self.stderr is not None:
self.stderr_thread.join(self._remaining_time(endtime))
if self.stderr_thread.is_alive():
raise TimeoutExpired(self.args, orig_timeout)
# Collect the output from and close both pipes, now that we know
# both have been read successfully.
stdout = None
stderr = None
if self.stdout:
stdout = self._stdout_buff
self.stdout.close()
if self.stderr:
stderr = self._stderr_buff
self.stderr.close()
# All data exchanged. Translate lists into strings.
if stdout is not None:
stdout = stdout[0]
if stderr is not None:
stderr = stderr[0]
return (stdout, stderr)
def send_signal(self, sig):
"""Send a signal to the process."""
# Don't signal a process that we know has already died.
if self.returncode is not None:
return
if sig == signal.SIGTERM:
self.terminate()
elif sig == signal.CTRL_C_EVENT:
os.kill(self.pid, signal.CTRL_C_EVENT)
elif sig == signal.CTRL_BREAK_EVENT:
os.kill(self.pid, signal.CTRL_BREAK_EVENT)
else:
raise ValueError("Unsupported signal: {}".format(sig))
def terminate(self):
"""Terminates the process."""
# Don't terminate a process that we know has already died.
if self.returncode is not None:
return
try:
_winapi.TerminateProcess(self._handle, 1)
except PermissionError:
# ERROR_ACCESS_DENIED (winerror 5) is received when the
# process already died.
rc = _winapi.GetExitCodeProcess(self._handle)
if rc == _winapi.STILL_ACTIVE:
raise
self.returncode = rc
kill = terminate
elif ironpython:
#
# Dotnet methods
#
def _get_handles(self, stdin, stdout, stderr):
# Can't get redirect file before Process.Start() is called
# postpone it to _execute_child
return (stdin, -1, -1, stdout, -1, stderr)
def _execute_child(self, args, executable, preexec_fn, close_fds,
pass_fds, cwd, env,
startupinfo, creationflags, shell,
stdin, unused_p2cwrite,
unused_c2pread, stdout,
unused_errread, stderr,
unused_restore_signals, unused_start_new_session):
"""Execute program (Dotnet version)"""
p = Process()
s = p.StartInfo
if env:
for k, v in env.items():
s.Environment[k] = v
if shell:
if not isinstance(args, str):
args = list2cmdline(args)
# escape backslash and double quote
args = ''.join('\\' + c if c in {'\\', '"'} else c for c in args)
s.Arguments = '-c "{}"'.format(args)
s.FileName = executable or '/bin/sh'
else:
if not isinstance(args, str):
s.FileName = args[0]
s.Arguments = list2cmdline(args[1:])
else:
s.FileName = args
s.RedirectStandardInput = stdin is not None
s.RedirectStandardOutput = stdout is not None
s.RedirectStandardError = stderr is not None
s.WorkingDirectory = cwd
s.UseShellExecute = False
p.Start()
self.pid = p.Id
self._child_created = True
self._handle = p
if stdin == PIPE:
self.stdin = open(p.StandardInput.BaseStream)
if stdout == PIPE:
self.stdout = io.BufferedReader(open(p.StandardOutput.BaseStream))
if stderr == PIPE:
self.stderr = io.BufferedReader(open(p.StandardError.BaseStream))
# dotnet can't redirect stdio to file/stream, thus has to feed from parent
if stdin not in (None, DEVNULL, PIPE):
# assume file-like object
input = stdin.read()
with open(self._handle.StandardInput.BaseStream) as f:
f.write(input)
def _internal_poll(self):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it can only refer to objects
in its local scope.
"""
if self.returncode is None and self._handle.HasExited:
self.returncode = self._handle.ExitCode
return self.returncode
def wait(self, timeout=None, endtime=None):
"""Wait for child process to terminate. Returns returncode
attribute."""
if endtime is not None:
timeout = self._remaining_time(endtime)
if timeout is None:
self._handle.WaitForExit()
else:
self._handle.WaitForExit(int(timeout * 1000))
self.returncode = self._handle.ExitCode
return self.returncode
def _communicate(self, input, endtime, orig_timeout):
# .NET framework caches stdout and stderr
# so we can simply wait then read
if self.stdin:
if input:
self.stdin.write(input)
self.stdin.close()
if orig_timeout is not None:
self.wait(endtime=endtime)
return (
self.stdout.read() if self.stdout else None,
self.stderr.read() if self.stderr else None,
)
def terminate(self):
"""Terminates the process."""
self._handle.Kill()
kill = terminate
else:
#
# POSIX methods
#
def _get_handles(self, stdin, stdout, stderr):
"""Construct and return tuple with IO objects:
p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite
"""
p2cread, p2cwrite = -1, -1
c2pread, c2pwrite = -1, -1
errread, errwrite = -1, -1
if stdin is None:
pass
elif stdin == PIPE:
p2cread, p2cwrite = os.pipe()
elif stdin == DEVNULL:
p2cread = self._get_devnull()
elif isinstance(stdin, int):
p2cread = stdin
else:
# Assuming file-like object
p2cread = stdin.fileno()
if stdout is None:
pass
elif stdout == PIPE:
c2pread, c2pwrite = os.pipe()
elif stdout == DEVNULL:
c2pwrite = self._get_devnull()
elif isinstance(stdout, int):
c2pwrite = stdout
else:
# Assuming file-like object
c2pwrite = stdout.fileno()
if stderr is None:
pass
elif stderr == PIPE:
errread, errwrite = os.pipe()
elif stderr == STDOUT:
errwrite = c2pwrite
elif stderr == DEVNULL:
errwrite = self._get_devnull()
elif isinstance(stderr, int):
errwrite = stderr
else:
# Assuming file-like object
errwrite = stderr.fileno()
return (p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite)
def _close_fds(self, fds_to_keep):
start_fd = 3
for fd in sorted(fds_to_keep):
if fd >= start_fd:
os.closerange(start_fd, fd)
start_fd = fd + 1
if start_fd <= MAXFD:
os.closerange(start_fd, MAXFD)
def _execute_child(self, args, executable, preexec_fn, close_fds,
pass_fds, cwd, env,
startupinfo, creationflags, shell,
p2cread, p2cwrite,
c2pread, c2pwrite,
errread, errwrite,
restore_signals, start_new_session):
"""Execute program (POSIX version)"""
if isinstance(args, (str, bytes)):
args = [args]
else:
args = list(args)
if shell:
args = ["/bin/sh", "-c"] + args
if executable:
args[0] = executable
if executable is None:
executable = args[0]
orig_executable = executable
# For transferring possible exec failure from child to parent.
# Data format: "exception name:hex errno:description"
# Pickle is not used; it is complex and involves memory allocation.
errpipe_read, errpipe_write = os.pipe()
# errpipe_write must not be in the standard io 0, 1, or 2 fd range.
low_fds_to_close = []
while errpipe_write < 3:
low_fds_to_close.append(errpipe_write)
errpipe_write = os.dup(errpipe_write)
for low_fd in low_fds_to_close:
os.close(low_fd)
try:
try:
# We must avoid complex work that could involve
# malloc or free in the child process to avoid
# potential deadlocks, thus we do all this here.
# and pass it to fork_exec()
if env is not None:
env_list = []
for k, v in env.items():
k = os.fsencode(k)
if b'=' in k:
raise ValueError("illegal environment variable name")
env_list.append(k + b'=' + os.fsencode(v))
else:
env_list = None # Use execv instead of execve.
executable = os.fsencode(executable)
if os.path.dirname(executable):
executable_list = (executable,)
else:
# This matches the behavior of os._execvpe().
executable_list = tuple(
os.path.join(os.fsencode(dir), executable)
for dir in os.get_exec_path(env))
fds_to_keep = set(pass_fds)
fds_to_keep.add(errpipe_write)
self.pid = _posixsubprocess.fork_exec(
args, executable_list,
close_fds, sorted(fds_to_keep), cwd, env_list,
p2cread, p2cwrite, c2pread, c2pwrite,
errread, errwrite,
errpipe_read, errpipe_write,
restore_signals, start_new_session, preexec_fn)
self._child_created = True
finally:
# be sure the FD is closed no matter what
os.close(errpipe_write)
# self._devnull is not always defined.
devnull_fd = getattr(self, '_devnull', None)
if p2cread != -1 and p2cwrite != -1 and p2cread != devnull_fd:
os.close(p2cread)
if c2pwrite != -1 and c2pread != -1 and c2pwrite != devnull_fd:
os.close(c2pwrite)
if errwrite != -1 and errread != -1 and errwrite != devnull_fd:
os.close(errwrite)
if devnull_fd is not None:
os.close(devnull_fd)
# Prevent a double close of these fds from __init__ on error.
self._closed_child_pipe_fds = True
# Wait for exec to fail or succeed; possibly raising an
# exception (limited in size)
errpipe_data = bytearray()
while True:
part = _eintr_retry_call(os.read, errpipe_read, 50000)
errpipe_data += part
if not part or len(errpipe_data) > 50000:
break
finally:
# be sure the FD is closed no matter what
os.close(errpipe_read)
if errpipe_data:
try:
_eintr_retry_call(os.waitpid, self.pid, 0)
except OSError as e:
if e.errno != errno.ECHILD:
raise
try:
exception_name, hex_errno, err_msg = (
errpipe_data.split(b':', 2))
except ValueError:
exception_name = b'SubprocessError'
hex_errno = b'0'
err_msg = (b'Bad exception data from child: ' +
repr(errpipe_data))
child_exception_type = getattr(
builtins, exception_name.decode('ascii'),
SubprocessError)
err_msg = err_msg.decode(errors="surrogatepass")
if issubclass(child_exception_type, OSError) and hex_errno:
errno_num = int(hex_errno, 16)
child_exec_never_called = (err_msg == "noexec")
if child_exec_never_called:
err_msg = ""
if errno_num != 0:
err_msg = os.strerror(errno_num)
if errno_num == errno.ENOENT:
if child_exec_never_called:
# The error must be from chdir(cwd).
err_msg += ': ' + repr(cwd)
else:
err_msg += ': ' + repr(orig_executable)
raise child_exception_type(errno_num, err_msg)
raise child_exception_type(err_msg)
def _handle_exitstatus(self, sts, _WIFSIGNALED=os.WIFSIGNALED,
_WTERMSIG=os.WTERMSIG, _WIFEXITED=os.WIFEXITED,
_WEXITSTATUS=os.WEXITSTATUS):
"""All callers to this function MUST hold self._waitpid_lock."""
# This method is called (indirectly) by __del__, so it cannot
# refer to anything outside of its local scope.
if _WIFSIGNALED(sts):
self.returncode = -_WTERMSIG(sts)
elif _WIFEXITED(sts):
self.returncode = _WEXITSTATUS(sts)
else:
# Should never happen
raise SubprocessError("Unknown child exit status!")
def _internal_poll(self, _deadstate=None, _waitpid=os.waitpid,
_WNOHANG=os.WNOHANG, _ECHILD=errno.ECHILD):
"""Check if child process has terminated. Returns returncode
attribute.
This method is called by __del__, so it cannot reference anything
outside of the local scope (nor can any methods it calls).
"""
if self.returncode is None:
if not self._waitpid_lock.acquire(False):
# Something else is busy calling waitpid. Don't allow two
# at once. We know nothing yet.
return None
try:
if self.returncode is not None:
return self.returncode # Another thread waited.
pid, sts = _waitpid(self.pid, _WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except OSError as e:
if _deadstate is not None:
self.returncode = _deadstate
elif e.errno == _ECHILD:
# This happens if SIGCLD is set to be ignored or
# waiting for child processes has otherwise been
# disabled for our process. This child is dead, we
# can't get the status.
# http://bugs.python.org/issue15756
self.returncode = 0
finally:
self._waitpid_lock.release()
return self.returncode
def _try_wait(self, wait_flags):
"""All callers to this function MUST hold self._waitpid_lock."""
try:
(pid, sts) = _eintr_retry_call(os.waitpid, self.pid, wait_flags)
except OSError as e:
if e.errno != errno.ECHILD:
raise
# This happens if SIGCLD is set to be ignored or waiting
# for child processes has otherwise been disabled for our
# process. This child is dead, we can't get the status.
pid = self.pid
sts = 0
return (pid, sts)
def wait(self, timeout=None, endtime=None):
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode is not None:
return self.returncode
# endtime is preferred to timeout. timeout is only used for
# printing.
if endtime is not None or timeout is not None:
if endtime is None:
endtime = _time() + timeout
elif timeout is None:
timeout = self._remaining_time(endtime)
if endtime is not None:
# Enter a busy loop if we have a timeout. This busy loop was
# cribbed from Lib/threading.py in Thread.wait() at r71065.
delay = 0.0005 # 500 us -> initial delay of 1 ms
while True:
if self._waitpid_lock.acquire(False):
try:
if self.returncode is not None:
break # Another thread waited.
(pid, sts) = self._try_wait(os.WNOHANG)
assert pid == self.pid or pid == 0
if pid == self.pid:
self._handle_exitstatus(sts)
break
finally:
self._waitpid_lock.release()
remaining = self._remaining_time(endtime)
if remaining <= 0:
raise TimeoutExpired(self.args, timeout)
delay = min(delay * 2, remaining, .05)
time.sleep(delay)
else:
while self.returncode is None:
with self._waitpid_lock:
if self.returncode is not None:
break # Another thread waited.
(pid, sts) = self._try_wait(0)
# Check the pid and loop as waitpid has been known to
# return 0 even without WNOHANG in odd situations.
# http://bugs.python.org/issue14396.
if pid == self.pid:
self._handle_exitstatus(sts)
return self.returncode
def _communicate(self, input, endtime, orig_timeout):
if self.stdin and not self._communication_started:
# Flush stdio buffer. This might block, if the user has
# been writing to .stdin in an uncontrolled fashion.
self.stdin.flush()
if not input:
self.stdin.close()
stdout = None
stderr = None
# Only create this mapping if we haven't already.
if not self._communication_started:
self._fileobj2output = {}
if self.stdout:
self._fileobj2output[self.stdout] = []
if self.stderr:
self._fileobj2output[self.stderr] = []
if self.stdout:
stdout = self._fileobj2output[self.stdout]
if self.stderr:
stderr = self._fileobj2output[self.stderr]
self._save_input(input)
if self._input:
input_view = memoryview(self._input)
with _PopenSelector() as selector:
if self.stdin and input:
selector.register(self.stdin, selectors.EVENT_WRITE)
if self.stdout:
selector.register(self.stdout, selectors.EVENT_READ)
if self.stderr:
selector.register(self.stderr, selectors.EVENT_READ)
while selector.get_map():
timeout = self._remaining_time(endtime)
if timeout is not None and timeout < 0:
raise TimeoutExpired(self.args, orig_timeout)
ready = selector.select(timeout)
self._check_timeout(endtime, orig_timeout)
# XXX Rewrite these to use non-blocking I/O on the file
# objects; they are no longer using C stdio!
for key, events in ready:
if key.fileobj is self.stdin:
chunk = input_view[self._input_offset :
self._input_offset + _PIPE_BUF]
try:
self._input_offset += os.write(key.fd, chunk)
except OSError as e:
if e.errno == errno.EPIPE:
selector.unregister(key.fileobj)
key.fileobj.close()
else:
raise
else:
if self._input_offset >= len(self._input):
selector.unregister(key.fileobj)
key.fileobj.close()
elif key.fileobj in (self.stdout, self.stderr):
data = os.read(key.fd, 32768)
if not data:
selector.unregister(key.fileobj)
key.fileobj.close()
self._fileobj2output[key.fileobj].append(data)
self.wait(timeout=self._remaining_time(endtime))
# All data exchanged. Translate lists into strings.
if stdout is not None:
stdout = b''.join(stdout)
if stderr is not None:
stderr = b''.join(stderr)
# Translate newlines, if requested.
# This also turns bytes into strings.
if self.universal_newlines:
if stdout is not None:
stdout = self._translate_newlines(stdout,
self.stdout.encoding)
if stderr is not None:
stderr = self._translate_newlines(stderr,
self.stderr.encoding)
return (stdout, stderr)
def _save_input(self, input):
# This method is called from the _communicate_with_*() methods
# so that if we time out while communicating, we can continue
# sending input if we retry.
if self.stdin and self._input is None:
self._input_offset = 0
self._input = input
if self.universal_newlines and input is not None:
self._input = self._input.encode(self.stdin.encoding)
def send_signal(self, sig):
"""Send a signal to the process."""
# Skip signalling a process that we know has already died.
if self.returncode is None:
os.kill(self.pid, sig)
def terminate(self):
"""Terminate the process with SIGTERM
"""
self.send_signal(signal.SIGTERM)
def kill(self):
"""Kill the process with SIGKILL
"""
self.send_signal(signal.SIGKILL)
"""Stuff to parse Sun and NeXT audio files.
An audio file consists of a header followed by the data. The structure
of the header is as follows.
+---------------+
| magic word |
+---------------+
| header size |
+---------------+
| data size |
+---------------+
| encoding |
+---------------+
| sample rate |
+---------------+
| # of channels |
+---------------+
| info |
| |
+---------------+
The magic word consists of the 4 characters '.snd'. Apart from the
info field, all header fields are 4 bytes in size. They are all
32-bit unsigned integers encoded in big-endian byte order.
The header size really gives the start of the data.
The data size is the physical size of the data. From the other
parameters the number of frames can be calculated.
The encoding gives the way in which audio samples are encoded.
Possible values are listed below.
The info field currently consists of an ASCII string giving a
human-readable description of the audio file. The info field is
padded with NUL bytes to the header size.
Usage.
Reading audio files:
f = sunau.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
When the setpos() and rewind() methods are not used, the seek()
method is not necessary.
This returns an instance of a class with the following public methods:
getnchannels() -- returns number of audio channels (1 for
mono, 2 for stereo)
getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getcomptype() -- returns compression type ('NONE' or 'ULAW')
getcompname() -- returns human-readable version of
compression type ('not compressed' matches 'NONE')
getparams() -- returns a namedtuple consisting of all of the
above in the above order
getmarkers() -- returns None (for compatibility with the
aifc module)
getmark(id) -- raises an error since the mark does not
exist (for compatibility with the aifc module)
readframes(n) -- returns at most n frames of audio
rewind() -- rewind to the beginning of the audio stream
setpos(pos) -- seek to the specified position
tell() -- return the current position
close() -- close the instance (make it unusable)
The position returned by tell() and the position given to setpos()
are compatible and have nothing to do with the actual position in the
file.
The close() method is called automatically when the class instance
is destroyed.
Writing audio files:
f = sunau.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
setnchannels(n) -- set the number of channels
setsampwidth(n) -- set the sample width
setframerate(n) -- set the frame rate
setnframes(n) -- set the number of frames
setcomptype(type, name)
-- set the compression type and the
human-readable compression type
setparams(tuple)-- set all parameters at once
tell() -- return current position in output file
writeframesraw(data)
-- write audio frames without pathing up the
file header
writeframes(data)
-- write audio frames and patch up the file header
close() -- patch up the file header and close the
output file
You should set the parameters before the first writeframesraw or
writeframes. The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes(b'') or
close() to patch up the sizes in the header.
The close() method is called automatically when the class instance
is destroyed.
"""
from collections import namedtuple
_sunau_params = namedtuple('_sunau_params',
'nchannels sampwidth framerate nframes comptype compname')
# from <multimedia/audio_filehdr.h>
AUDIO_FILE_MAGIC = 0x2e736e64
AUDIO_FILE_ENCODING_MULAW_8 = 1
AUDIO_FILE_ENCODING_LINEAR_8 = 2
AUDIO_FILE_ENCODING_LINEAR_16 = 3
AUDIO_FILE_ENCODING_LINEAR_24 = 4
AUDIO_FILE_ENCODING_LINEAR_32 = 5
AUDIO_FILE_ENCODING_FLOAT = 6
AUDIO_FILE_ENCODING_DOUBLE = 7
AUDIO_FILE_ENCODING_ADPCM_G721 = 23
AUDIO_FILE_ENCODING_ADPCM_G722 = 24
AUDIO_FILE_ENCODING_ADPCM_G723_3 = 25
AUDIO_FILE_ENCODING_ADPCM_G723_5 = 26
AUDIO_FILE_ENCODING_ALAW_8 = 27
# from <multimedia/audio_hdr.h>
AUDIO_UNKNOWN_SIZE = 0xFFFFFFFF # ((unsigned)(~0))
_simple_encodings = [AUDIO_FILE_ENCODING_MULAW_8,
AUDIO_FILE_ENCODING_LINEAR_8,
AUDIO_FILE_ENCODING_LINEAR_16,
AUDIO_FILE_ENCODING_LINEAR_24,
AUDIO_FILE_ENCODING_LINEAR_32,
AUDIO_FILE_ENCODING_ALAW_8]
class Error(Exception):
pass
def _read_u32(file):
x = 0
for i in range(4):
byte = file.read(1)
if not byte:
raise EOFError
x = x*256 + ord(byte)
return x
def _write_u32(file, x):
data = []
for i in range(4):
d, m = divmod(x, 256)
data.insert(0, int(m))
x = d
file.write(bytes(data))
class Au_read:
def __init__(self, f):
if type(f) == type(''):
import builtins
f = builtins.open(f, 'rb')
self._opened = True
else:
self._opened = False
self.initfp(f)
def __del__(self):
if self._file:
self.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def initfp(self, file):
self._file = file
self._soundpos = 0
magic = int(_read_u32(file))
if magic != AUDIO_FILE_MAGIC:
raise Error('bad magic number')
self._hdr_size = int(_read_u32(file))
if self._hdr_size < 24:
raise Error('header size too small')
if self._hdr_size > 100:
raise Error('header size ridiculously large')
self._data_size = _read_u32(file)
if self._data_size != AUDIO_UNKNOWN_SIZE:
self._data_size = int(self._data_size)
self._encoding = int(_read_u32(file))
if self._encoding not in _simple_encodings:
raise Error('encoding not (yet) supported')
if self._encoding in (AUDIO_FILE_ENCODING_MULAW_8,
AUDIO_FILE_ENCODING_ALAW_8):
self._sampwidth = 2
self._framesize = 1
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_8:
self._framesize = self._sampwidth = 1
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_16:
self._framesize = self._sampwidth = 2
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_24:
self._framesize = self._sampwidth = 3
elif self._encoding == AUDIO_FILE_ENCODING_LINEAR_32:
self._framesize = self._sampwidth = 4
else:
raise Error('unknown encoding')
self._framerate = int(_read_u32(file))
self._nchannels = int(_read_u32(file))
self._framesize = self._framesize * self._nchannels
if self._hdr_size > 24:
self._info = file.read(self._hdr_size - 24)
self._info, _, _ = self._info.partition(b'\0')
else:
self._info = b''
try:
self._data_pos = file.tell()
except (AttributeError, OSError):
self._data_pos = None
def getfp(self):
return self._file
def getnchannels(self):
return self._nchannels
def getsampwidth(self):
return self._sampwidth
def getframerate(self):
return self._framerate
def getnframes(self):
if self._data_size == AUDIO_UNKNOWN_SIZE:
return AUDIO_UNKNOWN_SIZE
if self._encoding in _simple_encodings:
return self._data_size // self._framesize
return 0 # XXX--must do some arithmetic here
def getcomptype(self):
if self._encoding == AUDIO_FILE_ENCODING_MULAW_8:
return 'ULAW'
elif self._encoding == AUDIO_FILE_ENCODING_ALAW_8:
return 'ALAW'
else:
return 'NONE'
def getcompname(self):
if self._encoding == AUDIO_FILE_ENCODING_MULAW_8:
return 'CCITT G.711 u-law'
elif self._encoding == AUDIO_FILE_ENCODING_ALAW_8:
return 'CCITT G.711 A-law'
else:
return 'not compressed'
def getparams(self):
return _sunau_params(self.getnchannels(), self.getsampwidth(),
self.getframerate(), self.getnframes(),
self.getcomptype(), self.getcompname())
def getmarkers(self):
return None
def getmark(self, id):
raise Error('no marks')
def readframes(self, nframes):
if self._encoding in _simple_encodings:
if nframes == AUDIO_UNKNOWN_SIZE:
data = self._file.read()
else:
data = self._file.read(nframes * self._framesize)
self._soundpos += len(data) // self._framesize
if self._encoding == AUDIO_FILE_ENCODING_MULAW_8:
import audioop
data = audioop.ulaw2lin(data, self._sampwidth)
return data
return None # XXX--not implemented yet
def rewind(self):
if self._data_pos is None:
raise OSError('cannot seek')
self._file.seek(self._data_pos)
self._soundpos = 0
def tell(self):
return self._soundpos
def setpos(self, pos):
if pos < 0 or pos > self.getnframes():
raise Error('position not in range')
if self._data_pos is None:
raise OSError('cannot seek')
self._file.seek(self._data_pos + pos * self._framesize)
self._soundpos = pos
def close(self):
file = self._file
if file:
self._file = None
if self._opened:
file.close()
class Au_write:
def __init__(self, f):
if type(f) == type(''):
import builtins
f = builtins.open(f, 'wb')
self._opened = True
else:
self._opened = False
self.initfp(f)
def __del__(self):
if self._file:
self.close()
self._file = None
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def initfp(self, file):
self._file = file
self._framerate = 0
self._nchannels = 0
self._sampwidth = 0
self._framesize = 0
self._nframes = AUDIO_UNKNOWN_SIZE
self._nframeswritten = 0
self._datawritten = 0
self._datalength = 0
self._info = b''
self._comptype = 'ULAW' # default is U-law
def setnchannels(self, nchannels):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if nchannels not in (1, 2, 4):
raise Error('only 1, 2, or 4 channels supported')
self._nchannels = nchannels
def getnchannels(self):
if not self._nchannels:
raise Error('number of channels not set')
return self._nchannels
def setsampwidth(self, sampwidth):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if sampwidth not in (1, 2, 3, 4):
raise Error('bad sample width')
self._sampwidth = sampwidth
def getsampwidth(self):
if not self._framerate:
raise Error('sample width not specified')
return self._sampwidth
def setframerate(self, framerate):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
self._framerate = framerate
def getframerate(self):
if not self._framerate:
raise Error('frame rate not set')
return self._framerate
def setnframes(self, nframes):
if self._nframeswritten:
raise Error('cannot change parameters after starting to write')
if nframes < 0:
raise Error('# of frames cannot be negative')
self._nframes = nframes
def getnframes(self):
return self._nframeswritten
def setcomptype(self, type, name):
if type in ('NONE', 'ULAW'):
self._comptype = type
else:
raise Error('unknown compression type')
def getcomptype(self):
return self._comptype
def getcompname(self):
if self._comptype == 'ULAW':
return 'CCITT G.711 u-law'
elif self._comptype == 'ALAW':
return 'CCITT G.711 A-law'
else:
return 'not compressed'
def setparams(self, params):
nchannels, sampwidth, framerate, nframes, comptype, compname = params
self.setnchannels(nchannels)
self.setsampwidth(sampwidth)
self.setframerate(framerate)
self.setnframes(nframes)
self.setcomptype(comptype, compname)
def getparams(self):
return _sunau_params(self.getnchannels(), self.getsampwidth(),
self.getframerate(), self.getnframes(),
self.getcomptype(), self.getcompname())
def tell(self):
return self._nframeswritten
def writeframesraw(self, data):
if not isinstance(data, (bytes, bytearray)):
data = memoryview(data).cast('B')
self._ensure_header_written()
if self._comptype == 'ULAW':
import audioop
data = audioop.lin2ulaw(data, self._sampwidth)
nframes = len(data) // self._framesize
self._file.write(data)
self._nframeswritten = self._nframeswritten + nframes
self._datawritten = self._datawritten + len(data)
def writeframes(self, data):
self.writeframesraw(data)
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten:
self._patchheader()
def close(self):
if self._file:
try:
self._ensure_header_written()
if self._nframeswritten != self._nframes or \
self._datalength != self._datawritten:
self._patchheader()
self._file.flush()
finally:
file = self._file
self._file = None
if self._opened:
file.close()
#
# private methods
#
def _ensure_header_written(self):
if not self._nframeswritten:
if not self._nchannels:
raise Error('# of channels not specified')
if not self._sampwidth:
raise Error('sample width not specified')
if not self._framerate:
raise Error('frame rate not specified')
self._write_header()
def _write_header(self):
if self._comptype == 'NONE':
if self._sampwidth == 1:
encoding = AUDIO_FILE_ENCODING_LINEAR_8
self._framesize = 1
elif self._sampwidth == 2:
encoding = AUDIO_FILE_ENCODING_LINEAR_16
self._framesize = 2
elif self._sampwidth == 3:
encoding = AUDIO_FILE_ENCODING_LINEAR_24
self._framesize = 3
elif self._sampwidth == 4:
encoding = AUDIO_FILE_ENCODING_LINEAR_32
self._framesize = 4
else:
raise Error('internal error')
elif self._comptype == 'ULAW':
encoding = AUDIO_FILE_ENCODING_MULAW_8
self._framesize = 1
else:
raise Error('internal error')
self._framesize = self._framesize * self._nchannels
_write_u32(self._file, AUDIO_FILE_MAGIC)
header_size = 25 + len(self._info)
header_size = (header_size + 7) & ~7
_write_u32(self._file, header_size)
if self._nframes == AUDIO_UNKNOWN_SIZE:
length = AUDIO_UNKNOWN_SIZE
else:
length = self._nframes * self._framesize
try:
self._form_length_pos = self._file.tell()
except (AttributeError, OSError):
self._form_length_pos = None
_write_u32(self._file, length)
self._datalength = length
_write_u32(self._file, encoding)
_write_u32(self._file, self._framerate)
_write_u32(self._file, self._nchannels)
self._file.write(self._info)
self._file.write(b'\0'*(header_size - len(self._info) - 24))
def _patchheader(self):
if self._form_length_pos is None:
raise OSError('cannot seek')
self._file.seek(self._form_length_pos)
_write_u32(self._file, self._datawritten)
self._datalength = self._datawritten
self._file.seek(0, 2)
def open(f, mode=None):
if mode is None:
if hasattr(f, 'mode'):
mode = f.mode
else:
mode = 'rb'
if mode in ('r', 'rb'):
return Au_read(f)
elif mode in ('w', 'wb'):
return Au_write(f)
else:
raise Error("mode must be 'r', 'rb', 'w', or 'wb'")
openfp = open
#! /usr/bin/env python3
"""Non-terminal symbols of Python grammar (from "graminit.h")."""
# This file is automatically generated; please don't muck it up!
#
# To update the symbols in this file, 'cd' to the top directory of
# the python source tree after building the interpreter and run:
#
# ./python Lib/symbol.py
#--start constants--
single_input = 256
file_input = 257
eval_input = 258
decorator = 259
decorators = 260
decorated = 261
funcdef = 262
parameters = 263
typedargslist = 264
tfpdef = 265
varargslist = 266
vfpdef = 267
stmt = 268
simple_stmt = 269
small_stmt = 270
expr_stmt = 271
testlist_star_expr = 272
augassign = 273
del_stmt = 274
pass_stmt = 275
flow_stmt = 276
break_stmt = 277
continue_stmt = 278
return_stmt = 279
yield_stmt = 280
raise_stmt = 281
import_stmt = 282
import_name = 283
import_from = 284
import_as_name = 285
dotted_as_name = 286
import_as_names = 287
dotted_as_names = 288
dotted_name = 289
global_stmt = 290
nonlocal_stmt = 291
assert_stmt = 292
compound_stmt = 293
if_stmt = 294
while_stmt = 295
for_stmt = 296
try_stmt = 297
with_stmt = 298
with_item = 299
except_clause = 300
suite = 301
test = 302
test_nocond = 303
lambdef = 304
lambdef_nocond = 305
or_test = 306
and_test = 307
not_test = 308
comparison = 309
comp_op = 310
star_expr = 311
expr = 312
xor_expr = 313
and_expr = 314
shift_expr = 315
arith_expr = 316
term = 317
factor = 318
power = 319
atom = 320
testlist_comp = 321
trailer = 322
subscriptlist = 323
subscript = 324
sliceop = 325
exprlist = 326
testlist = 327
dictorsetmaker = 328
classdef = 329
arglist = 330
argument = 331
comp_iter = 332
comp_for = 333
comp_if = 334
encoding_decl = 335
yield_expr = 336
yield_arg = 337
#--end constants--
sym_name = {}
for _name, _value in list(globals().items()):
if type(_value) is type(0):
sym_name[_value] = _name
def _main():
import sys
import token
if len(sys.argv) == 1:
sys.argv = sys.argv + ["Include/graminit.h", "Lib/symbol.py"]
token._main()
if __name__ == "__main__":
_main()
"""Interface to the compiler's internal symbol tables"""
import _symtable
from _symtable import (USE, DEF_GLOBAL, DEF_LOCAL, DEF_PARAM,
DEF_IMPORT, DEF_BOUND, OPT_IMPORT_STAR, SCOPE_OFF, SCOPE_MASK, FREE,
LOCAL, GLOBAL_IMPLICIT, GLOBAL_EXPLICIT, CELL)
import weakref
__all__ = ["symtable", "SymbolTable", "Class", "Function", "Symbol"]
def symtable(code, filename, compile_type):
top = _symtable.symtable(code, filename, compile_type)
return _newSymbolTable(top, filename)
class SymbolTableFactory:
def __init__(self):
self.__memo = weakref.WeakValueDictionary()
def new(self, table, filename):
if table.type == _symtable.TYPE_FUNCTION:
return Function(table, filename)
if table.type == _symtable.TYPE_CLASS:
return Class(table, filename)
return SymbolTable(table, filename)
def __call__(self, table, filename):
key = table, filename
obj = self.__memo.get(key, None)
if obj is None:
obj = self.__memo[key] = self.new(table, filename)
return obj
_newSymbolTable = SymbolTableFactory()
class SymbolTable(object):
def __init__(self, raw_table, filename):
self._table = raw_table
self._filename = filename
self._symbols = {}
def __repr__(self):
if self.__class__ == SymbolTable:
kind = ""
else:
kind = "%s " % self.__class__.__name__
if self._table.name == "global":
return "<{0}SymbolTable for module {1}>".format(kind, self._filename)
else:
return "<{0}SymbolTable for {1} in {2}>".format(kind,
self._table.name,
self._filename)
def get_type(self):
if self._table.type == _symtable.TYPE_MODULE:
return "module"
if self._table.type == _symtable.TYPE_FUNCTION:
return "function"
if self._table.type == _symtable.TYPE_CLASS:
return "class"
assert self._table.type in (1, 2, 3), \
"unexpected type: {0}".format(self._table.type)
def get_id(self):
return self._table.id
def get_name(self):
return self._table.name
def get_lineno(self):
return self._table.lineno
def is_optimized(self):
return bool(self._table.type == _symtable.TYPE_FUNCTION
and not self._table.optimized)
def is_nested(self):
return bool(self._table.nested)
def has_children(self):
return bool(self._table.children)
def has_exec(self):
"""Return true if the scope uses exec. Deprecated method."""
return False
def has_import_star(self):
"""Return true if the scope uses import *"""
return bool(self._table.optimized & OPT_IMPORT_STAR)
def get_identifiers(self):
return self._table.symbols.keys()
def lookup(self, name):
sym = self._symbols.get(name)
if sym is None:
flags = self._table.symbols[name]
namespaces = self.__check_children(name)
sym = self._symbols[name] = Symbol(name, flags, namespaces)
return sym
def get_symbols(self):
return [self.lookup(ident) for ident in self.get_identifiers()]
def __check_children(self, name):
return [_newSymbolTable(st, self._filename)
for st in self._table.children
if st.name == name]
def get_children(self):
return [_newSymbolTable(st, self._filename)
for st in self._table.children]
class Function(SymbolTable):
# Default values for instance variables
__params = None
__locals = None
__frees = None
__globals = None
def __idents_matching(self, test_func):
return tuple([ident for ident in self.get_identifiers()
if test_func(self._table.symbols[ident])])
def get_parameters(self):
if self.__params is None:
self.__params = self.__idents_matching(lambda x:x & DEF_PARAM)
return self.__params
def get_locals(self):
if self.__locals is None:
locs = (LOCAL, CELL)
test = lambda x: ((x >> SCOPE_OFF) & SCOPE_MASK) in locs
self.__locals = self.__idents_matching(test)
return self.__locals
def get_globals(self):
if self.__globals is None:
glob = (GLOBAL_IMPLICIT, GLOBAL_EXPLICIT)
test = lambda x:((x >> SCOPE_OFF) & SCOPE_MASK) in glob
self.__globals = self.__idents_matching(test)
return self.__globals
def get_frees(self):
if self.__frees is None:
is_free = lambda x:((x >> SCOPE_OFF) & SCOPE_MASK) == FREE
self.__frees = self.__idents_matching(is_free)
return self.__frees
class Class(SymbolTable):
__methods = None
def get_methods(self):
if self.__methods is None:
d = {}
for st in self._table.children:
d[st.name] = 1
self.__methods = tuple(d)
return self.__methods
class Symbol(object):
def __init__(self, name, flags, namespaces=None):
self.__name = name
self.__flags = flags
self.__scope = (flags >> SCOPE_OFF) & SCOPE_MASK # like PyST_GetScope()
self.__namespaces = namespaces or ()
def __repr__(self):
return "<symbol {0!r}>".format(self.__name)
def get_name(self):
return self.__name
def is_referenced(self):
return bool(self.__flags & _symtable.USE)
def is_parameter(self):
return bool(self.__flags & DEF_PARAM)
def is_global(self):
return bool(self.__scope in (GLOBAL_IMPLICIT, GLOBAL_EXPLICIT))
def is_declared_global(self):
return bool(self.__scope == GLOBAL_EXPLICIT)
def is_local(self):
return bool(self.__flags & DEF_BOUND)
def is_free(self):
return bool(self.__scope == FREE)
def is_imported(self):
return bool(self.__flags & DEF_IMPORT)
def is_assigned(self):
return bool(self.__flags & DEF_LOCAL)
def is_namespace(self):
"""Returns true if name binding introduces new namespace.
If the name is used as the target of a function or class
statement, this will be true.
Note that a single name can be bound to multiple objects. If
is_namespace() is true, the name may also be bound to other
objects, like an int or list, that does not introduce a new
namespace.
"""
return bool(self.__namespaces)
def get_namespaces(self):
"""Return a list of namespaces bound to this name"""
return self.__namespaces
def get_namespace(self):
"""Returns the single namespace bound to this name.
Raises ValueError if the name is bound to multiple namespaces.
"""
if len(self.__namespaces) != 1:
raise ValueError("name is bound to multiple namespaces")
return self.__namespaces[0]
if __name__ == "__main__":
import os, sys
with open(sys.argv[0]) as f:
src = f.read()
mod = symtable(src, os.path.split(sys.argv[0])[1], "exec")
for ident in mod.get_identifiers():
info = mod.lookup(ident)
print(info, info.is_local(), info.is_namespace())
"""Access to Python's configuration information."""
import os
import sys
from os.path import pardir, realpath
__all__ = [
'get_config_h_filename',
'get_config_var',
'get_config_vars',
'get_makefile_filename',
'get_path',
'get_path_names',
'get_paths',
'get_platform',
'get_python_version',
'get_scheme_names',
'parse_config_h',
]
_INSTALL_SCHEMES = {
'posix_prefix': {
'stdlib': '{installed_base}/lib/{implementation_lower}{py_version_short}',
'platstdlib': '{platbase}/lib/{implementation_lower}{py_version_short}',
'purelib': '{base}/lib/{implementation_lower}{py_version_short}/site-packages',
'platlib': '{platbase}/lib/{implementation_lower}{py_version_short}/site-packages',
'include':
'{installed_base}/include/{implementation_lower}{py_version_short}{abiflags}',
'platinclude':
'{installed_platbase}/include/{implementation_lower}{py_version_short}{abiflags}',
'scripts': '{base}/bin',
'data': '{base}',
},
'posix_home': {
'stdlib': '{installed_base}/lib/{implementation_lower}',
'platstdlib': '{base}/lib/{implementation_lower}',
'purelib': '{base}/lib/{implementation_lower}',
'platlib': '{base}/lib/{implementation_lower}',
'include': '{installed_base}/include/{implementation_lower}',
'platinclude': '{installed_base}/include/{implementation_lower}',
'scripts': '{base}/bin',
'data': '{base}',
},
'nt': {
'stdlib': '{installed_base}/Lib',
'platstdlib': '{base}/Lib',
'purelib': '{base}/Lib/site-packages',
'platlib': '{base}/Lib/site-packages',
'include': '{installed_base}/Include',
'platinclude': '{installed_base}/Include',
'scripts': '{base}/Scripts',
'data': '{base}',
},
'nt_user': {
'stdlib': '{userbase}/{implementation}{py_version_nodot}',
'platstdlib': '{userbase}/{implementation}{py_version_nodot}',
'purelib': '{userbase}/{implementation}{py_version_nodot}/site-packages',
'platlib': '{userbase}/{implementation}{py_version_nodot}/site-packages',
'include': '{userbase}/{implementation}{py_version_nodot}/Include',
'scripts': '{userbase}/Scripts',
'data': '{userbase}',
},
'posix_user': {
'stdlib': '{userbase}/lib/{implementation_lower}{py_version_short}',
'platstdlib': '{userbase}/lib/{implementation_lower}{py_version_short}',
'purelib': '{userbase}/lib/{implementation_lower}{py_version_short}/site-packages',
'platlib': '{userbase}/lib/{implementation_lower}{py_version_short}/site-packages',
'include': '{userbase}/include/{implementation_lower}{py_version_short}',
'scripts': '{userbase}/bin',
'data': '{userbase}',
},
'osx_framework_user': {
'stdlib': '{userbase}/lib/{implementation_lower}',
'platstdlib': '{userbase}/lib/{implementation_lower}',
'purelib': '{userbase}/lib/{implementation_lower}/site-packages',
'platlib': '{userbase}/lib/{implementation_lower}/site-packages',
'include': '{userbase}/include',
'scripts': '{userbase}/bin',
'data': '{userbase}',
},
}
_SCHEME_KEYS = ('stdlib', 'platstdlib', 'purelib', 'platlib', 'include',
'scripts', 'data')
# FIXME don't rely on sys.version here, its format is an implementation detail
# of CPython, use sys.version_info or sys.hexversion
_PY_VERSION = sys.version.split()[0]
_PY_VERSION_SHORT = sys.version[:3]
_PY_VERSION_SHORT_NO_DOT = _PY_VERSION[0] + _PY_VERSION[2]
_PREFIX = os.path.normpath(sys.prefix)
_BASE_PREFIX = os.path.normpath(sys.base_prefix)
_EXEC_PREFIX = os.path.normpath(sys.exec_prefix)
_BASE_EXEC_PREFIX = os.path.normpath(sys.base_exec_prefix)
_CONFIG_VARS = None
_USER_BASE = None
def _get_implementation():
if sys.implementation.name == 'ironpython':
return 'IronPython'
return 'Python'
def _safe_realpath(path):
try:
return realpath(path)
except OSError:
return path
if sys.executable:
_PROJECT_BASE = os.path.dirname(_safe_realpath(sys.executable))
else:
# sys.executable can be empty if argv[0] has been changed and Python is
# unable to retrieve the real program name
_PROJECT_BASE = _safe_realpath(os.getcwd())
if os.name == "nt" and "pcbuild" in _PROJECT_BASE[-8:].lower():
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir))
# PC/VS7.1
if os.name == "nt" and "\\pc\\v" in _PROJECT_BASE[-10:].lower():
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir, pardir))
# PC/AMD64
if os.name == "nt" and "\\pcbuild\\amd64" in _PROJECT_BASE[-14:].lower():
_PROJECT_BASE = _safe_realpath(os.path.join(_PROJECT_BASE, pardir, pardir))
# set for cross builds
if "_PYTHON_PROJECT_BASE" in os.environ:
_PROJECT_BASE = _safe_realpath(os.environ["_PYTHON_PROJECT_BASE"])
def _is_python_source_dir(d):
for fn in ("Setup.dist", "Setup.local"):
if os.path.isfile(os.path.join(d, "Modules", fn)):
return True
return False
_sys_home = getattr(sys, '_home', None)
if _sys_home and os.name == 'nt' and \
_sys_home.lower().endswith(('pcbuild', 'pcbuild\\amd64')):
_sys_home = os.path.dirname(_sys_home)
if _sys_home.endswith('pcbuild'): # must be amd64
_sys_home = os.path.dirname(_sys_home)
def is_python_build(check_home=False):
if check_home and _sys_home:
return _is_python_source_dir(_sys_home)
return _is_python_source_dir(_PROJECT_BASE)
_PYTHON_BUILD = is_python_build(True)
if _PYTHON_BUILD:
for scheme in ('posix_prefix', 'posix_home'):
_INSTALL_SCHEMES[scheme]['include'] = '{srcdir}/Include'
_INSTALL_SCHEMES[scheme]['platinclude'] = '{projectbase}/.'
def _subst_vars(s, local_vars):
try:
return s.format(**local_vars)
except KeyError:
try:
return s.format(**os.environ)
except KeyError as var:
raise AttributeError('{%s}' % var)
def _extend_dict(target_dict, other_dict):
target_keys = target_dict.keys()
for key, value in other_dict.items():
if key in target_keys:
continue
target_dict[key] = value
def _expand_vars(scheme, vars):
res = {}
if vars is None:
vars = {}
_extend_dict(vars, get_config_vars())
for key, value in _INSTALL_SCHEMES[scheme].items():
if os.name in ('posix', 'nt'):
value = os.path.expanduser(value)
res[key] = os.path.normpath(_subst_vars(value, vars))
return res
def _get_default_scheme():
if os.name == 'posix':
# the default scheme for posix is posix_prefix
return 'posix_prefix'
return os.name
def _getuserbase():
env_base = os.environ.get("PYTHONUSERBASE", None)
def joinuser(*args):
return os.path.expanduser(os.path.join(*args))
if os.name == "nt":
base = os.environ.get("APPDATA") or "~"
if env_base:
return env_base
else:
return joinuser(base, "Python")
if sys.platform == "darwin":
framework = get_config_var("PYTHONFRAMEWORK")
if framework:
if env_base:
return env_base
else:
return joinuser("~", "Library", framework, "%d.%d" %
sys.version_info[:2])
if env_base:
return env_base
else:
return joinuser("~", ".local")
def _parse_makefile(filename, vars=None):
"""Parse a Makefile-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
# Regexes needed for parsing Makefile (and similar syntaxes,
# like old-style Setup files).
import re
_variable_rx = re.compile("([a-zA-Z][a-zA-Z0-9_]+)\s*=\s*(.*)")
_findvar1_rx = re.compile(r"\$\(([A-Za-z][A-Za-z0-9_]*)\)")
_findvar2_rx = re.compile(r"\${([A-Za-z][A-Za-z0-9_]*)}")
if vars is None:
vars = {}
done = {}
notdone = {}
with open(filename, errors="surrogateescape") as f:
lines = f.readlines()
for line in lines:
if line.startswith('#') or line.strip() == '':
continue
m = _variable_rx.match(line)
if m:
n, v = m.group(1, 2)
v = v.strip()
# `$$' is a literal `$' in make
tmpv = v.replace('$$', '')
if "$" in tmpv:
notdone[n] = v
else:
try:
v = int(v)
except ValueError:
# insert literal `$'
done[n] = v.replace('$$', '$')
else:
done[n] = v
# do variable interpolation here
variables = list(notdone.keys())
# Variables with a 'PY_' prefix in the makefile. These need to
# be made available without that prefix through sysconfig.
# Special care is needed to ensure that variable expansion works, even
# if the expansion uses the name without a prefix.
renamed_variables = ('CFLAGS', 'LDFLAGS', 'CPPFLAGS')
while len(variables) > 0:
for name in tuple(variables):
value = notdone[name]
m = _findvar1_rx.search(value) or _findvar2_rx.search(value)
if m is not None:
n = m.group(1)
found = True
if n in done:
item = str(done[n])
elif n in notdone:
# get it on a subsequent round
found = False
elif n in os.environ:
# do it like make: fall back to environment
item = os.environ[n]
elif n in renamed_variables:
if (name.startswith('PY_') and
name[3:] in renamed_variables):
item = ""
elif 'PY_' + n in notdone:
found = False
else:
item = str(done['PY_' + n])
else:
done[n] = item = ""
if found:
after = value[m.end():]
value = value[:m.start()] + item + after
if "$" in after:
notdone[name] = value
else:
try:
value = int(value)
except ValueError:
done[name] = value.strip()
else:
done[name] = value
variables.remove(name)
if name.startswith('PY_') \
and name[3:] in renamed_variables:
name = name[3:]
if name not in done:
done[name] = value
else:
# bogus variable reference (e.g. "prefix=$/opt/python");
# just drop it since we can't deal
done[name] = value
variables.remove(name)
# strip spurious spaces
for k, v in done.items():
if isinstance(v, str):
done[k] = v.strip()
# save the results in the global dictionary
vars.update(done)
return vars
def get_makefile_filename():
"""Return the path of the Makefile."""
if _PYTHON_BUILD:
return os.path.join(_sys_home or _PROJECT_BASE, "Makefile")
if hasattr(sys, 'abiflags'):
config_dir_name = 'config-%s%s' % (_PY_VERSION_SHORT, sys.abiflags)
else:
config_dir_name = 'config'
return os.path.join(get_path('stdlib'), config_dir_name, 'Makefile')
def _generate_posix_vars():
"""Generate the Python module containing build-time variables."""
import pprint
vars = {}
# load the installed Makefile:
makefile = get_makefile_filename()
try:
_parse_makefile(makefile, vars)
except OSError as e:
msg = "invalid Python installation: unable to open %s" % makefile
if hasattr(e, "strerror"):
msg = msg + " (%s)" % e.strerror
raise OSError(msg)
# load the installed pyconfig.h:
config_h = get_config_h_filename()
try:
with open(config_h) as f:
parse_config_h(f, vars)
except OSError as e:
msg = "invalid Python installation: unable to open %s" % config_h
if hasattr(e, "strerror"):
msg = msg + " (%s)" % e.strerror
raise OSError(msg)
# On AIX, there are wrong paths to the linker scripts in the Makefile
# -- these paths are relative to the Python source, but when installed
# the scripts are in another directory.
if _PYTHON_BUILD:
vars['BLDSHARED'] = vars['LDSHARED']
# There's a chicken-and-egg situation on OS X with regards to the
# _sysconfigdata module after the changes introduced by #15298:
# get_config_vars() is called by get_platform() as part of the
# `make pybuilddir.txt` target -- which is a precursor to the
# _sysconfigdata.py module being constructed. Unfortunately,
# get_config_vars() eventually calls _init_posix(), which attempts
# to import _sysconfigdata, which we won't have built yet. In order
# for _init_posix() to work, if we're on Darwin, just mock up the
# _sysconfigdata module manually and populate it with the build vars.
# This is more than sufficient for ensuring the subsequent call to
# get_platform() succeeds.
name = '_sysconfigdata'
if 'darwin' in sys.platform:
import types
module = types.ModuleType(name)
module.build_time_vars = vars
sys.modules[name] = module
pybuilddir = 'build/lib.%s-%s' % (get_platform(), sys.version[:3])
if hasattr(sys, "gettotalrefcount"):
pybuilddir += '-pydebug'
os.makedirs(pybuilddir, exist_ok=True)
destfile = os.path.join(pybuilddir, name + '.py')
with open(destfile, 'w', encoding='utf8') as f:
f.write('# system configuration generated and used by'
' the sysconfig module\n')
f.write('build_time_vars = ')
pprint.pprint(vars, stream=f)
# Create file used for sys.path fixup -- see Modules/getpath.c
with open('pybuilddir.txt', 'w', encoding='ascii') as f:
f.write(pybuilddir)
def _init_posix(vars):
"""Initialize the module as appropriate for POSIX systems."""
# _sysconfigdata is generated at build time, see _generate_posix_vars()
from _sysconfigdata import build_time_vars
vars.update(build_time_vars)
def _init_non_posix(vars):
"""Initialize the module as appropriate for NT"""
# set basic install directories
vars['LIBDEST'] = get_path('stdlib')
vars['BINLIBDEST'] = get_path('platstdlib')
vars['INCLUDEPY'] = get_path('include')
vars['EXT_SUFFIX'] = '.pyd'
vars['EXE'] = '.exe'
vars['VERSION'] = _PY_VERSION_SHORT_NO_DOT
vars['BINDIR'] = os.path.dirname(_safe_realpath(sys.executable))
#
# public APIs
#
def parse_config_h(fp, vars=None):
"""Parse a config.h-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
if vars is None:
vars = {}
import re
define_rx = re.compile("#define ([A-Z][A-Za-z0-9_]+) (.*)\n")
undef_rx = re.compile("/[*] #undef ([A-Z][A-Za-z0-9_]+) [*]/\n")
while True:
line = fp.readline()
if not line:
break
m = define_rx.match(line)
if m:
n, v = m.group(1, 2)
try:
v = int(v)
except ValueError:
pass
vars[n] = v
else:
m = undef_rx.match(line)
if m:
vars[m.group(1)] = 0
return vars
def get_config_h_filename():
"""Return the path of pyconfig.h."""
if _PYTHON_BUILD:
if os.name == "nt":
inc_dir = os.path.join(_sys_home or _PROJECT_BASE, "PC")
else:
inc_dir = _sys_home or _PROJECT_BASE
else:
inc_dir = get_path('platinclude')
return os.path.join(inc_dir, 'pyconfig.h')
def get_scheme_names():
"""Return a tuple containing the schemes names."""
return tuple(sorted(_INSTALL_SCHEMES))
def get_path_names():
"""Return a tuple containing the paths names."""
return _SCHEME_KEYS
def get_paths(scheme=_get_default_scheme(), vars=None, expand=True):
"""Return a mapping containing an install scheme.
``scheme`` is the install scheme name. If not provided, it will
return the default scheme for the current platform.
"""
if expand:
return _expand_vars(scheme, vars)
else:
return _INSTALL_SCHEMES[scheme]
def get_path(name, scheme=_get_default_scheme(), vars=None, expand=True):
"""Return a path corresponding to the scheme.
``scheme`` is the install scheme name.
"""
return get_paths(scheme, vars, expand)[name]
def get_config_vars(*args):
"""With no arguments, return a dictionary of all configuration
variables relevant for the current platform.
On Unix, this means every variable defined in Python's installed Makefile;
On Windows it's a much smaller set.
With arguments, return a list of values that result from looking up
each argument in the configuration variable dictionary.
"""
global _CONFIG_VARS
if _CONFIG_VARS is None:
_CONFIG_VARS = {}
# Normalized versions of prefix and exec_prefix are handy to have;
# in fact, these are the standard versions used most places in the
# Distutils.
_CONFIG_VARS['prefix'] = _PREFIX
_CONFIG_VARS['exec_prefix'] = _EXEC_PREFIX
_CONFIG_VARS['py_version'] = _PY_VERSION
_CONFIG_VARS['py_version_short'] = _PY_VERSION_SHORT
_CONFIG_VARS['py_version_nodot'] = _PY_VERSION[0] + _PY_VERSION[2]
_CONFIG_VARS['installed_base'] = _BASE_PREFIX
_CONFIG_VARS['base'] = _PREFIX
_CONFIG_VARS['installed_platbase'] = _BASE_EXEC_PREFIX
_CONFIG_VARS['platbase'] = _EXEC_PREFIX
_CONFIG_VARS['projectbase'] = _PROJECT_BASE
try:
_CONFIG_VARS['abiflags'] = sys.abiflags
except AttributeError:
# sys.abiflags may not be defined on all platforms.
_CONFIG_VARS['abiflags'] = ''
_CONFIG_VARS['implementation'] = _get_implementation()
_CONFIG_VARS['implementation_lower'] = _get_implementation().lower()
if os.name == 'nt':
_init_non_posix(_CONFIG_VARS)
if os.name == 'posix':
_init_posix(_CONFIG_VARS)
# For backward compatibility, see issue19555
SO = _CONFIG_VARS.get('EXT_SUFFIX')
if SO is not None:
_CONFIG_VARS['SO'] = SO
# Setting 'userbase' is done below the call to the
# init function to enable using 'get_config_var' in
# the init-function.
_CONFIG_VARS['userbase'] = _getuserbase()
# Always convert srcdir to an absolute path
srcdir = _CONFIG_VARS.get('srcdir', _PROJECT_BASE)
if os.name == 'posix':
if _PYTHON_BUILD:
# If srcdir is a relative path (typically '.' or '..')
# then it should be interpreted relative to the directory
# containing Makefile.
base = os.path.dirname(get_makefile_filename())
srcdir = os.path.join(base, srcdir)
else:
# srcdir is not meaningful since the installation is
# spread about the filesystem. We choose the
# directory containing the Makefile since we know it
# exists.
srcdir = os.path.dirname(get_makefile_filename())
_CONFIG_VARS['srcdir'] = _safe_realpath(srcdir)
# OS X platforms require special customization to handle
# multi-architecture, multi-os-version installers
if sys.platform == 'darwin':
import _osx_support
_osx_support.customize_config_vars(_CONFIG_VARS)
if args:
vals = []
for name in args:
vals.append(_CONFIG_VARS.get(name))
return vals
else:
return _CONFIG_VARS
def get_config_var(name):
"""Return the value of a single variable using the dictionary returned by
'get_config_vars()'.
Equivalent to get_config_vars().get(name)
"""
if name == 'SO':
import warnings
warnings.warn('SO is deprecated, use EXT_SUFFIX', DeprecationWarning, 2)
return get_config_vars().get(name)
def get_platform():
"""Return a string that identifies the current platform.
This is used mainly to distinguish platform-specific build directories and
platform-specific built distributions. Typically includes the OS name
and version and the architecture (as supplied by 'os.uname()'),
although the exact information included depends on the OS; eg. for IRIX
the architecture isn't particularly important (IRIX only runs on SGI
hardware), but for Linux the kernel version isn't particularly
important.
Examples of returned values:
linux-i586
linux-alpha (?)
solaris-2.6-sun4u
irix-5.3
irix64-6.2
Windows will return one of:
win-amd64 (64bit Windows on AMD64 (aka x86_64, Intel64, EM64T, etc)
win-ia64 (64bit Windows on Itanium)
win32 (all others - specifically, sys.platform is returned)
For other non-POSIX platforms, currently just returns 'sys.platform'.
"""
if os.name == 'nt':
# sniff sys.version for architecture.
prefix = " bit ("
i = sys.version.find(prefix)
if i == -1:
return sys.platform
j = sys.version.find(")", i)
look = sys.version[i+len(prefix):j].lower()
if look == 'amd64':
return 'win-amd64'
if look == 'itanium':
return 'win-ia64'
return sys.platform
if os.name != "posix" or not hasattr(os, 'uname'):
# XXX what about the architecture? NT is Intel or Alpha
return sys.platform
# Set for cross builds explicitly
if "_PYTHON_HOST_PLATFORM" in os.environ:
return os.environ["_PYTHON_HOST_PLATFORM"]
# Try to distinguish various flavours of Unix
osname, host, release, version, machine = os.uname()
# Convert the OS name to lowercase, remove '/' characters
# (to accommodate BSD/OS), and translate spaces (for "Power Macintosh")
osname = osname.lower().replace('/', '')
machine = machine.replace(' ', '_')
machine = machine.replace('/', '-')
if osname[:5] == "linux":
# At least on Linux/Intel, 'machine' is the processor --
# i386, etc.
# XXX what about Alpha, SPARC, etc?
return "%s-%s" % (osname, machine)
elif osname[:5] == "sunos":
if release[0] >= "5": # SunOS 5 == Solaris 2
osname = "solaris"
release = "%d.%s" % (int(release[0]) - 3, release[2:])
# We can't use "platform.architecture()[0]" because a
# bootstrap problem. We use a dict to get an error
# if some suspicious happens.
bitness = {2147483647:"32bit", 9223372036854775807:"64bit"}
machine += ".%s" % bitness[sys.maxsize]
# fall through to standard osname-release-machine representation
elif osname[:4] == "irix": # could be "irix64"!
return "%s-%s" % (osname, release)
elif osname[:3] == "aix":
return "%s-%s.%s" % (osname, version, release)
elif osname[:6] == "cygwin":
osname = "cygwin"
import re
rel_re = re.compile(r'[\d.]+')
m = rel_re.match(release)
if m:
release = m.group()
elif osname[:6] == "darwin":
import _osx_support
osname, release, machine = _osx_support.get_platform_osx(
get_config_vars(),
osname, release, machine)
return "%s-%s-%s" % (osname, release, machine)
def get_python_version():
return _PY_VERSION_SHORT
def _print_dict(title, data):
for index, (key, value) in enumerate(sorted(data.items())):
if index == 0:
print('%s: ' % (title))
print('\t%s = "%s"' % (key, value))
def _main():
"""Display all information sysconfig detains."""
if '--generate-posix-vars' in sys.argv:
_generate_posix_vars()
return
print('Platform: "%s"' % get_platform())
print('Python version: "%s"' % get_python_version())
print('Current installation scheme: "%s"' % _get_default_scheme())
print()
_print_dict('Paths', get_paths())
print()
_print_dict('Variables', get_config_vars())
if __name__ == '__main__':
_main()
#! /usr/bin/env python3
"""The Tab Nanny despises ambiguous indentation. She knows no mercy.
tabnanny -- Detection of ambiguous indentation
For the time being this module is intended to be called as a script.
However it is possible to import it into an IDE and use the function
check() described below.
Warning: The API provided by this module is likely to change in future
releases; such changes may not be backward compatible.
"""
# Released to the public domain, by Tim Peters, 15 April 1998.
# XXX Note: this is now a standard library module.
# XXX The API needs to undergo changes however; the current code is too
# XXX script-like. This will be addressed later.
__version__ = "6"
import os
import sys
import getopt
import tokenize
if not hasattr(tokenize, 'NL'):
raise ValueError("tokenize.NL doesn't exist -- tokenize module too old")
__all__ = ["check", "NannyNag", "process_tokens"]
verbose = 0
filename_only = 0
def errprint(*args):
sep = ""
for arg in args:
sys.stderr.write(sep + str(arg))
sep = " "
sys.stderr.write("\n")
def main():
global verbose, filename_only
try:
opts, args = getopt.getopt(sys.argv[1:], "qv")
except getopt.error as msg:
errprint(msg)
return
for o, a in opts:
if o == '-q':
filename_only = filename_only + 1
if o == '-v':
verbose = verbose + 1
if not args:
errprint("Usage:", sys.argv[0], "[-v] file_or_directory ...")
return
for arg in args:
check(arg)
class NannyNag(Exception):
"""
Raised by tokeneater() if detecting an ambiguous indent.
Captured and handled in check().
"""
def __init__(self, lineno, msg, line):
self.lineno, self.msg, self.line = lineno, msg, line
def get_lineno(self):
return self.lineno
def get_msg(self):
return self.msg
def get_line(self):
return self.line
def check(file):
"""check(file_or_dir)
If file_or_dir is a directory and not a symbolic link, then recursively
descend the directory tree named by file_or_dir, checking all .py files
along the way. If file_or_dir is an ordinary Python source file, it is
checked for whitespace related problems. The diagnostic messages are
written to standard output using the print statement.
"""
if os.path.isdir(file) and not os.path.islink(file):
if verbose:
print("%r: listing directory" % (file,))
names = os.listdir(file)
for name in names:
fullname = os.path.join(file, name)
if (os.path.isdir(fullname) and
not os.path.islink(fullname) or
os.path.normcase(name[-3:]) == ".py"):
check(fullname)
return
try:
f = tokenize.open(file)
except OSError as msg:
errprint("%r: I/O Error: %s" % (file, msg))
return
if verbose > 1:
print("checking %r ..." % file)
try:
process_tokens(tokenize.generate_tokens(f.readline))
except tokenize.TokenError as msg:
errprint("%r: Token Error: %s" % (file, msg))
return
except IndentationError as msg:
errprint("%r: Indentation Error: %s" % (file, msg))
return
except NannyNag as nag:
badline = nag.get_lineno()
line = nag.get_line()
if verbose:
print("%r: *** Line %d: trouble in tab city! ***" % (file, badline))
print("offending line: %r" % (line,))
print(nag.get_msg())
else:
if ' ' in file: file = '"' + file + '"'
if filename_only: print(file)
else: print(file, badline, repr(line))
return
finally:
f.close()
if verbose:
print("%r: Clean bill of health." % (file,))
class Whitespace:
# the characters used for space and tab
S, T = ' \t'
# members:
# raw
# the original string
# n
# the number of leading whitespace characters in raw
# nt
# the number of tabs in raw[:n]
# norm
# the normal form as a pair (count, trailing), where:
# count
# a tuple such that raw[:n] contains count[i]
# instances of S * i + T
# trailing
# the number of trailing spaces in raw[:n]
# It's A Theorem that m.indent_level(t) ==
# n.indent_level(t) for all t >= 1 iff m.norm == n.norm.
# is_simple
# true iff raw[:n] is of the form (T*)(S*)
def __init__(self, ws):
self.raw = ws
S, T = Whitespace.S, Whitespace.T
count = []
b = n = nt = 0
for ch in self.raw:
if ch == S:
n = n + 1
b = b + 1
elif ch == T:
n = n + 1
nt = nt + 1
if b >= len(count):
count = count + [0] * (b - len(count) + 1)
count[b] = count[b] + 1
b = 0
else:
break
self.n = n
self.nt = nt
self.norm = tuple(count), b
self.is_simple = len(count) <= 1
# return length of longest contiguous run of spaces (whether or not
# preceding a tab)
def longest_run_of_spaces(self):
count, trailing = self.norm
return max(len(count)-1, trailing)
def indent_level(self, tabsize):
# count, il = self.norm
# for i in range(len(count)):
# if count[i]:
# il = il + (i//tabsize + 1)*tabsize * count[i]
# return il
# quicker:
# il = trailing + sum (i//ts + 1)*ts*count[i] =
# trailing + ts * sum (i//ts + 1)*count[i] =
# trailing + ts * sum i//ts*count[i] + count[i] =
# trailing + ts * [(sum i//ts*count[i]) + (sum count[i])] =
# trailing + ts * [(sum i//ts*count[i]) + num_tabs]
# and note that i//ts*count[i] is 0 when i < ts
count, trailing = self.norm
il = 0
for i in range(tabsize, len(count)):
il = il + i//tabsize * count[i]
return trailing + tabsize * (il + self.nt)
# return true iff self.indent_level(t) == other.indent_level(t)
# for all t >= 1
def equal(self, other):
return self.norm == other.norm
# return a list of tuples (ts, i1, i2) such that
# i1 == self.indent_level(ts) != other.indent_level(ts) == i2.
# Intended to be used after not self.equal(other) is known, in which
# case it will return at least one witnessing tab size.
def not_equal_witness(self, other):
n = max(self.longest_run_of_spaces(),
other.longest_run_of_spaces()) + 1
a = []
for ts in range(1, n+1):
if self.indent_level(ts) != other.indent_level(ts):
a.append( (ts,
self.indent_level(ts),
other.indent_level(ts)) )
return a
# Return True iff self.indent_level(t) < other.indent_level(t)
# for all t >= 1.
# The algorithm is due to Vincent Broman.
# Easy to prove it's correct.
# XXXpost that.
# Trivial to prove n is sharp (consider T vs ST).
# Unknown whether there's a faster general way. I suspected so at
# first, but no longer.
# For the special (but common!) case where M and N are both of the
# form (T*)(S*), M.less(N) iff M.len() < N.len() and
# M.num_tabs() <= N.num_tabs(). Proof is easy but kinda long-winded.
# XXXwrite that up.
# Note that M is of the form (T*)(S*) iff len(M.norm[0]) <= 1.
def less(self, other):
if self.n >= other.n:
return False
if self.is_simple and other.is_simple:
return self.nt <= other.nt
n = max(self.longest_run_of_spaces(),
other.longest_run_of_spaces()) + 1
# the self.n >= other.n test already did it for ts=1
for ts in range(2, n+1):
if self.indent_level(ts) >= other.indent_level(ts):
return False
return True
# return a list of tuples (ts, i1, i2) such that
# i1 == self.indent_level(ts) >= other.indent_level(ts) == i2.
# Intended to be used after not self.less(other) is known, in which
# case it will return at least one witnessing tab size.
def not_less_witness(self, other):
n = max(self.longest_run_of_spaces(),
other.longest_run_of_spaces()) + 1
a = []
for ts in range(1, n+1):
if self.indent_level(ts) >= other.indent_level(ts):
a.append( (ts,
self.indent_level(ts),
other.indent_level(ts)) )
return a
def format_witnesses(w):
firsts = (str(tup[0]) for tup in w)
prefix = "at tab size"
if len(w) > 1:
prefix = prefix + "s"
return prefix + " " + ', '.join(firsts)
def process_tokens(tokens):
INDENT = tokenize.INDENT
DEDENT = tokenize.DEDENT
NEWLINE = tokenize.NEWLINE
JUNK = tokenize.COMMENT, tokenize.NL
indents = [Whitespace("")]
check_equal = 0
for (type, token, start, end, line) in tokens:
if type == NEWLINE:
# a program statement, or ENDMARKER, will eventually follow,
# after some (possibly empty) run of tokens of the form
# (NL | COMMENT)* (INDENT | DEDENT+)?
# If an INDENT appears, setting check_equal is wrong, and will
# be undone when we see the INDENT.
check_equal = 1
elif type == INDENT:
check_equal = 0
thisguy = Whitespace(token)
if not indents[-1].less(thisguy):
witness = indents[-1].not_less_witness(thisguy)
msg = "indent not greater e.g. " + format_witnesses(witness)
raise NannyNag(start[0], msg, line)
indents.append(thisguy)
elif type == DEDENT:
# there's nothing we need to check here! what's important is
# that when the run of DEDENTs ends, the indentation of the
# program statement (or ENDMARKER) that triggered the run is
# equal to what's left at the top of the indents stack
# Ouch! This assert triggers if the last line of the source
# is indented *and* lacks a newline -- then DEDENTs pop out
# of thin air.
# assert check_equal # else no earlier NEWLINE, or an earlier INDENT
check_equal = 1
del indents[-1]
elif check_equal and type not in JUNK:
# this is the first "real token" following a NEWLINE, so it
# must be the first token of the next program statement, or an
# ENDMARKER; the "line" argument exposes the leading whitespace
# for this statement; in the case of ENDMARKER, line is an empty
# string, so will properly match the empty string with which the
# "indents" stack was seeded
check_equal = 0
thisguy = Whitespace(line)
if not indents[-1].equal(thisguy):
witness = indents[-1].not_equal_witness(thisguy)
msg = "indent not equal e.g. " + format_witnesses(witness)
raise NannyNag(start[0], msg, line)
if __name__ == '__main__':
main()
#!/usr/bin/env python3
#-------------------------------------------------------------------
# tarfile.py
#-------------------------------------------------------------------
# Copyright (C) 2002 Lars Gustaebel <[email protected]>
# All rights reserved.
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
#
"""Read from and write to tar format archives.
"""
version = "0.9.0"
__author__ = "Lars Gust\u00e4bel ([email protected])"
__date__ = "$Date: 2011-02-25 17:42:01 +0200 (Fri, 25 Feb 2011) $"
__cvsid__ = "$Id: tarfile.py 88586 2011-02-25 15:42:01Z marc-andre.lemburg $"
__credits__ = "Gustavo Niemeyer, Niels Gust\u00e4bel, Richard Townsend."
#---------
# Imports
#---------
from builtins import open as bltn_open
import sys
import os
import io
import shutil
import stat
import time
import struct
import copy
import re
try:
import grp, pwd
except ImportError:
grp = pwd = None
# os.symlink on Windows prior to 6.0 raises NotImplementedError
symlink_exception = (AttributeError, NotImplementedError)
try:
# OSError (winerror=1314) will be raised if the caller does not hold the
# SeCreateSymbolicLinkPrivilege privilege
symlink_exception += (OSError,)
except NameError:
pass
# from tarfile import *
__all__ = ["TarFile", "TarInfo", "is_tarfile", "TarError"]
#---------------------------------------------------------
# tar constants
#---------------------------------------------------------
NUL = b"\0" # the null character
BLOCKSIZE = 512 # length of processing blocks
RECORDSIZE = BLOCKSIZE * 20 # length of records
GNU_MAGIC = b"ustar \0" # magic gnu tar string
POSIX_MAGIC = b"ustar\x0000" # magic posix tar string
LENGTH_NAME = 100 # maximum length of a filename
LENGTH_LINK = 100 # maximum length of a linkname
LENGTH_PREFIX = 155 # maximum length of the prefix field
REGTYPE = b"0" # regular file
AREGTYPE = b"\0" # regular file
LNKTYPE = b"1" # link (inside tarfile)
SYMTYPE = b"2" # symbolic link
CHRTYPE = b"3" # character special device
BLKTYPE = b"4" # block special device
DIRTYPE = b"5" # directory
FIFOTYPE = b"6" # fifo special device
CONTTYPE = b"7" # contiguous file
GNUTYPE_LONGNAME = b"L" # GNU tar longname
GNUTYPE_LONGLINK = b"K" # GNU tar longlink
GNUTYPE_SPARSE = b"S" # GNU tar sparse file
XHDTYPE = b"x" # POSIX.1-2001 extended header
XGLTYPE = b"g" # POSIX.1-2001 global header
SOLARIS_XHDTYPE = b"X" # Solaris extended header
USTAR_FORMAT = 0 # POSIX.1-1988 (ustar) format
GNU_FORMAT = 1 # GNU tar format
PAX_FORMAT = 2 # POSIX.1-2001 (pax) format
DEFAULT_FORMAT = GNU_FORMAT
#---------------------------------------------------------
# tarfile constants
#---------------------------------------------------------
# File types that tarfile supports:
SUPPORTED_TYPES = (REGTYPE, AREGTYPE, LNKTYPE,
SYMTYPE, DIRTYPE, FIFOTYPE,
CONTTYPE, CHRTYPE, BLKTYPE,
GNUTYPE_LONGNAME, GNUTYPE_LONGLINK,
GNUTYPE_SPARSE)
# File types that will be treated as a regular file.
REGULAR_TYPES = (REGTYPE, AREGTYPE,
CONTTYPE, GNUTYPE_SPARSE)
# File types that are part of the GNU tar format.
GNU_TYPES = (GNUTYPE_LONGNAME, GNUTYPE_LONGLINK,
GNUTYPE_SPARSE)
# Fields from a pax header that override a TarInfo attribute.
PAX_FIELDS = ("path", "linkpath", "size", "mtime",
"uid", "gid", "uname", "gname")
# Fields from a pax header that are affected by hdrcharset.
PAX_NAME_FIELDS = {"path", "linkpath", "uname", "gname"}
# Fields in a pax header that are numbers, all other fields
# are treated as strings.
PAX_NUMBER_FIELDS = {
"atime": float,
"ctime": float,
"mtime": float,
"uid": int,
"gid": int,
"size": int
}
#---------------------------------------------------------
# initialization
#---------------------------------------------------------
if os.name in ("nt", "ce"):
ENCODING = "utf-8"
else:
ENCODING = sys.getfilesystemencoding()
#---------------------------------------------------------
# Some useful functions
#---------------------------------------------------------
def stn(s, length, encoding, errors):
"""Convert a string to a null-terminated bytes object.
"""
s = s.encode(encoding, errors)
return s[:length] + (length - len(s)) * NUL
def nts(s, encoding, errors):
"""Convert a null-terminated bytes object to a string.
"""
p = s.find(b"\0")
if p != -1:
s = s[:p]
return s.decode(encoding, errors)
def nti(s):
"""Convert a number field to a python number.
"""
# There are two possible encodings for a number field, see
# itn() below.
if s[0] in (0o200, 0o377):
n = 0
for i in range(len(s) - 1):
n <<= 8
n += s[i + 1]
if s[0] == 0o377:
n = -(256 ** (len(s) - 1) - n)
else:
try:
s = nts(s, "ascii", "strict")
n = int(s.strip() or "0", 8)
except ValueError:
raise InvalidHeaderError("invalid header")
return n
def itn(n, digits=8, format=DEFAULT_FORMAT):
"""Convert a python number to a number field.
"""
# POSIX 1003.1-1988 requires numbers to be encoded as a string of
# octal digits followed by a null-byte, this allows values up to
# (8**(digits-1))-1. GNU tar allows storing numbers greater than
# that if necessary. A leading 0o200 or 0o377 byte indicate this
# particular encoding, the following digits-1 bytes are a big-endian
# base-256 representation. This allows values up to (256**(digits-1))-1.
# A 0o200 byte indicates a positive number, a 0o377 byte a negative
# number.
if 0 <= n < 8 ** (digits - 1):
s = bytes("%0*o" % (digits - 1, int(n)), "ascii") + NUL
elif format == GNU_FORMAT and -256 ** (digits - 1) <= n < 256 ** (digits - 1):
if n >= 0:
s = bytearray([0o200])
else:
s = bytearray([0o377])
n = 256 ** digits + n
for i in range(digits - 1):
s.insert(1, n & 0o377)
n >>= 8
else:
raise ValueError("overflow in number field")
return s
def calc_chksums(buf):
"""Calculate the checksum for a member's header by summing up all
characters except for the chksum field which is treated as if
it was filled with spaces. According to the GNU tar sources,
some tars (Sun and NeXT) calculate chksum with signed char,
which will be different if there are chars in the buffer with
the high bit set. So we calculate two checksums, unsigned and
signed.
"""
unsigned_chksum = 256 + sum(struct.unpack_from("148B8x356B", buf))
signed_chksum = 256 + sum(struct.unpack_from("148b8x356b", buf))
return unsigned_chksum, signed_chksum
def copyfileobj(src, dst, length=None, exception=OSError):
"""Copy length bytes from fileobj src to fileobj dst.
If length is None, copy the entire content.
"""
if length == 0:
return
if length is None:
shutil.copyfileobj(src, dst)
return
BUFSIZE = 16 * 1024
blocks, remainder = divmod(length, BUFSIZE)
for b in range(blocks):
buf = src.read(BUFSIZE)
if len(buf) < BUFSIZE:
raise exception("unexpected end of data")
dst.write(buf)
if remainder != 0:
buf = src.read(remainder)
if len(buf) < remainder:
raise exception("unexpected end of data")
dst.write(buf)
return
def filemode(mode):
"""Deprecated in this location; use stat.filemode."""
import warnings
warnings.warn("deprecated in favor of stat.filemode",
DeprecationWarning, 2)
return stat.filemode(mode)
def _safe_print(s):
encoding = getattr(sys.stdout, 'encoding', None)
if encoding is not None:
s = s.encode(encoding, 'backslashreplace').decode(encoding)
print(s, end=' ')
class TarError(Exception):
"""Base exception."""
pass
class ExtractError(TarError):
"""General exception for extract errors."""
pass
class ReadError(TarError):
"""Exception for unreadable tar archives."""
pass
class CompressionError(TarError):
"""Exception for unavailable compression methods."""
pass
class StreamError(TarError):
"""Exception for unsupported operations on stream-like TarFiles."""
pass
class HeaderError(TarError):
"""Base exception for header errors."""
pass
class EmptyHeaderError(HeaderError):
"""Exception for empty headers."""
pass
class TruncatedHeaderError(HeaderError):
"""Exception for truncated headers."""
pass
class EOFHeaderError(HeaderError):
"""Exception for end of file headers."""
pass
class InvalidHeaderError(HeaderError):
"""Exception for invalid headers."""
pass
class SubsequentHeaderError(HeaderError):
"""Exception for missing and invalid extended headers."""
pass
#---------------------------
# internal stream interface
#---------------------------
class _LowLevelFile:
"""Low-level file object. Supports reading and writing.
It is used instead of a regular file object for streaming
access.
"""
def __init__(self, name, mode):
mode = {
"r": os.O_RDONLY,
"w": os.O_WRONLY | os.O_CREAT | os.O_TRUNC,
}[mode]
if hasattr(os, "O_BINARY"):
mode |= os.O_BINARY
self.fd = os.open(name, mode, 0o666)
def close(self):
os.close(self.fd)
def read(self, size):
return os.read(self.fd, size)
def write(self, s):
os.write(self.fd, s)
class _Stream:
"""Class that serves as an adapter between TarFile and
a stream-like object. The stream-like object only
needs to have a read() or write() method and is accessed
blockwise. Use of gzip or bzip2 compression is possible.
A stream-like object could be for example: sys.stdin,
sys.stdout, a socket, a tape device etc.
_Stream is intended to be used only internally.
"""
def __init__(self, name, mode, comptype, fileobj, bufsize):
"""Construct a _Stream object.
"""
self._extfileobj = True
if fileobj is None:
fileobj = _LowLevelFile(name, mode)
self._extfileobj = False
if comptype == '*':
# Enable transparent compression detection for the
# stream interface
fileobj = _StreamProxy(fileobj)
comptype = fileobj.getcomptype()
self.name = name or ""
self.mode = mode
self.comptype = comptype
self.fileobj = fileobj
self.bufsize = bufsize
self.buf = b""
self.pos = 0
self.closed = False
try:
if comptype == "gz":
try:
import zlib
except ImportError:
raise CompressionError("zlib module is not available")
self.zlib = zlib
self.crc = zlib.crc32(b"")
if mode == "r":
self._init_read_gz()
self.exception = zlib.error
else:
self._init_write_gz()
elif comptype == "bz2":
try:
import bz2
except ImportError:
raise CompressionError("bz2 module is not available")
if mode == "r":
self.dbuf = b""
self.cmp = bz2.BZ2Decompressor()
self.exception = OSError
else:
self.cmp = bz2.BZ2Compressor()
elif comptype == "xz":
try:
import lzma
except ImportError:
raise CompressionError("lzma module is not available")
if mode == "r":
self.dbuf = b""
self.cmp = lzma.LZMADecompressor()
self.exception = lzma.LZMAError
else:
self.cmp = lzma.LZMACompressor()
elif comptype != "tar":
raise CompressionError("unknown compression type %r" % comptype)
except:
if not self._extfileobj:
self.fileobj.close()
self.closed = True
raise
def __del__(self):
if hasattr(self, "closed") and not self.closed:
self.close()
def _init_write_gz(self):
"""Initialize for writing with gzip compression.
"""
self.cmp = self.zlib.compressobj(9, self.zlib.DEFLATED,
-self.zlib.MAX_WBITS,
self.zlib.DEF_MEM_LEVEL,
0)
timestamp = struct.pack("<L", int(time.time()))
self.__write(b"\037\213\010\010" + timestamp + b"\002\377")
if self.name.endswith(".gz"):
self.name = self.name[:-3]
# RFC1952 says we must use ISO-8859-1 for the FNAME field.
self.__write(self.name.encode("iso-8859-1", "replace") + NUL)
def write(self, s):
"""Write string s to the stream.
"""
if self.comptype == "gz":
self.crc = self.zlib.crc32(s, self.crc)
self.pos += len(s)
if self.comptype != "tar":
s = self.cmp.compress(s)
self.__write(s)
def __write(self, s):
"""Write string s to the stream if a whole new block
is ready to be written.
"""
self.buf += s
while len(self.buf) > self.bufsize:
self.fileobj.write(self.buf[:self.bufsize])
self.buf = self.buf[self.bufsize:]
def close(self):
"""Close the _Stream object. No operation should be
done on it afterwards.
"""
if self.closed:
return
self.closed = True
try:
if self.mode == "w" and self.comptype != "tar":
self.buf += self.cmp.flush()
if self.mode == "w" and self.buf:
self.fileobj.write(self.buf)
self.buf = b""
if self.comptype == "gz":
# The native zlib crc is an unsigned 32-bit integer, but
# the Python wrapper implicitly casts that to a signed C
# long. So, on a 32-bit box self.crc may "look negative",
# while the same crc on a 64-bit box may "look positive".
# To avoid irksome warnings from the `struct` module, force
# it to look positive on all boxes.
self.fileobj.write(struct.pack("<L", self.crc & 0xffffffff))
self.fileobj.write(struct.pack("<L", self.pos & 0xffffFFFF))
finally:
if not self._extfileobj:
self.fileobj.close()
def _init_read_gz(self):
"""Initialize for reading a gzip compressed fileobj.
"""
self.cmp = self.zlib.decompressobj(-self.zlib.MAX_WBITS)
self.dbuf = b""
# taken from gzip.GzipFile with some alterations
if self.__read(2) != b"\037\213":
raise ReadError("not a gzip file")
if self.__read(1) != b"\010":
raise CompressionError("unsupported compression method")
flag = ord(self.__read(1))
self.__read(6)
if flag & 4:
xlen = ord(self.__read(1)) + 256 * ord(self.__read(1))
self.read(xlen)
if flag & 8:
while True:
s = self.__read(1)
if not s or s == NUL:
break
if flag & 16:
while True:
s = self.__read(1)
if not s or s == NUL:
break
if flag & 2:
self.__read(2)
def tell(self):
"""Return the stream's file pointer position.
"""
return self.pos
def seek(self, pos=0):
"""Set the stream's file pointer to pos. Negative seeking
is forbidden.
"""
if pos - self.pos >= 0:
blocks, remainder = divmod(pos - self.pos, self.bufsize)
for i in range(blocks):
self.read(self.bufsize)
self.read(remainder)
else:
raise StreamError("seeking backwards is not allowed")
return self.pos
def read(self, size=None):
"""Return the next size number of bytes from the stream.
If size is not defined, return all bytes of the stream
up to EOF.
"""
if size is None:
t = []
while True:
buf = self._read(self.bufsize)
if not buf:
break
t.append(buf)
buf = "".join(t)
else:
buf = self._read(size)
self.pos += len(buf)
return buf
def _read(self, size):
"""Return size bytes from the stream.
"""
if self.comptype == "tar":
return self.__read(size)
c = len(self.dbuf)
while c < size:
buf = self.__read(self.bufsize)
if not buf:
break
try:
buf = self.cmp.decompress(buf)
except self.exception:
raise ReadError("invalid compressed data")
self.dbuf += buf
c += len(buf)
buf = self.dbuf[:size]
self.dbuf = self.dbuf[size:]
return buf
def __read(self, size):
"""Return size bytes from stream. If internal buffer is empty,
read another block from the stream.
"""
c = len(self.buf)
while c < size:
buf = self.fileobj.read(self.bufsize)
if not buf:
break
self.buf += buf
c += len(buf)
buf = self.buf[:size]
self.buf = self.buf[size:]
return buf
# class _Stream
class _StreamProxy(object):
"""Small proxy class that enables transparent compression
detection for the Stream interface (mode 'r|*').
"""
def __init__(self, fileobj):
self.fileobj = fileobj
self.buf = self.fileobj.read(BLOCKSIZE)
def read(self, size):
self.read = self.fileobj.read
return self.buf
def getcomptype(self):
if self.buf.startswith(b"\x1f\x8b\x08"):
return "gz"
elif self.buf[0:3] == b"BZh" and self.buf[4:10] == b"1AY&SY":
return "bz2"
elif self.buf.startswith((b"\x5d\x00\x00\x80", b"\xfd7zXZ")):
return "xz"
else:
return "tar"
def close(self):
self.fileobj.close()
# class StreamProxy
#------------------------
# Extraction file object
#------------------------
class _FileInFile(object):
"""A thin wrapper around an existing file object that
provides a part of its data as an individual file
object.
"""
def __init__(self, fileobj, offset, size, blockinfo=None):
self.fileobj = fileobj
self.offset = offset
self.size = size
self.position = 0
self.name = getattr(fileobj, "name", None)
self.closed = False
if blockinfo is None:
blockinfo = [(0, size)]
# Construct a map with data and zero blocks.
self.map_index = 0
self.map = []
lastpos = 0
realpos = self.offset
for offset, size in blockinfo:
if offset > lastpos:
self.map.append((False, lastpos, offset, None))
self.map.append((True, offset, offset + size, realpos))
realpos += size
lastpos = offset + size
if lastpos < self.size:
self.map.append((False, lastpos, self.size, None))
def flush(self):
pass
def readable(self):
return True
def writable(self):
return False
def seekable(self):
return self.fileobj.seekable()
def tell(self):
"""Return the current file position.
"""
return self.position
def seek(self, position, whence=io.SEEK_SET):
"""Seek to a position in the file.
"""
if whence == io.SEEK_SET:
self.position = min(max(position, 0), self.size)
elif whence == io.SEEK_CUR:
if position < 0:
self.position = max(self.position + position, 0)
else:
self.position = min(self.position + position, self.size)
elif whence == io.SEEK_END:
self.position = max(min(self.size + position, self.size), 0)
else:
raise ValueError("Invalid argument")
return self.position
def read(self, size=None):
"""Read data from the file.
"""
if size is None:
size = self.size - self.position
else:
size = min(size, self.size - self.position)
buf = b""
while size > 0:
while True:
data, start, stop, offset = self.map[self.map_index]
if start <= self.position < stop:
break
else:
self.map_index += 1
if self.map_index == len(self.map):
self.map_index = 0
length = min(size, stop - self.position)
if data:
self.fileobj.seek(offset + (self.position - start))
b = self.fileobj.read(length)
if len(b) != length:
raise ReadError("unexpected end of data")
buf += b
else:
buf += NUL * length
size -= length
self.position += length
return buf
def readinto(self, b):
buf = self.read(len(b))
b[:len(buf)] = buf
return len(buf)
def close(self):
self.closed = True
#class _FileInFile
class ExFileObject(io.BufferedReader):
def __init__(self, tarfile, tarinfo):
fileobj = _FileInFile(tarfile.fileobj, tarinfo.offset_data,
tarinfo.size, tarinfo.sparse)
super().__init__(fileobj)
#class ExFileObject
#------------------
# Exported Classes
#------------------
class TarInfo(object):
"""Informational class which holds the details about an
archive member given by a tar header block.
TarInfo objects are returned by TarFile.getmember(),
TarFile.getmembers() and TarFile.gettarinfo() and are
usually created internally.
"""
__slots__ = ("name", "mode", "uid", "gid", "size", "mtime",
"chksum", "type", "linkname", "uname", "gname",
"devmajor", "devminor",
"offset", "offset_data", "pax_headers", "sparse",
"tarfile", "_sparse_structs", "_link_target")
def __init__(self, name=""):
"""Construct a TarInfo object. name is the optional name
of the member.
"""
self.name = name # member name
self.mode = 0o644 # file permissions
self.uid = 0 # user id
self.gid = 0 # group id
self.size = 0 # file size
self.mtime = 0 # modification time
self.chksum = 0 # header checksum
self.type = REGTYPE # member type
self.linkname = "" # link name
self.uname = "" # user name
self.gname = "" # group name
self.devmajor = 0 # device major number
self.devminor = 0 # device minor number
self.offset = 0 # the tar header starts here
self.offset_data = 0 # the file's data starts here
self.sparse = None # sparse member information
self.pax_headers = {} # pax header information
# In pax headers the "name" and "linkname" field are called
# "path" and "linkpath".
def _getpath(self):
return self.name
def _setpath(self, name):
self.name = name
path = property(_getpath, _setpath)
def _getlinkpath(self):
return self.linkname
def _setlinkpath(self, linkname):
self.linkname = linkname
linkpath = property(_getlinkpath, _setlinkpath)
def __repr__(self):
return "<%s %r at %#x>" % (self.__class__.__name__,self.name,id(self))
def get_info(self):
"""Return the TarInfo's attributes as a dictionary.
"""
info = {
"name": self.name,
"mode": self.mode & 0o7777,
"uid": self.uid,
"gid": self.gid,
"size": self.size,
"mtime": self.mtime,
"chksum": self.chksum,
"type": self.type,
"linkname": self.linkname,
"uname": self.uname,
"gname": self.gname,
"devmajor": self.devmajor,
"devminor": self.devminor
}
if info["type"] == DIRTYPE and not info["name"].endswith("/"):
info["name"] += "/"
return info
def tobuf(self, format=DEFAULT_FORMAT, encoding=ENCODING, errors="surrogateescape"):
"""Return a tar header as a string of 512 byte blocks.
"""
info = self.get_info()
if format == USTAR_FORMAT:
return self.create_ustar_header(info, encoding, errors)
elif format == GNU_FORMAT:
return self.create_gnu_header(info, encoding, errors)
elif format == PAX_FORMAT:
return self.create_pax_header(info, encoding)
else:
raise ValueError("invalid format")
def create_ustar_header(self, info, encoding, errors):
"""Return the object as a ustar header block.
"""
info["magic"] = POSIX_MAGIC
if len(info["linkname"]) > LENGTH_LINK:
raise ValueError("linkname is too long")
if len(info["name"]) > LENGTH_NAME:
info["prefix"], info["name"] = self._posix_split_name(info["name"])
return self._create_header(info, USTAR_FORMAT, encoding, errors)
def create_gnu_header(self, info, encoding, errors):
"""Return the object as a GNU header block sequence.
"""
info["magic"] = GNU_MAGIC
buf = b""
if len(info["linkname"]) > LENGTH_LINK:
buf += self._create_gnu_long_header(info["linkname"], GNUTYPE_LONGLINK, encoding, errors)
if len(info["name"]) > LENGTH_NAME:
buf += self._create_gnu_long_header(info["name"], GNUTYPE_LONGNAME, encoding, errors)
return buf + self._create_header(info, GNU_FORMAT, encoding, errors)
def create_pax_header(self, info, encoding):
"""Return the object as a ustar header block. If it cannot be
represented this way, prepend a pax extended header sequence
with supplement information.
"""
info["magic"] = POSIX_MAGIC
pax_headers = self.pax_headers.copy()
# Test string fields for values that exceed the field length or cannot
# be represented in ASCII encoding.
for name, hname, length in (
("name", "path", LENGTH_NAME), ("linkname", "linkpath", LENGTH_LINK),
("uname", "uname", 32), ("gname", "gname", 32)):
if hname in pax_headers:
# The pax header has priority.
continue
# Try to encode the string as ASCII.
try:
info[name].encode("ascii", "strict")
except UnicodeEncodeError:
pax_headers[hname] = info[name]
continue
if len(info[name]) > length:
pax_headers[hname] = info[name]
# Test number fields for values that exceed the field limit or values
# that like to be stored as float.
for name, digits in (("uid", 8), ("gid", 8), ("size", 12), ("mtime", 12)):
if name in pax_headers:
# The pax header has priority. Avoid overflow.
info[name] = 0
continue
val = info[name]
if not 0 <= val < 8 ** (digits - 1) or isinstance(val, float):
pax_headers[name] = str(val)
info[name] = 0
# Create a pax extended header if necessary.
if pax_headers:
buf = self._create_pax_generic_header(pax_headers, XHDTYPE, encoding)
else:
buf = b""
return buf + self._create_header(info, USTAR_FORMAT, "ascii", "replace")
@classmethod
def create_pax_global_header(cls, pax_headers):
"""Return the object as a pax global header block sequence.
"""
return cls._create_pax_generic_header(pax_headers, XGLTYPE, "utf-8")
def _posix_split_name(self, name):
"""Split a name longer than 100 chars into a prefix
and a name part.
"""
prefix = name[:LENGTH_PREFIX + 1]
while prefix and prefix[-1] != "/":
prefix = prefix[:-1]
name = name[len(prefix):]
prefix = prefix[:-1]
if not prefix or len(name) > LENGTH_NAME:
raise ValueError("name is too long")
return prefix, name
@staticmethod
def _create_header(info, format, encoding, errors):
"""Return a header block. info is a dictionary with file
information, format must be one of the *_FORMAT constants.
"""
parts = [
stn(info.get("name", ""), 100, encoding, errors),
itn(info.get("mode", 0) & 0o7777, 8, format),
itn(info.get("uid", 0), 8, format),
itn(info.get("gid", 0), 8, format),
itn(info.get("size", 0), 12, format),
itn(info.get("mtime", 0), 12, format),
b" ", # checksum field
info.get("type", REGTYPE),
stn(info.get("linkname", ""), 100, encoding, errors),
info.get("magic", POSIX_MAGIC),
stn(info.get("uname", ""), 32, encoding, errors),
stn(info.get("gname", ""), 32, encoding, errors),
itn(info.get("devmajor", 0), 8, format),
itn(info.get("devminor", 0), 8, format),
stn(info.get("prefix", ""), 155, encoding, errors)
]
buf = struct.pack("%ds" % BLOCKSIZE, b"".join(parts))
chksum = calc_chksums(buf[-BLOCKSIZE:])[0]
buf = buf[:-364] + bytes("%06o\0" % chksum, "ascii") + buf[-357:]
return buf
@staticmethod
def _create_payload(payload):
"""Return the string payload filled with zero bytes
up to the next 512 byte border.
"""
blocks, remainder = divmod(len(payload), BLOCKSIZE)
if remainder > 0:
payload += (BLOCKSIZE - remainder) * NUL
return payload
@classmethod
def _create_gnu_long_header(cls, name, type, encoding, errors):
"""Return a GNUTYPE_LONGNAME or GNUTYPE_LONGLINK sequence
for name.
"""
name = name.encode(encoding, errors) + NUL
info = {}
info["name"] = "././@LongLink"
info["type"] = type
info["size"] = len(name)
info["magic"] = GNU_MAGIC
# create extended header + name blocks.
return cls._create_header(info, USTAR_FORMAT, encoding, errors) + \
cls._create_payload(name)
@classmethod
def _create_pax_generic_header(cls, pax_headers, type, encoding):
"""Return a POSIX.1-2008 extended or global header sequence
that contains a list of keyword, value pairs. The values
must be strings.
"""
# Check if one of the fields contains surrogate characters and thereby
# forces hdrcharset=BINARY, see _proc_pax() for more information.
binary = False
for keyword, value in pax_headers.items():
try:
value.encode("utf-8", "strict")
except UnicodeEncodeError:
binary = True
break
records = b""
if binary:
# Put the hdrcharset field at the beginning of the header.
records += b"21 hdrcharset=BINARY\n"
for keyword, value in pax_headers.items():
keyword = keyword.encode("utf-8")
if binary:
# Try to restore the original byte representation of `value'.
# Needless to say, that the encoding must match the string.
value = value.encode(encoding, "surrogateescape")
else:
value = value.encode("utf-8")
l = len(keyword) + len(value) + 3 # ' ' + '=' + '\n'
n = p = 0
while True:
n = l + len(str(p))
if n == p:
break
p = n
records += bytes(str(p), "ascii") + b" " + keyword + b"=" + value + b"\n"
# We use a hardcoded "././@PaxHeader" name like star does
# instead of the one that POSIX recommends.
info = {}
info["name"] = "././@PaxHeader"
info["type"] = type
info["size"] = len(records)
info["magic"] = POSIX_MAGIC
# Create pax header + record blocks.
return cls._create_header(info, USTAR_FORMAT, "ascii", "replace") + \
cls._create_payload(records)
@classmethod
def frombuf(cls, buf, encoding, errors):
"""Construct a TarInfo object from a 512 byte bytes object.
"""
if len(buf) == 0:
raise EmptyHeaderError("empty header")
if len(buf) != BLOCKSIZE:
raise TruncatedHeaderError("truncated header")
if buf.count(NUL) == BLOCKSIZE:
raise EOFHeaderError("end of file header")
chksum = nti(buf[148:156])
if chksum not in calc_chksums(buf):
raise InvalidHeaderError("bad checksum")
obj = cls()
obj.name = nts(buf[0:100], encoding, errors)
obj.mode = nti(buf[100:108])
obj.uid = nti(buf[108:116])
obj.gid = nti(buf[116:124])
obj.size = nti(buf[124:136])
obj.mtime = nti(buf[136:148])
obj.chksum = chksum
obj.type = buf[156:157]
obj.linkname = nts(buf[157:257], encoding, errors)
obj.uname = nts(buf[265:297], encoding, errors)
obj.gname = nts(buf[297:329], encoding, errors)
obj.devmajor = nti(buf[329:337])
obj.devminor = nti(buf[337:345])
prefix = nts(buf[345:500], encoding, errors)
# Old V7 tar format represents a directory as a regular
# file with a trailing slash.
if obj.type == AREGTYPE and obj.name.endswith("/"):
obj.type = DIRTYPE
# The old GNU sparse format occupies some of the unused
# space in the buffer for up to 4 sparse structures.
# Save the them for later processing in _proc_sparse().
if obj.type == GNUTYPE_SPARSE:
pos = 386
structs = []
for i in range(4):
try:
offset = nti(buf[pos:pos + 12])
numbytes = nti(buf[pos + 12:pos + 24])
except ValueError:
break
structs.append((offset, numbytes))
pos += 24
isextended = bool(buf[482])
origsize = nti(buf[483:495])
obj._sparse_structs = (structs, isextended, origsize)
# Remove redundant slashes from directories.
if obj.isdir():
obj.name = obj.name.rstrip("/")
# Reconstruct a ustar longname.
if prefix and obj.type not in GNU_TYPES:
obj.name = prefix + "/" + obj.name
return obj
@classmethod
def fromtarfile(cls, tarfile):
"""Return the next TarInfo object from TarFile object
tarfile.
"""
buf = tarfile.fileobj.read(BLOCKSIZE)
obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
return obj._proc_member(tarfile)
#--------------------------------------------------------------------------
# The following are methods that are called depending on the type of a
# member. The entry point is _proc_member() which can be overridden in a
# subclass to add custom _proc_*() methods. A _proc_*() method MUST
# implement the following
# operations:
# 1. Set self.offset_data to the position where the data blocks begin,
# if there is data that follows.
# 2. Set tarfile.offset to the position where the next member's header will
# begin.
# 3. Return self or another valid TarInfo object.
def _proc_member(self, tarfile):
"""Choose the right processing method depending on
the type and call it.
"""
if self.type in (GNUTYPE_LONGNAME, GNUTYPE_LONGLINK):
return self._proc_gnulong(tarfile)
elif self.type == GNUTYPE_SPARSE:
return self._proc_sparse(tarfile)
elif self.type in (XHDTYPE, XGLTYPE, SOLARIS_XHDTYPE):
return self._proc_pax(tarfile)
else:
return self._proc_builtin(tarfile)
def _proc_builtin(self, tarfile):
"""Process a builtin type or an unknown type which
will be treated as a regular file.
"""
self.offset_data = tarfile.fileobj.tell()
offset = self.offset_data
if self.isreg() or self.type not in SUPPORTED_TYPES:
# Skip the following data blocks.
offset += self._block(self.size)
tarfile.offset = offset
# Patch the TarInfo object with saved global
# header information.
self._apply_pax_info(tarfile.pax_headers, tarfile.encoding, tarfile.errors)
return self
def _proc_gnulong(self, tarfile):
"""Process the blocks that hold a GNU longname
or longlink member.
"""
buf = tarfile.fileobj.read(self._block(self.size))
# Fetch the next header and process it.
try:
next = self.fromtarfile(tarfile)
except HeaderError:
raise SubsequentHeaderError("missing or bad subsequent header")
# Patch the TarInfo object from the next header with
# the longname information.
next.offset = self.offset
if self.type == GNUTYPE_LONGNAME:
next.name = nts(buf, tarfile.encoding, tarfile.errors)
elif self.type == GNUTYPE_LONGLINK:
next.linkname = nts(buf, tarfile.encoding, tarfile.errors)
return next
def _proc_sparse(self, tarfile):
"""Process a GNU sparse header plus extra headers.
"""
# We already collected some sparse structures in frombuf().
structs, isextended, origsize = self._sparse_structs
del self._sparse_structs
# Collect sparse structures from extended header blocks.
while isextended:
buf = tarfile.fileobj.read(BLOCKSIZE)
pos = 0
for i in range(21):
try:
offset = nti(buf[pos:pos + 12])
numbytes = nti(buf[pos + 12:pos + 24])
except ValueError:
break
if offset and numbytes:
structs.append((offset, numbytes))
pos += 24
isextended = bool(buf[504])
self.sparse = structs
self.offset_data = tarfile.fileobj.tell()
tarfile.offset = self.offset_data + self._block(self.size)
self.size = origsize
return self
def _proc_pax(self, tarfile):
"""Process an extended or global header as described in
POSIX.1-2008.
"""
# Read the header information.
buf = tarfile.fileobj.read(self._block(self.size))
# A pax header stores supplemental information for either
# the following file (extended) or all following files
# (global).
if self.type == XGLTYPE:
pax_headers = tarfile.pax_headers
else:
pax_headers = tarfile.pax_headers.copy()
# Check if the pax header contains a hdrcharset field. This tells us
# the encoding of the path, linkpath, uname and gname fields. Normally,
# these fields are UTF-8 encoded but since POSIX.1-2008 tar
# implementations are allowed to store them as raw binary strings if
# the translation to UTF-8 fails.
match = re.search(br"\d+ hdrcharset=([^\n]+)\n", buf)
if match is not None:
pax_headers["hdrcharset"] = match.group(1).decode("utf-8")
# For the time being, we don't care about anything other than "BINARY".
# The only other value that is currently allowed by the standard is
# "ISO-IR 10646 2000 UTF-8" in other words UTF-8.
hdrcharset = pax_headers.get("hdrcharset")
if hdrcharset == "BINARY":
encoding = tarfile.encoding
else:
encoding = "utf-8"
# Parse pax header information. A record looks like that:
# "%d %s=%s\n" % (length, keyword, value). length is the size
# of the complete record including the length field itself and
# the newline. keyword and value are both UTF-8 encoded strings.
regex = re.compile(br"(\d+) ([^=]+)=")
pos = 0
while True:
match = regex.match(buf, pos)
if not match:
break
length, keyword = match.groups()
length = int(length)
value = buf[match.end(2) + 1:match.start(1) + length - 1]
# Normally, we could just use "utf-8" as the encoding and "strict"
# as the error handler, but we better not take the risk. For
# example, GNU tar <= 1.23 is known to store filenames it cannot
# translate to UTF-8 as raw strings (unfortunately without a
# hdrcharset=BINARY header).
# We first try the strict standard encoding, and if that fails we
# fall back on the user's encoding and error handler.
keyword = self._decode_pax_field(keyword, "utf-8", "utf-8",
tarfile.errors)
if keyword in PAX_NAME_FIELDS:
value = self._decode_pax_field(value, encoding, tarfile.encoding,
tarfile.errors)
else:
value = self._decode_pax_field(value, "utf-8", "utf-8",
tarfile.errors)
pax_headers[keyword] = value
pos += length
# Fetch the next header.
try:
next = self.fromtarfile(tarfile)
except HeaderError:
raise SubsequentHeaderError("missing or bad subsequent header")
# Process GNU sparse information.
if "GNU.sparse.map" in pax_headers:
# GNU extended sparse format version 0.1.
self._proc_gnusparse_01(next, pax_headers)
elif "GNU.sparse.size" in pax_headers:
# GNU extended sparse format version 0.0.
self._proc_gnusparse_00(next, pax_headers, buf)
elif pax_headers.get("GNU.sparse.major") == "1" and pax_headers.get("GNU.sparse.minor") == "0":
# GNU extended sparse format version 1.0.
self._proc_gnusparse_10(next, pax_headers, tarfile)
if self.type in (XHDTYPE, SOLARIS_XHDTYPE):
# Patch the TarInfo object with the extended header info.
next._apply_pax_info(pax_headers, tarfile.encoding, tarfile.errors)
next.offset = self.offset
if "size" in pax_headers:
# If the extended header replaces the size field,
# we need to recalculate the offset where the next
# header starts.
offset = next.offset_data
if next.isreg() or next.type not in SUPPORTED_TYPES:
offset += next._block(next.size)
tarfile.offset = offset
return next
def _proc_gnusparse_00(self, next, pax_headers, buf):
"""Process a GNU tar extended sparse header, version 0.0.
"""
offsets = []
for match in re.finditer(br"\d+ GNU.sparse.offset=(\d+)\n", buf):
offsets.append(int(match.group(1)))
numbytes = []
for match in re.finditer(br"\d+ GNU.sparse.numbytes=(\d+)\n", buf):
numbytes.append(int(match.group(1)))
next.sparse = list(zip(offsets, numbytes))
def _proc_gnusparse_01(self, next, pax_headers):
"""Process a GNU tar extended sparse header, version 0.1.
"""
sparse = [int(x) for x in pax_headers["GNU.sparse.map"].split(",")]
next.sparse = list(zip(sparse[::2], sparse[1::2]))
def _proc_gnusparse_10(self, next, pax_headers, tarfile):
"""Process a GNU tar extended sparse header, version 1.0.
"""
fields = None
sparse = []
buf = tarfile.fileobj.read(BLOCKSIZE)
fields, buf = buf.split(b"\n", 1)
fields = int(fields)
while len(sparse) < fields * 2:
if b"\n" not in buf:
buf += tarfile.fileobj.read(BLOCKSIZE)
number, buf = buf.split(b"\n", 1)
sparse.append(int(number))
next.offset_data = tarfile.fileobj.tell()
next.sparse = list(zip(sparse[::2], sparse[1::2]))
def _apply_pax_info(self, pax_headers, encoding, errors):
"""Replace fields with supplemental information from a previous
pax extended or global header.
"""
for keyword, value in pax_headers.items():
if keyword == "GNU.sparse.name":
setattr(self, "path", value)
elif keyword == "GNU.sparse.size":
setattr(self, "size", int(value))
elif keyword == "GNU.sparse.realsize":
setattr(self, "size", int(value))
elif keyword in PAX_FIELDS:
if keyword in PAX_NUMBER_FIELDS:
try:
value = PAX_NUMBER_FIELDS[keyword](value)
except ValueError:
value = 0
if keyword == "path":
value = value.rstrip("/")
setattr(self, keyword, value)
self.pax_headers = pax_headers.copy()
def _decode_pax_field(self, value, encoding, fallback_encoding, fallback_errors):
"""Decode a single field from a pax record.
"""
try:
return value.decode(encoding, "strict")
except UnicodeDecodeError:
return value.decode(fallback_encoding, fallback_errors)
def _block(self, count):
"""Round up a byte count by BLOCKSIZE and return it,
e.g. _block(834) => 1024.
"""
blocks, remainder = divmod(count, BLOCKSIZE)
if remainder:
blocks += 1
return blocks * BLOCKSIZE
def isreg(self):
return self.type in REGULAR_TYPES
def isfile(self):
return self.isreg()
def isdir(self):
return self.type == DIRTYPE
def issym(self):
return self.type == SYMTYPE
def islnk(self):
return self.type == LNKTYPE
def ischr(self):
return self.type == CHRTYPE
def isblk(self):
return self.type == BLKTYPE
def isfifo(self):
return self.type == FIFOTYPE
def issparse(self):
return self.sparse is not None
def isdev(self):
return self.type in (CHRTYPE, BLKTYPE, FIFOTYPE)
# class TarInfo
class TarFile(object):
"""The TarFile Class provides an interface to tar archives.
"""
debug = 0 # May be set from 0 (no msgs) to 3 (all msgs)
dereference = False # If true, add content of linked file to the
# tar file, else the link.
ignore_zeros = False # If true, skips empty or invalid blocks and
# continues processing.
errorlevel = 1 # If 0, fatal errors only appear in debug
# messages (if debug >= 0). If > 0, errors
# are passed to the caller as exceptions.
format = DEFAULT_FORMAT # The format to use when creating an archive.
encoding = ENCODING # Encoding for 8-bit character strings.
errors = None # Error handler for unicode conversion.
tarinfo = TarInfo # The default TarInfo class to use.
fileobject = ExFileObject # The file-object for extractfile().
def __init__(self, name=None, mode="r", fileobj=None, format=None,
tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
errors="surrogateescape", pax_headers=None, debug=None, errorlevel=None):
"""Open an (uncompressed) tar archive `name'. `mode' is either 'r' to
read from an existing archive, 'a' to append data to an existing
file or 'w' to create a new file overwriting an existing one. `mode'
defaults to 'r'.
If `fileobj' is given, it is used for reading or writing data. If it
can be determined, `mode' is overridden by `fileobj's mode.
`fileobj' is not closed, when TarFile is closed.
"""
modes = {"r": "rb", "a": "r+b", "w": "wb"}
if mode not in modes:
raise ValueError("mode must be 'r', 'a' or 'w'")
self.mode = mode
self._mode = modes[mode]
if not fileobj:
if self.mode == "a" and not os.path.exists(name):
# Create nonexistent files in append mode.
self.mode = "w"
self._mode = "wb"
fileobj = bltn_open(name, self._mode)
self._extfileobj = False
else:
if (name is None and hasattr(fileobj, "name") and
isinstance(fileobj.name, (str, bytes))):
name = fileobj.name
if hasattr(fileobj, "mode"):
self._mode = fileobj.mode
self._extfileobj = True
self.name = os.path.abspath(name) if name else None
self.fileobj = fileobj
# Init attributes.
if format is not None:
self.format = format
if tarinfo is not None:
self.tarinfo = tarinfo
if dereference is not None:
self.dereference = dereference
if ignore_zeros is not None:
self.ignore_zeros = ignore_zeros
if encoding is not None:
self.encoding = encoding
self.errors = errors
if pax_headers is not None and self.format == PAX_FORMAT:
self.pax_headers = pax_headers
else:
self.pax_headers = {}
if debug is not None:
self.debug = debug
if errorlevel is not None:
self.errorlevel = errorlevel
# Init datastructures.
self.closed = False
self.members = [] # list of members as TarInfo objects
self._loaded = False # flag if all members have been read
self.offset = self.fileobj.tell()
# current position in the archive file
self.inodes = {} # dictionary caching the inodes of
# archive members already added
try:
if self.mode == "r":
self.firstmember = None
self.firstmember = self.next()
if self.mode == "a":
# Move to the end of the archive,
# before the first empty block.
while True:
self.fileobj.seek(self.offset)
try:
tarinfo = self.tarinfo.fromtarfile(self)
self.members.append(tarinfo)
except EOFHeaderError:
self.fileobj.seek(self.offset)
break
except HeaderError as e:
raise ReadError(str(e))
if self.mode in "aw":
self._loaded = True
if self.pax_headers:
buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())
self.fileobj.write(buf)
self.offset += len(buf)
except:
if not self._extfileobj:
self.fileobj.close()
self.closed = True
raise
#--------------------------------------------------------------------------
# Below are the classmethods which act as alternate constructors to the
# TarFile class. The open() method is the only one that is needed for
# public use; it is the "super"-constructor and is able to select an
# adequate "sub"-constructor for a particular compression using the mapping
# from OPEN_METH.
#
# This concept allows one to subclass TarFile without losing the comfort of
# the super-constructor. A sub-constructor is registered and made available
# by adding it to the mapping in OPEN_METH.
@classmethod
def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
"""Open a tar archive for reading, writing or appending. Return
an appropriate TarFile class.
mode:
'r' or 'r:*' open for reading with transparent compression
'r:' open for reading exclusively uncompressed
'r:gz' open for reading with gzip compression
'r:bz2' open for reading with bzip2 compression
'r:xz' open for reading with lzma compression
'a' or 'a:' open for appending, creating the file if necessary
'w' or 'w:' open for writing without compression
'w:gz' open for writing with gzip compression
'w:bz2' open for writing with bzip2 compression
'w:xz' open for writing with lzma compression
'r|*' open a stream of tar blocks with transparent compression
'r|' open an uncompressed stream of tar blocks for reading
'r|gz' open a gzip compressed stream of tar blocks
'r|bz2' open a bzip2 compressed stream of tar blocks
'r|xz' open an lzma compressed stream of tar blocks
'w|' open an uncompressed stream for writing
'w|gz' open a gzip compressed stream for writing
'w|bz2' open a bzip2 compressed stream for writing
'w|xz' open an lzma compressed stream for writing
"""
if not name and not fileobj:
raise ValueError("nothing to open")
if mode in ("r", "r:*"):
# Find out which *open() is appropriate for opening the file.
for comptype in cls.OPEN_METH:
func = getattr(cls, cls.OPEN_METH[comptype])
if fileobj is not None:
saved_pos = fileobj.tell()
try:
return func(name, "r", fileobj, **kwargs)
except (ReadError, CompressionError) as e:
if fileobj is not None:
fileobj.seek(saved_pos)
continue
raise ReadError("file could not be opened successfully")
elif ":" in mode:
filemode, comptype = mode.split(":", 1)
filemode = filemode or "r"
comptype = comptype or "tar"
# Select the *open() function according to
# given compression.
if comptype in cls.OPEN_METH:
func = getattr(cls, cls.OPEN_METH[comptype])
else:
raise CompressionError("unknown compression type %r" % comptype)
return func(name, filemode, fileobj, **kwargs)
elif "|" in mode:
filemode, comptype = mode.split("|", 1)
filemode = filemode or "r"
comptype = comptype or "tar"
if filemode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'")
stream = _Stream(name, filemode, comptype, fileobj, bufsize)
try:
t = cls(name, filemode, stream, **kwargs)
except:
stream.close()
raise
t._extfileobj = False
return t
elif mode in ("a", "w"):
return cls.taropen(name, mode, fileobj, **kwargs)
raise ValueError("undiscernible mode")
@classmethod
def taropen(cls, name, mode="r", fileobj=None, **kwargs):
"""Open uncompressed tar archive name for reading or writing.
"""
if mode not in ("r", "a", "w"):
raise ValueError("mode must be 'r', 'a' or 'w'")
return cls(name, mode, fileobj, **kwargs)
@classmethod
def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
"""Open gzip compressed tar archive name for reading or writing.
Appending is not allowed.
"""
if mode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'")
try:
import gzip
gzip.GzipFile
except (ImportError, AttributeError):
raise CompressionError("gzip module is not available")
try:
fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
except OSError:
if fileobj is not None and mode == 'r':
raise ReadError("not a gzip file")
raise
try:
t = cls.taropen(name, mode, fileobj, **kwargs)
except OSError:
fileobj.close()
if mode == 'r':
raise ReadError("not a gzip file")
raise
except:
fileobj.close()
raise
t._extfileobj = False
return t
@classmethod
def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
"""Open bzip2 compressed tar archive name for reading or writing.
Appending is not allowed.
"""
if mode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'.")
try:
import bz2
except ImportError:
raise CompressionError("bz2 module is not available")
fileobj = bz2.BZ2File(fileobj or name, mode,
compresslevel=compresslevel)
try:
t = cls.taropen(name, mode, fileobj, **kwargs)
except (OSError, EOFError):
fileobj.close()
if mode == 'r':
raise ReadError("not a bzip2 file")
raise
except:
fileobj.close()
raise
t._extfileobj = False
return t
@classmethod
def xzopen(cls, name, mode="r", fileobj=None, preset=None, **kwargs):
"""Open lzma compressed tar archive name for reading or writing.
Appending is not allowed.
"""
if mode not in ("r", "w"):
raise ValueError("mode must be 'r' or 'w'")
try:
import lzma
except ImportError:
raise CompressionError("lzma module is not available")
fileobj = lzma.LZMAFile(fileobj or name, mode, preset=preset)
try:
t = cls.taropen(name, mode, fileobj, **kwargs)
except (lzma.LZMAError, EOFError):
fileobj.close()
if mode == 'r':
raise ReadError("not an lzma file")
raise
except:
fileobj.close()
raise
t._extfileobj = False
return t
# All *open() methods are registered here.
OPEN_METH = {
"tar": "taropen", # uncompressed tar
"gz": "gzopen", # gzip compressed tar
"bz2": "bz2open", # bzip2 compressed tar
"xz": "xzopen" # lzma compressed tar
}
#--------------------------------------------------------------------------
# The public methods which TarFile provides:
def close(self):
"""Close the TarFile. In write-mode, two finishing zero blocks are
appended to the archive.
"""
if self.closed:
return
self.closed = True
try:
if self.mode in "aw":
self.fileobj.write(NUL * (BLOCKSIZE * 2))
self.offset += (BLOCKSIZE * 2)
# fill up the end with zero-blocks
# (like option -b20 for tar does)
blocks, remainder = divmod(self.offset, RECORDSIZE)
if remainder > 0:
self.fileobj.write(NUL * (RECORDSIZE - remainder))
finally:
if not self._extfileobj:
self.fileobj.close()
def getmember(self, name):
"""Return a TarInfo object for member `name'. If `name' can not be
found in the archive, KeyError is raised. If a member occurs more
than once in the archive, its last occurrence is assumed to be the
most up-to-date version.
"""
tarinfo = self._getmember(name)
if tarinfo is None:
raise KeyError("filename %r not found" % name)
return tarinfo
def getmembers(self):
"""Return the members of the archive as a list of TarInfo objects. The
list has the same order as the members in the archive.
"""
self._check()
if not self._loaded: # if we want to obtain a list of
self._load() # all members, we first have to
# scan the whole archive.
return self.members
def getnames(self):
"""Return the members of the archive as a list of their names. It has
the same order as the list returned by getmembers().
"""
return [tarinfo.name for tarinfo in self.getmembers()]
def gettarinfo(self, name=None, arcname=None, fileobj=None):
"""Create a TarInfo object for either the file `name' or the file
object `fileobj' (using os.fstat on its file descriptor). You can
modify some of the TarInfo's attributes before you add it using
addfile(). If given, `arcname' specifies an alternative name for the
file in the archive.
"""
self._check("aw")
# When fileobj is given, replace name by
# fileobj's real name.
if fileobj is not None:
name = fileobj.name
# Building the name of the member in the archive.
# Backward slashes are converted to forward slashes,
# Absolute paths are turned to relative paths.
if arcname is None:
arcname = name
drv, arcname = os.path.splitdrive(arcname)
arcname = arcname.replace(os.sep, "/")
arcname = arcname.lstrip("/")
# Now, fill the TarInfo object with
# information specific for the file.
tarinfo = self.tarinfo()
tarinfo.tarfile = self
# Use os.stat or os.lstat, depending on platform
# and if symlinks shall be resolved.
if fileobj is None:
if hasattr(os, "lstat") and not self.dereference:
statres = os.lstat(name)
else:
statres = os.stat(name)
else:
statres = os.fstat(fileobj.fileno())
linkname = ""
stmd = statres.st_mode
if stat.S_ISREG(stmd):
inode = (statres.st_ino, statres.st_dev)
if not self.dereference and statres.st_nlink > 1 and \
inode in self.inodes and arcname != self.inodes[inode]:
# Is it a hardlink to an already
# archived file?
type = LNKTYPE
linkname = self.inodes[inode]
else:
# The inode is added only if its valid.
# For win32 it is always 0.
type = REGTYPE
if inode[0]:
self.inodes[inode] = arcname
elif stat.S_ISDIR(stmd):
type = DIRTYPE
elif stat.S_ISFIFO(stmd):
type = FIFOTYPE
elif stat.S_ISLNK(stmd):
type = SYMTYPE
linkname = os.readlink(name)
elif stat.S_ISCHR(stmd):
type = CHRTYPE
elif stat.S_ISBLK(stmd):
type = BLKTYPE
else:
return None
# Fill the TarInfo object with all
# information we can get.
tarinfo.name = arcname
tarinfo.mode = stmd
tarinfo.uid = statres.st_uid
tarinfo.gid = statres.st_gid
if type == REGTYPE:
tarinfo.size = statres.st_size
else:
tarinfo.size = 0
tarinfo.mtime = statres.st_mtime
tarinfo.type = type
tarinfo.linkname = linkname
if pwd:
try:
tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]
except KeyError:
pass
if grp:
try:
tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]
except KeyError:
pass
if type in (CHRTYPE, BLKTYPE):
if hasattr(os, "major") and hasattr(os, "minor"):
tarinfo.devmajor = os.major(statres.st_rdev)
tarinfo.devminor = os.minor(statres.st_rdev)
return tarinfo
def list(self, verbose=True):
"""Print a table of contents to sys.stdout. If `verbose' is False, only
the names of the members are printed. If it is True, an `ls -l'-like
output is produced.
"""
self._check()
for tarinfo in self:
if verbose:
_safe_print(stat.filemode(tarinfo.mode))
_safe_print("%s/%s" % (tarinfo.uname or tarinfo.uid,
tarinfo.gname or tarinfo.gid))
if tarinfo.ischr() or tarinfo.isblk():
_safe_print("%10s" %
("%d,%d" % (tarinfo.devmajor, tarinfo.devminor)))
else:
_safe_print("%10d" % tarinfo.size)
_safe_print("%d-%02d-%02d %02d:%02d:%02d" \
% time.localtime(tarinfo.mtime)[:6])
_safe_print(tarinfo.name + ("/" if tarinfo.isdir() else ""))
if verbose:
if tarinfo.issym():
_safe_print("-> " + tarinfo.linkname)
if tarinfo.islnk():
_safe_print("link to " + tarinfo.linkname)
print()
def add(self, name, arcname=None, recursive=True, exclude=None, *, filter=None):
"""Add the file `name' to the archive. `name' may be any type of file
(directory, fifo, symbolic link, etc.). If given, `arcname'
specifies an alternative name for the file in the archive.
Directories are added recursively by default. This can be avoided by
setting `recursive' to False. `exclude' is a function that should
return True for each filename to be excluded. `filter' is a function
that expects a TarInfo object argument and returns the changed
TarInfo object, if it returns None the TarInfo object will be
excluded from the archive.
"""
self._check("aw")
if arcname is None:
arcname = name
# Exclude pathnames.
if exclude is not None:
import warnings
warnings.warn("use the filter argument instead",
DeprecationWarning, 2)
if exclude(name):
self._dbg(2, "tarfile: Excluded %r" % name)
return
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return
self._dbg(1, name)
# Create a TarInfo object from the file.
tarinfo = self.gettarinfo(name, arcname)
if tarinfo is None:
self._dbg(1, "tarfile: Unsupported type %r" % name)
return
# Change or exclude the TarInfo object.
if filter is not None:
tarinfo = filter(tarinfo)
if tarinfo is None:
self._dbg(2, "tarfile: Excluded %r" % name)
return
# Append the tar header and data to the archive.
if tarinfo.isreg():
with bltn_open(name, "rb") as f:
self.addfile(tarinfo, f)
elif tarinfo.isdir():
self.addfile(tarinfo)
if recursive:
for f in os.listdir(name):
self.add(os.path.join(name, f), os.path.join(arcname, f),
recursive, exclude, filter=filter)
else:
self.addfile(tarinfo)
def addfile(self, tarinfo, fileobj=None):
"""Add the TarInfo object `tarinfo' to the archive. If `fileobj' is
given, tarinfo.size bytes are read from it and added to the archive.
You can create TarInfo objects using gettarinfo().
On Windows platforms, `fileobj' should always be opened with mode
'rb' to avoid irritation about the file size.
"""
self._check("aw")
tarinfo = copy.copy(tarinfo)
buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
self.fileobj.write(buf)
self.offset += len(buf)
# If there's data to follow, append it.
if fileobj is not None:
copyfileobj(fileobj, self.fileobj, tarinfo.size)
blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)
if remainder > 0:
self.fileobj.write(NUL * (BLOCKSIZE - remainder))
blocks += 1
self.offset += blocks * BLOCKSIZE
self.members.append(tarinfo)
def extractall(self, path=".", members=None):
"""Extract all members from the archive to the current working
directory and set owner, modification time and permissions on
directories afterwards. `path' specifies a different directory
to extract to. `members' is optional and must be a subset of the
list returned by getmembers().
"""
directories = []
if members is None:
members = self
for tarinfo in members:
if tarinfo.isdir():
# Extract directories with a safe mode.
directories.append(tarinfo)
tarinfo = copy.copy(tarinfo)
tarinfo.mode = 0o700
# Do not set_attrs directories, as we will do that further down
self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
# Reverse sort directories.
directories.sort(key=lambda a: a.name)
directories.reverse()
# Set correct owner, mtime and filemode on directories.
for tarinfo in directories:
dirpath = os.path.join(path, tarinfo.name)
try:
self.chown(tarinfo, dirpath)
self.utime(tarinfo, dirpath)
self.chmod(tarinfo, dirpath)
except ExtractError as e:
if self.errorlevel > 1:
raise
else:
self._dbg(1, "tarfile: %s" % e)
def extract(self, member, path="", set_attrs=True):
"""Extract a member from the archive to the current working directory,
using its full name. Its file information is extracted as accurately
as possible. `member' may be a filename or a TarInfo object. You can
specify a different directory using `path'. File attributes (owner,
mtime, mode) are set unless `set_attrs' is False.
"""
self._check("r")
if isinstance(member, str):
tarinfo = self.getmember(member)
else:
tarinfo = member
# Prepare the link target for makelink().
if tarinfo.islnk():
tarinfo._link_target = os.path.join(path, tarinfo.linkname)
try:
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
set_attrs=set_attrs)
except OSError as e:
if self.errorlevel > 0:
raise
else:
if e.filename is None:
self._dbg(1, "tarfile: %s" % e.strerror)
else:
self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))
except ExtractError as e:
if self.errorlevel > 1:
raise
else:
self._dbg(1, "tarfile: %s" % e)
def extractfile(self, member):
"""Extract a member from the archive as a file object. `member' may be
a filename or a TarInfo object. If `member' is a regular file or a
link, an io.BufferedReader object is returned. Otherwise, None is
returned.
"""
self._check("r")
if isinstance(member, str):
tarinfo = self.getmember(member)
else:
tarinfo = member
if tarinfo.isreg() or tarinfo.type not in SUPPORTED_TYPES:
# Members with unknown types are treated as regular files.
return self.fileobject(self, tarinfo)
elif tarinfo.islnk() or tarinfo.issym():
if isinstance(self.fileobj, _Stream):
# A small but ugly workaround for the case that someone tries
# to extract a (sym)link as a file-object from a non-seekable
# stream of tar blocks.
raise StreamError("cannot extract (sym)link as file object")
else:
# A (sym)link's file object is its target's file object.
return self.extractfile(self._find_link_target(tarinfo))
else:
# If there's no data associated with the member (directory, chrdev,
# blkdev, etc.), return None instead of a file object.
return None
def _extract_member(self, tarinfo, targetpath, set_attrs=True):
"""Extract the TarInfo object tarinfo to a physical
file called targetpath.
"""
# Fetch the TarInfo object for the given name
# and build the destination pathname, replacing
# forward slashes to platform specific separators.
targetpath = targetpath.rstrip("/")
targetpath = targetpath.replace("/", os.sep)
# Create all upper directories.
upperdirs = os.path.dirname(targetpath)
if upperdirs and not os.path.exists(upperdirs):
# Create directories that are not part of the archive with
# default permissions.
os.makedirs(upperdirs)
if tarinfo.islnk() or tarinfo.issym():
self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))
else:
self._dbg(1, tarinfo.name)
if tarinfo.isreg():
self.makefile(tarinfo, targetpath)
elif tarinfo.isdir():
self.makedir(tarinfo, targetpath)
elif tarinfo.isfifo():
self.makefifo(tarinfo, targetpath)
elif tarinfo.ischr() or tarinfo.isblk():
self.makedev(tarinfo, targetpath)
elif tarinfo.islnk() or tarinfo.issym():
self.makelink(tarinfo, targetpath)
elif tarinfo.type not in SUPPORTED_TYPES:
self.makeunknown(tarinfo, targetpath)
else:
self.makefile(tarinfo, targetpath)
if set_attrs:
self.chown(tarinfo, targetpath)
if not tarinfo.issym():
self.chmod(tarinfo, targetpath)
self.utime(tarinfo, targetpath)
#--------------------------------------------------------------------------
# Below are the different file methods. They are called via
# _extract_member() when extract() is called. They can be replaced in a
# subclass to implement other functionality.
def makedir(self, tarinfo, targetpath):
"""Make a directory called targetpath.
"""
try:
# Use a safe mode for the directory, the real mode is set
# later in _extract_member().
os.mkdir(targetpath, 0o700)
except FileExistsError:
pass
def makefile(self, tarinfo, targetpath):
"""Make a file called targetpath.
"""
source = self.fileobj
source.seek(tarinfo.offset_data)
with bltn_open(targetpath, "wb") as target:
if tarinfo.sparse is not None:
for offset, size in tarinfo.sparse:
target.seek(offset)
copyfileobj(source, target, size, ReadError)
else:
copyfileobj(source, target, tarinfo.size, ReadError)
target.seek(tarinfo.size)
target.truncate()
def makeunknown(self, tarinfo, targetpath):
"""Make a file from a TarInfo object with an unknown type
at targetpath.
"""
self.makefile(tarinfo, targetpath)
self._dbg(1, "tarfile: Unknown file type %r, " \
"extracted as regular file." % tarinfo.type)
def makefifo(self, tarinfo, targetpath):
"""Make a fifo called targetpath.
"""
if hasattr(os, "mkfifo"):
os.mkfifo(targetpath)
else:
raise ExtractError("fifo not supported by system")
def makedev(self, tarinfo, targetpath):
"""Make a character or block device called targetpath.
"""
if not hasattr(os, "mknod") or not hasattr(os, "makedev"):
raise ExtractError("special devices not supported by system")
mode = tarinfo.mode
if tarinfo.isblk():
mode |= stat.S_IFBLK
else:
mode |= stat.S_IFCHR
os.mknod(targetpath, mode,
os.makedev(tarinfo.devmajor, tarinfo.devminor))
def makelink(self, tarinfo, targetpath):
"""Make a (symbolic) link called targetpath. If it cannot be created
(platform limitation), we try to make a copy of the referenced file
instead of a link.
"""
try:
# For systems that support symbolic and hard links.
if tarinfo.issym():
os.symlink(tarinfo.linkname, targetpath)
else:
# See extract().
if os.path.exists(tarinfo._link_target):
os.link(tarinfo._link_target, targetpath)
else:
self._extract_member(self._find_link_target(tarinfo),
targetpath)
except symlink_exception:
try:
self._extract_member(self._find_link_target(tarinfo),
targetpath)
except KeyError:
raise ExtractError("unable to resolve link inside archive")
def chown(self, tarinfo, targetpath):
"""Set owner of targetpath according to tarinfo.
"""
if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
# We have to be root to do so.
try:
g = grp.getgrnam(tarinfo.gname)[2]
except KeyError:
g = tarinfo.gid
try:
u = pwd.getpwnam(tarinfo.uname)[2]
except KeyError:
u = tarinfo.uid
try:
if tarinfo.issym() and hasattr(os, "lchown"):
os.lchown(targetpath, u, g)
else:
os.chown(targetpath, u, g)
except OSError as e:
raise ExtractError("could not change owner")
def chmod(self, tarinfo, targetpath):
"""Set file permissions of targetpath according to tarinfo.
"""
if hasattr(os, 'chmod'):
try:
os.chmod(targetpath, tarinfo.mode)
except OSError as e:
raise ExtractError("could not change mode")
def utime(self, tarinfo, targetpath):
"""Set modification time of targetpath according to tarinfo.
"""
if not hasattr(os, 'utime'):
return
try:
os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))
except OSError as e:
raise ExtractError("could not change modification time")
#--------------------------------------------------------------------------
def next(self):
"""Return the next member of the archive as a TarInfo object, when
TarFile is opened for reading. Return None if there is no more
available.
"""
self._check("ra")
if self.firstmember is not None:
m = self.firstmember
self.firstmember = None
return m
# Advance the file pointer.
if self.offset != self.fileobj.tell():
self.fileobj.seek(self.offset - 1)
if not self.fileobj.read(1):
raise ReadError("unexpected end of data")
# Read the next block.
tarinfo = None
while True:
try:
tarinfo = self.tarinfo.fromtarfile(self)
except EOFHeaderError as e:
if self.ignore_zeros:
self._dbg(2, "0x%X: %s" % (self.offset, e))
self.offset += BLOCKSIZE
continue
except InvalidHeaderError as e:
if self.ignore_zeros:
self._dbg(2, "0x%X: %s" % (self.offset, e))
self.offset += BLOCKSIZE
continue
elif self.offset == 0:
raise ReadError(str(e))
except EmptyHeaderError:
if self.offset == 0:
raise ReadError("empty file")
except TruncatedHeaderError as e:
if self.offset == 0:
raise ReadError(str(e))
except SubsequentHeaderError as e:
raise ReadError(str(e))
break
if tarinfo is not None:
self.members.append(tarinfo)
else:
self._loaded = True
return tarinfo
#--------------------------------------------------------------------------
# Little helper methods:
def _getmember(self, name, tarinfo=None, normalize=False):
"""Find an archive member by name from bottom to top.
If tarinfo is given, it is used as the starting point.
"""
# Ensure that all members have been loaded.
members = self.getmembers()
# Limit the member search list up to tarinfo.
if tarinfo is not None:
members = members[:members.index(tarinfo)]
if normalize:
name = os.path.normpath(name)
for member in reversed(members):
if normalize:
member_name = os.path.normpath(member.name)
else:
member_name = member.name
if name == member_name:
return member
def _load(self):
"""Read through the entire archive file and look for readable
members.
"""
while True:
tarinfo = self.next()
if tarinfo is None:
break
self._loaded = True
def _check(self, mode=None):
"""Check if TarFile is still open, and if the operation's mode
corresponds to TarFile's mode.
"""
if self.closed:
raise OSError("%s is closed" % self.__class__.__name__)
if mode is not None and self.mode not in mode:
raise OSError("bad operation for mode %r" % self.mode)
def _find_link_target(self, tarinfo):
"""Find the target member of a symlink or hardlink member in the
archive.
"""
if tarinfo.issym():
# Always search the entire archive.
linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))
limit = None
else:
# Search the archive before the link, because a hard link is
# just a reference to an already archived file.
linkname = tarinfo.linkname
limit = tarinfo
member = self._getmember(linkname, tarinfo=limit, normalize=True)
if member is None:
raise KeyError("linkname %r not found" % linkname)
return member
def __iter__(self):
"""Provide an iterator object.
"""
if self._loaded:
return iter(self.members)
else:
return TarIter(self)
def _dbg(self, level, msg):
"""Write debugging output to sys.stderr.
"""
if level <= self.debug:
print(msg, file=sys.stderr)
def __enter__(self):
self._check()
return self
def __exit__(self, type, value, traceback):
if type is None:
self.close()
else:
# An exception occurred. We must not call close() because
# it would try to write end-of-archive blocks and padding.
if not self._extfileobj:
self.fileobj.close()
self.closed = True
# class TarFile
class TarIter:
"""Iterator Class.
for tarinfo in TarFile(...):
suite...
"""
def __init__(self, tarfile):
"""Construct a TarIter object.
"""
self.tarfile = tarfile
self.index = 0
def __iter__(self):
"""Return iterator object.
"""
return self
def __next__(self):
"""Return the next item using TarFile's next() method.
When all members have been read, set TarFile as _loaded.
"""
# Fix for SF #1100429: Under rare circumstances it can
# happen that getmembers() is called during iteration,
# which will cause TarIter to stop prematurely.
if self.index == 0 and self.tarfile.firstmember is not None:
tarinfo = self.tarfile.next()
elif self.index < len(self.tarfile.members):
tarinfo = self.tarfile.members[self.index]
elif not self.tarfile._loaded:
tarinfo = self.tarfile.next()
if not tarinfo:
self.tarfile._loaded = True
raise StopIteration
else:
raise StopIteration
self.index += 1
return tarinfo
#--------------------
# exported functions
#--------------------
def is_tarfile(name):
"""Return True if name points to a tar archive that we
are able to handle, else return False.
"""
try:
t = open(name)
t.close()
return True
except TarError:
return False
open = TarFile.open
def main():
import argparse
description = 'A simple command line interface for tarfile module.'
parser = argparse.ArgumentParser(description=description)
parser.add_argument('-v', '--verbose', action='store_true', default=False,
help='Verbose output')
group = parser.add_mutually_exclusive_group()
group.add_argument('-l', '--list', metavar='<tarfile>',
help='Show listing of a tarfile')
group.add_argument('-e', '--extract', nargs='+',
metavar=('<tarfile>', '<output_dir>'),
help='Extract tarfile into target dir')
group.add_argument('-c', '--create', nargs='+',
metavar=('<name>', '<file>'),
help='Create tarfile from sources')
group.add_argument('-t', '--test', metavar='<tarfile>',
help='Test if a tarfile is valid')
args = parser.parse_args()
if args.test:
src = args.test
if is_tarfile(src):
with open(src, 'r') as tar:
tar.getmembers()
print(tar.getmembers(), file=sys.stderr)
if args.verbose:
print('{!r} is a tar archive.'.format(src))
else:
parser.exit(1, '{!r} is not a tar archive.\n'.format(src))
elif args.list:
src = args.list
if is_tarfile(src):
with TarFile.open(src, 'r:*') as tf:
tf.list(verbose=args.verbose)
else:
parser.exit(1, '{!r} is not a tar archive.\n'.format(src))
elif args.extract:
if len(args.extract) == 1:
src = args.extract[0]
curdir = os.curdir
elif len(args.extract) == 2:
src, curdir = args.extract
else:
parser.exit(1, parser.format_help())
if is_tarfile(src):
with TarFile.open(src, 'r:*') as tf:
tf.extractall(path=curdir)
if args.verbose:
if curdir == '.':
msg = '{!r} file is extracted.'.format(src)
else:
msg = ('{!r} file is extracted '
'into {!r} directory.').format(src, curdir)
print(msg)
else:
parser.exit(1, '{!r} is not a tar archive.\n'.format(src))
elif args.create:
tar_name = args.create.pop(0)
_, ext = os.path.splitext(tar_name)
compressions = {
# gz
'.gz': 'gz',
'.tgz': 'gz',
# xz
'.xz': 'xz',
'.txz': 'xz',
# bz2
'.bz2': 'bz2',
'.tbz': 'bz2',
'.tbz2': 'bz2',
'.tb2': 'bz2',
}
tar_mode = 'w:' + compressions[ext] if ext in compressions else 'w'
tar_files = args.create
with TarFile.open(tar_name, tar_mode) as tf:
for file_name in tar_files:
tf.add(file_name)
if args.verbose:
print('{!r} file created.'.format(tar_name))
else:
parser.exit(1, parser.format_help())
if __name__ == '__main__':
main()
r"""TELNET client class.
Based on RFC 854: TELNET Protocol Specification, by J. Postel and
J. Reynolds
Example:
>>> from telnetlib import Telnet
>>> tn = Telnet('www.python.org', 79) # connect to finger port
>>> tn.write(b'guido\r\n')
>>> print(tn.read_all())
Login Name TTY Idle When Where
guido Guido van Rossum pts/2 <Dec 2 11:10> snag.cnri.reston..
>>>
Note that read_all() won't read until eof -- it just reads some data
-- but it guarantees to read at least one byte unless EOF is hit.
It is possible to pass a Telnet object to a selector in order to wait until
more data is available. Note that in this case, read_eager() may return b''
even if there was data on the socket, because the protocol negotiation may have
eaten the data. This is why EOFError is needed in some cases to distinguish
between "no data" and "connection closed" (since the socket also appears ready
for reading when it is closed).
To do:
- option negotiation
- timeout should be intrinsic to the connection object instead of an
option on one of the read calls only
"""
# Imported modules
import sys
import socket
import selectors
try:
from time import monotonic as _time
except ImportError:
from time import time as _time
__all__ = ["Telnet"]
# Tunable parameters
DEBUGLEVEL = 0
# Telnet protocol defaults
TELNET_PORT = 23
# Telnet protocol characters (don't change)
IAC = bytes([255]) # "Interpret As Command"
DONT = bytes([254])
DO = bytes([253])
WONT = bytes([252])
WILL = bytes([251])
theNULL = bytes([0])
SE = bytes([240]) # Subnegotiation End
NOP = bytes([241]) # No Operation
DM = bytes([242]) # Data Mark
BRK = bytes([243]) # Break
IP = bytes([244]) # Interrupt process
AO = bytes([245]) # Abort output
AYT = bytes([246]) # Are You There
EC = bytes([247]) # Erase Character
EL = bytes([248]) # Erase Line
GA = bytes([249]) # Go Ahead
SB = bytes([250]) # Subnegotiation Begin
# Telnet protocol options code (don't change)
# These ones all come from arpa/telnet.h
BINARY = bytes([0]) # 8-bit data path
ECHO = bytes([1]) # echo
RCP = bytes([2]) # prepare to reconnect
SGA = bytes([3]) # suppress go ahead
NAMS = bytes([4]) # approximate message size
STATUS = bytes([5]) # give status
TM = bytes([6]) # timing mark
RCTE = bytes([7]) # remote controlled transmission and echo
NAOL = bytes([8]) # negotiate about output line width
NAOP = bytes([9]) # negotiate about output page size
NAOCRD = bytes([10]) # negotiate about CR disposition
NAOHTS = bytes([11]) # negotiate about horizontal tabstops
NAOHTD = bytes([12]) # negotiate about horizontal tab disposition
NAOFFD = bytes([13]) # negotiate about formfeed disposition
NAOVTS = bytes([14]) # negotiate about vertical tab stops
NAOVTD = bytes([15]) # negotiate about vertical tab disposition
NAOLFD = bytes([16]) # negotiate about output LF disposition
XASCII = bytes([17]) # extended ascii character set
LOGOUT = bytes([18]) # force logout
BM = bytes([19]) # byte macro
DET = bytes([20]) # data entry terminal
SUPDUP = bytes([21]) # supdup protocol
SUPDUPOUTPUT = bytes([22]) # supdup output
SNDLOC = bytes([23]) # send location
TTYPE = bytes([24]) # terminal type
EOR = bytes([25]) # end or record
TUID = bytes([26]) # TACACS user identification
OUTMRK = bytes([27]) # output marking
TTYLOC = bytes([28]) # terminal location number
VT3270REGIME = bytes([29]) # 3270 regime
X3PAD = bytes([30]) # X.3 PAD
NAWS = bytes([31]) # window size
TSPEED = bytes([32]) # terminal speed
LFLOW = bytes([33]) # remote flow control
LINEMODE = bytes([34]) # Linemode option
XDISPLOC = bytes([35]) # X Display Location
OLD_ENVIRON = bytes([36]) # Old - Environment variables
AUTHENTICATION = bytes([37]) # Authenticate
ENCRYPT = bytes([38]) # Encryption option
NEW_ENVIRON = bytes([39]) # New - Environment variables
# the following ones come from
# http://www.iana.org/assignments/telnet-options
# Unfortunately, that document does not assign identifiers
# to all of them, so we are making them up
TN3270E = bytes([40]) # TN3270E
XAUTH = bytes([41]) # XAUTH
CHARSET = bytes([42]) # CHARSET
RSP = bytes([43]) # Telnet Remote Serial Port
COM_PORT_OPTION = bytes([44]) # Com Port Control Option
SUPPRESS_LOCAL_ECHO = bytes([45]) # Telnet Suppress Local Echo
TLS = bytes([46]) # Telnet Start TLS
KERMIT = bytes([47]) # KERMIT
SEND_URL = bytes([48]) # SEND-URL
FORWARD_X = bytes([49]) # FORWARD_X
PRAGMA_LOGON = bytes([138]) # TELOPT PRAGMA LOGON
SSPI_LOGON = bytes([139]) # TELOPT SSPI LOGON
PRAGMA_HEARTBEAT = bytes([140]) # TELOPT PRAGMA HEARTBEAT
EXOPL = bytes([255]) # Extended-Options-List
NOOPT = bytes([0])
# poll/select have the advantage of not requiring any extra file descriptor,
# contrarily to epoll/kqueue (also, they require a single syscall).
if hasattr(selectors, 'PollSelector'):
_TelnetSelector = selectors.PollSelector
else:
_TelnetSelector = selectors.SelectSelector
class Telnet:
"""Telnet interface class.
An instance of this class represents a connection to a telnet
server. The instance is initially not connected; the open()
method must be used to establish a connection. Alternatively, the
host name and optional port number can be passed to the
constructor, too.
Don't try to reopen an already connected instance.
This class has many read_*() methods. Note that some of them
raise EOFError when the end of the connection is read, because
they can return an empty string for other reasons. See the
individual doc strings.
read_until(expected, [timeout])
Read until the expected string has been seen, or a timeout is
hit (default is no timeout); may block.
read_all()
Read all data until EOF; may block.
read_some()
Read at least one byte or EOF; may block.
read_very_eager()
Read all data available already queued or on the socket,
without blocking.
read_eager()
Read either data already queued or some data available on the
socket, without blocking.
read_lazy()
Read all data in the raw queue (processing it first), without
doing any socket I/O.
read_very_lazy()
Reads all data in the cooked queue, without doing any socket
I/O.
read_sb_data()
Reads available data between SB ... SE sequence. Don't block.
set_option_negotiation_callback(callback)
Each time a telnet option is read on the input flow, this callback
(if set) is called with the following parameters :
callback(telnet socket, command, option)
option will be chr(0) when there is no option.
No other action is done afterwards by telnetlib.
"""
def __init__(self, host=None, port=0,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
"""Constructor.
When called without arguments, create an unconnected instance.
With a hostname argument, it connects the instance; port number
and timeout are optional.
"""
self.debuglevel = DEBUGLEVEL
self.host = host
self.port = port
self.timeout = timeout
self.sock = None
self.rawq = b''
self.irawq = 0
self.cookedq = b''
self.eof = 0
self.iacseq = b'' # Buffer for IAC sequence.
self.sb = 0 # flag for SB and SE sequence.
self.sbdataq = b''
self.option_callback = None
if host is not None:
self.open(host, port, timeout)
def open(self, host, port=0, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
"""Connect to a host.
The optional second argument is the port number, which
defaults to the standard telnet port (23).
Don't try to reopen an already connected instance.
"""
self.eof = 0
if not port:
port = TELNET_PORT
self.host = host
self.port = port
self.timeout = timeout
self.sock = socket.create_connection((host, port), timeout)
def __del__(self):
"""Destructor -- close the connection."""
self.close()
def msg(self, msg, *args):
"""Print a debug message, when the debug level is > 0.
If extra arguments are present, they are substituted in the
message using the standard string formatting operator.
"""
if self.debuglevel > 0:
print('Telnet(%s,%s):' % (self.host, self.port), end=' ')
if args:
print(msg % args)
else:
print(msg)
def set_debuglevel(self, debuglevel):
"""Set the debug level.
The higher it is, the more debug output you get (on sys.stdout).
"""
self.debuglevel = debuglevel
def close(self):
"""Close the connection."""
sock = self.sock
self.sock = 0
self.eof = 1
self.iacseq = b''
self.sb = 0
if sock:
sock.close()
def get_socket(self):
"""Return the socket object used internally."""
return self.sock
def fileno(self):
"""Return the fileno() of the socket object used internally."""
return self.sock.fileno()
def write(self, buffer):
"""Write a string to the socket, doubling any IAC characters.
Can block if the connection is blocked. May raise
OSError if the connection is closed.
"""
if IAC in buffer:
buffer = buffer.replace(IAC, IAC+IAC)
self.msg("send %r", buffer)
self.sock.sendall(buffer)
def read_until(self, match, timeout=None):
"""Read until a given string is encountered or until timeout.
When no match is found, return whatever is available instead,
possibly the empty string. Raise EOFError if the connection
is closed and no cooked data is available.
"""
n = len(match)
self.process_rawq()
i = self.cookedq.find(match)
if i >= 0:
i = i+n
buf = self.cookedq[:i]
self.cookedq = self.cookedq[i:]
return buf
if timeout is not None:
deadline = _time() + timeout
with _TelnetSelector() as selector:
selector.register(self, selectors.EVENT_READ)
while not self.eof:
if selector.select(timeout):
i = max(0, len(self.cookedq)-n)
self.fill_rawq()
self.process_rawq()
i = self.cookedq.find(match, i)
if i >= 0:
i = i+n
buf = self.cookedq[:i]
self.cookedq = self.cookedq[i:]
return buf
if timeout is not None:
timeout = deadline - _time()
if timeout < 0:
break
return self.read_very_lazy()
def read_all(self):
"""Read all data until EOF; block until connection closed."""
self.process_rawq()
while not self.eof:
self.fill_rawq()
self.process_rawq()
buf = self.cookedq
self.cookedq = b''
return buf
def read_some(self):
"""Read at least one byte of cooked data unless EOF is hit.
Return b'' if EOF is hit. Block if no data is immediately
available.
"""
self.process_rawq()
while not self.cookedq and not self.eof:
self.fill_rawq()
self.process_rawq()
buf = self.cookedq
self.cookedq = b''
return buf
def read_very_eager(self):
"""Read everything that's possible without blocking in I/O (eager).
Raise EOFError if connection closed and no cooked data
available. Return b'' if no cooked data available otherwise.
Don't block unless in the midst of an IAC sequence.
"""
self.process_rawq()
while not self.eof and self.sock_avail():
self.fill_rawq()
self.process_rawq()
return self.read_very_lazy()
def read_eager(self):
"""Read readily available data.
Raise EOFError if connection closed and no cooked data
available. Return b'' if no cooked data available otherwise.
Don't block unless in the midst of an IAC sequence.
"""
self.process_rawq()
while not self.cookedq and not self.eof and self.sock_avail():
self.fill_rawq()
self.process_rawq()
return self.read_very_lazy()
def read_lazy(self):
"""Process and return data that's already in the queues (lazy).
Raise EOFError if connection closed and no data available.
Return b'' if no cooked data available otherwise. Don't block
unless in the midst of an IAC sequence.
"""
self.process_rawq()
return self.read_very_lazy()
def read_very_lazy(self):
"""Return any data available in the cooked queue (very lazy).
Raise EOFError if connection closed and no data available.
Return b'' if no cooked data available otherwise. Don't block.
"""
buf = self.cookedq
self.cookedq = b''
if not buf and self.eof and not self.rawq:
raise EOFError('telnet connection closed')
return buf
def read_sb_data(self):
"""Return any data available in the SB ... SE queue.
Return b'' if no SB ... SE available. Should only be called
after seeing a SB or SE command. When a new SB command is
found, old unread SB data will be discarded. Don't block.
"""
buf = self.sbdataq
self.sbdataq = b''
return buf
def set_option_negotiation_callback(self, callback):
"""Provide a callback function called after each receipt of a telnet option."""
self.option_callback = callback
def process_rawq(self):
"""Transfer from raw queue to cooked queue.
Set self.eof when connection is closed. Don't block unless in
the midst of an IAC sequence.
"""
buf = [b'', b'']
try:
while self.rawq:
c = self.rawq_getchar()
if not self.iacseq:
if c == theNULL:
continue
if c == b"\021":
continue
if c != IAC:
buf[self.sb] = buf[self.sb] + c
continue
else:
self.iacseq += c
elif len(self.iacseq) == 1:
# 'IAC: IAC CMD [OPTION only for WILL/WONT/DO/DONT]'
if c in (DO, DONT, WILL, WONT):
self.iacseq += c
continue
self.iacseq = b''
if c == IAC:
buf[self.sb] = buf[self.sb] + c
else:
if c == SB: # SB ... SE start.
self.sb = 1
self.sbdataq = b''
elif c == SE:
self.sb = 0
self.sbdataq = self.sbdataq + buf[1]
buf[1] = b''
if self.option_callback:
# Callback is supposed to look into
# the sbdataq
self.option_callback(self.sock, c, NOOPT)
else:
# We can't offer automatic processing of
# suboptions. Alas, we should not get any
# unless we did a WILL/DO before.
self.msg('IAC %d not recognized' % ord(c))
elif len(self.iacseq) == 2:
cmd = self.iacseq[1:2]
self.iacseq = b''
opt = c
if cmd in (DO, DONT):
self.msg('IAC %s %d',
cmd == DO and 'DO' or 'DONT', ord(opt))
if self.option_callback:
self.option_callback(self.sock, cmd, opt)
else:
self.sock.sendall(IAC + WONT + opt)
elif cmd in (WILL, WONT):
self.msg('IAC %s %d',
cmd == WILL and 'WILL' or 'WONT', ord(opt))
if self.option_callback:
self.option_callback(self.sock, cmd, opt)
else:
self.sock.sendall(IAC + DONT + opt)
except EOFError: # raised by self.rawq_getchar()
self.iacseq = b'' # Reset on EOF
self.sb = 0
pass
self.cookedq = self.cookedq + buf[0]
self.sbdataq = self.sbdataq + buf[1]
def rawq_getchar(self):
"""Get next char from raw queue.
Block if no data is immediately available. Raise EOFError
when connection is closed.
"""
if not self.rawq:
self.fill_rawq()
if self.eof:
raise EOFError
c = self.rawq[self.irawq:self.irawq+1]
self.irawq = self.irawq + 1
if self.irawq >= len(self.rawq):
self.rawq = b''
self.irawq = 0
return c
def fill_rawq(self):
"""Fill raw queue from exactly one recv() system call.
Block if no data is immediately available. Set self.eof when
connection is closed.
"""
if self.irawq >= len(self.rawq):
self.rawq = b''
self.irawq = 0
# The buffer size should be fairly small so as to avoid quadratic
# behavior in process_rawq() above
buf = self.sock.recv(50)
self.msg("recv %r", buf)
self.eof = (not buf)
self.rawq = self.rawq + buf
def sock_avail(self):
"""Test whether data is available on the socket."""
with _TelnetSelector() as selector:
selector.register(self, selectors.EVENT_READ)
return bool(selector.select(0))
def interact(self):
"""Interaction function, emulates a very dumb telnet client."""
if sys.platform == "win32":
self.mt_interact()
return
with _TelnetSelector() as selector:
selector.register(self, selectors.EVENT_READ)
selector.register(sys.stdin, selectors.EVENT_READ)
while True:
for key, events in selector.select():
if key.fileobj is self:
try:
text = self.read_eager()
except EOFError:
print('*** Connection closed by remote host ***')
return
if text:
sys.stdout.write(text.decode('ascii'))
sys.stdout.flush()
elif key.fileobj is sys.stdin:
line = sys.stdin.readline().encode('ascii')
if not line:
return
self.write(line)
def mt_interact(self):
"""Multithreaded version of interact()."""
import _thread
_thread.start_new_thread(self.listener, ())
while 1:
line = sys.stdin.readline()
if not line:
break
self.write(line.encode('ascii'))
def listener(self):
"""Helper for mt_interact() -- this executes in the other thread."""
while 1:
try:
data = self.read_eager()
except EOFError:
print('*** Connection closed by remote host ***')
return
if data:
sys.stdout.write(data.decode('ascii'))
else:
sys.stdout.flush()
def expect(self, list, timeout=None):
"""Read until one from a list of a regular expressions matches.
The first argument is a list of regular expressions, either
compiled (re.RegexObject instances) or uncompiled (strings).
The optional second argument is a timeout, in seconds; default
is no timeout.
Return a tuple of three items: the index in the list of the
first regular expression that matches; the match object
returned; and the text read up till and including the match.
If EOF is read and no text was read, raise EOFError.
Otherwise, when nothing matches, return (-1, None, text) where
text is the text received so far (may be the empty string if a
timeout happened).
If a regular expression ends with a greedy match (e.g. '.*')
or if more than one expression can match the same input, the
results are undeterministic, and may depend on the I/O timing.
"""
re = None
list = list[:]
indices = range(len(list))
for i in indices:
if not hasattr(list[i], "search"):
if not re: import re
list[i] = re.compile(list[i])
if timeout is not None:
deadline = _time() + timeout
with _TelnetSelector() as selector:
selector.register(self, selectors.EVENT_READ)
while not self.eof:
self.process_rawq()
for i in indices:
m = list[i].search(self.cookedq)
if m:
e = m.end()
text = self.cookedq[:e]
self.cookedq = self.cookedq[e:]
return (i, m, text)
if timeout is not None:
ready = selector.select(timeout)
timeout = deadline - _time()
if not ready:
if timeout < 0:
break
else:
continue
self.fill_rawq()
text = self.read_very_lazy()
if not text and self.eof:
raise EOFError
return (-1, None, text)
def test():
"""Test program for telnetlib.
Usage: python telnetlib.py [-d] ... [host [port]]
Default host is localhost; default port is 23.
"""
debuglevel = 0
while sys.argv[1:] and sys.argv[1] == '-d':
debuglevel = debuglevel+1
del sys.argv[1]
host = 'localhost'
if sys.argv[1:]:
host = sys.argv[1]
port = 0
if sys.argv[2:]:
portstr = sys.argv[2]
try:
port = int(portstr)
except ValueError:
port = socket.getservbyname(portstr, 'tcp')
tn = Telnet()
tn.set_debuglevel(debuglevel)
tn.open(host, port, timeout=0.5)
tn.interact()
tn.close()
if __name__ == '__main__':
test()
"""Temporary files.
This module provides generic, low- and high-level interfaces for
creating temporary files and directories. All of the interfaces
provided by this module can be used without fear of race conditions
except for 'mktemp'. 'mktemp' is subject to race conditions and
should not be used; it is provided for backward compatibility only.
This module also provides some data items to the user:
TMP_MAX - maximum number of names that will be tried before
giving up.
tempdir - If this is set to a string before the first use of
any routine from this module, it will be considered as
another candidate location to store temporary files.
"""
__all__ = [
"NamedTemporaryFile", "TemporaryFile", # high level safe interfaces
"SpooledTemporaryFile", "TemporaryDirectory",
"mkstemp", "mkdtemp", # low level safe interfaces
"mktemp", # deprecated unsafe interface
"TMP_MAX", "gettempprefix", # constants
"tempdir", "gettempdir"
]
# Imports.
import functools as _functools
import warnings as _warnings
import io as _io
import os as _os
import shutil as _shutil
import errno as _errno
from random import Random as _Random
import weakref as _weakref
try:
import _thread
except ImportError:
import _dummy_thread as _thread
_allocate_lock = _thread.allocate_lock
_text_openflags = _os.O_RDWR | _os.O_CREAT | _os.O_EXCL
if hasattr(_os, 'O_NOFOLLOW'):
_text_openflags |= _os.O_NOFOLLOW
_bin_openflags = _text_openflags
if hasattr(_os, 'O_BINARY'):
_bin_openflags |= _os.O_BINARY
if hasattr(_os, 'TMP_MAX'):
TMP_MAX = _os.TMP_MAX
else:
TMP_MAX = 10000
# Although it does not have an underscore for historical reasons, this
# variable is an internal implementation detail (see issue 10354).
template = "tmp"
# Internal routines.
_once_lock = _allocate_lock()
if hasattr(_os, "lstat"):
_stat = _os.lstat
elif hasattr(_os, "stat"):
_stat = _os.stat
else:
# Fallback. All we need is something that raises OSError if the
# file doesn't exist.
def _stat(fn):
fd = _os.open(fn, _os.O_RDONLY)
_os.close(fd)
def _exists(fn):
try:
_stat(fn)
except OSError:
return False
else:
return True
class _RandomNameSequence:
"""An instance of _RandomNameSequence generates an endless
sequence of unpredictable strings which can safely be incorporated
into file names. Each string is six characters long. Multiple
threads can safely use the same instance at the same time.
_RandomNameSequence is an iterator."""
characters = "abcdefghijklmnopqrstuvwxyz0123456789_"
@property
def rng(self):
cur_pid = _os.getpid()
if cur_pid != getattr(self, '_rng_pid', None):
self._rng = _Random()
self._rng_pid = cur_pid
return self._rng
def __iter__(self):
return self
def __next__(self):
c = self.characters
choose = self.rng.choice
letters = [choose(c) for dummy in range(8)]
return ''.join(letters)
def _candidate_tempdir_list():
"""Generate a list of candidate temporary directories which
_get_default_tempdir will try."""
dirlist = []
# First, try the environment.
for envname in 'TMPDIR', 'TEMP', 'TMP':
dirname = _os.getenv(envname)
if dirname: dirlist.append(dirname)
# Failing that, try OS-specific locations.
if _os.name == 'nt':
dirlist.extend([ r'c:\temp', r'c:\tmp', r'\temp', r'\tmp' ])
else:
dirlist.extend([ '/tmp', '/var/tmp', '/usr/tmp' ])
# As a last resort, the current directory.
try:
dirlist.append(_os.getcwd())
except (AttributeError, OSError):
dirlist.append(_os.curdir)
return dirlist
def _get_default_tempdir():
"""Calculate the default directory to use for temporary files.
This routine should be called exactly once.
We determine whether or not a candidate temp dir is usable by
trying to create and write to a file in that directory. If this
is successful, the test file is deleted. To prevent denial of
service, the name of the test file must be randomized."""
namer = _RandomNameSequence()
dirlist = _candidate_tempdir_list()
for dir in dirlist:
if dir != _os.curdir:
dir = _os.path.abspath(dir)
# Try only a few names per directory.
for seq in range(100):
name = next(namer)
filename = _os.path.join(dir, name)
try:
fd = _os.open(filename, _bin_openflags, 0o600)
try:
try:
with _io.open(fd, 'wb', closefd=False) as fp:
fp.write(b'blat')
finally:
_os.close(fd)
finally:
_os.unlink(filename)
return dir
except FileExistsError:
pass
except PermissionError:
# This exception is thrown when a directory with the chosen name
# already exists on windows.
if (_os.name == 'nt' and _os.path.isdir(dir) and
_os.access(dir, _os.W_OK)):
continue
break # no point trying more names in this directory
except OSError:
break # no point trying more names in this directory
raise FileNotFoundError(_errno.ENOENT,
"No usable temporary directory found in %s" %
dirlist)
_name_sequence = None
def _get_candidate_names():
"""Common setup sequence for all user-callable interfaces."""
global _name_sequence
if _name_sequence is None:
_once_lock.acquire()
try:
if _name_sequence is None:
_name_sequence = _RandomNameSequence()
finally:
_once_lock.release()
return _name_sequence
def _mkstemp_inner(dir, pre, suf, flags):
"""Code common to mkstemp, TemporaryFile, and NamedTemporaryFile."""
names = _get_candidate_names()
for seq in range(TMP_MAX):
name = next(names)
file = _os.path.join(dir, pre + name + suf)
try:
fd = _os.open(file, flags, 0o600)
return (fd, _os.path.abspath(file))
except FileExistsError:
continue # try again
except PermissionError:
# This exception is thrown when a directory with the chosen name
# already exists on windows.
if (_os.name == 'nt' and _os.path.isdir(dir) and
_os.access(dir, _os.W_OK)):
continue
else:
raise
raise FileExistsError(_errno.EEXIST,
"No usable temporary file name found")
# User visible interfaces.
def gettempprefix():
"""Accessor for tempdir.template."""
return template
tempdir = None
def gettempdir():
"""Accessor for tempfile.tempdir."""
global tempdir
if tempdir is None:
_once_lock.acquire()
try:
if tempdir is None:
tempdir = _get_default_tempdir()
finally:
_once_lock.release()
return tempdir
def mkstemp(suffix="", prefix=template, dir=None, text=False):
"""User-callable function to create and return a unique temporary
file. The return value is a pair (fd, name) where fd is the
file descriptor returned by os.open, and name is the filename.
If 'suffix' is specified, the file name will end with that suffix,
otherwise there will be no suffix.
If 'prefix' is specified, the file name will begin with that prefix,
otherwise a default prefix is used.
If 'dir' is specified, the file will be created in that directory,
otherwise a default directory is used.
If 'text' is specified and true, the file is opened in text
mode. Else (the default) the file is opened in binary mode. On
some operating systems, this makes no difference.
The file is readable and writable only by the creating user ID.
If the operating system uses permission bits to indicate whether a
file is executable, the file is executable by no one. The file
descriptor is not inherited by children of this process.
Caller is responsible for deleting the file when done with it.
"""
if dir is None:
dir = gettempdir()
if text:
flags = _text_openflags
else:
flags = _bin_openflags
return _mkstemp_inner(dir, prefix, suffix, flags)
def mkdtemp(suffix="", prefix=template, dir=None):
"""User-callable function to create and return a unique temporary
directory. The return value is the pathname of the directory.
Arguments are as for mkstemp, except that the 'text' argument is
not accepted.
The directory is readable, writable, and searchable only by the
creating user.
Caller is responsible for deleting the directory when done with it.
"""
if dir is None:
dir = gettempdir()
names = _get_candidate_names()
for seq in range(TMP_MAX):
name = next(names)
file = _os.path.join(dir, prefix + name + suffix)
try:
_os.mkdir(file, 0o700)
return file
except FileExistsError:
continue # try again
except PermissionError:
# This exception is thrown when a directory with the chosen name
# already exists on windows.
if (_os.name == 'nt' and _os.path.isdir(dir) and
_os.access(dir, _os.W_OK)):
continue
else:
raise
raise FileExistsError(_errno.EEXIST,
"No usable temporary directory name found")
def mktemp(suffix="", prefix=template, dir=None):
"""User-callable function to return a unique temporary file name. The
file is not created.
Arguments are as for mkstemp, except that the 'text' argument is
not accepted.
This function is unsafe and should not be used. The file name
refers to a file that did not exist at some point, but by the time
you get around to creating it, someone else may have beaten you to
the punch.
"""
## from warnings import warn as _warn
## _warn("mktemp is a potential security risk to your program",
## RuntimeWarning, stacklevel=2)
if dir is None:
dir = gettempdir()
names = _get_candidate_names()
for seq in range(TMP_MAX):
name = next(names)
file = _os.path.join(dir, prefix + name + suffix)
if not _exists(file):
return file
raise FileExistsError(_errno.EEXIST,
"No usable temporary filename found")
class _TemporaryFileCloser:
"""A separate object allowing proper closing of a temporary file's
underlying file object, without adding a __del__ method to the
temporary file."""
file = None # Set here since __del__ checks it
close_called = False
def __init__(self, file, name, delete=True):
self.file = file
self.name = name
self.delete = delete
# NT provides delete-on-close as a primitive, so we don't need
# the wrapper to do anything special. We still use it so that
# file.name is useful (i.e. not "(fdopen)") with NamedTemporaryFile.
if _os.name != 'nt':
# Cache the unlinker so we don't get spurious errors at
# shutdown when the module-level "os" is None'd out. Note
# that this must be referenced as self.unlink, because the
# name TemporaryFileWrapper may also get None'd out before
# __del__ is called.
def close(self, unlink=_os.unlink):
if not self.close_called and self.file is not None:
self.close_called = True
try:
self.file.close()
finally:
if self.delete:
unlink(self.name)
# Need to ensure the file is deleted on __del__
def __del__(self):
self.close()
else:
def close(self):
if not self.close_called:
self.close_called = True
self.file.close()
class _TemporaryFileWrapper:
"""Temporary file wrapper
This class provides a wrapper around files opened for
temporary use. In particular, it seeks to automatically
remove the file when it is no longer needed.
"""
def __init__(self, file, name, delete=True):
self.file = file
self.name = name
self.delete = delete
self._closer = _TemporaryFileCloser(file, name, delete)
def __getattr__(self, name):
# Attribute lookups are delegated to the underlying file
# and cached for non-numeric results
# (i.e. methods are cached, closed and friends are not)
file = self.__dict__['file']
a = getattr(file, name)
if hasattr(a, '__call__'):
func = a
@_functools.wraps(func)
def func_wrapper(*args, **kwargs):
return func(*args, **kwargs)
# Avoid closing the file as long as the wrapper is alive,
# see issue #18879.
func_wrapper._closer = self._closer
a = func_wrapper
if not isinstance(a, int):
setattr(self, name, a)
return a
# The underlying __enter__ method returns the wrong object
# (self.file) so override it to return the wrapper
def __enter__(self):
self.file.__enter__()
return self
# Need to trap __exit__ as well to ensure the file gets
# deleted when used in a with statement
def __exit__(self, exc, value, tb):
result = self.file.__exit__(exc, value, tb)
self.close()
return result
def close(self):
"""
Close the temporary file, possibly deleting it.
"""
self._closer.close()
# iter() doesn't use __getattr__ to find the __iter__ method
def __iter__(self):
# Don't return iter(self.file), but yield from it to avoid closing
# file as long as it's being used as iterator (see issue #23700). We
# can't use 'yield from' here because iter(file) returns the file
# object itself, which has a close method, and thus the file would get
# closed when the generator is finalized, due to PEP380 semantics.
for line in self.file:
yield line
def NamedTemporaryFile(mode='w+b', buffering=-1, encoding=None,
newline=None, suffix="", prefix=template,
dir=None, delete=True):
"""Create and return a temporary file.
Arguments:
'prefix', 'suffix', 'dir' -- as for mkstemp.
'mode' -- the mode argument to io.open (default "w+b").
'buffering' -- the buffer size argument to io.open (default -1).
'encoding' -- the encoding argument to io.open (default None)
'newline' -- the newline argument to io.open (default None)
'delete' -- whether the file is deleted on close (default True).
The file is created as mkstemp() would do it.
Returns an object with a file-like interface; the name of the file
is accessible as file.name. The file will be automatically deleted
when it is closed unless the 'delete' argument is set to False.
"""
if dir is None:
dir = gettempdir()
flags = _bin_openflags
# Setting O_TEMPORARY in the flags causes the OS to delete
# the file when it is closed. This is only supported by Windows.
if _os.name == 'nt' and delete:
flags |= _os.O_TEMPORARY
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
try:
file = _io.open(fd, mode, buffering=buffering,
newline=newline, encoding=encoding)
return _TemporaryFileWrapper(file, name, delete)
except Exception:
_os.close(fd)
raise
if _os.name != 'posix' or _os.sys.platform == 'cygwin':
# On non-POSIX and Cygwin systems, assume that we cannot unlink a file
# while it is open.
TemporaryFile = NamedTemporaryFile
else:
def TemporaryFile(mode='w+b', buffering=-1, encoding=None,
newline=None, suffix="", prefix=template,
dir=None):
"""Create and return a temporary file.
Arguments:
'prefix', 'suffix', 'dir' -- as for mkstemp.
'mode' -- the mode argument to io.open (default "w+b").
'buffering' -- the buffer size argument to io.open (default -1).
'encoding' -- the encoding argument to io.open (default None)
'newline' -- the newline argument to io.open (default None)
The file is created as mkstemp() would do it.
Returns an object with a file-like interface. The file has no
name, and will cease to exist when it is closed.
"""
if dir is None:
dir = gettempdir()
flags = _bin_openflags
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
try:
_os.unlink(name)
return _io.open(fd, mode, buffering=buffering,
newline=newline, encoding=encoding)
except:
_os.close(fd)
raise
class SpooledTemporaryFile:
"""Temporary file wrapper, specialized to switch from BytesIO
or StringIO to a real file when it exceeds a certain size or
when a fileno is needed.
"""
_rolled = False
def __init__(self, max_size=0, mode='w+b', buffering=-1,
encoding=None, newline=None,
suffix="", prefix=template, dir=None):
if 'b' in mode:
self._file = _io.BytesIO()
else:
# Setting newline="\n" avoids newline translation;
# this is important because otherwise on Windows we'd
# get double newline translation upon rollover().
self._file = _io.StringIO(newline="\n")
self._max_size = max_size
self._rolled = False
self._TemporaryFileArgs = {'mode': mode, 'buffering': buffering,
'suffix': suffix, 'prefix': prefix,
'encoding': encoding, 'newline': newline,
'dir': dir}
def _check(self, file):
if self._rolled: return
max_size = self._max_size
if max_size and file.tell() > max_size:
self.rollover()
def rollover(self):
if self._rolled: return
file = self._file
newfile = self._file = TemporaryFile(**self._TemporaryFileArgs)
del self._TemporaryFileArgs
newfile.write(file.getvalue())
newfile.seek(file.tell(), 0)
self._rolled = True
# The method caching trick from NamedTemporaryFile
# won't work here, because _file may change from a
# BytesIO/StringIO instance to a real file. So we list
# all the methods directly.
# Context management protocol
def __enter__(self):
if self._file.closed:
raise ValueError("Cannot enter context with closed file")
return self
def __exit__(self, exc, value, tb):
self._file.close()
# file protocol
def __iter__(self):
return self._file.__iter__()
def close(self):
self._file.close()
@property
def closed(self):
return self._file.closed
@property
def encoding(self):
try:
return self._file.encoding
except AttributeError:
if 'b' in self._TemporaryFileArgs['mode']:
raise
return self._TemporaryFileArgs['encoding']
def fileno(self):
self.rollover()
return self._file.fileno()
def flush(self):
self._file.flush()
def isatty(self):
return self._file.isatty()
@property
def mode(self):
try:
return self._file.mode
except AttributeError:
return self._TemporaryFileArgs['mode']
@property
def name(self):
try:
return self._file.name
except AttributeError:
return None
@property
def newlines(self):
try:
return self._file.newlines
except AttributeError:
if 'b' in self._TemporaryFileArgs['mode']:
raise
return self._TemporaryFileArgs['newline']
def read(self, *args):
return self._file.read(*args)
def readline(self, *args):
return self._file.readline(*args)
def readlines(self, *args):
return self._file.readlines(*args)
def seek(self, *args):
self._file.seek(*args)
@property
def softspace(self):
return self._file.softspace
def tell(self):
return self._file.tell()
def truncate(self, size=None):
if size is None:
self._file.truncate()
else:
if size > self._max_size:
self.rollover()
self._file.truncate(size)
def write(self, s):
file = self._file
rv = file.write(s)
self._check(file)
return rv
def writelines(self, iterable):
file = self._file
rv = file.writelines(iterable)
self._check(file)
return rv
class TemporaryDirectory(object):
"""Create and return a temporary directory. This has the same
behavior as mkdtemp but can be used as a context manager. For
example:
with TemporaryDirectory() as tmpdir:
...
Upon exiting the context, the directory and everything contained
in it are removed.
"""
def __init__(self, suffix="", prefix=template, dir=None):
self.name = mkdtemp(suffix, prefix, dir)
self._finalizer = _weakref.finalize(
self, self._cleanup, self.name,
warn_message="Implicitly cleaning up {!r}".format(self))
@classmethod
def _cleanup(cls, name, warn_message):
_shutil.rmtree(name)
_warnings.warn(warn_message, ResourceWarning)
def __repr__(self):
return "<{} {!r}>".format(self.__class__.__name__, self.name)
def __enter__(self):
return self.name
def __exit__(self, exc, value, tb):
self.cleanup()
def cleanup(self):
if self._finalizer.detach():
_shutil.rmtree(self.name)
"""Text wrapping and filling.
"""
# Copyright (C) 1999-2001 Gregory P. Ward.
# Copyright (C) 2002, 2003 Python Software Foundation.
# Written by Greg Ward <[email protected]>
import re
__all__ = ['TextWrapper', 'wrap', 'fill', 'dedent', 'indent', 'shorten']
# Hardcode the recognized whitespace characters to the US-ASCII
# whitespace characters. The main reason for doing this is that in
# ISO-8859-1, 0xa0 is non-breaking whitespace, so in certain locales
# that character winds up in string.whitespace. Respecting
# string.whitespace in those cases would 1) make textwrap treat 0xa0 the
# same as any other whitespace char, which is clearly wrong (it's a
# *non-breaking* space), 2) possibly cause problems with Unicode,
# since 0xa0 is not in range(128).
_whitespace = '\t\n\x0b\x0c\r '
class TextWrapper:
"""
Object for wrapping/filling text. The public interface consists of
the wrap() and fill() methods; the other methods are just there for
subclasses to override in order to tweak the default behaviour.
If you want to completely replace the main wrapping algorithm,
you'll probably have to override _wrap_chunks().
Several instance attributes control various aspects of wrapping:
width (default: 70)
the maximum width of wrapped lines (unless break_long_words
is false)
initial_indent (default: "")
string that will be prepended to the first line of wrapped
output. Counts towards the line's width.
subsequent_indent (default: "")
string that will be prepended to all lines save the first
of wrapped output; also counts towards each line's width.
expand_tabs (default: true)
Expand tabs in input text to spaces before further processing.
Each tab will become 0 .. 'tabsize' spaces, depending on its position
in its line. If false, each tab is treated as a single character.
tabsize (default: 8)
Expand tabs in input text to 0 .. 'tabsize' spaces, unless
'expand_tabs' is false.
replace_whitespace (default: true)
Replace all whitespace characters in the input text by spaces
after tab expansion. Note that if expand_tabs is false and
replace_whitespace is true, every tab will be converted to a
single space!
fix_sentence_endings (default: false)
Ensure that sentence-ending punctuation is always followed
by two spaces. Off by default because the algorithm is
(unavoidably) imperfect.
break_long_words (default: true)
Break words longer than 'width'. If false, those words will not
be broken, and some lines might be longer than 'width'.
break_on_hyphens (default: true)
Allow breaking hyphenated words. If true, wrapping will occur
preferably on whitespaces and right after hyphens part of
compound words.
drop_whitespace (default: true)
Drop leading and trailing whitespace from lines.
max_lines (default: None)
Truncate wrapped lines.
placeholder (default: ' [...]')
Append to the last line of truncated text.
"""
unicode_whitespace_trans = {}
uspace = ord(' ')
for x in _whitespace:
unicode_whitespace_trans[ord(x)] = uspace
# This funky little regex is just the trick for splitting
# text up into word-wrappable chunks. E.g.
# "Hello there -- you goof-ball, use the -b option!"
# splits into
# Hello/ /there/ /--/ /you/ /goof-/ball,/ /use/ /the/ /-b/ /option!
# (after stripping out empty strings).
wordsep_re = re.compile(
r'(\s+|' # any whitespace
r'[^\s\w]*\w+[^0-9\W]-(?=\w+[^0-9\W])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
# This less funky little regex just split on recognized spaces. E.g.
# "Hello there -- you goof-ball, use the -b option!"
# splits into
# Hello/ /there/ /--/ /you/ /goof-ball,/ /use/ /the/ /-b/ /option!/
wordsep_simple_re = re.compile(r'(\s+)')
# XXX this is not locale- or charset-aware -- string.lowercase
# is US-ASCII only (and therefore English-only)
sentence_end_re = re.compile(r'[a-z]' # lowercase letter
r'[\.\!\?]' # sentence-ending punct.
r'[\"\']?' # optional end-of-quote
r'\Z') # end of chunk
def __init__(self,
width=70,
initial_indent="",
subsequent_indent="",
expand_tabs=True,
replace_whitespace=True,
fix_sentence_endings=False,
break_long_words=True,
drop_whitespace=True,
break_on_hyphens=True,
tabsize=8,
*,
max_lines=None,
placeholder=' [...]'):
self.width = width
self.initial_indent = initial_indent
self.subsequent_indent = subsequent_indent
self.expand_tabs = expand_tabs
self.replace_whitespace = replace_whitespace
self.fix_sentence_endings = fix_sentence_endings
self.break_long_words = break_long_words
self.drop_whitespace = drop_whitespace
self.break_on_hyphens = break_on_hyphens
self.tabsize = tabsize
self.max_lines = max_lines
self.placeholder = placeholder
# -- Private methods -----------------------------------------------
# (possibly useful for subclasses to override)
def _munge_whitespace(self, text):
"""_munge_whitespace(text : string) -> string
Munge whitespace in text: expand tabs and convert all other
whitespace characters to spaces. Eg. " foo\\tbar\\n\\nbaz"
becomes " foo bar baz".
"""
if self.expand_tabs:
text = text.expandtabs(self.tabsize)
if self.replace_whitespace:
text = text.translate(self.unicode_whitespace_trans)
return text
def _split(self, text):
"""_split(text : string) -> [string]
Split the text to wrap into indivisible chunks. Chunks are
not quite the same as words; see _wrap_chunks() for full
details. As an example, the text
Look, goof-ball -- use the -b option!
breaks into the following chunks:
'Look,', ' ', 'goof-', 'ball', ' ', '--', ' ',
'use', ' ', 'the', ' ', '-b', ' ', 'option!'
if break_on_hyphens is True, or in:
'Look,', ' ', 'goof-ball', ' ', '--', ' ',
'use', ' ', 'the', ' ', '-b', ' ', option!'
otherwise.
"""
if self.break_on_hyphens is True:
chunks = self.wordsep_re.split(text)
else:
chunks = self.wordsep_simple_re.split(text)
chunks = [c for c in chunks if c]
return chunks
def _fix_sentence_endings(self, chunks):
"""_fix_sentence_endings(chunks : [string])
Correct for sentence endings buried in 'chunks'. Eg. when the
original text contains "... foo.\\nBar ...", munge_whitespace()
and split() will convert that to [..., "foo.", " ", "Bar", ...]
which has one too few spaces; this method simply changes the one
space to two.
"""
i = 0
patsearch = self.sentence_end_re.search
while i < len(chunks)-1:
if chunks[i+1] == " " and patsearch(chunks[i]):
chunks[i+1] = " "
i += 2
else:
i += 1
def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
"""_handle_long_word(chunks : [string],
cur_line : [string],
cur_len : int, width : int)
Handle a chunk of text (most likely a word, not whitespace) that
is too long to fit in any line.
"""
# Figure out when indent is larger than the specified width, and make
# sure at least one character is stripped off on every pass
if width < 1:
space_left = 1
else:
space_left = width - cur_len
# If we're allowed to break long words, then do so: put as much
# of the next chunk onto the current line as will fit.
if self.break_long_words:
cur_line.append(reversed_chunks[-1][:space_left])
reversed_chunks[-1] = reversed_chunks[-1][space_left:]
# Otherwise, we have to preserve the long word intact. Only add
# it to the current line if there's nothing already there --
# that minimizes how much we violate the width constraint.
elif not cur_line:
cur_line.append(reversed_chunks.pop())
# If we're not allowed to break long words, and there's already
# text on the current line, do nothing. Next time through the
# main loop of _wrap_chunks(), we'll wind up here again, but
# cur_len will be zero, so the next line will be entirely
# devoted to the long word that we can't handle right now.
def _wrap_chunks(self, chunks):
"""_wrap_chunks(chunks : [string]) -> [string]
Wrap a sequence of text chunks and return a list of lines of
length 'self.width' or less. (If 'break_long_words' is false,
some lines may be longer than this.) Chunks correspond roughly
to words and the whitespace between them: each chunk is
indivisible (modulo 'break_long_words'), but a line break can
come between any two chunks. Chunks should not have internal
whitespace; ie. a chunk is either all whitespace or a "word".
Whitespace chunks will be removed from the beginning and end of
lines, but apart from that whitespace is preserved.
"""
lines = []
if self.width <= 0:
raise ValueError("invalid width %r (must be > 0)" % self.width)
if self.max_lines is not None:
if self.max_lines > 1:
indent = self.subsequent_indent
else:
indent = self.initial_indent
if len(indent) + len(self.placeholder.lstrip()) > self.width:
raise ValueError("placeholder too large for max width")
# Arrange in reverse order so items can be efficiently popped
# from a stack of chucks.
chunks.reverse()
while chunks:
# Start the list of chunks that will make up the current line.
# cur_len is just the length of all the chunks in cur_line.
cur_line = []
cur_len = 0
# Figure out which static string will prefix this line.
if lines:
indent = self.subsequent_indent
else:
indent = self.initial_indent
# Maximum width for this line.
width = self.width - len(indent)
# First chunk on line is whitespace -- drop it, unless this
# is the very beginning of the text (ie. no lines started yet).
if self.drop_whitespace and chunks[-1].strip() == '' and lines:
del chunks[-1]
while chunks:
l = len(chunks[-1])
# Can at least squeeze this chunk onto the current line.
if cur_len + l <= width:
cur_line.append(chunks.pop())
cur_len += l
# Nope, this line is full.
else:
break
# The current line is full, and the next chunk is too big to
# fit on *any* line (not just this one).
if chunks and len(chunks[-1]) > width:
self._handle_long_word(chunks, cur_line, cur_len, width)
cur_len = sum(map(len, cur_line))
# If the last chunk on this line is all whitespace, drop it.
if self.drop_whitespace and cur_line and cur_line[-1].strip() == '':
cur_len -= len(cur_line[-1])
del cur_line[-1]
if cur_line:
if (self.max_lines is None or
len(lines) + 1 < self.max_lines or
(not chunks or
self.drop_whitespace and
len(chunks) == 1 and
not chunks[0].strip()) and cur_len <= width):
# Convert current line back to a string and store it in
# list of all lines (return value).
lines.append(indent + ''.join(cur_line))
else:
while cur_line:
if (cur_line[-1].strip() and
cur_len + len(self.placeholder) <= width):
cur_line.append(self.placeholder)
lines.append(indent + ''.join(cur_line))
break
cur_len -= len(cur_line[-1])
del cur_line[-1]
else:
if lines:
prev_line = lines[-1].rstrip()
if (len(prev_line) + len(self.placeholder) <=
self.width):
lines[-1] = prev_line + self.placeholder
break
lines.append(indent + self.placeholder.lstrip())
break
return lines
def _split_chunks(self, text):
text = self._munge_whitespace(text)
return self._split(text)
# -- Public interface ----------------------------------------------
def wrap(self, text):
"""wrap(text : string) -> [string]
Reformat the single paragraph in 'text' so it fits in lines of
no more than 'self.width' columns, and return a list of wrapped
lines. Tabs in 'text' are expanded with string.expandtabs(),
and all other whitespace characters (including newline) are
converted to space.
"""
chunks = self._split_chunks(text)
if self.fix_sentence_endings:
self._fix_sentence_endings(chunks)
return self._wrap_chunks(chunks)
def fill(self, text):
"""fill(text : string) -> string
Reformat the single paragraph in 'text' to fit in lines of no
more than 'self.width' columns, and return a new string
containing the entire wrapped paragraph.
"""
return "\n".join(self.wrap(text))
# -- Convenience interface ---------------------------------------------
def wrap(text, width=70, **kwargs):
"""Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
"""
w = TextWrapper(width=width, **kwargs)
return w.wrap(text)
def fill(text, width=70, **kwargs):
"""Fill a single paragraph of text, returning a new string.
Reformat the single paragraph in 'text' to fit in lines of no more
than 'width' columns, and return a new string containing the entire
wrapped paragraph. As with wrap(), tabs are expanded and other
whitespace characters converted to space. See TextWrapper class for
available keyword args to customize wrapping behaviour.
"""
w = TextWrapper(width=width, **kwargs)
return w.fill(text)
def shorten(text, width, **kwargs):
"""Collapse and truncate the given text to fit in the given width.
The text first has its whitespace collapsed. If it then fits in
the *width*, it is returned as is. Otherwise, as many words
as possible are joined and then the placeholder is appended::
>>> textwrap.shorten("Hello world!", width=12)
'Hello world!'
>>> textwrap.shorten("Hello world!", width=11)
'Hello [...]'
"""
w = TextWrapper(width=width, max_lines=1, **kwargs)
return w.fill(' '.join(text.strip().split()))
# -- Loosely related functionality -------------------------------------
_whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
def dedent(text):
"""Remove any common leading whitespace from every line in `text`.
This can be used to make triple-quoted strings line up with the left
edge of the display, while still presenting them in the source code
in indented form.
Note that tabs and spaces are both treated as whitespace, but they
are not equal: the lines " hello" and "\\thello" are
considered to have no common leading whitespace. (This behaviour is
new in Python 2.5; older versions of this module incorrectly
expanded tabs before searching for common leading whitespace.)
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub('', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent
# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass
# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent
# Find the largest common whitespace between current line and previous
# winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
else:
margin = margin[:len(indent)]
# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split("\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
if margin:
text = re.sub(r'(?m)^' + margin, '', text)
return text
def indent(text, prefix, predicate=None):
"""Adds 'prefix' to the beginning of selected lines in 'text'.
If 'predicate' is provided, 'prefix' will only be added to the lines
where 'predicate(line)' is True. If 'predicate' is not provided,
it will default to adding 'prefix' to all non-empty lines that do not
consist solely of whitespace characters.
"""
if predicate is None:
def predicate(line):
return line.strip()
def prefixed_lines():
for line in text.splitlines(True):
yield (prefix + line if predicate(line) else line)
return ''.join(prefixed_lines())
if __name__ == "__main__":
#print dedent("\tfoo\n\tbar")
#print dedent(" \thello there\n \t how are you?")
print(dedent("Hello there.\n This is indented."))
s = """Gur Mra bs Clguba, ol Gvz Crgref
Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""
d = {}
for c in (65, 97):
for i in range(26):
d[chr(i+c)] = chr((i+13) % 26 + c)
print("".join([d.get(c, c) for c in s]))
"""Thread module emulating a subset of Java's threading model."""
import sys as _sys
import _thread
try:
from time import monotonic as _time
except ImportError:
from time import time as _time
from traceback import format_exc as _format_exc
from _weakrefset import WeakSet
from itertools import islice as _islice, count as _count
try:
from _collections import deque as _deque
except ImportError:
from collections import deque as _deque
# Note regarding PEP 8 compliant names
# This threading model was originally inspired by Java, and inherited
# the convention of camelCase function and method names from that
# language. Those original names are not in any imminent danger of
# being deprecated (even for Py3k),so this module provides them as an
# alias for the PEP 8 compliant names
# Note that using the new PEP 8 compliant names facilitates substitution
# with the multiprocessing module, which doesn't provide the old
# Java inspired names.
__all__ = ['active_count', 'Condition', 'current_thread', 'enumerate', 'Event',
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Thread', 'Barrier',
'Timer', 'ThreadError', 'setprofile', 'settrace', 'local', 'stack_size']
# Rename some stuff so "from threading import *" is safe
_start_new_thread = _thread.start_new_thread
_allocate_lock = _thread.allocate_lock
_set_sentinel = _thread._set_sentinel
get_ident = _thread.get_ident
ThreadError = _thread.error
try:
_CRLock = _thread.RLock
except AttributeError:
_CRLock = None
TIMEOUT_MAX = _thread.TIMEOUT_MAX
del _thread
# Support for profile and trace hooks
_profile_hook = None
_trace_hook = None
def setprofile(func):
"""Set a profile function for all threads started from the threading module.
The func will be passed to sys.setprofile() for each thread, before its
run() method is called.
"""
global _profile_hook
_profile_hook = func
def settrace(func):
"""Set a trace function for all threads started from the threading module.
The func will be passed to sys.settrace() for each thread, before its run()
method is called.
"""
global _trace_hook
_trace_hook = func
# Synchronization classes
Lock = _allocate_lock
def RLock(*args, **kwargs):
"""Factory function that returns a new reentrant lock.
A reentrant lock must be released by the thread that acquired it. Once a
thread has acquired a reentrant lock, the same thread may acquire it again
without blocking; the thread must release it once for each time it has
acquired it.
"""
if _CRLock is None:
return _PyRLock(*args, **kwargs)
return _CRLock(*args, **kwargs)
class _RLock:
"""This class implements reentrant lock objects.
A reentrant lock must be released by the thread that acquired it. Once a
thread has acquired a reentrant lock, the same thread may acquire it
again without blocking; the thread must release it once for each time it
has acquired it.
"""
def __init__(self):
self._block = _allocate_lock()
self._owner = None
self._count = 0
def __repr__(self):
owner = self._owner
try:
owner = _active[owner].name
except KeyError:
pass
return "<%s owner=%r count=%d>" % (
self.__class__.__name__, owner, self._count)
def acquire(self, blocking=True, timeout=-1):
"""Acquire a lock, blocking or non-blocking.
When invoked without arguments: if this thread already owns the lock,
increment the recursion level by one, and return immediately. Otherwise,
if another thread owns the lock, block until the lock is unlocked. Once
the lock is unlocked (not owned by any thread), then grab ownership, set
the recursion level to one, and return. If more than one thread is
blocked waiting until the lock is unlocked, only one at a time will be
able to grab ownership of the lock. There is no return value in this
case.
When invoked with the blocking argument set to true, do the same thing
as when called without arguments, and return true.
When invoked with the blocking argument set to false, do not block. If a
call without an argument would block, return false immediately;
otherwise, do the same thing as when called without arguments, and
return true.
When invoked with the floating-point timeout argument set to a positive
value, block for at most the number of seconds specified by timeout
and as long as the lock cannot be acquired. Return true if the lock has
been acquired, false if the timeout has elapsed.
"""
me = get_ident()
if self._owner == me:
self._count += 1
return 1
rc = self._block.acquire(blocking, timeout)
if rc:
self._owner = me
self._count = 1
return rc
__enter__ = acquire
def release(self):
"""Release a lock, decrementing the recursion level.
If after the decrement it is zero, reset the lock to unlocked (not owned
by any thread), and if any other threads are blocked waiting for the
lock to become unlocked, allow exactly one of them to proceed. If after
the decrement the recursion level is still nonzero, the lock remains
locked and owned by the calling thread.
Only call this method when the calling thread owns the lock. A
RuntimeError is raised if this method is called when the lock is
unlocked.
There is no return value.
"""
if self._owner != get_ident():
raise RuntimeError("cannot release un-acquired lock")
self._count = count = self._count - 1
if not count:
self._owner = None
self._block.release()
def __exit__(self, t, v, tb):
self.release()
# Internal methods used by condition variables
def _acquire_restore(self, state):
self._block.acquire()
self._count, self._owner = state
def _release_save(self):
if self._count == 0:
raise RuntimeError("cannot release un-acquired lock")
count = self._count
self._count = 0
owner = self._owner
self._owner = None
self._block.release()
return (count, owner)
def _is_owned(self):
return self._owner == get_ident()
_PyRLock = _RLock
class Condition:
"""Class that implements a condition variable.
A condition variable allows one or more threads to wait until they are
notified by another thread.
If the lock argument is given and not None, it must be a Lock or RLock
object, and it is used as the underlying lock. Otherwise, a new RLock object
is created and used as the underlying lock.
"""
def __init__(self, lock=None):
if lock is None:
lock = RLock()
self._lock = lock
# Export the lock's acquire() and release() methods
self.acquire = lock.acquire
self.release = lock.release
# If the lock defines _release_save() and/or _acquire_restore(),
# these override the default implementations (which just call
# release() and acquire() on the lock). Ditto for _is_owned().
try:
self._release_save = lock._release_save
except AttributeError:
pass
try:
self._acquire_restore = lock._acquire_restore
except AttributeError:
pass
try:
self._is_owned = lock._is_owned
except AttributeError:
pass
self._waiters = _deque()
def __enter__(self):
return self._lock.__enter__()
def __exit__(self, *args):
return self._lock.__exit__(*args)
def __repr__(self):
return "<Condition(%s, %d)>" % (self._lock, len(self._waiters))
def _release_save(self):
self._lock.release() # No state to save
def _acquire_restore(self, x):
self._lock.acquire() # Ignore saved state
def _is_owned(self):
# Return True if lock is owned by current_thread.
# This method is called only if _lock doesn't have _is_owned().
if self._lock.acquire(0):
self._lock.release()
return False
else:
return True
def wait(self, timeout=None):
"""Wait until notified or until a timeout occurs.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks until it is
awakened by a notify() or notify_all() call for the same condition
variable in another thread, or until the optional timeout occurs. Once
awakened or timed out, it re-acquires the lock and returns.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
When the underlying lock is an RLock, it is not released using its
release() method, since this may not actually unlock the lock when it
was acquired multiple times recursively. Instead, an internal interface
of the RLock class is used, which really unlocks it even when it has
been recursively acquired several times. Another internal interface is
then used to restore the recursion level when the lock is reacquired.
"""
if not self._is_owned():
raise RuntimeError("cannot wait on un-acquired lock")
waiter = _allocate_lock()
waiter.acquire()
self._waiters.append(waiter)
saved_state = self._release_save()
gotit = False
try: # restore state no matter what (e.g., KeyboardInterrupt)
if timeout is None:
waiter.acquire()
gotit = True
else:
if timeout > 0:
gotit = waiter.acquire(True, timeout)
else:
gotit = waiter.acquire(False)
return gotit
finally:
self._acquire_restore(saved_state)
if not gotit:
try:
self._waiters.remove(waiter)
except ValueError:
pass
def wait_for(self, predicate, timeout=None):
"""Wait until a condition evaluates to True.
predicate should be a callable which result will be interpreted as a
boolean value. A timeout may be provided giving the maximum time to
wait.
"""
endtime = None
waittime = timeout
result = predicate()
while not result:
if waittime is not None:
if endtime is None:
endtime = _time() + waittime
else:
waittime = endtime - _time()
if waittime <= 0:
break
self.wait(waittime)
result = predicate()
return result
def notify(self, n=1):
"""Wake up one or more threads waiting on this condition, if any.
If the calling thread has not acquired the lock when this method is
called, a RuntimeError is raised.
This method wakes up at most n of the threads waiting for the condition
variable; it is a no-op if no threads are waiting.
"""
if not self._is_owned():
raise RuntimeError("cannot notify on un-acquired lock")
all_waiters = self._waiters
waiters_to_notify = _deque(_islice(all_waiters, n))
if not waiters_to_notify:
return
for waiter in waiters_to_notify:
waiter.release()
try:
all_waiters.remove(waiter)
except ValueError:
pass
def notify_all(self):
"""Wake up all threads waiting on this condition.
If the calling thread has not acquired the lock when this method
is called, a RuntimeError is raised.
"""
self.notify(len(self._waiters))
notifyAll = notify_all
class Semaphore:
"""This class implements semaphore objects.
Semaphores manage a counter representing the number of release() calls minus
the number of acquire() calls, plus an initial value. The acquire() method
blocks if necessary until it can return without making the counter
negative. If not given, value defaults to 1.
"""
# After Tim Peters' semaphore class, but not quite the same (no maximum)
def __init__(self, value=1):
if value < 0:
raise ValueError("semaphore initial value must be >= 0")
self._cond = Condition(Lock())
self._value = value
def acquire(self, blocking=True, timeout=None):
"""Acquire a semaphore, decrementing the internal counter by one.
When invoked without arguments: if the internal counter is larger than
zero on entry, decrement it by one and return immediately. If it is zero
on entry, block, waiting until some other thread has called release() to
make it larger than zero. This is done with proper interlocking so that
if multiple acquire() calls are blocked, release() will wake exactly one
of them up. The implementation may pick one at random, so the order in
which blocked threads are awakened should not be relied on. There is no
return value in this case.
When invoked with blocking set to true, do the same thing as when called
without arguments, and return true.
When invoked with blocking set to false, do not block. If a call without
an argument would block, return false immediately; otherwise, do the
same thing as when called without arguments, and return true.
When invoked with a timeout other than None, it will block for at
most timeout seconds. If acquire does not complete successfully in
that interval, return false. Return true otherwise.
"""
if not blocking and timeout is not None:
raise ValueError("can't specify timeout for non-blocking acquire")
rc = False
endtime = None
with self._cond:
while self._value == 0:
if not blocking:
break
if timeout is not None:
if endtime is None:
endtime = _time() + timeout
else:
timeout = endtime - _time()
if timeout <= 0:
break
self._cond.wait(timeout)
else:
self._value -= 1
rc = True
return rc
__enter__ = acquire
def release(self):
"""Release a semaphore, incrementing the internal counter by one.
When the counter is zero on entry and another thread is waiting for it
to become larger than zero again, wake up that thread.
"""
with self._cond:
self._value += 1
self._cond.notify()
def __exit__(self, t, v, tb):
self.release()
class BoundedSemaphore(Semaphore):
"""Implements a bounded semaphore.
A bounded semaphore checks to make sure its current value doesn't exceed its
initial value. If it does, ValueError is raised. In most situations
semaphores are used to guard resources with limited capacity.
If the semaphore is released too many times it's a sign of a bug. If not
given, value defaults to 1.
Like regular semaphores, bounded semaphores manage a counter representing
the number of release() calls minus the number of acquire() calls, plus an
initial value. The acquire() method blocks if necessary until it can return
without making the counter negative. If not given, value defaults to 1.
"""
def __init__(self, value=1):
Semaphore.__init__(self, value)
self._initial_value = value
def release(self):
"""Release a semaphore, incrementing the internal counter by one.
When the counter is zero on entry and another thread is waiting for it
to become larger than zero again, wake up that thread.
If the number of releases exceeds the number of acquires,
raise a ValueError.
"""
with self._cond:
if self._value >= self._initial_value:
raise ValueError("Semaphore released too many times")
self._value += 1
self._cond.notify()
class Event:
"""Class implementing event objects.
Events manage a flag that can be set to true with the set() method and reset
to false with the clear() method. The wait() method blocks until the flag is
true. The flag is initially false.
"""
# After Tim Peters' event class (without is_posted())
def __init__(self):
self._cond = Condition(Lock())
self._flag = False
def _reset_internal_locks(self):
# private! called by Thread._reset_internal_locks by _after_fork()
self._cond.__init__(Lock())
def is_set(self):
"""Return true if and only if the internal flag is true."""
return self._flag
isSet = is_set
def set(self):
"""Set the internal flag to true.
All threads waiting for it to become true are awakened. Threads
that call wait() once the flag is true will not block at all.
"""
with self._cond:
self._flag = True
self._cond.notify_all()
def clear(self):
"""Reset the internal flag to false.
Subsequently, threads calling wait() will block until set() is called to
set the internal flag to true again.
"""
with self._cond:
self._flag = False
def wait(self, timeout=None):
"""Block until the internal flag is true.
If the internal flag is true on entry, return immediately. Otherwise,
block until another thread calls set() to set the flag to true, or until
the optional timeout occurs.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof).
This method returns the internal flag on exit, so it will always return
True except if a timeout is given and the operation times out.
"""
with self._cond:
signaled = self._flag
if not signaled:
signaled = self._cond.wait(timeout)
return signaled
# A barrier class. Inspired in part by the pthread_barrier_* api and
# the CyclicBarrier class from Java. See
# http://sourceware.org/pthreads-win32/manual/pthread_barrier_init.html and
# http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/
# CyclicBarrier.html
# for information.
# We maintain two main states, 'filling' and 'draining' enabling the barrier
# to be cyclic. Threads are not allowed into it until it has fully drained
# since the previous cycle. In addition, a 'resetting' state exists which is
# similar to 'draining' except that threads leave with a BrokenBarrierError,
# and a 'broken' state in which all threads get the exception.
class Barrier:
"""Implements a Barrier.
Useful for synchronizing a fixed number of threads at known synchronization
points. Threads block on 'wait()' and are simultaneously once they have all
made that call.
"""
def __init__(self, parties, action=None, timeout=None):
"""Create a barrier, initialised to 'parties' threads.
'action' is a callable which, when supplied, will be called by one of
the threads after they have all entered the barrier and just prior to
releasing them all. If a 'timeout' is provided, it is uses as the
default for all subsequent 'wait()' calls.
"""
self._cond = Condition(Lock())
self._action = action
self._timeout = timeout
self._parties = parties
self._state = 0 #0 filling, 1, draining, -1 resetting, -2 broken
self._count = 0
def wait(self, timeout=None):
"""Wait for the barrier.
When the specified number of threads have started waiting, they are all
simultaneously awoken. If an 'action' was provided for the barrier, one
of the threads will have executed that callback prior to returning.
Returns an individual index number from 0 to 'parties-1'.
"""
if timeout is None:
timeout = self._timeout
with self._cond:
self._enter() # Block while the barrier drains.
index = self._count
self._count += 1
try:
if index + 1 == self._parties:
# We release the barrier
self._release()
else:
# We wait until someone releases us
self._wait(timeout)
return index
finally:
self._count -= 1
# Wake up any threads waiting for barrier to drain.
self._exit()
# Block until the barrier is ready for us, or raise an exception
# if it is broken.
def _enter(self):
while self._state in (-1, 1):
# It is draining or resetting, wait until done
self._cond.wait()
#see if the barrier is in a broken state
if self._state < 0:
raise BrokenBarrierError
assert self._state == 0
# Optionally run the 'action' and release the threads waiting
# in the barrier.
def _release(self):
try:
if self._action:
self._action()
# enter draining state
self._state = 1
self._cond.notify_all()
except:
#an exception during the _action handler. Break and reraise
self._break()
raise
# Wait in the barrier until we are relased. Raise an exception
# if the barrier is reset or broken.
def _wait(self, timeout):
if not self._cond.wait_for(lambda : self._state != 0, timeout):
#timed out. Break the barrier
self._break()
raise BrokenBarrierError
if self._state < 0:
raise BrokenBarrierError
assert self._state == 1
# If we are the last thread to exit the barrier, signal any threads
# waiting for the barrier to drain.
def _exit(self):
if self._count == 0:
if self._state in (-1, 1):
#resetting or draining
self._state = 0
self._cond.notify_all()
def reset(self):
"""Reset the barrier to the initial state.
Any threads currently waiting will get the BrokenBarrier exception
raised.
"""
with self._cond:
if self._count > 0:
if self._state == 0:
#reset the barrier, waking up threads
self._state = -1
elif self._state == -2:
#was broken, set it to reset state
#which clears when the last thread exits
self._state = -1
else:
self._state = 0
self._cond.notify_all()
def abort(self):
"""Place the barrier into a 'broken' state.
Useful in case of error. Any currently waiting threads and threads
attempting to 'wait()' will have BrokenBarrierError raised.
"""
with self._cond:
self._break()
def _break(self):
# An internal error was detected. The barrier is set to
# a broken state all parties awakened.
self._state = -2
self._cond.notify_all()
@property
def parties(self):
"""Return the number of threads required to trip the barrier."""
return self._parties
@property
def n_waiting(self):
"""Return the number of threads currently waiting at the barrier."""
# We don't need synchronization here since this is an ephemeral result
# anyway. It returns the correct value in the steady state.
if self._state == 0:
return self._count
return 0
@property
def broken(self):
"""Return True if the barrier is in a broken state."""
return self._state == -2
# exception raised by the Barrier class
class BrokenBarrierError(RuntimeError):
pass
# Helper to generate new thread names
_counter = _count().__next__
_counter() # Consume 0 so first non-main thread has id 1.
def _newname(template="Thread-%d"):
return template % _counter()
# Active thread administration
_active_limbo_lock = _allocate_lock()
_active = {} # maps thread id to Thread object
_limbo = {}
_dangling = WeakSet()
# Main class for threads
class Thread:
"""A class that represents a thread of control.
This class can be safely subclassed in a limited fashion. There are two ways
to specify the activity: by passing a callable object to the constructor, or
by overriding the run() method in a subclass.
"""
_initialized = False
# Need to store a reference to sys.exc_info for printing
# out exceptions when a thread tries to use a global var. during interp.
# shutdown and thus raises an exception about trying to perform some
# operation on/with a NoneType
_exc_info = _sys.exc_info
# Keep sys.exc_clear too to clear the exception just before
# allowing .join() to return.
#XXX __exc_clear = _sys.exc_clear
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, *, daemon=None):
"""This constructor should always be called with keyword arguments. Arguments are:
*group* should be None; reserved for future extension when a ThreadGroup
class is implemented.
*target* is the callable object to be invoked by the run()
method. Defaults to None, meaning nothing is called.
*name* is the thread name. By default, a unique name is constructed of
the form "Thread-N" where N is a small decimal number.
*args* is the argument tuple for the target invocation. Defaults to ().
*kwargs* is a dictionary of keyword arguments for the target
invocation. Defaults to {}.
If a subclass overrides the constructor, it must make sure to invoke
the base class constructor (Thread.__init__()) before doing anything
else to the thread.
"""
assert group is None, "group argument must be None for now"
if kwargs is None:
kwargs = {}
self._target = target
self._name = str(name or _newname())
self._args = args
self._kwargs = kwargs
if daemon is not None:
self._daemonic = daemon
else:
self._daemonic = current_thread().daemon
self._ident = None
self._tstate_lock = None
self._started = Event()
self._is_stopped = False
self._initialized = True
# sys.stderr is not stored in the class like
# sys.exc_info since it can be changed between instances
self._stderr = _sys.stderr
# For debugging and _after_fork()
_dangling.add(self)
def _reset_internal_locks(self, is_alive):
# private! Called by _after_fork() to reset our internal locks as
# they may be in an invalid state leading to a deadlock or crash.
self._started._reset_internal_locks()
if is_alive:
self._set_tstate_lock()
else:
# The thread isn't alive after fork: it doesn't have a tstate
# anymore.
self._is_stopped = True
self._tstate_lock = None
def __repr__(self):
assert self._initialized, "Thread.__init__() was not called"
status = "initial"
if self._started.is_set():
status = "started"
self.is_alive() # easy way to get ._is_stopped set when appropriate
if self._is_stopped:
status = "stopped"
if self._daemonic:
status += " daemon"
if self._ident is not None:
status += " %s" % self._ident
return "<%s(%s, %s)>" % (self.__class__.__name__, self._name, status)
def start(self):
"""Start the thread's activity.
It must be called at most once per thread object. It arranges for the
object's run() method to be invoked in a separate thread of control.
This method will raise a RuntimeError if called more than once on the
same thread object.
"""
if not self._initialized:
raise RuntimeError("thread.__init__() not called")
if self._started.is_set():
raise RuntimeError("threads can only be started once")
with _active_limbo_lock:
_limbo[self] = self
try:
_start_new_thread(self._bootstrap, ())
except Exception:
with _active_limbo_lock:
del _limbo[self]
raise
self._started.wait()
def run(self):
"""Method representing the thread's activity.
You may override this method in a subclass. The standard run() method
invokes the callable object passed to the object's constructor as the
target argument, if any, with sequential and keyword arguments taken
from the args and kwargs arguments, respectively.
"""
try:
if self._target:
self._target(*self._args, **self._kwargs)
finally:
# Avoid a refcycle if the thread is running a function with
# an argument that has a member that points to the thread.
del self._target, self._args, self._kwargs
def _bootstrap(self):
# Wrapper around the real bootstrap code that ignores
# exceptions during interpreter cleanup. Those typically
# happen when a daemon thread wakes up at an unfortunate
# moment, finds the world around it destroyed, and raises some
# random exception *** while trying to report the exception in
# _bootstrap_inner() below ***. Those random exceptions
# don't help anybody, and they confuse users, so we suppress
# them. We suppress them only when it appears that the world
# indeed has already been destroyed, so that exceptions in
# _bootstrap_inner() during normal business hours are properly
# reported. Also, we only suppress them for daemonic threads;
# if a non-daemonic encounters this, something else is wrong.
try:
self._bootstrap_inner()
except:
if self._daemonic and _sys is None:
return
raise
def _set_ident(self):
self._ident = get_ident()
def _set_tstate_lock(self):
"""
Set a lock object which will be released by the interpreter when
the underlying thread state (see pystate.h) gets deleted.
"""
self._tstate_lock = _set_sentinel()
self._tstate_lock.acquire()
def _bootstrap_inner(self):
try:
self._set_ident()
self._set_tstate_lock()
self._started.set()
with _active_limbo_lock:
_active[self._ident] = self
del _limbo[self]
if _trace_hook:
_sys.settrace(_trace_hook)
if _profile_hook:
_sys.setprofile(_profile_hook)
try:
self.run()
except SystemExit:
pass
except:
# If sys.stderr is no more (most likely from interpreter
# shutdown) use self._stderr. Otherwise still use sys (as in
# _sys) in case sys.stderr was redefined since the creation of
# self.
if _sys and _sys.stderr is not None:
print("Exception in thread %s:\n%s" %
(self.name, _format_exc()), file=self._stderr)
elif self._stderr is not None:
# Do the best job possible w/o a huge amt. of code to
# approximate a traceback (code ideas from
# Lib/traceback.py)
exc_type, exc_value, exc_tb = self._exc_info()
try:
print((
"Exception in thread " + self.name +
" (most likely raised during interpreter shutdown):"), file=self._stderr)
print((
"Traceback (most recent call last):"), file=self._stderr)
while exc_tb:
print((
' File "%s", line %s, in %s' %
(exc_tb.tb_frame.f_code.co_filename,
exc_tb.tb_lineno,
exc_tb.tb_frame.f_code.co_name)), file=self._stderr)
exc_tb = exc_tb.tb_next
print(("%s: %s" % (exc_type, exc_value)), file=self._stderr)
# Make sure that exc_tb gets deleted since it is a memory
# hog; deleting everything else is just for thoroughness
finally:
del exc_type, exc_value, exc_tb
finally:
# Prevent a race in
# test_threading.test_no_refcycle_through_target when
# the exception keeps the target alive past when we
# assert that it's dead.
#XXX self._exc_clear()
pass
finally:
with _active_limbo_lock:
try:
# We don't call self._delete() because it also
# grabs _active_limbo_lock.
del _active[get_ident()]
except:
pass
def _stop(self):
# After calling ._stop(), .is_alive() returns False and .join() returns
# immediately. ._tstate_lock must be released before calling ._stop().
#
# Normal case: C code at the end of the thread's life
# (release_sentinel in _threadmodule.c) releases ._tstate_lock, and
# that's detected by our ._wait_for_tstate_lock(), called by .join()
# and .is_alive(). Any number of threads _may_ call ._stop()
# simultaneously (for example, if multiple threads are blocked in
# .join() calls), and they're not serialized. That's harmless -
# they'll just make redundant rebindings of ._is_stopped and
# ._tstate_lock. Obscure: we rebind ._tstate_lock last so that the
# "assert self._is_stopped" in ._wait_for_tstate_lock() always works
# (the assert is executed only if ._tstate_lock is None).
#
# Special case: _main_thread releases ._tstate_lock via this
# module's _shutdown() function.
lock = self._tstate_lock
if lock is not None:
assert not lock.locked()
self._is_stopped = True
self._tstate_lock = None
def _delete(self):
"Remove current thread from the dict of currently running threads."
# Notes about running with _dummy_thread:
#
# Must take care to not raise an exception if _dummy_thread is being
# used (and thus this module is being used as an instance of
# dummy_threading). _dummy_thread.get_ident() always returns -1 since
# there is only one thread if _dummy_thread is being used. Thus
# len(_active) is always <= 1 here, and any Thread instance created
# overwrites the (if any) thread currently registered in _active.
#
# An instance of _MainThread is always created by 'threading'. This
# gets overwritten the instant an instance of Thread is created; both
# threads return -1 from _dummy_thread.get_ident() and thus have the
# same key in the dict. So when the _MainThread instance created by
# 'threading' tries to clean itself up when atexit calls this method
# it gets a KeyError if another Thread instance was created.
#
# This all means that KeyError from trying to delete something from
# _active if dummy_threading is being used is a red herring. But
# since it isn't if dummy_threading is *not* being used then don't
# hide the exception.
try:
with _active_limbo_lock:
del _active[get_ident()]
# There must not be any python code between the previous line
# and after the lock is released. Otherwise a tracing function
# could try to acquire the lock again in the same thread, (in
# current_thread()), and would block.
except KeyError:
if 'dummy_threading' not in _sys.modules:
raise
def join(self, timeout=None):
"""Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is
called terminates -- either normally or through an unhandled exception
or until the optional timeout occurs.
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in seconds
(or fractions thereof). As join() always returns None, you must call
isAlive() after join() to decide whether a timeout happened -- if the
thread is still alive, the join() call timed out.
When the timeout argument is not present or None, the operation will
block until the thread terminates.
A thread can be join()ed many times.
join() raises a RuntimeError if an attempt is made to join the current
thread as that would cause a deadlock. It is also an error to join() a
thread before it has been started and attempts to do so raises the same
exception.
"""
if not self._initialized:
raise RuntimeError("Thread.__init__() not called")
if not self._started.is_set():
raise RuntimeError("cannot join thread before it is started")
if self is current_thread():
raise RuntimeError("cannot join current thread")
if timeout is None:
self._wait_for_tstate_lock()
else:
# the behavior of a negative timeout isn't documented, but
# historically .join(timeout=x) for x<0 has acted as if timeout=0
self._wait_for_tstate_lock(timeout=max(timeout, 0))
def _wait_for_tstate_lock(self, block=True, timeout=-1):
# Issue #18808: wait for the thread state to be gone.
# At the end of the thread's life, after all knowledge of the thread
# is removed from C data structures, C code releases our _tstate_lock.
# This method passes its arguments to _tstate_lock.aquire().
# If the lock is acquired, the C code is done, and self._stop() is
# called. That sets ._is_stopped to True, and ._tstate_lock to None.
lock = self._tstate_lock
if lock is None: # already determined that the C code is done
assert self._is_stopped
elif lock.acquire(block, timeout):
lock.release()
self._stop()
@property
def name(self):
"""A string used for identification purposes only.
It has no semantics. Multiple threads may be given the same name. The
initial name is set by the constructor.
"""
assert self._initialized, "Thread.__init__() not called"
return self._name
@name.setter
def name(self, name):
assert self._initialized, "Thread.__init__() not called"
self._name = str(name)
@property
def ident(self):
"""Thread identifier of this thread or None if it has not been started.
This is a nonzero integer. See the thread.get_ident() function. Thread
identifiers may be recycled when a thread exits and another thread is
created. The identifier is available even after the thread has exited.
"""
assert self._initialized, "Thread.__init__() not called"
return self._ident
def is_alive(self):
"""Return whether the thread is alive.
This method returns True just before the run() method starts until just
after the run() method terminates. The module function enumerate()
returns a list of all alive threads.
"""
assert self._initialized, "Thread.__init__() not called"
if self._is_stopped or not self._started.is_set():
return False
self._wait_for_tstate_lock(False)
return not self._is_stopped
isAlive = is_alive
@property
def daemon(self):
"""A boolean value indicating whether this thread is a daemon thread.
This must be set before start() is called, otherwise RuntimeError is
raised. Its initial value is inherited from the creating thread; the
main thread is not a daemon thread and therefore all threads created in
the main thread default to daemon = False.
The entire Python program exits when no alive non-daemon threads are
left.
"""
assert self._initialized, "Thread.__init__() not called"
return self._daemonic
@daemon.setter
def daemon(self, daemonic):
if not self._initialized:
raise RuntimeError("Thread.__init__() not called")
if self._started.is_set():
raise RuntimeError("cannot set daemon status of active thread")
self._daemonic = daemonic
def isDaemon(self):
return self.daemon
def setDaemon(self, daemonic):
self.daemon = daemonic
def getName(self):
return self.name
def setName(self, name):
self.name = name
# The timer class was contributed by Itamar Shtull-Trauring
class Timer(Thread):
"""Call a function after a specified number of seconds:
t = Timer(30.0, f, args=None, kwargs=None)
t.start()
t.cancel() # stop the timer's action if it's still waiting
"""
def __init__(self, interval, function, args=None, kwargs=None):
Thread.__init__(self)
self.interval = interval
self.function = function
self.args = args if args is not None else []
self.kwargs = kwargs if kwargs is not None else {}
self.finished = Event()
def cancel(self):
"""Stop the timer if it hasn't finished yet."""
self.finished.set()
def run(self):
self.finished.wait(self.interval)
if not self.finished.is_set():
self.function(*self.args, **self.kwargs)
self.finished.set()
# Special thread class to represent the main thread
# This is garbage collected through an exit handler
class _MainThread(Thread):
def __init__(self):
Thread.__init__(self, name="MainThread", daemon=False)
self._set_tstate_lock()
self._started.set()
self._set_ident()
with _active_limbo_lock:
_active[self._ident] = self
# Dummy thread class to represent threads not started here.
# These aren't garbage collected when they die, nor can they be waited for.
# If they invoke anything in threading.py that calls current_thread(), they
# leave an entry in the _active dict forever after.
# Their purpose is to return *something* from current_thread().
# They are marked as daemon threads so we won't wait for them
# when we exit (conform previous semantics).
class _DummyThread(Thread):
def __init__(self):
Thread.__init__(self, name=_newname("Dummy-%d"), daemon=True)
self._started.set()
self._set_ident()
with _active_limbo_lock:
_active[self._ident] = self
def _stop(self):
pass
def join(self, timeout=None):
assert False, "cannot join a dummy thread"
# Global API functions
def current_thread():
"""Return the current Thread object, corresponding to the caller's thread of control.
If the caller's thread of control was not created through the threading
module, a dummy thread object with limited functionality is returned.
"""
try:
return _active[get_ident()]
except KeyError:
return _DummyThread()
currentThread = current_thread
def active_count():
"""Return the number of Thread objects currently alive.
The returned count is equal to the length of the list returned by
enumerate().
"""
with _active_limbo_lock:
return len(_active) + len(_limbo)
activeCount = active_count
def _enumerate():
# Same as enumerate(), but without the lock. Internal use only.
return list(_active.values()) + list(_limbo.values())
def enumerate():
"""Return a list of all Thread objects currently alive.
The list includes daemonic threads, dummy thread objects created by
current_thread(), and the main thread. It excludes terminated threads and
threads that have not yet been started.
"""
with _active_limbo_lock:
return list(_active.values()) + list(_limbo.values())
from _thread import stack_size
# Create the main thread object,
# and make it available for the interpreter
# (Py_Main) as threading._shutdown.
_main_thread = _MainThread()
def _shutdown():
# Obscure: other threads may be waiting to join _main_thread. That's
# dubious, but some code does it. We can't wait for C code to release
# the main thread's tstate_lock - that won't happen until the interpreter
# is nearly dead. So we release it here. Note that just calling _stop()
# isn't enough: other threads may already be waiting on _tstate_lock.
tlock = _main_thread._tstate_lock
# The main thread isn't finished yet, so its thread state lock can't have
# been released.
assert tlock is not None
assert tlock.locked()
tlock.release()
_main_thread._stop()
t = _pickSomeNonDaemonThread()
while t:
t.join()
t = _pickSomeNonDaemonThread()
_main_thread._delete()
def _pickSomeNonDaemonThread():
for t in enumerate():
if not t.daemon and t.is_alive():
return t
return None
def main_thread():
"""Return the main thread object.
In normal conditions, the main thread is the thread from which the
Python interpreter was started.
"""
return _main_thread
# get thread-local implementation, either from the thread
# module, or from the python fallback
try:
from _thread import _local as local
except ImportError:
from _threading_local import local
def _after_fork():
# This function is called by Python/ceval.c:PyEval_ReInitThreads which
# is called from PyOS_AfterFork. Here we cleanup threading module state
# that should not exist after a fork.
# Reset _active_limbo_lock, in case we forked while the lock was held
# by another (non-forked) thread. http://bugs.python.org/issue874900
global _active_limbo_lock, _main_thread
_active_limbo_lock = _allocate_lock()
# fork() only copied the current thread; clear references to others.
new_active = {}
current = current_thread()
_main_thread = current
with _active_limbo_lock:
# Dangling thread instances must still have their locks reset,
# because someone may join() them.
threads = set(_enumerate())
threads.update(_dangling)
for thread in threads:
# Any lock/condition variable may be currently locked or in an
# invalid state, so we reinitialize them.
if thread is current:
# There is only one active thread. We reset the ident to
# its new value since it can have changed.
thread._reset_internal_locks(True)
ident = get_ident()
thread._ident = ident
new_active[ident] = thread
else:
# All the others are already stopped.
thread._reset_internal_locks(False)
thread._stop()
_limbo.clear()
_active.clear()
_active.update(new_active)
assert len(_active) == 1
#! /usr/bin/env python3
"""Tool for measuring execution time of small code snippets.
This module avoids a number of common traps for measuring execution
times. See also Tim Peters' introduction to the Algorithms chapter in
the Python Cookbook, published by O'Reilly.
Library usage: see the Timer class.
Command line usage:
python timeit.py [-n N] [-r N] [-s S] [-t] [-c] [-p] [-h] [--] [statement]
Options:
-n/--number N: how many times to execute 'statement' (default: see below)
-r/--repeat N: how many times to repeat the timer (default 3)
-s/--setup S: statement to be executed once initially (default 'pass').
Execution time of this setup statement is NOT timed.
-p/--process: use time.process_time() (default is time.perf_counter())
-t/--time: use time.time() (deprecated)
-c/--clock: use time.clock() (deprecated)
-v/--verbose: print raw timing results; repeat for more digits precision
-h/--help: print this usage message and exit
--: separate options from statement, use when statement starts with -
statement: statement to be timed (default 'pass')
A multi-line statement may be given by specifying each line as a
separate argument; indented lines are possible by enclosing an
argument in quotes and using leading spaces. Multiple -s options are
treated similarly.
If -n is not given, a suitable number of loops is calculated by trying
successive powers of 10 until the total time is at least 0.2 seconds.
Note: there is a certain baseline overhead associated with executing a
pass statement. It differs between versions. The code here doesn't try
to hide it, but you should be aware of it. The baseline overhead can be
measured by invoking the program without arguments.
Classes:
Timer
Functions:
timeit(string, string) -> float
repeat(string, string) -> list
default_timer() -> float
"""
import gc
import sys
import time
import itertools
__all__ = ["Timer", "timeit", "repeat", "default_timer"]
dummy_src_name = "<timeit-src>"
default_number = 1000000
default_repeat = 3
default_timer = time.perf_counter
# Don't change the indentation of the template; the reindent() calls
# in Timer.__init__() depend on setup being indented 4 spaces and stmt
# being indented 8 spaces.
template = """
def inner(_it, _timer{init}):
{setup}
_t0 = _timer()
for _i in _it:
{stmt}
_t1 = _timer()
return _t1 - _t0
"""
def reindent(src, indent):
"""Helper to reindent a multi-line statement."""
return src.replace("\n", "\n" + " "*indent)
def _template_func(setup, func):
"""Create a timer function. Used if the "statement" is a callable."""
def inner(_it, _timer, _func=func):
setup()
_t0 = _timer()
for _i in _it:
_func()
_t1 = _timer()
return _t1 - _t0
return inner
class Timer:
"""Class for timing execution speed of small code snippets.
The constructor takes a statement to be timed, an additional
statement used for setup, and a timer function. Both statements
default to 'pass'; the timer function is platform-dependent (see
module doc string).
To measure the execution time of the first statement, use the
timeit() method. The repeat() method is a convenience to call
timeit() multiple times and return a list of results.
The statements may contain newlines, as long as they don't contain
multi-line string literals.
"""
def __init__(self, stmt="pass", setup="pass", timer=default_timer):
"""Constructor. See class doc string."""
self.timer = timer
ns = {}
if isinstance(stmt, str):
# Check that the code can be compiled outside a function
if isinstance(setup, str):
compile(setup, dummy_src_name, "exec")
compile(setup + '\n' + stmt, dummy_src_name, "exec")
else:
compile(stmt, dummy_src_name, "exec")
stmt = reindent(stmt, 8)
if isinstance(setup, str):
setup = reindent(setup, 4)
src = template.format(stmt=stmt, setup=setup, init='')
elif callable(setup):
src = template.format(stmt=stmt, setup='_setup()',
init=', _setup=_setup')
ns['_setup'] = setup
else:
raise ValueError("setup is neither a string nor callable")
self.src = src # Save for traceback display
code = compile(src, dummy_src_name, "exec")
exec(code, globals(), ns)
self.inner = ns["inner"]
elif callable(stmt):
self.src = None
if isinstance(setup, str):
_setup = setup
def setup():
exec(_setup, globals(), ns)
elif not callable(setup):
raise ValueError("setup is neither a string nor callable")
self.inner = _template_func(setup, stmt)
else:
raise ValueError("stmt is neither a string nor callable")
def print_exc(self, file=None):
"""Helper to print a traceback from the timed code.
Typical use:
t = Timer(...) # outside the try/except
try:
t.timeit(...) # or t.repeat(...)
except:
t.print_exc()
The advantage over the standard traceback is that source lines
in the compiled template will be displayed.
The optional file argument directs where the traceback is
sent; it defaults to sys.stderr.
"""
import linecache, traceback
if self.src is not None:
linecache.cache[dummy_src_name] = (len(self.src),
None,
self.src.split("\n"),
dummy_src_name)
# else the source is already stored somewhere else
traceback.print_exc(file=file)
def timeit(self, number=default_number):
"""Time 'number' executions of the main statement.
To be precise, this executes the setup statement once, and
then returns the time it takes to execute the main statement
a number of times, as a float measured in seconds. The
argument is the number of times through the loop, defaulting
to one million. The main statement, the setup statement and
the timer function to be used are passed to the constructor.
"""
it = itertools.repeat(None, number)
gcold = gc.isenabled()
gc.disable()
try:
timing = self.inner(it, self.timer)
finally:
if gcold:
gc.enable()
return timing
def repeat(self, repeat=default_repeat, number=default_number):
"""Call timeit() a few times.
This is a convenience function that calls the timeit()
repeatedly, returning a list of results. The first argument
specifies how many times to call timeit(), defaulting to 3;
the second argument specifies the timer argument, defaulting
to one million.
Note: it's tempting to calculate mean and standard deviation
from the result vector and report these. However, this is not
very useful. In a typical case, the lowest value gives a
lower bound for how fast your machine can run the given code
snippet; higher values in the result vector are typically not
caused by variability in Python's speed, but by other
processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be
interested in. After that, you should look at the entire
vector and apply common sense rather than statistics.
"""
r = []
for i in range(repeat):
t = self.timeit(number)
r.append(t)
return r
def timeit(stmt="pass", setup="pass", timer=default_timer,
number=default_number):
"""Convenience function to create Timer object and call timeit method."""
return Timer(stmt, setup, timer).timeit(number)
def repeat(stmt="pass", setup="pass", timer=default_timer,
repeat=default_repeat, number=default_number):
"""Convenience function to create Timer object and call repeat method."""
return Timer(stmt, setup, timer).repeat(repeat, number)
def main(args=None, *, _wrap_timer=None):
"""Main program, used when run as a script.
The optional 'args' argument specifies the command line to be parsed,
defaulting to sys.argv[1:].
The return value is an exit code to be passed to sys.exit(); it
may be None to indicate success.
When an exception happens during timing, a traceback is printed to
stderr and the return value is 1. Exceptions at other times
(including the template compilation) are not caught.
'_wrap_timer' is an internal interface used for unit testing. If it
is not None, it must be a callable that accepts a timer function
and returns another timer function (used for unit testing).
"""
if args is None:
args = sys.argv[1:]
import getopt
try:
opts, args = getopt.getopt(args, "n:s:r:tcpvh",
["number=", "setup=", "repeat=",
"time", "clock", "process",
"verbose", "help"])
except getopt.error as err:
print(err)
print("use -h/--help for command line help")
return 2
timer = default_timer
stmt = "\n".join(args) or "pass"
number = 0 # auto-determine
setup = []
repeat = default_repeat
verbose = 0
precision = 3
for o, a in opts:
if o in ("-n", "--number"):
number = int(a)
if o in ("-s", "--setup"):
setup.append(a)
if o in ("-r", "--repeat"):
repeat = int(a)
if repeat <= 0:
repeat = 1
if o in ("-t", "--time"):
timer = time.time
if o in ("-c", "--clock"):
timer = time.clock
if o in ("-p", "--process"):
timer = time.process_time
if o in ("-v", "--verbose"):
if verbose:
precision += 1
verbose += 1
if o in ("-h", "--help"):
print(__doc__, end=' ')
return 0
setup = "\n".join(setup) or "pass"
# Include the current directory, so that local imports work (sys.path
# contains the directory of this script, rather than the current
# directory)
import os
sys.path.insert(0, os.curdir)
if _wrap_timer is not None:
timer = _wrap_timer(timer)
t = Timer(stmt, setup, timer)
if number == 0:
# determine number so that 0.2 <= total time < 2.0
for i in range(1, 10):
number = 10**i
try:
x = t.timeit(number)
except:
t.print_exc()
return 1
if verbose:
print("%d loops -> %.*g secs" % (number, precision, x))
if x >= 0.2:
break
try:
r = t.repeat(repeat, number)
except:
t.print_exc()
return 1
best = min(r)
if verbose:
print("raw times:", " ".join(["%.*g" % (precision, x) for x in r]))
print("%d loops," % number, end=' ')
usec = best * 1e6 / number
if usec < 1000:
print("best of %d: %.*g usec per loop" % (repeat, precision, usec))
else:
msec = usec / 1000
if msec < 1000:
print("best of %d: %.*g msec per loop" % (repeat, precision, msec))
else:
sec = msec / 1000
print("best of %d: %.*g sec per loop" % (repeat, precision, sec))
return None
if __name__ == "__main__":
sys.exit(main())
"""Token constants (from "token.h")."""
__all__ = ['tok_name', 'ISTERMINAL', 'ISNONTERMINAL', 'ISEOF']
# This file is automatically generated; please don't muck it up!
#
# To update the symbols in this file, 'cd' to the top directory of
# the python source tree after building the interpreter and run:
#
# ./python Lib/token.py
#--start constants--
ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
LPAR = 7
RPAR = 8
LSQB = 9
RSQB = 10
COLON = 11
COMMA = 12
SEMI = 13
PLUS = 14
MINUS = 15
STAR = 16
SLASH = 17
VBAR = 18
AMPER = 19
LESS = 20
GREATER = 21
EQUAL = 22
DOT = 23
PERCENT = 24
LBRACE = 25
RBRACE = 26
EQEQUAL = 27
NOTEQUAL = 28
LESSEQUAL = 29
GREATEREQUAL = 30
TILDE = 31
CIRCUMFLEX = 32
LEFTSHIFT = 33
RIGHTSHIFT = 34
DOUBLESTAR = 35
PLUSEQUAL = 36
MINEQUAL = 37
STAREQUAL = 38
SLASHEQUAL = 39
PERCENTEQUAL = 40
AMPEREQUAL = 41
VBAREQUAL = 42
CIRCUMFLEXEQUAL = 43
LEFTSHIFTEQUAL = 44
RIGHTSHIFTEQUAL = 45
DOUBLESTAREQUAL = 46
DOUBLESLASH = 47
DOUBLESLASHEQUAL = 48
AT = 49
RARROW = 50
ELLIPSIS = 51
OP = 52
ERRORTOKEN = 53
N_TOKENS = 54
NT_OFFSET = 256
#--end constants--
tok_name = {value: name
for name, value in globals().items()
if isinstance(value, int) and not name.startswith('_')}
__all__.extend(tok_name.values())
def ISTERMINAL(x):
return x < NT_OFFSET
def ISNONTERMINAL(x):
return x >= NT_OFFSET
def ISEOF(x):
return x == ENDMARKER
def _main():
import re
import sys
args = sys.argv[1:]
inFileName = args and args[0] or "Include/token.h"
outFileName = "Lib/token.py"
if len(args) > 1:
outFileName = args[1]
try:
fp = open(inFileName)
except OSError as err:
sys.stdout.write("I/O error: %s\n" % str(err))
sys.exit(1)
lines = fp.read().split("\n")
fp.close()
prog = re.compile(
"#define[ \t][ \t]*([A-Z0-9][A-Z0-9_]*)[ \t][ \t]*([0-9][0-9]*)",
re.IGNORECASE)
tokens = {}
for line in lines:
match = prog.match(line)
if match:
name, val = match.group(1, 2)
val = int(val)
tokens[val] = name # reverse so we can sort them...
keys = sorted(tokens.keys())
# load the output skeleton from the target:
try:
fp = open(outFileName)
except OSError as err:
sys.stderr.write("I/O error: %s\n" % str(err))
sys.exit(2)
format = fp.read().split("\n")
fp.close()
try:
start = format.index("#--start constants--") + 1
end = format.index("#--end constants--")
except ValueError:
sys.stderr.write("target does not contain format markers")
sys.exit(3)
lines = []
for val in keys:
lines.append("%s = %d" % (tokens[val], val))
format[start:end] = lines
try:
fp = open(outFileName, 'w')
except OSError as err:
sys.stderr.write("I/O error: %s\n" % str(err))
sys.exit(4)
fp.write("\n".join(format))
fp.close()
if __name__ == "__main__":
_main()
"""Tokenization help for Python programs.
tokenize(readline) is a generator that breaks a stream of bytes into
Python tokens. It decodes the bytes according to PEP-0263 for
determining source file encoding.
It accepts a readline-like method which is called repeatedly to get the
next line of input (or b"" for EOF). It generates 5-tuples with these
members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators. Additionally, all token lists start with an ENCODING token
which tells you which encoding was used to decode the bytes stream.
"""
__author__ = 'Ka-Ping Yee <[email protected]>'
__credits__ = ('GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, '
'Skip Montanaro, Raymond Hettinger, Trent Nelson, '
'Michael Foord')
from builtins import open as _builtin_open
from codecs import lookup, BOM_UTF8
import collections
from io import TextIOWrapper
from itertools import chain
import re
import sys
from token import *
cookie_re = re.compile(r'^[ \t\f]*#.*coding[:=][ \t]*([-\w.]+)', re.ASCII)
blank_re = re.compile(br'^[ \t\f]*(?:[#\r\n]|$)', re.ASCII)
import token
__all__ = token.__all__ + ["COMMENT", "tokenize", "detect_encoding",
"NL", "untokenize", "ENCODING", "TokenInfo"]
del token
COMMENT = N_TOKENS
tok_name[COMMENT] = 'COMMENT'
NL = N_TOKENS + 1
tok_name[NL] = 'NL'
ENCODING = N_TOKENS + 2
tok_name[ENCODING] = 'ENCODING'
N_TOKENS += 3
EXACT_TOKEN_TYPES = {
'(': LPAR,
')': RPAR,
'[': LSQB,
']': RSQB,
':': COLON,
',': COMMA,
';': SEMI,
'+': PLUS,
'-': MINUS,
'*': STAR,
'/': SLASH,
'|': VBAR,
'&': AMPER,
'<': LESS,
'>': GREATER,
'=': EQUAL,
'.': DOT,
'%': PERCENT,
'{': LBRACE,
'}': RBRACE,
'==': EQEQUAL,
'!=': NOTEQUAL,
'<=': LESSEQUAL,
'>=': GREATEREQUAL,
'~': TILDE,
'^': CIRCUMFLEX,
'<<': LEFTSHIFT,
'>>': RIGHTSHIFT,
'**': DOUBLESTAR,
'+=': PLUSEQUAL,
'-=': MINEQUAL,
'*=': STAREQUAL,
'/=': SLASHEQUAL,
'%=': PERCENTEQUAL,
'&=': AMPEREQUAL,
'|=': VBAREQUAL,
'^=': CIRCUMFLEXEQUAL,
'<<=': LEFTSHIFTEQUAL,
'>>=': RIGHTSHIFTEQUAL,
'**=': DOUBLESTAREQUAL,
'//': DOUBLESLASH,
'//=': DOUBLESLASHEQUAL,
'@': AT
}
class TokenInfo(collections.namedtuple('TokenInfo', 'type string start end line')):
def __repr__(self):
annotated_type = '%d (%s)' % (self.type, tok_name[self.type])
return ('TokenInfo(type=%s, string=%r, start=%r, end=%r, line=%r)' %
self._replace(type=annotated_type))
@property
def exact_type(self):
if self.type == OP and self.string in EXACT_TOKEN_TYPES:
return EXACT_TOKEN_TYPES[self.string]
else:
return self.type
def group(*choices): return '(' + '|'.join(choices) + ')'
def any(*choices): return group(*choices) + '*'
def maybe(*choices): return group(*choices) + '?'
# Note: we use unicode matching for names ("\w") but ascii matching for
# number literals.
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'\w+'
Hexnumber = r'0[xX][0-9a-fA-F]+'
Binnumber = r'0[bB][01]+'
Octnumber = r'0[oO][0-7]+'
Decnumber = r'(?:0+|[1-9][0-9]*)'
Intnumber = group(Hexnumber, Binnumber, Octnumber, Decnumber)
Exponent = r'[eE][-+]?[0-9]+'
Pointfloat = group(r'[0-9]+\.[0-9]*', r'\.[0-9]+') + maybe(Exponent)
Expfloat = r'[0-9]+' + Exponent
Floatnumber = group(Pointfloat, Expfloat)
Imagnumber = group(r'[0-9]+[jJ]', Floatnumber + r'[jJ]')
Number = group(Imagnumber, Floatnumber, Intnumber)
StringPrefix = r'(?:[bB][rR]?|[rR][bB]?|[uU])?'
# Tail end of ' string.
Single = r"[^'\\]*(?:\\.[^'\\]*)*'"
# Tail end of " string.
Double = r'[^"\\]*(?:\\.[^"\\]*)*"'
# Tail end of ''' string.
Single3 = r"[^'\\]*(?:(?:\\.|'(?!''))[^'\\]*)*'''"
# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'
Triple = group(StringPrefix + "'''", StringPrefix + '"""')
# Single-line ' or " string.
String = group(StringPrefix + r"'[^\n'\\]*(?:\\.[^\n'\\]*)*'",
StringPrefix + r'"[^\n"\\]*(?:\\.[^\n"\\]*)*"')
# Because of leftmost-then-longest match semantics, be sure to put the
# longest operators first (e.g., if = came before ==, == would get
# recognized as two instances of =).
Operator = group(r"\*\*=?", r">>=?", r"<<=?", r"!=",
r"//=?", r"->",
r"[+\-*/%&|^=<>]=?",
r"~")
Bracket = '[][(){}]'
Special = group(r'\r?\n', r'\.\.\.', r'[:;.,@]')
Funny = group(Operator, Bracket, Special)
PlainToken = group(Number, Funny, String, Name)
Token = Ignore + PlainToken
# First (or only) line of ' or " string.
ContStr = group(StringPrefix + r"'[^\n'\\]*(?:\\.[^\n'\\]*)*" +
group("'", r'\\\r?\n'),
StringPrefix + r'"[^\n"\\]*(?:\\.[^\n"\\]*)*' +
group('"', r'\\\r?\n'))
PseudoExtras = group(r'\\\r?\n|\Z', Comment, Triple)
PseudoToken = Whitespace + group(PseudoExtras, Number, Funny, ContStr, Name)
def _compile(expr):
return re.compile(expr, re.UNICODE)
endpats = {"'": Single, '"': Double,
"'''": Single3, '"""': Double3,
"r'''": Single3, 'r"""': Double3,
"b'''": Single3, 'b"""': Double3,
"R'''": Single3, 'R"""': Double3,
"B'''": Single3, 'B"""': Double3,
"br'''": Single3, 'br"""': Double3,
"bR'''": Single3, 'bR"""': Double3,
"Br'''": Single3, 'Br"""': Double3,
"BR'''": Single3, 'BR"""': Double3,
"rb'''": Single3, 'rb"""': Double3,
"Rb'''": Single3, 'Rb"""': Double3,
"rB'''": Single3, 'rB"""': Double3,
"RB'''": Single3, 'RB"""': Double3,
"u'''": Single3, 'u"""': Double3,
"R'''": Single3, 'R"""': Double3,
"U'''": Single3, 'U"""': Double3,
'r': None, 'R': None, 'b': None, 'B': None,
'u': None, 'U': None}
triple_quoted = {}
for t in ("'''", '"""',
"r'''", 'r"""', "R'''", 'R"""',
"b'''", 'b"""', "B'''", 'B"""',
"br'''", 'br"""', "Br'''", 'Br"""',
"bR'''", 'bR"""', "BR'''", 'BR"""',
"rb'''", 'rb"""', "rB'''", 'rB"""',
"Rb'''", 'Rb"""', "RB'''", 'RB"""',
"u'''", 'u"""', "U'''", 'U"""',
):
triple_quoted[t] = t
single_quoted = {}
for t in ("'", '"',
"r'", 'r"', "R'", 'R"',
"b'", 'b"', "B'", 'B"',
"br'", 'br"', "Br'", 'Br"',
"bR'", 'bR"', "BR'", 'BR"' ,
"rb'", 'rb"', "rB'", 'rB"',
"Rb'", 'Rb"', "RB'", 'RB"' ,
"u'", 'u"', "U'", 'U"',
):
single_quoted[t] = t
tabsize = 8
class TokenError(Exception): pass
class StopTokenizing(Exception): pass
class Untokenizer:
def __init__(self):
self.tokens = []
self.prev_row = 1
self.prev_col = 0
self.encoding = None
def add_whitespace(self, start):
row, col = start
if row < self.prev_row or row == self.prev_row and col < self.prev_col:
raise ValueError("start ({},{}) precedes previous end ({},{})"
.format(row, col, self.prev_row, self.prev_col))
row_offset = row - self.prev_row
if row_offset:
self.tokens.append("\\\n" * row_offset)
self.prev_col = 0
col_offset = col - self.prev_col
if col_offset:
self.tokens.append(" " * col_offset)
def untokenize(self, iterable):
it = iter(iterable)
indents = []
startline = False
for t in it:
if len(t) == 2:
self.compat(t, it)
break
tok_type, token, start, end, line = t
if tok_type == ENCODING:
self.encoding = token
continue
if tok_type == ENDMARKER:
break
if tok_type == INDENT:
indents.append(token)
continue
elif tok_type == DEDENT:
indents.pop()
self.prev_row, self.prev_col = end
continue
elif tok_type in (NEWLINE, NL):
startline = True
elif startline and indents:
indent = indents[-1]
if start[1] >= len(indent):
self.tokens.append(indent)
self.prev_col = len(indent)
startline = False
self.add_whitespace(start)
self.tokens.append(token)
self.prev_row, self.prev_col = end
if tok_type in (NEWLINE, NL):
self.prev_row += 1
self.prev_col = 0
return "".join(self.tokens)
def compat(self, token, iterable):
indents = []
toks_append = self.tokens.append
startline = token[0] in (NEWLINE, NL)
prevstring = False
for tok in chain([token], iterable):
toknum, tokval = tok[:2]
if toknum == ENCODING:
self.encoding = tokval
continue
if toknum in (NAME, NUMBER):
tokval += ' '
# Insert a space between two consecutive strings
if toknum == STRING:
if prevstring:
tokval = ' ' + tokval
prevstring = True
else:
prevstring = False
if toknum == INDENT:
indents.append(tokval)
continue
elif toknum == DEDENT:
indents.pop()
continue
elif toknum in (NEWLINE, NL):
startline = True
elif startline and indents:
toks_append(indents[-1])
startline = False
toks_append(tokval)
def untokenize(iterable):
"""Transform tokens back into Python source code.
It returns a bytes object, encoded using the ENCODING
token, which is the first token sequence output by tokenize.
Each element returned by the iterable must be a token sequence
with at least two elements, a token number and token value. If
only two tokens are passed, the resulting output is poor.
Round-trip invariant for full input:
Untokenized source will match input source exactly
Round-trip invariant for limited intput:
# Output bytes will tokenize the back to the input
t1 = [tok[:2] for tok in tokenize(f.readline)]
newcode = untokenize(t1)
readline = BytesIO(newcode).readline
t2 = [tok[:2] for tok in tokenize(readline)]
assert t1 == t2
"""
ut = Untokenizer()
out = ut.untokenize(iterable)
if ut.encoding is not None:
out = out.encode(ut.encoding)
return out
def _get_normal_name(orig_enc):
"""Imitates get_normal_name in tokenizer.c."""
# Only care about the first 12 characters.
enc = orig_enc[:12].lower().replace("_", "-")
if enc == "utf-8" or enc.startswith("utf-8-"):
return "utf-8"
if enc in ("latin-1", "iso-8859-1", "iso-latin-1") or \
enc.startswith(("latin-1-", "iso-8859-1-", "iso-latin-1-")):
return "iso-8859-1"
return orig_enc
def detect_encoding(readline):
"""
The detect_encoding() function is used to detect the encoding that should
be used to decode a Python source file. It requires one argument, readline,
in the same way as the tokenize() generator.
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (left as bytes) it has read in.
It detects the encoding from the presence of a utf-8 bom or an encoding
cookie as specified in pep-0263. If both a bom and a cookie are present,
but disagree, a SyntaxError will be raised. If the encoding cookie is an
invalid charset, raise a SyntaxError. Note that if a utf-8 bom is found,
'utf-8-sig' is returned.
If no encoding is specified, then the default of 'utf-8' will be returned.
"""
try:
filename = readline.__self__.name
except AttributeError:
filename = None
bom_found = False
encoding = None
default = 'utf-8'
def read_or_stop():
try:
return readline()
except StopIteration:
return b''
def find_cookie(line):
try:
# Decode as UTF-8. Either the line is an encoding declaration,
# in which case it should be pure ASCII, or it must be UTF-8
# per default encoding.
line_string = line.decode('utf-8')
except UnicodeDecodeError:
msg = "invalid or missing encoding declaration"
if filename is not None:
msg = '{} for {!r}'.format(msg, filename)
raise SyntaxError(msg)
match = cookie_re.match(line_string)
if not match:
return None
encoding = _get_normal_name(match.group(1))
try:
codec = lookup(encoding)
except LookupError:
# This behaviour mimics the Python interpreter
if filename is None:
msg = "unknown encoding: " + encoding
else:
msg = "unknown encoding for {!r}: {}".format(filename,
encoding)
raise SyntaxError(msg)
if bom_found:
if encoding != 'utf-8':
# This behaviour mimics the Python interpreter
if filename is None:
msg = 'encoding problem: utf-8'
else:
msg = 'encoding problem for {!r}: utf-8'.format(filename)
raise SyntaxError(msg)
encoding += '-sig'
return encoding
first = read_or_stop()
if first.startswith(BOM_UTF8):
bom_found = True
first = first[3:]
default = 'utf-8-sig'
if not first:
return default, []
encoding = find_cookie(first)
if encoding:
return encoding, [first]
if not blank_re.match(first):
return default, [first]
second = read_or_stop()
if not second:
return default, [first]
encoding = find_cookie(second)
if encoding:
return encoding, [first, second]
return default, [first, second]
def open(filename):
"""Open a file in read only mode using the encoding detected by
detect_encoding().
"""
buffer = _builtin_open(filename, 'rb')
try:
encoding, lines = detect_encoding(buffer.readline)
buffer.seek(0)
text = TextIOWrapper(buffer, encoding, line_buffering=True)
text.mode = 'r'
return text
except:
buffer.close()
raise
def tokenize(readline):
"""
The tokenize() generator requires one argment, readline, which
must be a callable object which provides the same interface as the
readline() method of built-in file objects. Each call to the function
should return one line of input as bytes. Alternately, readline
can be a callable function terminating with StopIteration:
readline = open(myfile, 'rb').__next__ # Example of alternate readline
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed is the
logical line; continuation lines are included.
The first token sequence will always be an ENCODING token
which tells you which encoding was used to decode the bytes stream.
"""
# This import is here to avoid problems when the itertools module is not
# built yet and tokenize is imported.
from itertools import chain, repeat
encoding, consumed = detect_encoding(readline)
rl_gen = iter(readline, b"")
empty = repeat(b"")
return _tokenize(chain(consumed, rl_gen, empty).__next__, encoding)
def _tokenize(readline, encoding):
lnum = parenlev = continued = 0
numchars = '0123456789'
contstr, needcont = '', 0
contline = None
indents = [0]
if encoding is not None:
if encoding == "utf-8-sig":
# BOM will already have been stripped.
encoding = "utf-8"
yield TokenInfo(ENCODING, encoding, (0, 0), (0, 0), '')
while True: # loop over lines in stream
try:
line = readline()
except StopIteration:
line = b''
if encoding is not None:
line = line.decode(encoding)
lnum += 1
pos, max = 0, len(line)
if contstr: # continued string
if not line:
raise TokenError("EOF in multi-line string", strstart)
endmatch = endprog.match(line)
if endmatch:
pos = end = endmatch.end(0)
yield TokenInfo(STRING, contstr + line[:end],
strstart, (lnum, end), contline + line)
contstr, needcont = '', 0
contline = None
elif needcont and line[-2:] != '\\\n' and line[-3:] != '\\\r\n':
yield TokenInfo(ERRORTOKEN, contstr + line,
strstart, (lnum, len(line)), contline)
contstr = ''
contline = None
continue
else:
contstr = contstr + line
contline = contline + line
continue
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ':
column += 1
elif line[pos] == '\t':
column = (column//tabsize + 1)*tabsize
elif line[pos] == '\f':
column = 0
else:
break
pos += 1
if pos == max:
break
if line[pos] in '#\r\n': # skip comments or blank lines
if line[pos] == '#':
comment_token = line[pos:].rstrip('\r\n')
nl_pos = pos + len(comment_token)
yield TokenInfo(COMMENT, comment_token,
(lnum, pos), (lnum, pos + len(comment_token)), line)
yield TokenInfo(NL, line[nl_pos:],
(lnum, nl_pos), (lnum, len(line)), line)
else:
yield TokenInfo((NL, COMMENT)[line[pos] == '#'], line[pos:],
(lnum, pos), (lnum, len(line)), line)
continue
if column > indents[-1]: # count indents or dedents
indents.append(column)
yield TokenInfo(INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]:
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
yield TokenInfo(DEDENT, '', (lnum, pos), (lnum, pos), line)
else: # continued statement
if not line:
raise TokenError("EOF in multi-line statement", (lnum, 0))
continued = 0
while pos < max:
pseudomatch = _compile(PseudoToken).match(line, pos)
if pseudomatch: # scan for tokens
start, end = pseudomatch.span(1)
spos, epos, pos = (lnum, start), (lnum, end), end
if start == end:
continue
token, initial = line[start:end], line[start]
if (initial in numchars or # ordinary number
(initial == '.' and token != '.' and token != '...')):
yield TokenInfo(NUMBER, token, spos, epos, line)
elif initial in '\r\n':
yield TokenInfo(NL if parenlev > 0 else NEWLINE,
token, spos, epos, line)
elif initial == '#':
assert not token.endswith("\n")
yield TokenInfo(COMMENT, token, spos, epos, line)
elif token in triple_quoted:
endprog = _compile(endpats[token])
endmatch = endprog.match(line, pos)
if endmatch: # all on one line
pos = endmatch.end(0)
token = line[start:pos]
yield TokenInfo(STRING, token, spos, (lnum, pos), line)
else:
strstart = (lnum, start) # multiple lines
contstr = line[start:]
contline = line
break
elif initial in single_quoted or \
token[:2] in single_quoted or \
token[:3] in single_quoted:
if token[-1] == '\n': # continued string
strstart = (lnum, start)
endprog = _compile(endpats[initial] or
endpats[token[1]] or
endpats[token[2]])
contstr, needcont = line[start:], 1
contline = line
break
else: # ordinary string
yield TokenInfo(STRING, token, spos, epos, line)
elif initial.isidentifier(): # ordinary name
yield TokenInfo(NAME, token, spos, epos, line)
elif initial == '\\': # continued stmt
continued = 1
else:
if initial in '([{':
parenlev += 1
elif initial in ')]}':
parenlev -= 1
yield TokenInfo(OP, token, spos, epos, line)
else:
yield TokenInfo(ERRORTOKEN, line[pos],
(lnum, pos), (lnum, pos+1), line)
pos += 1
for indent in indents[1:]: # pop remaining indent levels
yield TokenInfo(DEDENT, '', (lnum, 0), (lnum, 0), '')
yield TokenInfo(ENDMARKER, '', (lnum, 0), (lnum, 0), '')
# An undocumented, backwards compatible, API for all the places in the standard
# library that expect to be able to use tokenize with strings
def generate_tokens(readline):
return _tokenize(readline, None)
def main():
import argparse
# Helper error handling routines
def perror(message):
print(message, file=sys.stderr)
def error(message, filename=None, location=None):
if location:
args = (filename,) + location + (message,)
perror("%s:%d:%d: error: %s" % args)
elif filename:
perror("%s: error: %s" % (filename, message))
else:
perror("error: %s" % message)
sys.exit(1)
# Parse the arguments and options
parser = argparse.ArgumentParser(prog='python -m tokenize')
parser.add_argument(dest='filename', nargs='?',
metavar='filename.py',
help='the file to tokenize; defaults to stdin')
parser.add_argument('-e', '--exact', dest='exact', action='store_true',
help='display token names using the exact type')
args = parser.parse_args()
try:
# Tokenize the input
if args.filename:
filename = args.filename
with _builtin_open(filename, 'rb') as f:
tokens = list(tokenize(f.readline))
else:
filename = "<stdin>"
tokens = _tokenize(sys.stdin.readline, None)
# Output the tokenization
for token in tokens:
token_type = token.type
if args.exact:
token_type = token.exact_type
token_range = "%d,%d-%d,%d:" % (token.start + token.end)
print("%-20s%-15s%-15r" %
(token_range, tok_name[token_type], token.string))
except IndentationError as err:
line, column = err.args[1][1:3]
error(err.args[0], filename, (line, column))
except TokenError as err:
line, column = err.args[1]
error(err.args[0], filename, (line, column))
except SyntaxError as err:
error(err, filename)
except OSError as err:
error(err)
except KeyboardInterrupt:
print("interrupted\n")
except Exception as err:
perror("unexpected error: %s" % err)
raise
if __name__ == "__main__":
main()
#!/usr/bin/env python3
# portions copyright 2001, Autonomous Zones Industries, Inc., all rights...
# err... reserved and offered to the public under the terms of the
# Python 2.2 license.
# Author: Zooko O'Whielacronx
# http://zooko.com/
# mailto:[email protected]
#
# Copyright 2000, Mojam Media, Inc., all rights reserved.
# Author: Skip Montanaro
#
# Copyright 1999, Bioreason, Inc., all rights reserved.
# Author: Andrew Dalke
#
# Copyright 1995-1997, Automatrix, Inc., all rights reserved.
# Author: Skip Montanaro
#
# Copyright 1991-1995, Stichting Mathematisch Centrum, all rights reserved.
#
#
# Permission to use, copy, modify, and distribute this Python software and
# its associated documentation for any purpose without fee is hereby
# granted, provided that the above copyright notice appears in all copies,
# and that both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of neither Automatrix,
# Bioreason or Mojam Media be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior permission.
#
"""program/module to trace Python program or function execution
Sample use, command line:
trace.py -c -f counts --ignore-dir '$prefix' spam.py eggs
trace.py -t --ignore-dir '$prefix' spam.py eggs
trace.py --trackcalls spam.py eggs
Sample use, programmatically
import sys
# create a Trace object, telling it what to ignore, and whether to
# do tracing or line-counting or both.
tracer = trace.Trace(ignoredirs=[sys.base_prefix, sys.base_exec_prefix,],
trace=0, count=1)
# run the new command using the given tracer
tracer.run('main()')
# make a report, placing output in /tmp
r = tracer.results()
r.write_results(show_missing=True, coverdir="/tmp")
"""
__all__ = ['Trace', 'CoverageResults']
import linecache
import os
import re
import sys
import token
import tokenize
import inspect
import gc
import dis
import pickle
from warnings import warn as _warn
try:
from time import monotonic as _time
except ImportError:
from time import time as _time
try:
import threading
except ImportError:
_settrace = sys.settrace
def _unsettrace():
sys.settrace(None)
else:
def _settrace(func):
threading.settrace(func)
sys.settrace(func)
def _unsettrace():
sys.settrace(None)
threading.settrace(None)
def _usage(outfile):
outfile.write("""Usage: %s [OPTIONS] <file> [ARGS]
Meta-options:
--help Display this help then exit.
--version Output version information then exit.
Otherwise, exactly one of the following three options must be given:
-t, --trace Print each line to sys.stdout before it is executed.
-c, --count Count the number of times each line is executed
and write the counts to <module>.cover for each
module executed, in the module's directory.
See also `--coverdir', `--file', `--no-report' below.
-l, --listfuncs Keep track of which functions are executed at least
once and write the results to sys.stdout after the
program exits.
-T, --trackcalls Keep track of caller/called pairs and write the
results to sys.stdout after the program exits.
-r, --report Generate a report from a counts file; do not execute
any code. `--file' must specify the results file to
read, which must have been created in a previous run
with `--count --file=FILE'.
Modifiers:
-f, --file=<file> File to accumulate counts over several runs.
-R, --no-report Do not generate the coverage report files.
Useful if you want to accumulate over several runs.
-C, --coverdir=<dir> Directory where the report files. The coverage
report for <package>.<module> is written to file
<dir>/<package>/<module>.cover.
-m, --missing Annotate executable lines that were not executed
with '>>>>>> '.
-s, --summary Write a brief summary on stdout for each file.
(Can only be used with --count or --report.)
-g, --timing Prefix each line with the time since the program started.
Only used while tracing.
Filters, may be repeated multiple times:
--ignore-module=<mod> Ignore the given module(s) and its submodules
(if it is a package). Accepts comma separated
list of module names
--ignore-dir=<dir> Ignore files in the given directory (multiple
directories can be joined by os.pathsep).
""" % sys.argv[0])
PRAGMA_NOCOVER = "#pragma NO COVER"
# Simple rx to find lines with no code.
rx_blank = re.compile(r'^\s*(#.*)?$')
class _Ignore:
def __init__(self, modules=None, dirs=None):
self._mods = set() if not modules else set(modules)
self._dirs = [] if not dirs else [os.path.normpath(d)
for d in dirs]
self._ignore = { '<string>': 1 }
def names(self, filename, modulename):
if modulename in self._ignore:
return self._ignore[modulename]
# haven't seen this one before, so see if the module name is
# on the ignore list.
if modulename in self._mods: # Identical names, so ignore
self._ignore[modulename] = 1
return 1
# check if the module is a proper submodule of something on
# the ignore list
for mod in self._mods:
# Need to take some care since ignoring
# "cmp" mustn't mean ignoring "cmpcache" but ignoring
# "Spam" must also mean ignoring "Spam.Eggs".
if modulename.startswith(mod + '.'):
self._ignore[modulename] = 1
return 1
# Now check that filename isn't in one of the directories
if filename is None:
# must be a built-in, so we must ignore
self._ignore[modulename] = 1
return 1
# Ignore a file when it contains one of the ignorable paths
for d in self._dirs:
# The '+ os.sep' is to ensure that d is a parent directory,
# as compared to cases like:
# d = "/usr/local"
# filename = "/usr/local.py"
# or
# d = "/usr/local.py"
# filename = "/usr/local.py"
if filename.startswith(d + os.sep):
self._ignore[modulename] = 1
return 1
# Tried the different ways, so we don't ignore this module
self._ignore[modulename] = 0
return 0
def _modname(path):
"""Return a plausible module name for the patch."""
base = os.path.basename(path)
filename, ext = os.path.splitext(base)
return filename
def _fullmodname(path):
"""Return a plausible module name for the path."""
# If the file 'path' is part of a package, then the filename isn't
# enough to uniquely identify it. Try to do the right thing by
# looking in sys.path for the longest matching prefix. We'll
# assume that the rest is the package name.
comparepath = os.path.normcase(path)
longest = ""
for dir in sys.path:
dir = os.path.normcase(dir)
if comparepath.startswith(dir) and comparepath[len(dir)] == os.sep:
if len(dir) > len(longest):
longest = dir
if longest:
base = path[len(longest) + 1:]
else:
base = path
# the drive letter is never part of the module name
drive, base = os.path.splitdrive(base)
base = base.replace(os.sep, ".")
if os.altsep:
base = base.replace(os.altsep, ".")
filename, ext = os.path.splitext(base)
return filename.lstrip(".")
class CoverageResults:
def __init__(self, counts=None, calledfuncs=None, infile=None,
callers=None, outfile=None):
self.counts = counts
if self.counts is None:
self.counts = {}
self.counter = self.counts.copy() # map (filename, lineno) to count
self.calledfuncs = calledfuncs
if self.calledfuncs is None:
self.calledfuncs = {}
self.calledfuncs = self.calledfuncs.copy()
self.callers = callers
if self.callers is None:
self.callers = {}
self.callers = self.callers.copy()
self.infile = infile
self.outfile = outfile
if self.infile:
# Try to merge existing counts file.
try:
counts, calledfuncs, callers = \
pickle.load(open(self.infile, 'rb'))
self.update(self.__class__(counts, calledfuncs, callers))
except (OSError, EOFError, ValueError) as err:
print(("Skipping counts file %r: %s"
% (self.infile, err)), file=sys.stderr)
def is_ignored_filename(self, filename):
"""Return True if the filename does not refer to a file
we want to have reported.
"""
return filename.startswith('<') and filename.endswith('>')
def update(self, other):
"""Merge in the data from another CoverageResults"""
counts = self.counts
calledfuncs = self.calledfuncs
callers = self.callers
other_counts = other.counts
other_calledfuncs = other.calledfuncs
other_callers = other.callers
for key in other_counts:
counts[key] = counts.get(key, 0) + other_counts[key]
for key in other_calledfuncs:
calledfuncs[key] = 1
for key in other_callers:
callers[key] = 1
def write_results(self, show_missing=True, summary=False, coverdir=None):
"""
@param coverdir
"""
if self.calledfuncs:
print()
print("functions called:")
calls = self.calledfuncs
for filename, modulename, funcname in sorted(calls):
print(("filename: %s, modulename: %s, funcname: %s"
% (filename, modulename, funcname)))
if self.callers:
print()
print("calling relationships:")
lastfile = lastcfile = ""
for ((pfile, pmod, pfunc), (cfile, cmod, cfunc)) \
in sorted(self.callers):
if pfile != lastfile:
print()
print("***", pfile, "***")
lastfile = pfile
lastcfile = ""
if cfile != pfile and lastcfile != cfile:
print(" -->", cfile)
lastcfile = cfile
print(" %s.%s -> %s.%s" % (pmod, pfunc, cmod, cfunc))
# turn the counts data ("(filename, lineno) = count") into something
# accessible on a per-file basis
per_file = {}
for filename, lineno in self.counts:
lines_hit = per_file[filename] = per_file.get(filename, {})
lines_hit[lineno] = self.counts[(filename, lineno)]
# accumulate summary info, if needed
sums = {}
for filename, count in per_file.items():
if self.is_ignored_filename(filename):
continue
if filename.endswith((".pyc", ".pyo")):
filename = filename[:-1]
if coverdir is None:
dir = os.path.dirname(os.path.abspath(filename))
modulename = _modname(filename)
else:
dir = coverdir
if not os.path.exists(dir):
os.makedirs(dir)
modulename = _fullmodname(filename)
# If desired, get a list of the line numbers which represent
# executable content (returned as a dict for better lookup speed)
if show_missing:
lnotab = _find_executable_linenos(filename)
else:
lnotab = {}
source = linecache.getlines(filename)
coverpath = os.path.join(dir, modulename + ".cover")
with open(filename, 'rb') as fp:
encoding, _ = tokenize.detect_encoding(fp.readline)
n_hits, n_lines = self.write_results_file(coverpath, source,
lnotab, count, encoding)
if summary and n_lines:
percent = int(100 * n_hits / n_lines)
sums[modulename] = n_lines, percent, modulename, filename
if summary and sums:
print("lines cov% module (path)")
for m in sorted(sums):
n_lines, percent, modulename, filename = sums[m]
print("%5d %3d%% %s (%s)" % sums[m])
if self.outfile:
# try and store counts and module info into self.outfile
try:
pickle.dump((self.counts, self.calledfuncs, self.callers),
open(self.outfile, 'wb'), 1)
except OSError as err:
print("Can't save counts files because %s" % err, file=sys.stderr)
def write_results_file(self, path, lines, lnotab, lines_hit, encoding=None):
"""Return a coverage results file in path."""
try:
outfile = open(path, "w", encoding=encoding)
except OSError as err:
print(("trace: Could not open %r for writing: %s"
"- skipping" % (path, err)), file=sys.stderr)
return 0, 0
n_lines = 0
n_hits = 0
for lineno, line in enumerate(lines, 1):
# do the blank/comment match to try to mark more lines
# (help the reader find stuff that hasn't been covered)
if lineno in lines_hit:
outfile.write("%5d: " % lines_hit[lineno])
n_hits += 1
n_lines += 1
elif rx_blank.match(line):
outfile.write(" ")
else:
# lines preceded by no marks weren't hit
# Highlight them if so indicated, unless the line contains
# #pragma: NO COVER
if lineno in lnotab and not PRAGMA_NOCOVER in line:
outfile.write(">>>>>> ")
n_lines += 1
else:
outfile.write(" ")
outfile.write(line.expandtabs(8))
outfile.close()
return n_hits, n_lines
def _find_lines_from_code(code, strs):
"""Return dict where keys are lines in the line number table."""
linenos = {}
for _, lineno in dis.findlinestarts(code):
if lineno not in strs:
linenos[lineno] = 1
return linenos
def _find_lines(code, strs):
"""Return lineno dict for all code objects reachable from code."""
# get all of the lineno information from the code of this scope level
linenos = _find_lines_from_code(code, strs)
# and check the constants for references to other code objects
for c in code.co_consts:
if inspect.iscode(c):
# find another code object, so recurse into it
linenos.update(_find_lines(c, strs))
return linenos
def _find_strings(filename, encoding=None):
"""Return a dict of possible docstring positions.
The dict maps line numbers to strings. There is an entry for
line that contains only a string or a part of a triple-quoted
string.
"""
d = {}
# If the first token is a string, then it's the module docstring.
# Add this special case so that the test in the loop passes.
prev_ttype = token.INDENT
with open(filename, encoding=encoding) as f:
tok = tokenize.generate_tokens(f.readline)
for ttype, tstr, start, end, line in tok:
if ttype == token.STRING:
if prev_ttype == token.INDENT:
sline, scol = start
eline, ecol = end
for i in range(sline, eline + 1):
d[i] = 1
prev_ttype = ttype
return d
def _find_executable_linenos(filename):
"""Return dict where keys are line numbers in the line number table."""
try:
with tokenize.open(filename) as f:
prog = f.read()
encoding = f.encoding
except OSError as err:
print(("Not printing coverage data for %r: %s"
% (filename, err)), file=sys.stderr)
return {}
code = compile(prog, filename, "exec")
strs = _find_strings(filename, encoding)
return _find_lines(code, strs)
class Trace:
def __init__(self, count=1, trace=1, countfuncs=0, countcallers=0,
ignoremods=(), ignoredirs=(), infile=None, outfile=None,
timing=False):
"""
@param count true iff it should count number of times each
line is executed
@param trace true iff it should print out each line that is
being counted
@param countfuncs true iff it should just output a list of
(filename, modulename, funcname,) for functions
that were called at least once; This overrides
`count' and `trace'
@param ignoremods a list of the names of modules to ignore
@param ignoredirs a list of the names of directories to ignore
all of the (recursive) contents of
@param infile file from which to read stored counts to be
added into the results
@param outfile file in which to write the results
@param timing true iff timing information be displayed
"""
self.infile = infile
self.outfile = outfile
self.ignore = _Ignore(ignoremods, ignoredirs)
self.counts = {} # keys are (filename, linenumber)
self.pathtobasename = {} # for memoizing os.path.basename
self.donothing = 0
self.trace = trace
self._calledfuncs = {}
self._callers = {}
self._caller_cache = {}
self.start_time = None
if timing:
self.start_time = _time()
if countcallers:
self.globaltrace = self.globaltrace_trackcallers
elif countfuncs:
self.globaltrace = self.globaltrace_countfuncs
elif trace and count:
self.globaltrace = self.globaltrace_lt
self.localtrace = self.localtrace_trace_and_count
elif trace:
self.globaltrace = self.globaltrace_lt
self.localtrace = self.localtrace_trace
elif count:
self.globaltrace = self.globaltrace_lt
self.localtrace = self.localtrace_count
else:
# Ahem -- do nothing? Okay.
self.donothing = 1
def run(self, cmd):
import __main__
dict = __main__.__dict__
self.runctx(cmd, dict, dict)
def runctx(self, cmd, globals=None, locals=None):
if globals is None: globals = {}
if locals is None: locals = {}
if not self.donothing:
_settrace(self.globaltrace)
try:
exec(cmd, globals, locals)
finally:
if not self.donothing:
_unsettrace()
def runfunc(self, func, *args, **kw):
result = None
if not self.donothing:
sys.settrace(self.globaltrace)
try:
result = func(*args, **kw)
finally:
if not self.donothing:
sys.settrace(None)
return result
def file_module_function_of(self, frame):
code = frame.f_code
filename = code.co_filename
if filename:
modulename = _modname(filename)
else:
modulename = None
funcname = code.co_name
clsname = None
if code in self._caller_cache:
if self._caller_cache[code] is not None:
clsname = self._caller_cache[code]
else:
self._caller_cache[code] = None
## use of gc.get_referrers() was suggested by Michael Hudson
# all functions which refer to this code object
funcs = [f for f in gc.get_referrers(code)
if inspect.isfunction(f)]
# require len(func) == 1 to avoid ambiguity caused by calls to
# new.function(): "In the face of ambiguity, refuse the
# temptation to guess."
if len(funcs) == 1:
dicts = [d for d in gc.get_referrers(funcs[0])
if isinstance(d, dict)]
if len(dicts) == 1:
classes = [c for c in gc.get_referrers(dicts[0])
if hasattr(c, "__bases__")]
if len(classes) == 1:
# ditto for new.classobj()
clsname = classes[0].__name__
# cache the result - assumption is that new.* is
# not called later to disturb this relationship
# _caller_cache could be flushed if functions in
# the new module get called.
self._caller_cache[code] = clsname
if clsname is not None:
funcname = "%s.%s" % (clsname, funcname)
return filename, modulename, funcname
def globaltrace_trackcallers(self, frame, why, arg):
"""Handler for call events.
Adds information about who called who to the self._callers dict.
"""
if why == 'call':
# XXX Should do a better job of identifying methods
this_func = self.file_module_function_of(frame)
parent_func = self.file_module_function_of(frame.f_back)
self._callers[(parent_func, this_func)] = 1
def globaltrace_countfuncs(self, frame, why, arg):
"""Handler for call events.
Adds (filename, modulename, funcname) to the self._calledfuncs dict.
"""
if why == 'call':
this_func = self.file_module_function_of(frame)
self._calledfuncs[this_func] = 1
def globaltrace_lt(self, frame, why, arg):
"""Handler for call events.
If the code block being entered is to be ignored, returns `None',
else returns self.localtrace.
"""
if why == 'call':
code = frame.f_code
filename = frame.f_globals.get('__file__', None)
if filename:
# XXX _modname() doesn't work right for packages, so
# the ignore support won't work right for packages
modulename = _modname(filename)
if modulename is not None:
ignore_it = self.ignore.names(filename, modulename)
if not ignore_it:
if self.trace:
print((" --- modulename: %s, funcname: %s"
% (modulename, code.co_name)))
return self.localtrace
else:
return None
def localtrace_trace_and_count(self, frame, why, arg):
if why == "line":
# record the file name and line number of every trace
filename = frame.f_code.co_filename
lineno = frame.f_lineno
key = filename, lineno
self.counts[key] = self.counts.get(key, 0) + 1
if self.start_time:
print('%.2f' % (_time() - self.start_time), end=' ')
bname = os.path.basename(filename)
print("%s(%d): %s" % (bname, lineno,
linecache.getline(filename, lineno)), end='')
return self.localtrace
def localtrace_trace(self, frame, why, arg):
if why == "line":
# record the file name and line number of every trace
filename = frame.f_code.co_filename
lineno = frame.f_lineno
if self.start_time:
print('%.2f' % (_time() - self.start_time), end=' ')
bname = os.path.basename(filename)
print("%s(%d): %s" % (bname, lineno,
linecache.getline(filename, lineno)), end='')
return self.localtrace
def localtrace_count(self, frame, why, arg):
if why == "line":
filename = frame.f_code.co_filename
lineno = frame.f_lineno
key = filename, lineno
self.counts[key] = self.counts.get(key, 0) + 1
return self.localtrace
def results(self):
return CoverageResults(self.counts, infile=self.infile,
outfile=self.outfile,
calledfuncs=self._calledfuncs,
callers=self._callers)
def _err_exit(msg):
sys.stderr.write("%s: %s\n" % (sys.argv[0], msg))
sys.exit(1)
def main(argv=None):
import getopt
if argv is None:
argv = sys.argv
try:
opts, prog_argv = getopt.getopt(argv[1:], "tcrRf:d:msC:lTg",
["help", "version", "trace", "count",
"report", "no-report", "summary",
"file=", "missing",
"ignore-module=", "ignore-dir=",
"coverdir=", "listfuncs",
"trackcalls", "timing"])
except getopt.error as msg:
sys.stderr.write("%s: %s\n" % (sys.argv[0], msg))
sys.stderr.write("Try `%s --help' for more information\n"
% sys.argv[0])
sys.exit(1)
trace = 0
count = 0
report = 0
no_report = 0
counts_file = None
missing = 0
ignore_modules = []
ignore_dirs = []
coverdir = None
summary = 0
listfuncs = False
countcallers = False
timing = False
for opt, val in opts:
if opt == "--help":
_usage(sys.stdout)
sys.exit(0)
if opt == "--version":
sys.stdout.write("trace 2.0\n")
sys.exit(0)
if opt == "-T" or opt == "--trackcalls":
countcallers = True
continue
if opt == "-l" or opt == "--listfuncs":
listfuncs = True
continue
if opt == "-g" or opt == "--timing":
timing = True
continue
if opt == "-t" or opt == "--trace":
trace = 1
continue
if opt == "-c" or opt == "--count":
count = 1
continue
if opt == "-r" or opt == "--report":
report = 1
continue
if opt == "-R" or opt == "--no-report":
no_report = 1
continue
if opt == "-f" or opt == "--file":
counts_file = val
continue
if opt == "-m" or opt == "--missing":
missing = 1
continue
if opt == "-C" or opt == "--coverdir":
coverdir = val
continue
if opt == "-s" or opt == "--summary":
summary = 1
continue
if opt == "--ignore-module":
for mod in val.split(","):
ignore_modules.append(mod.strip())
continue
if opt == "--ignore-dir":
for s in val.split(os.pathsep):
s = os.path.expandvars(s)
# should I also call expanduser? (after all, could use $HOME)
s = s.replace("$prefix",
os.path.join(sys.base_prefix, "lib",
"python" + sys.version[:3]))
s = s.replace("$exec_prefix",
os.path.join(sys.base_exec_prefix, "lib",
"python" + sys.version[:3]))
s = os.path.normpath(s)
ignore_dirs.append(s)
continue
assert 0, "Should never get here"
if listfuncs and (count or trace):
_err_exit("cannot specify both --listfuncs and (--trace or --count)")
if not (count or trace or report or listfuncs or countcallers):
_err_exit("must specify one of --trace, --count, --report, "
"--listfuncs, or --trackcalls")
if report and no_report:
_err_exit("cannot specify both --report and --no-report")
if report and not counts_file:
_err_exit("--report requires a --file")
if no_report and len(prog_argv) == 0:
_err_exit("missing name of file to run")
# everything is ready
if report:
results = CoverageResults(infile=counts_file, outfile=counts_file)
results.write_results(missing, summary=summary, coverdir=coverdir)
else:
sys.argv = prog_argv
progname = prog_argv[0]
sys.path[0] = os.path.split(progname)[0]
t = Trace(count, trace, countfuncs=listfuncs,
countcallers=countcallers, ignoremods=ignore_modules,
ignoredirs=ignore_dirs, infile=counts_file,
outfile=counts_file, timing=timing)
try:
with open(progname) as fp:
code = compile(fp.read(), progname, 'exec')
# try to emulate __main__ namespace as much as possible
globs = {
'__file__': progname,
'__name__': '__main__',
'__package__': None,
'__cached__': None,
}
t.runctx(code, globs, globs)
except OSError as err:
_err_exit("Cannot run file %r because: %s" % (sys.argv[0], err))
except SystemExit:
pass
results = t.results()
if not no_report:
results.write_results(missing, summary=summary, coverdir=coverdir)
# Deprecated API
def usage(outfile):
_warn("The trace.usage() function is deprecated",
DeprecationWarning, 2)
_usage(outfile)
class Ignore(_Ignore):
def __init__(self, modules=None, dirs=None):
_warn("The class trace.Ignore is deprecated",
DeprecationWarning, 2)
_Ignore.__init__(self, modules, dirs)
def modname(path):
_warn("The trace.modname() function is deprecated",
DeprecationWarning, 2)
return _modname(path)
def fullmodname(path):
_warn("The trace.fullmodname() function is deprecated",
DeprecationWarning, 2)
return _fullmodname(path)
def find_lines_from_code(code, strs):
_warn("The trace.find_lines_from_code() function is deprecated",
DeprecationWarning, 2)
return _find_lines_from_code(code, strs)
def find_lines(code, strs):
_warn("The trace.find_lines() function is deprecated",
DeprecationWarning, 2)
return _find_lines(code, strs)
def find_strings(filename, encoding=None):
_warn("The trace.find_strings() function is deprecated",
DeprecationWarning, 2)
return _find_strings(filename, encoding=None)
def find_executable_linenos(filename):
_warn("The trace.find_executable_linenos() function is deprecated",
DeprecationWarning, 2)
return _find_executable_linenos(filename)
if __name__=='__main__':
main()
"""Extract, format and print information about Python stack traces."""
import linecache
import sys
import operator
__all__ = ['extract_stack', 'extract_tb', 'format_exception',
'format_exception_only', 'format_list', 'format_stack',
'format_tb', 'print_exc', 'format_exc', 'print_exception',
'print_last', 'print_stack', 'print_tb',
'clear_frames']
#
# Formatting and printing lists of traceback lines.
#
def _format_list_iter(extracted_list):
for filename, lineno, name, line in extracted_list:
item = ' File "{}", line {}, in {}\n'.format(filename, lineno, name)
if line:
item = item + ' {}\n'.format(line.strip())
yield item
def print_list(extracted_list, file=None):
"""Print the list of tuples as returned by extract_tb() or
extract_stack() as a formatted stack trace to the given file."""
if file is None:
file = sys.stderr
for item in _format_list_iter(extracted_list):
print(item, file=file, end="")
def format_list(extracted_list):
"""Format a list of traceback entry tuples for printing.
Given a list of tuples as returned by extract_tb() or
extract_stack(), return a list of strings ready for printing.
Each string in the resulting list corresponds to the item with the
same index in the argument list. Each string ends in a newline;
the strings may contain internal newlines as well, for those items
whose source text line is not None.
"""
return list(_format_list_iter(extracted_list))
#
# Printing and Extracting Tracebacks.
#
# extractor takes curr and needs to return a tuple of:
# - Frame object
# - Line number
# - Next item (same type as curr)
# In practice, curr is either a traceback or a frame.
def _extract_tb_or_stack_iter(curr, limit, extractor):
if limit is None:
limit = getattr(sys, 'tracebacklimit', None)
n = 0
while curr is not None and (limit is None or n < limit):
f, lineno, next_item = extractor(curr)
co = f.f_code
filename = co.co_filename
name = co.co_name
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, f.f_globals)
if line:
line = line.strip()
else:
line = None
yield (filename, lineno, name, line)
curr = next_item
n += 1
def _extract_tb_iter(tb, limit):
return _extract_tb_or_stack_iter(
tb, limit,
operator.attrgetter("tb_frame", "tb_lineno", "tb_next"))
def print_tb(tb, limit=None, file=None):
"""Print up to 'limit' stack trace entries from the traceback 'tb'.
If 'limit' is omitted or None, all entries are printed. If 'file'
is omitted or None, the output goes to sys.stderr; otherwise
'file' should be an open file or file-like object with a write()
method.
"""
print_list(extract_tb(tb, limit=limit), file=file)
def format_tb(tb, limit=None):
"""A shorthand for 'format_list(extract_tb(tb, limit))'."""
return format_list(extract_tb(tb, limit=limit))
def extract_tb(tb, limit=None):
"""Return list of up to limit pre-processed entries from traceback.
This is useful for alternate formatting of stack traces. If
'limit' is omitted or None, all entries are extracted. A
pre-processed stack trace entry is a quadruple (filename, line
number, function name, text) representing the information that is
usually printed for a stack trace. The text is a string with
leading and trailing whitespace stripped; if the source is not
available it is None.
"""
return list(_extract_tb_iter(tb, limit=limit))
#
# Exception formatting and output.
#
_cause_message = (
"\nThe above exception was the direct cause "
"of the following exception:\n")
_context_message = (
"\nDuring handling of the above exception, "
"another exception occurred:\n")
def _iter_chain(exc, custom_tb=None, seen=None):
if seen is None:
seen = set()
seen.add(exc)
its = []
context = exc.__context__
cause = exc.__cause__
if cause is not None and cause not in seen:
its.append(_iter_chain(cause, False, seen))
its.append([(_cause_message, None)])
elif (context is not None and
not exc.__suppress_context__ and
context not in seen):
its.append(_iter_chain(context, None, seen))
its.append([(_context_message, None)])
its.append([(exc, custom_tb or exc.__traceback__)])
# itertools.chain is in an extension module and may be unavailable
for it in its:
yield from it
def _format_exception_iter(etype, value, tb, limit, chain):
if chain:
values = _iter_chain(value, tb)
else:
values = [(value, tb)]
for value, tb in values:
if isinstance(value, str):
# This is a cause/context message line
yield value + '\n'
continue
if tb:
yield 'Traceback (most recent call last):\n'
yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
yield from _format_exception_only_iter(type(value), value)
def print_exception(etype, value, tb, limit=None, file=None, chain=True):
"""Print exception up to 'limit' stack trace entries from 'tb' to 'file'.
This differs from print_tb() in the following ways: (1) if
traceback is not None, it prints a header "Traceback (most recent
call last):"; (2) it prints the exception type and value after the
stack trace; (3) if type is SyntaxError and value has the
appropriate format, it prints the line where the syntax error
occurred with a caret on the next line indicating the approximate
position of the error.
"""
if file is None:
file = sys.stderr
for line in _format_exception_iter(etype, value, tb, limit, chain):
print(line, file=file, end="")
def format_exception(etype, value, tb, limit=None, chain=True):
"""Format a stack trace and the exception information.
The arguments have the same meaning as the corresponding arguments
to print_exception(). The return value is a list of strings, each
ending in a newline and some containing internal newlines. When
these lines are concatenated and printed, exactly the same text is
printed as does print_exception().
"""
return list(_format_exception_iter(etype, value, tb, limit, chain))
def format_exception_only(etype, value):
"""Format the exception part of a traceback.
The arguments are the exception type and value such as given by
sys.last_type and sys.last_value. The return value is a list of
strings, each ending in a newline.
Normally, the list contains a single string; however, for
SyntaxError exceptions, it contains several lines that (when
printed) display detailed information about where the syntax
error occurred.
The message indicating which exception occurred is always the last
string in the list.
"""
return list(_format_exception_only_iter(etype, value))
def _format_exception_only_iter(etype, value):
# Gracefully handle (the way Python 2.4 and earlier did) the case of
# being called with (None, None).
if etype is None:
yield _format_final_exc_line(etype, value)
return
stype = etype.__name__
smod = etype.__module__
if smod not in ("__main__", "builtins"):
stype = smod + '.' + stype
if not issubclass(etype, SyntaxError):
yield _format_final_exc_line(stype, value)
return
# It was a syntax error; show exactly where the problem was found.
filename = value.filename or "<string>"
lineno = str(value.lineno) or '?'
yield ' File "{}", line {}\n'.format(filename, lineno)
badline = value.text
offset = value.offset
if badline is not None:
yield ' {}\n'.format(badline.strip())
if offset is not None:
caretspace = badline.rstrip('\n')
offset = min(len(caretspace), offset) - 1
caretspace = caretspace[:offset].lstrip()
# non-space whitespace (likes tabs) must be kept for alignment
caretspace = ((c.isspace() and c or ' ') for c in caretspace)
yield ' {}^\n'.format(''.join(caretspace))
msg = value.msg or "<no detail available>"
yield "{}: {}\n".format(stype, msg)
def _format_final_exc_line(etype, value):
valuestr = _some_str(value)
if value is None or not valuestr:
line = "%s\n" % etype
else:
line = "%s: %s\n" % (etype, valuestr)
return line
def _some_str(value):
try:
return str(value)
except:
return '<unprintable %s object>' % type(value).__name__
def print_exc(limit=None, file=None, chain=True):
"""Shorthand for 'print_exception(*sys.exc_info(), limit, file)'."""
print_exception(*sys.exc_info(), limit=limit, file=file, chain=chain)
def format_exc(limit=None, chain=True):
"""Like print_exc() but return a string."""
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
def print_last(limit=None, file=None, chain=True):
"""This is a shorthand for 'print_exception(sys.last_type,
sys.last_value, sys.last_traceback, limit, file)'."""
if not hasattr(sys, "last_type"):
raise ValueError("no last exception")
print_exception(sys.last_type, sys.last_value, sys.last_traceback,
limit, file, chain)
#
# Printing and Extracting Stacks.
#
def _extract_stack_iter(f, limit=None):
return _extract_tb_or_stack_iter(
f, limit, lambda f: (f, f.f_lineno, f.f_back))
def _get_stack(f):
if f is None:
f = sys._getframe().f_back.f_back
return f
def print_stack(f=None, limit=None, file=None):
"""Print a stack trace from its invocation point.
The optional 'f' argument can be used to specify an alternate
stack frame at which to start. The optional 'limit' and 'file'
arguments have the same meaning as for print_exception().
"""
print_list(extract_stack(_get_stack(f), limit=limit), file=file)
def format_stack(f=None, limit=None):
"""Shorthand for 'format_list(extract_stack(f, limit))'."""
return format_list(extract_stack(_get_stack(f), limit=limit))
def extract_stack(f=None, limit=None):
"""Extract the raw traceback from the current stack frame.
The return value has the same format as for extract_tb(). The
optional 'f' and 'limit' arguments have the same meaning as for
print_stack(). Each item in the list is a quadruple (filename,
line number, function name, text), and the entries are in order
from oldest to newest stack frame.
"""
stack = list(_extract_stack_iter(_get_stack(f), limit=limit))
stack.reverse()
return stack
def clear_frames(tb):
"Clear all references to local variables in the frames of a traceback."
while tb is not None:
try:
tb.tb_frame.clear()
except RuntimeError:
# Ignore the exception raised if the frame is still executing.
pass
tb = tb.tb_next
from collections import Sequence, Iterable
from functools import total_ordering
import fnmatch
import linecache
import os.path
import pickle
# Import types and functions implemented in C
from _tracemalloc import *
from _tracemalloc import _get_object_traceback, _get_traces
def _format_size(size, sign):
for unit in ('B', 'KiB', 'MiB', 'GiB', 'TiB'):
if abs(size) < 100 and unit != 'B':
# 3 digits (xx.x UNIT)
if sign:
return "%+.1f %s" % (size, unit)
else:
return "%.1f %s" % (size, unit)
if abs(size) < 10 * 1024 or unit == 'TiB':
# 4 or 5 digits (xxxx UNIT)
if sign:
return "%+.0f %s" % (size, unit)
else:
return "%.0f %s" % (size, unit)
size /= 1024
class Statistic:
"""
Statistic difference on memory allocations between two Snapshot instance.
"""
__slots__ = ('traceback', 'size', 'count')
def __init__(self, traceback, size, count):
self.traceback = traceback
self.size = size
self.count = count
def __hash__(self):
return hash((self.traceback, self.size, self.count))
def __eq__(self, other):
return (self.traceback == other.traceback
and self.size == other.size
and self.count == other.count)
def __str__(self):
text = ("%s: size=%s, count=%i"
% (self.traceback,
_format_size(self.size, False),
self.count))
if self.count:
average = self.size / self.count
text += ", average=%s" % _format_size(average, False)
return text
def __repr__(self):
return ('<Statistic traceback=%r size=%i count=%i>'
% (self.traceback, self.size, self.count))
def _sort_key(self):
return (self.size, self.count, self.traceback)
class StatisticDiff:
"""
Statistic difference on memory allocations between an old and a new
Snapshot instance.
"""
__slots__ = ('traceback', 'size', 'size_diff', 'count', 'count_diff')
def __init__(self, traceback, size, size_diff, count, count_diff):
self.traceback = traceback
self.size = size
self.size_diff = size_diff
self.count = count
self.count_diff = count_diff
def __hash__(self):
return hash((self.traceback, self.size, self.size_diff,
self.count, self.count_diff))
def __eq__(self, other):
return (self.traceback == other.traceback
and self.size == other.size
and self.size_diff == other.size_diff
and self.count == other.count
and self.count_diff == other.count_diff)
def __str__(self):
text = ("%s: size=%s (%s), count=%i (%+i)"
% (self.traceback,
_format_size(self.size, False),
_format_size(self.size_diff, True),
self.count,
self.count_diff))
if self.count:
average = self.size / self.count
text += ", average=%s" % _format_size(average, False)
return text
def __repr__(self):
return ('<StatisticDiff traceback=%r size=%i (%+i) count=%i (%+i)>'
% (self.traceback, self.size, self.size_diff,
self.count, self.count_diff))
def _sort_key(self):
return (abs(self.size_diff), self.size,
abs(self.count_diff), self.count,
self.traceback)
def _compare_grouped_stats(old_group, new_group):
statistics = []
for traceback, stat in new_group.items():
previous = old_group.pop(traceback, None)
if previous is not None:
stat = StatisticDiff(traceback,
stat.size, stat.size - previous.size,
stat.count, stat.count - previous.count)
else:
stat = StatisticDiff(traceback,
stat.size, stat.size,
stat.count, stat.count)
statistics.append(stat)
for traceback, stat in old_group.items():
stat = StatisticDiff(traceback, 0, -stat.size, 0, -stat.count)
statistics.append(stat)
return statistics
@total_ordering
class Frame:
"""
Frame of a traceback.
"""
__slots__ = ("_frame",)
def __init__(self, frame):
# frame is a tuple: (filename: str, lineno: int)
self._frame = frame
@property
def filename(self):
return self._frame[0]
@property
def lineno(self):
return self._frame[1]
def __eq__(self, other):
return (self._frame == other._frame)
def __lt__(self, other):
return (self._frame < other._frame)
def __hash__(self):
return hash(self._frame)
def __str__(self):
return "%s:%s" % (self.filename, self.lineno)
def __repr__(self):
return "<Frame filename=%r lineno=%r>" % (self.filename, self.lineno)
@total_ordering
class Traceback(Sequence):
"""
Sequence of Frame instances sorted from the most recent frame
to the oldest frame.
"""
__slots__ = ("_frames",)
def __init__(self, frames):
Sequence.__init__(self)
# frames is a tuple of frame tuples: see Frame constructor for the
# format of a frame tuple
self._frames = frames
def __len__(self):
return len(self._frames)
def __getitem__(self, index):
if isinstance(index, slice):
return tuple(Frame(trace) for trace in self._frames[index])
else:
return Frame(self._frames[index])
def __contains__(self, frame):
return frame._frame in self._frames
def __hash__(self):
return hash(self._frames)
def __eq__(self, other):
return (self._frames == other._frames)
def __lt__(self, other):
return (self._frames < other._frames)
def __str__(self):
return str(self[0])
def __repr__(self):
return "<Traceback %r>" % (tuple(self),)
def format(self, limit=None):
lines = []
if limit is not None and limit < 0:
return lines
for frame in self[:limit]:
lines.append(' File "%s", line %s'
% (frame.filename, frame.lineno))
line = linecache.getline(frame.filename, frame.lineno).strip()
if line:
lines.append(' %s' % line)
return lines
def get_object_traceback(obj):
"""
Get the traceback where the Python object *obj* was allocated.
Return a Traceback instance.
Return None if the tracemalloc module is not tracing memory allocations or
did not trace the allocation of the object.
"""
frames = _get_object_traceback(obj)
if frames is not None:
return Traceback(frames)
else:
return None
class Trace:
"""
Trace of a memory block.
"""
__slots__ = ("_trace",)
def __init__(self, trace):
# trace is a tuple: (size, traceback), see Traceback constructor
# for the format of the traceback tuple
self._trace = trace
@property
def size(self):
return self._trace[0]
@property
def traceback(self):
return Traceback(self._trace[1])
def __eq__(self, other):
return (self._trace == other._trace)
def __hash__(self):
return hash(self._trace)
def __str__(self):
return "%s: %s" % (self.traceback, _format_size(self.size, False))
def __repr__(self):
return ("<Trace size=%s, traceback=%r>"
% (_format_size(self.size, False), self.traceback))
class _Traces(Sequence):
def __init__(self, traces):
Sequence.__init__(self)
# traces is a tuple of trace tuples: see Trace constructor
self._traces = traces
def __len__(self):
return len(self._traces)
def __getitem__(self, index):
if isinstance(index, slice):
return tuple(Trace(trace) for trace in self._traces[index])
else:
return Trace(self._traces[index])
def __contains__(self, trace):
return trace._trace in self._traces
def __eq__(self, other):
return (self._traces == other._traces)
def __repr__(self):
return "<Traces len=%s>" % len(self)
def _normalize_filename(filename):
filename = os.path.normcase(filename)
if filename.endswith(('.pyc', '.pyo')):
filename = filename[:-1]
return filename
class Filter:
def __init__(self, inclusive, filename_pattern,
lineno=None, all_frames=False):
self.inclusive = inclusive
self._filename_pattern = _normalize_filename(filename_pattern)
self.lineno = lineno
self.all_frames = all_frames
@property
def filename_pattern(self):
return self._filename_pattern
def __match_frame(self, filename, lineno):
filename = _normalize_filename(filename)
if not fnmatch.fnmatch(filename, self._filename_pattern):
return False
if self.lineno is None:
return True
else:
return (lineno == self.lineno)
def _match_frame(self, filename, lineno):
return self.__match_frame(filename, lineno) ^ (not self.inclusive)
def _match_traceback(self, traceback):
if self.all_frames:
if any(self.__match_frame(filename, lineno)
for filename, lineno in traceback):
return self.inclusive
else:
return (not self.inclusive)
else:
filename, lineno = traceback[0]
return self._match_frame(filename, lineno)
class Snapshot:
"""
Snapshot of traces of memory blocks allocated by Python.
"""
def __init__(self, traces, traceback_limit):
# traces is a tuple of trace tuples: see _Traces constructor for
# the exact format
self.traces = _Traces(traces)
self.traceback_limit = traceback_limit
def dump(self, filename):
"""
Write the snapshot into a file.
"""
with open(filename, "wb") as fp:
pickle.dump(self, fp, pickle.HIGHEST_PROTOCOL)
@staticmethod
def load(filename):
"""
Load a snapshot from a file.
"""
with open(filename, "rb") as fp:
return pickle.load(fp)
def _filter_trace(self, include_filters, exclude_filters, trace):
traceback = trace[1]
if include_filters:
if not any(trace_filter._match_traceback(traceback)
for trace_filter in include_filters):
return False
if exclude_filters:
if any(not trace_filter._match_traceback(traceback)
for trace_filter in exclude_filters):
return False
return True
def filter_traces(self, filters):
"""
Create a new Snapshot instance with a filtered traces sequence, filters
is a list of Filter instances. If filters is an empty list, return a
new Snapshot instance with a copy of the traces.
"""
if not isinstance(filters, Iterable):
raise TypeError("filters must be a list of filters, not %s"
% type(filters).__name__)
if filters:
include_filters = []
exclude_filters = []
for trace_filter in filters:
if trace_filter.inclusive:
include_filters.append(trace_filter)
else:
exclude_filters.append(trace_filter)
new_traces = [trace for trace in self.traces._traces
if self._filter_trace(include_filters,
exclude_filters,
trace)]
else:
new_traces = self.traces._traces.copy()
return Snapshot(new_traces, self.traceback_limit)
def _group_by(self, key_type, cumulative):
if key_type not in ('traceback', 'filename', 'lineno'):
raise ValueError("unknown key_type: %r" % (key_type,))
if cumulative and key_type not in ('lineno', 'filename'):
raise ValueError("cumulative mode cannot by used "
"with key type %r" % key_type)
stats = {}
tracebacks = {}
if not cumulative:
for trace in self.traces._traces:
size, trace_traceback = trace
try:
traceback = tracebacks[trace_traceback]
except KeyError:
if key_type == 'traceback':
frames = trace_traceback
elif key_type == 'lineno':
frames = trace_traceback[:1]
else: # key_type == 'filename':
frames = ((trace_traceback[0][0], 0),)
traceback = Traceback(frames)
tracebacks[trace_traceback] = traceback
try:
stat = stats[traceback]
stat.size += size
stat.count += 1
except KeyError:
stats[traceback] = Statistic(traceback, size, 1)
else:
# cumulative statistics
for trace in self.traces._traces:
size, trace_traceback = trace
for frame in trace_traceback:
try:
traceback = tracebacks[frame]
except KeyError:
if key_type == 'lineno':
frames = (frame,)
else: # key_type == 'filename':
frames = ((frame[0], 0),)
traceback = Traceback(frames)
tracebacks[frame] = traceback
try:
stat = stats[traceback]
stat.size += size
stat.count += 1
except KeyError:
stats[traceback] = Statistic(traceback, size, 1)
return stats
def statistics(self, key_type, cumulative=False):
"""
Group statistics by key_type. Return a sorted list of Statistic
instances.
"""
grouped = self._group_by(key_type, cumulative)
statistics = list(grouped.values())
statistics.sort(reverse=True, key=Statistic._sort_key)
return statistics
def compare_to(self, old_snapshot, key_type, cumulative=False):
"""
Compute the differences with an old snapshot old_snapshot. Get
statistics as a sorted list of StatisticDiff instances, grouped by
group_by.
"""
new_group = self._group_by(key_type, cumulative)
old_group = old_snapshot._group_by(key_type, cumulative)
statistics = _compare_grouped_stats(old_group, new_group)
statistics.sort(reverse=True, key=StatisticDiff._sort_key)
return statistics
def take_snapshot():
"""
Take a snapshot of traces of memory blocks allocated by Python.
"""
if not is_tracing():
raise RuntimeError("the tracemalloc module must be tracing memory "
"allocations to take a snapshot")
traces = _get_traces()
traceback_limit = get_traceback_limit()
return Snapshot(traces, traceback_limit)
"""Terminal utilities."""
# Author: Steen Lumholt.
from termios import *
__all__ = ["setraw", "setcbreak"]
# Indexes for termios list.
IFLAG = 0
OFLAG = 1
CFLAG = 2
LFLAG = 3
ISPEED = 4
OSPEED = 5
CC = 6
def setraw(fd, when=TCSAFLUSH):
"""Put terminal into a raw mode."""
mode = tcgetattr(fd)
mode[IFLAG] = mode[IFLAG] & ~(BRKINT | ICRNL | INPCK | ISTRIP | IXON)
mode[OFLAG] = mode[OFLAG] & ~(OPOST)
mode[CFLAG] = mode[CFLAG] & ~(CSIZE | PARENB)
mode[CFLAG] = mode[CFLAG] | CS8
mode[LFLAG] = mode[LFLAG] & ~(ECHO | ICANON | IEXTEN | ISIG)
mode[CC][VMIN] = 1
mode[CC][VTIME] = 0
tcsetattr(fd, when, mode)
def setcbreak(fd, when=TCSAFLUSH):
"""Put terminal into a cbreak mode."""
mode = tcgetattr(fd)
mode[LFLAG] = mode[LFLAG] & ~(ECHO | ICANON)
mode[CC][VMIN] = 1
mode[CC][VTIME] = 0
tcsetattr(fd, when, mode)
#
# turtle.py: a Tkinter based turtle graphics module for Python
# Version 1.1b - 4. 5. 2009
#
# Copyright (C) 2006 - 2010 Gregor Lingl
# email: [email protected]
#
# This software is provided 'as-is', without any express or implied
# warranty. In no event will the authors be held liable for any damages
# arising from the use of this software.
#
# Permission is granted to anyone to use this software for any purpose,
# including commercial applications, and to alter it and redistribute it
# freely, subject to the following restrictions:
#
# 1. The origin of this software must not be misrepresented; you must not
# claim that you wrote the original software. If you use this software
# in a product, an acknowledgment in the product documentation would be
# appreciated but is not required.
# 2. Altered source versions must be plainly marked as such, and must not be
# misrepresented as being the original software.
# 3. This notice may not be removed or altered from any source distribution.
"""
Turtle graphics is a popular way for introducing programming to
kids. It was part of the original Logo programming language developed
by Wally Feurzig and Seymour Papert in 1966.
Imagine a robotic turtle starting at (0, 0) in the x-y plane. After an ``import turtle``, give it
the command turtle.forward(15), and it moves (on-screen!) 15 pixels in
the direction it is facing, drawing a line as it moves. Give it the
command turtle.right(25), and it rotates in-place 25 degrees clockwise.
By combining together these and similar commands, intricate shapes and
pictures can easily be drawn.
----- turtle.py
This module is an extended reimplementation of turtle.py from the
Python standard distribution up to Python 2.5. (See: http://www.python.org)
It tries to keep the merits of turtle.py and to be (nearly) 100%
compatible with it. This means in the first place to enable the
learning programmer to use all the commands, classes and methods
interactively when using the module from within IDLE run with
the -n switch.
Roughly it has the following features added:
- Better animation of the turtle movements, especially of turning the
turtle. So the turtles can more easily be used as a visual feedback
instrument by the (beginning) programmer.
- Different turtle shapes, gif-images as turtle shapes, user defined
and user controllable turtle shapes, among them compound
(multicolored) shapes. Turtle shapes can be stretched and tilted, which
makes turtles very versatile geometrical objects.
- Fine control over turtle movement and screen updates via delay(),
and enhanced tracer() and speed() methods.
- Aliases for the most commonly used commands, like fd for forward etc.,
following the early Logo traditions. This reduces the boring work of
typing long sequences of commands, which often occur in a natural way
when kids try to program fancy pictures on their first encounter with
turtle graphics.
- Turtles now have an undo()-method with configurable undo-buffer.
- Some simple commands/methods for creating event driven programs
(mouse-, key-, timer-events). Especially useful for programming games.
- A scrollable Canvas class. The default scrollable Canvas can be
extended interactively as needed while playing around with the turtle(s).
- A TurtleScreen class with methods controlling background color or
background image, window and canvas size and other properties of the
TurtleScreen.
- There is a method, setworldcoordinates(), to install a user defined
coordinate-system for the TurtleScreen.
- The implementation uses a 2-vector class named Vec2D, derived from tuple.
This class is public, so it can be imported by the application programmer,
which makes certain types of computations very natural and compact.
- Appearance of the TurtleScreen and the Turtles at startup/import can be
configured by means of a turtle.cfg configuration file.
The default configuration mimics the appearance of the old turtle module.
- If configured appropriately the module reads in docstrings from a docstring
dictionary in some different language, supplied separately and replaces
the English ones by those read in. There is a utility function
write_docstringdict() to write a dictionary with the original (English)
docstrings to disc, so it can serve as a template for translations.
Behind the scenes there are some features included with possible
extensions in mind. These will be commented and documented elsewhere.
"""
_ver = "turtle 1.1b- - for Python 3.1 - 4. 5. 2009"
# print(_ver)
import tkinter as TK
import types
import math
import time
import inspect
import sys
from os.path import isfile, split, join
from copy import deepcopy
from tkinter import simpledialog
_tg_classes = ['ScrolledCanvas', 'TurtleScreen', 'Screen',
'RawTurtle', 'Turtle', 'RawPen', 'Pen', 'Shape', 'Vec2D']
_tg_screen_functions = ['addshape', 'bgcolor', 'bgpic', 'bye',
'clearscreen', 'colormode', 'delay', 'exitonclick', 'getcanvas',
'getshapes', 'listen', 'mainloop', 'mode', 'numinput',
'onkey', 'onkeypress', 'onkeyrelease', 'onscreenclick', 'ontimer',
'register_shape', 'resetscreen', 'screensize', 'setup',
'setworldcoordinates', 'textinput', 'title', 'tracer', 'turtles', 'update',
'window_height', 'window_width']
_tg_turtle_functions = ['back', 'backward', 'begin_fill', 'begin_poly', 'bk',
'circle', 'clear', 'clearstamp', 'clearstamps', 'clone', 'color',
'degrees', 'distance', 'dot', 'down', 'end_fill', 'end_poly', 'fd',
'fillcolor', 'filling', 'forward', 'get_poly', 'getpen', 'getscreen', 'get_shapepoly',
'getturtle', 'goto', 'heading', 'hideturtle', 'home', 'ht', 'isdown',
'isvisible', 'left', 'lt', 'onclick', 'ondrag', 'onrelease', 'pd',
'pen', 'pencolor', 'pendown', 'pensize', 'penup', 'pos', 'position',
'pu', 'radians', 'right', 'reset', 'resizemode', 'rt',
'seth', 'setheading', 'setpos', 'setposition', 'settiltangle',
'setundobuffer', 'setx', 'sety', 'shape', 'shapesize', 'shapetransform', 'shearfactor', 'showturtle',
'speed', 'st', 'stamp', 'tilt', 'tiltangle', 'towards',
'turtlesize', 'undo', 'undobufferentries', 'up', 'width',
'write', 'xcor', 'ycor']
_tg_utilities = ['write_docstringdict', 'done']
__all__ = (_tg_classes + _tg_screen_functions + _tg_turtle_functions +
_tg_utilities + ['Terminator']) # + _math_functions)
_alias_list = ['addshape', 'backward', 'bk', 'fd', 'ht', 'lt', 'pd', 'pos',
'pu', 'rt', 'seth', 'setpos', 'setposition', 'st',
'turtlesize', 'up', 'width']
_CFG = {"width" : 0.5, # Screen
"height" : 0.75,
"canvwidth" : 400,
"canvheight": 300,
"leftright": None,
"topbottom": None,
"mode": "standard", # TurtleScreen
"colormode": 1.0,
"delay": 10,
"undobuffersize": 1000, # RawTurtle
"shape": "classic",
"pencolor" : "black",
"fillcolor" : "black",
"resizemode" : "noresize",
"visible" : True,
"language": "english", # docstrings
"exampleturtle": "turtle",
"examplescreen": "screen",
"title": "Python Turtle Graphics",
"using_IDLE": False
}
def config_dict(filename):
"""Convert content of config-file into dictionary."""
with open(filename, "r") as f:
cfglines = f.readlines()
cfgdict = {}
for line in cfglines:
line = line.strip()
if not line or line.startswith("#"):
continue
try:
key, value = line.split("=")
except:
print("Bad line in config-file %s:\n%s" % (filename,line))
continue
key = key.strip()
value = value.strip()
if value in ["True", "False", "None", "''", '""']:
value = eval(value)
else:
try:
if "." in value:
value = float(value)
else:
value = int(value)
except:
pass # value need not be converted
cfgdict[key] = value
return cfgdict
def readconfig(cfgdict):
"""Read config-files, change configuration-dict accordingly.
If there is a turtle.cfg file in the current working directory,
read it from there. If this contains an importconfig-value,
say 'myway', construct filename turtle_mayway.cfg else use
turtle.cfg and read it from the import-directory, where
turtle.py is located.
Update configuration dictionary first according to config-file,
in the import directory, then according to config-file in the
current working directory.
If no config-file is found, the default configuration is used.
"""
default_cfg = "turtle.cfg"
cfgdict1 = {}
cfgdict2 = {}
if isfile(default_cfg):
cfgdict1 = config_dict(default_cfg)
if "importconfig" in cfgdict1:
default_cfg = "turtle_%s.cfg" % cfgdict1["importconfig"]
try:
head, tail = split(__file__)
cfg_file2 = join(head, default_cfg)
except:
cfg_file2 = ""
if isfile(cfg_file2):
cfgdict2 = config_dict(cfg_file2)
_CFG.update(cfgdict2)
_CFG.update(cfgdict1)
try:
readconfig(_CFG)
except:
print ("No configfile read, reason unknown")
class Vec2D(tuple):
"""A 2 dimensional vector class, used as a helper class
for implementing turtle graphics.
May be useful for turtle graphics programs also.
Derived from tuple, so a vector is a tuple!
Provides (for a, b vectors, k number):
a+b vector addition
a-b vector subtraction
a*b inner product
k*a and a*k multiplication with scalar
|a| absolute value of a
a.rotate(angle) rotation
"""
def __new__(cls, x, y):
return tuple.__new__(cls, (x, y))
def __add__(self, other):
return Vec2D(self[0]+other[0], self[1]+other[1])
def __mul__(self, other):
if isinstance(other, Vec2D):
return self[0]*other[0]+self[1]*other[1]
return Vec2D(self[0]*other, self[1]*other)
def __rmul__(self, other):
if isinstance(other, int) or isinstance(other, float):
return Vec2D(self[0]*other, self[1]*other)
def __sub__(self, other):
return Vec2D(self[0]-other[0], self[1]-other[1])
def __neg__(self):
return Vec2D(-self[0], -self[1])
def __abs__(self):
return (self[0]**2 + self[1]**2)**0.5
def rotate(self, angle):
"""rotate self counterclockwise by angle
"""
perp = Vec2D(-self[1], self[0])
angle = angle * math.pi / 180.0
c, s = math.cos(angle), math.sin(angle)
return Vec2D(self[0]*c+perp[0]*s, self[1]*c+perp[1]*s)
def __getnewargs__(self):
return (self[0], self[1])
def __repr__(self):
return "(%.2f,%.2f)" % self
##############################################################################
### From here up to line : Tkinter - Interface for turtle.py ###
### May be replaced by an interface to some different graphics toolkit ###
##############################################################################
## helper functions for Scrolled Canvas, to forward Canvas-methods
## to ScrolledCanvas class
def __methodDict(cls, _dict):
"""helper function for Scrolled Canvas"""
baseList = list(cls.__bases__)
baseList.reverse()
for _super in baseList:
__methodDict(_super, _dict)
for key, value in cls.__dict__.items():
if type(value) == types.FunctionType:
_dict[key] = value
def __methods(cls):
"""helper function for Scrolled Canvas"""
_dict = {}
__methodDict(cls, _dict)
return _dict.keys()
__stringBody = (
'def %(method)s(self, *args, **kw): return ' +
'self.%(attribute)s.%(method)s(*args, **kw)')
def __forwardmethods(fromClass, toClass, toPart, exclude = ()):
### MANY CHANGES ###
_dict_1 = {}
__methodDict(toClass, _dict_1)
_dict = {}
mfc = __methods(fromClass)
for ex in _dict_1.keys():
if ex[:1] == '_' or ex[-1:] == '_' or ex in exclude or ex in mfc:
pass
else:
_dict[ex] = _dict_1[ex]
for method, func in _dict.items():
d = {'method': method, 'func': func}
if isinstance(toPart, str):
execString = \
__stringBody % {'method' : method, 'attribute' : toPart}
exec(execString, d)
setattr(fromClass, method, d[method]) ### NEWU!
class ScrolledCanvas(TK.Frame):
"""Modeled after the scrolled canvas class from Grayons's Tkinter book.
Used as the default canvas, which pops up automatically when
using turtle graphics functions or the Turtle class.
"""
def __init__(self, master, width=500, height=350,
canvwidth=600, canvheight=500):
TK.Frame.__init__(self, master, width=width, height=height)
self._rootwindow = self.winfo_toplevel()
self.width, self.height = width, height
self.canvwidth, self.canvheight = canvwidth, canvheight
self.bg = "white"
self._canvas = TK.Canvas(master, width=width, height=height,
bg=self.bg, relief=TK.SUNKEN, borderwidth=2)
self.hscroll = TK.Scrollbar(master, command=self._canvas.xview,
orient=TK.HORIZONTAL)
self.vscroll = TK.Scrollbar(master, command=self._canvas.yview)
self._canvas.configure(xscrollcommand=self.hscroll.set,
yscrollcommand=self.vscroll.set)
self.rowconfigure(0, weight=1, minsize=0)
self.columnconfigure(0, weight=1, minsize=0)
self._canvas.grid(padx=1, in_ = self, pady=1, row=0,
column=0, rowspan=1, columnspan=1, sticky='news')
self.vscroll.grid(padx=1, in_ = self, pady=1, row=0,
column=1, rowspan=1, columnspan=1, sticky='news')
self.hscroll.grid(padx=1, in_ = self, pady=1, row=1,
column=0, rowspan=1, columnspan=1, sticky='news')
self.reset()
self._rootwindow.bind('<Configure>', self.onResize)
def reset(self, canvwidth=None, canvheight=None, bg = None):
"""Adjust canvas and scrollbars according to given canvas size."""
if canvwidth:
self.canvwidth = canvwidth
if canvheight:
self.canvheight = canvheight
if bg:
self.bg = bg
self._canvas.config(bg=bg,
scrollregion=(-self.canvwidth//2, -self.canvheight//2,
self.canvwidth//2, self.canvheight//2))
self._canvas.xview_moveto(0.5*(self.canvwidth - self.width + 30) /
self.canvwidth)
self._canvas.yview_moveto(0.5*(self.canvheight- self.height + 30) /
self.canvheight)
self.adjustScrolls()
def adjustScrolls(self):
""" Adjust scrollbars according to window- and canvas-size.
"""
cwidth = self._canvas.winfo_width()
cheight = self._canvas.winfo_height()
self._canvas.xview_moveto(0.5*(self.canvwidth-cwidth)/self.canvwidth)
self._canvas.yview_moveto(0.5*(self.canvheight-cheight)/self.canvheight)
if cwidth < self.canvwidth or cheight < self.canvheight:
self.hscroll.grid(padx=1, in_ = self, pady=1, row=1,
column=0, rowspan=1, columnspan=1, sticky='news')
self.vscroll.grid(padx=1, in_ = self, pady=1, row=0,
column=1, rowspan=1, columnspan=1, sticky='news')
else:
self.hscroll.grid_forget()
self.vscroll.grid_forget()
def onResize(self, event):
"""self-explanatory"""
self.adjustScrolls()
def bbox(self, *args):
""" 'forward' method, which canvas itself has inherited...
"""
return self._canvas.bbox(*args)
def cget(self, *args, **kwargs):
""" 'forward' method, which canvas itself has inherited...
"""
return self._canvas.cget(*args, **kwargs)
def config(self, *args, **kwargs):
""" 'forward' method, which canvas itself has inherited...
"""
self._canvas.config(*args, **kwargs)
def bind(self, *args, **kwargs):
""" 'forward' method, which canvas itself has inherited...
"""
self._canvas.bind(*args, **kwargs)
def unbind(self, *args, **kwargs):
""" 'forward' method, which canvas itself has inherited...
"""
self._canvas.unbind(*args, **kwargs)
def focus_force(self):
""" 'forward' method, which canvas itself has inherited...
"""
self._canvas.focus_force()
__forwardmethods(ScrolledCanvas, TK.Canvas, '_canvas')
class _Root(TK.Tk):
"""Root class for Screen based on Tkinter."""
def __init__(self):
TK.Tk.__init__(self)
def setupcanvas(self, width, height, cwidth, cheight):
self._canvas = ScrolledCanvas(self, width, height, cwidth, cheight)
self._canvas.pack(expand=1, fill="both")
def _getcanvas(self):
return self._canvas
def set_geometry(self, width, height, startx, starty):
self.geometry("%dx%d%+d%+d"%(width, height, startx, starty))
def ondestroy(self, destroy):
self.wm_protocol("WM_DELETE_WINDOW", destroy)
def win_width(self):
return self.winfo_screenwidth()
def win_height(self):
return self.winfo_screenheight()
Canvas = TK.Canvas
class TurtleScreenBase(object):
"""Provide the basic graphics functionality.
Interface between Tkinter and turtle.py.
To port turtle.py to some different graphics toolkit
a corresponding TurtleScreenBase class has to be implemented.
"""
@staticmethod
def _blankimage():
"""return a blank image object
"""
img = TK.PhotoImage(width=1, height=1)
img.blank()
return img
@staticmethod
def _image(filename):
"""return an image object containing the
imagedata from a gif-file named filename.
"""
return TK.PhotoImage(file=filename)
def __init__(self, cv):
self.cv = cv
if isinstance(cv, ScrolledCanvas):
w = self.cv.canvwidth
h = self.cv.canvheight
else: # expected: ordinary TK.Canvas
w = int(self.cv.cget("width"))
h = int(self.cv.cget("height"))
self.cv.config(scrollregion = (-w//2, -h//2, w//2, h//2 ))
self.canvwidth = w
self.canvheight = h
self.xscale = self.yscale = 1.0
def _createpoly(self):
"""Create an invisible polygon item on canvas self.cv)
"""
return self.cv.create_polygon((0, 0, 0, 0, 0, 0), fill="", outline="")
def _drawpoly(self, polyitem, coordlist, fill=None,
outline=None, width=None, top=False):
"""Configure polygonitem polyitem according to provided
arguments:
coordlist is sequence of coordinates
fill is filling color
outline is outline color
top is a boolean value, which specifies if polyitem
will be put on top of the canvas' displaylist so it
will not be covered by other items.
"""
cl = []
for x, y in coordlist:
cl.append(x * self.xscale)
cl.append(-y * self.yscale)
self.cv.coords(polyitem, *cl)
if fill is not None:
self.cv.itemconfigure(polyitem, fill=fill)
if outline is not None:
self.cv.itemconfigure(polyitem, outline=outline)
if width is not None:
self.cv.itemconfigure(polyitem, width=width)
if top:
self.cv.tag_raise(polyitem)
def _createline(self):
"""Create an invisible line item on canvas self.cv)
"""
return self.cv.create_line(0, 0, 0, 0, fill="", width=2,
capstyle = TK.ROUND)
def _drawline(self, lineitem, coordlist=None,
fill=None, width=None, top=False):
"""Configure lineitem according to provided arguments:
coordlist is sequence of coordinates
fill is drawing color
width is width of drawn line.
top is a boolean value, which specifies if polyitem
will be put on top of the canvas' displaylist so it
will not be covered by other items.
"""
if coordlist is not None:
cl = []
for x, y in coordlist:
cl.append(x * self.xscale)
cl.append(-y * self.yscale)
self.cv.coords(lineitem, *cl)
if fill is not None:
self.cv.itemconfigure(lineitem, fill=fill)
if width is not None:
self.cv.itemconfigure(lineitem, width=width)
if top:
self.cv.tag_raise(lineitem)
def _delete(self, item):
"""Delete graphics item from canvas.
If item is"all" delete all graphics items.
"""
self.cv.delete(item)
def _update(self):
"""Redraw graphics items on canvas
"""
self.cv.update()
def _delay(self, delay):
"""Delay subsequent canvas actions for delay ms."""
self.cv.after(delay)
def _iscolorstring(self, color):
"""Check if the string color is a legal Tkinter color string.
"""
try:
rgb = self.cv.winfo_rgb(color)
ok = True
except TK.TclError:
ok = False
return ok
def _bgcolor(self, color=None):
"""Set canvas' backgroundcolor if color is not None,
else return backgroundcolor."""
if color is not None:
self.cv.config(bg = color)
self._update()
else:
return self.cv.cget("bg")
def _write(self, pos, txt, align, font, pencolor):
"""Write txt at pos in canvas with specified font
and color.
Return text item and x-coord of right bottom corner
of text's bounding box."""
x, y = pos
x = x * self.xscale
y = y * self.yscale
anchor = {"left":"sw", "center":"s", "right":"se" }
item = self.cv.create_text(x-1, -y, text = txt, anchor = anchor[align],
fill = pencolor, font = font)
x0, y0, x1, y1 = self.cv.bbox(item)
self.cv.update()
return item, x1-1
## def _dot(self, pos, size, color):
## """may be implemented for some other graphics toolkit"""
def _onclick(self, item, fun, num=1, add=None):
"""Bind fun to mouse-click event on turtle.
fun must be a function with two arguments, the coordinates
of the clicked point on the canvas.
num, the number of the mouse-button defaults to 1
"""
if fun is None:
self.cv.tag_unbind(item, "<Button-%s>" % num)
else:
def eventfun(event):
x, y = (self.cv.canvasx(event.x)/self.xscale,
-self.cv.canvasy(event.y)/self.yscale)
fun(x, y)
self.cv.tag_bind(item, "<Button-%s>" % num, eventfun, add)
def _onrelease(self, item, fun, num=1, add=None):
"""Bind fun to mouse-button-release event on turtle.
fun must be a function with two arguments, the coordinates
of the point on the canvas where mouse button is released.
num, the number of the mouse-button defaults to 1
If a turtle is clicked, first _onclick-event will be performed,
then _onscreensclick-event.
"""
if fun is None:
self.cv.tag_unbind(item, "<Button%s-ButtonRelease>" % num)
else:
def eventfun(event):
x, y = (self.cv.canvasx(event.x)/self.xscale,
-self.cv.canvasy(event.y)/self.yscale)
fun(x, y)
self.cv.tag_bind(item, "<Button%s-ButtonRelease>" % num,
eventfun, add)
def _ondrag(self, item, fun, num=1, add=None):
"""Bind fun to mouse-move-event (with pressed mouse button) on turtle.
fun must be a function with two arguments, the coordinates of the
actual mouse position on the canvas.
num, the number of the mouse-button defaults to 1
Every sequence of mouse-move-events on a turtle is preceded by a
mouse-click event on that turtle.
"""
if fun is None:
self.cv.tag_unbind(item, "<Button%s-Motion>" % num)
else:
def eventfun(event):
try:
x, y = (self.cv.canvasx(event.x)/self.xscale,
-self.cv.canvasy(event.y)/self.yscale)
fun(x, y)
except:
pass
self.cv.tag_bind(item, "<Button%s-Motion>" % num, eventfun, add)
def _onscreenclick(self, fun, num=1, add=None):
"""Bind fun to mouse-click event on canvas.
fun must be a function with two arguments, the coordinates
of the clicked point on the canvas.
num, the number of the mouse-button defaults to 1
If a turtle is clicked, first _onclick-event will be performed,
then _onscreensclick-event.
"""
if fun is None:
self.cv.unbind("<Button-%s>" % num)
else:
def eventfun(event):
x, y = (self.cv.canvasx(event.x)/self.xscale,
-self.cv.canvasy(event.y)/self.yscale)
fun(x, y)
self.cv.bind("<Button-%s>" % num, eventfun, add)
def _onkeyrelease(self, fun, key):
"""Bind fun to key-release event of key.
Canvas must have focus. See method listen
"""
if fun is None:
self.cv.unbind("<KeyRelease-%s>" % key, None)
else:
def eventfun(event):
fun()
self.cv.bind("<KeyRelease-%s>" % key, eventfun)
def _onkeypress(self, fun, key=None):
"""If key is given, bind fun to key-press event of key.
Otherwise bind fun to any key-press.
Canvas must have focus. See method listen.
"""
if fun is None:
if key is None:
self.cv.unbind("<KeyPress>", None)
else:
self.cv.unbind("<KeyPress-%s>" % key, None)
else:
def eventfun(event):
fun()
if key is None:
self.cv.bind("<KeyPress>", eventfun)
else:
self.cv.bind("<KeyPress-%s>" % key, eventfun)
def _listen(self):
"""Set focus on canvas (in order to collect key-events)
"""
self.cv.focus_force()
def _ontimer(self, fun, t):
"""Install a timer, which calls fun after t milliseconds.
"""
if t == 0:
self.cv.after_idle(fun)
else:
self.cv.after(t, fun)
def _createimage(self, image):
"""Create and return image item on canvas.
"""
return self.cv.create_image(0, 0, image=image)
def _drawimage(self, item, pos, image):
"""Configure image item as to draw image object
at position (x,y) on canvas)
"""
x, y = pos
self.cv.coords(item, (x * self.xscale, -y * self.yscale))
self.cv.itemconfig(item, image=image)
def _setbgpic(self, item, image):
"""Configure image item as to draw image object
at center of canvas. Set item to the first item
in the displaylist, so it will be drawn below
any other item ."""
self.cv.itemconfig(item, image=image)
self.cv.tag_lower(item)
def _type(self, item):
"""Return 'line' or 'polygon' or 'image' depending on
type of item.
"""
return self.cv.type(item)
def _pointlist(self, item):
"""returns list of coordinate-pairs of points of item
Example (for insiders):
>>> from turtle import *
>>> getscreen()._pointlist(getturtle().turtle._item)
[(0.0, 9.9999999999999982), (0.0, -9.9999999999999982),
(9.9999999999999982, 0.0)]
>>> """
cl = self.cv.coords(item)
pl = [(cl[i], -cl[i+1]) for i in range(0, len(cl), 2)]
return pl
def _setscrollregion(self, srx1, sry1, srx2, sry2):
self.cv.config(scrollregion=(srx1, sry1, srx2, sry2))
def _rescale(self, xscalefactor, yscalefactor):
items = self.cv.find_all()
for item in items:
coordinates = list(self.cv.coords(item))
newcoordlist = []
while coordinates:
x, y = coordinates[:2]
newcoordlist.append(x * xscalefactor)
newcoordlist.append(y * yscalefactor)
coordinates = coordinates[2:]
self.cv.coords(item, *newcoordlist)
def _resize(self, canvwidth=None, canvheight=None, bg=None):
"""Resize the canvas the turtles are drawing on. Does
not alter the drawing window.
"""
# needs amendment
if not isinstance(self.cv, ScrolledCanvas):
return self.canvwidth, self.canvheight
if canvwidth is canvheight is bg is None:
return self.cv.canvwidth, self.cv.canvheight
if canvwidth is not None:
self.canvwidth = canvwidth
if canvheight is not None:
self.canvheight = canvheight
self.cv.reset(canvwidth, canvheight, bg)
def _window_size(self):
""" Return the width and height of the turtle window.
"""
width = self.cv.winfo_width()
if width <= 1: # the window isn't managed by a geometry manager
width = self.cv['width']
height = self.cv.winfo_height()
if height <= 1: # the window isn't managed by a geometry manager
height = self.cv['height']
return width, height
def mainloop(self):
"""Starts event loop - calling Tkinter's mainloop function.
No argument.
Must be last statement in a turtle graphics program.
Must NOT be used if a script is run from within IDLE in -n mode
(No subprocess) - for interactive use of turtle graphics.
Example (for a TurtleScreen instance named screen):
>>> screen.mainloop()
"""
TK.mainloop()
def textinput(self, title, prompt):
"""Pop up a dialog window for input of a string.
Arguments: title is the title of the dialog window,
prompt is a text mostly describing what information to input.
Return the string input
If the dialog is canceled, return None.
Example (for a TurtleScreen instance named screen):
>>> screen.textinput("NIM", "Name of first player:")
"""
return simpledialog.askstring(title, prompt)
def numinput(self, title, prompt, default=None, minval=None, maxval=None):
"""Pop up a dialog window for input of a number.
Arguments: title is the title of the dialog window,
prompt is a text mostly describing what numerical information to input.
default: default value
minval: minimum value for imput
maxval: maximum value for input
The number input must be in the range minval .. maxval if these are
given. If not, a hint is issued and the dialog remains open for
correction. Return the number input.
If the dialog is canceled, return None.
Example (for a TurtleScreen instance named screen):
>>> screen.numinput("Poker", "Your stakes:", 1000, minval=10, maxval=10000)
"""
return simpledialog.askfloat(title, prompt, initialvalue=default,
minvalue=minval, maxvalue=maxval)
##############################################################################
### End of Tkinter - interface ###
##############################################################################
class Terminator (Exception):
"""Will be raised in TurtleScreen.update, if _RUNNING becomes False.
This stops execution of a turtle graphics script.
Main purpose: use in the Demo-Viewer turtle.Demo.py.
"""
pass
class TurtleGraphicsError(Exception):
"""Some TurtleGraphics Error
"""
class Shape(object):
"""Data structure modeling shapes.
attribute _type is one of "polygon", "image", "compound"
attribute _data is - depending on _type a poygon-tuple,
an image or a list constructed using the addcomponent method.
"""
def __init__(self, type_, data=None):
self._type = type_
if type_ == "polygon":
if isinstance(data, list):
data = tuple(data)
elif type_ == "image":
if isinstance(data, str):
if data.lower().endswith(".gif") and isfile(data):
data = TurtleScreen._image(data)
# else data assumed to be Photoimage
elif type_ == "compound":
data = []
else:
raise TurtleGraphicsError("There is no shape type %s" % type_)
self._data = data
def addcomponent(self, poly, fill, outline=None):
"""Add component to a shape of type compound.
Arguments: poly is a polygon, i. e. a tuple of number pairs.
fill is the fillcolor of the component,
outline is the outline color of the component.
call (for a Shapeobject namend s):
-- s.addcomponent(((0,0), (10,10), (-10,10)), "red", "blue")
Example:
>>> poly = ((0,0),(10,-5),(0,10),(-10,-5))
>>> s = Shape("compound")
>>> s.addcomponent(poly, "red", "blue")
>>> # .. add more components and then use register_shape()
"""
if self._type != "compound":
raise TurtleGraphicsError("Cannot add component to %s Shape"
% self._type)
if outline is None:
outline = fill
self._data.append([poly, fill, outline])
class Tbuffer(object):
"""Ring buffer used as undobuffer for RawTurtle objects."""
def __init__(self, bufsize=10):
self.bufsize = bufsize
self.buffer = [[None]] * bufsize
self.ptr = -1
self.cumulate = False
def reset(self, bufsize=None):
if bufsize is None:
for i in range(self.bufsize):
self.buffer[i] = [None]
else:
self.bufsize = bufsize
self.buffer = [[None]] * bufsize
self.ptr = -1
def push(self, item):
if self.bufsize > 0:
if not self.cumulate:
self.ptr = (self.ptr + 1) % self.bufsize
self.buffer[self.ptr] = item
else:
self.buffer[self.ptr].append(item)
def pop(self):
if self.bufsize > 0:
item = self.buffer[self.ptr]
if item is None:
return None
else:
self.buffer[self.ptr] = [None]
self.ptr = (self.ptr - 1) % self.bufsize
return (item)
def nr_of_items(self):
return self.bufsize - self.buffer.count([None])
def __repr__(self):
return str(self.buffer) + " " + str(self.ptr)
class TurtleScreen(TurtleScreenBase):
"""Provides screen oriented methods like setbg etc.
Only relies upon the methods of TurtleScreenBase and NOT
upon components of the underlying graphics toolkit -
which is Tkinter in this case.
"""
_RUNNING = True
def __init__(self, cv, mode=_CFG["mode"],
colormode=_CFG["colormode"], delay=_CFG["delay"]):
self._shapes = {
"arrow" : Shape("polygon", ((-10,0), (10,0), (0,10))),
"turtle" : Shape("polygon", ((0,16), (-2,14), (-1,10), (-4,7),
(-7,9), (-9,8), (-6,5), (-7,1), (-5,-3), (-8,-6),
(-6,-8), (-4,-5), (0,-7), (4,-5), (6,-8), (8,-6),
(5,-3), (7,1), (6,5), (9,8), (7,9), (4,7), (1,10),
(2,14))),
"circle" : Shape("polygon", ((10,0), (9.51,3.09), (8.09,5.88),
(5.88,8.09), (3.09,9.51), (0,10), (-3.09,9.51),
(-5.88,8.09), (-8.09,5.88), (-9.51,3.09), (-10,0),
(-9.51,-3.09), (-8.09,-5.88), (-5.88,-8.09),
(-3.09,-9.51), (-0.00,-10.00), (3.09,-9.51),
(5.88,-8.09), (8.09,-5.88), (9.51,-3.09))),
"square" : Shape("polygon", ((10,-10), (10,10), (-10,10),
(-10,-10))),
"triangle" : Shape("polygon", ((10,-5.77), (0,11.55),
(-10,-5.77))),
"classic": Shape("polygon", ((0,0),(-5,-9),(0,-7),(5,-9))),
"blank" : Shape("image", self._blankimage())
}
self._bgpics = {"nopic" : ""}
TurtleScreenBase.__init__(self, cv)
self._mode = mode
self._delayvalue = delay
self._colormode = _CFG["colormode"]
self._keys = []
self.clear()
if sys.platform == 'darwin':
# Force Turtle window to the front on OS X. This is needed because
# the Turtle window will show behind the Terminal window when you
# start the demo from the command line.
rootwindow = cv.winfo_toplevel()
rootwindow.call('wm', 'attributes', '.', '-topmost', '1')
rootwindow.call('wm', 'attributes', '.', '-topmost', '0')
def clear(self):
"""Delete all drawings and all turtles from the TurtleScreen.
No argument.
Reset empty TurtleScreen to its initial state: white background,
no backgroundimage, no eventbindings and tracing on.
Example (for a TurtleScreen instance named screen):
>>> screen.clear()
Note: this method is not available as function.
"""
self._delayvalue = _CFG["delay"]
self._colormode = _CFG["colormode"]
self._delete("all")
self._bgpic = self._createimage("")
self._bgpicname = "nopic"
self._tracing = 1
self._updatecounter = 0
self._turtles = []
self.bgcolor("white")
for btn in 1, 2, 3:
self.onclick(None, btn)
self.onkeypress(None)
for key in self._keys[:]:
self.onkey(None, key)
self.onkeypress(None, key)
Turtle._pen = None
def mode(self, mode=None):
"""Set turtle-mode ('standard', 'logo' or 'world') and perform reset.
Optional argument:
mode -- on of the strings 'standard', 'logo' or 'world'
Mode 'standard' is compatible with turtle.py.
Mode 'logo' is compatible with most Logo-Turtle-Graphics.
Mode 'world' uses userdefined 'worldcoordinates'. *Attention*: in
this mode angles appear distorted if x/y unit-ratio doesn't equal 1.
If mode is not given, return the current mode.
Mode Initial turtle heading positive angles
------------|-------------------------|-------------------
'standard' to the right (east) counterclockwise
'logo' upward (north) clockwise
Examples:
>>> mode('logo') # resets turtle heading to north
>>> mode()
'logo'
"""
if mode is None:
return self._mode
mode = mode.lower()
if mode not in ["standard", "logo", "world"]:
raise TurtleGraphicsError("No turtle-graphics-mode %s" % mode)
self._mode = mode
if mode in ["standard", "logo"]:
self._setscrollregion(-self.canvwidth//2, -self.canvheight//2,
self.canvwidth//2, self.canvheight//2)
self.xscale = self.yscale = 1.0
self.reset()
def setworldcoordinates(self, llx, lly, urx, ury):
"""Set up a user defined coordinate-system.
Arguments:
llx -- a number, x-coordinate of lower left corner of canvas
lly -- a number, y-coordinate of lower left corner of canvas
urx -- a number, x-coordinate of upper right corner of canvas
ury -- a number, y-coordinate of upper right corner of canvas
Set up user coodinat-system and switch to mode 'world' if necessary.
This performs a screen.reset. If mode 'world' is already active,
all drawings are redrawn according to the new coordinates.
But ATTENTION: in user-defined coordinatesystems angles may appear
distorted. (see Screen.mode())
Example (for a TurtleScreen instance named screen):
>>> screen.setworldcoordinates(-10,-0.5,50,1.5)
>>> for _ in range(36):
... left(10)
... forward(0.5)
"""
if self.mode() != "world":
self.mode("world")
xspan = float(urx - llx)
yspan = float(ury - lly)
wx, wy = self._window_size()
self.screensize(wx-20, wy-20)
oldxscale, oldyscale = self.xscale, self.yscale
self.xscale = self.canvwidth / xspan
self.yscale = self.canvheight / yspan
srx1 = llx * self.xscale
sry1 = -ury * self.yscale
srx2 = self.canvwidth + srx1
sry2 = self.canvheight + sry1
self._setscrollregion(srx1, sry1, srx2, sry2)
self._rescale(self.xscale/oldxscale, self.yscale/oldyscale)
self.update()
def register_shape(self, name, shape=None):
"""Adds a turtle shape to TurtleScreen's shapelist.
Arguments:
(1) name is the name of a gif-file and shape is None.
Installs the corresponding image shape.
!! Image-shapes DO NOT rotate when turning the turtle,
!! so they do not display the heading of the turtle!
(2) name is an arbitrary string and shape is a tuple
of pairs of coordinates. Installs the corresponding
polygon shape
(3) name is an arbitrary string and shape is a
(compound) Shape object. Installs the corresponding
compound shape.
To use a shape, you have to issue the command shape(shapename).
call: register_shape("turtle.gif")
--or: register_shape("tri", ((0,0), (10,10), (-10,10)))
Example (for a TurtleScreen instance named screen):
>>> screen.register_shape("triangle", ((5,-3),(0,5),(-5,-3)))
"""
if shape is None:
# image
if name.lower().endswith(".gif"):
shape = Shape("image", self._image(name))
else:
raise TurtleGraphicsError("Bad arguments for register_shape.\n"
+ "Use help(register_shape)" )
elif isinstance(shape, tuple):
shape = Shape("polygon", shape)
## else shape assumed to be Shape-instance
self._shapes[name] = shape
def _colorstr(self, color):
"""Return color string corresponding to args.
Argument may be a string or a tuple of three
numbers corresponding to actual colormode,
i.e. in the range 0<=n<=colormode.
If the argument doesn't represent a color,
an error is raised.
"""
if len(color) == 1:
color = color[0]
if isinstance(color, str):
if self._iscolorstring(color) or color == "":
return color
else:
raise TurtleGraphicsError("bad color string: %s" % str(color))
try:
r, g, b = color
except:
raise TurtleGraphicsError("bad color arguments: %s" % str(color))
if self._colormode == 1.0:
r, g, b = [round(255.0*x) for x in (r, g, b)]
if not ((0 <= r <= 255) and (0 <= g <= 255) and (0 <= b <= 255)):
raise TurtleGraphicsError("bad color sequence: %s" % str(color))
return "#%02x%02x%02x" % (r, g, b)
def _color(self, cstr):
if not cstr.startswith("#"):
return cstr
if len(cstr) == 7:
cl = [int(cstr[i:i+2], 16) for i in (1, 3, 5)]
elif len(cstr) == 4:
cl = [16*int(cstr[h], 16) for h in cstr[1:]]
else:
raise TurtleGraphicsError("bad colorstring: %s" % cstr)
return tuple([c * self._colormode/255 for c in cl])
def colormode(self, cmode=None):
"""Return the colormode or set it to 1.0 or 255.
Optional argument:
cmode -- one of the values 1.0 or 255
r, g, b values of colortriples have to be in range 0..cmode.
Example (for a TurtleScreen instance named screen):
>>> screen.colormode()
1.0
>>> screen.colormode(255)
>>> pencolor(240,160,80)
"""
if cmode is None:
return self._colormode
if cmode == 1.0:
self._colormode = float(cmode)
elif cmode == 255:
self._colormode = int(cmode)
def reset(self):
"""Reset all Turtles on the Screen to their initial state.
No argument.
Example (for a TurtleScreen instance named screen):
>>> screen.reset()
"""
for turtle in self._turtles:
turtle._setmode(self._mode)
turtle.reset()
def turtles(self):
"""Return the list of turtles on the screen.
Example (for a TurtleScreen instance named screen):
>>> screen.turtles()
[<turtle.Turtle object at 0x00E11FB0>]
"""
return self._turtles
def bgcolor(self, *args):
"""Set or return backgroundcolor of the TurtleScreen.
Arguments (if given): a color string or three numbers
in the range 0..colormode or a 3-tuple of such numbers.
Example (for a TurtleScreen instance named screen):
>>> screen.bgcolor("orange")
>>> screen.bgcolor()
'orange'
>>> screen.bgcolor(0.5,0,0.5)
>>> screen.bgcolor()
'#800080'
"""
if args:
color = self._colorstr(args)
else:
color = None
color = self._bgcolor(color)
if color is not None:
color = self._color(color)
return color
def tracer(self, n=None, delay=None):
"""Turns turtle animation on/off and set delay for update drawings.
Optional arguments:
n -- nonnegative integer
delay -- nonnegative integer
If n is given, only each n-th regular screen update is really performed.
(Can be used to accelerate the drawing of complex graphics.)
Second arguments sets delay value (see RawTurtle.delay())
Example (for a TurtleScreen instance named screen):
>>> screen.tracer(8, 25)
>>> dist = 2
>>> for i in range(200):
... fd(dist)
... rt(90)
... dist += 2
"""
if n is None:
return self._tracing
self._tracing = int(n)
self._updatecounter = 0
if delay is not None:
self._delayvalue = int(delay)
if self._tracing:
self.update()
def delay(self, delay=None):
""" Return or set the drawing delay in milliseconds.
Optional argument:
delay -- positive integer
Example (for a TurtleScreen instance named screen):
>>> screen.delay(15)
>>> screen.delay()
15
"""
if delay is None:
return self._delayvalue
self._delayvalue = int(delay)
def _incrementudc(self):
"""Increment update counter."""
if not TurtleScreen._RUNNING:
TurtleScreen._RUNNING = True
raise Terminator
if self._tracing > 0:
self._updatecounter += 1
self._updatecounter %= self._tracing
def update(self):
"""Perform a TurtleScreen update.
"""
tracing = self._tracing
self._tracing = True
for t in self.turtles():
t._update_data()
t._drawturtle()
self._tracing = tracing
self._update()
def window_width(self):
""" Return the width of the turtle window.
Example (for a TurtleScreen instance named screen):
>>> screen.window_width()
640
"""
return self._window_size()[0]
def window_height(self):
""" Return the height of the turtle window.
Example (for a TurtleScreen instance named screen):
>>> screen.window_height()
480
"""
return self._window_size()[1]
def getcanvas(self):
"""Return the Canvas of this TurtleScreen.
No argument.
Example (for a Screen instance named screen):
>>> cv = screen.getcanvas()
>>> cv
<turtle.ScrolledCanvas instance at 0x010742D8>
"""
return self.cv
def getshapes(self):
"""Return a list of names of all currently available turtle shapes.
No argument.
Example (for a TurtleScreen instance named screen):
>>> screen.getshapes()
['arrow', 'blank', 'circle', ... , 'turtle']
"""
return sorted(self._shapes.keys())
def onclick(self, fun, btn=1, add=None):
"""Bind fun to mouse-click event on canvas.
Arguments:
fun -- a function with two arguments, the coordinates of the
clicked point on the canvas.
num -- the number of the mouse-button, defaults to 1
Example (for a TurtleScreen instance named screen)
>>> screen.onclick(goto)
>>> # Subsequently clicking into the TurtleScreen will
>>> # make the turtle move to the clicked point.
>>> screen.onclick(None)
"""
self._onscreenclick(fun, btn, add)
def onkey(self, fun, key):
"""Bind fun to key-release event of key.
Arguments:
fun -- a function with no arguments
key -- a string: key (e.g. "a") or key-symbol (e.g. "space")
In order to be able to register key-events, TurtleScreen
must have focus. (See method listen.)
Example (for a TurtleScreen instance named screen):
>>> def f():
... fd(50)
... lt(60)
...
>>> screen.onkey(f, "Up")
>>> screen.listen()
Subsequently the turtle can be moved by repeatedly pressing
the up-arrow key, consequently drawing a hexagon
"""
if fun is None:
if key in self._keys:
self._keys.remove(key)
elif key not in self._keys:
self._keys.append(key)
self._onkeyrelease(fun, key)
def onkeypress(self, fun, key=None):
"""Bind fun to key-press event of key if key is given,
or to any key-press-event if no key is given.
Arguments:
fun -- a function with no arguments
key -- a string: key (e.g. "a") or key-symbol (e.g. "space")
In order to be able to register key-events, TurtleScreen
must have focus. (See method listen.)
Example (for a TurtleScreen instance named screen
and a Turtle instance named turtle):
>>> def f():
... fd(50)
... lt(60)
...
>>> screen.onkeypress(f, "Up")
>>> screen.listen()
Subsequently the turtle can be moved by repeatedly pressing
the up-arrow key, or by keeping pressed the up-arrow key.
consequently drawing a hexagon.
"""
if fun is None:
if key in self._keys:
self._keys.remove(key)
elif key is not None and key not in self._keys:
self._keys.append(key)
self._onkeypress(fun, key)
def listen(self, xdummy=None, ydummy=None):
"""Set focus on TurtleScreen (in order to collect key-events)
No arguments.
Dummy arguments are provided in order
to be able to pass listen to the onclick method.
Example (for a TurtleScreen instance named screen):
>>> screen.listen()
"""
self._listen()
def ontimer(self, fun, t=0):
"""Install a timer, which calls fun after t milliseconds.
Arguments:
fun -- a function with no arguments.
t -- a number >= 0
Example (for a TurtleScreen instance named screen):
>>> running = True
>>> def f():
... if running:
... fd(50)
... lt(60)
... screen.ontimer(f, 250)
...
>>> f() # makes the turtle marching around
>>> running = False
"""
self._ontimer(fun, t)
def bgpic(self, picname=None):
"""Set background image or return name of current backgroundimage.
Optional argument:
picname -- a string, name of a gif-file or "nopic".
If picname is a filename, set the corresponding image as background.
If picname is "nopic", delete backgroundimage, if present.
If picname is None, return the filename of the current backgroundimage.
Example (for a TurtleScreen instance named screen):
>>> screen.bgpic()
'nopic'
>>> screen.bgpic("landscape.gif")
>>> screen.bgpic()
'landscape.gif'
"""
if picname is None:
return self._bgpicname
if picname not in self._bgpics:
self._bgpics[picname] = self._image(picname)
self._setbgpic(self._bgpic, self._bgpics[picname])
self._bgpicname = picname
def screensize(self, canvwidth=None, canvheight=None, bg=None):
"""Resize the canvas the turtles are drawing on.
Optional arguments:
canvwidth -- positive integer, new width of canvas in pixels
canvheight -- positive integer, new height of canvas in pixels
bg -- colorstring or color-tuple, new backgroundcolor
If no arguments are given, return current (canvaswidth, canvasheight)
Do not alter the drawing window. To observe hidden parts of
the canvas use the scrollbars. (Can make visible those parts
of a drawing, which were outside the canvas before!)
Example (for a Turtle instance named turtle):
>>> turtle.screensize(2000,1500)
>>> # e.g. to search for an erroneously escaped turtle ;-)
"""
return self._resize(canvwidth, canvheight, bg)
onscreenclick = onclick
resetscreen = reset
clearscreen = clear
addshape = register_shape
onkeyrelease = onkey
class TNavigator(object):
"""Navigation part of the RawTurtle.
Implements methods for turtle movement.
"""
START_ORIENTATION = {
"standard": Vec2D(1.0, 0.0),
"world" : Vec2D(1.0, 0.0),
"logo" : Vec2D(0.0, 1.0) }
DEFAULT_MODE = "standard"
DEFAULT_ANGLEOFFSET = 0
DEFAULT_ANGLEORIENT = 1
def __init__(self, mode=DEFAULT_MODE):
self._angleOffset = self.DEFAULT_ANGLEOFFSET
self._angleOrient = self.DEFAULT_ANGLEORIENT
self._mode = mode
self.undobuffer = None
self.degrees()
self._mode = None
self._setmode(mode)
TNavigator.reset(self)
def reset(self):
"""reset turtle to its initial values
Will be overwritten by parent class
"""
self._position = Vec2D(0.0, 0.0)
self._orient = TNavigator.START_ORIENTATION[self._mode]
def _setmode(self, mode=None):
"""Set turtle-mode to 'standard', 'world' or 'logo'.
"""
if mode is None:
return self._mode
if mode not in ["standard", "logo", "world"]:
return
self._mode = mode
if mode in ["standard", "world"]:
self._angleOffset = 0
self._angleOrient = 1
else: # mode == "logo":
self._angleOffset = self._fullcircle/4.
self._angleOrient = -1
def _setDegreesPerAU(self, fullcircle):
"""Helper function for degrees() and radians()"""
self._fullcircle = fullcircle
self._degreesPerAU = 360/fullcircle
if self._mode == "standard":
self._angleOffset = 0
else:
self._angleOffset = fullcircle/4.
def degrees(self, fullcircle=360.0):
""" Set angle measurement units to degrees.
Optional argument:
fullcircle - a number
Set angle measurement units, i. e. set number
of 'degrees' for a full circle. Dafault value is
360 degrees.
Example (for a Turtle instance named turtle):
>>> turtle.left(90)
>>> turtle.heading()
90
Change angle measurement unit to grad (also known as gon,
grade, or gradian and equals 1/100-th of the right angle.)
>>> turtle.degrees(400.0)
>>> turtle.heading()
100
"""
self._setDegreesPerAU(fullcircle)
def radians(self):
""" Set the angle measurement units to radians.
No arguments.
Example (for a Turtle instance named turtle):
>>> turtle.heading()
90
>>> turtle.radians()
>>> turtle.heading()
1.5707963267948966
"""
self._setDegreesPerAU(2*math.pi)
def _go(self, distance):
"""move turtle forward by specified distance"""
ende = self._position + self._orient * distance
self._goto(ende)
def _rotate(self, angle):
"""Turn turtle counterclockwise by specified angle if angle > 0."""
angle *= self._degreesPerAU
self._orient = self._orient.rotate(angle)
def _goto(self, end):
"""move turtle to position end."""
self._position = end
def forward(self, distance):
"""Move the turtle forward by the specified distance.
Aliases: forward | fd
Argument:
distance -- a number (integer or float)
Move the turtle forward by the specified distance, in the direction
the turtle is headed.
Example (for a Turtle instance named turtle):
>>> turtle.position()
(0.00, 0.00)
>>> turtle.forward(25)
>>> turtle.position()
(25.00,0.00)
>>> turtle.forward(-75)
>>> turtle.position()
(-50.00,0.00)
"""
self._go(distance)
def back(self, distance):
"""Move the turtle backward by distance.
Aliases: back | backward | bk
Argument:
distance -- a number
Move the turtle backward by distance ,opposite to the direction the
turtle is headed. Do not change the turtle's heading.
Example (for a Turtle instance named turtle):
>>> turtle.position()
(0.00, 0.00)
>>> turtle.backward(30)
>>> turtle.position()
(-30.00, 0.00)
"""
self._go(-distance)
def right(self, angle):
"""Turn turtle right by angle units.
Aliases: right | rt
Argument:
angle -- a number (integer or float)
Turn turtle right by angle units. (Units are by default degrees,
but can be set via the degrees() and radians() functions.)
Angle orientation depends on mode. (See this.)
Example (for a Turtle instance named turtle):
>>> turtle.heading()
22.0
>>> turtle.right(45)
>>> turtle.heading()
337.0
"""
self._rotate(-angle)
def left(self, angle):
"""Turn turtle left by angle units.
Aliases: left | lt
Argument:
angle -- a number (integer or float)
Turn turtle left by angle units. (Units are by default degrees,
but can be set via the degrees() and radians() functions.)
Angle orientation depends on mode. (See this.)
Example (for a Turtle instance named turtle):
>>> turtle.heading()
22.0
>>> turtle.left(45)
>>> turtle.heading()
67.0
"""
self._rotate(angle)
def pos(self):
"""Return the turtle's current location (x,y), as a Vec2D-vector.
Aliases: pos | position
No arguments.
Example (for a Turtle instance named turtle):
>>> turtle.pos()
(0.00, 240.00)
"""
return self._position
def xcor(self):
""" Return the turtle's x coordinate.
No arguments.
Example (for a Turtle instance named turtle):
>>> reset()
>>> turtle.left(60)
>>> turtle.forward(100)
>>> print turtle.xcor()
50.0
"""
return self._position[0]
def ycor(self):
""" Return the turtle's y coordinate
---
No arguments.
Example (for a Turtle instance named turtle):
>>> reset()
>>> turtle.left(60)
>>> turtle.forward(100)
>>> print turtle.ycor()
86.6025403784
"""
return self._position[1]
def goto(self, x, y=None):
"""Move turtle to an absolute position.
Aliases: setpos | setposition | goto:
Arguments:
x -- a number or a pair/vector of numbers
y -- a number None
call: goto(x, y) # two coordinates
--or: goto((x, y)) # a pair (tuple) of coordinates
--or: goto(vec) # e.g. as returned by pos()
Move turtle to an absolute position. If the pen is down,
a line will be drawn. The turtle's orientation does not change.
Example (for a Turtle instance named turtle):
>>> tp = turtle.pos()
>>> tp
(0.00, 0.00)
>>> turtle.setpos(60,30)
>>> turtle.pos()
(60.00,30.00)
>>> turtle.setpos((20,80))
>>> turtle.pos()
(20.00,80.00)
>>> turtle.setpos(tp)
>>> turtle.pos()
(0.00,0.00)
"""
if y is None:
self._goto(Vec2D(*x))
else:
self._goto(Vec2D(x, y))
def home(self):
"""Move turtle to the origin - coordinates (0,0).
No arguments.
Move turtle to the origin - coordinates (0,0) and set its
heading to its start-orientation (which depends on mode).
Example (for a Turtle instance named turtle):
>>> turtle.home()
"""
self.goto(0, 0)
self.setheading(0)
def setx(self, x):
"""Set the turtle's first coordinate to x
Argument:
x -- a number (integer or float)
Set the turtle's first coordinate to x, leave second coordinate
unchanged.
Example (for a Turtle instance named turtle):
>>> turtle.position()
(0.00, 240.00)
>>> turtle.setx(10)
>>> turtle.position()
(10.00, 240.00)
"""
self._goto(Vec2D(x, self._position[1]))
def sety(self, y):
"""Set the turtle's second coordinate to y
Argument:
y -- a number (integer or float)
Set the turtle's first coordinate to x, second coordinate remains
unchanged.
Example (for a Turtle instance named turtle):
>>> turtle.position()
(0.00, 40.00)
>>> turtle.sety(-10)
>>> turtle.position()
(0.00, -10.00)
"""
self._goto(Vec2D(self._position[0], y))
def distance(self, x, y=None):
"""Return the distance from the turtle to (x,y) in turtle step units.
Arguments:
x -- a number or a pair/vector of numbers or a turtle instance
y -- a number None None
call: distance(x, y) # two coordinates
--or: distance((x, y)) # a pair (tuple) of coordinates
--or: distance(vec) # e.g. as returned by pos()
--or: distance(mypen) # where mypen is another turtle
Example (for a Turtle instance named turtle):
>>> turtle.pos()
(0.00, 0.00)
>>> turtle.distance(30,40)
50.0
>>> pen = Turtle()
>>> pen.forward(77)
>>> turtle.distance(pen)
77.0
"""
if y is not None:
pos = Vec2D(x, y)
if isinstance(x, Vec2D):
pos = x
elif isinstance(x, tuple):
pos = Vec2D(*x)
elif isinstance(x, TNavigator):
pos = x._position
return abs(pos - self._position)
def towards(self, x, y=None):
"""Return the angle of the line from the turtle's position to (x, y).
Arguments:
x -- a number or a pair/vector of numbers or a turtle instance
y -- a number None None
call: distance(x, y) # two coordinates
--or: distance((x, y)) # a pair (tuple) of coordinates
--or: distance(vec) # e.g. as returned by pos()
--or: distance(mypen) # where mypen is another turtle
Return the angle, between the line from turtle-position to position
specified by x, y and the turtle's start orientation. (Depends on
modes - "standard" or "logo")
Example (for a Turtle instance named turtle):
>>> turtle.pos()
(10.00, 10.00)
>>> turtle.towards(0,0)
225.0
"""
if y is not None:
pos = Vec2D(x, y)
if isinstance(x, Vec2D):
pos = x
elif isinstance(x, tuple):
pos = Vec2D(*x)
elif isinstance(x, TNavigator):
pos = x._position
x, y = pos - self._position
result = round(math.atan2(y, x)*180.0/math.pi, 10) % 360.0
result /= self._degreesPerAU
return (self._angleOffset + self._angleOrient*result) % self._fullcircle
def heading(self):
""" Return the turtle's current heading.
No arguments.
Example (for a Turtle instance named turtle):
>>> turtle.left(67)
>>> turtle.heading()
67.0
"""
x, y = self._orient
result = round(math.atan2(y, x)*180.0/math.pi, 10) % 360.0
result /= self._degreesPerAU
return (self._angleOffset + self._angleOrient*result) % self._fullcircle
def setheading(self, to_angle):
"""Set the orientation of the turtle to to_angle.
Aliases: setheading | seth
Argument:
to_angle -- a number (integer or float)
Set the orientation of the turtle to to_angle.
Here are some common directions in degrees:
standard - mode: logo-mode:
-------------------|--------------------
0 - east 0 - north
90 - north 90 - east
180 - west 180 - south
270 - south 270 - west
Example (for a Turtle instance named turtle):
>>> turtle.setheading(90)
>>> turtle.heading()
90
"""
angle = (to_angle - self.heading())*self._angleOrient
full = self._fullcircle
angle = (angle+full/2.)%full - full/2.
self._rotate(angle)
def circle(self, radius, extent = None, steps = None):
""" Draw a circle with given radius.
Arguments:
radius -- a number
extent (optional) -- a number
steps (optional) -- an integer
Draw a circle with given radius. The center is radius units left
of the turtle; extent - an angle - determines which part of the
circle is drawn. If extent is not given, draw the entire circle.
If extent is not a full circle, one endpoint of the arc is the
current pen position. Draw the arc in counterclockwise direction
if radius is positive, otherwise in clockwise direction. Finally
the direction of the turtle is changed by the amount of extent.
As the circle is approximated by an inscribed regular polygon,
steps determines the number of steps to use. If not given,
it will be calculated automatically. Maybe used to draw regular
polygons.
call: circle(radius) # full circle
--or: circle(radius, extent) # arc
--or: circle(radius, extent, steps)
--or: circle(radius, steps=6) # 6-sided polygon
Example (for a Turtle instance named turtle):
>>> turtle.circle(50)
>>> turtle.circle(120, 180) # semicircle
"""
if self.undobuffer:
self.undobuffer.push(["seq"])
self.undobuffer.cumulate = True
speed = self.speed()
if extent is None:
extent = self._fullcircle
if steps is None:
frac = abs(extent)/self._fullcircle
steps = 1+int(min(11+abs(radius)/6.0, 59.0)*frac)
w = 1.0 * extent / steps
w2 = 0.5 * w
l = 2.0 * radius * math.sin(w2*math.pi/180.0*self._degreesPerAU)
if radius < 0:
l, w, w2 = -l, -w, -w2
tr = self._tracer()
dl = self._delay()
if speed == 0:
self._tracer(0, 0)
else:
self.speed(0)
self._rotate(w2)
for i in range(steps):
self.speed(speed)
self._go(l)
self.speed(0)
self._rotate(w)
self._rotate(-w2)
if speed == 0:
self._tracer(tr, dl)
self.speed(speed)
if self.undobuffer:
self.undobuffer.cumulate = False
## three dummy methods to be implemented by child class:
def speed(self, s=0):
"""dummy method - to be overwritten by child class"""
def _tracer(self, a=None, b=None):
"""dummy method - to be overwritten by child class"""
def _delay(self, n=None):
"""dummy method - to be overwritten by child class"""
fd = forward
bk = back
backward = back
rt = right
lt = left
position = pos
setpos = goto
setposition = goto
seth = setheading
class TPen(object):
"""Drawing part of the RawTurtle.
Implements drawing properties.
"""
def __init__(self, resizemode=_CFG["resizemode"]):
self._resizemode = resizemode # or "user" or "noresize"
self.undobuffer = None
TPen._reset(self)
def _reset(self, pencolor=_CFG["pencolor"],
fillcolor=_CFG["fillcolor"]):
self._pensize = 1
self._shown = True
self._pencolor = pencolor
self._fillcolor = fillcolor
self._drawing = True
self._speed = 3
self._stretchfactor = (1., 1.)
self._shearfactor = 0.
self._tilt = 0.
self._shapetrafo = (1., 0., 0., 1.)
self._outlinewidth = 1
def resizemode(self, rmode=None):
"""Set resizemode to one of the values: "auto", "user", "noresize".
(Optional) Argument:
rmode -- one of the strings "auto", "user", "noresize"
Different resizemodes have the following effects:
- "auto" adapts the appearance of the turtle
corresponding to the value of pensize.
- "user" adapts the appearance of the turtle according to the
values of stretchfactor and outlinewidth (outline),
which are set by shapesize()
- "noresize" no adaption of the turtle's appearance takes place.
If no argument is given, return current resizemode.
resizemode("user") is called by a call of shapesize with arguments.
Examples (for a Turtle instance named turtle):
>>> turtle.resizemode("noresize")
>>> turtle.resizemode()
'noresize'
"""
if rmode is None:
return self._resizemode
rmode = rmode.lower()
if rmode in ["auto", "user", "noresize"]:
self.pen(resizemode=rmode)
def pensize(self, width=None):
"""Set or return the line thickness.
Aliases: pensize | width
Argument:
width -- positive number
Set the line thickness to width or return it. If resizemode is set
to "auto" and turtleshape is a polygon, that polygon is drawn with
the same line thickness. If no argument is given, current pensize
is returned.
Example (for a Turtle instance named turtle):
>>> turtle.pensize()
1
>>> turtle.pensize(10) # from here on lines of width 10 are drawn
"""
if width is None:
return self._pensize
self.pen(pensize=width)
def penup(self):
"""Pull the pen up -- no drawing when moving.
Aliases: penup | pu | up
No argument
Example (for a Turtle instance named turtle):
>>> turtle.penup()
"""
if not self._drawing:
return
self.pen(pendown=False)
def pendown(self):
"""Pull the pen down -- drawing when moving.
Aliases: pendown | pd | down
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.pendown()
"""
if self._drawing:
return
self.pen(pendown=True)
def isdown(self):
"""Return True if pen is down, False if it's up.
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.penup()
>>> turtle.isdown()
False
>>> turtle.pendown()
>>> turtle.isdown()
True
"""
return self._drawing
def speed(self, speed=None):
""" Return or set the turtle's speed.
Optional argument:
speed -- an integer in the range 0..10 or a speedstring (see below)
Set the turtle's speed to an integer value in the range 0 .. 10.
If no argument is given: return current speed.
If input is a number greater than 10 or smaller than 0.5,
speed is set to 0.
Speedstrings are mapped to speedvalues in the following way:
'fastest' : 0
'fast' : 10
'normal' : 6
'slow' : 3
'slowest' : 1
speeds from 1 to 10 enforce increasingly faster animation of
line drawing and turtle turning.
Attention:
speed = 0 : *no* animation takes place. forward/back makes turtle jump
and likewise left/right make the turtle turn instantly.
Example (for a Turtle instance named turtle):
>>> turtle.speed(3)
"""
speeds = {'fastest':0, 'fast':10, 'normal':6, 'slow':3, 'slowest':1 }
if speed is None:
return self._speed
if speed in speeds:
speed = speeds[speed]
elif 0.5 < speed < 10.5:
speed = int(round(speed))
else:
speed = 0
self.pen(speed=speed)
def color(self, *args):
"""Return or set the pencolor and fillcolor.
Arguments:
Several input formats are allowed.
They use 0, 1, 2, or 3 arguments as follows:
color()
Return the current pencolor and the current fillcolor
as a pair of color specification strings as are returned
by pencolor and fillcolor.
color(colorstring), color((r,g,b)), color(r,g,b)
inputs as in pencolor, set both, fillcolor and pencolor,
to the given value.
color(colorstring1, colorstring2),
color((r1,g1,b1), (r2,g2,b2))
equivalent to pencolor(colorstring1) and fillcolor(colorstring2)
and analogously, if the other input format is used.
If turtleshape is a polygon, outline and interior of that polygon
is drawn with the newly set colors.
For mor info see: pencolor, fillcolor
Example (for a Turtle instance named turtle):
>>> turtle.color('red', 'green')
>>> turtle.color()
('red', 'green')
>>> colormode(255)
>>> color((40, 80, 120), (160, 200, 240))
>>> color()
('#285078', '#a0c8f0')
"""
if args:
l = len(args)
if l == 1:
pcolor = fcolor = args[0]
elif l == 2:
pcolor, fcolor = args
elif l == 3:
pcolor = fcolor = args
pcolor = self._colorstr(pcolor)
fcolor = self._colorstr(fcolor)
self.pen(pencolor=pcolor, fillcolor=fcolor)
else:
return self._color(self._pencolor), self._color(self._fillcolor)
def pencolor(self, *args):
""" Return or set the pencolor.
Arguments:
Four input formats are allowed:
- pencolor()
Return the current pencolor as color specification string,
possibly in hex-number format (see example).
May be used as input to another color/pencolor/fillcolor call.
- pencolor(colorstring)
s is a Tk color specification string, such as "red" or "yellow"
- pencolor((r, g, b))
*a tuple* of r, g, and b, which represent, an RGB color,
and each of r, g, and b are in the range 0..colormode,
where colormode is either 1.0 or 255
- pencolor(r, g, b)
r, g, and b represent an RGB color, and each of r, g, and b
are in the range 0..colormode
If turtleshape is a polygon, the outline of that polygon is drawn
with the newly set pencolor.
Example (for a Turtle instance named turtle):
>>> turtle.pencolor('brown')
>>> tup = (0.2, 0.8, 0.55)
>>> turtle.pencolor(tup)
>>> turtle.pencolor()
'#33cc8c'
"""
if args:
color = self._colorstr(args)
if color == self._pencolor:
return
self.pen(pencolor=color)
else:
return self._color(self._pencolor)
def fillcolor(self, *args):
""" Return or set the fillcolor.
Arguments:
Four input formats are allowed:
- fillcolor()
Return the current fillcolor as color specification string,
possibly in hex-number format (see example).
May be used as input to another color/pencolor/fillcolor call.
- fillcolor(colorstring)
s is a Tk color specification string, such as "red" or "yellow"
- fillcolor((r, g, b))
*a tuple* of r, g, and b, which represent, an RGB color,
and each of r, g, and b are in the range 0..colormode,
where colormode is either 1.0 or 255
- fillcolor(r, g, b)
r, g, and b represent an RGB color, and each of r, g, and b
are in the range 0..colormode
If turtleshape is a polygon, the interior of that polygon is drawn
with the newly set fillcolor.
Example (for a Turtle instance named turtle):
>>> turtle.fillcolor('violet')
>>> col = turtle.pencolor()
>>> turtle.fillcolor(col)
>>> turtle.fillcolor(0, .5, 0)
"""
if args:
color = self._colorstr(args)
if color == self._fillcolor:
return
self.pen(fillcolor=color)
else:
return self._color(self._fillcolor)
def showturtle(self):
"""Makes the turtle visible.
Aliases: showturtle | st
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.hideturtle()
>>> turtle.showturtle()
"""
self.pen(shown=True)
def hideturtle(self):
"""Makes the turtle invisible.
Aliases: hideturtle | ht
No argument.
It's a good idea to do this while you're in the
middle of a complicated drawing, because hiding
the turtle speeds up the drawing observably.
Example (for a Turtle instance named turtle):
>>> turtle.hideturtle()
"""
self.pen(shown=False)
def isvisible(self):
"""Return True if the Turtle is shown, False if it's hidden.
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.hideturtle()
>>> print turtle.isvisible():
False
"""
return self._shown
def pen(self, pen=None, **pendict):
"""Return or set the pen's attributes.
Arguments:
pen -- a dictionary with some or all of the below listed keys.
**pendict -- one or more keyword-arguments with the below
listed keys as keywords.
Return or set the pen's attributes in a 'pen-dictionary'
with the following key/value pairs:
"shown" : True/False
"pendown" : True/False
"pencolor" : color-string or color-tuple
"fillcolor" : color-string or color-tuple
"pensize" : positive number
"speed" : number in range 0..10
"resizemode" : "auto" or "user" or "noresize"
"stretchfactor": (positive number, positive number)
"shearfactor": number
"outline" : positive number
"tilt" : number
This dictionary can be used as argument for a subsequent
pen()-call to restore the former pen-state. Moreover one
or more of these attributes can be provided as keyword-arguments.
This can be used to set several pen attributes in one statement.
Examples (for a Turtle instance named turtle):
>>> turtle.pen(fillcolor="black", pencolor="red", pensize=10)
>>> turtle.pen()
{'pensize': 10, 'shown': True, 'resizemode': 'auto', 'outline': 1,
'pencolor': 'red', 'pendown': True, 'fillcolor': 'black',
'stretchfactor': (1,1), 'speed': 3, 'shearfactor': 0.0}
>>> penstate=turtle.pen()
>>> turtle.color("yellow","")
>>> turtle.penup()
>>> turtle.pen()
{'pensize': 10, 'shown': True, 'resizemode': 'auto', 'outline': 1,
'pencolor': 'yellow', 'pendown': False, 'fillcolor': '',
'stretchfactor': (1,1), 'speed': 3, 'shearfactor': 0.0}
>>> p.pen(penstate, fillcolor="green")
>>> p.pen()
{'pensize': 10, 'shown': True, 'resizemode': 'auto', 'outline': 1,
'pencolor': 'red', 'pendown': True, 'fillcolor': 'green',
'stretchfactor': (1,1), 'speed': 3, 'shearfactor': 0.0}
"""
_pd = {"shown" : self._shown,
"pendown" : self._drawing,
"pencolor" : self._pencolor,
"fillcolor" : self._fillcolor,
"pensize" : self._pensize,
"speed" : self._speed,
"resizemode" : self._resizemode,
"stretchfactor" : self._stretchfactor,
"shearfactor" : self._shearfactor,
"outline" : self._outlinewidth,
"tilt" : self._tilt
}
if not (pen or pendict):
return _pd
if isinstance(pen, dict):
p = pen
else:
p = {}
p.update(pendict)
_p_buf = {}
for key in p:
_p_buf[key] = _pd[key]
if self.undobuffer:
self.undobuffer.push(("pen", _p_buf))
newLine = False
if "pendown" in p:
if self._drawing != p["pendown"]:
newLine = True
if "pencolor" in p:
if isinstance(p["pencolor"], tuple):
p["pencolor"] = self._colorstr((p["pencolor"],))
if self._pencolor != p["pencolor"]:
newLine = True
if "pensize" in p:
if self._pensize != p["pensize"]:
newLine = True
if newLine:
self._newLine()
if "pendown" in p:
self._drawing = p["pendown"]
if "pencolor" in p:
self._pencolor = p["pencolor"]
if "pensize" in p:
self._pensize = p["pensize"]
if "fillcolor" in p:
if isinstance(p["fillcolor"], tuple):
p["fillcolor"] = self._colorstr((p["fillcolor"],))
self._fillcolor = p["fillcolor"]
if "speed" in p:
self._speed = p["speed"]
if "resizemode" in p:
self._resizemode = p["resizemode"]
if "stretchfactor" in p:
sf = p["stretchfactor"]
if isinstance(sf, (int, float)):
sf = (sf, sf)
self._stretchfactor = sf
if "shearfactor" in p:
self._shearfactor = p["shearfactor"]
if "outline" in p:
self._outlinewidth = p["outline"]
if "shown" in p:
self._shown = p["shown"]
if "tilt" in p:
self._tilt = p["tilt"]
if "stretchfactor" in p or "tilt" in p or "shearfactor" in p:
scx, scy = self._stretchfactor
shf = self._shearfactor
sa, ca = math.sin(self._tilt), math.cos(self._tilt)
self._shapetrafo = ( scx*ca, scy*(shf*ca + sa),
-scx*sa, scy*(ca - shf*sa))
self._update()
## three dummy methods to be implemented by child class:
def _newLine(self, usePos = True):
"""dummy method - to be overwritten by child class"""
def _update(self, count=True, forced=False):
"""dummy method - to be overwritten by child class"""
def _color(self, args):
"""dummy method - to be overwritten by child class"""
def _colorstr(self, args):
"""dummy method - to be overwritten by child class"""
width = pensize
up = penup
pu = penup
pd = pendown
down = pendown
st = showturtle
ht = hideturtle
class _TurtleImage(object):
"""Helper class: Datatype to store Turtle attributes
"""
def __init__(self, screen, shapeIndex):
self.screen = screen
self._type = None
self._setshape(shapeIndex)
def _setshape(self, shapeIndex):
screen = self.screen
self.shapeIndex = shapeIndex
if self._type == "polygon" == screen._shapes[shapeIndex]._type:
return
if self._type == "image" == screen._shapes[shapeIndex]._type:
return
if self._type in ["image", "polygon"]:
screen._delete(self._item)
elif self._type == "compound":
for item in self._item:
screen._delete(item)
self._type = screen._shapes[shapeIndex]._type
if self._type == "polygon":
self._item = screen._createpoly()
elif self._type == "image":
self._item = screen._createimage(screen._shapes["blank"]._data)
elif self._type == "compound":
self._item = [screen._createpoly() for item in
screen._shapes[shapeIndex]._data]
class RawTurtle(TPen, TNavigator):
"""Animation part of the RawTurtle.
Puts RawTurtle upon a TurtleScreen and provides tools for
its animation.
"""
screens = []
def __init__(self, canvas=None,
shape=_CFG["shape"],
undobuffersize=_CFG["undobuffersize"],
visible=_CFG["visible"]):
if isinstance(canvas, _Screen):
self.screen = canvas
elif isinstance(canvas, TurtleScreen):
if canvas not in RawTurtle.screens:
RawTurtle.screens.append(canvas)
self.screen = canvas
elif isinstance(canvas, (ScrolledCanvas, Canvas)):
for screen in RawTurtle.screens:
if screen.cv == canvas:
self.screen = screen
break
else:
self.screen = TurtleScreen(canvas)
RawTurtle.screens.append(self.screen)
else:
raise TurtleGraphicsError("bad canvas argument %s" % canvas)
screen = self.screen
TNavigator.__init__(self, screen.mode())
TPen.__init__(self)
screen._turtles.append(self)
self.drawingLineItem = screen._createline()
self.turtle = _TurtleImage(screen, shape)
self._poly = None
self._creatingPoly = False
self._fillitem = self._fillpath = None
self._shown = visible
self._hidden_from_screen = False
self.currentLineItem = screen._createline()
self.currentLine = [self._position]
self.items = [self.currentLineItem]
self.stampItems = []
self._undobuffersize = undobuffersize
self.undobuffer = Tbuffer(undobuffersize)
self._update()
def reset(self):
"""Delete the turtle's drawings and restore its default values.
No argument.
Delete the turtle's drawings from the screen, re-center the turtle
and set variables to the default values.
Example (for a Turtle instance named turtle):
>>> turtle.position()
(0.00,-22.00)
>>> turtle.heading()
100.0
>>> turtle.reset()
>>> turtle.position()
(0.00,0.00)
>>> turtle.heading()
0.0
"""
TNavigator.reset(self)
TPen._reset(self)
self._clear()
self._drawturtle()
self._update()
def setundobuffer(self, size):
"""Set or disable undobuffer.
Argument:
size -- an integer or None
If size is an integer an empty undobuffer of given size is installed.
Size gives the maximum number of turtle-actions that can be undone
by the undo() function.
If size is None, no undobuffer is present.
Example (for a Turtle instance named turtle):
>>> turtle.setundobuffer(42)
"""
if size is None or size <= 0:
self.undobuffer = None
else:
self.undobuffer = Tbuffer(size)
def undobufferentries(self):
"""Return count of entries in the undobuffer.
No argument.
Example (for a Turtle instance named turtle):
>>> while undobufferentries():
... undo()
"""
if self.undobuffer is None:
return 0
return self.undobuffer.nr_of_items()
def _clear(self):
"""Delete all of pen's drawings"""
self._fillitem = self._fillpath = None
for item in self.items:
self.screen._delete(item)
self.currentLineItem = self.screen._createline()
self.currentLine = []
if self._drawing:
self.currentLine.append(self._position)
self.items = [self.currentLineItem]
self.clearstamps()
self.setundobuffer(self._undobuffersize)
def clear(self):
"""Delete the turtle's drawings from the screen. Do not move turtle.
No arguments.
Delete the turtle's drawings from the screen. Do not move turtle.
State and position of the turtle as well as drawings of other
turtles are not affected.
Examples (for a Turtle instance named turtle):
>>> turtle.clear()
"""
self._clear()
self._update()
def _update_data(self):
self.screen._incrementudc()
if self.screen._updatecounter != 0:
return
if len(self.currentLine)>1:
self.screen._drawline(self.currentLineItem, self.currentLine,
self._pencolor, self._pensize)
def _update(self):
"""Perform a Turtle-data update.
"""
screen = self.screen
if screen._tracing == 0:
return
elif screen._tracing == 1:
self._update_data()
self._drawturtle()
screen._update() # TurtleScreenBase
screen._delay(screen._delayvalue) # TurtleScreenBase
else:
self._update_data()
if screen._updatecounter == 0:
for t in screen.turtles():
t._drawturtle()
screen._update()
def _tracer(self, flag=None, delay=None):
"""Turns turtle animation on/off and set delay for update drawings.
Optional arguments:
n -- nonnegative integer
delay -- nonnegative integer
If n is given, only each n-th regular screen update is really performed.
(Can be used to accelerate the drawing of complex graphics.)
Second arguments sets delay value (see RawTurtle.delay())
Example (for a Turtle instance named turtle):
>>> turtle.tracer(8, 25)
>>> dist = 2
>>> for i in range(200):
... turtle.fd(dist)
... turtle.rt(90)
... dist += 2
"""
return self.screen.tracer(flag, delay)
def _color(self, args):
return self.screen._color(args)
def _colorstr(self, args):
return self.screen._colorstr(args)
def _cc(self, args):
"""Convert colortriples to hexstrings.
"""
if isinstance(args, str):
return args
try:
r, g, b = args
except:
raise TurtleGraphicsError("bad color arguments: %s" % str(args))
if self.screen._colormode == 1.0:
r, g, b = [round(255.0*x) for x in (r, g, b)]
if not ((0 <= r <= 255) and (0 <= g <= 255) and (0 <= b <= 255)):
raise TurtleGraphicsError("bad color sequence: %s" % str(args))
return "#%02x%02x%02x" % (r, g, b)
def clone(self):
"""Create and return a clone of the turtle.
No argument.
Create and return a clone of the turtle with same position, heading
and turtle properties.
Example (for a Turtle instance named mick):
mick = Turtle()
joe = mick.clone()
"""
screen = self.screen
self._newLine(self._drawing)
turtle = self.turtle
self.screen = None
self.turtle = None # too make self deepcopy-able
q = deepcopy(self)
self.screen = screen
self.turtle = turtle
q.screen = screen
q.turtle = _TurtleImage(screen, self.turtle.shapeIndex)
screen._turtles.append(q)
ttype = screen._shapes[self.turtle.shapeIndex]._type
if ttype == "polygon":
q.turtle._item = screen._createpoly()
elif ttype == "image":
q.turtle._item = screen._createimage(screen._shapes["blank"]._data)
elif ttype == "compound":
q.turtle._item = [screen._createpoly() for item in
screen._shapes[self.turtle.shapeIndex]._data]
q.currentLineItem = screen._createline()
q._update()
return q
def shape(self, name=None):
"""Set turtle shape to shape with given name / return current shapename.
Optional argument:
name -- a string, which is a valid shapename
Set turtle shape to shape with given name or, if name is not given,
return name of current shape.
Shape with name must exist in the TurtleScreen's shape dictionary.
Initially there are the following polygon shapes:
'arrow', 'turtle', 'circle', 'square', 'triangle', 'classic'.
To learn about how to deal with shapes see Screen-method register_shape.
Example (for a Turtle instance named turtle):
>>> turtle.shape()
'arrow'
>>> turtle.shape("turtle")
>>> turtle.shape()
'turtle'
"""
if name is None:
return self.turtle.shapeIndex
if not name in self.screen.getshapes():
raise TurtleGraphicsError("There is no shape named %s" % name)
self.turtle._setshape(name)
self._update()
def shapesize(self, stretch_wid=None, stretch_len=None, outline=None):
"""Set/return turtle's stretchfactors/outline. Set resizemode to "user".
Optional arguments:
stretch_wid : positive number
stretch_len : positive number
outline : positive number
Return or set the pen's attributes x/y-stretchfactors and/or outline.
Set resizemode to "user".
If and only if resizemode is set to "user", the turtle will be displayed
stretched according to its stretchfactors:
stretch_wid is stretchfactor perpendicular to orientation
stretch_len is stretchfactor in direction of turtles orientation.
outline determines the width of the shapes's outline.
Examples (for a Turtle instance named turtle):
>>> turtle.resizemode("user")
>>> turtle.shapesize(5, 5, 12)
>>> turtle.shapesize(outline=8)
"""
if stretch_wid is stretch_len is outline is None:
stretch_wid, stretch_len = self._stretchfactor
return stretch_wid, stretch_len, self._outlinewidth
if stretch_wid == 0 or stretch_len == 0:
raise TurtleGraphicsError("stretch_wid/stretch_len must not be zero")
if stretch_wid is not None:
if stretch_len is None:
stretchfactor = stretch_wid, stretch_wid
else:
stretchfactor = stretch_wid, stretch_len
elif stretch_len is not None:
stretchfactor = self._stretchfactor[0], stretch_len
else:
stretchfactor = self._stretchfactor
if outline is None:
outline = self._outlinewidth
self.pen(resizemode="user",
stretchfactor=stretchfactor, outline=outline)
def shearfactor(self, shear=None):
"""Set or return the current shearfactor.
Optional argument: shear -- number, tangent of the shear angle
Shear the turtleshape according to the given shearfactor shear,
which is the tangent of the shear angle. DO NOT change the
turtle's heading (direction of movement).
If shear is not given: return the current shearfactor, i. e. the
tangent of the shear angle, by which lines parallel to the
heading of the turtle are sheared.
Examples (for a Turtle instance named turtle):
>>> turtle.shape("circle")
>>> turtle.shapesize(5,2)
>>> turtle.shearfactor(0.5)
>>> turtle.shearfactor()
>>> 0.5
"""
if shear is None:
return self._shearfactor
self.pen(resizemode="user", shearfactor=shear)
def settiltangle(self, angle):
"""Rotate the turtleshape to point in the specified direction
Argument: angle -- number
Rotate the turtleshape to point in the direction specified by angle,
regardless of its current tilt-angle. DO NOT change the turtle's
heading (direction of movement).
Examples (for a Turtle instance named turtle):
>>> turtle.shape("circle")
>>> turtle.shapesize(5,2)
>>> turtle.settiltangle(45)
>>> stamp()
>>> turtle.fd(50)
>>> turtle.settiltangle(-45)
>>> stamp()
>>> turtle.fd(50)
"""
tilt = -angle * self._degreesPerAU * self._angleOrient
tilt = (tilt * math.pi / 180.0) % (2*math.pi)
self.pen(resizemode="user", tilt=tilt)
def tiltangle(self, angle=None):
"""Set or return the current tilt-angle.
Optional argument: angle -- number
Rotate the turtleshape to point in the direction specified by angle,
regardless of its current tilt-angle. DO NOT change the turtle's
heading (direction of movement).
If angle is not given: return the current tilt-angle, i. e. the angle
between the orientation of the turtleshape and the heading of the
turtle (its direction of movement).
Deprecated since Python 3.1
Examples (for a Turtle instance named turtle):
>>> turtle.shape("circle")
>>> turtle.shapesize(5,2)
>>> turtle.tilt(45)
>>> turtle.tiltangle()
"""
if angle is None:
tilt = -self._tilt * (180.0/math.pi) * self._angleOrient
return (tilt / self._degreesPerAU) % self._fullcircle
else:
self.settiltangle(angle)
def tilt(self, angle):
"""Rotate the turtleshape by angle.
Argument:
angle - a number
Rotate the turtleshape by angle from its current tilt-angle,
but do NOT change the turtle's heading (direction of movement).
Examples (for a Turtle instance named turtle):
>>> turtle.shape("circle")
>>> turtle.shapesize(5,2)
>>> turtle.tilt(30)
>>> turtle.fd(50)
>>> turtle.tilt(30)
>>> turtle.fd(50)
"""
self.settiltangle(angle + self.tiltangle())
def shapetransform(self, t11=None, t12=None, t21=None, t22=None):
"""Set or return the current transformation matrix of the turtle shape.
Optional arguments: t11, t12, t21, t22 -- numbers.
If none of the matrix elements are given, return the transformation
matrix.
Otherwise set the given elements and transform the turtleshape
according to the matrix consisting of first row t11, t12 and
second row t21, 22.
Modify stretchfactor, shearfactor and tiltangle according to the
given matrix.
Examples (for a Turtle instance named turtle):
>>> turtle.shape("square")
>>> turtle.shapesize(4,2)
>>> turtle.shearfactor(-0.5)
>>> turtle.shapetransform()
(4.0, -1.0, -0.0, 2.0)
"""
if t11 is t12 is t21 is t22 is None:
return self._shapetrafo
m11, m12, m21, m22 = self._shapetrafo
if t11 is not None: m11 = t11
if t12 is not None: m12 = t12
if t21 is not None: m21 = t21
if t22 is not None: m22 = t22
if t11 * t22 - t12 * t21 == 0:
raise TurtleGraphicsError("Bad shape transform matrix: must not be singular")
self._shapetrafo = (m11, m12, m21, m22)
alfa = math.atan2(-m21, m11) % (2 * math.pi)
sa, ca = math.sin(alfa), math.cos(alfa)
a11, a12, a21, a22 = (ca*m11 - sa*m21, ca*m12 - sa*m22,
sa*m11 + ca*m21, sa*m12 + ca*m22)
self._stretchfactor = a11, a22
self._shearfactor = a12/a22
self._tilt = alfa
self.pen(resizemode="user")
def _polytrafo(self, poly):
"""Computes transformed polygon shapes from a shape
according to current position and heading.
"""
screen = self.screen
p0, p1 = self._position
e0, e1 = self._orient
e = Vec2D(e0, e1 * screen.yscale / screen.xscale)
e0, e1 = (1.0 / abs(e)) * e
return [(p0+(e1*x+e0*y)/screen.xscale, p1+(-e0*x+e1*y)/screen.yscale)
for (x, y) in poly]
def get_shapepoly(self):
"""Return the current shape polygon as tuple of coordinate pairs.
No argument.
Examples (for a Turtle instance named turtle):
>>> turtle.shape("square")
>>> turtle.shapetransform(4, -1, 0, 2)
>>> turtle.get_shapepoly()
((50, -20), (30, 20), (-50, 20), (-30, -20))
"""
shape = self.screen._shapes[self.turtle.shapeIndex]
if shape._type == "polygon":
return self._getshapepoly(shape._data, shape._type == "compound")
# else return None
def _getshapepoly(self, polygon, compound=False):
"""Calculate transformed shape polygon according to resizemode
and shapetransform.
"""
if self._resizemode == "user" or compound:
t11, t12, t21, t22 = self._shapetrafo
elif self._resizemode == "auto":
l = max(1, self._pensize/5.0)
t11, t12, t21, t22 = l, 0, 0, l
elif self._resizemode == "noresize":
return polygon
return tuple([(t11*x + t12*y, t21*x + t22*y) for (x, y) in polygon])
def _drawturtle(self):
"""Manages the correct rendering of the turtle with respect to
its shape, resizemode, stretch and tilt etc."""
screen = self.screen
shape = screen._shapes[self.turtle.shapeIndex]
ttype = shape._type
titem = self.turtle._item
if self._shown and screen._updatecounter == 0 and screen._tracing > 0:
self._hidden_from_screen = False
tshape = shape._data
if ttype == "polygon":
if self._resizemode == "noresize": w = 1
elif self._resizemode == "auto": w = self._pensize
else: w =self._outlinewidth
shape = self._polytrafo(self._getshapepoly(tshape))
fc, oc = self._fillcolor, self._pencolor
screen._drawpoly(titem, shape, fill=fc, outline=oc,
width=w, top=True)
elif ttype == "image":
screen._drawimage(titem, self._position, tshape)
elif ttype == "compound":
for item, (poly, fc, oc) in zip(titem, tshape):
poly = self._polytrafo(self._getshapepoly(poly, True))
screen._drawpoly(item, poly, fill=self._cc(fc),
outline=self._cc(oc), width=self._outlinewidth, top=True)
else:
if self._hidden_from_screen:
return
if ttype == "polygon":
screen._drawpoly(titem, ((0, 0), (0, 0), (0, 0)), "", "")
elif ttype == "image":
screen._drawimage(titem, self._position,
screen._shapes["blank"]._data)
elif ttype == "compound":
for item in titem:
screen._drawpoly(item, ((0, 0), (0, 0), (0, 0)), "", "")
self._hidden_from_screen = True
############################## stamp stuff ###############################
def stamp(self):
"""Stamp a copy of the turtleshape onto the canvas and return its id.
No argument.
Stamp a copy of the turtle shape onto the canvas at the current
turtle position. Return a stamp_id for that stamp, which can be
used to delete it by calling clearstamp(stamp_id).
Example (for a Turtle instance named turtle):
>>> turtle.color("blue")
>>> turtle.stamp()
13
>>> turtle.fd(50)
"""
screen = self.screen
shape = screen._shapes[self.turtle.shapeIndex]
ttype = shape._type
tshape = shape._data
if ttype == "polygon":
stitem = screen._createpoly()
if self._resizemode == "noresize": w = 1
elif self._resizemode == "auto": w = self._pensize
else: w =self._outlinewidth
shape = self._polytrafo(self._getshapepoly(tshape))
fc, oc = self._fillcolor, self._pencolor
screen._drawpoly(stitem, shape, fill=fc, outline=oc,
width=w, top=True)
elif ttype == "image":
stitem = screen._createimage("")
screen._drawimage(stitem, self._position, tshape)
elif ttype == "compound":
stitem = []
for element in tshape:
item = screen._createpoly()
stitem.append(item)
stitem = tuple(stitem)
for item, (poly, fc, oc) in zip(stitem, tshape):
poly = self._polytrafo(self._getshapepoly(poly, True))
screen._drawpoly(item, poly, fill=self._cc(fc),
outline=self._cc(oc), width=self._outlinewidth, top=True)
self.stampItems.append(stitem)
self.undobuffer.push(("stamp", stitem))
return stitem
def _clearstamp(self, stampid):
"""does the work for clearstamp() and clearstamps()
"""
if stampid in self.stampItems:
if isinstance(stampid, tuple):
for subitem in stampid:
self.screen._delete(subitem)
else:
self.screen._delete(stampid)
self.stampItems.remove(stampid)
# Delete stampitem from undobuffer if necessary
# if clearstamp is called directly.
item = ("stamp", stampid)
buf = self.undobuffer
if item not in buf.buffer:
return
index = buf.buffer.index(item)
buf.buffer.remove(item)
if index <= buf.ptr:
buf.ptr = (buf.ptr - 1) % buf.bufsize
buf.buffer.insert((buf.ptr+1)%buf.bufsize, [None])
def clearstamp(self, stampid):
"""Delete stamp with given stampid
Argument:
stampid - an integer, must be return value of previous stamp() call.
Example (for a Turtle instance named turtle):
>>> turtle.color("blue")
>>> astamp = turtle.stamp()
>>> turtle.fd(50)
>>> turtle.clearstamp(astamp)
"""
self._clearstamp(stampid)
self._update()
def clearstamps(self, n=None):
"""Delete all or first/last n of turtle's stamps.
Optional argument:
n -- an integer
If n is None, delete all of pen's stamps,
else if n > 0 delete first n stamps
else if n < 0 delete last n stamps.
Example (for a Turtle instance named turtle):
>>> for i in range(8):
... turtle.stamp(); turtle.fd(30)
...
>>> turtle.clearstamps(2)
>>> turtle.clearstamps(-2)
>>> turtle.clearstamps()
"""
if n is None:
toDelete = self.stampItems[:]
elif n >= 0:
toDelete = self.stampItems[:n]
else:
toDelete = self.stampItems[n:]
for item in toDelete:
self._clearstamp(item)
self._update()
def _goto(self, end):
"""Move the pen to the point end, thereby drawing a line
if pen is down. All other methods for turtle movement depend
on this one.
"""
## Version with undo-stuff
go_modes = ( self._drawing,
self._pencolor,
self._pensize,
isinstance(self._fillpath, list))
screen = self.screen
undo_entry = ("go", self._position, end, go_modes,
(self.currentLineItem,
self.currentLine[:],
screen._pointlist(self.currentLineItem),
self.items[:])
)
if self.undobuffer:
self.undobuffer.push(undo_entry)
start = self._position
if self._speed and screen._tracing == 1:
diff = (end-start)
diffsq = (diff[0]*screen.xscale)**2 + (diff[1]*screen.yscale)**2
nhops = 1+int((diffsq**0.5)/(3*(1.1**self._speed)*self._speed))
delta = diff * (1.0/nhops)
for n in range(1, nhops):
if n == 1:
top = True
else:
top = False
self._position = start + delta * n
if self._drawing:
screen._drawline(self.drawingLineItem,
(start, self._position),
self._pencolor, self._pensize, top)
self._update()
if self._drawing:
screen._drawline(self.drawingLineItem, ((0, 0), (0, 0)),
fill="", width=self._pensize)
# Turtle now at end,
if self._drawing: # now update currentLine
self.currentLine.append(end)
if isinstance(self._fillpath, list):
self._fillpath.append(end)
###### vererbung!!!!!!!!!!!!!!!!!!!!!!
self._position = end
if self._creatingPoly:
self._poly.append(end)
if len(self.currentLine) > 42: # 42! answer to the ultimate question
# of life, the universe and everything
self._newLine()
self._update() #count=True)
def _undogoto(self, entry):
"""Reverse a _goto. Used for undo()
"""
old, new, go_modes, coodata = entry
drawing, pc, ps, filling = go_modes
cLI, cL, pl, items = coodata
screen = self.screen
if abs(self._position - new) > 0.5:
print ("undogoto: HALLO-DA-STIMMT-WAS-NICHT!")
# restore former situation
self.currentLineItem = cLI
self.currentLine = cL
if pl == [(0, 0), (0, 0)]:
usepc = ""
else:
usepc = pc
screen._drawline(cLI, pl, fill=usepc, width=ps)
todelete = [i for i in self.items if (i not in items) and
(screen._type(i) == "line")]
for i in todelete:
screen._delete(i)
self.items.remove(i)
start = old
if self._speed and screen._tracing == 1:
diff = old - new
diffsq = (diff[0]*screen.xscale)**2 + (diff[1]*screen.yscale)**2
nhops = 1+int((diffsq**0.5)/(3*(1.1**self._speed)*self._speed))
delta = diff * (1.0/nhops)
for n in range(1, nhops):
if n == 1:
top = True
else:
top = False
self._position = new + delta * n
if drawing:
screen._drawline(self.drawingLineItem,
(start, self._position),
pc, ps, top)
self._update()
if drawing:
screen._drawline(self.drawingLineItem, ((0, 0), (0, 0)),
fill="", width=ps)
# Turtle now at position old,
self._position = old
## if undo is done during creating a polygon, the last vertex
## will be deleted. if the polygon is entirely deleted,
## creatingPoly will be set to False.
## Polygons created before the last one will not be affected by undo()
if self._creatingPoly:
if len(self._poly) > 0:
self._poly.pop()
if self._poly == []:
self._creatingPoly = False
self._poly = None
if filling:
if self._fillpath == []:
self._fillpath = None
print("Unwahrscheinlich in _undogoto!")
elif self._fillpath is not None:
self._fillpath.pop()
self._update() #count=True)
def _rotate(self, angle):
"""Turns pen clockwise by angle.
"""
if self.undobuffer:
self.undobuffer.push(("rot", angle, self._degreesPerAU))
angle *= self._degreesPerAU
neworient = self._orient.rotate(angle)
tracing = self.screen._tracing
if tracing == 1 and self._speed > 0:
anglevel = 3.0 * self._speed
steps = 1 + int(abs(angle)/anglevel)
delta = 1.0*angle/steps
for _ in range(steps):
self._orient = self._orient.rotate(delta)
self._update()
self._orient = neworient
self._update()
def _newLine(self, usePos=True):
"""Closes current line item and starts a new one.
Remark: if current line became too long, animation
performance (via _drawline) slowed down considerably.
"""
if len(self.currentLine) > 1:
self.screen._drawline(self.currentLineItem, self.currentLine,
self._pencolor, self._pensize)
self.currentLineItem = self.screen._createline()
self.items.append(self.currentLineItem)
else:
self.screen._drawline(self.currentLineItem, top=True)
self.currentLine = []
if usePos:
self.currentLine = [self._position]
def filling(self):
"""Return fillstate (True if filling, False else).
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.begin_fill()
>>> if turtle.filling():
... turtle.pensize(5)
... else:
... turtle.pensize(3)
"""
return isinstance(self._fillpath, list)
def begin_fill(self):
"""Called just before drawing a shape to be filled.
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.color("black", "red")
>>> turtle.begin_fill()
>>> turtle.circle(60)
>>> turtle.end_fill()
"""
if not self.filling():
self._fillitem = self.screen._createpoly()
self.items.append(self._fillitem)
self._fillpath = [self._position]
self._newLine()
if self.undobuffer:
self.undobuffer.push(("beginfill", self._fillitem))
self._update()
def end_fill(self):
"""Fill the shape drawn after the call begin_fill().
No argument.
Example (for a Turtle instance named turtle):
>>> turtle.color("black", "red")
>>> turtle.begin_fill()
>>> turtle.circle(60)
>>> turtle.end_fill()
"""
if self.filling():
if len(self._fillpath) > 2:
self.screen._drawpoly(self._fillitem, self._fillpath,
fill=self._fillcolor)
if self.undobuffer:
self.undobuffer.push(("dofill", self._fillitem))
self._fillitem = self._fillpath = None
self._update()
def dot(self, size=None, *color):
"""Draw a dot with diameter size, using color.
Optional arguments:
size -- an integer >= 1 (if given)
color -- a colorstring or a numeric color tuple
Draw a circular dot with diameter size, using color.
If size is not given, the maximum of pensize+4 and 2*pensize is used.
Example (for a Turtle instance named turtle):
>>> turtle.dot()
>>> turtle.fd(50); turtle.dot(20, "blue"); turtle.fd(50)
"""
if not color:
if isinstance(size, (str, tuple)):
color = self._colorstr(size)
size = self._pensize + max(self._pensize, 4)
else:
color = self._pencolor
if not size:
size = self._pensize + max(self._pensize, 4)
else:
if size is None:
size = self._pensize + max(self._pensize, 4)
color = self._colorstr(color)
if hasattr(self.screen, "_dot"):
item = self.screen._dot(self._position, size, color)
self.items.append(item)
if self.undobuffer:
self.undobuffer.push(("dot", item))
else:
pen = self.pen()
if self.undobuffer:
self.undobuffer.push(["seq"])
self.undobuffer.cumulate = True
try:
if self.resizemode() == 'auto':
self.ht()
self.pendown()
self.pensize(size)
self.pencolor(color)
self.forward(0)
finally:
self.pen(pen)
if self.undobuffer:
self.undobuffer.cumulate = False
def _write(self, txt, align, font):
"""Performs the writing for write()
"""
item, end = self.screen._write(self._position, txt, align, font,
self._pencolor)
self.items.append(item)
if self.undobuffer:
self.undobuffer.push(("wri", item))
return end
def write(self, arg, move=False, align="left", font=("Arial", 8, "normal")):
"""Write text at the current turtle position.
Arguments:
arg -- info, which is to be written to the TurtleScreen
move (optional) -- True/False
align (optional) -- one of the strings "left", "center" or right"
font (optional) -- a triple (fontname, fontsize, fonttype)
Write text - the string representation of arg - at the current
turtle position according to align ("left", "center" or right")
and with the given font.
If move is True, the pen is moved to the bottom-right corner
of the text. By default, move is False.
Example (for a Turtle instance named turtle):
>>> turtle.write('Home = ', True, align="center")
>>> turtle.write((0,0), True)
"""
if self.undobuffer:
self.undobuffer.push(["seq"])
self.undobuffer.cumulate = True
end = self._write(str(arg), align.lower(), font)
if move:
x, y = self.pos()
self.setpos(end, y)
if self.undobuffer:
self.undobuffer.cumulate = False
def begin_poly(self):
"""Start recording the vertices of a polygon.
No argument.
Start recording the vertices of a polygon. Current turtle position
is first point of polygon.
Example (for a Turtle instance named turtle):
>>> turtle.begin_poly()
"""
self._poly = [self._position]
self._creatingPoly = True
def end_poly(self):
"""Stop recording the vertices of a polygon.
No argument.
Stop recording the vertices of a polygon. Current turtle position is
last point of polygon. This will be connected with the first point.
Example (for a Turtle instance named turtle):
>>> turtle.end_poly()
"""
self._creatingPoly = False
def get_poly(self):
"""Return the lastly recorded polygon.
No argument.
Example (for a Turtle instance named turtle):
>>> p = turtle.get_poly()
>>> turtle.register_shape("myFavouriteShape", p)
"""
## check if there is any poly?
if self._poly is not None:
return tuple(self._poly)
def getscreen(self):
"""Return the TurtleScreen object, the turtle is drawing on.
No argument.
Return the TurtleScreen object, the turtle is drawing on.
So TurtleScreen-methods can be called for that object.
Example (for a Turtle instance named turtle):
>>> ts = turtle.getscreen()
>>> ts
<turtle.TurtleScreen object at 0x0106B770>
>>> ts.bgcolor("pink")
"""
return self.screen
def getturtle(self):
"""Return the Turtleobject itself.
No argument.
Only reasonable use: as a function to return the 'anonymous turtle':
Example:
>>> pet = getturtle()
>>> pet.fd(50)
>>> pet
<turtle.Turtle object at 0x0187D810>
>>> turtles()
[<turtle.Turtle object at 0x0187D810>]
"""
return self
getpen = getturtle
################################################################
### screen oriented methods recurring to methods of TurtleScreen
################################################################
def _delay(self, delay=None):
"""Set delay value which determines speed of turtle animation.
"""
return self.screen.delay(delay)
def onclick(self, fun, btn=1, add=None):
"""Bind fun to mouse-click event on this turtle on canvas.
Arguments:
fun -- a function with two arguments, to which will be assigned
the coordinates of the clicked point on the canvas.
num -- number of the mouse-button defaults to 1 (left mouse button).
add -- True or False. If True, new binding will be added, otherwise
it will replace a former binding.
Example for the anonymous turtle, i. e. the procedural way:
>>> def turn(x, y):
... left(360)
...
>>> onclick(turn) # Now clicking into the turtle will turn it.
>>> onclick(None) # event-binding will be removed
"""
self.screen._onclick(self.turtle._item, fun, btn, add)
self._update()
def onrelease(self, fun, btn=1, add=None):
"""Bind fun to mouse-button-release event on this turtle on canvas.
Arguments:
fun -- a function with two arguments, to which will be assigned
the coordinates of the clicked point on the canvas.
num -- number of the mouse-button defaults to 1 (left mouse button).
Example (for a MyTurtle instance named joe):
>>> class MyTurtle(Turtle):
... def glow(self,x,y):
... self.fillcolor("red")
... def unglow(self,x,y):
... self.fillcolor("")
...
>>> joe = MyTurtle()
>>> joe.onclick(joe.glow)
>>> joe.onrelease(joe.unglow)
Clicking on joe turns fillcolor red, unclicking turns it to
transparent.
"""
self.screen._onrelease(self.turtle._item, fun, btn, add)
self._update()
def ondrag(self, fun, btn=1, add=None):
"""Bind fun to mouse-move event on this turtle on canvas.
Arguments:
fun -- a function with two arguments, to which will be assigned
the coordinates of the clicked point on the canvas.
num -- number of the mouse-button defaults to 1 (left mouse button).
Every sequence of mouse-move-events on a turtle is preceded by a
mouse-click event on that turtle.
Example (for a Turtle instance named turtle):
>>> turtle.ondrag(turtle.goto)
Subsequently clicking and dragging a Turtle will move it
across the screen thereby producing handdrawings (if pen is
down).
"""
self.screen._ondrag(self.turtle._item, fun, btn, add)
def _undo(self, action, data):
"""Does the main part of the work for undo()
"""
if self.undobuffer is None:
return
if action == "rot":
angle, degPAU = data
self._rotate(-angle*degPAU/self._degreesPerAU)
dummy = self.undobuffer.pop()
elif action == "stamp":
stitem = data[0]
self.clearstamp(stitem)
elif action == "go":
self._undogoto(data)
elif action in ["wri", "dot"]:
item = data[0]
self.screen._delete(item)
self.items.remove(item)
elif action == "dofill":
item = data[0]
self.screen._drawpoly(item, ((0, 0),(0, 0),(0, 0)),
fill="", outline="")
elif action == "beginfill":
item = data[0]
self._fillitem = self._fillpath = None
if item in self.items:
self.screen._delete(item)
self.items.remove(item)
elif action == "pen":
TPen.pen(self, data[0])
self.undobuffer.pop()
def undo(self):
"""undo (repeatedly) the last turtle action.
No argument.
undo (repeatedly) the last turtle action.
Number of available undo actions is determined by the size of
the undobuffer.
Example (for a Turtle instance named turtle):
>>> for i in range(4):
... turtle.fd(50); turtle.lt(80)
...
>>> for i in range(8):
... turtle.undo()
...
"""
if self.undobuffer is None:
return
item = self.undobuffer.pop()
action = item[0]
data = item[1:]
if action == "seq":
while data:
item = data.pop()
self._undo(item[0], item[1:])
else:
self._undo(action, data)
turtlesize = shapesize
RawPen = RawTurtle
### Screen - Singleton ########################
def Screen():
"""Return the singleton screen object.
If none exists at the moment, create a new one and return it,
else return the existing one."""
if Turtle._screen is None:
Turtle._screen = _Screen()
return Turtle._screen
class _Screen(TurtleScreen):
_root = None
_canvas = None
_title = _CFG["title"]
def __init__(self):
# XXX there is no need for this code to be conditional,
# as there will be only a single _Screen instance, anyway
# XXX actually, the turtle demo is injecting root window,
# so perhaps the conditional creation of a root should be
# preserved (perhaps by passing it as an optional parameter)
if _Screen._root is None:
_Screen._root = self._root = _Root()
self._root.title(_Screen._title)
self._root.ondestroy(self._destroy)
if _Screen._canvas is None:
width = _CFG["width"]
height = _CFG["height"]
canvwidth = _CFG["canvwidth"]
canvheight = _CFG["canvheight"]
leftright = _CFG["leftright"]
topbottom = _CFG["topbottom"]
self._root.setupcanvas(width, height, canvwidth, canvheight)
_Screen._canvas = self._root._getcanvas()
TurtleScreen.__init__(self, _Screen._canvas)
self.setup(width, height, leftright, topbottom)
def setup(self, width=_CFG["width"], height=_CFG["height"],
startx=_CFG["leftright"], starty=_CFG["topbottom"]):
""" Set the size and position of the main window.
Arguments:
width: as integer a size in pixels, as float a fraction of the screen.
Default is 50% of screen.
height: as integer the height in pixels, as float a fraction of the
screen. Default is 75% of screen.
startx: if positive, starting position in pixels from the left
edge of the screen, if negative from the right edge
Default, startx=None is to center window horizontally.
starty: if positive, starting position in pixels from the top
edge of the screen, if negative from the bottom edge
Default, starty=None is to center window vertically.
Examples (for a Screen instance named screen):
>>> screen.setup (width=200, height=200, startx=0, starty=0)
sets window to 200x200 pixels, in upper left of screen
>>> screen.setup(width=.75, height=0.5, startx=None, starty=None)
sets window to 75% of screen by 50% of screen and centers
"""
if not hasattr(self._root, "set_geometry"):
return
sw = self._root.win_width()
sh = self._root.win_height()
if isinstance(width, float) and 0 <= width <= 1:
width = sw*width
if startx is None:
startx = (sw - width) / 2
if isinstance(height, float) and 0 <= height <= 1:
height = sh*height
if starty is None:
starty = (sh - height) / 2
self._root.set_geometry(width, height, startx, starty)
self.update()
def title(self, titlestring):
"""Set title of turtle-window
Argument:
titlestring -- a string, to appear in the titlebar of the
turtle graphics window.
This is a method of Screen-class. Not available for TurtleScreen-
objects.
Example (for a Screen instance named screen):
>>> screen.title("Welcome to the turtle-zoo!")
"""
if _Screen._root is not None:
_Screen._root.title(titlestring)
_Screen._title = titlestring
def _destroy(self):
root = self._root
if root is _Screen._root:
Turtle._pen = None
Turtle._screen = None
_Screen._root = None
_Screen._canvas = None
TurtleScreen._RUNNING = False
root.destroy()
def bye(self):
"""Shut the turtlegraphics window.
Example (for a TurtleScreen instance named screen):
>>> screen.bye()
"""
self._destroy()
def exitonclick(self):
"""Go into mainloop until the mouse is clicked.
No arguments.
Bind bye() method to mouseclick on TurtleScreen.
If "using_IDLE" - value in configuration dictionary is False
(default value), enter mainloop.
If IDLE with -n switch (no subprocess) is used, this value should be
set to True in turtle.cfg. In this case IDLE's mainloop
is active also for the client script.
This is a method of the Screen-class and not available for
TurtleScreen instances.
Example (for a Screen instance named screen):
>>> screen.exitonclick()
"""
def exitGracefully(x, y):
"""Screen.bye() with two dummy-parameters"""
self.bye()
self.onclick(exitGracefully)
if _CFG["using_IDLE"]:
return
try:
mainloop()
except AttributeError:
exit(0)
class Turtle(RawTurtle):
"""RawTurtle auto-creating (scrolled) canvas.
When a Turtle object is created or a function derived from some
Turtle method is called a TurtleScreen object is automatically created.
"""
_pen = None
_screen = None
def __init__(self,
shape=_CFG["shape"],
undobuffersize=_CFG["undobuffersize"],
visible=_CFG["visible"]):
if Turtle._screen is None:
Turtle._screen = Screen()
RawTurtle.__init__(self, Turtle._screen,
shape=shape,
undobuffersize=undobuffersize,
visible=visible)
Pen = Turtle
def write_docstringdict(filename="turtle_docstringdict"):
"""Create and write docstring-dictionary to file.
Optional argument:
filename -- a string, used as filename
default value is turtle_docstringdict
Has to be called explicitly, (not used by the turtle-graphics classes)
The docstring dictionary will be written to the Python script <filname>.py
It is intended to serve as a template for translation of the docstrings
into different languages.
"""
docsdict = {}
for methodname in _tg_screen_functions:
key = "_Screen."+methodname
docsdict[key] = eval(key).__doc__
for methodname in _tg_turtle_functions:
key = "Turtle."+methodname
docsdict[key] = eval(key).__doc__
with open("%s.py" % filename,"w") as f:
keys = sorted([x for x in docsdict.keys()
if x.split('.')[1] not in _alias_list])
f.write('docsdict = {\n\n')
for key in keys[:-1]:
f.write('%s :\n' % repr(key))
f.write(' """%s\n""",\n\n' % docsdict[key])
key = keys[-1]
f.write('%s :\n' % repr(key))
f.write(' """%s\n"""\n\n' % docsdict[key])
f.write("}\n")
f.close()
def read_docstrings(lang):
"""Read in docstrings from lang-specific docstring dictionary.
Transfer docstrings, translated to lang, from a dictionary-file
to the methods of classes Screen and Turtle and - in revised form -
to the corresponding functions.
"""
modname = "turtle_docstringdict_%(language)s" % {'language':lang.lower()}
module = __import__(modname)
docsdict = module.docsdict
for key in docsdict:
try:
# eval(key).im_func.__doc__ = docsdict[key]
eval(key).__doc__ = docsdict[key]
except:
print("Bad docstring-entry: %s" % key)
_LANGUAGE = _CFG["language"]
try:
if _LANGUAGE != "english":
read_docstrings(_LANGUAGE)
except ImportError:
print("Cannot find docsdict for", _LANGUAGE)
except:
print ("Unknown Error when trying to import %s-docstring-dictionary" %
_LANGUAGE)
def getmethparlist(ob):
"""Get strings describing the arguments for the given object
Returns a pair of strings representing function parameter lists
including parenthesis. The first string is suitable for use in
function definition and the second is suitable for use in function
call. The "self" parameter is not included.
"""
defText = callText = ""
# bit of a hack for methods - turn it into a function
# but we drop the "self" param.
# Try and build one for Python defined functions
args, varargs, varkw = inspect.getargs(ob.__code__)
items2 = args[1:]
realArgs = args[1:]
defaults = ob.__defaults__ or []
defaults = ["=%r" % (value,) for value in defaults]
defaults = [""] * (len(realArgs)-len(defaults)) + defaults
items1 = [arg + dflt for arg, dflt in zip(realArgs, defaults)]
if varargs is not None:
items1.append("*" + varargs)
items2.append("*" + varargs)
if varkw is not None:
items1.append("**" + varkw)
items2.append("**" + varkw)
defText = ", ".join(items1)
defText = "(%s)" % defText
callText = ", ".join(items2)
callText = "(%s)" % callText
return defText, callText
def _turtle_docrevise(docstr):
"""To reduce docstrings from RawTurtle class for functions
"""
import re
if docstr is None:
return None
turtlename = _CFG["exampleturtle"]
newdocstr = docstr.replace("%s." % turtlename,"")
parexp = re.compile(r' \(.+ %s\):' % turtlename)
newdocstr = parexp.sub(":", newdocstr)
return newdocstr
def _screen_docrevise(docstr):
"""To reduce docstrings from TurtleScreen class for functions
"""
import re
if docstr is None:
return None
screenname = _CFG["examplescreen"]
newdocstr = docstr.replace("%s." % screenname,"")
parexp = re.compile(r' \(.+ %s\):' % screenname)
newdocstr = parexp.sub(":", newdocstr)
return newdocstr
## The following mechanism makes all methods of RawTurtle and Turtle available
## as functions. So we can enhance, change, add, delete methods to these
## classes and do not need to change anything here.
__func_body = """\
def {name}{paramslist}:
if {obj} is None:
if not TurtleScreen._RUNNING:
TurtleScreen._RUNNING = True
raise Terminator
{obj} = {init}
try:
return {obj}.{name}{argslist}
except TK.TclError:
if not TurtleScreen._RUNNING:
TurtleScreen._RUNNING = True
raise Terminator
raise
"""
def _make_global_funcs(functions, cls, obj, init, docrevise):
for methodname in functions:
method = getattr(cls, methodname)
pl1, pl2 = getmethparlist(method)
if pl1 == "":
print(">>>>>>", pl1, pl2)
continue
defstr = __func_body.format(obj=obj, init=init, name=methodname,
paramslist=pl1, argslist=pl2)
exec(defstr, globals())
globals()[methodname].__doc__ = docrevise(method.__doc__)
_make_global_funcs(_tg_screen_functions, _Screen,
'Turtle._screen', 'Screen()', _screen_docrevise)
_make_global_funcs(_tg_turtle_functions, Turtle,
'Turtle._pen', 'Turtle()', _turtle_docrevise)
done = mainloop
if __name__ == "__main__":
def switchpen():
if isdown():
pu()
else:
pd()
def demo1():
"""Demo of old turtle.py - module"""
reset()
tracer(True)
up()
backward(100)
down()
# draw 3 squares; the last filled
width(3)
for i in range(3):
if i == 2:
begin_fill()
for _ in range(4):
forward(20)
left(90)
if i == 2:
color("maroon")
end_fill()
up()
forward(30)
down()
width(1)
color("black")
# move out of the way
tracer(False)
up()
right(90)
forward(100)
right(90)
forward(100)
right(180)
down()
# some text
write("startstart", 1)
write("start", 1)
color("red")
# staircase
for i in range(5):
forward(20)
left(90)
forward(20)
right(90)
# filled staircase
tracer(True)
begin_fill()
for i in range(5):
forward(20)
left(90)
forward(20)
right(90)
end_fill()
# more text
def demo2():
"""Demo of some new features."""
speed(1)
st()
pensize(3)
setheading(towards(0, 0))
radius = distance(0, 0)/2.0
rt(90)
for _ in range(18):
switchpen()
circle(radius, 10)
write("wait a moment...")
while undobufferentries():
undo()
reset()
lt(90)
colormode(255)
laenge = 10
pencolor("green")
pensize(3)
lt(180)
for i in range(-2, 16):
if i > 0:
begin_fill()
fillcolor(255-15*i, 0, 15*i)
for _ in range(3):
fd(laenge)
lt(120)
end_fill()
laenge += 10
lt(15)
speed((speed()+1)%12)
#end_fill()
lt(120)
pu()
fd(70)
rt(30)
pd()
color("red","yellow")
speed(0)
begin_fill()
for _ in range(4):
circle(50, 90)
rt(90)
fd(30)
rt(90)
end_fill()
lt(90)
pu()
fd(30)
pd()
shape("turtle")
tri = getturtle()
tri.resizemode("auto")
turtle = Turtle()
turtle.resizemode("auto")
turtle.shape("turtle")
turtle.reset()
turtle.left(90)
turtle.speed(0)
turtle.up()
turtle.goto(280, 40)
turtle.lt(30)
turtle.down()
turtle.speed(6)
turtle.color("blue","orange")
turtle.pensize(2)
tri.speed(6)
setheading(towards(turtle))
count = 1
while tri.distance(turtle) > 4:
turtle.fd(3.5)
turtle.lt(0.6)
tri.setheading(tri.towards(turtle))
tri.fd(4)
if count % 20 == 0:
turtle.stamp()
tri.stamp()
switchpen()
count += 1
tri.write("CAUGHT! ", font=("Arial", 16, "bold"), align="right")
tri.pencolor("black")
tri.pencolor("red")
def baba(xdummy, ydummy):
clearscreen()
bye()
time.sleep(2)
while undobufferentries():
tri.undo()
turtle.undo()
tri.fd(50)
tri.write(" Click me!", font = ("Courier", 12, "bold") )
tri.onclick(baba, 1)
demo1()
demo2()
exitonclick()
"""
Define names for built-in types that aren't directly accessible as a builtin.
"""
import sys
# Iterators in Python aren't a matter of type but of protocol. A large
# and changing number of builtin types implement *some* flavor of
# iterator. Don't check the type! Use hasattr to check for both
# "__iter__" and "__next__" attributes instead.
def _f(): pass
FunctionType = type(_f)
LambdaType = type(lambda: None) # Same as FunctionType
CodeType = type(_f.__code__)
MappingProxyType = type(type.__dict__)
SimpleNamespace = type(sys.implementation)
def _g():
yield 1
GeneratorType = type(_g())
class _C:
def _m(self): pass
MethodType = type(_C()._m)
BuiltinFunctionType = type(len)
BuiltinMethodType = type([].append) # Same as BuiltinFunctionType
ModuleType = type(sys)
try:
raise TypeError
except TypeError:
tb = sys.exc_info()[2]
TracebackType = type(tb)
FrameType = type(tb.tb_frame)
tb = None; del tb
# For Jython, the following two types are identical
GetSetDescriptorType = type(FunctionType.__code__)
MemberDescriptorType = type(ModuleType.__dict__["__dict__"]) # ironpython: type(FunctionType.__globals__) is getset_descriptor
del sys, _f, _g, _C, # Not for export
# Provide a PEP 3115 compliant mechanism for class creation
def new_class(name, bases=(), kwds=None, exec_body=None):
"""Create a class object dynamically using the appropriate metaclass."""
meta, ns, kwds = prepare_class(name, bases, kwds)
if exec_body is not None:
exec_body(ns)
return meta(name, bases, ns, **kwds)
def prepare_class(name, bases=(), kwds=None):
"""Call the __prepare__ method of the appropriate metaclass.
Returns (metaclass, namespace, kwds) as a 3-tuple
*metaclass* is the appropriate metaclass
*namespace* is the prepared class namespace
*kwds* is an updated copy of the passed in kwds argument with any
'metaclass' entry removed. If no kwds argument is passed in, this will
be an empty dict.
"""
if kwds is None:
kwds = {}
else:
kwds = dict(kwds) # Don't alter the provided mapping
if 'metaclass' in kwds:
meta = kwds.pop('metaclass')
else:
if bases:
meta = type(bases[0])
else:
meta = type
if isinstance(meta, type):
# when meta is a type, we first determine the most-derived metaclass
# instead of invoking the initial candidate directly
meta = _calculate_meta(meta, bases)
if hasattr(meta, '__prepare__'):
ns = meta.__prepare__(name, bases, **kwds)
else:
ns = {}
return meta, ns, kwds
def _calculate_meta(meta, bases):
"""Calculate the most derived metaclass."""
winner = meta
for base in bases:
base_meta = type(base)
if issubclass(winner, base_meta):
continue
if issubclass(base_meta, winner):
winner = base_meta
continue
# else:
raise TypeError("metaclass conflict: "
"the metaclass of a derived class "
"must be a (non-strict) subclass "
"of the metaclasses of all its bases")
return winner
class DynamicClassAttribute:
"""Route attribute access on a class to __getattr__.
This is a descriptor, used to define attributes that act differently when
accessed through an instance and through a class. Instance access remains
normal, but access to an attribute through a class will be routed to the
class's __getattr__ method; this is done by raising AttributeError.
This allows one to have properties active on an instance, and have virtual
attributes on the class with the same name (see Enum for an example).
"""
def __init__(self, fget=None, fset=None, fdel=None, doc=None):
self.fget = fget
self.fset = fset
self.fdel = fdel
# next two lines make DynamicClassAttribute act the same as property
self.__doc__ = doc or fget.__doc__
self.overwrite_doc = doc is None
# support for abstract methods
self.__isabstractmethod__ = bool(getattr(fget, '__isabstractmethod__', False))
def __get__(self, instance, ownerclass=None):
if instance is None:
if self.__isabstractmethod__:
return self
raise AttributeError()
elif self.fget is None:
raise AttributeError("unreadable attribute")
return self.fget(instance)
def __set__(self, instance, value):
if self.fset is None:
raise AttributeError("can't set attribute")
self.fset(instance, value)
def __delete__(self, instance):
if self.fdel is None:
raise AttributeError("can't delete attribute")
self.fdel(instance)
def getter(self, fget):
fdoc = fget.__doc__ if self.overwrite_doc else None
result = type(self)(fget, self.fset, self.fdel, fdoc or self.__doc__)
result.overwrite_doc = self.overwrite_doc
return result
def setter(self, fset):
result = type(self)(self.fget, fset, self.fdel, self.__doc__)
result.overwrite_doc = self.overwrite_doc
return result
def deleter(self, fdel):
result = type(self)(self.fget, self.fset, fdel, self.__doc__)
result.overwrite_doc = self.overwrite_doc
return result
__all__ = [n for n in globals() if n[:1] != '_']
import abc
from abc import abstractmethod, abstractproperty
import collections
import contextlib
import functools
import re as stdlib_re # Avoid confusion with the re we export.
import sys
import types
try:
import collections.abc as collections_abc
except ImportError:
import collections as collections_abc # Fallback for PY3.2.
if sys.version_info[:2] >= (3, 6):
import _collections_abc # Needed for private function _check_methods # noqa
try:
from types import WrapperDescriptorType, MethodWrapperType, MethodDescriptorType
except ImportError:
WrapperDescriptorType = type(object.__init__)
MethodWrapperType = type(object().__str__)
MethodDescriptorType = type(str.join)
# Please keep __all__ alphabetized within each category.
__all__ = [
# Super-special typing primitives.
'Any',
'Callable',
'ClassVar',
'Generic',
'Optional',
'Tuple',
'Type',
'TypeVar',
'Union',
# ABCs (from collections.abc).
'AbstractSet', # collections.abc.Set.
'GenericMeta', # subclass of abc.ABCMeta and a metaclass
# for 'Generic' and ABCs below.
'ByteString',
'Container',
'ContextManager',
'Hashable',
'ItemsView',
'Iterable',
'Iterator',
'KeysView',
'Mapping',
'MappingView',
'MutableMapping',
'MutableSequence',
'MutableSet',
'Sequence',
'Sized',
'ValuesView',
# The following are added depending on presence
# of their non-generic counterparts in stdlib:
# Awaitable,
# AsyncIterator,
# AsyncIterable,
# Coroutine,
# Collection,
# AsyncGenerator,
# AsyncContextManager
# Structural checks, a.k.a. protocols.
'Reversible',
'SupportsAbs',
'SupportsBytes',
'SupportsComplex',
'SupportsFloat',
'SupportsIndex',
'SupportsInt',
'SupportsRound',
# Concrete collection types.
'Counter',
'Deque',
'Dict',
'DefaultDict',
'List',
'Set',
'FrozenSet',
'NamedTuple', # Not really a type.
'Generator',
# One-off things.
'AnyStr',
'cast',
'get_type_hints',
'NewType',
'no_type_check',
'no_type_check_decorator',
'NoReturn',
'overload',
'Text',
'TYPE_CHECKING',
]
# The pseudo-submodules 're' and 'io' are part of the public
# namespace, but excluded from __all__ because they might stomp on
# legitimate imports of those modules.
def _qualname(x):
if sys.version_info[:2] >= (3, 3):
return x.__qualname__
else:
# Fall back to just name.
return x.__name__
def _trim_name(nm):
whitelist = ('_TypeAlias', '_ForwardRef', '_TypingBase', '_FinalTypingBase')
if nm.startswith('_') and nm not in whitelist:
nm = nm[1:]
return nm
class TypingMeta(type):
"""Metaclass for most types defined in typing module
(not a part of public API).
This overrides __new__() to require an extra keyword parameter
'_root', which serves as a guard against naive subclassing of the
typing classes. Any legitimate class defined using a metaclass
derived from TypingMeta must pass _root=True.
This also defines a dummy constructor (all the work for most typing
constructs is done in __new__) and a nicer repr().
"""
_is_protocol = False
def __new__(cls, name, bases, namespace, *, _root=False):
if not _root:
raise TypeError("Cannot subclass %s" %
(', '.join(map(_type_repr, bases)) or '()'))
return super().__new__(cls, name, bases, namespace)
def __init__(self, *args, **kwds):
pass
def _eval_type(self, globalns, localns):
"""Override this in subclasses to interpret forward references.
For example, List['C'] is internally stored as
List[_ForwardRef('C')], which should evaluate to List[C],
where C is an object found in globalns or localns (searching
localns first, of course).
"""
return self
def _get_type_vars(self, tvars):
pass
def __repr__(self):
qname = _trim_name(_qualname(self))
return '%s.%s' % (self.__module__, qname)
class _TypingBase(metaclass=TypingMeta, _root=True):
"""Internal indicator of special typing constructs."""
__slots__ = ('__weakref__',)
def __init__(self, *args, **kwds):
pass
def __new__(cls, *args, **kwds):
"""Constructor.
This only exists to give a better error message in case
someone tries to subclass a special typing object (not a good idea).
"""
if (len(args) == 3 and
isinstance(args[0], str) and
isinstance(args[1], tuple)):
# Close enough.
raise TypeError("Cannot subclass %r" % cls)
return super().__new__(cls)
# Things that are not classes also need these.
def _eval_type(self, globalns, localns):
return self
def _get_type_vars(self, tvars):
pass
def __repr__(self):
cls = type(self)
qname = _trim_name(_qualname(cls))
return '%s.%s' % (cls.__module__, qname)
def __call__(self, *args, **kwds):
raise TypeError("Cannot instantiate %r" % type(self))
class _FinalTypingBase(_TypingBase, _root=True):
"""Internal mix-in class to prevent instantiation.
Prevents instantiation unless _root=True is given in class call.
It is used to create pseudo-singleton instances Any, Union, Optional, etc.
"""
__slots__ = ()
def __new__(cls, *args, _root=False, **kwds):
self = super().__new__(cls, *args, **kwds)
if _root is True:
return self
raise TypeError("Cannot instantiate %r" % cls)
def __reduce__(self):
return _trim_name(type(self).__name__)
class _ForwardRef(_TypingBase, _root=True):
"""Internal wrapper to hold a forward reference."""
__slots__ = ('__forward_arg__', '__forward_code__',
'__forward_evaluated__', '__forward_value__')
def __init__(self, arg):
super().__init__(arg)
if not isinstance(arg, str):
raise TypeError('Forward reference must be a string -- got %r' % (arg,))
try:
code = compile(arg, '<string>', 'eval')
except SyntaxError:
raise SyntaxError('Forward reference must be an expression -- got %r' %
(arg,))
self.__forward_arg__ = arg
self.__forward_code__ = code
self.__forward_evaluated__ = False
self.__forward_value__ = None
def _eval_type(self, globalns, localns):
if not self.__forward_evaluated__ or localns is not globalns:
if globalns is None and localns is None:
globalns = localns = {}
elif globalns is None:
globalns = localns
elif localns is None:
localns = globalns
self.__forward_value__ = _type_check(
eval(self.__forward_code__, globalns, localns),
"Forward references must evaluate to types.")
self.__forward_evaluated__ = True
return self.__forward_value__
def __eq__(self, other):
if not isinstance(other, _ForwardRef):
return NotImplemented
if self.__forward_evaluated__ and other.__forward_evaluated__:
return (self.__forward_arg__ == other.__forward_arg__ and
self.__forward_value__ == other.__forward_value__)
return self.__forward_arg__ == other.__forward_arg__
def __hash__(self):
return hash(self.__forward_arg__)
def __instancecheck__(self, obj):
raise TypeError("Forward references cannot be used with isinstance().")
def __subclasscheck__(self, cls):
raise TypeError("Forward references cannot be used with issubclass().")
def __repr__(self):
return '_ForwardRef(%r)' % (self.__forward_arg__,)
class _TypeAlias(_TypingBase, _root=True):
"""Internal helper class for defining generic variants of concrete types.
Note that this is not a type; let's call it a pseudo-type. It cannot
be used in instance and subclass checks in parameterized form, i.e.
``isinstance(42, Match[str])`` raises ``TypeError`` instead of returning
``False``.
"""
__slots__ = ('name', 'type_var', 'impl_type', 'type_checker')
def __init__(self, name, type_var, impl_type, type_checker):
"""Initializer.
Args:
name: The name, e.g. 'Pattern'.
type_var: The type parameter, e.g. AnyStr, or the
specific type, e.g. str.
impl_type: The implementation type.
type_checker: Function that takes an impl_type instance.
and returns a value that should be a type_var instance.
"""
assert isinstance(name, str), repr(name)
assert isinstance(impl_type, type), repr(impl_type)
assert not isinstance(impl_type, TypingMeta), repr(impl_type)
assert isinstance(type_var, (type, _TypingBase)), repr(type_var)
self.name = name
self.type_var = type_var
self.impl_type = impl_type
self.type_checker = type_checker
def __repr__(self):
return "%s[%s]" % (self.name, _type_repr(self.type_var))
def __getitem__(self, parameter):
if not isinstance(self.type_var, TypeVar):
raise TypeError("%s cannot be further parameterized." % self)
if self.type_var.__constraints__ and isinstance(parameter, type):
if not issubclass(parameter, self.type_var.__constraints__):
raise TypeError("%s is not a valid substitution for %s." %
(parameter, self.type_var))
if isinstance(parameter, TypeVar) and parameter is not self.type_var:
raise TypeError("%s cannot be re-parameterized." % self)
return self.__class__(self.name, parameter,
self.impl_type, self.type_checker)
def __eq__(self, other):
if not isinstance(other, _TypeAlias):
return NotImplemented
return self.name == other.name and self.type_var == other.type_var
def __hash__(self):
return hash((self.name, self.type_var))
def __instancecheck__(self, obj):
if not isinstance(self.type_var, TypeVar):
raise TypeError("Parameterized type aliases cannot be used "
"with isinstance().")
return isinstance(obj, self.impl_type)
def __subclasscheck__(self, cls):
if not isinstance(self.type_var, TypeVar):
raise TypeError("Parameterized type aliases cannot be used "
"with issubclass().")
return issubclass(cls, self.impl_type)
def _get_type_vars(types, tvars):
for t in types:
if isinstance(t, TypingMeta) or isinstance(t, _TypingBase):
t._get_type_vars(tvars)
def _type_vars(types):
tvars = []
_get_type_vars(types, tvars)
return tuple(tvars)
def _eval_type(t, globalns, localns):
if isinstance(t, TypingMeta) or isinstance(t, _TypingBase):
return t._eval_type(globalns, localns)
return t
def _type_check(arg, msg):
"""Check that the argument is a type, and return it (internal helper).
As a special case, accept None and return type(None) instead.
Also, _TypeAlias instances (e.g. Match, Pattern) are acceptable.
The msg argument is a human-readable error message, e.g.
"Union[arg, ...]: arg should be a type."
We append the repr() of the actual value (truncated to 100 chars).
"""
if arg is None:
return type(None)
if isinstance(arg, str):
arg = _ForwardRef(arg)
if (
isinstance(arg, _TypingBase) and type(arg).__name__ == '_ClassVar' or
not isinstance(arg, (type, _TypingBase)) and not callable(arg)
):
raise TypeError(msg + " Got %.100r." % (arg,))
# Bare Union etc. are not valid as type arguments
if (
type(arg).__name__ in ('_Union', '_Optional') and
not getattr(arg, '__origin__', None) or
isinstance(arg, TypingMeta) and arg._gorg in (Generic, _Protocol)
):
raise TypeError("Plain %s is not valid as type argument" % arg)
return arg
def _type_repr(obj):
"""Return the repr() of an object, special-casing types (internal helper).
If obj is a type, we return a shorter version than the default
type.__repr__, based on the module and qualified name, which is
typically enough to uniquely identify a type. For everything
else, we fall back on repr(obj).
"""
if isinstance(obj, type) and not isinstance(obj, TypingMeta):
if obj.__module__ == 'builtins':
return _qualname(obj)
return '%s.%s' % (obj.__module__, _qualname(obj))
if obj is ...:
return('...')
if isinstance(obj, types.FunctionType):
return obj.__name__
return repr(obj)
class _Any(_FinalTypingBase, _root=True):
"""Special type indicating an unconstrained type.
- Any is compatible with every type.
- Any assumed to have all methods.
- All values assumed to be instances of Any.
Note that all the above statements are true from the point of view of
static type checkers. At runtime, Any should not be used with instance
or class checks.
"""
__slots__ = ()
def __instancecheck__(self, obj):
raise TypeError("Any cannot be used with isinstance().")
def __subclasscheck__(self, cls):
raise TypeError("Any cannot be used with issubclass().")
Any = _Any(_root=True)
class _NoReturn(_FinalTypingBase, _root=True):
"""Special type indicating functions that never return.
Example::
from typing import NoReturn
def stop() -> NoReturn:
raise Exception('no way')
This type is invalid in other positions, e.g., ``List[NoReturn]``
will fail in static type checkers.
"""
__slots__ = ()
def __instancecheck__(self, obj):
raise TypeError("NoReturn cannot be used with isinstance().")
def __subclasscheck__(self, cls):
raise TypeError("NoReturn cannot be used with issubclass().")
NoReturn = _NoReturn(_root=True)
class TypeVar(_TypingBase, _root=True):
"""Type variable.
Usage::
T = TypeVar('T') # Can be anything
A = TypeVar('A', str, bytes) # Must be str or bytes
Type variables exist primarily for the benefit of static type
checkers. They serve as the parameters for generic types as well
as for generic function definitions. See class Generic for more
information on generic types. Generic functions work as follows:
def repeat(x: T, n: int) -> List[T]:
'''Return a list containing n references to x.'''
return [x]*n
def longest(x: A, y: A) -> A:
'''Return the longest of two strings.'''
return x if len(x) >= len(y) else y
The latter example's signature is essentially the overloading
of (str, str) -> str and (bytes, bytes) -> bytes. Also note
that if the arguments are instances of some subclass of str,
the return type is still plain str.
At runtime, isinstance(x, T) and issubclass(C, T) will raise TypeError.
Type variables defined with covariant=True or contravariant=True
can be used do declare covariant or contravariant generic types.
See PEP 484 for more details. By default generic types are invariant
in all type variables.
Type variables can be introspected. e.g.:
T.__name__ == 'T'
T.__constraints__ == ()
T.__covariant__ == False
T.__contravariant__ = False
A.__constraints__ == (str, bytes)
"""
__slots__ = ('__name__', '__bound__', '__constraints__',
'__covariant__', '__contravariant__')
def __init__(self, name, *constraints, bound=None,
covariant=False, contravariant=False):
super().__init__(name, *constraints, bound=bound,
covariant=covariant, contravariant=contravariant)
self.__name__ = name
if covariant and contravariant:
raise ValueError("Bivariant types are not supported.")
self.__covariant__ = bool(covariant)
self.__contravariant__ = bool(contravariant)
if constraints and bound is not None:
raise TypeError("Constraints cannot be combined with bound=...")
if constraints and len(constraints) == 1:
raise TypeError("A single constraint is not allowed")
msg = "TypeVar(name, constraint, ...): constraints must be types."
self.__constraints__ = tuple(_type_check(t, msg) for t in constraints)
if bound:
self.__bound__ = _type_check(bound, "Bound must be a type.")
else:
self.__bound__ = None
def _get_type_vars(self, tvars):
if self not in tvars:
tvars.append(self)
def __repr__(self):
if self.__covariant__:
prefix = '+'
elif self.__contravariant__:
prefix = '-'
else:
prefix = '~'
return prefix + self.__name__
def __instancecheck__(self, instance):
raise TypeError("Type variables cannot be used with isinstance().")
def __subclasscheck__(self, cls):
raise TypeError("Type variables cannot be used with issubclass().")
# Some unconstrained type variables. These are used by the container types.
# (These are not for export.)
T = TypeVar('T') # Any type.
KT = TypeVar('KT') # Key type.
VT = TypeVar('VT') # Value type.
T_co = TypeVar('T_co', covariant=True) # Any type covariant containers.
V_co = TypeVar('V_co', covariant=True) # Any type covariant containers.
VT_co = TypeVar('VT_co', covariant=True) # Value type covariant containers.
T_contra = TypeVar('T_contra', contravariant=True) # Ditto contravariant.
# A useful type variable with constraints. This represents string types.
# (This one *is* for export!)
AnyStr = TypeVar('AnyStr', bytes, str)
def _replace_arg(arg, tvars, args):
"""An internal helper function: replace arg if it is a type variable
found in tvars with corresponding substitution from args or
with corresponding substitution sub-tree if arg is a generic type.
"""
if tvars is None:
tvars = []
if hasattr(arg, '_subs_tree') and isinstance(arg, (GenericMeta, _TypingBase)):
return arg._subs_tree(tvars, args)
if isinstance(arg, TypeVar):
for i, tvar in enumerate(tvars):
if arg == tvar:
return args[i]
return arg
# Special typing constructs Union, Optional, Generic, Callable and Tuple
# use three special attributes for internal bookkeeping of generic types:
# * __parameters__ is a tuple of unique free type parameters of a generic
# type, for example, Dict[T, T].__parameters__ == (T,);
# * __origin__ keeps a reference to a type that was subscripted,
# e.g., Union[T, int].__origin__ == Union;
# * __args__ is a tuple of all arguments used in subscripting,
# e.g., Dict[T, int].__args__ == (T, int).
def _subs_tree(cls, tvars=None, args=None):
"""An internal helper function: calculate substitution tree
for generic cls after replacing its type parameters with
substitutions in tvars -> args (if any).
Repeat the same following __origin__'s.
Return a list of arguments with all possible substitutions
performed. Arguments that are generic classes themselves are represented
as tuples (so that no new classes are created by this function).
For example: _subs_tree(List[Tuple[int, T]][str]) == [(Tuple, int, str)]
"""
if cls.__origin__ is None:
return cls
# Make of chain of origins (i.e. cls -> cls.__origin__)
current = cls.__origin__
orig_chain = []
while current.__origin__ is not None:
orig_chain.append(current)
current = current.__origin__
# Replace type variables in __args__ if asked ...
tree_args = []
for arg in cls.__args__:
tree_args.append(_replace_arg(arg, tvars, args))
# ... then continue replacing down the origin chain.
for ocls in orig_chain:
new_tree_args = []
for arg in ocls.__args__:
new_tree_args.append(_replace_arg(arg, ocls.__parameters__, tree_args))
tree_args = new_tree_args
return tree_args
def _remove_dups_flatten(parameters):
"""An internal helper for Union creation and substitution: flatten Union's
among parameters, then remove duplicates and strict subclasses.
"""
# Flatten out Union[Union[...], ...].
params = []
for p in parameters:
if isinstance(p, _Union) and p.__origin__ is Union:
params.extend(p.__args__)
elif isinstance(p, tuple) and len(p) > 0 and p[0] is Union:
params.extend(p[1:])
else:
params.append(p)
# Weed out strict duplicates, preserving the first of each occurrence.
all_params = set(params)
if len(all_params) < len(params):
new_params = []
for t in params:
if t in all_params:
new_params.append(t)
all_params.remove(t)
params = new_params
assert not all_params, all_params
# Weed out subclasses.
# E.g. Union[int, Employee, Manager] == Union[int, Employee].
# If object is present it will be sole survivor among proper classes.
# Never discard type variables.
# (In particular, Union[str, AnyStr] != AnyStr.)
all_params = set(params)
for t1 in params:
if not isinstance(t1, type):
continue
if any(isinstance(t2, type) and issubclass(t1, t2)
for t2 in all_params - {t1}
if not (isinstance(t2, GenericMeta) and
t2.__origin__ is not None)):
all_params.remove(t1)
return tuple(t for t in params if t in all_params)
def _check_generic(cls, parameters):
# Check correct count for parameters of a generic cls (internal helper).
if not cls.__parameters__:
raise TypeError("%s is not a generic class" % repr(cls))
alen = len(parameters)
elen = len(cls.__parameters__)
if alen != elen:
raise TypeError("Too %s parameters for %s; actual %s, expected %s" %
("many" if alen > elen else "few", repr(cls), alen, elen))
_cleanups = []
def _tp_cache(func):
"""Internal wrapper caching __getitem__ of generic types with a fallback to
original function for non-hashable arguments.
"""
cached = functools.lru_cache()(func)
_cleanups.append(cached.cache_clear)
@functools.wraps(func)
def inner(*args, **kwds):
try:
return cached(*args, **kwds)
except TypeError:
pass # All real errors (not unhashable args) are raised below.
return func(*args, **kwds)
return inner
class _Union(_FinalTypingBase, _root=True):
"""Union type; Union[X, Y] means either X or Y.
To define a union, use e.g. Union[int, str]. Details:
- The arguments must be types and there must be at least one.
- None as an argument is a special case and is replaced by
type(None).
- Unions of unions are flattened, e.g.::
Union[Union[int, str], float] == Union[int, str, float]
- Unions of a single argument vanish, e.g.::
Union[int] == int # The constructor actually returns int
- Redundant arguments are skipped, e.g.::
Union[int, str, int] == Union[int, str]
- When comparing unions, the argument order is ignored, e.g.::
Union[int, str] == Union[str, int]
- When two arguments have a subclass relationship, the least
derived argument is kept, e.g.::
class Employee: pass
class Manager(Employee): pass
Union[int, Employee, Manager] == Union[int, Employee]
Union[Manager, int, Employee] == Union[int, Employee]
Union[Employee, Manager] == Employee
- Similar for object::
Union[int, object] == object
- You cannot subclass or instantiate a union.
- You can use Optional[X] as a shorthand for Union[X, None].
"""
__slots__ = ('__parameters__', '__args__', '__origin__', '__tree_hash__')
def __new__(cls, parameters=None, origin=None, *args, _root=False):
self = super().__new__(cls, parameters, origin, *args, _root=_root)
if origin is None:
self.__parameters__ = None
self.__args__ = None
self.__origin__ = None
self.__tree_hash__ = hash(frozenset(('Union',)))
return self
if not isinstance(parameters, tuple):
raise TypeError("Expected parameters=<tuple>")
if origin is Union:
parameters = _remove_dups_flatten(parameters)
# It's not a union if there's only one type left.
if len(parameters) == 1:
return parameters[0]
self.__parameters__ = _type_vars(parameters)
self.__args__ = parameters
self.__origin__ = origin
# Pre-calculate the __hash__ on instantiation.
# This improves speed for complex substitutions.
subs_tree = self._subs_tree()
if isinstance(subs_tree, tuple):
self.__tree_hash__ = hash(frozenset(subs_tree))
else:
self.__tree_hash__ = hash(subs_tree)
return self
def _eval_type(self, globalns, localns):
if self.__args__ is None:
return self
ev_args = tuple(_eval_type(t, globalns, localns) for t in self.__args__)
ev_origin = _eval_type(self.__origin__, globalns, localns)
if ev_args == self.__args__ and ev_origin == self.__origin__:
# Everything is already evaluated.
return self
return self.__class__(ev_args, ev_origin, _root=True)
def _get_type_vars(self, tvars):
if self.__origin__ and self.__parameters__:
_get_type_vars(self.__parameters__, tvars)
def __repr__(self):
if self.__origin__ is None:
return super().__repr__()
tree = self._subs_tree()
if not isinstance(tree, tuple):
return repr(tree)
return tree[0]._tree_repr(tree)
def _tree_repr(self, tree):
arg_list = []
for arg in tree[1:]:
if not isinstance(arg, tuple):
arg_list.append(_type_repr(arg))
else:
arg_list.append(arg[0]._tree_repr(arg))
return super().__repr__() + '[%s]' % ', '.join(arg_list)
@_tp_cache
def __getitem__(self, parameters):
if parameters == ():
raise TypeError("Cannot take a Union of no types.")
if not isinstance(parameters, tuple):
parameters = (parameters,)
if self.__origin__ is None:
msg = "Union[arg, ...]: each arg must be a type."
else:
msg = "Parameters to generic types must be types."
parameters = tuple(_type_check(p, msg) for p in parameters)
if self is not Union:
_check_generic(self, parameters)
return self.__class__(parameters, origin=self, _root=True)
def _subs_tree(self, tvars=None, args=None):
if self is Union:
return Union # Nothing to substitute
tree_args = _subs_tree(self, tvars, args)
tree_args = _remove_dups_flatten(tree_args)
if len(tree_args) == 1:
return tree_args[0] # Union of a single type is that type
return (Union,) + tree_args
def __eq__(self, other):
if isinstance(other, _Union):
return self.__tree_hash__ == other.__tree_hash__
elif self is not Union:
return self._subs_tree() == other
else:
return self is other
def __hash__(self):
return self.__tree_hash__
def __instancecheck__(self, obj):
raise TypeError("Unions cannot be used with isinstance().")
def __subclasscheck__(self, cls):
raise TypeError("Unions cannot be used with issubclass().")
Union = _Union(_root=True)
class _Optional(_FinalTypingBase, _root=True):
"""Optional type.
Optional[X] is equivalent to Union[X, None].
"""
__slots__ = ()
@_tp_cache
def __getitem__(self, arg):
arg = _type_check(arg, "Optional[t] requires a single type.")
return Union[arg, type(None)]
Optional = _Optional(_root=True)
def _next_in_mro(cls):
"""Helper for Generic.__new__.
Returns the class after the last occurrence of Generic or
Generic[...] in cls.__mro__.
"""
next_in_mro = object
# Look for the last occurrence of Generic or Generic[...].
for i, c in enumerate(cls.__mro__[:-1]):
if isinstance(c, GenericMeta) and c._gorg is Generic:
next_in_mro = cls.__mro__[i + 1]
return next_in_mro
def _make_subclasshook(cls):
"""Construct a __subclasshook__ callable that incorporates
the associated __extra__ class in subclass checks performed
against cls.
"""
if isinstance(cls.__extra__, abc.ABCMeta):
# The logic mirrors that of ABCMeta.__subclasscheck__.
# Registered classes need not be checked here because
# cls and its extra share the same _abc_registry.
def __extrahook__(subclass):
res = cls.__extra__.__subclasshook__(subclass)
if res is not NotImplemented:
return res
if cls.__extra__ in subclass.__mro__:
return True
for scls in cls.__extra__.__subclasses__():
if isinstance(scls, GenericMeta):
continue
if issubclass(subclass, scls):
return True
return NotImplemented
else:
# For non-ABC extras we'll just call issubclass().
def __extrahook__(subclass):
if cls.__extra__ and issubclass(subclass, cls.__extra__):
return True
return NotImplemented
return __extrahook__
def _no_slots_copy(dct):
"""Internal helper: copy class __dict__ and clean slots class variables.
(They will be re-created if necessary by normal class machinery.)
"""
dict_copy = dict(dct)
if '__slots__' in dict_copy:
for slot in dict_copy['__slots__']:
dict_copy.pop(slot, None)
return dict_copy
class GenericMeta(TypingMeta, abc.ABCMeta):
"""Metaclass for generic types.
This is a metaclass for typing.Generic and generic ABCs defined in
typing module. User defined subclasses of GenericMeta can override
__new__ and invoke super().__new__. Note that GenericMeta.__new__
has strict rules on what is allowed in its bases argument:
* plain Generic is disallowed in bases;
* Generic[...] should appear in bases at most once;
* if Generic[...] is present, then it should list all type variables
that appear in other bases.
In addition, type of all generic bases is erased, e.g., C[int] is
stripped to plain C.
"""
def __new__(cls, name, bases, namespace,
tvars=None, args=None, origin=None, extra=None, orig_bases=None):
"""Create a new generic class. GenericMeta.__new__ accepts
keyword arguments that are used for internal bookkeeping, therefore
an override should pass unused keyword arguments to super().
"""
if tvars is not None:
# Called from __getitem__() below.
assert origin is not None
assert all(isinstance(t, TypeVar) for t in tvars), tvars
else:
# Called from class statement.
assert tvars is None, tvars
assert args is None, args
assert origin is None, origin
# Get the full set of tvars from the bases.
tvars = _type_vars(bases)
# Look for Generic[T1, ..., Tn].
# If found, tvars must be a subset of it.
# If not found, tvars is it.
# Also check for and reject plain Generic,
# and reject multiple Generic[...].
gvars = None
for base in bases:
if base is Generic:
raise TypeError("Cannot inherit from plain Generic")
if (isinstance(base, GenericMeta) and
base.__origin__ is Generic):
if gvars is not None:
raise TypeError(
"Cannot inherit from Generic[...] multiple types.")
gvars = base.__parameters__
if gvars is None:
gvars = tvars
else:
tvarset = set(tvars)
gvarset = set(gvars)
if not tvarset <= gvarset:
raise TypeError(
"Some type variables (%s) "
"are not listed in Generic[%s]" %
(", ".join(str(t) for t in tvars if t not in gvarset),
", ".join(str(g) for g in gvars)))
tvars = gvars
initial_bases = bases
if extra is not None and type(extra) is abc.ABCMeta and extra not in bases:
bases = (extra,) + bases
bases = tuple(b._gorg if isinstance(b, GenericMeta) else b for b in bases)
# remove bare Generic from bases if there are other generic bases
if any(isinstance(b, GenericMeta) and b is not Generic for b in bases):
bases = tuple(b for b in bases if b is not Generic)
namespace.update({'__origin__': origin, '__extra__': extra,
'_gorg': None if not origin else origin._gorg})
self = super().__new__(cls, name, bases, namespace, _root=True)
super(GenericMeta, self).__setattr__('_gorg',
self if not origin else origin._gorg)
self.__parameters__ = tvars
# Be prepared that GenericMeta will be subclassed by TupleMeta
# and CallableMeta, those two allow ..., (), or [] in __args___.
self.__args__ = tuple(... if a is _TypingEllipsis else
() if a is _TypingEmpty else
a for a in args) if args else None
# Speed hack (https://github.com/python/typing/issues/196).
self.__next_in_mro__ = _next_in_mro(self)
# Preserve base classes on subclassing (__bases__ are type erased now).
if orig_bases is None:
self.__orig_bases__ = initial_bases
# This allows unparameterized generic collections to be used
# with issubclass() and isinstance() in the same way as their
# collections.abc counterparts (e.g., isinstance([], Iterable)).
if (
'__subclasshook__' not in namespace and extra or
# allow overriding
getattr(self.__subclasshook__, '__name__', '') == '__extrahook__'
):
self.__subclasshook__ = _make_subclasshook(self)
if isinstance(extra, abc.ABCMeta):
self._abc_registry = extra._abc_registry
self._abc_cache = extra._abc_cache
elif origin is not None:
self._abc_registry = origin._abc_registry
self._abc_cache = origin._abc_cache
if origin and hasattr(origin, '__qualname__'): # Fix for Python 3.2.
self.__qualname__ = origin.__qualname__
self.__tree_hash__ = (hash(self._subs_tree()) if origin else
super(GenericMeta, self).__hash__())
return self
# _abc_negative_cache and _abc_negative_cache_version
# realised as descriptors, since GenClass[t1, t2, ...] always
# share subclass info with GenClass.
# This is an important memory optimization.
@property
def _abc_negative_cache(self):
if isinstance(self.__extra__, abc.ABCMeta):
return self.__extra__._abc_negative_cache
return self._gorg._abc_generic_negative_cache
@_abc_negative_cache.setter
def _abc_negative_cache(self, value):
if self.__origin__ is None:
if isinstance(self.__extra__, abc.ABCMeta):
self.__extra__._abc_negative_cache = value
else:
self._abc_generic_negative_cache = value
@property
def _abc_negative_cache_version(self):
if isinstance(self.__extra__, abc.ABCMeta):
return self.__extra__._abc_negative_cache_version
return self._gorg._abc_generic_negative_cache_version
@_abc_negative_cache_version.setter
def _abc_negative_cache_version(self, value):
if self.__origin__ is None:
if isinstance(self.__extra__, abc.ABCMeta):
self.__extra__._abc_negative_cache_version = value
else:
self._abc_generic_negative_cache_version = value
def _get_type_vars(self, tvars):
if self.__origin__ and self.__parameters__:
_get_type_vars(self.__parameters__, tvars)
def _eval_type(self, globalns, localns):
ev_origin = (self.__origin__._eval_type(globalns, localns)
if self.__origin__ else None)
ev_args = tuple(_eval_type(a, globalns, localns) for a
in self.__args__) if self.__args__ else None
if ev_origin == self.__origin__ and ev_args == self.__args__:
return self
return self.__class__(self.__name__,
self.__bases__,
_no_slots_copy(self.__dict__),
tvars=_type_vars(ev_args) if ev_args else None,
args=ev_args,
origin=ev_origin,
extra=self.__extra__,
orig_bases=self.__orig_bases__)
def __repr__(self):
if self.__origin__ is None:
return super().__repr__()
return self._tree_repr(self._subs_tree())
def _tree_repr(self, tree):
arg_list = []
for arg in tree[1:]:
if arg == ():
arg_list.append('()')
elif not isinstance(arg, tuple):
arg_list.append(_type_repr(arg))
else:
arg_list.append(arg[0]._tree_repr(arg))
return super().__repr__() + '[%s]' % ', '.join(arg_list)
def _subs_tree(self, tvars=None, args=None):
if self.__origin__ is None:
return self
tree_args = _subs_tree(self, tvars, args)
return (self._gorg,) + tuple(tree_args)
def __eq__(self, other):
if not isinstance(other, GenericMeta):
return NotImplemented
if self.__origin__ is None or other.__origin__ is None:
return self is other
return self.__tree_hash__ == other.__tree_hash__
def __hash__(self):
return self.__tree_hash__
@_tp_cache
def __getitem__(self, params):
if not isinstance(params, tuple):
params = (params,)
if not params and self._gorg is not Tuple:
raise TypeError(
"Parameter list to %s[...] cannot be empty" % _qualname(self))
msg = "Parameters to generic types must be types."
params = tuple(_type_check(p, msg) for p in params)
if self is Generic:
# Generic can only be subscripted with unique type variables.
if not all(isinstance(p, TypeVar) for p in params):
raise TypeError(
"Parameters to Generic[...] must all be type variables")
if len(set(params)) != len(params):
raise TypeError(
"Parameters to Generic[...] must all be unique")
tvars = params
args = params
elif self in (Tuple, Callable):
tvars = _type_vars(params)
args = params
elif self is _Protocol:
# _Protocol is internal, don't check anything.
tvars = params
args = params
elif self.__origin__ in (Generic, _Protocol):
# Can't subscript Generic[...] or _Protocol[...].
raise TypeError("Cannot subscript already-subscripted %s" %
repr(self))
else:
# Subscripting a regular Generic subclass.
_check_generic(self, params)
tvars = _type_vars(params)
args = params
prepend = (self,) if self.__origin__ is None else ()
return self.__class__(self.__name__,
prepend + self.__bases__,
_no_slots_copy(self.__dict__),
tvars=tvars,
args=args,
origin=self,
extra=self.__extra__,
orig_bases=self.__orig_bases__)
def __subclasscheck__(self, cls):
if self.__origin__ is not None:
if sys._getframe(1).f_globals['__name__'] not in ['abc', 'functools']:
raise TypeError("Parameterized generics cannot be used with class "
"or instance checks")
return False
if self is Generic:
raise TypeError("Class %r cannot be used with class "
"or instance checks" % self)
return super().__subclasscheck__(cls)
def __instancecheck__(self, instance):
# Since we extend ABC.__subclasscheck__ and
# ABC.__instancecheck__ inlines the cache checking done by the
# latter, we must extend __instancecheck__ too. For simplicity
# we just skip the cache check -- instance checks for generic
# classes are supposed to be rare anyways.
return issubclass(instance.__class__, self)
def __setattr__(self, attr, value):
# We consider all the subscripted generics as proxies for original class
if (
attr.startswith('__') and attr.endswith('__') or
attr.startswith('_abc_') or
self._gorg is None # The class is not fully created, see #typing/506
):
super(GenericMeta, self).__setattr__(attr, value)
else:
super(GenericMeta, self._gorg).__setattr__(attr, value)
# Prevent checks for Generic to crash when defining Generic.
Generic = None
def _generic_new(base_cls, cls, *args, **kwds):
# Assure type is erased on instantiation,
# but attempt to store it in __orig_class__
if cls.__origin__ is None:
if (base_cls.__new__ is object.__new__ and
cls.__init__ is not object.__init__):
return base_cls.__new__(cls)
else:
return base_cls.__new__(cls, *args, **kwds)
else:
origin = cls._gorg
if (base_cls.__new__ is object.__new__ and
cls.__init__ is not object.__init__):
obj = base_cls.__new__(origin)
else:
obj = base_cls.__new__(origin, *args, **kwds)
try:
obj.__orig_class__ = cls
except AttributeError:
pass
obj.__init__(*args, **kwds)
return obj
class Generic(metaclass=GenericMeta):
"""Abstract base class for generic types.
A generic type is typically declared by inheriting from
this class parameterized with one or more type variables.
For example, a generic mapping type might be defined as::
class Mapping(Generic[KT, VT]):
def __getitem__(self, key: KT) -> VT:
...
# Etc.
This class can then be used as follows::
def lookup_name(mapping: Mapping[KT, VT], key: KT, default: VT) -> VT:
try:
return mapping[key]
except KeyError:
return default
"""
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Generic:
raise TypeError("Type Generic cannot be instantiated; "
"it can be used only as a base class")
return _generic_new(cls.__next_in_mro__, cls, *args, **kwds)
class _TypingEmpty:
"""Internal placeholder for () or []. Used by TupleMeta and CallableMeta
to allow empty list/tuple in specific places, without allowing them
to sneak in where prohibited.
"""
class _TypingEllipsis:
"""Internal placeholder for ... (ellipsis)."""
class TupleMeta(GenericMeta):
"""Metaclass for Tuple (internal)."""
@_tp_cache
def __getitem__(self, parameters):
if self.__origin__ is not None or self._gorg is not Tuple:
# Normal generic rules apply if this is not the first subscription
# or a subscription of a subclass.
return super().__getitem__(parameters)
if parameters == ():
return super().__getitem__((_TypingEmpty,))
if not isinstance(parameters, tuple):
parameters = (parameters,)
if len(parameters) == 2 and parameters[1] is ...:
msg = "Tuple[t, ...]: t must be a type."
p = _type_check(parameters[0], msg)
return super().__getitem__((p, _TypingEllipsis))
msg = "Tuple[t0, t1, ...]: each t must be a type."
parameters = tuple(_type_check(p, msg) for p in parameters)
return super().__getitem__(parameters)
def __instancecheck__(self, obj):
if self.__args__ is None:
return isinstance(obj, tuple)
raise TypeError("Parameterized Tuple cannot be used "
"with isinstance().")
def __subclasscheck__(self, cls):
if self.__args__ is None:
return issubclass(cls, tuple)
raise TypeError("Parameterized Tuple cannot be used "
"with issubclass().")
class Tuple(tuple, extra=tuple, metaclass=TupleMeta):
"""Tuple type; Tuple[X, Y] is the cross-product type of X and Y.
Example: Tuple[T1, T2] is a tuple of two elements corresponding
to type variables T1 and T2. Tuple[int, float, str] is a tuple
of an int, a float and a string.
To specify a variable-length tuple of homogeneous type, use Tuple[T, ...].
"""
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Tuple:
raise TypeError("Type Tuple cannot be instantiated; "
"use tuple() instead")
return _generic_new(tuple, cls, *args, **kwds)
class CallableMeta(GenericMeta):
"""Metaclass for Callable (internal)."""
def __repr__(self):
if self.__origin__ is None:
return super().__repr__()
return self._tree_repr(self._subs_tree())
def _tree_repr(self, tree):
if self._gorg is not Callable:
return super()._tree_repr(tree)
# For actual Callable (not its subclass) we override
# super()._tree_repr() for nice formatting.
arg_list = []
for arg in tree[1:]:
if not isinstance(arg, tuple):
arg_list.append(_type_repr(arg))
else:
arg_list.append(arg[0]._tree_repr(arg))
if arg_list[0] == '...':
return repr(tree[0]) + '[..., %s]' % arg_list[1]
return (repr(tree[0]) +
'[[%s], %s]' % (', '.join(arg_list[:-1]), arg_list[-1]))
def __getitem__(self, parameters):
"""A thin wrapper around __getitem_inner__ to provide the latter
with hashable arguments to improve speed.
"""
if self.__origin__ is not None or self._gorg is not Callable:
return super().__getitem__(parameters)
if not isinstance(parameters, tuple) or len(parameters) != 2:
raise TypeError("Callable must be used as "
"Callable[[arg, ...], result].")
args, result = parameters
if args is Ellipsis:
parameters = (Ellipsis, result)
else:
if not isinstance(args, list):
raise TypeError("Callable[args, result]: args must be a list."
" Got %.100r." % (args,))
parameters = (tuple(args), result)
return self.__getitem_inner__(parameters)
@_tp_cache
def __getitem_inner__(self, parameters):
args, result = parameters
msg = "Callable[args, result]: result must be a type."
result = _type_check(result, msg)
if args is Ellipsis:
return super().__getitem__((_TypingEllipsis, result))
msg = "Callable[[arg, ...], result]: each arg must be a type."
args = tuple(_type_check(arg, msg) for arg in args)
parameters = args + (result,)
return super().__getitem__(parameters)
class Callable(extra=collections_abc.Callable, metaclass=CallableMeta):
"""Callable type; Callable[[int], str] is a function of (int) -> str.
The subscription syntax must always be used with exactly two
values: the argument list and the return type. The argument list
must be a list of types or ellipsis; the return type must be a single type.
There is no syntax to indicate optional or keyword arguments,
such function types are rarely used as callback types.
"""
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Callable:
raise TypeError("Type Callable cannot be instantiated; "
"use a non-abstract subclass instead")
return _generic_new(cls.__next_in_mro__, cls, *args, **kwds)
class _ClassVar(_FinalTypingBase, _root=True):
"""Special type construct to mark class variables.
An annotation wrapped in ClassVar indicates that a given
attribute is intended to be used as a class variable and
should not be set on instances of that class. Usage::
class Starship:
stats: ClassVar[Dict[str, int]] = {} # class variable
damage: int = 10 # instance variable
ClassVar accepts only types and cannot be further subscribed.
Note that ClassVar is not a class itself, and should not
be used with isinstance() or issubclass().
"""
__slots__ = ('__type__',)
def __init__(self, tp=None, **kwds):
self.__type__ = tp
def __getitem__(self, item):
cls = type(self)
if self.__type__ is None:
return cls(_type_check(item,
'{} accepts only single type.'.format(cls.__name__[1:])),
_root=True)
raise TypeError('{} cannot be further subscripted'
.format(cls.__name__[1:]))
def _eval_type(self, globalns, localns):
new_tp = _eval_type(self.__type__, globalns, localns)
if new_tp == self.__type__:
return self
return type(self)(new_tp, _root=True)
def __repr__(self):
r = super().__repr__()
if self.__type__ is not None:
r += '[{}]'.format(_type_repr(self.__type__))
return r
def __hash__(self):
return hash((type(self).__name__, self.__type__))
def __eq__(self, other):
if not isinstance(other, _ClassVar):
return NotImplemented
if self.__type__ is not None:
return self.__type__ == other.__type__
return self is other
ClassVar = _ClassVar(_root=True)
def cast(typ, val):
"""Cast a value to a type.
This returns the value unchanged. To the type checker this
signals that the return value has the designated type, but at
runtime we intentionally don't check anything (we want this
to be as fast as possible).
"""
return val
def _get_defaults(func):
"""Internal helper to extract the default arguments, by name."""
try:
code = func.__code__
except AttributeError:
# Some built-in functions don't have __code__, __defaults__, etc.
return {}
pos_count = code.co_argcount
arg_names = code.co_varnames
arg_names = arg_names[:pos_count]
defaults = func.__defaults__ or ()
kwdefaults = func.__kwdefaults__
res = dict(kwdefaults) if kwdefaults else {}
pos_offset = pos_count - len(defaults)
for name, value in zip(arg_names[pos_offset:], defaults):
assert name not in res
res[name] = value
return res
_allowed_types = (types.FunctionType, types.BuiltinFunctionType,
types.MethodType, types.ModuleType,
WrapperDescriptorType, MethodWrapperType, MethodDescriptorType)
def get_type_hints(obj, globalns=None, localns=None):
"""Return type hints for an object.
This is often the same as obj.__annotations__, but it handles
forward references encoded as string literals, and if necessary
adds Optional[t] if a default value equal to None is set.
The argument may be a module, class, method, or function. The annotations
are returned as a dictionary. For classes, annotations include also
inherited members.
TypeError is raised if the argument is not of a type that can contain
annotations, and an empty dictionary is returned if no annotations are
present.
BEWARE -- the behavior of globalns and localns is counterintuitive
(unless you are familiar with how eval() and exec() work). The
search order is locals first, then globals.
- If no dict arguments are passed, an attempt is made to use the
globals from obj (or the respective module's globals for classes),
and these are also used as the locals. If the object does not appear
to have globals, an empty dictionary is used.
- If one dict argument is passed, it is used for both globals and
locals.
- If two dict arguments are passed, they specify globals and
locals, respectively.
"""
if getattr(obj, '__no_type_check__', None):
return {}
# Classes require a special treatment.
if isinstance(obj, type):
hints = {}
for base in reversed(obj.__mro__):
if globalns is None:
base_globals = sys.modules[base.__module__].__dict__
else:
base_globals = globalns
ann = base.__dict__.get('__annotations__', {})
for name, value in ann.items():
if value is None:
value = type(None)
if isinstance(value, str):
value = _ForwardRef(value)
value = _eval_type(value, base_globals, localns)
hints[name] = value
return hints
if globalns is None:
if isinstance(obj, types.ModuleType):
globalns = obj.__dict__
else:
globalns = getattr(obj, '__globals__', {})
if localns is None:
localns = globalns
elif localns is None:
localns = globalns
hints = getattr(obj, '__annotations__', None)
if hints is None:
# Return empty annotations for something that _could_ have them.
if isinstance(obj, _allowed_types):
return {}
else:
raise TypeError('{!r} is not a module, class, method, '
'or function.'.format(obj))
defaults = _get_defaults(obj)
hints = dict(hints)
for name, value in hints.items():
if value is None:
value = type(None)
if isinstance(value, str):
value = _ForwardRef(value)
value = _eval_type(value, globalns, localns)
if name in defaults and defaults[name] is None:
value = Optional[value]
hints[name] = value
return hints
def no_type_check(arg):
"""Decorator to indicate that annotations are not type hints.
The argument must be a class or function; if it is a class, it
applies recursively to all methods and classes defined in that class
(but not to methods defined in its superclasses or subclasses).
This mutates the function(s) or class(es) in place.
"""
if isinstance(arg, type):
arg_attrs = arg.__dict__.copy()
for attr, val in arg.__dict__.items():
if val in arg.__bases__ + (arg,):
arg_attrs.pop(attr)
for obj in arg_attrs.values():
if isinstance(obj, types.FunctionType):
obj.__no_type_check__ = True
if isinstance(obj, type):
no_type_check(obj)
try:
arg.__no_type_check__ = True
except TypeError: # built-in classes
pass
return arg
def no_type_check_decorator(decorator):
"""Decorator to give another decorator the @no_type_check effect.
This wraps the decorator with something that wraps the decorated
function in @no_type_check.
"""
@functools.wraps(decorator)
def wrapped_decorator(*args, **kwds):
func = decorator(*args, **kwds)
func = no_type_check(func)
return func
return wrapped_decorator
def _overload_dummy(*args, **kwds):
"""Helper for @overload to raise when called."""
raise NotImplementedError(
"You should not call an overloaded function. "
"A series of @overload-decorated functions "
"outside a stub module should always be followed "
"by an implementation that is not @overload-ed.")
def overload(func):
"""Decorator for overloaded functions/methods.
In a stub file, place two or more stub definitions for the same
function in a row, each decorated with @overload. For example:
@overload
def utf8(value: None) -> None: ...
@overload
def utf8(value: bytes) -> bytes: ...
@overload
def utf8(value: str) -> bytes: ...
In a non-stub file (i.e. a regular .py file), do the same but
follow it with an implementation. The implementation should *not*
be decorated with @overload. For example:
@overload
def utf8(value: None) -> None: ...
@overload
def utf8(value: bytes) -> bytes: ...
@overload
def utf8(value: str) -> bytes: ...
def utf8(value):
# implementation goes here
"""
return _overload_dummy
class _ProtocolMeta(GenericMeta):
"""Internal metaclass for _Protocol.
This exists so _Protocol classes can be generic without deriving
from Generic.
"""
def __instancecheck__(self, obj):
if _Protocol not in self.__bases__:
return super().__instancecheck__(obj)
raise TypeError("Protocols cannot be used with isinstance().")
def __subclasscheck__(self, cls):
if not self._is_protocol:
# No structural checks since this isn't a protocol.
return NotImplemented
if self is _Protocol:
# Every class is a subclass of the empty protocol.
return True
# Find all attributes defined in the protocol.
attrs = self._get_protocol_attrs()
for attr in attrs:
if not any(attr in d.__dict__ for d in cls.__mro__):
return False
return True
def _get_protocol_attrs(self):
# Get all Protocol base classes.
protocol_bases = []
for c in self.__mro__:
if getattr(c, '_is_protocol', False) and c.__name__ != '_Protocol':
protocol_bases.append(c)
# Get attributes included in protocol.
attrs = set()
for base in protocol_bases:
for attr in base.__dict__.keys():
# Include attributes not defined in any non-protocol bases.
for c in self.__mro__:
if (c is not base and attr in c.__dict__ and
not getattr(c, '_is_protocol', False)):
break
else:
if (not attr.startswith('_abc_') and
attr != '__abstractmethods__' and
attr != '__annotations__' and
attr != '__weakref__' and
attr != '_is_protocol' and
attr != '_gorg' and
attr != '__dict__' and
attr != '__args__' and
attr != '__slots__' and
attr != '_get_protocol_attrs' and
attr != '__next_in_mro__' and
attr != '__parameters__' and
attr != '__origin__' and
attr != '__orig_bases__' and
attr != '__extra__' and
attr != '__tree_hash__' and
attr != '__module__'):
attrs.add(attr)
return attrs
class _Protocol(metaclass=_ProtocolMeta):
"""Internal base class for protocol classes.
This implements a simple-minded structural issubclass check
(similar but more general than the one-offs in collections.abc
such as Hashable).
"""
__slots__ = ()
_is_protocol = True
# Various ABCs mimicking those in collections.abc.
# A few are simply re-exported for completeness.
Hashable = collections_abc.Hashable # Not generic.
if hasattr(collections_abc, 'Awaitable'):
class Awaitable(Generic[T_co], extra=collections_abc.Awaitable):
__slots__ = ()
__all__.append('Awaitable')
if hasattr(collections_abc, 'Coroutine'):
class Coroutine(Awaitable[V_co], Generic[T_co, T_contra, V_co],
extra=collections_abc.Coroutine):
__slots__ = ()
__all__.append('Coroutine')
if hasattr(collections_abc, 'AsyncIterable'):
class AsyncIterable(Generic[T_co], extra=collections_abc.AsyncIterable):
__slots__ = ()
class AsyncIterator(AsyncIterable[T_co],
extra=collections_abc.AsyncIterator):
__slots__ = ()
__all__.append('AsyncIterable')
__all__.append('AsyncIterator')
class Iterable(Generic[T_co], extra=collections_abc.Iterable):
__slots__ = ()
class Iterator(Iterable[T_co], extra=collections_abc.Iterator):
__slots__ = ()
class SupportsInt(_Protocol):
__slots__ = ()
@abstractmethod
def __int__(self) -> int:
pass
class SupportsFloat(_Protocol):
__slots__ = ()
@abstractmethod
def __float__(self) -> float:
pass
class SupportsComplex(_Protocol):
__slots__ = ()
@abstractmethod
def __complex__(self) -> complex:
pass
class SupportsBytes(_Protocol):
__slots__ = ()
@abstractmethod
def __bytes__(self) -> bytes:
pass
class SupportsIndex(_Protocol):
__slots__ = ()
@abstractmethod
def __index__(self) -> int:
pass
class SupportsAbs(_Protocol[T_co]):
__slots__ = ()
@abstractmethod
def __abs__(self) -> T_co:
pass
class SupportsRound(_Protocol[T_co]):
__slots__ = ()
@abstractmethod
def __round__(self, ndigits: int = 0) -> T_co:
pass
if hasattr(collections_abc, 'Reversible'):
class Reversible(Iterable[T_co], extra=collections_abc.Reversible):
__slots__ = ()
else:
class Reversible(_Protocol[T_co]):
__slots__ = ()
@abstractmethod
def __reversed__(self) -> 'Iterator[T_co]':
pass
Sized = collections_abc.Sized # Not generic.
class Container(Generic[T_co], extra=collections_abc.Container):
__slots__ = ()
if hasattr(collections_abc, 'Collection'):
class Collection(Sized, Iterable[T_co], Container[T_co],
extra=collections_abc.Collection):
__slots__ = ()
__all__.append('Collection')
# Callable was defined earlier.
if hasattr(collections_abc, 'Collection'):
class AbstractSet(Collection[T_co],
extra=collections_abc.Set):
__slots__ = ()
else:
class AbstractSet(Sized, Iterable[T_co], Container[T_co],
extra=collections_abc.Set):
__slots__ = ()
class MutableSet(AbstractSet[T], extra=collections_abc.MutableSet):
__slots__ = ()
# NOTE: It is only covariant in the value type.
if hasattr(collections_abc, 'Collection'):
class Mapping(Collection[KT], Generic[KT, VT_co],
extra=collections_abc.Mapping):
__slots__ = ()
else:
class Mapping(Sized, Iterable[KT], Container[KT], Generic[KT, VT_co],
extra=collections_abc.Mapping):
__slots__ = ()
class MutableMapping(Mapping[KT, VT], extra=collections_abc.MutableMapping):
__slots__ = ()
if hasattr(collections_abc, 'Reversible'):
if hasattr(collections_abc, 'Collection'):
class Sequence(Reversible[T_co], Collection[T_co],
extra=collections_abc.Sequence):
__slots__ = ()
else:
class Sequence(Sized, Reversible[T_co], Container[T_co],
extra=collections_abc.Sequence):
__slots__ = ()
else:
class Sequence(Sized, Iterable[T_co], Container[T_co],
extra=collections_abc.Sequence):
__slots__ = ()
class MutableSequence(Sequence[T], extra=collections_abc.MutableSequence):
__slots__ = ()
class ByteString(Sequence[int], extra=collections_abc.ByteString):
__slots__ = ()
class List(list, MutableSequence[T], extra=list):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is List:
raise TypeError("Type List cannot be instantiated; "
"use list() instead")
return _generic_new(list, cls, *args, **kwds)
class Deque(collections.deque, MutableSequence[T], extra=collections.deque):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Deque:
return collections.deque(*args, **kwds)
return _generic_new(collections.deque, cls, *args, **kwds)
class Set(set, MutableSet[T], extra=set):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Set:
raise TypeError("Type Set cannot be instantiated; "
"use set() instead")
return _generic_new(set, cls, *args, **kwds)
class FrozenSet(frozenset, AbstractSet[T_co], extra=frozenset):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is FrozenSet:
raise TypeError("Type FrozenSet cannot be instantiated; "
"use frozenset() instead")
return _generic_new(frozenset, cls, *args, **kwds)
class MappingView(Sized, Iterable[T_co], extra=collections_abc.MappingView):
__slots__ = ()
class KeysView(MappingView[KT], AbstractSet[KT],
extra=collections_abc.KeysView):
__slots__ = ()
class ItemsView(MappingView[Tuple[KT, VT_co]],
AbstractSet[Tuple[KT, VT_co]],
Generic[KT, VT_co],
extra=collections_abc.ItemsView):
__slots__ = ()
class ValuesView(MappingView[VT_co], extra=collections_abc.ValuesView):
__slots__ = ()
if hasattr(contextlib, 'AbstractContextManager'):
class ContextManager(Generic[T_co], extra=contextlib.AbstractContextManager):
__slots__ = ()
else:
class ContextManager(Generic[T_co]):
__slots__ = ()
def __enter__(self):
return self
@abc.abstractmethod
def __exit__(self, exc_type, exc_value, traceback):
return None
@classmethod
def __subclasshook__(cls, C):
if cls is ContextManager:
# In Python 3.6+, it is possible to set a method to None to
# explicitly indicate that the class does not implement an ABC
# (https://bugs.python.org/issue25958), but we do not support
# that pattern here because this fallback class is only used
# in Python 3.5 and earlier.
if (any("__enter__" in B.__dict__ for B in C.__mro__) and
any("__exit__" in B.__dict__ for B in C.__mro__)):
return True
return NotImplemented
if hasattr(contextlib, 'AbstractAsyncContextManager'):
class AsyncContextManager(Generic[T_co],
extra=contextlib.AbstractAsyncContextManager):
__slots__ = ()
__all__.append('AsyncContextManager')
elif sys.version_info[:2] >= (3, 5):
exec("""
class AsyncContextManager(Generic[T_co]):
__slots__ = ()
async def __aenter__(self):
return self
@abc.abstractmethod
async def __aexit__(self, exc_type, exc_value, traceback):
return None
@classmethod
def __subclasshook__(cls, C):
if cls is AsyncContextManager:
if sys.version_info[:2] >= (3, 6):
return _collections_abc._check_methods(C, "__aenter__", "__aexit__")
if (any("__aenter__" in B.__dict__ for B in C.__mro__) and
any("__aexit__" in B.__dict__ for B in C.__mro__)):
return True
return NotImplemented
__all__.append('AsyncContextManager')
""")
class Dict(dict, MutableMapping[KT, VT], extra=dict):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Dict:
raise TypeError("Type Dict cannot be instantiated; "
"use dict() instead")
return _generic_new(dict, cls, *args, **kwds)
class DefaultDict(collections.defaultdict, MutableMapping[KT, VT],
extra=collections.defaultdict):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is DefaultDict:
return collections.defaultdict(*args, **kwds)
return _generic_new(collections.defaultdict, cls, *args, **kwds)
class Counter(collections.Counter, Dict[T, int], extra=collections.Counter):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Counter:
return collections.Counter(*args, **kwds)
return _generic_new(collections.Counter, cls, *args, **kwds)
if hasattr(collections, 'ChainMap'):
# ChainMap only exists in 3.3+
__all__.append('ChainMap')
class ChainMap(collections.ChainMap, MutableMapping[KT, VT],
extra=collections.ChainMap):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is ChainMap:
return collections.ChainMap(*args, **kwds)
return _generic_new(collections.ChainMap, cls, *args, **kwds)
# Determine what base class to use for Generator.
if hasattr(collections_abc, 'Generator'):
# Sufficiently recent versions of 3.5 have a Generator ABC.
_G_base = collections_abc.Generator
else:
# Fall back on the exact type.
_G_base = types.GeneratorType
class Generator(Iterator[T_co], Generic[T_co, T_contra, V_co],
extra=_G_base):
__slots__ = ()
def __new__(cls, *args, **kwds):
if cls._gorg is Generator:
raise TypeError("Type Generator cannot be instantiated; "
"create a subclass instead")
return _generic_new(_G_base, cls, *args, **kwds)
if hasattr(collections_abc, 'AsyncGenerator'):
class AsyncGenerator(AsyncIterator[T_co], Generic[T_co, T_contra],
extra=collections_abc.AsyncGenerator):
__slots__ = ()
__all__.append('AsyncGenerator')
# Internal type variable used for Type[].
CT_co = TypeVar('CT_co', covariant=True, bound=type)
# This is not a real generic class. Don't use outside annotations.
class Type(Generic[CT_co], extra=type):
"""A special construct usable to annotate class objects.
For example, suppose we have the following classes::
class User: ... # Abstract base for User classes
class BasicUser(User): ...
class ProUser(User): ...
class TeamUser(User): ...
And a function that takes a class argument that's a subclass of
User and returns an instance of the corresponding class::
U = TypeVar('U', bound=User)
def new_user(user_class: Type[U]) -> U:
user = user_class()
# (Here we could write the user object to a database)
return user
joe = new_user(BasicUser)
At this point the type checker knows that joe has type BasicUser.
"""
__slots__ = ()
def _make_nmtuple(name, types):
msg = "NamedTuple('Name', [(f0, t0), (f1, t1), ...]); each t must be a type"
types = [(n, _type_check(t, msg)) for n, t in types]
nm_tpl = collections.namedtuple(name, [n for n, t in types])
# Prior to PEP 526, only _field_types attribute was assigned.
# Now, both __annotations__ and _field_types are used to maintain compatibility.
nm_tpl.__annotations__ = nm_tpl._field_types = collections.OrderedDict(types)
try:
nm_tpl.__module__ = sys._getframe(2).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
return nm_tpl
_PY36 = sys.version_info[:2] >= (3, 6)
# attributes prohibited to set in NamedTuple class syntax
_prohibited = ('__new__', '__init__', '__slots__', '__getnewargs__',
'_fields', '_field_defaults', '_field_types',
'_make', '_replace', '_asdict', '_source')
_special = ('__module__', '__name__', '__qualname__', '__annotations__')
class NamedTupleMeta(type):
def __new__(cls, typename, bases, ns):
if ns.get('_root', False):
return super().__new__(cls, typename, bases, ns)
if not _PY36:
raise TypeError("Class syntax for NamedTuple is only supported"
" in Python 3.6+")
types = ns.get('__annotations__', {})
nm_tpl = _make_nmtuple(typename, types.items())
defaults = []
defaults_dict = {}
for field_name in types:
if field_name in ns:
default_value = ns[field_name]
defaults.append(default_value)
defaults_dict[field_name] = default_value
elif defaults:
raise TypeError("Non-default namedtuple field {field_name} cannot "
"follow default field(s) {default_names}"
.format(field_name=field_name,
default_names=', '.join(defaults_dict.keys())))
nm_tpl.__new__.__annotations__ = collections.OrderedDict(types)
nm_tpl.__new__.__defaults__ = tuple(defaults)
nm_tpl._field_defaults = defaults_dict
# update from user namespace without overriding special namedtuple attributes
for key in ns:
if key in _prohibited:
raise AttributeError("Cannot overwrite NamedTuple attribute " + key)
elif key not in _special and key not in nm_tpl._fields:
setattr(nm_tpl, key, ns[key])
return nm_tpl
class NamedTuple(metaclass=NamedTupleMeta):
"""Typed version of namedtuple.
Usage in Python versions >= 3.6::
class Employee(NamedTuple):
name: str
id: int
This is equivalent to::
Employee = collections.namedtuple('Employee', ['name', 'id'])
The resulting class has extra __annotations__ and _field_types
attributes, giving an ordered dict mapping field names to types.
__annotations__ should be preferred, while _field_types
is kept to maintain pre PEP 526 compatibility. (The field names
are in the _fields attribute, which is part of the namedtuple
API.) Alternative equivalent keyword syntax is also accepted::
Employee = NamedTuple('Employee', name=str, id=int)
In Python versions <= 3.5 use::
Employee = NamedTuple('Employee', [('name', str), ('id', int)])
"""
_root = True
def __new__(*args, **kwargs):
if kwargs and not _PY36:
raise TypeError("Keyword syntax for NamedTuple is only supported"
" in Python 3.6+")
if not args:
raise TypeError('NamedTuple.__new__(): not enough arguments')
_, args = args[0], args[1:] # allow the "cls" keyword be passed
if args:
typename, args = args[0], args[1:] # allow the "typename" keyword be passed
elif 'typename' in kwargs:
typename = kwargs.pop('typename')
import warnings
warnings.warn("Passing 'typename' as keyword argument is deprecated",
DeprecationWarning, stacklevel=2)
else:
raise TypeError("NamedTuple.__new__() missing 1 required positional "
"argument: 'typename'")
if args:
try:
fields, = args # allow the "fields" keyword be passed
except ValueError:
raise TypeError('NamedTuple.__new__() takes from 2 to 3 '
'positional arguments but {} '
'were given'.format(len(args) + 2))
elif 'fields' in kwargs and len(kwargs) == 1:
fields = kwargs.pop('fields')
import warnings
warnings.warn("Passing 'fields' as keyword argument is deprecated",
DeprecationWarning, stacklevel=2)
else:
fields = None
if fields is None:
fields = kwargs.items()
elif kwargs:
raise TypeError("Either list of fields or keywords"
" can be provided to NamedTuple, not both")
return _make_nmtuple(typename, fields)
__new__.__text_signature__ = '($cls, typename, fields=None, /, **kwargs)'
def NewType(name, tp):
"""NewType creates simple unique types with almost zero
runtime overhead. NewType(name, tp) is considered a subtype of tp
by static type checkers. At runtime, NewType(name, tp) returns
a dummy function that simply returns its argument. Usage::
UserId = NewType('UserId', int)
def name_by_id(user_id: UserId) -> str:
...
UserId('user') # Fails type check
name_by_id(42) # Fails type check
name_by_id(UserId(42)) # OK
num = UserId(5) + 1 # type: int
"""
def new_type(x):
return x
new_type.__name__ = name
new_type.__supertype__ = tp
return new_type
# Python-version-specific alias (Python 2: unicode; Python 3: str)
Text = str
# Constant that's True when type checking, but False here.
TYPE_CHECKING = False
class IO(Generic[AnyStr]):
"""Generic base class for TextIO and BinaryIO.
This is an abstract, generic version of the return of open().
NOTE: This does not distinguish between the different possible
classes (text vs. binary, read vs. write vs. read/write,
append-only, unbuffered). The TextIO and BinaryIO subclasses
below capture the distinctions between text vs. binary, which is
pervasive in the interface; however we currently do not offer a
way to track the other distinctions in the type system.
"""
__slots__ = ()
@abstractproperty
def mode(self) -> str:
pass
@abstractproperty
def name(self) -> str:
pass
@abstractmethod
def close(self) -> None:
pass
@abstractproperty
def closed(self) -> bool:
pass
@abstractmethod
def fileno(self) -> int:
pass
@abstractmethod
def flush(self) -> None:
pass
@abstractmethod
def isatty(self) -> bool:
pass
@abstractmethod
def read(self, n: int = -1) -> AnyStr:
pass
@abstractmethod
def readable(self) -> bool:
pass
@abstractmethod
def readline(self, limit: int = -1) -> AnyStr:
pass
@abstractmethod
def readlines(self, hint: int = -1) -> List[AnyStr]:
pass
@abstractmethod
def seek(self, offset: int, whence: int = 0) -> int:
pass
@abstractmethod
def seekable(self) -> bool:
pass
@abstractmethod
def tell(self) -> int:
pass
@abstractmethod
def truncate(self, size: int = None) -> int:
pass
@abstractmethod
def writable(self) -> bool:
pass
@abstractmethod
def write(self, s: AnyStr) -> int:
pass
@abstractmethod
def writelines(self, lines: List[AnyStr]) -> None:
pass
@abstractmethod
def __enter__(self) -> 'IO[AnyStr]':
pass
@abstractmethod
def __exit__(self, type, value, traceback) -> None:
pass
class BinaryIO(IO[bytes]):
"""Typed version of the return of open() in binary mode."""
__slots__ = ()
@abstractmethod
def write(self, s: Union[bytes, bytearray]) -> int:
pass
@abstractmethod
def __enter__(self) -> 'BinaryIO':
pass
class TextIO(IO[str]):
"""Typed version of the return of open() in text mode."""
__slots__ = ()
@abstractproperty
def buffer(self) -> BinaryIO:
pass
@abstractproperty
def encoding(self) -> str:
pass
@abstractproperty
def errors(self) -> Optional[str]:
pass
@abstractproperty
def line_buffering(self) -> bool:
pass
@abstractproperty
def newlines(self) -> Any:
pass
@abstractmethod
def __enter__(self) -> 'TextIO':
pass
class io:
"""Wrapper namespace for IO generic classes."""
__all__ = ['IO', 'TextIO', 'BinaryIO']
IO = IO
TextIO = TextIO
BinaryIO = BinaryIO
io.__name__ = __name__ + '.io'
sys.modules[io.__name__] = io
Pattern = _TypeAlias('Pattern', AnyStr, type(stdlib_re.compile('')),
lambda p: p.pattern)
Match = _TypeAlias('Match', AnyStr, type(stdlib_re.match('', '')),
lambda m: m.re.pattern)
class re:
"""Wrapper namespace for re type aliases."""
__all__ = ['Pattern', 'Match']
Pattern = Pattern
Match = Match
re.__name__ = __name__ + '.re'
sys.modules[re.__name__] = re
#! /usr/bin/env python3
# Copyright 1994 by Lance Ellinghouse
# Cathedral City, California Republic, United States of America.
# All Rights Reserved
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Lance Ellinghouse
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# LANCE ELLINGHOUSE DISCLAIMS ALL WARRANTIES WITH REGARD TO
# THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
# FITNESS, IN NO EVENT SHALL LANCE ELLINGHOUSE CENTRUM BE LIABLE
# FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
#
# Modified by Jack Jansen, CWI, July 1995:
# - Use binascii module to do the actual line-by-line conversion
# between ascii and binary. This results in a 1000-fold speedup. The C
# version is still 5 times faster, though.
# - Arguments more compliant with python standard
"""Implementation of the UUencode and UUdecode functions.
encode(in_file, out_file [,name, mode])
decode(in_file [, out_file, mode])
"""
import binascii
import os
import sys
__all__ = ["Error", "encode", "decode"]
class Error(Exception):
pass
def encode(in_file, out_file, name=None, mode=None):
"""Uuencode file"""
#
# If in_file is a pathname open it and change defaults
#
opened_files = []
try:
if in_file == '-':
in_file = sys.stdin.buffer
elif isinstance(in_file, str):
if name is None:
name = os.path.basename(in_file)
if mode is None:
try:
mode = os.stat(in_file).st_mode
except AttributeError:
pass
in_file = open(in_file, 'rb')
opened_files.append(in_file)
#
# Open out_file if it is a pathname
#
if out_file == '-':
out_file = sys.stdout.buffer
elif isinstance(out_file, str):
out_file = open(out_file, 'wb')
opened_files.append(out_file)
#
# Set defaults for name and mode
#
if name is None:
name = '-'
if mode is None:
mode = 0o666
#
# Write the data
#
out_file.write(('begin %o %s\n' % ((mode & 0o777), name)).encode("ascii"))
data = in_file.read(45)
while len(data) > 0:
out_file.write(binascii.b2a_uu(data))
data = in_file.read(45)
out_file.write(b' \nend\n')
finally:
for f in opened_files:
f.close()
def decode(in_file, out_file=None, mode=None, quiet=False):
"""Decode uuencoded file"""
#
# Open the input file, if needed.
#
opened_files = []
if in_file == '-':
in_file = sys.stdin.buffer
elif isinstance(in_file, str):
in_file = open(in_file, 'rb')
opened_files.append(in_file)
try:
#
# Read until a begin is encountered or we've exhausted the file
#
while True:
hdr = in_file.readline()
if not hdr:
raise Error('No valid begin line found in input file')
if not hdr.startswith(b'begin'):
continue
hdrfields = hdr.split(b' ', 2)
if len(hdrfields) == 3 and hdrfields[0] == b'begin':
try:
int(hdrfields[1], 8)
break
except ValueError:
pass
if out_file is None:
# If the filename isn't ASCII, what's up with that?!?
out_file = hdrfields[2].rstrip(b' \t\r\n\f').decode("ascii")
if os.path.exists(out_file):
raise Error('Cannot overwrite existing file: %s' % out_file)
if mode is None:
mode = int(hdrfields[1], 8)
#
# Open the output file
#
if out_file == '-':
out_file = sys.stdout.buffer
elif isinstance(out_file, str):
fp = open(out_file, 'wb')
try:
os.path.chmod(out_file, mode)
except AttributeError:
pass
out_file = fp
opened_files.append(out_file)
#
# Main decoding loop
#
s = in_file.readline()
while s and s.strip(b' \t\r\n\f') != b'end':
try:
data = binascii.a2b_uu(s)
except binascii.Error as v:
# Workaround for broken uuencoders by /Fredrik Lundh
nbytes = (((s[0]-32) & 63) * 4 + 5) // 3
data = binascii.a2b_uu(s[:nbytes])
if not quiet:
sys.stderr.write("Warning: %s\n" % v)
out_file.write(data)
s = in_file.readline()
if not s:
raise Error('Truncated input file')
finally:
for f in opened_files:
f.close()
def test():
"""uuencode/uudecode main program"""
import optparse
parser = optparse.OptionParser(usage='usage: %prog [-d] [-t] [input [output]]')
parser.add_option('-d', '--decode', dest='decode', help='Decode (instead of encode)?', default=False, action='store_true')
parser.add_option('-t', '--text', dest='text', help='data is text, encoded format unix-compatible text?', default=False, action='store_true')
(options, args) = parser.parse_args()
if len(args) > 2:
parser.error('incorrect number of arguments')
sys.exit(1)
# Use the binary streams underlying stdin/stdout
input = sys.stdin.buffer
output = sys.stdout.buffer
if len(args) > 0:
input = args[0]
if len(args) > 1:
output = args[1]
if options.decode:
if options.text:
if isinstance(output, str):
output = open(output, 'wb')
else:
print(sys.argv[0], ': cannot do -t to stdout')
sys.exit(1)
decode(input, output)
else:
if options.text:
if isinstance(input, str):
input = open(input, 'rb')
else:
print(sys.argv[0], ': cannot do -t from stdin')
sys.exit(1)
encode(input, output)
if __name__ == '__main__':
test()
r"""UUID objects (universally unique identifiers) according to RFC 4122.
This module provides immutable UUID objects (class UUID) and the functions
uuid1(), uuid3(), uuid4(), uuid5() for generating version 1, 3, 4, and 5
UUIDs as specified in RFC 4122.
If all you want is a unique ID, you should probably call uuid1() or uuid4().
Note that uuid1() may compromise privacy since it creates a UUID containing
the computer's network address. uuid4() creates a random UUID.
Typical usage:
>>> import uuid
# make a UUID based on the host ID and current time
>>> uuid.uuid1() # doctest: +SKIP
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')
# make a UUID using an MD5 hash of a namespace UUID and a name
>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')
# make a random UUID
>>> uuid.uuid4() # doctest: +SKIP
UUID('16fd2706-8baf-433b-82eb-8c7fada847da')
# make a UUID using a SHA-1 hash of a namespace UUID and a name
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')
# make a UUID from a string of hex digits (braces and hyphens ignored)
>>> x = uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}')
# convert a UUID to a string of hex digits in standard form
>>> str(x)
'00010203-0405-0607-0809-0a0b0c0d0e0f'
# get the raw 16 bytes of the UUID
>>> x.bytes
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
# make a UUID from a 16-byte string
>>> uuid.UUID(bytes=x.bytes)
UUID('00010203-0405-0607-0809-0a0b0c0d0e0f')
"""
__author__ = 'Ka-Ping Yee <[email protected]>'
RESERVED_NCS, RFC_4122, RESERVED_MICROSOFT, RESERVED_FUTURE = [
'reserved for NCS compatibility', 'specified in RFC 4122',
'reserved for Microsoft compatibility', 'reserved for future definition']
int_ = int # The built-in int type
bytes_ = bytes # The built-in bytes type
class UUID(object):
"""Instances of the UUID class represent UUIDs as specified in RFC 4122.
UUID objects are immutable, hashable, and usable as dictionary keys.
Converting a UUID to a string with str() yields something in the form
'12345678-1234-1234-1234-123456789abc'. The UUID constructor accepts
five possible forms: a similar string of hexadecimal digits, or a tuple
of six integer fields (with 32-bit, 16-bit, 16-bit, 8-bit, 8-bit, and
48-bit values respectively) as an argument named 'fields', or a string
of 16 bytes (with all the integer fields in big-endian order) as an
argument named 'bytes', or a string of 16 bytes (with the first three
fields in little-endian order) as an argument named 'bytes_le', or a
single 128-bit integer as an argument named 'int'.
UUIDs have these read-only attributes:
bytes the UUID as a 16-byte string (containing the six
integer fields in big-endian byte order)
bytes_le the UUID as a 16-byte string (with time_low, time_mid,
and time_hi_version in little-endian byte order)
fields a tuple of the six integer fields of the UUID,
which are also available as six individual attributes
and two derived attributes:
time_low the first 32 bits of the UUID
time_mid the next 16 bits of the UUID
time_hi_version the next 16 bits of the UUID
clock_seq_hi_variant the next 8 bits of the UUID
clock_seq_low the next 8 bits of the UUID
node the last 48 bits of the UUID
time the 60-bit timestamp
clock_seq the 14-bit sequence number
hex the UUID as a 32-character hexadecimal string
int the UUID as a 128-bit integer
urn the UUID as a URN as specified in RFC 4122
variant the UUID variant (one of the constants RESERVED_NCS,
RFC_4122, RESERVED_MICROSOFT, or RESERVED_FUTURE)
version the UUID version number (1 through 5, meaningful only
when the variant is RFC_4122)
"""
def __init__(self, hex=None, bytes=None, bytes_le=None, fields=None,
int=None, version=None):
r"""Create a UUID from either a string of 32 hexadecimal digits,
a string of 16 bytes as the 'bytes' argument, a string of 16 bytes
in little-endian order as the 'bytes_le' argument, a tuple of six
integers (32-bit time_low, 16-bit time_mid, 16-bit time_hi_version,
8-bit clock_seq_hi_variant, 8-bit clock_seq_low, 48-bit node) as
the 'fields' argument, or a single 128-bit integer as the 'int'
argument. When a string of hex digits is given, curly braces,
hyphens, and a URN prefix are all optional. For example, these
expressions all yield the same UUID:
UUID('{12345678-1234-5678-1234-567812345678}')
UUID('12345678123456781234567812345678')
UUID('urn:uuid:12345678-1234-5678-1234-567812345678')
UUID(bytes='\x12\x34\x56\x78'*4)
UUID(bytes_le='\x78\x56\x34\x12\x34\x12\x78\x56' +
'\x12\x34\x56\x78\x12\x34\x56\x78')
UUID(fields=(0x12345678, 0x1234, 0x5678, 0x12, 0x34, 0x567812345678))
UUID(int=0x12345678123456781234567812345678)
Exactly one of 'hex', 'bytes', 'bytes_le', 'fields', or 'int' must
be given. The 'version' argument is optional; if given, the resulting
UUID will have its variant and version set according to RFC 4122,
overriding the given 'hex', 'bytes', 'bytes_le', 'fields', or 'int'.
"""
if [hex, bytes, bytes_le, fields, int].count(None) != 4:
raise TypeError('need one of hex, bytes, bytes_le, fields, or int')
if hex is not None:
hex = hex.replace('urn:', '').replace('uuid:', '')
hex = hex.strip('{}').replace('-', '')
if len(hex) != 32:
raise ValueError('badly formed hexadecimal UUID string')
int = int_(hex, 16)
if bytes_le is not None:
if len(bytes_le) != 16:
raise ValueError('bytes_le is not a 16-char string')
bytes = (bytes_(reversed(bytes_le[0:4])) +
bytes_(reversed(bytes_le[4:6])) +
bytes_(reversed(bytes_le[6:8])) +
bytes_le[8:])
if bytes is not None:
if len(bytes) != 16:
raise ValueError('bytes is not a 16-char string')
assert isinstance(bytes, bytes_), repr(bytes)
int = int_.from_bytes(bytes, byteorder='big')
if fields is not None:
if len(fields) != 6:
raise ValueError('fields is not a 6-tuple')
(time_low, time_mid, time_hi_version,
clock_seq_hi_variant, clock_seq_low, node) = fields
if not 0 <= time_low < 1<<32:
raise ValueError('field 1 out of range (need a 32-bit value)')
if not 0 <= time_mid < 1<<16:
raise ValueError('field 2 out of range (need a 16-bit value)')
if not 0 <= time_hi_version < 1<<16:
raise ValueError('field 3 out of range (need a 16-bit value)')
if not 0 <= clock_seq_hi_variant < 1<<8:
raise ValueError('field 4 out of range (need an 8-bit value)')
if not 0 <= clock_seq_low < 1<<8:
raise ValueError('field 5 out of range (need an 8-bit value)')
if not 0 <= node < 1<<48:
raise ValueError('field 6 out of range (need a 48-bit value)')
clock_seq = (clock_seq_hi_variant << 8) | clock_seq_low
int = ((time_low << 96) | (time_mid << 80) |
(time_hi_version << 64) | (clock_seq << 48) | node)
if int is not None:
if not 0 <= int < 1<<128:
raise ValueError('int is out of range (need a 128-bit value)')
if version is not None:
if not 1 <= version <= 5:
raise ValueError('illegal version number')
# Set the variant to RFC 4122.
int &= ~(0xc000 << 48)
int |= 0x8000 << 48
# Set the version number.
int &= ~(0xf000 << 64)
int |= version << 76
self.__dict__['int'] = int
def __eq__(self, other):
if isinstance(other, UUID):
return self.int == other.int
return NotImplemented
def __ne__(self, other):
if isinstance(other, UUID):
return self.int != other.int
return NotImplemented
# Q. What's the value of being able to sort UUIDs?
# A. Use them as keys in a B-Tree or similar mapping.
def __lt__(self, other):
if isinstance(other, UUID):
return self.int < other.int
return NotImplemented
def __gt__(self, other):
if isinstance(other, UUID):
return self.int > other.int
return NotImplemented
def __le__(self, other):
if isinstance(other, UUID):
return self.int <= other.int
return NotImplemented
def __ge__(self, other):
if isinstance(other, UUID):
return self.int >= other.int
return NotImplemented
def __hash__(self):
return hash(self.int)
def __int__(self):
return self.int
def __repr__(self):
return 'UUID(%r)' % str(self)
def __setattr__(self, name, value):
raise TypeError('UUID objects are immutable')
def __str__(self):
hex = '%032x' % self.int
return '%s-%s-%s-%s-%s' % (
hex[:8], hex[8:12], hex[12:16], hex[16:20], hex[20:])
@property
def bytes(self):
bytes = bytearray()
for shift in range(0, 128, 8):
bytes.insert(0, (self.int >> shift) & 0xff)
return bytes_(bytes)
@property
def bytes_le(self):
bytes = self.bytes
return (bytes_(reversed(bytes[0:4])) +
bytes_(reversed(bytes[4:6])) +
bytes_(reversed(bytes[6:8])) +
bytes[8:])
@property
def fields(self):
return (self.time_low, self.time_mid, self.time_hi_version,
self.clock_seq_hi_variant, self.clock_seq_low, self.node)
@property
def time_low(self):
return self.int >> 96
@property
def time_mid(self):
return (self.int >> 80) & 0xffff
@property
def time_hi_version(self):
return (self.int >> 64) & 0xffff
@property
def clock_seq_hi_variant(self):
return (self.int >> 56) & 0xff
@property
def clock_seq_low(self):
return (self.int >> 48) & 0xff
@property
def time(self):
return (((self.time_hi_version & 0x0fff) << 48) |
(self.time_mid << 32) | self.time_low)
@property
def clock_seq(self):
return (((self.clock_seq_hi_variant & 0x3f) << 8) |
self.clock_seq_low)
@property
def node(self):
return self.int & 0xffffffffffff
@property
def hex(self):
return '%032x' % self.int
@property
def urn(self):
return 'urn:uuid:' + str(self)
@property
def variant(self):
if not self.int & (0x8000 << 48):
return RESERVED_NCS
elif not self.int & (0x4000 << 48):
return RFC_4122
elif not self.int & (0x2000 << 48):
return RESERVED_MICROSOFT
else:
return RESERVED_FUTURE
@property
def version(self):
# The version bits are only meaningful for RFC 4122 UUIDs.
if self.variant == RFC_4122:
return int((self.int >> 76) & 0xf)
def _popen(command, args):
import os, shutil
executable = shutil.which(command)
if executable is None:
path = os.pathsep.join(('/sbin', '/usr/sbin'))
executable = shutil.which(command, path=path)
if executable is None:
return None
# LC_ALL to ensure English output, 2>/dev/null to prevent output on
# stderr (Note: we don't have an example where the words we search for
# are actually localized, but in theory some system could do so.)
cmd = 'LC_ALL=C %s %s 2>/dev/null' % (executable, args)
return os.popen(cmd)
def _find_mac(command, args, hw_identifiers, get_index):
try:
pipe = _popen(command, args)
if not pipe:
return
with pipe:
for line in pipe:
words = line.lower().rstrip().split()
for i in range(len(words)):
if words[i] in hw_identifiers:
try:
word = words[get_index(i)]
mac = int(word.replace(':', ''), 16)
if mac:
return mac
except (ValueError, IndexError):
# Virtual interfaces, such as those provided by
# VPNs, do not have a colon-delimited MAC address
# as expected, but a 16-byte HWAddr separated by
# dashes. These should be ignored in favor of a
# real MAC address
pass
except OSError:
pass
def _ifconfig_getnode():
"""Get the hardware address on Unix by running ifconfig."""
# This works on Linux ('' or '-a'), Tru64 ('-av'), but not all Unixes.
for args in ('', '-a', '-av'):
mac = _find_mac('ifconfig', args, ['hwaddr', 'ether'], lambda i: i+1)
if mac:
return mac
def _arp_getnode():
"""Get the hardware address on Unix by running arp."""
import os, socket
try:
ip_addr = socket.gethostbyname(socket.gethostname())
except OSError:
return None
# Try getting the MAC addr from arp based on our IP address (Solaris).
return _find_mac('arp', '-an', [ip_addr], lambda i: -1)
def _lanscan_getnode():
"""Get the hardware address on Unix by running lanscan."""
# This might work on HP-UX.
return _find_mac('lanscan', '-ai', ['lan0'], lambda i: 0)
def _netstat_getnode():
"""Get the hardware address on Unix by running netstat."""
# This might work on AIX, Tru64 UNIX and presumably on IRIX.
try:
pipe = _popen('netstat', '-ia')
if not pipe:
return
with pipe:
words = pipe.readline().rstrip().split()
try:
i = words.index('Address')
except ValueError:
return
for line in pipe:
try:
words = line.rstrip().split()
word = words[i]
if len(word) == 17 and word.count(':') == 5:
mac = int(word.replace(':', ''), 16)
if mac:
return mac
except (ValueError, IndexError):
pass
except OSError:
pass
def _ipconfig_getnode():
"""Get the hardware address on Windows by running ipconfig.exe."""
import os, re
dirs = ['', r'c:\windows\system32', r'c:\winnt\system32']
try:
import ctypes
buffer = ctypes.create_string_buffer(300)
ctypes.windll.kernel32.GetSystemDirectoryA(buffer, 300)
dirs.insert(0, buffer.value.decode('mbcs'))
except:
pass
for dir in dirs:
try:
pipe = os.popen(os.path.join(dir, 'ipconfig') + ' /all')
except OSError:
continue
with pipe:
for line in pipe:
value = line.split(':')[-1].strip().lower()
if re.match('([0-9a-f][0-9a-f]-){5}[0-9a-f][0-9a-f]', value):
return int(value.replace('-', ''), 16)
def _netbios_getnode():
"""Get the hardware address on Windows using NetBIOS calls.
See http://support.microsoft.com/kb/118623 for details."""
import win32wnet, netbios
ncb = netbios.NCB()
ncb.Command = netbios.NCBENUM
ncb.Buffer = adapters = netbios.LANA_ENUM()
adapters._pack()
if win32wnet.Netbios(ncb) != 0:
return
adapters._unpack()
for i in range(adapters.length):
ncb.Reset()
ncb.Command = netbios.NCBRESET
ncb.Lana_num = ord(adapters.lana[i])
if win32wnet.Netbios(ncb) != 0:
continue
ncb.Reset()
ncb.Command = netbios.NCBASTAT
ncb.Lana_num = ord(adapters.lana[i])
ncb.Callname = '*'.ljust(16)
ncb.Buffer = status = netbios.ADAPTER_STATUS()
if win32wnet.Netbios(ncb) != 0:
continue
status._unpack()
bytes = status.adapter_address
return ((bytes[0]<<40) + (bytes[1]<<32) + (bytes[2]<<24) +
(bytes[3]<<16) + (bytes[4]<<8) + bytes[5])
# Thanks to Thomas Heller for ctypes and for his help with its use here.
# If ctypes is available, use it to find system routines for UUID generation.
# XXX This makes the module non-thread-safe!
_uuid_generate_random = _uuid_generate_time = _UuidCreate = None
try:
import ctypes, ctypes.util
# The uuid_generate_* routines are provided by libuuid on at least
# Linux and FreeBSD, and provided by libc on Mac OS X.
for libname in ['uuid', 'c']:
try:
lib = ctypes.CDLL(ctypes.util.find_library(libname))
except:
continue
if hasattr(lib, 'uuid_generate_random'):
_uuid_generate_random = lib.uuid_generate_random
if hasattr(lib, 'uuid_generate_time'):
_uuid_generate_time = lib.uuid_generate_time
if _uuid_generate_random is not None:
break # found everything we were looking for
# The uuid_generate_* functions are broken on MacOS X 10.5, as noted
# in issue #8621 the function generates the same sequence of values
# in the parent process and all children created using fork (unless
# those children use exec as well).
#
# Assume that the uuid_generate functions are broken from 10.5 onward,
# the test can be adjusted when a later version is fixed.
import sys
if sys.platform == 'darwin':
import os
if int(os.uname().release.split('.')[0]) >= 9:
_uuid_generate_random = _uuid_generate_time = None
# On Windows prior to 2000, UuidCreate gives a UUID containing the
# hardware address. On Windows 2000 and later, UuidCreate makes a
# random UUID and UuidCreateSequential gives a UUID containing the
# hardware address. These routines are provided by the RPC runtime.
# NOTE: at least on Tim's WinXP Pro SP2 desktop box, while the last
# 6 bytes returned by UuidCreateSequential are fixed, they don't appear
# to bear any relationship to the MAC address of any network device
# on the box.
try:
lib = ctypes.windll.rpcrt4
except:
lib = None
_UuidCreate = getattr(lib, 'UuidCreateSequential',
getattr(lib, 'UuidCreate', None))
except:
pass
def _unixdll_getnode():
"""Get the hardware address on Unix using ctypes."""
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_time(_buffer)
return UUID(bytes=bytes_(_buffer.raw)).node
def _windll_getnode():
"""Get the hardware address on Windows using ctypes."""
_buffer = ctypes.create_string_buffer(16)
if _UuidCreate(_buffer) == 0:
return UUID(bytes=bytes_(_buffer.raw)).node
def _random_getnode():
"""Get a random node ID, with eighth bit set as suggested by RFC 4122."""
import random
return random.randrange(0, 1<<48) | 0x010000000000
_node = None
def getnode():
"""Get the hardware address as a 48-bit positive integer.
The first time this runs, it may launch a separate program, which could
be quite slow. If all attempts to obtain the hardware address fail, we
choose a random 48-bit number with its eighth bit set to 1 as recommended
in RFC 4122.
"""
global _node
if _node is not None:
return _node
import sys
if sys.platform == 'win32':
getters = [_windll_getnode, _netbios_getnode, _ipconfig_getnode]
else:
getters = [_unixdll_getnode, _ifconfig_getnode, _arp_getnode,
_lanscan_getnode, _netstat_getnode]
for getter in getters + [_random_getnode]:
try:
_node = getter()
except:
continue
if _node is not None:
return _node
_last_timestamp = None
def uuid1(node=None, clock_seq=None):
"""Generate a UUID from a host ID, sequence number, and the current time.
If 'node' is not given, getnode() is used to obtain the hardware
address. If 'clock_seq' is given, it is used as the sequence number;
otherwise a random 14-bit sequence number is chosen."""
# When the system provides a version-1 UUID generator, use it (but don't
# use UuidCreate here because its UUIDs don't conform to RFC 4122).
if _uuid_generate_time and node is clock_seq is None:
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_time(_buffer)
return UUID(bytes=bytes_(_buffer.raw))
global _last_timestamp
import time
nanoseconds = int(time.time() * 1e9)
# 0x01b21dd213814000 is the number of 100-ns intervals between the
# UUID epoch 1582-10-15 00:00:00 and the Unix epoch 1970-01-01 00:00:00.
timestamp = int(nanoseconds/100) + 0x01b21dd213814000
if _last_timestamp is not None and timestamp <= _last_timestamp:
timestamp = _last_timestamp + 1
_last_timestamp = timestamp
if clock_seq is None:
import random
clock_seq = random.randrange(1<<14) # instead of stable storage
time_low = timestamp & 0xffffffff
time_mid = (timestamp >> 32) & 0xffff
time_hi_version = (timestamp >> 48) & 0x0fff
clock_seq_low = clock_seq & 0xff
clock_seq_hi_variant = (clock_seq >> 8) & 0x3f
if node is None:
node = getnode()
return UUID(fields=(time_low, time_mid, time_hi_version,
clock_seq_hi_variant, clock_seq_low, node), version=1)
def uuid3(namespace, name):
"""Generate a UUID from the MD5 hash of a namespace UUID and a name."""
from hashlib import md5
hash = md5(namespace.bytes + bytes(name, "utf-8")).digest()
return UUID(bytes=hash[:16], version=3)
def uuid4():
"""Generate a random UUID."""
# When the system provides a version-4 UUID generator, use it.
if _uuid_generate_random:
_buffer = ctypes.create_string_buffer(16)
_uuid_generate_random(_buffer)
return UUID(bytes=bytes_(_buffer.raw))
# Otherwise, get randomness from urandom or the 'random' module.
try:
import os
return UUID(bytes=os.urandom(16), version=4)
except:
import random
bytes = bytes_(random.randrange(256) for i in range(16))
return UUID(bytes=bytes, version=4)
def uuid5(namespace, name):
"""Generate a UUID from the SHA-1 hash of a namespace UUID and a name."""
from hashlib import sha1
hash = sha1(namespace.bytes + bytes(name, "utf-8")).digest()
return UUID(bytes=hash[:16], version=5)
# The following standard UUIDs are for use with uuid3() or uuid5().
NAMESPACE_DNS = UUID('6ba7b810-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_URL = UUID('6ba7b811-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_OID = UUID('6ba7b812-9dad-11d1-80b4-00c04fd430c8')
NAMESPACE_X500 = UUID('6ba7b814-9dad-11d1-80b4-00c04fd430c8')
"""Python part of the warnings subsystem."""
import sys
__all__ = ["warn", "warn_explicit", "showwarning",
"formatwarning", "filterwarnings", "simplefilter",
"resetwarnings", "catch_warnings"]
def showwarning(message, category, filename, lineno, file=None, line=None):
"""Hook to write a warning to a file; replace if you like."""
if file is None:
file = sys.stderr
if file is None:
# sys.stderr is None when run with pythonw.exe - warnings get lost
return
try:
file.write(formatwarning(message, category, filename, lineno, line))
except OSError:
pass # the file (probably stderr) is invalid - this warning gets lost.
def formatwarning(message, category, filename, lineno, line=None):
"""Function to format a warning the standard way."""
import linecache
s = "%s:%s: %s: %s\n" % (filename, lineno, category.__name__, message)
line = linecache.getline(filename, lineno) if line is None else line
if line:
line = line.strip()
s += " %s\n" % line
return s
def filterwarnings(action, message="", category=Warning, module="", lineno=0,
append=False):
"""Insert an entry into the list of warnings filters (at the front).
'action' -- one of "error", "ignore", "always", "default", "module",
or "once"
'message' -- a regex that the warning message must match
'category' -- a class that the warning must be a subclass of
'module' -- a regex that the module name must match
'lineno' -- an integer line number, 0 matches all warnings
'append' -- if true, append to the list of filters
"""
import re
assert action in ("error", "ignore", "always", "default", "module",
"once"), "invalid action: %r" % (action,)
assert isinstance(message, str), "message must be a string"
assert isinstance(category, type), "category must be a class"
assert issubclass(category, Warning), "category must be a Warning subclass"
assert isinstance(module, str), "module must be a string"
assert isinstance(lineno, int) and lineno >= 0, \
"lineno must be an int >= 0"
item = (action, re.compile(message, re.I), category,
re.compile(module), lineno)
if append:
filters.append(item)
else:
filters.insert(0, item)
_filters_mutated()
def simplefilter(action, category=Warning, lineno=0, append=False):
"""Insert a simple entry into the list of warnings filters (at the front).
A simple filter matches all modules and messages.
'action' -- one of "error", "ignore", "always", "default", "module",
or "once"
'category' -- a class that the warning must be a subclass of
'lineno' -- an integer line number, 0 matches all warnings
'append' -- if true, append to the list of filters
"""
assert action in ("error", "ignore", "always", "default", "module",
"once"), "invalid action: %r" % (action,)
assert isinstance(lineno, int) and lineno >= 0, \
"lineno must be an int >= 0"
item = (action, None, category, None, lineno)
if append:
filters.append(item)
else:
filters.insert(0, item)
_filters_mutated()
def resetwarnings():
"""Clear the list of warning filters, so that no filters are active."""
filters[:] = []
_filters_mutated()
class _OptionError(Exception):
"""Exception used by option processing helpers."""
pass
# Helper to process -W options passed via sys.warnoptions
def _processoptions(args):
for arg in args:
try:
_setoption(arg)
except _OptionError as msg:
print("Invalid -W option ignored:", msg, file=sys.stderr)
# Helper for _processoptions()
def _setoption(arg):
import re
parts = arg.split(':')
if len(parts) > 5:
raise _OptionError("too many fields (max 5): %r" % (arg,))
while len(parts) < 5:
parts.append('')
action, message, category, module, lineno = [s.strip()
for s in parts]
action = _getaction(action)
message = re.escape(message)
category = _getcategory(category)
module = re.escape(module)
if module:
module = module + '$'
if lineno:
try:
lineno = int(lineno)
if lineno < 0:
raise ValueError
except (ValueError, OverflowError):
raise _OptionError("invalid lineno %r" % (lineno,))
else:
lineno = 0
filterwarnings(action, message, category, module, lineno)
# Helper for _setoption()
def _getaction(action):
if not action:
return "default"
if action == "all": return "always" # Alias
for a in ('default', 'always', 'ignore', 'module', 'once', 'error'):
if a.startswith(action):
return a
raise _OptionError("invalid action: %r" % (action,))
# Helper for _setoption()
def _getcategory(category):
import re
if not category:
return Warning
if re.match("^[a-zA-Z0-9_]+$", category):
try:
cat = eval(category)
except NameError:
raise _OptionError("unknown warning category: %r" % (category,))
else:
i = category.rfind(".")
module = category[:i]
klass = category[i+1:]
try:
m = __import__(module, None, None, [klass])
except ImportError:
raise _OptionError("invalid module name: %r" % (module,))
try:
cat = getattr(m, klass)
except AttributeError:
raise _OptionError("unknown warning category: %r" % (category,))
if not issubclass(cat, Warning):
raise _OptionError("invalid warning category: %r" % (category,))
return cat
# Code typically replaced by _warnings
def warn(message, category=None, stacklevel=1):
"""Issue a warning, or maybe ignore it or raise an exception."""
# Check if message is already a Warning object
if isinstance(message, Warning):
category = message.__class__
# Check category argument
if category is None:
category = UserWarning
assert issubclass(category, Warning)
# Get context information
try:
caller = sys._getframe(stacklevel)
except ValueError:
globals = sys.__dict__
lineno = 1
else:
globals = caller.f_globals
lineno = caller.f_lineno
if '__name__' in globals:
module = globals['__name__']
else:
module = "<string>"
filename = globals.get('__file__')
if filename:
fnl = filename.lower()
if fnl.endswith((".pyc", ".pyo")):
filename = filename[:-1]
else:
if module == "__main__":
try:
filename = sys.argv[0]
except AttributeError:
# embedded interpreters don't have sys.argv, see bug #839151
filename = '__main__'
if not filename:
filename = module
registry = globals.setdefault("__warningregistry__", {})
warn_explicit(message, category, filename, lineno, module, registry,
globals)
def warn_explicit(message, category, filename, lineno,
module=None, registry=None, module_globals=None):
lineno = int(lineno)
if module is None:
module = filename or "<unknown>"
if module[-3:].lower() == ".py":
module = module[:-3] # XXX What about leading pathname?
if registry is None:
registry = {}
if registry.get('version', 0) != _filters_version:
registry.clear()
registry['version'] = _filters_version
if isinstance(message, Warning):
text = str(message)
category = message.__class__
else:
text = message
message = category(message)
key = (text, category, lineno)
# Quick test for common case
if registry.get(key):
return
# Search the filters
for item in filters:
action, msg, cat, mod, ln = item
if ((msg is None or msg.match(text)) and
issubclass(category, cat) and
(mod is None or mod.match(module)) and
(ln == 0 or lineno == ln)):
break
else:
action = defaultaction
# Early exit actions
if action == "ignore":
registry[key] = 1
return
# Prime the linecache for formatting, in case the
# "file" is actually in a zipfile or something.
import linecache
linecache.getlines(filename, module_globals)
if action == "error":
raise message
# Other actions
if action == "once":
registry[key] = 1
oncekey = (text, category)
if onceregistry.get(oncekey):
return
onceregistry[oncekey] = 1
elif action == "always":
pass
elif action == "module":
registry[key] = 1
altkey = (text, category, 0)
if registry.get(altkey):
return
registry[altkey] = 1
elif action == "default":
registry[key] = 1
else:
# Unrecognized actions are errors
raise RuntimeError(
"Unrecognized action (%r) in warnings.filters:\n %s" %
(action, item))
if not callable(showwarning):
raise TypeError("warnings.showwarning() must be set to a "
"function or method")
# Print message and context
showwarning(message, category, filename, lineno)
class WarningMessage(object):
"""Holds the result of a single showwarning() call."""
_WARNING_DETAILS = ("message", "category", "filename", "lineno", "file",
"line")
def __init__(self, message, category, filename, lineno, file=None,
line=None):
local_values = locals()
for attr in self._WARNING_DETAILS:
setattr(self, attr, local_values[attr])
self._category_name = category.__name__ if category else None
def __str__(self):
return ("{message : %r, category : %r, filename : %r, lineno : %s, "
"line : %r}" % (self.message, self._category_name,
self.filename, self.lineno, self.line))
class catch_warnings(object):
"""A context manager that copies and restores the warnings filter upon
exiting the context.
The 'record' argument specifies whether warnings should be captured by a
custom implementation of warnings.showwarning() and be appended to a list
returned by the context manager. Otherwise None is returned by the context
manager. The objects appended to the list are arguments whose attributes
mirror the arguments to showwarning().
The 'module' argument is to specify an alternative module to the module
named 'warnings' and imported under that name. This argument is only useful
when testing the warnings module itself.
"""
def __init__(self, *, record=False, module=None):
"""Specify whether to record warnings and if an alternative module
should be used other than sys.modules['warnings'].
For compatibility with Python 3.0, please consider all arguments to be
keyword-only.
"""
self._record = record
self._module = sys.modules['warnings'] if module is None else module
self._entered = False
def __repr__(self):
args = []
if self._record:
args.append("record=True")
if self._module is not sys.modules['warnings']:
args.append("module=%r" % self._module)
name = type(self).__name__
return "%s(%s)" % (name, ", ".join(args))
def __enter__(self):
if self._entered:
raise RuntimeError("Cannot enter %r twice" % self)
self._entered = True
self._filters = self._module.filters
self._module.filters = self._filters[:]
self._module._filters_mutated()
self._showwarning = self._module.showwarning
if self._record:
log = []
def showwarning(*args, **kwargs):
log.append(WarningMessage(*args, **kwargs))
self._module.showwarning = showwarning
return log
else:
return None
def __exit__(self, *exc_info):
if not self._entered:
raise RuntimeError("Cannot exit %r without entering first" % self)
self._module.filters = self._filters
self._module._filters_mutated()
self._module.showwarning = self._showwarning
# filters contains a sequence of filter 5-tuples
# The components of the 5-tuple are:
# - an action: error, ignore, always, default, module, or once
# - a compiled regex that must match the warning message
# - a class representing the warning category
# - a compiled regex that must match the module that is being warned
# - a line number for the line being warning, or 0 to mean any line
# If either if the compiled regexs are None, match anything.
_warnings_defaults = False
try:
from _warnings import (filters, _defaultaction, _onceregistry,
warn, warn_explicit, _filters_mutated)
defaultaction = _defaultaction
onceregistry = _onceregistry
_warnings_defaults = True
except ImportError:
filters = []
defaultaction = "default"
onceregistry = {}
_filters_version = 1
def _filters_mutated():
global _filters_version
_filters_version += 1
# Module initialization
_processoptions(sys.warnoptions)
if not _warnings_defaults:
silence = [ImportWarning, PendingDeprecationWarning]
silence.append(DeprecationWarning)
for cls in silence:
simplefilter("ignore", category=cls)
bytes_warning = sys.flags.bytes_warning
if bytes_warning > 1:
bytes_action = "error"
elif bytes_warning:
bytes_action = "default"
else:
bytes_action = "ignore"
simplefilter(bytes_action, category=BytesWarning, append=1)
# resource usage warnings are enabled by default in pydebug mode
if hasattr(sys, 'gettotalrefcount'):
resource_action = "always"
else:
resource_action = "ignore"
simplefilter(resource_action, category=ResourceWarning, append=1)
del _warnings_defaults
"""Stuff to parse WAVE files.
Usage.
Reading WAVE files:
f = wave.open(file, 'r')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods read(), seek(), and close().
When the setpos() and rewind() methods are not used, the seek()
method is not necessary.
This returns an instance of a class with the following public methods:
getnchannels() -- returns number of audio channels (1 for
mono, 2 for stereo)
getsampwidth() -- returns sample width in bytes
getframerate() -- returns sampling frequency
getnframes() -- returns number of audio frames
getcomptype() -- returns compression type ('NONE' for linear samples)
getcompname() -- returns human-readable version of
compression type ('not compressed' linear samples)
getparams() -- returns a namedtuple consisting of all of the
above in the above order
getmarkers() -- returns None (for compatibility with the
aifc module)
getmark(id) -- raises an error since the mark does not
exist (for compatibility with the aifc module)
readframes(n) -- returns at most n frames of audio
rewind() -- rewind to the beginning of the audio stream
setpos(pos) -- seek to the specified position
tell() -- return the current position
close() -- close the instance (make it unusable)
The position returned by tell() and the position given to setpos()
are compatible and have nothing to do with the actual position in the
file.
The close() method is called automatically when the class instance
is destroyed.
Writing WAVE files:
f = wave.open(file, 'w')
where file is either the name of a file or an open file pointer.
The open file pointer must have methods write(), tell(), seek(), and
close().
This returns an instance of a class with the following public methods:
setnchannels(n) -- set the number of channels
setsampwidth(n) -- set the sample width
setframerate(n) -- set the frame rate
setnframes(n) -- set the number of frames
setcomptype(type, name)
-- set the compression type and the
human-readable compression type
setparams(tuple)
-- set all parameters at once
tell() -- return current position in output file
writeframesraw(data)
-- write audio frames without pathing up the
file header
writeframes(data)
-- write audio frames and patch up the file header
close() -- patch up the file header and close the
output file
You should set the parameters before the first writeframesraw or
writeframes. The total number of frames does not need to be set,
but when it is set to the correct value, the header does not have to
be patched up.
It is best to first set all parameters, perhaps possibly the
compression type, and then write audio frames using writeframesraw.
When all frames have been written, either call writeframes(b'') or
close() to patch up the sizes in the header.
The close() method is called automatically when the class instance
is destroyed.
"""
import builtins
__all__ = ["open", "openfp", "Error"]
class Error(Exception):
pass
WAVE_FORMAT_PCM = 0x0001
_array_fmts = None, 'b', 'h', None, 'i'
import audioop
import struct
import sys
from chunk import Chunk
from collections import namedtuple
_wave_params = namedtuple('_wave_params',
'nchannels sampwidth framerate nframes comptype compname')
class Wave_read:
"""Variables used in this class:
These variables are available to the user though appropriate
methods of this class:
_file -- the open file with methods read(), close(), and seek()
set through the __init__() method
_nchannels -- the number of audio channels
available through the getnchannels() method
_nframes -- the number of audio frames
available through the getnframes() method
_sampwidth -- the number of bytes per audio sample
available through the getsampwidth() method
_framerate -- the sampling frequency
available through the getframerate() method
_comptype -- the AIFF-C compression type ('NONE' if AIFF)
available through the getcomptype() method
_compname -- the human-readable AIFF-C compression type
available through the getcomptype() method
_soundpos -- the position in the audio stream
available through the tell() method, set through the
setpos() method
These variables are used internally only:
_fmt_chunk_read -- 1 iff the FMT chunk has been read
_data_seek_needed -- 1 iff positioned correctly in audio
file for readframes()
_data_chunk -- instantiation of a chunk class for the DATA chunk
_framesize -- size of one frame in the file
"""
def initfp(self, file):
self._convert = None
self._soundpos = 0
self._file = Chunk(file, bigendian = 0)
if self._file.getname() != b'RIFF':
raise Error('file does not start with RIFF id')
if self._file.read(4) != b'WAVE':
raise Error('not a WAVE file')
self._fmt_chunk_read = 0
self._data_chunk = None
while 1:
self._data_seek_needed = 1
try:
chunk = Chunk(self._file, bigendian = 0)
except EOFError:
break
chunkname = chunk.getname()
if chunkname == b'fmt ':
self._read_fmt_chunk(chunk)
self._fmt_chunk_read = 1
elif chunkname == b'data':
if not self._fmt_chunk_read:
raise Error('data chunk before fmt chunk')
self._data_chunk = chunk
self._nframes = chunk.chunksize // self._framesize
self._data_seek_needed = 0
break
chunk.skip()
if not self._fmt_chunk_read or not self._data_chunk:
raise Error('fmt chunk and/or data chunk missing')
def __init__(self, f):
self._i_opened_the_file = None
if isinstance(f, str):
f = builtins.open(f, 'rb')
self._i_opened_the_file = f
# else, assume it is an open file object already
try:
self.initfp(f)
except:
if self._i_opened_the_file:
f.close()
raise
def __del__(self):
self.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
#
# User visible methods.
#
def getfp(self):
return self._file
def rewind(self):
self._data_seek_needed = 1
self._soundpos = 0
def close(self):
self._file = None
file = self._i_opened_the_file
if file:
self._i_opened_the_file = None
file.close()
def tell(self):
return self._soundpos
def getnchannels(self):
return self._nchannels
def getnframes(self):
return self._nframes
def getsampwidth(self):
return self._sampwidth
def getframerate(self):
return self._framerate
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
def getparams(self):
return _wave_params(self.getnchannels(), self.getsampwidth(),
self.getframerate(), self.getnframes(),
self.getcomptype(), self.getcompname())
def getmarkers(self):
return None
def getmark(self, id):
raise Error('no marks')
def setpos(self, pos):
if pos < 0 or pos > self._nframes:
raise Error('position not in range')
self._soundpos = pos
self._data_seek_needed = 1
def readframes(self, nframes):
if self._data_seek_needed:
self._data_chunk.seek(0, 0)
pos = self._soundpos * self._framesize
if pos:
self._data_chunk.seek(pos, 0)
self._data_seek_needed = 0
if nframes == 0:
return b''
data = self._data_chunk.read(nframes * self._framesize)
if self._sampwidth != 1 and sys.byteorder == 'big':
data = audioop.byteswap(data, self._sampwidth)
if self._convert and data:
data = self._convert(data)
self._soundpos = self._soundpos + len(data) // (self._nchannels * self._sampwidth)
return data
#
# Internal methods.
#
def _read_fmt_chunk(self, chunk):
wFormatTag, self._nchannels, self._framerate, dwAvgBytesPerSec, wBlockAlign = struct.unpack_from('<HHLLH', chunk.read(14))
if wFormatTag == WAVE_FORMAT_PCM:
sampwidth = struct.unpack_from('<H', chunk.read(2))[0]
self._sampwidth = (sampwidth + 7) // 8
else:
raise Error('unknown format: %r' % (wFormatTag,))
self._framesize = self._nchannels * self._sampwidth
self._comptype = 'NONE'
self._compname = 'not compressed'
class Wave_write:
"""Variables used in this class:
These variables are user settable through appropriate methods
of this class:
_file -- the open file with methods write(), close(), tell(), seek()
set through the __init__() method
_comptype -- the AIFF-C compression type ('NONE' in AIFF)
set through the setcomptype() or setparams() method
_compname -- the human-readable AIFF-C compression type
set through the setcomptype() or setparams() method
_nchannels -- the number of audio channels
set through the setnchannels() or setparams() method
_sampwidth -- the number of bytes per audio sample
set through the setsampwidth() or setparams() method
_framerate -- the sampling frequency
set through the setframerate() or setparams() method
_nframes -- the number of audio frames written to the header
set through the setnframes() or setparams() method
These variables are used internally only:
_datalength -- the size of the audio samples written to the header
_nframeswritten -- the number of frames actually written
_datawritten -- the size of the audio samples actually written
"""
def __init__(self, f):
self._i_opened_the_file = None
if isinstance(f, str):
f = builtins.open(f, 'wb')
self._i_opened_the_file = f
try:
self.initfp(f)
except:
if self._i_opened_the_file:
f.close()
raise
def initfp(self, file):
self._file = file
self._convert = None
self._nchannels = 0
self._sampwidth = 0
self._framerate = 0
self._nframes = 0
self._nframeswritten = 0
self._datawritten = 0
self._datalength = 0
self._headerwritten = False
def __del__(self):
self.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
#
# User visible methods.
#
def setnchannels(self, nchannels):
if self._datawritten:
raise Error('cannot change parameters after starting to write')
if nchannels < 1:
raise Error('bad # of channels')
self._nchannels = nchannels
def getnchannels(self):
if not self._nchannels:
raise Error('number of channels not set')
return self._nchannels
def setsampwidth(self, sampwidth):
if self._datawritten:
raise Error('cannot change parameters after starting to write')
if sampwidth < 1 or sampwidth > 4:
raise Error('bad sample width')
self._sampwidth = sampwidth
def getsampwidth(self):
if not self._sampwidth:
raise Error('sample width not set')
return self._sampwidth
def setframerate(self, framerate):
if self._datawritten:
raise Error('cannot change parameters after starting to write')
if framerate <= 0:
raise Error('bad frame rate')
self._framerate = int(round(framerate))
def getframerate(self):
if not self._framerate:
raise Error('frame rate not set')
return self._framerate
def setnframes(self, nframes):
if self._datawritten:
raise Error('cannot change parameters after starting to write')
self._nframes = nframes
def getnframes(self):
return self._nframeswritten
def setcomptype(self, comptype, compname):
if self._datawritten:
raise Error('cannot change parameters after starting to write')
if comptype not in ('NONE',):
raise Error('unsupported compression type')
self._comptype = comptype
self._compname = compname
def getcomptype(self):
return self._comptype
def getcompname(self):
return self._compname
def setparams(self, params):
nchannels, sampwidth, framerate, nframes, comptype, compname = params
if self._datawritten:
raise Error('cannot change parameters after starting to write')
self.setnchannels(nchannels)
self.setsampwidth(sampwidth)
self.setframerate(framerate)
self.setnframes(nframes)
self.setcomptype(comptype, compname)
def getparams(self):
if not self._nchannels or not self._sampwidth or not self._framerate:
raise Error('not all parameters set')
return _wave_params(self._nchannels, self._sampwidth, self._framerate,
self._nframes, self._comptype, self._compname)
def setmark(self, id, pos, name):
raise Error('setmark() not supported')
def getmark(self, id):
raise Error('no marks')
def getmarkers(self):
return None
def tell(self):
return self._nframeswritten
def writeframesraw(self, data):
if not isinstance(data, (bytes, bytearray)):
data = memoryview(data).cast('B')
self._ensure_header_written(len(data))
nframes = len(data) // (self._sampwidth * self._nchannels)
if self._convert:
data = self._convert(data)
if self._sampwidth != 1 and sys.byteorder == 'big':
data = audioop.byteswap(data, self._sampwidth)
self._file.write(data)
self._datawritten += len(data)
self._nframeswritten = self._nframeswritten + nframes
def writeframes(self, data):
self.writeframesraw(data)
if self._datalength != self._datawritten:
self._patchheader()
def close(self):
try:
if self._file:
self._ensure_header_written(0)
if self._datalength != self._datawritten:
self._patchheader()
self._file.flush()
finally:
self._file = None
file = self._i_opened_the_file
if file:
self._i_opened_the_file = None
file.close()
#
# Internal methods.
#
def _ensure_header_written(self, datasize):
if not self._headerwritten:
if not self._nchannels:
raise Error('# channels not specified')
if not self._sampwidth:
raise Error('sample width not specified')
if not self._framerate:
raise Error('sampling rate not specified')
self._write_header(datasize)
def _write_header(self, initlength):
assert not self._headerwritten
self._file.write(b'RIFF')
if not self._nframes:
self._nframes = initlength // (self._nchannels * self._sampwidth)
self._datalength = self._nframes * self._nchannels * self._sampwidth
try:
self._form_length_pos = self._file.tell()
except (AttributeError, OSError):
self._form_length_pos = None
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
36 + self._datalength, b'WAVE', b'fmt ', 16,
WAVE_FORMAT_PCM, self._nchannels, self._framerate,
self._nchannels * self._framerate * self._sampwidth,
self._nchannels * self._sampwidth,
self._sampwidth * 8, b'data'))
if self._form_length_pos is not None:
self._data_length_pos = self._file.tell()
self._file.write(struct.pack('<L', self._datalength))
self._headerwritten = True
def _patchheader(self):
assert self._headerwritten
if self._datawritten == self._datalength:
return
curpos = self._file.tell()
self._file.seek(self._form_length_pos, 0)
self._file.write(struct.pack('<L', 36 + self._datawritten))
self._file.seek(self._data_length_pos, 0)
self._file.write(struct.pack('<L', self._datawritten))
self._file.seek(curpos, 0)
self._datalength = self._datawritten
def open(f, mode=None):
if mode is None:
if hasattr(f, 'mode'):
mode = f.mode
else:
mode = 'rb'
if mode in ('r', 'rb'):
return Wave_read(f)
elif mode in ('w', 'wb'):
return Wave_write(f)
else:
raise Error("mode must be 'r', 'rb', 'w', or 'wb'")
openfp = open # B/W compatibility
"""Weak reference support for Python.
This module is an implementation of PEP 205:
http://www.python.org/dev/peps/pep-0205/
"""
# Naming convention: Variables named "wr" are weak reference objects;
# they are called this instead of "ref" to avoid name collisions with
# the module-global ref() function imported from _weakref.
from _weakref import (
getweakrefcount,
getweakrefs,
ref,
proxy,
CallableProxyType,
ProxyType,
ReferenceType)
from _weakrefset import WeakSet, _IterationGuard
import collections # Import after _weakref to avoid circular import.
import sys
import itertools
ProxyTypes = (ProxyType, CallableProxyType)
__all__ = ["ref", "proxy", "getweakrefcount", "getweakrefs",
"WeakKeyDictionary", "ReferenceType", "ProxyType",
"CallableProxyType", "ProxyTypes", "WeakValueDictionary",
"WeakSet", "WeakMethod", "finalize"]
class WeakMethod(ref):
"""
A custom `weakref.ref` subclass which simulates a weak reference to
a bound method, working around the lifetime problem of bound methods.
"""
__slots__ = "_func_ref", "_meth_type", "_alive", "__weakref__"
def __new__(cls, meth, callback=None):
try:
obj = meth.__self__
func = meth.__func__
except AttributeError:
raise TypeError("argument should be a bound method, not {}"
.format(type(meth))) from None
def _cb(arg):
# The self-weakref trick is needed to avoid creating a reference
# cycle.
self = self_wr()
if self._alive:
self._alive = False
if callback is not None:
callback(self)
self = ref.__new__(cls, obj, _cb)
self._func_ref = ref(func, _cb)
self._meth_type = type(meth)
self._alive = True
self_wr = ref(self)
return self
def __call__(self):
obj = super().__call__()
func = self._func_ref()
if obj is None or func is None:
return None
return self._meth_type(func, obj)
def __eq__(self, other):
if isinstance(other, WeakMethod):
if not self._alive or not other._alive:
return self is other
return ref.__eq__(self, other) and self._func_ref == other._func_ref
return False
def __ne__(self, other):
if isinstance(other, WeakMethod):
if not self._alive or not other._alive:
return self is not other
return ref.__ne__(self, other) or self._func_ref != other._func_ref
return True
__hash__ = ref.__hash__
class WeakValueDictionary(collections.MutableMapping):
"""Mapping class that references values weakly.
Entries in the dictionary will be discarded when no strong
reference to the value exists anymore
"""
# We inherit the constructor without worrying about the input
# dictionary; since it uses our .update() method, we get the right
# checks (if the other dictionary is a WeakValueDictionary,
# objects are unwrapped on the way out, and we always wrap on the
# way in).
def __init__(*args, **kw):
if not args:
raise TypeError("descriptor '__init__' of 'WeakValueDictionary' "
"object needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
def remove(wr, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(wr.key)
else:
del self.data[wr.key]
self._remove = remove
# A list of keys to be removed
self._pending_removals = []
self._iterating = set()
self.data = d = {}
self.update(*args, **kw)
def _commit_removals(self):
l = self._pending_removals
d = self.data
# We shouldn't encounter any KeyError, because this method should
# always be called *before* mutating the dict.
while l:
del d[l.pop()]
def __getitem__(self, key):
o = self.data[key]()
if o is None:
raise KeyError(key)
else:
return o
def __delitem__(self, key):
if self._pending_removals:
self._commit_removals()
del self.data[key]
def __len__(self):
return len(self.data) - len(self._pending_removals)
def __contains__(self, key):
try:
o = self.data[key]()
except KeyError:
return False
return o is not None
def __repr__(self):
return "<WeakValueDictionary at %s>" % id(self)
def __setitem__(self, key, value):
if self._pending_removals:
self._commit_removals()
self.data[key] = KeyedRef(value, self._remove, key)
def copy(self):
new = WeakValueDictionary()
for key, wr in self.data.items():
o = wr()
if o is not None:
new[key] = o
return new
__copy__ = copy
def __deepcopy__(self, memo):
from copy import deepcopy
new = self.__class__()
for key, wr in self.data.items():
o = wr()
if o is not None:
new[deepcopy(key, memo)] = o
return new
def get(self, key, default=None):
try:
wr = self.data[key]
except KeyError:
return default
else:
o = wr()
if o is None:
# This should only happen
return default
else:
return o
def items(self):
with _IterationGuard(self):
for k, wr in self.data.items():
v = wr()
if v is not None:
yield k, v
def keys(self):
with _IterationGuard(self):
for k, wr in self.data.items():
if wr() is not None:
yield k
__iter__ = keys
def itervaluerefs(self):
"""Return an iterator that yields the weak references to the values.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the values around longer than needed.
"""
with _IterationGuard(self):
yield from self.data.values()
def values(self):
with _IterationGuard(self):
for wr in self.data.values():
obj = wr()
if obj is not None:
yield obj
def popitem(self):
if self._pending_removals:
self._commit_removals()
while True:
key, wr = self.data.popitem()
o = wr()
if o is not None:
return key, o
def pop(self, key, *args):
if self._pending_removals:
self._commit_removals()
try:
o = self.data.pop(key)()
except KeyError:
if args:
return args[0]
raise
if o is None:
raise KeyError(key)
else:
return o
def setdefault(self, key, default=None):
try:
wr = self.data[key]
except KeyError:
if self._pending_removals:
self._commit_removals()
self.data[key] = KeyedRef(default, self._remove, key)
return default
else:
return wr()
def update(*args, **kwargs):
if not args:
raise TypeError("descriptor 'update' of 'WeakValueDictionary' "
"object needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
dict = args[0] if args else None
if self._pending_removals:
self._commit_removals()
d = self.data
if dict is not None:
if not hasattr(dict, "items"):
dict = type({})(dict)
for key, o in dict.items():
d[key] = KeyedRef(o, self._remove, key)
if len(kwargs):
self.update(kwargs)
def valuerefs(self):
"""Return a list of weak references to the values.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the values around longer than needed.
"""
return list(self.data.values())
class KeyedRef(ref):
"""Specialized reference that includes a key corresponding to the value.
This is used in the WeakValueDictionary to avoid having to create
a function object for each key stored in the mapping. A shared
callback object can use the 'key' attribute of a KeyedRef instead
of getting a reference to the key from an enclosing scope.
"""
__slots__ = "key",
def __new__(type, ob, callback, key):
self = ref.__new__(type, ob, callback)
self.key = key
return self
def __init__(self, ob, callback, key):
super().__init__(ob, callback)
class WeakKeyDictionary(collections.MutableMapping):
""" Mapping class that references keys weakly.
Entries in the dictionary will be discarded when there is no
longer a strong reference to the key. This can be used to
associate additional data with an object owned by other parts of
an application without adding attributes to those objects. This
can be especially useful with objects that override attribute
accesses.
"""
def __init__(self, dict=None):
self.data = {}
def remove(k, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(k)
else:
del self.data[k]
self._remove = remove
# A list of dead weakrefs (keys to be removed)
self._pending_removals = []
self._iterating = set()
self._dirty_len = False
if dict is not None:
self.update(dict)
def _commit_removals(self):
# NOTE: We don't need to call this method before mutating the dict,
# because a dead weakref never compares equal to a live weakref,
# even if they happened to refer to equal objects.
# However, it means keys may already have been removed.
l = self._pending_removals
d = self.data
while l:
try:
del d[l.pop()]
except KeyError:
pass
def _scrub_removals(self):
d = self.data
self._pending_removals = [k for k in self._pending_removals if k in d]
self._dirty_len = False
def __delitem__(self, key):
self._dirty_len = True
del self.data[ref(key)]
def __getitem__(self, key):
return self.data[ref(key)]
def __len__(self):
if self._dirty_len and self._pending_removals:
# self._pending_removals may still contain keys which were
# explicitly removed, we have to scrub them (see issue #21173).
self._scrub_removals()
return len(self.data) - len(self._pending_removals)
def __repr__(self):
return "<WeakKeyDictionary at %s>" % id(self)
def __setitem__(self, key, value):
self.data[ref(key, self._remove)] = value
def copy(self):
new = WeakKeyDictionary()
for key, value in self.data.items():
o = key()
if o is not None:
new[o] = value
return new
__copy__ = copy
def __deepcopy__(self, memo):
from copy import deepcopy
new = self.__class__()
for key, value in self.data.items():
o = key()
if o is not None:
new[o] = deepcopy(value, memo)
return new
def get(self, key, default=None):
return self.data.get(ref(key),default)
def __contains__(self, key):
try:
wr = ref(key)
except TypeError:
return False
return wr in self.data
def items(self):
with _IterationGuard(self):
for wr, value in self.data.items():
key = wr()
if key is not None:
yield key, value
def keys(self):
with _IterationGuard(self):
for wr in self.data:
obj = wr()
if obj is not None:
yield obj
__iter__ = keys
def values(self):
with _IterationGuard(self):
for wr, value in self.data.items():
if wr() is not None:
yield value
def keyrefs(self):
"""Return a list of weak references to the keys.
The references are not guaranteed to be 'live' at the time
they are used, so the result of calling the references needs
to be checked before being used. This can be used to avoid
creating references that will cause the garbage collector to
keep the keys around longer than needed.
"""
return list(self.data)
def popitem(self):
self._dirty_len = True
while True:
key, value = self.data.popitem()
o = key()
if o is not None:
return o, value
def pop(self, key, *args):
self._dirty_len = True
return self.data.pop(ref(key), *args)
def setdefault(self, key, default=None):
return self.data.setdefault(ref(key, self._remove),default)
def update(self, dict=None, **kwargs):
d = self.data
if dict is not None:
if not hasattr(dict, "items"):
dict = type({})(dict)
for key, value in dict.items():
d[ref(key, self._remove)] = value
if len(kwargs):
self.update(kwargs)
class finalize:
"""Class for finalization of weakrefable objects
finalize(obj, func, *args, **kwargs) returns a callable finalizer
object which will be called when obj is garbage collected. The
first time the finalizer is called it evaluates func(*arg, **kwargs)
and returns the result. After this the finalizer is dead, and
calling it just returns None.
When the program exits any remaining finalizers for which the
atexit attribute is true will be run in reverse order of creation.
By default atexit is true.
"""
# Finalizer objects don't have any state of their own. They are
# just used as keys to lookup _Info objects in the registry. This
# ensures that they cannot be part of a ref-cycle.
__slots__ = ()
_registry = {}
_shutdown = False
_index_iter = itertools.count()
_dirty = False
_registered_with_atexit = False
class _Info:
__slots__ = ("weakref", "func", "args", "kwargs", "atexit", "index")
def __init__(self, obj, func, *args, **kwargs):
if not self._registered_with_atexit:
# We may register the exit function more than once because
# of a thread race, but that is harmless
import atexit
atexit.register(self._exitfunc)
finalize._registered_with_atexit = True
info = self._Info()
info.weakref = ref(obj, self)
info.func = func
info.args = args
info.kwargs = kwargs or None
info.atexit = True
info.index = next(self._index_iter)
self._registry[self] = info
finalize._dirty = True
def __call__(self, _=None):
"""If alive then mark as dead and return func(*args, **kwargs);
otherwise return None"""
info = self._registry.pop(self, None)
if info and not self._shutdown:
return info.func(*info.args, **(info.kwargs or {}))
def detach(self):
"""If alive then mark as dead and return (obj, func, args, kwargs);
otherwise return None"""
info = self._registry.get(self)
obj = info and info.weakref()
if obj is not None and self._registry.pop(self, None):
return (obj, info.func, info.args, info.kwargs or {})
def peek(self):
"""If alive then return (obj, func, args, kwargs);
otherwise return None"""
info = self._registry.get(self)
obj = info and info.weakref()
if obj is not None:
return (obj, info.func, info.args, info.kwargs or {})
@property
def alive(self):
"""Whether finalizer is alive"""
return self in self._registry
@property
def atexit(self):
"""Whether finalizer should be called at exit"""
info = self._registry.get(self)
return bool(info) and info.atexit
@atexit.setter
def atexit(self, value):
info = self._registry.get(self)
if info:
info.atexit = bool(value)
def __repr__(self):
info = self._registry.get(self)
obj = info and info.weakref()
if obj is None:
return '<%s object at %#x; dead>' % (type(self).__name__, id(self))
else:
return '<%s object at %#x; for %r at %#x>' % \
(type(self).__name__, id(self), type(obj).__name__, id(obj))
@classmethod
def _select_for_exit(cls):
# Return live finalizers marked for exit, oldest first
L = [(f,i) for (f,i) in cls._registry.items() if i.atexit]
L.sort(key=lambda item:item[1].index)
return [f for (f,i) in L]
@classmethod
def _exitfunc(cls):
# At shutdown invoke finalizers for which atexit is true.
# This is called once all other non-daemonic threads have been
# joined.
reenable_gc = False
try:
if cls._registry:
import gc
if gc.isenabled():
reenable_gc = True
gc.disable()
pending = None
while True:
if pending is None or finalize._dirty:
pending = cls._select_for_exit()
finalize._dirty = False
if not pending:
break
f = pending.pop()
try:
# gc is disabled, so (assuming no daemonic
# threads) the following is the only line in
# this function which might trigger creation
# of a new finalizer
f()
except Exception:
sys.excepthook(*sys.exc_info())
assert f not in cls._registry
finally:
# prevent any more finalizers from executing during shutdown
finalize._shutdown = True
if reenable_gc:
gc.enable()
#! /usr/bin/env python3
"""Interfaces for launching and remotely controlling Web browsers."""
# Maintained by Georg Brandl.
import os
import shlex
import shutil
import sys
import subprocess
__all__ = ["Error", "open", "open_new", "open_new_tab", "get", "register"]
class Error(Exception):
pass
_browsers = {} # Dictionary of available browser controllers
_tryorder = [] # Preference order of available browsers
def register(name, klass, instance=None, update_tryorder=1):
"""Register a browser connector and, optionally, connection."""
_browsers[name.lower()] = [klass, instance]
if update_tryorder > 0:
_tryorder.append(name)
elif update_tryorder < 0:
_tryorder.insert(0, name)
def get(using=None):
"""Return a browser launcher instance appropriate for the environment."""
if using is not None:
alternatives = [using]
else:
alternatives = _tryorder
for browser in alternatives:
if '%s' in browser:
# User gave us a command line, split it into name and args
browser = shlex.split(browser)
if browser[-1] == '&':
return BackgroundBrowser(browser[:-1])
else:
return GenericBrowser(browser)
else:
# User gave us a browser name or path.
try:
command = _browsers[browser.lower()]
except KeyError:
command = _synthesize(browser)
if command[1] is not None:
return command[1]
elif command[0] is not None:
return command[0]()
raise Error("could not locate runnable browser")
# Please note: the following definition hides a builtin function.
# It is recommended one does "import webbrowser" and uses webbrowser.open(url)
# instead of "from webbrowser import *".
def open(url, new=0, autoraise=True):
for name in _tryorder:
browser = get(name)
if browser.open(url, new, autoraise):
return True
return False
def open_new(url):
return open(url, 1)
def open_new_tab(url):
return open(url, 2)
def _synthesize(browser, update_tryorder=1):
"""Attempt to synthesize a controller base on existing controllers.
This is useful to create a controller when a user specifies a path to
an entry in the BROWSER environment variable -- we can copy a general
controller to operate using a specific installation of the desired
browser in this way.
If we can't create a controller in this way, or if there is no
executable for the requested browser, return [None, None].
"""
cmd = browser.split()[0]
if not shutil.which(cmd):
return [None, None]
name = os.path.basename(cmd)
try:
command = _browsers[name.lower()]
except KeyError:
return [None, None]
# now attempt to clone to fit the new name:
controller = command[1]
if controller and name.lower() == controller.basename:
import copy
controller = copy.copy(controller)
controller.name = browser
controller.basename = os.path.basename(browser)
register(browser, None, controller, update_tryorder)
return [None, controller]
return [None, None]
# General parent classes
class BaseBrowser(object):
"""Parent class for all browsers. Do not use directly."""
args = ['%s']
def __init__(self, name=""):
self.name = name
self.basename = name
def open(self, url, new=0, autoraise=True):
raise NotImplementedError
def open_new(self, url):
return self.open(url, 1)
def open_new_tab(self, url):
return self.open(url, 2)
class GenericBrowser(BaseBrowser):
"""Class for all browsers started with a command
and without remote functionality."""
def __init__(self, name):
if isinstance(name, str):
self.name = name
self.args = ["%s"]
else:
# name should be a list with arguments
self.name = name[0]
self.args = name[1:]
self.basename = os.path.basename(self.name)
def open(self, url, new=0, autoraise=True):
cmdline = [self.name] + [arg.replace("%s", url)
for arg in self.args]
try:
if sys.platform[:3] == 'win':
p = subprocess.Popen(cmdline)
else:
p = subprocess.Popen(cmdline, close_fds=True)
return not p.wait()
except OSError:
return False
class BackgroundBrowser(GenericBrowser):
"""Class for all browsers which are to be started in the
background."""
def open(self, url, new=0, autoraise=True):
cmdline = [self.name] + [arg.replace("%s", url)
for arg in self.args]
try:
if sys.platform[:3] == 'win':
p = subprocess.Popen(cmdline)
else:
p = subprocess.Popen(cmdline, close_fds=True,
start_new_session=True)
return (p.poll() is None)
except OSError:
return False
class UnixBrowser(BaseBrowser):
"""Parent class for all Unix browsers with remote functionality."""
raise_opts = None
background = False
redirect_stdout = True
# In remote_args, %s will be replaced with the requested URL. %action will
# be replaced depending on the value of 'new' passed to open.
# remote_action is used for new=0 (open). If newwin is not None, it is
# used for new=1 (open_new). If newtab is not None, it is used for
# new=3 (open_new_tab). After both substitutions are made, any empty
# strings in the transformed remote_args list will be removed.
remote_args = ['%action', '%s']
remote_action = None
remote_action_newwin = None
remote_action_newtab = None
def _invoke(self, args, remote, autoraise):
raise_opt = []
if remote and self.raise_opts:
# use autoraise argument only for remote invocation
autoraise = int(autoraise)
opt = self.raise_opts[autoraise]
if opt: raise_opt = [opt]
cmdline = [self.name] + raise_opt + args
if remote or self.background:
inout = subprocess.DEVNULL
else:
# for TTY browsers, we need stdin/out
inout = None
p = subprocess.Popen(cmdline, close_fds=True, stdin=inout,
stdout=(self.redirect_stdout and inout or None),
stderr=inout, start_new_session=True)
if remote:
# wait at most five seconds. If the subprocess is not finished, the
# remote invocation has (hopefully) started a new instance.
try:
rc = p.wait(5)
# if remote call failed, open() will try direct invocation
return not rc
except subprocess.TimeoutExpired:
return True
elif self.background:
if p.poll() is None:
return True
else:
return False
else:
return not p.wait()
def open(self, url, new=0, autoraise=True):
if new == 0:
action = self.remote_action
elif new == 1:
action = self.remote_action_newwin
elif new == 2:
if self.remote_action_newtab is None:
action = self.remote_action_newwin
else:
action = self.remote_action_newtab
else:
raise Error("Bad 'new' parameter to open(); " +
"expected 0, 1, or 2, got %s" % new)
args = [arg.replace("%s", url).replace("%action", action)
for arg in self.remote_args]
args = [arg for arg in args if arg]
success = self._invoke(args, True, autoraise)
if not success:
# remote invocation failed, try straight way
args = [arg.replace("%s", url) for arg in self.args]
return self._invoke(args, False, False)
else:
return True
class Mozilla(UnixBrowser):
"""Launcher class for Mozilla/Netscape browsers."""
raise_opts = ["-noraise", "-raise"]
remote_args = ['-remote', 'openURL(%s%action)']
remote_action = ""
remote_action_newwin = ",new-window"
remote_action_newtab = ",new-tab"
background = True
Netscape = Mozilla
class Galeon(UnixBrowser):
"""Launcher class for Galeon/Epiphany browsers."""
raise_opts = ["-noraise", ""]
remote_args = ['%action', '%s']
remote_action = "-n"
remote_action_newwin = "-w"
background = True
class Chrome(UnixBrowser):
"Launcher class for Google Chrome browser."
remote_args = ['%action', '%s']
remote_action = ""
remote_action_newwin = "--new-window"
remote_action_newtab = ""
background = True
Chromium = Chrome
class Opera(UnixBrowser):
"Launcher class for Opera browser."
raise_opts = ["-noraise", ""]
remote_args = ['-remote', 'openURL(%s%action)']
remote_action = ""
remote_action_newwin = ",new-window"
remote_action_newtab = ",new-page"
background = True
class Elinks(UnixBrowser):
"Launcher class for Elinks browsers."
remote_args = ['-remote', 'openURL(%s%action)']
remote_action = ""
remote_action_newwin = ",new-window"
remote_action_newtab = ",new-tab"
background = False
# elinks doesn't like its stdout to be redirected -
# it uses redirected stdout as a signal to do -dump
redirect_stdout = False
class Konqueror(BaseBrowser):
"""Controller for the KDE File Manager (kfm, or Konqueror).
See the output of ``kfmclient --commands``
for more information on the Konqueror remote-control interface.
"""
def open(self, url, new=0, autoraise=True):
# XXX Currently I know no way to prevent KFM from opening a new win.
if new == 2:
action = "newTab"
else:
action = "openURL"
devnull = subprocess.DEVNULL
try:
p = subprocess.Popen(["kfmclient", action, url],
close_fds=True, stdin=devnull,
stdout=devnull, stderr=devnull)
except OSError:
# fall through to next variant
pass
else:
p.wait()
# kfmclient's return code unfortunately has no meaning as it seems
return True
try:
p = subprocess.Popen(["konqueror", "--silent", url],
close_fds=True, stdin=devnull,
stdout=devnull, stderr=devnull,
start_new_session=True)
except OSError:
# fall through to next variant
pass
else:
if p.poll() is None:
# Should be running now.
return True
try:
p = subprocess.Popen(["kfm", "-d", url],
close_fds=True, stdin=devnull,
stdout=devnull, stderr=devnull,
start_new_session=True)
except OSError:
return False
else:
return (p.poll() is None)
class Grail(BaseBrowser):
# There should be a way to maintain a connection to Grail, but the
# Grail remote control protocol doesn't really allow that at this
# point. It probably never will!
def _find_grail_rc(self):
import glob
import pwd
import socket
import tempfile
tempdir = os.path.join(tempfile.gettempdir(),
".grail-unix")
user = pwd.getpwuid(os.getuid())[0]
filename = os.path.join(tempdir, user + "-*")
maybes = glob.glob(filename)
if not maybes:
return None
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
for fn in maybes:
# need to PING each one until we find one that's live
try:
s.connect(fn)
except OSError:
# no good; attempt to clean it out, but don't fail:
try:
os.unlink(fn)
except OSError:
pass
else:
return s
def _remote(self, action):
s = self._find_grail_rc()
if not s:
return 0
s.send(action)
s.close()
return 1
def open(self, url, new=0, autoraise=True):
if new:
ok = self._remote("LOADNEW " + url)
else:
ok = self._remote("LOAD " + url)
return ok
#
# Platform support for Unix
#
# These are the right tests because all these Unix browsers require either
# a console terminal or an X display to run.
def register_X_browsers():
# use xdg-open if around
if shutil.which("xdg-open"):
register("xdg-open", None, BackgroundBrowser("xdg-open"))
# The default GNOME3 browser
if "GNOME_DESKTOP_SESSION_ID" in os.environ and shutil.which("gvfs-open"):
register("gvfs-open", None, BackgroundBrowser("gvfs-open"))
# The default GNOME browser
if "GNOME_DESKTOP_SESSION_ID" in os.environ and shutil.which("gnome-open"):
register("gnome-open", None, BackgroundBrowser("gnome-open"))
# The default KDE browser
if "KDE_FULL_SESSION" in os.environ and shutil.which("kfmclient"):
register("kfmclient", Konqueror, Konqueror("kfmclient"))
if shutil.which("x-www-browser"):
register("x-www-browser", None, BackgroundBrowser("x-www-browser"))
# The Mozilla/Netscape browsers
for browser in ("mozilla-firefox", "firefox",
"mozilla-firebird", "firebird",
"iceweasel", "iceape",
"seamonkey", "mozilla", "netscape"):
if shutil.which(browser):
register(browser, None, Mozilla(browser))
# Konqueror/kfm, the KDE browser.
if shutil.which("kfm"):
register("kfm", Konqueror, Konqueror("kfm"))
elif shutil.which("konqueror"):
register("konqueror", Konqueror, Konqueror("konqueror"))
# Gnome's Galeon and Epiphany
for browser in ("galeon", "epiphany"):
if shutil.which(browser):
register(browser, None, Galeon(browser))
# Skipstone, another Gtk/Mozilla based browser
if shutil.which("skipstone"):
register("skipstone", None, BackgroundBrowser("skipstone"))
# Google Chrome/Chromium browsers
for browser in ("google-chrome", "chrome", "chromium", "chromium-browser"):
if shutil.which(browser):
register(browser, None, Chrome(browser))
# Opera, quite popular
if shutil.which("opera"):
register("opera", None, Opera("opera"))
# Next, Mosaic -- old but still in use.
if shutil.which("mosaic"):
register("mosaic", None, BackgroundBrowser("mosaic"))
# Grail, the Python browser. Does anybody still use it?
if shutil.which("grail"):
register("grail", Grail, None)
# Prefer X browsers if present
if os.environ.get("DISPLAY"):
register_X_browsers()
# Also try console browsers
if os.environ.get("TERM"):
if shutil.which("www-browser"):
register("www-browser", None, GenericBrowser("www-browser"))
# The Links/elinks browsers <http://artax.karlin.mff.cuni.cz/~mikulas/links/>
if shutil.which("links"):
register("links", None, GenericBrowser("links"))
if shutil.which("elinks"):
register("elinks", None, Elinks("elinks"))
# The Lynx browser <http://lynx.isc.org/>, <http://lynx.browser.org/>
if shutil.which("lynx"):
register("lynx", None, GenericBrowser("lynx"))
# The w3m browser <http://w3m.sourceforge.net/>
if shutil.which("w3m"):
register("w3m", None, GenericBrowser("w3m"))
#
# Platform support for Windows
#
if sys.platform[:3] == "win":
class WindowsDefault(BaseBrowser):
def open(self, url, new=0, autoraise=True):
try:
os.startfile(url)
except OSError:
# [Error 22] No application is associated with the specified
# file for this operation: '<URL>'
return False
else:
return True
_tryorder = []
_browsers = {}
# First try to use the default Windows browser
register("windows-default", WindowsDefault)
# Detect some common Windows browsers, fallback to IE
iexplore = os.path.join(os.environ.get("PROGRAMFILES", "C:\\Program Files"),
"Internet Explorer\\IEXPLORE.EXE")
for browser in ("firefox", "firebird", "seamonkey", "mozilla",
"netscape", "opera", iexplore):
if shutil.which(browser):
register(browser, None, BackgroundBrowser(browser))
#
# Platform support for MacOS
#
if sys.platform == 'darwin':
# Adapted from patch submitted to SourceForge by Steven J. Burr
class MacOSX(BaseBrowser):
"""Launcher class for Aqua browsers on Mac OS X
Optionally specify a browser name on instantiation. Note that this
will not work for Aqua browsers if the user has moved the application
package after installation.
If no browser is specified, the default browser, as specified in the
Internet System Preferences panel, will be used.
"""
def __init__(self, name):
self.name = name
def open(self, url, new=0, autoraise=True):
assert "'" not in url
# hack for local urls
if not ':' in url:
url = 'file:'+url
# new must be 0 or 1
new = int(bool(new))
if self.name == "default":
# User called open, open_new or get without a browser parameter
script = 'open location "%s"' % url.replace('"', '%22') # opens in default browser
else:
# User called get and chose a browser
if self.name == "OmniWeb":
toWindow = ""
else:
# Include toWindow parameter of OpenURL command for browsers
# that support it. 0 == new window; -1 == existing
toWindow = "toWindow %d" % (new - 1)
cmd = 'OpenURL "%s"' % url.replace('"', '%22')
script = '''tell application "%s"
activate
%s %s
end tell''' % (self.name, cmd, toWindow)
# Open pipe to AppleScript through osascript command
osapipe = os.popen("osascript", "w")
if osapipe is None:
return False
# Write script to osascript's stdin
osapipe.write(script)
rc = osapipe.close()
return not rc
class MacOSXOSAScript(BaseBrowser):
def __init__(self, name):
self._name = name
def open(self, url, new=0, autoraise=True):
if self._name == 'default':
script = 'open location "%s"' % url.replace('"', '%22') # opens in default browser
else:
script = '''
tell application "%s"
activate
open location "%s"
end
'''%(self._name, url.replace('"', '%22'))
osapipe = os.popen("osascript", "w")
if osapipe is None:
return False
osapipe.write(script)
rc = osapipe.close()
return not rc
# Don't clear _tryorder or _browsers since OS X can use above Unix support
# (but we prefer using the OS X specific stuff)
register("safari", None, MacOSXOSAScript('safari'), -1)
register("firefox", None, MacOSXOSAScript('firefox'), -1)
register("MacOSX", None, MacOSXOSAScript('default'), -1)
# OK, now that we know what the default preference orders for each
# platform are, allow user to override them with the BROWSER variable.
if "BROWSER" in os.environ:
_userchoices = os.environ["BROWSER"].split(os.pathsep)
_userchoices.reverse()
# Treat choices in same way as if passed into get() but do register
# and prepend to _tryorder
for cmdline in _userchoices:
if cmdline != '':
cmd = _synthesize(cmdline, -1)
if cmd[1] is None:
register(cmdline, None, GenericBrowser(cmdline), -1)
cmdline = None # to make del work if _userchoices was empty
del cmdline
del _userchoices
# what to do if _tryorder is now empty?
def main():
import getopt
usage = """Usage: %s [-n | -t] url
-n: open new window
-t: open new tab""" % sys.argv[0]
try:
opts, args = getopt.getopt(sys.argv[1:], 'ntd')
except getopt.error as msg:
print(msg, file=sys.stderr)
print(usage, file=sys.stderr)
sys.exit(1)
new_win = 0
for o, a in opts:
if o == '-n': new_win = 1
elif o == '-t': new_win = 2
if len(args) != 1:
print(usage, file=sys.stderr)
sys.exit(1)
url = args[0]
open(url, new_win)
print("\a")
if __name__ == "__main__":
main()
#-*- coding: ISO-8859-1 -*-
def _():
import sys
if sys.implementation.name == 'ironpython':
import clr
try:
clr.AddReference('IronPython.Wpf')
except:
pass
_()
del _
from _wpf import *
"""Implements (a subset of) Sun XDR -- eXternal Data Representation.
See: RFC 1014
"""
import struct
from io import BytesIO
from functools import wraps
__all__ = ["Error", "Packer", "Unpacker", "ConversionError"]
# exceptions
class Error(Exception):
"""Exception class for this module. Use:
except xdrlib.Error as var:
# var has the Error instance for the exception
Public ivars:
msg -- contains the message
"""
def __init__(self, msg):
self.msg = msg
def __repr__(self):
return repr(self.msg)
def __str__(self):
return str(self.msg)
class ConversionError(Error):
pass
def raise_conversion_error(function):
""" Wrap any raised struct.errors in a ConversionError. """
@wraps(function)
def result(self, value):
try:
return function(self, value)
except struct.error as e:
raise ConversionError(e.args[0]) from None
return result
class Packer:
"""Pack various data representations into a buffer."""
def __init__(self):
self.reset()
def reset(self):
self.__buf = BytesIO()
def get_buffer(self):
return self.__buf.getvalue()
# backwards compatibility
get_buf = get_buffer
@raise_conversion_error
def pack_uint(self, x):
self.__buf.write(struct.pack('>L', x))
@raise_conversion_error
def pack_int(self, x):
self.__buf.write(struct.pack('>l', x))
pack_enum = pack_int
def pack_bool(self, x):
if x: self.__buf.write(b'\0\0\0\1')
else: self.__buf.write(b'\0\0\0\0')
def pack_uhyper(self, x):
try:
self.pack_uint(x>>32 & 0xffffffff)
except (TypeError, struct.error) as e:
raise ConversionError(e.args[0]) from None
try:
self.pack_uint(x & 0xffffffff)
except (TypeError, struct.error) as e:
raise ConversionError(e.args[0]) from None
pack_hyper = pack_uhyper
@raise_conversion_error
def pack_float(self, x):
self.__buf.write(struct.pack('>f', x))
@raise_conversion_error
def pack_double(self, x):
self.__buf.write(struct.pack('>d', x))
def pack_fstring(self, n, s):
if n < 0:
raise ValueError('fstring size must be nonnegative')
data = s[:n]
n = ((n+3)//4)*4
data = data + (n - len(data)) * b'\0'
self.__buf.write(data)
pack_fopaque = pack_fstring
def pack_string(self, s):
n = len(s)
self.pack_uint(n)
self.pack_fstring(n, s)
pack_opaque = pack_string
pack_bytes = pack_string
def pack_list(self, list, pack_item):
for item in list:
self.pack_uint(1)
pack_item(item)
self.pack_uint(0)
def pack_farray(self, n, list, pack_item):
if len(list) != n:
raise ValueError('wrong array size')
for item in list:
pack_item(item)
def pack_array(self, list, pack_item):
n = len(list)
self.pack_uint(n)
self.pack_farray(n, list, pack_item)
class Unpacker:
"""Unpacks various data representations from the given buffer."""
def __init__(self, data):
self.reset(data)
def reset(self, data):
self.__buf = data
self.__pos = 0
def get_position(self):
return self.__pos
def set_position(self, position):
self.__pos = position
def get_buffer(self):
return self.__buf
def done(self):
if self.__pos < len(self.__buf):
raise Error('unextracted data remains')
def unpack_uint(self):
i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
if len(data) < 4:
raise EOFError
return struct.unpack('>L', data)[0]
def unpack_int(self):
i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
if len(data) < 4:
raise EOFError
return struct.unpack('>l', data)[0]
unpack_enum = unpack_int
def unpack_bool(self):
return bool(self.unpack_int())
def unpack_uhyper(self):
hi = self.unpack_uint()
lo = self.unpack_uint()
return int(hi)<<32 | lo
def unpack_hyper(self):
x = self.unpack_uhyper()
if x >= 0x8000000000000000:
x = x - 0x10000000000000000
return x
def unpack_float(self):
i = self.__pos
self.__pos = j = i+4
data = self.__buf[i:j]
if len(data) < 4:
raise EOFError
return struct.unpack('>f', data)[0]
def unpack_double(self):
i = self.__pos
self.__pos = j = i+8
data = self.__buf[i:j]
if len(data) < 8:
raise EOFError
return struct.unpack('>d', data)[0]
def unpack_fstring(self, n):
if n < 0:
raise ValueError('fstring size must be nonnegative')
i = self.__pos
j = i + (n+3)//4*4
if j > len(self.__buf):
raise EOFError
self.__pos = j
return self.__buf[i:i+n]
unpack_fopaque = unpack_fstring
def unpack_string(self):
n = self.unpack_uint()
return self.unpack_fstring(n)
unpack_opaque = unpack_string
unpack_bytes = unpack_string
def unpack_list(self, unpack_item):
list = []
while 1:
x = self.unpack_uint()
if x == 0: break
if x != 1:
raise ConversionError('0 or 1 expected, got %r' % (x,))
item = unpack_item()
list.append(item)
return list
def unpack_farray(self, n, unpack_item):
list = []
for i in range(n):
list.append(unpack_item())
return list
def unpack_array(self, unpack_item):
n = self.unpack_uint()
return self.unpack_farray(n, unpack_item)
"""
Read and write ZIP files.
XXX references to utf-8 need further investigation.
"""
import io
import os
import re
import importlib.util
import sys
import time
import stat
import shutil
import struct
import binascii
try:
import zlib # We may need its compression method
crc32 = zlib.crc32
except ImportError:
zlib = None
crc32 = binascii.crc32
try:
import bz2 # We may need its compression method
except ImportError:
bz2 = None
try:
import lzma # We may need its compression method
except ImportError:
lzma = None
__all__ = ["BadZipFile", "BadZipfile", "error",
"ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2", "ZIP_LZMA",
"is_zipfile", "ZipInfo", "ZipFile", "PyZipFile", "LargeZipFile"]
class BadZipFile(Exception):
pass
class LargeZipFile(Exception):
"""
Raised when writing a zipfile, the zipfile requires ZIP64 extensions
and those extensions are disabled.
"""
error = BadZipfile = BadZipFile # Pre-3.2 compatibility names
ZIP64_LIMIT = (1 << 31) - 1
ZIP_FILECOUNT_LIMIT = (1 << 16) - 1
ZIP_MAX_COMMENT = (1 << 16) - 1
# constants for Zip file compression methods
ZIP_STORED = 0
ZIP_DEFLATED = 8
ZIP_BZIP2 = 12
ZIP_LZMA = 14
# Other ZIP compression methods not supported
DEFAULT_VERSION = 20
ZIP64_VERSION = 45
BZIP2_VERSION = 46
LZMA_VERSION = 63
# we recognize (but not necessarily support) all features up to that version
MAX_EXTRACT_VERSION = 63
# Below are some formats and associated data for reading/writing headers using
# the struct module. The names and structures of headers/records are those used
# in the PKWARE description of the ZIP file format:
# http://www.pkware.com/documents/casestudies/APPNOTE.TXT
# (URL valid as of January 2008)
# The "end of central directory" structure, magic number, size, and indices
# (section V.I in the format document)
structEndArchive = b"<4s4H2LH"
stringEndArchive = b"PK\005\006"
sizeEndCentDir = struct.calcsize(structEndArchive)
_ECD_SIGNATURE = 0
_ECD_DISK_NUMBER = 1
_ECD_DISK_START = 2
_ECD_ENTRIES_THIS_DISK = 3
_ECD_ENTRIES_TOTAL = 4
_ECD_SIZE = 5
_ECD_OFFSET = 6
_ECD_COMMENT_SIZE = 7
# These last two indices are not part of the structure as defined in the
# spec, but they are used internally by this module as a convenience
_ECD_COMMENT = 8
_ECD_LOCATION = 9
# The "central directory" structure, magic number, size, and indices
# of entries in the structure (section V.F in the format document)
structCentralDir = "<4s4B4HL2L5H2L"
stringCentralDir = b"PK\001\002"
sizeCentralDir = struct.calcsize(structCentralDir)
# indexes of entries in the central directory structure
_CD_SIGNATURE = 0
_CD_CREATE_VERSION = 1
_CD_CREATE_SYSTEM = 2
_CD_EXTRACT_VERSION = 3
_CD_EXTRACT_SYSTEM = 4
_CD_FLAG_BITS = 5
_CD_COMPRESS_TYPE = 6
_CD_TIME = 7
_CD_DATE = 8
_CD_CRC = 9
_CD_COMPRESSED_SIZE = 10
_CD_UNCOMPRESSED_SIZE = 11
_CD_FILENAME_LENGTH = 12
_CD_EXTRA_FIELD_LENGTH = 13
_CD_COMMENT_LENGTH = 14
_CD_DISK_NUMBER_START = 15
_CD_INTERNAL_FILE_ATTRIBUTES = 16
_CD_EXTERNAL_FILE_ATTRIBUTES = 17
_CD_LOCAL_HEADER_OFFSET = 18
# The "local file header" structure, magic number, size, and indices
# (section V.A in the format document)
structFileHeader = "<4s2B4HL2L2H"
stringFileHeader = b"PK\003\004"
sizeFileHeader = struct.calcsize(structFileHeader)
_FH_SIGNATURE = 0
_FH_EXTRACT_VERSION = 1
_FH_EXTRACT_SYSTEM = 2
_FH_GENERAL_PURPOSE_FLAG_BITS = 3
_FH_COMPRESSION_METHOD = 4
_FH_LAST_MOD_TIME = 5
_FH_LAST_MOD_DATE = 6
_FH_CRC = 7
_FH_COMPRESSED_SIZE = 8
_FH_UNCOMPRESSED_SIZE = 9
_FH_FILENAME_LENGTH = 10
_FH_EXTRA_FIELD_LENGTH = 11
# The "Zip64 end of central directory locator" structure, magic number, and size
structEndArchive64Locator = "<4sLQL"
stringEndArchive64Locator = b"PK\x06\x07"
sizeEndCentDir64Locator = struct.calcsize(structEndArchive64Locator)
# The "Zip64 end of central directory" record, magic number, size, and indices
# (section V.G in the format document)
structEndArchive64 = "<4sQ2H2L4Q"
stringEndArchive64 = b"PK\x06\x06"
sizeEndCentDir64 = struct.calcsize(structEndArchive64)
_CD64_SIGNATURE = 0
_CD64_DIRECTORY_RECSIZE = 1
_CD64_CREATE_VERSION = 2
_CD64_EXTRACT_VERSION = 3
_CD64_DISK_NUMBER = 4
_CD64_DISK_NUMBER_START = 5
_CD64_NUMBER_ENTRIES_THIS_DISK = 6
_CD64_NUMBER_ENTRIES_TOTAL = 7
_CD64_DIRECTORY_SIZE = 8
_CD64_OFFSET_START_CENTDIR = 9
def _check_zipfile(fp):
try:
if _EndRecData(fp):
return True # file has correct magic number
except OSError:
pass
return False
def is_zipfile(filename):
"""Quickly see if a file is a ZIP file by checking the magic number.
The filename argument may be a file or file-like object too.
"""
result = False
try:
if hasattr(filename, "read"):
result = _check_zipfile(fp=filename)
else:
with open(filename, "rb") as fp:
result = _check_zipfile(fp)
except OSError:
pass
return result
def _EndRecData64(fpin, offset, endrec):
"""
Read the ZIP64 end-of-archive records and use that to update endrec
"""
try:
fpin.seek(offset - sizeEndCentDir64Locator, 2)
except OSError:
# If the seek fails, the file is not large enough to contain a ZIP64
# end-of-archive record, so just return the end record we were given.
return endrec
data = fpin.read(sizeEndCentDir64Locator)
if len(data) != sizeEndCentDir64Locator:
return endrec
sig, diskno, reloff, disks = struct.unpack(structEndArchive64Locator, data)
if sig != stringEndArchive64Locator:
return endrec
if diskno != 0 or disks != 1:
raise BadZipFile("zipfiles that span multiple disks are not supported")
# Assume no 'zip64 extensible data'
fpin.seek(offset - sizeEndCentDir64Locator - sizeEndCentDir64, 2)
data = fpin.read(sizeEndCentDir64)
if len(data) != sizeEndCentDir64:
return endrec
sig, sz, create_version, read_version, disk_num, disk_dir, \
dircount, dircount2, dirsize, diroffset = \
struct.unpack(structEndArchive64, data)
if sig != stringEndArchive64:
return endrec
# Update the original endrec using data from the ZIP64 record
endrec[_ECD_SIGNATURE] = sig
endrec[_ECD_DISK_NUMBER] = disk_num
endrec[_ECD_DISK_START] = disk_dir
endrec[_ECD_ENTRIES_THIS_DISK] = dircount
endrec[_ECD_ENTRIES_TOTAL] = dircount2
endrec[_ECD_SIZE] = dirsize
endrec[_ECD_OFFSET] = diroffset
return endrec
def _EndRecData(fpin):
"""Return data from the "End of Central Directory" record, or None.
The data is a list of the nine items in the ZIP "End of central dir"
record followed by a tenth item, the file seek offset of this record."""
# Determine file size
fpin.seek(0, 2)
filesize = fpin.tell()
# Check to see if this is ZIP file with no archive comment (the
# "end of central directory" structure should be the last item in the
# file if this is the case).
try:
fpin.seek(-sizeEndCentDir, 2)
except OSError:
return None
data = fpin.read()
if (len(data) == sizeEndCentDir and
data[0:4] == stringEndArchive and
data[-2:] == b"\000\000"):
# the signature is correct and there's no comment, unpack structure
endrec = struct.unpack(structEndArchive, data)
endrec=list(endrec)
# Append a blank comment and record start offset
endrec.append(b"")
endrec.append(filesize - sizeEndCentDir)
# Try to read the "Zip64 end of central directory" structure
return _EndRecData64(fpin, -sizeEndCentDir, endrec)
# Either this is not a ZIP file, or it is a ZIP file with an archive
# comment. Search the end of the file for the "end of central directory"
# record signature. The comment is the last item in the ZIP file and may be
# up to 64K long. It is assumed that the "end of central directory" magic
# number does not appear in the comment.
maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0)
fpin.seek(maxCommentStart, 0)
data = fpin.read()
start = data.rfind(stringEndArchive)
if start >= 0:
# found the magic number; attempt to unpack and interpret
recData = data[start:start+sizeEndCentDir]
if len(recData) != sizeEndCentDir:
# Zip file is corrupted.
return None
endrec = list(struct.unpack(structEndArchive, recData))
commentSize = endrec[_ECD_COMMENT_SIZE] #as claimed by the zip file
comment = data[start+sizeEndCentDir:start+sizeEndCentDir+commentSize]
endrec.append(comment)
endrec.append(maxCommentStart + start)
# Try to read the "Zip64 end of central directory" structure
return _EndRecData64(fpin, maxCommentStart + start - filesize,
endrec)
# Unable to find a valid end of central directory structure
return None
class ZipInfo (object):
"""Class with attributes describing each file in the ZIP archive."""
__slots__ = (
'orig_filename',
'filename',
'date_time',
'compress_type',
'comment',
'extra',
'create_system',
'create_version',
'extract_version',
'reserved',
'flag_bits',
'volume',
'internal_attr',
'external_attr',
'header_offset',
'CRC',
'compress_size',
'file_size',
'_raw_time',
)
def __init__(self, filename="NoName", date_time=(1980,1,1,0,0,0)):
self.orig_filename = filename # Original file name in archive
# Terminate the file name at the first null byte. Null bytes in file
# names are used as tricks by viruses in archives.
null_byte = filename.find(chr(0))
if null_byte >= 0:
filename = filename[0:null_byte]
# This is used to ensure paths in generated ZIP files always use
# forward slashes as the directory separator, as required by the
# ZIP format specification.
if os.sep != "/" and os.sep in filename:
filename = filename.replace(os.sep, "/")
self.filename = filename # Normalized file name
self.date_time = date_time # year, month, day, hour, min, sec
if date_time[0] < 1980:
raise ValueError('ZIP does not support timestamps before 1980')
# Standard values:
self.compress_type = ZIP_STORED # Type of compression for the file
self.comment = b"" # Comment for each file
self.extra = b"" # ZIP extra data
if sys.platform == 'win32':
self.create_system = 0 # System which created ZIP archive
else:
# Assume everything else is unix-y
self.create_system = 3 # System which created ZIP archive
self.create_version = DEFAULT_VERSION # Version which created ZIP archive
self.extract_version = DEFAULT_VERSION # Version needed to extract archive
self.reserved = 0 # Must be zero
self.flag_bits = 0 # ZIP flag bits
self.volume = 0 # Volume number of file header
self.internal_attr = 0 # Internal attributes
self.external_attr = 0 # External file attributes
# Other attributes are set by class ZipFile:
# header_offset Byte offset to the file header
# CRC CRC-32 of the uncompressed file
# compress_size Size of the compressed file
# file_size Size of the uncompressed file
def FileHeader(self, zip64=None):
"""Return the per-file header as a string."""
dt = self.date_time
dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
if self.flag_bits & 0x08:
# Set these to zero because we write them after the file data
CRC = compress_size = file_size = 0
else:
CRC = self.CRC
compress_size = self.compress_size
file_size = self.file_size
extra = self.extra
min_version = 0
if zip64 is None:
zip64 = file_size > ZIP64_LIMIT or compress_size > ZIP64_LIMIT
if zip64:
fmt = '<HHQQ'
extra = extra + struct.pack(fmt,
1, struct.calcsize(fmt)-4, file_size, compress_size)
if file_size > ZIP64_LIMIT or compress_size > ZIP64_LIMIT:
if not zip64:
raise LargeZipFile("Filesize would require ZIP64 extensions")
# File is larger than what fits into a 4 byte integer,
# fall back to the ZIP64 extension
file_size = 0xffffffff
compress_size = 0xffffffff
min_version = ZIP64_VERSION
if self.compress_type == ZIP_BZIP2:
min_version = max(BZIP2_VERSION, min_version)
elif self.compress_type == ZIP_LZMA:
min_version = max(LZMA_VERSION, min_version)
self.extract_version = max(min_version, self.extract_version)
self.create_version = max(min_version, self.create_version)
filename, flag_bits = self._encodeFilenameFlags()
header = struct.pack(structFileHeader, stringFileHeader,
self.extract_version, self.reserved, flag_bits,
self.compress_type, dostime, dosdate, CRC,
compress_size, file_size,
len(filename), len(extra))
return header + filename + extra
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
def _decodeExtra(self):
# Try to decode the extra field.
extra = self.extra
unpack = struct.unpack
while len(extra) >= 4:
tp, ln = unpack('<HH', extra[:4])
if tp == 1:
if ln >= 24:
counts = unpack('<QQQ', extra[4:28])
elif ln == 16:
counts = unpack('<QQ', extra[4:20])
elif ln == 8:
counts = unpack('<Q', extra[4:12])
elif ln == 0:
counts = ()
else:
raise RuntimeError("Corrupt extra field %s"%(ln,))
idx = 0
# ZIP64 extension (large files and/or large archives)
if self.file_size in (0xffffffffffffffff, 0xffffffff):
self.file_size = counts[idx]
idx += 1
if self.compress_size == 0xFFFFFFFF:
self.compress_size = counts[idx]
idx += 1
if self.header_offset == 0xffffffff:
old = self.header_offset
self.header_offset = counts[idx]
idx+=1
extra = extra[ln+4:]
class _ZipDecrypter:
"""Class to handle decryption of files stored within a ZIP archive.
ZIP supports a password-based form of encryption. Even though known
plaintext attacks have been found against it, it is still useful
to be able to get data out of such a file.
Usage:
zd = _ZipDecrypter(mypwd)
plain_char = zd(cypher_char)
plain_text = map(zd, cypher_text)
"""
def _GenerateCRCTable():
"""Generate a CRC-32 table.
ZIP encryption uses the CRC32 one-byte primitive for scrambling some
internal keys. We noticed that a direct implementation is faster than
relying on binascii.crc32().
"""
poly = 0xedb88320
table = [0] * 256
for i in range(256):
crc = i
for j in range(8):
if crc & 1:
crc = ((crc >> 1) & 0x7FFFFFFF) ^ poly
else:
crc = ((crc >> 1) & 0x7FFFFFFF)
table[i] = crc
return table
crctable = None
def _crc32(self, ch, crc):
"""Compute the CRC32 primitive on one byte."""
return ((crc >> 8) & 0xffffff) ^ self.crctable[(crc ^ ch) & 0xff]
def __init__(self, pwd):
if _ZipDecrypter.crctable is None:
_ZipDecrypter.crctable = _ZipDecrypter._GenerateCRCTable()
self.key0 = 305419896
self.key1 = 591751049
self.key2 = 878082192
for p in pwd:
self._UpdateKeys(p)
def _UpdateKeys(self, c):
self.key0 = self._crc32(c, self.key0)
self.key1 = (self.key1 + (self.key0 & 255)) & 4294967295
self.key1 = (self.key1 * 134775813 + 1) & 4294967295
self.key2 = self._crc32((self.key1 >> 24) & 255, self.key2)
def __call__(self, c):
"""Decrypt a single character."""
assert isinstance(c, int)
k = self.key2 | 2
c = c ^ (((k * (k^1)) >> 8) & 255)
self._UpdateKeys(c)
return c
class LZMACompressor:
def __init__(self):
self._comp = None
def _init(self):
props = lzma._encode_filter_properties({'id': lzma.FILTER_LZMA1})
self._comp = lzma.LZMACompressor(lzma.FORMAT_RAW, filters=[
lzma._decode_filter_properties(lzma.FILTER_LZMA1, props)
])
return struct.pack('<BBH', 9, 4, len(props)) + props
def compress(self, data):
if self._comp is None:
return self._init() + self._comp.compress(data)
return self._comp.compress(data)
def flush(self):
if self._comp is None:
return self._init() + self._comp.flush()
return self._comp.flush()
class LZMADecompressor:
def __init__(self):
self._decomp = None
self._unconsumed = b''
self.eof = False
def decompress(self, data):
if self._decomp is None:
self._unconsumed += data
if len(self._unconsumed) <= 4:
return b''
psize, = struct.unpack('<H', self._unconsumed[2:4])
if len(self._unconsumed) <= 4 + psize:
return b''
self._decomp = lzma.LZMADecompressor(lzma.FORMAT_RAW, filters=[
lzma._decode_filter_properties(lzma.FILTER_LZMA1,
self._unconsumed[4:4 + psize])
])
data = self._unconsumed[4 + psize:]
del self._unconsumed
result = self._decomp.decompress(data)
self.eof = self._decomp.eof
return result
compressor_names = {
0: 'store',
1: 'shrink',
2: 'reduce',
3: 'reduce',
4: 'reduce',
5: 'reduce',
6: 'implode',
7: 'tokenize',
8: 'deflate',
9: 'deflate64',
10: 'implode',
12: 'bzip2',
14: 'lzma',
18: 'terse',
19: 'lz77',
97: 'wavpack',
98: 'ppmd',
}
def _check_compression(compression):
if compression == ZIP_STORED:
pass
elif compression == ZIP_DEFLATED:
if not zlib:
raise RuntimeError(
"Compression requires the (missing) zlib module")
elif compression == ZIP_BZIP2:
if not bz2:
raise RuntimeError(
"Compression requires the (missing) bz2 module")
elif compression == ZIP_LZMA:
if not lzma:
raise RuntimeError(
"Compression requires the (missing) lzma module")
else:
raise RuntimeError("That compression method is not supported")
def _get_compressor(compress_type):
if compress_type == ZIP_DEFLATED:
return zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
zlib.DEFLATED, -15)
elif compress_type == ZIP_BZIP2:
return bz2.BZ2Compressor()
elif compress_type == ZIP_LZMA:
return LZMACompressor()
else:
return None
def _get_decompressor(compress_type):
if compress_type == ZIP_STORED:
return None
elif compress_type == ZIP_DEFLATED:
return zlib.decompressobj(-15)
elif compress_type == ZIP_BZIP2:
return bz2.BZ2Decompressor()
elif compress_type == ZIP_LZMA:
return LZMADecompressor()
else:
descr = compressor_names.get(compress_type)
if descr:
raise NotImplementedError("compression type %d (%s)" % (compress_type, descr))
else:
raise NotImplementedError("compression type %d" % (compress_type,))
class ZipExtFile(io.BufferedIOBase):
"""File-like object for reading an archive member.
Is returned by ZipFile.open().
"""
# Max size supported by decompressor.
MAX_N = 1 << 31 - 1
# Read from compressed files in 4k blocks.
MIN_READ_SIZE = 4096
# Search for universal newlines or line chunks.
PATTERN = re.compile(br'^(?P<chunk>[^\r\n]+)|(?P<newline>\n|\r\n?)')
def __init__(self, fileobj, mode, zipinfo, decrypter=None,
close_fileobj=False):
self._fileobj = fileobj
self._decrypter = decrypter
self._close_fileobj = close_fileobj
self._compress_type = zipinfo.compress_type
self._compress_left = zipinfo.compress_size
self._left = zipinfo.file_size
self._decompressor = _get_decompressor(self._compress_type)
self._eof = False
self._readbuffer = b''
self._offset = 0
self._universal = 'U' in mode
self.newlines = None
# Adjust read size for encrypted files since the first 12 bytes
# are for the encryption/password information.
if self._decrypter is not None:
self._compress_left -= 12
self.mode = mode
self.name = zipinfo.filename
if hasattr(zipinfo, 'CRC'):
self._expected_crc = zipinfo.CRC
self._running_crc = crc32(b'') & 0xffffffff
else:
self._expected_crc = None
def readline(self, limit=-1):
"""Read and return a line from the stream.
If limit is specified, at most limit bytes will be read.
"""
if not self._universal and limit < 0:
# Shortcut common case - newline found in buffer.
i = self._readbuffer.find(b'\n', self._offset) + 1
if i > 0:
line = self._readbuffer[self._offset: i]
self._offset = i
return line
if not self._universal:
return io.BufferedIOBase.readline(self, limit)
line = b''
while limit < 0 or len(line) < limit:
readahead = self.peek(2)
if readahead == b'':
return line
#
# Search for universal newlines or line chunks.
#
# The pattern returns either a line chunk or a newline, but not
# both. Combined with peek(2), we are assured that the sequence
# '\r\n' is always retrieved completely and never split into
# separate newlines - '\r', '\n' due to coincidental readaheads.
#
match = self.PATTERN.search(readahead)
newline = match.group('newline')
if newline is not None:
if self.newlines is None:
self.newlines = []
if newline not in self.newlines:
self.newlines.append(newline)
self._offset += len(newline)
return line + b'\n'
chunk = match.group('chunk')
if limit >= 0:
chunk = chunk[: limit - len(line)]
self._offset += len(chunk)
line += chunk
return line
def peek(self, n=1):
"""Returns buffered bytes without advancing the position."""
if n > len(self._readbuffer) - self._offset:
chunk = self.read(n)
if len(chunk) > self._offset:
self._readbuffer = chunk + self._readbuffer[self._offset:]
self._offset = 0
else:
self._offset -= len(chunk)
# Return up to 512 bytes to reduce allocation overhead for tight loops.
return self._readbuffer[self._offset: self._offset + 512]
def readable(self):
return True
def read(self, n=-1):
"""Read and return up to n bytes.
If the argument is omitted, None, or negative, data is read and returned until EOF is reached..
"""
if n is None or n < 0:
buf = self._readbuffer[self._offset:]
self._readbuffer = b''
self._offset = 0
while not self._eof:
buf += self._read1(self.MAX_N)
return buf
end = n + self._offset
if end < len(self._readbuffer):
buf = self._readbuffer[self._offset:end]
self._offset = end
return buf
n = end - len(self._readbuffer)
buf = self._readbuffer[self._offset:]
self._readbuffer = b''
self._offset = 0
while n > 0 and not self._eof:
data = self._read1(n)
if n < len(data):
self._readbuffer = data
self._offset = n
buf += data[:n]
break
buf += data
n -= len(data)
return buf
def _update_crc(self, newdata):
# Update the CRC using the given data.
if self._expected_crc is None:
# No need to compute the CRC if we don't have a reference value
return
self._running_crc = crc32(newdata, self._running_crc) & 0xffffffff
# Check the CRC if we're at the end of the file
if self._eof and self._running_crc != self._expected_crc:
raise BadZipFile("Bad CRC-32 for file %r" % self.name)
def read1(self, n):
"""Read up to n bytes with at most one read() system call."""
if n is None or n < 0:
buf = self._readbuffer[self._offset:]
self._readbuffer = b''
self._offset = 0
while not self._eof:
data = self._read1(self.MAX_N)
if data:
buf += data
break
return buf
end = n + self._offset
if end < len(self._readbuffer):
buf = self._readbuffer[self._offset:end]
self._offset = end
return buf
n = end - len(self._readbuffer)
buf = self._readbuffer[self._offset:]
self._readbuffer = b''
self._offset = 0
if n > 0:
while not self._eof:
data = self._read1(n)
if n < len(data):
self._readbuffer = data
self._offset = n
buf += data[:n]
break
if data:
buf += data
break
return buf
def _read1(self, n):
# Read up to n compressed bytes with at most one read() system call,
# decrypt and decompress them.
if self._eof or n <= 0:
return b''
# Read from file.
if self._compress_type == ZIP_DEFLATED:
## Handle unconsumed data.
data = self._decompressor.unconsumed_tail
if n > len(data):
data += self._read2(n - len(data))
else:
data = self._read2(n)
if self._compress_type == ZIP_STORED:
self._eof = self._compress_left <= 0
elif self._compress_type == ZIP_DEFLATED:
n = max(n, self.MIN_READ_SIZE)
data = self._decompressor.decompress(data, n)
self._eof = (self._decompressor.eof or
self._compress_left <= 0 and
not self._decompressor.unconsumed_tail)
if self._eof:
data += self._decompressor.flush()
else:
data = self._decompressor.decompress(data)
self._eof = self._decompressor.eof or self._compress_left <= 0
data = data[:self._left]
self._left -= len(data)
if self._left <= 0:
self._eof = True
self._update_crc(data)
return data
def _read2(self, n):
if self._compress_left <= 0:
return b''
n = max(n, self.MIN_READ_SIZE)
n = min(n, self._compress_left)
data = self._fileobj.read(n)
self._compress_left -= len(data)
if not data:
raise EOFError
if self._decrypter is not None:
data = bytes(map(self._decrypter, data))
return data
def close(self):
try:
if self._close_fileobj:
self._fileobj.close()
finally:
super().close()
class ZipFile:
""" Class with methods to open, read, write, close, list zip files.
z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=True)
file: Either the path to the file, or a file-like object.
If it is a path, the file will be opened and closed by ZipFile.
mode: The mode can be either read "r", write "w" or append "a".
compression: ZIP_STORED (no compression), ZIP_DEFLATED (requires zlib),
ZIP_BZIP2 (requires bz2) or ZIP_LZMA (requires lzma).
allowZip64: if True ZipFile will create files with ZIP64 extensions when
needed, otherwise it will raise an exception when this would
be necessary.
"""
fp = None # Set here since __del__ checks it
_windows_illegal_name_trans_table = None
def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=True):
"""Open the ZIP file with mode read "r", write "w" or append "a"."""
if mode not in ("r", "w", "a"):
raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
_check_compression(compression)
self._allowZip64 = allowZip64
self._didModify = False
self.debug = 0 # Level of printing: 0 through 3
self.NameToInfo = {} # Find file info given name
self.filelist = [] # List of ZipInfo instances for archive
self.compression = compression # Method of compression
self.mode = key = mode.replace('b', '')[0]
self.pwd = None
self._comment = b''
# Check if we were passed a file-like object
if isinstance(file, str):
# No, it's a filename
self._filePassed = 0
self.filename = file
modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
try:
self.fp = io.open(file, modeDict[mode])
except OSError:
if mode == 'a':
mode = key = 'w'
self.fp = io.open(file, modeDict[mode])
else:
raise
else:
self._filePassed = 1
self.fp = file
self.filename = getattr(file, 'name', None)
try:
if key == 'r':
self._RealGetContents()
elif key == 'w':
# set the modified flag so central directory gets written
# even if no files are added to the archive
self._didModify = True
elif key == 'a':
try:
# See if file is a zip file
self._RealGetContents()
# seek to start of directory and overwrite
self.fp.seek(self.start_dir, 0)
except BadZipFile:
# file is not a zip file, just append
self.fp.seek(0, 2)
# set the modified flag so central directory gets written
# even if no files are added to the archive
self._didModify = True
else:
raise RuntimeError('Mode must be "r", "w" or "a"')
except:
fp = self.fp
self.fp = None
if not self._filePassed:
fp.close()
raise
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.close()
def _RealGetContents(self):
"""Read in the table of contents for the ZIP file."""
fp = self.fp
try:
endrec = _EndRecData(fp)
except OSError:
raise BadZipFile("File is not a zip file")
if not endrec:
raise BadZipFile("File is not a zip file")
if self.debug > 1:
print(endrec)
size_cd = endrec[_ECD_SIZE] # bytes in central directory
offset_cd = endrec[_ECD_OFFSET] # offset of central directory
self._comment = endrec[_ECD_COMMENT] # archive comment
# "concat" is zero, unless zip was concatenated to another file
concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
if endrec[_ECD_SIGNATURE] == stringEndArchive64:
# If Zip64 extension structures are present, account for them
concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
if self.debug > 2:
inferred = concat + offset_cd
print("given, inferred, offset", offset_cd, inferred, concat)
# self.start_dir: Position of start of central directory
self.start_dir = offset_cd + concat
fp.seek(self.start_dir, 0)
data = fp.read(size_cd)
fp = io.BytesIO(data)
total = 0
while total < size_cd:
centdir = fp.read(sizeCentralDir)
if len(centdir) != sizeCentralDir:
raise BadZipFile("Truncated central directory")
centdir = struct.unpack(structCentralDir, centdir)
if centdir[_CD_SIGNATURE] != stringCentralDir:
raise BadZipFile("Bad magic number for central directory")
if self.debug > 2:
print(centdir)
filename = fp.read(centdir[_CD_FILENAME_LENGTH])
flags = centdir[5]
if flags & 0x800:
# UTF-8 file names extension
filename = filename.decode('utf-8')
else:
# Historical ZIP filename encoding
filename = filename.decode('cp437')
# Create ZipInfo instance to store file information
x = ZipInfo(filename)
x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
(x.create_version, x.create_system, x.extract_version, x.reserved,
x.flag_bits, x.compress_type, t, d,
x.CRC, x.compress_size, x.file_size) = centdir[1:12]
if x.extract_version > MAX_EXTRACT_VERSION:
raise NotImplementedError("zip file version %.1f" %
(x.extract_version / 10))
x.volume, x.internal_attr, x.external_attr = centdir[15:18]
# Convert date/time code to (year, month, day, hour, min, sec)
x._raw_time = t
x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
x._decodeExtra()
x.header_offset = x.header_offset + concat
self.filelist.append(x)
self.NameToInfo[x.filename] = x
# update total bytes read from central directory
total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
+ centdir[_CD_EXTRA_FIELD_LENGTH]
+ centdir[_CD_COMMENT_LENGTH])
if self.debug > 2:
print("total", total)
def namelist(self):
"""Return a list of file names in the archive."""
return [data.filename for data in self.filelist]
def infolist(self):
"""Return a list of class ZipInfo instances for files in the
archive."""
return self.filelist
def printdir(self, file=None):
"""Print a table of contents for the zip file."""
print("%-46s %19s %12s" % ("File Name", "Modified ", "Size"),
file=file)
for zinfo in self.filelist:
date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
print("%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size),
file=file)
def testzip(self):
"""Read all the files and check the CRC."""
chunk_size = 2 ** 20
for zinfo in self.filelist:
try:
# Read by chunks, to avoid an OverflowError or a
# MemoryError with very large embedded files.
with self.open(zinfo.filename, "r") as f:
while f.read(chunk_size): # Check CRC-32
pass
except BadZipFile:
return zinfo.filename
def getinfo(self, name):
"""Return the instance of ZipInfo given 'name'."""
info = self.NameToInfo.get(name)
if info is None:
raise KeyError(
'There is no item named %r in the archive' % name)
return info
def setpassword(self, pwd):
"""Set default password for encrypted files."""
if pwd and not isinstance(pwd, bytes):
raise TypeError("pwd: expected bytes, got %s" % type(pwd))
if pwd:
self.pwd = pwd
else:
self.pwd = None
@property
def comment(self):
"""The comment text associated with the ZIP file."""
return self._comment
@comment.setter
def comment(self, comment):
if not isinstance(comment, bytes):
raise TypeError("comment: expected bytes, got %s" % type(comment))
# check for valid comment length
if len(comment) > ZIP_MAX_COMMENT:
import warnings
warnings.warn('Archive comment is too long; truncating to %d bytes'
% ZIP_MAX_COMMENT, stacklevel=2)
comment = comment[:ZIP_MAX_COMMENT]
self._comment = comment
self._didModify = True
def read(self, name, pwd=None):
"""Return file bytes (as a string) for name."""
with self.open(name, "r", pwd) as fp:
return fp.read()
def open(self, name, mode="r", pwd=None):
"""Return file-like object for 'name'."""
if mode not in ("r", "U", "rU"):
raise RuntimeError('open() requires mode "r", "U", or "rU"')
if 'U' in mode:
import warnings
warnings.warn("'U' mode is deprecated",
DeprecationWarning, 2)
if pwd and not isinstance(pwd, bytes):
raise TypeError("pwd: expected bytes, got %s" % type(pwd))
if not self.fp:
raise RuntimeError(
"Attempt to read ZIP archive that was already closed")
# Only open a new file for instances where we were not
# given a file object in the constructor
if self._filePassed:
zef_file = self.fp
else:
zef_file = io.open(self.filename, 'rb')
try:
# Make sure we have an info object
if isinstance(name, ZipInfo):
# 'name' is already an info object
zinfo = name
else:
# Get info object for name
zinfo = self.getinfo(name)
zef_file.seek(zinfo.header_offset, 0)
# Skip the file header:
fheader = zef_file.read(sizeFileHeader)
if len(fheader) != sizeFileHeader:
raise BadZipFile("Truncated file header")
fheader = struct.unpack(structFileHeader, fheader)
if fheader[_FH_SIGNATURE] != stringFileHeader:
raise BadZipFile("Bad magic number for file header")
fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
if fheader[_FH_EXTRA_FIELD_LENGTH]:
zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
if zinfo.flag_bits & 0x20:
# Zip 2.7: compressed patched data
raise NotImplementedError("compressed patched data (flag bit 5)")
if zinfo.flag_bits & 0x40:
# strong encryption
raise NotImplementedError("strong encryption (flag bit 6)")
if zinfo.flag_bits & 0x800:
# UTF-8 filename
fname_str = fname.decode("utf-8")
else:
fname_str = fname.decode("cp437")
if fname_str != zinfo.orig_filename:
raise BadZipFile(
'File name in directory %r and header %r differ.'
% (zinfo.orig_filename, fname))
# check for encrypted flag & handle password
is_encrypted = zinfo.flag_bits & 0x1
zd = None
if is_encrypted:
if not pwd:
pwd = self.pwd
if not pwd:
raise RuntimeError("File %s is encrypted, password "
"required for extraction" % name)
zd = _ZipDecrypter(pwd)
# The first 12 bytes in the cypher stream is an encryption header
# used to strengthen the algorithm. The first 11 bytes are
# completely random, while the 12th contains the MSB of the CRC,
# or the MSB of the file time depending on the header type
# and is used to check the correctness of the password.
header = zef_file.read(12)
h = list(map(zd, header[0:12]))
if zinfo.flag_bits & 0x8:
# compare against the file type from extended local headers
check_byte = (zinfo._raw_time >> 8) & 0xff
else:
# compare against the CRC otherwise
check_byte = (zinfo.CRC >> 24) & 0xff
if h[11] != check_byte:
raise RuntimeError("Bad password for file", name)
return ZipExtFile(zef_file, mode, zinfo, zd,
close_fileobj=not self._filePassed)
except:
if not self._filePassed:
zef_file.close()
raise
def extract(self, member, path=None, pwd=None):
"""Extract a member from the archive to the current working directory,
using its full name. Its file information is extracted as accurately
as possible. `member' may be a filename or a ZipInfo object. You can
specify a different directory using `path'.
"""
if not isinstance(member, ZipInfo):
member = self.getinfo(member)
if path is None:
path = os.getcwd()
return self._extract_member(member, path, pwd)
def extractall(self, path=None, members=None, pwd=None):
"""Extract all members from the archive to the current working
directory. `path' specifies a different directory to extract to.
`members' is optional and must be a subset of the list returned
by namelist().
"""
if members is None:
members = self.namelist()
for zipinfo in members:
self.extract(zipinfo, path, pwd)
@classmethod
def _sanitize_windows_name(cls, arcname, pathsep):
"""Replace bad characters and remove trailing dots from parts."""
table = cls._windows_illegal_name_trans_table
if not table:
illegal = ':<>|"?*'
table = str.maketrans(illegal, '_' * len(illegal))
cls._windows_illegal_name_trans_table = table
arcname = arcname.translate(table)
# remove trailing dots
arcname = (x.rstrip('.') for x in arcname.split(pathsep))
# rejoin, removing empty parts.
arcname = pathsep.join(x for x in arcname if x)
return arcname
def _extract_member(self, member, targetpath, pwd):
"""Extract the ZipInfo object 'member' to a physical
file on the path targetpath.
"""
# build the destination pathname, replacing
# forward slashes to platform specific separators.
arcname = member.filename.replace('/', os.path.sep)
if os.path.altsep:
arcname = arcname.replace(os.path.altsep, os.path.sep)
# interpret absolute pathname as relative, remove drive letter or
# UNC path, redundant separators, "." and ".." components.
arcname = os.path.splitdrive(arcname)[1]
invalid_path_parts = ('', os.path.curdir, os.path.pardir)
arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
if x not in invalid_path_parts)
if os.path.sep == '\\':
# filter illegal characters on Windows
arcname = self._sanitize_windows_name(arcname, os.path.sep)
targetpath = os.path.join(targetpath, arcname)
targetpath = os.path.normpath(targetpath)
# Create all upper directories if necessary.
upperdirs = os.path.dirname(targetpath)
if upperdirs and not os.path.exists(upperdirs):
os.makedirs(upperdirs)
if member.filename[-1] == '/':
if not os.path.isdir(targetpath):
os.mkdir(targetpath)
return targetpath
with self.open(member, pwd=pwd) as source, \
open(targetpath, "wb") as target:
shutil.copyfileobj(source, target)
return targetpath
def _writecheck(self, zinfo):
"""Check for errors before writing a file to the archive."""
if zinfo.filename in self.NameToInfo:
import warnings
warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
if self.mode not in ("w", "a"):
raise RuntimeError('write() requires mode "w" or "a"')
if not self.fp:
raise RuntimeError(
"Attempt to write ZIP archive that was already closed")
_check_compression(zinfo.compress_type)
if not self._allowZip64:
requires_zip64 = None
if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
requires_zip64 = "Files count"
elif zinfo.file_size > ZIP64_LIMIT:
requires_zip64 = "Filesize"
elif zinfo.header_offset > ZIP64_LIMIT:
requires_zip64 = "Zipfile size"
if requires_zip64:
raise LargeZipFile(requires_zip64 +
" would require ZIP64 extensions")
def write(self, filename, arcname=None, compress_type=None):
"""Put the bytes from filename into the archive under the name
arcname."""
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
st = os.stat(filename)
isdir = stat.S_ISDIR(st.st_mode)
mtime = time.localtime(st.st_mtime)
date_time = mtime[0:6]
# Create ZipInfo instance to store file information
if arcname is None:
arcname = filename
arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
while arcname[0] in (os.sep, os.altsep):
arcname = arcname[1:]
if isdir:
arcname += '/'
zinfo = ZipInfo(arcname, date_time)
zinfo.external_attr = (st[0] & 0xFFFF) << 16 # Unix attributes
if isdir:
zinfo.compress_type = ZIP_STORED
elif compress_type is None:
zinfo.compress_type = self.compression
else:
zinfo.compress_type = compress_type
zinfo.file_size = st.st_size
zinfo.flag_bits = 0x00
zinfo.header_offset = self.fp.tell() # Start of header bytes
if zinfo.compress_type == ZIP_LZMA:
# Compressed data includes an end-of-stream (EOS) marker
zinfo.flag_bits |= 0x02
self._writecheck(zinfo)
self._didModify = True
if isdir:
zinfo.file_size = 0
zinfo.compress_size = 0
zinfo.CRC = 0
zinfo.external_attr |= 0x10 # MS-DOS directory flag
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo
self.fp.write(zinfo.FileHeader(False))
return
cmpr = _get_compressor(zinfo.compress_type)
with open(filename, "rb") as fp:
# Must overwrite CRC and sizes with correct data later
zinfo.CRC = CRC = 0
zinfo.compress_size = compress_size = 0
# Compressed size can be larger than uncompressed size
zip64 = self._allowZip64 and \
zinfo.file_size * 1.05 > ZIP64_LIMIT
self.fp.write(zinfo.FileHeader(zip64))
file_size = 0
while 1:
buf = fp.read(1024 * 8)
if not buf:
break
file_size = file_size + len(buf)
CRC = crc32(buf, CRC) & 0xffffffff
if cmpr:
buf = cmpr.compress(buf)
compress_size = compress_size + len(buf)
self.fp.write(buf)
if cmpr:
buf = cmpr.flush()
compress_size = compress_size + len(buf)
self.fp.write(buf)
zinfo.compress_size = compress_size
else:
zinfo.compress_size = file_size
zinfo.CRC = CRC
zinfo.file_size = file_size
if not zip64 and self._allowZip64:
if file_size > ZIP64_LIMIT:
raise RuntimeError('File size has increased during compressing')
if compress_size > ZIP64_LIMIT:
raise RuntimeError('Compressed size larger than uncompressed size')
# Seek backwards and write file header (which will now include
# correct CRC and file sizes)
position = self.fp.tell() # Preserve current position in file
self.fp.seek(zinfo.header_offset, 0)
self.fp.write(zinfo.FileHeader(zip64))
self.fp.seek(position, 0)
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo
def writestr(self, zinfo_or_arcname, data, compress_type=None):
"""Write a file into the archive. The contents is 'data', which
may be either a 'str' or a 'bytes' instance; if it is a 'str',
it is encoded as UTF-8 first.
'zinfo_or_arcname' is either a ZipInfo instance or
the name of the file in the archive."""
if isinstance(data, str):
data = data.encode("utf-8")
if not isinstance(zinfo_or_arcname, ZipInfo):
zinfo = ZipInfo(filename=zinfo_or_arcname,
date_time=time.localtime(time.time())[:6])
zinfo.compress_type = self.compression
if zinfo.filename[-1] == '/':
zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x
zinfo.external_attr |= 0x10 # MS-DOS directory flag
else:
zinfo.external_attr = 0o600 << 16 # ?rw-------
else:
zinfo = zinfo_or_arcname
if not self.fp:
raise RuntimeError(
"Attempt to write to ZIP archive that was already closed")
zinfo.file_size = len(data) # Uncompressed size
zinfo.header_offset = self.fp.tell() # Start of header data
if compress_type is not None:
zinfo.compress_type = compress_type
if zinfo.compress_type == ZIP_LZMA:
# Compressed data includes an end-of-stream (EOS) marker
zinfo.flag_bits |= 0x02
self._writecheck(zinfo)
self._didModify = True
zinfo.CRC = crc32(data) & 0xffffffff # CRC-32 checksum
co = _get_compressor(zinfo.compress_type)
if co:
data = co.compress(data) + co.flush()
zinfo.compress_size = len(data) # Compressed size
else:
zinfo.compress_size = zinfo.file_size
zip64 = zinfo.file_size > ZIP64_LIMIT or \
zinfo.compress_size > ZIP64_LIMIT
if zip64 and not self._allowZip64:
raise LargeZipFile("Filesize would require ZIP64 extensions")
self.fp.write(zinfo.FileHeader(zip64))
self.fp.write(data)
if zinfo.flag_bits & 0x08:
# Write CRC and file sizes after the file data
fmt = '<LQQ' if zip64 else '<LLL'
self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
zinfo.file_size))
self.fp.flush()
self.filelist.append(zinfo)
self.NameToInfo[zinfo.filename] = zinfo
def __del__(self):
"""Call the "close()" method in case the user forgot."""
self.close()
def close(self):
"""Close the file, and for mode "w" and "a" write the ending
records."""
if self.fp is None:
return
try:
if self.mode in ("w", "a") and self._didModify: # write ending records
pos1 = self.fp.tell()
for zinfo in self.filelist: # write central directory
dt = zinfo.date_time
dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
extra = []
if zinfo.file_size > ZIP64_LIMIT \
or zinfo.compress_size > ZIP64_LIMIT:
extra.append(zinfo.file_size)
extra.append(zinfo.compress_size)
file_size = 0xffffffff
compress_size = 0xffffffff
else:
file_size = zinfo.file_size
compress_size = zinfo.compress_size
if zinfo.header_offset > ZIP64_LIMIT:
extra.append(zinfo.header_offset)
header_offset = 0xffffffff
else:
header_offset = zinfo.header_offset
extra_data = zinfo.extra
min_version = 0
if extra:
# Append a ZIP64 field to the extra's
extra_data = struct.pack(
'<HH' + 'Q'*len(extra),
1, 8*len(extra), *extra) + extra_data
min_version = ZIP64_VERSION
if zinfo.compress_type == ZIP_BZIP2:
min_version = max(BZIP2_VERSION, min_version)
elif zinfo.compress_type == ZIP_LZMA:
min_version = max(LZMA_VERSION, min_version)
extract_version = max(min_version, zinfo.extract_version)
create_version = max(min_version, zinfo.create_version)
try:
filename, flag_bits = zinfo._encodeFilenameFlags()
centdir = struct.pack(structCentralDir,
stringCentralDir, create_version,
zinfo.create_system, extract_version, zinfo.reserved,
flag_bits, zinfo.compress_type, dostime, dosdate,
zinfo.CRC, compress_size, file_size,
len(filename), len(extra_data), len(zinfo.comment),
0, zinfo.internal_attr, zinfo.external_attr,
header_offset)
except DeprecationWarning:
print((structCentralDir, stringCentralDir, create_version,
zinfo.create_system, extract_version, zinfo.reserved,
zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
zinfo.CRC, compress_size, file_size,
len(zinfo.filename), len(extra_data), len(zinfo.comment),
0, zinfo.internal_attr, zinfo.external_attr,
header_offset), file=sys.stderr)
raise
self.fp.write(centdir)
self.fp.write(filename)
self.fp.write(extra_data)
self.fp.write(zinfo.comment)
pos2 = self.fp.tell()
# Write end-of-zip-archive record
centDirCount = len(self.filelist)
centDirSize = pos2 - pos1
centDirOffset = pos1
requires_zip64 = None
if centDirCount > ZIP_FILECOUNT_LIMIT:
requires_zip64 = "Files count"
elif centDirOffset > ZIP64_LIMIT:
requires_zip64 = "Central directory offset"
elif centDirSize > ZIP64_LIMIT:
requires_zip64 = "Central directory size"
if requires_zip64:
# Need to write the ZIP64 end-of-archive records
if not self._allowZip64:
raise LargeZipFile(requires_zip64 +
" would require ZIP64 extensions")
zip64endrec = struct.pack(
structEndArchive64, stringEndArchive64,
44, 45, 45, 0, 0, centDirCount, centDirCount,
centDirSize, centDirOffset)
self.fp.write(zip64endrec)
zip64locrec = struct.pack(
structEndArchive64Locator,
stringEndArchive64Locator, 0, pos2, 1)
self.fp.write(zip64locrec)
centDirCount = min(centDirCount, 0xFFFF)
centDirSize = min(centDirSize, 0xFFFFFFFF)
centDirOffset = min(centDirOffset, 0xFFFFFFFF)
endrec = struct.pack(structEndArchive, stringEndArchive,
0, 0, centDirCount, centDirCount,
centDirSize, centDirOffset, len(self._comment))
self.fp.write(endrec)
self.fp.write(self._comment)
self.fp.flush()
finally:
fp = self.fp
self.fp = None
if not self._filePassed:
fp.close()
class PyZipFile(ZipFile):
"""Class to create ZIP archives with Python library files and packages."""
def __init__(self, file, mode="r", compression=ZIP_STORED,
allowZip64=True, optimize=-1):
ZipFile.__init__(self, file, mode=mode, compression=compression,
allowZip64=allowZip64)
self._optimize = optimize
def writepy(self, pathname, basename="", filterfunc=None):
"""Add all files from "pathname" to the ZIP archive.
If pathname is a package directory, search the directory and
all package subdirectories recursively for all *.py and enter
the modules into the archive. If pathname is a plain
directory, listdir *.py and enter all modules. Else, pathname
must be a Python *.py file and the module will be put into the
archive. Added modules are always module.pyo or module.pyc.
This method will compile the module.py into module.pyc if
necessary.
If filterfunc(pathname) is given, it is called with every argument.
When it is False, the file or directory is skipped.
"""
if filterfunc and not filterfunc(pathname):
if self.debug:
label = 'path' if os.path.isdir(pathname) else 'file'
print('%s "%s" skipped by filterfunc' % (label, pathname))
return
dir, name = os.path.split(pathname)
if os.path.isdir(pathname):
initname = os.path.join(pathname, "__init__.py")
if os.path.isfile(initname):
# This is a package directory, add it
if basename:
basename = "%s/%s" % (basename, name)
else:
basename = name
if self.debug:
print("Adding package in", pathname, "as", basename)
fname, arcname = self._get_codename(initname[0:-3], basename)
if self.debug:
print("Adding", arcname)
self.write(fname, arcname)
dirlist = os.listdir(pathname)
dirlist.remove("__init__.py")
# Add all *.py files and package subdirectories
for filename in dirlist:
path = os.path.join(pathname, filename)
root, ext = os.path.splitext(filename)
if os.path.isdir(path):
if os.path.isfile(os.path.join(path, "__init__.py")):
# This is a package directory, add it
self.writepy(path, basename,
filterfunc=filterfunc) # Recursive call
elif ext == ".py":
if filterfunc and not filterfunc(path):
if self.debug:
print('file "%s" skipped by filterfunc' % path)
continue
fname, arcname = self._get_codename(path[0:-3],
basename)
if self.debug:
print("Adding", arcname)
self.write(fname, arcname)
else:
# This is NOT a package directory, add its files at top level
if self.debug:
print("Adding files from directory", pathname)
for filename in os.listdir(pathname):
path = os.path.join(pathname, filename)
root, ext = os.path.splitext(filename)
if ext == ".py":
if filterfunc and not filterfunc(path):
if self.debug:
print('file "%s" skipped by filterfunc' % path)
continue
fname, arcname = self._get_codename(path[0:-3],
basename)
if self.debug:
print("Adding", arcname)
self.write(fname, arcname)
else:
if pathname[-3:] != ".py":
raise RuntimeError(
'Files added with writepy() must end with ".py"')
fname, arcname = self._get_codename(pathname[0:-3], basename)
if self.debug:
print("Adding file", arcname)
self.write(fname, arcname)
def _get_codename(self, pathname, basename):
"""Return (filename, archivename) for the path.
Given a module name path, return the correct file path and
archive name, compiling if necessary. For example, given
/python/lib/string, return (/python/lib/string.pyc, string).
"""
def _compile(file, optimize=-1):
import py_compile
if self.debug:
print("Compiling", file)
try:
py_compile.compile(file, doraise=True, optimize=optimize)
except py_compile.PyCompileError as err:
print(err.msg)
return False
return True
file_py = pathname + ".py"
file_pyc = pathname + ".pyc"
file_pyo = pathname + ".pyo"
pycache_pyc = importlib.util.cache_from_source(file_py, True)
pycache_pyo = importlib.util.cache_from_source(file_py, False)
if self._optimize == -1:
# legacy mode: use whatever file is present
if (os.path.isfile(file_pyo) and
os.stat(file_pyo).st_mtime >= os.stat(file_py).st_mtime):
# Use .pyo file.
arcname = fname = file_pyo
elif (os.path.isfile(file_pyc) and
os.stat(file_pyc).st_mtime >= os.stat(file_py).st_mtime):
# Use .pyc file.
arcname = fname = file_pyc
elif (os.path.isfile(pycache_pyc) and
os.stat(pycache_pyc).st_mtime >= os.stat(file_py).st_mtime):
# Use the __pycache__/*.pyc file, but write it to the legacy pyc
# file name in the archive.
fname = pycache_pyc
arcname = file_pyc
elif (os.path.isfile(pycache_pyo) and
os.stat(pycache_pyo).st_mtime >= os.stat(file_py).st_mtime):
# Use the __pycache__/*.pyo file, but write it to the legacy pyo
# file name in the archive.
fname = pycache_pyo
arcname = file_pyo
else:
# Compile py into PEP 3147 pyc file.
if _compile(file_py):
fname = (pycache_pyc if __debug__ else pycache_pyo)
arcname = (file_pyc if __debug__ else file_pyo)
else:
fname = arcname = file_py
else:
# new mode: use given optimization level
if self._optimize == 0:
fname = pycache_pyc
arcname = file_pyc
else:
fname = pycache_pyo
arcname = file_pyo
if not (os.path.isfile(fname) and
os.stat(fname).st_mtime >= os.stat(file_py).st_mtime):
if not _compile(file_py, optimize=self._optimize):
fname = arcname = file_py
archivename = os.path.split(arcname)[1]
if basename:
archivename = "%s/%s" % (basename, archivename)
return (fname, archivename)
def main(args = None):
import textwrap
USAGE=textwrap.dedent("""\
Usage:
zipfile.py -l zipfile.zip # Show listing of a zipfile
zipfile.py -t zipfile.zip # Test if a zipfile is valid
zipfile.py -e zipfile.zip target # Extract zipfile into target dir
zipfile.py -c zipfile.zip src ... # Create zipfile from sources
""")
if args is None:
args = sys.argv[1:]
if not args or args[0] not in ('-l', '-c', '-e', '-t'):
print(USAGE)
sys.exit(1)
if args[0] == '-l':
if len(args) != 2:
print(USAGE)
sys.exit(1)
with ZipFile(args[1], 'r') as zf:
zf.printdir()
elif args[0] == '-t':
if len(args) != 2:
print(USAGE)
sys.exit(1)
with ZipFile(args[1], 'r') as zf:
badfile = zf.testzip()
if badfile:
print("The following enclosed file is corrupted: {!r}".format(badfile))
print("Done testing")
elif args[0] == '-e':
if len(args) != 3:
print(USAGE)
sys.exit(1)
with ZipFile(args[1], 'r') as zf:
zf.extractall(args[2])
elif args[0] == '-c':
if len(args) < 3:
print(USAGE)
sys.exit(1)
def addToZip(zf, path, zippath):
if os.path.isfile(path):
zf.write(path, zippath, ZIP_DEFLATED)
elif os.path.isdir(path):
if zippath:
zf.write(path, zippath)
for nm in os.listdir(path):
addToZip(zf,
os.path.join(path, nm), os.path.join(zippath, nm))
# else: ignore
with ZipFile(args[1], 'w') as zf:
for path in args[2:]:
zippath = os.path.basename(path)
if not zippath:
zippath = os.path.basename(os.path.dirname(path))
if zippath in ('', os.curdir, os.pardir):
zippath = ''
addToZip(zf, path, zippath)
if __name__ == "__main__":
main()
"""A minimal subset of the locale module used at interpreter startup
(imported by the _io module), in order to reduce startup time.
Don't import directly from third-party code; use the `locale` module instead!
"""
import sys
import _locale
if sys.platform.startswith("win"):
def getpreferredencoding(do_setlocale=True):
if sys.flags.utf8_mode:
return 'UTF-8'
return _locale._getdefaultlocale()[1]
else:
try:
_locale.CODESET
except AttributeError:
def getpreferredencoding(do_setlocale=True):
if sys.flags.utf8_mode:
return 'UTF-8'
# This path for legacy systems needs the more complex
# getdefaultlocale() function, import the full locale module.
import locale
return locale.getpreferredencoding(do_setlocale)
else:
def getpreferredencoding(do_setlocale=True):
assert not do_setlocale
if sys.flags.utf8_mode:
return 'UTF-8'
result = _locale.nl_langinfo(_locale.CODESET)
if not result and sys.platform == 'darwin':
# nl_langinfo can return an empty string
# when the setting has an invalid value.
# Default to UTF-8 in that case because
# UTF-8 is the default charset on OSX and
# returning nothing will crash the
# interpreter.
result = 'UTF-8'
return result
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Abstract Base Classes (ABCs) for collections, according to PEP 3119.
Unit tests are in test_collections.
"""
from abc import ABCMeta, abstractmethod
import sys
__all__ = ["Hashable", "Iterable", "Iterator",
"Sized", "Container", "Callable",
"Set", "MutableSet",
"Mapping", "MutableMapping",
"MappingView", "KeysView", "ItemsView", "ValuesView",
"Sequence", "MutableSequence",
"ByteString",
]
# This module has been renamed from collections.abc to _collections_abc to
# speed up interpreter startup. Some of the types such as MutableMapping are
# required early but collections module imports a lot of other modules.
# See issue #19218
__name__ = "collections.abc"
# Private list of types that we want to register with the various ABCs
# so that they will pass tests like:
# it = iter(somebytearray)
# assert isinstance(it, Iterable)
# Note: in other implementations, these types many not be distinct
# and they make have their own implementation specific types that
# are not included on this list.
bytes_iterator = type(iter(b''))
bytearray_iterator = type(iter(bytearray()))
#callable_iterator = ???
dict_keyiterator = type(iter({}.keys()))
dict_valueiterator = type(iter({}.values()))
dict_itemiterator = type(iter({}.items()))
list_iterator = type(iter([]))
list_reverseiterator = type(iter(reversed([])))
range_iterator = type(iter(range(0)))
set_iterator = type(iter(set()))
str_iterator = type(iter(""))
tuple_iterator = type(iter(()))
zip_iterator = type(iter(zip()))
## views ##
dict_keys = type({}.keys())
dict_values = type({}.values())
dict_items = type({}.items())
## misc ##
mappingproxy = type(type.__dict__)
### ONE-TRICK PONIES ###
class Hashable(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __hash__(self):
return 0
@classmethod
def __subclasshook__(cls, C):
if cls is Hashable:
for B in C.__mro__:
if "__hash__" in B.__dict__:
if B.__dict__["__hash__"]:
return True
break
return NotImplemented
class Iterable(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __iter__(self):
while False:
yield None
@classmethod
def __subclasshook__(cls, C):
if cls is Iterable:
if any("__iter__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
class Iterator(Iterable):
__slots__ = ()
@abstractmethod
def __next__(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
raise StopIteration
def __iter__(self):
return self
@classmethod
def __subclasshook__(cls, C):
if cls is Iterator:
if (any("__next__" in B.__dict__ for B in C.__mro__) and
any("__iter__" in B.__dict__ for B in C.__mro__)):
return True
return NotImplemented
Iterator.register(bytes_iterator)
Iterator.register(bytearray_iterator)
#Iterator.register(callable_iterator)
Iterator.register(dict_keyiterator)
Iterator.register(dict_valueiterator)
Iterator.register(dict_itemiterator)
Iterator.register(list_iterator)
Iterator.register(list_reverseiterator)
Iterator.register(range_iterator)
Iterator.register(set_iterator)
Iterator.register(str_iterator)
Iterator.register(tuple_iterator)
Iterator.register(zip_iterator)
class Sized(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __len__(self):
return 0
@classmethod
def __subclasshook__(cls, C):
if cls is Sized:
if any("__len__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
class Container(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __contains__(self, x):
return False
@classmethod
def __subclasshook__(cls, C):
if cls is Container:
if any("__contains__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
class Callable(metaclass=ABCMeta):
__slots__ = ()
@abstractmethod
def __call__(self, *args, **kwds):
return False
@classmethod
def __subclasshook__(cls, C):
if cls is Callable:
if any("__call__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
### SETS ###
class Set(Sized, Iterable, Container):
"""A set is a finite, iterable container.
This class provides concrete generic implementations of all
methods except for __contains__, __iter__ and __len__.
To override the comparisons (presumably for speed, as the
semantics are fixed), redefine __le__ and __ge__,
then the other operations will automatically follow suit.
"""
__slots__ = ()
def __le__(self, other):
if not isinstance(other, Set):
return NotImplemented
if len(self) > len(other):
return False
for elem in self:
if elem not in other:
return False
return True
def __lt__(self, other):
if not isinstance(other, Set):
return NotImplemented
return len(self) < len(other) and self.__le__(other)
def __gt__(self, other):
if not isinstance(other, Set):
return NotImplemented
return len(self) > len(other) and self.__ge__(other)
def __ge__(self, other):
if not isinstance(other, Set):
return NotImplemented
if len(self) < len(other):
return False
for elem in other:
if elem not in self:
return False
return True
def __eq__(self, other):
if not isinstance(other, Set):
return NotImplemented
return len(self) == len(other) and self.__le__(other)
@classmethod
def _from_iterable(cls, it):
'''Construct an instance of the class from any iterable input.
Must override this method if the class constructor signature
does not accept an iterable for an input.
'''
return cls(it)
def __and__(self, other):
if not isinstance(other, Iterable):
return NotImplemented
return self._from_iterable(value for value in other if value in self)
__rand__ = __and__
def isdisjoint(self, other):
'Return True if two sets have a null intersection.'
for value in other:
if value in self:
return False
return True
def __or__(self, other):
if not isinstance(other, Iterable):
return NotImplemented
chain = (e for s in (self, other) for e in s)
return self._from_iterable(chain)
__ror__ = __or__
def __sub__(self, other):
if not isinstance(other, Set):
if not isinstance(other, Iterable):
return NotImplemented
other = self._from_iterable(other)
return self._from_iterable(value for value in self
if value not in other)
def __rsub__(self, other):
if not isinstance(other, Set):
if not isinstance(other, Iterable):
return NotImplemented
other = self._from_iterable(other)
return self._from_iterable(value for value in other
if value not in self)
def __xor__(self, other):
if not isinstance(other, Set):
if not isinstance(other, Iterable):
return NotImplemented
other = self._from_iterable(other)
return (self - other) | (other - self)
__rxor__ = __xor__
def _hash(self):
"""Compute the hash value of a set.
Note that we don't define __hash__: not all sets are hashable.
But if you define a hashable set type, its __hash__ should
call this function.
This must be compatible __eq__.
All sets ought to compare equal if they contain the same
elements, regardless of how they are implemented, and
regardless of the order of the elements; so there's not much
freedom for __eq__ or __hash__. We match the algorithm used
by the built-in frozenset type.
"""
MAX = sys.maxsize
MASK = 2 * MAX + 1
n = len(self)
h = 1927868237 * (n + 1)
h &= MASK
for x in self:
hx = hash(x)
h ^= (hx ^ (hx << 16) ^ 89869747) * 3644798167
h &= MASK
h = h * 69069 + 907133923
h &= MASK
if h > MAX:
h -= MASK + 1
if h == -1:
h = 590923713
return h
Set.register(frozenset)
class MutableSet(Set):
"""A mutable set is a finite, iterable container.
This class provides concrete generic implementations of all
methods except for __contains__, __iter__, __len__,
add(), and discard().
To override the comparisons (presumably for speed, as the
semantics are fixed), all you have to do is redefine __le__ and
then the other operations will automatically follow suit.
"""
__slots__ = ()
@abstractmethod
def add(self, value):
"""Add an element."""
raise NotImplementedError
@abstractmethod
def discard(self, value):
"""Remove an element. Do not raise an exception if absent."""
raise NotImplementedError
def remove(self, value):
"""Remove an element. If not a member, raise a KeyError."""
if value not in self:
raise KeyError(value)
self.discard(value)
def pop(self):
"""Return the popped value. Raise KeyError if empty."""
it = iter(self)
try:
value = next(it)
except StopIteration:
raise KeyError
self.discard(value)
return value
def clear(self):
"""This is slow (creates N new iterators!) but effective."""
try:
while True:
self.pop()
except KeyError:
pass
def __ior__(self, it):
for value in it:
self.add(value)
return self
def __iand__(self, it):
for value in (self - it):
self.discard(value)
return self
def __ixor__(self, it):
if it is self:
self.clear()
else:
if not isinstance(it, Set):
it = self._from_iterable(it)
for value in it:
if value in self:
self.discard(value)
else:
self.add(value)
return self
def __isub__(self, it):
if it is self:
self.clear()
else:
for value in it:
self.discard(value)
return self
MutableSet.register(set)
### MAPPINGS ###
class Mapping(Sized, Iterable, Container):
__slots__ = ()
"""A Mapping is a generic container for associating key/value
pairs.
This class provides concrete generic implementations of all
methods except for __getitem__, __iter__, and __len__.
"""
@abstractmethod
def __getitem__(self, key):
raise KeyError
def get(self, key, default=None):
'D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.'
try:
return self[key]
except KeyError:
return default
def __contains__(self, key):
try:
self[key]
except KeyError:
return False
else:
return True
def keys(self):
"D.keys() -> a set-like object providing a view on D's keys"
return KeysView(self)
def items(self):
"D.items() -> a set-like object providing a view on D's items"
return ItemsView(self)
def values(self):
"D.values() -> an object providing a view on D's values"
return ValuesView(self)
def __eq__(self, other):
if not isinstance(other, Mapping):
return NotImplemented
return dict(self.items()) == dict(other.items())
Mapping.register(mappingproxy)
class MappingView(Sized):
def __init__(self, mapping):
self._mapping = mapping
def __len__(self):
return len(self._mapping)
def __repr__(self):
return '{0.__class__.__name__}({0._mapping!r})'.format(self)
class KeysView(MappingView, Set):
@classmethod
def _from_iterable(self, it):
return set(it)
def __contains__(self, key):
return key in self._mapping
def __iter__(self):
yield from self._mapping
KeysView.register(dict_keys)
class ItemsView(MappingView, Set):
@classmethod
def _from_iterable(self, it):
return set(it)
def __contains__(self, item):
key, value = item
try:
v = self._mapping[key]
except KeyError:
return False
else:
return v == value
def __iter__(self):
for key in self._mapping:
yield (key, self._mapping[key])
ItemsView.register(dict_items)
class ValuesView(MappingView):
def __contains__(self, value):
for key in self._mapping:
if value == self._mapping[key]:
return True
return False
def __iter__(self):
for key in self._mapping:
yield self._mapping[key]
ValuesView.register(dict_values)
class MutableMapping(Mapping):
__slots__ = ()
"""A MutableMapping is a generic container for associating
key/value pairs.
This class provides concrete generic implementations of all
methods except for __getitem__, __setitem__, __delitem__,
__iter__, and __len__.
"""
@abstractmethod
def __setitem__(self, key, value):
raise KeyError
@abstractmethod
def __delitem__(self, key):
raise KeyError
__marker = object()
def pop(self, key, default=__marker):
'''D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised.
'''
try:
value = self[key]
except KeyError:
if default is self.__marker:
raise
return default
else:
del self[key]
return value
def popitem(self):
'''D.popitem() -> (k, v), remove and return some (key, value) pair
as a 2-tuple; but raise KeyError if D is empty.
'''
try:
key = next(iter(self))
except StopIteration:
raise KeyError
value = self[key]
del self[key]
return key, value
def clear(self):
'D.clear() -> None. Remove all items from D.'
try:
while True:
self.popitem()
except KeyError:
pass
def update(*args, **kwds):
''' D.update([E, ]**F) -> None. Update D from mapping/iterable E and F.
If E present and has a .keys() method, does: for k in E: D[k] = E[k]
If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v
In either case, this is followed by: for k, v in F.items(): D[k] = v
'''
if not args:
raise TypeError("descriptor 'update' of 'MutableMapping' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('update expected at most 1 arguments, got %d' %
len(args))
if args:
other = args[0]
if isinstance(other, Mapping):
for key in other:
self[key] = other[key]
elif hasattr(other, "keys"):
for key in other.keys():
self[key] = other[key]
else:
for key, value in other:
self[key] = value
for key, value in kwds.items():
self[key] = value
def setdefault(self, key, default=None):
'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
try:
return self[key]
except KeyError:
self[key] = default
return default
MutableMapping.register(dict)
### SEQUENCES ###
class Sequence(Sized, Iterable, Container):
"""All the operations on a read-only sequence.
Concrete subclasses must override __new__ or __init__,
__getitem__, and __len__.
"""
__slots__ = ()
@abstractmethod
def __getitem__(self, index):
raise IndexError
def __iter__(self):
i = 0
try:
while True:
v = self[i]
yield v
i += 1
except IndexError:
return
def __contains__(self, value):
for v in self:
if v == value:
return True
return False
def __reversed__(self):
for i in reversed(range(len(self))):
yield self[i]
def index(self, value):
'''S.index(value) -> integer -- return first index of value.
Raises ValueError if the value is not present.
'''
for i, v in enumerate(self):
if v == value:
return i
raise ValueError
def count(self, value):
'S.count(value) -> integer -- return number of occurrences of value'
return sum(1 for v in self if v == value)
Sequence.register(tuple)
Sequence.register(str)
Sequence.register(range)
Sequence.register(memoryview)
class ByteString(Sequence):
"""This unifies bytes and bytearray.
XXX Should add all their methods.
"""
__slots__ = ()
ByteString.register(bytes)
ByteString.register(bytearray)
class MutableSequence(Sequence):
__slots__ = ()
"""All the operations on a read-write sequence.
Concrete subclasses must provide __new__ or __init__,
__getitem__, __setitem__, __delitem__, __len__, and insert().
"""
@abstractmethod
def __setitem__(self, index, value):
raise IndexError
@abstractmethod
def __delitem__(self, index):
raise IndexError
@abstractmethod
def insert(self, index, value):
'S.insert(index, value) -- insert value before index'
raise IndexError
def append(self, value):
'S.append(value) -- append value to the end of the sequence'
self.insert(len(self), value)
def clear(self):
'S.clear() -> None -- remove all items from S'
try:
while True:
self.pop()
except IndexError:
pass
def reverse(self):
'S.reverse() -- reverse *IN PLACE*'
n = len(self)
for i in range(n//2):
self[i], self[n-i-1] = self[n-i-1], self[i]
def extend(self, values):
'S.extend(iterable) -- extend sequence by appending elements from the iterable'
for v in values:
self.append(v)
def pop(self, index=-1):
'''S.pop([index]) -> item -- remove and return item at index (default last).
Raise IndexError if list is empty or index is out of range.
'''
v = self[index]
del self[index]
return v
def remove(self, value):
'''S.remove(value) -- remove first occurrence of value.
Raise ValueError if the value is not present.
'''
del self[self.index(value)]
def __iadd__(self, values):
self.extend(values)
return self
MutableSequence.register(list)
MutableSequence.register(bytearray) # Multiply inheriting, see ByteString
# This module is used to map the old Python 2 names to the new names used in
# Python 3 for the pickle module. This needed to make pickle streams
# generated with Python 2 loadable by Python 3.
# This is a copy of lib2to3.fixes.fix_imports.MAPPING. We cannot import
# lib2to3 and use the mapping defined there, because lib2to3 uses pickle.
# Thus, this could cause the module to be imported recursively.
IMPORT_MAPPING = {
'__builtin__' : 'builtins',
'copy_reg': 'copyreg',
'Queue': 'queue',
'SocketServer': 'socketserver',
'ConfigParser': 'configparser',
'repr': 'reprlib',
'tkFileDialog': 'tkinter.filedialog',
'tkSimpleDialog': 'tkinter.simpledialog',
'tkColorChooser': 'tkinter.colorchooser',
'tkCommonDialog': 'tkinter.commondialog',
'Dialog': 'tkinter.dialog',
'Tkdnd': 'tkinter.dnd',
'tkFont': 'tkinter.font',
'tkMessageBox': 'tkinter.messagebox',
'ScrolledText': 'tkinter.scrolledtext',
'Tkconstants': 'tkinter.constants',
'Tix': 'tkinter.tix',
'ttk': 'tkinter.ttk',
'Tkinter': 'tkinter',
'markupbase': '_markupbase',
'_winreg': 'winreg',
'thread': '_thread',
'dummy_thread': '_dummy_thread',
'dbhash': 'dbm.bsd',
'dumbdbm': 'dbm.dumb',
'dbm': 'dbm.ndbm',
'gdbm': 'dbm.gnu',
'xmlrpclib': 'xmlrpc.client',
'SimpleXMLRPCServer': 'xmlrpc.server',
'httplib': 'http.client',
'htmlentitydefs' : 'html.entities',
'HTMLParser' : 'html.parser',
'Cookie': 'http.cookies',
'cookielib': 'http.cookiejar',
'BaseHTTPServer': 'http.server',
'test.test_support': 'test.support',
'commands': 'subprocess',
'urlparse' : 'urllib.parse',
'robotparser' : 'urllib.robotparser',
'urllib2': 'urllib.request',
'anydbm': 'dbm',
'_abcoll' : 'collections.abc',
}
# This contains rename rules that are easy to handle. We ignore the more
# complex stuff (e.g. mapping the names in the urllib and types modules).
# These rules should be run before import names are fixed.
NAME_MAPPING = {
('__builtin__', 'xrange'): ('builtins', 'range'),
('__builtin__', 'reduce'): ('functools', 'reduce'),
('__builtin__', 'intern'): ('sys', 'intern'),
('__builtin__', 'unichr'): ('builtins', 'chr'),
('__builtin__', 'unicode'): ('builtins', 'str'),
('__builtin__', 'long'): ('builtins', 'int'),
('itertools', 'izip'): ('builtins', 'zip'),
('itertools', 'imap'): ('builtins', 'map'),
('itertools', 'ifilter'): ('builtins', 'filter'),
('itertools', 'ifilterfalse'): ('itertools', 'filterfalse'),
('itertools', 'izip_longest'): ('itertools', 'zip_longest'),
('UserDict', 'IterableUserDict'): ('collections', 'UserDict'),
('UserList', 'UserList'): ('collections', 'UserList'),
('UserString', 'UserString'): ('collections', 'UserString'),
('whichdb', 'whichdb'): ('dbm', 'whichdb'),
('_socket', 'fromfd'): ('socket', 'fromfd'),
('_multiprocessing', 'Connection'): ('multiprocessing.connection', 'Connection'),
('multiprocessing.process', 'Process'): ('multiprocessing.context', 'Process'),
('multiprocessing.forking', 'Popen'): ('multiprocessing.popen_fork', 'Popen'),
('urllib', 'ContentTooShortError'): ('urllib.error', 'ContentTooShortError'),
('urllib', 'getproxies'): ('urllib.request', 'getproxies'),
('urllib', 'pathname2url'): ('urllib.request', 'pathname2url'),
('urllib', 'quote_plus'): ('urllib.parse', 'quote_plus'),
('urllib', 'quote'): ('urllib.parse', 'quote'),
('urllib', 'unquote_plus'): ('urllib.parse', 'unquote_plus'),
('urllib', 'unquote'): ('urllib.parse', 'unquote'),
('urllib', 'url2pathname'): ('urllib.request', 'url2pathname'),
('urllib', 'urlcleanup'): ('urllib.request', 'urlcleanup'),
('urllib', 'urlencode'): ('urllib.parse', 'urlencode'),
('urllib', 'urlopen'): ('urllib.request', 'urlopen'),
('urllib', 'urlretrieve'): ('urllib.request', 'urlretrieve'),
('urllib2', 'HTTPError'): ('urllib.error', 'HTTPError'),
('urllib2', 'URLError'): ('urllib.error', 'URLError'),
}
PYTHON2_EXCEPTIONS = (
"ArithmeticError",
"AssertionError",
"AttributeError",
"BaseException",
"BufferError",
"BytesWarning",
"DeprecationWarning",
"EOFError",
"EnvironmentError",
"Exception",
"FloatingPointError",
"FutureWarning",
"GeneratorExit",
"IOError",
"ImportError",
"ImportWarning",
"IndentationError",
"IndexError",
"KeyError",
"KeyboardInterrupt",
"LookupError",
"MemoryError",
"NameError",
"NotImplementedError",
"OSError",
"OverflowError",
"PendingDeprecationWarning",
"ReferenceError",
"RuntimeError",
"RuntimeWarning",
# StandardError is gone in Python 3, so we map it to Exception
"StopIteration",
"SyntaxError",
"SyntaxWarning",
"SystemError",
"SystemExit",
"TabError",
"TypeError",
"UnboundLocalError",
"UnicodeDecodeError",
"UnicodeEncodeError",
"UnicodeError",
"UnicodeTranslateError",
"UnicodeWarning",
"UserWarning",
"ValueError",
"Warning",
"ZeroDivisionError",
)
try:
WindowsError
except NameError:
pass
else:
PYTHON2_EXCEPTIONS += ("WindowsError",)
for excname in PYTHON2_EXCEPTIONS:
NAME_MAPPING[("exceptions", excname)] = ("builtins", excname)
MULTIPROCESSING_EXCEPTIONS = (
'AuthenticationError',
'BufferTooShort',
'ProcessError',
'TimeoutError',
)
for excname in MULTIPROCESSING_EXCEPTIONS:
NAME_MAPPING[("multiprocessing", excname)] = ("multiprocessing.context", excname)
# Same, but for 3.x to 2.x
REVERSE_IMPORT_MAPPING = dict((v, k) for (k, v) in IMPORT_MAPPING.items())
assert len(REVERSE_IMPORT_MAPPING) == len(IMPORT_MAPPING)
REVERSE_NAME_MAPPING = dict((v, k) for (k, v) in NAME_MAPPING.items())
assert len(REVERSE_NAME_MAPPING) == len(NAME_MAPPING)
# Non-mutual mappings.
IMPORT_MAPPING.update({
'cPickle': 'pickle',
'_elementtree': 'xml.etree.ElementTree',
'FileDialog': 'tkinter.filedialog',
'SimpleDialog': 'tkinter.simpledialog',
'DocXMLRPCServer': 'xmlrpc.server',
'SimpleHTTPServer': 'http.server',
'CGIHTTPServer': 'http.server',
})
REVERSE_IMPORT_MAPPING.update({
'_bz2': 'bz2',
'_dbm': 'dbm',
'_functools': 'functools',
'_gdbm': 'gdbm',
'_pickle': 'pickle',
})
NAME_MAPPING.update({
('__builtin__', 'basestring'): ('builtins', 'str'),
('exceptions', 'StandardError'): ('builtins', 'Exception'),
('UserDict', 'UserDict'): ('collections', 'UserDict'),
('socket', '_socketobject'): ('socket', 'SocketType'),
})
REVERSE_NAME_MAPPING.update({
('_functools', 'reduce'): ('__builtin__', 'reduce'),
('tkinter.filedialog', 'FileDialog'): ('FileDialog', 'FileDialog'),
('tkinter.filedialog', 'LoadFileDialog'): ('FileDialog', 'LoadFileDialog'),
('tkinter.filedialog', 'SaveFileDialog'): ('FileDialog', 'SaveFileDialog'),
('tkinter.simpledialog', 'SimpleDialog'): ('SimpleDialog', 'SimpleDialog'),
('xmlrpc.server', 'ServerHTMLDoc'): ('DocXMLRPCServer', 'ServerHTMLDoc'),
('xmlrpc.server', 'XMLRPCDocGenerator'):
('DocXMLRPCServer', 'XMLRPCDocGenerator'),
('xmlrpc.server', 'DocXMLRPCRequestHandler'):
('DocXMLRPCServer', 'DocXMLRPCRequestHandler'),
('xmlrpc.server', 'DocXMLRPCServer'):
('DocXMLRPCServer', 'DocXMLRPCServer'),
('xmlrpc.server', 'DocCGIXMLRPCRequestHandler'):
('DocXMLRPCServer', 'DocCGIXMLRPCRequestHandler'),
('http.server', 'SimpleHTTPRequestHandler'):
('SimpleHTTPServer', 'SimpleHTTPRequestHandler'),
('http.server', 'CGIHTTPRequestHandler'):
('CGIHTTPServer', 'CGIHTTPRequestHandler'),
('_socket', 'socket'): ('socket', '_socketobject'),
})
PYTHON3_OSERROR_EXCEPTIONS = (
'BrokenPipeError',
'ChildProcessError',
'ConnectionAbortedError',
'ConnectionError',
'ConnectionRefusedError',
'ConnectionResetError',
'FileExistsError',
'FileNotFoundError',
'InterruptedError',
'IsADirectoryError',
'NotADirectoryError',
'PermissionError',
'ProcessLookupError',
'TimeoutError',
)
for excname in PYTHON3_OSERROR_EXCEPTIONS:
REVERSE_NAME_MAPPING[('builtins', excname)] = ('exceptions', 'OSError')
"""Drop-in replacement for the thread module.
Meant to be used as a brain-dead substitute so that threaded code does
not need to be rewritten for when the thread module is not present.
Suggested usage is::
try:
import _thread
except ImportError:
import _dummy_thread as _thread
"""
# Exports only things specified by thread documentation;
# skipping obsolete synonyms allocate(), start_new(), exit_thread().
__all__ = ['error', 'start_new_thread', 'exit', 'get_ident', 'allocate_lock',
'interrupt_main', 'LockType']
# A dummy value
TIMEOUT_MAX = 2**31
# NOTE: this module can be imported early in the extension building process,
# and so top level imports of other modules should be avoided. Instead, all
# imports are done when needed on a function-by-function basis. Since threads
# are disabled, the import lock should not be an issue anyway (??).
error = RuntimeError
def start_new_thread(function, args, kwargs={}):
"""Dummy implementation of _thread.start_new_thread().
Compatibility is maintained by making sure that ``args`` is a
tuple and ``kwargs`` is a dictionary. If an exception is raised
and it is SystemExit (which can be done by _thread.exit()) it is
caught and nothing is done; all other exceptions are printed out
by using traceback.print_exc().
If the executed function calls interrupt_main the KeyboardInterrupt will be
raised when the function returns.
"""
if type(args) != type(tuple()):
raise TypeError("2nd arg must be a tuple")
if type(kwargs) != type(dict()):
raise TypeError("3rd arg must be a dict")
global _main
_main = False
try:
function(*args, **kwargs)
except SystemExit:
pass
except:
import traceback
traceback.print_exc()
_main = True
global _interrupt
if _interrupt:
_interrupt = False
raise KeyboardInterrupt
def exit():
"""Dummy implementation of _thread.exit()."""
raise SystemExit
def get_ident():
"""Dummy implementation of _thread.get_ident().
Since this module should only be used when _threadmodule is not
available, it is safe to assume that the current process is the
only thread. Thus a constant can be safely returned.
"""
return -1
def allocate_lock():
"""Dummy implementation of _thread.allocate_lock()."""
return LockType()
def stack_size(size=None):
"""Dummy implementation of _thread.stack_size()."""
if size is not None:
raise error("setting thread stack size not supported")
return 0
def _set_sentinel():
"""Dummy implementation of _thread._set_sentinel()."""
return LockType()
class LockType(object):
"""Class implementing dummy implementation of _thread.LockType.
Compatibility is maintained by maintaining self.locked_status
which is a boolean that stores the state of the lock. Pickling of
the lock, though, should not be done since if the _thread module is
then used with an unpickled ``lock()`` from here problems could
occur from this class not having atomic methods.
"""
def __init__(self):
self.locked_status = False
def acquire(self, waitflag=None, timeout=-1):
"""Dummy implementation of acquire().
For blocking calls, self.locked_status is automatically set to
True and returned appropriately based on value of
``waitflag``. If it is non-blocking, then the value is
actually checked and not set if it is already acquired. This
is all done so that threading.Condition's assert statements
aren't triggered and throw a little fit.
"""
if waitflag is None or waitflag:
self.locked_status = True
return True
else:
if not self.locked_status:
self.locked_status = True
return True
else:
if timeout > 0:
import time
time.sleep(timeout)
return False
__enter__ = acquire
def __exit__(self, typ, val, tb):
self.release()
def release(self):
"""Release the dummy lock."""
# XXX Perhaps shouldn't actually bother to test? Could lead
# to problems for complex, threaded code.
if not self.locked_status:
raise error
self.locked_status = False
return True
def locked(self):
return self.locked_status
# Used to signal that interrupt_main was called in a "thread"
_interrupt = False
# True when not executing in a "thread"
_main = True
def interrupt_main():
"""Set _interrupt flag to True to have start_new_thread raise
KeyboardInterrupt upon exiting."""
if _main:
raise KeyboardInterrupt
else:
global _interrupt
_interrupt = True
"""Shared support for scanning document type declarations in HTML and XHTML.
This module is used as a foundation for the html.parser module. It has no
documented public API and should not be used directly.
"""
import re
_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9]*\s*').match
_declstringlit_match = re.compile(r'(\'[^\']*\'|"[^"]*")\s*').match
_commentclose = re.compile(r'--\s*>')
_markedsectionclose = re.compile(r']\s*]\s*>')
# An analysis of the MS-Word extensions is available at
# http://www.planetpublish.com/xmlarena/xap/Thursday/WordtoXML.pdf
_msmarkedsectionclose = re.compile(r']\s*>')
del re
class ParserBase:
"""Parser base class which provides some common support methods used
by the SGML/HTML and XHTML parsers."""
def __init__(self):
if self.__class__ is ParserBase:
raise RuntimeError(
"_markupbase.ParserBase must be subclassed")
def error(self, message):
raise NotImplementedError(
"subclasses of ParserBase must override error()")
def reset(self):
self.lineno = 1
self.offset = 0
def getpos(self):
"""Return current line number and offset."""
return self.lineno, self.offset
# Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
def updatepos(self, i, j):
if i >= j:
return j
rawdata = self.rawdata
nlines = rawdata.count("\n", i, j)
if nlines:
self.lineno = self.lineno + nlines
pos = rawdata.rindex("\n", i, j) # Should not fail
self.offset = j-(pos+1)
else:
self.offset = self.offset + j-i
return j
_decl_otherchars = ''
# Internal -- parse declaration (for use by subclasses).
def parse_declaration(self, i):
# This is some sort of declaration; in "HTML as
# deployed," this should only be the document type
# declaration ("<!DOCTYPE html...>").
# ISO 8879:1986, however, has more complex
# declaration syntax for elements in <!...>, including:
# --comment--
# [marked section]
# name in the following list: ENTITY, DOCTYPE, ELEMENT,
# ATTLIST, NOTATION, SHORTREF, USEMAP,
# LINKTYPE, LINK, IDLINK, USELINK, SYSTEM
rawdata = self.rawdata
j = i + 2
assert rawdata[i:j] == "<!", "unexpected call to parse_declaration"
if rawdata[j:j+1] == ">":
# the empty comment <!>
return j + 1
if rawdata[j:j+1] in ("-", ""):
# Start of comment followed by buffer boundary,
# or just a buffer boundary.
return -1
# A simple, practical version could look like: ((name|stringlit) S*) + '>'
n = len(rawdata)
if rawdata[j:j+2] == '--': #comment
# Locate --.*-- as the body of the comment
return self.parse_comment(i)
elif rawdata[j] == '[': #marked section
# Locate [statusWord [...arbitrary SGML...]] as the body of the marked section
# Where statusWord is one of TEMP, CDATA, IGNORE, INCLUDE, RCDATA
# Note that this is extended by Microsoft Office "Save as Web" function
# to include [if...] and [endif].
return self.parse_marked_section(i)
else: #all other declaration elements
decltype, j = self._scan_name(j, i)
if j < 0:
return j
if decltype == "doctype":
self._decl_otherchars = ''
while j < n:
c = rawdata[j]
if c == ">":
# end of declaration syntax
data = rawdata[i+2:j]
if decltype == "doctype":
self.handle_decl(data)
else:
# According to the HTML5 specs sections "8.2.4.44 Bogus
# comment state" and "8.2.4.45 Markup declaration open
# state", a comment token should be emitted.
# Calling unknown_decl provides more flexibility though.
self.unknown_decl(data)
return j + 1
if c in "\"'":
m = _declstringlit_match(rawdata, j)
if not m:
return -1 # incomplete
j = m.end()
elif c in "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ":
name, j = self._scan_name(j, i)
elif c in self._decl_otherchars:
j = j + 1
elif c == "[":
# this could be handled in a separate doctype parser
if decltype == "doctype":
j = self._parse_doctype_subset(j + 1, i)
elif decltype in {"attlist", "linktype", "link", "element"}:
# must tolerate []'d groups in a content model in an element declaration
# also in data attribute specifications of attlist declaration
# also link type declaration subsets in linktype declarations
# also link attribute specification lists in link declarations
self.error("unsupported '[' char in %s declaration" % decltype)
else:
self.error("unexpected '[' char in declaration")
else:
self.error(
"unexpected %r char in declaration" % rawdata[j])
if j < 0:
return j
return -1 # incomplete
# Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
def parse_marked_section(self, i, report=1):
rawdata= self.rawdata
assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
sectName, j = self._scan_name( i+3, i )
if j < 0:
return j
if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
# look for standard ]]> ending
match= _markedsectionclose.search(rawdata, i+3)
elif sectName in {"if", "else", "endif"}:
# look for MS Office ]> ending
match= _msmarkedsectionclose.search(rawdata, i+3)
else:
self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
if not match:
return -1
if report:
j = match.start(0)
self.unknown_decl(rawdata[i+3: j])
return match.end(0)
# Internal -- parse comment, return length or -1 if not terminated
def parse_comment(self, i, report=1):
rawdata = self.rawdata
if rawdata[i:i+4] != '<!--':
self.error('unexpected call to parse_comment()')
match = _commentclose.search(rawdata, i+4)
if not match:
return -1
if report:
j = match.start(0)
self.handle_comment(rawdata[i+4: j])
return match.end(0)
# Internal -- scan past the internal subset in a <!DOCTYPE declaration,
# returning the index just past any whitespace following the trailing ']'.
def _parse_doctype_subset(self, i, declstartpos):
rawdata = self.rawdata
n = len(rawdata)
j = i
while j < n:
c = rawdata[j]
if c == "<":
s = rawdata[j:j+2]
if s == "<":
# end of buffer; incomplete
return -1
if s != "<!":
self.updatepos(declstartpos, j + 1)
self.error("unexpected char in internal subset (in %r)" % s)
if (j + 2) == n:
# end of buffer; incomplete
return -1
if (j + 4) > n:
# end of buffer; incomplete
return -1
if rawdata[j:j+4] == "<!--":
j = self.parse_comment(j, report=0)
if j < 0:
return j
continue
name, j = self._scan_name(j + 2, declstartpos)
if j == -1:
return -1
if name not in {"attlist", "element", "entity", "notation"}:
self.updatepos(declstartpos, j + 2)
self.error(
"unknown declaration %r in internal subset" % name)
# handle the individual names
meth = getattr(self, "_parse_doctype_" + name)
j = meth(j, declstartpos)
if j < 0:
return j
elif c == "%":
# parameter entity reference
if (j + 1) == n:
# end of buffer; incomplete
return -1
s, j = self._scan_name(j + 1, declstartpos)
if j < 0:
return j
if rawdata[j] == ";":
j = j + 1
elif c == "]":
j = j + 1
while j < n and rawdata[j].isspace():
j = j + 1
if j < n:
if rawdata[j] == ">":
return j
self.updatepos(declstartpos, j)
self.error("unexpected char after internal subset")
else:
return -1
elif c.isspace():
j = j + 1
else:
self.updatepos(declstartpos, j)
self.error("unexpected char %r in internal subset" % c)
# end of buffer reached
return -1
# Internal -- scan past <!ELEMENT declarations
def _parse_doctype_element(self, i, declstartpos):
name, j = self._scan_name(i, declstartpos)
if j == -1:
return -1
# style content model; just skip until '>'
rawdata = self.rawdata
if '>' in rawdata[j:]:
return rawdata.find(">", j) + 1
return -1
# Internal -- scan past <!ATTLIST declarations
def _parse_doctype_attlist(self, i, declstartpos):
rawdata = self.rawdata
name, j = self._scan_name(i, declstartpos)
c = rawdata[j:j+1]
if c == "":
return -1
if c == ">":
return j + 1
while 1:
# scan a series of attribute descriptions; simplified:
# name type [value] [#constraint]
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
c = rawdata[j:j+1]
if c == "":
return -1
if c == "(":
# an enumerated type; look for ')'
if ")" in rawdata[j:]:
j = rawdata.find(")", j) + 1
else:
return -1
while rawdata[j:j+1].isspace():
j = j + 1
if not rawdata[j:]:
# end of buffer, incomplete
return -1
else:
name, j = self._scan_name(j, declstartpos)
c = rawdata[j:j+1]
if not c:
return -1
if c in "'\"":
m = _declstringlit_match(rawdata, j)
if m:
j = m.end()
else:
return -1
c = rawdata[j:j+1]
if not c:
return -1
if c == "#":
if rawdata[j:] == "#":
# end of buffer
return -1
name, j = self._scan_name(j + 1, declstartpos)
if j < 0:
return j
c = rawdata[j:j+1]
if not c:
return -1
if c == '>':
# all done
return j + 1
# Internal -- scan past <!NOTATION declarations
def _parse_doctype_notation(self, i, declstartpos):
name, j = self._scan_name(i, declstartpos)
if j < 0:
return j
rawdata = self.rawdata
while 1:
c = rawdata[j:j+1]
if not c:
# end of buffer; incomplete
return -1
if c == '>':
return j + 1
if c in "'\"":
m = _declstringlit_match(rawdata, j)
if not m:
return -1
j = m.end()
else:
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
# Internal -- scan past <!ENTITY declarations
def _parse_doctype_entity(self, i, declstartpos):
rawdata = self.rawdata
if rawdata[i:i+1] == "%":
j = i + 1
while 1:
c = rawdata[j:j+1]
if not c:
return -1
if c.isspace():
j = j + 1
else:
break
else:
j = i
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
while 1:
c = self.rawdata[j:j+1]
if not c:
return -1
if c in "'\"":
m = _declstringlit_match(rawdata, j)
if m:
j = m.end()
else:
return -1 # incomplete
elif c == ">":
return j + 1
else:
name, j = self._scan_name(j, declstartpos)
if j < 0:
return j
# Internal -- scan a name token and the new position and the token, or
# return -1 if we've reached the end of the buffer.
def _scan_name(self, i, declstartpos):
rawdata = self.rawdata
n = len(rawdata)
if i == n:
return None, -1
m = _declname_match(rawdata, i)
if m:
s = m.group()
name = s.strip()
if (i + len(s)) == n:
return None, -1 # end of buffer
return name.lower(), m.end()
else:
self.updatepos(declstartpos, i)
self.error("expected name token at %r"
% rawdata[declstartpos:declstartpos+20])
# To be overridden -- handlers for unknown objects
def unknown_decl(self, data):
pass
"""Shared OS X support functions."""
import os
import re
import sys
__all__ = [
'compiler_fixup',
'customize_config_vars',
'customize_compiler',
'get_platform_osx',
]
# configuration variables that may contain universal build flags,
# like "-arch" or "-isdkroot", that may need customization for
# the user environment
_UNIVERSAL_CONFIG_VARS = ('CFLAGS', 'LDFLAGS', 'CPPFLAGS', 'BASECFLAGS',
'BLDSHARED', 'LDSHARED', 'CC', 'CXX',
'PY_CFLAGS', 'PY_LDFLAGS', 'PY_CPPFLAGS',
'PY_CORE_CFLAGS')
# configuration variables that may contain compiler calls
_COMPILER_CONFIG_VARS = ('BLDSHARED', 'LDSHARED', 'CC', 'CXX')
# prefix added to original configuration variable names
_INITPRE = '_OSX_SUPPORT_INITIAL_'
def _find_executable(executable, path=None):
"""Tries to find 'executable' in the directories listed in 'path'.
A string listing directories separated by 'os.pathsep'; defaults to
os.environ['PATH']. Returns the complete filename or None if not found.
"""
if path is None:
path = os.environ['PATH']
paths = path.split(os.pathsep)
base, ext = os.path.splitext(executable)
if (sys.platform == 'win32') and (ext != '.exe'):
executable = executable + '.exe'
if not os.path.isfile(executable):
for p in paths:
f = os.path.join(p, executable)
if os.path.isfile(f):
# the file exists, we have a shot at spawn working
return f
return None
else:
return executable
def _read_output(commandstring):
"""Output from successful command execution or None"""
# Similar to os.popen(commandstring, "r").read(),
# but without actually using os.popen because that
# function is not usable during python bootstrap.
# tempfile is also not available then.
import contextlib
try:
import tempfile
fp = tempfile.NamedTemporaryFile()
except ImportError:
fp = open("/tmp/_osx_support.%s"%(
os.getpid(),), "w+b")
with contextlib.closing(fp) as fp:
cmd = "%s 2>/dev/null >'%s'" % (commandstring, fp.name)
return fp.read().decode('utf-8').strip() if not os.system(cmd) else None
def _find_build_tool(toolname):
"""Find a build tool on current path or using xcrun"""
return (_find_executable(toolname)
or _read_output("/usr/bin/xcrun -find %s" % (toolname,))
or ''
)
_SYSTEM_VERSION = None
def _get_system_version():
"""Return the OS X system version as a string"""
# Reading this plist is a documented way to get the system
# version (see the documentation for the Gestalt Manager)
# We avoid using platform.mac_ver to avoid possible bootstrap issues during
# the build of Python itself (distutils is used to build standard library
# extensions).
global _SYSTEM_VERSION
if _SYSTEM_VERSION is None:
_SYSTEM_VERSION = ''
try:
f = open('/System/Library/CoreServices/SystemVersion.plist')
except OSError:
# We're on a plain darwin box, fall back to the default
# behaviour.
pass
else:
try:
m = re.search(r'<key>ProductUserVisibleVersion</key>\s*'
r'<string>(.*?)</string>', f.read())
finally:
f.close()
if m is not None:
_SYSTEM_VERSION = '.'.join(m.group(1).split('.')[:2])
# else: fall back to the default behaviour
return _SYSTEM_VERSION
def _remove_original_values(_config_vars):
"""Remove original unmodified values for testing"""
# This is needed for higher-level cross-platform tests of get_platform.
for k in list(_config_vars):
if k.startswith(_INITPRE):
del _config_vars[k]
def _save_modified_value(_config_vars, cv, newvalue):
"""Save modified and original unmodified value of configuration var"""
oldvalue = _config_vars.get(cv, '')
if (oldvalue != newvalue) and (_INITPRE + cv not in _config_vars):
_config_vars[_INITPRE + cv] = oldvalue
_config_vars[cv] = newvalue
def _supports_universal_builds():
"""Returns True if universal builds are supported on this system"""
# As an approximation, we assume that if we are running on 10.4 or above,
# then we are running with an Xcode environment that supports universal
# builds, in particular -isysroot and -arch arguments to the compiler. This
# is in support of allowing 10.4 universal builds to run on 10.3.x systems.
osx_version = _get_system_version()
if osx_version:
try:
osx_version = tuple(int(i) for i in osx_version.split('.'))
except ValueError:
osx_version = ''
return bool(osx_version >= (10, 4)) if osx_version else False
def _find_appropriate_compiler(_config_vars):
"""Find appropriate C compiler for extension module builds"""
# Issue #13590:
# The OSX location for the compiler varies between OSX
# (or rather Xcode) releases. With older releases (up-to 10.5)
# the compiler is in /usr/bin, with newer releases the compiler
# can only be found inside Xcode.app if the "Command Line Tools"
# are not installed.
#
# Futhermore, the compiler that can be used varies between
# Xcode releases. Up to Xcode 4 it was possible to use 'gcc-4.2'
# as the compiler, after that 'clang' should be used because
# gcc-4.2 is either not present, or a copy of 'llvm-gcc' that
# miscompiles Python.
# skip checks if the compiler was overriden with a CC env variable
if 'CC' in os.environ:
return _config_vars
# The CC config var might contain additional arguments.
# Ignore them while searching.
cc = oldcc = _config_vars['CC'].split()[0]
if not _find_executable(cc):
# Compiler is not found on the shell search PATH.
# Now search for clang, first on PATH (if the Command LIne
# Tools have been installed in / or if the user has provided
# another location via CC). If not found, try using xcrun
# to find an uninstalled clang (within a selected Xcode).
# NOTE: Cannot use subprocess here because of bootstrap
# issues when building Python itself (and os.popen is
# implemented on top of subprocess and is therefore not
# usable as well)
cc = _find_build_tool('clang')
elif os.path.basename(cc).startswith('gcc'):
# Compiler is GCC, check if it is LLVM-GCC
data = _read_output("'%s' --version"
% (cc.replace("'", "'\"'\"'"),))
if data and 'llvm-gcc' in data:
# Found LLVM-GCC, fall back to clang
cc = _find_build_tool('clang')
if not cc:
raise SystemError(
"Cannot locate working compiler")
if cc != oldcc:
# Found a replacement compiler.
# Modify config vars using new compiler, if not already explicitly
# overriden by an env variable, preserving additional arguments.
for cv in _COMPILER_CONFIG_VARS:
if cv in _config_vars and cv not in os.environ:
cv_split = _config_vars[cv].split()
cv_split[0] = cc if cv != 'CXX' else cc + '++'
_save_modified_value(_config_vars, cv, ' '.join(cv_split))
return _config_vars
def _remove_universal_flags(_config_vars):
"""Remove all universal build arguments from config vars"""
for cv in _UNIVERSAL_CONFIG_VARS:
# Do not alter a config var explicitly overriden by env var
if cv in _config_vars and cv not in os.environ:
flags = _config_vars[cv]
flags = re.sub('-arch\s+\w+\s', ' ', flags, re.ASCII)
flags = re.sub('-isysroot [^ \t]*', ' ', flags)
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def _remove_unsupported_archs(_config_vars):
"""Remove any unsupported archs from config vars"""
# Different Xcode releases support different sets for '-arch'
# flags. In particular, Xcode 4.x no longer supports the
# PPC architectures.
#
# This code automatically removes '-arch ppc' and '-arch ppc64'
# when these are not supported. That makes it possible to
# build extensions on OSX 10.7 and later with the prebuilt
# 32-bit installer on the python.org website.
# skip checks if the compiler was overriden with a CC env variable
if 'CC' in os.environ:
return _config_vars
if re.search('-arch\s+ppc', _config_vars['CFLAGS']) is not None:
# NOTE: Cannot use subprocess here because of bootstrap
# issues when building Python itself
status = os.system(
"""echo 'int main{};' | """
"""'%s' -c -arch ppc -x c -o /dev/null /dev/null 2>/dev/null"""
%(_config_vars['CC'].replace("'", "'\"'\"'"),))
if status:
# The compile failed for some reason. Because of differences
# across Xcode and compiler versions, there is no reliable way
# to be sure why it failed. Assume here it was due to lack of
# PPC support and remove the related '-arch' flags from each
# config variables not explicitly overriden by an environment
# variable. If the error was for some other reason, we hope the
# failure will show up again when trying to compile an extension
# module.
for cv in _UNIVERSAL_CONFIG_VARS:
if cv in _config_vars and cv not in os.environ:
flags = _config_vars[cv]
flags = re.sub('-arch\s+ppc\w*\s', ' ', flags)
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def _override_all_archs(_config_vars):
"""Allow override of all archs with ARCHFLAGS env var"""
# NOTE: This name was introduced by Apple in OSX 10.5 and
# is used by several scripting languages distributed with
# that OS release.
if 'ARCHFLAGS' in os.environ:
arch = os.environ['ARCHFLAGS']
for cv in _UNIVERSAL_CONFIG_VARS:
if cv in _config_vars and '-arch' in _config_vars[cv]:
flags = _config_vars[cv]
flags = re.sub('-arch\s+\w+\s', ' ', flags)
flags = flags + ' ' + arch
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def _check_for_unavailable_sdk(_config_vars):
"""Remove references to any SDKs not available"""
# If we're on OSX 10.5 or later and the user tries to
# compile an extension using an SDK that is not present
# on the current machine it is better to not use an SDK
# than to fail. This is particularly important with
# the standalone Command Line Tools alternative to a
# full-blown Xcode install since the CLT packages do not
# provide SDKs. If the SDK is not present, it is assumed
# that the header files and dev libs have been installed
# to /usr and /System/Library by either a standalone CLT
# package or the CLT component within Xcode.
cflags = _config_vars.get('CFLAGS', '')
m = re.search(r'-isysroot\s+(\S+)', cflags)
if m is not None:
sdk = m.group(1)
if not os.path.exists(sdk):
for cv in _UNIVERSAL_CONFIG_VARS:
# Do not alter a config var explicitly overriden by env var
if cv in _config_vars and cv not in os.environ:
flags = _config_vars[cv]
flags = re.sub(r'-isysroot\s+\S+(?:\s|$)', ' ', flags)
_save_modified_value(_config_vars, cv, flags)
return _config_vars
def compiler_fixup(compiler_so, cc_args):
"""
This function will strip '-isysroot PATH' and '-arch ARCH' from the
compile flags if the user has specified one them in extra_compile_flags.
This is needed because '-arch ARCH' adds another architecture to the
build, without a way to remove an architecture. Furthermore GCC will
barf if multiple '-isysroot' arguments are present.
"""
stripArch = stripSysroot = False
compiler_so = list(compiler_so)
if not _supports_universal_builds():
# OSX before 10.4.0, these don't support -arch and -isysroot at
# all.
stripArch = stripSysroot = True
else:
stripArch = '-arch' in cc_args
stripSysroot = '-isysroot' in cc_args
if stripArch or 'ARCHFLAGS' in os.environ:
while True:
try:
index = compiler_so.index('-arch')
# Strip this argument and the next one:
del compiler_so[index:index+2]
except ValueError:
break
if 'ARCHFLAGS' in os.environ and not stripArch:
# User specified different -arch flags in the environ,
# see also distutils.sysconfig
compiler_so = compiler_so + os.environ['ARCHFLAGS'].split()
if stripSysroot:
while True:
try:
index = compiler_so.index('-isysroot')
# Strip this argument and the next one:
del compiler_so[index:index+2]
except ValueError:
break
# Check if the SDK that is used during compilation actually exists,
# the universal build requires the usage of a universal SDK and not all
# users have that installed by default.
sysroot = None
if '-isysroot' in cc_args:
idx = cc_args.index('-isysroot')
sysroot = cc_args[idx+1]
elif '-isysroot' in compiler_so:
idx = compiler_so.index('-isysroot')
sysroot = compiler_so[idx+1]
if sysroot and not os.path.isdir(sysroot):
from distutils import log
log.warn("Compiling with an SDK that doesn't seem to exist: %s",
sysroot)
log.warn("Please check your Xcode installation")
return compiler_so
def customize_config_vars(_config_vars):
"""Customize Python build configuration variables.
Called internally from sysconfig with a mutable mapping
containing name/value pairs parsed from the configured
makefile used to build this interpreter. Returns
the mapping updated as needed to reflect the environment
in which the interpreter is running; in the case of
a Python from a binary installer, the installed
environment may be very different from the build
environment, i.e. different OS levels, different
built tools, different available CPU architectures.
This customization is performed whenever
distutils.sysconfig.get_config_vars() is first
called. It may be used in environments where no
compilers are present, i.e. when installing pure
Python dists. Customization of compiler paths
and detection of unavailable archs is deferred
until the first extension module build is
requested (in distutils.sysconfig.customize_compiler).
Currently called from distutils.sysconfig
"""
if not _supports_universal_builds():
# On Mac OS X before 10.4, check if -arch and -isysroot
# are in CFLAGS or LDFLAGS and remove them if they are.
# This is needed when building extensions on a 10.3 system
# using a universal build of python.
_remove_universal_flags(_config_vars)
# Allow user to override all archs with ARCHFLAGS env var
_override_all_archs(_config_vars)
# Remove references to sdks that are not found
_check_for_unavailable_sdk(_config_vars)
return _config_vars
def customize_compiler(_config_vars):
"""Customize compiler path and configuration variables.
This customization is performed when the first
extension module build is requested
in distutils.sysconfig.customize_compiler).
"""
# Find a compiler to use for extension module builds
_find_appropriate_compiler(_config_vars)
# Remove ppc arch flags if not supported here
_remove_unsupported_archs(_config_vars)
# Allow user to override all archs with ARCHFLAGS env var
_override_all_archs(_config_vars)
return _config_vars
def get_platform_osx(_config_vars, osname, release, machine):
"""Filter values for get_platform()"""
# called from get_platform() in sysconfig and distutils.util
#
# For our purposes, we'll assume that the system version from
# distutils' perspective is what MACOSX_DEPLOYMENT_TARGET is set
# to. This makes the compatibility story a bit more sane because the
# machine is going to compile and link as if it were
# MACOSX_DEPLOYMENT_TARGET.
macver = _config_vars.get('MACOSX_DEPLOYMENT_TARGET', '')
macrelease = _get_system_version() or macver
macver = macver or macrelease
if macver:
release = macver
osname = "macosx"
# Use the original CFLAGS value, if available, so that we
# return the same machine type for the platform string.
# Otherwise, distutils may consider this a cross-compiling
# case and disallow installs.
cflags = _config_vars.get(_INITPRE+'CFLAGS',
_config_vars.get('CFLAGS', ''))
if macrelease:
try:
macrelease = tuple(int(i) for i in macrelease.split('.')[0:2])
except ValueError:
macrelease = (10, 0)
else:
# assume no universal support
macrelease = (10, 0)
if (macrelease >= (10, 4)) and '-arch' in cflags.strip():
# The universal build will build fat binaries, but not on
# systems before 10.4
machine = 'fat'
archs = re.findall('-arch\s+(\S+)', cflags)
archs = tuple(sorted(set(archs)))
if len(archs) == 1:
machine = archs[0]
elif archs == ('i386', 'ppc'):
machine = 'fat'
elif archs == ('i386', 'x86_64'):
machine = 'intel'
elif archs == ('i386', 'ppc', 'x86_64'):
machine = 'fat3'
elif archs == ('ppc64', 'x86_64'):
machine = 'fat64'
elif archs == ('i386', 'ppc', 'ppc64', 'x86_64'):
machine = 'universal'
else:
raise ValueError(
"Don't know machine value for archs=%r" % (archs,))
elif machine == 'i386':
# On OSX the machine type returned by uname is always the
# 32-bit variant, even if the executable architecture is
# the 64-bit variant
if sys.maxsize >= 2**32:
machine = 'x86_64'
elif machine in ('PowerPC', 'Power_Macintosh'):
# Pick a sane name for the PPC architecture.
# See 'i386' case
if sys.maxsize >= 2**32:
machine = 'ppc64'
else:
machine = 'ppc'
return (osname, release, machine)
"""
Python implementation of the io module.
"""
import os
import abc
import codecs
import errno
# Import _thread instead of threading to reduce startup cost
try:
from _thread import allocate_lock as Lock
except ImportError:
from _dummy_thread import allocate_lock as Lock
import io
from io import (__all__, SEEK_SET, SEEK_CUR, SEEK_END)
valid_seek_flags = {0, 1, 2} # Hardwired values
if hasattr(os, 'SEEK_HOLE') :
valid_seek_flags.add(os.SEEK_HOLE)
valid_seek_flags.add(os.SEEK_DATA)
# open() uses st_blksize whenever we can
DEFAULT_BUFFER_SIZE = 8 * 1024 # bytes
# NOTE: Base classes defined here are registered with the "official" ABCs
# defined in io.py. We don't use real inheritance though, because we don't want
# to inherit the C implementations.
# Rebind for compatibility
BlockingIOError = BlockingIOError
def open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None):
r"""Open file and return a stream. Raise OSError upon failure.
file is either a text or byte string giving the name (and the path
if the file isn't in the current working directory) of the file to
be opened or an integer file descriptor of the file to be
wrapped. (If a file descriptor is given, it is closed when the
returned I/O object is closed, unless closefd is set to False.)
mode is an optional string that specifies the mode in which the file is
opened. It defaults to 'r' which means open for reading in text mode. Other
common values are 'w' for writing (truncating the file if it already
exists), 'x' for exclusive creation of a new file, and 'a' for appending
(which on some Unix systems, means that all writes append to the end of the
file regardless of the current seek position). In text mode, if encoding is
not specified the encoding used is platform dependent. (For reading and
writing raw bytes use binary mode and leave encoding unspecified.) The
available modes are:
========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r' open for reading (default)
'w' open for writing, truncating the file first
'x' create a new file and open it for writing
'a' open for writing, appending to the end of the file if it exists
'b' binary mode
't' text mode (default)
'+' open a disk file for updating (reading and writing)
'U' universal newline mode (deprecated)
========= ===============================================================
The default mode is 'rt' (open for reading text). For binary random
access, the mode 'w+b' opens and truncates the file to 0 bytes, while
'r+b' opens the file without truncation. The 'x' mode implies 'w' and
raises an `FileExistsError` if the file already exists.
Python distinguishes between files opened in binary and text modes,
even when the underlying operating system doesn't. Files opened in
binary mode (appending 'b' to the mode argument) return contents as
bytes objects without any decoding. In text mode (the default, or when
't' is appended to the mode argument), the contents of the file are
returned as strings, the bytes having been first decoded using a
platform-dependent encoding or using the specified encoding if given.
'U' mode is deprecated and will raise an exception in future versions
of Python. It has no effect in Python 3. Use newline to control
universal newlines mode.
buffering is an optional integer used to set the buffering policy.
Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
line buffering (only usable in text mode), and an integer > 1 to indicate
the size of a fixed-size chunk buffer. When no buffering argument is
given, the default buffering policy works as follows:
* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
On many systems, the buffer will typically be 4096 or 8192 bytes long.
* "Interactive" text files (files for which isatty() returns True)
use line buffering. Other text files use the policy described above
for binary files.
encoding is the str name of the encoding used to decode or encode the
file. This should only be used in text mode. The default encoding is
platform dependent, but any encoding supported by Python can be
passed. See the codecs module for the list of supported encodings.
errors is an optional string that specifies how encoding errors are to
be handled---this argument should not be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding error
(the default of None has the same effect), or pass 'ignore' to ignore
errors. (Note that ignoring encoding errors can lead to data loss.)
See the documentation for codecs.register for a list of the permitted
encoding error strings.
newline is a string controlling how universal newlines works (it only
applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works
as follows:
* On input, if newline is None, universal newlines mode is
enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
these are translated into '\n' before being returned to the
caller. If it is '', universal newline mode is enabled, but line
endings are returned to the caller untranslated. If it has any of
the other legal values, input lines are only terminated by the given
string, and the line ending is returned to the caller untranslated.
* On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '', no translation takes place. If newline is any of the
other legal values, any '\n' characters written are translated to
the given string.
closedfd is a bool. If closefd is False, the underlying file descriptor will
be kept open when the file is closed. This does not work when a file name is
given and must be True in that case.
The newly created file is non-inheritable.
A custom opener can be used by passing a callable as *opener*. The
underlying file descriptor for the file object is then obtained by calling
*opener* with (*file*, *flags*). *opener* must return an open file
descriptor (passing os.open as *opener* results in functionality similar to
passing None).
open() returns a file object whose type depends on the mode, and
through which the standard file operations such as reading and writing
are performed. When open() is used to open a file in a text mode ('w',
'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
a file in a binary mode, the returned class varies: in read binary
mode, it returns a BufferedReader; in write binary and append binary
modes, it returns a BufferedWriter, and in read/write mode, it returns
a BufferedRandom.
It is also possible to use a string or bytearray as a file for both
reading and writing. For strings StringIO can be used like a file
opened in a text mode, and for bytes a BytesIO can be used like a file
opened in a binary mode.
"""
if not isinstance(file, (str, bytes, int)):
raise TypeError("invalid file: %r" % file)
if not isinstance(mode, str):
raise TypeError("invalid mode: %r" % mode)
if not isinstance(buffering, int):
raise TypeError("invalid buffering: %r" % buffering)
if encoding is not None and not isinstance(encoding, str):
raise TypeError("invalid encoding: %r" % encoding)
if errors is not None and not isinstance(errors, str):
raise TypeError("invalid errors: %r" % errors)
modes = set(mode)
if modes - set("axrwb+tU") or len(mode) > len(modes):
raise ValueError("invalid mode: %r" % mode)
creating = "x" in modes
reading = "r" in modes
writing = "w" in modes
appending = "a" in modes
updating = "+" in modes
text = "t" in modes
binary = "b" in modes
if "U" in modes:
if creating or writing or appending:
raise ValueError("can't use U and writing mode at once")
import warnings
warnings.warn("'U' mode is deprecated",
DeprecationWarning, 2)
reading = True
if text and binary:
raise ValueError("can't have text and binary mode at once")
if creating + reading + writing + appending > 1:
raise ValueError("can't have read/write/append mode at once")
if not (creating or reading or writing or appending):
raise ValueError("must have exactly one of read/write/append mode")
if binary and encoding is not None:
raise ValueError("binary mode doesn't take an encoding argument")
if binary and errors is not None:
raise ValueError("binary mode doesn't take an errors argument")
if binary and newline is not None:
raise ValueError("binary mode doesn't take a newline argument")
raw = FileIO(file,
(creating and "x" or "") +
(reading and "r" or "") +
(writing and "w" or "") +
(appending and "a" or "") +
(updating and "+" or ""),
closefd, opener=opener)
result = raw
try:
line_buffering = False
if buffering == 1 or buffering < 0 and raw.isatty():
buffering = -1
line_buffering = True
if buffering < 0:
buffering = DEFAULT_BUFFER_SIZE
try:
bs = os.fstat(raw.fileno()).st_blksize
except (OSError, AttributeError):
pass
else:
if bs > 1:
buffering = bs
if buffering < 0:
raise ValueError("invalid buffering size")
if buffering == 0:
if binary:
return result
raise ValueError("can't have unbuffered text I/O")
if updating:
buffer = BufferedRandom(raw, buffering)
elif creating or writing or appending:
buffer = BufferedWriter(raw, buffering)
elif reading:
buffer = BufferedReader(raw, buffering)
else:
raise ValueError("unknown mode: %r" % mode)
result = buffer
if binary:
return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
return result
except:
result.close()
raise
class DocDescriptor:
"""Helper for builtins.open.__doc__
"""
def __get__(self, obj, typ):
return (
"open(file, mode='r', buffering=-1, encoding=None, "
"errors=None, newline=None, closefd=True)\n\n" +
open.__doc__)
class OpenWrapper:
"""Wrapper for builtins.open
Trick so that open won't become a bound method when stored
as a class variable (as dbm.dumb does).
See initstdio() in Python/pythonrun.c.
"""
__doc__ = DocDescriptor()
def __new__(cls, *args, **kwargs):
return open(*args, **kwargs)
# In normal operation, both `UnsupportedOperation`s should be bound to the
# same object.
try:
UnsupportedOperation = io.UnsupportedOperation
except AttributeError:
class UnsupportedOperation(ValueError, OSError):
pass
class IOBase(metaclass=abc.ABCMeta):
"""The abstract base class for all I/O classes, acting on streams of
bytes. There is no public constructor.
This class provides dummy implementations for many methods that
derived classes can override selectively; the default implementations
represent a file that cannot be read, written or seeked.
Even though IOBase does not declare read, readinto, or write because
their signatures will vary, implementations and clients should
consider those methods part of the interface. Also, implementations
may raise UnsupportedOperation when operations they do not support are
called.
The basic type used for binary data read from or written to a file is
bytes. bytearrays are accepted too, and in some cases (such as
readinto) needed. Text I/O classes work with str data.
Note that calling any method (even inquiries) on a closed stream is
undefined. Implementations may raise OSError in this case.
IOBase (and its subclasses) support the iterator protocol, meaning
that an IOBase object can be iterated over yielding the lines in a
stream.
IOBase also supports the :keyword:`with` statement. In this example,
fp is closed after the suite of the with statement is complete:
with open('spam.txt', 'r') as fp:
fp.write('Spam and eggs!')
"""
### Internal ###
def _unsupported(self, name):
"""Internal: raise an OSError exception for unsupported operations."""
raise UnsupportedOperation("%s.%s() not supported" %
(self.__class__.__name__, name))
### Positioning ###
def seek(self, pos, whence=0):
"""Change stream position.
Change the stream position to byte offset pos. Argument pos is
interpreted relative to the position indicated by whence. Values
for whence are ints:
* 0 -- start of stream (the default); offset should be zero or positive
* 1 -- current stream position; offset may be negative
* 2 -- end of stream; offset is usually negative
Some operating systems / file systems could provide additional values.
Return an int indicating the new absolute position.
"""
self._unsupported("seek")
def tell(self):
"""Return an int indicating the current stream position."""
return self.seek(0, 1)
def truncate(self, pos=None):
"""Truncate file to size bytes.
Size defaults to the current IO position as reported by tell(). Return
the new size.
"""
self._unsupported("truncate")
### Flush and close ###
def flush(self):
"""Flush write buffers, if applicable.
This is not implemented for read-only and non-blocking streams.
"""
self._checkClosed()
# XXX Should this return the number of bytes written???
__closed = False
def close(self):
"""Flush and close the IO object.
This method has no effect if the file is already closed.
"""
if not self.__closed:
try:
self.flush()
finally:
self.__closed = True
def __del__(self):
"""Destructor. Calls close()."""
# The try/except block is in case this is called at program
# exit time, when it's possible that globals have already been
# deleted, and then the close() call might fail. Since
# there's nothing we can do about such failures and they annoy
# the end users, we suppress the traceback.
try:
self.close()
except:
pass
### Inquiries ###
def seekable(self):
"""Return a bool indicating whether object supports random access.
If False, seek(), tell() and truncate() will raise UnsupportedOperation.
This method may need to do a test seek().
"""
return False
def _checkSeekable(self, msg=None):
"""Internal: raise UnsupportedOperation if file is not seekable
"""
if not self.seekable():
raise UnsupportedOperation("File or stream is not seekable."
if msg is None else msg)
def readable(self):
"""Return a bool indicating whether object was opened for reading.
If False, read() will raise UnsupportedOperation.
"""
return False
def _checkReadable(self, msg=None):
"""Internal: raise UnsupportedOperation if file is not readable
"""
if not self.readable():
raise UnsupportedOperation("File or stream is not readable."
if msg is None else msg)
def writable(self):
"""Return a bool indicating whether object was opened for writing.
If False, write() and truncate() will raise UnsupportedOperation.
"""
return False
def _checkWritable(self, msg=None):
"""Internal: raise UnsupportedOperation if file is not writable
"""
if not self.writable():
raise UnsupportedOperation("File or stream is not writable."
if msg is None else msg)
@property
def closed(self):
"""closed: bool. True iff the file has been closed.
For backwards compatibility, this is a property, not a predicate.
"""
return self.__closed
def _checkClosed(self, msg=None):
"""Internal: raise an ValueError if file is closed
"""
if self.closed:
raise ValueError("I/O operation on closed file."
if msg is None else msg)
### Context manager ###
def __enter__(self): # That's a forward reference
"""Context management protocol. Returns self (an instance of IOBase)."""
self._checkClosed()
return self
def __exit__(self, *args):
"""Context management protocol. Calls close()"""
self.close()
### Lower-level APIs ###
# XXX Should these be present even if unimplemented?
def fileno(self):
"""Returns underlying file descriptor (an int) if one exists.
An OSError is raised if the IO object does not use a file descriptor.
"""
self._unsupported("fileno")
def isatty(self):
"""Return a bool indicating whether this is an 'interactive' stream.
Return False if it can't be determined.
"""
self._checkClosed()
return False
### Readline[s] and writelines ###
def readline(self, size=-1):
r"""Read and return a line of bytes from the stream.
If size is specified, at most size bytes will be read.
Size should be an int.
The line terminator is always b'\n' for binary files; for text
files, the newlines argument to open can be used to select the line
terminator(s) recognized.
"""
# For backwards compatibility, a (slowish) readline().
if hasattr(self, "peek"):
def nreadahead():
readahead = self.peek(1)
if not readahead:
return 1
n = (readahead.find(b"\n") + 1) or len(readahead)
if size >= 0:
n = min(n, size)
return n
else:
def nreadahead():
return 1
if size is None:
size = -1
elif not isinstance(size, int):
raise TypeError("size must be an integer")
res = bytearray()
while size < 0 or len(res) < size:
b = self.read(nreadahead())
if not b:
break
res += b
if res.endswith(b"\n"):
break
return bytes(res)
def __iter__(self):
self._checkClosed()
return self
def __next__(self):
line = self.readline()
if not line:
raise StopIteration
return line
def readlines(self, hint=None):
"""Return a list of lines from the stream.
hint can be specified to control the number of lines read: no more
lines will be read if the total size (in bytes/characters) of all
lines so far exceeds hint.
"""
if hint is None or hint <= 0:
return list(self)
n = 0
lines = []
for line in self:
lines.append(line)
n += len(line)
if n >= hint:
break
return lines
def writelines(self, lines):
self._checkClosed()
for line in lines:
self.write(line)
io.IOBase.register(IOBase)
class RawIOBase(IOBase):
"""Base class for raw binary I/O."""
# The read() method is implemented by calling readinto(); derived
# classes that want to support read() only need to implement
# readinto() as a primitive operation. In general, readinto() can be
# more efficient than read().
# (It would be tempting to also provide an implementation of
# readinto() in terms of read(), in case the latter is a more suitable
# primitive operation, but that would lead to nasty recursion in case
# a subclass doesn't implement either.)
def read(self, size=-1):
"""Read and return up to size bytes, where size is an int.
Returns an empty bytes object on EOF, or None if the object is
set not to block and has no data to read.
"""
if size is None:
size = -1
if size < 0:
return self.readall()
b = bytearray(size.__index__())
n = self.readinto(b)
if n is None:
return None
del b[n:]
return bytes(b)
def readall(self):
"""Read until EOF, using multiple read() call."""
res = bytearray()
while True:
data = self.read(DEFAULT_BUFFER_SIZE)
if not data:
break
res += data
if res:
return bytes(res)
else:
# b'' or None
return data
def readinto(self, b):
"""Read up to len(b) bytes into bytearray b.
Returns an int representing the number of bytes read (0 for EOF), or
None if the object is set not to block and has no data to read.
"""
self._unsupported("readinto")
def write(self, b):
"""Write the given buffer to the IO stream.
Returns the number of bytes written, which may be less than len(b).
"""
self._unsupported("write")
io.RawIOBase.register(RawIOBase)
from _io import FileIO
RawIOBase.register(FileIO)
class BufferedIOBase(IOBase):
"""Base class for buffered IO objects.
The main difference with RawIOBase is that the read() method
supports omitting the size argument, and does not have a default
implementation that defers to readinto().
In addition, read(), readinto() and write() may raise
BlockingIOError if the underlying raw stream is in non-blocking
mode and not ready; unlike their raw counterparts, they will never
return None.
A typical implementation should not inherit from a RawIOBase
implementation, but wrap one.
"""
def read(self, size=None):
"""Read and return up to size bytes, where size is an int.
If the argument is omitted, None, or negative, reads and
returns all data until EOF.
If the argument is positive, and the underlying raw stream is
not 'interactive', multiple raw reads may be issued to satisfy
the byte count (unless EOF is reached first). But for
interactive raw streams (XXX and for pipes?), at most one raw
read will be issued, and a short result does not imply that
EOF is imminent.
Returns an empty bytes array on EOF.
Raises BlockingIOError if the underlying raw stream has no
data at the moment.
"""
self._unsupported("read")
def read1(self, size=None):
"""Read up to size bytes with at most one read() system call,
where size is an int.
"""
self._unsupported("read1")
def readinto(self, b):
"""Read up to len(b) bytes into bytearray b.
Like read(), this may issue multiple reads to the underlying raw
stream, unless the latter is 'interactive'.
Returns an int representing the number of bytes read (0 for EOF).
Raises BlockingIOError if the underlying raw stream has no
data at the moment.
"""
# XXX This ought to work with anything that supports the buffer API
data = self.read(len(b))
n = len(data)
try:
b[:n] = data
except TypeError as err:
import array
if not isinstance(b, array.array):
raise err
b[:n] = array.array('b', data)
return n
def write(self, b):
"""Write the given bytes buffer to the IO stream.
Return the number of bytes written, which is never less than
len(b).
Raises BlockingIOError if the buffer is full and the
underlying raw stream cannot accept more data at the moment.
"""
self._unsupported("write")
def detach(self):
"""
Separate the underlying raw stream from the buffer and return it.
After the raw stream has been detached, the buffer is in an unusable
state.
"""
self._unsupported("detach")
io.BufferedIOBase.register(BufferedIOBase)
class _BufferedIOMixin(BufferedIOBase):
"""A mixin implementation of BufferedIOBase with an underlying raw stream.
This passes most requests on to the underlying raw stream. It
does *not* provide implementations of read(), readinto() or
write().
"""
def __init__(self, raw):
self._raw = raw
### Positioning ###
def seek(self, pos, whence=0):
new_position = self.raw.seek(pos, whence)
if new_position < 0:
raise OSError("seek() returned an invalid position")
return new_position
def tell(self):
pos = self.raw.tell()
if pos < 0:
raise OSError("tell() returned an invalid position")
return pos
def truncate(self, pos=None):
# Flush the stream. We're mixing buffered I/O with lower-level I/O,
# and a flush may be necessary to synch both views of the current
# file state.
self.flush()
if pos is None:
pos = self.tell()
# XXX: Should seek() be used, instead of passing the position
# XXX directly to truncate?
return self.raw.truncate(pos)
### Flush and close ###
def flush(self):
if self.closed:
raise ValueError("flush of closed file")
self.raw.flush()
def close(self):
if self.raw is not None and not self.closed:
try:
# may raise BlockingIOError or BrokenPipeError etc
self.flush()
finally:
self.raw.close()
def detach(self):
if self.raw is None:
raise ValueError("raw stream already detached")
self.flush()
raw = self._raw
self._raw = None
return raw
### Inquiries ###
def seekable(self):
return self.raw.seekable()
def readable(self):
return self.raw.readable()
def writable(self):
return self.raw.writable()
@property
def raw(self):
return self._raw
@property
def closed(self):
return self.raw.closed
@property
def name(self):
return self.raw.name
@property
def mode(self):
return self.raw.mode
def __getstate__(self):
raise TypeError("can not serialize a '{0}' object"
.format(self.__class__.__name__))
def __repr__(self):
clsname = self.__class__.__name__
try:
name = self.name
except Exception:
return "<_pyio.{0}>".format(clsname)
else:
return "<_pyio.{0} name={1!r}>".format(clsname, name)
### Lower-level APIs ###
def fileno(self):
return self.raw.fileno()
def isatty(self):
return self.raw.isatty()
class BytesIO(BufferedIOBase):
"""Buffered I/O implementation using an in-memory bytes buffer."""
def __init__(self, initial_bytes=None):
buf = bytearray()
if initial_bytes is not None:
buf += initial_bytes
self._buffer = buf
self._pos = 0
def __getstate__(self):
if self.closed:
raise ValueError("__getstate__ on closed file")
return self.__dict__.copy()
def getvalue(self):
"""Return the bytes value (contents) of the buffer
"""
if self.closed:
raise ValueError("getvalue on closed file")
return bytes(self._buffer)
def getbuffer(self):
"""Return a readable and writable view of the buffer.
"""
if self.closed:
raise ValueError("getbuffer on closed file")
return memoryview(self._buffer)
def close(self):
self._buffer.clear()
super().close()
def read(self, size=None):
if self.closed:
raise ValueError("read from closed file")
if size is None:
size = -1
if size < 0:
size = len(self._buffer)
if len(self._buffer) <= self._pos:
return b""
newpos = min(len(self._buffer), self._pos + size)
b = self._buffer[self._pos : newpos]
self._pos = newpos
return bytes(b)
def read1(self, size):
"""This is the same as read.
"""
return self.read(size)
def write(self, b):
if self.closed:
raise ValueError("write to closed file")
if isinstance(b, str):
raise TypeError("can't write str to binary stream")
n = len(b)
if n == 0:
return 0
pos = self._pos
if pos > len(self._buffer):
# Inserts null bytes between the current end of the file
# and the new write position.
padding = b'\x00' * (pos - len(self._buffer))
self._buffer += padding
self._buffer[pos:pos + n] = b
self._pos += n
return n
def seek(self, pos, whence=0):
if self.closed:
raise ValueError("seek on closed file")
try:
pos.__index__
except AttributeError as err:
raise TypeError("an integer is required") from err
if whence == 0:
if pos < 0:
raise ValueError("negative seek position %r" % (pos,))
self._pos = pos
elif whence == 1:
self._pos = max(0, self._pos + pos)
elif whence == 2:
self._pos = max(0, len(self._buffer) + pos)
else:
raise ValueError("unsupported whence value")
return self._pos
def tell(self):
if self.closed:
raise ValueError("tell on closed file")
return self._pos
def truncate(self, pos=None):
if self.closed:
raise ValueError("truncate on closed file")
if pos is None:
pos = self._pos
else:
try:
pos.__index__
except AttributeError as err:
raise TypeError("an integer is required") from err
if pos < 0:
raise ValueError("negative truncate position %r" % (pos,))
del self._buffer[pos:]
return pos
def readable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return True
def writable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return True
def seekable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return True
class BufferedReader(_BufferedIOMixin):
"""BufferedReader(raw[, buffer_size])
A buffer for a readable, sequential BaseRawIO object.
The constructor creates a BufferedReader for the given readable raw
stream and buffer_size. If buffer_size is omitted, DEFAULT_BUFFER_SIZE
is used.
"""
def __init__(self, raw, buffer_size=DEFAULT_BUFFER_SIZE):
"""Create a new buffered reader using the given readable raw IO object.
"""
if not raw.readable():
raise OSError('"raw" argument must be readable.')
_BufferedIOMixin.__init__(self, raw)
if buffer_size <= 0:
raise ValueError("invalid buffer size")
self.buffer_size = buffer_size
self._reset_read_buf()
self._read_lock = Lock()
def _reset_read_buf(self):
self._read_buf = b""
self._read_pos = 0
def read(self, size=None):
"""Read size bytes.
Returns exactly size bytes of data unless the underlying raw IO
stream reaches EOF or if the call would block in non-blocking
mode. If size is negative, read until EOF or until read() would
block.
"""
if size is not None and size < -1:
raise ValueError("invalid number of bytes to read")
with self._read_lock:
return self._read_unlocked(size)
def _read_unlocked(self, n=None):
nodata_val = b""
empty_values = (b"", None)
buf = self._read_buf
pos = self._read_pos
# Special case for when the number of bytes to read is unspecified.
if n is None or n == -1:
self._reset_read_buf()
if hasattr(self.raw, 'readall'):
chunk = self.raw.readall()
if chunk is None:
return buf[pos:] or None
else:
return buf[pos:] + chunk
chunks = [buf[pos:]] # Strip the consumed bytes.
current_size = 0
while True:
# Read until EOF or until read() would block.
try:
chunk = self.raw.read()
except InterruptedError:
continue
if chunk in empty_values:
nodata_val = chunk
break
current_size += len(chunk)
chunks.append(chunk)
return b"".join(chunks) or nodata_val
# The number of bytes to read is specified, return at most n bytes.
avail = len(buf) - pos # Length of the available buffered data.
if n <= avail:
# Fast path: the data to read is fully buffered.
self._read_pos += n
return buf[pos:pos+n]
# Slow path: read from the stream until enough bytes are read,
# or until an EOF occurs or until read() would block.
chunks = [buf[pos:]]
wanted = max(self.buffer_size, n)
while avail < n:
try:
chunk = self.raw.read(wanted)
except InterruptedError:
continue
if chunk in empty_values:
nodata_val = chunk
break
avail += len(chunk)
chunks.append(chunk)
# n is more then avail only when an EOF occurred or when
# read() would have blocked.
n = min(n, avail)
out = b"".join(chunks)
self._read_buf = out[n:] # Save the extra data in the buffer.
self._read_pos = 0
return out[:n] if out else nodata_val
def peek(self, size=0):
"""Returns buffered bytes without advancing the position.
The argument indicates a desired minimal number of bytes; we
do at most one raw read to satisfy it. We never return more
than self.buffer_size.
"""
with self._read_lock:
return self._peek_unlocked(size)
def _peek_unlocked(self, n=0):
want = min(n, self.buffer_size)
have = len(self._read_buf) - self._read_pos
if have < want or have <= 0:
to_read = self.buffer_size - have
while True:
try:
current = self.raw.read(to_read)
except InterruptedError:
continue
break
if current:
self._read_buf = self._read_buf[self._read_pos:] + current
self._read_pos = 0
return self._read_buf[self._read_pos:]
def read1(self, size):
"""Reads up to size bytes, with at most one read() system call."""
# Returns up to size bytes. If at least one byte is buffered, we
# only return buffered bytes. Otherwise, we do one raw read.
if size < 0:
raise ValueError("number of bytes to read must be positive")
if size == 0:
return b""
with self._read_lock:
self._peek_unlocked(1)
return self._read_unlocked(
min(size, len(self._read_buf) - self._read_pos))
def tell(self):
return _BufferedIOMixin.tell(self) - len(self._read_buf) + self._read_pos
def seek(self, pos, whence=0):
if whence not in valid_seek_flags:
raise ValueError("invalid whence value")
with self._read_lock:
if whence == 1:
pos -= len(self._read_buf) - self._read_pos
pos = _BufferedIOMixin.seek(self, pos, whence)
self._reset_read_buf()
return pos
class BufferedWriter(_BufferedIOMixin):
"""A buffer for a writeable sequential RawIO object.
The constructor creates a BufferedWriter for the given writeable raw
stream. If the buffer_size is not given, it defaults to
DEFAULT_BUFFER_SIZE.
"""
def __init__(self, raw, buffer_size=DEFAULT_BUFFER_SIZE):
if not raw.writable():
raise OSError('"raw" argument must be writable.')
_BufferedIOMixin.__init__(self, raw)
if buffer_size <= 0:
raise ValueError("invalid buffer size")
self.buffer_size = buffer_size
self._write_buf = bytearray()
self._write_lock = Lock()
def write(self, b):
if self.closed:
raise ValueError("write to closed file")
if isinstance(b, str):
raise TypeError("can't write str to binary stream")
with self._write_lock:
# XXX we can implement some more tricks to try and avoid
# partial writes
if len(self._write_buf) > self.buffer_size:
# We're full, so let's pre-flush the buffer. (This may
# raise BlockingIOError with characters_written == 0.)
self._flush_unlocked()
before = len(self._write_buf)
self._write_buf.extend(b)
written = len(self._write_buf) - before
if len(self._write_buf) > self.buffer_size:
try:
self._flush_unlocked()
except BlockingIOError as e:
if len(self._write_buf) > self.buffer_size:
# We've hit the buffer_size. We have to accept a partial
# write and cut back our buffer.
overage = len(self._write_buf) - self.buffer_size
written -= overage
self._write_buf = self._write_buf[:self.buffer_size]
raise BlockingIOError(e.errno, e.strerror, written)
return written
def truncate(self, pos=None):
with self._write_lock:
self._flush_unlocked()
if pos is None:
pos = self.raw.tell()
return self.raw.truncate(pos)
def flush(self):
with self._write_lock:
self._flush_unlocked()
def _flush_unlocked(self):
if self.closed:
raise ValueError("flush of closed file")
while self._write_buf:
try:
n = self.raw.write(self._write_buf)
except InterruptedError:
continue
except BlockingIOError:
raise RuntimeError("self.raw should implement RawIOBase: it "
"should not raise BlockingIOError")
if n is None:
raise BlockingIOError(
errno.EAGAIN,
"write could not complete without blocking", 0)
if n > len(self._write_buf) or n < 0:
raise OSError("write() returned incorrect number of bytes")
del self._write_buf[:n]
def tell(self):
return _BufferedIOMixin.tell(self) + len(self._write_buf)
def seek(self, pos, whence=0):
if whence not in valid_seek_flags:
raise ValueError("invalid whence value")
with self._write_lock:
self._flush_unlocked()
return _BufferedIOMixin.seek(self, pos, whence)
class BufferedRWPair(BufferedIOBase):
"""A buffered reader and writer object together.
A buffered reader object and buffered writer object put together to
form a sequential IO object that can read and write. This is typically
used with a socket or two-way pipe.
reader and writer are RawIOBase objects that are readable and
writeable respectively. If the buffer_size is omitted it defaults to
DEFAULT_BUFFER_SIZE.
"""
# XXX The usefulness of this (compared to having two separate IO
# objects) is questionable.
def __init__(self, reader, writer, buffer_size=DEFAULT_BUFFER_SIZE):
"""Constructor.
The arguments are two RawIO instances.
"""
if not reader.readable():
raise OSError('"reader" argument must be readable.')
if not writer.writable():
raise OSError('"writer" argument must be writable.')
self.reader = BufferedReader(reader, buffer_size)
self.writer = BufferedWriter(writer, buffer_size)
def read(self, size=None):
if size is None:
size = -1
return self.reader.read(size)
def readinto(self, b):
return self.reader.readinto(b)
def write(self, b):
return self.writer.write(b)
def peek(self, size=0):
return self.reader.peek(size)
def read1(self, size):
return self.reader.read1(size)
def readable(self):
return self.reader.readable()
def writable(self):
return self.writer.writable()
def flush(self):
return self.writer.flush()
def close(self):
try:
self.writer.close()
finally:
self.reader.close()
def isatty(self):
return self.reader.isatty() or self.writer.isatty()
@property
def closed(self):
return self.writer.closed
class BufferedRandom(BufferedWriter, BufferedReader):
"""A buffered interface to random access streams.
The constructor creates a reader and writer for a seekable stream,
raw, given in the first argument. If the buffer_size is omitted it
defaults to DEFAULT_BUFFER_SIZE.
"""
def __init__(self, raw, buffer_size=DEFAULT_BUFFER_SIZE):
raw._checkSeekable()
BufferedReader.__init__(self, raw, buffer_size)
BufferedWriter.__init__(self, raw, buffer_size)
def seek(self, pos, whence=0):
if whence not in valid_seek_flags:
raise ValueError("invalid whence value")
self.flush()
if self._read_buf:
# Undo read ahead.
with self._read_lock:
self.raw.seek(self._read_pos - len(self._read_buf), 1)
# First do the raw seek, then empty the read buffer, so that
# if the raw seek fails, we don't lose buffered data forever.
pos = self.raw.seek(pos, whence)
with self._read_lock:
self._reset_read_buf()
if pos < 0:
raise OSError("seek() returned invalid position")
return pos
def tell(self):
if self._write_buf:
return BufferedWriter.tell(self)
else:
return BufferedReader.tell(self)
def truncate(self, pos=None):
if pos is None:
pos = self.tell()
# Use seek to flush the read buffer.
return BufferedWriter.truncate(self, pos)
def read(self, size=None):
if size is None:
size = -1
self.flush()
return BufferedReader.read(self, size)
def readinto(self, b):
self.flush()
return BufferedReader.readinto(self, b)
def peek(self, size=0):
self.flush()
return BufferedReader.peek(self, size)
def read1(self, size):
self.flush()
return BufferedReader.read1(self, size)
def write(self, b):
if self._read_buf:
# Undo readahead
with self._read_lock:
self.raw.seek(self._read_pos - len(self._read_buf), 1)
self._reset_read_buf()
return BufferedWriter.write(self, b)
class TextIOBase(IOBase):
"""Base class for text I/O.
This class provides a character and line based interface to stream
I/O. There is no readinto method because Python's character strings
are immutable. There is no public constructor.
"""
def read(self, size=-1):
"""Read at most size characters from stream, where size is an int.
Read from underlying buffer until we have size characters or we hit EOF.
If size is negative or omitted, read until EOF.
Returns a string.
"""
self._unsupported("read")
def write(self, s):
"""Write string s to stream and returning an int."""
self._unsupported("write")
def truncate(self, pos=None):
"""Truncate size to pos, where pos is an int."""
self._unsupported("truncate")
def readline(self):
"""Read until newline or EOF.
Returns an empty string if EOF is hit immediately.
"""
self._unsupported("readline")
def detach(self):
"""
Separate the underlying buffer from the TextIOBase and return it.
After the underlying buffer has been detached, the TextIO is in an
unusable state.
"""
self._unsupported("detach")
@property
def encoding(self):
"""Subclasses should override."""
return None
@property
def newlines(self):
"""Line endings translated so far.
Only line endings translated during reading are considered.
Subclasses should override.
"""
return None
@property
def errors(self):
"""Error setting of the decoder or encoder.
Subclasses should override."""
return None
io.TextIOBase.register(TextIOBase)
class IncrementalNewlineDecoder(codecs.IncrementalDecoder):
r"""Codec used when reading a file in universal newlines mode. It wraps
another incremental decoder, translating \r\n and \r into \n. It also
records the types of newlines encountered. When used with
translate=False, it ensures that the newline sequence is returned in
one piece.
"""
def __init__(self, decoder, translate, errors='strict'):
codecs.IncrementalDecoder.__init__(self, errors=errors)
self.translate = translate
self.decoder = decoder
self.seennl = 0
self.pendingcr = False
def decode(self, input, final=False):
# decode input (with the eventual \r from a previous pass)
if self.decoder is None:
output = input
else:
output = self.decoder.decode(input, final=final)
if self.pendingcr and (output or final):
output = "\r" + output
self.pendingcr = False
# retain last \r even when not translating data:
# then readline() is sure to get \r\n in one pass
if output.endswith("\r") and not final:
output = output[:-1]
self.pendingcr = True
# Record which newlines are read
crlf = output.count('\r\n')
cr = output.count('\r') - crlf
lf = output.count('\n') - crlf
self.seennl |= (lf and self._LF) | (cr and self._CR) \
| (crlf and self._CRLF)
if self.translate:
if crlf:
output = output.replace("\r\n", "\n")
if cr:
output = output.replace("\r", "\n")
return output
def getstate(self):
if self.decoder is None:
buf = b""
flag = 0
else:
buf, flag = self.decoder.getstate()
flag <<= 1
if self.pendingcr:
flag |= 1
return buf, flag
def setstate(self, state):
buf, flag = state
self.pendingcr = bool(flag & 1)
if self.decoder is not None:
self.decoder.setstate((buf, flag >> 1))
def reset(self):
self.seennl = 0
self.pendingcr = False
if self.decoder is not None:
self.decoder.reset()
_LF = 1
_CR = 2
_CRLF = 4
@property
def newlines(self):
return (None,
"\n",
"\r",
("\r", "\n"),
"\r\n",
("\n", "\r\n"),
("\r", "\r\n"),
("\r", "\n", "\r\n")
)[self.seennl]
class TextIOWrapper(TextIOBase):
r"""Character and line based layer over a BufferedIOBase object, buffer.
encoding gives the name of the encoding that the stream will be
decoded or encoded with. It defaults to locale.getpreferredencoding(False).
errors determines the strictness of encoding and decoding (see the
codecs.register) and defaults to "strict".
newline can be None, '', '\n', '\r', or '\r\n'. It controls the
handling of line endings. If it is None, universal newlines is
enabled. With this enabled, on input, the lines endings '\n', '\r',
or '\r\n' are translated to '\n' before being returned to the
caller. Conversely, on output, '\n' is translated to the system
default line separator, os.linesep. If newline is any other of its
legal values, that newline becomes the newline when the file is read
and it is returned untranslated. On output, '\n' is converted to the
newline.
If line_buffering is True, a call to flush is implied when a call to
write contains a newline character.
"""
_CHUNK_SIZE = 2048
# The write_through argument has no effect here since this
# implementation always writes through. The argument is present only
# so that the signature can match the signature of the C version.
def __init__(self, buffer, encoding=None, errors=None, newline=None,
line_buffering=False, write_through=False):
if newline is not None and not isinstance(newline, str):
raise TypeError("illegal newline type: %r" % (type(newline),))
if newline not in (None, "", "\n", "\r", "\r\n"):
raise ValueError("illegal newline value: %r" % (newline,))
if encoding is None:
try:
encoding = os.device_encoding(buffer.fileno())
except (AttributeError, UnsupportedOperation):
pass
if encoding is None:
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "ascii"
else:
encoding = locale.getpreferredencoding(False)
if not isinstance(encoding, str):
raise ValueError("invalid encoding: %r" % encoding)
if not codecs.lookup(encoding)._is_text_encoding:
msg = ("%r is not a text encoding; "
"use codecs.open() to handle arbitrary codecs")
raise LookupError(msg % encoding)
if errors is None:
errors = "strict"
else:
if not isinstance(errors, str):
raise ValueError("invalid errors: %r" % errors)
self._buffer = buffer
self._line_buffering = line_buffering
self._encoding = encoding
self._errors = errors
self._readuniversal = not newline
self._readtranslate = newline is None
self._readnl = newline
self._writetranslate = newline != ''
self._writenl = newline or os.linesep
self._encoder = None
self._decoder = None
self._decoded_chars = '' # buffer for text returned from decoder
self._decoded_chars_used = 0 # offset into _decoded_chars for read()
self._snapshot = None # info for reconstructing decoder state
self._seekable = self._telling = self.buffer.seekable()
self._has_read1 = hasattr(self.buffer, 'read1')
self._b2cratio = 0.0
if self._seekable and self.writable():
position = self.buffer.tell()
if position != 0:
try:
self._get_encoder().setstate(0)
except LookupError:
# Sometimes the encoder doesn't exist
pass
# self._snapshot is either None, or a tuple (dec_flags, next_input)
# where dec_flags is the second (integer) item of the decoder state
# and next_input is the chunk of input bytes that comes next after the
# snapshot point. We use this to reconstruct decoder states in tell().
# Naming convention:
# - "bytes_..." for integer variables that count input bytes
# - "chars_..." for integer variables that count decoded characters
def __repr__(self):
result = "<_pyio.TextIOWrapper"
try:
name = self.name
except Exception:
pass
else:
result += " name={0!r}".format(name)
try:
mode = self.mode
except Exception:
pass
else:
result += " mode={0!r}".format(mode)
return result + " encoding={0!r}>".format(self.encoding)
@property
def encoding(self):
return self._encoding
@property
def errors(self):
return self._errors
@property
def line_buffering(self):
return self._line_buffering
@property
def buffer(self):
return self._buffer
def seekable(self):
if self.closed:
raise ValueError("I/O operation on closed file.")
return self._seekable
def readable(self):
return self.buffer.readable()
def writable(self):
return self.buffer.writable()
def flush(self):
self.buffer.flush()
self._telling = self._seekable
def close(self):
if self.buffer is not None and not self.closed:
try:
self.flush()
finally:
self.buffer.close()
@property
def closed(self):
return self.buffer.closed
@property
def name(self):
return self.buffer.name
def fileno(self):
return self.buffer.fileno()
def isatty(self):
return self.buffer.isatty()
def write(self, s):
'Write data, where s is a str'
if self.closed:
raise ValueError("write to closed file")
if not isinstance(s, str):
raise TypeError("can't write %s to text stream" %
s.__class__.__name__)
length = len(s)
haslf = (self._writetranslate or self._line_buffering) and "\n" in s
if haslf and self._writetranslate and self._writenl != "\n":
s = s.replace("\n", self._writenl)
encoder = self._encoder or self._get_encoder()
# XXX What if we were just reading?
b = encoder.encode(s)
self.buffer.write(b)
if self._line_buffering and (haslf or "\r" in s):
self.flush()
self._snapshot = None
if self._decoder:
self._decoder.reset()
return length
def _get_encoder(self):
make_encoder = codecs.getincrementalencoder(self._encoding)
self._encoder = make_encoder(self._errors)
return self._encoder
def _get_decoder(self):
make_decoder = codecs.getincrementaldecoder(self._encoding)
decoder = make_decoder(self._errors)
if self._readuniversal:
decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)
self._decoder = decoder
return decoder
# The following three methods implement an ADT for _decoded_chars.
# Text returned from the decoder is buffered here until the client
# requests it by calling our read() or readline() method.
def _set_decoded_chars(self, chars):
"""Set the _decoded_chars buffer."""
self._decoded_chars = chars
self._decoded_chars_used = 0
def _get_decoded_chars(self, n=None):
"""Advance into the _decoded_chars buffer."""
offset = self._decoded_chars_used
if n is None:
chars = self._decoded_chars[offset:]
else:
chars = self._decoded_chars[offset:offset + n]
self._decoded_chars_used += len(chars)
return chars
def _rewind_decoded_chars(self, n):
"""Rewind the _decoded_chars buffer."""
if self._decoded_chars_used < n:
raise AssertionError("rewind decoded_chars out of bounds")
self._decoded_chars_used -= n
def _read_chunk(self):
"""
Read and decode the next chunk of data from the BufferedReader.
"""
# The return value is True unless EOF was reached. The decoded
# string is placed in self._decoded_chars (replacing its previous
# value). The entire input chunk is sent to the decoder, though
# some of it may remain buffered in the decoder, yet to be
# converted.
if self._decoder is None:
raise ValueError("no decoder")
if self._telling:
# To prepare for tell(), we need to snapshot a point in the
# file where the decoder's input buffer is empty.
dec_buffer, dec_flags = self._decoder.getstate()
# Given this, we know there was a valid snapshot point
# len(dec_buffer) bytes ago with decoder state (b'', dec_flags).
# Read a chunk, decode it, and put the result in self._decoded_chars.
if self._has_read1:
input_chunk = self.buffer.read1(self._CHUNK_SIZE)
else:
input_chunk = self.buffer.read(self._CHUNK_SIZE)
eof = not input_chunk
decoded_chars = self._decoder.decode(input_chunk, eof)
self._set_decoded_chars(decoded_chars)
if decoded_chars:
self._b2cratio = len(input_chunk) / len(self._decoded_chars)
else:
self._b2cratio = 0.0
if self._telling:
# At the snapshot point, len(dec_buffer) bytes before the read,
# the next input to be decoded is dec_buffer + input_chunk.
self._snapshot = (dec_flags, dec_buffer + input_chunk)
return not eof
def _pack_cookie(self, position, dec_flags=0,
bytes_to_feed=0, need_eof=0, chars_to_skip=0):
# The meaning of a tell() cookie is: seek to position, set the
# decoder flags to dec_flags, read bytes_to_feed bytes, feed them
# into the decoder with need_eof as the EOF flag, then skip
# chars_to_skip characters of the decoded result. For most simple
# decoders, tell() will often just give a byte offset in the file.
return (position | (dec_flags<<64) | (bytes_to_feed<<128) |
(chars_to_skip<<192) | bool(need_eof)<<256)
def _unpack_cookie(self, bigint):
rest, position = divmod(bigint, 1<<64)
rest, dec_flags = divmod(rest, 1<<64)
rest, bytes_to_feed = divmod(rest, 1<<64)
need_eof, chars_to_skip = divmod(rest, 1<<64)
return position, dec_flags, bytes_to_feed, need_eof, chars_to_skip
def tell(self):
if not self._seekable:
raise UnsupportedOperation("underlying stream is not seekable")
if not self._telling:
raise OSError("telling position disabled by next() call")
self.flush()
position = self.buffer.tell()
decoder = self._decoder
if decoder is None or self._snapshot is None:
if self._decoded_chars:
# This should never happen.
raise AssertionError("pending decoded text")
return position
# Skip backward to the snapshot point (see _read_chunk).
dec_flags, next_input = self._snapshot
position -= len(next_input)
# How many decoded characters have been used up since the snapshot?
chars_to_skip = self._decoded_chars_used
if chars_to_skip == 0:
# We haven't moved from the snapshot point.
return self._pack_cookie(position, dec_flags)
# Starting from the snapshot position, we will walk the decoder
# forward until it gives us enough decoded characters.
saved_state = decoder.getstate()
try:
# Fast search for an acceptable start point, close to our
# current pos.
# Rationale: calling decoder.decode() has a large overhead
# regardless of chunk size; we want the number of such calls to
# be O(1) in most situations (common decoders, non-crazy input).
# Actually, it will be exactly 1 for fixed-size codecs (all
# 8-bit codecs, also UTF-16 and UTF-32).
skip_bytes = int(self._b2cratio * chars_to_skip)
skip_back = 1
assert skip_bytes <= len(next_input)
while skip_bytes > 0:
decoder.setstate((b'', dec_flags))
# Decode up to temptative start point
n = len(decoder.decode(next_input[:skip_bytes]))
if n <= chars_to_skip:
b, d = decoder.getstate()
if not b:
# Before pos and no bytes buffered in decoder => OK
dec_flags = d
chars_to_skip -= n
break
# Skip back by buffered amount and reset heuristic
skip_bytes -= len(b)
skip_back = 1
else:
# We're too far ahead, skip back a bit
skip_bytes -= skip_back
skip_back = skip_back * 2
else:
skip_bytes = 0
decoder.setstate((b'', dec_flags))
# Note our initial start point.
start_pos = position + skip_bytes
start_flags = dec_flags
if chars_to_skip == 0:
# We haven't moved from the start point.
return self._pack_cookie(start_pos, start_flags)
# Feed the decoder one byte at a time. As we go, note the
# nearest "safe start point" before the current location
# (a point where the decoder has nothing buffered, so seek()
# can safely start from there and advance to this location).
bytes_fed = 0
need_eof = 0
# Chars decoded since `start_pos`
chars_decoded = 0
for i in range(skip_bytes, len(next_input)):
bytes_fed += 1
chars_decoded += len(decoder.decode(next_input[i:i+1]))
dec_buffer, dec_flags = decoder.getstate()
if not dec_buffer and chars_decoded <= chars_to_skip:
# Decoder buffer is empty, so this is a safe start point.
start_pos += bytes_fed
chars_to_skip -= chars_decoded
start_flags, bytes_fed, chars_decoded = dec_flags, 0, 0
if chars_decoded >= chars_to_skip:
break
else:
# We didn't get enough decoded data; signal EOF to get more.
chars_decoded += len(decoder.decode(b'', final=True))
need_eof = 1
if chars_decoded < chars_to_skip:
raise OSError("can't reconstruct logical file position")
# The returned cookie corresponds to the last safe start point.
return self._pack_cookie(
start_pos, start_flags, bytes_fed, need_eof, chars_to_skip)
finally:
decoder.setstate(saved_state)
def truncate(self, pos=None):
self.flush()
if pos is None:
pos = self.tell()
return self.buffer.truncate(pos)
def detach(self):
if self.buffer is None:
raise ValueError("buffer is already detached")
self.flush()
buffer = self._buffer
self._buffer = None
return buffer
def seek(self, cookie, whence=0):
def _reset_encoder(position):
"""Reset the encoder (merely useful for proper BOM handling)"""
try:
encoder = self._encoder or self._get_encoder()
except LookupError:
# Sometimes the encoder doesn't exist
pass
else:
if position != 0:
encoder.setstate(0)
else:
encoder.reset()
if self.closed:
raise ValueError("tell on closed file")
if not self._seekable:
raise UnsupportedOperation("underlying stream is not seekable")
if whence == 1: # seek relative to current position
if cookie != 0:
raise UnsupportedOperation("can't do nonzero cur-relative seeks")
# Seeking to the current position should attempt to
# sync the underlying buffer with the current position.
whence = 0
cookie = self.tell()
if whence == 2: # seek relative to end of file
if cookie != 0:
raise UnsupportedOperation("can't do nonzero end-relative seeks")
self.flush()
position = self.buffer.seek(0, 2)
self._set_decoded_chars('')
self._snapshot = None
if self._decoder:
self._decoder.reset()
_reset_encoder(position)
return position
if whence != 0:
raise ValueError("unsupported whence (%r)" % (whence,))
if cookie < 0:
raise ValueError("negative seek position %r" % (cookie,))
self.flush()
# The strategy of seek() is to go back to the safe start point
# and replay the effect of read(chars_to_skip) from there.
start_pos, dec_flags, bytes_to_feed, need_eof, chars_to_skip = \
self._unpack_cookie(cookie)
# Seek back to the safe start point.
self.buffer.seek(start_pos)
self._set_decoded_chars('')
self._snapshot = None
# Restore the decoder to its state from the safe start point.
if cookie == 0 and self._decoder:
self._decoder.reset()
elif self._decoder or dec_flags or chars_to_skip:
self._decoder = self._decoder or self._get_decoder()
self._decoder.setstate((b'', dec_flags))
self._snapshot = (dec_flags, b'')
if chars_to_skip:
# Just like _read_chunk, feed the decoder and save a snapshot.
input_chunk = self.buffer.read(bytes_to_feed)
self._set_decoded_chars(
self._decoder.decode(input_chunk, need_eof))
self._snapshot = (dec_flags, input_chunk)
# Skip chars_to_skip of the decoded characters.
if len(self._decoded_chars) < chars_to_skip:
raise OSError("can't restore logical file position")
self._decoded_chars_used = chars_to_skip
_reset_encoder(cookie)
return cookie
def read(self, size=None):
self._checkReadable()
if size is None:
size = -1
decoder = self._decoder or self._get_decoder()
try:
size.__index__
except AttributeError as err:
raise TypeError("an integer is required") from err
if size < 0:
# Read everything.
result = (self._get_decoded_chars() +
decoder.decode(self.buffer.read(), final=True))
self._set_decoded_chars('')
self._snapshot = None
return result
else:
# Keep reading chunks until we have size characters to return.
eof = False
result = self._get_decoded_chars(size)
while len(result) < size and not eof:
eof = not self._read_chunk()
result += self._get_decoded_chars(size - len(result))
return result
def __next__(self):
self._telling = False
line = self.readline()
if not line:
self._snapshot = None
self._telling = self._seekable
raise StopIteration
return line
def readline(self, size=None):
if self.closed:
raise ValueError("read from closed file")
if size is None:
size = -1
elif not isinstance(size, int):
raise TypeError("size must be an integer")
# Grab all the decoded text (we will rewind any extra bits later).
line = self._get_decoded_chars()
start = 0
# Make the decoder if it doesn't already exist.
if not self._decoder:
self._get_decoder()
pos = endpos = None
while True:
if self._readtranslate:
# Newlines are already translated, only search for \n
pos = line.find('\n', start)
if pos >= 0:
endpos = pos + 1
break
else:
start = len(line)
elif self._readuniversal:
# Universal newline search. Find any of \r, \r\n, \n
# The decoder ensures that \r\n are not split in two pieces
# In C we'd look for these in parallel of course.
nlpos = line.find("\n", start)
crpos = line.find("\r", start)
if crpos == -1:
if nlpos == -1:
# Nothing found
start = len(line)
else:
# Found \n
endpos = nlpos + 1
break
elif nlpos == -1:
# Found lone \r
endpos = crpos + 1
break
elif nlpos < crpos:
# Found \n
endpos = nlpos + 1
break
elif nlpos == crpos + 1:
# Found \r\n
endpos = crpos + 2
break
else:
# Found \r
endpos = crpos + 1
break
else:
# non-universal
pos = line.find(self._readnl)
if pos >= 0:
endpos = pos + len(self._readnl)
break
if size >= 0 and len(line) >= size:
endpos = size # reached length size
break
# No line ending seen yet - get more data'
while self._read_chunk():
if self._decoded_chars:
break
if self._decoded_chars:
line += self._get_decoded_chars()
else:
# end of file
self._set_decoded_chars('')
self._snapshot = None
return line
if size >= 0 and endpos > size:
endpos = size # don't exceed size
# Rewind _decoded_chars to just after the line ending we found.
self._rewind_decoded_chars(len(line) - endpos)
return line[:endpos]
@property
def newlines(self):
return self._decoder.newlines if self._decoder else None
class StringIO(TextIOWrapper):
"""Text I/O implementation using an in-memory buffer.
The initial_value argument sets the value of object. The newline
argument is like the one of TextIOWrapper's constructor.
"""
def __init__(self, initial_value="", newline="\n"):
super(StringIO, self).__init__(BytesIO(),
encoding="utf-8",
errors="surrogatepass",
newline=newline)
# Issue #5645: make universal newlines semantics the same as in the
# C version, even under Windows.
if newline is None:
self._writetranslate = False
if initial_value is not None:
if not isinstance(initial_value, str):
raise TypeError("initial_value must be str or None, not {0}"
.format(type(initial_value).__name__))
self.write(initial_value)
self.seek(0)
def getvalue(self):
self.flush()
decoder = self._decoder or self._get_decoder()
old_state = decoder.getstate()
decoder.reset()
try:
return decoder.decode(self.buffer.getvalue(), final=True)
finally:
decoder.setstate(old_state)
def __repr__(self):
# TextIOWrapper tells the encoding in its repr. In StringIO,
# that's an implementation detail.
return object.__repr__(self)
@property
def errors(self):
return None
@property
def encoding(self):
return None
def detach(self):
# This doesn't make sense on StringIO.
self._unsupported("detach")
"""
The objects used by the site module to add custom builtins.
"""
# Those objects are almost immortal and they keep a reference to their module
# globals. Defining them in the site module would keep too many references
# alive.
# Note this means this module should also avoid keep things alive in its
# globals.
import sys
class Quitter(object):
def __init__(self, name, eof):
self.name = name
self.eof = eof
def __repr__(self):
return 'Use %s() or %s to exit' % (self.name, self.eof)
def __call__(self, code=None):
# Shells like IDLE catch the SystemExit, but listen when their
# stdin wrapper is closed.
try:
sys.stdin.close()
except:
pass
raise SystemExit(code)
class _Printer(object):
"""interactive prompt objects for printing the license text, a list of
contributors and the copyright notice."""
MAXLINES = 23
def __init__(self, name, data, files=(), dirs=()):
import os
self.__name = name
self.__data = data
self.__lines = None
self.__filenames = [os.path.join(dir, filename)
for dir in dirs
for filename in files]
def __setup(self):
if self.__lines:
return
data = None
for filename in self.__filenames:
try:
with open(filename, "r") as fp:
data = fp.read()
break
except OSError:
pass
if not data:
data = self.__data
self.__lines = data.split('\n')
self.__linecnt = len(self.__lines)
def __repr__(self):
self.__setup()
if len(self.__lines) <= self.MAXLINES:
return "\n".join(self.__lines)
else:
return "Type %s() to see the full %s text" % ((self.__name,)*2)
def __call__(self):
self.__setup()
prompt = 'Hit Return for more, or q (and Return) to quit: '
lineno = 0
while 1:
try:
for i in range(lineno, lineno + self.MAXLINES):
print(self.__lines[i])
except IndexError:
break
else:
lineno += self.MAXLINES
key = None
while key is None:
key = input(prompt)
if key not in ('', 'q'):
key = None
if key == 'q':
break
class _Helper(object):
"""Define the builtin 'help'.
This is a wrapper around pydoc.help that provides a helpful message
when 'help' is typed at the Python interactive prompt.
Calling help() at the Python prompt starts an interactive help session.
Calling help(thing) prints help for the python object 'thing'.
"""
def __repr__(self):
return "Type help() for interactive help, " \
"or help(object) for help about object."
def __call__(self, *args, **kwds):
import pydoc
return pydoc.help(*args, **kwds)
"""Strptime-related classes and functions.
CLASSES:
LocaleTime -- Discovers and stores locale-specific time information
TimeRE -- Creates regexes for pattern matching a string of text containing
time information
FUNCTIONS:
_getlang -- Figure out what language is being used for the locale
strptime -- Calculates the time struct represented by the passed-in string
"""
import time
import locale
import calendar
from re import compile as re_compile
from re import IGNORECASE
from re import escape as re_escape
from datetime import (date as datetime_date,
timedelta as datetime_timedelta,
timezone as datetime_timezone)
try:
from _thread import allocate_lock as _thread_allocate_lock
except ImportError:
from _dummy_thread import allocate_lock as _thread_allocate_lock
__all__ = []
def _getlang():
# Figure out what the current language is set to.
return locale.getlocale(locale.LC_TIME)
class LocaleTime(object):
"""Stores and handles locale-specific information related to time.
ATTRIBUTES:
f_weekday -- full weekday names (7-item list)
a_weekday -- abbreviated weekday names (7-item list)
f_month -- full month names (13-item list; dummy value in [0], which
is added by code)
a_month -- abbreviated month names (13-item list, dummy value in
[0], which is added by code)
am_pm -- AM/PM representation (2-item list)
LC_date_time -- format string for date/time representation (string)
LC_date -- format string for date representation (string)
LC_time -- format string for time representation (string)
timezone -- daylight- and non-daylight-savings timezone representation
(2-item list of sets)
lang -- Language used by instance (2-item tuple)
"""
def __init__(self):
"""Set all attributes.
Order of methods called matters for dependency reasons.
The locale language is set at the offset and then checked again before
exiting. This is to make sure that the attributes were not set with a
mix of information from more than one locale. This would most likely
happen when using threads where one thread calls a locale-dependent
function while another thread changes the locale while the function in
the other thread is still running. Proper coding would call for
locks to prevent changing the locale while locale-dependent code is
running. The check here is done in case someone does not think about
doing this.
Only other possible issue is if someone changed the timezone and did
not call tz.tzset . That is an issue for the programmer, though,
since changing the timezone is worthless without that call.
"""
self.lang = _getlang()
self.__calc_weekday()
self.__calc_month()
self.__calc_am_pm()
self.__calc_timezone()
self.__calc_date_time()
if _getlang() != self.lang:
raise ValueError("locale changed during initialization")
if time.tzname != self.tzname or time.daylight != self.daylight:
raise ValueError("timezone changed during initialization")
def __pad(self, seq, front):
# Add '' to seq to either the front (is True), else the back.
seq = list(seq)
if front:
seq.insert(0, '')
else:
seq.append('')
return seq
def __calc_weekday(self):
# Set self.a_weekday and self.f_weekday using the calendar
# module.
a_weekday = [calendar.day_abbr[i].lower() for i in range(7)]
f_weekday = [calendar.day_name[i].lower() for i in range(7)]
self.a_weekday = a_weekday
self.f_weekday = f_weekday
def __calc_month(self):
# Set self.f_month and self.a_month using the calendar module.
a_month = [calendar.month_abbr[i].lower() for i in range(13)]
f_month = [calendar.month_name[i].lower() for i in range(13)]
self.a_month = a_month
self.f_month = f_month
def __calc_am_pm(self):
# Set self.am_pm by using time.strftime().
# The magic date (1999,3,17,hour,44,55,2,76,0) is not really that
# magical; just happened to have used it everywhere else where a
# static date was needed.
am_pm = []
for hour in (1, 22):
time_tuple = time.struct_time((1999,3,17,hour,44,55,2,76,0))
am_pm.append(time.strftime("%p", time_tuple).lower())
self.am_pm = am_pm
def __calc_date_time(self):
# Set self.date_time, self.date, & self.time by using
# time.strftime().
# Use (1999,3,17,22,44,55,2,76,0) for magic date because the amount of
# overloaded numbers is minimized. The order in which searches for
# values within the format string is very important; it eliminates
# possible ambiguity for what something represents.
time_tuple = time.struct_time((1999,3,17,22,44,55,2,76,0))
date_time = [None, None, None]
date_time[0] = time.strftime("%c", time_tuple).lower()
date_time[1] = time.strftime("%x", time_tuple).lower()
date_time[2] = time.strftime("%X", time_tuple).lower()
replacement_pairs = [('%', '%%'), (self.f_weekday[2], '%A'),
(self.f_month[3], '%B'), (self.a_weekday[2], '%a'),
(self.a_month[3], '%b'), (self.am_pm[1], '%p'),
('1999', '%Y'), ('99', '%y'), ('22', '%H'),
('44', '%M'), ('55', '%S'), ('76', '%j'),
('17', '%d'), ('03', '%m'), ('3', '%m'),
# '3' needed for when no leading zero.
('2', '%w'), ('10', '%I')]
replacement_pairs.extend([(tz, "%Z") for tz_values in self.timezone
for tz in tz_values])
for offset,directive in ((0,'%c'), (1,'%x'), (2,'%X')):
current_format = date_time[offset]
for old, new in replacement_pairs:
# Must deal with possible lack of locale info
# manifesting itself as the empty string (e.g., Swedish's
# lack of AM/PM info) or a platform returning a tuple of empty
# strings (e.g., MacOS 9 having timezone as ('','')).
if old:
current_format = current_format.replace(old, new)
# If %W is used, then Sunday, 2005-01-03 will fall on week 0 since
# 2005-01-03 occurs before the first Monday of the year. Otherwise
# %U is used.
time_tuple = time.struct_time((1999,1,3,1,1,1,6,3,0))
if '00' in time.strftime(directive, time_tuple):
U_W = '%W'
else:
U_W = '%U'
date_time[offset] = current_format.replace('11', U_W)
self.LC_date_time = date_time[0]
self.LC_date = date_time[1]
self.LC_time = date_time[2]
def __calc_timezone(self):
# Set self.timezone by using time.tzname.
# Do not worry about possibility of time.tzname[0] == time.tzname[1]
# and time.daylight; handle that in strptime.
try:
time.tzset()
except AttributeError:
pass
self.tzname = time.tzname
self.daylight = time.daylight
no_saving = frozenset(["utc", "gmt", self.tzname[0].lower()])
if self.daylight:
has_saving = frozenset([self.tzname[1].lower()])
else:
has_saving = frozenset()
self.timezone = (no_saving, has_saving)
class TimeRE(dict):
"""Handle conversion from format directives to regexes."""
def __init__(self, locale_time=None):
"""Create keys/values.
Order of execution is important for dependency reasons.
"""
if locale_time:
self.locale_time = locale_time
else:
self.locale_time = LocaleTime()
base = super()
base.__init__({
# The " \d" part of the regex is to make %c from ANSI C work
'd': r"(?P<d>3[0-1]|[1-2]\d|0[1-9]|[1-9]| [1-9])",
'f': r"(?P<f>[0-9]{1,6})",
'H': r"(?P<H>2[0-3]|[0-1]\d|\d)",
'I': r"(?P<I>1[0-2]|0[1-9]|[1-9])",
'j': r"(?P<j>36[0-6]|3[0-5]\d|[1-2]\d\d|0[1-9]\d|00[1-9]|[1-9]\d|0[1-9]|[1-9])",
'm': r"(?P<m>1[0-2]|0[1-9]|[1-9])",
'M': r"(?P<M>[0-5]\d|\d)",
'S': r"(?P<S>6[0-1]|[0-5]\d|\d)",
'U': r"(?P<U>5[0-3]|[0-4]\d|\d)",
'w': r"(?P<w>[0-6])",
# W is set below by using 'U'
'y': r"(?P<y>\d\d)",
#XXX: Does 'Y' need to worry about having less or more than
# 4 digits?
'Y': r"(?P<Y>\d\d\d\d)",
'z': r"(?P<z>[+-]\d\d[0-5]\d)",
'A': self.__seqToRE(self.locale_time.f_weekday, 'A'),
'a': self.__seqToRE(self.locale_time.a_weekday, 'a'),
'B': self.__seqToRE(self.locale_time.f_month[1:], 'B'),
'b': self.__seqToRE(self.locale_time.a_month[1:], 'b'),
'p': self.__seqToRE(self.locale_time.am_pm, 'p'),
'Z': self.__seqToRE((tz for tz_names in self.locale_time.timezone
for tz in tz_names),
'Z'),
'%': '%'})
base.__setitem__('W', base.__getitem__('U').replace('U', 'W'))
base.__setitem__('c', self.pattern(self.locale_time.LC_date_time))
base.__setitem__('x', self.pattern(self.locale_time.LC_date))
base.__setitem__('X', self.pattern(self.locale_time.LC_time))
def __seqToRE(self, to_convert, directive):
"""Convert a list to a regex string for matching a directive.
Want possible matching values to be from longest to shortest. This
prevents the possibility of a match occurring for a value that also
a substring of a larger value that should have matched (e.g., 'abc'
matching when 'abcdef' should have been the match).
"""
to_convert = sorted(to_convert, key=len, reverse=True)
for value in to_convert:
if value != '':
break
else:
return ''
regex = '|'.join(re_escape(stuff) for stuff in to_convert)
regex = '(?P<%s>%s' % (directive, regex)
return '%s)' % regex
def pattern(self, format):
"""Return regex pattern for the format string.
Need to make sure that any characters that might be interpreted as
regex syntax are escaped.
"""
processed_format = ''
# The sub() call escapes all characters that might be misconstrued
# as regex syntax. Cannot use re.escape since we have to deal with
# format directives (%m, etc.).
regex_chars = re_compile(r"([\\.^$*+?\(\){}\[\]|])")
format = regex_chars.sub(r"\\\1", format)
whitespace_replacement = re_compile('\s+')
format = whitespace_replacement.sub('\s+', format)
while '%' in format:
directive_index = format.index('%')+1
processed_format = "%s%s%s" % (processed_format,
format[:directive_index-1],
self[format[directive_index]])
format = format[directive_index+1:]
return "%s%s" % (processed_format, format)
def compile(self, format):
"""Return a compiled re object for the format string."""
return re_compile(self.pattern(format), IGNORECASE)
_cache_lock = _thread_allocate_lock()
# DO NOT modify _TimeRE_cache or _regex_cache without acquiring the cache lock
# first!
_TimeRE_cache = TimeRE()
_CACHE_MAX_SIZE = 5 # Max number of regexes stored in _regex_cache
_regex_cache = {}
def _calc_julian_from_U_or_W(year, week_of_year, day_of_week, week_starts_Mon):
"""Calculate the Julian day based on the year, week of the year, and day of
the week, with week_start_day representing whether the week of the year
assumes the week starts on Sunday or Monday (6 or 0)."""
first_weekday = datetime_date(year, 1, 1).weekday()
# If we are dealing with the %U directive (week starts on Sunday), it's
# easier to just shift the view to Sunday being the first day of the
# week.
if not week_starts_Mon:
first_weekday = (first_weekday + 1) % 7
day_of_week = (day_of_week + 1) % 7
# Need to watch out for a week 0 (when the first day of the year is not
# the same as that specified by %U or %W).
week_0_length = (7 - first_weekday) % 7
if week_of_year == 0:
return 1 + day_of_week - first_weekday
else:
days_to_week = week_0_length + (7 * (week_of_year - 1))
return 1 + days_to_week + day_of_week
def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
"""Return a 2-tuple consisting of a time struct and an int containing
the number of microseconds based on the input string and the
format string."""
for index, arg in enumerate([data_string, format]):
if not isinstance(arg, str):
msg = "strptime() argument {} must be str, not {}"
raise TypeError(msg.format(index, type(arg)))
global _TimeRE_cache, _regex_cache
with _cache_lock:
locale_time = _TimeRE_cache.locale_time
if (_getlang() != locale_time.lang or
time.tzname != locale_time.tzname or
time.daylight != locale_time.daylight):
_TimeRE_cache = TimeRE()
_regex_cache.clear()
locale_time = _TimeRE_cache.locale_time
if len(_regex_cache) > _CACHE_MAX_SIZE:
_regex_cache.clear()
format_regex = _regex_cache.get(format)
if not format_regex:
try:
format_regex = _TimeRE_cache.compile(format)
# KeyError raised when a bad format is found; can be specified as
# \\, in which case it was a stray % but with a space after it
except KeyError as err:
bad_directive = err.args[0]
if bad_directive == "\\":
bad_directive = "%"
del err
raise ValueError("'%s' is a bad directive in format '%s'" %
(bad_directive, format)) from None
# IndexError only occurs when the format string is "%"
except IndexError:
raise ValueError("stray %% in format '%s'" % format) from None
_regex_cache[format] = format_regex
found = format_regex.match(data_string)
if not found:
raise ValueError("time data %r does not match format %r" %
(data_string, format))
if len(data_string) != found.end():
raise ValueError("unconverted data remains: %s" %
data_string[found.end():])
year = None
month = day = 1
hour = minute = second = fraction = 0
tz = -1
tzoffset = None
# Default to -1 to signify that values not known; not critical to have,
# though
week_of_year = -1
week_of_year_start = -1
# weekday and julian defaulted to None so as to signal need to calculate
# values
weekday = julian = None
found_dict = found.groupdict()
for group_key in found_dict.keys():
# Directives not explicitly handled below:
# c, x, X
# handled by making out of other directives
# U, W
# worthless without day of the week
if group_key == 'y':
year = int(found_dict['y'])
# Open Group specification for strptime() states that a %y
#value in the range of [00, 68] is in the century 2000, while
#[69,99] is in the century 1900
if year <= 68:
year += 2000
else:
year += 1900
elif group_key == 'Y':
year = int(found_dict['Y'])
elif group_key == 'm':
month = int(found_dict['m'])
elif group_key == 'B':
month = locale_time.f_month.index(found_dict['B'].lower())
elif group_key == 'b':
month = locale_time.a_month.index(found_dict['b'].lower())
elif group_key == 'd':
day = int(found_dict['d'])
elif group_key == 'H':
hour = int(found_dict['H'])
elif group_key == 'I':
hour = int(found_dict['I'])
ampm = found_dict.get('p', '').lower()
# If there was no AM/PM indicator, we'll treat this like AM
if ampm in ('', locale_time.am_pm[0]):
# We're in AM so the hour is correct unless we're
# looking at 12 midnight.
# 12 midnight == 12 AM == hour 0
if hour == 12:
hour = 0
elif ampm == locale_time.am_pm[1]:
# We're in PM so we need to add 12 to the hour unless
# we're looking at 12 noon.
# 12 noon == 12 PM == hour 12
if hour != 12:
hour += 12
elif group_key == 'M':
minute = int(found_dict['M'])
elif group_key == 'S':
second = int(found_dict['S'])
elif group_key == 'f':
s = found_dict['f']
# Pad to always return microseconds.
s += "0" * (6 - len(s))
fraction = int(s)
elif group_key == 'A':
weekday = locale_time.f_weekday.index(found_dict['A'].lower())
elif group_key == 'a':
weekday = locale_time.a_weekday.index(found_dict['a'].lower())
elif group_key == 'w':
weekday = int(found_dict['w'])
if weekday == 0:
weekday = 6
else:
weekday -= 1
elif group_key == 'j':
julian = int(found_dict['j'])
elif group_key in ('U', 'W'):
week_of_year = int(found_dict[group_key])
if group_key == 'U':
# U starts week on Sunday.
week_of_year_start = 6
else:
# W starts week on Monday.
week_of_year_start = 0
elif group_key == 'z':
z = found_dict['z']
tzoffset = int(z[1:3]) * 60 + int(z[3:5])
if z.startswith("-"):
tzoffset = -tzoffset
elif group_key == 'Z':
# Since -1 is default value only need to worry about setting tz if
# it can be something other than -1.
found_zone = found_dict['Z'].lower()
for value, tz_values in enumerate(locale_time.timezone):
if found_zone in tz_values:
# Deal with bad locale setup where timezone names are the
# same and yet time.daylight is true; too ambiguous to
# be able to tell what timezone has daylight savings
if (time.tzname[0] == time.tzname[1] and
time.daylight and found_zone not in ("utc", "gmt")):
break
else:
tz = value
break
leap_year_fix = False
if year is None and month == 2 and day == 29:
year = 1904 # 1904 is first leap year of 20th century
leap_year_fix = True
elif year is None:
year = 1900
# If we know the week of the year and what day of that week, we can figure
# out the Julian day of the year.
if julian is None and week_of_year != -1 and weekday is not None:
week_starts_Mon = True if week_of_year_start == 0 else False
julian = _calc_julian_from_U_or_W(year, week_of_year, weekday,
week_starts_Mon)
# Cannot pre-calculate datetime_date() since can change in Julian
# calculation and thus could have different value for the day of the week
# calculation.
if julian is None:
# Need to add 1 to result since first day of the year is 1, not 0.
julian = datetime_date(year, month, day).toordinal() - \
datetime_date(year, 1, 1).toordinal() + 1
else: # Assume that if they bothered to include Julian day it will
# be accurate.
datetime_result = datetime_date.fromordinal((julian - 1) + datetime_date(year, 1, 1).toordinal())
year = datetime_result.year
month = datetime_result.month
day = datetime_result.day
if weekday is None:
weekday = datetime_date(year, month, day).weekday()
# Add timezone info
tzname = found_dict.get("Z")
if tzoffset is not None:
gmtoff = tzoffset * 60
else:
gmtoff = None
if leap_year_fix:
# the caller didn't supply a year but asked for Feb 29th. We couldn't
# use the default of 1900 for computations. We set it back to ensure
# that February 29th is smaller than March 1st.
year = 1900
return (year, month, day,
hour, minute, second,
weekday, julian, tz, tzname, gmtoff), fraction
def _strptime_time(data_string, format="%a %b %d %H:%M:%S %Y"):
"""Return a time struct based on the input string and the
format string."""
tt = _strptime(data_string, format)[0]
return time.struct_time(tt[:time._STRUCT_TM_ITEMS])
def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
"""Return a class cls instance based on the input string and the
format string."""
tt, fraction = _strptime(data_string, format)
tzname, gmtoff = tt[-2:]
args = tt[:6] + (fraction,)
if gmtoff is not None:
tzdelta = datetime_timedelta(seconds=gmtoff)
if tzname:
tz = datetime_timezone(tzdelta, tzname)
else:
tz = datetime_timezone(tzdelta)
args += (tz,)
return cls(*args)
"""Thread-local objects.
(Note that this module provides a Python version of the threading.local
class. Depending on the version of Python you're using, there may be a
faster one available. You should always import the `local` class from
`threading`.)
Thread-local objects support the management of thread-local data.
If you have data that you want to be local to a thread, simply create
a thread-local object and use its attributes:
>>> mydata = local()
>>> mydata.number = 42
>>> mydata.number
42
You can also access the local-object's dictionary:
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]
What's important about thread-local objects is that their data are
local to a thread. If we access the data in a different thread:
>>> log = []
>>> def f():
... items = sorted(mydata.__dict__.items())
... log.append(items)
... mydata.number = 11
... log.append(mydata.number)
>>> import threading
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[], 11]
we get different data. Furthermore, changes made in the other thread
don't affect data seen in this thread:
>>> mydata.number
42
Of course, values you get from a local object, including a __dict__
attribute, are for whatever thread was current at the time the
attribute was read. For that reason, you generally don't want to save
these values across threads, as they apply only to the thread they
came from.
You can create custom local objects by subclassing the local class:
>>> class MyLocal(local):
... number = 2
... initialized = False
... def __init__(self, **kw):
... if self.initialized:
... raise SystemError('__init__ called too many times')
... self.initialized = True
... self.__dict__.update(kw)
... def squared(self):
... return self.number ** 2
This can be useful to support default values, methods and
initialization. Note that if you define an __init__ method, it will be
called each time the local object is used in a separate thread. This
is necessary to initialize each thread's dictionary.
Now if we create a local object:
>>> mydata = MyLocal(color='red')
Now we have a default number:
>>> mydata.number
2
an initial color:
>>> mydata.color
'red'
>>> del mydata.color
And a method that operates on the data:
>>> mydata.squared()
4
As before, we can access the data in a separate thread:
>>> log = []
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
>>> log
[[('color', 'red'), ('initialized', True)], 11]
without affecting this thread's data:
>>> mydata.number
2
>>> mydata.color
Traceback (most recent call last):
...
AttributeError: 'MyLocal' object has no attribute 'color'
Note that subclasses can define slots, but they are not thread
local. They are shared across threads:
>>> class MyLocal(local):
... __slots__ = 'number'
>>> mydata = MyLocal()
>>> mydata.number = 42
>>> mydata.color = 'red'
So, the separate thread:
>>> thread = threading.Thread(target=f)
>>> thread.start()
>>> thread.join()
affects what we see:
>>> mydata.number
11
>>> del mydata
"""
from weakref import ref
from contextlib import contextmanager
__all__ = ["local"]
# We need to use objects from the threading module, but the threading
# module may also want to use our `local` class, if support for locals
# isn't compiled in to the `thread` module. This creates potential problems
# with circular imports. For that reason, we don't import `threading`
# until the bottom of this file (a hack sufficient to worm around the
# potential problems). Note that all platforms on CPython do have support
# for locals in the `thread` module, and there is no circular import problem
# then, so problems introduced by fiddling the order of imports here won't
# manifest.
class _localimpl:
"""A class managing thread-local dicts"""
__slots__ = 'key', 'dicts', 'localargs', 'locallock', '__weakref__'
def __init__(self):
# The key used in the Thread objects' attribute dicts.
# We keep it a string for speed but make it unlikely to clash with
# a "real" attribute.
self.key = '_threading_local._localimpl.' + str(id(self))
# { id(Thread) -> (ref(Thread), thread-local dict) }
self.dicts = {}
def get_dict(self):
"""Return the dict for the current thread. Raises KeyError if none
defined."""
thread = current_thread()
return self.dicts[id(thread)][1]
def create_dict(self):
"""Create a new dict for the current thread, and return it."""
localdict = {}
key = self.key
thread = current_thread()
idt = id(thread)
def local_deleted(_, key=key):
# When the localimpl is deleted, remove the thread attribute.
thread = wrthread()
if thread is not None:
del thread.__dict__[key]
def thread_deleted(_, idt=idt):
# When the thread is deleted, remove the local dict.
# Note that this is suboptimal if the thread object gets
# caught in a reference loop. We would like to be called
# as soon as the OS-level thread ends instead.
local = wrlocal()
if local is not None:
dct = local.dicts.pop(idt)
wrlocal = ref(self, local_deleted)
wrthread = ref(thread, thread_deleted)
thread.__dict__[key] = wrlocal
self.dicts[idt] = wrthread, localdict
return localdict
@contextmanager
def _patch(self):
impl = object.__getattribute__(self, '_local__impl')
try:
dct = impl.get_dict()
except KeyError:
dct = impl.create_dict()
args, kw = impl.localargs
self.__init__(*args, **kw)
with impl.locallock:
object.__setattr__(self, '__dict__', dct)
yield
class local:
__slots__ = '_local__impl', '__dict__'
def __new__(cls, *args, **kw):
if (args or kw) and (cls.__init__ is object.__init__):
raise TypeError("Initialization arguments are not supported")
self = object.__new__(cls)
impl = _localimpl()
impl.localargs = (args, kw)
impl.locallock = RLock()
object.__setattr__(self, '_local__impl', impl)
# We need to create the thread dict in anticipation of
# __init__ being called, to make sure we don't call it
# again ourselves.
impl.create_dict()
return self
def __getattribute__(self, name):
with _patch(self):
return object.__getattribute__(self, name)
def __setattr__(self, name, value):
if name == '__dict__':
raise AttributeError(
"%r object attribute '__dict__' is read-only"
% self.__class__.__name__)
with _patch(self):
return object.__setattr__(self, name, value)
def __delattr__(self, name):
if name == '__dict__':
raise AttributeError(
"%r object attribute '__dict__' is read-only"
% self.__class__.__name__)
with _patch(self):
return object.__delattr__(self, name)
from threading import current_thread, RLock
# Access WeakSet through the weakref module.
# This code is separated-out because it is needed
# by abc.py to load everything else at startup.
from _weakref import ref
__all__ = ['WeakSet']
class _IterationGuard:
# This context manager registers itself in the current iterators of the
# weak container, such as to delay all removals until the context manager
# exits.
# This technique should be relatively thread-safe (since sets are).
def __init__(self, weakcontainer):
# Don't create cycles
self.weakcontainer = ref(weakcontainer)
def __enter__(self):
w = self.weakcontainer()
if w is not None:
w._iterating.add(self)
return self
def __exit__(self, e, t, b):
w = self.weakcontainer()
if w is not None:
s = w._iterating
s.remove(self)
if not s:
w._commit_removals()
class WeakSet:
def __init__(self, data=None):
self.data = set()
def _remove(item, selfref=ref(self)):
self = selfref()
if self is not None:
if self._iterating:
self._pending_removals.append(item)
else:
self.data.discard(item)
self._remove = _remove
# A list of keys to be removed
self._pending_removals = []
self._iterating = set()
if data is not None:
self.update(data)
def _commit_removals(self):
l = self._pending_removals
discard = self.data.discard
while l:
discard(l.pop())
def __iter__(self):
with _IterationGuard(self):
for itemref in self.data:
item = itemref()
if item is not None:
# Caveat: the iterator will keep a strong reference to
# `item` until it is resumed or closed.
yield item
def __len__(self):
return len(self.data) - len(self._pending_removals)
def __contains__(self, item):
try:
wr = ref(item)
except TypeError:
return False
return wr in self.data
def __reduce__(self):
return (self.__class__, (list(self),),
getattr(self, '__dict__', None))
def add(self, item):
if self._pending_removals:
self._commit_removals()
self.data.add(ref(item, self._remove))
def clear(self):
if self._pending_removals:
self._commit_removals()
self.data.clear()
def copy(self):
return self.__class__(self)
def pop(self):
if self._pending_removals:
self._commit_removals()
while True:
try:
itemref = self.data.pop()
except KeyError:
raise KeyError('pop from empty WeakSet')
item = itemref()
if item is not None:
return item
def remove(self, item):
if self._pending_removals:
self._commit_removals()
self.data.remove(ref(item))
def discard(self, item):
if self._pending_removals:
self._commit_removals()
self.data.discard(ref(item))
def update(self, other):
if self._pending_removals:
self._commit_removals()
for element in other:
self.add(element)
def __ior__(self, other):
self.update(other)
return self
def difference(self, other):
newset = self.copy()
newset.difference_update(other)
return newset
__sub__ = difference
def difference_update(self, other):
self.__isub__(other)
def __isub__(self, other):
if self._pending_removals:
self._commit_removals()
if self is other:
self.data.clear()
else:
self.data.difference_update(ref(item) for item in other)
return self
def intersection(self, other):
return self.__class__(item for item in other if item in self)
__and__ = intersection
def intersection_update(self, other):
self.__iand__(other)
def __iand__(self, other):
if self._pending_removals:
self._commit_removals()
self.data.intersection_update(ref(item) for item in other)
return self
def issubset(self, other):
return self.data.issubset(ref(item) for item in other)
__le__ = issubset
def __lt__(self, other):
return self.data < set(ref(item) for item in other)
def issuperset(self, other):
return self.data.issuperset(ref(item) for item in other)
__ge__ = issuperset
def __gt__(self, other):
return self.data > set(ref(item) for item in other)
def __eq__(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return self.data == set(ref(item) for item in other)
def symmetric_difference(self, other):
newset = self.copy()
newset.symmetric_difference_update(other)
return newset
__xor__ = symmetric_difference
def symmetric_difference_update(self, other):
self.__ixor__(other)
def __ixor__(self, other):
if self._pending_removals:
self._commit_removals()
if self is other:
self.data.clear()
else:
self.data.symmetric_difference_update(ref(item, self._remove) for item in other)
return self
def union(self, other):
return self.__class__(e for s in (self, other) for e in s)
__or__ = union
def isdisjoint(self, other):
return len(self.intersection(other)) == 0
"""Record of phased-in incompatible language changes.
Each line is of the form:
FeatureName = "_Feature(" OptionalRelease "," MandatoryRelease ","
CompilerFlag ")"
where, normally, OptionalRelease < MandatoryRelease, and both are 5-tuples
of the same form as sys.version_info:
(PY_MAJOR_VERSION, # the 2 in 2.1.0a3; an int
PY_MINOR_VERSION, # the 1; an int
PY_MICRO_VERSION, # the 0; an int
PY_RELEASE_LEVEL, # "alpha", "beta", "candidate" or "final"; string
PY_RELEASE_SERIAL # the 3; an int
)
OptionalRelease records the first release in which
from __future__ import FeatureName
was accepted.
In the case of MandatoryReleases that have not yet occurred,
MandatoryRelease predicts the release in which the feature will become part
of the language.
Else MandatoryRelease records when the feature became part of the language;
in releases at or after that, modules no longer need
from __future__ import FeatureName
to use the feature in question, but may continue to use such imports.
MandatoryRelease may also be None, meaning that a planned feature got
dropped.
Instances of class _Feature have two corresponding methods,
.getOptionalRelease() and .getMandatoryRelease().
CompilerFlag is the (bitfield) flag that should be passed in the fourth
argument to the builtin function compile() to enable the feature in
dynamically compiled code. This flag is stored in the .compiler_flag
attribute on _Future instances. These values must match the appropriate
#defines of CO_xxx flags in Include/compile.h.
No feature line is ever to be deleted from this file.
"""
all_feature_names = [
"nested_scopes",
"generators",
"division",
"absolute_import",
"with_statement",
"print_function",
"unicode_literals",
"barry_as_FLUFL",
"generator_stop",
]
__all__ = ["all_feature_names"] + all_feature_names
# The CO_xxx symbols are defined here under the same names used by
# compile.h, so that an editor search will find them here. However,
# they're not exported in __all__, because they don't really belong to
# this module.
CO_NESTED = 0x0010 # nested_scopes
CO_GENERATOR_ALLOWED = 0 # generators (obsolete, was 0x1000)
CO_FUTURE_DIVISION = 0x2000 # division
CO_FUTURE_ABSOLUTE_IMPORT = 0x4000 # perform absolute imports by default
CO_FUTURE_WITH_STATEMENT = 0x8000 # with statement
CO_FUTURE_PRINT_FUNCTION = 0x10000 # print function
CO_FUTURE_UNICODE_LITERALS = 0x20000 # unicode string literals
CO_FUTURE_BARRY_AS_BDFL = 0x40000
CO_FUTURE_GENERATOR_STOP = 0x80000 # StopIteration becomes RuntimeError in generators
class _Feature:
def __init__(self, optionalRelease, mandatoryRelease, compiler_flag):
self.optional = optionalRelease
self.mandatory = mandatoryRelease
self.compiler_flag = compiler_flag
def getOptionalRelease(self):
"""Return first release in which this feature was recognized.
This is a 5-tuple, of the same form as sys.version_info.
"""
return self.optional
def getMandatoryRelease(self):
"""Return release in which this feature will become mandatory.
This is a 5-tuple, of the same form as sys.version_info, or, if
the feature was dropped, is None.
"""
return self.mandatory
def __repr__(self):
return "_Feature" + repr((self.optional,
self.mandatory,
self.compiler_flag))
nested_scopes = _Feature((2, 1, 0, "beta", 1),
(2, 2, 0, "alpha", 0),
CO_NESTED)
generators = _Feature((2, 2, 0, "alpha", 1),
(2, 3, 0, "final", 0),
CO_GENERATOR_ALLOWED)
division = _Feature((2, 2, 0, "alpha", 2),
(3, 0, 0, "alpha", 0),
CO_FUTURE_DIVISION)
absolute_import = _Feature((2, 5, 0, "alpha", 1),
(3, 0, 0, "alpha", 0),
CO_FUTURE_ABSOLUTE_IMPORT)
with_statement = _Feature((2, 5, 0, "alpha", 1),
(2, 6, 0, "alpha", 0),
CO_FUTURE_WITH_STATEMENT)
print_function = _Feature((2, 6, 0, "alpha", 2),
(3, 0, 0, "alpha", 0),
CO_FUTURE_PRINT_FUNCTION)
unicode_literals = _Feature((2, 6, 0, "alpha", 2),
(3, 0, 0, "alpha", 0),
CO_FUTURE_UNICODE_LITERALS)
barry_as_FLUFL = _Feature((3, 1, 0, "alpha", 2),
(3, 9, 0, "alpha", 0),
CO_FUTURE_BARRY_AS_BDFL)
generator_stop = _Feature((3, 5, 0, "beta", 1),
(3, 7, 0, "alpha", 0),
CO_FUTURE_GENERATOR_STOP)
# This file exists as a helper for the test.test_frozen module.
"""Base implementation of event loop.
The event loop can be broken up into a multiplexer (the part
responsible for notifying us of I/O events) and the event loop proper,
which wraps a multiplexer with functionality for scheduling callbacks,
immediately or at a given time in the future.
Whenever a public API takes a callback, subsequent positional
arguments will be passed to the callback if/when it is called. This
avoids the proliferation of trivial lambdas implementing closures.
Keyword arguments for the callback are not supported; this is a
conscious design decision, leaving the door open for keyword arguments
to modify the meaning of the API call itself.
"""
import collections
import concurrent.futures
import functools
import heapq
import inspect
import ipaddress
import itertools
import logging
import os
import socket
import subprocess
import threading
import time
import traceback
import sys
import warnings
from . import compat
from . import coroutines
from . import events
from . import futures
from . import tasks
from .coroutines import coroutine
from .log import logger
__all__ = ['BaseEventLoop']
# Argument for default thread pool executor creation.
_MAX_WORKERS = 5
# Minimum number of _scheduled timer handles before cleanup of
# cancelled handles is performed.
_MIN_SCHEDULED_TIMER_HANDLES = 100
# Minimum fraction of _scheduled timer handles that are cancelled
# before cleanup of cancelled handles is performed.
_MIN_CANCELLED_TIMER_HANDLES_FRACTION = 0.5
def _format_handle(handle):
cb = handle._callback
if inspect.ismethod(cb) and isinstance(cb.__self__, tasks.Task):
# format the task
return repr(cb.__self__)
else:
return str(handle)
def _format_pipe(fd):
if fd == subprocess.PIPE:
return '<pipe>'
elif fd == subprocess.STDOUT:
return '<stdout>'
else:
return repr(fd)
# Linux's sock.type is a bitmask that can include extra info about socket.
_SOCKET_TYPE_MASK = 0
if hasattr(socket, 'SOCK_NONBLOCK'):
_SOCKET_TYPE_MASK |= socket.SOCK_NONBLOCK
if hasattr(socket, 'SOCK_CLOEXEC'):
_SOCKET_TYPE_MASK |= socket.SOCK_CLOEXEC
@functools.lru_cache(maxsize=1024)
def _ipaddr_info(host, port, family, type, proto):
# Try to skip getaddrinfo if "host" is already an IP. Since getaddrinfo
# blocks on an exclusive lock on some platforms, users might handle name
# resolution in their own code and pass in resolved IPs.
if proto not in {0, socket.IPPROTO_TCP, socket.IPPROTO_UDP} or host is None:
return None
type &= ~_SOCKET_TYPE_MASK
if type == socket.SOCK_STREAM:
proto = socket.IPPROTO_TCP
elif type == socket.SOCK_DGRAM:
proto = socket.IPPROTO_UDP
else:
return None
if hasattr(socket, 'inet_pton'):
if family == socket.AF_UNSPEC:
afs = [socket.AF_INET, socket.AF_INET6]
else:
afs = [family]
for af in afs:
# Linux's inet_pton doesn't accept an IPv6 zone index after host,
# like '::1%lo0', so strip it. If we happen to make an invalid
# address look valid, we fail later in sock.connect or sock.bind.
try:
if af == socket.AF_INET6:
socket.inet_pton(af, host.partition('%')[0])
else:
socket.inet_pton(af, host)
return af, type, proto, '', (host, port)
except OSError:
pass
# "host" is not an IP address.
return None
# No inet_pton. (On Windows it's only available since Python 3.4.)
# Even though getaddrinfo with AI_NUMERICHOST would be non-blocking, it
# still requires a lock on some platforms, and waiting for that lock could
# block the event loop. Use ipaddress instead, it's just text parsing.
try:
addr = ipaddress.IPv4Address(host)
except ValueError:
try:
addr = ipaddress.IPv6Address(host.partition('%')[0])
except ValueError:
return None
af = socket.AF_INET if addr.version == 4 else socket.AF_INET6
if family not in (socket.AF_UNSPEC, af):
# "host" is wrong IP version for "family".
return None
return af, type, proto, '', (host, port)
def _check_resolved_address(sock, address):
# Ensure that the address is already resolved to avoid the trap of hanging
# the entire event loop when the address requires doing a DNS lookup.
if hasattr(socket, 'AF_UNIX') and sock.family == socket.AF_UNIX:
return
host, port = address[:2]
if _ipaddr_info(host, port, sock.family, sock.type, sock.proto) is None:
raise ValueError("address must be resolved (IP address),"
" got host %r" % host)
def _run_until_complete_cb(fut):
exc = fut._exception
if (isinstance(exc, BaseException)
and not isinstance(exc, Exception)):
# Issue #22429: run_forever() already finished, no need to
# stop it.
return
fut._loop.stop()
class Server(events.AbstractServer):
def __init__(self, loop, sockets):
self._loop = loop
self.sockets = sockets
self._active_count = 0
self._waiters = []
def __repr__(self):
return '<%s sockets=%r>' % (self.__class__.__name__, self.sockets)
def _attach(self):
assert self.sockets is not None
self._active_count += 1
def _detach(self):
assert self._active_count > 0
self._active_count -= 1
if self._active_count == 0 and self.sockets is None:
self._wakeup()
def close(self):
sockets = self.sockets
if sockets is None:
return
self.sockets = None
for sock in sockets:
self._loop._stop_serving(sock)
if self._active_count == 0:
self._wakeup()
def _wakeup(self):
waiters = self._waiters
self._waiters = None
for waiter in waiters:
if not waiter.done():
waiter.set_result(waiter)
@coroutine
def wait_closed(self):
if self.sockets is None or self._waiters is None:
return
waiter = futures.Future(loop=self._loop)
self._waiters.append(waiter)
yield from waiter
class BaseEventLoop(events.AbstractEventLoop):
def __init__(self):
self._timer_cancelled_count = 0
self._closed = False
self._stopping = False
self._ready = collections.deque()
self._scheduled = []
self._default_executor = None
self._internal_fds = 0
# Identifier of the thread running the event loop, or None if the
# event loop is not running
self._thread_id = None
self._clock_resolution = time.get_clock_info('monotonic').resolution
self._exception_handler = None
self.set_debug((not sys.flags.ignore_environment
and bool(os.environ.get('PYTHONASYNCIODEBUG'))))
# In debug mode, if the execution of a callback or a step of a task
# exceed this duration in seconds, the slow callback/task is logged.
self.slow_callback_duration = 0.1
self._current_handle = None
self._task_factory = None
self._coroutine_wrapper_set = False
def __repr__(self):
return ('<%s running=%s closed=%s debug=%s>'
% (self.__class__.__name__, self.is_running(),
self.is_closed(), self.get_debug()))
def create_task(self, coro):
"""Schedule a coroutine object.
Return a task object.
"""
self._check_closed()
if self._task_factory is None:
task = tasks.Task(coro, loop=self)
if task._source_traceback:
del task._source_traceback[-1]
else:
task = self._task_factory(self, coro)
return task
def set_task_factory(self, factory):
"""Set a task factory that will be used by loop.create_task().
If factory is None the default task factory will be set.
If factory is a callable, it should have a signature matching
'(loop, coro)', where 'loop' will be a reference to the active
event loop, 'coro' will be a coroutine object. The callable
must return a Future.
"""
if factory is not None and not callable(factory):
raise TypeError('task factory must be a callable or None')
self._task_factory = factory
def get_task_factory(self):
"""Return a task factory, or None if the default one is in use."""
return self._task_factory
def _make_socket_transport(self, sock, protocol, waiter=None, *,
extra=None, server=None):
"""Create socket transport."""
raise NotImplementedError
def _make_ssl_transport(self, rawsock, protocol, sslcontext, waiter=None,
*, server_side=False, server_hostname=None,
extra=None, server=None):
"""Create SSL transport."""
raise NotImplementedError
def _make_datagram_transport(self, sock, protocol,
address=None, waiter=None, extra=None):
"""Create datagram transport."""
raise NotImplementedError
def _make_read_pipe_transport(self, pipe, protocol, waiter=None,
extra=None):
"""Create read pipe transport."""
raise NotImplementedError
def _make_write_pipe_transport(self, pipe, protocol, waiter=None,
extra=None):
"""Create write pipe transport."""
raise NotImplementedError
@coroutine
def _make_subprocess_transport(self, protocol, args, shell,
stdin, stdout, stderr, bufsize,
extra=None, **kwargs):
"""Create subprocess transport."""
raise NotImplementedError
def _write_to_self(self):
"""Write a byte to self-pipe, to wake up the event loop.
This may be called from a different thread.
The subclass is responsible for implementing the self-pipe.
"""
raise NotImplementedError
def _process_events(self, event_list):
"""Process selector events."""
raise NotImplementedError
def _check_closed(self):
if self._closed:
raise RuntimeError('Event loop is closed')
def run_forever(self):
"""Run until stop() is called."""
self._check_closed()
if self.is_running():
raise RuntimeError('Event loop is running.')
self._set_coroutine_wrapper(self._debug)
self._thread_id = threading.get_ident()
try:
while True:
self._run_once()
if self._stopping:
break
finally:
self._stopping = False
self._thread_id = None
self._set_coroutine_wrapper(False)
def run_until_complete(self, future):
"""Run until the Future is done.
If the argument is a coroutine, it is wrapped in a Task.
WARNING: It would be disastrous to call run_until_complete()
with the same coroutine twice -- it would wrap it in two
different Tasks and that can't be good.
Return the Future's result, or raise its exception.
"""
self._check_closed()
new_task = not isinstance(future, futures.Future)
future = tasks.ensure_future(future, loop=self)
if new_task:
# An exception is raised if the future didn't complete, so there
# is no need to log the "destroy pending task" message
future._log_destroy_pending = False
future.add_done_callback(_run_until_complete_cb)
try:
self.run_forever()
except:
if new_task and future.done() and not future.cancelled():
# The coroutine raised a BaseException. Consume the exception
# to not log a warning, the caller doesn't have access to the
# local task.
future.exception()
raise
future.remove_done_callback(_run_until_complete_cb)
if not future.done():
raise RuntimeError('Event loop stopped before Future completed.')
return future.result()
def stop(self):
"""Stop running the event loop.
Every callback already scheduled will still run. This simply informs
run_forever to stop looping after a complete iteration.
"""
self._stopping = True
def close(self):
"""Close the event loop.
This clears the queues and shuts down the executor,
but does not wait for the executor to finish.
The event loop must not be running.
"""
if self.is_running():
raise RuntimeError("Cannot close a running event loop")
if self._closed:
return
if self._debug:
logger.debug("Close %r", self)
self._closed = True
self._ready.clear()
self._scheduled.clear()
executor = self._default_executor
if executor is not None:
self._default_executor = None
executor.shutdown(wait=False)
def is_closed(self):
"""Returns True if the event loop was closed."""
return self._closed
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if not self.is_closed():
warnings.warn("unclosed event loop %r" % self, ResourceWarning)
if not self.is_running():
self.close()
def is_running(self):
"""Returns True if the event loop is running."""
return (self._thread_id is not None)
def time(self):
"""Return the time according to the event loop's clock.
This is a float expressed in seconds since an epoch, but the
epoch, precision, accuracy and drift are unspecified and may
differ per event loop.
"""
return time.monotonic()
def call_later(self, delay, callback, *args):
"""Arrange for a callback to be called at a given time.
Return a Handle: an opaque object with a cancel() method that
can be used to cancel the call.
The delay can be an int or float, expressed in seconds. It is
always relative to the current time.
Each callback will be called exactly once. If two callbacks
are scheduled for exactly the same time, it undefined which
will be called first.
Any positional arguments after the callback will be passed to
the callback when it is called.
"""
timer = self.call_at(self.time() + delay, callback, *args)
if timer._source_traceback:
del timer._source_traceback[-1]
return timer
def call_at(self, when, callback, *args):
"""Like call_later(), but uses an absolute time.
Absolute time corresponds to the event loop's time() method.
"""
if (coroutines.iscoroutine(callback)
or coroutines.iscoroutinefunction(callback)):
raise TypeError("coroutines cannot be used with call_at()")
self._check_closed()
if self._debug:
self._check_thread()
timer = events.TimerHandle(when, callback, args, self)
if timer._source_traceback:
del timer._source_traceback[-1]
heapq.heappush(self._scheduled, timer)
timer._scheduled = True
return timer
def call_soon(self, callback, *args):
"""Arrange for a callback to be called as soon as possible.
This operates as a FIFO queue: callbacks are called in the
order in which they are registered. Each callback will be
called exactly once.
Any positional arguments after the callback will be passed to
the callback when it is called.
"""
if self._debug:
self._check_thread()
handle = self._call_soon(callback, args)
if handle._source_traceback:
del handle._source_traceback[-1]
return handle
def _call_soon(self, callback, args):
if (coroutines.iscoroutine(callback)
or coroutines.iscoroutinefunction(callback)):
raise TypeError("coroutines cannot be used with call_soon()")
self._check_closed()
handle = events.Handle(callback, args, self)
if handle._source_traceback:
del handle._source_traceback[-1]
self._ready.append(handle)
return handle
def _check_thread(self):
"""Check that the current thread is the thread running the event loop.
Non-thread-safe methods of this class make this assumption and will
likely behave incorrectly when the assumption is violated.
Should only be called when (self._debug == True). The caller is
responsible for checking this condition for performance reasons.
"""
if self._thread_id is None:
return
thread_id = threading.get_ident()
if thread_id != self._thread_id:
raise RuntimeError(
"Non-thread-safe operation invoked on an event loop other "
"than the current one")
def call_soon_threadsafe(self, callback, *args):
"""Like call_soon(), but thread-safe."""
handle = self._call_soon(callback, args)
if handle._source_traceback:
del handle._source_traceback[-1]
self._write_to_self()
return handle
def run_in_executor(self, executor, func, *args):
if (coroutines.iscoroutine(func)
or coroutines.iscoroutinefunction(func)):
raise TypeError("coroutines cannot be used with run_in_executor()")
self._check_closed()
if isinstance(func, events.Handle):
assert not args
assert not isinstance(func, events.TimerHandle)
if func._cancelled:
f = futures.Future(loop=self)
f.set_result(None)
return f
func, args = func._callback, func._args
if executor is None:
executor = self._default_executor
if executor is None:
executor = concurrent.futures.ThreadPoolExecutor(_MAX_WORKERS)
self._default_executor = executor
return futures.wrap_future(executor.submit(func, *args), loop=self)
def set_default_executor(self, executor):
self._default_executor = executor
def _getaddrinfo_debug(self, host, port, family, type, proto, flags):
msg = ["%s:%r" % (host, port)]
if family:
msg.append('family=%r' % family)
if type:
msg.append('type=%r' % type)
if proto:
msg.append('proto=%r' % proto)
if flags:
msg.append('flags=%r' % flags)
msg = ', '.join(msg)
logger.debug('Get address info %s', msg)
t0 = self.time()
addrinfo = socket.getaddrinfo(host, port, family, type, proto, flags)
dt = self.time() - t0
msg = ('Getting address info %s took %.3f ms: %r'
% (msg, dt * 1e3, addrinfo))
if dt >= self.slow_callback_duration:
logger.info(msg)
else:
logger.debug(msg)
return addrinfo
def getaddrinfo(self, host, port, *,
family=0, type=0, proto=0, flags=0):
info = _ipaddr_info(host, port, family, type, proto)
if info is not None:
fut = futures.Future(loop=self)
fut.set_result([info])
return fut
elif self._debug:
return self.run_in_executor(None, self._getaddrinfo_debug,
host, port, family, type, proto, flags)
else:
return self.run_in_executor(None, socket.getaddrinfo,
host, port, family, type, proto, flags)
def getnameinfo(self, sockaddr, flags=0):
return self.run_in_executor(None, socket.getnameinfo, sockaddr, flags)
@coroutine
def create_connection(self, protocol_factory, host=None, port=None, *,
ssl=None, family=0, proto=0, flags=0, sock=None,
local_addr=None, server_hostname=None):
"""Connect to a TCP server.
Create a streaming transport connection to a given Internet host and
port: socket family AF_INET or socket.AF_INET6 depending on host (or
family if specified), socket type SOCK_STREAM. protocol_factory must be
a callable returning a protocol instance.
This method is a coroutine which will try to establish the connection
in the background. When successful, the coroutine returns a
(transport, protocol) pair.
"""
if server_hostname is not None and not ssl:
raise ValueError('server_hostname is only meaningful with ssl')
if server_hostname is None and ssl:
# Use host as default for server_hostname. It is an error
# if host is empty or not set, e.g. when an
# already-connected socket was passed or when only a port
# is given. To avoid this error, you can pass
# server_hostname='' -- this will bypass the hostname
# check. (This also means that if host is a numeric
# IP/IPv6 address, we will attempt to verify that exact
# address; this will probably fail, but it is possible to
# create a certificate for a specific IP address, so we
# don't judge it here.)
if not host:
raise ValueError('You must set server_hostname '
'when using ssl without a host')
server_hostname = host
if host is not None or port is not None:
if sock is not None:
raise ValueError(
'host/port and sock can not be specified at the same time')
f1 = self.getaddrinfo(
host, port, family=family,
type=socket.SOCK_STREAM, proto=proto, flags=flags)
fs = [f1]
if local_addr is not None:
f2 = self.getaddrinfo(
*local_addr, family=family,
type=socket.SOCK_STREAM, proto=proto, flags=flags)
fs.append(f2)
else:
f2 = None
yield from tasks.wait(fs, loop=self)
infos = f1.result()
if not infos:
raise OSError('getaddrinfo() returned empty list')
if f2 is not None:
laddr_infos = f2.result()
if not laddr_infos:
raise OSError('getaddrinfo() returned empty list')
exceptions = []
for family, type, proto, cname, address in infos:
try:
sock = socket.socket(family=family, type=type, proto=proto)
sock.setblocking(False)
if f2 is not None:
for _, _, _, _, laddr in laddr_infos:
try:
sock.bind(laddr)
break
except OSError as exc:
exc = OSError(
exc.errno, 'error while '
'attempting to bind on address '
'{!r}: {}'.format(
laddr, exc.strerror.lower()))
exceptions.append(exc)
else:
sock.close()
sock = None
continue
if self._debug:
logger.debug("connect %r to %r", sock, address)
yield from self.sock_connect(sock, address)
except OSError as exc:
if sock is not None:
sock.close()
exceptions.append(exc)
except:
if sock is not None:
sock.close()
raise
else:
break
else:
if len(exceptions) == 1:
raise exceptions[0]
else:
# If they all have the same str(), raise one.
model = str(exceptions[0])
if all(str(exc) == model for exc in exceptions):
raise exceptions[0]
# Raise a combined exception so the user can see all
# the various error messages.
raise OSError('Multiple exceptions: {}'.format(
', '.join(str(exc) for exc in exceptions)))
elif sock is None:
raise ValueError(
'host and port was not specified and no sock specified')
sock.setblocking(False)
transport, protocol = yield from self._create_connection_transport(
sock, protocol_factory, ssl, server_hostname)
if self._debug:
# Get the socket from the transport because SSL transport closes
# the old socket and creates a new SSL socket
sock = transport.get_extra_info('socket')
logger.debug("%r connected to %s:%r: (%r, %r)",
sock, host, port, transport, protocol)
return transport, protocol
@coroutine
def _create_connection_transport(self, sock, protocol_factory, ssl,
server_hostname):
protocol = protocol_factory()
waiter = futures.Future(loop=self)
if ssl:
sslcontext = None if isinstance(ssl, bool) else ssl
transport = self._make_ssl_transport(
sock, protocol, sslcontext, waiter,
server_side=False, server_hostname=server_hostname)
else:
transport = self._make_socket_transport(sock, protocol, waiter)
try:
yield from waiter
except:
transport.close()
raise
return transport, protocol
@coroutine
def create_datagram_endpoint(self, protocol_factory,
local_addr=None, remote_addr=None, *,
family=0, proto=0, flags=0,
reuse_address=None, reuse_port=None,
allow_broadcast=None, sock=None):
"""Create datagram connection."""
if sock is not None:
if (local_addr or remote_addr or
family or proto or flags or
reuse_address or reuse_port or allow_broadcast):
# show the problematic kwargs in exception msg
opts = dict(local_addr=local_addr, remote_addr=remote_addr,
family=family, proto=proto, flags=flags,
reuse_address=reuse_address, reuse_port=reuse_port,
allow_broadcast=allow_broadcast)
problems = ', '.join(
'{}={}'.format(k, v) for k, v in opts.items() if v)
raise ValueError(
'socket modifier keyword arguments can not be used '
'when sock is specified. ({})'.format(problems))
sock.setblocking(False)
r_addr = None
else:
if not (local_addr or remote_addr):
if family == 0:
raise ValueError('unexpected address family')
addr_pairs_info = (((family, proto), (None, None)),)
else:
# join address by (family, protocol)
addr_infos = collections.OrderedDict()
for idx, addr in ((0, local_addr), (1, remote_addr)):
if addr is not None:
assert isinstance(addr, tuple) and len(addr) == 2, (
'2-tuple is expected')
infos = yield from self.getaddrinfo(
*addr, family=family, type=socket.SOCK_DGRAM,
proto=proto, flags=flags)
if not infos:
raise OSError('getaddrinfo() returned empty list')
for fam, _, pro, _, address in infos:
key = (fam, pro)
if key not in addr_infos:
addr_infos[key] = [None, None]
addr_infos[key][idx] = address
# each addr has to have info for each (family, proto) pair
addr_pairs_info = [
(key, addr_pair) for key, addr_pair in addr_infos.items()
if not ((local_addr and addr_pair[0] is None) or
(remote_addr and addr_pair[1] is None))]
if not addr_pairs_info:
raise ValueError('can not get address information')
exceptions = []
if reuse_address is None:
reuse_address = os.name == 'posix' and sys.platform != 'cygwin'
for ((family, proto),
(local_address, remote_address)) in addr_pairs_info:
sock = None
r_addr = None
try:
sock = socket.socket(
family=family, type=socket.SOCK_DGRAM, proto=proto)
if reuse_address:
sock.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
if reuse_port:
if not hasattr(socket, 'SO_REUSEPORT'):
raise ValueError(
'reuse_port not supported by socket module')
else:
sock.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
if allow_broadcast:
sock.setsockopt(
socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
sock.setblocking(False)
if local_addr:
sock.bind(local_address)
if remote_addr:
yield from self.sock_connect(sock, remote_address)
r_addr = remote_address
except OSError as exc:
if sock is not None:
sock.close()
exceptions.append(exc)
except:
if sock is not None:
sock.close()
raise
else:
break
else:
raise exceptions[0]
protocol = protocol_factory()
waiter = futures.Future(loop=self)
transport = self._make_datagram_transport(
sock, protocol, r_addr, waiter)
if self._debug:
if local_addr:
logger.info("Datagram endpoint local_addr=%r remote_addr=%r "
"created: (%r, %r)",
local_addr, remote_addr, transport, protocol)
else:
logger.debug("Datagram endpoint remote_addr=%r created: "
"(%r, %r)",
remote_addr, transport, protocol)
try:
yield from waiter
except:
transport.close()
raise
return transport, protocol
@coroutine
def _create_server_getaddrinfo(self, host, port, family, flags):
infos = yield from self.getaddrinfo(host, port, family=family,
type=socket.SOCK_STREAM,
flags=flags)
if not infos:
raise OSError('getaddrinfo({!r}) returned empty list'.format(host))
return infos
@coroutine
def create_server(self, protocol_factory, host=None, port=None,
*,
family=socket.AF_UNSPEC,
flags=socket.AI_PASSIVE,
sock=None,
backlog=100,
ssl=None,
reuse_address=None,
reuse_port=None):
"""Create a TCP server.
The host parameter can be a string, in that case the TCP server is bound
to host and port.
The host parameter can also be a sequence of strings and in that case
the TCP server is bound to all hosts of the sequence.
Return a Server object which can be used to stop the service.
This method is a coroutine.
"""
if isinstance(ssl, bool):
raise TypeError('ssl argument must be an SSLContext or None')
if host is not None or port is not None:
if sock is not None:
raise ValueError(
'host/port and sock can not be specified at the same time')
AF_INET6 = getattr(socket, 'AF_INET6', 0)
if reuse_address is None:
reuse_address = os.name == 'posix' and sys.platform != 'cygwin'
sockets = []
if host == '':
hosts = [None]
elif (isinstance(host, str) or
not isinstance(host, collections.Iterable)):
hosts = [host]
else:
hosts = host
fs = [self._create_server_getaddrinfo(host, port, family=family,
flags=flags)
for host in hosts]
infos = yield from tasks.gather(*fs, loop=self)
infos = itertools.chain.from_iterable(infos)
completed = False
try:
for res in infos:
af, socktype, proto, canonname, sa = res
try:
sock = socket.socket(af, socktype, proto)
except socket.error:
# Assume it's a bad family/type/protocol combination.
if self._debug:
logger.warning('create_server() failed to create '
'socket.socket(%r, %r, %r)',
af, socktype, proto, exc_info=True)
continue
sockets.append(sock)
if reuse_address:
sock.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
if reuse_port:
if not hasattr(socket, 'SO_REUSEPORT'):
raise ValueError(
'reuse_port not supported by socket module')
else:
sock.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEPORT, True)
# Disable IPv4/IPv6 dual stack support (enabled by
# default on Linux) which makes a single socket
# listen on both address families.
if af == AF_INET6 and hasattr(socket, 'IPPROTO_IPV6'):
sock.setsockopt(socket.IPPROTO_IPV6,
socket.IPV6_V6ONLY,
True)
try:
sock.bind(sa)
except OSError as err:
raise OSError(err.errno, 'error while attempting '
'to bind on address %r: %s'
% (sa, err.strerror.lower()))
completed = True
finally:
if not completed:
for sock in sockets:
sock.close()
else:
if sock is None:
raise ValueError('Neither host/port nor sock were specified')
sockets = [sock]
server = Server(self, sockets)
for sock in sockets:
sock.listen(backlog)
sock.setblocking(False)
self._start_serving(protocol_factory, sock, ssl, server)
if self._debug:
logger.info("%r is serving", server)
return server
@coroutine
def connect_read_pipe(self, protocol_factory, pipe):
protocol = protocol_factory()
waiter = futures.Future(loop=self)
transport = self._make_read_pipe_transport(pipe, protocol, waiter)
try:
yield from waiter
except:
transport.close()
raise
if self._debug:
logger.debug('Read pipe %r connected: (%r, %r)',
pipe.fileno(), transport, protocol)
return transport, protocol
@coroutine
def connect_write_pipe(self, protocol_factory, pipe):
protocol = protocol_factory()
waiter = futures.Future(loop=self)
transport = self._make_write_pipe_transport(pipe, protocol, waiter)
try:
yield from waiter
except:
transport.close()
raise
if self._debug:
logger.debug('Write pipe %r connected: (%r, %r)',
pipe.fileno(), transport, protocol)
return transport, protocol
def _log_subprocess(self, msg, stdin, stdout, stderr):
info = [msg]
if stdin is not None:
info.append('stdin=%s' % _format_pipe(stdin))
if stdout is not None and stderr == subprocess.STDOUT:
info.append('stdout=stderr=%s' % _format_pipe(stdout))
else:
if stdout is not None:
info.append('stdout=%s' % _format_pipe(stdout))
if stderr is not None:
info.append('stderr=%s' % _format_pipe(stderr))
logger.debug(' '.join(info))
@coroutine
def subprocess_shell(self, protocol_factory, cmd, *, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=False, shell=True, bufsize=0,
**kwargs):
if not isinstance(cmd, (bytes, str)):
raise ValueError("cmd must be a string")
if universal_newlines:
raise ValueError("universal_newlines must be False")
if not shell:
raise ValueError("shell must be True")
if bufsize != 0:
raise ValueError("bufsize must be 0")
protocol = protocol_factory()
if self._debug:
# don't log parameters: they may contain sensitive information
# (password) and may be too long
debug_log = 'run shell command %r' % cmd
self._log_subprocess(debug_log, stdin, stdout, stderr)
transport = yield from self._make_subprocess_transport(
protocol, cmd, True, stdin, stdout, stderr, bufsize, **kwargs)
if self._debug:
logger.info('%s: %r' % (debug_log, transport))
return transport, protocol
@coroutine
def subprocess_exec(self, protocol_factory, program, *args,
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, universal_newlines=False,
shell=False, bufsize=0, **kwargs):
if universal_newlines:
raise ValueError("universal_newlines must be False")
if shell:
raise ValueError("shell must be False")
if bufsize != 0:
raise ValueError("bufsize must be 0")
popen_args = (program,) + args
for arg in popen_args:
if not isinstance(arg, (str, bytes)):
raise TypeError("program arguments must be "
"a bytes or text string, not %s"
% type(arg).__name__)
protocol = protocol_factory()
if self._debug:
# don't log parameters: they may contain sensitive information
# (password) and may be too long
debug_log = 'execute program %r' % program
self._log_subprocess(debug_log, stdin, stdout, stderr)
transport = yield from self._make_subprocess_transport(
protocol, popen_args, False, stdin, stdout, stderr,
bufsize, **kwargs)
if self._debug:
logger.info('%s: %r' % (debug_log, transport))
return transport, protocol
def set_exception_handler(self, handler):
"""Set handler as the new event loop exception handler.
If handler is None, the default exception handler will
be set.
If handler is a callable object, it should have a
signature matching '(loop, context)', where 'loop'
will be a reference to the active event loop, 'context'
will be a dict object (see `call_exception_handler()`
documentation for details about context).
"""
if handler is not None and not callable(handler):
raise TypeError('A callable object or None is expected, '
'got {!r}'.format(handler))
self._exception_handler = handler
def default_exception_handler(self, context):
"""Default exception handler.
This is called when an exception occurs and no exception
handler is set, and can be called by a custom exception
handler that wants to defer to the default behavior.
The context parameter has the same meaning as in
`call_exception_handler()`.
"""
message = context.get('message')
if not message:
message = 'Unhandled exception in event loop'
exception = context.get('exception')
if exception is not None:
exc_info = (type(exception), exception, exception.__traceback__)
else:
exc_info = False
if ('source_traceback' not in context
and self._current_handle is not None
and self._current_handle._source_traceback):
context['handle_traceback'] = self._current_handle._source_traceback
log_lines = [message]
for key in sorted(context):
if key in {'message', 'exception'}:
continue
value = context[key]
if key == 'source_traceback':
tb = ''.join(traceback.format_list(value))
value = 'Object created at (most recent call last):\n'
value += tb.rstrip()
elif key == 'handle_traceback':
tb = ''.join(traceback.format_list(value))
value = 'Handle created at (most recent call last):\n'
value += tb.rstrip()
else:
value = repr(value)
log_lines.append('{}: {}'.format(key, value))
logger.error('\n'.join(log_lines), exc_info=exc_info)
def call_exception_handler(self, context):
"""Call the current event loop's exception handler.
The context argument is a dict containing the following keys:
- 'message': Error message;
- 'exception' (optional): Exception object;
- 'future' (optional): Future instance;
- 'handle' (optional): Handle instance;
- 'protocol' (optional): Protocol instance;
- 'transport' (optional): Transport instance;
- 'socket' (optional): Socket instance.
New keys maybe introduced in the future.
Note: do not overload this method in an event loop subclass.
For custom exception handling, use the
`set_exception_handler()` method.
"""
if self._exception_handler is None:
try:
self.default_exception_handler(context)
except Exception:
# Second protection layer for unexpected errors
# in the default implementation, as well as for subclassed
# event loops with overloaded "default_exception_handler".
logger.error('Exception in default exception handler',
exc_info=True)
else:
try:
self._exception_handler(self, context)
except Exception as exc:
# Exception in the user set custom exception handler.
try:
# Let's try default handler.
self.default_exception_handler({
'message': 'Unhandled error in exception handler',
'exception': exc,
'context': context,
})
except Exception:
# Guard 'default_exception_handler' in case it is
# overloaded.
logger.error('Exception in default exception handler '
'while handling an unexpected error '
'in custom exception handler',
exc_info=True)
def _add_callback(self, handle):
"""Add a Handle to _scheduled (TimerHandle) or _ready."""
assert isinstance(handle, events.Handle), 'A Handle is required here'
if handle._cancelled:
return
assert not isinstance(handle, events.TimerHandle)
self._ready.append(handle)
def _add_callback_signalsafe(self, handle):
"""Like _add_callback() but called from a signal handler."""
self._add_callback(handle)
self._write_to_self()
def _timer_handle_cancelled(self, handle):
"""Notification that a TimerHandle has been cancelled."""
if handle._scheduled:
self._timer_cancelled_count += 1
def _run_once(self):
"""Run one full iteration of the event loop.
This calls all currently ready callbacks, polls for I/O,
schedules the resulting callbacks, and finally schedules
'call_later' callbacks.
"""
sched_count = len(self._scheduled)
if (sched_count > _MIN_SCHEDULED_TIMER_HANDLES and
self._timer_cancelled_count / sched_count >
_MIN_CANCELLED_TIMER_HANDLES_FRACTION):
# Remove delayed calls that were cancelled if their number
# is too high
new_scheduled = []
for handle in self._scheduled:
if handle._cancelled:
handle._scheduled = False
else:
new_scheduled.append(handle)
heapq.heapify(new_scheduled)
self._scheduled = new_scheduled
self._timer_cancelled_count = 0
else:
# Remove delayed calls that were cancelled from head of queue.
while self._scheduled and self._scheduled[0]._cancelled:
self._timer_cancelled_count -= 1
handle = heapq.heappop(self._scheduled)
handle._scheduled = False
timeout = None
if self._ready or self._stopping:
timeout = 0
elif self._scheduled:
# Compute the desired timeout.
when = self._scheduled[0]._when
timeout = max(0, when - self.time())
if self._debug and timeout != 0:
t0 = self.time()
event_list = self._selector.select(timeout)
dt = self.time() - t0
if dt >= 1.0:
level = logging.INFO
else:
level = logging.DEBUG
nevent = len(event_list)
if timeout is None:
logger.log(level, 'poll took %.3f ms: %s events',
dt * 1e3, nevent)
elif nevent:
logger.log(level,
'poll %.3f ms took %.3f ms: %s events',
timeout * 1e3, dt * 1e3, nevent)
elif dt >= 1.0:
logger.log(level,
'poll %.3f ms took %.3f ms: timeout',
timeout * 1e3, dt * 1e3)
else:
event_list = self._selector.select(timeout)
self._process_events(event_list)
# Handle 'later' callbacks that are ready.
end_time = self.time() + self._clock_resolution
while self._scheduled:
handle = self._scheduled[0]
if handle._when >= end_time:
break
handle = heapq.heappop(self._scheduled)
handle._scheduled = False
self._ready.append(handle)
# This is the only place where callbacks are actually *called*.
# All other places just add them to ready.
# Note: We run all currently scheduled callbacks, but not any
# callbacks scheduled by callbacks run this time around --
# they will be run the next time (after another I/O poll).
# Use an idiom that is thread-safe without using locks.
ntodo = len(self._ready)
for i in range(ntodo):
handle = self._ready.popleft()
if handle._cancelled:
continue
if self._debug:
try:
self._current_handle = handle
t0 = self.time()
handle._run()
dt = self.time() - t0
if dt >= self.slow_callback_duration:
logger.warning('Executing %s took %.3f seconds',
_format_handle(handle), dt)
finally:
self._current_handle = None
else:
handle._run()
handle = None # Needed to break cycles when an exception occurs.
def _set_coroutine_wrapper(self, enabled):
try:
set_wrapper = sys.set_coroutine_wrapper
get_wrapper = sys.get_coroutine_wrapper
except AttributeError:
return
enabled = bool(enabled)
if self._coroutine_wrapper_set == enabled:
return
wrapper = coroutines.debug_wrapper
current_wrapper = get_wrapper()
if enabled:
if current_wrapper not in (None, wrapper):
warnings.warn(
"loop.set_debug(True): cannot set debug coroutine "
"wrapper; another wrapper is already set %r" %
current_wrapper, RuntimeWarning)
else:
set_wrapper(wrapper)
self._coroutine_wrapper_set = True
else:
if current_wrapper not in (None, wrapper):
warnings.warn(
"loop.set_debug(False): cannot unset debug coroutine "
"wrapper; another wrapper was set %r" %
current_wrapper, RuntimeWarning)
else:
set_wrapper(None)
self._coroutine_wrapper_set = False
def get_debug(self):
return self._debug
def set_debug(self, enabled):
self._debug = enabled
if self.is_running():
self._set_coroutine_wrapper(enabled)
import collections
import subprocess
import warnings
from . import compat
from . import futures
from . import protocols
from . import transports
from .coroutines import coroutine
from .log import logger
class BaseSubprocessTransport(transports.SubprocessTransport):
def __init__(self, loop, protocol, args, shell,
stdin, stdout, stderr, bufsize,
waiter=None, extra=None, **kwargs):
super().__init__(extra)
self._closed = False
self._protocol = protocol
self._loop = loop
self._proc = None
self._pid = None
self._returncode = None
self._exit_waiters = []
self._pending_calls = collections.deque()
self._pipes = {}
self._finished = False
if stdin == subprocess.PIPE:
self._pipes[0] = None
if stdout == subprocess.PIPE:
self._pipes[1] = None
if stderr == subprocess.PIPE:
self._pipes[2] = None
# Create the child process: set the _proc attribute
try:
self._start(args=args, shell=shell, stdin=stdin, stdout=stdout,
stderr=stderr, bufsize=bufsize, **kwargs)
except:
self.close()
raise
self._pid = self._proc.pid
self._extra['subprocess'] = self._proc
if self._loop.get_debug():
if isinstance(args, (bytes, str)):
program = args
else:
program = args[0]
logger.debug('process %r created: pid %s',
program, self._pid)
self._loop.create_task(self._connect_pipes(waiter))
def __repr__(self):
info = [self.__class__.__name__]
if self._closed:
info.append('closed')
if self._pid is not None:
info.append('pid=%s' % self._pid)
if self._returncode is not None:
info.append('returncode=%s' % self._returncode)
elif self._pid is not None:
info.append('running')
else:
info.append('not started')
stdin = self._pipes.get(0)
if stdin is not None:
info.append('stdin=%s' % stdin.pipe)
stdout = self._pipes.get(1)
stderr = self._pipes.get(2)
if stdout is not None and stderr is stdout:
info.append('stdout=stderr=%s' % stdout.pipe)
else:
if stdout is not None:
info.append('stdout=%s' % stdout.pipe)
if stderr is not None:
info.append('stderr=%s' % stderr.pipe)
return '<%s>' % ' '.join(info)
def _start(self, args, shell, stdin, stdout, stderr, bufsize, **kwargs):
raise NotImplementedError
def is_closing(self):
return self._closed
def close(self):
if self._closed:
return
self._closed = True
for proto in self._pipes.values():
if proto is None:
continue
proto.pipe.close()
if (self._proc is not None
# the child process finished?
and self._returncode is None
# the child process finished but the transport was not notified yet?
and self._proc.poll() is None
):
if self._loop.get_debug():
logger.warning('Close running child process: kill %r', self)
try:
self._proc.kill()
except ProcessLookupError:
pass
# Don't clear the _proc reference yet: _post_init() may still run
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if not self._closed:
warnings.warn("unclosed transport %r" % self, ResourceWarning)
self.close()
def get_pid(self):
return self._pid
def get_returncode(self):
return self._returncode
def get_pipe_transport(self, fd):
if fd in self._pipes:
return self._pipes[fd].pipe
else:
return None
def _check_proc(self):
if self._proc is None:
raise ProcessLookupError()
def send_signal(self, signal):
self._check_proc()
self._proc.send_signal(signal)
def terminate(self):
self._check_proc()
self._proc.terminate()
def kill(self):
self._check_proc()
self._proc.kill()
@coroutine
def _connect_pipes(self, waiter):
try:
proc = self._proc
loop = self._loop
if proc.stdin is not None:
_, pipe = yield from loop.connect_write_pipe(
lambda: WriteSubprocessPipeProto(self, 0),
proc.stdin)
self._pipes[0] = pipe
if proc.stdout is not None:
_, pipe = yield from loop.connect_read_pipe(
lambda: ReadSubprocessPipeProto(self, 1),
proc.stdout)
self._pipes[1] = pipe
if proc.stderr is not None:
_, pipe = yield from loop.connect_read_pipe(
lambda: ReadSubprocessPipeProto(self, 2),
proc.stderr)
self._pipes[2] = pipe
assert self._pending_calls is not None
loop.call_soon(self._protocol.connection_made, self)
for callback, data in self._pending_calls:
loop.call_soon(callback, *data)
self._pending_calls = None
except Exception as exc:
if waiter is not None and not waiter.cancelled():
waiter.set_exception(exc)
else:
if waiter is not None and not waiter.cancelled():
waiter.set_result(None)
def _call(self, cb, *data):
if self._pending_calls is not None:
self._pending_calls.append((cb, data))
else:
self._loop.call_soon(cb, *data)
def _pipe_connection_lost(self, fd, exc):
self._call(self._protocol.pipe_connection_lost, fd, exc)
self._try_finish()
def _pipe_data_received(self, fd, data):
self._call(self._protocol.pipe_data_received, fd, data)
def _process_exited(self, returncode):
assert returncode is not None, returncode
assert self._returncode is None, self._returncode
if self._loop.get_debug():
logger.info('%r exited with return code %r',
self, returncode)
self._returncode = returncode
self._call(self._protocol.process_exited)
self._try_finish()
# wake up futures waiting for wait()
for waiter in self._exit_waiters:
if not waiter.cancelled():
waiter.set_result(returncode)
self._exit_waiters = None
@coroutine
def _wait(self):
"""Wait until the process exit and return the process return code.
This method is a coroutine."""
if self._returncode is not None:
return self._returncode
waiter = futures.Future(loop=self._loop)
self._exit_waiters.append(waiter)
return (yield from waiter)
def _try_finish(self):
assert not self._finished
if self._returncode is None:
return
if all(p is not None and p.disconnected
for p in self._pipes.values()):
self._finished = True
self._call(self._call_connection_lost, None)
def _call_connection_lost(self, exc):
try:
self._protocol.connection_lost(exc)
finally:
self._loop = None
self._proc = None
self._protocol = None
class WriteSubprocessPipeProto(protocols.BaseProtocol):
def __init__(self, proc, fd):
self.proc = proc
self.fd = fd
self.pipe = None
self.disconnected = False
def connection_made(self, transport):
self.pipe = transport
def __repr__(self):
return ('<%s fd=%s pipe=%r>'
% (self.__class__.__name__, self.fd, self.pipe))
def connection_lost(self, exc):
self.disconnected = True
self.proc._pipe_connection_lost(self.fd, exc)
self.proc = None
def pause_writing(self):
self.proc._protocol.pause_writing()
def resume_writing(self):
self.proc._protocol.resume_writing()
class ReadSubprocessPipeProto(WriteSubprocessPipeProto,
protocols.Protocol):
def data_received(self, data):
self.proc._pipe_data_received(self.fd, data)
"""Compatibility helpers for the different Python versions."""
import sys
PY34 = sys.version_info >= (3, 4)
PY35 = sys.version_info >= (3, 5)
def flatten_list_bytes(list_of_data):
"""Concatenate a sequence of bytes-like objects."""
if not PY34:
# On Python 3.3 and older, bytes.join() doesn't handle
# memoryview.
list_of_data = (
bytes(data) if isinstance(data, memoryview) else data
for data in list_of_data)
return b''.join(list_of_data)
"""Constants."""
# After the connection is lost, log warnings after this many write()s.
LOG_THRESHOLD_FOR_CONNLOST_WRITES = 5
# Seconds to wait before retrying accept().
ACCEPT_RETRY_DELAY = 1
__all__ = ['coroutine',
'iscoroutinefunction', 'iscoroutine']
import functools
import inspect
import opcode
import os
import sys
import traceback
import types
from . import compat
from . import events
from . import futures
from .log import logger
# Opcode of "yield from" instruction
_YIELD_FROM = opcode.opmap['YIELD_FROM']
# If you set _DEBUG to true, @coroutine will wrap the resulting
# generator objects in a CoroWrapper instance (defined below). That
# instance will log a message when the generator is never iterated
# over, which may happen when you forget to use "yield from" with a
# coroutine call. Note that the value of the _DEBUG flag is taken
# when the decorator is used, so to be of any use it must be set
# before you define your coroutines. A downside of using this feature
# is that tracebacks show entries for the CoroWrapper.__next__ method
# when _DEBUG is true.
_DEBUG = (not sys.flags.ignore_environment and
bool(os.environ.get('PYTHONASYNCIODEBUG')))
try:
_types_coroutine = types.coroutine
except AttributeError:
_types_coroutine = None
try:
_inspect_iscoroutinefunction = inspect.iscoroutinefunction
except AttributeError:
_inspect_iscoroutinefunction = lambda func: False
try:
from collections.abc import Coroutine as _CoroutineABC, \
Awaitable as _AwaitableABC
except ImportError:
_CoroutineABC = _AwaitableABC = None
# Check for CPython issue #21209
def has_yield_from_bug():
class MyGen:
def __init__(self):
self.send_args = None
def __iter__(self):
return self
def __next__(self):
return 42
def send(self, *what):
self.send_args = what
return None
def yield_from_gen(gen):
yield from gen
value = (1, 2, 3)
gen = MyGen()
coro = yield_from_gen(gen)
next(coro)
coro.send(value)
return gen.send_args != (value,)
_YIELD_FROM_BUG = has_yield_from_bug()
del has_yield_from_bug
def debug_wrapper(gen):
# This function is called from 'sys.set_coroutine_wrapper'.
# We only wrap here coroutines defined via 'async def' syntax.
# Generator-based coroutines are wrapped in @coroutine
# decorator.
return CoroWrapper(gen, None)
class CoroWrapper:
# Wrapper for coroutine object in _DEBUG mode.
def __init__(self, gen, func=None):
assert inspect.isgenerator(gen) or inspect.iscoroutine(gen), gen
self.gen = gen
self.func = func # Used to unwrap @coroutine decorator
self._source_traceback = traceback.extract_stack(sys._getframe(1))
self.__name__ = getattr(gen, '__name__', None)
self.__qualname__ = getattr(gen, '__qualname__', None)
def __repr__(self):
coro_repr = _format_coroutine(self)
if self._source_traceback:
frame = self._source_traceback[-1]
coro_repr += ', created at %s:%s' % (frame[0], frame[1])
return '<%s %s>' % (self.__class__.__name__, coro_repr)
def __iter__(self):
return self
def __next__(self):
return self.gen.send(None)
if _YIELD_FROM_BUG:
# For for CPython issue #21209: using "yield from" and a custom
# generator, generator.send(tuple) unpacks the tuple instead of passing
# the tuple unchanged. Check if the caller is a generator using "yield
# from" to decide if the parameter should be unpacked or not.
def send(self, *value):
frame = sys._getframe()
caller = frame.f_back
assert caller.f_lasti >= 0
if caller.f_code.co_code[caller.f_lasti] != _YIELD_FROM:
value = value[0]
return self.gen.send(value)
else:
def send(self, value):
return self.gen.send(value)
def throw(self, exc):
return self.gen.throw(exc)
def close(self):
return self.gen.close()
@property
def gi_frame(self):
return self.gen.gi_frame
@property
def gi_running(self):
return self.gen.gi_running
@property
def gi_code(self):
return self.gen.gi_code
if compat.PY35:
def __await__(self):
cr_await = getattr(self.gen, 'cr_await', None)
if cr_await is not None:
raise RuntimeError(
"Cannot await on coroutine {!r} while it's "
"awaiting for {!r}".format(self.gen, cr_await))
return self
@property
def gi_yieldfrom(self):
return self.gen.gi_yieldfrom
@property
def cr_await(self):
return self.gen.cr_await
@property
def cr_running(self):
return self.gen.cr_running
@property
def cr_code(self):
return self.gen.cr_code
@property
def cr_frame(self):
return self.gen.cr_frame
def __del__(self):
# Be careful accessing self.gen.frame -- self.gen might not exist.
gen = getattr(self, 'gen', None)
frame = getattr(gen, 'gi_frame', None)
if frame is None:
frame = getattr(gen, 'cr_frame', None)
if frame is not None and frame.f_lasti == -1:
msg = '%r was never yielded from' % self
tb = getattr(self, '_source_traceback', ())
if tb:
tb = ''.join(traceback.format_list(tb))
msg += ('\nCoroutine object created at '
'(most recent call last):\n')
msg += tb.rstrip()
logger.error(msg)
def coroutine(func):
"""Decorator to mark coroutines.
If the coroutine is not yielded from before it is destroyed,
an error message is logged.
"""
if _inspect_iscoroutinefunction(func):
# In Python 3.5 that's all we need to do for coroutines
# defiend with "async def".
# Wrapping in CoroWrapper will happen via
# 'sys.set_coroutine_wrapper' function.
return func
if inspect.isgeneratorfunction(func):
coro = func
else:
@functools.wraps(func)
def coro(*args, **kw):
res = func(*args, **kw)
if isinstance(res, futures.Future) or inspect.isgenerator(res):
res = yield from res
elif _AwaitableABC is not None:
# If 'func' returns an Awaitable (new in 3.5) we
# want to run it.
try:
await_meth = res.__await__
except AttributeError:
pass
else:
if isinstance(res, _AwaitableABC):
res = yield from await_meth()
return res
if not _DEBUG:
if _types_coroutine is None:
wrapper = coro
else:
wrapper = _types_coroutine(coro)
else:
@functools.wraps(func)
def wrapper(*args, **kwds):
w = CoroWrapper(coro(*args, **kwds), func=func)
if w._source_traceback:
del w._source_traceback[-1]
# Python < 3.5 does not implement __qualname__
# on generator objects, so we set it manually.
# We use getattr as some callables (such as
# functools.partial may lack __qualname__).
w.__name__ = getattr(func, '__name__', None)
w.__qualname__ = getattr(func, '__qualname__', None)
return w
wrapper._is_coroutine = True # For iscoroutinefunction().
return wrapper
def iscoroutinefunction(func):
"""Return True if func is a decorated coroutine function."""
return (getattr(func, '_is_coroutine', False) or
_inspect_iscoroutinefunction(func))
_COROUTINE_TYPES = (types.GeneratorType, CoroWrapper)
if _CoroutineABC is not None:
_COROUTINE_TYPES += (_CoroutineABC,)
def iscoroutine(obj):
"""Return True if obj is a coroutine object."""
return isinstance(obj, _COROUTINE_TYPES)
def _format_coroutine(coro):
assert iscoroutine(coro)
coro_name = None
if isinstance(coro, CoroWrapper):
func = coro.func
coro_name = coro.__qualname__
if coro_name is not None:
coro_name = '{}()'.format(coro_name)
else:
func = coro
if coro_name is None:
coro_name = events._format_callback(func, ())
try:
coro_code = coro.gi_code
except AttributeError:
coro_code = coro.cr_code
try:
coro_frame = coro.gi_frame
except AttributeError:
coro_frame = coro.cr_frame
filename = coro_code.co_filename
lineno = 0
if (isinstance(coro, CoroWrapper) and
not inspect.isgeneratorfunction(coro.func) and
coro.func is not None):
source = events._get_function_source(coro.func)
if source is not None:
filename, lineno = source
if coro_frame is None:
coro_repr = ('%s done, defined at %s:%s'
% (coro_name, filename, lineno))
else:
coro_repr = ('%s running, defined at %s:%s'
% (coro_name, filename, lineno))
elif coro_frame is not None:
lineno = coro_frame.f_lineno
coro_repr = ('%s running at %s:%s'
% (coro_name, filename, lineno))
else:
lineno = coro_code.co_firstlineno
coro_repr = ('%s done, defined at %s:%s'
% (coro_name, filename, lineno))
return coro_repr
"""Event loop and event loop policy."""
__all__ = ['AbstractEventLoopPolicy',
'AbstractEventLoop', 'AbstractServer',
'Handle', 'TimerHandle',
'get_event_loop_policy', 'set_event_loop_policy',
'get_event_loop', 'set_event_loop', 'new_event_loop',
'get_child_watcher', 'set_child_watcher',
]
import functools
import inspect
import reprlib
import socket
import subprocess
import sys
import threading
import traceback
from asyncio import compat
def _get_function_source(func):
if compat.PY34:
func = inspect.unwrap(func)
elif hasattr(func, '__wrapped__'):
func = func.__wrapped__
if inspect.isfunction(func):
code = func.__code__
return (code.co_filename, code.co_firstlineno)
if isinstance(func, functools.partial):
return _get_function_source(func.func)
if compat.PY34 and isinstance(func, functools.partialmethod):
return _get_function_source(func.func)
return None
def _format_args(args):
"""Format function arguments.
Special case for a single parameter: ('hello',) is formatted as ('hello').
"""
# use reprlib to limit the length of the output
args_repr = reprlib.repr(args)
if len(args) == 1 and args_repr.endswith(',)'):
args_repr = args_repr[:-2] + ')'
return args_repr
def _format_callback(func, args, suffix=''):
if isinstance(func, functools.partial):
if args is not None:
suffix = _format_args(args) + suffix
return _format_callback(func.func, func.args, suffix)
if hasattr(func, '__qualname__'):
func_repr = getattr(func, '__qualname__')
elif hasattr(func, '__name__'):
func_repr = getattr(func, '__name__')
else:
func_repr = repr(func)
if args is not None:
func_repr += _format_args(args)
if suffix:
func_repr += suffix
return func_repr
def _format_callback_source(func, args):
func_repr = _format_callback(func, args)
source = _get_function_source(func)
if source:
func_repr += ' at %s:%s' % source
return func_repr
class Handle:
"""Object returned by callback registration methods."""
__slots__ = ('_callback', '_args', '_cancelled', '_loop',
'_source_traceback', '_repr', '__weakref__')
def __init__(self, callback, args, loop):
assert not isinstance(callback, Handle), 'A Handle is not a callback'
self._loop = loop
self._callback = callback
self._args = args
self._cancelled = False
self._repr = None
if self._loop.get_debug():
self._source_traceback = traceback.extract_stack(sys._getframe(1))
else:
self._source_traceback = None
def _repr_info(self):
info = [self.__class__.__name__]
if self._cancelled:
info.append('cancelled')
if self._callback is not None:
info.append(_format_callback_source(self._callback, self._args))
if self._source_traceback:
frame = self._source_traceback[-1]
info.append('created at %s:%s' % (frame[0], frame[1]))
return info
def __repr__(self):
if self._repr is not None:
return self._repr
info = self._repr_info()
return '<%s>' % ' '.join(info)
def cancel(self):
if not self._cancelled:
self._cancelled = True
if self._loop.get_debug():
# Keep a representation in debug mode to keep callback and
# parameters. For example, to log the warning
# "Executing <Handle...> took 2.5 second"
self._repr = repr(self)
self._callback = None
self._args = None
def _run(self):
try:
self._callback(*self._args)
except Exception as exc:
cb = _format_callback_source(self._callback, self._args)
msg = 'Exception in callback {}'.format(cb)
context = {
'message': msg,
'exception': exc,
'handle': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self = None # Needed to break cycles when an exception occurs.
class TimerHandle(Handle):
"""Object returned by timed callback registration methods."""
__slots__ = ['_scheduled', '_when']
def __init__(self, when, callback, args, loop):
assert when is not None
super().__init__(callback, args, loop)
if self._source_traceback:
del self._source_traceback[-1]
self._when = when
self._scheduled = False
def _repr_info(self):
info = super()._repr_info()
pos = 2 if self._cancelled else 1
info.insert(pos, 'when=%s' % self._when)
return info
def __hash__(self):
return hash(self._when)
def __lt__(self, other):
return self._when < other._when
def __le__(self, other):
if self._when < other._when:
return True
return self.__eq__(other)
def __gt__(self, other):
return self._when > other._when
def __ge__(self, other):
if self._when > other._when:
return True
return self.__eq__(other)
def __eq__(self, other):
if isinstance(other, TimerHandle):
return (self._when == other._when and
self._callback == other._callback and
self._args == other._args and
self._cancelled == other._cancelled)
return NotImplemented
def __ne__(self, other):
equal = self.__eq__(other)
return NotImplemented if equal is NotImplemented else not equal
def cancel(self):
if not self._cancelled:
self._loop._timer_handle_cancelled(self)
super().cancel()
class AbstractServer:
"""Abstract server returned by create_server()."""
def close(self):
"""Stop serving. This leaves existing connections open."""
return NotImplemented
def wait_closed(self):
"""Coroutine to wait until service is closed."""
return NotImplemented
class AbstractEventLoop:
"""Abstract event loop."""
# Running and stopping the event loop.
def run_forever(self):
"""Run the event loop until stop() is called."""
raise NotImplementedError
def run_until_complete(self, future):
"""Run the event loop until a Future is done.
Return the Future's result, or raise its exception.
"""
raise NotImplementedError
def stop(self):
"""Stop the event loop as soon as reasonable.
Exactly how soon that is may depend on the implementation, but
no more I/O callbacks should be scheduled.
"""
raise NotImplementedError
def is_running(self):
"""Return whether the event loop is currently running."""
raise NotImplementedError
def is_closed(self):
"""Returns True if the event loop was closed."""
raise NotImplementedError
def close(self):
"""Close the loop.
The loop should not be running.
This is idempotent and irreversible.
No other methods should be called after this one.
"""
raise NotImplementedError
# Methods scheduling callbacks. All these return Handles.
def _timer_handle_cancelled(self, handle):
"""Notification that a TimerHandle has been cancelled."""
raise NotImplementedError
def call_soon(self, callback, *args):
return self.call_later(0, callback, *args)
def call_later(self, delay, callback, *args):
raise NotImplementedError
def call_at(self, when, callback, *args):
raise NotImplementedError
def time(self):
raise NotImplementedError
# Method scheduling a coroutine object: create a task.
def create_task(self, coro):
raise NotImplementedError
# Methods for interacting with threads.
def call_soon_threadsafe(self, callback, *args):
raise NotImplementedError
def run_in_executor(self, executor, func, *args):
raise NotImplementedError
def set_default_executor(self, executor):
raise NotImplementedError
# Network I/O methods returning Futures.
def getaddrinfo(self, host, port, *, family=0, type=0, proto=0, flags=0):
raise NotImplementedError
def getnameinfo(self, sockaddr, flags=0):
raise NotImplementedError
def create_connection(self, protocol_factory, host=None, port=None, *,
ssl=None, family=0, proto=0, flags=0, sock=None,
local_addr=None, server_hostname=None):
raise NotImplementedError
def create_server(self, protocol_factory, host=None, port=None, *,
family=socket.AF_UNSPEC, flags=socket.AI_PASSIVE,
sock=None, backlog=100, ssl=None, reuse_address=None,
reuse_port=None):
"""A coroutine which creates a TCP server bound to host and port.
The return value is a Server object which can be used to stop
the service.
If host is an empty string or None all interfaces are assumed
and a list of multiple sockets will be returned (most likely
one for IPv4 and another one for IPv6). The host parameter can also be a
sequence (e.g. list) of hosts to bind to.
family can be set to either AF_INET or AF_INET6 to force the
socket to use IPv4 or IPv6. If not set it will be determined
from host (defaults to AF_UNSPEC).
flags is a bitmask for getaddrinfo().
sock can optionally be specified in order to use a preexisting
socket object.
backlog is the maximum number of queued connections passed to
listen() (defaults to 100).
ssl can be set to an SSLContext to enable SSL over the
accepted connections.
reuse_address tells the kernel to reuse a local socket in
TIME_WAIT state, without waiting for its natural timeout to
expire. If not specified will automatically be set to True on
UNIX.
reuse_port tells the kernel to allow this endpoint to be bound to
the same port as other existing endpoints are bound to, so long as
they all set this flag when being created. This option is not
supported on Windows.
"""
raise NotImplementedError
def create_unix_connection(self, protocol_factory, path, *,
ssl=None, sock=None,
server_hostname=None):
raise NotImplementedError
def create_unix_server(self, protocol_factory, path, *,
sock=None, backlog=100, ssl=None):
"""A coroutine which creates a UNIX Domain Socket server.
The return value is a Server object, which can be used to stop
the service.
path is a str, representing a file systsem path to bind the
server socket to.
sock can optionally be specified in order to use a preexisting
socket object.
backlog is the maximum number of queued connections passed to
listen() (defaults to 100).
ssl can be set to an SSLContext to enable SSL over the
accepted connections.
"""
raise NotImplementedError
def create_datagram_endpoint(self, protocol_factory,
local_addr=None, remote_addr=None, *,
family=0, proto=0, flags=0,
reuse_address=None, reuse_port=None,
allow_broadcast=None, sock=None):
"""A coroutine which creates a datagram endpoint.
This method will try to establish the endpoint in the background.
When successful, the coroutine returns a (transport, protocol) pair.
protocol_factory must be a callable returning a protocol instance.
socket family AF_INET or socket.AF_INET6 depending on host (or
family if specified), socket type SOCK_DGRAM.
reuse_address tells the kernel to reuse a local socket in
TIME_WAIT state, without waiting for its natural timeout to
expire. If not specified it will automatically be set to True on
UNIX.
reuse_port tells the kernel to allow this endpoint to be bound to
the same port as other existing endpoints are bound to, so long as
they all set this flag when being created. This option is not
supported on Windows and some UNIX's. If the
:py:data:`~socket.SO_REUSEPORT` constant is not defined then this
capability is unsupported.
allow_broadcast tells the kernel to allow this endpoint to send
messages to the broadcast address.
sock can optionally be specified in order to use a preexisting
socket object.
"""
raise NotImplementedError
# Pipes and subprocesses.
def connect_read_pipe(self, protocol_factory, pipe):
"""Register read pipe in event loop. Set the pipe to non-blocking mode.
protocol_factory should instantiate object with Protocol interface.
pipe is a file-like object.
Return pair (transport, protocol), where transport supports the
ReadTransport interface."""
# The reason to accept file-like object instead of just file descriptor
# is: we need to own pipe and close it at transport finishing
# Can got complicated errors if pass f.fileno(),
# close fd in pipe transport then close f and vise versa.
raise NotImplementedError
def connect_write_pipe(self, protocol_factory, pipe):
"""Register write pipe in event loop.
protocol_factory should instantiate object with BaseProtocol interface.
Pipe is file-like object already switched to nonblocking.
Return pair (transport, protocol), where transport support
WriteTransport interface."""
# The reason to accept file-like object instead of just file descriptor
# is: we need to own pipe and close it at transport finishing
# Can got complicated errors if pass f.fileno(),
# close fd in pipe transport then close f and vise versa.
raise NotImplementedError
def subprocess_shell(self, protocol_factory, cmd, *, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
**kwargs):
raise NotImplementedError
def subprocess_exec(self, protocol_factory, *args, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
**kwargs):
raise NotImplementedError
# Ready-based callback registration methods.
# The add_*() methods return None.
# The remove_*() methods return True if something was removed,
# False if there was nothing to delete.
def add_reader(self, fd, callback, *args):
raise NotImplementedError
def remove_reader(self, fd):
raise NotImplementedError
def add_writer(self, fd, callback, *args):
raise NotImplementedError
def remove_writer(self, fd):
raise NotImplementedError
# Completion based I/O methods returning Futures.
def sock_recv(self, sock, nbytes):
raise NotImplementedError
def sock_sendall(self, sock, data):
raise NotImplementedError
def sock_connect(self, sock, address):
raise NotImplementedError
def sock_accept(self, sock):
raise NotImplementedError
# Signal handling.
def add_signal_handler(self, sig, callback, *args):
raise NotImplementedError
def remove_signal_handler(self, sig):
raise NotImplementedError
# Task factory.
def set_task_factory(self, factory):
raise NotImplementedError
def get_task_factory(self):
raise NotImplementedError
# Error handlers.
def set_exception_handler(self, handler):
raise NotImplementedError
def default_exception_handler(self, context):
raise NotImplementedError
def call_exception_handler(self, context):
raise NotImplementedError
# Debug flag management.
def get_debug(self):
raise NotImplementedError
def set_debug(self, enabled):
raise NotImplementedError
class AbstractEventLoopPolicy:
"""Abstract policy for accessing the event loop."""
def get_event_loop(self):
"""Get the event loop for the current context.
Returns an event loop object implementing the BaseEventLoop interface,
or raises an exception in case no event loop has been set for the
current context and the current policy does not specify to create one.
It should never return None."""
raise NotImplementedError
def set_event_loop(self, loop):
"""Set the event loop for the current context to loop."""
raise NotImplementedError
def new_event_loop(self):
"""Create and return a new event loop object according to this
policy's rules. If there's need to set this loop as the event loop for
the current context, set_event_loop must be called explicitly."""
raise NotImplementedError
# Child processes handling (Unix only).
def get_child_watcher(self):
"Get the watcher for child processes."
raise NotImplementedError
def set_child_watcher(self, watcher):
"""Set the watcher for child processes."""
raise NotImplementedError
class BaseDefaultEventLoopPolicy(AbstractEventLoopPolicy):
"""Default policy implementation for accessing the event loop.
In this policy, each thread has its own event loop. However, we
only automatically create an event loop by default for the main
thread; other threads by default have no event loop.
Other policies may have different rules (e.g. a single global
event loop, or automatically creating an event loop per thread, or
using some other notion of context to which an event loop is
associated).
"""
_loop_factory = None
class _Local(threading.local):
_loop = None
_set_called = False
def __init__(self):
self._local = self._Local()
def get_event_loop(self):
"""Get the event loop.
This may be None or an instance of EventLoop.
"""
if (self._local._loop is None and
not self._local._set_called and
isinstance(threading.current_thread(), threading._MainThread)):
self.set_event_loop(self.new_event_loop())
if self._local._loop is None:
raise RuntimeError('There is no current event loop in thread %r.'
% threading.current_thread().name)
return self._local._loop
def set_event_loop(self, loop):
"""Set the event loop."""
self._local._set_called = True
assert loop is None or isinstance(loop, AbstractEventLoop)
self._local._loop = loop
def new_event_loop(self):
"""Create a new event loop.
You must call set_event_loop() to make this the current event
loop.
"""
return self._loop_factory()
# Event loop policy. The policy itself is always global, even if the
# policy's rules say that there is an event loop per thread (or other
# notion of context). The default policy is installed by the first
# call to get_event_loop_policy().
_event_loop_policy = None
# Lock for protecting the on-the-fly creation of the event loop policy.
_lock = threading.Lock()
def _init_event_loop_policy():
global _event_loop_policy
with _lock:
if _event_loop_policy is None: # pragma: no branch
from . import DefaultEventLoopPolicy
_event_loop_policy = DefaultEventLoopPolicy()
def get_event_loop_policy():
"""Get the current event loop policy."""
if _event_loop_policy is None:
_init_event_loop_policy()
return _event_loop_policy
def set_event_loop_policy(policy):
"""Set the current event loop policy.
If policy is None, the default policy is restored."""
global _event_loop_policy
assert policy is None or isinstance(policy, AbstractEventLoopPolicy)
_event_loop_policy = policy
def get_event_loop():
"""Equivalent to calling get_event_loop_policy().get_event_loop()."""
return get_event_loop_policy().get_event_loop()
def set_event_loop(loop):
"""Equivalent to calling get_event_loop_policy().set_event_loop(loop)."""
get_event_loop_policy().set_event_loop(loop)
def new_event_loop():
"""Equivalent to calling get_event_loop_policy().new_event_loop()."""
return get_event_loop_policy().new_event_loop()
def get_child_watcher():
"""Equivalent to calling get_event_loop_policy().get_child_watcher()."""
return get_event_loop_policy().get_child_watcher()
def set_child_watcher(watcher):
"""Equivalent to calling
get_event_loop_policy().set_child_watcher(watcher)."""
return get_event_loop_policy().set_child_watcher(watcher)
"""A Future class similar to the one in PEP 3148."""
__all__ = ['CancelledError', 'TimeoutError',
'InvalidStateError',
'Future', 'wrap_future',
]
import concurrent.futures._base
import logging
import reprlib
import sys
import traceback
from . import compat
from . import events
# States for Future.
_PENDING = 'PENDING'
_CANCELLED = 'CANCELLED'
_FINISHED = 'FINISHED'
Error = concurrent.futures._base.Error
CancelledError = concurrent.futures.CancelledError
TimeoutError = concurrent.futures.TimeoutError
STACK_DEBUG = logging.DEBUG - 1 # heavy-duty debugging
class InvalidStateError(Error):
"""The operation is not allowed in this state."""
class _TracebackLogger:
"""Helper to log a traceback upon destruction if not cleared.
This solves a nasty problem with Futures and Tasks that have an
exception set: if nobody asks for the exception, the exception is
never logged. This violates the Zen of Python: 'Errors should
never pass silently. Unless explicitly silenced.'
However, we don't want to log the exception as soon as
set_exception() is called: if the calling code is written
properly, it will get the exception and handle it properly. But
we *do* want to log it if result() or exception() was never called
-- otherwise developers waste a lot of time wondering why their
buggy code fails silently.
An earlier attempt added a __del__() method to the Future class
itself, but this backfired because the presence of __del__()
prevents garbage collection from breaking cycles. A way out of
this catch-22 is to avoid having a __del__() method on the Future
class itself, but instead to have a reference to a helper object
with a __del__() method that logs the traceback, where we ensure
that the helper object doesn't participate in cycles, and only the
Future has a reference to it.
The helper object is added when set_exception() is called. When
the Future is collected, and the helper is present, the helper
object is also collected, and its __del__() method will log the
traceback. When the Future's result() or exception() method is
called (and a helper object is present), it removes the helper
object, after calling its clear() method to prevent it from
logging.
One downside is that we do a fair amount of work to extract the
traceback from the exception, even when it is never logged. It
would seem cheaper to just store the exception object, but that
references the traceback, which references stack frames, which may
reference the Future, which references the _TracebackLogger, and
then the _TracebackLogger would be included in a cycle, which is
what we're trying to avoid! As an optimization, we don't
immediately format the exception; we only do the work when
activate() is called, which call is delayed until after all the
Future's callbacks have run. Since usually a Future has at least
one callback (typically set by 'yield from') and usually that
callback extracts the callback, thereby removing the need to
format the exception.
PS. I don't claim credit for this solution. I first heard of it
in a discussion about closing files when they are collected.
"""
__slots__ = ('loop', 'source_traceback', 'exc', 'tb')
def __init__(self, future, exc):
self.loop = future._loop
self.source_traceback = future._source_traceback
self.exc = exc
self.tb = None
def activate(self):
exc = self.exc
if exc is not None:
self.exc = None
self.tb = traceback.format_exception(exc.__class__, exc,
exc.__traceback__)
def clear(self):
self.exc = None
self.tb = None
def __del__(self):
if self.tb:
msg = 'Future/Task exception was never retrieved\n'
if self.source_traceback:
src = ''.join(traceback.format_list(self.source_traceback))
msg += 'Future/Task created at (most recent call last):\n'
msg += '%s\n' % src.rstrip()
msg += ''.join(self.tb).rstrip()
self.loop.call_exception_handler({'message': msg})
class Future:
"""This class is *almost* compatible with concurrent.futures.Future.
Differences:
- result() and exception() do not take a timeout argument and
raise an exception when the future isn't done yet.
- Callbacks registered with add_done_callback() are always called
via the event loop's call_soon_threadsafe().
- This class is not compatible with the wait() and as_completed()
methods in the concurrent.futures package.
(In Python 3.4 or later we may be able to unify the implementations.)
"""
# Class variables serving as defaults for instance variables.
_state = _PENDING
_result = None
_exception = None
_loop = None
_source_traceback = None
_blocking = False # proper use of future (yield vs yield from)
_log_traceback = False # Used for Python 3.4 and later
_tb_logger = None # Used for Python 3.3 only
def __init__(self, *, loop=None):
"""Initialize the future.
The optional event_loop argument allows to explicitly set the event
loop object used by the future. If it's not provided, the future uses
the default event loop.
"""
if loop is None:
self._loop = events.get_event_loop()
else:
self._loop = loop
self._callbacks = []
if self._loop.get_debug():
self._source_traceback = traceback.extract_stack(sys._getframe(1))
def __format_callbacks(self):
cb = self._callbacks
size = len(cb)
if not size:
cb = ''
def format_cb(callback):
return events._format_callback_source(callback, ())
if size == 1:
cb = format_cb(cb[0])
elif size == 2:
cb = '{}, {}'.format(format_cb(cb[0]), format_cb(cb[1]))
elif size > 2:
cb = '{}, <{} more>, {}'.format(format_cb(cb[0]),
size-2,
format_cb(cb[-1]))
return 'cb=[%s]' % cb
def _repr_info(self):
info = [self._state.lower()]
if self._state == _FINISHED:
if self._exception is not None:
info.append('exception={!r}'.format(self._exception))
else:
# use reprlib to limit the length of the output, especially
# for very long strings
result = reprlib.repr(self._result)
info.append('result={}'.format(result))
if self._callbacks:
info.append(self.__format_callbacks())
if self._source_traceback:
frame = self._source_traceback[-1]
info.append('created at %s:%s' % (frame[0], frame[1]))
return info
def __repr__(self):
info = self._repr_info()
return '<%s %s>' % (self.__class__.__name__, ' '.join(info))
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if not self._log_traceback:
# set_exception() was not called, or result() or exception()
# has consumed the exception
return
exc = self._exception
context = {
'message': ('%s exception was never retrieved'
% self.__class__.__name__),
'exception': exc,
'future': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
def cancel(self):
"""Cancel the future and schedule callbacks.
If the future is already done or cancelled, return False. Otherwise,
change the future's state to cancelled, schedule the callbacks and
return True.
"""
if self._state != _PENDING:
return False
self._state = _CANCELLED
self._schedule_callbacks()
return True
def _schedule_callbacks(self):
"""Internal: Ask the event loop to call all callbacks.
The callbacks are scheduled to be called as soon as possible. Also
clears the callback list.
"""
callbacks = self._callbacks[:]
if not callbacks:
return
self._callbacks[:] = []
for callback in callbacks:
self._loop.call_soon(callback, self)
def cancelled(self):
"""Return True if the future was cancelled."""
return self._state == _CANCELLED
# Don't implement running(); see http://bugs.python.org/issue18699
def done(self):
"""Return True if the future is done.
Done means either that a result / exception are available, or that the
future was cancelled.
"""
return self._state != _PENDING
def result(self):
"""Return the result this future represents.
If the future has been cancelled, raises CancelledError. If the
future's result isn't yet available, raises InvalidStateError. If
the future is done and has an exception set, this exception is raised.
"""
if self._state == _CANCELLED:
raise CancelledError
if self._state != _FINISHED:
raise InvalidStateError('Result is not ready.')
self._log_traceback = False
if self._tb_logger is not None:
self._tb_logger.clear()
self._tb_logger = None
if self._exception is not None:
raise self._exception
return self._result
def exception(self):
"""Return the exception that was set on this future.
The exception (or None if no exception was set) is returned only if
the future is done. If the future has been cancelled, raises
CancelledError. If the future isn't done yet, raises
InvalidStateError.
"""
if self._state == _CANCELLED:
raise CancelledError
if self._state != _FINISHED:
raise InvalidStateError('Exception is not set.')
self._log_traceback = False
if self._tb_logger is not None:
self._tb_logger.clear()
self._tb_logger = None
return self._exception
def add_done_callback(self, fn):
"""Add a callback to be run when the future becomes done.
The callback is called with a single argument - the future object. If
the future is already done when this is called, the callback is
scheduled with call_soon.
"""
if self._state != _PENDING:
self._loop.call_soon(fn, self)
else:
self._callbacks.append(fn)
# New method not in PEP 3148.
def remove_done_callback(self, fn):
"""Remove all instances of a callback from the "call when done" list.
Returns the number of callbacks removed.
"""
filtered_callbacks = [f for f in self._callbacks if f != fn]
removed_count = len(self._callbacks) - len(filtered_callbacks)
if removed_count:
self._callbacks[:] = filtered_callbacks
return removed_count
# So-called internal methods (note: no set_running_or_notify_cancel()).
def set_result(self, result):
"""Mark the future done and set its result.
If the future is already done when this method is called, raises
InvalidStateError.
"""
if self._state != _PENDING:
raise InvalidStateError('{}: {!r}'.format(self._state, self))
self._result = result
self._state = _FINISHED
self._schedule_callbacks()
def set_exception(self, exception):
"""Mark the future done and set an exception.
If the future is already done when this method is called, raises
InvalidStateError.
"""
if self._state != _PENDING:
raise InvalidStateError('{}: {!r}'.format(self._state, self))
if isinstance(exception, type):
exception = exception()
self._exception = exception
self._state = _FINISHED
self._schedule_callbacks()
if compat.PY34:
self._log_traceback = True
else:
self._tb_logger = _TracebackLogger(self, exception)
# Arrange for the logger to be activated after all callbacks
# have had a chance to call result() or exception().
self._loop.call_soon(self._tb_logger.activate)
def __iter__(self):
if not self.done():
self._blocking = True
yield self # This tells Task to wait for completion.
assert self.done(), "yield from wasn't used with future"
return self.result() # May raise too.
if compat.PY35:
__await__ = __iter__ # make compatible with 'await' expression
def _set_result_unless_cancelled(fut, result):
"""Helper setting the result only if the future was not cancelled."""
if fut.cancelled():
return
fut.set_result(result)
def _set_concurrent_future_state(concurrent, source):
"""Copy state from a future to a concurrent.futures.Future."""
assert source.done()
if source.cancelled():
concurrent.cancel()
if not concurrent.set_running_or_notify_cancel():
return
exception = source.exception()
if exception is not None:
concurrent.set_exception(exception)
else:
result = source.result()
concurrent.set_result(result)
def _copy_future_state(source, dest):
"""Internal helper to copy state from another Future.
The other Future may be a concurrent.futures.Future.
"""
assert source.done()
if dest.cancelled():
return
assert not dest.done()
if source.cancelled():
dest.cancel()
else:
exception = source.exception()
if exception is not None:
dest.set_exception(exception)
else:
result = source.result()
dest.set_result(result)
def _chain_future(source, destination):
"""Chain two futures so that when one completes, so does the other.
The result (or exception) of source will be copied to destination.
If destination is cancelled, source gets cancelled too.
Compatible with both asyncio.Future and concurrent.futures.Future.
"""
if not isinstance(source, (Future, concurrent.futures.Future)):
raise TypeError('A future is required for source argument')
if not isinstance(destination, (Future, concurrent.futures.Future)):
raise TypeError('A future is required for destination argument')
source_loop = source._loop if isinstance(source, Future) else None
dest_loop = destination._loop if isinstance(destination, Future) else None
def _set_state(future, other):
if isinstance(future, Future):
_copy_future_state(other, future)
else:
_set_concurrent_future_state(future, other)
def _call_check_cancel(destination):
if destination.cancelled():
if source_loop is None or source_loop is dest_loop:
source.cancel()
else:
source_loop.call_soon_threadsafe(source.cancel)
def _call_set_state(source):
if dest_loop is None or dest_loop is source_loop:
_set_state(destination, source)
else:
dest_loop.call_soon_threadsafe(_set_state, destination, source)
destination.add_done_callback(_call_check_cancel)
source.add_done_callback(_call_set_state)
def wrap_future(future, *, loop=None):
"""Wrap concurrent.futures.Future object."""
if isinstance(future, Future):
return future
assert isinstance(future, concurrent.futures.Future), \
'concurrent.futures.Future is expected, got {!r}'.format(future)
new_future = Future(loop=loop)
_chain_future(future, new_future)
return new_future
"""Synchronization primitives."""
__all__ = ['Lock', 'Event', 'Condition', 'Semaphore', 'BoundedSemaphore']
import collections
from . import compat
from . import events
from . import futures
from .coroutines import coroutine
class _ContextManager:
"""Context manager.
This enables the following idiom for acquiring and releasing a
lock around a block:
with (yield from lock):
<block>
while failing loudly when accidentally using:
with lock:
<block>
"""
def __init__(self, lock):
self._lock = lock
def __enter__(self):
# We have no use for the "as ..." clause in the with
# statement for locks.
return None
def __exit__(self, *args):
try:
self._lock.release()
finally:
self._lock = None # Crudely prevent reuse.
class _ContextManagerMixin:
def __enter__(self):
raise RuntimeError(
'"yield from" should be used as context manager expression')
def __exit__(self, *args):
# This must exist because __enter__ exists, even though that
# always raises; that's how the with-statement works.
pass
@coroutine
def __iter__(self):
# This is not a coroutine. It is meant to enable the idiom:
#
# with (yield from lock):
# <block>
#
# as an alternative to:
#
# yield from lock.acquire()
# try:
# <block>
# finally:
# lock.release()
yield from self.acquire()
return _ContextManager(self)
if compat.PY35:
def __await__(self):
# To make "with await lock" work.
yield from self.acquire()
return _ContextManager(self)
@coroutine
def __aenter__(self):
yield from self.acquire()
# We have no use for the "as ..." clause in the with
# statement for locks.
return None
@coroutine
def __aexit__(self, exc_type, exc, tb):
self.release()
class Lock(_ContextManagerMixin):
"""Primitive lock objects.
A primitive lock is a synchronization primitive that is not owned
by a particular coroutine when locked. A primitive lock is in one
of two states, 'locked' or 'unlocked'.
It is created in the unlocked state. It has two basic methods,
acquire() and release(). When the state is unlocked, acquire()
changes the state to locked and returns immediately. When the
state is locked, acquire() blocks until a call to release() in
another coroutine changes it to unlocked, then the acquire() call
resets it to locked and returns. The release() method should only
be called in the locked state; it changes the state to unlocked
and returns immediately. If an attempt is made to release an
unlocked lock, a RuntimeError will be raised.
When more than one coroutine is blocked in acquire() waiting for
the state to turn to unlocked, only one coroutine proceeds when a
release() call resets the state to unlocked; first coroutine which
is blocked in acquire() is being processed.
acquire() is a coroutine and should be called with 'yield from'.
Locks also support the context management protocol. '(yield from lock)'
should be used as context manager expression.
Usage:
lock = Lock()
...
yield from lock
try:
...
finally:
lock.release()
Context manager usage:
lock = Lock()
...
with (yield from lock):
...
Lock objects can be tested for locking state:
if not lock.locked():
yield from lock
else:
# lock is acquired
...
"""
def __init__(self, *, loop=None):
self._waiters = collections.deque()
self._locked = False
if loop is not None:
self._loop = loop
else:
self._loop = events.get_event_loop()
def __repr__(self):
res = super().__repr__()
extra = 'locked' if self._locked else 'unlocked'
if self._waiters:
extra = '{},waiters:{}'.format(extra, len(self._waiters))
return '<{} [{}]>'.format(res[1:-1], extra)
def locked(self):
"""Return True if lock is acquired."""
return self._locked
@coroutine
def acquire(self):
"""Acquire a lock.
This method blocks until the lock is unlocked, then sets it to
locked and returns True.
"""
if not self._waiters and not self._locked:
self._locked = True
return True
fut = futures.Future(loop=self._loop)
self._waiters.append(fut)
try:
yield from fut
self._locked = True
return True
finally:
self._waiters.remove(fut)
def release(self):
"""Release a lock.
When the lock is locked, reset it to unlocked, and return.
If any other coroutines are blocked waiting for the lock to become
unlocked, allow exactly one of them to proceed.
When invoked on an unlocked lock, a RuntimeError is raised.
There is no return value.
"""
if self._locked:
self._locked = False
# Wake up the first waiter who isn't cancelled.
for fut in self._waiters:
if not fut.done():
fut.set_result(True)
break
else:
raise RuntimeError('Lock is not acquired.')
class Event:
"""Asynchronous equivalent to threading.Event.
Class implementing event objects. An event manages a flag that can be set
to true with the set() method and reset to false with the clear() method.
The wait() method blocks until the flag is true. The flag is initially
false.
"""
def __init__(self, *, loop=None):
self._waiters = collections.deque()
self._value = False
if loop is not None:
self._loop = loop
else:
self._loop = events.get_event_loop()
def __repr__(self):
res = super().__repr__()
extra = 'set' if self._value else 'unset'
if self._waiters:
extra = '{},waiters:{}'.format(extra, len(self._waiters))
return '<{} [{}]>'.format(res[1:-1], extra)
def is_set(self):
"""Return True if and only if the internal flag is true."""
return self._value
def set(self):
"""Set the internal flag to true. All coroutines waiting for it to
become true are awakened. Coroutine that call wait() once the flag is
true will not block at all.
"""
if not self._value:
self._value = True
for fut in self._waiters:
if not fut.done():
fut.set_result(True)
def clear(self):
"""Reset the internal flag to false. Subsequently, coroutines calling
wait() will block until set() is called to set the internal flag
to true again."""
self._value = False
@coroutine
def wait(self):
"""Block until the internal flag is true.
If the internal flag is true on entry, return True
immediately. Otherwise, block until another coroutine calls
set() to set the flag to true, then return True.
"""
if self._value:
return True
fut = futures.Future(loop=self._loop)
self._waiters.append(fut)
try:
yield from fut
return True
finally:
self._waiters.remove(fut)
class Condition(_ContextManagerMixin):
"""Asynchronous equivalent to threading.Condition.
This class implements condition variable objects. A condition variable
allows one or more coroutines to wait until they are notified by another
coroutine.
A new Lock object is created and used as the underlying lock.
"""
def __init__(self, lock=None, *, loop=None):
if loop is not None:
self._loop = loop
else:
self._loop = events.get_event_loop()
if lock is None:
lock = Lock(loop=self._loop)
elif lock._loop is not self._loop:
raise ValueError("loop argument must agree with lock")
self._lock = lock
# Export the lock's locked(), acquire() and release() methods.
self.locked = lock.locked
self.acquire = lock.acquire
self.release = lock.release
self._waiters = collections.deque()
def __repr__(self):
res = super().__repr__()
extra = 'locked' if self.locked() else 'unlocked'
if self._waiters:
extra = '{},waiters:{}'.format(extra, len(self._waiters))
return '<{} [{}]>'.format(res[1:-1], extra)
@coroutine
def wait(self):
"""Wait until notified.
If the calling coroutine has not acquired the lock when this
method is called, a RuntimeError is raised.
This method releases the underlying lock, and then blocks
until it is awakened by a notify() or notify_all() call for
the same condition variable in another coroutine. Once
awakened, it re-acquires the lock and returns True.
"""
if not self.locked():
raise RuntimeError('cannot wait on un-acquired lock')
self.release()
try:
fut = futures.Future(loop=self._loop)
self._waiters.append(fut)
try:
yield from fut
return True
finally:
self._waiters.remove(fut)
finally:
yield from self.acquire()
@coroutine
def wait_for(self, predicate):
"""Wait until a predicate becomes true.
The predicate should be a callable which result will be
interpreted as a boolean value. The final predicate value is
the return value.
"""
result = predicate()
while not result:
yield from self.wait()
result = predicate()
return result
def notify(self, n=1):
"""By default, wake up one coroutine waiting on this condition, if any.
If the calling coroutine has not acquired the lock when this method
is called, a RuntimeError is raised.
This method wakes up at most n of the coroutines waiting for the
condition variable; it is a no-op if no coroutines are waiting.
Note: an awakened coroutine does not actually return from its
wait() call until it can reacquire the lock. Since notify() does
not release the lock, its caller should.
"""
if not self.locked():
raise RuntimeError('cannot notify on un-acquired lock')
idx = 0
for fut in self._waiters:
if idx >= n:
break
if not fut.done():
idx += 1
fut.set_result(False)
def notify_all(self):
"""Wake up all threads waiting on this condition. This method acts
like notify(), but wakes up all waiting threads instead of one. If the
calling thread has not acquired the lock when this method is called,
a RuntimeError is raised.
"""
self.notify(len(self._waiters))
class Semaphore(_ContextManagerMixin):
"""A Semaphore implementation.
A semaphore manages an internal counter which is decremented by each
acquire() call and incremented by each release() call. The counter
can never go below zero; when acquire() finds that it is zero, it blocks,
waiting until some other thread calls release().
Semaphores also support the context management protocol.
The optional argument gives the initial value for the internal
counter; it defaults to 1. If the value given is less than 0,
ValueError is raised.
"""
def __init__(self, value=1, *, loop=None):
if value < 0:
raise ValueError("Semaphore initial value must be >= 0")
self._value = value
self._waiters = collections.deque()
if loop is not None:
self._loop = loop
else:
self._loop = events.get_event_loop()
def __repr__(self):
res = super().__repr__()
extra = 'locked' if self.locked() else 'unlocked,value:{}'.format(
self._value)
if self._waiters:
extra = '{},waiters:{}'.format(extra, len(self._waiters))
return '<{} [{}]>'.format(res[1:-1], extra)
def _wake_up_next(self):
while self._waiters:
waiter = self._waiters.popleft()
if not waiter.done():
waiter.set_result(None)
return
def locked(self):
"""Returns True if semaphore can not be acquired immediately."""
return self._value == 0
@coroutine
def acquire(self):
"""Acquire a semaphore.
If the internal counter is larger than zero on entry,
decrement it by one and return True immediately. If it is
zero on entry, block, waiting until some other coroutine has
called release() to make it larger than 0, and then return
True.
"""
while self._value <= 0:
fut = futures.Future(loop=self._loop)
self._waiters.append(fut)
try:
yield from fut
except:
# See the similar code in Queue.get.
fut.cancel()
if self._value > 0 and not fut.cancelled():
self._wake_up_next()
raise
self._value -= 1
return True
def release(self):
"""Release a semaphore, incrementing the internal counter by one.
When it was zero on entry and another coroutine is waiting for it to
become larger than zero again, wake up that coroutine.
"""
self._value += 1
self._wake_up_next()
class BoundedSemaphore(Semaphore):
"""A bounded semaphore implementation.
This raises ValueError in release() if it would increase the value
above the initial value.
"""
def __init__(self, value=1, *, loop=None):
self._bound_value = value
super().__init__(value, loop=loop)
def release(self):
if self._value >= self._bound_value:
raise ValueError('BoundedSemaphore released too many times')
super().release()
"""Logging configuration."""
import logging
# Name the logger after the package.
logger = logging.getLogger(__package__)
"""Event loop using a proactor and related classes.
A proactor is a "notify-on-completion" multiplexer. Currently a
proactor is only implemented on Windows with IOCP.
"""
__all__ = ['BaseProactorEventLoop']
import socket
import warnings
from . import base_events
from . import compat
from . import constants
from . import futures
from . import sslproto
from . import transports
from .log import logger
class _ProactorBasePipeTransport(transports._FlowControlMixin,
transports.BaseTransport):
"""Base class for pipe and socket transports."""
def __init__(self, loop, sock, protocol, waiter=None,
extra=None, server=None):
super().__init__(extra, loop)
self._set_extra(sock)
self._sock = sock
self._protocol = protocol
self._server = server
self._buffer = None # None or bytearray.
self._read_fut = None
self._write_fut = None
self._pending_write = 0
self._conn_lost = 0
self._closing = False # Set when close() called.
self._eof_written = False
if self._server is not None:
self._server._attach()
self._loop.call_soon(self._protocol.connection_made, self)
if waiter is not None:
# only wake up the waiter when connection_made() has been called
self._loop.call_soon(futures._set_result_unless_cancelled,
waiter, None)
def __repr__(self):
info = [self.__class__.__name__]
if self._sock is None:
info.append('closed')
elif self._closing:
info.append('closing')
if self._sock is not None:
info.append('fd=%s' % self._sock.fileno())
if self._read_fut is not None:
info.append('read=%s' % self._read_fut)
if self._write_fut is not None:
info.append("write=%r" % self._write_fut)
if self._buffer:
bufsize = len(self._buffer)
info.append('write_bufsize=%s' % bufsize)
if self._eof_written:
info.append('EOF written')
return '<%s>' % ' '.join(info)
def _set_extra(self, sock):
self._extra['pipe'] = sock
def is_closing(self):
return self._closing
def close(self):
if self._closing:
return
self._closing = True
self._conn_lost += 1
if not self._buffer and self._write_fut is None:
self._loop.call_soon(self._call_connection_lost, None)
if self._read_fut is not None:
self._read_fut.cancel()
self._read_fut = None
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if self._sock is not None:
warnings.warn("unclosed transport %r" % self, ResourceWarning)
self.close()
def _fatal_error(self, exc, message='Fatal error on pipe transport'):
if isinstance(exc, (BrokenPipeError, ConnectionResetError)):
if self._loop.get_debug():
logger.debug("%r: %s", self, message, exc_info=True)
else:
self._loop.call_exception_handler({
'message': message,
'exception': exc,
'transport': self,
'protocol': self._protocol,
})
self._force_close(exc)
def _force_close(self, exc):
if self._closing:
return
self._closing = True
self._conn_lost += 1
if self._write_fut:
self._write_fut.cancel()
self._write_fut = None
if self._read_fut:
self._read_fut.cancel()
self._read_fut = None
self._pending_write = 0
self._buffer = None
self._loop.call_soon(self._call_connection_lost, exc)
def _call_connection_lost(self, exc):
try:
self._protocol.connection_lost(exc)
finally:
# XXX If there is a pending overlapped read on the other
# end then it may fail with ERROR_NETNAME_DELETED if we
# just close our end. First calling shutdown() seems to
# cure it, but maybe using DisconnectEx() would be better.
if hasattr(self._sock, 'shutdown'):
self._sock.shutdown(socket.SHUT_RDWR)
self._sock.close()
self._sock = None
server = self._server
if server is not None:
server._detach()
self._server = None
def get_write_buffer_size(self):
size = self._pending_write
if self._buffer is not None:
size += len(self._buffer)
return size
class _ProactorReadPipeTransport(_ProactorBasePipeTransport,
transports.ReadTransport):
"""Transport for read pipes."""
def __init__(self, loop, sock, protocol, waiter=None,
extra=None, server=None):
super().__init__(loop, sock, protocol, waiter, extra, server)
self._paused = False
self._loop.call_soon(self._loop_reading)
def pause_reading(self):
if self._closing:
raise RuntimeError('Cannot pause_reading() when closing')
if self._paused:
raise RuntimeError('Already paused')
self._paused = True
if self._loop.get_debug():
logger.debug("%r pauses reading", self)
def resume_reading(self):
if not self._paused:
raise RuntimeError('Not paused')
self._paused = False
if self._closing:
return
self._loop.call_soon(self._loop_reading, self._read_fut)
if self._loop.get_debug():
logger.debug("%r resumes reading", self)
def _loop_reading(self, fut=None):
if self._paused:
return
data = None
try:
if fut is not None:
assert self._read_fut is fut or (self._read_fut is None and
self._closing)
self._read_fut = None
data = fut.result() # deliver data later in "finally" clause
if self._closing:
# since close() has been called we ignore any read data
data = None
return
if data == b'':
# we got end-of-file so no need to reschedule a new read
return
# reschedule a new read
self._read_fut = self._loop._proactor.recv(self._sock, 4096)
except ConnectionAbortedError as exc:
if not self._closing:
self._fatal_error(exc, 'Fatal read error on pipe transport')
elif self._loop.get_debug():
logger.debug("Read error on pipe transport while closing",
exc_info=True)
except ConnectionResetError as exc:
self._force_close(exc)
except OSError as exc:
self._fatal_error(exc, 'Fatal read error on pipe transport')
except futures.CancelledError:
if not self._closing:
raise
else:
self._read_fut.add_done_callback(self._loop_reading)
finally:
if data:
self._protocol.data_received(data)
elif data is not None:
if self._loop.get_debug():
logger.debug("%r received EOF", self)
keep_open = self._protocol.eof_received()
if not keep_open:
self.close()
class _ProactorBaseWritePipeTransport(_ProactorBasePipeTransport,
transports.WriteTransport):
"""Transport for write pipes."""
def write(self, data):
if not isinstance(data, (bytes, bytearray, memoryview)):
raise TypeError('data argument must be byte-ish (%r)',
type(data))
if self._eof_written:
raise RuntimeError('write_eof() already called')
if not data:
return
if self._conn_lost:
if self._conn_lost >= constants.LOG_THRESHOLD_FOR_CONNLOST_WRITES:
logger.warning('socket.send() raised exception.')
self._conn_lost += 1
return
# Observable states:
# 1. IDLE: _write_fut and _buffer both None
# 2. WRITING: _write_fut set; _buffer None
# 3. BACKED UP: _write_fut set; _buffer a bytearray
# We always copy the data, so the caller can't modify it
# while we're still waiting for the I/O to happen.
if self._write_fut is None: # IDLE -> WRITING
assert self._buffer is None
# Pass a copy, except if it's already immutable.
self._loop_writing(data=bytes(data))
elif not self._buffer: # WRITING -> BACKED UP
# Make a mutable copy which we can extend.
self._buffer = bytearray(data)
self._maybe_pause_protocol()
else: # BACKED UP
# Append to buffer (also copies).
self._buffer.extend(data)
self._maybe_pause_protocol()
def _loop_writing(self, f=None, data=None):
try:
assert f is self._write_fut
self._write_fut = None
self._pending_write = 0
if f:
f.result()
if data is None:
data = self._buffer
self._buffer = None
if not data:
if self._closing:
self._loop.call_soon(self._call_connection_lost, None)
if self._eof_written:
self._sock.shutdown(socket.SHUT_WR)
# Now that we've reduced the buffer size, tell the
# protocol to resume writing if it was paused. Note that
# we do this last since the callback is called immediately
# and it may add more data to the buffer (even causing the
# protocol to be paused again).
self._maybe_resume_protocol()
else:
self._write_fut = self._loop._proactor.send(self._sock, data)
if not self._write_fut.done():
assert self._pending_write == 0
self._pending_write = len(data)
self._write_fut.add_done_callback(self._loop_writing)
self._maybe_pause_protocol()
else:
self._write_fut.add_done_callback(self._loop_writing)
except ConnectionResetError as exc:
self._force_close(exc)
except OSError as exc:
self._fatal_error(exc, 'Fatal write error on pipe transport')
def can_write_eof(self):
return True
def write_eof(self):
self.close()
def abort(self):
self._force_close(None)
class _ProactorWritePipeTransport(_ProactorBaseWritePipeTransport):
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
self._read_fut = self._loop._proactor.recv(self._sock, 16)
self._read_fut.add_done_callback(self._pipe_closed)
def _pipe_closed(self, fut):
if fut.cancelled():
# the transport has been closed
return
assert fut.result() == b''
if self._closing:
assert self._read_fut is None
return
assert fut is self._read_fut, (fut, self._read_fut)
self._read_fut = None
if self._write_fut is not None:
self._force_close(BrokenPipeError())
else:
self.close()
class _ProactorDuplexPipeTransport(_ProactorReadPipeTransport,
_ProactorBaseWritePipeTransport,
transports.Transport):
"""Transport for duplex pipes."""
def can_write_eof(self):
return False
def write_eof(self):
raise NotImplementedError
class _ProactorSocketTransport(_ProactorReadPipeTransport,
_ProactorBaseWritePipeTransport,
transports.Transport):
"""Transport for connected sockets."""
def _set_extra(self, sock):
self._extra['socket'] = sock
try:
self._extra['sockname'] = sock.getsockname()
except (socket.error, AttributeError):
if self._loop.get_debug():
logger.warning("getsockname() failed on %r",
sock, exc_info=True)
if 'peername' not in self._extra:
try:
self._extra['peername'] = sock.getpeername()
except (socket.error, AttributeError):
if self._loop.get_debug():
logger.warning("getpeername() failed on %r",
sock, exc_info=True)
def can_write_eof(self):
return True
def write_eof(self):
if self._closing or self._eof_written:
return
self._eof_written = True
if self._write_fut is None:
self._sock.shutdown(socket.SHUT_WR)
class BaseProactorEventLoop(base_events.BaseEventLoop):
def __init__(self, proactor):
super().__init__()
logger.debug('Using proactor: %s', proactor.__class__.__name__)
self._proactor = proactor
self._selector = proactor # convenient alias
self._self_reading_future = None
self._accept_futures = {} # socket file descriptor => Future
proactor.set_loop(self)
self._make_self_pipe()
def _make_socket_transport(self, sock, protocol, waiter=None,
extra=None, server=None):
return _ProactorSocketTransport(self, sock, protocol, waiter,
extra, server)
def _make_ssl_transport(self, rawsock, protocol, sslcontext, waiter=None,
*, server_side=False, server_hostname=None,
extra=None, server=None):
if not sslproto._is_sslproto_available():
raise NotImplementedError("Proactor event loop requires Python 3.5"
" or newer (ssl.MemoryBIO) to support "
"SSL")
ssl_protocol = sslproto.SSLProtocol(self, protocol, sslcontext, waiter,
server_side, server_hostname)
_ProactorSocketTransport(self, rawsock, ssl_protocol,
extra=extra, server=server)
return ssl_protocol._app_transport
def _make_duplex_pipe_transport(self, sock, protocol, waiter=None,
extra=None):
return _ProactorDuplexPipeTransport(self,
sock, protocol, waiter, extra)
def _make_read_pipe_transport(self, sock, protocol, waiter=None,
extra=None):
return _ProactorReadPipeTransport(self, sock, protocol, waiter, extra)
def _make_write_pipe_transport(self, sock, protocol, waiter=None,
extra=None):
# We want connection_lost() to be called when other end closes
return _ProactorWritePipeTransport(self,
sock, protocol, waiter, extra)
def close(self):
if self.is_running():
raise RuntimeError("Cannot close a running event loop")
if self.is_closed():
return
# Call these methods before closing the event loop (before calling
# BaseEventLoop.close), because they can schedule callbacks with
# call_soon(), which is forbidden when the event loop is closed.
self._stop_accept_futures()
self._close_self_pipe()
self._proactor.close()
self._proactor = None
self._selector = None
# Close the event loop
super().close()
def sock_recv(self, sock, n):
return self._proactor.recv(sock, n)
def sock_sendall(self, sock, data):
return self._proactor.send(sock, data)
def sock_connect(self, sock, address):
try:
base_events._check_resolved_address(sock, address)
except ValueError as err:
fut = futures.Future(loop=self)
fut.set_exception(err)
return fut
else:
return self._proactor.connect(sock, address)
def sock_accept(self, sock):
return self._proactor.accept(sock)
def _socketpair(self):
raise NotImplementedError
def _close_self_pipe(self):
if self._self_reading_future is not None:
self._self_reading_future.cancel()
self._self_reading_future = None
self._ssock.close()
self._ssock = None
self._csock.close()
self._csock = None
self._internal_fds -= 1
def _make_self_pipe(self):
# A self-socket, really. :-)
self._ssock, self._csock = self._socketpair()
self._ssock.setblocking(False)
self._csock.setblocking(False)
self._internal_fds += 1
self.call_soon(self._loop_self_reading)
def _loop_self_reading(self, f=None):
try:
if f is not None:
f.result() # may raise
f = self._proactor.recv(self._ssock, 4096)
except futures.CancelledError:
# _close_self_pipe() has been called, stop waiting for data
return
except Exception as exc:
self.call_exception_handler({
'message': 'Error on reading from the event loop self pipe',
'exception': exc,
'loop': self,
})
else:
self._self_reading_future = f
f.add_done_callback(self._loop_self_reading)
def _write_to_self(self):
self._csock.send(b'\0')
def _start_serving(self, protocol_factory, sock,
sslcontext=None, server=None):
def loop(f=None):
try:
if f is not None:
conn, addr = f.result()
if self._debug:
logger.debug("%r got a new connection from %r: %r",
server, addr, conn)
protocol = protocol_factory()
if sslcontext is not None:
self._make_ssl_transport(
conn, protocol, sslcontext, server_side=True,
extra={'peername': addr}, server=server)
else:
self._make_socket_transport(
conn, protocol,
extra={'peername': addr}, server=server)
if self.is_closed():
return
f = self._proactor.accept(sock)
except OSError as exc:
if sock.fileno() != -1:
self.call_exception_handler({
'message': 'Accept failed on a socket',
'exception': exc,
'socket': sock,
})
sock.close()
elif self._debug:
logger.debug("Accept failed on socket %r",
sock, exc_info=True)
except futures.CancelledError:
sock.close()
else:
self._accept_futures[sock.fileno()] = f
f.add_done_callback(loop)
self.call_soon(loop)
def _process_events(self, event_list):
# Events are processed in the IocpProactor._poll() method
pass
def _stop_accept_futures(self):
for future in self._accept_futures.values():
future.cancel()
self._accept_futures.clear()
def _stop_serving(self, sock):
self._stop_accept_futures()
self._proactor._stop_serving(sock)
sock.close()
"""Abstract Protocol class."""
__all__ = ['BaseProtocol', 'Protocol', 'DatagramProtocol',
'SubprocessProtocol']
class BaseProtocol:
"""Common base class for protocol interfaces.
Usually user implements protocols that derived from BaseProtocol
like Protocol or ProcessProtocol.
The only case when BaseProtocol should be implemented directly is
write-only transport like write pipe
"""
def connection_made(self, transport):
"""Called when a connection is made.
The argument is the transport representing the pipe connection.
To receive data, wait for data_received() calls.
When the connection is closed, connection_lost() is called.
"""
def connection_lost(self, exc):
"""Called when the connection is lost or closed.
The argument is an exception object or None (the latter
meaning a regular EOF is received or the connection was
aborted or closed).
"""
def pause_writing(self):
"""Called when the transport's buffer goes over the high-water mark.
Pause and resume calls are paired -- pause_writing() is called
once when the buffer goes strictly over the high-water mark
(even if subsequent writes increases the buffer size even
more), and eventually resume_writing() is called once when the
buffer size reaches the low-water mark.
Note that if the buffer size equals the high-water mark,
pause_writing() is not called -- it must go strictly over.
Conversely, resume_writing() is called when the buffer size is
equal or lower than the low-water mark. These end conditions
are important to ensure that things go as expected when either
mark is zero.
NOTE: This is the only Protocol callback that is not called
through EventLoop.call_soon() -- if it were, it would have no
effect when it's most needed (when the app keeps writing
without yielding until pause_writing() is called).
"""
def resume_writing(self):
"""Called when the transport's buffer drains below the low-water mark.
See pause_writing() for details.
"""
class Protocol(BaseProtocol):
"""Interface for stream protocol.
The user should implement this interface. They can inherit from
this class but don't need to. The implementations here do
nothing (they don't raise exceptions).
When the user wants to requests a transport, they pass a protocol
factory to a utility function (e.g., EventLoop.create_connection()).
When the connection is made successfully, connection_made() is
called with a suitable transport object. Then data_received()
will be called 0 or more times with data (bytes) received from the
transport; finally, connection_lost() will be called exactly once
with either an exception object or None as an argument.
State machine of calls:
start -> CM [-> DR*] [-> ER?] -> CL -> end
* CM: connection_made()
* DR: data_received()
* ER: eof_received()
* CL: connection_lost()
"""
def data_received(self, data):
"""Called when some data is received.
The argument is a bytes object.
"""
def eof_received(self):
"""Called when the other end calls write_eof() or equivalent.
If this returns a false value (including None), the transport
will close itself. If it returns a true value, closing the
transport is up to the protocol.
"""
class DatagramProtocol(BaseProtocol):
"""Interface for datagram protocol."""
def datagram_received(self, data, addr):
"""Called when some datagram is received."""
def error_received(self, exc):
"""Called when a send or receive operation raises an OSError.
(Other than BlockingIOError or InterruptedError.)
"""
class SubprocessProtocol(BaseProtocol):
"""Interface for protocol for subprocess calls."""
def pipe_data_received(self, fd, data):
"""Called when the subprocess writes data into stdout/stderr pipe.
fd is int file descriptor.
data is bytes object.
"""
def pipe_connection_lost(self, fd, exc):
"""Called when a file descriptor associated with the child process is
closed.
fd is the int file descriptor that was closed.
"""
def process_exited(self):
"""Called when subprocess has exited."""
"""Queues"""
__all__ = ['Queue', 'PriorityQueue', 'LifoQueue', 'QueueFull', 'QueueEmpty']
import collections
import heapq
from . import compat
from . import events
from . import futures
from . import locks
from .coroutines import coroutine
class QueueEmpty(Exception):
"""Exception raised when Queue.get_nowait() is called on a Queue object
which is empty.
"""
pass
class QueueFull(Exception):
"""Exception raised when the Queue.put_nowait() method is called on a Queue
object which is full.
"""
pass
class Queue:
"""A queue, useful for coordinating producer and consumer coroutines.
If maxsize is less than or equal to zero, the queue size is infinite. If it
is an integer greater than 0, then "yield from put()" will block when the
queue reaches maxsize, until an item is removed by get().
Unlike the standard library Queue, you can reliably know this Queue's size
with qsize(), since your single-threaded asyncio application won't be
interrupted between calling qsize() and doing an operation on the Queue.
"""
def __init__(self, maxsize=0, *, loop=None):
if loop is None:
self._loop = events.get_event_loop()
else:
self._loop = loop
self._maxsize = maxsize
# Futures.
self._getters = collections.deque()
# Futures.
self._putters = collections.deque()
self._unfinished_tasks = 0
self._finished = locks.Event(loop=self._loop)
self._finished.set()
self._init(maxsize)
# These three are overridable in subclasses.
def _init(self, maxsize):
self._queue = collections.deque()
def _get(self):
return self._queue.popleft()
def _put(self, item):
self._queue.append(item)
# End of the overridable methods.
def _wakeup_next(self, waiters):
# Wake up the next waiter (if any) that isn't cancelled.
while waiters:
waiter = waiters.popleft()
if not waiter.done():
waiter.set_result(None)
break
def __repr__(self):
return '<{} at {:#x} {}>'.format(
type(self).__name__, id(self), self._format())
def __str__(self):
return '<{} {}>'.format(type(self).__name__, self._format())
def _format(self):
result = 'maxsize={!r}'.format(self._maxsize)
if getattr(self, '_queue', None):
result += ' _queue={!r}'.format(list(self._queue))
if self._getters:
result += ' _getters[{}]'.format(len(self._getters))
if self._putters:
result += ' _putters[{}]'.format(len(self._putters))
if self._unfinished_tasks:
result += ' tasks={}'.format(self._unfinished_tasks)
return result
def qsize(self):
"""Number of items in the queue."""
return len(self._queue)
@property
def maxsize(self):
"""Number of items allowed in the queue."""
return self._maxsize
def empty(self):
"""Return True if the queue is empty, False otherwise."""
return not self._queue
def full(self):
"""Return True if there are maxsize items in the queue.
Note: if the Queue was initialized with maxsize=0 (the default),
then full() is never True.
"""
if self._maxsize <= 0:
return False
else:
return self.qsize() >= self._maxsize
@coroutine
def put(self, item):
"""Put an item into the queue.
Put an item into the queue. If the queue is full, wait until a free
slot is available before adding item.
This method is a coroutine.
"""
while self.full():
putter = futures.Future(loop=self._loop)
self._putters.append(putter)
try:
yield from putter
except:
putter.cancel() # Just in case putter is not done yet.
if not self.full() and not putter.cancelled():
# We were woken up by get_nowait(), but can't take
# the call. Wake up the next in line.
self._wakeup_next(self._putters)
raise
return self.put_nowait(item)
def put_nowait(self, item):
"""Put an item into the queue without blocking.
If no free slot is immediately available, raise QueueFull.
"""
if self.full():
raise QueueFull
self._put(item)
self._unfinished_tasks += 1
self._finished.clear()
self._wakeup_next(self._getters)
@coroutine
def get(self):
"""Remove and return an item from the queue.
If queue is empty, wait until an item is available.
This method is a coroutine.
"""
while self.empty():
getter = futures.Future(loop=self._loop)
self._getters.append(getter)
try:
yield from getter
except:
getter.cancel() # Just in case getter is not done yet.
if not self.empty() and not getter.cancelled():
# We were woken up by put_nowait(), but can't take
# the call. Wake up the next in line.
self._wakeup_next(self._getters)
raise
return self.get_nowait()
def get_nowait(self):
"""Remove and return an item from the queue.
Return an item if one is immediately available, else raise QueueEmpty.
"""
if self.empty():
raise QueueEmpty
item = self._get()
self._wakeup_next(self._putters)
return item
def task_done(self):
"""Indicate that a formerly enqueued task is complete.
Used by queue consumers. For each get() used to fetch a task,
a subsequent call to task_done() tells the queue that the processing
on the task is complete.
If a join() is currently blocking, it will resume when all items have
been processed (meaning that a task_done() call was received for every
item that had been put() into the queue).
Raises ValueError if called more times than there were items placed in
the queue.
"""
if self._unfinished_tasks <= 0:
raise ValueError('task_done() called too many times')
self._unfinished_tasks -= 1
if self._unfinished_tasks == 0:
self._finished.set()
@coroutine
def join(self):
"""Block until all items in the queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer calls task_done() to
indicate that the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
"""
if self._unfinished_tasks > 0:
yield from self._finished.wait()
class PriorityQueue(Queue):
"""A subclass of Queue; retrieves entries in priority order (lowest first).
Entries are typically tuples of the form: (priority number, data).
"""
def _init(self, maxsize):
self._queue = []
def _put(self, item, heappush=heapq.heappush):
heappush(self._queue, item)
def _get(self, heappop=heapq.heappop):
return heappop(self._queue)
class LifoQueue(Queue):
"""A subclass of Queue that retrieves most recently added entries first."""
def _init(self, maxsize):
self._queue = []
def _put(self, item):
self._queue.append(item)
def _get(self):
return self._queue.pop()
if not compat.PY35:
JoinableQueue = Queue
"""Deprecated alias for Queue."""
__all__.append('JoinableQueue')
"""Event loop using a selector and related classes.
A selector is a "notify-when-ready" multiplexer. For a subclass which
also includes support for signal handling, see the unix_events sub-module.
"""
__all__ = ['BaseSelectorEventLoop']
import collections
import errno
import functools
import socket
import warnings
try:
import ssl
except ImportError: # pragma: no cover
ssl = None
from . import base_events
from . import compat
from . import constants
from . import events
from . import futures
from . import selectors
from . import transports
from . import sslproto
from .coroutines import coroutine
from .log import logger
def _test_selector_event(selector, fd, event):
# Test if the selector is monitoring 'event' events
# for the file descriptor 'fd'.
try:
key = selector.get_key(fd)
except KeyError:
return False
else:
return bool(key.events & event)
class BaseSelectorEventLoop(base_events.BaseEventLoop):
"""Selector event loop.
See events.EventLoop for API specification.
"""
def __init__(self, selector=None):
super().__init__()
if selector is None:
selector = selectors.DefaultSelector()
logger.debug('Using selector: %s', selector.__class__.__name__)
self._selector = selector
self._make_self_pipe()
def _make_socket_transport(self, sock, protocol, waiter=None, *,
extra=None, server=None):
return _SelectorSocketTransport(self, sock, protocol, waiter,
extra, server)
def _make_ssl_transport(self, rawsock, protocol, sslcontext, waiter=None,
*, server_side=False, server_hostname=None,
extra=None, server=None):
if not sslproto._is_sslproto_available():
return self._make_legacy_ssl_transport(
rawsock, protocol, sslcontext, waiter,
server_side=server_side, server_hostname=server_hostname,
extra=extra, server=server)
ssl_protocol = sslproto.SSLProtocol(self, protocol, sslcontext, waiter,
server_side, server_hostname)
_SelectorSocketTransport(self, rawsock, ssl_protocol,
extra=extra, server=server)
return ssl_protocol._app_transport
def _make_legacy_ssl_transport(self, rawsock, protocol, sslcontext,
waiter, *,
server_side=False, server_hostname=None,
extra=None, server=None):
# Use the legacy API: SSL_write, SSL_read, etc. The legacy API is used
# on Python 3.4 and older, when ssl.MemoryBIO is not available.
return _SelectorSslTransport(
self, rawsock, protocol, sslcontext, waiter,
server_side, server_hostname, extra, server)
def _make_datagram_transport(self, sock, protocol,
address=None, waiter=None, extra=None):
return _SelectorDatagramTransport(self, sock, protocol,
address, waiter, extra)
def close(self):
if self.is_running():
raise RuntimeError("Cannot close a running event loop")
if self.is_closed():
return
self._close_self_pipe()
super().close()
if self._selector is not None:
self._selector.close()
self._selector = None
def _socketpair(self):
raise NotImplementedError
def _close_self_pipe(self):
self.remove_reader(self._ssock.fileno())
self._ssock.close()
self._ssock = None
self._csock.close()
self._csock = None
self._internal_fds -= 1
def _make_self_pipe(self):
# A self-socket, really. :-)
self._ssock, self._csock = self._socketpair()
self._ssock.setblocking(False)
self._csock.setblocking(False)
self._internal_fds += 1
self.add_reader(self._ssock.fileno(), self._read_from_self)
def _process_self_data(self, data):
pass
def _read_from_self(self):
while True:
try:
data = self._ssock.recv(4096)
if not data:
break
self._process_self_data(data)
except InterruptedError:
continue
except BlockingIOError:
break
def _write_to_self(self):
# This may be called from a different thread, possibly after
# _close_self_pipe() has been called or even while it is
# running. Guard for self._csock being None or closed. When
# a socket is closed, send() raises OSError (with errno set to
# EBADF, but let's not rely on the exact error code).
csock = self._csock
if csock is not None:
try:
csock.send(b'\0')
except OSError:
if self._debug:
logger.debug("Fail to write a null byte into the "
"self-pipe socket",
exc_info=True)
def _start_serving(self, protocol_factory, sock,
sslcontext=None, server=None):
self.add_reader(sock.fileno(), self._accept_connection,
protocol_factory, sock, sslcontext, server)
def _accept_connection(self, protocol_factory, sock,
sslcontext=None, server=None):
try:
conn, addr = sock.accept()
if self._debug:
logger.debug("%r got a new connection from %r: %r",
server, addr, conn)
conn.setblocking(False)
except (BlockingIOError, InterruptedError, ConnectionAbortedError):
pass # False alarm.
except OSError as exc:
# There's nowhere to send the error, so just log it.
if exc.errno in (errno.EMFILE, errno.ENFILE,
errno.ENOBUFS, errno.ENOMEM):
# Some platforms (e.g. Linux keep reporting the FD as
# ready, so we remove the read handler temporarily.
# We'll try again in a while.
self.call_exception_handler({
'message': 'socket.accept() out of system resource',
'exception': exc,
'socket': sock,
})
self.remove_reader(sock.fileno())
self.call_later(constants.ACCEPT_RETRY_DELAY,
self._start_serving,
protocol_factory, sock, sslcontext, server)
else:
raise # The event loop will catch, log and ignore it.
else:
extra = {'peername': addr}
accept = self._accept_connection2(protocol_factory, conn, extra,
sslcontext, server)
self.create_task(accept)
@coroutine
def _accept_connection2(self, protocol_factory, conn, extra,
sslcontext=None, server=None):
protocol = None
transport = None
try:
protocol = protocol_factory()
waiter = futures.Future(loop=self)
if sslcontext:
transport = self._make_ssl_transport(
conn, protocol, sslcontext, waiter=waiter,
server_side=True, extra=extra, server=server)
else:
transport = self._make_socket_transport(
conn, protocol, waiter=waiter, extra=extra,
server=server)
try:
yield from waiter
except:
transport.close()
raise
# It's now up to the protocol to handle the connection.
except Exception as exc:
if self._debug:
context = {
'message': ('Error on transport creation '
'for incoming connection'),
'exception': exc,
}
if protocol is not None:
context['protocol'] = protocol
if transport is not None:
context['transport'] = transport
self.call_exception_handler(context)
def add_reader(self, fd, callback, *args):
"""Add a reader callback."""
self._check_closed()
handle = events.Handle(callback, args, self)
try:
key = self._selector.get_key(fd)
except KeyError:
self._selector.register(fd, selectors.EVENT_READ,
(handle, None))
else:
mask, (reader, writer) = key.events, key.data
self._selector.modify(fd, mask | selectors.EVENT_READ,
(handle, writer))
if reader is not None:
reader.cancel()
def remove_reader(self, fd):
"""Remove a reader callback."""
if self.is_closed():
return False
try:
key = self._selector.get_key(fd)
except KeyError:
return False
else:
mask, (reader, writer) = key.events, key.data
mask &= ~selectors.EVENT_READ
if not mask:
self._selector.unregister(fd)
else:
self._selector.modify(fd, mask, (None, writer))
if reader is not None:
reader.cancel()
return True
else:
return False
def add_writer(self, fd, callback, *args):
"""Add a writer callback.."""
self._check_closed()
handle = events.Handle(callback, args, self)
try:
key = self._selector.get_key(fd)
except KeyError:
self._selector.register(fd, selectors.EVENT_WRITE,
(None, handle))
else:
mask, (reader, writer) = key.events, key.data
self._selector.modify(fd, mask | selectors.EVENT_WRITE,
(reader, handle))
if writer is not None:
writer.cancel()
def remove_writer(self, fd):
"""Remove a writer callback."""
if self.is_closed():
return False
try:
key = self._selector.get_key(fd)
except KeyError:
return False
else:
mask, (reader, writer) = key.events, key.data
# Remove both writer and connector.
mask &= ~selectors.EVENT_WRITE
if not mask:
self._selector.unregister(fd)
else:
self._selector.modify(fd, mask, (reader, None))
if writer is not None:
writer.cancel()
return True
else:
return False
def sock_recv(self, sock, n):
"""Receive data from the socket.
The return value is a bytes object representing the data received.
The maximum amount of data to be received at once is specified by
nbytes.
This method is a coroutine.
"""
if self._debug and sock.gettimeout() != 0:
raise ValueError("the socket must be non-blocking")
fut = futures.Future(loop=self)
self._sock_recv(fut, False, sock, n)
return fut
def _sock_recv(self, fut, registered, sock, n):
# _sock_recv() can add itself as an I/O callback if the operation can't
# be done immediately. Don't use it directly, call sock_recv().
fd = sock.fileno()
if registered:
# Remove the callback early. It should be rare that the
# selector says the fd is ready but the call still returns
# EAGAIN, and I am willing to take a hit in that case in
# order to simplify the common case.
self.remove_reader(fd)
if fut.cancelled():
return
try:
data = sock.recv(n)
except (BlockingIOError, InterruptedError):
self.add_reader(fd, self._sock_recv, fut, True, sock, n)
except Exception as exc:
fut.set_exception(exc)
else:
fut.set_result(data)
def sock_sendall(self, sock, data):
"""Send data to the socket.
The socket must be connected to a remote socket. This method continues
to send data from data until either all data has been sent or an
error occurs. None is returned on success. On error, an exception is
raised, and there is no way to determine how much data, if any, was
successfully processed by the receiving end of the connection.
This method is a coroutine.
"""
if self._debug and sock.gettimeout() != 0:
raise ValueError("the socket must be non-blocking")
fut = futures.Future(loop=self)
if data:
self._sock_sendall(fut, False, sock, data)
else:
fut.set_result(None)
return fut
def _sock_sendall(self, fut, registered, sock, data):
fd = sock.fileno()
if registered:
self.remove_writer(fd)
if fut.cancelled():
return
try:
n = sock.send(data)
except (BlockingIOError, InterruptedError):
n = 0
except Exception as exc:
fut.set_exception(exc)
return
if n == len(data):
fut.set_result(None)
else:
if n:
data = data[n:]
self.add_writer(fd, self._sock_sendall, fut, True, sock, data)
def sock_connect(self, sock, address):
"""Connect to a remote socket at address.
The address must be already resolved to avoid the trap of hanging the
entire event loop when the address requires doing a DNS lookup. For
example, it must be an IP address, not an hostname, for AF_INET and
AF_INET6 address families. Use getaddrinfo() to resolve the hostname
asynchronously.
This method is a coroutine.
"""
if self._debug and sock.gettimeout() != 0:
raise ValueError("the socket must be non-blocking")
fut = futures.Future(loop=self)
try:
base_events._check_resolved_address(sock, address)
except ValueError as err:
fut.set_exception(err)
else:
self._sock_connect(fut, sock, address)
return fut
def _sock_connect(self, fut, sock, address):
fd = sock.fileno()
try:
sock.connect(address)
except (BlockingIOError, InterruptedError):
# Issue #23618: When the C function connect() fails with EINTR, the
# connection runs in background. We have to wait until the socket
# becomes writable to be notified when the connection succeed or
# fails.
fut.add_done_callback(functools.partial(self._sock_connect_done,
fd))
self.add_writer(fd, self._sock_connect_cb, fut, sock, address)
except Exception as exc:
fut.set_exception(exc)
else:
fut.set_result(None)
def _sock_connect_done(self, fd, fut):
self.remove_writer(fd)
def _sock_connect_cb(self, fut, sock, address):
if fut.cancelled():
return
try:
err = sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR)
if err != 0:
# Jump to any except clause below.
raise OSError(err, 'Connect call failed %s' % (address,))
except (BlockingIOError, InterruptedError):
# socket is still registered, the callback will be retried later
pass
except Exception as exc:
fut.set_exception(exc)
else:
fut.set_result(None)
def sock_accept(self, sock):
"""Accept a connection.
The socket must be bound to an address and listening for connections.
The return value is a pair (conn, address) where conn is a new socket
object usable to send and receive data on the connection, and address
is the address bound to the socket on the other end of the connection.
This method is a coroutine.
"""
if self._debug and sock.gettimeout() != 0:
raise ValueError("the socket must be non-blocking")
fut = futures.Future(loop=self)
self._sock_accept(fut, False, sock)
return fut
def _sock_accept(self, fut, registered, sock):
fd = sock.fileno()
if registered:
self.remove_reader(fd)
if fut.cancelled():
return
try:
conn, address = sock.accept()
conn.setblocking(False)
except (BlockingIOError, InterruptedError):
self.add_reader(fd, self._sock_accept, fut, True, sock)
except Exception as exc:
fut.set_exception(exc)
else:
fut.set_result((conn, address))
def _process_events(self, event_list):
for key, mask in event_list:
fileobj, (reader, writer) = key.fileobj, key.data
if mask & selectors.EVENT_READ and reader is not None:
if reader._cancelled:
self.remove_reader(fileobj)
else:
self._add_callback(reader)
if mask & selectors.EVENT_WRITE and writer is not None:
if writer._cancelled:
self.remove_writer(fileobj)
else:
self._add_callback(writer)
def _stop_serving(self, sock):
self.remove_reader(sock.fileno())
sock.close()
class _SelectorTransport(transports._FlowControlMixin,
transports.Transport):
max_size = 256 * 1024 # Buffer size passed to recv().
_buffer_factory = bytearray # Constructs initial value for self._buffer.
# Attribute used in the destructor: it must be set even if the constructor
# is not called (see _SelectorSslTransport which may start by raising an
# exception)
_sock = None
def __init__(self, loop, sock, protocol, extra=None, server=None):
super().__init__(extra, loop)
self._extra['socket'] = sock
self._extra['sockname'] = sock.getsockname()
if 'peername' not in self._extra:
try:
self._extra['peername'] = sock.getpeername()
except socket.error:
self._extra['peername'] = None
self._sock = sock
self._sock_fd = sock.fileno()
self._protocol = protocol
self._protocol_connected = True
self._server = server
self._buffer = self._buffer_factory()
self._conn_lost = 0 # Set when call to connection_lost scheduled.
self._closing = False # Set when close() called.
if self._server is not None:
self._server._attach()
def __repr__(self):
info = [self.__class__.__name__]
if self._sock is None:
info.append('closed')
elif self._closing:
info.append('closing')
info.append('fd=%s' % self._sock_fd)
# test if the transport was closed
if self._loop is not None and not self._loop.is_closed():
polling = _test_selector_event(self._loop._selector,
self._sock_fd, selectors.EVENT_READ)
if polling:
info.append('read=polling')
else:
info.append('read=idle')
polling = _test_selector_event(self._loop._selector,
self._sock_fd,
selectors.EVENT_WRITE)
if polling:
state = 'polling'
else:
state = 'idle'
bufsize = self.get_write_buffer_size()
info.append('write=<%s, bufsize=%s>' % (state, bufsize))
return '<%s>' % ' '.join(info)
def abort(self):
self._force_close(None)
def is_closing(self):
return self._closing
def close(self):
if self._closing:
return
self._closing = True
self._loop.remove_reader(self._sock_fd)
if not self._buffer:
self._conn_lost += 1
self._loop.call_soon(self._call_connection_lost, None)
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if self._sock is not None:
warnings.warn("unclosed transport %r" % self, ResourceWarning)
self._sock.close()
def _fatal_error(self, exc, message='Fatal error on transport'):
# Should be called from exception handler only.
if isinstance(exc, (BrokenPipeError,
ConnectionResetError, ConnectionAbortedError)):
if self._loop.get_debug():
logger.debug("%r: %s", self, message, exc_info=True)
else:
self._loop.call_exception_handler({
'message': message,
'exception': exc,
'transport': self,
'protocol': self._protocol,
})
self._force_close(exc)
def _force_close(self, exc):
if self._conn_lost:
return
if self._buffer:
self._buffer.clear()
self._loop.remove_writer(self._sock_fd)
if not self._closing:
self._closing = True
self._loop.remove_reader(self._sock_fd)
self._conn_lost += 1
self._loop.call_soon(self._call_connection_lost, exc)
def _call_connection_lost(self, exc):
try:
if self._protocol_connected:
self._protocol.connection_lost(exc)
finally:
self._sock.close()
self._sock = None
self._protocol = None
self._loop = None
server = self._server
if server is not None:
server._detach()
self._server = None
def get_write_buffer_size(self):
return len(self._buffer)
class _SelectorSocketTransport(_SelectorTransport):
def __init__(self, loop, sock, protocol, waiter=None,
extra=None, server=None):
super().__init__(loop, sock, protocol, extra, server)
self._eof = False
self._paused = False
self._loop.call_soon(self._protocol.connection_made, self)
# only start reading when connection_made() has been called
self._loop.call_soon(self._loop.add_reader,
self._sock_fd, self._read_ready)
if waiter is not None:
# only wake up the waiter when connection_made() has been called
self._loop.call_soon(futures._set_result_unless_cancelled,
waiter, None)
def pause_reading(self):
if self._closing:
raise RuntimeError('Cannot pause_reading() when closing')
if self._paused:
raise RuntimeError('Already paused')
self._paused = True
self._loop.remove_reader(self._sock_fd)
if self._loop.get_debug():
logger.debug("%r pauses reading", self)
def resume_reading(self):
if not self._paused:
raise RuntimeError('Not paused')
self._paused = False
if self._closing:
return
self._loop.add_reader(self._sock_fd, self._read_ready)
if self._loop.get_debug():
logger.debug("%r resumes reading", self)
def _read_ready(self):
try:
data = self._sock.recv(self.max_size)
except (BlockingIOError, InterruptedError):
pass
except Exception as exc:
self._fatal_error(exc, 'Fatal read error on socket transport')
else:
if data:
self._protocol.data_received(data)
else:
if self._loop.get_debug():
logger.debug("%r received EOF", self)
keep_open = self._protocol.eof_received()
if keep_open:
# We're keeping the connection open so the
# protocol can write more, but we still can't
# receive more, so remove the reader callback.
self._loop.remove_reader(self._sock_fd)
else:
self.close()
def write(self, data):
if not isinstance(data, (bytes, bytearray, memoryview)):
raise TypeError('data argument must be byte-ish (%r)',
type(data))
if self._eof:
raise RuntimeError('Cannot call write() after write_eof()')
if not data:
return
if self._conn_lost:
if self._conn_lost >= constants.LOG_THRESHOLD_FOR_CONNLOST_WRITES:
logger.warning('socket.send() raised exception.')
self._conn_lost += 1
return
if not self._buffer:
# Optimization: try to send now.
try:
n = self._sock.send(data)
except (BlockingIOError, InterruptedError):
pass
except Exception as exc:
self._fatal_error(exc, 'Fatal write error on socket transport')
return
else:
data = data[n:]
if not data:
return
# Not all was written; register write handler.
self._loop.add_writer(self._sock_fd, self._write_ready)
# Add it to the buffer.
self._buffer.extend(data)
self._maybe_pause_protocol()
def _write_ready(self):
assert self._buffer, 'Data should not be empty'
try:
n = self._sock.send(self._buffer)
except (BlockingIOError, InterruptedError):
pass
except Exception as exc:
self._loop.remove_writer(self._sock_fd)
self._buffer.clear()
self._fatal_error(exc, 'Fatal write error on socket transport')
else:
if n:
del self._buffer[:n]
self._maybe_resume_protocol() # May append to buffer.
if not self._buffer:
self._loop.remove_writer(self._sock_fd)
if self._closing:
self._call_connection_lost(None)
elif self._eof:
self._sock.shutdown(socket.SHUT_WR)
def write_eof(self):
if self._eof:
return
self._eof = True
if not self._buffer:
self._sock.shutdown(socket.SHUT_WR)
def can_write_eof(self):
return True
class _SelectorSslTransport(_SelectorTransport):
_buffer_factory = bytearray
def __init__(self, loop, rawsock, protocol, sslcontext, waiter=None,
server_side=False, server_hostname=None,
extra=None, server=None):
if ssl is None:
raise RuntimeError('stdlib ssl module not available')
if not sslcontext:
sslcontext = sslproto._create_transport_context(server_side, server_hostname)
wrap_kwargs = {
'server_side': server_side,
'do_handshake_on_connect': False,
}
if server_hostname and not server_side:
wrap_kwargs['server_hostname'] = server_hostname
sslsock = sslcontext.wrap_socket(rawsock, **wrap_kwargs)
super().__init__(loop, sslsock, protocol, extra, server)
# the protocol connection is only made after the SSL handshake
self._protocol_connected = False
self._server_hostname = server_hostname
self._waiter = waiter
self._sslcontext = sslcontext
self._paused = False
# SSL-specific extra info. (peercert is set later)
self._extra.update(sslcontext=sslcontext)
if self._loop.get_debug():
logger.debug("%r starts SSL handshake", self)
start_time = self._loop.time()
else:
start_time = None
self._on_handshake(start_time)
def _wakeup_waiter(self, exc=None):
if self._waiter is None:
return
if not self._waiter.cancelled():
if exc is not None:
self._waiter.set_exception(exc)
else:
self._waiter.set_result(None)
self._waiter = None
def _on_handshake(self, start_time):
try:
self._sock.do_handshake()
except ssl.SSLWantReadError:
self._loop.add_reader(self._sock_fd,
self._on_handshake, start_time)
return
except ssl.SSLWantWriteError:
self._loop.add_writer(self._sock_fd,
self._on_handshake, start_time)
return
except BaseException as exc:
if self._loop.get_debug():
logger.warning("%r: SSL handshake failed",
self, exc_info=True)
self._loop.remove_reader(self._sock_fd)
self._loop.remove_writer(self._sock_fd)
self._sock.close()
self._wakeup_waiter(exc)
if isinstance(exc, Exception):
return
else:
raise
self._loop.remove_reader(self._sock_fd)
self._loop.remove_writer(self._sock_fd)
peercert = self._sock.getpeercert()
if not hasattr(self._sslcontext, 'check_hostname'):
# Verify hostname if requested, Python 3.4+ uses check_hostname
# and checks the hostname in do_handshake()
if (self._server_hostname and
self._sslcontext.verify_mode != ssl.CERT_NONE):
try:
ssl.match_hostname(peercert, self._server_hostname)
except Exception as exc:
if self._loop.get_debug():
logger.warning("%r: SSL handshake failed "
"on matching the hostname",
self, exc_info=True)
self._sock.close()
self._wakeup_waiter(exc)
return
# Add extra info that becomes available after handshake.
self._extra.update(peercert=peercert,
cipher=self._sock.cipher(),
compression=self._sock.compression(),
ssl_object=self._sock,
)
self._read_wants_write = False
self._write_wants_read = False
self._loop.add_reader(self._sock_fd, self._read_ready)
self._protocol_connected = True
self._loop.call_soon(self._protocol.connection_made, self)
# only wake up the waiter when connection_made() has been called
self._loop.call_soon(self._wakeup_waiter)
if self._loop.get_debug():
dt = self._loop.time() - start_time
logger.debug("%r: SSL handshake took %.1f ms", self, dt * 1e3)
def pause_reading(self):
# XXX This is a bit icky, given the comment at the top of
# _read_ready(). Is it possible to evoke a deadlock? I don't
# know, although it doesn't look like it; write() will still
# accept more data for the buffer and eventually the app will
# call resume_reading() again, and things will flow again.
if self._closing:
raise RuntimeError('Cannot pause_reading() when closing')
if self._paused:
raise RuntimeError('Already paused')
self._paused = True
self._loop.remove_reader(self._sock_fd)
if self._loop.get_debug():
logger.debug("%r pauses reading", self)
def resume_reading(self):
if not self._paused:
raise RuntimeError('Not paused')
self._paused = False
if self._closing:
return
self._loop.add_reader(self._sock_fd, self._read_ready)
if self._loop.get_debug():
logger.debug("%r resumes reading", self)
def _read_ready(self):
if self._write_wants_read:
self._write_wants_read = False
self._write_ready()
if self._buffer:
self._loop.add_writer(self._sock_fd, self._write_ready)
try:
data = self._sock.recv(self.max_size)
except (BlockingIOError, InterruptedError, ssl.SSLWantReadError):
pass
except ssl.SSLWantWriteError:
self._read_wants_write = True
self._loop.remove_reader(self._sock_fd)
self._loop.add_writer(self._sock_fd, self._write_ready)
except Exception as exc:
self._fatal_error(exc, 'Fatal read error on SSL transport')
else:
if data:
self._protocol.data_received(data)
else:
try:
if self._loop.get_debug():
logger.debug("%r received EOF", self)
keep_open = self._protocol.eof_received()
if keep_open:
logger.warning('returning true from eof_received() '
'has no effect when using ssl')
finally:
self.close()
def _write_ready(self):
if self._read_wants_write:
self._read_wants_write = False
self._read_ready()
if not (self._paused or self._closing):
self._loop.add_reader(self._sock_fd, self._read_ready)
if self._buffer:
try:
n = self._sock.send(self._buffer)
except (BlockingIOError, InterruptedError, ssl.SSLWantWriteError):
n = 0
except ssl.SSLWantReadError:
n = 0
self._loop.remove_writer(self._sock_fd)
self._write_wants_read = True
except Exception as exc:
self._loop.remove_writer(self._sock_fd)
self._buffer.clear()
self._fatal_error(exc, 'Fatal write error on SSL transport')
return
if n:
del self._buffer[:n]
self._maybe_resume_protocol() # May append to buffer.
if not self._buffer:
self._loop.remove_writer(self._sock_fd)
if self._closing:
self._call_connection_lost(None)
def write(self, data):
if not isinstance(data, (bytes, bytearray, memoryview)):
raise TypeError('data argument must be byte-ish (%r)',
type(data))
if not data:
return
if self._conn_lost:
if self._conn_lost >= constants.LOG_THRESHOLD_FOR_CONNLOST_WRITES:
logger.warning('socket.send() raised exception.')
self._conn_lost += 1
return
if not self._buffer:
self._loop.add_writer(self._sock_fd, self._write_ready)
# Add it to the buffer.
self._buffer.extend(data)
self._maybe_pause_protocol()
def can_write_eof(self):
return False
class _SelectorDatagramTransport(_SelectorTransport):
_buffer_factory = collections.deque
def __init__(self, loop, sock, protocol, address=None,
waiter=None, extra=None):
super().__init__(loop, sock, protocol, extra)
self._address = address
self._loop.call_soon(self._protocol.connection_made, self)
# only start reading when connection_made() has been called
self._loop.call_soon(self._loop.add_reader,
self._sock_fd, self._read_ready)
if waiter is not None:
# only wake up the waiter when connection_made() has been called
self._loop.call_soon(futures._set_result_unless_cancelled,
waiter, None)
def get_write_buffer_size(self):
return sum(len(data) for data, _ in self._buffer)
def _read_ready(self):
try:
data, addr = self._sock.recvfrom(self.max_size)
except (BlockingIOError, InterruptedError):
pass
except OSError as exc:
self._protocol.error_received(exc)
except Exception as exc:
self._fatal_error(exc, 'Fatal read error on datagram transport')
else:
self._protocol.datagram_received(data, addr)
def sendto(self, data, addr=None):
if not isinstance(data, (bytes, bytearray, memoryview)):
raise TypeError('data argument must be byte-ish (%r)',
type(data))
if not data:
return
if self._address and addr not in (None, self._address):
raise ValueError('Invalid address: must be None or %s' %
(self._address,))
if self._conn_lost and self._address:
if self._conn_lost >= constants.LOG_THRESHOLD_FOR_CONNLOST_WRITES:
logger.warning('socket.send() raised exception.')
self._conn_lost += 1
return
if not self._buffer:
# Attempt to send it right away first.
try:
if self._address:
self._sock.send(data)
else:
self._sock.sendto(data, addr)
return
except (BlockingIOError, InterruptedError):
self._loop.add_writer(self._sock_fd, self._sendto_ready)
except OSError as exc:
self._protocol.error_received(exc)
return
except Exception as exc:
self._fatal_error(exc,
'Fatal write error on datagram transport')
return
# Ensure that what we buffer is immutable.
self._buffer.append((bytes(data), addr))
self._maybe_pause_protocol()
def _sendto_ready(self):
while self._buffer:
data, addr = self._buffer.popleft()
try:
if self._address:
self._sock.send(data)
else:
self._sock.sendto(data, addr)
except (BlockingIOError, InterruptedError):
self._buffer.appendleft((data, addr)) # Try again later.
break
except OSError as exc:
self._protocol.error_received(exc)
return
except Exception as exc:
self._fatal_error(exc,
'Fatal write error on datagram transport')
return
self._maybe_resume_protocol() # May append to buffer.
if not self._buffer:
self._loop.remove_writer(self._sock_fd)
if self._closing:
self._call_connection_lost(None)
import collections
import warnings
try:
import ssl
except ImportError: # pragma: no cover
ssl = None
from . import compat
from . import protocols
from . import transports
from .log import logger
def _create_transport_context(server_side, server_hostname):
if server_side:
raise ValueError('Server side SSL needs a valid SSLContext')
# Client side may pass ssl=True to use a default
# context; in that case the sslcontext passed is None.
# The default is secure for client connections.
if hasattr(ssl, 'create_default_context'):
# Python 3.4+: use up-to-date strong settings.
sslcontext = ssl.create_default_context()
if not server_hostname:
sslcontext.check_hostname = False
else:
# Fallback for Python 3.3.
sslcontext = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
sslcontext.options |= ssl.OP_NO_SSLv2
sslcontext.options |= ssl.OP_NO_SSLv3
sslcontext.set_default_verify_paths()
sslcontext.verify_mode = ssl.CERT_REQUIRED
return sslcontext
def _is_sslproto_available():
return hasattr(ssl, "MemoryBIO")
# States of an _SSLPipe.
_UNWRAPPED = "UNWRAPPED"
_DO_HANDSHAKE = "DO_HANDSHAKE"
_WRAPPED = "WRAPPED"
_SHUTDOWN = "SHUTDOWN"
class _SSLPipe(object):
"""An SSL "Pipe".
An SSL pipe allows you to communicate with an SSL/TLS protocol instance
through memory buffers. It can be used to implement a security layer for an
existing connection where you don't have access to the connection's file
descriptor, or for some reason you don't want to use it.
An SSL pipe can be in "wrapped" and "unwrapped" mode. In unwrapped mode,
data is passed through untransformed. In wrapped mode, application level
data is encrypted to SSL record level data and vice versa. The SSL record
level is the lowest level in the SSL protocol suite and is what travels
as-is over the wire.
An SslPipe initially is in "unwrapped" mode. To start SSL, call
do_handshake(). To shutdown SSL again, call unwrap().
"""
max_size = 256 * 1024 # Buffer size passed to read()
def __init__(self, context, server_side, server_hostname=None):
"""
The *context* argument specifies the ssl.SSLContext to use.
The *server_side* argument indicates whether this is a server side or
client side transport.
The optional *server_hostname* argument can be used to specify the
hostname you are connecting to. You may only specify this parameter if
the _ssl module supports Server Name Indication (SNI).
"""
self._context = context
self._server_side = server_side
self._server_hostname = server_hostname
self._state = _UNWRAPPED
self._incoming = ssl.MemoryBIO()
self._outgoing = ssl.MemoryBIO()
self._sslobj = None
self._need_ssldata = False
self._handshake_cb = None
self._shutdown_cb = None
@property
def context(self):
"""The SSL context passed to the constructor."""
return self._context
@property
def ssl_object(self):
"""The internal ssl.SSLObject instance.
Return None if the pipe is not wrapped.
"""
return self._sslobj
@property
def need_ssldata(self):
"""Whether more record level data is needed to complete a handshake
that is currently in progress."""
return self._need_ssldata
@property
def wrapped(self):
"""
Whether a security layer is currently in effect.
Return False during handshake.
"""
return self._state == _WRAPPED
def do_handshake(self, callback=None):
"""Start the SSL handshake.
Return a list of ssldata. A ssldata element is a list of buffers
The optional *callback* argument can be used to install a callback that
will be called when the handshake is complete. The callback will be
called with None if successful, else an exception instance.
"""
if self._state != _UNWRAPPED:
raise RuntimeError('handshake in progress or completed')
self._sslobj = self._context.wrap_bio(
self._incoming, self._outgoing,
server_side=self._server_side,
server_hostname=self._server_hostname)
self._state = _DO_HANDSHAKE
self._handshake_cb = callback
ssldata, appdata = self.feed_ssldata(b'', only_handshake=True)
assert len(appdata) == 0
return ssldata
def shutdown(self, callback=None):
"""Start the SSL shutdown sequence.
Return a list of ssldata. A ssldata element is a list of buffers
The optional *callback* argument can be used to install a callback that
will be called when the shutdown is complete. The callback will be
called without arguments.
"""
if self._state == _UNWRAPPED:
raise RuntimeError('no security layer present')
if self._state == _SHUTDOWN:
raise RuntimeError('shutdown in progress')
assert self._state in (_WRAPPED, _DO_HANDSHAKE)
self._state = _SHUTDOWN
self._shutdown_cb = callback
ssldata, appdata = self.feed_ssldata(b'')
assert appdata == [] or appdata == [b'']
return ssldata
def feed_eof(self):
"""Send a potentially "ragged" EOF.
This method will raise an SSL_ERROR_EOF exception if the EOF is
unexpected.
"""
self._incoming.write_eof()
ssldata, appdata = self.feed_ssldata(b'')
assert appdata == [] or appdata == [b'']
def feed_ssldata(self, data, only_handshake=False):
"""Feed SSL record level data into the pipe.
The data must be a bytes instance. It is OK to send an empty bytes
instance. This can be used to get ssldata for a handshake initiated by
this endpoint.
Return a (ssldata, appdata) tuple. The ssldata element is a list of
buffers containing SSL data that needs to be sent to the remote SSL.
The appdata element is a list of buffers containing plaintext data that
needs to be forwarded to the application. The appdata list may contain
an empty buffer indicating an SSL "close_notify" alert. This alert must
be acknowledged by calling shutdown().
"""
if self._state == _UNWRAPPED:
# If unwrapped, pass plaintext data straight through.
if data:
appdata = [data]
else:
appdata = []
return ([], appdata)
self._need_ssldata = False
if data:
self._incoming.write(data)
ssldata = []
appdata = []
try:
if self._state == _DO_HANDSHAKE:
# Call do_handshake() until it doesn't raise anymore.
self._sslobj.do_handshake()
self._state = _WRAPPED
if self._handshake_cb:
self._handshake_cb(None)
if only_handshake:
return (ssldata, appdata)
# Handshake done: execute the wrapped block
if self._state == _WRAPPED:
# Main state: read data from SSL until close_notify
while True:
chunk = self._sslobj.read(self.max_size)
appdata.append(chunk)
if not chunk: # close_notify
break
elif self._state == _SHUTDOWN:
# Call shutdown() until it doesn't raise anymore.
self._sslobj.unwrap()
self._sslobj = None
self._state = _UNWRAPPED
if self._shutdown_cb:
self._shutdown_cb()
elif self._state == _UNWRAPPED:
# Drain possible plaintext data after close_notify.
appdata.append(self._incoming.read())
except (ssl.SSLError, ssl.CertificateError) as exc:
if getattr(exc, 'errno', None) not in (
ssl.SSL_ERROR_WANT_READ, ssl.SSL_ERROR_WANT_WRITE,
ssl.SSL_ERROR_SYSCALL):
if self._state == _DO_HANDSHAKE and self._handshake_cb:
self._handshake_cb(exc)
raise
self._need_ssldata = (exc.errno == ssl.SSL_ERROR_WANT_READ)
# Check for record level data that needs to be sent back.
# Happens for the initial handshake and renegotiations.
if self._outgoing.pending:
ssldata.append(self._outgoing.read())
return (ssldata, appdata)
def feed_appdata(self, data, offset=0):
"""Feed plaintext data into the pipe.
Return an (ssldata, offset) tuple. The ssldata element is a list of
buffers containing record level data that needs to be sent to the
remote SSL instance. The offset is the number of plaintext bytes that
were processed, which may be less than the length of data.
NOTE: In case of short writes, this call MUST be retried with the SAME
buffer passed into the *data* argument (i.e. the id() must be the
same). This is an OpenSSL requirement. A further particularity is that
a short write will always have offset == 0, because the _ssl module
does not enable partial writes. And even though the offset is zero,
there will still be encrypted data in ssldata.
"""
assert 0 <= offset <= len(data)
if self._state == _UNWRAPPED:
# pass through data in unwrapped mode
if offset < len(data):
ssldata = [data[offset:]]
else:
ssldata = []
return (ssldata, len(data))
ssldata = []
view = memoryview(data)
while True:
self._need_ssldata = False
try:
if offset < len(view):
offset += self._sslobj.write(view[offset:])
except ssl.SSLError as exc:
# It is not allowed to call write() after unwrap() until the
# close_notify is acknowledged. We return the condition to the
# caller as a short write.
if exc.reason == 'PROTOCOL_IS_SHUTDOWN':
exc.errno = ssl.SSL_ERROR_WANT_READ
if exc.errno not in (ssl.SSL_ERROR_WANT_READ,
ssl.SSL_ERROR_WANT_WRITE,
ssl.SSL_ERROR_SYSCALL):
raise
self._need_ssldata = (exc.errno == ssl.SSL_ERROR_WANT_READ)
# See if there's any record level data back for us.
if self._outgoing.pending:
ssldata.append(self._outgoing.read())
if offset == len(view) or self._need_ssldata:
break
return (ssldata, offset)
class _SSLProtocolTransport(transports._FlowControlMixin,
transports.Transport):
def __init__(self, loop, ssl_protocol, app_protocol):
self._loop = loop
# SSLProtocol instance
self._ssl_protocol = ssl_protocol
self._app_protocol = app_protocol
self._closed = False
def get_extra_info(self, name, default=None):
"""Get optional transport information."""
return self._ssl_protocol._get_extra_info(name, default)
def is_closing(self):
return self._closed
def close(self):
"""Close the transport.
Buffered data will be flushed asynchronously. No more data
will be received. After all buffered data is flushed, the
protocol's connection_lost() method will (eventually) called
with None as its argument.
"""
self._closed = True
self._ssl_protocol._start_shutdown()
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if not self._closed:
warnings.warn("unclosed transport %r" % self, ResourceWarning)
self.close()
def pause_reading(self):
"""Pause the receiving end.
No data will be passed to the protocol's data_received()
method until resume_reading() is called.
"""
self._ssl_protocol._transport.pause_reading()
def resume_reading(self):
"""Resume the receiving end.
Data received will once again be passed to the protocol's
data_received() method.
"""
self._ssl_protocol._transport.resume_reading()
def set_write_buffer_limits(self, high=None, low=None):
"""Set the high- and low-water limits for write flow control.
These two values control when to call the protocol's
pause_writing() and resume_writing() methods. If specified,
the low-water limit must be less than or equal to the
high-water limit. Neither value can be negative.
The defaults are implementation-specific. If only the
high-water limit is given, the low-water limit defaults to an
implementation-specific value less than or equal to the
high-water limit. Setting high to zero forces low to zero as
well, and causes pause_writing() to be called whenever the
buffer becomes non-empty. Setting low to zero causes
resume_writing() to be called only once the buffer is empty.
Use of zero for either limit is generally sub-optimal as it
reduces opportunities for doing I/O and computation
concurrently.
"""
self._ssl_protocol._transport.set_write_buffer_limits(high, low)
def get_write_buffer_size(self):
"""Return the current size of the write buffer."""
return self._ssl_protocol._transport.get_write_buffer_size()
def write(self, data):
"""Write some data bytes to the transport.
This does not block; it buffers the data and arranges for it
to be sent out asynchronously.
"""
if not isinstance(data, (bytes, bytearray, memoryview)):
raise TypeError("data: expecting a bytes-like instance, got {!r}"
.format(type(data).__name__))
if not data:
return
self._ssl_protocol._write_appdata(data)
def can_write_eof(self):
"""Return True if this transport supports write_eof(), False if not."""
return False
def abort(self):
"""Close the transport immediately.
Buffered data will be lost. No more data will be received.
The protocol's connection_lost() method will (eventually) be
called with None as its argument.
"""
self._ssl_protocol._abort()
class SSLProtocol(protocols.Protocol):
"""SSL protocol.
Implementation of SSL on top of a socket using incoming and outgoing
buffers which are ssl.MemoryBIO objects.
"""
def __init__(self, loop, app_protocol, sslcontext, waiter,
server_side=False, server_hostname=None):
if ssl is None:
raise RuntimeError('stdlib ssl module not available')
if not sslcontext:
sslcontext = _create_transport_context(server_side, server_hostname)
self._server_side = server_side
if server_hostname and not server_side:
self._server_hostname = server_hostname
else:
self._server_hostname = None
self._sslcontext = sslcontext
# SSL-specific extra info. More info are set when the handshake
# completes.
self._extra = dict(sslcontext=sslcontext)
# App data write buffering
self._write_backlog = collections.deque()
self._write_buffer_size = 0
self._waiter = waiter
self._loop = loop
self._app_protocol = app_protocol
self._app_transport = _SSLProtocolTransport(self._loop,
self, self._app_protocol)
# _SSLPipe instance (None until the connection is made)
self._sslpipe = None
self._session_established = False
self._in_handshake = False
self._in_shutdown = False
# transport, ex: SelectorSocketTransport
self._transport = None
def _wakeup_waiter(self, exc=None):
if self._waiter is None:
return
if not self._waiter.cancelled():
if exc is not None:
self._waiter.set_exception(exc)
else:
self._waiter.set_result(None)
self._waiter = None
def connection_made(self, transport):
"""Called when the low-level connection is made.
Start the SSL handshake.
"""
self._transport = transport
self._sslpipe = _SSLPipe(self._sslcontext,
self._server_side,
self._server_hostname)
self._start_handshake()
def connection_lost(self, exc):
"""Called when the low-level connection is lost or closed.
The argument is an exception object or None (the latter
meaning a regular EOF is received or the connection was
aborted or closed).
"""
if self._session_established:
self._session_established = False
self._loop.call_soon(self._app_protocol.connection_lost, exc)
self._transport = None
self._app_transport = None
def pause_writing(self):
"""Called when the low-level transport's buffer goes over
the high-water mark.
"""
self._app_protocol.pause_writing()
def resume_writing(self):
"""Called when the low-level transport's buffer drains below
the low-water mark.
"""
self._app_protocol.resume_writing()
def data_received(self, data):
"""Called when some SSL data is received.
The argument is a bytes object.
"""
try:
ssldata, appdata = self._sslpipe.feed_ssldata(data)
except ssl.SSLError as e:
if self._loop.get_debug():
logger.warning('%r: SSL error %s (reason %s)',
self, e.errno, e.reason)
self._abort()
return
for chunk in ssldata:
self._transport.write(chunk)
for chunk in appdata:
if chunk:
self._app_protocol.data_received(chunk)
else:
self._start_shutdown()
break
def eof_received(self):
"""Called when the other end of the low-level stream
is half-closed.
If this returns a false value (including None), the transport
will close itself. If it returns a true value, closing the
transport is up to the protocol.
"""
try:
if self._loop.get_debug():
logger.debug("%r received EOF", self)
self._wakeup_waiter(ConnectionResetError)
if not self._in_handshake:
keep_open = self._app_protocol.eof_received()
if keep_open:
logger.warning('returning true from eof_received() '
'has no effect when using ssl')
finally:
self._transport.close()
def _get_extra_info(self, name, default=None):
if name in self._extra:
return self._extra[name]
else:
return self._transport.get_extra_info(name, default)
def _start_shutdown(self):
if self._in_shutdown:
return
self._in_shutdown = True
self._write_appdata(b'')
def _write_appdata(self, data):
self._write_backlog.append((data, 0))
self._write_buffer_size += len(data)
self._process_write_backlog()
def _start_handshake(self):
if self._loop.get_debug():
logger.debug("%r starts SSL handshake", self)
self._handshake_start_time = self._loop.time()
else:
self._handshake_start_time = None
self._in_handshake = True
# (b'', 1) is a special value in _process_write_backlog() to do
# the SSL handshake
self._write_backlog.append((b'', 1))
self._loop.call_soon(self._process_write_backlog)
def _on_handshake_complete(self, handshake_exc):
self._in_handshake = False
sslobj = self._sslpipe.ssl_object
try:
if handshake_exc is not None:
raise handshake_exc
peercert = sslobj.getpeercert()
if not hasattr(self._sslcontext, 'check_hostname'):
# Verify hostname if requested, Python 3.4+ uses check_hostname
# and checks the hostname in do_handshake()
if (self._server_hostname
and self._sslcontext.verify_mode != ssl.CERT_NONE):
ssl.match_hostname(peercert, self._server_hostname)
except BaseException as exc:
if self._loop.get_debug():
if isinstance(exc, ssl.CertificateError):
logger.warning("%r: SSL handshake failed "
"on verifying the certificate",
self, exc_info=True)
else:
logger.warning("%r: SSL handshake failed",
self, exc_info=True)
self._transport.close()
if isinstance(exc, Exception):
self._wakeup_waiter(exc)
return
else:
raise
if self._loop.get_debug():
dt = self._loop.time() - self._handshake_start_time
logger.debug("%r: SSL handshake took %.1f ms", self, dt * 1e3)
# Add extra info that becomes available after handshake.
self._extra.update(peercert=peercert,
cipher=sslobj.cipher(),
compression=sslobj.compression(),
ssl_object=sslobj,
)
self._app_protocol.connection_made(self._app_transport)
self._wakeup_waiter()
self._session_established = True
# In case transport.write() was already called. Don't call
# immediatly _process_write_backlog(), but schedule it:
# _on_handshake_complete() can be called indirectly from
# _process_write_backlog(), and _process_write_backlog() is not
# reentrant.
self._loop.call_soon(self._process_write_backlog)
def _process_write_backlog(self):
# Try to make progress on the write backlog.
if self._transport is None:
return
try:
for i in range(len(self._write_backlog)):
data, offset = self._write_backlog[0]
if data:
ssldata, offset = self._sslpipe.feed_appdata(data, offset)
elif offset:
ssldata = self._sslpipe.do_handshake(
self._on_handshake_complete)
offset = 1
else:
ssldata = self._sslpipe.shutdown(self._finalize)
offset = 1
for chunk in ssldata:
self._transport.write(chunk)
if offset < len(data):
self._write_backlog[0] = (data, offset)
# A short write means that a write is blocked on a read
# We need to enable reading if it is paused!
assert self._sslpipe.need_ssldata
if self._transport._paused:
self._transport.resume_reading()
break
# An entire chunk from the backlog was processed. We can
# delete it and reduce the outstanding buffer size.
del self._write_backlog[0]
self._write_buffer_size -= len(data)
except BaseException as exc:
if self._in_handshake:
# BaseExceptions will be re-raised in _on_handshake_complete.
self._on_handshake_complete(exc)
else:
self._fatal_error(exc, 'Fatal error on SSL transport')
if not isinstance(exc, Exception):
# BaseException
raise
def _fatal_error(self, exc, message='Fatal error on transport'):
# Should be called from exception handler only.
if isinstance(exc, (BrokenPipeError, ConnectionResetError)):
if self._loop.get_debug():
logger.debug("%r: %s", self, message, exc_info=True)
else:
self._loop.call_exception_handler({
'message': message,
'exception': exc,
'transport': self._transport,
'protocol': self,
})
if self._transport:
self._transport._force_close(exc)
def _finalize(self):
if self._transport is not None:
self._transport.close()
def _abort(self):
if self._transport is not None:
try:
self._transport.abort()
finally:
self._finalize()
"""Stream-related things."""
__all__ = ['StreamReader', 'StreamWriter', 'StreamReaderProtocol',
'open_connection', 'start_server',
'IncompleteReadError',
'LimitOverrunError',
]
import socket
if hasattr(socket, 'AF_UNIX'):
__all__.extend(['open_unix_connection', 'start_unix_server'])
from . import coroutines
from . import compat
from . import events
from . import futures
from . import protocols
from .coroutines import coroutine
from .log import logger
_DEFAULT_LIMIT = 2**16
class IncompleteReadError(EOFError):
"""
Incomplete read error. Attributes:
- partial: read bytes string before the end of stream was reached
- expected: total number of expected bytes (or None if unknown)
"""
def __init__(self, partial, expected):
super().__init__("%d bytes read on a total of %r expected bytes"
% (len(partial), expected))
self.partial = partial
self.expected = expected
class LimitOverrunError(Exception):
"""Reached buffer limit while looking for the separator.
Attributes:
- message: error message
- consumed: total number of bytes that should be consumed
"""
def __init__(self, message, consumed):
super().__init__(message)
self.message = message
self.consumed = consumed
@coroutine
def open_connection(host=None, port=None, *,
loop=None, limit=_DEFAULT_LIMIT, **kwds):
"""A wrapper for create_connection() returning a (reader, writer) pair.
The reader returned is a StreamReader instance; the writer is a
StreamWriter instance.
The arguments are all the usual arguments to create_connection()
except protocol_factory; most common are positional host and port,
with various optional keyword arguments following.
Additional optional keyword arguments are loop (to set the event loop
instance to use) and limit (to set the buffer limit passed to the
StreamReader).
(If you want to customize the StreamReader and/or
StreamReaderProtocol classes, just copy the code -- there's
really nothing special here except some convenience.)
"""
if loop is None:
loop = events.get_event_loop()
reader = StreamReader(limit=limit, loop=loop)
protocol = StreamReaderProtocol(reader, loop=loop)
transport, _ = yield from loop.create_connection(
lambda: protocol, host, port, **kwds)
writer = StreamWriter(transport, protocol, reader, loop)
return reader, writer
@coroutine
def start_server(client_connected_cb, host=None, port=None, *,
loop=None, limit=_DEFAULT_LIMIT, **kwds):
"""Start a socket server, call back for each client connected.
The first parameter, `client_connected_cb`, takes two parameters:
client_reader, client_writer. client_reader is a StreamReader
object, while client_writer is a StreamWriter object. This
parameter can either be a plain callback function or a coroutine;
if it is a coroutine, it will be automatically converted into a
Task.
The rest of the arguments are all the usual arguments to
loop.create_server() except protocol_factory; most common are
positional host and port, with various optional keyword arguments
following. The return value is the same as loop.create_server().
Additional optional keyword arguments are loop (to set the event loop
instance to use) and limit (to set the buffer limit passed to the
StreamReader).
The return value is the same as loop.create_server(), i.e. a
Server object which can be used to stop the service.
"""
if loop is None:
loop = events.get_event_loop()
def factory():
reader = StreamReader(limit=limit, loop=loop)
protocol = StreamReaderProtocol(reader, client_connected_cb,
loop=loop)
return protocol
return (yield from loop.create_server(factory, host, port, **kwds))
if hasattr(socket, 'AF_UNIX'):
# UNIX Domain Sockets are supported on this platform
@coroutine
def open_unix_connection(path=None, *,
loop=None, limit=_DEFAULT_LIMIT, **kwds):
"""Similar to `open_connection` but works with UNIX Domain Sockets."""
if loop is None:
loop = events.get_event_loop()
reader = StreamReader(limit=limit, loop=loop)
protocol = StreamReaderProtocol(reader, loop=loop)
transport, _ = yield from loop.create_unix_connection(
lambda: protocol, path, **kwds)
writer = StreamWriter(transport, protocol, reader, loop)
return reader, writer
@coroutine
def start_unix_server(client_connected_cb, path=None, *,
loop=None, limit=_DEFAULT_LIMIT, **kwds):
"""Similar to `start_server` but works with UNIX Domain Sockets."""
if loop is None:
loop = events.get_event_loop()
def factory():
reader = StreamReader(limit=limit, loop=loop)
protocol = StreamReaderProtocol(reader, client_connected_cb,
loop=loop)
return protocol
return (yield from loop.create_unix_server(factory, path, **kwds))
class FlowControlMixin(protocols.Protocol):
"""Reusable flow control logic for StreamWriter.drain().
This implements the protocol methods pause_writing(),
resume_reading() and connection_lost(). If the subclass overrides
these it must call the super methods.
StreamWriter.drain() must wait for _drain_helper() coroutine.
"""
def __init__(self, loop=None):
if loop is None:
self._loop = events.get_event_loop()
else:
self._loop = loop
self._paused = False
self._drain_waiter = None
self._connection_lost = False
def pause_writing(self):
assert not self._paused
self._paused = True
if self._loop.get_debug():
logger.debug("%r pauses writing", self)
def resume_writing(self):
assert self._paused
self._paused = False
if self._loop.get_debug():
logger.debug("%r resumes writing", self)
waiter = self._drain_waiter
if waiter is not None:
self._drain_waiter = None
if not waiter.done():
waiter.set_result(None)
def connection_lost(self, exc):
self._connection_lost = True
# Wake up the writer if currently paused.
if not self._paused:
return
waiter = self._drain_waiter
if waiter is None:
return
self._drain_waiter = None
if waiter.done():
return
if exc is None:
waiter.set_result(None)
else:
waiter.set_exception(exc)
@coroutine
def _drain_helper(self):
if self._connection_lost:
raise ConnectionResetError('Connection lost')
if not self._paused:
return
waiter = self._drain_waiter
assert waiter is None or waiter.cancelled()
waiter = futures.Future(loop=self._loop)
self._drain_waiter = waiter
yield from waiter
class StreamReaderProtocol(FlowControlMixin, protocols.Protocol):
"""Helper class to adapt between Protocol and StreamReader.
(This is a helper class instead of making StreamReader itself a
Protocol subclass, because the StreamReader has other potential
uses, and to prevent the user of the StreamReader to accidentally
call inappropriate methods of the protocol.)
"""
def __init__(self, stream_reader, client_connected_cb=None, loop=None):
super().__init__(loop=loop)
self._stream_reader = stream_reader
self._stream_writer = None
self._client_connected_cb = client_connected_cb
def connection_made(self, transport):
self._stream_reader.set_transport(transport)
if self._client_connected_cb is not None:
self._stream_writer = StreamWriter(transport, self,
self._stream_reader,
self._loop)
res = self._client_connected_cb(self._stream_reader,
self._stream_writer)
if coroutines.iscoroutine(res):
self._loop.create_task(res)
def connection_lost(self, exc):
if exc is None:
self._stream_reader.feed_eof()
else:
self._stream_reader.set_exception(exc)
super().connection_lost(exc)
def data_received(self, data):
self._stream_reader.feed_data(data)
def eof_received(self):
self._stream_reader.feed_eof()
return True
class StreamWriter:
"""Wraps a Transport.
This exposes write(), writelines(), [can_]write_eof(),
get_extra_info() and close(). It adds drain() which returns an
optional Future on which you can wait for flow control. It also
adds a transport property which references the Transport
directly.
"""
def __init__(self, transport, protocol, reader, loop):
self._transport = transport
self._protocol = protocol
# drain() expects that the reader has an exception() method
assert reader is None or isinstance(reader, StreamReader)
self._reader = reader
self._loop = loop
def __repr__(self):
info = [self.__class__.__name__, 'transport=%r' % self._transport]
if self._reader is not None:
info.append('reader=%r' % self._reader)
return '<%s>' % ' '.join(info)
@property
def transport(self):
return self._transport
def write(self, data):
self._transport.write(data)
def writelines(self, data):
self._transport.writelines(data)
def write_eof(self):
return self._transport.write_eof()
def can_write_eof(self):
return self._transport.can_write_eof()
def close(self):
return self._transport.close()
def get_extra_info(self, name, default=None):
return self._transport.get_extra_info(name, default)
@coroutine
def drain(self):
"""Flush the write buffer.
The intended use is to write
w.write(data)
yield from w.drain()
"""
if self._reader is not None:
exc = self._reader.exception()
if exc is not None:
raise exc
if self._transport is not None:
if self._transport.is_closing():
# Yield to the event loop so connection_lost() may be
# called. Without this, _drain_helper() would return
# immediately, and code that calls
# write(...); yield from drain()
# in a loop would never call connection_lost(), so it
# would not see an error when the socket is closed.
yield
yield from self._protocol._drain_helper()
class StreamReader:
def __init__(self, limit=_DEFAULT_LIMIT, loop=None):
# The line length limit is a security feature;
# it also doubles as half the buffer limit.
if limit <= 0:
raise ValueError('Limit cannot be <= 0')
self._limit = limit
if loop is None:
self._loop = events.get_event_loop()
else:
self._loop = loop
self._buffer = bytearray()
self._eof = False # Whether we're done.
self._waiter = None # A future used by _wait_for_data()
self._exception = None
self._transport = None
self._paused = False
def __repr__(self):
info = ['StreamReader']
if self._buffer:
info.append('%d bytes' % len(self._buffer))
if self._eof:
info.append('eof')
if self._limit != _DEFAULT_LIMIT:
info.append('l=%d' % self._limit)
if self._waiter:
info.append('w=%r' % self._waiter)
if self._exception:
info.append('e=%r' % self._exception)
if self._transport:
info.append('t=%r' % self._transport)
if self._paused:
info.append('paused')
return '<%s>' % ' '.join(info)
def exception(self):
return self._exception
def set_exception(self, exc):
self._exception = exc
waiter = self._waiter
if waiter is not None:
self._waiter = None
if not waiter.cancelled():
waiter.set_exception(exc)
def _wakeup_waiter(self):
"""Wakeup read*() functions waiting for data or EOF."""
waiter = self._waiter
if waiter is not None:
self._waiter = None
if not waiter.cancelled():
waiter.set_result(None)
def set_transport(self, transport):
assert self._transport is None, 'Transport already set'
self._transport = transport
def _maybe_resume_transport(self):
if self._paused and len(self._buffer) <= self._limit:
self._paused = False
self._transport.resume_reading()
def feed_eof(self):
self._eof = True
self._wakeup_waiter()
def at_eof(self):
"""Return True if the buffer is empty and 'feed_eof' was called."""
return self._eof and not self._buffer
def feed_data(self, data):
assert not self._eof, 'feed_data after feed_eof'
if not data:
return
self._buffer.extend(data)
self._wakeup_waiter()
if (self._transport is not None and
not self._paused and
len(self._buffer) > 2*self._limit):
try:
self._transport.pause_reading()
except NotImplementedError:
# The transport can't be paused.
# We'll just have to buffer all data.
# Forget the transport so we don't keep trying.
self._transport = None
else:
self._paused = True
@coroutine
def _wait_for_data(self, func_name):
"""Wait until feed_data() or feed_eof() is called.
If stream was paused, automatically resume it.
"""
# StreamReader uses a future to link the protocol feed_data() method
# to a read coroutine. Running two read coroutines at the same time
# would have an unexpected behaviour. It would not possible to know
# which coroutine would get the next data.
if self._waiter is not None:
raise RuntimeError('%s() called while another coroutine is '
'already waiting for incoming data' % func_name)
assert not self._eof, '_wait_for_data after EOF'
# Waiting for data while paused will make deadlock, so prevent it.
if self._paused:
self._paused = False
self._transport.resume_reading()
self._waiter = futures.Future(loop=self._loop)
try:
yield from self._waiter
finally:
self._waiter = None
@coroutine
def readline(self):
"""Read chunk of data from the stream until newline (b'\n') is found.
On success, return chunk that ends with newline. If only partial
line can be read due to EOF, return incomplete line without
terminating newline. When EOF was reached while no bytes read, empty
bytes object is returned.
If limit is reached, ValueError will be raised. In that case, if
newline was found, complete line including newline will be removed
from internal buffer. Else, internal buffer will be cleared. Limit is
compared against part of the line without newline.
If stream was paused, this function will automatically resume it if
needed.
"""
sep = b'\n'
seplen = len(sep)
try:
line = yield from self.readuntil(sep)
except IncompleteReadError as e:
return e.partial
except LimitOverrunError as e:
if self._buffer.startswith(sep, e.consumed):
del self._buffer[:e.consumed + seplen]
else:
self._buffer.clear()
self._maybe_resume_transport()
raise ValueError(e.args[0])
return line
@coroutine
def readuntil(self, separator=b'\n'):
"""Read chunk of data from the stream until `separator` is found.
On success, chunk and its separator will be removed from internal buffer
(i.e. consumed). Returned chunk will include separator at the end.
Configured stream limit is used to check result. Limit means maximal
length of chunk that can be returned, not counting the separator.
If EOF occurs and complete separator still not found,
IncompleteReadError(<partial data>, None) will be raised and internal
buffer becomes empty. This partial data may contain a partial separator.
If chunk cannot be read due to overlimit, LimitOverrunError will be raised
and data will be left in internal buffer, so it can be read again, in
some different way.
If stream was paused, this function will automatically resume it if
needed.
"""
seplen = len(separator)
if seplen == 0:
raise ValueError('Separator should be at least one-byte string')
if self._exception is not None:
raise self._exception
# Consume whole buffer except last bytes, which length is
# one less than seplen. Let's check corner cases with
# separator='SEPARATOR':
# * we have received almost complete separator (without last
# byte). i.e buffer='some textSEPARATO'. In this case we
# can safely consume len(separator) - 1 bytes.
# * last byte of buffer is first byte of separator, i.e.
# buffer='abcdefghijklmnopqrS'. We may safely consume
# everything except that last byte, but this require to
# analyze bytes of buffer that match partial separator.
# This is slow and/or require FSM. For this case our
# implementation is not optimal, since require rescanning
# of data that is known to not belong to separator. In
# real world, separator will not be so long to notice
# performance problems. Even when reading MIME-encoded
# messages :)
# `offset` is the number of bytes from the beginning of the buffer where
# is no occurrence of `separator`.
offset = 0
# Loop until we find `separator` in the buffer, exceed the buffer size,
# or an EOF has happened.
while True:
buflen = len(self._buffer)
# Check if we now have enough data in the buffer for `separator` to
# fit.
if buflen - offset >= seplen:
isep = self._buffer.find(separator, offset)
if isep != -1:
# `separator` is in the buffer. `isep` will be used later to
# retrieve the data.
break
# see upper comment for explanation.
offset = buflen + 1 - seplen
if offset > self._limit:
raise LimitOverrunError('Separator is not found, and chunk exceed the limit', offset)
# Complete message (with full separator) may be present in buffer
# even when EOF flag is set. This may happen when the last chunk
# adds data which makes separator be found. That's why we check for
# EOF *ater* inspecting the buffer.
if self._eof:
chunk = bytes(self._buffer)
self._buffer.clear()
raise IncompleteReadError(chunk, None)
# _wait_for_data() will resume reading if stream was paused.
yield from self._wait_for_data('readuntil')
if isep > self._limit:
raise LimitOverrunError('Separator is found, but chunk is longer than limit', isep)
chunk = self._buffer[:isep + seplen]
del self._buffer[:isep + seplen]
self._maybe_resume_transport()
return bytes(chunk)
@coroutine
def read(self, n=-1):
"""Read up to `n` bytes from the stream.
If n is not provided, or set to -1, read until EOF and return all read
bytes. If the EOF was received and the internal buffer is empty, return
an empty bytes object.
If n is zero, return empty bytes object immediatelly.
If n is positive, this function try to read `n` bytes, and may return
less or equal bytes than requested, but at least one byte. If EOF was
received before any byte is read, this function returns empty byte
object.
Returned value is not limited with limit, configured at stream creation.
If stream was paused, this function will automatically resume it if
needed.
"""
if self._exception is not None:
raise self._exception
if n == 0:
return b''
if n < 0:
# This used to just loop creating a new waiter hoping to
# collect everything in self._buffer, but that would
# deadlock if the subprocess sends more than self.limit
# bytes. So just call self.read(self._limit) until EOF.
blocks = []
while True:
block = yield from self.read(self._limit)
if not block:
break
blocks.append(block)
return b''.join(blocks)
if not self._buffer and not self._eof:
yield from self._wait_for_data('read')
# This will work right even if buffer is less than n bytes
data = bytes(self._buffer[:n])
del self._buffer[:n]
self._maybe_resume_transport()
return data
@coroutine
def readexactly(self, n):
"""Read exactly `n` bytes.
Raise an `IncompleteReadError` if EOF is reached before `n` bytes can be
read. The `IncompleteReadError.partial` attribute of the exception will
contain the partial read bytes.
if n is zero, return empty bytes object.
Returned value is not limited with limit, configured at stream creation.
If stream was paused, this function will automatically resume it if
needed.
"""
if n < 0:
raise ValueError('readexactly size can not be less than zero')
if self._exception is not None:
raise self._exception
if n == 0:
return b''
# There used to be "optimized" code here. It created its own
# Future and waited until self._buffer had at least the n
# bytes, then called read(n). Unfortunately, this could pause
# the transport if the argument was larger than the pause
# limit (which is twice self._limit). So now we just read()
# into a local buffer.
blocks = []
while n > 0:
block = yield from self.read(n)
if not block:
partial = b''.join(blocks)
raise IncompleteReadError(partial, len(partial) + n)
blocks.append(block)
n -= len(block)
assert n == 0
return b''.join(blocks)
if compat.PY35:
@coroutine
def __aiter__(self):
return self
@coroutine
def __anext__(self):
val = yield from self.readline()
if val == b'':
raise StopAsyncIteration
return val
__all__ = ['create_subprocess_exec', 'create_subprocess_shell']
import subprocess
from . import events
from . import protocols
from . import streams
from . import tasks
from .coroutines import coroutine
from .log import logger
PIPE = subprocess.PIPE
STDOUT = subprocess.STDOUT
DEVNULL = subprocess.DEVNULL
class SubprocessStreamProtocol(streams.FlowControlMixin,
protocols.SubprocessProtocol):
"""Like StreamReaderProtocol, but for a subprocess."""
def __init__(self, limit, loop):
super().__init__(loop=loop)
self._limit = limit
self.stdin = self.stdout = self.stderr = None
self._transport = None
def __repr__(self):
info = [self.__class__.__name__]
if self.stdin is not None:
info.append('stdin=%r' % self.stdin)
if self.stdout is not None:
info.append('stdout=%r' % self.stdout)
if self.stderr is not None:
info.append('stderr=%r' % self.stderr)
return '<%s>' % ' '.join(info)
def connection_made(self, transport):
self._transport = transport
stdout_transport = transport.get_pipe_transport(1)
if stdout_transport is not None:
self.stdout = streams.StreamReader(limit=self._limit,
loop=self._loop)
self.stdout.set_transport(stdout_transport)
stderr_transport = transport.get_pipe_transport(2)
if stderr_transport is not None:
self.stderr = streams.StreamReader(limit=self._limit,
loop=self._loop)
self.stderr.set_transport(stderr_transport)
stdin_transport = transport.get_pipe_transport(0)
if stdin_transport is not None:
self.stdin = streams.StreamWriter(stdin_transport,
protocol=self,
reader=None,
loop=self._loop)
def pipe_data_received(self, fd, data):
if fd == 1:
reader = self.stdout
elif fd == 2:
reader = self.stderr
else:
reader = None
if reader is not None:
reader.feed_data(data)
def pipe_connection_lost(self, fd, exc):
if fd == 0:
pipe = self.stdin
if pipe is not None:
pipe.close()
self.connection_lost(exc)
return
if fd == 1:
reader = self.stdout
elif fd == 2:
reader = self.stderr
else:
reader = None
if reader != None:
if exc is None:
reader.feed_eof()
else:
reader.set_exception(exc)
def process_exited(self):
self._transport.close()
self._transport = None
class Process:
def __init__(self, transport, protocol, loop):
self._transport = transport
self._protocol = protocol
self._loop = loop
self.stdin = protocol.stdin
self.stdout = protocol.stdout
self.stderr = protocol.stderr
self.pid = transport.get_pid()
def __repr__(self):
return '<%s %s>' % (self.__class__.__name__, self.pid)
@property
def returncode(self):
return self._transport.get_returncode()
@coroutine
def wait(self):
"""Wait until the process exit and return the process return code.
This method is a coroutine."""
return (yield from self._transport._wait())
def send_signal(self, signal):
self._transport.send_signal(signal)
def terminate(self):
self._transport.terminate()
def kill(self):
self._transport.kill()
@coroutine
def _feed_stdin(self, input):
debug = self._loop.get_debug()
self.stdin.write(input)
if debug:
logger.debug('%r communicate: feed stdin (%s bytes)',
self, len(input))
try:
yield from self.stdin.drain()
except (BrokenPipeError, ConnectionResetError) as exc:
# communicate() ignores BrokenPipeError and ConnectionResetError
if debug:
logger.debug('%r communicate: stdin got %r', self, exc)
if debug:
logger.debug('%r communicate: close stdin', self)
self.stdin.close()
@coroutine
def _noop(self):
return None
@coroutine
def _read_stream(self, fd):
transport = self._transport.get_pipe_transport(fd)
if fd == 2:
stream = self.stderr
else:
assert fd == 1
stream = self.stdout
if self._loop.get_debug():
name = 'stdout' if fd == 1 else 'stderr'
logger.debug('%r communicate: read %s', self, name)
output = yield from stream.read()
if self._loop.get_debug():
name = 'stdout' if fd == 1 else 'stderr'
logger.debug('%r communicate: close %s', self, name)
transport.close()
return output
@coroutine
def communicate(self, input=None):
if input:
stdin = self._feed_stdin(input)
else:
stdin = self._noop()
if self.stdout is not None:
stdout = self._read_stream(1)
else:
stdout = self._noop()
if self.stderr is not None:
stderr = self._read_stream(2)
else:
stderr = self._noop()
stdin, stdout, stderr = yield from tasks.gather(stdin, stdout, stderr,
loop=self._loop)
yield from self.wait()
return (stdout, stderr)
@coroutine
def create_subprocess_shell(cmd, stdin=None, stdout=None, stderr=None,
loop=None, limit=streams._DEFAULT_LIMIT, **kwds):
if loop is None:
loop = events.get_event_loop()
protocol_factory = lambda: SubprocessStreamProtocol(limit=limit,
loop=loop)
transport, protocol = yield from loop.subprocess_shell(
protocol_factory,
cmd, stdin=stdin, stdout=stdout,
stderr=stderr, **kwds)
return Process(transport, protocol, loop)
@coroutine
def create_subprocess_exec(program, *args, stdin=None, stdout=None,
stderr=None, loop=None,
limit=streams._DEFAULT_LIMIT, **kwds):
if loop is None:
loop = events.get_event_loop()
protocol_factory = lambda: SubprocessStreamProtocol(limit=limit,
loop=loop)
transport, protocol = yield from loop.subprocess_exec(
protocol_factory,
program, *args,
stdin=stdin, stdout=stdout,
stderr=stderr, **kwds)
return Process(transport, protocol, loop)
"""Support for tasks, coroutines and the scheduler."""
__all__ = ['Task',
'FIRST_COMPLETED', 'FIRST_EXCEPTION', 'ALL_COMPLETED',
'wait', 'wait_for', 'as_completed', 'sleep',
'gather', 'shield', 'ensure_future', 'run_coroutine_threadsafe',
'timeout',
]
import concurrent.futures
import functools
import inspect
import linecache
import traceback
import warnings
import weakref
from . import compat
from . import coroutines
from . import events
from . import futures
from .coroutines import coroutine
class Task(futures.Future):
"""A coroutine wrapped in a Future."""
# An important invariant maintained while a Task not done:
#
# - Either _fut_waiter is None, and _step() is scheduled;
# - or _fut_waiter is some Future, and _step() is *not* scheduled.
#
# The only transition from the latter to the former is through
# _wakeup(). When _fut_waiter is not None, one of its callbacks
# must be _wakeup().
# Weak set containing all tasks alive.
_all_tasks = weakref.WeakSet()
# Dictionary containing tasks that are currently active in
# all running event loops. {EventLoop: Task}
_current_tasks = {}
# If False, don't log a message if the task is destroyed whereas its
# status is still pending
_log_destroy_pending = True
@classmethod
def current_task(cls, loop=None):
"""Return the currently running task in an event loop or None.
By default the current task for the current event loop is returned.
None is returned when called not in the context of a Task.
"""
if loop is None:
loop = events.get_event_loop()
return cls._current_tasks.get(loop)
@classmethod
def all_tasks(cls, loop=None):
"""Return a set of all tasks for an event loop.
By default all tasks for the current event loop are returned.
"""
if loop is None:
loop = events.get_event_loop()
return {t for t in cls._all_tasks if t._loop is loop}
def __init__(self, coro, *, loop=None):
assert coroutines.iscoroutine(coro), repr(coro)
super().__init__(loop=loop)
if self._source_traceback:
del self._source_traceback[-1]
self._coro = coro
self._fut_waiter = None
self._must_cancel = False
self._loop.call_soon(self._step)
self.__class__._all_tasks.add(self)
# On Python 3.3 or older, objects with a destructor that are part of a
# reference cycle are never destroyed. That's not the case any more on
# Python 3.4 thanks to the PEP 442.
if compat.PY34:
def __del__(self):
if self._state == futures._PENDING and self._log_destroy_pending:
context = {
'task': self,
'message': 'Task was destroyed but it is pending!',
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
futures.Future.__del__(self)
def _repr_info(self):
info = super()._repr_info()
if self._must_cancel:
# replace status
info[0] = 'cancelling'
coro = coroutines._format_coroutine(self._coro)
info.insert(1, 'coro=<%s>' % coro)
if self._fut_waiter is not None:
info.insert(2, 'wait_for=%r' % self._fut_waiter)
return info
def get_stack(self, *, limit=None):
"""Return the list of stack frames for this task's coroutine.
If the coroutine is not done, this returns the stack where it is
suspended. If the coroutine has completed successfully or was
cancelled, this returns an empty list. If the coroutine was
terminated by an exception, this returns the list of traceback
frames.
The frames are always ordered from oldest to newest.
The optional limit gives the maximum number of frames to
return; by default all available frames are returned. Its
meaning differs depending on whether a stack or a traceback is
returned: the newest frames of a stack are returned, but the
oldest frames of a traceback are returned. (This matches the
behavior of the traceback module.)
For reasons beyond our control, only one stack frame is
returned for a suspended coroutine.
"""
frames = []
try:
# 'async def' coroutines
f = self._coro.cr_frame
except AttributeError:
f = self._coro.gi_frame
if f is not None:
while f is not None:
if limit is not None:
if limit <= 0:
break
limit -= 1
frames.append(f)
f = f.f_back
frames.reverse()
elif self._exception is not None:
tb = self._exception.__traceback__
while tb is not None:
if limit is not None:
if limit <= 0:
break
limit -= 1
frames.append(tb.tb_frame)
tb = tb.tb_next
return frames
def print_stack(self, *, limit=None, file=None):
"""Print the stack or traceback for this task's coroutine.
This produces output similar to that of the traceback module,
for the frames retrieved by get_stack(). The limit argument
is passed to get_stack(). The file argument is an I/O stream
to which the output is written; by default output is written
to sys.stderr.
"""
extracted_list = []
checked = set()
for f in self.get_stack(limit=limit):
lineno = f.f_lineno
co = f.f_code
filename = co.co_filename
name = co.co_name
if filename not in checked:
checked.add(filename)
linecache.checkcache(filename)
line = linecache.getline(filename, lineno, f.f_globals)
extracted_list.append((filename, lineno, name, line))
exc = self._exception
if not extracted_list:
print('No stack for %r' % self, file=file)
elif exc is not None:
print('Traceback for %r (most recent call last):' % self,
file=file)
else:
print('Stack for %r (most recent call last):' % self,
file=file)
traceback.print_list(extracted_list, file=file)
if exc is not None:
for line in traceback.format_exception_only(exc.__class__, exc):
print(line, file=file, end='')
def cancel(self):
"""Request that this task cancel itself.
This arranges for a CancelledError to be thrown into the
wrapped coroutine on the next cycle through the event loop.
The coroutine then has a chance to clean up or even deny
the request using try/except/finally.
Unlike Future.cancel, this does not guarantee that the
task will be cancelled: the exception might be caught and
acted upon, delaying cancellation of the task or preventing
cancellation completely. The task may also return a value or
raise a different exception.
Immediately after this method is called, Task.cancelled() will
not return True (unless the task was already cancelled). A
task will be marked as cancelled when the wrapped coroutine
terminates with a CancelledError exception (even if cancel()
was not called).
"""
if self.done():
return False
if self._fut_waiter is not None:
if self._fut_waiter.cancel():
# Leave self._fut_waiter; it may be a Task that
# catches and ignores the cancellation so we may have
# to cancel it again later.
return True
# It must be the case that self._step is already scheduled.
self._must_cancel = True
return True
def _step(self, exc=None):
assert not self.done(), \
'_step(): already done: {!r}, {!r}'.format(self, exc)
if self._must_cancel:
if not isinstance(exc, futures.CancelledError):
exc = futures.CancelledError()
self._must_cancel = False
coro = self._coro
self._fut_waiter = None
self.__class__._current_tasks[self._loop] = self
# Call either coro.throw(exc) or coro.send(None).
try:
if exc is None:
# We use the `send` method directly, because coroutines
# don't have `__iter__` and `__next__` methods.
result = coro.send(None)
else:
result = coro.throw(exc)
except StopIteration as exc:
self.set_result(exc.value)
except futures.CancelledError as exc:
super().cancel() # I.e., Future.cancel(self).
except Exception as exc:
self.set_exception(exc)
except BaseException as exc:
self.set_exception(exc)
raise
else:
if isinstance(result, futures.Future):
# Yielded Future must come from Future.__iter__().
if result._loop is not self._loop:
self._loop.call_soon(
self._step,
RuntimeError(
'Task {!r} got Future {!r} attached to a '
'different loop'.format(self, result)))
elif result._blocking:
result._blocking = False
result.add_done_callback(self._wakeup)
self._fut_waiter = result
if self._must_cancel:
if self._fut_waiter.cancel():
self._must_cancel = False
else:
self._loop.call_soon(
self._step,
RuntimeError(
'yield was used instead of yield from '
'in task {!r} with {!r}'.format(self, result)))
elif result is None:
# Bare yield relinquishes control for one event loop iteration.
self._loop.call_soon(self._step)
elif inspect.isgenerator(result):
# Yielding a generator is just wrong.
self._loop.call_soon(
self._step,
RuntimeError(
'yield was used instead of yield from for '
'generator in task {!r} with {}'.format(
self, result)))
else:
# Yielding something else is an error.
self._loop.call_soon(
self._step,
RuntimeError(
'Task got bad yield: {!r}'.format(result)))
finally:
self.__class__._current_tasks.pop(self._loop)
self = None # Needed to break cycles when an exception occurs.
def _wakeup(self, future):
try:
future.result()
except Exception as exc:
# This may also be a cancellation.
self._step(exc)
else:
# Don't pass the value of `future.result()` explicitly,
# as `Future.__iter__` and `Future.__await__` don't need it.
# If we call `_step(value, None)` instead of `_step()`,
# Python eval loop would use `.send(value)` method call,
# instead of `__next__()`, which is slower for futures
# that return non-generator iterators from their `__iter__`.
self._step()
self = None # Needed to break cycles when an exception occurs.
# wait() and as_completed() similar to those in PEP 3148.
FIRST_COMPLETED = concurrent.futures.FIRST_COMPLETED
FIRST_EXCEPTION = concurrent.futures.FIRST_EXCEPTION
ALL_COMPLETED = concurrent.futures.ALL_COMPLETED
@coroutine
def wait(fs, *, loop=None, timeout=None, return_when=ALL_COMPLETED):
"""Wait for the Futures and coroutines given by fs to complete.
The sequence futures must not be empty.
Coroutines will be wrapped in Tasks.
Returns two sets of Future: (done, pending).
Usage:
done, pending = yield from asyncio.wait(fs)
Note: This does not raise TimeoutError! Futures that aren't done
when the timeout occurs are returned in the second set.
"""
if isinstance(fs, futures.Future) or coroutines.iscoroutine(fs):
raise TypeError("expect a list of futures, not %s" % type(fs).__name__)
if not fs:
raise ValueError('Set of coroutines/Futures is empty.')
if return_when not in (FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED):
raise ValueError('Invalid return_when value: {}'.format(return_when))
if loop is None:
loop = events.get_event_loop()
fs = {ensure_future(f, loop=loop) for f in set(fs)}
return (yield from _wait(fs, timeout, return_when, loop))
def _release_waiter(waiter, *args):
if not waiter.done():
waiter.set_result(None)
@coroutine
def wait_for(fut, timeout, *, loop=None):
"""Wait for the single Future or coroutine to complete, with timeout.
Coroutine will be wrapped in Task.
Returns result of the Future or coroutine. When a timeout occurs,
it cancels the task and raises TimeoutError. To avoid the task
cancellation, wrap it in shield().
If the wait is cancelled, the task is also cancelled.
This function is a coroutine.
"""
if loop is None:
loop = events.get_event_loop()
if timeout is None:
return (yield from fut)
waiter = futures.Future(loop=loop)
timeout_handle = loop.call_later(timeout, _release_waiter, waiter)
cb = functools.partial(_release_waiter, waiter)
fut = ensure_future(fut, loop=loop)
fut.add_done_callback(cb)
try:
# wait until the future completes or the timeout
try:
yield from waiter
except futures.CancelledError:
fut.remove_done_callback(cb)
fut.cancel()
raise
if fut.done():
return fut.result()
else:
fut.remove_done_callback(cb)
fut.cancel()
raise futures.TimeoutError()
finally:
timeout_handle.cancel()
@coroutine
def _wait(fs, timeout, return_when, loop):
"""Internal helper for wait() and _wait_for().
The fs argument must be a collection of Futures.
"""
assert fs, 'Set of Futures is empty.'
waiter = futures.Future(loop=loop)
timeout_handle = None
if timeout is not None:
timeout_handle = loop.call_later(timeout, _release_waiter, waiter)
counter = len(fs)
def _on_completion(f):
nonlocal counter
counter -= 1
if (counter <= 0 or
return_when == FIRST_COMPLETED or
return_when == FIRST_EXCEPTION and (not f.cancelled() and
f.exception() is not None)):
if timeout_handle is not None:
timeout_handle.cancel()
if not waiter.done():
waiter.set_result(None)
for f in fs:
f.add_done_callback(_on_completion)
try:
yield from waiter
finally:
if timeout_handle is not None:
timeout_handle.cancel()
done, pending = set(), set()
for f in fs:
f.remove_done_callback(_on_completion)
if f.done():
done.add(f)
else:
pending.add(f)
return done, pending
# This is *not* a @coroutine! It is just an iterator (yielding Futures).
def as_completed(fs, *, loop=None, timeout=None):
"""Return an iterator whose values are coroutines.
When waiting for the yielded coroutines you'll get the results (or
exceptions!) of the original Futures (or coroutines), in the order
in which and as soon as they complete.
This differs from PEP 3148; the proper way to use this is:
for f in as_completed(fs):
result = yield from f # The 'yield from' may raise.
# Use result.
If a timeout is specified, the 'yield from' will raise
TimeoutError when the timeout occurs before all Futures are done.
Note: The futures 'f' are not necessarily members of fs.
"""
if isinstance(fs, futures.Future) or coroutines.iscoroutine(fs):
raise TypeError("expect a list of futures, not %s" % type(fs).__name__)
loop = loop if loop is not None else events.get_event_loop()
todo = {ensure_future(f, loop=loop) for f in set(fs)}
from .queues import Queue # Import here to avoid circular import problem.
done = Queue(loop=loop)
timeout_handle = None
def _on_timeout():
for f in todo:
f.remove_done_callback(_on_completion)
done.put_nowait(None) # Queue a dummy value for _wait_for_one().
todo.clear() # Can't do todo.remove(f) in the loop.
def _on_completion(f):
if not todo:
return # _on_timeout() was here first.
todo.remove(f)
done.put_nowait(f)
if not todo and timeout_handle is not None:
timeout_handle.cancel()
@coroutine
def _wait_for_one():
f = yield from done.get()
if f is None:
# Dummy value from _on_timeout().
raise futures.TimeoutError
return f.result() # May raise f.exception().
for f in todo:
f.add_done_callback(_on_completion)
if todo and timeout is not None:
timeout_handle = loop.call_later(timeout, _on_timeout)
for _ in range(len(todo)):
yield _wait_for_one()
@coroutine
def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay == 0:
yield
return result
future = futures.Future(loop=loop)
h = future._loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
try:
return (yield from future)
finally:
h.cancel()
def ensure_future(coro_or_future, *, loop=None):
"""Wrap a coroutine or an awaitable in a future.
If the argument is a Future, it is returned directly.
"""
if isinstance(coro_or_future, futures.Future):
if loop is not None and loop is not coro_or_future._loop:
raise ValueError('loop argument must agree with Future')
return coro_or_future
elif coroutines.iscoroutine(coro_or_future):
if loop is None:
loop = events.get_event_loop()
task = loop.create_task(coro_or_future)
if task._source_traceback:
del task._source_traceback[-1]
return task
elif compat.PY35 and inspect.isawaitable(coro_or_future):
return ensure_future(_wrap_awaitable(coro_or_future), loop=loop)
else:
raise TypeError('A Future, a coroutine or an awaitable is required')
@coroutine
def _wrap_awaitable(awaitable):
"""Helper for asyncio.ensure_future().
Wraps awaitable (an object with __await__) into a coroutine
that will later be wrapped in a Task by ensure_future().
"""
return (yield from awaitable.__await__())
class _GatheringFuture(futures.Future):
"""Helper for gather().
This overrides cancel() to cancel all the children and act more
like Task.cancel(), which doesn't immediately mark itself as
cancelled.
"""
def __init__(self, children, *, loop=None):
super().__init__(loop=loop)
self._children = children
def cancel(self):
if self.done():
return False
for child in self._children:
child.cancel()
return True
def gather(*coros_or_futures, loop=None, return_exceptions=False):
"""Return a future aggregating results from the given coroutines
or futures.
All futures must share the same event loop. If all the tasks are
done successfully, the returned future's result is the list of
results (in the order of the original sequence, not necessarily
the order of results arrival). If *return_exceptions* is True,
exceptions in the tasks are treated the same as successful
results, and gathered in the result list; otherwise, the first
raised exception will be immediately propagated to the returned
future.
Cancellation: if the outer Future is cancelled, all children (that
have not completed yet) are also cancelled. If any child is
cancelled, this is treated as if it raised CancelledError --
the outer Future is *not* cancelled in this case. (This is to
prevent the cancellation of one child to cause other children to
be cancelled.)
"""
if not coros_or_futures:
outer = futures.Future(loop=loop)
outer.set_result([])
return outer
arg_to_fut = {}
for arg in set(coros_or_futures):
if not isinstance(arg, futures.Future):
fut = ensure_future(arg, loop=loop)
if loop is None:
loop = fut._loop
# The caller cannot control this future, the "destroy pending task"
# warning should not be emitted.
fut._log_destroy_pending = False
else:
fut = arg
if loop is None:
loop = fut._loop
elif fut._loop is not loop:
raise ValueError("futures are tied to different event loops")
arg_to_fut[arg] = fut
children = [arg_to_fut[arg] for arg in coros_or_futures]
nchildren = len(children)
outer = _GatheringFuture(children, loop=loop)
nfinished = 0
results = [None] * nchildren
def _done_callback(i, fut):
nonlocal nfinished
if outer.done():
if not fut.cancelled():
# Mark exception retrieved.
fut.exception()
return
if fut.cancelled():
res = futures.CancelledError()
if not return_exceptions:
outer.set_exception(res)
return
elif fut._exception is not None:
res = fut.exception() # Mark exception retrieved.
if not return_exceptions:
outer.set_exception(res)
return
else:
res = fut._result
results[i] = res
nfinished += 1
if nfinished == nchildren:
outer.set_result(results)
for i, fut in enumerate(children):
fut.add_done_callback(functools.partial(_done_callback, i))
return outer
def shield(arg, *, loop=None):
"""Wait for a future, shielding it from cancellation.
The statement
res = yield from shield(something())
is exactly equivalent to the statement
res = yield from something()
*except* that if the coroutine containing it is cancelled, the
task running in something() is not cancelled. From the POV of
something(), the cancellation did not happen. But its caller is
still cancelled, so the yield-from expression still raises
CancelledError. Note: If something() is cancelled by other means
this will still cancel shield().
If you want to completely ignore cancellation (not recommended)
you can combine shield() with a try/except clause, as follows:
try:
res = yield from shield(something())
except CancelledError:
res = None
"""
inner = ensure_future(arg, loop=loop)
if inner.done():
# Shortcut.
return inner
loop = inner._loop
outer = futures.Future(loop=loop)
def _done_callback(inner):
if outer.cancelled():
if not inner.cancelled():
# Mark inner's result as retrieved.
inner.exception()
return
if inner.cancelled():
outer.cancel()
else:
exc = inner.exception()
if exc is not None:
outer.set_exception(exc)
else:
outer.set_result(inner.result())
inner.add_done_callback(_done_callback)
return outer
def run_coroutine_threadsafe(coro, loop):
"""Submit a coroutine object to a given event loop.
Return a concurrent.futures.Future to access the result.
"""
if not coroutines.iscoroutine(coro):
raise TypeError('A coroutine object is required')
future = concurrent.futures.Future()
def callback():
try:
futures._chain_future(ensure_future(coro, loop=loop), future)
except Exception as exc:
if future.set_running_or_notify_cancel():
future.set_exception(exc)
raise
loop.call_soon_threadsafe(callback)
return future
def timeout(timeout, *, loop=None):
"""A factory which produce a context manager with timeout.
Useful in cases when you want to apply timeout logic around block
of code or in cases when asyncio.wait_for is not suitable.
For example:
>>> with asyncio.timeout(0.001):
... yield from coro()
timeout: timeout value in seconds
loop: asyncio compatible event loop
"""
if loop is None:
loop = events.get_event_loop()
return _Timeout(timeout, loop=loop)
class _Timeout:
def __init__(self, timeout, *, loop):
self._timeout = timeout
self._loop = loop
self._task = None
self._cancelled = False
self._cancel_handler = None
def __enter__(self):
self._task = Task.current_task(loop=self._loop)
if self._task is None:
raise RuntimeError('Timeout context manager should be used '
'inside a task')
self._cancel_handler = self._loop.call_later(
self._timeout, self._cancel_task)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is futures.CancelledError and self._cancelled:
self._cancel_handler = None
self._task = None
raise futures.TimeoutError
self._cancel_handler.cancel()
self._cancel_handler = None
self._task = None
def _cancel_task(self):
self._cancelled = self._task.cancel()
"""Utilities shared by tests."""
import collections
import contextlib
import io
import logging
import os
import re
import socket
import socketserver
import sys
import tempfile
import threading
import time
import unittest
from unittest import mock
from http.server import HTTPServer
from wsgiref.simple_server import WSGIRequestHandler, WSGIServer
try:
import ssl
except ImportError: # pragma: no cover
ssl = None
from . import base_events
from . import compat
from . import events
from . import futures
from . import selectors
from . import tasks
from .coroutines import coroutine
from .log import logger
if sys.platform == 'win32': # pragma: no cover
from .windows_utils import socketpair
else:
from socket import socketpair # pragma: no cover
def dummy_ssl_context():
if ssl is None:
return None
else:
return ssl.SSLContext(ssl.PROTOCOL_SSLv23)
def run_briefly(loop):
@coroutine
def once():
pass
gen = once()
t = loop.create_task(gen)
# Don't log a warning if the task is not done after run_until_complete().
# It occurs if the loop is stopped or if a task raises a BaseException.
t._log_destroy_pending = False
try:
loop.run_until_complete(t)
finally:
gen.close()
def run_until(loop, pred, timeout=30):
deadline = time.time() + timeout
while not pred():
if timeout is not None:
timeout = deadline - time.time()
if timeout <= 0:
raise futures.TimeoutError()
loop.run_until_complete(tasks.sleep(0.001, loop=loop))
def run_once(loop):
"""Legacy API to run once through the event loop.
This is the recommended pattern for test code. It will poll the
selector once and run all callbacks scheduled in response to I/O
events.
"""
loop.call_soon(loop.stop)
loop.run_forever()
class SilentWSGIRequestHandler(WSGIRequestHandler):
def get_stderr(self):
return io.StringIO()
def log_message(self, format, *args):
pass
class SilentWSGIServer(WSGIServer):
request_timeout = 2
def get_request(self):
request, client_addr = super().get_request()
request.settimeout(self.request_timeout)
return request, client_addr
def handle_error(self, request, client_address):
pass
class SSLWSGIServerMixin:
def finish_request(self, request, client_address):
# The relative location of our test directory (which
# contains the ssl key and certificate files) differs
# between the stdlib and stand-alone asyncio.
# Prefer our own if we can find it.
here = os.path.join(os.path.dirname(__file__), '..', 'tests')
if not os.path.isdir(here):
here = os.path.join(os.path.dirname(os.__file__),
'test', 'test_asyncio')
keyfile = os.path.join(here, 'ssl_key.pem')
certfile = os.path.join(here, 'ssl_cert.pem')
ssock = ssl.wrap_socket(request,
keyfile=keyfile,
certfile=certfile,
server_side=True)
try:
self.RequestHandlerClass(ssock, client_address, self)
ssock.close()
except OSError:
# maybe socket has been closed by peer
pass
class SSLWSGIServer(SSLWSGIServerMixin, SilentWSGIServer):
pass
def _run_test_server(*, address, use_ssl=False, server_cls, server_ssl_cls):
def app(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/plain')]
start_response(status, headers)
return [b'Test message']
# Run the test WSGI server in a separate thread in order not to
# interfere with event handling in the main thread
server_class = server_ssl_cls if use_ssl else server_cls
httpd = server_class(address, SilentWSGIRequestHandler)
httpd.set_app(app)
httpd.address = httpd.server_address
server_thread = threading.Thread(
target=lambda: httpd.serve_forever(poll_interval=0.05))
server_thread.start()
try:
yield httpd
finally:
httpd.shutdown()
httpd.server_close()
server_thread.join()
if hasattr(socket, 'AF_UNIX'):
class UnixHTTPServer(socketserver.UnixStreamServer, HTTPServer):
def server_bind(self):
socketserver.UnixStreamServer.server_bind(self)
self.server_name = '127.0.0.1'
self.server_port = 80
class UnixWSGIServer(UnixHTTPServer, WSGIServer):
request_timeout = 2
def server_bind(self):
UnixHTTPServer.server_bind(self)
self.setup_environ()
def get_request(self):
request, client_addr = super().get_request()
request.settimeout(self.request_timeout)
# Code in the stdlib expects that get_request
# will return a socket and a tuple (host, port).
# However, this isn't true for UNIX sockets,
# as the second return value will be a path;
# hence we return some fake data sufficient
# to get the tests going
return request, ('127.0.0.1', '')
class SilentUnixWSGIServer(UnixWSGIServer):
def handle_error(self, request, client_address):
pass
class UnixSSLWSGIServer(SSLWSGIServerMixin, SilentUnixWSGIServer):
pass
def gen_unix_socket_path():
with tempfile.NamedTemporaryFile() as file:
return file.name
@contextlib.contextmanager
def unix_socket_path():
path = gen_unix_socket_path()
try:
yield path
finally:
try:
os.unlink(path)
except OSError:
pass
@contextlib.contextmanager
def run_test_unix_server(*, use_ssl=False):
with unix_socket_path() as path:
yield from _run_test_server(address=path, use_ssl=use_ssl,
server_cls=SilentUnixWSGIServer,
server_ssl_cls=UnixSSLWSGIServer)
@contextlib.contextmanager
def run_test_server(*, host='127.0.0.1', port=0, use_ssl=False):
yield from _run_test_server(address=(host, port), use_ssl=use_ssl,
server_cls=SilentWSGIServer,
server_ssl_cls=SSLWSGIServer)
def make_test_protocol(base):
dct = {}
for name in dir(base):
if name.startswith('__') and name.endswith('__'):
# skip magic names
continue
dct[name] = MockCallback(return_value=None)
return type('TestProtocol', (base,) + base.__bases__, dct)()
class TestSelector(selectors.BaseSelector):
def __init__(self):
self.keys = {}
def register(self, fileobj, events, data=None):
key = selectors.SelectorKey(fileobj, 0, events, data)
self.keys[fileobj] = key
return key
def unregister(self, fileobj):
return self.keys.pop(fileobj)
def select(self, timeout):
return []
def get_map(self):
return self.keys
class TestLoop(base_events.BaseEventLoop):
"""Loop for unittests.
It manages self time directly.
If something scheduled to be executed later then
on next loop iteration after all ready handlers done
generator passed to __init__ is calling.
Generator should be like this:
def gen():
...
when = yield ...
... = yield time_advance
Value returned by yield is absolute time of next scheduled handler.
Value passed to yield is time advance to move loop's time forward.
"""
def __init__(self, gen=None):
super().__init__()
if gen is None:
def gen():
yield
self._check_on_close = False
else:
self._check_on_close = True
self._gen = gen()
next(self._gen)
self._time = 0
self._clock_resolution = 1e-9
self._timers = []
self._selector = TestSelector()
self.readers = {}
self.writers = {}
self.reset_counters()
def time(self):
return self._time
def advance_time(self, advance):
"""Move test time forward."""
if advance:
self._time += advance
def close(self):
super().close()
if self._check_on_close:
try:
self._gen.send(0)
except StopIteration:
pass
else: # pragma: no cover
raise AssertionError("Time generator is not finished")
def add_reader(self, fd, callback, *args):
self.readers[fd] = events.Handle(callback, args, self)
def remove_reader(self, fd):
self.remove_reader_count[fd] += 1
if fd in self.readers:
del self.readers[fd]
return True
else:
return False
def assert_reader(self, fd, callback, *args):
assert fd in self.readers, 'fd {} is not registered'.format(fd)
handle = self.readers[fd]
assert handle._callback == callback, '{!r} != {!r}'.format(
handle._callback, callback)
assert handle._args == args, '{!r} != {!r}'.format(
handle._args, args)
def add_writer(self, fd, callback, *args):
self.writers[fd] = events.Handle(callback, args, self)
def remove_writer(self, fd):
self.remove_writer_count[fd] += 1
if fd in self.writers:
del self.writers[fd]
return True
else:
return False
def assert_writer(self, fd, callback, *args):
assert fd in self.writers, 'fd {} is not registered'.format(fd)
handle = self.writers[fd]
assert handle._callback == callback, '{!r} != {!r}'.format(
handle._callback, callback)
assert handle._args == args, '{!r} != {!r}'.format(
handle._args, args)
def reset_counters(self):
self.remove_reader_count = collections.defaultdict(int)
self.remove_writer_count = collections.defaultdict(int)
def _run_once(self):
super()._run_once()
for when in self._timers:
advance = self._gen.send(when)
self.advance_time(advance)
self._timers = []
def call_at(self, when, callback, *args):
self._timers.append(when)
return super().call_at(when, callback, *args)
def _process_events(self, event_list):
return
def _write_to_self(self):
pass
def MockCallback(**kwargs):
return mock.Mock(spec=['__call__'], **kwargs)
class MockPattern(str):
"""A regex based str with a fuzzy __eq__.
Use this helper with 'mock.assert_called_with', or anywhere
where a regex comparison between strings is needed.
For instance:
mock_call.assert_called_with(MockPattern('spam.*ham'))
"""
def __eq__(self, other):
return bool(re.search(str(self), other, re.S))
def get_function_source(func):
source = events._get_function_source(func)
if source is None:
raise ValueError("unable to get the source of %r" % (func,))
return source
class TestCase(unittest.TestCase):
def set_event_loop(self, loop, *, cleanup=True):
assert loop is not None
# ensure that the event loop is passed explicitly in asyncio
events.set_event_loop(None)
if cleanup:
self.addCleanup(loop.close)
def new_test_loop(self, gen=None):
loop = TestLoop(gen)
self.set_event_loop(loop)
return loop
def tearDown(self):
events.set_event_loop(None)
# Detect CPython bug #23353: ensure that yield/yield-from is not used
# in an except block of a generator
self.assertEqual(sys.exc_info(), (None, None, None))
if not compat.PY34:
# Python 3.3 compatibility
def subTest(self, *args, **kwargs):
class EmptyCM:
def __enter__(self):
pass
def __exit__(self, *exc):
pass
return EmptyCM()
@contextlib.contextmanager
def disable_logger():
"""Context manager to disable asyncio logger.
For example, it can be used to ignore warnings in debug mode.
"""
old_level = logger.level
try:
logger.setLevel(logging.CRITICAL+1)
yield
finally:
logger.setLevel(old_level)
def mock_nonblocking_socket(proto=socket.IPPROTO_TCP, type=socket.SOCK_STREAM,
family=socket.AF_INET):
"""Create a mock of a non-blocking socket."""
sock = mock.MagicMock(socket.socket)
sock.proto = proto
sock.type = type
sock.family = family
sock.gettimeout.return_value = 0.0
return sock
def force_legacy_ssl_support():
return mock.patch('asyncio.sslproto._is_sslproto_available',
return_value=False)
"""Abstract Transport class."""
from asyncio import compat
__all__ = ['BaseTransport', 'ReadTransport', 'WriteTransport',
'Transport', 'DatagramTransport', 'SubprocessTransport',
]
class BaseTransport:
"""Base class for transports."""
def __init__(self, extra=None):
if extra is None:
extra = {}
self._extra = extra
def get_extra_info(self, name, default=None):
"""Get optional transport information."""
return self._extra.get(name, default)
def is_closing(self):
"""Return True if the transport is closing or closed."""
raise NotImplementedError
def close(self):
"""Close the transport.
Buffered data will be flushed asynchronously. No more data
will be received. After all buffered data is flushed, the
protocol's connection_lost() method will (eventually) called
with None as its argument.
"""
raise NotImplementedError
class ReadTransport(BaseTransport):
"""Interface for read-only transports."""
def pause_reading(self):
"""Pause the receiving end.
No data will be passed to the protocol's data_received()
method until resume_reading() is called.
"""
raise NotImplementedError
def resume_reading(self):
"""Resume the receiving end.
Data received will once again be passed to the protocol's
data_received() method.
"""
raise NotImplementedError
class WriteTransport(BaseTransport):
"""Interface for write-only transports."""
def set_write_buffer_limits(self, high=None, low=None):
"""Set the high- and low-water limits for write flow control.
These two values control when to call the protocol's
pause_writing() and resume_writing() methods. If specified,
the low-water limit must be less than or equal to the
high-water limit. Neither value can be negative.
The defaults are implementation-specific. If only the
high-water limit is given, the low-water limit defaults to an
implementation-specific value less than or equal to the
high-water limit. Setting high to zero forces low to zero as
well, and causes pause_writing() to be called whenever the
buffer becomes non-empty. Setting low to zero causes
resume_writing() to be called only once the buffer is empty.
Use of zero for either limit is generally sub-optimal as it
reduces opportunities for doing I/O and computation
concurrently.
"""
raise NotImplementedError
def get_write_buffer_size(self):
"""Return the current size of the write buffer."""
raise NotImplementedError
def write(self, data):
"""Write some data bytes to the transport.
This does not block; it buffers the data and arranges for it
to be sent out asynchronously.
"""
raise NotImplementedError
def writelines(self, list_of_data):
"""Write a list (or any iterable) of data bytes to the transport.
The default implementation concatenates the arguments and
calls write() on the result.
"""
data = compat.flatten_list_bytes(list_of_data)
self.write(data)
def write_eof(self):
"""Close the write end after flushing buffered data.
(This is like typing ^D into a UNIX program reading from stdin.)
Data may still be received.
"""
raise NotImplementedError
def can_write_eof(self):
"""Return True if this transport supports write_eof(), False if not."""
raise NotImplementedError
def abort(self):
"""Close the transport immediately.
Buffered data will be lost. No more data will be received.
The protocol's connection_lost() method will (eventually) be
called with None as its argument.
"""
raise NotImplementedError
class Transport(ReadTransport, WriteTransport):
"""Interface representing a bidirectional transport.
There may be several implementations, but typically, the user does
not implement new transports; rather, the platform provides some
useful transports that are implemented using the platform's best
practices.
The user never instantiates a transport directly; they call a
utility function, passing it a protocol factory and other
information necessary to create the transport and protocol. (E.g.
EventLoop.create_connection() or EventLoop.create_server().)
The utility function will asynchronously create a transport and a
protocol and hook them up by calling the protocol's
connection_made() method, passing it the transport.
The implementation here raises NotImplemented for every method
except writelines(), which calls write() in a loop.
"""
class DatagramTransport(BaseTransport):
"""Interface for datagram (UDP) transports."""
def sendto(self, data, addr=None):
"""Send data to the transport.
This does not block; it buffers the data and arranges for it
to be sent out asynchronously.
addr is target socket address.
If addr is None use target address pointed on transport creation.
"""
raise NotImplementedError
def abort(self):
"""Close the transport immediately.
Buffered data will be lost. No more data will be received.
The protocol's connection_lost() method will (eventually) be
called with None as its argument.
"""
raise NotImplementedError
class SubprocessTransport(BaseTransport):
def get_pid(self):
"""Get subprocess id."""
raise NotImplementedError
def get_returncode(self):
"""Get subprocess returncode.
See also
http://docs.python.org/3/library/subprocess#subprocess.Popen.returncode
"""
raise NotImplementedError
def get_pipe_transport(self, fd):
"""Get transport for pipe with number fd."""
raise NotImplementedError
def send_signal(self, signal):
"""Send signal to subprocess.
See also:
docs.python.org/3/library/subprocess#subprocess.Popen.send_signal
"""
raise NotImplementedError
def terminate(self):
"""Stop the subprocess.
Alias for close() method.
On Posix OSs the method sends SIGTERM to the subprocess.
On Windows the Win32 API function TerminateProcess()
is called to stop the subprocess.
See also:
http://docs.python.org/3/library/subprocess#subprocess.Popen.terminate
"""
raise NotImplementedError
def kill(self):
"""Kill the subprocess.
On Posix OSs the function sends SIGKILL to the subprocess.
On Windows kill() is an alias for terminate().
See also:
http://docs.python.org/3/library/subprocess#subprocess.Popen.kill
"""
raise NotImplementedError
class _FlowControlMixin(Transport):
"""All the logic for (write) flow control in a mix-in base class.
The subclass must implement get_write_buffer_size(). It must call
_maybe_pause_protocol() whenever the write buffer size increases,
and _maybe_resume_protocol() whenever it decreases. It may also
override set_write_buffer_limits() (e.g. to specify different
defaults).
The subclass constructor must call super().__init__(extra). This
will call set_write_buffer_limits().
The user may call set_write_buffer_limits() and
get_write_buffer_size(), and their protocol's pause_writing() and
resume_writing() may be called.
"""
def __init__(self, extra=None, loop=None):
super().__init__(extra)
assert loop is not None
self._loop = loop
self._protocol_paused = False
self._set_write_buffer_limits()
def _maybe_pause_protocol(self):
size = self.get_write_buffer_size()
if size <= self._high_water:
return
if not self._protocol_paused:
self._protocol_paused = True
try:
self._protocol.pause_writing()
except Exception as exc:
self._loop.call_exception_handler({
'message': 'protocol.pause_writing() failed',
'exception': exc,
'transport': self,
'protocol': self._protocol,
})
def _maybe_resume_protocol(self):
if (self._protocol_paused and
self.get_write_buffer_size() <= self._low_water):
self._protocol_paused = False
try:
self._protocol.resume_writing()
except Exception as exc:
self._loop.call_exception_handler({
'message': 'protocol.resume_writing() failed',
'exception': exc,
'transport': self,
'protocol': self._protocol,
})
def get_write_buffer_limits(self):
return (self._low_water, self._high_water)
def _set_write_buffer_limits(self, high=None, low=None):
if high is None:
if low is None:
high = 64*1024
else:
high = 4*low
if low is None:
low = high // 4
if not high >= low >= 0:
raise ValueError('high (%r) must be >= low (%r) must be >= 0' %
(high, low))
self._high_water = high
self._low_water = low
def set_write_buffer_limits(self, high=None, low=None):
self._set_write_buffer_limits(high=high, low=low)
self._maybe_pause_protocol()
def get_write_buffer_size(self):
raise NotImplementedError
"""Selector event loop for Unix with signal handling."""
import errno
import os
import signal
import socket
import stat
import subprocess
import sys
import threading
import warnings
from . import base_events
from . import base_subprocess
from . import compat
from . import constants
from . import coroutines
from . import events
from . import futures
from . import selector_events
from . import selectors
from . import transports
from .coroutines import coroutine
from .log import logger
__all__ = ['SelectorEventLoop',
'AbstractChildWatcher', 'SafeChildWatcher',
'FastChildWatcher', 'DefaultEventLoopPolicy',
]
if sys.platform == 'win32': # pragma: no cover
raise ImportError('Signals are not really supported on Windows')
def _sighandler_noop(signum, frame):
"""Dummy signal handler."""
pass
class _UnixSelectorEventLoop(selector_events.BaseSelectorEventLoop):
"""Unix event loop.
Adds signal handling and UNIX Domain Socket support to SelectorEventLoop.
"""
def __init__(self, selector=None):
super().__init__(selector)
self._signal_handlers = {}
def _socketpair(self):
return socket.socketpair()
def close(self):
super().close()
for sig in list(self._signal_handlers):
self.remove_signal_handler(sig)
def _process_self_data(self, data):
for signum in data:
if not signum:
# ignore null bytes written by _write_to_self()
continue
self._handle_signal(signum)
def add_signal_handler(self, sig, callback, *args):
"""Add a handler for a signal. UNIX only.
Raise ValueError if the signal number is invalid or uncatchable.
Raise RuntimeError if there is a problem setting up the handler.
"""
if (coroutines.iscoroutine(callback)
or coroutines.iscoroutinefunction(callback)):
raise TypeError("coroutines cannot be used "
"with add_signal_handler()")
self._check_signal(sig)
self._check_closed()
try:
# set_wakeup_fd() raises ValueError if this is not the
# main thread. By calling it early we ensure that an
# event loop running in another thread cannot add a signal
# handler.
signal.set_wakeup_fd(self._csock.fileno())
except (ValueError, OSError) as exc:
raise RuntimeError(str(exc))
handle = events.Handle(callback, args, self)
self._signal_handlers[sig] = handle
try:
# Register a dummy signal handler to ask Python to write the signal
# number in the wakup file descriptor. _process_self_data() will
# read signal numbers from this file descriptor to handle signals.
signal.signal(sig, _sighandler_noop)
# Set SA_RESTART to limit EINTR occurrences.
signal.siginterrupt(sig, False)
except OSError as exc:
del self._signal_handlers[sig]
if not self._signal_handlers:
try:
signal.set_wakeup_fd(-1)
except (ValueError, OSError) as nexc:
logger.info('set_wakeup_fd(-1) failed: %s', nexc)
if exc.errno == errno.EINVAL:
raise RuntimeError('sig {} cannot be caught'.format(sig))
else:
raise
def _handle_signal(self, sig):
"""Internal helper that is the actual signal handler."""
handle = self._signal_handlers.get(sig)
if handle is None:
return # Assume it's some race condition.
if handle._cancelled:
self.remove_signal_handler(sig) # Remove it properly.
else:
self._add_callback_signalsafe(handle)
def remove_signal_handler(self, sig):
"""Remove a handler for a signal. UNIX only.
Return True if a signal handler was removed, False if not.
"""
self._check_signal(sig)
try:
del self._signal_handlers[sig]
except KeyError:
return False
if sig == signal.SIGINT:
handler = signal.default_int_handler
else:
handler = signal.SIG_DFL
try:
signal.signal(sig, handler)
except OSError as exc:
if exc.errno == errno.EINVAL:
raise RuntimeError('sig {} cannot be caught'.format(sig))
else:
raise
if not self._signal_handlers:
try:
signal.set_wakeup_fd(-1)
except (ValueError, OSError) as exc:
logger.info('set_wakeup_fd(-1) failed: %s', exc)
return True
def _check_signal(self, sig):
"""Internal helper to validate a signal.
Raise ValueError if the signal number is invalid or uncatchable.
Raise RuntimeError if there is a problem setting up the handler.
"""
if not isinstance(sig, int):
raise TypeError('sig must be an int, not {!r}'.format(sig))
if not (1 <= sig < signal.NSIG):
raise ValueError(
'sig {} out of range(1, {})'.format(sig, signal.NSIG))
def _make_read_pipe_transport(self, pipe, protocol, waiter=None,
extra=None):
return _UnixReadPipeTransport(self, pipe, protocol, waiter, extra)
def _make_write_pipe_transport(self, pipe, protocol, waiter=None,
extra=None):
return _UnixWritePipeTransport(self, pipe, protocol, waiter, extra)
@coroutine
def _make_subprocess_transport(self, protocol, args, shell,
stdin, stdout, stderr, bufsize,
extra=None, **kwargs):
with events.get_child_watcher() as watcher:
waiter = futures.Future(loop=self)
transp = _UnixSubprocessTransport(self, protocol, args, shell,
stdin, stdout, stderr, bufsize,
waiter=waiter, extra=extra,
**kwargs)
watcher.add_child_handler(transp.get_pid(),
self._child_watcher_callback, transp)
try:
yield from waiter
except Exception as exc:
# Workaround CPython bug #23353: using yield/yield-from in an
# except block of a generator doesn't clear properly
# sys.exc_info()
err = exc
else:
err = None
if err is not None:
transp.close()
yield from transp._wait()
raise err
return transp
def _child_watcher_callback(self, pid, returncode, transp):
self.call_soon_threadsafe(transp._process_exited, returncode)
@coroutine
def create_unix_connection(self, protocol_factory, path, *,
ssl=None, sock=None,
server_hostname=None):
assert server_hostname is None or isinstance(server_hostname, str)
if ssl:
if server_hostname is None:
raise ValueError(
'you have to pass server_hostname when using ssl')
else:
if server_hostname is not None:
raise ValueError('server_hostname is only meaningful with ssl')
if path is not None:
if sock is not None:
raise ValueError(
'path and sock can not be specified at the same time')
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM, 0)
try:
sock.setblocking(False)
yield from self.sock_connect(sock, path)
except:
sock.close()
raise
else:
if sock is None:
raise ValueError('no path and sock were specified')
sock.setblocking(False)
transport, protocol = yield from self._create_connection_transport(
sock, protocol_factory, ssl, server_hostname)
return transport, protocol
@coroutine
def create_unix_server(self, protocol_factory, path=None, *,
sock=None, backlog=100, ssl=None):
if isinstance(ssl, bool):
raise TypeError('ssl argument must be an SSLContext or None')
if path is not None:
if sock is not None:
raise ValueError(
'path and sock can not be specified at the same time')
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
try:
sock.bind(path)
except OSError as exc:
sock.close()
if exc.errno == errno.EADDRINUSE:
# Let's improve the error message by adding
# with what exact address it occurs.
msg = 'Address {!r} is already in use'.format(path)
raise OSError(errno.EADDRINUSE, msg) from None
else:
raise
except:
sock.close()
raise
else:
if sock is None:
raise ValueError(
'path was not specified, and no sock specified')
if sock.family != socket.AF_UNIX:
raise ValueError(
'A UNIX Domain Socket was expected, got {!r}'.format(sock))
server = base_events.Server(self, [sock])
sock.listen(backlog)
sock.setblocking(False)
self._start_serving(protocol_factory, sock, ssl, server)
return server
if hasattr(os, 'set_blocking'):
def _set_nonblocking(fd):
os.set_blocking(fd, False)
else:
import fcntl
def _set_nonblocking(fd):
flags = fcntl.fcntl(fd, fcntl.F_GETFL)
flags = flags | os.O_NONBLOCK
fcntl.fcntl(fd, fcntl.F_SETFL, flags)
class _UnixReadPipeTransport(transports.ReadTransport):
max_size = 256 * 1024 # max bytes we read in one event loop iteration
def __init__(self, loop, pipe, protocol, waiter=None, extra=None):
super().__init__(extra)
self._extra['pipe'] = pipe
self._loop = loop
self._pipe = pipe
self._fileno = pipe.fileno()
mode = os.fstat(self._fileno).st_mode
if not (stat.S_ISFIFO(mode) or
stat.S_ISSOCK(mode) or
stat.S_ISCHR(mode)):
raise ValueError("Pipe transport is for pipes/sockets only.")
_set_nonblocking(self._fileno)
self._protocol = protocol
self._closing = False
self._loop.call_soon(self._protocol.connection_made, self)
# only start reading when connection_made() has been called
self._loop.call_soon(self._loop.add_reader,
self._fileno, self._read_ready)
if waiter is not None:
# only wake up the waiter when connection_made() has been called
self._loop.call_soon(futures._set_result_unless_cancelled,
waiter, None)
def __repr__(self):
info = [self.__class__.__name__]
if self._pipe is None:
info.append('closed')
elif self._closing:
info.append('closing')
info.append('fd=%s' % self._fileno)
if self._pipe is not None:
polling = selector_events._test_selector_event(
self._loop._selector,
self._fileno, selectors.EVENT_READ)
if polling:
info.append('polling')
else:
info.append('idle')
else:
info.append('closed')
return '<%s>' % ' '.join(info)
def _read_ready(self):
try:
data = os.read(self._fileno, self.max_size)
except (BlockingIOError, InterruptedError):
pass
except OSError as exc:
self._fatal_error(exc, 'Fatal read error on pipe transport')
else:
if data:
self._protocol.data_received(data)
else:
if self._loop.get_debug():
logger.info("%r was closed by peer", self)
self._closing = True
self._loop.remove_reader(self._fileno)
self._loop.call_soon(self._protocol.eof_received)
self._loop.call_soon(self._call_connection_lost, None)
def pause_reading(self):
self._loop.remove_reader(self._fileno)
def resume_reading(self):
self._loop.add_reader(self._fileno, self._read_ready)
def is_closing(self):
return self._closing
def close(self):
if not self._closing:
self._close(None)
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if self._pipe is not None:
warnings.warn("unclosed transport %r" % self, ResourceWarning)
self._pipe.close()
def _fatal_error(self, exc, message='Fatal error on pipe transport'):
# should be called by exception handler only
if (isinstance(exc, OSError) and exc.errno == errno.EIO):
if self._loop.get_debug():
logger.debug("%r: %s", self, message, exc_info=True)
else:
self._loop.call_exception_handler({
'message': message,
'exception': exc,
'transport': self,
'protocol': self._protocol,
})
self._close(exc)
def _close(self, exc):
self._closing = True
self._loop.remove_reader(self._fileno)
self._loop.call_soon(self._call_connection_lost, exc)
def _call_connection_lost(self, exc):
try:
self._protocol.connection_lost(exc)
finally:
self._pipe.close()
self._pipe = None
self._protocol = None
self._loop = None
class _UnixWritePipeTransport(transports._FlowControlMixin,
transports.WriteTransport):
def __init__(self, loop, pipe, protocol, waiter=None, extra=None):
super().__init__(extra, loop)
self._extra['pipe'] = pipe
self._pipe = pipe
self._fileno = pipe.fileno()
mode = os.fstat(self._fileno).st_mode
is_socket = stat.S_ISSOCK(mode)
if not (is_socket or
stat.S_ISFIFO(mode) or
stat.S_ISCHR(mode)):
raise ValueError("Pipe transport is only for "
"pipes, sockets and character devices")
_set_nonblocking(self._fileno)
self._protocol = protocol
self._buffer = []
self._conn_lost = 0
self._closing = False # Set when close() or write_eof() called.
self._loop.call_soon(self._protocol.connection_made, self)
# On AIX, the reader trick (to be notified when the read end of the
# socket is closed) only works for sockets. On other platforms it
# works for pipes and sockets. (Exception: OS X 10.4? Issue #19294.)
if is_socket or not sys.platform.startswith("aix"):
# only start reading when connection_made() has been called
self._loop.call_soon(self._loop.add_reader,
self._fileno, self._read_ready)
if waiter is not None:
# only wake up the waiter when connection_made() has been called
self._loop.call_soon(futures._set_result_unless_cancelled,
waiter, None)
def __repr__(self):
info = [self.__class__.__name__]
if self._pipe is None:
info.append('closed')
elif self._closing:
info.append('closing')
info.append('fd=%s' % self._fileno)
if self._pipe is not None:
polling = selector_events._test_selector_event(
self._loop._selector,
self._fileno, selectors.EVENT_WRITE)
if polling:
info.append('polling')
else:
info.append('idle')
bufsize = self.get_write_buffer_size()
info.append('bufsize=%s' % bufsize)
else:
info.append('closed')
return '<%s>' % ' '.join(info)
def get_write_buffer_size(self):
return sum(len(data) for data in self._buffer)
def _read_ready(self):
# Pipe was closed by peer.
if self._loop.get_debug():
logger.info("%r was closed by peer", self)
if self._buffer:
self._close(BrokenPipeError())
else:
self._close()
def write(self, data):
assert isinstance(data, (bytes, bytearray, memoryview)), repr(data)
if isinstance(data, bytearray):
data = memoryview(data)
if not data:
return
if self._conn_lost or self._closing:
if self._conn_lost >= constants.LOG_THRESHOLD_FOR_CONNLOST_WRITES:
logger.warning('pipe closed by peer or '
'os.write(pipe, data) raised exception.')
self._conn_lost += 1
return
if not self._buffer:
# Attempt to send it right away first.
try:
n = os.write(self._fileno, data)
except (BlockingIOError, InterruptedError):
n = 0
except Exception as exc:
self._conn_lost += 1
self._fatal_error(exc, 'Fatal write error on pipe transport')
return
if n == len(data):
return
elif n > 0:
data = data[n:]
self._loop.add_writer(self._fileno, self._write_ready)
self._buffer.append(data)
self._maybe_pause_protocol()
def _write_ready(self):
data = b''.join(self._buffer)
assert data, 'Data should not be empty'
self._buffer.clear()
try:
n = os.write(self._fileno, data)
except (BlockingIOError, InterruptedError):
self._buffer.append(data)
except Exception as exc:
self._conn_lost += 1
# Remove writer here, _fatal_error() doesn't it
# because _buffer is empty.
self._loop.remove_writer(self._fileno)
self._fatal_error(exc, 'Fatal write error on pipe transport')
else:
if n == len(data):
self._loop.remove_writer(self._fileno)
self._maybe_resume_protocol() # May append to buffer.
if not self._buffer and self._closing:
self._loop.remove_reader(self._fileno)
self._call_connection_lost(None)
return
elif n > 0:
data = data[n:]
self._buffer.append(data) # Try again later.
def can_write_eof(self):
return True
def write_eof(self):
if self._closing:
return
assert self._pipe
self._closing = True
if not self._buffer:
self._loop.remove_reader(self._fileno)
self._loop.call_soon(self._call_connection_lost, None)
def is_closing(self):
return self._closing
def close(self):
if self._pipe is not None and not self._closing:
# write_eof is all what we needed to close the write pipe
self.write_eof()
# On Python 3.3 and older, objects with a destructor part of a reference
# cycle are never destroyed. It's not more the case on Python 3.4 thanks
# to the PEP 442.
if compat.PY34:
def __del__(self):
if self._pipe is not None:
warnings.warn("unclosed transport %r" % self, ResourceWarning)
self._pipe.close()
def abort(self):
self._close(None)
def _fatal_error(self, exc, message='Fatal error on pipe transport'):
# should be called by exception handler only
if isinstance(exc, (BrokenPipeError, ConnectionResetError)):
if self._loop.get_debug():
logger.debug("%r: %s", self, message, exc_info=True)
else:
self._loop.call_exception_handler({
'message': message,
'exception': exc,
'transport': self,
'protocol': self._protocol,
})
self._close(exc)
def _close(self, exc=None):
self._closing = True
if self._buffer:
self._loop.remove_writer(self._fileno)
self._buffer.clear()
self._loop.remove_reader(self._fileno)
self._loop.call_soon(self._call_connection_lost, exc)
def _call_connection_lost(self, exc):
try:
self._protocol.connection_lost(exc)
finally:
self._pipe.close()
self._pipe = None
self._protocol = None
self._loop = None
if hasattr(os, 'set_inheritable'):
# Python 3.4 and newer
_set_inheritable = os.set_inheritable
else:
import fcntl
def _set_inheritable(fd, inheritable):
cloexec_flag = getattr(fcntl, 'FD_CLOEXEC', 1)
old = fcntl.fcntl(fd, fcntl.F_GETFD)
if not inheritable:
fcntl.fcntl(fd, fcntl.F_SETFD, old | cloexec_flag)
else:
fcntl.fcntl(fd, fcntl.F_SETFD, old & ~cloexec_flag)
class _UnixSubprocessTransport(base_subprocess.BaseSubprocessTransport):
def _start(self, args, shell, stdin, stdout, stderr, bufsize, **kwargs):
stdin_w = None
if stdin == subprocess.PIPE:
# Use a socket pair for stdin, since not all platforms
# support selecting read events on the write end of a
# socket (which we use in order to detect closing of the
# other end). Notably this is needed on AIX, and works
# just fine on other platforms.
stdin, stdin_w = self._loop._socketpair()
# Mark the write end of the stdin pipe as non-inheritable,
# needed by close_fds=False on Python 3.3 and older
# (Python 3.4 implements the PEP 446, socketpair returns
# non-inheritable sockets)
_set_inheritable(stdin_w.fileno(), False)
self._proc = subprocess.Popen(
args, shell=shell, stdin=stdin, stdout=stdout, stderr=stderr,
universal_newlines=False, bufsize=bufsize, **kwargs)
if stdin_w is not None:
stdin.close()
self._proc.stdin = open(stdin_w.detach(), 'wb', buffering=bufsize)
class AbstractChildWatcher:
"""Abstract base class for monitoring child processes.
Objects derived from this class monitor a collection of subprocesses and
report their termination or interruption by a signal.
New callbacks are registered with .add_child_handler(). Starting a new
process must be done within a 'with' block to allow the watcher to suspend
its activity until the new process if fully registered (this is needed to
prevent a race condition in some implementations).
Example:
with watcher:
proc = subprocess.Popen("sleep 1")
watcher.add_child_handler(proc.pid, callback)
Notes:
Implementations of this class must be thread-safe.
Since child watcher objects may catch the SIGCHLD signal and call
waitpid(-1), there should be only one active object per process.
"""
def add_child_handler(self, pid, callback, *args):
"""Register a new child handler.
Arrange for callback(pid, returncode, *args) to be called when
process 'pid' terminates. Specifying another callback for the same
process replaces the previous handler.
Note: callback() must be thread-safe.
"""
raise NotImplementedError()
def remove_child_handler(self, pid):
"""Removes the handler for process 'pid'.
The function returns True if the handler was successfully removed,
False if there was nothing to remove."""
raise NotImplementedError()
def attach_loop(self, loop):
"""Attach the watcher to an event loop.
If the watcher was previously attached to an event loop, then it is
first detached before attaching to the new loop.
Note: loop may be None.
"""
raise NotImplementedError()
def close(self):
"""Close the watcher.
This must be called to make sure that any underlying resource is freed.
"""
raise NotImplementedError()
def __enter__(self):
"""Enter the watcher's context and allow starting new processes
This function must return self"""
raise NotImplementedError()
def __exit__(self, a, b, c):
"""Exit the watcher's context"""
raise NotImplementedError()
class BaseChildWatcher(AbstractChildWatcher):
def __init__(self):
self._loop = None
def close(self):
self.attach_loop(None)
def _do_waitpid(self, expected_pid):
raise NotImplementedError()
def _do_waitpid_all(self):
raise NotImplementedError()
def attach_loop(self, loop):
assert loop is None or isinstance(loop, events.AbstractEventLoop)
if self._loop is not None:
self._loop.remove_signal_handler(signal.SIGCHLD)
self._loop = loop
if loop is not None:
loop.add_signal_handler(signal.SIGCHLD, self._sig_chld)
# Prevent a race condition in case a child terminated
# during the switch.
self._do_waitpid_all()
def _sig_chld(self):
try:
self._do_waitpid_all()
except Exception as exc:
# self._loop should always be available here
# as '_sig_chld' is added as a signal handler
# in 'attach_loop'
self._loop.call_exception_handler({
'message': 'Unknown exception in SIGCHLD handler',
'exception': exc,
})
def _compute_returncode(self, status):
if os.WIFSIGNALED(status):
# The child process died because of a signal.
return -os.WTERMSIG(status)
elif os.WIFEXITED(status):
# The child process exited (e.g sys.exit()).
return os.WEXITSTATUS(status)
else:
# The child exited, but we don't understand its status.
# This shouldn't happen, but if it does, let's just
# return that status; perhaps that helps debug it.
return status
class SafeChildWatcher(BaseChildWatcher):
"""'Safe' child watcher implementation.
This implementation avoids disrupting other code spawning processes by
polling explicitly each process in the SIGCHLD handler instead of calling
os.waitpid(-1).
This is a safe solution but it has a significant overhead when handling a
big number of children (O(n) each time SIGCHLD is raised)
"""
def __init__(self):
super().__init__()
self._callbacks = {}
def close(self):
self._callbacks.clear()
super().close()
def __enter__(self):
return self
def __exit__(self, a, b, c):
pass
def add_child_handler(self, pid, callback, *args):
self._callbacks[pid] = (callback, args)
# Prevent a race condition in case the child is already terminated.
self._do_waitpid(pid)
def remove_child_handler(self, pid):
try:
del self._callbacks[pid]
return True
except KeyError:
return False
def _do_waitpid_all(self):
for pid in list(self._callbacks):
self._do_waitpid(pid)
def _do_waitpid(self, expected_pid):
assert expected_pid > 0
try:
pid, status = os.waitpid(expected_pid, os.WNOHANG)
except ChildProcessError:
# The child process is already reaped
# (may happen if waitpid() is called elsewhere).
pid = expected_pid
returncode = 255
logger.warning(
"Unknown child process pid %d, will report returncode 255",
pid)
else:
if pid == 0:
# The child process is still alive.
return
returncode = self._compute_returncode(status)
if self._loop.get_debug():
logger.debug('process %s exited with returncode %s',
expected_pid, returncode)
try:
callback, args = self._callbacks.pop(pid)
except KeyError: # pragma: no cover
# May happen if .remove_child_handler() is called
# after os.waitpid() returns.
if self._loop.get_debug():
logger.warning("Child watcher got an unexpected pid: %r",
pid, exc_info=True)
else:
callback(pid, returncode, *args)
class FastChildWatcher(BaseChildWatcher):
"""'Fast' child watcher implementation.
This implementation reaps every terminated processes by calling
os.waitpid(-1) directly, possibly breaking other code spawning processes
and waiting for their termination.
There is no noticeable overhead when handling a big number of children
(O(1) each time a child terminates).
"""
def __init__(self):
super().__init__()
self._callbacks = {}
self._lock = threading.Lock()
self._zombies = {}
self._forks = 0
def close(self):
self._callbacks.clear()
self._zombies.clear()
super().close()
def __enter__(self):
with self._lock:
self._forks += 1
return self
def __exit__(self, a, b, c):
with self._lock:
self._forks -= 1
if self._forks or not self._zombies:
return
collateral_victims = str(self._zombies)
self._zombies.clear()
logger.warning(
"Caught subprocesses termination from unknown pids: %s",
collateral_victims)
def add_child_handler(self, pid, callback, *args):
assert self._forks, "Must use the context manager"
with self._lock:
try:
returncode = self._zombies.pop(pid)
except KeyError:
# The child is running.
self._callbacks[pid] = callback, args
return
# The child is dead already. We can fire the callback.
callback(pid, returncode, *args)
def remove_child_handler(self, pid):
try:
del self._callbacks[pid]
return True
except KeyError:
return False
def _do_waitpid_all(self):
# Because of signal coalescing, we must keep calling waitpid() as
# long as we're able to reap a child.
while True:
try:
pid, status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
# No more child processes exist.
return
else:
if pid == 0:
# A child process is still alive.
return
returncode = self._compute_returncode(status)
with self._lock:
try:
callback, args = self._callbacks.pop(pid)
except KeyError:
# unknown child
if self._forks:
# It may not be registered yet.
self._zombies[pid] = returncode
if self._loop.get_debug():
logger.debug('unknown process %s exited '
'with returncode %s',
pid, returncode)
continue
callback = None
else:
if self._loop.get_debug():
logger.debug('process %s exited with returncode %s',
pid, returncode)
if callback is None:
logger.warning(
"Caught subprocess termination from unknown pid: "
"%d -> %d", pid, returncode)
else:
callback(pid, returncode, *args)
class _UnixDefaultEventLoopPolicy(events.BaseDefaultEventLoopPolicy):
"""UNIX event loop policy with a watcher for child processes."""
_loop_factory = _UnixSelectorEventLoop
def __init__(self):
super().__init__()
self._watcher = None
def _init_watcher(self):
with events._lock:
if self._watcher is None: # pragma: no branch
self._watcher = SafeChildWatcher()
if isinstance(threading.current_thread(),
threading._MainThread):
self._watcher.attach_loop(self._local._loop)
def set_event_loop(self, loop):
"""Set the event loop.
As a side effect, if a child watcher was set before, then calling
.set_event_loop() from the main thread will call .attach_loop(loop) on
the child watcher.
"""
super().set_event_loop(loop)
if self._watcher is not None and \
isinstance(threading.current_thread(), threading._MainThread):
self._watcher.attach_loop(loop)
def get_child_watcher(self):
"""Get the watcher for child processes.
If not yet set, a SafeChildWatcher object is automatically created.
"""
if self._watcher is None:
self._init_watcher()
return self._watcher
def set_child_watcher(self, watcher):
"""Set the watcher for child processes."""
assert watcher is None or isinstance(watcher, AbstractChildWatcher)
if self._watcher is not None:
self._watcher.close()
self._watcher = watcher
SelectorEventLoop = _UnixSelectorEventLoop
DefaultEventLoopPolicy = _UnixDefaultEventLoopPolicy
"""Selector and proactor event loops for Windows."""
import _winapi
import errno
import math
import socket
import struct
import weakref
from . import events
from . import base_subprocess
from . import futures
from . import proactor_events
from . import selector_events
from . import tasks
from . import windows_utils
from . import _overlapped
from .coroutines import coroutine
from .log import logger
__all__ = ['SelectorEventLoop', 'ProactorEventLoop', 'IocpProactor',
'DefaultEventLoopPolicy',
]
NULL = 0
INFINITE = 0xffffffff
ERROR_CONNECTION_REFUSED = 1225
ERROR_CONNECTION_ABORTED = 1236
# Initial delay in seconds for connect_pipe() before retrying to connect
CONNECT_PIPE_INIT_DELAY = 0.001
# Maximum delay in seconds for connect_pipe() before retrying to connect
CONNECT_PIPE_MAX_DELAY = 0.100
class _OverlappedFuture(futures.Future):
"""Subclass of Future which represents an overlapped operation.
Cancelling it will immediately cancel the overlapped operation.
"""
def __init__(self, ov, *, loop=None):
super().__init__(loop=loop)
if self._source_traceback:
del self._source_traceback[-1]
self._ov = ov
def _repr_info(self):
info = super()._repr_info()
if self._ov is not None:
state = 'pending' if self._ov.pending else 'completed'
info.insert(1, 'overlapped=<%s, %#x>' % (state, self._ov.address))
return info
def _cancel_overlapped(self):
if self._ov is None:
return
try:
self._ov.cancel()
except OSError as exc:
context = {
'message': 'Cancelling an overlapped future failed',
'exception': exc,
'future': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
self._ov = None
def cancel(self):
self._cancel_overlapped()
return super().cancel()
def set_exception(self, exception):
super().set_exception(exception)
self._cancel_overlapped()
def set_result(self, result):
super().set_result(result)
self._ov = None
class _BaseWaitHandleFuture(futures.Future):
"""Subclass of Future which represents a wait handle."""
def __init__(self, ov, handle, wait_handle, *, loop=None):
super().__init__(loop=loop)
if self._source_traceback:
del self._source_traceback[-1]
# Keep a reference to the Overlapped object to keep it alive until the
# wait is unregistered
self._ov = ov
self._handle = handle
self._wait_handle = wait_handle
# Should we call UnregisterWaitEx() if the wait completes
# or is cancelled?
self._registered = True
def _poll(self):
# non-blocking wait: use a timeout of 0 millisecond
return (_winapi.WaitForSingleObject(self._handle, 0) ==
_winapi.WAIT_OBJECT_0)
def _repr_info(self):
info = super()._repr_info()
info.append('handle=%#x' % self._handle)
if self._handle is not None:
state = 'signaled' if self._poll() else 'waiting'
info.append(state)
if self._wait_handle is not None:
info.append('wait_handle=%#x' % self._wait_handle)
return info
def _unregister_wait_cb(self, fut):
# The wait was unregistered: it's not safe to destroy the Overlapped
# object
self._ov = None
def _unregister_wait(self):
if not self._registered:
return
self._registered = False
wait_handle = self._wait_handle
self._wait_handle = None
try:
_overlapped.UnregisterWait(wait_handle)
except OSError as exc:
if exc.winerror != _overlapped.ERROR_IO_PENDING:
context = {
'message': 'Failed to unregister the wait handle',
'exception': exc,
'future': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
return
# ERROR_IO_PENDING means that the unregister is pending
self._unregister_wait_cb(None)
def cancel(self):
self._unregister_wait()
return super().cancel()
def set_exception(self, exception):
self._unregister_wait()
super().set_exception(exception)
def set_result(self, result):
self._unregister_wait()
super().set_result(result)
class _WaitCancelFuture(_BaseWaitHandleFuture):
"""Subclass of Future which represents a wait for the cancellation of a
_WaitHandleFuture using an event.
"""
def __init__(self, ov, event, wait_handle, *, loop=None):
super().__init__(ov, event, wait_handle, loop=loop)
self._done_callback = None
def cancel(self):
raise RuntimeError("_WaitCancelFuture must not be cancelled")
def _schedule_callbacks(self):
super(_WaitCancelFuture, self)._schedule_callbacks()
if self._done_callback is not None:
self._done_callback(self)
class _WaitHandleFuture(_BaseWaitHandleFuture):
def __init__(self, ov, handle, wait_handle, proactor, *, loop=None):
super().__init__(ov, handle, wait_handle, loop=loop)
self._proactor = proactor
self._unregister_proactor = True
self._event = _overlapped.CreateEvent(None, True, False, None)
self._event_fut = None
def _unregister_wait_cb(self, fut):
if self._event is not None:
_winapi.CloseHandle(self._event)
self._event = None
self._event_fut = None
# If the wait was cancelled, the wait may never be signalled, so
# it's required to unregister it. Otherwise, IocpProactor.close() will
# wait forever for an event which will never come.
#
# If the IocpProactor already received the event, it's safe to call
# _unregister() because we kept a reference to the Overlapped object
# which is used as an unique key.
self._proactor._unregister(self._ov)
self._proactor = None
super()._unregister_wait_cb(fut)
def _unregister_wait(self):
if not self._registered:
return
self._registered = False
wait_handle = self._wait_handle
self._wait_handle = None
try:
_overlapped.UnregisterWaitEx(wait_handle, self._event)
except OSError as exc:
if exc.winerror != _overlapped.ERROR_IO_PENDING:
context = {
'message': 'Failed to unregister the wait handle',
'exception': exc,
'future': self,
}
if self._source_traceback:
context['source_traceback'] = self._source_traceback
self._loop.call_exception_handler(context)
return
# ERROR_IO_PENDING is not an error, the wait was unregistered
self._event_fut = self._proactor._wait_cancel(self._event,
self._unregister_wait_cb)
class PipeServer(object):
"""Class representing a pipe server.
This is much like a bound, listening socket.
"""
def __init__(self, address):
self._address = address
self._free_instances = weakref.WeakSet()
# initialize the pipe attribute before calling _server_pipe_handle()
# because this function can raise an exception and the destructor calls
# the close() method
self._pipe = None
self._accept_pipe_future = None
self._pipe = self._server_pipe_handle(True)
def _get_unconnected_pipe(self):
# Create new instance and return previous one. This ensures
# that (until the server is closed) there is always at least
# one pipe handle for address. Therefore if a client attempt
# to connect it will not fail with FileNotFoundError.
tmp, self._pipe = self._pipe, self._server_pipe_handle(False)
return tmp
def _server_pipe_handle(self, first):
# Return a wrapper for a new pipe handle.
if self.closed():
return None
flags = _winapi.PIPE_ACCESS_DUPLEX | _winapi.FILE_FLAG_OVERLAPPED
if first:
flags |= _winapi.FILE_FLAG_FIRST_PIPE_INSTANCE
h = _winapi.CreateNamedPipe(
self._address, flags,
_winapi.PIPE_TYPE_MESSAGE | _winapi.PIPE_READMODE_MESSAGE |
_winapi.PIPE_WAIT,
_winapi.PIPE_UNLIMITED_INSTANCES,
windows_utils.BUFSIZE, windows_utils.BUFSIZE,
_winapi.NMPWAIT_WAIT_FOREVER, _winapi.NULL)
pipe = windows_utils.PipeHandle(h)
self._free_instances.add(pipe)
return pipe
def closed(self):
return (self._address is None)
def close(self):
if self._accept_pipe_future is not None:
self._accept_pipe_future.cancel()
self._accept_pipe_future = None
# Close all instances which have not been connected to by a client.
if self._address is not None:
for pipe in self._free_instances:
pipe.close()
self._pipe = None
self._address = None
self._free_instances.clear()
__del__ = close
class _WindowsSelectorEventLoop(selector_events.BaseSelectorEventLoop):
"""Windows version of selector event loop."""
def _socketpair(self):
return windows_utils.socketpair()
class ProactorEventLoop(proactor_events.BaseProactorEventLoop):
"""Windows version of proactor event loop using IOCP."""
def __init__(self, proactor=None):
if proactor is None:
proactor = IocpProactor()
super().__init__(proactor)
def _socketpair(self):
return windows_utils.socketpair()
@coroutine
def create_pipe_connection(self, protocol_factory, address):
f = self._proactor.connect_pipe(address)
pipe = yield from f
protocol = protocol_factory()
trans = self._make_duplex_pipe_transport(pipe, protocol,
extra={'addr': address})
return trans, protocol
@coroutine
def start_serving_pipe(self, protocol_factory, address):
server = PipeServer(address)
def loop_accept_pipe(f=None):
pipe = None
try:
if f:
pipe = f.result()
server._free_instances.discard(pipe)
if server.closed():
# A client connected before the server was closed:
# drop the client (close the pipe) and exit
pipe.close()
return
protocol = protocol_factory()
self._make_duplex_pipe_transport(
pipe, protocol, extra={'addr': address})
pipe = server._get_unconnected_pipe()
if pipe is None:
return
f = self._proactor.accept_pipe(pipe)
except OSError as exc:
if pipe and pipe.fileno() != -1:
self.call_exception_handler({
'message': 'Pipe accept failed',
'exception': exc,
'pipe': pipe,
})
pipe.close()
elif self._debug:
logger.warning("Accept pipe failed on pipe %r",
pipe, exc_info=True)
except futures.CancelledError:
if pipe:
pipe.close()
else:
server._accept_pipe_future = f
f.add_done_callback(loop_accept_pipe)
self.call_soon(loop_accept_pipe)
return [server]
@coroutine
def _make_subprocess_transport(self, protocol, args, shell,
stdin, stdout, stderr, bufsize,
extra=None, **kwargs):
waiter = futures.Future(loop=self)
transp = _WindowsSubprocessTransport(self, protocol, args, shell,
stdin, stdout, stderr, bufsize,
waiter=waiter, extra=extra,
**kwargs)
try:
yield from waiter
except Exception as exc:
# Workaround CPython bug #23353: using yield/yield-from in an
# except block of a generator doesn't clear properly sys.exc_info()
err = exc
else:
err = None
if err is not None:
transp.close()
yield from transp._wait()
raise err
return transp
class IocpProactor:
"""Proactor implementation using IOCP."""
def __init__(self, concurrency=0xffffffff):
self._loop = None
self._results = []
self._iocp = _overlapped.CreateIoCompletionPort(
_overlapped.INVALID_HANDLE_VALUE, NULL, 0, concurrency)
self._cache = {}
self._registered = weakref.WeakSet()
self._unregistered = []
self._stopped_serving = weakref.WeakSet()
def __repr__(self):
return ('<%s overlapped#=%s result#=%s>'
% (self.__class__.__name__, len(self._cache),
len(self._results)))
def set_loop(self, loop):
self._loop = loop
def select(self, timeout=None):
if not self._results:
self._poll(timeout)
tmp = self._results
self._results = []
return tmp
def _result(self, value):
fut = futures.Future(loop=self._loop)
fut.set_result(value)
return fut
def recv(self, conn, nbytes, flags=0):
self._register_with_iocp(conn)
ov = _overlapped.Overlapped(NULL)
try:
if isinstance(conn, socket.socket):
ov.WSARecv(conn.fileno(), nbytes, flags)
else:
ov.ReadFile(conn.fileno(), nbytes)
except BrokenPipeError:
return self._result(b'')
def finish_recv(trans, key, ov):
try:
return ov.getresult()
except OSError as exc:
if exc.winerror == _overlapped.ERROR_NETNAME_DELETED:
raise ConnectionResetError(*exc.args)
else:
raise
return self._register(ov, conn, finish_recv)
def send(self, conn, buf, flags=0):
self._register_with_iocp(conn)
ov = _overlapped.Overlapped(NULL)
if isinstance(conn, socket.socket):
ov.WSASend(conn.fileno(), buf, flags)
else:
ov.WriteFile(conn.fileno(), buf)
def finish_send(trans, key, ov):
try:
return ov.getresult()
except OSError as exc:
if exc.winerror == _overlapped.ERROR_NETNAME_DELETED:
raise ConnectionResetError(*exc.args)
else:
raise
return self._register(ov, conn, finish_send)
def accept(self, listener):
self._register_with_iocp(listener)
conn = self._get_accept_socket(listener.family)
ov = _overlapped.Overlapped(NULL)
ov.AcceptEx(listener.fileno(), conn.fileno())
def finish_accept(trans, key, ov):
ov.getresult()
# Use SO_UPDATE_ACCEPT_CONTEXT so getsockname() etc work.
buf = struct.pack('@P', listener.fileno())
conn.setsockopt(socket.SOL_SOCKET,
_overlapped.SO_UPDATE_ACCEPT_CONTEXT, buf)
conn.settimeout(listener.gettimeout())
return conn, conn.getpeername()
@coroutine
def accept_coro(future, conn):
# Coroutine closing the accept socket if the future is cancelled
try:
yield from future
except futures.CancelledError:
conn.close()
raise
future = self._register(ov, listener, finish_accept)
coro = accept_coro(future, conn)
tasks.ensure_future(coro, loop=self._loop)
return future
def connect(self, conn, address):
self._register_with_iocp(conn)
# The socket needs to be locally bound before we call ConnectEx().
try:
_overlapped.BindLocal(conn.fileno(), conn.family)
except OSError as e:
if e.winerror != errno.WSAEINVAL:
raise
# Probably already locally bound; check using getsockname().
if conn.getsockname()[1] == 0:
raise
ov = _overlapped.Overlapped(NULL)
ov.ConnectEx(conn.fileno(), address)
def finish_connect(trans, key, ov):
ov.getresult()
# Use SO_UPDATE_CONNECT_CONTEXT so getsockname() etc work.
conn.setsockopt(socket.SOL_SOCKET,
_overlapped.SO_UPDATE_CONNECT_CONTEXT, 0)
return conn
return self._register(ov, conn, finish_connect)
def accept_pipe(self, pipe):
self._register_with_iocp(pipe)
ov = _overlapped.Overlapped(NULL)
connected = ov.ConnectNamedPipe(pipe.fileno())
if connected:
# ConnectNamePipe() failed with ERROR_PIPE_CONNECTED which means
# that the pipe is connected. There is no need to wait for the
# completion of the connection.
return self._result(pipe)
def finish_accept_pipe(trans, key, ov):
ov.getresult()
return pipe
return self._register(ov, pipe, finish_accept_pipe)
@coroutine
def connect_pipe(self, address):
delay = CONNECT_PIPE_INIT_DELAY
while True:
# Unfortunately there is no way to do an overlapped connect to a pipe.
# Call CreateFile() in a loop until it doesn't fail with
# ERROR_PIPE_BUSY
try:
handle = _overlapped.ConnectPipe(address)
break
except OSError as exc:
if exc.winerror != _overlapped.ERROR_PIPE_BUSY:
raise
# ConnectPipe() failed with ERROR_PIPE_BUSY: retry later
delay = min(delay * 2, CONNECT_PIPE_MAX_DELAY)
yield from tasks.sleep(delay, loop=self._loop)
return windows_utils.PipeHandle(handle)
def wait_for_handle(self, handle, timeout=None):
"""Wait for a handle.
Return a Future object. The result of the future is True if the wait
completed, or False if the wait did not complete (on timeout).
"""
return self._wait_for_handle(handle, timeout, False)
def _wait_cancel(self, event, done_callback):
fut = self._wait_for_handle(event, None, True)
# add_done_callback() cannot be used because the wait may only complete
# in IocpProactor.close(), while the event loop is not running.
fut._done_callback = done_callback
return fut
def _wait_for_handle(self, handle, timeout, _is_cancel):
if timeout is None:
ms = _winapi.INFINITE
else:
# RegisterWaitForSingleObject() has a resolution of 1 millisecond,
# round away from zero to wait *at least* timeout seconds.
ms = math.ceil(timeout * 1e3)
# We only create ov so we can use ov.address as a key for the cache.
ov = _overlapped.Overlapped(NULL)
wait_handle = _overlapped.RegisterWaitWithQueue(
handle, self._iocp, ov.address, ms)
if _is_cancel:
f = _WaitCancelFuture(ov, handle, wait_handle, loop=self._loop)
else:
f = _WaitHandleFuture(ov, handle, wait_handle, self,
loop=self._loop)
if f._source_traceback:
del f._source_traceback[-1]
def finish_wait_for_handle(trans, key, ov):
# Note that this second wait means that we should only use
# this with handles types where a successful wait has no
# effect. So events or processes are all right, but locks
# or semaphores are not. Also note if the handle is
# signalled and then quickly reset, then we may return
# False even though we have not timed out.
return f._poll()
self._cache[ov.address] = (f, ov, 0, finish_wait_for_handle)
return f
def _register_with_iocp(self, obj):
# To get notifications of finished ops on this objects sent to the
# completion port, were must register the handle.
if obj not in self._registered:
self._registered.add(obj)
_overlapped.CreateIoCompletionPort(obj.fileno(), self._iocp, 0, 0)
# XXX We could also use SetFileCompletionNotificationModes()
# to avoid sending notifications to completion port of ops
# that succeed immediately.
def _register(self, ov, obj, callback):
# Return a future which will be set with the result of the
# operation when it completes. The future's value is actually
# the value returned by callback().
f = _OverlappedFuture(ov, loop=self._loop)
if f._source_traceback:
del f._source_traceback[-1]
if not ov.pending:
# The operation has completed, so no need to postpone the
# work. We cannot take this short cut if we need the
# NumberOfBytes, CompletionKey values returned by
# PostQueuedCompletionStatus().
try:
value = callback(None, None, ov)
except OSError as e:
f.set_exception(e)
else:
f.set_result(value)
# Even if GetOverlappedResult() was called, we have to wait for the
# notification of the completion in GetQueuedCompletionStatus().
# Register the overlapped operation to keep a reference to the
# OVERLAPPED object, otherwise the memory is freed and Windows may
# read uninitialized memory.
# Register the overlapped operation for later. Note that
# we only store obj to prevent it from being garbage
# collected too early.
self._cache[ov.address] = (f, ov, obj, callback)
return f
def _unregister(self, ov):
"""Unregister an overlapped object.
Call this method when its future has been cancelled. The event can
already be signalled (pending in the proactor event queue). It is also
safe if the event is never signalled (because it was cancelled).
"""
self._unregistered.append(ov)
def _get_accept_socket(self, family):
s = socket.socket(family)
s.settimeout(0)
return s
def _poll(self, timeout=None):
if timeout is None:
ms = INFINITE
elif timeout < 0:
raise ValueError("negative timeout")
else:
# GetQueuedCompletionStatus() has a resolution of 1 millisecond,
# round away from zero to wait *at least* timeout seconds.
ms = math.ceil(timeout * 1e3)
if ms >= INFINITE:
raise ValueError("timeout too big")
while True:
status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms)
if status is None:
break
ms = 0
err, transferred, key, address = status
try:
f, ov, obj, callback = self._cache.pop(address)
except KeyError:
if self._loop.get_debug():
self._loop.call_exception_handler({
'message': ('GetQueuedCompletionStatus() returned an '
'unexpected event'),
'status': ('err=%s transferred=%s key=%#x address=%#x'
% (err, transferred, key, address)),
})
# key is either zero, or it is used to return a pipe
# handle which should be closed to avoid a leak.
if key not in (0, _overlapped.INVALID_HANDLE_VALUE):
_winapi.CloseHandle(key)
continue
if obj in self._stopped_serving:
f.cancel()
# Don't call the callback if _register() already read the result or
# if the overlapped has been cancelled
elif not f.done():
try:
value = callback(transferred, key, ov)
except OSError as e:
f.set_exception(e)
self._results.append(f)
else:
f.set_result(value)
self._results.append(f)
# Remove unregisted futures
for ov in self._unregistered:
self._cache.pop(ov.address, None)
self._unregistered.clear()
def _stop_serving(self, obj):
# obj is a socket or pipe handle. It will be closed in
# BaseProactorEventLoop._stop_serving() which will make any
# pending operations fail quickly.
self._stopped_serving.add(obj)
def close(self):
# Cancel remaining registered operations.
for address, (fut, ov, obj, callback) in list(self._cache.items()):
if fut.cancelled():
# Nothing to do with cancelled futures
pass
elif isinstance(fut, _WaitCancelFuture):
# _WaitCancelFuture must not be cancelled
pass
else:
try:
fut.cancel()
except OSError as exc:
if self._loop is not None:
context = {
'message': 'Cancelling a future failed',
'exception': exc,
'future': fut,
}
if fut._source_traceback:
context['source_traceback'] = fut._source_traceback
self._loop.call_exception_handler(context)
while self._cache:
if not self._poll(1):
logger.debug('taking long time to close proactor')
self._results = []
if self._iocp is not None:
_winapi.CloseHandle(self._iocp)
self._iocp = None
def __del__(self):
self.close()
class _WindowsSubprocessTransport(base_subprocess.BaseSubprocessTransport):
def _start(self, args, shell, stdin, stdout, stderr, bufsize, **kwargs):
self._proc = windows_utils.Popen(
args, shell=shell, stdin=stdin, stdout=stdout, stderr=stderr,
bufsize=bufsize, **kwargs)
def callback(f):
returncode = self._proc.poll()
self._process_exited(returncode)
f = self._loop._proactor.wait_for_handle(int(self._proc._handle))
f.add_done_callback(callback)
SelectorEventLoop = _WindowsSelectorEventLoop
class _WindowsDefaultEventLoopPolicy(events.BaseDefaultEventLoopPolicy):
_loop_factory = SelectorEventLoop
DefaultEventLoopPolicy = _WindowsDefaultEventLoopPolicy
"""
Various Windows specific bits and pieces
"""
import sys
if sys.platform != 'win32': # pragma: no cover
raise ImportError('win32 only')
import _winapi
import itertools
import msvcrt
import os
import socket
import subprocess
import tempfile
import warnings
__all__ = ['socketpair', 'pipe', 'Popen', 'PIPE', 'PipeHandle']
# Constants/globals
BUFSIZE = 8192
PIPE = subprocess.PIPE
STDOUT = subprocess.STDOUT
_mmap_counter = itertools.count()
if hasattr(socket, 'socketpair'):
# Since Python 3.5, socket.socketpair() is now also available on Windows
socketpair = socket.socketpair
else:
# Replacement for socket.socketpair()
def socketpair(family=socket.AF_INET, type=socket.SOCK_STREAM, proto=0):
"""A socket pair usable as a self-pipe, for Windows.
Origin: https://gist.github.com/4325783, by Geert Jansen.
Public domain.
"""
if family == socket.AF_INET:
host = '127.0.0.1'
elif family == socket.AF_INET6:
host = '::1'
else:
raise ValueError("Only AF_INET and AF_INET6 socket address "
"families are supported")
if type != socket.SOCK_STREAM:
raise ValueError("Only SOCK_STREAM socket type is supported")
if proto != 0:
raise ValueError("Only protocol zero is supported")
# We create a connected TCP socket. Note the trick with setblocking(0)
# that prevents us from having to create a thread.
lsock = socket.socket(family, type, proto)
try:
lsock.bind((host, 0))
lsock.listen(1)
# On IPv6, ignore flow_info and scope_id
addr, port = lsock.getsockname()[:2]
csock = socket.socket(family, type, proto)
try:
csock.setblocking(False)
try:
csock.connect((addr, port))
except (BlockingIOError, InterruptedError):
pass
csock.setblocking(True)
ssock, _ = lsock.accept()
except:
csock.close()
raise
finally:
lsock.close()
return (ssock, csock)
# Replacement for os.pipe() using handles instead of fds
def pipe(*, duplex=False, overlapped=(True, True), bufsize=BUFSIZE):
"""Like os.pipe() but with overlapped support and using handles not fds."""
address = tempfile.mktemp(prefix=r'\\.\pipe\python-pipe-%d-%d-' %
(os.getpid(), next(_mmap_counter)))
if duplex:
openmode = _winapi.PIPE_ACCESS_DUPLEX
access = _winapi.GENERIC_READ | _winapi.GENERIC_WRITE
obsize, ibsize = bufsize, bufsize
else:
openmode = _winapi.PIPE_ACCESS_INBOUND
access = _winapi.GENERIC_WRITE
obsize, ibsize = 0, bufsize
openmode |= _winapi.FILE_FLAG_FIRST_PIPE_INSTANCE
if overlapped[0]:
openmode |= _winapi.FILE_FLAG_OVERLAPPED
if overlapped[1]:
flags_and_attribs = _winapi.FILE_FLAG_OVERLAPPED
else:
flags_and_attribs = 0
h1 = h2 = None
try:
h1 = _winapi.CreateNamedPipe(
address, openmode, _winapi.PIPE_WAIT,
1, obsize, ibsize, _winapi.NMPWAIT_WAIT_FOREVER, _winapi.NULL)
h2 = _winapi.CreateFile(
address, access, 0, _winapi.NULL, _winapi.OPEN_EXISTING,
flags_and_attribs, _winapi.NULL)
ov = _winapi.ConnectNamedPipe(h1, overlapped=True)
ov.GetOverlappedResult(True)
return h1, h2
except:
if h1 is not None:
_winapi.CloseHandle(h1)
if h2 is not None:
_winapi.CloseHandle(h2)
raise
# Wrapper for a pipe handle
class PipeHandle:
"""Wrapper for an overlapped pipe handle which is vaguely file-object like.
The IOCP event loop can use these instead of socket objects.
"""
def __init__(self, handle):
self._handle = handle
def __repr__(self):
if self._handle is not None:
handle = 'handle=%r' % self._handle
else:
handle = 'closed'
return '<%s %s>' % (self.__class__.__name__, handle)
@property
def handle(self):
return self._handle
def fileno(self):
if self._handle is None:
raise ValueError("I/O operatioon on closed pipe")
return self._handle
def close(self, *, CloseHandle=_winapi.CloseHandle):
if self._handle is not None:
CloseHandle(self._handle)
self._handle = None
def __del__(self):
if self._handle is not None:
warnings.warn("unclosed %r" % self, ResourceWarning)
self.close()
def __enter__(self):
return self
def __exit__(self, t, v, tb):
self.close()
# Replacement for subprocess.Popen using overlapped pipe handles
class Popen(subprocess.Popen):
"""Replacement for subprocess.Popen using overlapped pipe handles.
The stdin, stdout, stderr are None or instances of PipeHandle.
"""
def __init__(self, args, stdin=None, stdout=None, stderr=None, **kwds):
assert not kwds.get('universal_newlines')
assert kwds.get('bufsize', 0) == 0
stdin_rfd = stdout_wfd = stderr_wfd = None
stdin_wh = stdout_rh = stderr_rh = None
if stdin == PIPE:
stdin_rh, stdin_wh = pipe(overlapped=(False, True), duplex=True)
stdin_rfd = msvcrt.open_osfhandle(stdin_rh, os.O_RDONLY)
else:
stdin_rfd = stdin
if stdout == PIPE:
stdout_rh, stdout_wh = pipe(overlapped=(True, False))
stdout_wfd = msvcrt.open_osfhandle(stdout_wh, 0)
else:
stdout_wfd = stdout
if stderr == PIPE:
stderr_rh, stderr_wh = pipe(overlapped=(True, False))
stderr_wfd = msvcrt.open_osfhandle(stderr_wh, 0)
elif stderr == STDOUT:
stderr_wfd = stdout_wfd
else:
stderr_wfd = stderr
try:
super().__init__(args, stdin=stdin_rfd, stdout=stdout_wfd,
stderr=stderr_wfd, **kwds)
except:
for h in (stdin_wh, stdout_rh, stderr_rh):
if h is not None:
_winapi.CloseHandle(h)
raise
else:
if stdin_wh is not None:
self.stdin = PipeHandle(stdin_wh)
if stdout_rh is not None:
self.stdout = PipeHandle(stdout_rh)
if stderr_rh is not None:
self.stderr = PipeHandle(stderr_rh)
finally:
if stdin == PIPE:
os.close(stdin_rfd)
if stdout == PIPE:
os.close(stdout_wfd)
if stderr == PIPE:
os.close(stderr_wfd)
"""The asyncio package, tracking PEP 3156."""
import sys
# The selectors module is in the stdlib in Python 3.4 but not in 3.3.
# Do this first, so the other submodules can use "from . import selectors".
# Prefer asyncio/selectors.py over the stdlib one, as ours may be newer.
try:
from . import selectors
except ImportError:
import selectors # Will also be exported.
if sys.platform == 'win32':
# Similar thing for _overlapped.
try:
from . import _overlapped
except ImportError:
import _overlapped # Will also be exported.
# This relies on each of the submodules having an __all__ variable.
from .base_events import *
from .coroutines import *
from .events import *
from .futures import *
from .locks import *
from .protocols import *
from .queues import *
from .streams import *
from .subprocess import *
from .tasks import *
from .transports import *
__all__ = (base_events.__all__ +
coroutines.__all__ +
events.__all__ +
futures.__all__ +
locks.__all__ +
protocols.__all__ +
queues.__all__ +
streams.__all__ +
subprocess.__all__ +
tasks.__all__ +
transports.__all__)
if sys.platform == 'win32': # pragma: no cover
from .windows_events import *
__all__ += windows_events.__all__
else:
from .unix_events import * # pragma: no cover
__all__ += unix_events.__all__
from _collections_abc import *
from _collections_abc import __all__
__all__ = ['deque', 'defaultdict', 'namedtuple', 'UserDict', 'UserList',
'UserString', 'Counter', 'OrderedDict', 'ChainMap']
# For backwards compatibility, continue to make the collections ABCs
# available through the collections module.
from _collections_abc import *
import _collections_abc
__all__ += _collections_abc.__all__
from _collections import deque, defaultdict
from operator import itemgetter as _itemgetter, eq as _eq
from keyword import iskeyword as _iskeyword
import sys as _sys
import heapq as _heapq
from _weakref import proxy as _proxy
from itertools import repeat as _repeat, chain as _chain, starmap as _starmap
from reprlib import recursive_repr as _recursive_repr
################################################################################
### OrderedDict
################################################################################
class _Link(object):
__slots__ = 'prev', 'next', 'key', '__weakref__'
class OrderedDict(dict):
'Dictionary that remembers insertion order'
# An inherited dict maps keys to values.
# The inherited dict provides __getitem__, __len__, __contains__, and get.
# The remaining methods are order-aware.
# Big-O running times for all methods are the same as regular dictionaries.
# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# The sentinel is in self.__hardroot with a weakref proxy in self.__root.
# The prev links are weakref proxies (to prevent circular references).
# Individual links are kept alive by the hard reference in self.__map.
# Those hard references disappear when a key is deleted from an OrderedDict.
def __init__(*args, **kwds):
'''Initialize an ordered dictionary. The signature is the same as
regular dictionaries, but keyword arguments are not recommended because
their insertion order is arbitrary.
'''
if not args:
raise TypeError("descriptor '__init__' of 'OrderedDict' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
try:
self.__root
except AttributeError:
self.__hardroot = _Link()
self.__root = root = _proxy(self.__hardroot)
root.prev = root.next = root
self.__map = {}
self.__update(*args, **kwds)
def __setitem__(self, key, value,
dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
'od.__setitem__(i, y) <==> od[i]=y'
# Setting a new item creates a new link at the end of the linked list,
# and the inherited dictionary is updated with the new key/value pair.
if key not in self:
self.__map[key] = link = Link()
root = self.__root
last = root.prev
link.prev, link.next, link.key = last, root, key
last.next = link
root.prev = proxy(link)
dict_setitem(self, key, value)
def __delitem__(self, key, dict_delitem=dict.__delitem__):
'od.__delitem__(y) <==> del od[y]'
# Deleting an existing item uses self.__map to find the link which gets
# removed by updating the links in the predecessor and successor nodes.
dict_delitem(self, key)
link = self.__map.pop(key)
link_prev = link.prev
link_next = link.next
link_prev.next = link_next
link_next.prev = link_prev
def __iter__(self):
'od.__iter__() <==> iter(od)'
# Traverse the linked list in order.
root = self.__root
curr = root.next
while curr is not root:
yield curr.key
curr = curr.next
def __reversed__(self):
'od.__reversed__() <==> reversed(od)'
# Traverse the linked list in reverse order.
root = self.__root
curr = root.prev
while curr is not root:
yield curr.key
curr = curr.prev
def clear(self):
'od.clear() -> None. Remove all items from od.'
root = self.__root
root.prev = root.next = root
self.__map.clear()
dict.clear(self)
def popitem(self, last=True):
'''od.popitem() -> (k, v), return and remove a (key, value) pair.
Pairs are returned in LIFO order if last is true or FIFO order if false.
'''
if not self:
raise KeyError('dictionary is empty')
root = self.__root
if last:
link = root.prev
link_prev = link.prev
link_prev.next = root
root.prev = link_prev
else:
link = root.next
link_next = link.next
root.next = link_next
link_next.prev = root
key = link.key
del self.__map[key]
value = dict.pop(self, key)
return key, value
def move_to_end(self, key, last=True):
'''Move an existing element to the end (or beginning if last==False).
Raises KeyError if the element does not exist.
When last=True, acts like a fast version of self[key]=self.pop(key).
'''
link = self.__map[key]
link_prev = link.prev
link_next = link.next
link_prev.next = link_next
link_next.prev = link_prev
root = self.__root
if last:
last = root.prev
link.prev = last
link.next = root
last.next = root.prev = link
else:
first = root.next
link.prev = root
link.next = first
root.next = first.prev = link
def __sizeof__(self):
sizeof = _sys.getsizeof
n = len(self) + 1 # number of links including root
size = sizeof(self.__dict__) # instance dictionary
size += sizeof(self.__map) * 2 # internal dict and inherited dict
size += sizeof(self.__hardroot) * n # link objects
size += sizeof(self.__root) * n # proxy objects
return size
update = __update = MutableMapping.update
keys = MutableMapping.keys
values = MutableMapping.values
items = MutableMapping.items
__ne__ = MutableMapping.__ne__
__marker = object()
def pop(self, key, default=__marker):
'''od.pop(k[,d]) -> v, remove specified key and return the corresponding
value. If key is not found, d is returned if given, otherwise KeyError
is raised.
'''
if key in self:
result = self[key]
del self[key]
return result
if default is self.__marker:
raise KeyError(key)
return default
def setdefault(self, key, default=None):
'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od'
if key in self:
return self[key]
self[key] = default
return default
@_recursive_repr()
def __repr__(self):
'od.__repr__() <==> repr(od)'
if not self:
return '%s()' % (self.__class__.__name__,)
return '%s(%r)' % (self.__class__.__name__, list(self.items()))
def __reduce__(self):
'Return state information for pickling'
inst_dict = vars(self).copy()
for k in vars(OrderedDict()):
inst_dict.pop(k, None)
return self.__class__, (), inst_dict or None, None, iter(self.items())
def copy(self):
'od.copy() -> a shallow copy of od'
return self.__class__(self)
@classmethod
def fromkeys(cls, iterable, value=None):
'''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S.
If not specified, the value defaults to None.
'''
self = cls()
for key in iterable:
self[key] = value
return self
def __eq__(self, other):
'''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive
while comparison to a regular mapping is order-insensitive.
'''
if isinstance(other, OrderedDict):
return dict.__eq__(self, other) and all(map(_eq, self, other))
return dict.__eq__(self, other)
################################################################################
### namedtuple
################################################################################
_class_template = """\
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict
class {typename}(tuple):
'{typename}({arg_list})'
__slots__ = ()
_fields = {field_names!r}
def __new__(_cls, {arg_list}):
'Create new instance of {typename}({arg_list})'
return _tuple.__new__(_cls, ({arg_list}))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new {typename} object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != {num_fields:d}:
raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
return result
def _replace(_self, **kwds):
'Return a new {typename} object replacing specified fields with new values'
result = _self._make(map(kwds.pop, {field_names!r}, _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % list(kwds))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return self.__class__.__name__ + '({repr_fmt})' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values.'
return OrderedDict(zip(self._fields, self))
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
{field_defs}
"""
_repr_template = '{name}=%r'
_field_template = '''\
{name} = _property(_itemgetter({index:d}), doc='Alias for field number {index:d}')
'''
def namedtuple(typename, field_names, verbose=False, rename=False):
"""Returns a new subclass of tuple with named fields.
>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point.__doc__ # docstring for the new class
'Point(x, y)'
>>> p = Point(11, y=22) # instantiate with positional args or keywords
>>> p[0] + p[1] # indexable like a plain tuple
33
>>> x, y = p # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y # fields also accessable by name
33
>>> d = p._asdict() # convert to a dictionary
>>> d['x']
11
>>> Point(**d) # convert from a dictionary
Point(x=11, y=22)
>>> p._replace(x=100) # _replace() is like str.replace() but targets named fields
Point(x=100, y=22)
"""
# Validate the field names. At the user's option, either generate an error
# message or automatically replace the field name with a valid name.
if isinstance(field_names, str):
field_names = field_names.replace(',', ' ').split()
field_names = list(map(str, field_names))
typename = str(typename)
if rename:
seen = set()
for index, name in enumerate(field_names):
if (not name.isidentifier()
or _iskeyword(name)
or name.startswith('_')
or name in seen):
field_names[index] = '_%d' % index
seen.add(name)
for name in [typename] + field_names:
if type(name) != str:
raise TypeError('Type names and field names must be strings')
if not name.isidentifier():
raise ValueError('Type names and field names must be valid '
'identifiers: %r' % name)
if _iskeyword(name):
raise ValueError('Type names and field names cannot be a '
'keyword: %r' % name)
seen = set()
for name in field_names:
if name.startswith('_') and not rename:
raise ValueError('Field names cannot start with an underscore: '
'%r' % name)
if name in seen:
raise ValueError('Encountered duplicate field name: %r' % name)
seen.add(name)
# Fill-in the class template
class_definition = _class_template.format(
typename = typename,
field_names = tuple(field_names),
num_fields = len(field_names),
arg_list = repr(tuple(field_names)).replace("'", "")[1:-1],
repr_fmt = ', '.join(_repr_template.format(name=name)
for name in field_names),
field_defs = '\n'.join(_field_template.format(index=index, name=name)
for index, name in enumerate(field_names))
)
# Execute the template string in a temporary namespace and support
# tracing utilities by setting a value for frame.f_globals['__name__']
namespace = dict(__name__='namedtuple_%s' % typename)
exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition
if verbose:
print(result._source)
# For pickling to work, the __module__ variable needs to be set to the frame
# where the named tuple is created. Bypass this step in environments where
# sys._getframe is not defined (Jython for example) or sys._getframe is not
# defined for arguments greater than 0 (IronPython).
try:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
except (AttributeError, ValueError):
pass
return result
########################################################################
### Counter
########################################################################
def _count_elements(mapping, iterable):
'Tally elements from the iterable.'
mapping_get = mapping.get
for elem in iterable:
mapping[elem] = mapping_get(elem, 0) + 1
try: # Load C helper function if available
from _collections import _count_elements
except ImportError:
pass
class Counter(dict):
'''Dict subclass for counting hashable items. Sometimes called a bag
or multiset. Elements are stored as dictionary keys and their counts
are stored as dictionary values.
>>> c = Counter('abcdeabcdabcaba') # count elements from a string
>>> c.most_common(3) # three most common elements
[('a', 5), ('b', 4), ('c', 3)]
>>> sorted(c) # list all unique elements
['a', 'b', 'c', 'd', 'e']
>>> ''.join(sorted(c.elements())) # list elements with repetitions
'aaaaabbbbcccdde'
>>> sum(c.values()) # total of all counts
15
>>> c['a'] # count of letter 'a'
5
>>> for elem in 'shazam': # update counts from an iterable
... c[elem] += 1 # by adding 1 to each element's count
>>> c['a'] # now there are seven 'a'
7
>>> del c['b'] # remove all 'b'
>>> c['b'] # now there are zero 'b'
0
>>> d = Counter('simsalabim') # make another counter
>>> c.update(d) # add in the second counter
>>> c['a'] # now there are nine 'a'
9
>>> c.clear() # empty the counter
>>> c
Counter()
Note: If a count is set to zero or reduced to zero, it will remain
in the counter until the entry is deleted or the counter is cleared:
>>> c = Counter('aaabbc')
>>> c['b'] -= 2 # reduce the count of 'b' by two
>>> c.most_common() # 'b' is still in, but its count is zero
[('a', 3), ('c', 1), ('b', 0)]
'''
# References:
# http://en.wikipedia.org/wiki/Multiset
# http://www.gnu.org/software/smalltalk/manual-base/html_node/Bag.html
# http://www.demo2s.com/Tutorial/Cpp/0380__set-multiset/Catalog0380__set-multiset.htm
# http://code.activestate.com/recipes/259174/
# Knuth, TAOCP Vol. II section 4.6.3
def __init__(*args, **kwds):
'''Create a new, empty Counter object. And if given, count elements
from an input iterable. Or, initialize the count from another mapping
of elements to their counts.
>>> c = Counter() # a new, empty counter
>>> c = Counter('gallahad') # a new counter from an iterable
>>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping
>>> c = Counter(a=4, b=2) # a new counter from keyword args
'''
if not args:
raise TypeError("descriptor '__init__' of 'Counter' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
super(Counter, self).__init__()
self.update(*args, **kwds)
def __missing__(self, key):
'The count of elements not in the Counter is zero.'
# Needed so that self[missing_item] does not raise KeyError
return 0
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
def elements(self):
'''Iterator over elements repeating each as many times as its count.
>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']
# Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements(): # loop over factors
... product *= factor # and multiply them
>>> product
1836
Note, if an element's count has been set to zero or is a negative
number, elements() will ignore it.
'''
# Emulate Bag.do from Smalltalk and Multiset.begin from C++.
return _chain.from_iterable(_starmap(_repeat, self.items()))
# Override dict methods where necessary
@classmethod
def fromkeys(cls, iterable, v=None):
# There is no equivalent method for counters because setting v=1
# means that no element can have a count greater than one.
raise NotImplementedError(
'Counter.fromkeys() is undefined. Use Counter(iterable) instead.')
def update(*args, **kwds):
'''Like dict.update() but add counts instead of replacing them.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.update('witch') # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d) # add elements from another counter
>>> c['h'] # four 'h' in which, witch, and watch
4
'''
# The regular dict.update() operation makes no sense here because the
# replace behavior results in the some of original untouched counts
# being mixed-in with all of the other counts for a mismash that
# doesn't have a straight-forward interpretation in most counting
# contexts. Instead, we implement straight-addition. Both the inputs
# and outputs are allowed to contain zero and negative counts.
if not args:
raise TypeError("descriptor 'update' of 'Counter' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
iterable = args[0] if args else None
if iterable is not None:
if isinstance(iterable, Mapping):
if self:
self_get = self.get
for elem, count in iterable.items():
self[elem] = count + self_get(elem, 0)
else:
super(Counter, self).update(iterable) # fast path when counter is empty
else:
_count_elements(self, iterable)
if kwds:
self.update(kwds)
def subtract(*args, **kwds):
'''Like dict.update() but subtracts counts instead of replacing them.
Counts can be reduced below zero. Both the inputs and outputs are
allowed to contain zero and negative counts.
Source can be an iterable, a dictionary, or another Counter instance.
>>> c = Counter('which')
>>> c.subtract('witch') # subtract elements from another iterable
>>> c.subtract(Counter('watch')) # subtract elements from another counter
>>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch
-1
'''
if not args:
raise TypeError("descriptor 'subtract' of 'Counter' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
iterable = args[0] if args else None
if iterable is not None:
self_get = self.get
if isinstance(iterable, Mapping):
for elem, count in iterable.items():
self[elem] = self_get(elem, 0) - count
else:
for elem in iterable:
self[elem] = self_get(elem, 0) - 1
if kwds:
self.subtract(kwds)
def copy(self):
'Return a shallow copy.'
return self.__class__(self)
def __reduce__(self):
return self.__class__, (dict(self),)
def __delitem__(self, elem):
'Like dict.__delitem__() but does not raise KeyError for missing values.'
if elem in self:
super().__delitem__(elem)
def __repr__(self):
if not self:
return '%s()' % self.__class__.__name__
try:
items = ', '.join(map('%r: %r'.__mod__, self.most_common()))
return '%s({%s})' % (self.__class__.__name__, items)
except TypeError:
# handle case where values are not orderable
return '{0}({1!r})'.format(self.__class__.__name__, dict(self))
# Multiset-style mathematical operations discussed in:
# Knuth TAOCP Volume II section 4.6.3 exercise 19
# and at http://en.wikipedia.org/wiki/Multiset
#
# Outputs guaranteed to only include positive counts.
#
# To strip negative and zero counts, add-in an empty counter:
# c += Counter()
def __add__(self, other):
'''Add counts from two counters.
>>> Counter('abbb') + Counter('bcc')
Counter({'b': 4, 'c': 2, 'a': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
newcount = count + other[elem]
if newcount > 0:
result[elem] = newcount
for elem, count in other.items():
if elem not in self and count > 0:
result[elem] = count
return result
def __sub__(self, other):
''' Subtract count, but keep only results with positive counts.
>>> Counter('abbbc') - Counter('bccd')
Counter({'b': 2, 'a': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
newcount = count - other[elem]
if newcount > 0:
result[elem] = newcount
for elem, count in other.items():
if elem not in self and count < 0:
result[elem] = 0 - count
return result
def __or__(self, other):
'''Union is the maximum of value in either of the input counters.
>>> Counter('abbb') | Counter('bcc')
Counter({'b': 3, 'c': 2, 'a': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
other_count = other[elem]
newcount = other_count if count < other_count else count
if newcount > 0:
result[elem] = newcount
for elem, count in other.items():
if elem not in self and count > 0:
result[elem] = count
return result
def __and__(self, other):
''' Intersection is the minimum of corresponding counts.
>>> Counter('abbb') & Counter('bcc')
Counter({'b': 1})
'''
if not isinstance(other, Counter):
return NotImplemented
result = Counter()
for elem, count in self.items():
other_count = other[elem]
newcount = count if count < other_count else other_count
if newcount > 0:
result[elem] = newcount
return result
def __pos__(self):
'Adds an empty counter, effectively stripping negative and zero counts'
return self + Counter()
def __neg__(self):
'''Subtracts from an empty counter. Strips positive and zero counts,
and flips the sign on negative counts.
'''
return Counter() - self
def _keep_positive(self):
'''Internal method to strip elements with a negative or zero count'''
nonpositive = [elem for elem, count in self.items() if not count > 0]
for elem in nonpositive:
del self[elem]
return self
def __iadd__(self, other):
'''Inplace add from another counter, keeping only positive counts.
>>> c = Counter('abbb')
>>> c += Counter('bcc')
>>> c
Counter({'b': 4, 'c': 2, 'a': 1})
'''
for elem, count in other.items():
self[elem] += count
return self._keep_positive()
def __isub__(self, other):
'''Inplace subtract counter, but keep only results with positive counts.
>>> c = Counter('abbbc')
>>> c -= Counter('bccd')
>>> c
Counter({'b': 2, 'a': 1})
'''
for elem, count in other.items():
self[elem] -= count
return self._keep_positive()
def __ior__(self, other):
'''Inplace union is the maximum of value from either counter.
>>> c = Counter('abbb')
>>> c |= Counter('bcc')
>>> c
Counter({'b': 3, 'c': 2, 'a': 1})
'''
for elem, other_count in other.items():
count = self[elem]
if other_count > count:
self[elem] = other_count
return self._keep_positive()
def __iand__(self, other):
'''Inplace intersection is the minimum of corresponding counts.
>>> c = Counter('abbb')
>>> c &= Counter('bcc')
>>> c
Counter({'b': 1})
'''
for elem, count in self.items():
other_count = other[elem]
if other_count < count:
self[elem] = other_count
return self._keep_positive()
########################################################################
### ChainMap (helper for configparser and string.Template)
########################################################################
class ChainMap(MutableMapping):
''' A ChainMap groups multiple dicts (or other mappings) together
to create a single, updateable view.
The underlying mappings are stored in a list. That list is public and can
accessed or updated using the *maps* attribute. There is no other state.
Lookups search the underlying mappings successively until a key is found.
In contrast, writes, updates, and deletions only operate on the first
mapping.
'''
def __init__(self, *maps):
'''Initialize a ChainMap by setting *maps* to the given mappings.
If no mappings are provided, a single empty dictionary is used.
'''
self.maps = list(maps) or [{}] # always at least one map
def __missing__(self, key):
raise KeyError(key)
def __getitem__(self, key):
for mapping in self.maps:
try:
return mapping[key] # can't use 'key in mapping' with defaultdict
except KeyError:
pass
return self.__missing__(key) # support subclasses that define __missing__
def get(self, key, default=None):
return self[key] if key in self else default
def __len__(self):
return len(set().union(*self.maps)) # reuses stored hash values if possible
def __iter__(self):
return iter(set().union(*self.maps))
def __contains__(self, key):
return any(key in m for m in self.maps)
def __bool__(self):
return any(self.maps)
@_recursive_repr()
def __repr__(self):
return '{0.__class__.__name__}({1})'.format(
self, ', '.join(map(repr, self.maps)))
@classmethod
def fromkeys(cls, iterable, *args):
'Create a ChainMap with a single dict created from the iterable.'
return cls(dict.fromkeys(iterable, *args))
def copy(self):
'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]'
return self.__class__(self.maps[0].copy(), *self.maps[1:])
__copy__ = copy
def new_child(self, m=None): # like Django's Context.push()
'''New ChainMap with a new map followed by all previous maps.
If no map is provided, an empty dict is used.
'''
if m is None:
m = {}
return self.__class__(m, *self.maps)
@property
def parents(self): # like Django's Context.pop()
'New ChainMap from maps[1:].'
return self.__class__(*self.maps[1:])
def __setitem__(self, key, value):
self.maps[0][key] = value
def __delitem__(self, key):
try:
del self.maps[0][key]
except KeyError:
raise KeyError('Key not found in the first mapping: {!r}'.format(key))
def popitem(self):
'Remove and return an item pair from maps[0]. Raise KeyError is maps[0] is empty.'
try:
return self.maps[0].popitem()
except KeyError:
raise KeyError('No keys found in the first mapping.')
def pop(self, key, *args):
'Remove *key* from maps[0] and return its value. Raise KeyError if *key* not in maps[0].'
try:
return self.maps[0].pop(key, *args)
except KeyError:
raise KeyError('Key not found in the first mapping: {!r}'.format(key))
def clear(self):
'Clear maps[0], leaving maps[1:] intact.'
self.maps[0].clear()
################################################################################
### UserDict
################################################################################
class UserDict(MutableMapping):
# Start by filling-out the abstract methods
def __init__(*args, **kwargs):
if not args:
raise TypeError("descriptor '__init__' of 'UserDict' object "
"needs an argument")
self, *args = args
if len(args) > 1:
raise TypeError('expected at most 1 arguments, got %d' % len(args))
if args:
dict = args[0]
elif 'dict' in kwargs:
dict = kwargs.pop('dict')
import warnings
warnings.warn("Passing 'dict' as keyword argument is deprecated",
PendingDeprecationWarning, stacklevel=2)
else:
dict = None
self.data = {}
if dict is not None:
self.update(dict)
if len(kwargs):
self.update(kwargs)
def __len__(self): return len(self.data)
def __getitem__(self, key):
if key in self.data:
return self.data[key]
if hasattr(self.__class__, "__missing__"):
return self.__class__.__missing__(self, key)
raise KeyError(key)
def __setitem__(self, key, item): self.data[key] = item
def __delitem__(self, key): del self.data[key]
def __iter__(self):
return iter(self.data)
# Modify __contains__ to work correctly when __missing__ is present
def __contains__(self, key):
return key in self.data
# Now, add the methods in dicts but not in MutableMapping
def __repr__(self): return repr(self.data)
def copy(self):
if self.__class__ is UserDict:
return UserDict(self.data.copy())
import copy
data = self.data
try:
self.data = {}
c = copy.copy(self)
finally:
self.data = data
c.update(self)
return c
@classmethod
def fromkeys(cls, iterable, value=None):
d = cls()
for key in iterable:
d[key] = value
return d
################################################################################
### UserList
################################################################################
class UserList(MutableSequence):
"""A more or less complete user-defined wrapper around list objects."""
def __init__(self, initlist=None):
self.data = []
if initlist is not None:
# XXX should this accept an arbitrary sequence?
if type(initlist) == type(self.data):
self.data[:] = initlist
elif isinstance(initlist, UserList):
self.data[:] = initlist.data[:]
else:
self.data = list(initlist)
def __repr__(self): return repr(self.data)
def __lt__(self, other): return self.data < self.__cast(other)
def __le__(self, other): return self.data <= self.__cast(other)
def __eq__(self, other): return self.data == self.__cast(other)
def __ne__(self, other): return self.data != self.__cast(other)
def __gt__(self, other): return self.data > self.__cast(other)
def __ge__(self, other): return self.data >= self.__cast(other)
def __cast(self, other):
return other.data if isinstance(other, UserList) else other
def __contains__(self, item): return item in self.data
def __len__(self): return len(self.data)
def __getitem__(self, i): return self.data[i]
def __setitem__(self, i, item): self.data[i] = item
def __delitem__(self, i): del self.data[i]
def __add__(self, other):
if isinstance(other, UserList):
return self.__class__(self.data + other.data)
elif isinstance(other, type(self.data)):
return self.__class__(self.data + other)
return self.__class__(self.data + list(other))
def __radd__(self, other):
if isinstance(other, UserList):
return self.__class__(other.data + self.data)
elif isinstance(other, type(self.data)):
return self.__class__(other + self.data)
return self.__class__(list(other) + self.data)
def __iadd__(self, other):
if isinstance(other, UserList):
self.data += other.data
elif isinstance(other, type(self.data)):
self.data += other
else:
self.data += list(other)
return self
def __mul__(self, n):
return self.__class__(self.data*n)
__rmul__ = __mul__
def __imul__(self, n):
self.data *= n
return self
def append(self, item): self.data.append(item)
def insert(self, i, item): self.data.insert(i, item)
def pop(self, i=-1): return self.data.pop(i)
def remove(self, item): self.data.remove(item)
def clear(self): self.data.clear()
def copy(self): return self.__class__(self)
def count(self, item): return self.data.count(item)
def index(self, item, *args): return self.data.index(item, *args)
def reverse(self): self.data.reverse()
def sort(self, *args, **kwds): self.data.sort(*args, **kwds)
def extend(self, other):
if isinstance(other, UserList):
self.data.extend(other.data)
else:
self.data.extend(other)
################################################################################
### UserString
################################################################################
class UserString(Sequence):
def __init__(self, seq):
if isinstance(seq, str):
self.data = seq
elif isinstance(seq, UserString):
self.data = seq.data[:]
else:
self.data = str(seq)
def __str__(self): return str(self.data)
def __repr__(self): return repr(self.data)
def __int__(self): return int(self.data)
def __float__(self): return float(self.data)
def __complex__(self): return complex(self.data)
def __hash__(self): return hash(self.data)
def __eq__(self, string):
if isinstance(string, UserString):
return self.data == string.data
return self.data == string
def __ne__(self, string):
if isinstance(string, UserString):
return self.data != string.data
return self.data != string
def __lt__(self, string):
if isinstance(string, UserString):
return self.data < string.data
return self.data < string
def __le__(self, string):
if isinstance(string, UserString):
return self.data <= string.data
return self.data <= string
def __gt__(self, string):
if isinstance(string, UserString):
return self.data > string.data
return self.data > string
def __ge__(self, string):
if isinstance(string, UserString):
return self.data >= string.data
return self.data >= string
def __contains__(self, char):
if isinstance(char, UserString):
char = char.data
return char in self.data
def __len__(self): return len(self.data)
def __getitem__(self, index): return self.__class__(self.data[index])
def __add__(self, other):
if isinstance(other, UserString):
return self.__class__(self.data + other.data)
elif isinstance(other, str):
return self.__class__(self.data + other)
return self.__class__(self.data + str(other))
def __radd__(self, other):
if isinstance(other, str):
return self.__class__(other + self.data)
return self.__class__(str(other) + self.data)
def __mul__(self, n):
return self.__class__(self.data*n)
__rmul__ = __mul__
def __mod__(self, args):
return self.__class__(self.data % args)
# the following methods are defined in alphabetical order:
def capitalize(self): return self.__class__(self.data.capitalize())
def center(self, width, *args):
return self.__class__(self.data.center(width, *args))
def count(self, sub, start=0, end=_sys.maxsize):
if isinstance(sub, UserString):
sub = sub.data
return self.data.count(sub, start, end)
def encode(self, encoding=None, errors=None): # XXX improve this?
if encoding:
if errors:
return self.__class__(self.data.encode(encoding, errors))
return self.__class__(self.data.encode(encoding))
return self.__class__(self.data.encode())
def endswith(self, suffix, start=0, end=_sys.maxsize):
return self.data.endswith(suffix, start, end)
def expandtabs(self, tabsize=8):
return self.__class__(self.data.expandtabs(tabsize))
def find(self, sub, start=0, end=_sys.maxsize):
if isinstance(sub, UserString):
sub = sub.data
return self.data.find(sub, start, end)
def format(self, *args, **kwds):
return self.data.format(*args, **kwds)
def index(self, sub, start=0, end=_sys.maxsize):
return self.data.index(sub, start, end)
def isalpha(self): return self.data.isalpha()
def isalnum(self): return self.data.isalnum()
def isdecimal(self): return self.data.isdecimal()
def isdigit(self): return self.data.isdigit()
def isidentifier(self): return self.data.isidentifier()
def islower(self): return self.data.islower()
def isnumeric(self): return self.data.isnumeric()
def isspace(self): return self.data.isspace()
def istitle(self): return self.data.istitle()
def isupper(self): return self.data.isupper()
def join(self, seq): return self.data.join(seq)
def ljust(self, width, *args):
return self.__class__(self.data.ljust(width, *args))
def lower(self): return self.__class__(self.data.lower())
def lstrip(self, chars=None): return self.__class__(self.data.lstrip(chars))
def partition(self, sep):
return self.data.partition(sep)
def replace(self, old, new, maxsplit=-1):
if isinstance(old, UserString):
old = old.data
if isinstance(new, UserString):
new = new.data
return self.__class__(self.data.replace(old, new, maxsplit))
def rfind(self, sub, start=0, end=_sys.maxsize):
if isinstance(sub, UserString):
sub = sub.data
return self.data.rfind(sub, start, end)
def rindex(self, sub, start=0, end=_sys.maxsize):
return self.data.rindex(sub, start, end)
def rjust(self, width, *args):
return self.__class__(self.data.rjust(width, *args))
def rpartition(self, sep):
return self.data.rpartition(sep)
def rstrip(self, chars=None):
return self.__class__(self.data.rstrip(chars))
def split(self, sep=None, maxsplit=-1):
return self.data.split(sep, maxsplit)
def rsplit(self, sep=None, maxsplit=-1):
return self.data.rsplit(sep, maxsplit)
def splitlines(self, keepends=False): return self.data.splitlines(keepends)
def startswith(self, prefix, start=0, end=_sys.maxsize):
return self.data.startswith(prefix, start, end)
def strip(self, chars=None): return self.__class__(self.data.strip(chars))
def swapcase(self): return self.__class__(self.data.swapcase())
def title(self): return self.__class__(self.data.title())
def translate(self, *args):
return self.__class__(self.data.translate(*args))
def upper(self): return self.__class__(self.data.upper())
def zfill(self, width): return self.__class__(self.data.zfill(width))
################################################################################
### Simple tests
################################################################################
# verify that instances can be pickled
from collections import namedtuple
from pickle import loads, dumps
Point = namedtuple('Point', 'x, y', True)
p = Point(x=10, y=20)
assert p == loads(dumps(p))
# test and demonstrate ability to override methods
class Point(namedtuple('Point', 'x y')):
__slots__ = ()
@property
def hypot(self):
return (self.x ** 2 + self.y ** 2) ** 0.5
def __str__(self):
return 'Point: x=%6.3f y=%6.3f hypot=%6.3f' % (self.x, self.y, self.hypot)
for p in Point(3, 4), Point(14, 5/7.):
print (p)
class Point(namedtuple('Point', 'x y')):
'Point class with optimized _make() and _replace() without error-checking'
__slots__ = ()
_make = classmethod(tuple.__new__)
def _replace(self, _map=map, **kwds):
return self._make(_map(kwds.get, ('x', 'y'), self))
print(Point(11, 22)._replace(x=100))
Point3D = namedtuple('Point3D', Point._fields + ('z',))
print(Point3D.__doc__)
import doctest, collections
TestResults = namedtuple('TestResults', 'failed attempted')
print(TestResults(*doctest.testmod(collections)))
# This directory is a Python package.
# Copyright 2009 Brian Quinlan. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Implements ProcessPoolExecutor.
The follow diagram and text describe the data-flow through the system:
|======================= In-process =====================|== Out-of-process ==|
+----------+ +----------+ +--------+ +-----------+ +---------+
| | => | Work Ids | => | | => | Call Q | => | |
| | +----------+ | | +-----------+ | |
| | | ... | | | | ... | | |
| | | 6 | | | | 5, call() | | |
| | | 7 | | | | ... | | |
| Process | | ... | | Local | +-----------+ | Process |
| Pool | +----------+ | Worker | | #1..n |
| Executor | | Thread | | |
| | +----------- + | | +-----------+ | |
| | <=> | Work Items | <=> | | <= | Result Q | <= | |
| | +------------+ | | +-----------+ | |
| | | 6: call() | | | | ... | | |
| | | future | | | | 4, result | | |
| | | ... | | | | 3, except | | |
+----------+ +------------+ +--------+ +-----------+ +---------+
Executor.submit() called:
- creates a uniquely numbered _WorkItem and adds it to the "Work Items" dict
- adds the id of the _WorkItem to the "Work Ids" queue
Local worker thread:
- reads work ids from the "Work Ids" queue and looks up the corresponding
WorkItem from the "Work Items" dict: if the work item has been cancelled then
it is simply removed from the dict, otherwise it is repackaged as a
_CallItem and put in the "Call Q". New _CallItems are put in the "Call Q"
until "Call Q" is full. NOTE: the size of the "Call Q" is kept small because
calls placed in the "Call Q" can no longer be cancelled with Future.cancel().
- reads _ResultItems from "Result Q", updates the future stored in the
"Work Items" dict and deletes the dict entry
Process #1..n:
- reads _CallItems from "Call Q", executes the calls, and puts the resulting
_ResultItems in "Result Q"
"""
__author__ = 'Brian Quinlan ([email protected])'
import atexit
import os
from concurrent.futures import _base
import queue
from queue import Full
import multiprocessing
from multiprocessing import SimpleQueue
from multiprocessing.connection import wait
import threading
import weakref
# Workers are created as daemon threads and processes. This is done to allow the
# interpreter to exit when there are still idle processes in a
# ProcessPoolExecutor's process pool (i.e. shutdown() was not called). However,
# allowing workers to die with the interpreter has two undesirable properties:
# - The workers would still be running during interpretor shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads/processes finish.
_threads_queues = weakref.WeakKeyDictionary()
_shutdown = False
def _python_exit():
global _shutdown
_shutdown = True
items = list(_threads_queues.items())
for t, q in items:
q.put(None)
for t, q in items:
t.join()
# Controls how many more calls than processes will be queued in the call queue.
# A smaller number will mean that processes spend more time idle waiting for
# work while a larger number will make Future.cancel() succeed less frequently
# (Futures in the call queue cannot be cancelled).
EXTRA_QUEUED_CALLS = 1
class _WorkItem(object):
def __init__(self, future, fn, args, kwargs):
self.future = future
self.fn = fn
self.args = args
self.kwargs = kwargs
class _ResultItem(object):
def __init__(self, work_id, exception=None, result=None):
self.work_id = work_id
self.exception = exception
self.result = result
class _CallItem(object):
def __init__(self, work_id, fn, args, kwargs):
self.work_id = work_id
self.fn = fn
self.args = args
self.kwargs = kwargs
def _process_worker(call_queue, result_queue):
"""Evaluates calls from call_queue and places the results in result_queue.
This worker is run in a separate process.
Args:
call_queue: A multiprocessing.Queue of _CallItems that will be read and
evaluated by the worker.
result_queue: A multiprocessing.Queue of _ResultItems that will written
to by the worker.
shutdown: A multiprocessing.Event that will be set as a signal to the
worker that it should exit when call_queue is empty.
"""
while True:
call_item = call_queue.get(block=True)
if call_item is None:
# Wake up queue management thread
result_queue.put(os.getpid())
return
try:
r = call_item.fn(*call_item.args, **call_item.kwargs)
except BaseException as e:
result_queue.put(_ResultItem(call_item.work_id,
exception=e))
else:
result_queue.put(_ResultItem(call_item.work_id,
result=r))
def _add_call_item_to_queue(pending_work_items,
work_ids,
call_queue):
"""Fills call_queue with _WorkItems from pending_work_items.
This function never blocks.
Args:
pending_work_items: A dict mapping work ids to _WorkItems e.g.
{5: <_WorkItem...>, 6: <_WorkItem...>, ...}
work_ids: A queue.Queue of work ids e.g. Queue([5, 6, ...]). Work ids
are consumed and the corresponding _WorkItems from
pending_work_items are transformed into _CallItems and put in
call_queue.
call_queue: A multiprocessing.Queue that will be filled with _CallItems
derived from _WorkItems.
"""
while True:
if call_queue.full():
return
try:
work_id = work_ids.get(block=False)
except queue.Empty:
return
else:
work_item = pending_work_items[work_id]
if work_item.future.set_running_or_notify_cancel():
call_queue.put(_CallItem(work_id,
work_item.fn,
work_item.args,
work_item.kwargs),
block=True)
else:
del pending_work_items[work_id]
continue
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
This function is run in a local thread.
Args:
executor_reference: A weakref.ref to the ProcessPoolExecutor that owns
this thread. Used to determine if the ProcessPoolExecutor has been
garbage collected and that this function can exit.
process: A list of the multiprocessing.Process instances used as
workers.
pending_work_items: A dict mapping work ids to _WorkItems e.g.
{5: <_WorkItem...>, 6: <_WorkItem...>, ...}
work_ids_queue: A queue.Queue of work ids e.g. Queue([5, 6, ...]).
call_queue: A multiprocessing.Queue that will be filled with _CallItems
derived from _WorkItems for processing by the process workers.
result_queue: A multiprocessing.Queue of _ResultItems generated by the
process workers.
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
# This is an upper bound
nb_children_alive = sum(p.is_alive() for p in processes.values())
for i in range(0, nb_children_alive):
call_queue.put_nowait(None)
# Release the queue's resources as soon as possible.
call_queue.close()
# If .join() is not called on the created processes then
# some multiprocessing.Queue methods may deadlock on Mac OS X.
for p in processes.values():
p.join()
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else:
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
if isinstance(result_item, int):
# Clean shutdown of a worker using its PID
# (avoids marking the executor broken)
assert shutting_down()
p = processes.pop(result_item)
p.join()
if not processes:
shutdown_worker()
return
elif result_item is not None:
work_item = pending_work_items.pop(result_item.work_id, None)
# work_item can be None if another process terminated (see above)
if work_item is not None:
if result_item.exception:
work_item.future.set_exception(result_item.exception)
else:
work_item.future.set_result(result_item.result)
# Delete references to object. See issue16284
del work_item
# Check whether we should start shutting down.
executor = executor_reference()
# No more work items can be added if:
# - The interpreter is shutting down OR
# - The executor that owns this worker has been collected OR
# - The executor that owns this worker has been shutdown.
if shutting_down():
try:
# Since no new work items can be added, it is safe to shutdown
# this thread if there are no pending work items.
if not pending_work_items:
shutdown_worker()
return
except Full:
# This is not a problem: we will eventually be woken up (in
# result_queue.get()) and be able to send a sentinel again.
pass
executor = None
_system_limits_checked = False
_system_limited = None
def _check_system_limits():
global _system_limits_checked, _system_limited
if _system_limits_checked:
if _system_limited:
raise NotImplementedError(_system_limited)
_system_limits_checked = True
try:
nsems_max = os.sysconf("SC_SEM_NSEMS_MAX")
except (AttributeError, ValueError):
# sysconf not available or setting not available
return
if nsems_max == -1:
# indetermined limit, assume that limit is determined
# by available memory only
return
if nsems_max >= 256:
# minimum number of semaphores available
# according to POSIX
return
_system_limited = "system provides too few semaphores (%d available, 256 necessary)" % nsems_max
raise NotImplementedError(_system_limited)
class BrokenProcessPool(RuntimeError):
"""
Raised when a process in a ProcessPoolExecutor terminated abruptly
while a future was in the running state.
"""
class ProcessPoolExecutor(_base.Executor):
def __init__(self, max_workers=None):
"""Initializes a new ProcessPoolExecutor instance.
Args:
max_workers: The maximum number of processes that can be used to
execute the given calls. If None or not given then as many
worker processes will be created as the machine has processors.
"""
_check_system_limits()
if max_workers is None:
self._max_workers = os.cpu_count() or 1
else:
self._max_workers = max_workers
# Make the call queue slightly larger than the number of processes to
# prevent the worker processes from idling. But don't make it too big
# because futures in the call queue cannot be cancelled.
self._call_queue = multiprocessing.Queue(self._max_workers +
EXTRA_QUEUED_CALLS)
# Killed worker processes can produce spurious "broken pipe"
# tracebacks in the queue's own worker thread. But we detect killed
# processes anyway, so silence the tracebacks.
self._call_queue._ignore_epipe = True
self._result_queue = SimpleQueue()
self._work_ids = queue.Queue()
self._queue_management_thread = None
# Map of pids to processes
self._processes = {}
# Shutdown is a two-step process.
self._shutdown_thread = False
self._shutdown_lock = threading.Lock()
self._broken = False
self._queue_count = 0
self._pending_work_items = {}
def _start_queue_management_thread(self):
# When the executor gets lost, the weakref callback will wake up
# the queue management thread.
def weakref_cb(_, q=self._result_queue):
q.put(None)
if self._queue_management_thread is None:
# Start the processes so that their sentinels are known.
self._adjust_process_count()
self._queue_management_thread = threading.Thread(
target=_queue_management_worker,
args=(weakref.ref(self, weakref_cb),
self._processes,
self._pending_work_items,
self._work_ids,
self._call_queue,
self._result_queue))
self._queue_management_thread.daemon = True
self._queue_management_thread.start()
_threads_queues[self._queue_management_thread] = self._result_queue
def _adjust_process_count(self):
for _ in range(len(self._processes), self._max_workers):
p = multiprocessing.Process(
target=_process_worker,
args=(self._call_queue,
self._result_queue))
p.start()
self._processes[p.pid] = p
def submit(self, fn, *args, **kwargs):
with self._shutdown_lock:
if self._broken:
raise BrokenProcessPool('A child process terminated '
'abruptly, the process pool is not usable anymore')
if self._shutdown_thread:
raise RuntimeError('cannot schedule new futures after shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._pending_work_items[self._queue_count] = w
self._work_ids.put(self._queue_count)
self._queue_count += 1
# Wake up queue management thread
self._result_queue.put(None)
self._start_queue_management_thread()
return f
submit.__doc__ = _base.Executor.submit.__doc__
def shutdown(self, wait=True):
with self._shutdown_lock:
self._shutdown_thread = True
if self._queue_management_thread:
# Wake up queue management thread
self._result_queue.put(None)
if wait:
self._queue_management_thread.join()
# To reduce the risk of opening too many files, remove references to
# objects that use file descriptors.
self._queue_management_thread = None
self._call_queue = None
self._result_queue = None
self._processes = None
shutdown.__doc__ = _base.Executor.shutdown.__doc__
atexit.register(_python_exit)
# Copyright 2009 Brian Quinlan. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Implements ThreadPoolExecutor."""
__author__ = 'Brian Quinlan ([email protected])'
import atexit
from concurrent.futures import _base
import queue
import threading
import weakref
# Workers are created as daemon threads. This is done to allow the interpreter
# to exit when there are still idle threads in a ThreadPoolExecutor's thread
# pool (i.e. shutdown() was not called). However, allowing workers to die with
# the interpreter has two undesirable properties:
# - The workers would still be running during interpretor shutdown,
# meaning that they would fail in unpredictable ways.
# - The workers could be killed while evaluating a work item, which could
# be bad if the callable being evaluated has external side-effects e.g.
# writing to a file.
#
# To work around this problem, an exit handler is installed which tells the
# workers to exit when their work queues are empty and then waits until the
# threads finish.
_threads_queues = weakref.WeakKeyDictionary()
_shutdown = False
def _python_exit():
global _shutdown
_shutdown = True
items = list(_threads_queues.items())
for t, q in items:
q.put(None)
for t, q in items:
t.join()
atexit.register(_python_exit)
class _WorkItem(object):
def __init__(self, future, fn, args, kwargs):
self.future = future
self.fn = fn
self.args = args
self.kwargs = kwargs
def run(self):
if not self.future.set_running_or_notify_cancel():
return
try:
result = self.fn(*self.args, **self.kwargs)
except BaseException as e:
self.future.set_exception(e)
else:
self.future.set_result(result)
def _worker(executor_reference, work_queue):
try:
while True:
work_item = work_queue.get(block=True)
if work_item is not None:
work_item.run()
# Delete references to object. See issue16284
del work_item
continue
executor = executor_reference()
# Exit if:
# - The interpreter is shutting down OR
# - The executor that owns the worker has been collected OR
# - The executor that owns the worker has been shutdown.
if _shutdown or executor is None or executor._shutdown:
# Notice other workers
work_queue.put(None)
return
del executor
except BaseException:
_base.LOGGER.critical('Exception in worker', exc_info=True)
class ThreadPoolExecutor(_base.Executor):
def __init__(self, max_workers):
"""Initializes a new ThreadPoolExecutor instance.
Args:
max_workers: The maximum number of threads that can be used to
execute the given calls.
"""
self._max_workers = max_workers
self._work_queue = queue.Queue()
self._threads = set()
self._shutdown = False
self._shutdown_lock = threading.Lock()
def submit(self, fn, *args, **kwargs):
with self._shutdown_lock:
if self._shutdown:
raise RuntimeError('cannot schedule new futures after shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._work_queue.put(w)
self._adjust_thread_count()
return f
submit.__doc__ = _base.Executor.submit.__doc__
def _adjust_thread_count(self):
# When the executor gets lost, the weakref callback will wake up
# the worker threads.
def weakref_cb(_, q=self._work_queue):
q.put(None)
# TODO(bquinlan): Should avoid creating new threads if there are more
# idle threads than items in the work queue.
if len(self._threads) < self._max_workers:
t = threading.Thread(target=_worker,
args=(weakref.ref(self, weakref_cb),
self._work_queue))
t.daemon = True
t.start()
self._threads.add(t)
_threads_queues[t] = self._work_queue
def shutdown(self, wait=True):
with self._shutdown_lock:
self._shutdown = True
self._work_queue.put(None)
if wait:
for t in self._threads:
t.join()
shutdown.__doc__ = _base.Executor.shutdown.__doc__
# Copyright 2009 Brian Quinlan. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
__author__ = 'Brian Quinlan ([email protected])'
import collections
import logging
import threading
import time
FIRST_COMPLETED = 'FIRST_COMPLETED'
FIRST_EXCEPTION = 'FIRST_EXCEPTION'
ALL_COMPLETED = 'ALL_COMPLETED'
_AS_COMPLETED = '_AS_COMPLETED'
# Possible future states (for internal use by the futures package).
PENDING = 'PENDING'
RUNNING = 'RUNNING'
# The future was cancelled by the user...
CANCELLED = 'CANCELLED'
# ...and _Waiter.add_cancelled() was called by a worker.
CANCELLED_AND_NOTIFIED = 'CANCELLED_AND_NOTIFIED'
FINISHED = 'FINISHED'
_FUTURE_STATES = [
PENDING,
RUNNING,
CANCELLED,
CANCELLED_AND_NOTIFIED,
FINISHED
]
_STATE_TO_DESCRIPTION_MAP = {
PENDING: "pending",
RUNNING: "running",
CANCELLED: "cancelled",
CANCELLED_AND_NOTIFIED: "cancelled",
FINISHED: "finished"
}
# Logger for internal use by the futures package.
LOGGER = logging.getLogger("concurrent.futures")
class Error(Exception):
"""Base class for all future-related exceptions."""
pass
class CancelledError(Error):
"""The Future was cancelled."""
pass
class TimeoutError(Error):
"""The operation exceeded the given deadline."""
pass
class _Waiter(object):
"""Provides the event that wait() and as_completed() block on."""
def __init__(self):
self.event = threading.Event()
self.finished_futures = []
def add_result(self, future):
self.finished_futures.append(future)
def add_exception(self, future):
self.finished_futures.append(future)
def add_cancelled(self, future):
self.finished_futures.append(future)
class _AsCompletedWaiter(_Waiter):
"""Used by as_completed()."""
def __init__(self):
super(_AsCompletedWaiter, self).__init__()
self.lock = threading.Lock()
def add_result(self, future):
with self.lock:
super(_AsCompletedWaiter, self).add_result(future)
self.event.set()
def add_exception(self, future):
with self.lock:
super(_AsCompletedWaiter, self).add_exception(future)
self.event.set()
def add_cancelled(self, future):
with self.lock:
super(_AsCompletedWaiter, self).add_cancelled(future)
self.event.set()
class _FirstCompletedWaiter(_Waiter):
"""Used by wait(return_when=FIRST_COMPLETED)."""
def add_result(self, future):
super().add_result(future)
self.event.set()
def add_exception(self, future):
super().add_exception(future)
self.event.set()
def add_cancelled(self, future):
super().add_cancelled(future)
self.event.set()
class _AllCompletedWaiter(_Waiter):
"""Used by wait(return_when=FIRST_EXCEPTION and ALL_COMPLETED)."""
def __init__(self, num_pending_calls, stop_on_exception):
self.num_pending_calls = num_pending_calls
self.stop_on_exception = stop_on_exception
self.lock = threading.Lock()
super().__init__()
def _decrement_pending_calls(self):
with self.lock:
self.num_pending_calls -= 1
if not self.num_pending_calls:
self.event.set()
def add_result(self, future):
super().add_result(future)
self._decrement_pending_calls()
def add_exception(self, future):
super().add_exception(future)
if self.stop_on_exception:
self.event.set()
else:
self._decrement_pending_calls()
def add_cancelled(self, future):
super().add_cancelled(future)
self._decrement_pending_calls()
class _AcquireFutures(object):
"""A context manager that does an ordered acquire of Future conditions."""
def __init__(self, futures):
self.futures = sorted(futures, key=id)
def __enter__(self):
for future in self.futures:
future._condition.acquire()
def __exit__(self, *args):
for future in self.futures:
future._condition.release()
def _create_and_install_waiters(fs, return_when):
if return_when == _AS_COMPLETED:
waiter = _AsCompletedWaiter()
elif return_when == FIRST_COMPLETED:
waiter = _FirstCompletedWaiter()
else:
pending_count = sum(
f._state not in [CANCELLED_AND_NOTIFIED, FINISHED] for f in fs)
if return_when == FIRST_EXCEPTION:
waiter = _AllCompletedWaiter(pending_count, stop_on_exception=True)
elif return_when == ALL_COMPLETED:
waiter = _AllCompletedWaiter(pending_count, stop_on_exception=False)
else:
raise ValueError("Invalid return condition: %r" % return_when)
for f in fs:
f._waiters.append(waiter)
return waiter
def as_completed(fs, timeout=None):
"""An iterator over the given futures that yields each as it completes.
Args:
fs: The sequence of Futures (possibly created by different Executors) to
iterate over.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.
Returns:
An iterator that yields the given Futures as they complete (finished or
cancelled). If any given Futures are duplicated, they will be returned
once.
Raises:
TimeoutError: If the entire result iterator could not be generated
before the given timeout.
"""
if timeout is not None:
end_time = timeout + time.time()
fs = set(fs)
with _AcquireFutures(fs):
finished = set(
f for f in fs
if f._state in [CANCELLED_AND_NOTIFIED, FINISHED])
pending = fs - finished
waiter = _create_and_install_waiters(fs, _AS_COMPLETED)
try:
yield from finished
while pending:
if timeout is None:
wait_timeout = None
else:
wait_timeout = end_time - time.time()
if wait_timeout < 0:
raise TimeoutError(
'%d (of %d) futures unfinished' % (
len(pending), len(fs)))
waiter.event.wait(wait_timeout)
with waiter.lock:
finished = waiter.finished_futures
waiter.finished_futures = []
waiter.event.clear()
for future in finished:
yield future
pending.remove(future)
finally:
for f in fs:
with f._condition:
f._waiters.remove(waiter)
DoneAndNotDoneFutures = collections.namedtuple(
'DoneAndNotDoneFutures', 'done not_done')
def wait(fs, timeout=None, return_when=ALL_COMPLETED):
"""Wait for the futures in the given sequence to complete.
Args:
fs: The sequence of Futures (possibly created by different Executors) to
wait upon.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.
return_when: Indicates when this function should return. The options
are:
FIRST_COMPLETED - Return when any future finishes or is
cancelled.
FIRST_EXCEPTION - Return when any future finishes by raising an
exception. If no future raises an exception
then it is equivalent to ALL_COMPLETED.
ALL_COMPLETED - Return when all futures finish or are cancelled.
Returns:
A named 2-tuple of sets. The first set, named 'done', contains the
futures that completed (is finished or cancelled) before the wait
completed. The second set, named 'not_done', contains uncompleted
futures.
"""
with _AcquireFutures(fs):
done = set(f for f in fs
if f._state in [CANCELLED_AND_NOTIFIED, FINISHED])
not_done = set(fs) - done
if (return_when == FIRST_COMPLETED) and done:
return DoneAndNotDoneFutures(done, not_done)
elif (return_when == FIRST_EXCEPTION) and done:
if any(f for f in done
if not f.cancelled() and f.exception() is not None):
return DoneAndNotDoneFutures(done, not_done)
if len(done) == len(fs):
return DoneAndNotDoneFutures(done, not_done)
waiter = _create_and_install_waiters(fs, return_when)
waiter.event.wait(timeout)
for f in fs:
with f._condition:
f._waiters.remove(waiter)
done.update(waiter.finished_futures)
return DoneAndNotDoneFutures(done, set(fs) - done)
class Future(object):
"""Represents the result of an asynchronous computation."""
def __init__(self):
"""Initializes the future. Should not be called by clients."""
self._condition = threading.Condition()
self._state = PENDING
self._result = None
self._exception = None
self._waiters = []
self._done_callbacks = []
def _invoke_callbacks(self):
for callback in self._done_callbacks:
try:
callback(self)
except Exception:
LOGGER.exception('exception calling callback for %r', self)
def __repr__(self):
with self._condition:
if self._state == FINISHED:
if self._exception:
return '<Future at %s state=%s raised %s>' % (
hex(id(self)),
_STATE_TO_DESCRIPTION_MAP[self._state],
self._exception.__class__.__name__)
else:
return '<Future at %s state=%s returned %s>' % (
hex(id(self)),
_STATE_TO_DESCRIPTION_MAP[self._state],
self._result.__class__.__name__)
return '<Future at %s state=%s>' % (
hex(id(self)),
_STATE_TO_DESCRIPTION_MAP[self._state])
def cancel(self):
"""Cancel the future if possible.
Returns True if the future was cancelled, False otherwise. A future
cannot be cancelled if it is running or has already completed.
"""
with self._condition:
if self._state in [RUNNING, FINISHED]:
return False
if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
return True
self._state = CANCELLED
self._condition.notify_all()
self._invoke_callbacks()
return True
def cancelled(self):
"""Return True if the future was cancelled."""
with self._condition:
return self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]
def running(self):
"""Return True if the future is currently executing."""
with self._condition:
return self._state == RUNNING
def done(self):
"""Return True of the future was cancelled or finished executing."""
with self._condition:
return self._state in [CANCELLED, CANCELLED_AND_NOTIFIED, FINISHED]
def __get_result(self):
if self._exception:
raise self._exception
else:
return self._result
def add_done_callback(self, fn):
"""Attaches a callable that will be called when the future finishes.
Args:
fn: A callable that will be called with this future as its only
argument when the future completes or is cancelled. The callable
will always be called by a thread in the same process in which
it was added. If the future has already completed or been
cancelled then the callable will be called immediately. These
callables are called in the order that they were added.
"""
with self._condition:
if self._state not in [CANCELLED, CANCELLED_AND_NOTIFIED, FINISHED]:
self._done_callbacks.append(fn)
return
fn(self)
def result(self, timeout=None):
"""Return the result of the call that the future represents.
Args:
timeout: The number of seconds to wait for the result if the future
isn't done. If None, then there is no limit on the wait time.
Returns:
The result of the call that the future represents.
Raises:
CancelledError: If the future was cancelled.
TimeoutError: If the future didn't finish executing before the given
timeout.
Exception: If the call raised then that exception will be raised.
"""
with self._condition:
if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
raise CancelledError()
elif self._state == FINISHED:
return self.__get_result()
self._condition.wait(timeout)
if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
raise CancelledError()
elif self._state == FINISHED:
return self.__get_result()
else:
raise TimeoutError()
def exception(self, timeout=None):
"""Return the exception raised by the call that the future represents.
Args:
timeout: The number of seconds to wait for the exception if the
future isn't done. If None, then there is no limit on the wait
time.
Returns:
The exception raised by the call that the future represents or None
if the call completed without raising.
Raises:
CancelledError: If the future was cancelled.
TimeoutError: If the future didn't finish executing before the given
timeout.
"""
with self._condition:
if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
raise CancelledError()
elif self._state == FINISHED:
return self._exception
self._condition.wait(timeout)
if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
raise CancelledError()
elif self._state == FINISHED:
return self._exception
else:
raise TimeoutError()
# The following methods should only be used by Executors and in tests.
def set_running_or_notify_cancel(self):
"""Mark the future as running or process any cancel notifications.
Should only be used by Executor implementations and unit tests.
If the future has been cancelled (cancel() was called and returned
True) then any threads waiting on the future completing (though calls
to as_completed() or wait()) are notified and False is returned.
If the future was not cancelled then it is put in the running state
(future calls to running() will return True) and True is returned.
This method should be called by Executor implementations before
executing the work associated with this future. If this method returns
False then the work should not be executed.
Returns:
False if the Future was cancelled, True otherwise.
Raises:
RuntimeError: if this method was already called or if set_result()
or set_exception() was called.
"""
with self._condition:
if self._state == CANCELLED:
self._state = CANCELLED_AND_NOTIFIED
for waiter in self._waiters:
waiter.add_cancelled(self)
# self._condition.notify_all() is not necessary because
# self.cancel() triggers a notification.
return False
elif self._state == PENDING:
self._state = RUNNING
return True
else:
LOGGER.critical('Future %s in unexpected state: %s',
id(self),
self._state)
raise RuntimeError('Future in unexpected state')
def set_result(self, result):
"""Sets the return value of work associated with the future.
Should only be used by Executor implementations and unit tests.
"""
with self._condition:
self._result = result
self._state = FINISHED
for waiter in self._waiters:
waiter.add_result(self)
self._condition.notify_all()
self._invoke_callbacks()
def set_exception(self, exception):
"""Sets the result of the future as being the given exception.
Should only be used by Executor implementations and unit tests.
"""
with self._condition:
self._exception = exception
self._state = FINISHED
for waiter in self._waiters:
waiter.add_exception(self)
self._condition.notify_all()
self._invoke_callbacks()
class Executor(object):
"""This is an abstract base class for concrete asynchronous executors."""
def submit(self, fn, *args, **kwargs):
"""Submits a callable to be executed with the given arguments.
Schedules the callable to be executed as fn(*args, **kwargs) and returns
a Future instance representing the execution of the callable.
Returns:
A Future representing the given call.
"""
raise NotImplementedError()
def map(self, fn, *iterables, timeout=None):
"""Returns an iterator equivalent to map(fn, iter).
Args:
fn: A callable that will take as many arguments as there are
passed iterables.
timeout: The maximum number of seconds to wait. If None, then there
is no limit on the wait time.
Returns:
An iterator equivalent to: map(func, *iterables) but the calls may
be evaluated out-of-order.
Raises:
TimeoutError: If the entire result iterator could not be generated
before the given timeout.
Exception: If fn(*args) raises for any values.
"""
if timeout is not None:
end_time = timeout + time.time()
fs = [self.submit(fn, *args) for args in zip(*iterables)]
# Yield must be hidden in closure so that the futures are submitted
# before the first iterator value is required.
def result_iterator():
try:
for future in fs:
if timeout is None:
yield future.result()
else:
yield future.result(end_time - time.time())
finally:
for future in fs:
future.cancel()
return result_iterator()
def shutdown(self, wait=True):
"""Clean-up the resources associated with the Executor.
It is safe to call this method several times. Otherwise, no other
methods can be called after this one.
Args:
wait: If True then shutdown will not return until all running
futures have finished executing and the resources used by the
executor have been reclaimed.
"""
pass
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.shutdown(wait=True)
return False
# Copyright 2009 Brian Quinlan. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Execute computations asynchronously using threads or processes."""
__author__ = 'Brian Quinlan ([email protected])'
from concurrent.futures._base import (FIRST_COMPLETED,
FIRST_EXCEPTION,
ALL_COMPLETED,
CancelledError,
TimeoutError,
Future,
Executor,
wait,
as_completed)
from concurrent.futures.process import ProcessPoolExecutor
from concurrent.futures.thread import ThreadPoolExecutor
import sys, os
import contextlib
import subprocess
# find_library(name) returns the pathname of a library, or None.
if os.name == "nt":
def _get_build_version():
"""Return the version of MSVC that was used to build Python.
For Python 2.3 and up, the version number is included in
sys.version. For earlier versions, assume the compiler is MSVC 6.
"""
# This function was copied from Lib/distutils/msvccompiler.py
prefix = "MSC v."
i = sys.version.find(prefix)
if i == -1:
return 6
i = i + len(prefix)
s, rest = sys.version[i:].split(" ", 1)
majorVersion = int(s[:-2]) - 6
minorVersion = int(s[2:3]) / 10.0
# I don't think paths are affected by minor version in version 6
if majorVersion == 6:
minorVersion = 0
if majorVersion >= 6:
return majorVersion + minorVersion
# else we don't know what version of the compiler this is
return None
def find_msvcrt():
"""Return the name of the VC runtime dll"""
version = _get_build_version()
if version is None:
# better be safe than sorry
return None
if version <= 6:
clibname = 'msvcrt'
else:
clibname = 'msvcr%d' % (version * 10)
# If python was built with in debug mode
import importlib.machinery
if '_d.pyd' in importlib.machinery.EXTENSION_SUFFIXES:
clibname += 'd'
return clibname+'.dll'
def find_library(name):
if name in ('c', 'm'):
return find_msvcrt()
# See MSDN for the REAL search order.
for directory in os.environ['PATH'].split(os.pathsep):
fname = os.path.join(directory, name)
if os.path.isfile(fname):
return fname
if fname.lower().endswith(".dll"):
continue
fname = fname + ".dll"
if os.path.isfile(fname):
return fname
return None
if os.name == "ce":
# search path according to MSDN:
# - absolute path specified by filename
# - The .exe launch directory
# - the Windows directory
# - ROM dll files (where are they?)
# - OEM specified search path: HKLM\Loader\SystemPath
def find_library(name):
return name
if os.name == "posix" and sys.platform == "darwin":
from ctypes.macholib.dyld import dyld_find as _dyld_find
def find_library(name):
possible = ['lib%s.dylib' % name,
'%s.dylib' % name,
'%s.framework/%s' % (name, name)]
for name in possible:
try:
return _dyld_find(name)
except ValueError:
continue
return None
elif os.name == "posix":
# Andreas Degert's find functions, using gcc, /sbin/ldconfig, objdump
import re, tempfile
def _findLib_gcc(name):
expr = r'[^\(\)\s]*lib%s\.[^\(\)\s]*' % re.escape(name)
fdout, ccout = tempfile.mkstemp()
os.close(fdout)
cmd = 'if type gcc >/dev/null 2>&1; then CC=gcc; elif type cc >/dev/null 2>&1; then CC=cc;else exit 10; fi;' \
'LANG=C LC_ALL=C $CC -Wl,-t -o ' + ccout + ' 2>&1 -l' + name
try:
f = os.popen(cmd)
try:
trace = f.read()
finally:
rv = f.close()
finally:
try:
os.unlink(ccout)
except FileNotFoundError:
pass
if rv == 10:
raise OSError('gcc or cc command not found')
res = re.search(expr, trace)
if not res:
return None
return res.group(0)
if sys.platform == "sunos5":
# use /usr/ccs/bin/dump on solaris
def _get_soname(f):
if not f:
return None
cmd = "/usr/ccs/bin/dump -Lpv 2>/dev/null " + f
with contextlib.closing(os.popen(cmd)) as f:
data = f.read()
res = re.search(r'\[.*\]\sSONAME\s+([^\s]+)', data)
if not res:
return None
return res.group(1)
else:
def _get_soname(f):
# assuming GNU binutils / ELF
if not f:
return None
cmd = 'if ! type objdump >/dev/null 2>&1; then exit 10; fi;' \
"objdump -p -j .dynamic 2>/dev/null " + f
f = os.popen(cmd)
try:
dump = f.read()
finally:
rv = f.close()
if rv == 10:
raise OSError('objdump command not found')
res = re.search(r'\sSONAME\s+([^\s]+)', dump)
if not res:
return None
return res.group(1)
if sys.platform.startswith(("freebsd", "openbsd", "dragonfly")):
def _num_version(libname):
# "libxyz.so.MAJOR.MINOR" => [ MAJOR, MINOR ]
parts = libname.split(".")
nums = []
try:
while parts:
nums.insert(0, int(parts.pop()))
except ValueError:
pass
return nums or [ sys.maxsize ]
def find_library(name):
ename = re.escape(name)
expr = r':-l%s\.\S+ => \S*/(lib%s\.\S+)' % (ename, ename)
with contextlib.closing(os.popen('/sbin/ldconfig -r 2>/dev/null')) as f:
data = f.read()
res = re.findall(expr, data)
if not res:
return _get_soname(_findLib_gcc(name))
res.sort(key=_num_version)
return res[-1]
elif sys.platform == "sunos5":
def _findLib_crle(name, is64):
if not os.path.exists('/usr/bin/crle'):
return None
if is64:
cmd = 'env LC_ALL=C /usr/bin/crle -64 2>/dev/null'
else:
cmd = 'env LC_ALL=C /usr/bin/crle 2>/dev/null'
with contextlib.closing(os.popen(cmd)) as f:
for line in f.readlines():
line = line.strip()
if line.startswith('Default Library Path (ELF):'):
paths = line.split()[4]
if not paths:
return None
for dir in paths.split(":"):
libfile = os.path.join(dir, "lib%s.so" % name)
if os.path.exists(libfile):
return libfile
return None
def find_library(name, is64 = False):
return _get_soname(_findLib_crle(name, is64) or _findLib_gcc(name))
else:
def _findSoname_ldconfig(name):
import struct
if struct.calcsize('l') == 4:
machine = os.uname().machine + '-32'
else:
machine = os.uname().machine + '-64'
mach_map = {
'x86_64-64': 'libc6,x86-64',
'ppc64-64': 'libc6,64bit',
'sparc64-64': 'libc6,64bit',
's390x-64': 'libc6,64bit',
'ia64-64': 'libc6,IA-64',
}
abi_type = mach_map.get(machine, 'libc6')
# XXX assuming GLIBC's ldconfig (with option -p)
regex = os.fsencode(
'\s+(lib%s\.[^\s]+)\s+\(%s' % (re.escape(name), abi_type))
try:
with subprocess.Popen(['/sbin/ldconfig', '-p'],
stdin=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
stdout=subprocess.PIPE,
env={'LC_ALL': 'C', 'LANG': 'C'}) as p:
res = re.search(regex, p.stdout.read())
if res:
return os.fsdecode(res.group(1))
except OSError:
pass
def find_library(name):
return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
################################################################
# test code
def test():
from ctypes import cdll
if os.name == "nt":
print(cdll.msvcrt)
print(cdll.load("msvcrt"))
print(find_library("msvcrt"))
if os.name == "posix":
# find and load_version
print(find_library("m"))
print(find_library("c"))
print(find_library("bz2"))
# getattr
## print cdll.m
## print cdll.bz2
# load
if sys.platform == "darwin":
print(cdll.LoadLibrary("libm.dylib"))
print(cdll.LoadLibrary("libcrypto.dylib"))
print(cdll.LoadLibrary("libSystem.dylib"))
print(cdll.LoadLibrary("System.framework/System"))
else:
print(cdll.LoadLibrary("libm.so"))
print(cdll.LoadLibrary("libcrypt.so"))
print(find_library("crypt"))
if __name__ == "__main__":
test()
# The most useful windows datatypes
import ctypes
BYTE = ctypes.c_byte
WORD = ctypes.c_ushort
DWORD = ctypes.c_ulong
#UCHAR = ctypes.c_uchar
CHAR = ctypes.c_char
WCHAR = ctypes.c_wchar
UINT = ctypes.c_uint
INT = ctypes.c_int
DOUBLE = ctypes.c_double
FLOAT = ctypes.c_float
BOOLEAN = BYTE
BOOL = ctypes.c_long
class VARIANT_BOOL(ctypes._SimpleCData):
_type_ = "v"
def __repr__(self):
return "%s(%r)" % (self.__class__.__name__, self.value)
ULONG = ctypes.c_ulong
LONG = ctypes.c_long
USHORT = ctypes.c_ushort
SHORT = ctypes.c_short
# in the windows header files, these are structures.
_LARGE_INTEGER = LARGE_INTEGER = ctypes.c_longlong
_ULARGE_INTEGER = ULARGE_INTEGER = ctypes.c_ulonglong
LPCOLESTR = LPOLESTR = OLESTR = ctypes.c_wchar_p
LPCWSTR = LPWSTR = ctypes.c_wchar_p
LPCSTR = LPSTR = ctypes.c_char_p
LPCVOID = LPVOID = ctypes.c_void_p
# WPARAM is defined as UINT_PTR (unsigned type)
# LPARAM is defined as LONG_PTR (signed type)
if ctypes.sizeof(ctypes.c_long) == ctypes.sizeof(ctypes.c_void_p):
WPARAM = ctypes.c_ulong
LPARAM = ctypes.c_long
elif ctypes.sizeof(ctypes.c_longlong) == ctypes.sizeof(ctypes.c_void_p):
WPARAM = ctypes.c_ulonglong
LPARAM = ctypes.c_longlong
ATOM = WORD
LANGID = WORD
COLORREF = DWORD
LGRPID = DWORD
LCTYPE = DWORD
LCID = DWORD
################################################################
# HANDLE types
HANDLE = ctypes.c_void_p # in the header files: void *
HACCEL = HANDLE
HBITMAP = HANDLE
HBRUSH = HANDLE
HCOLORSPACE = HANDLE
HDC = HANDLE
HDESK = HANDLE
HDWP = HANDLE
HENHMETAFILE = HANDLE
HFONT = HANDLE
HGDIOBJ = HANDLE
HGLOBAL = HANDLE
HHOOK = HANDLE
HICON = HANDLE
HINSTANCE = HANDLE
HKEY = HANDLE
HKL = HANDLE
HLOCAL = HANDLE
HMENU = HANDLE
HMETAFILE = HANDLE
HMODULE = HANDLE
HMONITOR = HANDLE
HPALETTE = HANDLE
HPEN = HANDLE
HRGN = HANDLE
HRSRC = HANDLE
HSTR = HANDLE
HTASK = HANDLE
HWINSTA = HANDLE
HWND = HANDLE
SC_HANDLE = HANDLE
SERVICE_STATUS_HANDLE = HANDLE
################################################################
# Some important structure definitions
class RECT(ctypes.Structure):
_fields_ = [("left", LONG),
("top", LONG),
("right", LONG),
("bottom", LONG)]
tagRECT = _RECTL = RECTL = RECT
class _SMALL_RECT(ctypes.Structure):
_fields_ = [('Left', SHORT),
('Top', SHORT),
('Right', SHORT),
('Bottom', SHORT)]
SMALL_RECT = _SMALL_RECT
class _COORD(ctypes.Structure):
_fields_ = [('X', SHORT),
('Y', SHORT)]
class POINT(ctypes.Structure):
_fields_ = [("x", LONG),
("y", LONG)]
tagPOINT = _POINTL = POINTL = POINT
class SIZE(ctypes.Structure):
_fields_ = [("cx", LONG),
("cy", LONG)]
tagSIZE = SIZEL = SIZE
def RGB(red, green, blue):
return red + (green << 8) + (blue << 16)
class FILETIME(ctypes.Structure):
_fields_ = [("dwLowDateTime", DWORD),
("dwHighDateTime", DWORD)]
_FILETIME = FILETIME
class MSG(ctypes.Structure):
_fields_ = [("hWnd", HWND),
("message", UINT),
("wParam", WPARAM),
("lParam", LPARAM),
("time", DWORD),
("pt", POINT)]
tagMSG = MSG
MAX_PATH = 260
class WIN32_FIND_DATAA(ctypes.Structure):
_fields_ = [("dwFileAttributes", DWORD),
("ftCreationTime", FILETIME),
("ftLastAccessTime", FILETIME),
("ftLastWriteTime", FILETIME),
("nFileSizeHigh", DWORD),
("nFileSizeLow", DWORD),
("dwReserved0", DWORD),
("dwReserved1", DWORD),
("cFileName", CHAR * MAX_PATH),
("cAlternateFileName", CHAR * 14)]
class WIN32_FIND_DATAW(ctypes.Structure):
_fields_ = [("dwFileAttributes", DWORD),
("ftCreationTime", FILETIME),
("ftLastAccessTime", FILETIME),
("ftLastWriteTime", FILETIME),
("nFileSizeHigh", DWORD),
("nFileSizeLow", DWORD),
("dwReserved0", DWORD),
("dwReserved1", DWORD),
("cFileName", WCHAR * MAX_PATH),
("cAlternateFileName", WCHAR * 14)]
################################################################
# Pointer types
LPBOOL = PBOOL = ctypes.POINTER(BOOL)
PBOOLEAN = ctypes.POINTER(BOOLEAN)
LPBYTE = PBYTE = ctypes.POINTER(BYTE)
PCHAR = ctypes.POINTER(CHAR)
LPCOLORREF = ctypes.POINTER(COLORREF)
LPDWORD = PDWORD = ctypes.POINTER(DWORD)
LPFILETIME = PFILETIME = ctypes.POINTER(FILETIME)
PFLOAT = ctypes.POINTER(FLOAT)
LPHANDLE = PHANDLE = ctypes.POINTER(HANDLE)
PHKEY = ctypes.POINTER(HKEY)
LPHKL = ctypes.POINTER(HKL)
LPINT = PINT = ctypes.POINTER(INT)
PLARGE_INTEGER = ctypes.POINTER(LARGE_INTEGER)
PLCID = ctypes.POINTER(LCID)
LPLONG = PLONG = ctypes.POINTER(LONG)
LPMSG = PMSG = ctypes.POINTER(MSG)
LPPOINT = PPOINT = ctypes.POINTER(POINT)
PPOINTL = ctypes.POINTER(POINTL)
LPRECT = PRECT = ctypes.POINTER(RECT)
LPRECTL = PRECTL = ctypes.POINTER(RECTL)
LPSC_HANDLE = ctypes.POINTER(SC_HANDLE)
PSHORT = ctypes.POINTER(SHORT)
LPSIZE = PSIZE = ctypes.POINTER(SIZE)
LPSIZEL = PSIZEL = ctypes.POINTER(SIZEL)
PSMALL_RECT = ctypes.POINTER(SMALL_RECT)
LPUINT = PUINT = ctypes.POINTER(UINT)
PULARGE_INTEGER = ctypes.POINTER(ULARGE_INTEGER)
PULONG = ctypes.POINTER(ULONG)
PUSHORT = ctypes.POINTER(USHORT)
PWCHAR = ctypes.POINTER(WCHAR)
LPWIN32_FIND_DATAA = PWIN32_FIND_DATAA = ctypes.POINTER(WIN32_FIND_DATAA)
LPWIN32_FIND_DATAW = PWIN32_FIND_DATAW = ctypes.POINTER(WIN32_FIND_DATAW)
LPWORD = PWORD = ctypes.POINTER(WORD)
import sys
from ctypes import *
_array_type = type(Array)
def _other_endian(typ):
"""Return the type with the 'other' byte order. Simple types like
c_int and so on already have __ctype_be__ and __ctype_le__
attributes which contain the types, for more complicated types
arrays and structures are supported.
"""
# check _OTHER_ENDIAN attribute (present if typ is primitive type)
if hasattr(typ, _OTHER_ENDIAN):
return getattr(typ, _OTHER_ENDIAN)
# if typ is array
if isinstance(typ, _array_type):
return _other_endian(typ._type_) * typ._length_
# if typ is structure
if issubclass(typ, Structure):
return typ
raise TypeError("This type does not support other endian: %s" % typ)
class _swapped_meta(type(Structure)):
def __setattr__(self, attrname, value):
if attrname == "_fields_":
fields = []
for desc in value:
name = desc[0]
typ = desc[1]
rest = desc[2:]
fields.append((name, _other_endian(typ)) + rest)
value = fields
super().__setattr__(attrname, value)
################################################################
# Note: The Structure metaclass checks for the *presence* (not the
# value!) of a _swapped_bytes_ attribute to determine the bit order in
# structures containing bit fields.
if sys.byteorder == "little":
_OTHER_ENDIAN = "__ctype_be__"
LittleEndianStructure = Structure
class BigEndianStructure(Structure, metaclass=_swapped_meta):
"""Structure with big endian byte order"""
_swappedbytes_ = None
elif sys.byteorder == "big":
_OTHER_ENDIAN = "__ctype_le__"
BigEndianStructure = Structure
class LittleEndianStructure(Structure, metaclass=_swapped_meta):
"""Structure with little endian byte order"""
_swappedbytes_ = None
else:
raise RuntimeError("Invalid byteorder")
"""create and manipulate C data types in Python"""
import os as _os, sys as _sys
__version__ = "1.1.0"
from _ctypes import Union, Structure, Array
from _ctypes import _Pointer
from _ctypes import CFuncPtr as _CFuncPtr
from _ctypes import __version__ as _ctypes_version
from _ctypes import RTLD_LOCAL, RTLD_GLOBAL
from _ctypes import ArgumentError
from struct import calcsize as _calcsize
if __version__ != _ctypes_version:
raise Exception("Version number mismatch", __version__, _ctypes_version)
if _os.name in ("nt", "ce"):
from _ctypes import FormatError
DEFAULT_MODE = RTLD_LOCAL
if _os.name == "posix" and _sys.platform == "darwin":
# On OS X 10.3, we use RTLD_GLOBAL as default mode
# because RTLD_LOCAL does not work at least on some
# libraries. OS X 10.3 is Darwin 7, so we check for
# that.
if int(_os.uname().release.split('.')[0]) < 8:
DEFAULT_MODE = RTLD_GLOBAL
from _ctypes import FUNCFLAG_CDECL as _FUNCFLAG_CDECL, \
FUNCFLAG_PYTHONAPI as _FUNCFLAG_PYTHONAPI, \
FUNCFLAG_USE_ERRNO as _FUNCFLAG_USE_ERRNO, \
FUNCFLAG_USE_LASTERROR as _FUNCFLAG_USE_LASTERROR
# WINOLEAPI -> HRESULT
# WINOLEAPI_(type)
#
# STDMETHODCALLTYPE
#
# STDMETHOD(name)
# STDMETHOD_(type, name)
#
# STDAPICALLTYPE
def create_string_buffer(init, size=None):
"""create_string_buffer(aBytes) -> character array
create_string_buffer(anInteger) -> character array
create_string_buffer(aString, anInteger) -> character array
"""
if isinstance(init, bytes):
if size is None:
size = len(init)+1
buftype = c_char * size
buf = buftype()
buf.value = init
return buf
elif isinstance(init, int):
buftype = c_char * init
buf = buftype()
return buf
raise TypeError(init)
def c_buffer(init, size=None):
## "deprecated, use create_string_buffer instead"
## import warnings
## warnings.warn("c_buffer is deprecated, use create_string_buffer instead",
## DeprecationWarning, stacklevel=2)
return create_string_buffer(init, size)
_c_functype_cache = {}
def CFUNCTYPE(restype, *argtypes, **kw):
"""CFUNCTYPE(restype, *argtypes,
use_errno=False, use_last_error=False) -> function prototype.
restype: the result type
argtypes: a sequence specifying the argument types
The function prototype can be called in different ways to create a
callable object:
prototype(integer address) -> foreign function
prototype(callable) -> create and return a C callable function from callable
prototype(integer index, method name[, paramflags]) -> foreign function calling a COM method
prototype((ordinal number, dll object)[, paramflags]) -> foreign function exported by ordinal
prototype((function name, dll object)[, paramflags]) -> foreign function exported by name
"""
flags = _FUNCFLAG_CDECL
if kw.pop("use_errno", False):
flags |= _FUNCFLAG_USE_ERRNO
if kw.pop("use_last_error", False):
flags |= _FUNCFLAG_USE_LASTERROR
if kw:
raise ValueError("unexpected keyword argument(s) %s" % kw.keys())
try:
return _c_functype_cache[(restype, argtypes, flags)]
except KeyError:
class CFunctionType(_CFuncPtr):
_argtypes_ = argtypes
_restype_ = restype
_flags_ = flags
_c_functype_cache[(restype, argtypes, flags)] = CFunctionType
return CFunctionType
if _os.name in ("nt", "ce"):
from _ctypes import LoadLibrary as _dlopen
from _ctypes import FUNCFLAG_STDCALL as _FUNCFLAG_STDCALL
if _os.name == "ce":
# 'ce' doesn't have the stdcall calling convention
_FUNCFLAG_STDCALL = _FUNCFLAG_CDECL
_win_functype_cache = {}
def WINFUNCTYPE(restype, *argtypes, **kw):
# docstring set later (very similar to CFUNCTYPE.__doc__)
flags = _FUNCFLAG_STDCALL
if kw.pop("use_errno", False):
flags |= _FUNCFLAG_USE_ERRNO
if kw.pop("use_last_error", False):
flags |= _FUNCFLAG_USE_LASTERROR
if kw:
raise ValueError("unexpected keyword argument(s) %s" % kw.keys())
try:
return _win_functype_cache[(restype, argtypes, flags)]
except KeyError:
class WinFunctionType(_CFuncPtr):
_argtypes_ = argtypes
_restype_ = restype
_flags_ = flags
_win_functype_cache[(restype, argtypes, flags)] = WinFunctionType
return WinFunctionType
if WINFUNCTYPE.__doc__:
WINFUNCTYPE.__doc__ = CFUNCTYPE.__doc__.replace("CFUNCTYPE", "WINFUNCTYPE")
elif _os.name == "posix":
from _ctypes import dlopen as _dlopen
from _ctypes import sizeof, byref, addressof, alignment, resize
from _ctypes import get_errno, set_errno
from _ctypes import _SimpleCData
def _check_size(typ, typecode=None):
# Check if sizeof(ctypes_type) against struct.calcsize. This
# should protect somewhat against a misconfigured libffi.
from struct import calcsize
if typecode is None:
# Most _type_ codes are the same as used in struct
typecode = typ._type_
actual, required = sizeof(typ), calcsize(typecode)
if actual != required:
raise SystemError("sizeof(%s) wrong: %d instead of %d" % \
(typ, actual, required))
class py_object(_SimpleCData):
_type_ = "O"
def __repr__(self):
try:
return super().__repr__()
except ValueError:
return "%s(<NULL>)" % type(self).__name__
_check_size(py_object, "P")
class c_short(_SimpleCData):
_type_ = "h"
_check_size(c_short)
class c_ushort(_SimpleCData):
_type_ = "H"
_check_size(c_ushort)
class c_long(_SimpleCData):
_type_ = "l"
_check_size(c_long)
class c_ulong(_SimpleCData):
_type_ = "L"
_check_size(c_ulong)
if _calcsize("i") == _calcsize("l"):
# if int and long have the same size, make c_int an alias for c_long
c_int = c_long
c_uint = c_ulong
else:
class c_int(_SimpleCData):
_type_ = "i"
_check_size(c_int)
class c_uint(_SimpleCData):
_type_ = "I"
_check_size(c_uint)
class c_float(_SimpleCData):
_type_ = "f"
_check_size(c_float)
class c_double(_SimpleCData):
_type_ = "d"
_check_size(c_double)
class c_longdouble(_SimpleCData):
_type_ = "g"
if sizeof(c_longdouble) == sizeof(c_double):
c_longdouble = c_double
if _calcsize("l") == _calcsize("q"):
# if long and long long have the same size, make c_longlong an alias for c_long
c_longlong = c_long
c_ulonglong = c_ulong
else:
class c_longlong(_SimpleCData):
_type_ = "q"
_check_size(c_longlong)
class c_ulonglong(_SimpleCData):
_type_ = "Q"
## def from_param(cls, val):
## return ('d', float(val), val)
## from_param = classmethod(from_param)
_check_size(c_ulonglong)
class c_ubyte(_SimpleCData):
_type_ = "B"
c_ubyte.__ctype_le__ = c_ubyte.__ctype_be__ = c_ubyte
# backward compatibility:
##c_uchar = c_ubyte
_check_size(c_ubyte)
class c_byte(_SimpleCData):
_type_ = "b"
c_byte.__ctype_le__ = c_byte.__ctype_be__ = c_byte
_check_size(c_byte)
class c_char(_SimpleCData):
_type_ = "c"
c_char.__ctype_le__ = c_char.__ctype_be__ = c_char
_check_size(c_char)
class c_char_p(_SimpleCData):
_type_ = "z"
if _os.name == "nt":
def __repr__(self):
if not windll.kernel32.IsBadStringPtrA(self, -1):
return "%s(%r)" % (self.__class__.__name__, self.value)
return "%s(%s)" % (self.__class__.__name__, cast(self, c_void_p).value)
else:
def __repr__(self):
return "%s(%s)" % (self.__class__.__name__, cast(self, c_void_p).value)
_check_size(c_char_p, "P")
class c_void_p(_SimpleCData):
_type_ = "P"
c_voidp = c_void_p # backwards compatibility (to a bug)
_check_size(c_void_p)
class c_bool(_SimpleCData):
_type_ = "?"
from _ctypes import POINTER, pointer, _pointer_type_cache
class c_wchar_p(_SimpleCData):
_type_ = "Z"
class c_wchar(_SimpleCData):
_type_ = "u"
def _reset_cache():
_pointer_type_cache.clear()
_c_functype_cache.clear()
if _os.name in ("nt", "ce"):
_win_functype_cache.clear()
# _SimpleCData.c_wchar_p_from_param
POINTER(c_wchar).from_param = c_wchar_p.from_param
# _SimpleCData.c_char_p_from_param
POINTER(c_char).from_param = c_char_p.from_param
_pointer_type_cache[None] = c_void_p
# XXX for whatever reasons, creating the first instance of a callback
# function is needed for the unittests on Win64 to succeed. This MAY
# be a compiler bug, since the problem occurs only when _ctypes is
# compiled with the MS SDK compiler. Or an uninitialized variable?
CFUNCTYPE(c_int)(lambda: None)
def create_unicode_buffer(init, size=None):
"""create_unicode_buffer(aString) -> character array
create_unicode_buffer(anInteger) -> character array
create_unicode_buffer(aString, anInteger) -> character array
"""
if isinstance(init, str):
if size is None:
size = len(init)+1
buftype = c_wchar * size
buf = buftype()
buf.value = init
return buf
elif isinstance(init, int):
buftype = c_wchar * init
buf = buftype()
return buf
raise TypeError(init)
# XXX Deprecated
def SetPointerType(pointer, cls):
if _pointer_type_cache.get(cls, None) is not None:
raise RuntimeError("This type already exists in the cache")
if id(pointer) not in _pointer_type_cache:
raise RuntimeError("What's this???")
pointer.set_type(cls)
_pointer_type_cache[cls] = pointer
del _pointer_type_cache[id(pointer)]
# XXX Deprecated
def ARRAY(typ, len):
return typ * len
################################################################
class CDLL(object):
"""An instance of this class represents a loaded dll/shared
library, exporting functions using the standard C calling
convention (named 'cdecl' on Windows).
The exported functions can be accessed as attributes, or by
indexing with the function name. Examples:
<obj>.qsort -> callable object
<obj>['qsort'] -> callable object
Calling the functions releases the Python GIL during the call and
reacquires it afterwards.
"""
_func_flags_ = _FUNCFLAG_CDECL
_func_restype_ = c_int
def __init__(self, name, mode=DEFAULT_MODE, handle=None,
use_errno=False,
use_last_error=False):
self._name = name
flags = self._func_flags_
if use_errno:
flags |= _FUNCFLAG_USE_ERRNO
if use_last_error:
flags |= _FUNCFLAG_USE_LASTERROR
class _FuncPtr(_CFuncPtr):
_flags_ = flags
_restype_ = self._func_restype_
self._FuncPtr = _FuncPtr
if handle is None:
self._handle = _dlopen(self._name, mode)
else:
self._handle = handle
def __repr__(self):
return "<%s '%s', handle %x at %x>" % \
(self.__class__.__name__, self._name,
(self._handle & (_sys.maxsize*2 + 1)),
id(self) & (_sys.maxsize*2 + 1))
def __getattr__(self, name):
if name.startswith('__') and name.endswith('__'):
raise AttributeError(name)
func = self.__getitem__(name)
setattr(self, name, func)
return func
def __getitem__(self, name_or_ordinal):
func = self._FuncPtr((name_or_ordinal, self))
if not isinstance(name_or_ordinal, int):
func.__name__ = name_or_ordinal
return func
class PyDLL(CDLL):
"""This class represents the Python library itself. It allows to
access Python API functions. The GIL is not released, and
Python exceptions are handled correctly.
"""
_func_flags_ = _FUNCFLAG_CDECL | _FUNCFLAG_PYTHONAPI
if _os.name in ("nt", "ce"):
class WinDLL(CDLL):
"""This class represents a dll exporting functions using the
Windows stdcall calling convention.
"""
_func_flags_ = _FUNCFLAG_STDCALL
# XXX Hm, what about HRESULT as normal parameter?
# Mustn't it derive from c_long then?
from _ctypes import _check_HRESULT, _SimpleCData
class HRESULT(_SimpleCData):
_type_ = "l"
# _check_retval_ is called with the function's result when it
# is used as restype. It checks for the FAILED bit, and
# raises an OSError if it is set.
#
# The _check_retval_ method is implemented in C, so that the
# method definition itself is not included in the traceback
# when it raises an error - that is what we want (and Python
# doesn't have a way to raise an exception in the caller's
# frame).
_check_retval_ = _check_HRESULT
class OleDLL(CDLL):
"""This class represents a dll exporting functions using the
Windows stdcall calling convention, and returning HRESULT.
HRESULT error values are automatically raised as OSError
exceptions.
"""
_func_flags_ = _FUNCFLAG_STDCALL
_func_restype_ = HRESULT
class LibraryLoader(object):
def __init__(self, dlltype):
self._dlltype = dlltype
def __getattr__(self, name):
if name[0] == '_':
raise AttributeError(name)
dll = self._dlltype(name)
setattr(self, name, dll)
return dll
def __getitem__(self, name):
return getattr(self, name)
def LoadLibrary(self, name):
return self._dlltype(name)
cdll = LibraryLoader(CDLL)
pydll = LibraryLoader(PyDLL)
if _os.name in ("nt", "ce"):
pythonapi = PyDLL("python dll", None, _sys.dllhandle)
elif _sys.platform == "cygwin":
pythonapi = PyDLL("libpython%d.%d.dll" % _sys.version_info[:2])
else:
pythonapi = PyDLL(None)
if _os.name in ("nt", "ce"):
windll = LibraryLoader(WinDLL)
oledll = LibraryLoader(OleDLL)
if _os.name == "nt":
GetLastError = windll.kernel32.GetLastError
else:
GetLastError = windll.coredll.GetLastError
from _ctypes import get_last_error, set_last_error
def WinError(code=None, descr=None):
if code is None:
code = GetLastError()
if descr is None:
descr = FormatError(code).strip()
return OSError(None, descr, None, code)
if sizeof(c_uint) == sizeof(c_void_p):
c_size_t = c_uint
c_ssize_t = c_int
elif sizeof(c_ulong) == sizeof(c_void_p):
c_size_t = c_ulong
c_ssize_t = c_long
elif sizeof(c_ulonglong) == sizeof(c_void_p):
c_size_t = c_ulonglong
c_ssize_t = c_longlong
# functions
from _ctypes import _memmove_addr, _memset_addr, _string_at_addr, _cast_addr
## void *memmove(void *, const void *, size_t);
memmove = CFUNCTYPE(c_void_p, c_void_p, c_void_p, c_size_t)(_memmove_addr)
## void *memset(void *, int, size_t)
memset = CFUNCTYPE(c_void_p, c_void_p, c_int, c_size_t)(_memset_addr)
def PYFUNCTYPE(restype, *argtypes):
class CFunctionType(_CFuncPtr):
_argtypes_ = argtypes
_restype_ = restype
_flags_ = _FUNCFLAG_CDECL | _FUNCFLAG_PYTHONAPI
return CFunctionType
_cast = PYFUNCTYPE(py_object, c_void_p, py_object, py_object)(_cast_addr)
def cast(obj, typ):
return _cast(obj, obj, typ)
_string_at = PYFUNCTYPE(py_object, c_void_p, c_int)(_string_at_addr)
def string_at(ptr, size=-1):
"""string_at(addr[, size]) -> string
Return the string at addr."""
return _string_at(ptr, size)
try:
from _ctypes import _wstring_at_addr
except ImportError:
pass
else:
_wstring_at = PYFUNCTYPE(py_object, c_void_p, c_int)(_wstring_at_addr)
def wstring_at(ptr, size=-1):
"""wstring_at(addr[, size]) -> string
Return the string at addr."""
return _wstring_at(ptr, size)
if _os.name in ("nt", "ce"): # COM stuff
def DllGetClassObject(rclsid, riid, ppv):
try:
ccom = __import__("comtypes.server.inprocserver", globals(), locals(), ['*'])
except ImportError:
return -2147221231 # CLASS_E_CLASSNOTAVAILABLE
else:
return ccom.DllGetClassObject(rclsid, riid, ppv)
def DllCanUnloadNow():
try:
ccom = __import__("comtypes.server.inprocserver", globals(), locals(), ['*'])
except ImportError:
return 0 # S_OK
return ccom.DllCanUnloadNow()
from ctypes._endian import BigEndianStructure, LittleEndianStructure
# Fill in specifically-sized types
c_int8 = c_byte
c_uint8 = c_ubyte
for kind in [c_short, c_int, c_long, c_longlong]:
if sizeof(kind) == 2: c_int16 = kind
elif sizeof(kind) == 4: c_int32 = kind
elif sizeof(kind) == 8: c_int64 = kind
for kind in [c_ushort, c_uint, c_ulong, c_ulonglong]:
if sizeof(kind) == 2: c_uint16 = kind
elif sizeof(kind) == 4: c_uint32 = kind
elif sizeof(kind) == 8: c_uint64 = kind
del(kind)
_reset_cache()
"""
dyld emulation
"""
import os
from ctypes.macholib.framework import framework_info
from ctypes.macholib.dylib import dylib_info
from itertools import *
__all__ = [
'dyld_find', 'framework_find',
'framework_info', 'dylib_info',
]
# These are the defaults as per man dyld(1)
#
DEFAULT_FRAMEWORK_FALLBACK = [
os.path.expanduser("~/Library/Frameworks"),
"/Library/Frameworks",
"/Network/Library/Frameworks",
"/System/Library/Frameworks",
]
DEFAULT_LIBRARY_FALLBACK = [
os.path.expanduser("~/lib"),
"/usr/local/lib",
"/lib",
"/usr/lib",
]
def dyld_env(env, var):
if env is None:
env = os.environ
rval = env.get(var)
if rval is None:
return []
return rval.split(':')
def dyld_image_suffix(env=None):
if env is None:
env = os.environ
return env.get('DYLD_IMAGE_SUFFIX')
def dyld_framework_path(env=None):
return dyld_env(env, 'DYLD_FRAMEWORK_PATH')
def dyld_library_path(env=None):
return dyld_env(env, 'DYLD_LIBRARY_PATH')
def dyld_fallback_framework_path(env=None):
return dyld_env(env, 'DYLD_FALLBACK_FRAMEWORK_PATH')
def dyld_fallback_library_path(env=None):
return dyld_env(env, 'DYLD_FALLBACK_LIBRARY_PATH')
def dyld_image_suffix_search(iterator, env=None):
"""For a potential path iterator, add DYLD_IMAGE_SUFFIX semantics"""
suffix = dyld_image_suffix(env)
if suffix is None:
return iterator
def _inject(iterator=iterator, suffix=suffix):
for path in iterator:
if path.endswith('.dylib'):
yield path[:-len('.dylib')] + suffix + '.dylib'
else:
yield path + suffix
yield path
return _inject()
def dyld_override_search(name, env=None):
# If DYLD_FRAMEWORK_PATH is set and this dylib_name is a
# framework name, use the first file that exists in the framework
# path if any. If there is none go on to search the DYLD_LIBRARY_PATH
# if any.
framework = framework_info(name)
if framework is not None:
for path in dyld_framework_path(env):
yield os.path.join(path, framework['name'])
# If DYLD_LIBRARY_PATH is set then use the first file that exists
# in the path. If none use the original name.
for path in dyld_library_path(env):
yield os.path.join(path, os.path.basename(name))
def dyld_executable_path_search(name, executable_path=None):
# If we haven't done any searching and found a library and the
# dylib_name starts with "@executable_path/" then construct the
# library name.
if name.startswith('@executable_path/') and executable_path is not None:
yield os.path.join(executable_path, name[len('@executable_path/'):])
def dyld_default_search(name, env=None):
yield name
framework = framework_info(name)
if framework is not None:
fallback_framework_path = dyld_fallback_framework_path(env)
for path in fallback_framework_path:
yield os.path.join(path, framework['name'])
fallback_library_path = dyld_fallback_library_path(env)
for path in fallback_library_path:
yield os.path.join(path, os.path.basename(name))
if framework is not None and not fallback_framework_path:
for path in DEFAULT_FRAMEWORK_FALLBACK:
yield os.path.join(path, framework['name'])
if not fallback_library_path:
for path in DEFAULT_LIBRARY_FALLBACK:
yield os.path.join(path, os.path.basename(name))
def dyld_find(name, executable_path=None, env=None):
"""
Find a library or framework using dyld semantics
"""
for path in dyld_image_suffix_search(chain(
dyld_override_search(name, env),
dyld_executable_path_search(name, executable_path),
dyld_default_search(name, env),
), env):
if os.path.isfile(path):
return path
raise ValueError("dylib %s could not be found" % (name,))
def framework_find(fn, executable_path=None, env=None):
"""
Find a framework using dyld semantics in a very loose manner.
Will take input such as:
Python
Python.framework
Python.framework/Versions/Current
"""
try:
return dyld_find(fn, executable_path=executable_path, env=env)
except ValueError as e:
pass
fmwk_index = fn.rfind('.framework')
if fmwk_index == -1:
fmwk_index = len(fn)
fn += '.framework'
fn = os.path.join(fn, os.path.basename(fn[:fmwk_index]))
try:
return dyld_find(fn, executable_path=executable_path, env=env)
except ValueError:
raise e
def test_dyld_find():
env = {}
assert dyld_find('libSystem.dylib') == '/usr/lib/libSystem.dylib'
assert dyld_find('System.framework/System') == '/System/Library/Frameworks/System.framework/System'
if __name__ == '__main__':
test_dyld_find()
"""
Generic dylib path manipulation
"""
import re
__all__ = ['dylib_info']
DYLIB_RE = re.compile(r"""(?x)
(?P<location>^.*)(?:^|/)
(?P<name>
(?P<shortname>\w+?)
(?:\.(?P<version>[^._]+))?
(?:_(?P<suffix>[^._]+))?
\.dylib$
)
""")
def dylib_info(filename):
"""
A dylib name can take one of the following four forms:
Location/Name.SomeVersion_Suffix.dylib
Location/Name.SomeVersion.dylib
Location/Name_Suffix.dylib
Location/Name.dylib
returns None if not found or a mapping equivalent to:
dict(
location='Location',
name='Name.SomeVersion_Suffix.dylib',
shortname='Name',
version='SomeVersion',
suffix='Suffix',
)
Note that SomeVersion and Suffix are optional and may be None
if not present.
"""
is_dylib = DYLIB_RE.match(filename)
if not is_dylib:
return None
return is_dylib.groupdict()
def test_dylib_info():
def d(location=None, name=None, shortname=None, version=None, suffix=None):
return dict(
location=location,
name=name,
shortname=shortname,
version=version,
suffix=suffix
)
assert dylib_info('completely/invalid') is None
assert dylib_info('completely/invalide_debug') is None
assert dylib_info('P/Foo.dylib') == d('P', 'Foo.dylib', 'Foo')
assert dylib_info('P/Foo_debug.dylib') == d('P', 'Foo_debug.dylib', 'Foo', suffix='debug')
assert dylib_info('P/Foo.A.dylib') == d('P', 'Foo.A.dylib', 'Foo', 'A')
assert dylib_info('P/Foo_debug.A.dylib') == d('P', 'Foo_debug.A.dylib', 'Foo_debug', 'A')
assert dylib_info('P/Foo.A_debug.dylib') == d('P', 'Foo.A_debug.dylib', 'Foo', 'A', 'debug')
if __name__ == '__main__':
test_dylib_info()
svn export --force http://svn.red-bean.com/bob/macholib/trunk/macholib/ .
"""
Generic framework path manipulation
"""
import re
__all__ = ['framework_info']
STRICT_FRAMEWORK_RE = re.compile(r"""(?x)
(?P<location>^.*)(?:^|/)
(?P<name>
(?P<shortname>\w+).framework/
(?:Versions/(?P<version>[^/]+)/)?
(?P=shortname)
(?:_(?P<suffix>[^_]+))?
)$
""")
def framework_info(filename):
"""
A framework name can take one of the following four forms:
Location/Name.framework/Versions/SomeVersion/Name_Suffix
Location/Name.framework/Versions/SomeVersion/Name
Location/Name.framework/Name_Suffix
Location/Name.framework/Name
returns None if not found, or a mapping equivalent to:
dict(
location='Location',
name='Name.framework/Versions/SomeVersion/Name_Suffix',
shortname='Name',
version='SomeVersion',
suffix='Suffix',
)
Note that SomeVersion and Suffix are optional and may be None
if not present
"""
is_framework = STRICT_FRAMEWORK_RE.match(filename)
if not is_framework:
return None
return is_framework.groupdict()
def test_framework_info():
def d(location=None, name=None, shortname=None, version=None, suffix=None):
return dict(
location=location,
name=name,
shortname=shortname,
version=version,
suffix=suffix
)
assert framework_info('completely/invalid') is None
assert framework_info('completely/invalid/_debug') is None
assert framework_info('P/F.framework') is None
assert framework_info('P/F.framework/_debug') is None
assert framework_info('P/F.framework/F') == d('P', 'F.framework/F', 'F')
assert framework_info('P/F.framework/F_debug') == d('P', 'F.framework/F_debug', 'F', suffix='debug')
assert framework_info('P/F.framework/Versions') is None
assert framework_info('P/F.framework/Versions/A') is None
assert framework_info('P/F.framework/Versions/A/F') == d('P', 'F.framework/Versions/A/F', 'F', 'A')
assert framework_info('P/F.framework/Versions/A/F_debug') == d('P', 'F.framework/Versions/A/F_debug', 'F', 'A', 'debug')
if __name__ == '__main__':
test_framework_info()
"""
Enough Mach-O to make your head spin.
See the relevant header files in /usr/include/mach-o
And also Apple's documentation.
"""
__version__ = '1.0'
"""Constants and membership tests for ASCII characters"""
NUL = 0x00 # ^@
SOH = 0x01 # ^A
STX = 0x02 # ^B
ETX = 0x03 # ^C
EOT = 0x04 # ^D
ENQ = 0x05 # ^E
ACK = 0x06 # ^F
BEL = 0x07 # ^G
BS = 0x08 # ^H
TAB = 0x09 # ^I
HT = 0x09 # ^I
LF = 0x0a # ^J
NL = 0x0a # ^J
VT = 0x0b # ^K
FF = 0x0c # ^L
CR = 0x0d # ^M
SO = 0x0e # ^N
SI = 0x0f # ^O
DLE = 0x10 # ^P
DC1 = 0x11 # ^Q
DC2 = 0x12 # ^R
DC3 = 0x13 # ^S
DC4 = 0x14 # ^T
NAK = 0x15 # ^U
SYN = 0x16 # ^V
ETB = 0x17 # ^W
CAN = 0x18 # ^X
EM = 0x19 # ^Y
SUB = 0x1a # ^Z
ESC = 0x1b # ^[
FS = 0x1c # ^\
GS = 0x1d # ^]
RS = 0x1e # ^^
US = 0x1f # ^_
SP = 0x20 # space
DEL = 0x7f # delete
controlnames = [
"NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
"BS", "HT", "LF", "VT", "FF", "CR", "SO", "SI",
"DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
"CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US",
"SP"
]
def _ctoi(c):
if type(c) == type(""):
return ord(c)
else:
return c
def isalnum(c): return isalpha(c) or isdigit(c)
def isalpha(c): return isupper(c) or islower(c)
def isascii(c): return _ctoi(c) <= 127 # ?
def isblank(c): return _ctoi(c) in (8,32)
def iscntrl(c): return _ctoi(c) <= 31
def isdigit(c): return _ctoi(c) >= 48 and _ctoi(c) <= 57
def isgraph(c): return _ctoi(c) >= 33 and _ctoi(c) <= 126
def islower(c): return _ctoi(c) >= 97 and _ctoi(c) <= 122
def isprint(c): return _ctoi(c) >= 32 and _ctoi(c) <= 126
def ispunct(c): return _ctoi(c) != 32 and not isalnum(c)
def isspace(c): return _ctoi(c) in (9, 10, 11, 12, 13, 32)
def isupper(c): return _ctoi(c) >= 65 and _ctoi(c) <= 90
def isxdigit(c): return isdigit(c) or \
(_ctoi(c) >= 65 and _ctoi(c) <= 70) or (_ctoi(c) >= 97 and _ctoi(c) <= 102)
def isctrl(c): return _ctoi(c) < 32
def ismeta(c): return _ctoi(c) > 127
def ascii(c):
if type(c) == type(""):
return chr(_ctoi(c) & 0x7f)
else:
return _ctoi(c) & 0x7f
def ctrl(c):
if type(c) == type(""):
return chr(_ctoi(c) & 0x1f)
else:
return _ctoi(c) & 0x1f
def alt(c):
if type(c) == type(""):
return chr(_ctoi(c) | 0x80)
else:
return _ctoi(c) | 0x80
def unctrl(c):
bits = _ctoi(c)
if bits == 0x7f:
rep = "^?"
elif isprint(bits & 0x7f):
rep = chr(bits & 0x7f)
else:
rep = "^" + chr(((bits & 0x7f) | 0x20) + 0x20)
if bits & 0x80:
return "!" + rep
return rep
#
# Emulation of has_key() function for platforms that don't use ncurses
#
import _curses
# Table mapping curses keys to the terminfo capability name
_capability_names = {
_curses.KEY_A1: 'ka1',
_curses.KEY_A3: 'ka3',
_curses.KEY_B2: 'kb2',
_curses.KEY_BACKSPACE: 'kbs',
_curses.KEY_BEG: 'kbeg',
_curses.KEY_BTAB: 'kcbt',
_curses.KEY_C1: 'kc1',
_curses.KEY_C3: 'kc3',
_curses.KEY_CANCEL: 'kcan',
_curses.KEY_CATAB: 'ktbc',
_curses.KEY_CLEAR: 'kclr',
_curses.KEY_CLOSE: 'kclo',
_curses.KEY_COMMAND: 'kcmd',
_curses.KEY_COPY: 'kcpy',
_curses.KEY_CREATE: 'kcrt',
_curses.KEY_CTAB: 'kctab',
_curses.KEY_DC: 'kdch1',
_curses.KEY_DL: 'kdl1',
_curses.KEY_DOWN: 'kcud1',
_curses.KEY_EIC: 'krmir',
_curses.KEY_END: 'kend',
_curses.KEY_ENTER: 'kent',
_curses.KEY_EOL: 'kel',
_curses.KEY_EOS: 'ked',
_curses.KEY_EXIT: 'kext',
_curses.KEY_F0: 'kf0',
_curses.KEY_F1: 'kf1',
_curses.KEY_F10: 'kf10',
_curses.KEY_F11: 'kf11',
_curses.KEY_F12: 'kf12',
_curses.KEY_F13: 'kf13',
_curses.KEY_F14: 'kf14',
_curses.KEY_F15: 'kf15',
_curses.KEY_F16: 'kf16',
_curses.KEY_F17: 'kf17',
_curses.KEY_F18: 'kf18',
_curses.KEY_F19: 'kf19',
_curses.KEY_F2: 'kf2',
_curses.KEY_F20: 'kf20',
_curses.KEY_F21: 'kf21',
_curses.KEY_F22: 'kf22',
_curses.KEY_F23: 'kf23',
_curses.KEY_F24: 'kf24',
_curses.KEY_F25: 'kf25',
_curses.KEY_F26: 'kf26',
_curses.KEY_F27: 'kf27',
_curses.KEY_F28: 'kf28',
_curses.KEY_F29: 'kf29',
_curses.KEY_F3: 'kf3',
_curses.KEY_F30: 'kf30',
_curses.KEY_F31: 'kf31',
_curses.KEY_F32: 'kf32',
_curses.KEY_F33: 'kf33',
_curses.KEY_F34: 'kf34',
_curses.KEY_F35: 'kf35',
_curses.KEY_F36: 'kf36',
_curses.KEY_F37: 'kf37',
_curses.KEY_F38: 'kf38',
_curses.KEY_F39: 'kf39',
_curses.KEY_F4: 'kf4',
_curses.KEY_F40: 'kf40',
_curses.KEY_F41: 'kf41',
_curses.KEY_F42: 'kf42',
_curses.KEY_F43: 'kf43',
_curses.KEY_F44: 'kf44',
_curses.KEY_F45: 'kf45',
_curses.KEY_F46: 'kf46',
_curses.KEY_F47: 'kf47',
_curses.KEY_F48: 'kf48',
_curses.KEY_F49: 'kf49',
_curses.KEY_F5: 'kf5',
_curses.KEY_F50: 'kf50',
_curses.KEY_F51: 'kf51',
_curses.KEY_F52: 'kf52',
_curses.KEY_F53: 'kf53',
_curses.KEY_F54: 'kf54',
_curses.KEY_F55: 'kf55',
_curses.KEY_F56: 'kf56',
_curses.KEY_F57: 'kf57',
_curses.KEY_F58: 'kf58',
_curses.KEY_F59: 'kf59',
_curses.KEY_F6: 'kf6',
_curses.KEY_F60: 'kf60',
_curses.KEY_F61: 'kf61',
_curses.KEY_F62: 'kf62',
_curses.KEY_F63: 'kf63',
_curses.KEY_F7: 'kf7',
_curses.KEY_F8: 'kf8',
_curses.KEY_F9: 'kf9',
_curses.KEY_FIND: 'kfnd',
_curses.KEY_HELP: 'khlp',
_curses.KEY_HOME: 'khome',
_curses.KEY_IC: 'kich1',
_curses.KEY_IL: 'kil1',
_curses.KEY_LEFT: 'kcub1',
_curses.KEY_LL: 'kll',
_curses.KEY_MARK: 'kmrk',
_curses.KEY_MESSAGE: 'kmsg',
_curses.KEY_MOVE: 'kmov',
_curses.KEY_NEXT: 'knxt',
_curses.KEY_NPAGE: 'knp',
_curses.KEY_OPEN: 'kopn',
_curses.KEY_OPTIONS: 'kopt',
_curses.KEY_PPAGE: 'kpp',
_curses.KEY_PREVIOUS: 'kprv',
_curses.KEY_PRINT: 'kprt',
_curses.KEY_REDO: 'krdo',
_curses.KEY_REFERENCE: 'kref',
_curses.KEY_REFRESH: 'krfr',
_curses.KEY_REPLACE: 'krpl',
_curses.KEY_RESTART: 'krst',
_curses.KEY_RESUME: 'kres',
_curses.KEY_RIGHT: 'kcuf1',
_curses.KEY_SAVE: 'ksav',
_curses.KEY_SBEG: 'kBEG',
_curses.KEY_SCANCEL: 'kCAN',
_curses.KEY_SCOMMAND: 'kCMD',
_curses.KEY_SCOPY: 'kCPY',
_curses.KEY_SCREATE: 'kCRT',
_curses.KEY_SDC: 'kDC',
_curses.KEY_SDL: 'kDL',
_curses.KEY_SELECT: 'kslt',
_curses.KEY_SEND: 'kEND',
_curses.KEY_SEOL: 'kEOL',
_curses.KEY_SEXIT: 'kEXT',
_curses.KEY_SF: 'kind',
_curses.KEY_SFIND: 'kFND',
_curses.KEY_SHELP: 'kHLP',
_curses.KEY_SHOME: 'kHOM',
_curses.KEY_SIC: 'kIC',
_curses.KEY_SLEFT: 'kLFT',
_curses.KEY_SMESSAGE: 'kMSG',
_curses.KEY_SMOVE: 'kMOV',
_curses.KEY_SNEXT: 'kNXT',
_curses.KEY_SOPTIONS: 'kOPT',
_curses.KEY_SPREVIOUS: 'kPRV',
_curses.KEY_SPRINT: 'kPRT',
_curses.KEY_SR: 'kri',
_curses.KEY_SREDO: 'kRDO',
_curses.KEY_SREPLACE: 'kRPL',
_curses.KEY_SRIGHT: 'kRIT',
_curses.KEY_SRSUME: 'kRES',
_curses.KEY_SSAVE: 'kSAV',
_curses.KEY_SSUSPEND: 'kSPD',
_curses.KEY_STAB: 'khts',
_curses.KEY_SUNDO: 'kUND',
_curses.KEY_SUSPEND: 'kspd',
_curses.KEY_UNDO: 'kund',
_curses.KEY_UP: 'kcuu1'
}
def has_key(ch):
if isinstance(ch, str):
ch = ord(ch)
# Figure out the correct capability name for the keycode.
capability_name = _capability_names.get(ch)
if capability_name is None:
return False
#Check the current terminal description for that capability;
#if present, return true, else return false.
if _curses.tigetstr( capability_name ):
return True
else:
return False
if __name__ == '__main__':
# Compare the output of this implementation and the ncurses has_key,
# on platforms where has_key is already available
try:
L = []
_curses.initscr()
for key in _capability_names.keys():
system = _curses.has_key(key)
python = has_key(key)
if system != python:
L.append( 'Mismatch for key %s, system=%i, Python=%i'
% (_curses.keyname( key ), system, python) )
finally:
_curses.endwin()
for i in L: print(i)
"""curses.panel
Module for using panels with curses.
"""
from _curses_panel import *
"""Simple textbox editing widget with Emacs-like keybindings."""
import curses
import curses.ascii
def rectangle(win, uly, ulx, lry, lrx):
"""Draw a rectangle with corners at the provided upper-left
and lower-right coordinates.
"""
win.vline(uly+1, ulx, curses.ACS_VLINE, lry - uly - 1)
win.hline(uly, ulx+1, curses.ACS_HLINE, lrx - ulx - 1)
win.hline(lry, ulx+1, curses.ACS_HLINE, lrx - ulx - 1)
win.vline(uly+1, lrx, curses.ACS_VLINE, lry - uly - 1)
win.addch(uly, ulx, curses.ACS_ULCORNER)
win.addch(uly, lrx, curses.ACS_URCORNER)
win.addch(lry, lrx, curses.ACS_LRCORNER)
win.addch(lry, ulx, curses.ACS_LLCORNER)
class Textbox:
"""Editing widget using the interior of a window object.
Supports the following Emacs-like key bindings:
Ctrl-A Go to left edge of window.
Ctrl-B Cursor left, wrapping to previous line if appropriate.
Ctrl-D Delete character under cursor.
Ctrl-E Go to right edge (stripspaces off) or end of line (stripspaces on).
Ctrl-F Cursor right, wrapping to next line when appropriate.
Ctrl-G Terminate, returning the window contents.
Ctrl-H Delete character backward.
Ctrl-J Terminate if the window is 1 line, otherwise insert newline.
Ctrl-K If line is blank, delete it, otherwise clear to end of line.
Ctrl-L Refresh screen.
Ctrl-N Cursor down; move down one line.
Ctrl-O Insert a blank line at cursor location.
Ctrl-P Cursor up; move up one line.
Move operations do nothing if the cursor is at an edge where the movement
is not possible. The following synonyms are supported where possible:
KEY_LEFT = Ctrl-B, KEY_RIGHT = Ctrl-F, KEY_UP = Ctrl-P, KEY_DOWN = Ctrl-N
KEY_BACKSPACE = Ctrl-h
"""
def __init__(self, win, insert_mode=False):
self.win = win
self.insert_mode = insert_mode
(self.maxy, self.maxx) = win.getmaxyx()
self.maxy = self.maxy - 1
self.maxx = self.maxx - 1
self.stripspaces = 1
self.lastcmd = None
win.keypad(1)
def _end_of_line(self, y):
"""Go to the location of the first blank on the given line,
returning the index of the last non-blank character."""
last = self.maxx
while True:
if curses.ascii.ascii(self.win.inch(y, last)) != curses.ascii.SP:
last = min(self.maxx, last+1)
break
elif last == 0:
break
last = last - 1
return last
def _insert_printable_char(self, ch):
(y, x) = self.win.getyx()
if y < self.maxy or x < self.maxx:
if self.insert_mode:
oldch = self.win.inch()
# The try-catch ignores the error we trigger from some curses
# versions by trying to write into the lowest-rightmost spot
# in the window.
try:
self.win.addch(ch)
except curses.error:
pass
if self.insert_mode:
(backy, backx) = self.win.getyx()
if curses.ascii.isprint(oldch):
self._insert_printable_char(oldch)
self.win.move(backy, backx)
def do_command(self, ch):
"Process a single editing command."
(y, x) = self.win.getyx()
self.lastcmd = ch
if curses.ascii.isprint(ch):
if y < self.maxy or x < self.maxx:
self._insert_printable_char(ch)
elif ch == curses.ascii.SOH: # ^a
self.win.move(y, 0)
elif ch in (curses.ascii.STX,curses.KEY_LEFT, curses.ascii.BS,curses.KEY_BACKSPACE):
if x > 0:
self.win.move(y, x-1)
elif y == 0:
pass
elif self.stripspaces:
self.win.move(y-1, self._end_of_line(y-1))
else:
self.win.move(y-1, self.maxx)
if ch in (curses.ascii.BS, curses.KEY_BACKSPACE):
self.win.delch()
elif ch == curses.ascii.EOT: # ^d
self.win.delch()
elif ch == curses.ascii.ENQ: # ^e
if self.stripspaces:
self.win.move(y, self._end_of_line(y))
else:
self.win.move(y, self.maxx)
elif ch in (curses.ascii.ACK, curses.KEY_RIGHT): # ^f
if x < self.maxx:
self.win.move(y, x+1)
elif y == self.maxy:
pass
else:
self.win.move(y+1, 0)
elif ch == curses.ascii.BEL: # ^g
return 0
elif ch == curses.ascii.NL: # ^j
if self.maxy == 0:
return 0
elif y < self.maxy:
self.win.move(y+1, 0)
elif ch == curses.ascii.VT: # ^k
if x == 0 and self._end_of_line(y) == 0:
self.win.deleteln()
else:
# first undo the effect of self._end_of_line
self.win.move(y, x)
self.win.clrtoeol()
elif ch == curses.ascii.FF: # ^l
self.win.refresh()
elif ch in (curses.ascii.SO, curses.KEY_DOWN): # ^n
if y < self.maxy:
self.win.move(y+1, x)
if x > self._end_of_line(y+1):
self.win.move(y+1, self._end_of_line(y+1))
elif ch == curses.ascii.SI: # ^o
self.win.insertln()
elif ch in (curses.ascii.DLE, curses.KEY_UP): # ^p
if y > 0:
self.win.move(y-1, x)
if x > self._end_of_line(y-1):
self.win.move(y-1, self._end_of_line(y-1))
return 1
def gather(self):
"Collect and return the contents of the window."
result = ""
for y in range(self.maxy+1):
self.win.move(y, 0)
stop = self._end_of_line(y)
if stop == 0 and self.stripspaces:
continue
for x in range(self.maxx+1):
if self.stripspaces and x > stop:
break
result = result + chr(curses.ascii.ascii(self.win.inch(y, x)))
if self.maxy > 0:
result = result + "\n"
return result
def edit(self, validate=None):
"Edit in the widget window and collect the results."
while 1:
ch = self.win.getch()
if validate:
ch = validate(ch)
if not ch:
continue
if not self.do_command(ch):
break
self.win.refresh()
return self.gather()
if __name__ == '__main__':
def test_editbox(stdscr):
ncols, nlines = 9, 4
uly, ulx = 15, 20
stdscr.addstr(uly-2, ulx, "Use Ctrl-G to end editing.")
win = curses.newwin(nlines, ncols, uly, ulx)
rectangle(stdscr, uly-1, ulx-1, uly + nlines, ulx + ncols)
stdscr.refresh()
return Textbox(win).edit()
str = curses.wrapper(test_editbox)
print('Contents of text box:', repr(str))
"""curses
The main package for curses support for Python. Normally used by importing
the package, and perhaps a particular module inside it.
import curses
from curses import textpad
curses.initscr()
...
"""
from _curses import *
import os as _os
import sys as _sys
# Some constants, most notably the ACS_* ones, are only added to the C
# _curses module's dictionary after initscr() is called. (Some
# versions of SGI's curses don't define values for those constants
# until initscr() has been called.) This wrapper function calls the
# underlying C initscr(), and then copies the constants from the
# _curses module to the curses package's dictionary. Don't do 'from
# curses import *' if you'll be needing the ACS_* constants.
def initscr():
import _curses, curses
# we call setupterm() here because it raises an error
# instead of calling exit() in error cases.
setupterm(term=_os.environ.get("TERM", "unknown"),
fd=_sys.__stdout__.fileno())
stdscr = _curses.initscr()
for key, value in _curses.__dict__.items():
if key[0:4] == 'ACS_' or key in ('LINES', 'COLS'):
setattr(curses, key, value)
return stdscr
# This is a similar wrapper for start_color(), which adds the COLORS and
# COLOR_PAIRS variables which are only available after start_color() is
# called.
def start_color():
import _curses, curses
retval = _curses.start_color()
if hasattr(_curses, 'COLORS'):
curses.COLORS = _curses.COLORS
if hasattr(_curses, 'COLOR_PAIRS'):
curses.COLOR_PAIRS = _curses.COLOR_PAIRS
return retval
# Import Python has_key() implementation if _curses doesn't contain has_key()
try:
has_key
except NameError:
from .has_key import has_key
# Wrapper for the entire curses-based application. Runs a function which
# should be the rest of your curses-based application. If the application
# raises an exception, wrapper() will restore the terminal to a sane state so
# you can read the resulting traceback.
def wrapper(func, *args, **kwds):
"""Wrapper function that initializes curses and calls another function,
restoring normal keyboard/screen behavior on error.
The callable object 'func' is then passed the main window 'stdscr'
as its first argument, followed by any other arguments passed to
wrapper().
"""
try:
# Initialize curses
stdscr = initscr()
# Turn off echoing of keys, and enter cbreak mode,
# where no buffering is performed on keyboard input
noecho()
cbreak()
# In keypad mode, escape sequences for special keys
# (like the cursor keys) will be interpreted and
# a special value like curses.KEY_LEFT will be returned
stdscr.keypad(1)
# Start color, too. Harmless if the terminal doesn't have
# color; user can test with has_color() later on. The try/catch
# works around a minor bit of over-conscientiousness in the curses
# module -- the error return from C start_color() is ignorable.
try:
start_color()
except:
pass
return func(stdscr, *args, **kwds)
finally:
# Set everything back to normal
if 'stdscr' in locals():
stdscr.keypad(0)
echo()
nocbreak()
endwin()
"""A dumb and slow but simple dbm clone.
For database spam, spam.dir contains the index (a text file),
spam.bak *may* contain a backup of the index (also a text file),
while spam.dat contains the data (a binary file).
XXX TO DO:
- seems to contain a bug when updating...
- reclaim free space (currently, space once occupied by deleted or expanded
items is never reused)
- support concurrent access (currently, if two processes take turns making
updates, they can mess up the index)
- support efficient access to large databases (currently, the whole index
is read when the database is opened, and some updates rewrite the whole index)
- support opening for read-only (flag = 'm')
"""
import ast as _ast
import io as _io
import os as _os
import collections
__all__ = ["error", "open"]
_BLOCKSIZE = 512
error = OSError
class _Database(collections.MutableMapping):
# The on-disk directory and data files can remain in mutually
# inconsistent states for an arbitrarily long time (see comments
# at the end of __setitem__). This is only repaired when _commit()
# gets called. One place _commit() gets called is from __del__(),
# and if that occurs at program shutdown time, module globals may
# already have gotten rebound to None. Since it's crucial that
# _commit() finish successfully, we can't ignore shutdown races
# here, and _commit() must not reference any globals.
_os = _os # for _commit()
_io = _io # for _commit()
def __init__(self, filebasename, mode):
self._mode = mode
# The directory file is a text file. Each line looks like
# "%r, (%d, %d)\n" % (key, pos, siz)
# where key is the string key, pos is the offset into the dat
# file of the associated value's first byte, and siz is the number
# of bytes in the associated value.
self._dirfile = filebasename + '.dir'
# The data file is a binary file pointed into by the directory
# file, and holds the values associated with keys. Each value
# begins at a _BLOCKSIZE-aligned byte offset, and is a raw
# binary 8-bit string value.
self._datfile = filebasename + '.dat'
self._bakfile = filebasename + '.bak'
# The index is an in-memory dict, mirroring the directory file.
self._index = None # maps keys to (pos, siz) pairs
# Mod by Jack: create data file if needed
try:
f = _io.open(self._datfile, 'r', encoding="Latin-1")
except OSError:
with _io.open(self._datfile, 'w', encoding="Latin-1") as f:
self._chmod(self._datfile)
else:
f.close()
self._update()
# Read directory file into the in-memory index dict.
def _update(self):
self._index = {}
try:
f = _io.open(self._dirfile, 'r', encoding="Latin-1")
except OSError:
pass
else:
with f:
for line in f:
line = line.rstrip()
key, pos_and_siz_pair = _ast.literal_eval(line)
key = key.encode('Latin-1')
self._index[key] = pos_and_siz_pair
# Write the index dict to the directory file. The original directory
# file (if any) is renamed with a .bak extension first. If a .bak
# file currently exists, it's deleted.
def _commit(self):
# CAUTION: It's vital that _commit() succeed, and _commit() can
# be called from __del__(). Therefore we must never reference a
# global in this routine.
if self._index is None:
return # nothing to do
try:
self._os.unlink(self._bakfile)
except OSError:
pass
try:
self._os.rename(self._dirfile, self._bakfile)
except OSError:
pass
with self._io.open(self._dirfile, 'w', encoding="Latin-1") as f:
self._chmod(self._dirfile)
for key, pos_and_siz_pair in self._index.items():
# Use Latin-1 since it has no qualms with any value in any
# position; UTF-8, though, does care sometimes.
entry = "%r, %r\n" % (key.decode('Latin-1'), pos_and_siz_pair)
f.write(entry)
sync = _commit
def _verify_open(self):
if self._index is None:
raise error('DBM object has already been closed')
def __getitem__(self, key):
if isinstance(key, str):
key = key.encode('utf-8')
self._verify_open()
pos, siz = self._index[key] # may raise KeyError
with _io.open(self._datfile, 'rb') as f:
f.seek(pos)
dat = f.read(siz)
return dat
# Append val to the data file, starting at a _BLOCKSIZE-aligned
# offset. The data file is first padded with NUL bytes (if needed)
# to get to an aligned offset. Return pair
# (starting offset of val, len(val))
def _addval(self, val):
with _io.open(self._datfile, 'rb+') as f:
f.seek(0, 2)
pos = int(f.tell())
npos = ((pos + _BLOCKSIZE - 1) // _BLOCKSIZE) * _BLOCKSIZE
f.write(b'\0'*(npos-pos))
pos = npos
f.write(val)
return (pos, len(val))
# Write val to the data file, starting at offset pos. The caller
# is responsible for ensuring that there's enough room starting at
# pos to hold val, without overwriting some other value. Return
# pair (pos, len(val)).
def _setval(self, pos, val):
with _io.open(self._datfile, 'rb+') as f:
f.seek(pos)
f.write(val)
return (pos, len(val))
# key is a new key whose associated value starts in the data file
# at offset pos and with length siz. Add an index record to
# the in-memory index dict, and append one to the directory file.
def _addkey(self, key, pos_and_siz_pair):
self._index[key] = pos_and_siz_pair
with _io.open(self._dirfile, 'a', encoding="Latin-1") as f:
self._chmod(self._dirfile)
f.write("%r, %r\n" % (key.decode("Latin-1"), pos_and_siz_pair))
def __setitem__(self, key, val):
if isinstance(key, str):
key = key.encode('utf-8')
elif not isinstance(key, (bytes, bytearray)):
raise TypeError("keys must be bytes or strings")
if isinstance(val, str):
val = val.encode('utf-8')
elif not isinstance(val, (bytes, bytearray)):
raise TypeError("values must be bytes or strings")
self._verify_open()
if key not in self._index:
self._addkey(key, self._addval(val))
else:
# See whether the new value is small enough to fit in the
# (padded) space currently occupied by the old value.
pos, siz = self._index[key]
oldblocks = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
newblocks = (len(val) + _BLOCKSIZE - 1) // _BLOCKSIZE
if newblocks <= oldblocks:
self._index[key] = self._setval(pos, val)
else:
# The new value doesn't fit in the (padded) space used
# by the old value. The blocks used by the old value are
# forever lost.
self._index[key] = self._addval(val)
# Note that _index may be out of synch with the directory
# file now: _setval() and _addval() don't update the directory
# file. This also means that the on-disk directory and data
# files are in a mutually inconsistent state, and they'll
# remain that way until _commit() is called. Note that this
# is a disaster (for the database) if the program crashes
# (so that _commit() never gets called).
def __delitem__(self, key):
if isinstance(key, str):
key = key.encode('utf-8')
self._verify_open()
# The blocks used by the associated value are lost.
del self._index[key]
# XXX It's unclear why we do a _commit() here (the code always
# XXX has, so I'm not changing it). __setitem__ doesn't try to
# XXX keep the directory file in synch. Why should we? Or
# XXX why shouldn't __setitem__?
self._commit()
def keys(self):
try:
return list(self._index)
except TypeError:
raise error('DBM object has already been closed') from None
def items(self):
self._verify_open()
return [(key, self[key]) for key in self._index.keys()]
def __contains__(self, key):
if isinstance(key, str):
key = key.encode('utf-8')
try:
return key in self._index
except TypeError:
if self._index is None:
raise error('DBM object has already been closed') from None
else:
raise
def iterkeys(self):
try:
return iter(self._index)
except TypeError:
raise error('DBM object has already been closed') from None
__iter__ = iterkeys
def __len__(self):
try:
return len(self._index)
except TypeError:
raise error('DBM object has already been closed') from None
def close(self):
try:
self._commit()
finally:
self._index = self._datfile = self._dirfile = self._bakfile = None
__del__ = close
def _chmod(self, file):
if hasattr(self._os, 'chmod'):
self._os.chmod(file, self._mode)
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
def open(file, flag=None, mode=0o666):
"""Open the database file, filename, and return corresponding object.
The flag argument, used to control how the database is opened in the
other DBM implementations, is ignored in the dbm.dumb module; the
database is always opened for update, and will be created if it does
not exist.
The optional mode argument is the UNIX mode of the file, used only when
the database has to be created. It defaults to octal code 0o666 (and
will be modified by the prevailing umask).
"""
# flag argument is currently ignored
# Modify mode depending on the umask
try:
um = _os.umask(0)
_os.umask(um)
except AttributeError:
pass
else:
# Turn off any bits that are set in the umask
mode = mode & (~um)
return _Database(file, mode)
"""Provide the _gdbm module as a dbm submodule."""
from _gdbm import *
"""Provide the _dbm module as a dbm submodule."""
from _dbm import *
"""Generic interface to all dbm clones.
Use
import dbm
d = dbm.open(file, 'w', 0o666)
The returned object is a dbm.gnu, dbm.ndbm or dbm.dumb object, dependent on the
type of database being opened (determined by the whichdb function) in the case
of an existing dbm. If the dbm does not exist and the create or new flag ('c'
or 'n') was specified, the dbm type will be determined by the availability of
the modules (tested in the above order).
It has the following interface (key and data are strings):
d[key] = data # store data at key (may override data at
# existing key)
data = d[key] # retrieve data at key (raise KeyError if no
# such key)
del d[key] # delete data stored at key (raises KeyError
# if no such key)
flag = key in d # true if the key exists
list = d.keys() # return a list of all existing keys (slow!)
Future versions may change the order in which implementations are
tested for existence, and add interfaces to other dbm-like
implementations.
"""
__all__ = ['open', 'whichdb', 'error']
import io
import os
import struct
import sys
class error(Exception):
pass
_names = ['dbm.gnu', 'dbm.ndbm', 'dbm.dumb']
_defaultmod = None
_modules = {}
error = (error, OSError)
try:
from dbm import ndbm
except ImportError:
ndbm = None
def open(file, flag='r', mode=0o666):
"""Open or create database at path given by *file*.
Optional argument *flag* can be 'r' (default) for read-only access, 'w'
for read-write access of an existing database, 'c' for read-write access
to a new or existing database, and 'n' for read-write access to a new
database.
Note: 'r' and 'w' fail if the database doesn't exist; 'c' creates it
only if it doesn't exist; and 'n' always creates a new database.
"""
global _defaultmod
if _defaultmod is None:
for name in _names:
try:
mod = __import__(name, fromlist=['open'])
except ImportError:
continue
if not _defaultmod:
_defaultmod = mod
_modules[name] = mod
if not _defaultmod:
raise ImportError("no dbm clone found; tried %s" % _names)
# guess the type of an existing database, if not creating a new one
result = whichdb(file) if 'n' not in flag else None
if result is None:
# db doesn't exist or 'n' flag was specified to create a new db
if 'c' in flag or 'n' in flag:
# file doesn't exist and the new flag was used so use default type
mod = _defaultmod
else:
raise error[0]("need 'c' or 'n' flag to open new db")
elif result == "":
# db type cannot be determined
raise error[0]("db type could not be determined")
elif result not in _modules:
raise error[0]("db type is {0}, but the module is not "
"available".format(result))
else:
mod = _modules[result]
return mod.open(file, flag, mode)
def whichdb(filename):
"""Guess which db package to use to open a db file.
Return values:
- None if the database file can't be read;
- empty string if the file can be read but can't be recognized
- the name of the dbm submodule (e.g. "ndbm" or "gnu") if recognized.
Importing the given module may still fail, and opening the
database using that module may still fail.
"""
# Check for ndbm first -- this has a .pag and a .dir file
try:
f = io.open(filename + ".pag", "rb")
f.close()
f = io.open(filename + ".dir", "rb")
f.close()
return "dbm.ndbm"
except OSError:
# some dbm emulations based on Berkeley DB generate a .db file
# some do not, but they should be caught by the bsd checks
try:
f = io.open(filename + ".db", "rb")
f.close()
# guarantee we can actually open the file using dbm
# kind of overkill, but since we are dealing with emulations
# it seems like a prudent step
if ndbm is not None:
d = ndbm.open(filename)
d.close()
return "dbm.ndbm"
except OSError:
pass
# Check for dumbdbm next -- this has a .dir and a .dat file
try:
# First check for presence of files
os.stat(filename + ".dat")
size = os.stat(filename + ".dir").st_size
# dumbdbm files with no keys are empty
if size == 0:
return "dbm.dumb"
f = io.open(filename + ".dir", "rb")
try:
if f.read(1) in (b"'", b'"'):
return "dbm.dumb"
finally:
f.close()
except OSError:
pass
# See if the file exists, return None if not
try:
f = io.open(filename, "rb")
except OSError:
return None
# Read the start of the file -- the magic number
s16 = f.read(16)
f.close()
s = s16[0:4]
# Return "" if not at least 4 bytes
if len(s) != 4:
return ""
# Convert to 4-byte int in native byte order -- return "" if impossible
try:
(magic,) = struct.unpack("=l", s)
except struct.error:
return ""
# Check for GNU dbm
if magic in (0x13579ace, 0x13579acd, 0x13579acf):
return "dbm.gnu"
# Later versions of Berkeley db hash file have a 12-byte pad in
# front of the file type
try:
(magic,) = struct.unpack("=l", s16[-4:])
except struct.error:
return ""
# Unknown
return ""
if __name__ == "__main__":
for filename in sys.argv[1:]:
print(whichdb(filename) or "UNKNOWN", filename)
"""distutils.archive_util
Utility functions for creating archive files (tarballs, zip files,
that sort of thing)."""
import os
from warnings import warn
import sys
try:
import zipfile
except ImportError:
zipfile = None
from distutils.errors import DistutilsExecError
from distutils.spawn import spawn
from distutils.dir_util import mkpath
from distutils import log
try:
from pwd import getpwnam
except ImportError:
getpwnam = None
try:
from grp import getgrnam
except ImportError:
getgrnam = None
def _get_gid(name):
"""Returns a gid, given a group name."""
if getgrnam is None or name is None:
return None
try:
result = getgrnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def _get_uid(name):
"""Returns an uid, given a user name."""
if getpwnam is None or name is None:
return None
try:
result = getpwnam(name)
except KeyError:
result = None
if result is not None:
return result[2]
return None
def make_tarball(base_name, base_dir, compress="gzip", verbose=0, dry_run=0,
owner=None, group=None):
"""Create a (possibly compressed) tar file from all the files under
'base_dir'.
'compress' must be "gzip" (the default), "compress", "bzip2", or None.
(compress will be deprecated in Python 3.2)
'owner' and 'group' can be used to define an owner and a group for the
archive that is being built. If not provided, the current owner and group
will be used.
The output tar file will be named 'base_dir' + ".tar", possibly plus
the appropriate compression extension (".gz", ".bz2" or ".Z").
Returns the output filename.
"""
tar_compression = {'gzip': 'gz', 'bzip2': 'bz2', None: '', 'compress': ''}
compress_ext = {'gzip': '.gz', 'bzip2': '.bz2', 'compress': '.Z'}
# flags for compression program, each element of list will be an argument
if compress is not None and compress not in compress_ext.keys():
raise ValueError(
"bad value for 'compress': must be None, 'gzip', 'bzip2' "
"or 'compress'")
archive_name = base_name + '.tar'
if compress != 'compress':
archive_name += compress_ext.get(compress, '')
mkpath(os.path.dirname(archive_name), dry_run=dry_run)
# creating the tarball
import tarfile # late import so Python build itself doesn't break
log.info('Creating tar archive')
uid = _get_uid(owner)
gid = _get_gid(group)
def _set_uid_gid(tarinfo):
if gid is not None:
tarinfo.gid = gid
tarinfo.gname = group
if uid is not None:
tarinfo.uid = uid
tarinfo.uname = owner
return tarinfo
if not dry_run:
tar = tarfile.open(archive_name, 'w|%s' % tar_compression[compress])
try:
tar.add(base_dir, filter=_set_uid_gid)
finally:
tar.close()
# compression using `compress`
if compress == 'compress':
warn("'compress' will be deprecated.", PendingDeprecationWarning)
# the option varies depending on the platform
compressed_name = archive_name + compress_ext[compress]
if sys.platform == 'win32':
cmd = [compress, archive_name, compressed_name]
else:
cmd = [compress, '-f', archive_name]
spawn(cmd, dry_run=dry_run)
return compressed_name
return archive_name
def make_zipfile(base_name, base_dir, verbose=0, dry_run=0):
"""Create a zip file from all the files under 'base_dir'.
The output zip file will be named 'base_name' + ".zip". Uses either the
"zipfile" Python module (if available) or the InfoZIP "zip" utility
(if installed and found on the default search path). If neither tool is
available, raises DistutilsExecError. Returns the name of the output zip
file.
"""
zip_filename = base_name + ".zip"
mkpath(os.path.dirname(zip_filename), dry_run=dry_run)
# If zipfile module is not available, try spawning an external
# 'zip' command.
if zipfile is None:
if verbose:
zipoptions = "-r"
else:
zipoptions = "-rq"
try:
spawn(["zip", zipoptions, zip_filename, base_dir],
dry_run=dry_run)
except DistutilsExecError:
# XXX really should distinguish between "couldn't find
# external 'zip' command" and "zip failed".
raise DistutilsExecError(("unable to create zip file '%s': "
"could neither import the 'zipfile' module nor "
"find a standalone zip utility") % zip_filename)
else:
log.info("creating '%s' and adding '%s' to it",
zip_filename, base_dir)
if not dry_run:
try:
zip = zipfile.ZipFile(zip_filename, "w",
compression=zipfile.ZIP_DEFLATED)
except RuntimeError:
zip = zipfile.ZipFile(zip_filename, "w",
compression=zipfile.ZIP_STORED)
for dirpath, dirnames, filenames in os.walk(base_dir):
for name in filenames:
path = os.path.normpath(os.path.join(dirpath, name))
if os.path.isfile(path):
zip.write(path, path)
log.info("adding '%s'" % path)
zip.close()
return zip_filename
ARCHIVE_FORMATS = {
'gztar': (make_tarball, [('compress', 'gzip')], "gzip'ed tar-file"),
'bztar': (make_tarball, [('compress', 'bzip2')], "bzip2'ed tar-file"),
'ztar': (make_tarball, [('compress', 'compress')], "compressed tar file"),
'tar': (make_tarball, [('compress', None)], "uncompressed tar file"),
'zip': (make_zipfile, [],"ZIP file")
}
def check_archive_formats(formats):
"""Returns the first format from the 'format' list that is unknown.
If all formats are known, returns None
"""
for format in formats:
if format not in ARCHIVE_FORMATS:
return format
return None
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
dry_run=0, owner=None, group=None):
"""Create an archive file (eg. zip or tar).
'base_name' is the name of the file to create, minus any format-specific
extension; 'format' is the archive format: one of "zip", "tar", "ztar",
or "gztar".
'root_dir' is a directory that will be the root directory of the
archive; ie. we typically chdir into 'root_dir' before creating the
archive. 'base_dir' is the directory where we start archiving from;
ie. 'base_dir' will be the common prefix of all files and
directories in the archive. 'root_dir' and 'base_dir' both default
to the current directory. Returns the name of the archive file.
'owner' and 'group' are used when creating a tar archive. By default,
uses the current owner and group.
"""
save_cwd = os.getcwd()
if root_dir is not None:
log.debug("changing into '%s'", root_dir)
base_name = os.path.abspath(base_name)
if not dry_run:
os.chdir(root_dir)
if base_dir is None:
base_dir = os.curdir
kwargs = {'dry_run': dry_run}
try:
format_info = ARCHIVE_FORMATS[format]
except KeyError:
raise ValueError("unknown archive format '%s'" % format)
func = format_info[0]
for arg, val in format_info[1]:
kwargs[arg] = val
if format != 'zip':
kwargs['owner'] = owner
kwargs['group'] = group
try:
filename = func(base_name, base_dir, **kwargs)
finally:
if root_dir is not None:
log.debug("changing back to '%s'", save_cwd)
os.chdir(save_cwd)
return filename
"""distutils.bcppcompiler
Contains BorlandCCompiler, an implementation of the abstract CCompiler class
for the Borland C++ compiler.
"""
# This implementation by Lyle Johnson, based on the original msvccompiler.py
# module and using the directions originally published by Gordon Williams.
# XXX looks like there's a LOT of overlap between these two classes:
# someone should sit down and factor out the common code as
# WindowsCCompiler! --GPW
import os
from distutils.errors import \
DistutilsExecError, DistutilsPlatformError, \
CompileError, LibError, LinkError, UnknownFileError
from distutils.ccompiler import \
CCompiler, gen_preprocess_options, gen_lib_options
from distutils.file_util import write_file
from distutils.dep_util import newer
from distutils import log
class BCPPCompiler(CCompiler) :
"""Concrete class that implements an interface to the Borland C/C++
compiler, as defined by the CCompiler abstract class.
"""
compiler_type = 'bcpp'
# Just set this so CCompiler's constructor doesn't barf. We currently
# don't use the 'set_executables()' bureaucracy provided by CCompiler,
# as it really isn't necessary for this sort of single-compiler class.
# Would be nice to have a consistent interface with UnixCCompiler,
# though, so it's worth thinking about.
executables = {}
# Private class data (need to distinguish C from C++ source for compiler)
_c_extensions = ['.c']
_cpp_extensions = ['.cc', '.cpp', '.cxx']
# Needed for the filename generation methods provided by the
# base class, CCompiler.
src_extensions = _c_extensions + _cpp_extensions
obj_extension = '.obj'
static_lib_extension = '.lib'
shared_lib_extension = '.dll'
static_lib_format = shared_lib_format = '%s%s'
exe_extension = '.exe'
def __init__ (self,
verbose=0,
dry_run=0,
force=0):
CCompiler.__init__ (self, verbose, dry_run, force)
# These executables are assumed to all be in the path.
# Borland doesn't seem to use any special registry settings to
# indicate their installation locations.
self.cc = "bcc32.exe"
self.linker = "ilink32.exe"
self.lib = "tlib.exe"
self.preprocess_options = None
self.compile_options = ['/tWM', '/O2', '/q', '/g0']
self.compile_options_debug = ['/tWM', '/Od', '/q', '/g0']
self.ldflags_shared = ['/Tpd', '/Gn', '/q', '/x']
self.ldflags_shared_debug = ['/Tpd', '/Gn', '/q', '/x']
self.ldflags_static = []
self.ldflags_exe = ['/Gn', '/q', '/x']
self.ldflags_exe_debug = ['/Gn', '/q', '/x','/r']
# -- Worker methods ------------------------------------------------
def compile(self, sources,
output_dir=None, macros=None, include_dirs=None, debug=0,
extra_preargs=None, extra_postargs=None, depends=None):
macros, objects, extra_postargs, pp_opts, build = \
self._setup_compile(output_dir, macros, include_dirs, sources,
depends, extra_postargs)
compile_opts = extra_preargs or []
compile_opts.append ('-c')
if debug:
compile_opts.extend (self.compile_options_debug)
else:
compile_opts.extend (self.compile_options)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
# XXX why do the normpath here?
src = os.path.normpath(src)
obj = os.path.normpath(obj)
# XXX _setup_compile() did a mkpath() too but before the normpath.
# Is it possible to skip the normpath?
self.mkpath(os.path.dirname(obj))
if ext == '.res':
# This is already a binary file -- skip it.
continue # the 'for' loop
if ext == '.rc':
# This needs to be compiled to a .res file -- do it now.
try:
self.spawn (["brcc32", "-fo", obj, src])
except DistutilsExecError as msg:
raise CompileError(msg)
continue # the 'for' loop
# The next two are both for the real compiler.
if ext in self._c_extensions:
input_opt = ""
elif ext in self._cpp_extensions:
input_opt = "-P"
else:
# Unknown file type -- no extra options. The compiler
# will probably fail, but let it just in case this is a
# file the compiler recognizes even if we don't.
input_opt = ""
output_opt = "-o" + obj
# Compiler command line syntax is: "bcc32 [options] file(s)".
# Note that the source file names must appear at the end of
# the command line.
try:
self.spawn ([self.cc] + compile_opts + pp_opts +
[input_opt, output_opt] +
extra_postargs + [src])
except DistutilsExecError as msg:
raise CompileError(msg)
return objects
# compile ()
def create_static_lib (self,
objects,
output_libname,
output_dir=None,
debug=0,
target_lang=None):
(objects, output_dir) = self._fix_object_args (objects, output_dir)
output_filename = \
self.library_filename (output_libname, output_dir=output_dir)
if self._need_link (objects, output_filename):
lib_args = [output_filename, '/u'] + objects
if debug:
pass # XXX what goes here?
try:
self.spawn ([self.lib] + lib_args)
except DistutilsExecError as msg:
raise LibError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
# create_static_lib ()
def link (self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
# XXX this ignores 'build_temp'! should follow the lead of
# msvccompiler.py
(objects, output_dir) = self._fix_object_args (objects, output_dir)
(libraries, library_dirs, runtime_library_dirs) = \
self._fix_lib_args (libraries, library_dirs, runtime_library_dirs)
if runtime_library_dirs:
log.warn("I don't know what to do with 'runtime_library_dirs': %s",
str(runtime_library_dirs))
if output_dir is not None:
output_filename = os.path.join (output_dir, output_filename)
if self._need_link (objects, output_filename):
# Figure out linker args based on type of target.
if target_desc == CCompiler.EXECUTABLE:
startup_obj = 'c0w32'
if debug:
ld_args = self.ldflags_exe_debug[:]
else:
ld_args = self.ldflags_exe[:]
else:
startup_obj = 'c0d32'
if debug:
ld_args = self.ldflags_shared_debug[:]
else:
ld_args = self.ldflags_shared[:]
# Create a temporary exports file for use by the linker
if export_symbols is None:
def_file = ''
else:
head, tail = os.path.split (output_filename)
modname, ext = os.path.splitext (tail)
temp_dir = os.path.dirname(objects[0]) # preserve tree structure
def_file = os.path.join (temp_dir, '%s.def' % modname)
contents = ['EXPORTS']
for sym in (export_symbols or []):
contents.append(' %s=_%s' % (sym, sym))
self.execute(write_file, (def_file, contents),
"writing %s" % def_file)
# Borland C++ has problems with '/' in paths
objects2 = map(os.path.normpath, objects)
# split objects in .obj and .res files
# Borland C++ needs them at different positions in the command line
objects = [startup_obj]
resources = []
for file in objects2:
(base, ext) = os.path.splitext(os.path.normcase(file))
if ext == '.res':
resources.append(file)
else:
objects.append(file)
for l in library_dirs:
ld_args.append("/L%s" % os.path.normpath(l))
ld_args.append("/L.") # we sometimes use relative paths
# list of object files
ld_args.extend(objects)
# XXX the command-line syntax for Borland C++ is a bit wonky;
# certain filenames are jammed together in one big string, but
# comma-delimited. This doesn't mesh too well with the
# Unix-centric attitude (with a DOS/Windows quoting hack) of
# 'spawn()', so constructing the argument list is a bit
# awkward. Note that doing the obvious thing and jamming all
# the filenames and commas into one argument would be wrong,
# because 'spawn()' would quote any filenames with spaces in
# them. Arghghh!. Apparently it works fine as coded...
# name of dll/exe file
ld_args.extend([',',output_filename])
# no map file and start libraries
ld_args.append(',,')
for lib in libraries:
# see if we find it and if there is a bcpp specific lib
# (xxx_bcpp.lib)
libfile = self.find_library_file(library_dirs, lib, debug)
if libfile is None:
ld_args.append(lib)
# probably a BCPP internal library -- don't warn
else:
# full name which prefers bcpp_xxx.lib over xxx.lib
ld_args.append(libfile)
# some default libraries
ld_args.append ('import32')
ld_args.append ('cw32mt')
# def file for export symbols
ld_args.extend([',',def_file])
# add resource files
ld_args.append(',')
ld_args.extend(resources)
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath (os.path.dirname (output_filename))
try:
self.spawn ([self.linker] + ld_args)
except DistutilsExecError as msg:
raise LinkError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
# link ()
# -- Miscellaneous methods -----------------------------------------
def find_library_file (self, dirs, lib, debug=0):
# List of effective library names to try, in order of preference:
# xxx_bcpp.lib is better than xxx.lib
# and xxx_d.lib is better than xxx.lib if debug is set
#
# The "_bcpp" suffix is to handle a Python installation for people
# with multiple compilers (primarily Distutils hackers, I suspect
# ;-). The idea is they'd have one static library for each
# compiler they care about, since (almost?) every Windows compiler
# seems to have a different format for static libraries.
if debug:
dlib = (lib + "_d")
try_names = (dlib + "_bcpp", lib + "_bcpp", dlib, lib)
else:
try_names = (lib + "_bcpp", lib)
for dir in dirs:
for name in try_names:
libfile = os.path.join(dir, self.library_filename(name))
if os.path.exists(libfile):
return libfile
else:
# Oops, didn't find it in *any* of 'dirs'
return None
# overwrite the one from CCompiler to support rc and res-files
def object_filenames (self,
source_filenames,
strip_dir=0,
output_dir=''):
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
# use normcase to make sure '.rc' is really '.rc' and not '.RC'
(base, ext) = os.path.splitext (os.path.normcase(src_name))
if ext not in (self.src_extensions + ['.rc','.res']):
raise UnknownFileError("unknown file type '%s' (from '%s')" % \
(ext, src_name))
if strip_dir:
base = os.path.basename (base)
if ext == '.res':
# these can go unchanged
obj_names.append (os.path.join (output_dir, base + ext))
elif ext == '.rc':
# these need to be compiled to .res-files
obj_names.append (os.path.join (output_dir, base + '.res'))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
# object_filenames ()
def preprocess (self,
source,
output_file=None,
macros=None,
include_dirs=None,
extra_preargs=None,
extra_postargs=None):
(_, macros, include_dirs) = \
self._fix_compile_args(None, macros, include_dirs)
pp_opts = gen_preprocess_options(macros, include_dirs)
pp_args = ['cpp32.exe'] + pp_opts
if output_file is not None:
pp_args.append('-o' + output_file)
if extra_preargs:
pp_args[:0] = extra_preargs
if extra_postargs:
pp_args.extend(extra_postargs)
pp_args.append(source)
# We need to preprocess: either we're being forced to, or the
# source file is newer than the target (or the target doesn't
# exist).
if self.force or output_file is None or newer(source, output_file):
if output_file:
self.mkpath(os.path.dirname(output_file))
try:
self.spawn(pp_args)
except DistutilsExecError as msg:
print(msg)
raise CompileError(msg)
# preprocess()
"""distutils.ccompiler
Contains CCompiler, an abstract base class that defines the interface
for the Distutils compiler abstraction model."""
import sys, os, re
from distutils.errors import *
from distutils.spawn import spawn
from distutils.file_util import move_file
from distutils.dir_util import mkpath
from distutils.dep_util import newer_pairwise, newer_group
from distutils.util import split_quoted, execute
from distutils import log
class CCompiler:
"""Abstract base class to define the interface that must be implemented
by real compiler classes. Also has some utility methods used by
several compiler classes.
The basic idea behind a compiler abstraction class is that each
instance can be used for all the compile/link steps in building a
single project. Thus, attributes common to all of those compile and
link steps -- include directories, macros to define, libraries to link
against, etc. -- are attributes of the compiler instance. To allow for
variability in how individual files are treated, most of those
attributes may be varied on a per-compilation or per-link basis.
"""
# 'compiler_type' is a class attribute that identifies this class. It
# keeps code that wants to know what kind of compiler it's dealing with
# from having to import all possible compiler classes just to do an
# 'isinstance'. In concrete CCompiler subclasses, 'compiler_type'
# should really, really be one of the keys of the 'compiler_class'
# dictionary (see below -- used by the 'new_compiler()' factory
# function) -- authors of new compiler interface classes are
# responsible for updating 'compiler_class'!
compiler_type = None
# XXX things not handled by this compiler abstraction model:
# * client can't provide additional options for a compiler,
# e.g. warning, optimization, debugging flags. Perhaps this
# should be the domain of concrete compiler abstraction classes
# (UnixCCompiler, MSVCCompiler, etc.) -- or perhaps the base
# class should have methods for the common ones.
# * can't completely override the include or library searchg
# path, ie. no "cc -I -Idir1 -Idir2" or "cc -L -Ldir1 -Ldir2".
# I'm not sure how widely supported this is even by Unix
# compilers, much less on other platforms. And I'm even less
# sure how useful it is; maybe for cross-compiling, but
# support for that is a ways off. (And anyways, cross
# compilers probably have a dedicated binary with the
# right paths compiled in. I hope.)
# * can't do really freaky things with the library list/library
# dirs, e.g. "-Ldir1 -lfoo -Ldir2 -lfoo" to link against
# different versions of libfoo.a in different locations. I
# think this is useless without the ability to null out the
# library search path anyways.
# Subclasses that rely on the standard filename generation methods
# implemented below should override these; see the comment near
# those methods ('object_filenames()' et. al.) for details:
src_extensions = None # list of strings
obj_extension = None # string
static_lib_extension = None
shared_lib_extension = None # string
static_lib_format = None # format string
shared_lib_format = None # prob. same as static_lib_format
exe_extension = None # string
# Default language settings. language_map is used to detect a source
# file or Extension target language, checking source filenames.
# language_order is used to detect the language precedence, when deciding
# what language to use when mixing source types. For example, if some
# extension has two files with ".c" extension, and one with ".cpp", it
# is still linked as c++.
language_map = {".c" : "c",
".cc" : "c++",
".cpp" : "c++",
".cxx" : "c++",
".m" : "objc",
}
language_order = ["c++", "objc", "c"]
def __init__(self, verbose=0, dry_run=0, force=0):
self.dry_run = dry_run
self.force = force
self.verbose = verbose
# 'output_dir': a common output directory for object, library,
# shared object, and shared library files
self.output_dir = None
# 'macros': a list of macro definitions (or undefinitions). A
# macro definition is a 2-tuple (name, value), where the value is
# either a string or None (no explicit value). A macro
# undefinition is a 1-tuple (name,).
self.macros = []
# 'include_dirs': a list of directories to search for include files
self.include_dirs = []
# 'libraries': a list of libraries to include in any link
# (library names, not filenames: eg. "foo" not "libfoo.a")
self.libraries = []
# 'library_dirs': a list of directories to search for libraries
self.library_dirs = []
# 'runtime_library_dirs': a list of directories to search for
# shared libraries/objects at runtime
self.runtime_library_dirs = []
# 'objects': a list of object files (or similar, such as explicitly
# named library files) to include on any link
self.objects = []
for key in self.executables.keys():
self.set_executable(key, self.executables[key])
def set_executables(self, **kwargs):
"""Define the executables (and options for them) that will be run
to perform the various stages of compilation. The exact set of
executables that may be specified here depends on the compiler
class (via the 'executables' class attribute), but most will have:
compiler the C/C++ compiler
linker_so linker used to create shared objects and libraries
linker_exe linker used to create binary executables
archiver static library creator
On platforms with a command-line (Unix, DOS/Windows), each of these
is a string that will be split into executable name and (optional)
list of arguments. (Splitting the string is done similarly to how
Unix shells operate: words are delimited by spaces, but quotes and
backslashes can override this. See
'distutils.util.split_quoted()'.)
"""
# Note that some CCompiler implementation classes will define class
# attributes 'cpp', 'cc', etc. with hard-coded executable names;
# this is appropriate when a compiler class is for exactly one
# compiler/OS combination (eg. MSVCCompiler). Other compiler
# classes (UnixCCompiler, in particular) are driven by information
# discovered at run-time, since there are many different ways to do
# basically the same things with Unix C compilers.
for key in kwargs:
if key not in self.executables:
raise ValueError("unknown executable '%s' for class %s" %
(key, self.__class__.__name__))
self.set_executable(key, kwargs[key])
def set_executable(self, key, value):
if isinstance(value, str):
setattr(self, key, split_quoted(value))
else:
setattr(self, key, value)
def _find_macro(self, name):
i = 0
for defn in self.macros:
if defn[0] == name:
return i
i += 1
return None
def _check_macro_definitions(self, definitions):
"""Ensures that every element of 'definitions' is a valid macro
definition, ie. either (name,value) 2-tuple or a (name,) tuple. Do
nothing if all definitions are OK, raise TypeError otherwise.
"""
for defn in definitions:
if not (isinstance(defn, tuple) and
(len(defn) in (1, 2) and
(isinstance (defn[1], str) or defn[1] is None)) and
isinstance (defn[0], str)):
raise TypeError(("invalid macro definition '%s': " % defn) + \
"must be tuple (string,), (string, string), or " + \
"(string, None)")
# -- Bookkeeping methods -------------------------------------------
def define_macro(self, name, value=None):
"""Define a preprocessor macro for all compilations driven by this
compiler object. The optional parameter 'value' should be a
string; if it is not supplied, then the macro will be defined
without an explicit value and the exact outcome depends on the
compiler used (XXX true? does ANSI say anything about this?)
"""
# Delete from the list of macro definitions/undefinitions if
# already there (so that this one will take precedence).
i = self._find_macro (name)
if i is not None:
del self.macros[i]
self.macros.append((name, value))
def undefine_macro(self, name):
"""Undefine a preprocessor macro for all compilations driven by
this compiler object. If the same macro is defined by
'define_macro()' and undefined by 'undefine_macro()' the last call
takes precedence (including multiple redefinitions or
undefinitions). If the macro is redefined/undefined on a
per-compilation basis (ie. in the call to 'compile()'), then that
takes precedence.
"""
# Delete from the list of macro definitions/undefinitions if
# already there (so that this one will take precedence).
i = self._find_macro (name)
if i is not None:
del self.macros[i]
undefn = (name,)
self.macros.append(undefn)
def add_include_dir(self, dir):
"""Add 'dir' to the list of directories that will be searched for
header files. The compiler is instructed to search directories in
the order in which they are supplied by successive calls to
'add_include_dir()'.
"""
self.include_dirs.append(dir)
def set_include_dirs(self, dirs):
"""Set the list of directories that will be searched to 'dirs' (a
list of strings). Overrides any preceding calls to
'add_include_dir()'; subsequence calls to 'add_include_dir()' add
to the list passed to 'set_include_dirs()'. This does not affect
any list of standard include directories that the compiler may
search by default.
"""
self.include_dirs = dirs[:]
def add_library(self, libname):
"""Add 'libname' to the list of libraries that will be included in
all links driven by this compiler object. Note that 'libname'
should *not* be the name of a file containing a library, but the
name of the library itself: the actual filename will be inferred by
the linker, the compiler, or the compiler class (depending on the
platform).
The linker will be instructed to link against libraries in the
order they were supplied to 'add_library()' and/or
'set_libraries()'. It is perfectly valid to duplicate library
names; the linker will be instructed to link against libraries as
many times as they are mentioned.
"""
self.libraries.append(libname)
def set_libraries(self, libnames):
"""Set the list of libraries to be included in all links driven by
this compiler object to 'libnames' (a list of strings). This does
not affect any standard system libraries that the linker may
include by default.
"""
self.libraries = libnames[:]
def add_library_dir(self, dir):
"""Add 'dir' to the list of directories that will be searched for
libraries specified to 'add_library()' and 'set_libraries()'. The
linker will be instructed to search for libraries in the order they
are supplied to 'add_library_dir()' and/or 'set_library_dirs()'.
"""
self.library_dirs.append(dir)
def set_library_dirs(self, dirs):
"""Set the list of library search directories to 'dirs' (a list of
strings). This does not affect any standard library search path
that the linker may search by default.
"""
self.library_dirs = dirs[:]
def add_runtime_library_dir(self, dir):
"""Add 'dir' to the list of directories that will be searched for
shared libraries at runtime.
"""
self.runtime_library_dirs.append(dir)
def set_runtime_library_dirs(self, dirs):
"""Set the list of directories to search for shared libraries at
runtime to 'dirs' (a list of strings). This does not affect any
standard search path that the runtime linker may search by
default.
"""
self.runtime_library_dirs = dirs[:]
def add_link_object(self, object):
"""Add 'object' to the list of object files (or analogues, such as
explicitly named library files or the output of "resource
compilers") to be included in every link driven by this compiler
object.
"""
self.objects.append(object)
def set_link_objects(self, objects):
"""Set the list of object files (or analogues) to be included in
every link to 'objects'. This does not affect any standard object
files that the linker may include by default (such as system
libraries).
"""
self.objects = objects[:]
# -- Private utility methods --------------------------------------
# (here for the convenience of subclasses)
# Helper method to prep compiler in subclass compile() methods
def _setup_compile(self, outdir, macros, incdirs, sources, depends,
extra):
"""Process arguments and decide which source files to compile."""
if outdir is None:
outdir = self.output_dir
elif not isinstance(outdir, str):
raise TypeError("'output_dir' must be a string or None")
if macros is None:
macros = self.macros
elif isinstance(macros, list):
macros = macros + (self.macros or [])
else:
raise TypeError("'macros' (if supplied) must be a list of tuples")
if incdirs is None:
incdirs = self.include_dirs
elif isinstance(incdirs, (list, tuple)):
incdirs = list(incdirs) + (self.include_dirs or [])
else:
raise TypeError(
"'include_dirs' (if supplied) must be a list of strings")
if extra is None:
extra = []
# Get the list of expected output (object) files
objects = self.object_filenames(sources, strip_dir=0,
output_dir=outdir)
assert len(objects) == len(sources)
pp_opts = gen_preprocess_options(macros, incdirs)
build = {}
for i in range(len(sources)):
src = sources[i]
obj = objects[i]
ext = os.path.splitext(src)[1]
self.mkpath(os.path.dirname(obj))
build[obj] = (src, ext)
return macros, objects, extra, pp_opts, build
def _get_cc_args(self, pp_opts, debug, before):
# works for unixccompiler, cygwinccompiler
cc_args = pp_opts + ['-c']
if debug:
cc_args[:0] = ['-g']
if before:
cc_args[:0] = before
return cc_args
def _fix_compile_args(self, output_dir, macros, include_dirs):
"""Typecheck and fix-up some of the arguments to the 'compile()'
method, and return fixed-up values. Specifically: if 'output_dir'
is None, replaces it with 'self.output_dir'; ensures that 'macros'
is a list, and augments it with 'self.macros'; ensures that
'include_dirs' is a list, and augments it with 'self.include_dirs'.
Guarantees that the returned values are of the correct type,
i.e. for 'output_dir' either string or None, and for 'macros' and
'include_dirs' either list or None.
"""
if output_dir is None:
output_dir = self.output_dir
elif not isinstance(output_dir, str):
raise TypeError("'output_dir' must be a string or None")
if macros is None:
macros = self.macros
elif isinstance(macros, list):
macros = macros + (self.macros or [])
else:
raise TypeError("'macros' (if supplied) must be a list of tuples")
if include_dirs is None:
include_dirs = self.include_dirs
elif isinstance(include_dirs, (list, tuple)):
include_dirs = list(include_dirs) + (self.include_dirs or [])
else:
raise TypeError(
"'include_dirs' (if supplied) must be a list of strings")
return output_dir, macros, include_dirs
def _prep_compile(self, sources, output_dir, depends=None):
"""Decide which souce files must be recompiled.
Determine the list of object files corresponding to 'sources',
and figure out which ones really need to be recompiled.
Return a list of all object files and a dictionary telling
which source files can be skipped.
"""
# Get the list of expected output (object) files
objects = self.object_filenames(sources, output_dir=output_dir)
assert len(objects) == len(sources)
# Return an empty dict for the "which source files can be skipped"
# return value to preserve API compatibility.
return objects, {}
def _fix_object_args(self, objects, output_dir):
"""Typecheck and fix up some arguments supplied to various methods.
Specifically: ensure that 'objects' is a list; if output_dir is
None, replace with self.output_dir. Return fixed versions of
'objects' and 'output_dir'.
"""
if not isinstance(objects, (list, tuple)):
raise TypeError("'objects' must be a list or tuple of strings")
objects = list(objects)
if output_dir is None:
output_dir = self.output_dir
elif not isinstance(output_dir, str):
raise TypeError("'output_dir' must be a string or None")
return (objects, output_dir)
def _fix_lib_args(self, libraries, library_dirs, runtime_library_dirs):
"""Typecheck and fix up some of the arguments supplied to the
'link_*' methods. Specifically: ensure that all arguments are
lists, and augment them with their permanent versions
(eg. 'self.libraries' augments 'libraries'). Return a tuple with
fixed versions of all arguments.
"""
if libraries is None:
libraries = self.libraries
elif isinstance(libraries, (list, tuple)):
libraries = list (libraries) + (self.libraries or [])
else:
raise TypeError(
"'libraries' (if supplied) must be a list of strings")
if library_dirs is None:
library_dirs = self.library_dirs
elif isinstance(library_dirs, (list, tuple)):
library_dirs = list (library_dirs) + (self.library_dirs or [])
else:
raise TypeError(
"'library_dirs' (if supplied) must be a list of strings")
if runtime_library_dirs is None:
runtime_library_dirs = self.runtime_library_dirs
elif isinstance(runtime_library_dirs, (list, tuple)):
runtime_library_dirs = (list(runtime_library_dirs) +
(self.runtime_library_dirs or []))
else:
raise TypeError("'runtime_library_dirs' (if supplied) "
"must be a list of strings")
return (libraries, library_dirs, runtime_library_dirs)
def _need_link(self, objects, output_file):
"""Return true if we need to relink the files listed in 'objects'
to recreate 'output_file'.
"""
if self.force:
return True
else:
if self.dry_run:
newer = newer_group (objects, output_file, missing='newer')
else:
newer = newer_group (objects, output_file)
return newer
def detect_language(self, sources):
"""Detect the language of a given file, or list of files. Uses
language_map, and language_order to do the job.
"""
if not isinstance(sources, list):
sources = [sources]
lang = None
index = len(self.language_order)
for source in sources:
base, ext = os.path.splitext(source)
extlang = self.language_map.get(ext)
try:
extindex = self.language_order.index(extlang)
if extindex < index:
lang = extlang
index = extindex
except ValueError:
pass
return lang
# -- Worker methods ------------------------------------------------
# (must be implemented by subclasses)
def preprocess(self, source, output_file=None, macros=None,
include_dirs=None, extra_preargs=None, extra_postargs=None):
"""Preprocess a single C/C++ source file, named in 'source'.
Output will be written to file named 'output_file', or stdout if
'output_file' not supplied. 'macros' is a list of macro
definitions as for 'compile()', which will augment the macros set
with 'define_macro()' and 'undefine_macro()'. 'include_dirs' is a
list of directory names that will be added to the default list.
Raises PreprocessError on failure.
"""
pass
def compile(self, sources, output_dir=None, macros=None,
include_dirs=None, debug=0, extra_preargs=None,
extra_postargs=None, depends=None):
"""Compile one or more source files.
'sources' must be a list of filenames, most likely C/C++
files, but in reality anything that can be handled by a
particular compiler and compiler class (eg. MSVCCompiler can
handle resource files in 'sources'). Return a list of object
filenames, one per source filename in 'sources'. Depending on
the implementation, not all source files will necessarily be
compiled, but all corresponding object filenames will be
returned.
If 'output_dir' is given, object files will be put under it, while
retaining their original path component. That is, "foo/bar.c"
normally compiles to "foo/bar.o" (for a Unix implementation); if
'output_dir' is "build", then it would compile to
"build/foo/bar.o".
'macros', if given, must be a list of macro definitions. A macro
definition is either a (name, value) 2-tuple or a (name,) 1-tuple.
The former defines a macro; if the value is None, the macro is
defined without an explicit value. The 1-tuple case undefines a
macro. Later definitions/redefinitions/ undefinitions take
precedence.
'include_dirs', if given, must be a list of strings, the
directories to add to the default include file search path for this
compilation only.
'debug' is a boolean; if true, the compiler will be instructed to
output debug symbols in (or alongside) the object file(s).
'extra_preargs' and 'extra_postargs' are implementation- dependent.
On platforms that have the notion of a command-line (e.g. Unix,
DOS/Windows), they are most likely lists of strings: extra
command-line arguments to prepand/append to the compiler command
line. On other platforms, consult the implementation class
documentation. In any event, they are intended as an escape hatch
for those occasions when the abstract compiler framework doesn't
cut the mustard.
'depends', if given, is a list of filenames that all targets
depend on. If a source file is older than any file in
depends, then the source file will be recompiled. This
supports dependency tracking, but only at a coarse
granularity.
Raises CompileError on failure.
"""
# A concrete compiler class can either override this method
# entirely or implement _compile().
macros, objects, extra_postargs, pp_opts, build = \
self._setup_compile(output_dir, macros, include_dirs, sources,
depends, extra_postargs)
cc_args = self._get_cc_args(pp_opts, debug, extra_preargs)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
# Return *all* object filenames, not just the ones we just built.
return objects
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
"""Compile 'src' to product 'obj'."""
# A concrete compiler class that does not override compile()
# should implement _compile().
pass
def create_static_lib(self, objects, output_libname, output_dir=None,
debug=0, target_lang=None):
"""Link a bunch of stuff together to create a static library file.
The "bunch of stuff" consists of the list of object files supplied
as 'objects', the extra object files supplied to
'add_link_object()' and/or 'set_link_objects()', the libraries
supplied to 'add_library()' and/or 'set_libraries()', and the
libraries supplied as 'libraries' (if any).
'output_libname' should be a library name, not a filename; the
filename will be inferred from the library name. 'output_dir' is
the directory where the library file will be put.
'debug' is a boolean; if true, debugging information will be
included in the library (note that on most platforms, it is the
compile step where this matters: the 'debug' flag is included here
just for consistency).
'target_lang' is the target language for which the given objects
are being compiled. This allows specific linkage time treatment of
certain languages.
Raises LibError on failure.
"""
pass
# values for target_desc parameter in link()
SHARED_OBJECT = "shared_object"
SHARED_LIBRARY = "shared_library"
EXECUTABLE = "executable"
def link(self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
"""Link a bunch of stuff together to create an executable or
shared library file.
The "bunch of stuff" consists of the list of object files supplied
as 'objects'. 'output_filename' should be a filename. If
'output_dir' is supplied, 'output_filename' is relative to it
(i.e. 'output_filename' can provide directory components if
needed).
'libraries' is a list of libraries to link against. These are
library names, not filenames, since they're translated into
filenames in a platform-specific way (eg. "foo" becomes "libfoo.a"
on Unix and "foo.lib" on DOS/Windows). However, they can include a
directory component, which means the linker will look in that
specific directory rather than searching all the normal locations.
'library_dirs', if supplied, should be a list of directories to
search for libraries that were specified as bare library names
(ie. no directory component). These are on top of the system
default and those supplied to 'add_library_dir()' and/or
'set_library_dirs()'. 'runtime_library_dirs' is a list of
directories that will be embedded into the shared library and used
to search for other shared libraries that *it* depends on at
run-time. (This may only be relevant on Unix.)
'export_symbols' is a list of symbols that the shared library will
export. (This appears to be relevant only on Windows.)
'debug' is as for 'compile()' and 'create_static_lib()', with the
slight distinction that it actually matters on most platforms (as
opposed to 'create_static_lib()', which includes a 'debug' flag
mostly for form's sake).
'extra_preargs' and 'extra_postargs' are as for 'compile()' (except
of course that they supply command-line arguments for the
particular linker being used).
'target_lang' is the target language for which the given objects
are being compiled. This allows specific linkage time treatment of
certain languages.
Raises LinkError on failure.
"""
raise NotImplementedError
# Old 'link_*()' methods, rewritten to use the new 'link()' method.
def link_shared_lib(self,
objects,
output_libname,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
self.link(CCompiler.SHARED_LIBRARY, objects,
self.library_filename(output_libname, lib_type='shared'),
output_dir,
libraries, library_dirs, runtime_library_dirs,
export_symbols, debug,
extra_preargs, extra_postargs, build_temp, target_lang)
def link_shared_object(self,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
self.link(CCompiler.SHARED_OBJECT, objects,
output_filename, output_dir,
libraries, library_dirs, runtime_library_dirs,
export_symbols, debug,
extra_preargs, extra_postargs, build_temp, target_lang)
def link_executable(self,
objects,
output_progname,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
target_lang=None):
self.link(CCompiler.EXECUTABLE, objects,
self.executable_filename(output_progname), output_dir,
libraries, library_dirs, runtime_library_dirs, None,
debug, extra_preargs, extra_postargs, None, target_lang)
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function; there is
# no appropriate default implementation so subclasses should
# implement all of these.
def library_dir_option(self, dir):
"""Return the compiler option to add 'dir' to the list of
directories searched for libraries.
"""
raise NotImplementedError
def runtime_library_dir_option(self, dir):
"""Return the compiler option to add 'dir' to the list of
directories searched for runtime libraries.
"""
raise NotImplementedError
def library_option(self, lib):
"""Return the compiler option to add 'lib' to the list of libraries
linked into the shared library or executable.
"""
raise NotImplementedError
def has_function(self, funcname, includes=None, include_dirs=None,
libraries=None, library_dirs=None):
"""Return a boolean indicating whether funcname is supported on
the current platform. The optional arguments can be used to
augment the compilation environment.
"""
# this can't be included at module scope because it tries to
# import math which might not be available at that point - maybe
# the necessary logic should just be inlined?
import tempfile
if includes is None:
includes = []
if include_dirs is None:
include_dirs = []
if libraries is None:
libraries = []
if library_dirs is None:
library_dirs = []
fd, fname = tempfile.mkstemp(".c", funcname, text=True)
f = os.fdopen(fd, "w")
try:
for incl in includes:
f.write("""#include "%s"\n""" % incl)
f.write("""\
main (int argc, char **argv) {
%s();
}
""" % funcname)
finally:
f.close()
try:
objects = self.compile([fname], include_dirs=include_dirs)
except CompileError:
return False
try:
self.link_executable(objects, "a.out",
libraries=libraries,
library_dirs=library_dirs)
except (LinkError, TypeError):
return False
return True
def find_library_file (self, dirs, lib, debug=0):
"""Search the specified list of directories for a static or shared
library file 'lib' and return the full path to that file. If
'debug' true, look for a debugging version (if that makes sense on
the current platform). Return None if 'lib' wasn't found in any of
the specified directories.
"""
raise NotImplementedError
# -- Filename generation methods -----------------------------------
# The default implementation of the filename generating methods are
# prejudiced towards the Unix/DOS/Windows view of the world:
# * object files are named by replacing the source file extension
# (eg. .c/.cpp -> .o/.obj)
# * library files (shared or static) are named by plugging the
# library name and extension into a format string, eg.
# "lib%s.%s" % (lib_name, ".a") for Unix static libraries
# * executables are named by appending an extension (possibly
# empty) to the program name: eg. progname + ".exe" for
# Windows
#
# To reduce redundant code, these methods expect to find
# several attributes in the current object (presumably defined
# as class attributes):
# * src_extensions -
# list of C/C++ source file extensions, eg. ['.c', '.cpp']
# * obj_extension -
# object file extension, eg. '.o' or '.obj'
# * static_lib_extension -
# extension for static library files, eg. '.a' or '.lib'
# * shared_lib_extension -
# extension for shared library/object files, eg. '.so', '.dll'
# * static_lib_format -
# format string for generating static library filenames,
# eg. 'lib%s.%s' or '%s.%s'
# * shared_lib_format
# format string for generating shared library filenames
# (probably same as static_lib_format, since the extension
# is one of the intended parameters to the format string)
# * exe_extension -
# extension for executable files, eg. '' or '.exe'
def object_filenames(self, source_filenames, strip_dir=0, output_dir=''):
if output_dir is None:
output_dir = ''
obj_names = []
for src_name in source_filenames:
base, ext = os.path.splitext(src_name)
base = os.path.splitdrive(base)[1] # Chop off the drive
base = base[os.path.isabs(base):] # If abs, chop off leading /
if ext not in self.src_extensions:
raise UnknownFileError(
"unknown file type '%s' (from '%s')" % (ext, src_name))
if strip_dir:
base = os.path.basename(base)
obj_names.append(os.path.join(output_dir,
base + self.obj_extension))
return obj_names
def shared_object_filename(self, basename, strip_dir=0, output_dir=''):
assert output_dir is not None
if strip_dir:
basename = os.path.basename(basename)
return os.path.join(output_dir, basename + self.shared_lib_extension)
def executable_filename(self, basename, strip_dir=0, output_dir=''):
assert output_dir is not None
if strip_dir:
basename = os.path.basename(basename)
return os.path.join(output_dir, basename + (self.exe_extension or ''))
def library_filename(self, libname, lib_type='static', # or 'shared'
strip_dir=0, output_dir=''):
assert output_dir is not None
if lib_type not in ("static", "shared", "dylib"):
raise ValueError(
"'lib_type' must be \"static\", \"shared\" or \"dylib\"")
fmt = getattr(self, lib_type + "_lib_format")
ext = getattr(self, lib_type + "_lib_extension")
dir, base = os.path.split(libname)
filename = fmt % (base, ext)
if strip_dir:
dir = ''
return os.path.join(output_dir, dir, filename)
# -- Utility methods -----------------------------------------------
def announce(self, msg, level=1):
log.debug(msg)
def debug_print(self, msg):
from distutils.debug import DEBUG
if DEBUG:
print(msg)
def warn(self, msg):
sys.stderr.write("warning: %s\n" % msg)
def execute(self, func, args, msg=None, level=1):
execute(func, args, msg, self.dry_run)
def spawn(self, cmd):
spawn(cmd, dry_run=self.dry_run)
def move_file(self, src, dst):
return move_file(src, dst, dry_run=self.dry_run)
def mkpath (self, name, mode=0o777):
mkpath(name, mode, dry_run=self.dry_run)
# Map a sys.platform/os.name ('posix', 'nt') to the default compiler
# type for that platform. Keys are interpreted as re match
# patterns. Order is important; platform mappings are preferred over
# OS names.
_default_compilers = (
# Platform string mappings
# on a cygwin built python we can use gcc like an ordinary UNIXish
# compiler
('cygwin.*', 'unix'),
# OS name mappings
('posix', 'unix'),
('nt', 'msvc'),
)
def get_default_compiler(osname=None, platform=None):
"""Determine the default compiler to use for the given platform.
osname should be one of the standard Python OS names (i.e. the
ones returned by os.name) and platform the common value
returned by sys.platform for the platform in question.
The default values are os.name and sys.platform in case the
parameters are not given.
"""
if osname is None:
osname = os.name
if platform is None:
platform = sys.platform
for pattern, compiler in _default_compilers:
if re.match(pattern, platform) is not None or \
re.match(pattern, osname) is not None:
return compiler
# Default to Unix compiler
return 'unix'
# Map compiler types to (module_name, class_name) pairs -- ie. where to
# find the code that implements an interface to this compiler. (The module
# is assumed to be in the 'distutils' package.)
compiler_class = { 'unix': ('unixccompiler', 'UnixCCompiler',
"standard UNIX-style compiler"),
'msvc': ('msvccompiler', 'MSVCCompiler',
"Microsoft Visual C++"),
'cygwin': ('cygwinccompiler', 'CygwinCCompiler',
"Cygwin port of GNU C Compiler for Win32"),
'mingw32': ('cygwinccompiler', 'Mingw32CCompiler',
"Mingw32 port of GNU C Compiler for Win32"),
'bcpp': ('bcppcompiler', 'BCPPCompiler',
"Borland C++ Compiler"),
}
def show_compilers():
"""Print list of available compilers (used by the "--help-compiler"
options to "build", "build_ext", "build_clib").
"""
# XXX this "knows" that the compiler option it's describing is
# "--compiler", which just happens to be the case for the three
# commands that use it.
from distutils.fancy_getopt import FancyGetopt
compilers = []
for compiler in compiler_class.keys():
compilers.append(("compiler="+compiler, None,
compiler_class[compiler][2]))
compilers.sort()
pretty_printer = FancyGetopt(compilers)
pretty_printer.print_help("List of available compilers:")
def new_compiler(plat=None, compiler=None, verbose=0, dry_run=0, force=0):
"""Generate an instance of some CCompiler subclass for the supplied
platform/compiler combination. 'plat' defaults to 'os.name'
(eg. 'posix', 'nt'), and 'compiler' defaults to the default compiler
for that platform. Currently only 'posix' and 'nt' are supported, and
the default compilers are "traditional Unix interface" (UnixCCompiler
class) and Visual C++ (MSVCCompiler class). Note that it's perfectly
possible to ask for a Unix compiler object under Windows, and a
Microsoft compiler object under Unix -- if you supply a value for
'compiler', 'plat' is ignored.
"""
if plat is None:
plat = os.name
try:
if compiler is None:
compiler = get_default_compiler(plat)
(module_name, class_name, long_description) = compiler_class[compiler]
except KeyError:
msg = "don't know how to compile C/C++ code on platform '%s'" % plat
if compiler is not None:
msg = msg + " with '%s' compiler" % compiler
raise DistutilsPlatformError(msg)
try:
module_name = "distutils." + module_name
__import__ (module_name)
module = sys.modules[module_name]
klass = vars(module)[class_name]
except ImportError:
raise DistutilsModuleError(
"can't compile C/C++ code: unable to load module '%s'" % \
module_name)
except KeyError:
raise DistutilsModuleError(
"can't compile C/C++ code: unable to find class '%s' "
"in module '%s'" % (class_name, module_name))
# XXX The None is necessary to preserve backwards compatibility
# with classes that expect verbose to be the first positional
# argument.
return klass(None, dry_run, force)
def gen_preprocess_options(macros, include_dirs):
"""Generate C pre-processor options (-D, -U, -I) as used by at least
two types of compilers: the typical Unix compiler and Visual C++.
'macros' is the usual thing, a list of 1- or 2-tuples, where (name,)
means undefine (-U) macro 'name', and (name,value) means define (-D)
macro 'name' to 'value'. 'include_dirs' is just a list of directory
names to be added to the header file search path (-I). Returns a list
of command-line options suitable for either Unix compilers or Visual
C++.
"""
# XXX it would be nice (mainly aesthetic, and so we don't generate
# stupid-looking command lines) to go over 'macros' and eliminate
# redundant definitions/undefinitions (ie. ensure that only the
# latest mention of a particular macro winds up on the command
# line). I don't think it's essential, though, since most (all?)
# Unix C compilers only pay attention to the latest -D or -U
# mention of a macro on their command line. Similar situation for
# 'include_dirs'. I'm punting on both for now. Anyways, weeding out
# redundancies like this should probably be the province of
# CCompiler, since the data structures used are inherited from it
# and therefore common to all CCompiler classes.
pp_opts = []
for macro in macros:
if not (isinstance(macro, tuple) and 1 <= len(macro) <= 2):
raise TypeError(
"bad macro definition '%s': "
"each element of 'macros' list must be a 1- or 2-tuple"
% macro)
if len(macro) == 1: # undefine this macro
pp_opts.append("-U%s" % macro[0])
elif len(macro) == 2:
if macro[1] is None: # define with no explicit value
pp_opts.append("-D%s" % macro[0])
else:
# XXX *don't* need to be clever about quoting the
# macro value here, because we're going to avoid the
# shell at all costs when we spawn the command!
pp_opts.append("-D%s=%s" % macro)
for dir in include_dirs:
pp_opts.append("-I%s" % dir)
return pp_opts
def gen_lib_options (compiler, library_dirs, runtime_library_dirs, libraries):
"""Generate linker options for searching library directories and
linking with specific libraries. 'libraries' and 'library_dirs' are,
respectively, lists of library names (not filenames!) and search
directories. Returns a list of command-line options suitable for use
with some compiler (depending on the two format strings passed in).
"""
lib_opts = []
for dir in library_dirs:
lib_opts.append(compiler.library_dir_option(dir))
for dir in runtime_library_dirs:
opt = compiler.runtime_library_dir_option(dir)
if isinstance(opt, list):
lib_opts = lib_opts + opt
else:
lib_opts.append(opt)
# XXX it's important that we *not* remove redundant library mentions!
# sometimes you really do have to say "-lfoo -lbar -lfoo" in order to
# resolve all symbols. I just hope we never have to say "-lfoo obj.o
# -lbar" to get things to work -- that's certainly a possibility, but a
# pretty nasty way to arrange your C code.
for lib in libraries:
(lib_dir, lib_name) = os.path.split(lib)
if lib_dir:
lib_file = compiler.find_library_file([lib_dir], lib_name)
if lib_file:
lib_opts.append(lib_file)
else:
compiler.warn("no library file corresponding to "
"'%s' found (skipping)" % lib)
else:
lib_opts.append(compiler.library_option (lib))
return lib_opts
"""distutils.cmd
Provides the Command class, the base class for the command classes
in the distutils.command package.
"""
import sys, os, re
from distutils.errors import DistutilsOptionError
from distutils import util, dir_util, file_util, archive_util, dep_util
from distutils import log
class Command:
"""Abstract base class for defining command classes, the "worker bees"
of the Distutils. A useful analogy for command classes is to think of
them as subroutines with local variables called "options". The options
are "declared" in 'initialize_options()' and "defined" (given their
final values, aka "finalized") in 'finalize_options()', both of which
must be defined by every command class. The distinction between the
two is necessary because option values might come from the outside
world (command line, config file, ...), and any options dependent on
other options must be computed *after* these outside influences have
been processed -- hence 'finalize_options()'. The "body" of the
subroutine, where it does all its work based on the values of its
options, is the 'run()' method, which must also be implemented by every
command class.
"""
# 'sub_commands' formalizes the notion of a "family" of commands,
# eg. "install" as the parent with sub-commands "install_lib",
# "install_headers", etc. The parent of a family of commands
# defines 'sub_commands' as a class attribute; it's a list of
# (command_name : string, predicate : unbound_method | string | None)
# tuples, where 'predicate' is a method of the parent command that
# determines whether the corresponding command is applicable in the
# current situation. (Eg. we "install_headers" is only applicable if
# we have any C header files to install.) If 'predicate' is None,
# that command is always applicable.
#
# 'sub_commands' is usually defined at the *end* of a class, because
# predicates can be unbound methods, so they must already have been
# defined. The canonical example is the "install" command.
sub_commands = []
# -- Creation/initialization methods -------------------------------
def __init__(self, dist):
"""Create and initialize a new Command object. Most importantly,
invokes the 'initialize_options()' method, which is the real
initializer and depends on the actual command being
instantiated.
"""
# late import because of mutual dependence between these classes
from distutils.dist import Distribution
if not isinstance(dist, Distribution):
raise TypeError("dist must be a Distribution instance")
if self.__class__ is Command:
raise RuntimeError("Command is an abstract class")
self.distribution = dist
self.initialize_options()
# Per-command versions of the global flags, so that the user can
# customize Distutils' behaviour command-by-command and let some
# commands fall back on the Distribution's behaviour. None means
# "not defined, check self.distribution's copy", while 0 or 1 mean
# false and true (duh). Note that this means figuring out the real
# value of each flag is a touch complicated -- hence "self._dry_run"
# will be handled by __getattr__, below.
# XXX This needs to be fixed.
self._dry_run = None
# verbose is largely ignored, but needs to be set for
# backwards compatibility (I think)?
self.verbose = dist.verbose
# Some commands define a 'self.force' option to ignore file
# timestamps, but methods defined *here* assume that
# 'self.force' exists for all commands. So define it here
# just to be safe.
self.force = None
# The 'help' flag is just used for command-line parsing, so
# none of that complicated bureaucracy is needed.
self.help = 0
# 'finalized' records whether or not 'finalize_options()' has been
# called. 'finalize_options()' itself should not pay attention to
# this flag: it is the business of 'ensure_finalized()', which
# always calls 'finalize_options()', to respect/update it.
self.finalized = 0
# XXX A more explicit way to customize dry_run would be better.
def __getattr__(self, attr):
if attr == 'dry_run':
myval = getattr(self, "_" + attr)
if myval is None:
return getattr(self.distribution, attr)
else:
return myval
else:
raise AttributeError(attr)
def ensure_finalized(self):
if not self.finalized:
self.finalize_options()
self.finalized = 1
# Subclasses must define:
# initialize_options()
# provide default values for all options; may be customized by
# setup script, by options from config file(s), or by command-line
# options
# finalize_options()
# decide on the final values for all options; this is called
# after all possible intervention from the outside world
# (command-line, option file, etc.) has been processed
# run()
# run the command: do whatever it is we're here to do,
# controlled by the command's various option values
def initialize_options(self):
"""Set default values for all the options that this command
supports. Note that these defaults may be overridden by other
commands, by the setup script, by config files, or by the
command-line. Thus, this is not the place to code dependencies
between options; generally, 'initialize_options()' implementations
are just a bunch of "self.foo = None" assignments.
This method must be implemented by all command classes.
"""
raise RuntimeError("abstract method -- subclass %s must override"
% self.__class__)
def finalize_options(self):
"""Set final values for all the options that this command supports.
This is always called as late as possible, ie. after any option
assignments from the command-line or from other commands have been
done. Thus, this is the place to code option dependencies: if
'foo' depends on 'bar', then it is safe to set 'foo' from 'bar' as
long as 'foo' still has the same value it was assigned in
'initialize_options()'.
This method must be implemented by all command classes.
"""
raise RuntimeError("abstract method -- subclass %s must override"
% self.__class__)
def dump_options(self, header=None, indent=""):
from distutils.fancy_getopt import longopt_xlate
if header is None:
header = "command options for '%s':" % self.get_command_name()
self.announce(indent + header, level=log.INFO)
indent = indent + " "
for (option, _, _) in self.user_options:
option = option.translate(longopt_xlate)
if option[-1] == "=":
option = option[:-1]
value = getattr(self, option)
self.announce(indent + "%s = %s" % (option, value),
level=log.INFO)
def run(self):
"""A command's raison d'etre: carry out the action it exists to
perform, controlled by the options initialized in
'initialize_options()', customized by other commands, the setup
script, the command-line, and config files, and finalized in
'finalize_options()'. All terminal output and filesystem
interaction should be done by 'run()'.
This method must be implemented by all command classes.
"""
raise RuntimeError("abstract method -- subclass %s must override"
% self.__class__)
def announce(self, msg, level=1):
"""If the current verbosity level is of greater than or equal to
'level' print 'msg' to stdout.
"""
log.log(level, msg)
def debug_print(self, msg):
"""Print 'msg' to stdout if the global DEBUG (taken from the
DISTUTILS_DEBUG environment variable) flag is true.
"""
from distutils.debug import DEBUG
if DEBUG:
print(msg)
sys.stdout.flush()
# -- Option validation methods -------------------------------------
# (these are very handy in writing the 'finalize_options()' method)
#
# NB. the general philosophy here is to ensure that a particular option
# value meets certain type and value constraints. If not, we try to
# force it into conformance (eg. if we expect a list but have a string,
# split the string on comma and/or whitespace). If we can't force the
# option into conformance, raise DistutilsOptionError. Thus, command
# classes need do nothing more than (eg.)
# self.ensure_string_list('foo')
# and they can be guaranteed that thereafter, self.foo will be
# a list of strings.
def _ensure_stringlike(self, option, what, default=None):
val = getattr(self, option)
if val is None:
setattr(self, option, default)
return default
elif not isinstance(val, str):
raise DistutilsOptionError("'%s' must be a %s (got `%s`)"
% (option, what, val))
return val
def ensure_string(self, option, default=None):
"""Ensure that 'option' is a string; if not defined, set it to
'default'.
"""
self._ensure_stringlike(option, "string", default)
def ensure_string_list(self, option):
"""Ensure that 'option' is a list of strings. If 'option' is
currently a string, we split it either on /,\s*/ or /\s+/, so
"foo bar baz", "foo,bar,baz", and "foo, bar baz" all become
["foo", "bar", "baz"].
"""
val = getattr(self, option)
if val is None:
return
elif isinstance(val, str):
setattr(self, option, re.split(r',\s*|\s+', val))
else:
if isinstance(val, list):
ok = all(isinstance(v, str) for v in val)
else:
ok = False
if not ok:
raise DistutilsOptionError(
"'%s' must be a list of strings (got %r)"
% (option, val))
def _ensure_tested_string(self, option, tester, what, error_fmt,
default=None):
val = self._ensure_stringlike(option, what, default)
if val is not None and not tester(val):
raise DistutilsOptionError(("error in '%s' option: " + error_fmt)
% (option, val))
def ensure_filename(self, option):
"""Ensure that 'option' is the name of an existing file."""
self._ensure_tested_string(option, os.path.isfile,
"filename",
"'%s' does not exist or is not a file")
def ensure_dirname(self, option):
self._ensure_tested_string(option, os.path.isdir,
"directory name",
"'%s' does not exist or is not a directory")
# -- Convenience methods for commands ------------------------------
def get_command_name(self):
if hasattr(self, 'command_name'):
return self.command_name
else:
return self.__class__.__name__
def set_undefined_options(self, src_cmd, *option_pairs):
"""Set the values of any "undefined" options from corresponding
option values in some other command object. "Undefined" here means
"is None", which is the convention used to indicate that an option
has not been changed between 'initialize_options()' and
'finalize_options()'. Usually called from 'finalize_options()' for
options that depend on some other command rather than another
option of the same command. 'src_cmd' is the other command from
which option values will be taken (a command object will be created
for it if necessary); the remaining arguments are
'(src_option,dst_option)' tuples which mean "take the value of
'src_option' in the 'src_cmd' command object, and copy it to
'dst_option' in the current command object".
"""
# Option_pairs: list of (src_option, dst_option) tuples
src_cmd_obj = self.distribution.get_command_obj(src_cmd)
src_cmd_obj.ensure_finalized()
for (src_option, dst_option) in option_pairs:
if getattr(self, dst_option) is None:
setattr(self, dst_option, getattr(src_cmd_obj, src_option))
def get_finalized_command(self, command, create=1):
"""Wrapper around Distribution's 'get_command_obj()' method: find
(create if necessary and 'create' is true) the command object for
'command', call its 'ensure_finalized()' method, and return the
finalized command object.
"""
cmd_obj = self.distribution.get_command_obj(command, create)
cmd_obj.ensure_finalized()
return cmd_obj
# XXX rename to 'get_reinitialized_command()'? (should do the
# same in dist.py, if so)
def reinitialize_command(self, command, reinit_subcommands=0):
return self.distribution.reinitialize_command(command,
reinit_subcommands)
def run_command(self, command):
"""Run some other command: uses the 'run_command()' method of
Distribution, which creates and finalizes the command object if
necessary and then invokes its 'run()' method.
"""
self.distribution.run_command(command)
def get_sub_commands(self):
"""Determine the sub-commands that are relevant in the current
distribution (ie., that need to be run). This is based on the
'sub_commands' class attribute: each tuple in that list may include
a method that we call to determine if the subcommand needs to be
run for the current distribution. Return a list of command names.
"""
commands = []
for (cmd_name, method) in self.sub_commands:
if method is None or method(self):
commands.append(cmd_name)
return commands
# -- External world manipulation -----------------------------------
def warn(self, msg):
log.warn("warning: %s: %s\n" %
(self.get_command_name(), msg))
def execute(self, func, args, msg=None, level=1):
util.execute(func, args, msg, dry_run=self.dry_run)
def mkpath(self, name, mode=0o777):
dir_util.mkpath(name, mode, dry_run=self.dry_run)
def copy_file(self, infile, outfile, preserve_mode=1, preserve_times=1,
link=None, level=1):
"""Copy a file respecting verbose, dry-run and force flags. (The
former two default to whatever is in the Distribution object, and
the latter defaults to false for commands that don't define it.)"""
return file_util.copy_file(infile, outfile, preserve_mode,
preserve_times, not self.force, link,
dry_run=self.dry_run)
def copy_tree(self, infile, outfile, preserve_mode=1, preserve_times=1,
preserve_symlinks=0, level=1):
"""Copy an entire directory tree respecting verbose, dry-run,
and force flags.
"""
return dir_util.copy_tree(infile, outfile, preserve_mode,
preserve_times, preserve_symlinks,
not self.force, dry_run=self.dry_run)
def move_file (self, src, dst, level=1):
"""Move a file respecting dry-run flag."""
return file_util.move_file(src, dst, dry_run=self.dry_run)
def spawn(self, cmd, search_path=1, level=1):
"""Spawn an external command respecting dry-run flag."""
from distutils.spawn import spawn
spawn(cmd, search_path, dry_run=self.dry_run)
def make_archive(self, base_name, format, root_dir=None, base_dir=None,
owner=None, group=None):
return archive_util.make_archive(base_name, format, root_dir, base_dir,
dry_run=self.dry_run,
owner=owner, group=group)
def make_file(self, infiles, outfile, func, args,
exec_msg=None, skip_msg=None, level=1):
"""Special case of 'execute()' for operations that process one or
more input files and generate one output file. Works just like
'execute()', except the operation is skipped and a different
message printed if 'outfile' already exists and is newer than all
files listed in 'infiles'. If the command defined 'self.force',
and it is true, then the command is unconditionally run -- does no
timestamp checks.
"""
if skip_msg is None:
skip_msg = "skipping %s (inputs unchanged)" % outfile
# Allow 'infiles' to be a single string
if isinstance(infiles, str):
infiles = (infiles,)
elif not isinstance(infiles, (list, tuple)):
raise TypeError(
"'infiles' must be a string, or a list or tuple of strings")
if exec_msg is None:
exec_msg = "generating %s from %s" % (outfile, ', '.join(infiles))
# If 'outfile' must be regenerated (either because it doesn't
# exist, is out-of-date, or the 'force' flag is true) then
# perform the action that presumably regenerates it
if self.force or dep_util.newer_group(infiles, outfile):
self.execute(func, args, exec_msg, level)
# Otherwise, print the "skip" message
else:
log.debug(skip_msg)
# XXX 'install_misc' class not currently used -- it was the base class for
# both 'install_scripts' and 'install_data', but they outgrew it. It might
# still be useful for 'install_headers', though, so I'm keeping it around
# for the time being.
class install_misc(Command):
"""Common base class for installing some files in a subdirectory.
Currently used by install_data and install_scripts.
"""
user_options = [('install-dir=', 'd', "directory to install the files to")]
def initialize_options (self):
self.install_dir = None
self.outfiles = []
def _install_dir_from(self, dirname):
self.set_undefined_options('install', (dirname, 'install_dir'))
def _copy_files(self, filelist):
self.outfiles = []
if not filelist:
return
self.mkpath(self.install_dir)
for f in filelist:
self.copy_file(f, self.install_dir)
self.outfiles.append(os.path.join(self.install_dir, f))
def get_outputs(self):
return self.outfiles
"""distutils.pypirc
Provides the PyPIRCCommand class, the base class for the command classes
that uses .pypirc in the distutils.command package.
"""
import os
from configparser import ConfigParser
from distutils.cmd import Command
DEFAULT_PYPIRC = """\
[distutils]
index-servers =
pypi
[pypi]
username:%s
password:%s
"""
class PyPIRCCommand(Command):
"""Base command that knows how to handle the .pypirc file
"""
DEFAULT_REPOSITORY = 'https://upload.pypi.org/legacy/'
DEFAULT_REALM = 'pypi'
repository = None
realm = None
user_options = [
('repository=', 'r',
"url of repository [default: %s]" % \
DEFAULT_REPOSITORY),
('show-response', None,
'display full response text from server')]
boolean_options = ['show-response']
def _get_rc_file(self):
"""Returns rc file path."""
return os.path.join(os.path.expanduser('~'), '.pypirc')
def _store_pypirc(self, username, password):
"""Creates a default .pypirc file."""
rc = self._get_rc_file()
with os.fdopen(os.open(rc, os.O_CREAT | os.O_WRONLY, 0o600), 'w') as f:
f.write(DEFAULT_PYPIRC % (username, password))
def _read_pypirc(self):
"""Reads the .pypirc file."""
rc = self._get_rc_file()
if os.path.exists(rc):
self.announce('Using PyPI login from %s' % rc)
repository = self.repository or self.DEFAULT_REPOSITORY
realm = self.realm or self.DEFAULT_REALM
config = ConfigParser()
config.read(rc)
sections = config.sections()
if 'distutils' in sections:
# let's get the list of servers
index_servers = config.get('distutils', 'index-servers')
_servers = [server.strip() for server in
index_servers.split('\n')
if server.strip() != '']
if _servers == []:
# nothing set, let's try to get the default pypi
if 'pypi' in sections:
_servers = ['pypi']
else:
# the file is not properly defined, returning
# an empty dict
return {}
for server in _servers:
current = {'server': server}
current['username'] = config.get(server, 'username')
# optional params
for key, default in (('repository',
self.DEFAULT_REPOSITORY),
('realm', self.DEFAULT_REALM),
('password', None)):
if config.has_option(server, key):
current[key] = config.get(server, key)
else:
current[key] = default
# work around people having "repository" for the "pypi"
# section of their config set to the HTTP (rather than
# HTTPS) URL
if (server == 'pypi' and
repository in (self.DEFAULT_REPOSITORY, 'pypi')):
current['repository'] = self.DEFAULT_REPOSITORY
return current
if (current['server'] == repository or
current['repository'] == repository):
return current
elif 'server-login' in sections:
# old format
server = 'server-login'
if config.has_option(server, 'repository'):
repository = config.get(server, 'repository')
else:
repository = self.DEFAULT_REPOSITORY
return {'username': config.get(server, 'username'),
'password': config.get(server, 'password'),
'repository': repository,
'server': server,
'realm': self.DEFAULT_REALM}
return {}
def _read_pypi_response(self, response):
"""Read and decode a PyPI HTTP response."""
import cgi
content_type = response.getheader('content-type', 'text/plain')
encoding = cgi.parse_header(content_type)[1].get('charset', 'ascii')
return response.read().decode(encoding)
def initialize_options(self):
"""Initialize options."""
self.repository = None
self.realm = None
self.show_response = 0
def finalize_options(self):
"""Finalizes options."""
if self.repository is None:
self.repository = self.DEFAULT_REPOSITORY
if self.realm is None:
self.realm = self.DEFAULT_REALM
"""distutils.core
The only module that needs to be imported to use the Distutils; provides
the 'setup' function (which is to be called from the setup script). Also
indirectly provides the Distribution and Command classes, although they are
really defined in distutils.dist and distutils.cmd.
"""
import os
import sys
from distutils.debug import DEBUG
from distutils.errors import *
# Mainly import these so setup scripts can "from distutils.core import" them.
from distutils.dist import Distribution
from distutils.cmd import Command
from distutils.config import PyPIRCCommand
from distutils.extension import Extension
# This is a barebones help message generated displayed when the user
# runs the setup script with no arguments at all. More useful help
# is generated with various --help options: global help, list commands,
# and per-command help.
USAGE = """\
usage: %(script)s [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: %(script)s --help [cmd1 cmd2 ...]
or: %(script)s --help-commands
or: %(script)s cmd --help
"""
def gen_usage (script_name):
script = os.path.basename(script_name)
return USAGE % vars()
# Some mild magic to control the behaviour of 'setup()' from 'run_setup()'.
_setup_stop_after = None
_setup_distribution = None
# Legal keyword arguments for the setup() function
setup_keywords = ('distclass', 'script_name', 'script_args', 'options',
'name', 'version', 'author', 'author_email',
'maintainer', 'maintainer_email', 'url', 'license',
'description', 'long_description', 'keywords',
'platforms', 'classifiers', 'download_url',
'requires', 'provides', 'obsoletes',
)
# Legal keyword arguments for the Extension constructor
extension_keywords = ('name', 'sources', 'include_dirs',
'define_macros', 'undef_macros',
'library_dirs', 'libraries', 'runtime_library_dirs',
'extra_objects', 'extra_compile_args', 'extra_link_args',
'swig_opts', 'export_symbols', 'depends', 'language')
def setup (**attrs):
"""The gateway to the Distutils: do everything your setup script needs
to do, in a highly flexible and user-driven way. Briefly: create a
Distribution instance; find and parse config files; parse the command
line; run each Distutils command found there, customized by the options
supplied to 'setup()' (as keyword arguments), in config files, and on
the command line.
The Distribution instance might be an instance of a class supplied via
the 'distclass' keyword argument to 'setup'; if no such class is
supplied, then the Distribution class (in dist.py) is instantiated.
All other arguments to 'setup' (except for 'cmdclass') are used to set
attributes of the Distribution instance.
The 'cmdclass' argument, if supplied, is a dictionary mapping command
names to command classes. Each command encountered on the command line
will be turned into a command class, which is in turn instantiated; any
class found in 'cmdclass' is used in place of the default, which is
(for command 'foo_bar') class 'foo_bar' in module
'distutils.command.foo_bar'. The command class must provide a
'user_options' attribute which is a list of option specifiers for
'distutils.fancy_getopt'. Any command-line options between the current
and the next command are used to set attributes of the current command
object.
When the entire command-line has been successfully parsed, calls the
'run()' method on each command object in turn. This method will be
driven entirely by the Distribution object (which each command object
has a reference to, thanks to its constructor), and the
command-specific options that became attributes of each command
object.
"""
global _setup_stop_after, _setup_distribution
# Determine the distribution class -- either caller-supplied or
# our Distribution (see below).
klass = attrs.get('distclass')
if klass:
del attrs['distclass']
else:
klass = Distribution
if 'script_name' not in attrs:
attrs['script_name'] = os.path.basename(sys.argv[0])
if 'script_args' not in attrs:
attrs['script_args'] = sys.argv[1:]
# Create the Distribution instance, using the remaining arguments
# (ie. everything except distclass) to initialize it
try:
_setup_distribution = dist = klass(attrs)
except DistutilsSetupError as msg:
if 'name' not in attrs:
raise SystemExit("error in setup command: %s" % msg)
else:
raise SystemExit("error in %s setup command: %s" % \
(attrs['name'], msg))
if _setup_stop_after == "init":
return dist
# Find and parse the config file(s): they will override options from
# the setup script, but be overridden by the command line.
dist.parse_config_files()
if DEBUG:
print("options (after parsing config files):")
dist.dump_option_dicts()
if _setup_stop_after == "config":
return dist
# Parse the command line and override config files; any
# command-line errors are the end user's fault, so turn them into
# SystemExit to suppress tracebacks.
try:
ok = dist.parse_command_line()
except DistutilsArgError as msg:
raise SystemExit(gen_usage(dist.script_name) + "\nerror: %s" % msg)
if DEBUG:
print("options (after parsing command line):")
dist.dump_option_dicts()
if _setup_stop_after == "commandline":
return dist
# And finally, run all the commands found on the command line.
if ok:
try:
dist.run_commands()
except KeyboardInterrupt:
raise SystemExit("interrupted")
except OSError as exc:
if DEBUG:
sys.stderr.write("error: %s\n" % (exc,))
raise
else:
raise SystemExit("error: %s" % (exc,))
except (DistutilsError,
CCompilerError) as msg:
if DEBUG:
raise
else:
raise SystemExit("error: " + str(msg))
return dist
# setup ()
def run_setup (script_name, script_args=None, stop_after="run"):
"""Run a setup script in a somewhat controlled environment, and
return the Distribution instance that drives things. This is useful
if you need to find out the distribution meta-data (passed as
keyword args from 'script' to 'setup()', or the contents of the
config files or command-line.
'script_name' is a file that will be read and run with 'exec()';
'sys.argv[0]' will be replaced with 'script' for the duration of the
call. 'script_args' is a list of strings; if supplied,
'sys.argv[1:]' will be replaced by 'script_args' for the duration of
the call.
'stop_after' tells 'setup()' when to stop processing; possible
values:
init
stop after the Distribution instance has been created and
populated with the keyword arguments to 'setup()'
config
stop after config files have been parsed (and their data
stored in the Distribution instance)
commandline
stop after the command-line ('sys.argv[1:]' or 'script_args')
have been parsed (and the data stored in the Distribution)
run [default]
stop after all commands have been run (the same as if 'setup()'
had been called in the usual way
Returns the Distribution instance, which provides all information
used to drive the Distutils.
"""
if stop_after not in ('init', 'config', 'commandline', 'run'):
raise ValueError("invalid value for 'stop_after': %r" % (stop_after,))
global _setup_stop_after, _setup_distribution
_setup_stop_after = stop_after
save_argv = sys.argv
g = {'__file__': script_name}
l = {}
try:
try:
sys.argv[0] = script_name
if script_args is not None:
sys.argv[1:] = script_args
with open(script_name, 'rb') as f:
exec(f.read(), g, l)
finally:
sys.argv = save_argv
_setup_stop_after = None
except SystemExit:
# Hmm, should we do something if exiting with a non-zero code
# (ie. error)?
pass
except:
raise
if _setup_distribution is None:
raise RuntimeError(("'distutils.core.setup()' was never called -- "
"perhaps '%s' is not a Distutils setup script?") % \
script_name)
# I wonder if the setup script's namespace -- g and l -- would be of
# any interest to callers?
#print "_setup_distribution:", _setup_distribution
return _setup_distribution
# run_setup ()
"""distutils.cygwinccompiler
Provides the CygwinCCompiler class, a subclass of UnixCCompiler that
handles the Cygwin port of the GNU C compiler to Windows. It also contains
the Mingw32CCompiler class which handles the mingw32 port of GCC (same as
cygwin in no-cygwin mode).
"""
# problems:
#
# * if you use a msvc compiled python version (1.5.2)
# 1. you have to insert a __GNUC__ section in its config.h
# 2. you have to generate an import library for its dll
# - create a def-file for python??.dll
# - create an import library using
# dlltool --dllname python15.dll --def python15.def \
# --output-lib libpython15.a
#
# see also http://starship.python.net/crew/kernr/mingw32/Notes.html
#
# * We put export_symbols in a def-file, and don't use
# --export-all-symbols because it doesn't worked reliable in some
# tested configurations. And because other windows compilers also
# need their symbols specified this no serious problem.
#
# tested configurations:
#
# * cygwin gcc 2.91.57/ld 2.9.4/dllwrap 0.2.4 works
# (after patching python's config.h and for C++ some other include files)
# see also http://starship.python.net/crew/kernr/mingw32/Notes.html
# * mingw32 gcc 2.95.2/ld 2.9.4/dllwrap 0.2.4 works
# (ld doesn't support -shared, so we use dllwrap)
# * cygwin gcc 2.95.2/ld 2.10.90/dllwrap 2.10.90 works now
# - its dllwrap doesn't work, there is a bug in binutils 2.10.90
# see also http://sources.redhat.com/ml/cygwin/2000-06/msg01274.html
# - using gcc -mdll instead dllwrap doesn't work without -static because
# it tries to link against dlls instead their import libraries. (If
# it finds the dll first.)
# By specifying -static we force ld to link against the import libraries,
# this is windows standard and there are normally not the necessary symbols
# in the dlls.
# *** only the version of June 2000 shows these problems
# * cygwin gcc 3.2/ld 2.13.90 works
# (ld supports -shared)
# * mingw gcc 3.2/ld 2.13 works
# (ld supports -shared)
import os
import sys
import copy
from subprocess import Popen, PIPE, check_output
import re
from distutils.ccompiler import gen_preprocess_options, gen_lib_options
from distutils.unixccompiler import UnixCCompiler
from distutils.file_util import write_file
from distutils.errors import (DistutilsExecError, CCompilerError,
CompileError, UnknownFileError)
from distutils import log
from distutils.version import LooseVersion
from distutils.spawn import find_executable
def get_msvcr():
"""Include the appropriate MSVC runtime library if Python was built
with MSVC 7.0 or later.
"""
msc_pos = sys.version.find('MSC v.')
if msc_pos != -1:
msc_ver = sys.version[msc_pos+6:msc_pos+10]
if msc_ver == '1300':
# MSVC 7.0
return ['msvcr70']
elif msc_ver == '1310':
# MSVC 7.1
return ['msvcr71']
elif msc_ver == '1400':
# VS2005 / MSVC 8.0
return ['msvcr80']
elif msc_ver == '1500':
# VS2008 / MSVC 9.0
return ['msvcr90']
elif msc_ver == '1600':
# VS2010 / MSVC 10.0
return ['msvcr100']
else:
raise ValueError("Unknown MS Compiler version %s " % msc_ver)
class CygwinCCompiler(UnixCCompiler):
""" Handles the Cygwin port of the GNU C compiler to Windows.
"""
compiler_type = 'cygwin'
obj_extension = ".o"
static_lib_extension = ".a"
shared_lib_extension = ".dll"
static_lib_format = "lib%s%s"
shared_lib_format = "%s%s"
exe_extension = ".exe"
def __init__(self, verbose=0, dry_run=0, force=0):
UnixCCompiler.__init__(self, verbose, dry_run, force)
status, details = check_config_h()
self.debug_print("Python's GCC status: %s (details: %s)" %
(status, details))
if status is not CONFIG_H_OK:
self.warn(
"Python's pyconfig.h doesn't seem to support your compiler. "
"Reason: %s. "
"Compiling may fail because of undefined preprocessor macros."
% details)
self.gcc_version, self.ld_version, self.dllwrap_version = \
get_versions()
self.debug_print(self.compiler_type + ": gcc %s, ld %s, dllwrap %s\n" %
(self.gcc_version,
self.ld_version,
self.dllwrap_version) )
# ld_version >= "2.10.90" and < "2.13" should also be able to use
# gcc -mdll instead of dllwrap
# Older dllwraps had own version numbers, newer ones use the
# same as the rest of binutils ( also ld )
# dllwrap 2.10.90 is buggy
if self.ld_version >= "2.10.90":
self.linker_dll = "gcc"
else:
self.linker_dll = "dllwrap"
# ld_version >= "2.13" support -shared so use it instead of
# -mdll -static
if self.ld_version >= "2.13":
shared_option = "-shared"
else:
shared_option = "-mdll -static"
# Hard-code GCC because that's what this is all about.
# XXX optimization, warnings etc. should be customizable.
self.set_executables(compiler='gcc -mcygwin -O -Wall',
compiler_so='gcc -mcygwin -mdll -O -Wall',
compiler_cxx='g++ -mcygwin -O -Wall',
linker_exe='gcc -mcygwin',
linker_so=('%s -mcygwin %s' %
(self.linker_dll, shared_option)))
# cygwin and mingw32 need different sets of libraries
if self.gcc_version == "2.91.57":
# cygwin shouldn't need msvcrt, but without the dlls will crash
# (gcc version 2.91.57) -- perhaps something about initialization
self.dll_libraries=["msvcrt"]
self.warn(
"Consider upgrading to a newer version of gcc")
else:
# Include the appropriate MSVC runtime library if Python was built
# with MSVC 7.0 or later.
self.dll_libraries = get_msvcr()
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
"""Compiles the source by spawning GCC and windres if needed."""
if ext == '.rc' or ext == '.res':
# gcc needs '.res' and '.rc' compiled to object files !!!
try:
self.spawn(["windres", "-i", src, "-o", obj])
except DistutilsExecError as msg:
raise CompileError(msg)
else: # for other files use the C-compiler
try:
self.spawn(self.compiler_so + cc_args + [src, '-o', obj] +
extra_postargs)
except DistutilsExecError as msg:
raise CompileError(msg)
def link(self, target_desc, objects, output_filename, output_dir=None,
libraries=None, library_dirs=None, runtime_library_dirs=None,
export_symbols=None, debug=0, extra_preargs=None,
extra_postargs=None, build_temp=None, target_lang=None):
"""Link the objects."""
# use separate copies, so we can modify the lists
extra_preargs = copy.copy(extra_preargs or [])
libraries = copy.copy(libraries or [])
objects = copy.copy(objects or [])
# Additional libraries
libraries.extend(self.dll_libraries)
# handle export symbols by creating a def-file
# with executables this only works with gcc/ld as linker
if ((export_symbols is not None) and
(target_desc != self.EXECUTABLE or self.linker_dll == "gcc")):
# (The linker doesn't do anything if output is up-to-date.
# So it would probably better to check if we really need this,
# but for this we had to insert some unchanged parts of
# UnixCCompiler, and this is not what we want.)
# we want to put some files in the same directory as the
# object files are, build_temp doesn't help much
# where are the object files
temp_dir = os.path.dirname(objects[0])
# name of dll to give the helper files the same base name
(dll_name, dll_extension) = os.path.splitext(
os.path.basename(output_filename))
# generate the filenames for these files
def_file = os.path.join(temp_dir, dll_name + ".def")
lib_file = os.path.join(temp_dir, 'lib' + dll_name + ".a")
# Generate .def file
contents = [
"LIBRARY %s" % os.path.basename(output_filename),
"EXPORTS"]
for sym in export_symbols:
contents.append(sym)
self.execute(write_file, (def_file, contents),
"writing %s" % def_file)
# next add options for def-file and to creating import libraries
# dllwrap uses different options than gcc/ld
if self.linker_dll == "dllwrap":
extra_preargs.extend(["--output-lib", lib_file])
# for dllwrap we have to use a special option
extra_preargs.extend(["--def", def_file])
# we use gcc/ld here and can be sure ld is >= 2.9.10
else:
# doesn't work: bfd_close build\...\libfoo.a: Invalid operation
#extra_preargs.extend(["-Wl,--out-implib,%s" % lib_file])
# for gcc/ld the def-file is specified as any object files
objects.append(def_file)
#end: if ((export_symbols is not None) and
# (target_desc != self.EXECUTABLE or self.linker_dll == "gcc")):
# who wants symbols and a many times larger output file
# should explicitly switch the debug mode on
# otherwise we let dllwrap/ld strip the output file
# (On my machine: 10KB < stripped_file < ??100KB
# unstripped_file = stripped_file + XXX KB
# ( XXX=254 for a typical python extension))
if not debug:
extra_preargs.append("-s")
UnixCCompiler.link(self, target_desc, objects, output_filename,
output_dir, libraries, library_dirs,
runtime_library_dirs,
None, # export_symbols, we do this in our def-file
debug, extra_preargs, extra_postargs, build_temp,
target_lang)
# -- Miscellaneous methods -----------------------------------------
def object_filenames(self, source_filenames, strip_dir=0, output_dir=''):
"""Adds supports for rc and res files."""
if output_dir is None:
output_dir = ''
obj_names = []
for src_name in source_filenames:
# use normcase to make sure '.rc' is really '.rc' and not '.RC'
base, ext = os.path.splitext(os.path.normcase(src_name))
if ext not in (self.src_extensions + ['.rc','.res']):
raise UnknownFileError("unknown file type '%s' (from '%s')" % \
(ext, src_name))
if strip_dir:
base = os.path.basename (base)
if ext in ('.res', '.rc'):
# these need to be compiled to object files
obj_names.append (os.path.join(output_dir,
base + ext + self.obj_extension))
else:
obj_names.append (os.path.join(output_dir,
base + self.obj_extension))
return obj_names
# the same as cygwin plus some additional parameters
class Mingw32CCompiler(CygwinCCompiler):
""" Handles the Mingw32 port of the GNU C compiler to Windows.
"""
compiler_type = 'mingw32'
def __init__(self, verbose=0, dry_run=0, force=0):
CygwinCCompiler.__init__ (self, verbose, dry_run, force)
# ld_version >= "2.13" support -shared so use it instead of
# -mdll -static
if self.ld_version >= "2.13":
shared_option = "-shared"
else:
shared_option = "-mdll -static"
# A real mingw32 doesn't need to specify a different entry point,
# but cygwin 2.91.57 in no-cygwin-mode needs it.
if self.gcc_version <= "2.91.57":
entry_point = '--entry _DllMain@12'
else:
entry_point = ''
if is_cygwingcc():
raise CCompilerError(
'Cygwin gcc cannot be used with --compiler=mingw32')
self.set_executables(compiler='gcc -O -Wall',
compiler_so='gcc -mdll -O -Wall',
compiler_cxx='g++ -O -Wall',
linker_exe='gcc',
linker_so='%s %s %s'
% (self.linker_dll, shared_option,
entry_point))
# Maybe we should also append -mthreads, but then the finished
# dlls need another dll (mingwm10.dll see Mingw32 docs)
# (-mthreads: Support thread-safe exception handling on `Mingw32')
# no additional libraries needed
self.dll_libraries=[]
# Include the appropriate MSVC runtime library if Python was built
# with MSVC 7.0 or later.
self.dll_libraries = get_msvcr()
# Because these compilers aren't configured in Python's pyconfig.h file by
# default, we should at least warn the user if he is using an unmodified
# version.
CONFIG_H_OK = "ok"
CONFIG_H_NOTOK = "not ok"
CONFIG_H_UNCERTAIN = "uncertain"
def check_config_h():
"""Check if the current Python installation appears amenable to building
extensions with GCC.
Returns a tuple (status, details), where 'status' is one of the following
constants:
- CONFIG_H_OK: all is well, go ahead and compile
- CONFIG_H_NOTOK: doesn't look good
- CONFIG_H_UNCERTAIN: not sure -- unable to read pyconfig.h
'details' is a human-readable string explaining the situation.
Note there are two ways to conclude "OK": either 'sys.version' contains
the string "GCC" (implying that this Python was built with GCC), or the
installed "pyconfig.h" contains the string "__GNUC__".
"""
# XXX since this function also checks sys.version, it's not strictly a
# "pyconfig.h" check -- should probably be renamed...
from distutils import sysconfig
# if sys.version contains GCC then python was compiled with GCC, and the
# pyconfig.h file should be OK
if "GCC" in sys.version:
return CONFIG_H_OK, "sys.version mentions 'GCC'"
# let's see if __GNUC__ is mentioned in python.h
fn = sysconfig.get_config_h_filename()
try:
config_h = open(fn)
try:
if "__GNUC__" in config_h.read():
return CONFIG_H_OK, "'%s' mentions '__GNUC__'" % fn
else:
return CONFIG_H_NOTOK, "'%s' does not mention '__GNUC__'" % fn
finally:
config_h.close()
except OSError as exc:
return (CONFIG_H_UNCERTAIN,
"couldn't read '%s': %s" % (fn, exc.strerror))
RE_VERSION = re.compile(b'(\d+\.\d+(\.\d+)*)')
def _find_exe_version(cmd):
"""Find the version of an executable by running `cmd` in the shell.
If the command is not found, or the output does not match
`RE_VERSION`, returns None.
"""
executable = cmd.split()[0]
if find_executable(executable) is None:
return None
out = Popen(cmd, shell=True, stdout=PIPE).stdout
try:
out_string = out.read()
finally:
out.close()
result = RE_VERSION.search(out_string)
if result is None:
return None
# LooseVersion works with strings
# so we need to decode our bytes
return LooseVersion(result.group(1).decode())
def get_versions():
""" Try to find out the versions of gcc, ld and dllwrap.
If not possible it returns None for it.
"""
commands = ['gcc -dumpversion', 'ld -v', 'dllwrap --version']
return tuple([_find_exe_version(cmd) for cmd in commands])
def is_cygwingcc():
'''Try to determine if the gcc that would be used is from cygwin.'''
out_string = check_output(['gcc', '-dumpmachine'])
return out_string.strip().endswith(b'cygwin')
import os
# If DISTUTILS_DEBUG is anything other than the empty string, we run in
# debug mode.
DEBUG = os.environ.get('DISTUTILS_DEBUG')
"""distutils.dep_util
Utility functions for simple, timestamp-based dependency of files
and groups of files; also, function based entirely on such
timestamp dependency analysis."""
import os
from distutils.errors import DistutilsFileError
def newer (source, target):
"""Return true if 'source' exists and is more recently modified than
'target', or if 'source' exists and 'target' doesn't. Return false if
both exist and 'target' is the same age or younger than 'source'.
Raise DistutilsFileError if 'source' does not exist.
"""
if not os.path.exists(source):
raise DistutilsFileError("file '%s' does not exist" %
os.path.abspath(source))
if not os.path.exists(target):
return 1
from stat import ST_MTIME
mtime1 = os.stat(source)[ST_MTIME]
mtime2 = os.stat(target)[ST_MTIME]
return mtime1 > mtime2
# newer ()
def newer_pairwise (sources, targets):
"""Walk two filename lists in parallel, testing if each source is newer
than its corresponding target. Return a pair of lists (sources,
targets) where source is newer than target, according to the semantics
of 'newer()'.
"""
if len(sources) != len(targets):
raise ValueError("'sources' and 'targets' must be same length")
# build a pair of lists (sources, targets) where source is newer
n_sources = []
n_targets = []
for i in range(len(sources)):
if newer(sources[i], targets[i]):
n_sources.append(sources[i])
n_targets.append(targets[i])
return (n_sources, n_targets)
# newer_pairwise ()
def newer_group (sources, target, missing='error'):
"""Return true if 'target' is out-of-date with respect to any file
listed in 'sources'. In other words, if 'target' exists and is newer
than every file in 'sources', return false; otherwise return true.
'missing' controls what we do when a source file is missing; the
default ("error") is to blow up with an OSError from inside 'stat()';
if it is "ignore", we silently drop any missing source files; if it is
"newer", any missing source files make us assume that 'target' is
out-of-date (this is handy in "dry-run" mode: it'll make you pretend to
carry out commands that wouldn't work because inputs are missing, but
that doesn't matter because you're not actually going to run the
commands).
"""
# If the target doesn't even exist, then it's definitely out-of-date.
if not os.path.exists(target):
return 1
# Otherwise we have to find out the hard way: if *any* source file
# is more recent than 'target', then 'target' is out-of-date and
# we can immediately return true. If we fall through to the end
# of the loop, then 'target' is up-to-date and we return false.
from stat import ST_MTIME
target_mtime = os.stat(target)[ST_MTIME]
for source in sources:
if not os.path.exists(source):
if missing == 'error': # blow up when we stat() the file
pass
elif missing == 'ignore': # missing source dropped from
continue # target's dependency list
elif missing == 'newer': # missing source means target is
return 1 # out-of-date
source_mtime = os.stat(source)[ST_MTIME]
if source_mtime > target_mtime:
return 1
else:
return 0
# newer_group ()
"""distutils.dir_util
Utility functions for manipulating directories and directory trees."""
import os
import errno
from distutils.errors import DistutilsFileError, DistutilsInternalError
from distutils import log
# cache for by mkpath() -- in addition to cheapening redundant calls,
# eliminates redundant "creating /foo/bar/baz" messages in dry-run mode
_path_created = {}
# I don't use os.makedirs because a) it's new to Python 1.5.2, and
# b) it blows up if the directory already exists (I want to silently
# succeed in that case).
def mkpath(name, mode=0o777, verbose=1, dry_run=0):
"""Create a directory and any missing ancestor directories.
If the directory already exists (or if 'name' is the empty string, which
means the current directory, which of course exists), then do nothing.
Raise DistutilsFileError if unable to create some directory along the way
(eg. some sub-path exists, but is a file rather than a directory).
If 'verbose' is true, print a one-line summary of each mkdir to stdout.
Return the list of directories actually created.
"""
global _path_created
# Detect a common bug -- name is None
if not isinstance(name, str):
raise DistutilsInternalError(
"mkpath: 'name' must be a string (got %r)" % (name,))
# XXX what's the better way to handle verbosity? print as we create
# each directory in the path (the current behaviour), or only announce
# the creation of the whole path? (quite easy to do the latter since
# we're not using a recursive algorithm)
name = os.path.normpath(name)
created_dirs = []
if os.path.isdir(name) or name == '':
return created_dirs
if _path_created.get(os.path.abspath(name)):
return created_dirs
(head, tail) = os.path.split(name)
tails = [tail] # stack of lone dirs to create
while head and tail and not os.path.isdir(head):
(head, tail) = os.path.split(head)
tails.insert(0, tail) # push next higher dir onto stack
# now 'head' contains the deepest directory that already exists
# (that is, the child of 'head' in 'name' is the highest directory
# that does *not* exist)
for d in tails:
#print "head = %s, d = %s: " % (head, d),
head = os.path.join(head, d)
abs_head = os.path.abspath(head)
if _path_created.get(abs_head):
continue
if verbose >= 1:
log.info("creating %s", head)
if not dry_run:
try:
os.mkdir(head, mode)
except OSError as exc:
if not (exc.errno == errno.EEXIST and os.path.isdir(head)):
raise DistutilsFileError(
"could not create '%s': %s" % (head, exc.args[-1]))
created_dirs.append(head)
_path_created[abs_head] = 1
return created_dirs
def create_tree(base_dir, files, mode=0o777, verbose=1, dry_run=0):
"""Create all the empty directories under 'base_dir' needed to put 'files'
there.
'base_dir' is just the name of a directory which doesn't necessarily
exist yet; 'files' is a list of filenames to be interpreted relative to
'base_dir'. 'base_dir' + the directory portion of every file in 'files'
will be created if it doesn't already exist. 'mode', 'verbose' and
'dry_run' flags are as for 'mkpath()'.
"""
# First get the list of directories to create
need_dir = set()
for file in files:
need_dir.add(os.path.join(base_dir, os.path.dirname(file)))
# Now create them
for dir in sorted(need_dir):
mkpath(dir, mode, verbose=verbose, dry_run=dry_run)
def copy_tree(src, dst, preserve_mode=1, preserve_times=1,
preserve_symlinks=0, update=0, verbose=1, dry_run=0):
"""Copy an entire directory tree 'src' to a new location 'dst'.
Both 'src' and 'dst' must be directory names. If 'src' is not a
directory, raise DistutilsFileError. If 'dst' does not exist, it is
created with 'mkpath()'. The end result of the copy is that every
file in 'src' is copied to 'dst', and directories under 'src' are
recursively copied to 'dst'. Return the list of files that were
copied or might have been copied, using their output name. The
return value is unaffected by 'update' or 'dry_run': it is simply
the list of all files under 'src', with the names changed to be
under 'dst'.
'preserve_mode' and 'preserve_times' are the same as for
'copy_file'; note that they only apply to regular files, not to
directories. If 'preserve_symlinks' is true, symlinks will be
copied as symlinks (on platforms that support them!); otherwise
(the default), the destination of the symlink will be copied.
'update' and 'verbose' are the same as for 'copy_file'.
"""
from distutils.file_util import copy_file
if not dry_run and not os.path.isdir(src):
raise DistutilsFileError(
"cannot copy tree '%s': not a directory" % src)
try:
names = os.listdir(src)
except OSError as e:
if dry_run:
names = []
else:
raise DistutilsFileError(
"error listing files in '%s': %s" % (src, e.strerror))
if not dry_run:
mkpath(dst, verbose=verbose)
outputs = []
for n in names:
src_name = os.path.join(src, n)
dst_name = os.path.join(dst, n)
if n.startswith('.nfs'):
# skip NFS rename files
continue
if preserve_symlinks and os.path.islink(src_name):
link_dest = os.readlink(src_name)
if verbose >= 1:
log.info("linking %s -> %s", dst_name, link_dest)
if not dry_run:
os.symlink(link_dest, dst_name)
outputs.append(dst_name)
elif os.path.isdir(src_name):
outputs.extend(
copy_tree(src_name, dst_name, preserve_mode,
preserve_times, preserve_symlinks, update,
verbose=verbose, dry_run=dry_run))
else:
copy_file(src_name, dst_name, preserve_mode,
preserve_times, update, verbose=verbose,
dry_run=dry_run)
outputs.append(dst_name)
return outputs
def _build_cmdtuple(path, cmdtuples):
"""Helper for remove_tree()."""
for f in os.listdir(path):
real_f = os.path.join(path,f)
if os.path.isdir(real_f) and not os.path.islink(real_f):
_build_cmdtuple(real_f, cmdtuples)
else:
cmdtuples.append((os.remove, real_f))
cmdtuples.append((os.rmdir, path))
def remove_tree(directory, verbose=1, dry_run=0):
"""Recursively remove an entire directory tree.
Any errors are ignored (apart from being reported to stdout if 'verbose'
is true).
"""
global _path_created
if verbose >= 1:
log.info("removing '%s' (and everything under it)", directory)
if dry_run:
return
cmdtuples = []
_build_cmdtuple(directory, cmdtuples)
for cmd in cmdtuples:
try:
cmd[0](cmd[1])
# remove dir from cache if it's already there
abspath = os.path.abspath(cmd[1])
if abspath in _path_created:
del _path_created[abspath]
except OSError as exc:
log.warn("error removing %s: %s", directory, exc)
def ensure_relative(path):
"""Take the full path 'path', and make it a relative path.
This is useful to make 'path' the second argument to os.path.join().
"""
drive, path = os.path.splitdrive(path)
if path[0:1] == os.sep:
path = drive + path[1:]
return path
"""distutils.dist
Provides the Distribution class, which represents the module distribution
being built/installed/distributed.
"""
import sys, os, re
from email import message_from_file
try:
import warnings
except ImportError:
warnings = None
from distutils.errors import *
from distutils.fancy_getopt import FancyGetopt, translate_longopt
from distutils.util import check_environ, strtobool, rfc822_escape
from distutils import log
from distutils.debug import DEBUG
# Regex to define acceptable Distutils command names. This is not *quite*
# the same as a Python NAME -- I don't allow leading underscores. The fact
# that they're very similar is no coincidence; the default naming scheme is
# to look for a Python module named after the command.
command_re = re.compile (r'^[a-zA-Z]([a-zA-Z0-9_]*)$')
class Distribution:
"""The core of the Distutils. Most of the work hiding behind 'setup'
is really done within a Distribution instance, which farms the work out
to the Distutils commands specified on the command line.
Setup scripts will almost never instantiate Distribution directly,
unless the 'setup()' function is totally inadequate to their needs.
However, it is conceivable that a setup script might wish to subclass
Distribution for some specialized purpose, and then pass the subclass
to 'setup()' as the 'distclass' keyword argument. If so, it is
necessary to respect the expectations that 'setup' has of Distribution.
See the code for 'setup()', in core.py, for details.
"""
# 'global_options' describes the command-line options that may be
# supplied to the setup script prior to any actual commands.
# Eg. "./setup.py -n" or "./setup.py --quiet" both take advantage of
# these global options. This list should be kept to a bare minimum,
# since every global option is also valid as a command option -- and we
# don't want to pollute the commands with too many options that they
# have minimal control over.
# The fourth entry for verbose means that it can be repeated.
global_options = [('verbose', 'v', "run verbosely (default)", 1),
('quiet', 'q', "run quietly (turns verbosity off)"),
('dry-run', 'n', "don't actually do anything"),
('help', 'h', "show detailed help message"),
('no-user-cfg', None,
'ignore pydistutils.cfg in your home directory'),
]
# 'common_usage' is a short (2-3 line) string describing the common
# usage of the setup script.
common_usage = """\
Common commands: (see '--help-commands' for more)
setup.py build will build the package underneath 'build/'
setup.py install will install the package
"""
# options that are not propagated to the commands
display_options = [
('help-commands', None,
"list all available commands"),
('name', None,
"print package name"),
('version', 'V',
"print package version"),
('fullname', None,
"print <package name>-<version>"),
('author', None,
"print the author's name"),
('author-email', None,
"print the author's email address"),
('maintainer', None,
"print the maintainer's name"),
('maintainer-email', None,
"print the maintainer's email address"),
('contact', None,
"print the maintainer's name if known, else the author's"),
('contact-email', None,
"print the maintainer's email address if known, else the author's"),
('url', None,
"print the URL for this package"),
('license', None,
"print the license of the package"),
('licence', None,
"alias for --license"),
('description', None,
"print the package description"),
('long-description', None,
"print the long package description"),
('platforms', None,
"print the list of platforms"),
('classifiers', None,
"print the list of classifiers"),
('keywords', None,
"print the list of keywords"),
('provides', None,
"print the list of packages/modules provided"),
('requires', None,
"print the list of packages/modules required"),
('obsoletes', None,
"print the list of packages/modules made obsolete")
]
display_option_names = [translate_longopt(x[0]) for x in display_options]
# negative options are options that exclude other options
negative_opt = {'quiet': 'verbose'}
# -- Creation/initialization methods -------------------------------
def __init__ (self, attrs=None):
"""Construct a new Distribution instance: initialize all the
attributes of a Distribution, and then use 'attrs' (a dictionary
mapping attribute names to values) to assign some of those
attributes their "real" values. (Any attributes not mentioned in
'attrs' will be assigned to some null value: 0, None, an empty list
or dictionary, etc.) Most importantly, initialize the
'command_obj' attribute to the empty dictionary; this will be
filled in with real command objects by 'parse_command_line()'.
"""
# Default values for our command-line options
self.verbose = 1
self.dry_run = 0
self.help = 0
for attr in self.display_option_names:
setattr(self, attr, 0)
# Store the distribution meta-data (name, version, author, and so
# forth) in a separate object -- we're getting to have enough
# information here (and enough command-line options) that it's
# worth it. Also delegate 'get_XXX()' methods to the 'metadata'
# object in a sneaky and underhanded (but efficient!) way.
self.metadata = DistributionMetadata()
for basename in self.metadata._METHOD_BASENAMES:
method_name = "get_" + basename
setattr(self, method_name, getattr(self.metadata, method_name))
# 'cmdclass' maps command names to class objects, so we
# can 1) quickly figure out which class to instantiate when
# we need to create a new command object, and 2) have a way
# for the setup script to override command classes
self.cmdclass = {}
# 'command_packages' is a list of packages in which commands
# are searched for. The factory for command 'foo' is expected
# to be named 'foo' in the module 'foo' in one of the packages
# named here. This list is searched from the left; an error
# is raised if no named package provides the command being
# searched for. (Always access using get_command_packages().)
self.command_packages = None
# 'script_name' and 'script_args' are usually set to sys.argv[0]
# and sys.argv[1:], but they can be overridden when the caller is
# not necessarily a setup script run from the command-line.
self.script_name = None
self.script_args = None
# 'command_options' is where we store command options between
# parsing them (from config files, the command-line, etc.) and when
# they are actually needed -- ie. when the command in question is
# instantiated. It is a dictionary of dictionaries of 2-tuples:
# command_options = { command_name : { option : (source, value) } }
self.command_options = {}
# 'dist_files' is the list of (command, pyversion, file) that
# have been created by any dist commands run so far. This is
# filled regardless of whether the run is dry or not. pyversion
# gives sysconfig.get_python_version() if the dist file is
# specific to a Python version, 'any' if it is good for all
# Python versions on the target platform, and '' for a source
# file. pyversion should not be used to specify minimum or
# maximum required Python versions; use the metainfo for that
# instead.
self.dist_files = []
# These options are really the business of various commands, rather
# than of the Distribution itself. We provide aliases for them in
# Distribution as a convenience to the developer.
self.packages = None
self.package_data = {}
self.package_dir = None
self.py_modules = None
self.libraries = None
self.headers = None
self.ext_modules = None
self.ext_package = None
self.include_dirs = None
self.extra_path = None
self.scripts = None
self.data_files = None
self.password = ''
# And now initialize bookkeeping stuff that can't be supplied by
# the caller at all. 'command_obj' maps command names to
# Command instances -- that's how we enforce that every command
# class is a singleton.
self.command_obj = {}
# 'have_run' maps command names to boolean values; it keeps track
# of whether we have actually run a particular command, to make it
# cheap to "run" a command whenever we think we might need to -- if
# it's already been done, no need for expensive filesystem
# operations, we just check the 'have_run' dictionary and carry on.
# It's only safe to query 'have_run' for a command class that has
# been instantiated -- a false value will be inserted when the
# command object is created, and replaced with a true value when
# the command is successfully run. Thus it's probably best to use
# '.get()' rather than a straight lookup.
self.have_run = {}
# Now we'll use the attrs dictionary (ultimately, keyword args from
# the setup script) to possibly override any or all of these
# distribution options.
if attrs:
# Pull out the set of command options and work on them
# specifically. Note that this order guarantees that aliased
# command options will override any supplied redundantly
# through the general options dictionary.
options = attrs.get('options')
if options is not None:
del attrs['options']
for (command, cmd_options) in options.items():
opt_dict = self.get_option_dict(command)
for (opt, val) in cmd_options.items():
opt_dict[opt] = ("setup script", val)
if 'licence' in attrs:
attrs['license'] = attrs['licence']
del attrs['licence']
msg = "'licence' distribution option is deprecated; use 'license'"
if warnings is not None:
warnings.warn(msg)
else:
sys.stderr.write(msg + "\n")
# Now work on the rest of the attributes. Any attribute that's
# not already defined is invalid!
for (key, val) in attrs.items():
if hasattr(self.metadata, "set_" + key):
getattr(self.metadata, "set_" + key)(val)
elif hasattr(self.metadata, key):
setattr(self.metadata, key, val)
elif hasattr(self, key):
setattr(self, key, val)
else:
msg = "Unknown distribution option: %s" % repr(key)
if warnings is not None:
warnings.warn(msg)
else:
sys.stderr.write(msg + "\n")
# no-user-cfg is handled before other command line args
# because other args override the config files, and this
# one is needed before we can load the config files.
# If attrs['script_args'] wasn't passed, assume false.
#
# This also make sure we just look at the global options
self.want_user_cfg = True
if self.script_args is not None:
for arg in self.script_args:
if not arg.startswith('-'):
break
if arg == '--no-user-cfg':
self.want_user_cfg = False
break
self.finalize_options()
def get_option_dict(self, command):
"""Get the option dictionary for a given command. If that
command's option dictionary hasn't been created yet, then create it
and return the new dictionary; otherwise, return the existing
option dictionary.
"""
dict = self.command_options.get(command)
if dict is None:
dict = self.command_options[command] = {}
return dict
def dump_option_dicts(self, header=None, commands=None, indent=""):
from pprint import pformat
if commands is None: # dump all command option dicts
commands = sorted(self.command_options.keys())
if header is not None:
self.announce(indent + header)
indent = indent + " "
if not commands:
self.announce(indent + "no commands known yet")
return
for cmd_name in commands:
opt_dict = self.command_options.get(cmd_name)
if opt_dict is None:
self.announce(indent +
"no option dict for '%s' command" % cmd_name)
else:
self.announce(indent +
"option dict for '%s' command:" % cmd_name)
out = pformat(opt_dict)
for line in out.split('\n'):
self.announce(indent + " " + line)
# -- Config file finding/parsing methods ---------------------------
def find_config_files(self):
"""Find as many configuration files as should be processed for this
platform, and return a list of filenames in the order in which they
should be parsed. The filenames returned are guaranteed to exist
(modulo nasty race conditions).
There are three possible config files: distutils.cfg in the
Distutils installation directory (ie. where the top-level
Distutils __inst__.py file lives), a file in the user's home
directory named .pydistutils.cfg on Unix and pydistutils.cfg
on Windows/Mac; and setup.cfg in the current directory.
The file in the user's home directory can be disabled with the
--no-user-cfg option.
"""
files = []
check_environ()
# Where to look for the system-wide Distutils config file
sys_dir = os.path.dirname(sys.modules['distutils'].__file__)
# Look for the system config file
sys_file = os.path.join(sys_dir, "distutils.cfg")
if os.path.isfile(sys_file):
files.append(sys_file)
# What to call the per-user config file
if os.name == 'posix':
user_filename = ".pydistutils.cfg"
else:
user_filename = "pydistutils.cfg"
# And look for the user config file
if self.want_user_cfg:
user_file = os.path.join(os.path.expanduser('~'), user_filename)
if os.path.isfile(user_file):
files.append(user_file)
# All platforms support local setup.cfg
local_file = "setup.cfg"
if os.path.isfile(local_file):
files.append(local_file)
if DEBUG:
self.announce("using config files: %s" % ', '.join(files))
return files
def parse_config_files(self, filenames=None):
from configparser import ConfigParser
# Ignore install directory options if we have a venv
if sys.prefix != sys.base_prefix:
ignore_options = [
'install-base', 'install-platbase', 'install-lib',
'install-platlib', 'install-purelib', 'install-headers',
'install-scripts', 'install-data', 'prefix', 'exec-prefix',
'home', 'user', 'root']
else:
ignore_options = []
ignore_options = frozenset(ignore_options)
if filenames is None:
filenames = self.find_config_files()
if DEBUG:
self.announce("Distribution.parse_config_files():")
parser = ConfigParser()
for filename in filenames:
if DEBUG:
self.announce(" reading %s" % filename)
parser.read(filename)
for section in parser.sections():
options = parser.options(section)
opt_dict = self.get_option_dict(section)
for opt in options:
if opt != '__name__' and opt not in ignore_options:
val = parser.get(section,opt)
opt = opt.replace('-', '_')
opt_dict[opt] = (filename, val)
# Make the ConfigParser forget everything (so we retain
# the original filenames that options come from)
parser.__init__()
# If there was a "global" section in the config file, use it
# to set Distribution options.
if 'global' in self.command_options:
for (opt, (src, val)) in self.command_options['global'].items():
alias = self.negative_opt.get(opt)
try:
if alias:
setattr(self, alias, not strtobool(val))
elif opt in ('verbose', 'dry_run'): # ugh!
setattr(self, opt, strtobool(val))
else:
setattr(self, opt, val)
except ValueError as msg:
raise DistutilsOptionError(msg)
# -- Command-line parsing methods ----------------------------------
def parse_command_line(self):
"""Parse the setup script's command line, taken from the
'script_args' instance attribute (which defaults to 'sys.argv[1:]'
-- see 'setup()' in core.py). This list is first processed for
"global options" -- options that set attributes of the Distribution
instance. Then, it is alternately scanned for Distutils commands
and options for that command. Each new command terminates the
options for the previous command. The allowed options for a
command are determined by the 'user_options' attribute of the
command class -- thus, we have to be able to load command classes
in order to parse the command line. Any error in that 'options'
attribute raises DistutilsGetoptError; any error on the
command-line raises DistutilsArgError. If no Distutils commands
were found on the command line, raises DistutilsArgError. Return
true if command-line was successfully parsed and we should carry
on with executing commands; false if no errors but we shouldn't
execute commands (currently, this only happens if user asks for
help).
"""
#
# We now have enough information to show the Macintosh dialog
# that allows the user to interactively specify the "command line".
#
toplevel_options = self._get_toplevel_options()
# We have to parse the command line a bit at a time -- global
# options, then the first command, then its options, and so on --
# because each command will be handled by a different class, and
# the options that are valid for a particular class aren't known
# until we have loaded the command class, which doesn't happen
# until we know what the command is.
self.commands = []
parser = FancyGetopt(toplevel_options + self.display_options)
parser.set_negative_aliases(self.negative_opt)
parser.set_aliases({'licence': 'license'})
args = parser.getopt(args=self.script_args, object=self)
option_order = parser.get_option_order()
log.set_verbosity(self.verbose)
# for display options we return immediately
if self.handle_display_options(option_order):
return
while args:
args = self._parse_command_opts(parser, args)
if args is None: # user asked for help (and got it)
return
# Handle the cases of --help as a "global" option, ie.
# "setup.py --help" and "setup.py --help command ...". For the
# former, we show global options (--verbose, --dry-run, etc.)
# and display-only options (--name, --version, etc.); for the
# latter, we omit the display-only options and show help for
# each command listed on the command line.
if self.help:
self._show_help(parser,
display_options=len(self.commands) == 0,
commands=self.commands)
return
# Oops, no commands found -- an end-user error
if not self.commands:
raise DistutilsArgError("no commands supplied")
# All is well: return true
return True
def _get_toplevel_options(self):
"""Return the non-display options recognized at the top level.
This includes options that are recognized *only* at the top
level as well as options recognized for commands.
"""
return self.global_options + [
("command-packages=", None,
"list of packages that provide distutils commands"),
]
def _parse_command_opts(self, parser, args):
"""Parse the command-line options for a single command.
'parser' must be a FancyGetopt instance; 'args' must be the list
of arguments, starting with the current command (whose options
we are about to parse). Returns a new version of 'args' with
the next command at the front of the list; will be the empty
list if there are no more commands on the command line. Returns
None if the user asked for help on this command.
"""
# late import because of mutual dependence between these modules
from distutils.cmd import Command
# Pull the current command from the head of the command line
command = args[0]
if not command_re.match(command):
raise SystemExit("invalid command name '%s'" % command)
self.commands.append(command)
# Dig up the command class that implements this command, so we
# 1) know that it's a valid command, and 2) know which options
# it takes.
try:
cmd_class = self.get_command_class(command)
except DistutilsModuleError as msg:
raise DistutilsArgError(msg)
# Require that the command class be derived from Command -- want
# to be sure that the basic "command" interface is implemented.
if not issubclass(cmd_class, Command):
raise DistutilsClassError(
"command class %s must subclass Command" % cmd_class)
# Also make sure that the command object provides a list of its
# known options.
if not (hasattr(cmd_class, 'user_options') and
isinstance(cmd_class.user_options, list)):
raise DistutilsClassError(("command class %s must provide " +
"'user_options' attribute (a list of tuples)") % \
cmd_class)
# If the command class has a list of negative alias options,
# merge it in with the global negative aliases.
negative_opt = self.negative_opt
if hasattr(cmd_class, 'negative_opt'):
negative_opt = negative_opt.copy()
negative_opt.update(cmd_class.negative_opt)
# Check for help_options in command class. They have a different
# format (tuple of four) so we need to preprocess them here.
if (hasattr(cmd_class, 'help_options') and
isinstance(cmd_class.help_options, list)):
help_options = fix_help_options(cmd_class.help_options)
else:
help_options = []
# All commands support the global options too, just by adding
# in 'global_options'.
parser.set_option_table(self.global_options +
cmd_class.user_options +
help_options)
parser.set_negative_aliases(negative_opt)
(args, opts) = parser.getopt(args[1:])
if hasattr(opts, 'help') and opts.help:
self._show_help(parser, display_options=0, commands=[cmd_class])
return
if (hasattr(cmd_class, 'help_options') and
isinstance(cmd_class.help_options, list)):
help_option_found=0
for (help_option, short, desc, func) in cmd_class.help_options:
if hasattr(opts, parser.get_attr_name(help_option)):
help_option_found=1
if callable(func):
func()
else:
raise DistutilsClassError(
"invalid help function %r for help option '%s': "
"must be a callable object (function, etc.)"
% (func, help_option))
if help_option_found:
return
# Put the options from the command-line into their official
# holding pen, the 'command_options' dictionary.
opt_dict = self.get_option_dict(command)
for (name, value) in vars(opts).items():
opt_dict[name] = ("command line", value)
return args
def finalize_options(self):
"""Set final values for all the options on the Distribution
instance, analogous to the .finalize_options() method of Command
objects.
"""
for attr in ('keywords', 'platforms'):
value = getattr(self.metadata, attr)
if value is None:
continue
if isinstance(value, str):
value = [elm.strip() for elm in value.split(',')]
setattr(self.metadata, attr, value)
def _show_help(self, parser, global_options=1, display_options=1,
commands=[]):
"""Show help for the setup script command-line in the form of
several lists of command-line options. 'parser' should be a
FancyGetopt instance; do not expect it to be returned in the
same state, as its option table will be reset to make it
generate the correct help text.
If 'global_options' is true, lists the global options:
--verbose, --dry-run, etc. If 'display_options' is true, lists
the "display-only" options: --name, --version, etc. Finally,
lists per-command help for every command name or command class
in 'commands'.
"""
# late import because of mutual dependence between these modules
from distutils.core import gen_usage
from distutils.cmd import Command
if global_options:
if display_options:
options = self._get_toplevel_options()
else:
options = self.global_options
parser.set_option_table(options)
parser.print_help(self.common_usage + "\nGlobal options:")
print('')
if display_options:
parser.set_option_table(self.display_options)
parser.print_help(
"Information display options (just display " +
"information, ignore any commands)")
print('')
for command in self.commands:
if isinstance(command, type) and issubclass(command, Command):
klass = command
else:
klass = self.get_command_class(command)
if (hasattr(klass, 'help_options') and
isinstance(klass.help_options, list)):
parser.set_option_table(klass.user_options +
fix_help_options(klass.help_options))
else:
parser.set_option_table(klass.user_options)
parser.print_help("Options for '%s' command:" % klass.__name__)
print('')
print(gen_usage(self.script_name))
def handle_display_options(self, option_order):
"""If there were any non-global "display-only" options
(--help-commands or the metadata display options) on the command
line, display the requested info and return true; else return
false.
"""
from distutils.core import gen_usage
# User just wants a list of commands -- we'll print it out and stop
# processing now (ie. if they ran "setup --help-commands foo bar",
# we ignore "foo bar").
if self.help_commands:
self.print_commands()
print('')
print(gen_usage(self.script_name))
return 1
# If user supplied any of the "display metadata" options, then
# display that metadata in the order in which the user supplied the
# metadata options.
any_display_options = 0
is_display_option = {}
for option in self.display_options:
is_display_option[option[0]] = 1
for (opt, val) in option_order:
if val and is_display_option.get(opt):
opt = translate_longopt(opt)
value = getattr(self.metadata, "get_"+opt)()
if opt in ['keywords', 'platforms']:
print(','.join(value))
elif opt in ('classifiers', 'provides', 'requires',
'obsoletes'):
print('\n'.join(value))
else:
print(value)
any_display_options = 1
return any_display_options
def print_command_list(self, commands, header, max_length):
"""Print a subset of the list of all commands -- used by
'print_commands()'.
"""
print(header + ":")
for cmd in commands:
klass = self.cmdclass.get(cmd)
if not klass:
klass = self.get_command_class(cmd)
try:
description = klass.description
except AttributeError:
description = "(no description available)"
print(" %-*s %s" % (max_length, cmd, description))
def print_commands(self):
"""Print out a help message listing all available commands with a
description of each. The list is divided into "standard commands"
(listed in distutils.command.__all__) and "extra commands"
(mentioned in self.cmdclass, but not a standard command). The
descriptions come from the command class attribute
'description'.
"""
import distutils.command
std_commands = distutils.command.__all__
is_std = {}
for cmd in std_commands:
is_std[cmd] = 1
extra_commands = []
for cmd in self.cmdclass.keys():
if not is_std.get(cmd):
extra_commands.append(cmd)
max_length = 0
for cmd in (std_commands + extra_commands):
if len(cmd) > max_length:
max_length = len(cmd)
self.print_command_list(std_commands,
"Standard commands",
max_length)
if extra_commands:
print()
self.print_command_list(extra_commands,
"Extra commands",
max_length)
def get_command_list(self):
"""Get a list of (command, description) tuples.
The list is divided into "standard commands" (listed in
distutils.command.__all__) and "extra commands" (mentioned in
self.cmdclass, but not a standard command). The descriptions come
from the command class attribute 'description'.
"""
# Currently this is only used on Mac OS, for the Mac-only GUI
# Distutils interface (by Jack Jansen)
import distutils.command
std_commands = distutils.command.__all__
is_std = {}
for cmd in std_commands:
is_std[cmd] = 1
extra_commands = []
for cmd in self.cmdclass.keys():
if not is_std.get(cmd):
extra_commands.append(cmd)
rv = []
for cmd in (std_commands + extra_commands):
klass = self.cmdclass.get(cmd)
if not klass:
klass = self.get_command_class(cmd)
try:
description = klass.description
except AttributeError:
description = "(no description available)"
rv.append((cmd, description))
return rv
# -- Command class/object methods ----------------------------------
def get_command_packages(self):
"""Return a list of packages from which commands are loaded."""
pkgs = self.command_packages
if not isinstance(pkgs, list):
if pkgs is None:
pkgs = ''
pkgs = [pkg.strip() for pkg in pkgs.split(',') if pkg != '']
if "distutils.command" not in pkgs:
pkgs.insert(0, "distutils.command")
self.command_packages = pkgs
return pkgs
def get_command_class(self, command):
"""Return the class that implements the Distutils command named by
'command'. First we check the 'cmdclass' dictionary; if the
command is mentioned there, we fetch the class object from the
dictionary and return it. Otherwise we load the command module
("distutils.command." + command) and fetch the command class from
the module. The loaded class is also stored in 'cmdclass'
to speed future calls to 'get_command_class()'.
Raises DistutilsModuleError if the expected module could not be
found, or if that module does not define the expected class.
"""
klass = self.cmdclass.get(command)
if klass:
return klass
for pkgname in self.get_command_packages():
module_name = "%s.%s" % (pkgname, command)
klass_name = command
try:
__import__ (module_name)
module = sys.modules[module_name]
except ImportError:
continue
try:
klass = getattr(module, klass_name)
except AttributeError:
raise DistutilsModuleError(
"invalid command '%s' (no class '%s' in module '%s')"
% (command, klass_name, module_name))
self.cmdclass[command] = klass
return klass
raise DistutilsModuleError("invalid command '%s'" % command)
def get_command_obj(self, command, create=1):
"""Return the command object for 'command'. Normally this object
is cached on a previous call to 'get_command_obj()'; if no command
object for 'command' is in the cache, then we either create and
return it (if 'create' is true) or return None.
"""
cmd_obj = self.command_obj.get(command)
if not cmd_obj and create:
if DEBUG:
self.announce("Distribution.get_command_obj(): " \
"creating '%s' command object" % command)
klass = self.get_command_class(command)
cmd_obj = self.command_obj[command] = klass(self)
self.have_run[command] = 0
# Set any options that were supplied in config files
# or on the command line. (NB. support for error
# reporting is lame here: any errors aren't reported
# until 'finalize_options()' is called, which means
# we won't report the source of the error.)
options = self.command_options.get(command)
if options:
self._set_command_options(cmd_obj, options)
return cmd_obj
def _set_command_options(self, command_obj, option_dict=None):
"""Set the options for 'command_obj' from 'option_dict'. Basically
this means copying elements of a dictionary ('option_dict') to
attributes of an instance ('command').
'command_obj' must be a Command instance. If 'option_dict' is not
supplied, uses the standard option dictionary for this command
(from 'self.command_options').
"""
command_name = command_obj.get_command_name()
if option_dict is None:
option_dict = self.get_option_dict(command_name)
if DEBUG:
self.announce(" setting options for '%s' command:" % command_name)
for (option, (source, value)) in option_dict.items():
if DEBUG:
self.announce(" %s = %s (from %s)" % (option, value,
source))
try:
bool_opts = [translate_longopt(o)
for o in command_obj.boolean_options]
except AttributeError:
bool_opts = []
try:
neg_opt = command_obj.negative_opt
except AttributeError:
neg_opt = {}
try:
is_string = isinstance(value, str)
if option in neg_opt and is_string:
setattr(command_obj, neg_opt[option], not strtobool(value))
elif option in bool_opts and is_string:
setattr(command_obj, option, strtobool(value))
elif hasattr(command_obj, option):
setattr(command_obj, option, value)
else:
raise DistutilsOptionError(
"error in %s: command '%s' has no such option '%s'"
% (source, command_name, option))
except ValueError as msg:
raise DistutilsOptionError(msg)
def reinitialize_command(self, command, reinit_subcommands=0):
"""Reinitializes a command to the state it was in when first
returned by 'get_command_obj()': ie., initialized but not yet
finalized. This provides the opportunity to sneak option
values in programmatically, overriding or supplementing
user-supplied values from the config files and command line.
You'll have to re-finalize the command object (by calling
'finalize_options()' or 'ensure_finalized()') before using it for
real.
'command' should be a command name (string) or command object. If
'reinit_subcommands' is true, also reinitializes the command's
sub-commands, as declared by the 'sub_commands' class attribute (if
it has one). See the "install" command for an example. Only
reinitializes the sub-commands that actually matter, ie. those
whose test predicates return true.
Returns the reinitialized command object.
"""
from distutils.cmd import Command
if not isinstance(command, Command):
command_name = command
command = self.get_command_obj(command_name)
else:
command_name = command.get_command_name()
if not command.finalized:
return command
command.initialize_options()
command.finalized = 0
self.have_run[command_name] = 0
self._set_command_options(command)
if reinit_subcommands:
for sub in command.get_sub_commands():
self.reinitialize_command(sub, reinit_subcommands)
return command
# -- Methods that operate on the Distribution ----------------------
def announce(self, msg, level=log.INFO):
log.log(level, msg)
def run_commands(self):
"""Run each command that was seen on the setup script command line.
Uses the list of commands found and cache of command objects
created by 'get_command_obj()'.
"""
for cmd in self.commands:
self.run_command(cmd)
# -- Methods that operate on its Commands --------------------------
def run_command(self, command):
"""Do whatever it takes to run a command (including nothing at all,
if the command has already been run). Specifically: if we have
already created and run the command named by 'command', return
silently without doing anything. If the command named by 'command'
doesn't even have a command object yet, create one. Then invoke
'run()' on that command object (or an existing one).
"""
# Already been here, done that? then return silently.
if self.have_run.get(command):
return
log.info("running %s", command)
cmd_obj = self.get_command_obj(command)
cmd_obj.ensure_finalized()
cmd_obj.run()
self.have_run[command] = 1
# -- Distribution query methods ------------------------------------
def has_pure_modules(self):
return len(self.packages or self.py_modules or []) > 0
def has_ext_modules(self):
return self.ext_modules and len(self.ext_modules) > 0
def has_c_libraries(self):
return self.libraries and len(self.libraries) > 0
def has_modules(self):
return self.has_pure_modules() or self.has_ext_modules()
def has_headers(self):
return self.headers and len(self.headers) > 0
def has_scripts(self):
return self.scripts and len(self.scripts) > 0
def has_data_files(self):
return self.data_files and len(self.data_files) > 0
def is_pure(self):
return (self.has_pure_modules() and
not self.has_ext_modules() and
not self.has_c_libraries())
# -- Metadata query methods ----------------------------------------
# If you're looking for 'get_name()', 'get_version()', and so forth,
# they are defined in a sneaky way: the constructor binds self.get_XXX
# to self.metadata.get_XXX. The actual code is in the
# DistributionMetadata class, below.
class DistributionMetadata:
"""Dummy class to hold the distribution meta-data: name, version,
author, and so forth.
"""
_METHOD_BASENAMES = ("name", "version", "author", "author_email",
"maintainer", "maintainer_email", "url",
"license", "description", "long_description",
"keywords", "platforms", "fullname", "contact",
"contact_email", "license", "classifiers",
"download_url",
# PEP 314
"provides", "requires", "obsoletes",
)
def __init__(self, path=None):
if path is not None:
self.read_pkg_file(open(path))
else:
self.name = None
self.version = None
self.author = None
self.author_email = None
self.maintainer = None
self.maintainer_email = None
self.url = None
self.license = None
self.description = None
self.long_description = None
self.keywords = None
self.platforms = None
self.classifiers = None
self.download_url = None
# PEP 314
self.provides = None
self.requires = None
self.obsoletes = None
def read_pkg_file(self, file):
"""Reads the metadata values from a file object."""
msg = message_from_file(file)
def _read_field(name):
value = msg[name]
if value == 'UNKNOWN':
return None
return value
def _read_list(name):
values = msg.get_all(name, None)
if values == []:
return None
return values
metadata_version = msg['metadata-version']
self.name = _read_field('name')
self.version = _read_field('version')
self.description = _read_field('summary')
# we are filling author only.
self.author = _read_field('author')
self.maintainer = None
self.author_email = _read_field('author-email')
self.maintainer_email = None
self.url = _read_field('home-page')
self.license = _read_field('license')
if 'download-url' in msg:
self.download_url = _read_field('download-url')
else:
self.download_url = None
self.long_description = _read_field('description')
self.description = _read_field('summary')
if 'keywords' in msg:
self.keywords = _read_field('keywords').split(',')
self.platforms = _read_list('platform')
self.classifiers = _read_list('classifier')
# PEP 314 - these fields only exist in 1.1
if metadata_version == '1.1':
self.requires = _read_list('requires')
self.provides = _read_list('provides')
self.obsoletes = _read_list('obsoletes')
else:
self.requires = None
self.provides = None
self.obsoletes = None
def write_pkg_info(self, base_dir):
"""Write the PKG-INFO file into the release tree.
"""
with open(os.path.join(base_dir, 'PKG-INFO'), 'w',
encoding='UTF-8') as pkg_info:
self.write_pkg_file(pkg_info)
def write_pkg_file(self, file):
"""Write the PKG-INFO format data to a file object.
"""
version = '1.0'
if (self.provides or self.requires or self.obsoletes or
self.classifiers or self.download_url):
version = '1.1'
file.write('Metadata-Version: %s\n' % version)
file.write('Name: %s\n' % self.get_name() )
file.write('Version: %s\n' % self.get_version() )
file.write('Summary: %s\n' % self.get_description() )
file.write('Home-page: %s\n' % self.get_url() )
file.write('Author: %s\n' % self.get_contact() )
file.write('Author-email: %s\n' % self.get_contact_email() )
file.write('License: %s\n' % self.get_license() )
if self.download_url:
file.write('Download-URL: %s\n' % self.download_url)
long_desc = rfc822_escape(self.get_long_description())
file.write('Description: %s\n' % long_desc)
keywords = ','.join(self.get_keywords())
if keywords:
file.write('Keywords: %s\n' % keywords )
self._write_list(file, 'Platform', self.get_platforms())
self._write_list(file, 'Classifier', self.get_classifiers())
# PEP 314
self._write_list(file, 'Requires', self.get_requires())
self._write_list(file, 'Provides', self.get_provides())
self._write_list(file, 'Obsoletes', self.get_obsoletes())
def _write_list(self, file, name, values):
for value in values:
file.write('%s: %s\n' % (name, value))
# -- Metadata query methods ----------------------------------------
def get_name(self):
return self.name or "UNKNOWN"
def get_version(self):
return self.version or "0.0.0"
def get_fullname(self):
return "%s-%s" % (self.get_name(), self.get_version())
def get_author(self):
return self.author or "UNKNOWN"
def get_author_email(self):
return self.author_email or "UNKNOWN"
def get_maintainer(self):
return self.maintainer or "UNKNOWN"
def get_maintainer_email(self):
return self.maintainer_email or "UNKNOWN"
def get_contact(self):
return self.maintainer or self.author or "UNKNOWN"
def get_contact_email(self):
return self.maintainer_email or self.author_email or "UNKNOWN"
def get_url(self):
return self.url or "UNKNOWN"
def get_license(self):
return self.license or "UNKNOWN"
get_licence = get_license
def get_description(self):
return self.description or "UNKNOWN"
def get_long_description(self):
return self.long_description or "UNKNOWN"
def get_keywords(self):
return self.keywords or []
def get_platforms(self):
return self.platforms or ["UNKNOWN"]
def get_classifiers(self):
return self.classifiers or []
def get_download_url(self):
return self.download_url or "UNKNOWN"
# PEP 314
def get_requires(self):
return self.requires or []
def set_requires(self, value):
import distutils.versionpredicate
for v in value:
distutils.versionpredicate.VersionPredicate(v)
self.requires = value
def get_provides(self):
return self.provides or []
def set_provides(self, value):
value = [v.strip() for v in value]
for v in value:
import distutils.versionpredicate
distutils.versionpredicate.split_provision(v)
self.provides = value
def get_obsoletes(self):
return self.obsoletes or []
def set_obsoletes(self, value):
import distutils.versionpredicate
for v in value:
distutils.versionpredicate.VersionPredicate(v)
self.obsoletes = value
def fix_help_options(options):
"""Convert a 4-tuple 'help_options' list as found in various command
classes to the 3-tuple form required by FancyGetopt.
"""
new_options = []
for help_tuple in options:
new_options.append(help_tuple[0:3])
return new_options
"""distutils.errors
Provides exceptions used by the Distutils modules. Note that Distutils
modules may raise standard exceptions; in particular, SystemExit is
usually raised for errors that are obviously the end-user's fault
(eg. bad command-line arguments).
This module is safe to use in "from ... import *" mode; it only exports
symbols whose names start with "Distutils" and end with "Error"."""
class DistutilsError (Exception):
"""The root of all Distutils evil."""
pass
class DistutilsModuleError (DistutilsError):
"""Unable to load an expected module, or to find an expected class
within some module (in particular, command modules and classes)."""
pass
class DistutilsClassError (DistutilsError):
"""Some command class (or possibly distribution class, if anyone
feels a need to subclass Distribution) is found not to be holding
up its end of the bargain, ie. implementing some part of the
"command "interface."""
pass
class DistutilsGetoptError (DistutilsError):
"""The option table provided to 'fancy_getopt()' is bogus."""
pass
class DistutilsArgError (DistutilsError):
"""Raised by fancy_getopt in response to getopt.error -- ie. an
error in the command line usage."""
pass
class DistutilsFileError (DistutilsError):
"""Any problems in the filesystem: expected file not found, etc.
Typically this is for problems that we detect before OSError
could be raised."""
pass
class DistutilsOptionError (DistutilsError):
"""Syntactic/semantic errors in command options, such as use of
mutually conflicting options, or inconsistent options,
badly-spelled values, etc. No distinction is made between option
values originating in the setup script, the command line, config
files, or what-have-you -- but if we *know* something originated in
the setup script, we'll raise DistutilsSetupError instead."""
pass
class DistutilsSetupError (DistutilsError):
"""For errors that can be definitely blamed on the setup script,
such as invalid keyword arguments to 'setup()'."""
pass
class DistutilsPlatformError (DistutilsError):
"""We don't know how to do something on the current platform (but
we do know how to do it on some platform) -- eg. trying to compile
C files on a platform not supported by a CCompiler subclass."""
pass
class DistutilsExecError (DistutilsError):
"""Any problems executing an external program (such as the C
compiler, when compiling C files)."""
pass
class DistutilsInternalError (DistutilsError):
"""Internal inconsistencies or impossibilities (obviously, this
should never be seen if the code is working!)."""
pass
class DistutilsTemplateError (DistutilsError):
"""Syntax error in a file list template."""
class DistutilsByteCompileError(DistutilsError):
"""Byte compile error."""
# Exception classes used by the CCompiler implementation classes
class CCompilerError (Exception):
"""Some compile/link operation failed."""
class PreprocessError (CCompilerError):
"""Failure to preprocess one or more C/C++ files."""
class CompileError (CCompilerError):
"""Failure to compile one or more C/C++ source files."""
class LibError (CCompilerError):
"""Failure to create a static library from one or more C/C++ object
files."""
class LinkError (CCompilerError):
"""Failure to link one or more C/C++ object files into an executable
or shared library file."""
class UnknownFileError (CCompilerError):
"""Attempt to process an unknown file type."""
"""distutils.extension
Provides the Extension class, used to describe C/C++ extension
modules in setup scripts."""
import os
import sys
import warnings
# This class is really only used by the "build_ext" command, so it might
# make sense to put it in distutils.command.build_ext. However, that
# module is already big enough, and I want to make this class a bit more
# complex to simplify some common cases ("foo" module in "foo.c") and do
# better error-checking ("foo.c" actually exists).
#
# Also, putting this in build_ext.py means every setup script would have to
# import that large-ish module (indirectly, through distutils.core) in
# order to do anything.
class Extension:
"""Just a collection of attributes that describes an extension
module and everything needed to build it (hopefully in a portable
way, but there are hooks that let you be as unportable as you need).
Instance attributes:
name : string
the full name of the extension, including any packages -- ie.
*not* a filename or pathname, but Python dotted name
sources : [string]
list of source filenames, relative to the distribution root
(where the setup script lives), in Unix form (slash-separated)
for portability. Source files may be C, C++, SWIG (.i),
platform-specific resource files, or whatever else is recognized
by the "build_ext" command as source for a Python extension.
include_dirs : [string]
list of directories to search for C/C++ header files (in Unix
form for portability)
define_macros : [(name : string, value : string|None)]
list of macros to define; each macro is defined using a 2-tuple,
where 'value' is either the string to define it to or None to
define it without a particular value (equivalent of "#define
FOO" in source or -DFOO on Unix C compiler command line)
undef_macros : [string]
list of macros to undefine explicitly
library_dirs : [string]
list of directories to search for C/C++ libraries at link time
libraries : [string]
list of library names (not filenames or paths) to link against
runtime_library_dirs : [string]
list of directories to search for C/C++ libraries at run time
(for shared extensions, this is when the extension is loaded)
extra_objects : [string]
list of extra files to link with (eg. object files not implied
by 'sources', static library that must be explicitly specified,
binary resource files, etc.)
extra_compile_args : [string]
any extra platform- and compiler-specific information to use
when compiling the source files in 'sources'. For platforms and
compilers where "command line" makes sense, this is typically a
list of command-line arguments, but for other platforms it could
be anything.
extra_link_args : [string]
any extra platform- and compiler-specific information to use
when linking object files together to create the extension (or
to create a new static Python interpreter). Similar
interpretation as for 'extra_compile_args'.
export_symbols : [string]
list of symbols to be exported from a shared extension. Not
used on all platforms, and not generally necessary for Python
extensions, which typically export exactly one symbol: "init" +
extension_name.
swig_opts : [string]
any extra options to pass to SWIG if a source file has the .i
extension.
depends : [string]
list of files that the extension depends on
language : string
extension language (i.e. "c", "c++", "objc"). Will be detected
from the source extensions if not provided.
optional : boolean
specifies that a build failure in the extension should not abort the
build process, but simply not install the failing extension.
"""
# When adding arguments to this constructor, be sure to update
# setup_keywords in core.py.
def __init__(self, name, sources,
include_dirs=None,
define_macros=None,
undef_macros=None,
library_dirs=None,
libraries=None,
runtime_library_dirs=None,
extra_objects=None,
extra_compile_args=None,
extra_link_args=None,
export_symbols=None,
swig_opts = None,
depends=None,
language=None,
optional=None,
**kw # To catch unknown keywords
):
if not isinstance(name, str):
raise AssertionError("'name' must be a string")
if not (isinstance(sources, list) and
all(isinstance(v, str) for v in sources)):
raise AssertionError("'sources' must be a list of strings")
self.name = name
self.sources = sources
self.include_dirs = include_dirs or []
self.define_macros = define_macros or []
self.undef_macros = undef_macros or []
self.library_dirs = library_dirs or []
self.libraries = libraries or []
self.runtime_library_dirs = runtime_library_dirs or []
self.extra_objects = extra_objects or []
self.extra_compile_args = extra_compile_args or []
self.extra_link_args = extra_link_args or []
self.export_symbols = export_symbols or []
self.swig_opts = swig_opts or []
self.depends = depends or []
self.language = language
self.optional = optional
# If there are unknown keyword options, warn about them
if len(kw) > 0:
options = [repr(option) for option in kw]
options = ', '.join(sorted(options))
msg = "Unknown Extension options: %s" % options
warnings.warn(msg)
def read_setup_file(filename):
"""Reads a Setup file and returns Extension instances."""
from distutils.sysconfig import (parse_makefile, expand_makefile_vars,
_variable_rx)
from distutils.text_file import TextFile
from distutils.util import split_quoted
# First pass over the file to gather "VAR = VALUE" assignments.
vars = parse_makefile(filename)
# Second pass to gobble up the real content: lines of the form
# <module> ... [<sourcefile> ...] [<cpparg> ...] [<library> ...]
file = TextFile(filename,
strip_comments=1, skip_blanks=1, join_lines=1,
lstrip_ws=1, rstrip_ws=1)
try:
extensions = []
while True:
line = file.readline()
if line is None: # eof
break
if _variable_rx.match(line): # VAR=VALUE, handled in first pass
continue
if line[0] == line[-1] == "*":
file.warn("'%s' lines not handled yet" % line)
continue
line = expand_makefile_vars(line, vars)
words = split_quoted(line)
# NB. this parses a slightly different syntax than the old
# makesetup script: here, there must be exactly one extension per
# line, and it must be the first word of the line. I have no idea
# why the old syntax supported multiple extensions per line, as
# they all wind up being the same.
module = words[0]
ext = Extension(module, [])
append_next_word = None
for word in words[1:]:
if append_next_word is not None:
append_next_word.append(word)
append_next_word = None
continue
suffix = os.path.splitext(word)[1]
switch = word[0:2] ; value = word[2:]
if suffix in (".c", ".cc", ".cpp", ".cxx", ".c++", ".m", ".mm"):
# hmm, should we do something about C vs. C++ sources?
# or leave it up to the CCompiler implementation to
# worry about?
ext.sources.append(word)
elif switch == "-I":
ext.include_dirs.append(value)
elif switch == "-D":
equals = value.find("=")
if equals == -1: # bare "-DFOO" -- no value
ext.define_macros.append((value, None))
else: # "-DFOO=blah"
ext.define_macros.append((value[0:equals],
value[equals+2:]))
elif switch == "-U":
ext.undef_macros.append(value)
elif switch == "-C": # only here 'cause makesetup has it!
ext.extra_compile_args.append(word)
elif switch == "-l":
ext.libraries.append(value)
elif switch == "-L":
ext.library_dirs.append(value)
elif switch == "-R":
ext.runtime_library_dirs.append(value)
elif word == "-rpath":
append_next_word = ext.runtime_library_dirs
elif word == "-Xlinker":
append_next_word = ext.extra_link_args
elif word == "-Xcompiler":
append_next_word = ext.extra_compile_args
elif switch == "-u":
ext.extra_link_args.append(word)
if not value:
append_next_word = ext.extra_link_args
elif suffix in (".a", ".so", ".sl", ".o", ".dylib"):
# NB. a really faithful emulation of makesetup would
# append a .o file to extra_objects only if it
# had a slash in it; otherwise, it would s/.o/.c/
# and append it to sources. Hmmmm.
ext.extra_objects.append(word)
else:
file.warn("unrecognized argument '%s'" % word)
extensions.append(ext)
finally:
file.close()
return extensions
"""distutils.fancy_getopt
Wrapper around the standard getopt module that provides the following
additional features:
* short and long options are tied together
* options have help strings, so fancy_getopt could potentially
create a complete usage summary
* options set attributes of a passed-in object
"""
import sys, string, re
import getopt
from distutils.errors import *
# Much like command_re in distutils.core, this is close to but not quite
# the same as a Python NAME -- except, in the spirit of most GNU
# utilities, we use '-' in place of '_'. (The spirit of LISP lives on!)
# The similarities to NAME are again not a coincidence...
longopt_pat = r'[a-zA-Z](?:[a-zA-Z0-9-]*)'
longopt_re = re.compile(r'^%s$' % longopt_pat)
# For recognizing "negative alias" options, eg. "quiet=!verbose"
neg_alias_re = re.compile("^(%s)=!(%s)$" % (longopt_pat, longopt_pat))
# This is used to translate long options to legitimate Python identifiers
# (for use as attributes of some object).
longopt_xlate = str.maketrans('-', '_')
class FancyGetopt:
"""Wrapper around the standard 'getopt()' module that provides some
handy extra functionality:
* short and long options are tied together
* options have help strings, and help text can be assembled
from them
* options set attributes of a passed-in object
* boolean options can have "negative aliases" -- eg. if
--quiet is the "negative alias" of --verbose, then "--quiet"
on the command line sets 'verbose' to false
"""
def __init__(self, option_table=None):
# The option table is (currently) a list of tuples. The
# tuples may have 3 or four values:
# (long_option, short_option, help_string [, repeatable])
# if an option takes an argument, its long_option should have '='
# appended; short_option should just be a single character, no ':'
# in any case. If a long_option doesn't have a corresponding
# short_option, short_option should be None. All option tuples
# must have long options.
self.option_table = option_table
# 'option_index' maps long option names to entries in the option
# table (ie. those 3-tuples).
self.option_index = {}
if self.option_table:
self._build_index()
# 'alias' records (duh) alias options; {'foo': 'bar'} means
# --foo is an alias for --bar
self.alias = {}
# 'negative_alias' keeps track of options that are the boolean
# opposite of some other option
self.negative_alias = {}
# These keep track of the information in the option table. We
# don't actually populate these structures until we're ready to
# parse the command-line, since the 'option_table' passed in here
# isn't necessarily the final word.
self.short_opts = []
self.long_opts = []
self.short2long = {}
self.attr_name = {}
self.takes_arg = {}
# And 'option_order' is filled up in 'getopt()'; it records the
# original order of options (and their values) on the command-line,
# but expands short options, converts aliases, etc.
self.option_order = []
def _build_index(self):
self.option_index.clear()
for option in self.option_table:
self.option_index[option[0]] = option
def set_option_table(self, option_table):
self.option_table = option_table
self._build_index()
def add_option(self, long_option, short_option=None, help_string=None):
if long_option in self.option_index:
raise DistutilsGetoptError(
"option conflict: already an option '%s'" % long_option)
else:
option = (long_option, short_option, help_string)
self.option_table.append(option)
self.option_index[long_option] = option
def has_option(self, long_option):
"""Return true if the option table for this parser has an
option with long name 'long_option'."""
return long_option in self.option_index
def get_attr_name(self, long_option):
"""Translate long option name 'long_option' to the form it
has as an attribute of some object: ie., translate hyphens
to underscores."""
return long_option.translate(longopt_xlate)
def _check_alias_dict(self, aliases, what):
assert isinstance(aliases, dict)
for (alias, opt) in aliases.items():
if alias not in self.option_index:
raise DistutilsGetoptError(("invalid %s '%s': "
"option '%s' not defined") % (what, alias, alias))
if opt not in self.option_index:
raise DistutilsGetoptError(("invalid %s '%s': "
"aliased option '%s' not defined") % (what, alias, opt))
def set_aliases(self, alias):
"""Set the aliases for this option parser."""
self._check_alias_dict(alias, "alias")
self.alias = alias
def set_negative_aliases(self, negative_alias):
"""Set the negative aliases for this option parser.
'negative_alias' should be a dictionary mapping option names to
option names, both the key and value must already be defined
in the option table."""
self._check_alias_dict(negative_alias, "negative alias")
self.negative_alias = negative_alias
def _grok_option_table(self):
"""Populate the various data structures that keep tabs on the
option table. Called by 'getopt()' before it can do anything
worthwhile.
"""
self.long_opts = []
self.short_opts = []
self.short2long.clear()
self.repeat = {}
for option in self.option_table:
if len(option) == 3:
long, short, help = option
repeat = 0
elif len(option) == 4:
long, short, help, repeat = option
else:
# the option table is part of the code, so simply
# assert that it is correct
raise ValueError("invalid option tuple: %r" % (option,))
# Type- and value-check the option names
if not isinstance(long, str) or len(long) < 2:
raise DistutilsGetoptError(("invalid long option '%s': "
"must be a string of length >= 2") % long)
if (not ((short is None) or
(isinstance(short, str) and len(short) == 1))):
raise DistutilsGetoptError("invalid short option '%s': "
"must a single character or None" % short)
self.repeat[long] = repeat
self.long_opts.append(long)
if long[-1] == '=': # option takes an argument?
if short: short = short + ':'
long = long[0:-1]
self.takes_arg[long] = 1
else:
# Is option is a "negative alias" for some other option (eg.
# "quiet" == "!verbose")?
alias_to = self.negative_alias.get(long)
if alias_to is not None:
if self.takes_arg[alias_to]:
raise DistutilsGetoptError(
"invalid negative alias '%s': "
"aliased option '%s' takes a value"
% (long, alias_to))
self.long_opts[-1] = long # XXX redundant?!
self.takes_arg[long] = 0
# If this is an alias option, make sure its "takes arg" flag is
# the same as the option it's aliased to.
alias_to = self.alias.get(long)
if alias_to is not None:
if self.takes_arg[long] != self.takes_arg[alias_to]:
raise DistutilsGetoptError(
"invalid alias '%s': inconsistent with "
"aliased option '%s' (one of them takes a value, "
"the other doesn't"
% (long, alias_to))
# Now enforce some bondage on the long option name, so we can
# later translate it to an attribute name on some object. Have
# to do this a bit late to make sure we've removed any trailing
# '='.
if not longopt_re.match(long):
raise DistutilsGetoptError(
"invalid long option name '%s' "
"(must be letters, numbers, hyphens only" % long)
self.attr_name[long] = self.get_attr_name(long)
if short:
self.short_opts.append(short)
self.short2long[short[0]] = long
def getopt(self, args=None, object=None):
"""Parse command-line options in args. Store as attributes on object.
If 'args' is None or not supplied, uses 'sys.argv[1:]'. If
'object' is None or not supplied, creates a new OptionDummy
object, stores option values there, and returns a tuple (args,
object). If 'object' is supplied, it is modified in place and
'getopt()' just returns 'args'; in both cases, the returned
'args' is a modified copy of the passed-in 'args' list, which
is left untouched.
"""
if args is None:
args = sys.argv[1:]
if object is None:
object = OptionDummy()
created_object = True
else:
created_object = False
self._grok_option_table()
short_opts = ' '.join(self.short_opts)
try:
opts, args = getopt.getopt(args, short_opts, self.long_opts)
except getopt.error as msg:
raise DistutilsArgError(msg)
for opt, val in opts:
if len(opt) == 2 and opt[0] == '-': # it's a short option
opt = self.short2long[opt[1]]
else:
assert len(opt) > 2 and opt[:2] == '--'
opt = opt[2:]
alias = self.alias.get(opt)
if alias:
opt = alias
if not self.takes_arg[opt]: # boolean option?
assert val == '', "boolean option can't have value"
alias = self.negative_alias.get(opt)
if alias:
opt = alias
val = 0
else:
val = 1
attr = self.attr_name[opt]
# The only repeating option at the moment is 'verbose'.
# It has a negative option -q quiet, which should set verbose = 0.
if val and self.repeat.get(attr) is not None:
val = getattr(object, attr, 0) + 1
setattr(object, attr, val)
self.option_order.append((opt, val))
# for opts
if created_object:
return args, object
else:
return args
def get_option_order(self):
"""Returns the list of (option, value) tuples processed by the
previous run of 'getopt()'. Raises RuntimeError if
'getopt()' hasn't been called yet.
"""
if self.option_order is None:
raise RuntimeError("'getopt()' hasn't been called yet")
else:
return self.option_order
def generate_help(self, header=None):
"""Generate help text (a list of strings, one per suggested line of
output) from the option table for this FancyGetopt object.
"""
# Blithely assume the option table is good: probably wouldn't call
# 'generate_help()' unless you've already called 'getopt()'.
# First pass: determine maximum length of long option names
max_opt = 0
for option in self.option_table:
long = option[0]
short = option[1]
l = len(long)
if long[-1] == '=':
l = l - 1
if short is not None:
l = l + 5 # " (-x)" where short == 'x'
if l > max_opt:
max_opt = l
opt_width = max_opt + 2 + 2 + 2 # room for indent + dashes + gutter
# Typical help block looks like this:
# --foo controls foonabulation
# Help block for longest option looks like this:
# --flimflam set the flim-flam level
# and with wrapped text:
# --flimflam set the flim-flam level (must be between
# 0 and 100, except on Tuesdays)
# Options with short names will have the short name shown (but
# it doesn't contribute to max_opt):
# --foo (-f) controls foonabulation
# If adding the short option would make the left column too wide,
# we push the explanation off to the next line
# --flimflam (-l)
# set the flim-flam level
# Important parameters:
# - 2 spaces before option block start lines
# - 2 dashes for each long option name
# - min. 2 spaces between option and explanation (gutter)
# - 5 characters (incl. space) for short option name
# Now generate lines of help text. (If 80 columns were good enough
# for Jesus, then 78 columns are good enough for me!)
line_width = 78
text_width = line_width - opt_width
big_indent = ' ' * opt_width
if header:
lines = [header]
else:
lines = ['Option summary:']
for option in self.option_table:
long, short, help = option[:3]
text = wrap_text(help, text_width)
if long[-1] == '=':
long = long[0:-1]
# Case 1: no short option at all (makes life easy)
if short is None:
if text:
lines.append(" --%-*s %s" % (max_opt, long, text[0]))
else:
lines.append(" --%-*s " % (max_opt, long))
# Case 2: we have a short option, so we have to include it
# just after the long option
else:
opt_names = "%s (-%s)" % (long, short)
if text:
lines.append(" --%-*s %s" %
(max_opt, opt_names, text[0]))
else:
lines.append(" --%-*s" % opt_names)
for l in text[1:]:
lines.append(big_indent + l)
return lines
def print_help(self, header=None, file=None):
if file is None:
file = sys.stdout
for line in self.generate_help(header):
file.write(line + "\n")
def fancy_getopt(options, negative_opt, object, args):
parser = FancyGetopt(options)
parser.set_negative_aliases(negative_opt)
return parser.getopt(args, object)
WS_TRANS = {ord(_wschar) : ' ' for _wschar in string.whitespace}
def wrap_text(text, width):
"""wrap_text(text : string, width : int) -> [string]
Split 'text' into multiple lines of no more than 'width' characters
each, and return the list of strings that results.
"""
if text is None:
return []
if len(text) <= width:
return [text]
text = text.expandtabs()
text = text.translate(WS_TRANS)
chunks = re.split(r'( +|-+)', text)
chunks = [ch for ch in chunks if ch] # ' - ' results in empty strings
lines = []
while chunks:
cur_line = [] # list of chunks (to-be-joined)
cur_len = 0 # length of current line
while chunks:
l = len(chunks[0])
if cur_len + l <= width: # can squeeze (at least) this chunk in
cur_line.append(chunks[0])
del chunks[0]
cur_len = cur_len + l
else: # this line is full
# drop last chunk if all space
if cur_line and cur_line[-1][0] == ' ':
del cur_line[-1]
break
if chunks: # any chunks left to process?
# if the current line is still empty, then we had a single
# chunk that's too big too fit on a line -- so we break
# down and break it up at the line width
if cur_len == 0:
cur_line.append(chunks[0][0:width])
chunks[0] = chunks[0][width:]
# all-whitespace chunks at the end of a line can be discarded
# (and we know from the re.split above that if a chunk has
# *any* whitespace, it is *all* whitespace)
if chunks[0][0] == ' ':
del chunks[0]
# and store this line in the list-of-all-lines -- as a single
# string, of course!
lines.append(''.join(cur_line))
return lines
def translate_longopt(opt):
"""Convert a long option name to a valid Python identifier by
changing "-" to "_".
"""
return opt.translate(longopt_xlate)
class OptionDummy:
"""Dummy class just used as a place to hold command-line option
values as instance attributes."""
def __init__(self, options=[]):
"""Create a new OptionDummy instance. The attributes listed in
'options' will be initialized to None."""
for opt in options:
setattr(self, opt, None)
if __name__ == "__main__":
text = """\
Tra-la-la, supercalifragilisticexpialidocious.
How *do* you spell that odd word, anyways?
(Someone ask Mary -- she'll know [or she'll
say, "How should I know?"].)"""
for w in (10, 20, 30, 40):
print("width: %d" % w)
print("\n".join(wrap_text(text, w)))
print()
"""distutils.filelist
Provides the FileList class, used for poking about the filesystem
and building lists of files.
"""
import os, re
import fnmatch
from distutils.util import convert_path
from distutils.errors import DistutilsTemplateError, DistutilsInternalError
from distutils import log
class FileList:
"""A list of files built by on exploring the filesystem and filtered by
applying various patterns to what we find there.
Instance attributes:
dir
directory from which files will be taken -- only used if
'allfiles' not supplied to constructor
files
list of filenames currently being built/filtered/manipulated
allfiles
complete list of files under consideration (ie. without any
filtering applied)
"""
def __init__(self, warn=None, debug_print=None):
# ignore argument to FileList, but keep them for backwards
# compatibility
self.allfiles = None
self.files = []
def set_allfiles(self, allfiles):
self.allfiles = allfiles
def findall(self, dir=os.curdir):
self.allfiles = findall(dir)
def debug_print(self, msg):
"""Print 'msg' to stdout if the global DEBUG (taken from the
DISTUTILS_DEBUG environment variable) flag is true.
"""
from distutils.debug import DEBUG
if DEBUG:
print(msg)
# -- List-like methods ---------------------------------------------
def append(self, item):
self.files.append(item)
def extend(self, items):
self.files.extend(items)
def sort(self):
# Not a strict lexical sort!
sortable_files = sorted(map(os.path.split, self.files))
self.files = []
for sort_tuple in sortable_files:
self.files.append(os.path.join(*sort_tuple))
# -- Other miscellaneous utility methods ---------------------------
def remove_duplicates(self):
# Assumes list has been sorted!
for i in range(len(self.files) - 1, 0, -1):
if self.files[i] == self.files[i - 1]:
del self.files[i]
# -- "File template" methods ---------------------------------------
def _parse_template_line(self, line):
words = line.split()
action = words[0]
patterns = dir = dir_pattern = None
if action in ('include', 'exclude',
'global-include', 'global-exclude'):
if len(words) < 2:
raise DistutilsTemplateError(
"'%s' expects <pattern1> <pattern2> ..." % action)
patterns = [convert_path(w) for w in words[1:]]
elif action in ('recursive-include', 'recursive-exclude'):
if len(words) < 3:
raise DistutilsTemplateError(
"'%s' expects <dir> <pattern1> <pattern2> ..." % action)
dir = convert_path(words[1])
patterns = [convert_path(w) for w in words[2:]]
elif action in ('graft', 'prune'):
if len(words) != 2:
raise DistutilsTemplateError(
"'%s' expects a single <dir_pattern>" % action)
dir_pattern = convert_path(words[1])
else:
raise DistutilsTemplateError("unknown action '%s'" % action)
return (action, patterns, dir, dir_pattern)
def process_template_line(self, line):
# Parse the line: split it up, make sure the right number of words
# is there, and return the relevant words. 'action' is always
# defined: it's the first word of the line. Which of the other
# three are defined depends on the action; it'll be either
# patterns, (dir and patterns), or (dir_pattern).
(action, patterns, dir, dir_pattern) = self._parse_template_line(line)
# OK, now we know that the action is valid and we have the
# right number of words on the line for that action -- so we
# can proceed with minimal error-checking.
if action == 'include':
self.debug_print("include " + ' '.join(patterns))
for pattern in patterns:
if not self.include_pattern(pattern, anchor=1):
log.warn("warning: no files found matching '%s'",
pattern)
elif action == 'exclude':
self.debug_print("exclude " + ' '.join(patterns))
for pattern in patterns:
if not self.exclude_pattern(pattern, anchor=1):
log.warn(("warning: no previously-included files "
"found matching '%s'"), pattern)
elif action == 'global-include':
self.debug_print("global-include " + ' '.join(patterns))
for pattern in patterns:
if not self.include_pattern(pattern, anchor=0):
log.warn(("warning: no files found matching '%s' "
"anywhere in distribution"), pattern)
elif action == 'global-exclude':
self.debug_print("global-exclude " + ' '.join(patterns))
for pattern in patterns:
if not self.exclude_pattern(pattern, anchor=0):
log.warn(("warning: no previously-included files matching "
"'%s' found anywhere in distribution"),
pattern)
elif action == 'recursive-include':
self.debug_print("recursive-include %s %s" %
(dir, ' '.join(patterns)))
for pattern in patterns:
if not self.include_pattern(pattern, prefix=dir):
log.warn(("warning: no files found matching '%s' "
"under directory '%s'"),
pattern, dir)
elif action == 'recursive-exclude':
self.debug_print("recursive-exclude %s %s" %
(dir, ' '.join(patterns)))
for pattern in patterns:
if not self.exclude_pattern(pattern, prefix=dir):
log.warn(("warning: no previously-included files matching "
"'%s' found under directory '%s'"),
pattern, dir)
elif action == 'graft':
self.debug_print("graft " + dir_pattern)
if not self.include_pattern(None, prefix=dir_pattern):
log.warn("warning: no directories found matching '%s'",
dir_pattern)
elif action == 'prune':
self.debug_print("prune " + dir_pattern)
if not self.exclude_pattern(None, prefix=dir_pattern):
log.warn(("no previously-included directories found "
"matching '%s'"), dir_pattern)
else:
raise DistutilsInternalError(
"this cannot happen: invalid action '%s'" % action)
# -- Filtering/selection methods -----------------------------------
def include_pattern(self, pattern, anchor=1, prefix=None, is_regex=0):
"""Select strings (presumably filenames) from 'self.files' that
match 'pattern', a Unix-style wildcard (glob) pattern. Patterns
are not quite the same as implemented by the 'fnmatch' module: '*'
and '?' match non-special characters, where "special" is platform-
dependent: slash on Unix; colon, slash, and backslash on
DOS/Windows; and colon on Mac OS.
If 'anchor' is true (the default), then the pattern match is more
stringent: "*.py" will match "foo.py" but not "foo/bar.py". If
'anchor' is false, both of these will match.
If 'prefix' is supplied, then only filenames starting with 'prefix'
(itself a pattern) and ending with 'pattern', with anything in between
them, will match. 'anchor' is ignored in this case.
If 'is_regex' is true, 'anchor' and 'prefix' are ignored, and
'pattern' is assumed to be either a string containing a regex or a
regex object -- no translation is done, the regex is just compiled
and used as-is.
Selected strings will be added to self.files.
Return True if files are found, False otherwise.
"""
# XXX docstring lying about what the special chars are?
files_found = False
pattern_re = translate_pattern(pattern, anchor, prefix, is_regex)
self.debug_print("include_pattern: applying regex r'%s'" %
pattern_re.pattern)
# delayed loading of allfiles list
if self.allfiles is None:
self.findall()
for name in self.allfiles:
if pattern_re.search(name):
self.debug_print(" adding " + name)
self.files.append(name)
files_found = True
return files_found
def exclude_pattern (self, pattern,
anchor=1, prefix=None, is_regex=0):
"""Remove strings (presumably filenames) from 'files' that match
'pattern'. Other parameters are the same as for
'include_pattern()', above.
The list 'self.files' is modified in place.
Return True if files are found, False otherwise.
"""
files_found = False
pattern_re = translate_pattern(pattern, anchor, prefix, is_regex)
self.debug_print("exclude_pattern: applying regex r'%s'" %
pattern_re.pattern)
for i in range(len(self.files)-1, -1, -1):
if pattern_re.search(self.files[i]):
self.debug_print(" removing " + self.files[i])
del self.files[i]
files_found = True
return files_found
# ----------------------------------------------------------------------
# Utility functions
def findall(dir=os.curdir):
"""Find all files under 'dir' and return the list of full filenames
(relative to 'dir').
"""
from stat import ST_MODE, S_ISREG, S_ISDIR, S_ISLNK
list = []
stack = [dir]
pop = stack.pop
push = stack.append
while stack:
dir = pop()
names = os.listdir(dir)
for name in names:
if dir != os.curdir: # avoid the dreaded "./" syndrome
fullname = os.path.join(dir, name)
else:
fullname = name
# Avoid excess stat calls -- just one will do, thank you!
stat = os.stat(fullname)
mode = stat[ST_MODE]
if S_ISREG(mode):
list.append(fullname)
elif S_ISDIR(mode) and not S_ISLNK(mode):
push(fullname)
return list
def glob_to_re(pattern):
"""Translate a shell-like glob pattern to a regular expression; return
a string containing the regex. Differs from 'fnmatch.translate()' in
that '*' does not match "special characters" (which are
platform-specific).
"""
pattern_re = fnmatch.translate(pattern)
# '?' and '*' in the glob pattern become '.' and '.*' in the RE, which
# IMHO is wrong -- '?' and '*' aren't supposed to match slash in Unix,
# and by extension they shouldn't match such "special characters" under
# any OS. So change all non-escaped dots in the RE to match any
# character except the special characters (currently: just os.sep).
sep = os.sep
if os.sep == '\\':
# we're using a regex to manipulate a regex, so we need
# to escape the backslash twice
sep = r'\\\\'
escaped = r'\1[^%s]' % sep
pattern_re = re.sub(r'((?<!\\)(\\\\)*)\.', escaped, pattern_re)
return pattern_re
def translate_pattern(pattern, anchor=1, prefix=None, is_regex=0):
"""Translate a shell-like wildcard pattern to a compiled regular
expression. Return the compiled regex. If 'is_regex' true,
then 'pattern' is directly compiled to a regex (if it's a string)
or just returned as-is (assumes it's a regex object).
"""
if is_regex:
if isinstance(pattern, str):
return re.compile(pattern)
else:
return pattern
if pattern:
pattern_re = glob_to_re(pattern)
else:
pattern_re = ''
if prefix is not None:
# ditch end of pattern character
empty_pattern = glob_to_re('')
prefix_re = glob_to_re(prefix)[:-len(empty_pattern)]
sep = os.sep
if os.sep == '\\':
sep = r'\\'
pattern_re = "^" + sep.join((prefix_re, ".*" + pattern_re))
else: # no prefix -- respect anchor flag
if anchor:
pattern_re = "^" + pattern_re
return re.compile(pattern_re)
"""distutils.file_util
Utility functions for operating on single files.
"""
import os
from distutils.errors import DistutilsFileError
from distutils import log
# for generating verbose output in 'copy_file()'
_copy_action = { None: 'copying',
'hard': 'hard linking',
'sym': 'symbolically linking' }
def _copy_file_contents(src, dst, buffer_size=16*1024):
"""Copy the file 'src' to 'dst'; both must be filenames. Any error
opening either file, reading from 'src', or writing to 'dst', raises
DistutilsFileError. Data is read/written in chunks of 'buffer_size'
bytes (default 16k). No attempt is made to handle anything apart from
regular files.
"""
# Stolen from shutil module in the standard library, but with
# custom error-handling added.
fsrc = None
fdst = None
try:
try:
fsrc = open(src, 'rb')
except OSError as e:
raise DistutilsFileError("could not open '%s': %s" % (src, e.strerror))
if os.path.exists(dst):
try:
os.unlink(dst)
except OSError as e:
raise DistutilsFileError(
"could not delete '%s': %s" % (dst, e.strerror))
try:
fdst = open(dst, 'wb')
except OSError as e:
raise DistutilsFileError(
"could not create '%s': %s" % (dst, e.strerror))
while True:
try:
buf = fsrc.read(buffer_size)
except OSError as e:
raise DistutilsFileError(
"could not read from '%s': %s" % (src, e.strerror))
if not buf:
break
try:
fdst.write(buf)
except OSError as e:
raise DistutilsFileError(
"could not write to '%s': %s" % (dst, e.strerror))
finally:
if fdst:
fdst.close()
if fsrc:
fsrc.close()
def copy_file(src, dst, preserve_mode=1, preserve_times=1, update=0,
link=None, verbose=1, dry_run=0):
"""Copy a file 'src' to 'dst'. If 'dst' is a directory, then 'src' is
copied there with the same name; otherwise, it must be a filename. (If
the file exists, it will be ruthlessly clobbered.) If 'preserve_mode'
is true (the default), the file's mode (type and permission bits, or
whatever is analogous on the current platform) is copied. If
'preserve_times' is true (the default), the last-modified and
last-access times are copied as well. If 'update' is true, 'src' will
only be copied if 'dst' does not exist, or if 'dst' does exist but is
older than 'src'.
'link' allows you to make hard links (os.link) or symbolic links
(os.symlink) instead of copying: set it to "hard" or "sym"; if it is
None (the default), files are copied. Don't set 'link' on systems that
don't support it: 'copy_file()' doesn't check if hard or symbolic
linking is available. If hardlink fails, falls back to
_copy_file_contents().
Under Mac OS, uses the native file copy function in macostools; on
other systems, uses '_copy_file_contents()' to copy file contents.
Return a tuple (dest_name, copied): 'dest_name' is the actual name of
the output file, and 'copied' is true if the file was copied (or would
have been copied, if 'dry_run' true).
"""
# XXX if the destination file already exists, we clobber it if
# copying, but blow up if linking. Hmmm. And I don't know what
# macostools.copyfile() does. Should definitely be consistent, and
# should probably blow up if destination exists and we would be
# changing it (ie. it's not already a hard/soft link to src OR
# (not update) and (src newer than dst).
from distutils.dep_util import newer
from stat import ST_ATIME, ST_MTIME, ST_MODE, S_IMODE
if not os.path.isfile(src):
raise DistutilsFileError(
"can't copy '%s': doesn't exist or not a regular file" % src)
if os.path.isdir(dst):
dir = dst
dst = os.path.join(dst, os.path.basename(src))
else:
dir = os.path.dirname(dst)
if update and not newer(src, dst):
if verbose >= 1:
log.debug("not copying %s (output up-to-date)", src)
return (dst, 0)
try:
action = _copy_action[link]
except KeyError:
raise ValueError("invalid value '%s' for 'link' argument" % link)
if verbose >= 1:
if os.path.basename(dst) == os.path.basename(src):
log.info("%s %s -> %s", action, src, dir)
else:
log.info("%s %s -> %s", action, src, dst)
if dry_run:
return (dst, 1)
# If linking (hard or symbolic), use the appropriate system call
# (Unix only, of course, but that's the caller's responsibility)
elif link == 'hard':
if not (os.path.exists(dst) and os.path.samefile(src, dst)):
try:
os.link(src, dst)
return (dst, 1)
except OSError:
# If hard linking fails, fall back on copying file
# (some special filesystems don't support hard linking
# even under Unix, see issue #8876).
pass
elif link == 'sym':
if not (os.path.exists(dst) and os.path.samefile(src, dst)):
os.symlink(src, dst)
return (dst, 1)
# Otherwise (non-Mac, not linking), copy the file contents and
# (optionally) copy the times and mode.
_copy_file_contents(src, dst)
if preserve_mode or preserve_times:
st = os.stat(src)
# According to David Ascher <[email protected]>, utime() should be done
# before chmod() (at least under NT).
if preserve_times:
os.utime(dst, (st[ST_ATIME], st[ST_MTIME]))
if preserve_mode:
os.chmod(dst, S_IMODE(st[ST_MODE]))
return (dst, 1)
# XXX I suspect this is Unix-specific -- need porting help!
def move_file (src, dst,
verbose=1,
dry_run=0):
"""Move a file 'src' to 'dst'. If 'dst' is a directory, the file will
be moved into it with the same name; otherwise, 'src' is just renamed
to 'dst'. Return the new full name of the file.
Handles cross-device moves on Unix using 'copy_file()'. What about
other systems???
"""
from os.path import exists, isfile, isdir, basename, dirname
import errno
if verbose >= 1:
log.info("moving %s -> %s", src, dst)
if dry_run:
return dst
if not isfile(src):
raise DistutilsFileError("can't move '%s': not a regular file" % src)
if isdir(dst):
dst = os.path.join(dst, basename(src))
elif exists(dst):
raise DistutilsFileError(
"can't move '%s': destination '%s' already exists" %
(src, dst))
if not isdir(dirname(dst)):
raise DistutilsFileError(
"can't move '%s': destination '%s' not a valid path" %
(src, dst))
copy_it = False
try:
os.rename(src, dst)
except OSError as e:
(num, msg) = e.args
if num == errno.EXDEV:
copy_it = True
else:
raise DistutilsFileError(
"couldn't move '%s' to '%s': %s" % (src, dst, msg))
if copy_it:
copy_file(src, dst, verbose=verbose)
try:
os.unlink(src)
except OSError as e:
(num, msg) = e.args
try:
os.unlink(dst)
except OSError:
pass
raise DistutilsFileError(
"couldn't move '%s' to '%s' by copy/delete: "
"delete '%s' failed: %s"
% (src, dst, src, msg))
return dst
def write_file (filename, contents):
"""Create a file with the specified name and write 'contents' (a
sequence of strings without line terminators) to it.
"""
f = open(filename, "w")
try:
for line in contents:
f.write(line + "\n")
finally:
f.close()
"""A simple log mechanism styled after PEP 282."""
# The class here is styled after PEP 282 so that it could later be
# replaced with a standard Python logging implementation.
DEBUG = 1
INFO = 2
WARN = 3
ERROR = 4
FATAL = 5
import sys
class Log:
def __init__(self, threshold=WARN):
self.threshold = threshold
def _log(self, level, msg, args):
if level not in (DEBUG, INFO, WARN, ERROR, FATAL):
raise ValueError('%s wrong log level' % str(level))
if level >= self.threshold:
if args:
msg = msg % args
if level in (WARN, ERROR, FATAL):
stream = sys.stderr
else:
stream = sys.stdout
if stream.errors == 'strict':
# emulate backslashreplace error handler
encoding = stream.encoding
msg = msg.encode(encoding, "backslashreplace").decode(encoding)
stream.write('%s\n' % msg)
stream.flush()
def log(self, level, msg, *args):
self._log(level, msg, args)
def debug(self, msg, *args):
self._log(DEBUG, msg, args)
def info(self, msg, *args):
self._log(INFO, msg, args)
def warn(self, msg, *args):
self._log(WARN, msg, args)
def error(self, msg, *args):
self._log(ERROR, msg, args)
def fatal(self, msg, *args):
self._log(FATAL, msg, args)
_global_log = Log()
log = _global_log.log
debug = _global_log.debug
info = _global_log.info
warn = _global_log.warn
error = _global_log.error
fatal = _global_log.fatal
def set_threshold(level):
# return the old threshold for use from tests
old = _global_log.threshold
_global_log.threshold = level
return old
def set_verbosity(v):
if v <= 0:
set_threshold(WARN)
elif v == 1:
set_threshold(INFO)
elif v >= 2:
set_threshold(DEBUG)
"""distutils.msvc9compiler
Contains MSVCCompiler, an implementation of the abstract CCompiler class
for the Microsoft Visual Studio 2008.
The module is compatible with VS 2005 and VS 2008. You can find legacy support
for older versions of VS in distutils.msvccompiler.
"""
# Written by Perry Stoll
# hacked by Robin Becker and Thomas Heller to do a better job of
# finding DevStudio (through the registry)
# ported to VS2005 and VS 2008 by Christian Heimes
import os
import subprocess
import sys
import re
from distutils.errors import DistutilsExecError, DistutilsPlatformError, \
CompileError, LibError, LinkError
from distutils.ccompiler import CCompiler, gen_preprocess_options, \
gen_lib_options
from distutils import log
from distutils.util import get_platform
import winreg
RegOpenKeyEx = winreg.OpenKeyEx
RegEnumKey = winreg.EnumKey
RegEnumValue = winreg.EnumValue
RegError = winreg.error
HKEYS = (winreg.HKEY_USERS,
winreg.HKEY_CURRENT_USER,
winreg.HKEY_LOCAL_MACHINE,
winreg.HKEY_CLASSES_ROOT)
NATIVE_WIN64 = (sys.platform == 'win32' and sys.maxsize > 2**32)
if NATIVE_WIN64:
# Visual C++ is a 32-bit application, so we need to look in
# the corresponding registry branch, if we're running a
# 64-bit Python on Win64
VS_BASE = r"Software\Wow6432Node\Microsoft\VisualStudio\%0.1f"
WINSDK_BASE = r"Software\Wow6432Node\Microsoft\Microsoft SDKs\Windows"
NET_BASE = r"Software\Wow6432Node\Microsoft\.NETFramework"
else:
VS_BASE = r"Software\Microsoft\VisualStudio\%0.1f"
WINSDK_BASE = r"Software\Microsoft\Microsoft SDKs\Windows"
NET_BASE = r"Software\Microsoft\.NETFramework"
# A map keyed by get_platform() return values to values accepted by
# 'vcvarsall.bat'. Note a cross-compile may combine these (eg, 'x86_amd64' is
# the param to cross-compile on x86 targetting amd64.)
PLAT_TO_VCVARS = {
'win32' : 'x86',
'win-amd64' : 'amd64',
'win-ia64' : 'ia64',
}
class Reg:
"""Helper class to read values from the registry
"""
def get_value(cls, path, key):
for base in HKEYS:
d = cls.read_values(base, path)
if d and key in d:
return d[key]
raise KeyError(key)
get_value = classmethod(get_value)
def read_keys(cls, base, key):
"""Return list of registry keys."""
try:
handle = RegOpenKeyEx(base, key)
except RegError:
return None
L = []
i = 0
while True:
try:
k = RegEnumKey(handle, i)
except RegError:
break
L.append(k)
i += 1
return L
read_keys = classmethod(read_keys)
def read_values(cls, base, key):
"""Return dict of registry keys and values.
All names are converted to lowercase.
"""
try:
handle = RegOpenKeyEx(base, key)
except RegError:
return None
d = {}
i = 0
while True:
try:
name, value, type = RegEnumValue(handle, i)
except RegError:
break
name = name.lower()
d[cls.convert_mbcs(name)] = cls.convert_mbcs(value)
i += 1
return d
read_values = classmethod(read_values)
def convert_mbcs(s):
dec = getattr(s, "decode", None)
if dec is not None:
try:
s = dec("mbcs")
except UnicodeError:
pass
return s
convert_mbcs = staticmethod(convert_mbcs)
class MacroExpander:
def __init__(self, version):
self.macros = {}
self.vsbase = VS_BASE % version
self.load_macros(version)
def set_macro(self, macro, path, key):
self.macros["$(%s)" % macro] = Reg.get_value(path, key)
def load_macros(self, version):
self.set_macro("VCInstallDir", self.vsbase + r"\Setup\VC", "productdir")
self.set_macro("VSInstallDir", self.vsbase + r"\Setup\VS", "productdir")
self.set_macro("FrameworkDir", NET_BASE, "installroot")
try:
if version >= 8.0:
self.set_macro("FrameworkSDKDir", NET_BASE,
"sdkinstallrootv2.0")
else:
raise KeyError("sdkinstallrootv2.0")
except KeyError:
raise DistutilsPlatformError(
"""Python was built with Visual Studio 2008;
extensions must be built with a compiler than can generate compatible binaries.
Visual Studio 2008 was not found on this system. If you have Cygwin installed,
you can try compiling with MingW32, by passing "-c mingw32" to setup.py.""")
if version >= 9.0:
self.set_macro("FrameworkVersion", self.vsbase, "clr version")
self.set_macro("WindowsSdkDir", WINSDK_BASE, "currentinstallfolder")
else:
p = r"Software\Microsoft\NET Framework Setup\Product"
for base in HKEYS:
try:
h = RegOpenKeyEx(base, p)
except RegError:
continue
key = RegEnumKey(h, 0)
d = Reg.get_value(base, r"%s\%s" % (p, key))
self.macros["$(FrameworkVersion)"] = d["version"]
def sub(self, s):
for k, v in self.macros.items():
s = s.replace(k, v)
return s
def get_build_version():
"""Return the version of MSVC that was used to build Python.
For Python 2.3 and up, the version number is included in
sys.version. For earlier versions, assume the compiler is MSVC 6.
"""
prefix = "MSC v."
i = sys.version.find(prefix)
if i == -1:
return 6
i = i + len(prefix)
s, rest = sys.version[i:].split(" ", 1)
majorVersion = int(s[:-2]) - 6
minorVersion = int(s[2:3]) / 10.0
# I don't think paths are affected by minor version in version 6
if majorVersion == 6:
minorVersion = 0
if majorVersion >= 6:
return majorVersion + minorVersion
# else we don't know what version of the compiler this is
return None
def normalize_and_reduce_paths(paths):
"""Return a list of normalized paths with duplicates removed.
The current order of paths is maintained.
"""
# Paths are normalized so things like: /a and /a/ aren't both preserved.
reduced_paths = []
for p in paths:
np = os.path.normpath(p)
# XXX(nnorwitz): O(n**2), if reduced_paths gets long perhaps use a set.
if np not in reduced_paths:
reduced_paths.append(np)
return reduced_paths
def removeDuplicates(variable):
"""Remove duplicate values of an environment variable.
"""
oldList = variable.split(os.pathsep)
newList = []
for i in oldList:
if i not in newList:
newList.append(i)
newVariable = os.pathsep.join(newList)
return newVariable
def find_vcvarsall(version):
"""Find the vcvarsall.bat file
At first it tries to find the productdir of VS 2008 in the registry. If
that fails it falls back to the VS90COMNTOOLS env var.
"""
vsbase = VS_BASE % version
try:
productdir = Reg.get_value(r"%s\Setup\VC" % vsbase,
"productdir")
except KeyError:
log.debug("Unable to find productdir in registry")
productdir = None
if not productdir or not os.path.isdir(productdir):
toolskey = "VS%0.f0COMNTOOLS" % version
toolsdir = os.environ.get(toolskey, None)
if toolsdir and os.path.isdir(toolsdir):
productdir = os.path.join(toolsdir, os.pardir, os.pardir, "VC")
productdir = os.path.abspath(productdir)
if not os.path.isdir(productdir):
log.debug("%s is not a valid directory" % productdir)
return None
else:
log.debug("Env var %s is not set or invalid" % toolskey)
if not productdir:
log.debug("No productdir found")
return None
vcvarsall = os.path.join(productdir, "vcvarsall.bat")
if os.path.isfile(vcvarsall):
return vcvarsall
log.debug("Unable to find vcvarsall.bat")
return None
def query_vcvarsall(version, arch="x86"):
"""Launch vcvarsall.bat and read the settings from its environment
"""
vcvarsall = find_vcvarsall(version)
interesting = set(("include", "lib", "libpath", "path"))
result = {}
if vcvarsall is None:
raise DistutilsPlatformError("Unable to find vcvarsall.bat")
log.debug("Calling 'vcvarsall.bat %s' (version=%s)", arch, version)
popen = subprocess.Popen('"%s" %s & set' % (vcvarsall, arch),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
try:
stdout, stderr = popen.communicate()
if popen.wait() != 0:
raise DistutilsPlatformError(stderr.decode("mbcs"))
stdout = stdout.decode("mbcs")
for line in stdout.split("\n"):
line = Reg.convert_mbcs(line)
if '=' not in line:
continue
line = line.strip()
key, value = line.split('=', 1)
key = key.lower()
if key in interesting:
if value.endswith(os.pathsep):
value = value[:-1]
result[key] = removeDuplicates(value)
finally:
popen.stdout.close()
popen.stderr.close()
if len(result) != len(interesting):
raise ValueError(str(list(result.keys())))
return result
# More globals
VERSION = get_build_version()
# MACROS = MacroExpander(VERSION)
class MSVCCompiler(CCompiler) :
"""Concrete class that implements an interface to Microsoft Visual C++,
as defined by the CCompiler abstract class."""
compiler_type = 'msvc'
# Just set this so CCompiler's constructor doesn't barf. We currently
# don't use the 'set_executables()' bureaucracy provided by CCompiler,
# as it really isn't necessary for this sort of single-compiler class.
# Would be nice to have a consistent interface with UnixCCompiler,
# though, so it's worth thinking about.
executables = {}
# Private class data (need to distinguish C from C++ source for compiler)
_c_extensions = ['.c']
_cpp_extensions = ['.cc', '.cpp', '.cxx']
_rc_extensions = ['.rc']
_mc_extensions = ['.mc']
# Needed for the filename generation methods provided by the
# base class, CCompiler.
src_extensions = (_c_extensions + _cpp_extensions +
_rc_extensions + _mc_extensions)
res_extension = '.res'
obj_extension = '.obj'
static_lib_extension = '.lib'
shared_lib_extension = '.dll'
static_lib_format = shared_lib_format = '%s%s'
exe_extension = '.exe'
def __init__(self, verbose=0, dry_run=0, force=0):
CCompiler.__init__ (self, verbose, dry_run, force)
if VERSION < 8.0: # ironpython: only throw if you try to use it
raise DistutilsPlatformError("VC %0.1f is not supported by this module" % VERSION)
self.__version = VERSION
self.__root = r"Software\Microsoft\VisualStudio"
# self.__macros = MACROS
self.__paths = []
# target platform (.plat_name is consistent with 'bdist')
self.plat_name = None
self.__arch = None # deprecated name
self.initialized = False
def initialize(self, plat_name=None):
# multi-init means we would need to check platform same each time...
assert not self.initialized, "don't init multiple times"
if plat_name is None:
plat_name = get_platform()
# sanity check for platforms to prevent obscure errors later.
ok_plats = 'win32', 'win-amd64', 'win-ia64'
if plat_name not in ok_plats:
raise DistutilsPlatformError("--plat-name must be one of %s" %
(ok_plats,))
if "DISTUTILS_USE_SDK" in os.environ and "MSSdk" in os.environ and self.find_exe("cl.exe"):
# Assume that the SDK set up everything alright; don't try to be
# smarter
self.cc = "cl.exe"
self.linker = "link.exe"
self.lib = "lib.exe"
self.rc = "rc.exe"
self.mc = "mc.exe"
else:
# On x86, 'vcvars32.bat amd64' creates an env that doesn't work;
# to cross compile, you use 'x86_amd64'.
# On AMD64, 'vcvars32.bat amd64' is a native build env; to cross
# compile use 'x86' (ie, it runs the x86 compiler directly)
# No idea how itanium handles this, if at all.
if plat_name == get_platform() or plat_name == 'win32':
# native build or cross-compile to win32
plat_spec = PLAT_TO_VCVARS[plat_name]
else:
# cross compile from win32 -> some 64bit
plat_spec = PLAT_TO_VCVARS[get_platform()] + '_' + \
PLAT_TO_VCVARS[plat_name]
vc_env = query_vcvarsall(VERSION, plat_spec)
self.__paths = vc_env['path'].split(os.pathsep)
os.environ['lib'] = vc_env['lib']
os.environ['include'] = vc_env['include']
if len(self.__paths) == 0:
raise DistutilsPlatformError("Python was built with %s, "
"and extensions need to be built with the same "
"version of the compiler, but it isn't installed."
% self.__product)
self.cc = self.find_exe("cl.exe")
self.linker = self.find_exe("link.exe")
self.lib = self.find_exe("lib.exe")
self.rc = self.find_exe("rc.exe") # resource compiler
self.mc = self.find_exe("mc.exe") # message compiler
#self.set_path_env_var('lib')
#self.set_path_env_var('include')
# extend the MSVC path with the current path
try:
for p in os.environ['path'].split(';'):
self.__paths.append(p)
except KeyError:
pass
self.__paths = normalize_and_reduce_paths(self.__paths)
os.environ['path'] = ";".join(self.__paths)
self.preprocess_options = None
if self.__arch == "x86":
self.compile_options = [ '/nologo', '/Ox', '/MD', '/W3',
'/DNDEBUG']
self.compile_options_debug = ['/nologo', '/Od', '/MDd', '/W3',
'/Z7', '/D_DEBUG']
else:
# Win64
self.compile_options = [ '/nologo', '/Ox', '/MD', '/W3', '/GS-' ,
'/DNDEBUG']
self.compile_options_debug = ['/nologo', '/Od', '/MDd', '/W3', '/GS-',
'/Z7', '/D_DEBUG']
self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:NO']
if self.__version >= 7:
self.ldflags_shared_debug = [
'/DLL', '/nologo', '/INCREMENTAL:no', '/DEBUG'
]
self.ldflags_static = [ '/nologo']
self.initialized = True
# -- Worker methods ------------------------------------------------
def object_filenames(self,
source_filenames,
strip_dir=0,
output_dir=''):
# Copied from ccompiler.py, extended to return .res as 'object'-file
# for .rc input file
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
(base, ext) = os.path.splitext (src_name)
base = os.path.splitdrive(base)[1] # Chop off the drive
base = base[os.path.isabs(base):] # If abs, chop off leading /
if ext not in self.src_extensions:
# Better to raise an exception instead of silently continuing
# and later complain about sources and targets having
# different lengths
raise CompileError ("Don't know how to compile %s" % src_name)
if strip_dir:
base = os.path.basename (base)
if ext in self._rc_extensions:
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
elif ext in self._mc_extensions:
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
def compile(self, sources,
output_dir=None, macros=None, include_dirs=None, debug=0,
extra_preargs=None, extra_postargs=None, depends=None):
if not self.initialized:
self.initialize()
compile_info = self._setup_compile(output_dir, macros, include_dirs,
sources, depends, extra_postargs)
macros, objects, extra_postargs, pp_opts, build = compile_info
compile_opts = extra_preargs or []
compile_opts.append ('/c')
if debug:
compile_opts.extend(self.compile_options_debug)
else:
compile_opts.extend(self.compile_options)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
if debug:
# pass the full pathname to MSVC in debug mode,
# this allows the debugger to find the source file
# without asking the user to browse for it
src = os.path.abspath(src)
if ext in self._c_extensions:
input_opt = "/Tc" + src
elif ext in self._cpp_extensions:
input_opt = "/Tp" + src
elif ext in self._rc_extensions:
# compile .RC to .RES file
input_opt = src
output_opt = "/fo" + obj
try:
self.spawn([self.rc] + pp_opts +
[output_opt] + [input_opt])
except DistutilsExecError as msg:
raise CompileError(msg)
continue
elif ext in self._mc_extensions:
# Compile .MC to .RC file to .RES file.
# * '-h dir' specifies the directory for the
# generated include file
# * '-r dir' specifies the target directory of the
# generated RC file and the binary message resource
# it includes
#
# For now (since there are no options to change this),
# we use the source-directory for the include file and
# the build directory for the RC file and message
# resources. This works at least for win32all.
h_dir = os.path.dirname(src)
rc_dir = os.path.dirname(obj)
try:
# first compile .MC to .RC and .H file
self.spawn([self.mc] +
['-h', h_dir, '-r', rc_dir] + [src])
base, _ = os.path.splitext (os.path.basename (src))
rc_file = os.path.join (rc_dir, base + '.rc')
# then compile .RC to .RES file
self.spawn([self.rc] +
["/fo" + obj] + [rc_file])
except DistutilsExecError as msg:
raise CompileError(msg)
continue
else:
# how to handle this file?
raise CompileError("Don't know how to compile %s to %s"
% (src, obj))
output_opt = "/Fo" + obj
try:
self.spawn([self.cc] + compile_opts + pp_opts +
[input_opt, output_opt] +
extra_postargs)
except DistutilsExecError as msg:
raise CompileError(msg)
return objects
def create_static_lib(self,
objects,
output_libname,
output_dir=None,
debug=0,
target_lang=None):
if not self.initialized:
self.initialize()
(objects, output_dir) = self._fix_object_args(objects, output_dir)
output_filename = self.library_filename(output_libname,
output_dir=output_dir)
if self._need_link(objects, output_filename):
lib_args = objects + ['/OUT:' + output_filename]
if debug:
pass # XXX what goes here?
try:
self.spawn([self.lib] + lib_args)
except DistutilsExecError as msg:
raise LibError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
def link(self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
if not self.initialized:
self.initialize()
(objects, output_dir) = self._fix_object_args(objects, output_dir)
fixed_args = self._fix_lib_args(libraries, library_dirs,
runtime_library_dirs)
(libraries, library_dirs, runtime_library_dirs) = fixed_args
if runtime_library_dirs:
self.warn ("I don't know what to do with 'runtime_library_dirs': "
+ str (runtime_library_dirs))
lib_opts = gen_lib_options(self,
library_dirs, runtime_library_dirs,
libraries)
if output_dir is not None:
output_filename = os.path.join(output_dir, output_filename)
if self._need_link(objects, output_filename):
if target_desc == CCompiler.EXECUTABLE:
if debug:
ldflags = self.ldflags_shared_debug[1:]
else:
ldflags = self.ldflags_shared[1:]
else:
if debug:
ldflags = self.ldflags_shared_debug
else:
ldflags = self.ldflags_shared
export_opts = []
for sym in (export_symbols or []):
export_opts.append("/EXPORT:" + sym)
ld_args = (ldflags + lib_opts + export_opts +
objects + ['/OUT:' + output_filename])
# The MSVC linker generates .lib and .exp files, which cannot be
# suppressed by any linker switches. The .lib files may even be
# needed! Make sure they are generated in the temporary build
# directory. Since they have different names for debug and release
# builds, they can go into the same directory.
build_temp = os.path.dirname(objects[0])
if export_symbols is not None:
(dll_name, dll_ext) = os.path.splitext(
os.path.basename(output_filename))
implib_file = os.path.join(
build_temp,
self.library_filename(dll_name))
ld_args.append ('/IMPLIB:' + implib_file)
self.manifest_setup_ldargs(output_filename, build_temp, ld_args)
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath(os.path.dirname(output_filename))
try:
self.spawn([self.linker] + ld_args)
except DistutilsExecError as msg:
raise LinkError(msg)
# embed the manifest
# XXX - this is somewhat fragile - if mt.exe fails, distutils
# will still consider the DLL up-to-date, but it will not have a
# manifest. Maybe we should link to a temp file? OTOH, that
# implies a build environment error that shouldn't go undetected.
mfinfo = self.manifest_get_embed_info(target_desc, ld_args)
if mfinfo is not None:
mffilename, mfid = mfinfo
out_arg = '-outputresource:%s;%s' % (output_filename, mfid)
try:
self.spawn(['mt.exe', '-nologo', '-manifest',
mffilename, out_arg])
except DistutilsExecError as msg:
raise LinkError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
def manifest_setup_ldargs(self, output_filename, build_temp, ld_args):
# If we need a manifest at all, an embedded manifest is recommended.
# See MSDN article titled
# "How to: Embed a Manifest Inside a C/C++ Application"
# (currently at http://msdn2.microsoft.com/en-us/library/ms235591(VS.80).aspx)
# Ask the linker to generate the manifest in the temp dir, so
# we can check it, and possibly embed it, later.
temp_manifest = os.path.join(
build_temp,
os.path.basename(output_filename) + ".manifest")
ld_args.append('/MANIFESTFILE:' + temp_manifest)
def manifest_get_embed_info(self, target_desc, ld_args):
# If a manifest should be embedded, return a tuple of
# (manifest_filename, resource_id). Returns None if no manifest
# should be embedded. See http://bugs.python.org/issue7833 for why
# we want to avoid any manifest for extension modules if we can)
for arg in ld_args:
if arg.startswith("/MANIFESTFILE:"):
temp_manifest = arg.split(":", 1)[1]
break
else:
# no /MANIFESTFILE so nothing to do.
return None
if target_desc == CCompiler.EXECUTABLE:
# by default, executables always get the manifest with the
# CRT referenced.
mfid = 1
else:
# Extension modules try and avoid any manifest if possible.
mfid = 2
temp_manifest = self._remove_visual_c_ref(temp_manifest)
if temp_manifest is None:
return None
return temp_manifest, mfid
def _remove_visual_c_ref(self, manifest_file):
try:
# Remove references to the Visual C runtime, so they will
# fall through to the Visual C dependency of Python.exe.
# This way, when installed for a restricted user (e.g.
# runtimes are not in WinSxS folder, but in Python's own
# folder), the runtimes do not need to be in every folder
# with .pyd's.
# Returns either the filename of the modified manifest or
# None if no manifest should be embedded.
manifest_f = open(manifest_file)
try:
manifest_buf = manifest_f.read()
finally:
manifest_f.close()
pattern = re.compile(
r"""<assemblyIdentity.*?name=("|')Microsoft\."""\
r"""VC\d{2}\.CRT("|').*?(/>|</assemblyIdentity>)""",
re.DOTALL)
manifest_buf = re.sub(pattern, "", manifest_buf)
pattern = "<dependentAssembly>\s*</dependentAssembly>"
manifest_buf = re.sub(pattern, "", manifest_buf)
# Now see if any other assemblies are referenced - if not, we
# don't want a manifest embedded.
pattern = re.compile(
r"""<assemblyIdentity.*?name=(?:"|')(.+?)(?:"|')"""
r""".*?(?:/>|</assemblyIdentity>)""", re.DOTALL)
if re.search(pattern, manifest_buf) is None:
return None
manifest_f = open(manifest_file, 'w')
try:
manifest_f.write(manifest_buf)
return manifest_file
finally:
manifest_f.close()
except OSError:
pass
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function, in
# ccompiler.py.
def library_dir_option(self, dir):
return "/LIBPATH:" + dir
def runtime_library_dir_option(self, dir):
raise DistutilsPlatformError(
"don't know how to set runtime library search path for MSVC++")
def library_option(self, lib):
return self.library_filename(lib)
def find_library_file(self, dirs, lib, debug=0):
# Prefer a debugging library if found (and requested), but deal
# with it if we don't have one.
if debug:
try_names = [lib + "_d", lib]
else:
try_names = [lib]
for dir in dirs:
for name in try_names:
libfile = os.path.join(dir, self.library_filename (name))
if os.path.exists(libfile):
return libfile
else:
# Oops, didn't find it in *any* of 'dirs'
return None
# Helper methods for using the MSVC registry settings
def find_exe(self, exe):
"""Return path to an MSVC executable program.
Tries to find the program in several places: first, one of the
MSVC program search paths from the registry; next, the directories
in the PATH environment variable. If any of those work, return an
absolute path that is known to exist. If none of them work, just
return the original program name, 'exe'.
"""
for p in self.__paths:
fn = os.path.join(os.path.abspath(p), exe)
if os.path.isfile(fn):
return fn
# didn't find it; try existing path
for p in os.environ['Path'].split(';'):
fn = os.path.join(os.path.abspath(p),exe)
if os.path.isfile(fn):
return fn
return exe
"""distutils.msvccompiler
Contains MSVCCompiler, an implementation of the abstract CCompiler class
for the Microsoft Visual Studio.
"""
# Written by Perry Stoll
# hacked by Robin Becker and Thomas Heller to do a better job of
# finding DevStudio (through the registry)
import sys, os
from distutils.errors import \
DistutilsExecError, DistutilsPlatformError, \
CompileError, LibError, LinkError
from distutils.ccompiler import \
CCompiler, gen_preprocess_options, gen_lib_options
from distutils import log
_can_read_reg = False
try:
import winreg
_can_read_reg = True
hkey_mod = winreg
RegOpenKeyEx = winreg.OpenKeyEx
RegEnumKey = winreg.EnumKey
RegEnumValue = winreg.EnumValue
RegError = winreg.error
except ImportError:
try:
import win32api
import win32con
_can_read_reg = True
hkey_mod = win32con
RegOpenKeyEx = win32api.RegOpenKeyEx
RegEnumKey = win32api.RegEnumKey
RegEnumValue = win32api.RegEnumValue
RegError = win32api.error
except ImportError:
log.info("Warning: Can't read registry to find the "
"necessary compiler setting\n"
"Make sure that Python modules winreg, "
"win32api or win32con are installed.")
pass
if _can_read_reg:
HKEYS = (hkey_mod.HKEY_USERS,
hkey_mod.HKEY_CURRENT_USER,
hkey_mod.HKEY_LOCAL_MACHINE,
hkey_mod.HKEY_CLASSES_ROOT)
def read_keys(base, key):
"""Return list of registry keys."""
try:
handle = RegOpenKeyEx(base, key)
except RegError:
return None
L = []
i = 0
while True:
try:
k = RegEnumKey(handle, i)
except RegError:
break
L.append(k)
i += 1
return L
def read_values(base, key):
"""Return dict of registry keys and values.
All names are converted to lowercase.
"""
try:
handle = RegOpenKeyEx(base, key)
except RegError:
return None
d = {}
i = 0
while True:
try:
name, value, type = RegEnumValue(handle, i)
except RegError:
break
name = name.lower()
d[convert_mbcs(name)] = convert_mbcs(value)
i += 1
return d
def convert_mbcs(s):
dec = getattr(s, "decode", None)
if dec is not None:
try:
s = dec("mbcs")
except UnicodeError:
pass
return s
class MacroExpander:
def __init__(self, version):
self.macros = {}
self.load_macros(version)
def set_macro(self, macro, path, key):
for base in HKEYS:
d = read_values(base, path)
if d:
self.macros["$(%s)" % macro] = d[key]
break
def load_macros(self, version):
vsbase = r"Software\Microsoft\VisualStudio\%0.1f" % version
self.set_macro("VCInstallDir", vsbase + r"\Setup\VC", "productdir")
self.set_macro("VSInstallDir", vsbase + r"\Setup\VS", "productdir")
net = r"Software\Microsoft\.NETFramework"
self.set_macro("FrameworkDir", net, "installroot")
try:
if version > 7.0:
self.set_macro("FrameworkSDKDir", net, "sdkinstallrootv1.1")
else:
self.set_macro("FrameworkSDKDir", net, "sdkinstallroot")
except KeyError as exc: #
raise DistutilsPlatformError(
"""Python was built with Visual Studio 2003;
extensions must be built with a compiler than can generate compatible binaries.
Visual Studio 2003 was not found on this system. If you have Cygwin installed,
you can try compiling with MingW32, by passing "-c mingw32" to setup.py.""")
p = r"Software\Microsoft\NET Framework Setup\Product"
for base in HKEYS:
try:
h = RegOpenKeyEx(base, p)
except RegError:
continue
key = RegEnumKey(h, 0)
d = read_values(base, r"%s\%s" % (p, key))
self.macros["$(FrameworkVersion)"] = d["version"]
def sub(self, s):
for k, v in self.macros.items():
s = s.replace(k, v)
return s
def get_build_version():
"""Return the version of MSVC that was used to build Python.
For Python 2.3 and up, the version number is included in
sys.version. For earlier versions, assume the compiler is MSVC 6.
"""
prefix = "MSC v."
i = sys.version.find(prefix)
if i == -1:
return 6
i = i + len(prefix)
s, rest = sys.version[i:].split(" ", 1)
majorVersion = int(s[:-2]) - 6
minorVersion = int(s[2:3]) / 10.0
# I don't think paths are affected by minor version in version 6
if majorVersion == 6:
minorVersion = 0
if majorVersion >= 6:
return majorVersion + minorVersion
# else we don't know what version of the compiler this is
return None
def get_build_architecture():
"""Return the processor architecture.
Possible results are "Intel", "Itanium", or "AMD64".
"""
prefix = " bit ("
i = sys.version.find(prefix)
if i == -1:
return "Intel"
j = sys.version.find(")", i)
return sys.version[i+len(prefix):j]
def normalize_and_reduce_paths(paths):
"""Return a list of normalized paths with duplicates removed.
The current order of paths is maintained.
"""
# Paths are normalized so things like: /a and /a/ aren't both preserved.
reduced_paths = []
for p in paths:
np = os.path.normpath(p)
# XXX(nnorwitz): O(n**2), if reduced_paths gets long perhaps use a set.
if np not in reduced_paths:
reduced_paths.append(np)
return reduced_paths
class MSVCCompiler(CCompiler) :
"""Concrete class that implements an interface to Microsoft Visual C++,
as defined by the CCompiler abstract class."""
compiler_type = 'msvc'
# Just set this so CCompiler's constructor doesn't barf. We currently
# don't use the 'set_executables()' bureaucracy provided by CCompiler,
# as it really isn't necessary for this sort of single-compiler class.
# Would be nice to have a consistent interface with UnixCCompiler,
# though, so it's worth thinking about.
executables = {}
# Private class data (need to distinguish C from C++ source for compiler)
_c_extensions = ['.c']
_cpp_extensions = ['.cc', '.cpp', '.cxx']
_rc_extensions = ['.rc']
_mc_extensions = ['.mc']
# Needed for the filename generation methods provided by the
# base class, CCompiler.
src_extensions = (_c_extensions + _cpp_extensions +
_rc_extensions + _mc_extensions)
res_extension = '.res'
obj_extension = '.obj'
static_lib_extension = '.lib'
shared_lib_extension = '.dll'
static_lib_format = shared_lib_format = '%s%s'
exe_extension = '.exe'
def __init__(self, verbose=0, dry_run=0, force=0):
CCompiler.__init__ (self, verbose, dry_run, force)
self.__version = get_build_version()
self.__arch = get_build_architecture()
if self.__arch == "Intel":
# x86
if self.__version >= 7:
self.__root = r"Software\Microsoft\VisualStudio"
self.__macros = MacroExpander(self.__version)
else:
self.__root = r"Software\Microsoft\Devstudio"
self.__product = "Visual Studio version %s" % self.__version
else:
# Win64. Assume this was built with the platform SDK
self.__product = "Microsoft SDK compiler %s" % (self.__version + 6)
self.initialized = False
def initialize(self):
self.__paths = []
if "DISTUTILS_USE_SDK" in os.environ and "MSSdk" in os.environ and self.find_exe("cl.exe"):
# Assume that the SDK set up everything alright; don't try to be
# smarter
self.cc = "cl.exe"
self.linker = "link.exe"
self.lib = "lib.exe"
self.rc = "rc.exe"
self.mc = "mc.exe"
else:
self.__paths = self.get_msvc_paths("path")
if len(self.__paths) == 0:
raise DistutilsPlatformError("Python was built with %s, "
"and extensions need to be built with the same "
"version of the compiler, but it isn't installed."
% self.__product)
self.cc = self.find_exe("cl.exe")
self.linker = self.find_exe("link.exe")
self.lib = self.find_exe("lib.exe")
self.rc = self.find_exe("rc.exe") # resource compiler
self.mc = self.find_exe("mc.exe") # message compiler
self.set_path_env_var('lib')
self.set_path_env_var('include')
# extend the MSVC path with the current path
try:
for p in os.environ['path'].split(';'):
self.__paths.append(p)
except KeyError:
pass
self.__paths = normalize_and_reduce_paths(self.__paths)
os.environ['path'] = ";".join(self.__paths)
self.preprocess_options = None
if self.__arch == "Intel":
self.compile_options = [ '/nologo', '/Ox', '/MD', '/W3', '/GX' ,
'/DNDEBUG']
self.compile_options_debug = ['/nologo', '/Od', '/MDd', '/W3', '/GX',
'/Z7', '/D_DEBUG']
else:
# Win64
self.compile_options = [ '/nologo', '/Ox', '/MD', '/W3', '/GS-' ,
'/DNDEBUG']
self.compile_options_debug = ['/nologo', '/Od', '/MDd', '/W3', '/GS-',
'/Z7', '/D_DEBUG']
self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:NO']
if self.__version >= 7:
self.ldflags_shared_debug = [
'/DLL', '/nologo', '/INCREMENTAL:no', '/DEBUG'
]
else:
self.ldflags_shared_debug = [
'/DLL', '/nologo', '/INCREMENTAL:no', '/pdb:None', '/DEBUG'
]
self.ldflags_static = [ '/nologo']
self.initialized = True
# -- Worker methods ------------------------------------------------
def object_filenames(self,
source_filenames,
strip_dir=0,
output_dir=''):
# Copied from ccompiler.py, extended to return .res as 'object'-file
# for .rc input file
if output_dir is None: output_dir = ''
obj_names = []
for src_name in source_filenames:
(base, ext) = os.path.splitext (src_name)
base = os.path.splitdrive(base)[1] # Chop off the drive
base = base[os.path.isabs(base):] # If abs, chop off leading /
if ext not in self.src_extensions:
# Better to raise an exception instead of silently continuing
# and later complain about sources and targets having
# different lengths
raise CompileError ("Don't know how to compile %s" % src_name)
if strip_dir:
base = os.path.basename (base)
if ext in self._rc_extensions:
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
elif ext in self._mc_extensions:
obj_names.append (os.path.join (output_dir,
base + self.res_extension))
else:
obj_names.append (os.path.join (output_dir,
base + self.obj_extension))
return obj_names
def compile(self, sources,
output_dir=None, macros=None, include_dirs=None, debug=0,
extra_preargs=None, extra_postargs=None, depends=None):
if not self.initialized:
self.initialize()
compile_info = self._setup_compile(output_dir, macros, include_dirs,
sources, depends, extra_postargs)
macros, objects, extra_postargs, pp_opts, build = compile_info
compile_opts = extra_preargs or []
compile_opts.append ('/c')
if debug:
compile_opts.extend(self.compile_options_debug)
else:
compile_opts.extend(self.compile_options)
for obj in objects:
try:
src, ext = build[obj]
except KeyError:
continue
if debug:
# pass the full pathname to MSVC in debug mode,
# this allows the debugger to find the source file
# without asking the user to browse for it
src = os.path.abspath(src)
if ext in self._c_extensions:
input_opt = "/Tc" + src
elif ext in self._cpp_extensions:
input_opt = "/Tp" + src
elif ext in self._rc_extensions:
# compile .RC to .RES file
input_opt = src
output_opt = "/fo" + obj
try:
self.spawn([self.rc] + pp_opts +
[output_opt] + [input_opt])
except DistutilsExecError as msg:
raise CompileError(msg)
continue
elif ext in self._mc_extensions:
# Compile .MC to .RC file to .RES file.
# * '-h dir' specifies the directory for the
# generated include file
# * '-r dir' specifies the target directory of the
# generated RC file and the binary message resource
# it includes
#
# For now (since there are no options to change this),
# we use the source-directory for the include file and
# the build directory for the RC file and message
# resources. This works at least for win32all.
h_dir = os.path.dirname(src)
rc_dir = os.path.dirname(obj)
try:
# first compile .MC to .RC and .H file
self.spawn([self.mc] +
['-h', h_dir, '-r', rc_dir] + [src])
base, _ = os.path.splitext (os.path.basename (src))
rc_file = os.path.join (rc_dir, base + '.rc')
# then compile .RC to .RES file
self.spawn([self.rc] +
["/fo" + obj] + [rc_file])
except DistutilsExecError as msg:
raise CompileError(msg)
continue
else:
# how to handle this file?
raise CompileError("Don't know how to compile %s to %s"
% (src, obj))
output_opt = "/Fo" + obj
try:
self.spawn([self.cc] + compile_opts + pp_opts +
[input_opt, output_opt] +
extra_postargs)
except DistutilsExecError as msg:
raise CompileError(msg)
return objects
def create_static_lib(self,
objects,
output_libname,
output_dir=None,
debug=0,
target_lang=None):
if not self.initialized:
self.initialize()
(objects, output_dir) = self._fix_object_args(objects, output_dir)
output_filename = self.library_filename(output_libname,
output_dir=output_dir)
if self._need_link(objects, output_filename):
lib_args = objects + ['/OUT:' + output_filename]
if debug:
pass # XXX what goes here?
try:
self.spawn([self.lib] + lib_args)
except DistutilsExecError as msg:
raise LibError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
def link(self,
target_desc,
objects,
output_filename,
output_dir=None,
libraries=None,
library_dirs=None,
runtime_library_dirs=None,
export_symbols=None,
debug=0,
extra_preargs=None,
extra_postargs=None,
build_temp=None,
target_lang=None):
if not self.initialized:
self.initialize()
(objects, output_dir) = self._fix_object_args(objects, output_dir)
fixed_args = self._fix_lib_args(libraries, library_dirs,
runtime_library_dirs)
(libraries, library_dirs, runtime_library_dirs) = fixed_args
if runtime_library_dirs:
self.warn ("I don't know what to do with 'runtime_library_dirs': "
+ str (runtime_library_dirs))
lib_opts = gen_lib_options(self,
library_dirs, runtime_library_dirs,
libraries)
if output_dir is not None:
output_filename = os.path.join(output_dir, output_filename)
if self._need_link(objects, output_filename):
if target_desc == CCompiler.EXECUTABLE:
if debug:
ldflags = self.ldflags_shared_debug[1:]
else:
ldflags = self.ldflags_shared[1:]
else:
if debug:
ldflags = self.ldflags_shared_debug
else:
ldflags = self.ldflags_shared
export_opts = []
for sym in (export_symbols or []):
export_opts.append("/EXPORT:" + sym)
ld_args = (ldflags + lib_opts + export_opts +
objects + ['/OUT:' + output_filename])
# The MSVC linker generates .lib and .exp files, which cannot be
# suppressed by any linker switches. The .lib files may even be
# needed! Make sure they are generated in the temporary build
# directory. Since they have different names for debug and release
# builds, they can go into the same directory.
if export_symbols is not None:
(dll_name, dll_ext) = os.path.splitext(
os.path.basename(output_filename))
implib_file = os.path.join(
os.path.dirname(objects[0]),
self.library_filename(dll_name))
ld_args.append ('/IMPLIB:' + implib_file)
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath(os.path.dirname(output_filename))
try:
self.spawn([self.linker] + ld_args)
except DistutilsExecError as msg:
raise LinkError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function, in
# ccompiler.py.
def library_dir_option(self, dir):
return "/LIBPATH:" + dir
def runtime_library_dir_option(self, dir):
raise DistutilsPlatformError(
"don't know how to set runtime library search path for MSVC++")
def library_option(self, lib):
return self.library_filename(lib)
def find_library_file(self, dirs, lib, debug=0):
# Prefer a debugging library if found (and requested), but deal
# with it if we don't have one.
if debug:
try_names = [lib + "_d", lib]
else:
try_names = [lib]
for dir in dirs:
for name in try_names:
libfile = os.path.join(dir, self.library_filename (name))
if os.path.exists(libfile):
return libfile
else:
# Oops, didn't find it in *any* of 'dirs'
return None
# Helper methods for using the MSVC registry settings
def find_exe(self, exe):
"""Return path to an MSVC executable program.
Tries to find the program in several places: first, one of the
MSVC program search paths from the registry; next, the directories
in the PATH environment variable. If any of those work, return an
absolute path that is known to exist. If none of them work, just
return the original program name, 'exe'.
"""
for p in self.__paths:
fn = os.path.join(os.path.abspath(p), exe)
if os.path.isfile(fn):
return fn
# didn't find it; try existing path
for p in os.environ['Path'].split(';'):
fn = os.path.join(os.path.abspath(p),exe)
if os.path.isfile(fn):
return fn
return exe
def get_msvc_paths(self, path, platform='x86'):
"""Get a list of devstudio directories (include, lib or path).
Return a list of strings. The list will be empty if unable to
access the registry or appropriate registry keys not found.
"""
if not _can_read_reg:
return []
path = path + " dirs"
if self.__version >= 7:
key = (r"%s\%0.1f\VC\VC_OBJECTS_PLATFORM_INFO\Win32\Directories"
% (self.__root, self.__version))
else:
key = (r"%s\6.0\Build System\Components\Platforms"
r"\Win32 (%s)\Directories" % (self.__root, platform))
for base in HKEYS:
d = read_values(base, key)
if d:
if self.__version >= 7:
return self.__macros.sub(d[path]).split(";")
else:
return d[path].split(";")
# MSVC 6 seems to create the registry entries we need only when
# the GUI is run.
if self.__version == 6:
for base in HKEYS:
if read_values(base, r"%s\6.0" % self.__root) is not None:
self.warn("It seems you have Visual Studio 6 installed, "
"but the expected registry settings are not present.\n"
"You must at least run the Visual Studio GUI once "
"so that these entries are created.")
break
return []
def set_path_env_var(self, name):
"""Set environment variable 'name' to an MSVC path type value.
This is equivalent to a SET command prior to execution of spawned
commands.
"""
if name == "lib":
p = self.get_msvc_paths("library")
else:
p = self.get_msvc_paths(name)
if p:
os.environ[name] = ';'.join(p)
if get_build_version() >= 8.0:
log.debug("Importing new compiler from distutils.msvc9compiler")
OldMSVCCompiler = MSVCCompiler
from distutils.msvc9compiler import MSVCCompiler
# get_build_architecture not really relevant now we support cross-compile
from distutils.msvc9compiler import MacroExpander
"""distutils.spawn
Provides the 'spawn()' function, a front-end to various platform-
specific functions for launching another program in a sub-process.
Also provides the 'find_executable()' to search the path for a given
executable name.
"""
import sys
import os
from distutils.errors import DistutilsPlatformError, DistutilsExecError
from distutils.debug import DEBUG
from distutils import log
def spawn(cmd, search_path=1, verbose=0, dry_run=0):
"""Run another program, specified as a command list 'cmd', in a new process.
'cmd' is just the argument list for the new process, ie.
cmd[0] is the program to run and cmd[1:] are the rest of its arguments.
There is no way to run a program with a name different from that of its
executable.
If 'search_path' is true (the default), the system's executable
search path will be used to find the program; otherwise, cmd[0]
must be the exact path to the executable. If 'dry_run' is true,
the command will not actually be run.
Raise DistutilsExecError if running the program fails in any way; just
return on success.
"""
# cmd is documented as a list, but just in case some code passes a tuple
# in, protect our %-formatting code against horrible death
cmd = list(cmd)
if os.name == 'posix':
_spawn_posix(cmd, search_path, dry_run=dry_run)
elif os.name == 'nt':
_spawn_nt(cmd, search_path, dry_run=dry_run)
else:
raise DistutilsPlatformError(
"don't know how to spawn programs on platform '%s'" % os.name)
def _nt_quote_args(args):
"""Quote command-line arguments for DOS/Windows conventions.
Just wraps every argument which contains blanks in double quotes, and
returns a new argument list.
"""
# XXX this doesn't seem very robust to me -- but if the Windows guys
# say it'll work, I guess I'll have to accept it. (What if an arg
# contains quotes? What other magic characters, other than spaces,
# have to be escaped? Is there an escaping mechanism other than
# quoting?)
for i, arg in enumerate(args):
if ' ' in arg:
args[i] = '"%s"' % arg
return args
def _spawn_nt(cmd, search_path=1, verbose=0, dry_run=0):
executable = cmd[0]
cmd = _nt_quote_args(cmd)
if search_path:
# either we find one or it stays the same
executable = find_executable(executable) or executable
log.info(' '.join([executable] + cmd[1:]))
if not dry_run:
# spawn for NT requires a full path to the .exe
try:
rc = os.spawnv(os.P_WAIT, executable, cmd)
except OSError as exc:
# this seems to happen when the command isn't found
if not DEBUG:
cmd = executable
raise DistutilsExecError(
"command %r failed: %s" % (cmd, exc.args[-1]))
if rc != 0:
# and this reflects the command running but failing
if not DEBUG:
cmd = executable
raise DistutilsExecError(
"command %r failed with exit status %d" % (cmd, rc))
if sys.platform == 'darwin':
from distutils import sysconfig
_cfg_target = None
_cfg_target_split = None
def _spawn_posix(cmd, search_path=1, verbose=0, dry_run=0):
log.info(' '.join(cmd))
if dry_run:
return
executable = cmd[0]
exec_fn = search_path and os.execvp or os.execv
env = None
if sys.platform == 'darwin':
global _cfg_target, _cfg_target_split
if _cfg_target is None:
_cfg_target = sysconfig.get_config_var(
'MACOSX_DEPLOYMENT_TARGET') or ''
if _cfg_target:
_cfg_target_split = [int(x) for x in _cfg_target.split('.')]
if _cfg_target:
# ensure that the deployment target of build process is not less
# than that used when the interpreter was built. This ensures
# extension modules are built with correct compatibility values
cur_target = os.environ.get('MACOSX_DEPLOYMENT_TARGET', _cfg_target)
if _cfg_target_split > [int(x) for x in cur_target.split('.')]:
my_msg = ('$MACOSX_DEPLOYMENT_TARGET mismatch: '
'now "%s" but "%s" during configure'
% (cur_target, _cfg_target))
raise DistutilsPlatformError(my_msg)
env = dict(os.environ,
MACOSX_DEPLOYMENT_TARGET=cur_target)
exec_fn = search_path and os.execvpe or os.execve
pid = os.fork()
if pid == 0: # in the child
try:
if env is None:
exec_fn(executable, cmd)
else:
exec_fn(executable, cmd, env)
except OSError as e:
if not DEBUG:
cmd = executable
sys.stderr.write("unable to execute %r: %s\n"
% (cmd, e.strerror))
os._exit(1)
if not DEBUG:
cmd = executable
sys.stderr.write("unable to execute %r for unknown reasons" % cmd)
os._exit(1)
else: # in the parent
# Loop until the child either exits or is terminated by a signal
# (ie. keep waiting if it's merely stopped)
while True:
try:
pid, status = os.waitpid(pid, 0)
except OSError as exc:
import errno
if exc.errno == errno.EINTR:
continue
if not DEBUG:
cmd = executable
raise DistutilsExecError(
"command %r failed: %s" % (cmd, exc.args[-1]))
if os.WIFSIGNALED(status):
if not DEBUG:
cmd = executable
raise DistutilsExecError(
"command %r terminated by signal %d"
% (cmd, os.WTERMSIG(status)))
elif os.WIFEXITED(status):
exit_status = os.WEXITSTATUS(status)
if exit_status == 0:
return # hey, it succeeded!
else:
if not DEBUG:
cmd = executable
raise DistutilsExecError(
"command %r failed with exit status %d"
% (cmd, exit_status))
elif os.WIFSTOPPED(status):
continue
else:
if not DEBUG:
cmd = executable
raise DistutilsExecError(
"unknown error executing %r: termination status %d"
% (cmd, status))
def find_executable(executable, path=None):
"""Tries to find 'executable' in the directories listed in 'path'.
A string listing directories separated by 'os.pathsep'; defaults to
os.environ['PATH']. Returns the complete filename or None if not found.
"""
if path is None:
path = os.environ['PATH']
paths = path.split(os.pathsep)
base, ext = os.path.splitext(executable)
if (sys.platform == 'win32') and (ext != '.exe'):
executable = executable + '.exe'
if not os.path.isfile(executable):
for p in paths:
f = os.path.join(p, executable)
if os.path.isfile(f):
# the file exists, we have a shot at spawn working
return f
return None
else:
return executable
"""Provide access to Python's configuration information. The specific
configuration variables available depend heavily on the platform and
configuration. The values may be retrieved using
get_config_var(name), and the list of variables is available via
get_config_vars().keys(). Additional convenience functions are also
available.
Written by: Fred L. Drake, Jr.
Email: <[email protected]>
"""
import os
import re
import sys
from .errors import DistutilsPlatformError
# These are needed in a couple of spots, so just compute them once.
PREFIX = os.path.normpath(sys.prefix)
EXEC_PREFIX = os.path.normpath(sys.exec_prefix)
BASE_PREFIX = os.path.normpath(sys.base_prefix)
BASE_EXEC_PREFIX = os.path.normpath(sys.base_exec_prefix)
# Path to the base directory of the project. On Windows the binary may
# live in project/PCBuild9. If we're dealing with an x64 Windows build,
# it'll live in project/PCbuild/amd64.
# set for cross builds
if "_PYTHON_PROJECT_BASE" in os.environ:
project_base = os.path.abspath(os.environ["_PYTHON_PROJECT_BASE"])
else:
project_base = os.path.dirname(os.path.abspath(sys.executable))
if os.name == "nt" and "pcbuild" in project_base[-8:].lower():
project_base = os.path.abspath(os.path.join(project_base, os.path.pardir))
# PC/VS7.1
if os.name == "nt" and "\\pc\\v" in project_base[-10:].lower():
project_base = os.path.abspath(os.path.join(project_base, os.path.pardir,
os.path.pardir))
# PC/AMD64
if os.name == "nt" and "\\pcbuild\\amd64" in project_base[-14:].lower():
project_base = os.path.abspath(os.path.join(project_base, os.path.pardir,
os.path.pardir))
# python_build: (Boolean) if true, we're either building Python or
# building an extension with an un-installed Python, so we use
# different (hard-wired) directories.
# Setup.local is available for Makefile builds including VPATH builds,
# Setup.dist is available on Windows
def _is_python_source_dir(d):
for fn in ("Setup.dist", "Setup.local"):
if os.path.isfile(os.path.join(d, "Modules", fn)):
return True
return False
_sys_home = getattr(sys, '_home', None)
if _sys_home and os.name == 'nt' and \
_sys_home.lower().endswith(('pcbuild', 'pcbuild\\amd64')):
_sys_home = os.path.dirname(_sys_home)
if _sys_home.endswith('pcbuild'): # must be amd64
_sys_home = os.path.dirname(_sys_home)
def _python_build():
if _sys_home:
return _is_python_source_dir(_sys_home)
return _is_python_source_dir(project_base)
python_build = _python_build()
# Calculate the build qualifier flags if they are defined. Adding the flags
# to the include and lib directories only makes sense for an installation, not
# an in-source build.
build_flags = ''
try:
if not python_build:
build_flags = sys.abiflags
except AttributeError:
# It's not a configure-based build, so the sys module doesn't have
# this attribute, which is fine.
pass
def get_python_version():
"""Return a string containing the major and minor Python version,
leaving off the patchlevel. Sample return values could be '1.5'
or '2.2'.
"""
return sys.version[:3]
def get_python_inc(plat_specific=0, prefix=None):
"""Return the directory containing installed Python header files.
If 'plat_specific' is false (the default), this is the path to the
non-platform-specific header files, i.e. Python.h and so on;
otherwise, this is the path to platform-specific header files
(namely pyconfig.h).
If 'prefix' is supplied, use it instead of sys.base_prefix or
sys.base_exec_prefix -- i.e., ignore 'plat_specific'.
"""
if prefix is None:
prefix = plat_specific and BASE_EXEC_PREFIX or BASE_PREFIX
if os.name == "posix":
if python_build:
# Assume the executable is in the build directory. The
# pyconfig.h file should be in the same directory. Since
# the build directory may not be the source directory, we
# must use "srcdir" from the makefile to find the "Include"
# directory.
base = _sys_home or project_base
if plat_specific:
return base
if _sys_home:
incdir = os.path.join(_sys_home, get_config_var('AST_H_DIR'))
else:
incdir = os.path.join(get_config_var('srcdir'), 'Include')
return os.path.normpath(incdir)
python_dir = 'python' + get_python_version() + build_flags
return os.path.join(prefix, "include", python_dir)
elif os.name == "nt":
return os.path.join(prefix, "include")
else:
raise DistutilsPlatformError(
"I don't know where Python installs its C header files "
"on platform '%s'" % os.name)
def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):
"""Return the directory containing the Python library (standard or
site additions).
If 'plat_specific' is true, return the directory containing
platform-specific modules, i.e. any module from a non-pure-Python
module distribution; otherwise, return the platform-shared library
directory. If 'standard_lib' is true, return the directory
containing standard Python library modules; otherwise, return the
directory for site-specific modules.
If 'prefix' is supplied, use it instead of sys.base_prefix or
sys.base_exec_prefix -- i.e., ignore 'plat_specific'.
"""
if prefix is None:
if standard_lib:
prefix = plat_specific and BASE_EXEC_PREFIX or BASE_PREFIX
else:
prefix = plat_specific and EXEC_PREFIX or PREFIX
if os.name == "posix":
libpython = os.path.join(prefix,
"lib", "ironpython" + get_python_version())
if standard_lib:
return libpython
else:
return os.path.join(libpython, "site-packages")
elif os.name == "nt":
if standard_lib:
return os.path.join(prefix, "Lib")
else:
return os.path.join(prefix, "Lib", "site-packages")
else:
raise DistutilsPlatformError(
"I don't know where Python installs its library "
"on platform '%s'" % os.name)
def customize_compiler(compiler):
"""Do any platform-specific customization of a CCompiler instance.
Mainly needed on Unix, so we can plug in the information that
varies across Unices and is stored in Python's Makefile.
"""
if compiler.compiler_type == "unix":
if sys.platform == "darwin":
# Perform first-time customization of compiler-related
# config vars on OS X now that we know we need a compiler.
# This is primarily to support Pythons from binary
# installers. The kind and paths to build tools on
# the user system may vary significantly from the system
# that Python itself was built on. Also the user OS
# version and build tools may not support the same set
# of CPU architectures for universal builds.
global _config_vars
# Use get_config_var() to ensure _config_vars is initialized.
if not get_config_var('CUSTOMIZED_OSX_COMPILER'):
import _osx_support
_osx_support.customize_compiler(_config_vars)
_config_vars['CUSTOMIZED_OSX_COMPILER'] = 'True'
(cc, cxx, opt, cflags, ccshared, ldshared, shlib_suffix, ar, ar_flags) = \
get_config_vars('CC', 'CXX', 'OPT', 'CFLAGS',
'CCSHARED', 'LDSHARED', 'SHLIB_SUFFIX', 'AR', 'ARFLAGS')
if 'CC' in os.environ:
newcc = os.environ['CC']
if (sys.platform == 'darwin'
and 'LDSHARED' not in os.environ
and ldshared.startswith(cc)):
# On OS X, if CC is overridden, use that as the default
# command for LDSHARED as well
ldshared = newcc + ldshared[len(cc):]
cc = newcc
if 'CXX' in os.environ:
cxx = os.environ['CXX']
if 'LDSHARED' in os.environ:
ldshared = os.environ['LDSHARED']
if 'CPP' in os.environ:
cpp = os.environ['CPP']
else:
cpp = cc + " -E" # not always
if 'LDFLAGS' in os.environ:
ldshared = ldshared + ' ' + os.environ['LDFLAGS']
if 'CFLAGS' in os.environ:
cflags = opt + ' ' + os.environ['CFLAGS']
ldshared = ldshared + ' ' + os.environ['CFLAGS']
if 'CPPFLAGS' in os.environ:
cpp = cpp + ' ' + os.environ['CPPFLAGS']
cflags = cflags + ' ' + os.environ['CPPFLAGS']
ldshared = ldshared + ' ' + os.environ['CPPFLAGS']
if 'AR' in os.environ:
ar = os.environ['AR']
if 'ARFLAGS' in os.environ:
archiver = ar + ' ' + os.environ['ARFLAGS']
else:
archiver = ar + ' ' + ar_flags
cc_cmd = cc + ' ' + cflags
compiler.set_executables(
preprocessor=cpp,
compiler=cc_cmd,
compiler_so=cc_cmd + ' ' + ccshared,
compiler_cxx=cxx,
linker_so=ldshared,
linker_exe=cc,
archiver=archiver)
compiler.shared_lib_extension = shlib_suffix
def get_config_h_filename():
"""Return full pathname of installed pyconfig.h file."""
if python_build:
if os.name == "nt":
inc_dir = os.path.join(_sys_home or project_base, "PC")
else:
inc_dir = _sys_home or project_base
else:
inc_dir = get_python_inc(plat_specific=1)
return os.path.join(inc_dir, 'pyconfig.h')
def get_makefile_filename():
"""Return full pathname of installed Makefile from the Python build."""
if python_build:
return os.path.join(_sys_home or project_base, "Makefile")
lib_dir = get_python_lib(plat_specific=0, standard_lib=1)
config_file = 'config-{}{}'.format(get_python_version(), build_flags)
return os.path.join(lib_dir, config_file, 'Makefile')
def parse_config_h(fp, g=None):
"""Parse a config.h-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
if g is None:
g = {}
define_rx = re.compile("#define ([A-Z][A-Za-z0-9_]+) (.*)\n")
undef_rx = re.compile("/[*] #undef ([A-Z][A-Za-z0-9_]+) [*]/\n")
#
while True:
line = fp.readline()
if not line:
break
m = define_rx.match(line)
if m:
n, v = m.group(1, 2)
try: v = int(v)
except ValueError: pass
g[n] = v
else:
m = undef_rx.match(line)
if m:
g[m.group(1)] = 0
return g
# Regexes needed for parsing Makefile (and similar syntaxes,
# like old-style Setup files).
_variable_rx = re.compile("([a-zA-Z][a-zA-Z0-9_]+)\s*=\s*(.*)")
_findvar1_rx = re.compile(r"\$\(([A-Za-z][A-Za-z0-9_]*)\)")
_findvar2_rx = re.compile(r"\${([A-Za-z][A-Za-z0-9_]*)}")
def parse_makefile(fn, g=None):
"""Parse a Makefile-style file.
A dictionary containing name/value pairs is returned. If an
optional dictionary is passed in as the second argument, it is
used instead of a new dictionary.
"""
from distutils.text_file import TextFile
fp = TextFile(fn, strip_comments=1, skip_blanks=1, join_lines=1, errors="surrogateescape")
if g is None:
g = {}
done = {}
notdone = {}
while True:
line = fp.readline()
if line is None: # eof
break
m = _variable_rx.match(line)
if m:
n, v = m.group(1, 2)
v = v.strip()
# `$$' is a literal `$' in make
tmpv = v.replace('$$', '')
if "$" in tmpv:
notdone[n] = v
else:
try:
v = int(v)
except ValueError:
# insert literal `$'
done[n] = v.replace('$$', '$')
else:
done[n] = v
# Variables with a 'PY_' prefix in the makefile. These need to
# be made available without that prefix through sysconfig.
# Special care is needed to ensure that variable expansion works, even
# if the expansion uses the name without a prefix.
renamed_variables = ('CFLAGS', 'LDFLAGS', 'CPPFLAGS')
# do variable interpolation here
while notdone:
for name in list(notdone):
value = notdone[name]
m = _findvar1_rx.search(value) or _findvar2_rx.search(value)
if m:
n = m.group(1)
found = True
if n in done:
item = str(done[n])
elif n in notdone:
# get it on a subsequent round
found = False
elif n in os.environ:
# do it like make: fall back to environment
item = os.environ[n]
elif n in renamed_variables:
if name.startswith('PY_') and name[3:] in renamed_variables:
item = ""
elif 'PY_' + n in notdone:
found = False
else:
item = str(done['PY_' + n])
else:
done[n] = item = ""
if found:
after = value[m.end():]
value = value[:m.start()] + item + after
if "$" in after:
notdone[name] = value
else:
try: value = int(value)
except ValueError:
done[name] = value.strip()
else:
done[name] = value
del notdone[name]
if name.startswith('PY_') \
and name[3:] in renamed_variables:
name = name[3:]
if name not in done:
done[name] = value
else:
# bogus variable reference; just drop it since we can't deal
del notdone[name]
fp.close()
# strip spurious spaces
for k, v in done.items():
if isinstance(v, str):
done[k] = v.strip()
# save the results in the global dictionary
g.update(done)
return g
def expand_makefile_vars(s, vars):
"""Expand Makefile-style variables -- "${foo}" or "$(foo)" -- in
'string' according to 'vars' (a dictionary mapping variable names to
values). Variables not present in 'vars' are silently expanded to the
empty string. The variable values in 'vars' should not contain further
variable expansions; if 'vars' is the output of 'parse_makefile()',
you're fine. Returns a variable-expanded version of 's'.
"""
# This algorithm does multiple expansion, so if vars['foo'] contains
# "${bar}", it will expand ${foo} to ${bar}, and then expand
# ${bar}... and so forth. This is fine as long as 'vars' comes from
# 'parse_makefile()', which takes care of such expansions eagerly,
# according to make's variable expansion semantics.
while True:
m = _findvar1_rx.search(s) or _findvar2_rx.search(s)
if m:
(beg, end) = m.span()
s = s[0:beg] + vars.get(m.group(1)) + s[end:]
else:
break
return s
_config_vars = None
def _init_posix():
"""Initialize the module as appropriate for POSIX systems."""
# _sysconfigdata is generated at build time, see the sysconfig module
from _sysconfigdata import build_time_vars # ironpython
global _config_vars
_config_vars = {}
_config_vars.update(build_time_vars)
def _init_nt():
"""Initialize the module as appropriate for NT"""
g = {}
# set basic install directories
g['LIBDEST'] = get_python_lib(plat_specific=0, standard_lib=1)
g['BINLIBDEST'] = get_python_lib(plat_specific=1, standard_lib=1)
# XXX hmmm.. a normal install puts include files here
g['INCLUDEPY'] = get_python_inc(plat_specific=0)
g['EXT_SUFFIX'] = '.pyd'
g['EXE'] = ".exe"
g['VERSION'] = get_python_version().replace(".", "")
g['BINDIR'] = os.path.dirname(os.path.abspath(sys.executable))
global _config_vars
_config_vars = g
def get_config_vars(*args):
"""With no arguments, return a dictionary of all configuration
variables relevant for the current platform. Generally this includes
everything needed to build extensions and install both pure modules and
extensions. On Unix, this means every variable defined in Python's
installed Makefile; on Windows it's a much smaller set.
With arguments, return a list of values that result from looking up
each argument in the configuration variable dictionary.
"""
global _config_vars
if _config_vars is None:
func = globals().get("_init_" + os.name)
if func:
func()
else:
_config_vars = {}
# Normalized versions of prefix and exec_prefix are handy to have;
# in fact, these are the standard versions used most places in the
# Distutils.
_config_vars['prefix'] = PREFIX
_config_vars['exec_prefix'] = EXEC_PREFIX
# For backward compatibility, see issue19555
SO = _config_vars.get('EXT_SUFFIX')
if SO is not None:
_config_vars['SO'] = SO
# Always convert srcdir to an absolute path
srcdir = _config_vars.get('srcdir', project_base)
if os.name == 'posix':
if python_build:
# If srcdir is a relative path (typically '.' or '..')
# then it should be interpreted relative to the directory
# containing Makefile.
base = os.path.dirname(get_makefile_filename())
srcdir = os.path.join(base, srcdir)
else:
# srcdir is not meaningful since the installation is
# spread about the filesystem. We choose the
# directory containing the Makefile since we know it
# exists.
srcdir = os.path.dirname(get_makefile_filename())
_config_vars['srcdir'] = os.path.abspath(os.path.normpath(srcdir))
# Convert srcdir into an absolute path if it appears necessary.
# Normally it is relative to the build directory. However, during
# testing, for example, we might be running a non-installed python
# from a different directory.
if python_build and os.name == "posix":
base = project_base
if (not os.path.isabs(_config_vars['srcdir']) and
base != os.getcwd()):
# srcdir is relative and we are not in the same directory
# as the executable. Assume executable is in the build
# directory and make srcdir absolute.
srcdir = os.path.join(base, _config_vars['srcdir'])
_config_vars['srcdir'] = os.path.normpath(srcdir)
# OS X platforms require special customization to handle
# multi-architecture, multi-os-version installers
if sys.platform == 'darwin':
import _osx_support
_osx_support.customize_config_vars(_config_vars)
if args:
vals = []
for name in args:
vals.append(_config_vars.get(name))
return vals
else:
return _config_vars
def get_config_var(name):
"""Return the value of a single variable using the dictionary
returned by 'get_config_vars()'. Equivalent to
get_config_vars().get(name)
"""
if name == 'SO':
import warnings
warnings.warn('SO is deprecated, use EXT_SUFFIX', DeprecationWarning, 2)
return get_config_vars().get(name)
"""text_file
provides the TextFile class, which gives an interface to text files
that (optionally) takes care of stripping comments, ignoring blank
lines, and joining lines with backslashes."""
import sys, os, io
class TextFile:
"""Provides a file-like object that takes care of all the things you
commonly want to do when processing a text file that has some
line-by-line syntax: strip comments (as long as "#" is your
comment character), skip blank lines, join adjacent lines by
escaping the newline (ie. backslash at end of line), strip
leading and/or trailing whitespace. All of these are optional
and independently controllable.
Provides a 'warn()' method so you can generate warning messages that
report physical line number, even if the logical line in question
spans multiple physical lines. Also provides 'unreadline()' for
implementing line-at-a-time lookahead.
Constructor is called as:
TextFile (filename=None, file=None, **options)
It bombs (RuntimeError) if both 'filename' and 'file' are None;
'filename' should be a string, and 'file' a file object (or
something that provides 'readline()' and 'close()' methods). It is
recommended that you supply at least 'filename', so that TextFile
can include it in warning messages. If 'file' is not supplied,
TextFile creates its own using 'io.open()'.
The options are all boolean, and affect the value returned by
'readline()':
strip_comments [default: true]
strip from "#" to end-of-line, as well as any whitespace
leading up to the "#" -- unless it is escaped by a backslash
lstrip_ws [default: false]
strip leading whitespace from each line before returning it
rstrip_ws [default: true]
strip trailing whitespace (including line terminator!) from
each line before returning it
skip_blanks [default: true}
skip lines that are empty *after* stripping comments and
whitespace. (If both lstrip_ws and rstrip_ws are false,
then some lines may consist of solely whitespace: these will
*not* be skipped, even if 'skip_blanks' is true.)
join_lines [default: false]
if a backslash is the last non-newline character on a line
after stripping comments and whitespace, join the following line
to it to form one "logical line"; if N consecutive lines end
with a backslash, then N+1 physical lines will be joined to
form one logical line.
collapse_join [default: false]
strip leading whitespace from lines that are joined to their
predecessor; only matters if (join_lines and not lstrip_ws)
errors [default: 'strict']
error handler used to decode the file content
Note that since 'rstrip_ws' can strip the trailing newline, the
semantics of 'readline()' must differ from those of the builtin file
object's 'readline()' method! In particular, 'readline()' returns
None for end-of-file: an empty string might just be a blank line (or
an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
not."""
default_options = { 'strip_comments': 1,
'skip_blanks': 1,
'lstrip_ws': 0,
'rstrip_ws': 1,
'join_lines': 0,
'collapse_join': 0,
'errors': 'strict',
}
def __init__(self, filename=None, file=None, **options):
"""Construct a new TextFile object. At least one of 'filename'
(a string) and 'file' (a file-like object) must be supplied.
They keyword argument options are described above and affect
the values returned by 'readline()'."""
if filename is None and file is None:
raise RuntimeError("you must supply either or both of 'filename' and 'file'")
# set values for all options -- either from client option hash
# or fallback to default_options
for opt in self.default_options.keys():
if opt in options:
setattr(self, opt, options[opt])
else:
setattr(self, opt, self.default_options[opt])
# sanity check client option hash
for opt in options.keys():
if opt not in self.default_options:
raise KeyError("invalid TextFile option '%s'" % opt)
if file is None:
self.open(filename)
else:
self.filename = filename
self.file = file
self.current_line = 0 # assuming that file is at BOF!
# 'linebuf' is a stack of lines that will be emptied before we
# actually read from the file; it's only populated by an
# 'unreadline()' operation
self.linebuf = []
def open(self, filename):
"""Open a new file named 'filename'. This overrides both the
'filename' and 'file' arguments to the constructor."""
self.filename = filename
self.file = io.open(self.filename, 'r', errors=self.errors)
self.current_line = 0
def close(self):
"""Close the current file and forget everything we know about it
(filename, current line number)."""
file = self.file
self.file = None
self.filename = None
self.current_line = None
file.close()
def gen_error(self, msg, line=None):
outmsg = []
if line is None:
line = self.current_line
outmsg.append(self.filename + ", ")
if isinstance(line, (list, tuple)):
outmsg.append("lines %d-%d: " % tuple(line))
else:
outmsg.append("line %d: " % line)
outmsg.append(str(msg))
return "".join(outmsg)
def error(self, msg, line=None):
raise ValueError("error: " + self.gen_error(msg, line))
def warn(self, msg, line=None):
"""Print (to stderr) a warning message tied to the current logical
line in the current file. If the current logical line in the
file spans multiple physical lines, the warning refers to the
whole range, eg. "lines 3-5". If 'line' supplied, it overrides
the current line number; it may be a list or tuple to indicate a
range of physical lines, or an integer for a single physical
line."""
sys.stderr.write("warning: " + self.gen_error(msg, line) + "\n")
def readline(self):
"""Read and return a single logical line from the current file (or
from an internal buffer if lines have previously been "unread"
with 'unreadline()'). If the 'join_lines' option is true, this
may involve reading multiple physical lines concatenated into a
single string. Updates the current line number, so calling
'warn()' after 'readline()' emits a warning about the physical
line(s) just read. Returns None on end-of-file, since the empty
string can occur if 'rstrip_ws' is true but 'strip_blanks' is
not."""
# If any "unread" lines waiting in 'linebuf', return the top
# one. (We don't actually buffer read-ahead data -- lines only
# get put in 'linebuf' if the client explicitly does an
# 'unreadline()'.
if self.linebuf:
line = self.linebuf[-1]
del self.linebuf[-1]
return line
buildup_line = ''
while True:
# read the line, make it None if EOF
line = self.file.readline()
if line == '':
line = None
if self.strip_comments and line:
# Look for the first "#" in the line. If none, never
# mind. If we find one and it's the first character, or
# is not preceded by "\", then it starts a comment --
# strip the comment, strip whitespace before it, and
# carry on. Otherwise, it's just an escaped "#", so
# unescape it (and any other escaped "#"'s that might be
# lurking in there) and otherwise leave the line alone.
pos = line.find("#")
if pos == -1: # no "#" -- no comments
pass
# It's definitely a comment -- either "#" is the first
# character, or it's elsewhere and unescaped.
elif pos == 0 or line[pos-1] != "\\":
# Have to preserve the trailing newline, because it's
# the job of a later step (rstrip_ws) to remove it --
# and if rstrip_ws is false, we'd better preserve it!
# (NB. this means that if the final line is all comment
# and has no trailing newline, we will think that it's
# EOF; I think that's OK.)
eol = (line[-1] == '\n') and '\n' or ''
line = line[0:pos] + eol
# If all that's left is whitespace, then skip line
# *now*, before we try to join it to 'buildup_line' --
# that way constructs like
# hello \\
# # comment that should be ignored
# there
# result in "hello there".
if line.strip() == "":
continue
else: # it's an escaped "#"
line = line.replace("\\#", "#")
# did previous line end with a backslash? then accumulate
if self.join_lines and buildup_line:
# oops: end of file
if line is None:
self.warn("continuation line immediately precedes "
"end-of-file")
return buildup_line
if self.collapse_join:
line = line.lstrip()
line = buildup_line + line
# careful: pay attention to line number when incrementing it
if isinstance(self.current_line, list):
self.current_line[1] = self.current_line[1] + 1
else:
self.current_line = [self.current_line,
self.current_line + 1]
# just an ordinary line, read it as usual
else:
if line is None: # eof
return None
# still have to be careful about incrementing the line number!
if isinstance(self.current_line, list):
self.current_line = self.current_line[1] + 1
else:
self.current_line = self.current_line + 1
# strip whitespace however the client wants (leading and
# trailing, or one or the other, or neither)
if self.lstrip_ws and self.rstrip_ws:
line = line.strip()
elif self.lstrip_ws:
line = line.lstrip()
elif self.rstrip_ws:
line = line.rstrip()
# blank line (whether we rstrip'ed or not)? skip to next line
# if appropriate
if (line == '' or line == '\n') and self.skip_blanks:
continue
if self.join_lines:
if line[-1] == '\\':
buildup_line = line[:-1]
continue
if line[-2:] == '\\\n':
buildup_line = line[0:-2] + '\n'
continue
# well, I guess there's some actual content there: return it
return line
def readlines(self):
"""Read and return the list of all logical lines remaining in the
current file."""
lines = []
while True:
line = self.readline()
if line is None:
return lines
lines.append(line)
def unreadline(self, line):
"""Push 'line' (a string) onto an internal buffer that will be
checked by future 'readline()' calls. Handy for implementing
a parser with line-at-a-time lookahead."""
self.linebuf.append(line)
"""distutils.unixccompiler
Contains the UnixCCompiler class, a subclass of CCompiler that handles
the "typical" Unix-style command-line C compiler:
* macros defined with -Dname[=value]
* macros undefined with -Uname
* include search directories specified with -Idir
* libraries specified with -lllib
* library search directories specified with -Ldir
* compile handled by 'cc' (or similar) executable with -c option:
compiles .c to .o
* link static library handled by 'ar' command (possibly with 'ranlib')
* link shared library handled by 'cc -shared'
"""
import os, sys, re
from distutils import sysconfig
from distutils.dep_util import newer
from distutils.ccompiler import \
CCompiler, gen_preprocess_options, gen_lib_options
from distutils.errors import \
DistutilsExecError, CompileError, LibError, LinkError
from distutils import log
if sys.platform == 'darwin':
import _osx_support
# XXX Things not currently handled:
# * optimization/debug/warning flags; we just use whatever's in Python's
# Makefile and live with it. Is this adequate? If not, we might
# have to have a bunch of subclasses GNUCCompiler, SGICCompiler,
# SunCCompiler, and I suspect down that road lies madness.
# * even if we don't know a warning flag from an optimization flag,
# we need some way for outsiders to feed preprocessor/compiler/linker
# flags in to us -- eg. a sysadmin might want to mandate certain flags
# via a site config file, or a user might want to set something for
# compiling this module distribution only via the setup.py command
# line, whatever. As long as these options come from something on the
# current system, they can be as system-dependent as they like, and we
# should just happily stuff them into the preprocessor/compiler/linker
# options and carry on.
class UnixCCompiler(CCompiler):
compiler_type = 'unix'
# These are used by CCompiler in two places: the constructor sets
# instance attributes 'preprocessor', 'compiler', etc. from them, and
# 'set_executable()' allows any of these to be set. The defaults here
# are pretty generic; they will probably have to be set by an outsider
# (eg. using information discovered by the sysconfig about building
# Python extensions).
executables = {'preprocessor' : None,
'compiler' : ["cc"],
'compiler_so' : ["cc"],
'compiler_cxx' : ["cc"],
'linker_so' : ["cc", "-shared"],
'linker_exe' : ["cc"],
'archiver' : ["ar", "-cr"],
'ranlib' : None,
}
if sys.platform[:6] == "darwin":
executables['ranlib'] = ["ranlib"]
# Needed for the filename generation methods provided by the base
# class, CCompiler. NB. whoever instantiates/uses a particular
# UnixCCompiler instance should set 'shared_lib_ext' -- we set a
# reasonable common default here, but it's not necessarily used on all
# Unices!
src_extensions = [".c",".C",".cc",".cxx",".cpp",".m"]
obj_extension = ".o"
static_lib_extension = ".a"
shared_lib_extension = ".so"
dylib_lib_extension = ".dylib"
static_lib_format = shared_lib_format = dylib_lib_format = "lib%s%s"
if sys.platform == "cygwin":
exe_extension = ".exe"
def preprocess(self, source, output_file=None, macros=None,
include_dirs=None, extra_preargs=None, extra_postargs=None):
fixed_args = self._fix_compile_args(None, macros, include_dirs)
ignore, macros, include_dirs = fixed_args
pp_opts = gen_preprocess_options(macros, include_dirs)
pp_args = self.preprocessor + pp_opts
if output_file:
pp_args.extend(['-o', output_file])
if extra_preargs:
pp_args[:0] = extra_preargs
if extra_postargs:
pp_args.extend(extra_postargs)
pp_args.append(source)
# We need to preprocess: either we're being forced to, or we're
# generating output to stdout, or there's a target output file and
# the source file is newer than the target (or the target doesn't
# exist).
if self.force or output_file is None or newer(source, output_file):
if output_file:
self.mkpath(os.path.dirname(output_file))
try:
self.spawn(pp_args)
except DistutilsExecError as msg:
raise CompileError(msg)
def _compile(self, obj, src, ext, cc_args, extra_postargs, pp_opts):
compiler_so = self.compiler_so
if sys.platform == 'darwin':
compiler_so = _osx_support.compiler_fixup(compiler_so,
cc_args + extra_postargs)
try:
self.spawn(compiler_so + cc_args + [src, '-o', obj] +
extra_postargs)
except DistutilsExecError as msg:
raise CompileError(msg)
def create_static_lib(self, objects, output_libname,
output_dir=None, debug=0, target_lang=None):
objects, output_dir = self._fix_object_args(objects, output_dir)
output_filename = \
self.library_filename(output_libname, output_dir=output_dir)
if self._need_link(objects, output_filename):
self.mkpath(os.path.dirname(output_filename))
self.spawn(self.archiver +
[output_filename] +
objects + self.objects)
# Not many Unices required ranlib anymore -- SunOS 4.x is, I
# think the only major Unix that does. Maybe we need some
# platform intelligence here to skip ranlib if it's not
# needed -- or maybe Python's configure script took care of
# it for us, hence the check for leading colon.
if self.ranlib:
try:
self.spawn(self.ranlib + [output_filename])
except DistutilsExecError as msg:
raise LibError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
def link(self, target_desc, objects,
output_filename, output_dir=None, libraries=None,
library_dirs=None, runtime_library_dirs=None,
export_symbols=None, debug=0, extra_preargs=None,
extra_postargs=None, build_temp=None, target_lang=None):
objects, output_dir = self._fix_object_args(objects, output_dir)
fixed_args = self._fix_lib_args(libraries, library_dirs,
runtime_library_dirs)
libraries, library_dirs, runtime_library_dirs = fixed_args
lib_opts = gen_lib_options(self, library_dirs, runtime_library_dirs,
libraries)
if not isinstance(output_dir, (str, type(None))):
raise TypeError("'output_dir' must be a string or None")
if output_dir is not None:
output_filename = os.path.join(output_dir, output_filename)
if self._need_link(objects, output_filename):
ld_args = (objects + self.objects +
lib_opts + ['-o', output_filename])
if debug:
ld_args[:0] = ['-g']
if extra_preargs:
ld_args[:0] = extra_preargs
if extra_postargs:
ld_args.extend(extra_postargs)
self.mkpath(os.path.dirname(output_filename))
try:
if target_desc == CCompiler.EXECUTABLE:
linker = self.linker_exe[:]
else:
linker = self.linker_so[:]
if target_lang == "c++" and self.compiler_cxx:
# skip over environment variable settings if /usr/bin/env
# is used to set up the linker's environment.
# This is needed on OSX. Note: this assumes that the
# normal and C++ compiler have the same environment
# settings.
i = 0
if os.path.basename(linker[0]) == "env":
i = 1
while '=' in linker[i]:
i += 1
linker[i] = self.compiler_cxx[i]
if sys.platform == 'darwin':
linker = _osx_support.compiler_fixup(linker, ld_args)
self.spawn(linker + ld_args)
except DistutilsExecError as msg:
raise LinkError(msg)
else:
log.debug("skipping %s (up-to-date)", output_filename)
# -- Miscellaneous methods -----------------------------------------
# These are all used by the 'gen_lib_options() function, in
# ccompiler.py.
def library_dir_option(self, dir):
return "-L" + dir
def _is_gcc(self, compiler_name):
return "gcc" in compiler_name or "g++" in compiler_name
def runtime_library_dir_option(self, dir):
# XXX Hackish, at the very least. See Python bug #445902:
# http://sourceforge.net/tracker/index.php
# ?func=detail&aid=445902&group_id=5470&atid=105470
# Linkers on different platforms need different options to
# specify that directories need to be added to the list of
# directories searched for dependencies when a dynamic library
# is sought. GCC on GNU systems (Linux, FreeBSD, ...) has to
# be told to pass the -R option through to the linker, whereas
# other compilers and gcc on other systems just know this.
# Other compilers may need something slightly different. At
# this time, there's no way to determine this information from
# the configuration data stored in the Python installation, so
# we use this hack.
compiler = os.path.basename(sysconfig.get_config_var("CC"))
if sys.platform[:6] == "darwin":
# MacOSX's linker doesn't understand the -R flag at all
return "-L" + dir
elif sys.platform[:5] == "hp-ux":
if self._is_gcc(compiler):
return ["-Wl,+s", "-L" + dir]
return ["+s", "-L" + dir]
elif sys.platform[:7] == "irix646" or sys.platform[:6] == "osf1V5":
return ["-rpath", dir]
else:
if self._is_gcc(compiler):
# gcc on non-GNU systems does not need -Wl, but can
# use it anyway. Since distutils has always passed in
# -Wl whenever gcc was used in the past it is probably
# safest to keep doing so.
if sysconfig.get_config_var("GNULD") == "yes":
# GNU ld needs an extra option to get a RUNPATH
# instead of just an RPATH.
return "-Wl,--enable-new-dtags,-R" + dir
else:
return "-Wl,-R" + dir
else:
# No idea how --enable-new-dtags would be passed on to
# ld if this system was using GNU ld. Don't know if a
# system like this even exists.
return "-R" + dir
def library_option(self, lib):
return "-l" + lib
def find_library_file(self, dirs, lib, debug=0):
shared_f = self.library_filename(lib, lib_type='shared')
dylib_f = self.library_filename(lib, lib_type='dylib')
static_f = self.library_filename(lib, lib_type='static')
if sys.platform == 'darwin':
# On OSX users can specify an alternate SDK using
# '-isysroot', calculate the SDK root if it is specified
# (and use it further on)
cflags = sysconfig.get_config_var('CFLAGS')
m = re.search(r'-isysroot\s+(\S+)', cflags)
if m is None:
sysroot = '/'
else:
sysroot = m.group(1)
for dir in dirs:
shared = os.path.join(dir, shared_f)
dylib = os.path.join(dir, dylib_f)
static = os.path.join(dir, static_f)
if sys.platform == 'darwin' and (
dir.startswith('/System/') or (
dir.startswith('/usr/') and not dir.startswith('/usr/local/'))):
shared = os.path.join(sysroot, dir[1:], shared_f)
dylib = os.path.join(sysroot, dir[1:], dylib_f)
static = os.path.join(sysroot, dir[1:], static_f)
# We're second-guessing the linker here, with not much hard
# data to go on: GCC seems to prefer the shared library, so I'm
# assuming that *all* Unix C compilers do. And of course I'm
# ignoring even GCC's "-static" option. So sue me.
if os.path.exists(dylib):
return dylib
elif os.path.exists(shared):
return shared
elif os.path.exists(static):
return static
# Oops, didn't find it in *any* of 'dirs'
return None
"""distutils.util
Miscellaneous utility functions -- anything that doesn't fit into
one of the other *util.py modules.
"""
import os
import re
import importlib.util
import sys
import string
from distutils.errors import DistutilsPlatformError
from distutils.dep_util import newer
from distutils.spawn import spawn
from distutils import log
from distutils.errors import DistutilsByteCompileError
def get_platform ():
"""Return a string that identifies the current platform. This is used
mainly to distinguish platform-specific build directories and
platform-specific built distributions. Typically includes the OS name
and version and the architecture (as supplied by 'os.uname()'),
although the exact information included depends on the OS; eg. for IRIX
the architecture isn't particularly important (IRIX only runs on SGI
hardware), but for Linux the kernel version isn't particularly
important.
Examples of returned values:
linux-i586
linux-alpha (?)
solaris-2.6-sun4u
irix-5.3
irix64-6.2
Windows will return one of:
win-amd64 (64bit Windows on AMD64 (aka x86_64, Intel64, EM64T, etc)
win-ia64 (64bit Windows on Itanium)
win32 (all others - specifically, sys.platform is returned)
For other non-POSIX platforms, currently just returns 'sys.platform'.
"""
if os.name == 'nt':
# sniff sys.version for architecture.
prefix = " bit ("
i = sys.version.find(prefix)
if i == -1:
return sys.platform
j = sys.version.find(")", i)
look = sys.version[i+len(prefix):j].lower()
if look == 'amd64':
return 'win-amd64'
if look == 'itanium':
return 'win-ia64'
return sys.platform
# Set for cross builds explicitly
if "_PYTHON_HOST_PLATFORM" in os.environ:
return os.environ["_PYTHON_HOST_PLATFORM"]
if os.name != "posix" or not hasattr(os, 'uname'):
# XXX what about the architecture? NT is Intel or Alpha,
# Mac OS is M68k or PPC, etc.
return sys.platform
# Try to distinguish various flavours of Unix
(osname, host, release, version, machine) = os.uname()
# Convert the OS name to lowercase, remove '/' characters
# (to accommodate BSD/OS), and translate spaces (for "Power Macintosh")
osname = osname.lower().replace('/', '')
machine = machine.replace(' ', '_')
machine = machine.replace('/', '-')
if osname[:5] == "linux":
# At least on Linux/Intel, 'machine' is the processor --
# i386, etc.
# XXX what about Alpha, SPARC, etc?
return "%s-%s" % (osname, machine)
elif osname[:5] == "sunos":
if release[0] >= "5": # SunOS 5 == Solaris 2
osname = "solaris"
release = "%d.%s" % (int(release[0]) - 3, release[2:])
# We can't use "platform.architecture()[0]" because a
# bootstrap problem. We use a dict to get an error
# if some suspicious happens.
bitness = {2147483647:"32bit", 9223372036854775807:"64bit"}
machine += ".%s" % bitness[sys.maxsize]
# fall through to standard osname-release-machine representation
elif osname[:4] == "irix": # could be "irix64"!
return "%s-%s" % (osname, release)
elif osname[:3] == "aix":
return "%s-%s.%s" % (osname, version, release)
elif osname[:6] == "cygwin":
osname = "cygwin"
rel_re = re.compile (r'[\d.]+', re.ASCII)
m = rel_re.match(release)
if m:
release = m.group()
elif osname[:6] == "darwin":
import _osx_support, distutils.sysconfig
osname, release, machine = _osx_support.get_platform_osx(
distutils.sysconfig.get_config_vars(),
osname, release, machine)
return "%s-%s-%s" % (osname, release, machine)
# get_platform ()
def convert_path (pathname):
"""Return 'pathname' as a name that will work on the native filesystem,
i.e. split it on '/' and put it back together again using the current
directory separator. Needed because filenames in the setup script are
always supplied in Unix style, and have to be converted to the local
convention before we can actually use them in the filesystem. Raises
ValueError on non-Unix-ish systems if 'pathname' either starts or
ends with a slash.
"""
if os.sep == '/':
return pathname
if not pathname:
return pathname
if pathname[0] == '/':
raise ValueError("path '%s' cannot be absolute" % pathname)
if pathname[-1] == '/':
raise ValueError("path '%s' cannot end with '/'" % pathname)
paths = pathname.split('/')
while '.' in paths:
paths.remove('.')
if not paths:
return os.curdir
return os.path.join(*paths)
# convert_path ()
def change_root (new_root, pathname):
"""Return 'pathname' with 'new_root' prepended. If 'pathname' is
relative, this is equivalent to "os.path.join(new_root,pathname)".
Otherwise, it requires making 'pathname' relative and then joining the
two, which is tricky on DOS/Windows and Mac OS.
"""
if os.name == 'posix':
if not os.path.isabs(pathname):
return os.path.join(new_root, pathname)
else:
return os.path.join(new_root, pathname[1:])
elif os.name == 'nt':
(drive, path) = os.path.splitdrive(pathname)
if path[0] == '\\':
path = path[1:]
return os.path.join(new_root, path)
else:
raise DistutilsPlatformError("nothing known about platform '%s'" % os.name)
_environ_checked = 0
def check_environ ():
"""Ensure that 'os.environ' has all the environment variables we
guarantee that users can use in config files, command-line options,
etc. Currently this includes:
HOME - user's home directory (Unix only)
PLAT - description of the current platform, including hardware
and OS (see 'get_platform()')
"""
global _environ_checked
if _environ_checked:
return
if os.name == 'posix' and 'HOME' not in os.environ:
import pwd
os.environ['HOME'] = pwd.getpwuid(os.getuid())[5]
if 'PLAT' not in os.environ:
os.environ['PLAT'] = get_platform()
_environ_checked = 1
def subst_vars (s, local_vars):
"""Perform shell/Perl-style variable substitution on 'string'. Every
occurrence of '$' followed by a name is considered a variable, and
variable is substituted by the value found in the 'local_vars'
dictionary, or in 'os.environ' if it's not in 'local_vars'.
'os.environ' is first checked/augmented to guarantee that it contains
certain values: see 'check_environ()'. Raise ValueError for any
variables not found in either 'local_vars' or 'os.environ'.
"""
check_environ()
def _subst (match, local_vars=local_vars):
var_name = match.group(1)
if var_name in local_vars:
return str(local_vars[var_name])
else:
return os.environ[var_name]
try:
return re.sub(r'\$([a-zA-Z_][a-zA-Z_0-9]*)', _subst, s)
except KeyError as var:
raise ValueError("invalid variable '$%s'" % var)
# subst_vars ()
def grok_environment_error (exc, prefix="error: "):
# Function kept for backward compatibility.
# Used to try clever things with EnvironmentErrors,
# but nowadays str(exception) produces good messages.
return prefix + str(exc)
# Needed by 'split_quoted()'
_wordchars_re = _squote_re = _dquote_re = None
def _init_regex():
global _wordchars_re, _squote_re, _dquote_re
_wordchars_re = re.compile(r'[^\\\'\"%s ]*' % string.whitespace)
_squote_re = re.compile(r"'(?:[^'\\]|\\.)*'")
_dquote_re = re.compile(r'"(?:[^"\\]|\\.)*"')
def split_quoted (s):
"""Split a string up according to Unix shell-like rules for quotes and
backslashes. In short: words are delimited by spaces, as long as those
spaces are not escaped by a backslash, or inside a quoted string.
Single and double quotes are equivalent, and the quote characters can
be backslash-escaped. The backslash is stripped from any two-character
escape sequence, leaving only the escaped character. The quote
characters are stripped from any quoted string. Returns a list of
words.
"""
# This is a nice algorithm for splitting up a single string, since it
# doesn't require character-by-character examination. It was a little
# bit of a brain-bender to get it working right, though...
if _wordchars_re is None: _init_regex()
s = s.strip()
words = []
pos = 0
while s:
m = _wordchars_re.match(s, pos)
end = m.end()
if end == len(s):
words.append(s[:end])
break
if s[end] in string.whitespace: # unescaped, unquoted whitespace: now
words.append(s[:end]) # we definitely have a word delimiter
s = s[end:].lstrip()
pos = 0
elif s[end] == '\\': # preserve whatever is being escaped;
# will become part of the current word
s = s[:end] + s[end+1:]
pos = end+1
else:
if s[end] == "'": # slurp singly-quoted string
m = _squote_re.match(s, end)
elif s[end] == '"': # slurp doubly-quoted string
m = _dquote_re.match(s, end)
else:
raise RuntimeError("this can't happen (bad char '%c')" % s[end])
if m is None:
raise ValueError("bad string (mismatched %s quotes?)" % s[end])
(beg, end) = m.span()
s = s[:beg] + s[beg+1:end-1] + s[end:]
pos = m.end() - 2
if pos >= len(s):
words.append(s)
break
return words
# split_quoted ()
def execute (func, args, msg=None, verbose=0, dry_run=0):
"""Perform some action that affects the outside world (eg. by
writing to the filesystem). Such actions are special because they
are disabled by the 'dry_run' flag. This method takes care of all
that bureaucracy for you; all you have to do is supply the
function to call and an argument tuple for it (to embody the
"external action" being performed), and an optional message to
print.
"""
if msg is None:
msg = "%s%r" % (func.__name__, args)
if msg[-2:] == ',)': # correct for singleton tuple
msg = msg[0:-2] + ')'
log.info(msg)
if not dry_run:
func(*args)
def strtobool (val):
"""Convert a string representation of truth to true (1) or false (0).
True values are 'y', 'yes', 't', 'true', 'on', and '1'; false values
are 'n', 'no', 'f', 'false', 'off', and '0'. Raises ValueError if
'val' is anything else.
"""
val = val.lower()
if val in ('y', 'yes', 't', 'true', 'on', '1'):
return 1
elif val in ('n', 'no', 'f', 'false', 'off', '0'):
return 0
else:
raise ValueError("invalid truth value %r" % (val,))
def byte_compile (py_files,
optimize=0, force=0,
prefix=None, base_dir=None,
verbose=1, dry_run=0,
direct=None):
"""Byte-compile a collection of Python source files to either .pyc
or .pyo files in a __pycache__ subdirectory. 'py_files' is a list
of files to compile; any files that don't end in ".py" are silently
skipped. 'optimize' must be one of the following:
0 - don't optimize (generate .pyc)
1 - normal optimization (like "python -O")
2 - extra optimization (like "python -OO")
If 'force' is true, all files are recompiled regardless of
timestamps.
The source filename encoded in each bytecode file defaults to the
filenames listed in 'py_files'; you can modify these with 'prefix' and
'basedir'. 'prefix' is a string that will be stripped off of each
source filename, and 'base_dir' is a directory name that will be
prepended (after 'prefix' is stripped). You can supply either or both
(or neither) of 'prefix' and 'base_dir', as you wish.
If 'dry_run' is true, doesn't actually do anything that would
affect the filesystem.
Byte-compilation is either done directly in this interpreter process
with the standard py_compile module, or indirectly by writing a
temporary script and executing it. Normally, you should let
'byte_compile()' figure out to use direct compilation or not (see
the source for details). The 'direct' flag is used by the script
generated in indirect mode; unless you know what you're doing, leave
it set to None.
"""
# nothing is done if sys.dont_write_bytecode is True
if sys.dont_write_bytecode:
raise DistutilsByteCompileError('byte-compiling is disabled.')
# First, if the caller didn't force us into direct or indirect mode,
# figure out which mode we should be in. We take a conservative
# approach: choose direct mode *only* if the current interpreter is
# in debug mode and optimize is 0. If we're not in debug mode (-O
# or -OO), we don't know which level of optimization this
# interpreter is running with, so we can't do direct
# byte-compilation and be certain that it's the right thing. Thus,
# always compile indirectly if the current interpreter is in either
# optimize mode, or if either optimization level was requested by
# the caller.
if direct is None:
direct = (__debug__ and optimize == 0)
# "Indirect" byte-compilation: write a temporary script and then
# run it with the appropriate flags.
if not direct:
try:
from tempfile import mkstemp
(script_fd, script_name) = mkstemp(".py")
except ImportError:
from tempfile import mktemp
(script_fd, script_name) = None, mktemp(".py")
log.info("writing byte-compilation script '%s'", script_name)
if not dry_run:
if script_fd is not None:
script = os.fdopen(script_fd, "w")
else:
script = open(script_name, "w")
script.write("""\
from distutils.util import byte_compile
files = [
""")
# XXX would be nice to write absolute filenames, just for
# safety's sake (script should be more robust in the face of
# chdir'ing before running it). But this requires abspath'ing
# 'prefix' as well, and that breaks the hack in build_lib's
# 'byte_compile()' method that carefully tacks on a trailing
# slash (os.sep really) to make sure the prefix here is "just
# right". This whole prefix business is rather delicate -- the
# problem is that it's really a directory, but I'm treating it
# as a dumb string, so trailing slashes and so forth matter.
#py_files = map(os.path.abspath, py_files)
#if prefix:
# prefix = os.path.abspath(prefix)
script.write(",\n".join(map(repr, py_files)) + "]\n")
script.write("""
byte_compile(files, optimize=%r, force=%r,
prefix=%r, base_dir=%r,
verbose=%r, dry_run=0,
direct=1)
""" % (optimize, force, prefix, base_dir, verbose))
script.close()
cmd = [sys.executable, script_name]
if optimize == 1:
cmd.insert(1, "-O")
elif optimize == 2:
cmd.insert(1, "-OO")
spawn(cmd, dry_run=dry_run)
execute(os.remove, (script_name,), "removing %s" % script_name,
dry_run=dry_run)
# "Direct" byte-compilation: use the py_compile module to compile
# right here, right now. Note that the script generated in indirect
# mode simply calls 'byte_compile()' in direct mode, a weird sort of
# cross-process recursion. Hey, it works!
else:
from py_compile import compile
for file in py_files:
if file[-3:] != ".py":
# This lets us be lazy and not filter filenames in
# the "install_lib" command.
continue
# Terminology from the py_compile module:
# cfile - byte-compiled file
# dfile - purported source filename (same as 'file' by default)
if optimize >= 0:
cfile = importlib.util.cache_from_source(
file, debug_override=not optimize)
else:
cfile = importlib.util.cache_from_source(file)
dfile = file
if prefix:
if file[:len(prefix)] != prefix:
raise ValueError("invalid prefix: filename %r doesn't start with %r"
% (file, prefix))
dfile = dfile[len(prefix):]
if base_dir:
dfile = os.path.join(base_dir, dfile)
cfile_base = os.path.basename(cfile)
if direct:
if force or newer(file, cfile):
log.info("byte-compiling %s to %s", file, cfile_base)
if not dry_run:
compile(file, cfile, dfile)
else:
log.debug("skipping byte-compilation of %s to %s",
file, cfile_base)
# byte_compile ()
def rfc822_escape (header):
"""Return a version of the string escaped for inclusion in an
RFC-822 header, by ensuring there are 8 spaces space after each newline.
"""
lines = header.split('\n')
sep = '\n' + 8 * ' '
return sep.join(lines)
# 2to3 support
def run_2to3(files, fixer_names=None, options=None, explicit=None):
"""Invoke 2to3 on a list of Python files.
The files should all come from the build area, as the
modification is done in-place. To reduce the build time,
only files modified since the last invocation of this
function should be passed in the files argument."""
if not files:
return
# Make this class local, to delay import of 2to3
from lib2to3.refactor import RefactoringTool, get_fixers_from_package
class DistutilsRefactoringTool(RefactoringTool):
def log_error(self, msg, *args, **kw):
log.error(msg, *args)
def log_message(self, msg, *args):
log.info(msg, *args)
def log_debug(self, msg, *args):
log.debug(msg, *args)
if fixer_names is None:
fixer_names = get_fixers_from_package('lib2to3.fixes')
r = DistutilsRefactoringTool(fixer_names, options=options)
r.refactor(files, write=True)
def copydir_run_2to3(src, dest, template=None, fixer_names=None,
options=None, explicit=None):
"""Recursively copy a directory, only copying new and changed files,
running run_2to3 over all newly copied Python modules afterward.
If you give a template string, it's parsed like a MANIFEST.in.
"""
from distutils.dir_util import mkpath
from distutils.file_util import copy_file
from distutils.filelist import FileList
filelist = FileList()
curdir = os.getcwd()
os.chdir(src)
try:
filelist.findall()
finally:
os.chdir(curdir)
filelist.files[:] = filelist.allfiles
if template:
for line in template.splitlines():
line = line.strip()
if not line: continue
filelist.process_template_line(line)
copied = []
for filename in filelist.files:
outname = os.path.join(dest, filename)
mkpath(os.path.dirname(outname))
res = copy_file(os.path.join(src, filename), outname, update=1)
if res[1]: copied.append(outname)
run_2to3([fn for fn in copied if fn.lower().endswith('.py')],
fixer_names=fixer_names, options=options, explicit=explicit)
return copied
class Mixin2to3:
'''Mixin class for commands that run 2to3.
To configure 2to3, setup scripts may either change
the class variables, or inherit from individual commands
to override how 2to3 is invoked.'''
# provide list of fixers to run;
# defaults to all from lib2to3.fixers
fixer_names = None
# options dictionary
options = None
# list of fixers to invoke even though they are marked as explicit
explicit = None
def run_2to3(self, files):
return run_2to3(files, self.fixer_names, self.options, self.explicit)
#
# distutils/version.py
#
# Implements multiple version numbering conventions for the
# Python Module Distribution Utilities.
#
# $Id$
#
"""Provides classes to represent module version numbers (one class for
each style of version numbering). There are currently two such classes
implemented: StrictVersion and LooseVersion.
Every version number class implements the following interface:
* the 'parse' method takes a string and parses it to some internal
representation; if the string is an invalid version number,
'parse' raises a ValueError exception
* the class constructor takes an optional string argument which,
if supplied, is passed to 'parse'
* __str__ reconstructs the string that was passed to 'parse' (or
an equivalent string -- ie. one that will generate an equivalent
version number instance)
* __repr__ generates Python code to recreate the version number instance
* _cmp compares the current instance with either another instance
of the same class or a string (which will be parsed to an instance
of the same class, thus must follow the same rules)
"""
import re
class Version:
"""Abstract base class for version numbering classes. Just provides
constructor (__init__) and reproducer (__repr__), because those
seem to be the same for all version numbering classes; and route
rich comparisons to _cmp.
"""
def __init__ (self, vstring=None):
if vstring:
self.parse(vstring)
def __repr__ (self):
return "%s ('%s')" % (self.__class__.__name__, str(self))
def __eq__(self, other):
c = self._cmp(other)
if c is NotImplemented:
return c
return c == 0
def __ne__(self, other):
c = self._cmp(other)
if c is NotImplemented:
return c
return c != 0
def __lt__(self, other):
c = self._cmp(other)
if c is NotImplemented:
return c
return c < 0
def __le__(self, other):
c = self._cmp(other)
if c is NotImplemented:
return c
return c <= 0
def __gt__(self, other):
c = self._cmp(other)
if c is NotImplemented:
return c
return c > 0
def __ge__(self, other):
c = self._cmp(other)
if c is NotImplemented:
return c
return c >= 0
# Interface for version-number classes -- must be implemented
# by the following classes (the concrete ones -- Version should
# be treated as an abstract class).
# __init__ (string) - create and take same action as 'parse'
# (string parameter is optional)
# parse (string) - convert a string representation to whatever
# internal representation is appropriate for
# this style of version numbering
# __str__ (self) - convert back to a string; should be very similar
# (if not identical to) the string supplied to parse
# __repr__ (self) - generate Python code to recreate
# the instance
# _cmp (self, other) - compare two version numbers ('other' may
# be an unparsed version string, or another
# instance of your version class)
class StrictVersion (Version):
"""Version numbering for anal retentives and software idealists.
Implements the standard interface for version number classes as
described above. A version number consists of two or three
dot-separated numeric components, with an optional "pre-release" tag
on the end. The pre-release tag consists of the letter 'a' or 'b'
followed by a number. If the numeric components of two version
numbers are equal, then one with a pre-release tag will always
be deemed earlier (lesser) than one without.
The following are valid version numbers (shown in the order that
would be obtained by sorting according to the supplied cmp function):
0.4 0.4.0 (these two are equivalent)
0.4.1
0.5a1
0.5b3
0.5
0.9.6
1.0
1.0.4a3
1.0.4b1
1.0.4
The following are examples of invalid version numbers:
1
2.7.2.2
1.3.a4
1.3pl1
1.3c4
The rationale for this version numbering system will be explained
in the distutils documentation.
"""
version_re = re.compile(r'^(\d+) \. (\d+) (\. (\d+))? ([ab](\d+))?$',
re.VERBOSE | re.ASCII)
def parse (self, vstring):
match = self.version_re.match(vstring)
if not match:
raise ValueError("invalid version number '%s'" % vstring)
(major, minor, patch, prerelease, prerelease_num) = \
match.group(1, 2, 4, 5, 6)
if patch:
self.version = tuple(map(int, [major, minor, patch]))
else:
self.version = tuple(map(int, [major, minor])) + (0,)
if prerelease:
self.prerelease = (prerelease[0], int(prerelease_num))
else:
self.prerelease = None
def __str__ (self):
if self.version[2] == 0:
vstring = '.'.join(map(str, self.version[0:2]))
else:
vstring = '.'.join(map(str, self.version))
if self.prerelease:
vstring = vstring + self.prerelease[0] + str(self.prerelease[1])
return vstring
def _cmp (self, other):
if isinstance(other, str):
other = StrictVersion(other)
if self.version != other.version:
# numeric versions don't match
# prerelease stuff doesn't matter
if self.version < other.version:
return -1
else:
return 1
# have to compare prerelease
# case 1: neither has prerelease; they're equal
# case 2: self has prerelease, other doesn't; other is greater
# case 3: self doesn't have prerelease, other does: self is greater
# case 4: both have prerelease: must compare them!
if (not self.prerelease and not other.prerelease):
return 0
elif (self.prerelease and not other.prerelease):
return -1
elif (not self.prerelease and other.prerelease):
return 1
elif (self.prerelease and other.prerelease):
if self.prerelease == other.prerelease:
return 0
elif self.prerelease < other.prerelease:
return -1
else:
return 1
else:
assert False, "never get here"
# end class StrictVersion
# The rules according to Greg Stein:
# 1) a version number has 1 or more numbers separated by a period or by
# sequences of letters. If only periods, then these are compared
# left-to-right to determine an ordering.
# 2) sequences of letters are part of the tuple for comparison and are
# compared lexicographically
# 3) recognize the numeric components may have leading zeroes
#
# The LooseVersion class below implements these rules: a version number
# string is split up into a tuple of integer and string components, and
# comparison is a simple tuple comparison. This means that version
# numbers behave in a predictable and obvious way, but a way that might
# not necessarily be how people *want* version numbers to behave. There
# wouldn't be a problem if people could stick to purely numeric version
# numbers: just split on period and compare the numbers as tuples.
# However, people insist on putting letters into their version numbers;
# the most common purpose seems to be:
# - indicating a "pre-release" version
# ('alpha', 'beta', 'a', 'b', 'pre', 'p')
# - indicating a post-release patch ('p', 'pl', 'patch')
# but of course this can't cover all version number schemes, and there's
# no way to know what a programmer means without asking him.
#
# The problem is what to do with letters (and other non-numeric
# characters) in a version number. The current implementation does the
# obvious and predictable thing: keep them as strings and compare
# lexically within a tuple comparison. This has the desired effect if
# an appended letter sequence implies something "post-release":
# eg. "0.99" < "0.99pl14" < "1.0", and "5.001" < "5.001m" < "5.002".
#
# However, if letters in a version number imply a pre-release version,
# the "obvious" thing isn't correct. Eg. you would expect that
# "1.5.1" < "1.5.2a2" < "1.5.2", but under the tuple/lexical comparison
# implemented here, this just isn't so.
#
# Two possible solutions come to mind. The first is to tie the
# comparison algorithm to a particular set of semantic rules, as has
# been done in the StrictVersion class above. This works great as long
# as everyone can go along with bondage and discipline. Hopefully a
# (large) subset of Python module programmers will agree that the
# particular flavour of bondage and discipline provided by StrictVersion
# provides enough benefit to be worth using, and will submit their
# version numbering scheme to its domination. The free-thinking
# anarchists in the lot will never give in, though, and something needs
# to be done to accommodate them.
#
# Perhaps a "moderately strict" version class could be implemented that
# lets almost anything slide (syntactically), and makes some heuristic
# assumptions about non-digits in version number strings. This could
# sink into special-case-hell, though; if I was as talented and
# idiosyncratic as Larry Wall, I'd go ahead and implement a class that
# somehow knows that "1.2.1" < "1.2.2a2" < "1.2.2" < "1.2.2pl3", and is
# just as happy dealing with things like "2g6" and "1.13++". I don't
# think I'm smart enough to do it right though.
#
# In any case, I've coded the test suite for this module (see
# ../test/test_version.py) specifically to fail on things like comparing
# "1.2a2" and "1.2". That's not because the *code* is doing anything
# wrong, it's because the simple, obvious design doesn't match my
# complicated, hairy expectations for real-world version numbers. It
# would be a snap to fix the test suite to say, "Yep, LooseVersion does
# the Right Thing" (ie. the code matches the conception). But I'd rather
# have a conception that matches common notions about version numbers.
class LooseVersion (Version):
"""Version numbering for anarchists and software realists.
Implements the standard interface for version number classes as
described above. A version number consists of a series of numbers,
separated by either periods or strings of letters. When comparing
version numbers, the numeric components will be compared
numerically, and the alphabetic components lexically. The following
are all valid version numbers, in no particular order:
1.5.1
1.5.2b2
161
3.10a
8.02
3.4j
1996.07.12
3.2.pl0
3.1.1.6
2g6
11g
0.960923
2.2beta29
1.13++
5.5.kw
2.0b1pl0
In fact, there is no such thing as an invalid version number under
this scheme; the rules for comparison are simple and predictable,
but may not always give the results you want (for some definition
of "want").
"""
component_re = re.compile(r'(\d+ | [a-z]+ | \.)', re.VERBOSE)
def __init__ (self, vstring=None):
if vstring:
self.parse(vstring)
def parse (self, vstring):
# I've given up on thinking I can reconstruct the version string
# from the parsed tuple -- so I just store the string here for
# use by __str__
self.vstring = vstring
components = [x for x in self.component_re.split(vstring)
if x and x != '.']
for i, obj in enumerate(components):
try:
components[i] = int(obj)
except ValueError:
pass
self.version = components
def __str__ (self):
return self.vstring
def __repr__ (self):
return "LooseVersion ('%s')" % str(self)
def _cmp (self, other):
if isinstance(other, str):
other = LooseVersion(other)
if self.version == other.version:
return 0
if self.version < other.version:
return -1
if self.version > other.version:
return 1
# end class LooseVersion
"""Module for parsing and testing package version predicate strings.
"""
import re
import distutils.version
import operator
re_validPackage = re.compile(r"(?i)^\s*([a-z_]\w*(?:\.[a-z_]\w*)*)(.*)",
re.ASCII)
# (package) (rest)
re_paren = re.compile(r"^\s*\((.*)\)\s*$") # (list) inside of parentheses
re_splitComparison = re.compile(r"^\s*(<=|>=|<|>|!=|==)\s*([^\s,]+)\s*$")
# (comp) (version)
def splitUp(pred):
"""Parse a single version comparison.
Return (comparison string, StrictVersion)
"""
res = re_splitComparison.match(pred)
if not res:
raise ValueError("bad package restriction syntax: %r" % pred)
comp, verStr = res.groups()
return (comp, distutils.version.StrictVersion(verStr))
compmap = {"<": operator.lt, "<=": operator.le, "==": operator.eq,
">": operator.gt, ">=": operator.ge, "!=": operator.ne}
class VersionPredicate:
"""Parse and test package version predicates.
>>> v = VersionPredicate('pyepat.abc (>1.0, <3333.3a1, !=1555.1b3)')
The `name` attribute provides the full dotted name that is given::
>>> v.name
'pyepat.abc'
The str() of a `VersionPredicate` provides a normalized
human-readable version of the expression::
>>> print(v)
pyepat.abc (> 1.0, < 3333.3a1, != 1555.1b3)
The `satisfied_by()` method can be used to determine with a given
version number is included in the set described by the version
restrictions::
>>> v.satisfied_by('1.1')
True
>>> v.satisfied_by('1.4')
True
>>> v.satisfied_by('1.0')
False
>>> v.satisfied_by('4444.4')
False
>>> v.satisfied_by('1555.1b3')
False
`VersionPredicate` is flexible in accepting extra whitespace::
>>> v = VersionPredicate(' pat( == 0.1 ) ')
>>> v.name
'pat'
>>> v.satisfied_by('0.1')
True
>>> v.satisfied_by('0.2')
False
If any version numbers passed in do not conform to the
restrictions of `StrictVersion`, a `ValueError` is raised::
>>> v = VersionPredicate('p1.p2.p3.p4(>=1.0, <=1.3a1, !=1.2zb3)')
Traceback (most recent call last):
...
ValueError: invalid version number '1.2zb3'
It the module or package name given does not conform to what's
allowed as a legal module or package name, `ValueError` is
raised::
>>> v = VersionPredicate('foo-bar')
Traceback (most recent call last):
...
ValueError: expected parenthesized list: '-bar'
>>> v = VersionPredicate('foo bar (12.21)')
Traceback (most recent call last):
...
ValueError: expected parenthesized list: 'bar (12.21)'
"""
def __init__(self, versionPredicateStr):
"""Parse a version predicate string.
"""
# Fields:
# name: package name
# pred: list of (comparison string, StrictVersion)
versionPredicateStr = versionPredicateStr.strip()
if not versionPredicateStr:
raise ValueError("empty package restriction")
match = re_validPackage.match(versionPredicateStr)
if not match:
raise ValueError("bad package name in %r" % versionPredicateStr)
self.name, paren = match.groups()
paren = paren.strip()
if paren:
match = re_paren.match(paren)
if not match:
raise ValueError("expected parenthesized list: %r" % paren)
str = match.groups()[0]
self.pred = [splitUp(aPred) for aPred in str.split(",")]
if not self.pred:
raise ValueError("empty parenthesized list in %r"
% versionPredicateStr)
else:
self.pred = []
def __str__(self):
if self.pred:
seq = [cond + " " + str(ver) for cond, ver in self.pred]
return self.name + " (" + ", ".join(seq) + ")"
else:
return self.name
def satisfied_by(self, version):
"""True if version is compatible with all the predicates in self.
The parameter version must be acceptable to the StrictVersion
constructor. It may be either a string or StrictVersion.
"""
for cond, ver in self.pred:
if not compmap[cond](version, ver):
return False
return True
_provision_rx = None
def split_provision(value):
"""Return the name and optional version number of a provision.
The version number, if given, will be returned as a `StrictVersion`
instance, otherwise it will be `None`.
>>> split_provision('mypkg')
('mypkg', None)
>>> split_provision(' mypkg( 1.2 ) ')
('mypkg', StrictVersion ('1.2'))
"""
global _provision_rx
if _provision_rx is None:
_provision_rx = re.compile(
"([a-zA-Z_]\w*(?:\.[a-zA-Z_]\w*)*)(?:\s*\(\s*([^)\s]+)\s*\))?$",
re.ASCII)
value = value.strip()
m = _provision_rx.match(value)
if not m:
raise ValueError("illegal provides specification: %r" % value)
ver = m.group(2) or None
if ver:
ver = distutils.version.StrictVersion(ver)
return m.group(1), ver
"""distutils
The main package for the Python Module Distribution Utilities. Normally
used from a setup script as
from distutils.core import setup
setup (...)
"""
import sys
__version__ = sys.version[:sys.version.index(' ')]
"""distutils.command.bdist
Implements the Distutils 'bdist' command (create a built [binary]
distribution)."""
import os
from distutils.core import Command
from distutils.errors import *
from distutils.util import get_platform
def show_formats():
"""Print list of available formats (arguments to "--format" option).
"""
from distutils.fancy_getopt import FancyGetopt
formats = []
for format in bdist.format_commands:
formats.append(("formats=" + format, None,
bdist.format_command[format][1]))
pretty_printer = FancyGetopt(formats)
pretty_printer.print_help("List of available distribution formats:")
class bdist(Command):
description = "create a built (binary) distribution"
user_options = [('bdist-base=', 'b',
"temporary directory for creating built distributions"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('formats=', None,
"formats for distribution (comma-separated list)"),
('dist-dir=', 'd',
"directory to put final built distributions in "
"[default: dist]"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('owner=', 'u',
"Owner name used when creating a tar file"
" [default: current user]"),
('group=', 'g',
"Group name used when creating a tar file"
" [default: current group]"),
]
boolean_options = ['skip-build']
help_options = [
('help-formats', None,
"lists available distribution formats", show_formats),
]
# The following commands do not take a format option from bdist
no_format_option = ('bdist_rpm',)
# This won't do in reality: will need to distinguish RPM-ish Linux,
# Debian-ish Linux, Solaris, FreeBSD, ..., Windows, Mac OS.
default_format = {'posix': 'gztar',
'nt': 'zip'}
# Establish the preferred order (for the --help-formats option).
format_commands = ['rpm', 'gztar', 'bztar', 'ztar', 'tar',
'wininst', 'zip', 'msi']
# And the real information.
format_command = {'rpm': ('bdist_rpm', "RPM distribution"),
'gztar': ('bdist_dumb', "gzip'ed tar file"),
'bztar': ('bdist_dumb', "bzip2'ed tar file"),
'ztar': ('bdist_dumb', "compressed tar file"),
'tar': ('bdist_dumb', "tar file"),
'wininst': ('bdist_wininst',
"Windows executable installer"),
'zip': ('bdist_dumb', "ZIP file"),
'msi': ('bdist_msi', "Microsoft Installer")
}
def initialize_options(self):
self.bdist_base = None
self.plat_name = None
self.formats = None
self.dist_dir = None
self.skip_build = 0
self.group = None
self.owner = None
def finalize_options(self):
# have to finalize 'plat_name' before 'bdist_base'
if self.plat_name is None:
if self.skip_build:
self.plat_name = get_platform()
else:
self.plat_name = self.get_finalized_command('build').plat_name
# 'bdist_base' -- parent of per-built-distribution-format
# temporary directories (eg. we'll probably have
# "build/bdist.<plat>/dumb", "build/bdist.<plat>/rpm", etc.)
if self.bdist_base is None:
build_base = self.get_finalized_command('build').build_base
self.bdist_base = os.path.join(build_base,
'bdist.' + self.plat_name)
self.ensure_string_list('formats')
if self.formats is None:
try:
self.formats = [self.default_format[os.name]]
except KeyError:
raise DistutilsPlatformError(
"don't know how to create built distributions "
"on platform %s" % os.name)
if self.dist_dir is None:
self.dist_dir = "dist"
def run(self):
# Figure out which sub-commands we need to run.
commands = []
for format in self.formats:
try:
commands.append(self.format_command[format][0])
except KeyError:
raise DistutilsOptionError("invalid format '%s'" % format)
# Reinitialize and run each command.
for i in range(len(self.formats)):
cmd_name = commands[i]
sub_cmd = self.reinitialize_command(cmd_name)
if cmd_name not in self.no_format_option:
sub_cmd.format = self.formats[i]
# passing the owner and group names for tar archiving
if cmd_name == 'bdist_dumb':
sub_cmd.owner = self.owner
sub_cmd.group = self.group
# If we're going to need to run this command again, tell it to
# keep its temporary files around so subsequent runs go faster.
if cmd_name in commands[i+1:]:
sub_cmd.keep_temp = 1
self.run_command(cmd_name)
"""distutils.command.bdist_dumb
Implements the Distutils 'bdist_dumb' command (create a "dumb" built
distribution -- i.e., just an archive to be unpacked under $prefix or
$exec_prefix)."""
import os
from distutils.core import Command
from distutils.util import get_platform
from distutils.dir_util import remove_tree, ensure_relative
from distutils.errors import *
from distutils.sysconfig import get_python_version
from distutils import log
class bdist_dumb(Command):
description = "create a \"dumb\" built distribution"
user_options = [('bdist-dir=', 'd',
"temporary directory for creating the distribution"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('format=', 'f',
"archive format to create (tar, ztar, gztar, zip)"),
('keep-temp', 'k',
"keep the pseudo-installation tree around after " +
"creating the distribution archive"),
('dist-dir=', 'd',
"directory to put final built distributions in"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('relative', None,
"build the archive using relative paths"
"(default: false)"),
('owner=', 'u',
"Owner name used when creating a tar file"
" [default: current user]"),
('group=', 'g',
"Group name used when creating a tar file"
" [default: current group]"),
]
boolean_options = ['keep-temp', 'skip-build', 'relative']
default_format = { 'posix': 'gztar',
'nt': 'zip' }
def initialize_options(self):
self.bdist_dir = None
self.plat_name = None
self.format = None
self.keep_temp = 0
self.dist_dir = None
self.skip_build = None
self.relative = 0
self.owner = None
self.group = None
def finalize_options(self):
if self.bdist_dir is None:
bdist_base = self.get_finalized_command('bdist').bdist_base
self.bdist_dir = os.path.join(bdist_base, 'dumb')
if self.format is None:
try:
self.format = self.default_format[os.name]
except KeyError:
raise DistutilsPlatformError(
"don't know how to create dumb built distributions "
"on platform %s" % os.name)
self.set_undefined_options('bdist',
('dist_dir', 'dist_dir'),
('plat_name', 'plat_name'),
('skip_build', 'skip_build'))
def run(self):
if not self.skip_build:
self.run_command('build')
install = self.reinitialize_command('install', reinit_subcommands=1)
install.root = self.bdist_dir
install.skip_build = self.skip_build
install.warn_dir = 0
log.info("installing to %s" % self.bdist_dir)
self.run_command('install')
# And make an archive relative to the root of the
# pseudo-installation tree.
archive_basename = "%s.%s" % (self.distribution.get_fullname(),
self.plat_name)
pseudoinstall_root = os.path.join(self.dist_dir, archive_basename)
if not self.relative:
archive_root = self.bdist_dir
else:
if (self.distribution.has_ext_modules() and
(install.install_base != install.install_platbase)):
raise DistutilsPlatformError(
"can't make a dumb built distribution where "
"base and platbase are different (%s, %s)"
% (repr(install.install_base),
repr(install.install_platbase)))
else:
archive_root = os.path.join(self.bdist_dir,
ensure_relative(install.install_base))
# Make the archive
filename = self.make_archive(pseudoinstall_root,
self.format, root_dir=archive_root,
owner=self.owner, group=self.group)
if self.distribution.has_ext_modules():
pyversion = get_python_version()
else:
pyversion = 'any'
self.distribution.dist_files.append(('bdist_dumb', pyversion,
filename))
if not self.keep_temp:
remove_tree(self.bdist_dir, dry_run=self.dry_run)
# Copyright (C) 2005, 2006 Martin von Löwis
# Licensed to PSF under a Contributor Agreement.
# The bdist_wininst command proper
# based on bdist_wininst
"""
Implements the bdist_msi command.
"""
import sys, os
from distutils.core import Command
from distutils.dir_util import remove_tree
from distutils.sysconfig import get_python_version
from distutils.version import StrictVersion
from distutils.errors import DistutilsOptionError
from distutils.util import get_platform
from distutils import log
import msilib
from msilib import schema, sequence, text
from msilib import Directory, Feature, Dialog, add_data
class PyDialog(Dialog):
"""Dialog class with a fixed layout: controls at the top, then a ruler,
then a list of buttons: back, next, cancel. Optionally a bitmap at the
left."""
def __init__(self, *args, **kw):
"""Dialog(database, name, x, y, w, h, attributes, title, first,
default, cancel, bitmap=true)"""
Dialog.__init__(self, *args)
ruler = self.h - 36
bmwidth = 152*ruler/328
#if kw.get("bitmap", True):
# self.bitmap("Bitmap", 0, 0, bmwidth, ruler, "PythonWin")
self.line("BottomLine", 0, ruler, self.w, 0)
def title(self, title):
"Set the title text of the dialog at the top."
# name, x, y, w, h, flags=Visible|Enabled|Transparent|NoPrefix,
# text, in VerdanaBold10
self.text("Title", 15, 10, 320, 60, 0x30003,
r"{\VerdanaBold10}%s" % title)
def back(self, title, next, name = "Back", active = 1):
"""Add a back button with a given title, the tab-next button,
its name in the Control table, possibly initially disabled.
Return the button, so that events can be associated"""
if active:
flags = 3 # Visible|Enabled
else:
flags = 1 # Visible
return self.pushbutton(name, 180, self.h-27 , 56, 17, flags, title, next)
def cancel(self, title, next, name = "Cancel", active = 1):
"""Add a cancel button with a given title, the tab-next button,
its name in the Control table, possibly initially disabled.
Return the button, so that events can be associated"""
if active:
flags = 3 # Visible|Enabled
else:
flags = 1 # Visible
return self.pushbutton(name, 304, self.h-27, 56, 17, flags, title, next)
def next(self, title, next, name = "Next", active = 1):
"""Add a Next button with a given title, the tab-next button,
its name in the Control table, possibly initially disabled.
Return the button, so that events can be associated"""
if active:
flags = 3 # Visible|Enabled
else:
flags = 1 # Visible
return self.pushbutton(name, 236, self.h-27, 56, 17, flags, title, next)
def xbutton(self, name, title, next, xpos):
"""Add a button with a given title, the tab-next button,
its name in the Control table, giving its x position; the
y-position is aligned with the other buttons.
Return the button, so that events can be associated"""
return self.pushbutton(name, int(self.w*xpos - 28), self.h-27, 56, 17, 3, title, next)
class bdist_msi(Command):
description = "create a Microsoft Installer (.msi) binary distribution"
user_options = [('bdist-dir=', None,
"temporary directory for creating the distribution"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('keep-temp', 'k',
"keep the pseudo-installation tree around after " +
"creating the distribution archive"),
('target-version=', None,
"require a specific python version" +
" on the target system"),
('no-target-compile', 'c',
"do not compile .py to .pyc on the target system"),
('no-target-optimize', 'o',
"do not compile .py to .pyo (optimized)"
"on the target system"),
('dist-dir=', 'd',
"directory to put final built distributions in"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('install-script=', None,
"basename of installation script to be run after"
"installation or before deinstallation"),
('pre-install-script=', None,
"Fully qualified filename of a script to be run before "
"any files are installed. This script need not be in the "
"distribution"),
]
boolean_options = ['keep-temp', 'no-target-compile', 'no-target-optimize',
'skip-build']
all_versions = ['2.0', '2.1', '2.2', '2.3', '2.4',
'2.5', '2.6', '2.7', '2.8', '2.9',
'3.0', '3.1', '3.2', '3.3', '3.4',
'3.5', '3.6', '3.7', '3.8', '3.9']
other_version = 'X'
def initialize_options(self):
self.bdist_dir = None
self.plat_name = None
self.keep_temp = 0
self.no_target_compile = 0
self.no_target_optimize = 0
self.target_version = None
self.dist_dir = None
self.skip_build = None
self.install_script = None
self.pre_install_script = None
self.versions = None
def finalize_options(self):
self.set_undefined_options('bdist', ('skip_build', 'skip_build'))
if self.bdist_dir is None:
bdist_base = self.get_finalized_command('bdist').bdist_base
self.bdist_dir = os.path.join(bdist_base, 'msi')
short_version = get_python_version()
if (not self.target_version) and self.distribution.has_ext_modules():
self.target_version = short_version
if self.target_version:
self.versions = [self.target_version]
if not self.skip_build and self.distribution.has_ext_modules()\
and self.target_version != short_version:
raise DistutilsOptionError(
"target version can only be %s, or the '--skip-build'"
" option must be specified" % (short_version,))
else:
self.versions = list(self.all_versions)
self.set_undefined_options('bdist',
('dist_dir', 'dist_dir'),
('plat_name', 'plat_name'),
)
if self.pre_install_script:
raise DistutilsOptionError(
"the pre-install-script feature is not yet implemented")
if self.install_script:
for script in self.distribution.scripts:
if self.install_script == os.path.basename(script):
break
else:
raise DistutilsOptionError(
"install_script '%s' not found in scripts"
% self.install_script)
self.install_script_key = None
def run(self):
if not self.skip_build:
self.run_command('build')
install = self.reinitialize_command('install', reinit_subcommands=1)
install.prefix = self.bdist_dir
install.skip_build = self.skip_build
install.warn_dir = 0
install_lib = self.reinitialize_command('install_lib')
# we do not want to include pyc or pyo files
install_lib.compile = 0
install_lib.optimize = 0
if self.distribution.has_ext_modules():
# If we are building an installer for a Python version other
# than the one we are currently running, then we need to ensure
# our build_lib reflects the other Python version rather than ours.
# Note that for target_version!=sys.version, we must have skipped the
# build step, so there is no issue with enforcing the build of this
# version.
target_version = self.target_version
if not target_version:
assert self.skip_build, "Should have already checked this"
target_version = sys.version[0:3]
plat_specifier = ".%s-%s" % (self.plat_name, target_version)
build = self.get_finalized_command('build')
build.build_lib = os.path.join(build.build_base,
'lib' + plat_specifier)
log.info("installing to %s", self.bdist_dir)
install.ensure_finalized()
# avoid warning of 'install_lib' about installing
# into a directory not in sys.path
sys.path.insert(0, os.path.join(self.bdist_dir, 'PURELIB'))
install.run()
del sys.path[0]
self.mkpath(self.dist_dir)
fullname = self.distribution.get_fullname()
installer_name = self.get_installer_filename(fullname)
installer_name = os.path.abspath(installer_name)
if os.path.exists(installer_name): os.unlink(installer_name)
metadata = self.distribution.metadata
author = metadata.author
if not author:
author = metadata.maintainer
if not author:
author = "UNKNOWN"
version = metadata.get_version()
# ProductVersion must be strictly numeric
# XXX need to deal with prerelease versions
sversion = "%d.%d.%d" % StrictVersion(version).version
# Prefix ProductName with Python x.y, so that
# it sorts together with the other Python packages
# in Add-Remove-Programs (APR)
fullname = self.distribution.get_fullname()
if self.target_version:
product_name = "Python %s %s" % (self.target_version, fullname)
else:
product_name = "Python %s" % (fullname)
self.db = msilib.init_database(installer_name, schema,
product_name, msilib.gen_uuid(),
sversion, author)
msilib.add_tables(self.db, sequence)
props = [('DistVersion', version)]
email = metadata.author_email or metadata.maintainer_email
if email:
props.append(("ARPCONTACT", email))
if metadata.url:
props.append(("ARPURLINFOABOUT", metadata.url))
if props:
add_data(self.db, 'Property', props)
self.add_find_python()
self.add_files()
self.add_scripts()
self.add_ui()
self.db.Commit()
if hasattr(self.distribution, 'dist_files'):
tup = 'bdist_msi', self.target_version or 'any', fullname
self.distribution.dist_files.append(tup)
if not self.keep_temp:
remove_tree(self.bdist_dir, dry_run=self.dry_run)
def add_files(self):
db = self.db
cab = msilib.CAB("distfiles")
rootdir = os.path.abspath(self.bdist_dir)
root = Directory(db, cab, None, rootdir, "TARGETDIR", "SourceDir")
f = Feature(db, "Python", "Python", "Everything",
0, 1, directory="TARGETDIR")
items = [(f, root, '')]
for version in self.versions + [self.other_version]:
target = "TARGETDIR" + version
name = default = "Python" + version
desc = "Everything"
if version is self.other_version:
title = "Python from another location"
level = 2
else:
title = "Python %s from registry" % version
level = 1
f = Feature(db, name, title, desc, 1, level, directory=target)
dir = Directory(db, cab, root, rootdir, target, default)
items.append((f, dir, version))
db.Commit()
seen = {}
for feature, dir, version in items:
todo = [dir]
while todo:
dir = todo.pop()
for file in os.listdir(dir.absolute):
afile = os.path.join(dir.absolute, file)
if os.path.isdir(afile):
short = "%s|%s" % (dir.make_short(file), file)
default = file + version
newdir = Directory(db, cab, dir, file, default, short)
todo.append(newdir)
else:
if not dir.component:
dir.start_component(dir.logical, feature, 0)
if afile not in seen:
key = seen[afile] = dir.add_file(file)
if file==self.install_script:
if self.install_script_key:
raise DistutilsOptionError(
"Multiple files with name %s" % file)
self.install_script_key = '[#%s]' % key
else:
key = seen[afile]
add_data(self.db, "DuplicateFile",
[(key + version, dir.component, key, None, dir.logical)])
db.Commit()
cab.commit(db)
def add_find_python(self):
"""Adds code to the installer to compute the location of Python.
Properties PYTHON.MACHINE.X.Y and PYTHON.USER.X.Y will be set from the
registry for each version of Python.
Properties TARGETDIRX.Y will be set from PYTHON.USER.X.Y if defined,
else from PYTHON.MACHINE.X.Y.
Properties PYTHONX.Y will be set to TARGETDIRX.Y\\python.exe"""
start = 402
for ver in self.versions:
install_path = r"SOFTWARE\Python\PythonCore\%s\InstallPath" % ver
machine_reg = "python.machine." + ver
user_reg = "python.user." + ver
machine_prop = "PYTHON.MACHINE." + ver
user_prop = "PYTHON.USER." + ver
machine_action = "PythonFromMachine" + ver
user_action = "PythonFromUser" + ver
exe_action = "PythonExe" + ver
target_dir_prop = "TARGETDIR" + ver
exe_prop = "PYTHON" + ver
if msilib.Win64:
# type: msidbLocatorTypeRawValue + msidbLocatorType64bit
Type = 2+16
else:
Type = 2
add_data(self.db, "RegLocator",
[(machine_reg, 2, install_path, None, Type),
(user_reg, 1, install_path, None, Type)])
add_data(self.db, "AppSearch",
[(machine_prop, machine_reg),
(user_prop, user_reg)])
add_data(self.db, "CustomAction",
[(machine_action, 51+256, target_dir_prop, "[" + machine_prop + "]"),
(user_action, 51+256, target_dir_prop, "[" + user_prop + "]"),
(exe_action, 51+256, exe_prop, "[" + target_dir_prop + "]\\python.exe"),
])
add_data(self.db, "InstallExecuteSequence",
[(machine_action, machine_prop, start),
(user_action, user_prop, start + 1),
(exe_action, None, start + 2),
])
add_data(self.db, "InstallUISequence",
[(machine_action, machine_prop, start),
(user_action, user_prop, start + 1),
(exe_action, None, start + 2),
])
add_data(self.db, "Condition",
[("Python" + ver, 0, "NOT TARGETDIR" + ver)])
start += 4
assert start < 500
def add_scripts(self):
if self.install_script:
start = 6800
for ver in self.versions + [self.other_version]:
install_action = "install_script." + ver
exe_prop = "PYTHON" + ver
add_data(self.db, "CustomAction",
[(install_action, 50, exe_prop, self.install_script_key)])
add_data(self.db, "InstallExecuteSequence",
[(install_action, "&Python%s=3" % ver, start)])
start += 1
# XXX pre-install scripts are currently refused in finalize_options()
# but if this feature is completed, it will also need to add
# entries for each version as the above code does
if self.pre_install_script:
scriptfn = os.path.join(self.bdist_dir, "preinstall.bat")
f = open(scriptfn, "w")
# The batch file will be executed with [PYTHON], so that %1
# is the path to the Python interpreter; %0 will be the path
# of the batch file.
# rem ="""
# %1 %0
# exit
# """
# <actual script>
f.write('rem ="""\n%1 %0\nexit\n"""\n')
f.write(open(self.pre_install_script).read())
f.close()
add_data(self.db, "Binary",
[("PreInstall", msilib.Binary(scriptfn))
])
add_data(self.db, "CustomAction",
[("PreInstall", 2, "PreInstall", None)
])
add_data(self.db, "InstallExecuteSequence",
[("PreInstall", "NOT Installed", 450)])
def add_ui(self):
db = self.db
x = y = 50
w = 370
h = 300
title = "[ProductName] Setup"
# see "Dialog Style Bits"
modal = 3 # visible | modal
modeless = 1 # visible
track_disk_space = 32
# UI customization properties
add_data(db, "Property",
# See "DefaultUIFont Property"
[("DefaultUIFont", "DlgFont8"),
# See "ErrorDialog Style Bit"
("ErrorDialog", "ErrorDlg"),
("Progress1", "Install"), # modified in maintenance type dlg
("Progress2", "installs"),
("MaintenanceForm_Action", "Repair"),
# possible values: ALL, JUSTME
("WhichUsers", "ALL")
])
# Fonts, see "TextStyle Table"
add_data(db, "TextStyle",
[("DlgFont8", "Tahoma", 9, None, 0),
("DlgFontBold8", "Tahoma", 8, None, 1), #bold
("VerdanaBold10", "Verdana", 10, None, 1),
("VerdanaRed9", "Verdana", 9, 255, 0),
])
# UI Sequences, see "InstallUISequence Table", "Using a Sequence Table"
# Numbers indicate sequence; see sequence.py for how these action integrate
add_data(db, "InstallUISequence",
[("PrepareDlg", "Not Privileged or Windows9x or Installed", 140),
("WhichUsersDlg", "Privileged and not Windows9x and not Installed", 141),
# In the user interface, assume all-users installation if privileged.
("SelectFeaturesDlg", "Not Installed", 1230),
# XXX no support for resume installations yet
#("ResumeDlg", "Installed AND (RESUME OR Preselected)", 1240),
("MaintenanceTypeDlg", "Installed AND NOT RESUME AND NOT Preselected", 1250),
("ProgressDlg", None, 1280)])
add_data(db, 'ActionText', text.ActionText)
add_data(db, 'UIText', text.UIText)
#####################################################################
# Standard dialogs: FatalError, UserExit, ExitDialog
fatal=PyDialog(db, "FatalError", x, y, w, h, modal, title,
"Finish", "Finish", "Finish")
fatal.title("[ProductName] Installer ended prematurely")
fatal.back("< Back", "Finish", active = 0)
fatal.cancel("Cancel", "Back", active = 0)
fatal.text("Description1", 15, 70, 320, 80, 0x30003,
"[ProductName] setup ended prematurely because of an error. Your system has not been modified. To install this program at a later time, please run the installation again.")
fatal.text("Description2", 15, 155, 320, 20, 0x30003,
"Click the Finish button to exit the Installer.")
c=fatal.next("Finish", "Cancel", name="Finish")
c.event("EndDialog", "Exit")
user_exit=PyDialog(db, "UserExit", x, y, w, h, modal, title,
"Finish", "Finish", "Finish")
user_exit.title("[ProductName] Installer was interrupted")
user_exit.back("< Back", "Finish", active = 0)
user_exit.cancel("Cancel", "Back", active = 0)
user_exit.text("Description1", 15, 70, 320, 80, 0x30003,
"[ProductName] setup was interrupted. Your system has not been modified. "
"To install this program at a later time, please run the installation again.")
user_exit.text("Description2", 15, 155, 320, 20, 0x30003,
"Click the Finish button to exit the Installer.")
c = user_exit.next("Finish", "Cancel", name="Finish")
c.event("EndDialog", "Exit")
exit_dialog = PyDialog(db, "ExitDialog", x, y, w, h, modal, title,
"Finish", "Finish", "Finish")
exit_dialog.title("Completing the [ProductName] Installer")
exit_dialog.back("< Back", "Finish", active = 0)
exit_dialog.cancel("Cancel", "Back", active = 0)
exit_dialog.text("Description", 15, 235, 320, 20, 0x30003,
"Click the Finish button to exit the Installer.")
c = exit_dialog.next("Finish", "Cancel", name="Finish")
c.event("EndDialog", "Return")
#####################################################################
# Required dialog: FilesInUse, ErrorDlg
inuse = PyDialog(db, "FilesInUse",
x, y, w, h,
19, # KeepModeless|Modal|Visible
title,
"Retry", "Retry", "Retry", bitmap=False)
inuse.text("Title", 15, 6, 200, 15, 0x30003,
r"{\DlgFontBold8}Files in Use")
inuse.text("Description", 20, 23, 280, 20, 0x30003,
"Some files that need to be updated are currently in use.")
inuse.text("Text", 20, 55, 330, 50, 3,
"The following applications are using files that need to be updated by this setup. Close these applications and then click Retry to continue the installation or Cancel to exit it.")
inuse.control("List", "ListBox", 20, 107, 330, 130, 7, "FileInUseProcess",
None, None, None)
c=inuse.back("Exit", "Ignore", name="Exit")
c.event("EndDialog", "Exit")
c=inuse.next("Ignore", "Retry", name="Ignore")
c.event("EndDialog", "Ignore")
c=inuse.cancel("Retry", "Exit", name="Retry")
c.event("EndDialog","Retry")
# See "Error Dialog". See "ICE20" for the required names of the controls.
error = Dialog(db, "ErrorDlg",
50, 10, 330, 101,
65543, # Error|Minimize|Modal|Visible
title,
"ErrorText", None, None)
error.text("ErrorText", 50,9,280,48,3, "")
#error.control("ErrorIcon", "Icon", 15, 9, 24, 24, 5242881, None, "py.ico", None, None)
error.pushbutton("N",120,72,81,21,3,"No",None).event("EndDialog","ErrorNo")
error.pushbutton("Y",240,72,81,21,3,"Yes",None).event("EndDialog","ErrorYes")
error.pushbutton("A",0,72,81,21,3,"Abort",None).event("EndDialog","ErrorAbort")
error.pushbutton("C",42,72,81,21,3,"Cancel",None).event("EndDialog","ErrorCancel")
error.pushbutton("I",81,72,81,21,3,"Ignore",None).event("EndDialog","ErrorIgnore")
error.pushbutton("O",159,72,81,21,3,"Ok",None).event("EndDialog","ErrorOk")
error.pushbutton("R",198,72,81,21,3,"Retry",None).event("EndDialog","ErrorRetry")
#####################################################################
# Global "Query Cancel" dialog
cancel = Dialog(db, "CancelDlg", 50, 10, 260, 85, 3, title,
"No", "No", "No")
cancel.text("Text", 48, 15, 194, 30, 3,
"Are you sure you want to cancel [ProductName] installation?")
#cancel.control("Icon", "Icon", 15, 15, 24, 24, 5242881, None,
# "py.ico", None, None)
c=cancel.pushbutton("Yes", 72, 57, 56, 17, 3, "Yes", "No")
c.event("EndDialog", "Exit")
c=cancel.pushbutton("No", 132, 57, 56, 17, 3, "No", "Yes")
c.event("EndDialog", "Return")
#####################################################################
# Global "Wait for costing" dialog
costing = Dialog(db, "WaitForCostingDlg", 50, 10, 260, 85, modal, title,
"Return", "Return", "Return")
costing.text("Text", 48, 15, 194, 30, 3,
"Please wait while the installer finishes determining your disk space requirements.")
c = costing.pushbutton("Return", 102, 57, 56, 17, 3, "Return", None)
c.event("EndDialog", "Exit")
#####################################################################
# Preparation dialog: no user input except cancellation
prep = PyDialog(db, "PrepareDlg", x, y, w, h, modeless, title,
"Cancel", "Cancel", "Cancel")
prep.text("Description", 15, 70, 320, 40, 0x30003,
"Please wait while the Installer prepares to guide you through the installation.")
prep.title("Welcome to the [ProductName] Installer")
c=prep.text("ActionText", 15, 110, 320, 20, 0x30003, "Pondering...")
c.mapping("ActionText", "Text")
c=prep.text("ActionData", 15, 135, 320, 30, 0x30003, None)
c.mapping("ActionData", "Text")
prep.back("Back", None, active=0)
prep.next("Next", None, active=0)
c=prep.cancel("Cancel", None)
c.event("SpawnDialog", "CancelDlg")
#####################################################################
# Feature (Python directory) selection
seldlg = PyDialog(db, "SelectFeaturesDlg", x, y, w, h, modal, title,
"Next", "Next", "Cancel")
seldlg.title("Select Python Installations")
seldlg.text("Hint", 15, 30, 300, 20, 3,
"Select the Python locations where %s should be installed."
% self.distribution.get_fullname())
seldlg.back("< Back", None, active=0)
c = seldlg.next("Next >", "Cancel")
order = 1
c.event("[TARGETDIR]", "[SourceDir]", ordering=order)
for version in self.versions + [self.other_version]:
order += 1
c.event("[TARGETDIR]", "[TARGETDIR%s]" % version,
"FEATURE_SELECTED AND &Python%s=3" % version,
ordering=order)
c.event("SpawnWaitDialog", "WaitForCostingDlg", ordering=order + 1)
c.event("EndDialog", "Return", ordering=order + 2)
c = seldlg.cancel("Cancel", "Features")
c.event("SpawnDialog", "CancelDlg")
c = seldlg.control("Features", "SelectionTree", 15, 60, 300, 120, 3,
"FEATURE", None, "PathEdit", None)
c.event("[FEATURE_SELECTED]", "1")
ver = self.other_version
install_other_cond = "FEATURE_SELECTED AND &Python%s=3" % ver
dont_install_other_cond = "FEATURE_SELECTED AND &Python%s<>3" % ver
c = seldlg.text("Other", 15, 200, 300, 15, 3,
"Provide an alternate Python location")
c.condition("Enable", install_other_cond)
c.condition("Show", install_other_cond)
c.condition("Disable", dont_install_other_cond)
c.condition("Hide", dont_install_other_cond)
c = seldlg.control("PathEdit", "PathEdit", 15, 215, 300, 16, 1,
"TARGETDIR" + ver, None, "Next", None)
c.condition("Enable", install_other_cond)
c.condition("Show", install_other_cond)
c.condition("Disable", dont_install_other_cond)
c.condition("Hide", dont_install_other_cond)
#####################################################################
# Disk cost
cost = PyDialog(db, "DiskCostDlg", x, y, w, h, modal, title,
"OK", "OK", "OK", bitmap=False)
cost.text("Title", 15, 6, 200, 15, 0x30003,
"{\DlgFontBold8}Disk Space Requirements")
cost.text("Description", 20, 20, 280, 20, 0x30003,
"The disk space required for the installation of the selected features.")
cost.text("Text", 20, 53, 330, 60, 3,
"The highlighted volumes (if any) do not have enough disk space "
"available for the currently selected features. You can either "
"remove some files from the highlighted volumes, or choose to "
"install less features onto local drive(s), or select different "
"destination drive(s).")
cost.control("VolumeList", "VolumeCostList", 20, 100, 330, 150, 393223,
None, "{120}{70}{70}{70}{70}", None, None)
cost.xbutton("OK", "Ok", None, 0.5).event("EndDialog", "Return")
#####################################################################
# WhichUsers Dialog. Only available on NT, and for privileged users.
# This must be run before FindRelatedProducts, because that will
# take into account whether the previous installation was per-user
# or per-machine. We currently don't support going back to this
# dialog after "Next" was selected; to support this, we would need to
# find how to reset the ALLUSERS property, and how to re-run
# FindRelatedProducts.
# On Windows9x, the ALLUSERS property is ignored on the command line
# and in the Property table, but installer fails according to the documentation
# if a dialog attempts to set ALLUSERS.
whichusers = PyDialog(db, "WhichUsersDlg", x, y, w, h, modal, title,
"AdminInstall", "Next", "Cancel")
whichusers.title("Select whether to install [ProductName] for all users of this computer.")
# A radio group with two options: allusers, justme
g = whichusers.radiogroup("AdminInstall", 15, 60, 260, 50, 3,
"WhichUsers", "", "Next")
g.add("ALL", 0, 5, 150, 20, "Install for all users")
g.add("JUSTME", 0, 25, 150, 20, "Install just for me")
whichusers.back("Back", None, active=0)
c = whichusers.next("Next >", "Cancel")
c.event("[ALLUSERS]", "1", 'WhichUsers="ALL"', 1)
c.event("EndDialog", "Return", ordering = 2)
c = whichusers.cancel("Cancel", "AdminInstall")
c.event("SpawnDialog", "CancelDlg")
#####################################################################
# Installation Progress dialog (modeless)
progress = PyDialog(db, "ProgressDlg", x, y, w, h, modeless, title,
"Cancel", "Cancel", "Cancel", bitmap=False)
progress.text("Title", 20, 15, 200, 15, 0x30003,
"{\DlgFontBold8}[Progress1] [ProductName]")
progress.text("Text", 35, 65, 300, 30, 3,
"Please wait while the Installer [Progress2] [ProductName]. "
"This may take several minutes.")
progress.text("StatusLabel", 35, 100, 35, 20, 3, "Status:")
c=progress.text("ActionText", 70, 100, w-70, 20, 3, "Pondering...")
c.mapping("ActionText", "Text")
#c=progress.text("ActionData", 35, 140, 300, 20, 3, None)
#c.mapping("ActionData", "Text")
c=progress.control("ProgressBar", "ProgressBar", 35, 120, 300, 10, 65537,
None, "Progress done", None, None)
c.mapping("SetProgress", "Progress")
progress.back("< Back", "Next", active=False)
progress.next("Next >", "Cancel", active=False)
progress.cancel("Cancel", "Back").event("SpawnDialog", "CancelDlg")
###################################################################
# Maintenance type: repair/uninstall
maint = PyDialog(db, "MaintenanceTypeDlg", x, y, w, h, modal, title,
"Next", "Next", "Cancel")
maint.title("Welcome to the [ProductName] Setup Wizard")
maint.text("BodyText", 15, 63, 330, 42, 3,
"Select whether you want to repair or remove [ProductName].")
g=maint.radiogroup("RepairRadioGroup", 15, 108, 330, 60, 3,
"MaintenanceForm_Action", "", "Next")
#g.add("Change", 0, 0, 200, 17, "&Change [ProductName]")
g.add("Repair", 0, 18, 200, 17, "&Repair [ProductName]")
g.add("Remove", 0, 36, 200, 17, "Re&move [ProductName]")
maint.back("< Back", None, active=False)
c=maint.next("Finish", "Cancel")
# Change installation: Change progress dialog to "Change", then ask
# for feature selection
#c.event("[Progress1]", "Change", 'MaintenanceForm_Action="Change"', 1)
#c.event("[Progress2]", "changes", 'MaintenanceForm_Action="Change"', 2)
# Reinstall: Change progress dialog to "Repair", then invoke reinstall
# Also set list of reinstalled features to "ALL"
c.event("[REINSTALL]", "ALL", 'MaintenanceForm_Action="Repair"', 5)
c.event("[Progress1]", "Repairing", 'MaintenanceForm_Action="Repair"', 6)
c.event("[Progress2]", "repairs", 'MaintenanceForm_Action="Repair"', 7)
c.event("Reinstall", "ALL", 'MaintenanceForm_Action="Repair"', 8)
# Uninstall: Change progress to "Remove", then invoke uninstall
# Also set list of removed features to "ALL"
c.event("[REMOVE]", "ALL", 'MaintenanceForm_Action="Remove"', 11)
c.event("[Progress1]", "Removing", 'MaintenanceForm_Action="Remove"', 12)
c.event("[Progress2]", "removes", 'MaintenanceForm_Action="Remove"', 13)
c.event("Remove", "ALL", 'MaintenanceForm_Action="Remove"', 14)
# Close dialog when maintenance action scheduled
c.event("EndDialog", "Return", 'MaintenanceForm_Action<>"Change"', 20)
#c.event("NewDialog", "SelectFeaturesDlg", 'MaintenanceForm_Action="Change"', 21)
maint.cancel("Cancel", "RepairRadioGroup").event("SpawnDialog", "CancelDlg")
def get_installer_filename(self, fullname):
# Factored out to allow overriding in subclasses
if self.target_version:
base_name = "%s.%s-py%s.msi" % (fullname, self.plat_name,
self.target_version)
else:
base_name = "%s.%s.msi" % (fullname, self.plat_name)
installer_name = os.path.join(self.dist_dir, base_name)
return installer_name
"""distutils.command.bdist_rpm
Implements the Distutils 'bdist_rpm' command (create RPM source and binary
distributions)."""
import subprocess, sys, os
from distutils.core import Command
from distutils.debug import DEBUG
from distutils.util import get_platform
from distutils.file_util import write_file
from distutils.errors import *
from distutils.sysconfig import get_python_version
from distutils import log
class bdist_rpm(Command):
description = "create an RPM distribution"
user_options = [
('bdist-base=', None,
"base directory for creating built distributions"),
('rpm-base=', None,
"base directory for creating RPMs (defaults to \"rpm\" under "
"--bdist-base; must be specified for RPM 2)"),
('dist-dir=', 'd',
"directory to put final RPM files in "
"(and .spec files if --spec-only)"),
('python=', None,
"path to Python interpreter to hard-code in the .spec file "
"(default: \"python\")"),
('fix-python', None,
"hard-code the exact path to the current Python interpreter in "
"the .spec file"),
('spec-only', None,
"only regenerate spec file"),
('source-only', None,
"only generate source RPM"),
('binary-only', None,
"only generate binary RPM"),
('use-bzip2', None,
"use bzip2 instead of gzip to create source distribution"),
# More meta-data: too RPM-specific to put in the setup script,
# but needs to go in the .spec file -- so we make these options
# to "bdist_rpm". The idea is that packagers would put this
# info in setup.cfg, although they are of course free to
# supply it on the command line.
('distribution-name=', None,
"name of the (Linux) distribution to which this "
"RPM applies (*not* the name of the module distribution!)"),
('group=', None,
"package classification [default: \"Development/Libraries\"]"),
('release=', None,
"RPM release number"),
('serial=', None,
"RPM serial number"),
('vendor=', None,
"RPM \"vendor\" (eg. \"Joe Blow <[email protected]>\") "
"[default: maintainer or author from setup script]"),
('packager=', None,
"RPM packager (eg. \"Jane Doe <[email protected]>\")"
"[default: vendor]"),
('doc-files=', None,
"list of documentation files (space or comma-separated)"),
('changelog=', None,
"RPM changelog"),
('icon=', None,
"name of icon file"),
('provides=', None,
"capabilities provided by this package"),
('requires=', None,
"capabilities required by this package"),
('conflicts=', None,
"capabilities which conflict with this package"),
('build-requires=', None,
"capabilities required to build this package"),
('obsoletes=', None,
"capabilities made obsolete by this package"),
('no-autoreq', None,
"do not automatically calculate dependencies"),
# Actions to take when building RPM
('keep-temp', 'k',
"don't clean up RPM build directory"),
('no-keep-temp', None,
"clean up RPM build directory [default]"),
('use-rpm-opt-flags', None,
"compile with RPM_OPT_FLAGS when building from source RPM"),
('no-rpm-opt-flags', None,
"do not pass any RPM CFLAGS to compiler"),
('rpm3-mode', None,
"RPM 3 compatibility mode (default)"),
('rpm2-mode', None,
"RPM 2 compatibility mode"),
# Add the hooks necessary for specifying custom scripts
('prep-script=', None,
"Specify a script for the PREP phase of RPM building"),
('build-script=', None,
"Specify a script for the BUILD phase of RPM building"),
('pre-install=', None,
"Specify a script for the pre-INSTALL phase of RPM building"),
('install-script=', None,
"Specify a script for the INSTALL phase of RPM building"),
('post-install=', None,
"Specify a script for the post-INSTALL phase of RPM building"),
('pre-uninstall=', None,
"Specify a script for the pre-UNINSTALL phase of RPM building"),
('post-uninstall=', None,
"Specify a script for the post-UNINSTALL phase of RPM building"),
('clean-script=', None,
"Specify a script for the CLEAN phase of RPM building"),
('verify-script=', None,
"Specify a script for the VERIFY phase of the RPM build"),
# Allow a packager to explicitly force an architecture
('force-arch=', None,
"Force an architecture onto the RPM build process"),
('quiet', 'q',
"Run the INSTALL phase of RPM building in quiet mode"),
]
boolean_options = ['keep-temp', 'use-rpm-opt-flags', 'rpm3-mode',
'no-autoreq', 'quiet']
negative_opt = {'no-keep-temp': 'keep-temp',
'no-rpm-opt-flags': 'use-rpm-opt-flags',
'rpm2-mode': 'rpm3-mode'}
def initialize_options(self):
self.bdist_base = None
self.rpm_base = None
self.dist_dir = None
self.python = None
self.fix_python = None
self.spec_only = None
self.binary_only = None
self.source_only = None
self.use_bzip2 = None
self.distribution_name = None
self.group = None
self.release = None
self.serial = None
self.vendor = None
self.packager = None
self.doc_files = None
self.changelog = None
self.icon = None
self.prep_script = None
self.build_script = None
self.install_script = None
self.clean_script = None
self.verify_script = None
self.pre_install = None
self.post_install = None
self.pre_uninstall = None
self.post_uninstall = None
self.prep = None
self.provides = None
self.requires = None
self.conflicts = None
self.build_requires = None
self.obsoletes = None
self.keep_temp = 0
self.use_rpm_opt_flags = 1
self.rpm3_mode = 1
self.no_autoreq = 0
self.force_arch = None
self.quiet = 0
def finalize_options(self):
self.set_undefined_options('bdist', ('bdist_base', 'bdist_base'))
if self.rpm_base is None:
if not self.rpm3_mode:
raise DistutilsOptionError(
"you must specify --rpm-base in RPM 2 mode")
self.rpm_base = os.path.join(self.bdist_base, "rpm")
if self.python is None:
if self.fix_python:
self.python = sys.executable
else:
self.python = "python3"
elif self.fix_python:
raise DistutilsOptionError(
"--python and --fix-python are mutually exclusive options")
if os.name != 'posix':
raise DistutilsPlatformError("don't know how to create RPM "
"distributions on platform %s" % os.name)
if self.binary_only and self.source_only:
raise DistutilsOptionError(
"cannot supply both '--source-only' and '--binary-only'")
# don't pass CFLAGS to pure python distributions
if not self.distribution.has_ext_modules():
self.use_rpm_opt_flags = 0
self.set_undefined_options('bdist', ('dist_dir', 'dist_dir'))
self.finalize_package_data()
def finalize_package_data(self):
self.ensure_string('group', "Development/Libraries")
self.ensure_string('vendor',
"%s <%s>" % (self.distribution.get_contact(),
self.distribution.get_contact_email()))
self.ensure_string('packager')
self.ensure_string_list('doc_files')
if isinstance(self.doc_files, list):
for readme in ('README', 'README.txt'):
if os.path.exists(readme) and readme not in self.doc_files:
self.doc_files.append(readme)
self.ensure_string('release', "1")
self.ensure_string('serial') # should it be an int?
self.ensure_string('distribution_name')
self.ensure_string('changelog')
# Format changelog correctly
self.changelog = self._format_changelog(self.changelog)
self.ensure_filename('icon')
self.ensure_filename('prep_script')
self.ensure_filename('build_script')
self.ensure_filename('install_script')
self.ensure_filename('clean_script')
self.ensure_filename('verify_script')
self.ensure_filename('pre_install')
self.ensure_filename('post_install')
self.ensure_filename('pre_uninstall')
self.ensure_filename('post_uninstall')
# XXX don't forget we punted on summaries and descriptions -- they
# should be handled here eventually!
# Now *this* is some meta-data that belongs in the setup script...
self.ensure_string_list('provides')
self.ensure_string_list('requires')
self.ensure_string_list('conflicts')
self.ensure_string_list('build_requires')
self.ensure_string_list('obsoletes')
self.ensure_string('force_arch')
def run(self):
if DEBUG:
print("before _get_package_data():")
print("vendor =", self.vendor)
print("packager =", self.packager)
print("doc_files =", self.doc_files)
print("changelog =", self.changelog)
# make directories
if self.spec_only:
spec_dir = self.dist_dir
self.mkpath(spec_dir)
else:
rpm_dir = {}
for d in ('SOURCES', 'SPECS', 'BUILD', 'RPMS', 'SRPMS'):
rpm_dir[d] = os.path.join(self.rpm_base, d)
self.mkpath(rpm_dir[d])
spec_dir = rpm_dir['SPECS']
# Spec file goes into 'dist_dir' if '--spec-only specified',
# build/rpm.<plat> otherwise.
spec_path = os.path.join(spec_dir,
"%s.spec" % self.distribution.get_name())
self.execute(write_file,
(spec_path,
self._make_spec_file()),
"writing '%s'" % spec_path)
if self.spec_only: # stop if requested
return
# Make a source distribution and copy to SOURCES directory with
# optional icon.
saved_dist_files = self.distribution.dist_files[:]
sdist = self.reinitialize_command('sdist')
if self.use_bzip2:
sdist.formats = ['bztar']
else:
sdist.formats = ['gztar']
self.run_command('sdist')
self.distribution.dist_files = saved_dist_files
source = sdist.get_archive_files()[0]
source_dir = rpm_dir['SOURCES']
self.copy_file(source, source_dir)
if self.icon:
if os.path.exists(self.icon):
self.copy_file(self.icon, source_dir)
else:
raise DistutilsFileError(
"icon file '%s' does not exist" % self.icon)
# build package
log.info("building RPMs")
rpm_cmd = ['rpm']
if os.path.exists('/usr/bin/rpmbuild') or \
os.path.exists('/bin/rpmbuild'):
rpm_cmd = ['rpmbuild']
if self.source_only: # what kind of RPMs?
rpm_cmd.append('-bs')
elif self.binary_only:
rpm_cmd.append('-bb')
else:
rpm_cmd.append('-ba')
rpm_cmd.extend(['--define', '__python %s' % self.python])
if self.rpm3_mode:
rpm_cmd.extend(['--define',
'_topdir %s' % os.path.abspath(self.rpm_base)])
if not self.keep_temp:
rpm_cmd.append('--clean')
if self.quiet:
rpm_cmd.append('--quiet')
rpm_cmd.append(spec_path)
# Determine the binary rpm names that should be built out of this spec
# file
# Note that some of these may not be really built (if the file
# list is empty)
nvr_string = "%{name}-%{version}-%{release}"
src_rpm = nvr_string + ".src.rpm"
non_src_rpm = "%{arch}/" + nvr_string + ".%{arch}.rpm"
q_cmd = r"rpm -q --qf '%s %s\n' --specfile '%s'" % (
src_rpm, non_src_rpm, spec_path)
out = os.popen(q_cmd)
try:
binary_rpms = []
source_rpm = None
while True:
line = out.readline()
if not line:
break
l = line.strip().split()
assert(len(l) == 2)
binary_rpms.append(l[1])
# The source rpm is named after the first entry in the spec file
if source_rpm is None:
source_rpm = l[0]
status = out.close()
if status:
raise DistutilsExecError("Failed to execute: %s" % repr(q_cmd))
finally:
out.close()
self.spawn(rpm_cmd)
if not self.dry_run:
if self.distribution.has_ext_modules():
pyversion = get_python_version()
else:
pyversion = 'any'
if not self.binary_only:
srpm = os.path.join(rpm_dir['SRPMS'], source_rpm)
assert(os.path.exists(srpm))
self.move_file(srpm, self.dist_dir)
filename = os.path.join(self.dist_dir, source_rpm)
self.distribution.dist_files.append(
('bdist_rpm', pyversion, filename))
if not self.source_only:
for rpm in binary_rpms:
rpm = os.path.join(rpm_dir['RPMS'], rpm)
if os.path.exists(rpm):
self.move_file(rpm, self.dist_dir)
filename = os.path.join(self.dist_dir,
os.path.basename(rpm))
self.distribution.dist_files.append(
('bdist_rpm', pyversion, filename))
def _dist_path(self, path):
return os.path.join(self.dist_dir, os.path.basename(path))
def _make_spec_file(self):
"""Generate the text of an RPM spec file and return it as a
list of strings (one per line).
"""
# definitions and headers
spec_file = [
'%define name ' + self.distribution.get_name(),
'%define version ' + self.distribution.get_version().replace('-','_'),
'%define unmangled_version ' + self.distribution.get_version(),
'%define release ' + self.release.replace('-','_'),
'',
'Summary: ' + self.distribution.get_description(),
]
# Workaround for #14443 which affects some RPM based systems such as
# RHEL6 (and probably derivatives)
vendor_hook = subprocess.getoutput('rpm --eval %{__os_install_post}')
# Generate a potential replacement value for __os_install_post (whilst
# normalizing the whitespace to simplify the test for whether the
# invocation of brp-python-bytecompile passes in __python):
vendor_hook = '\n'.join([' %s \\' % line.strip()
for line in vendor_hook.splitlines()])
problem = "brp-python-bytecompile \\\n"
fixed = "brp-python-bytecompile %{__python} \\\n"
fixed_hook = vendor_hook.replace(problem, fixed)
if fixed_hook != vendor_hook:
spec_file.append('# Workaround for http://bugs.python.org/issue14443')
spec_file.append('%define __os_install_post ' + fixed_hook + '\n')
# put locale summaries into spec file
# XXX not supported for now (hard to put a dictionary
# in a config file -- arg!)
#for locale in self.summaries.keys():
# spec_file.append('Summary(%s): %s' % (locale,
# self.summaries[locale]))
spec_file.extend([
'Name: %{name}',
'Version: %{version}',
'Release: %{release}',])
# XXX yuck! this filename is available from the "sdist" command,
# but only after it has run: and we create the spec file before
# running "sdist", in case of --spec-only.
if self.use_bzip2:
spec_file.append('Source0: %{name}-%{unmangled_version}.tar.bz2')
else:
spec_file.append('Source0: %{name}-%{unmangled_version}.tar.gz')
spec_file.extend([
'License: ' + self.distribution.get_license(),
'Group: ' + self.group,
'BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-buildroot',
'Prefix: %{_prefix}', ])
if not self.force_arch:
# noarch if no extension modules
if not self.distribution.has_ext_modules():
spec_file.append('BuildArch: noarch')
else:
spec_file.append( 'BuildArch: %s' % self.force_arch )
for field in ('Vendor',
'Packager',
'Provides',
'Requires',
'Conflicts',
'Obsoletes',
):
val = getattr(self, field.lower())
if isinstance(val, list):
spec_file.append('%s: %s' % (field, ' '.join(val)))
elif val is not None:
spec_file.append('%s: %s' % (field, val))
if self.distribution.get_url() != 'UNKNOWN':
spec_file.append('Url: ' + self.distribution.get_url())
if self.distribution_name:
spec_file.append('Distribution: ' + self.distribution_name)
if self.build_requires:
spec_file.append('BuildRequires: ' +
' '.join(self.build_requires))
if self.icon:
spec_file.append('Icon: ' + os.path.basename(self.icon))
if self.no_autoreq:
spec_file.append('AutoReq: 0')
spec_file.extend([
'',
'%description',
self.distribution.get_long_description()
])
# put locale descriptions into spec file
# XXX again, suppressed because config file syntax doesn't
# easily support this ;-(
#for locale in self.descriptions.keys():
# spec_file.extend([
# '',
# '%description -l ' + locale,
# self.descriptions[locale],
# ])
# rpm scripts
# figure out default build script
def_setup_call = "%s %s" % (self.python,os.path.basename(sys.argv[0]))
def_build = "%s build" % def_setup_call
if self.use_rpm_opt_flags:
def_build = 'env CFLAGS="$RPM_OPT_FLAGS" ' + def_build
# insert contents of files
# XXX this is kind of misleading: user-supplied options are files
# that we open and interpolate into the spec file, but the defaults
# are just text that we drop in as-is. Hmmm.
install_cmd = ('%s install -O1 --root=$RPM_BUILD_ROOT '
'--record=INSTALLED_FILES') % def_setup_call
script_options = [
('prep', 'prep_script', "%setup -n %{name}-%{unmangled_version}"),
('build', 'build_script', def_build),
('install', 'install_script', install_cmd),
('clean', 'clean_script', "rm -rf $RPM_BUILD_ROOT"),
('verifyscript', 'verify_script', None),
('pre', 'pre_install', None),
('post', 'post_install', None),
('preun', 'pre_uninstall', None),
('postun', 'post_uninstall', None),
]
for (rpm_opt, attr, default) in script_options:
# Insert contents of file referred to, if no file is referred to
# use 'default' as contents of script
val = getattr(self, attr)
if val or default:
spec_file.extend([
'',
'%' + rpm_opt,])
if val:
spec_file.extend(open(val, 'r').read().split('\n'))
else:
spec_file.append(default)
# files section
spec_file.extend([
'',
'%files -f INSTALLED_FILES',
'%defattr(-,root,root)',
])
if self.doc_files:
spec_file.append('%doc ' + ' '.join(self.doc_files))
if self.changelog:
spec_file.extend([
'',
'%changelog',])
spec_file.extend(self.changelog)
return spec_file
def _format_changelog(self, changelog):
"""Format the changelog correctly and convert it to a list of strings
"""
if not changelog:
return changelog
new_changelog = []
for line in changelog.strip().split('\n'):
line = line.strip()
if line[0] == '*':
new_changelog.extend(['', line])
elif line[0] == '-':
new_changelog.append(line)
else:
new_changelog.append(' ' + line)
# strip trailing newline inserted by first changelog entry
if not new_changelog[0]:
del new_changelog[0]
return new_changelog
"""distutils.command.bdist_wininst
Implements the Distutils 'bdist_wininst' command: create a windows installer
exe-program."""
import sys, os
from distutils.core import Command
from distutils.util import get_platform
from distutils.dir_util import create_tree, remove_tree
from distutils.errors import *
from distutils.sysconfig import get_python_version
from distutils import log
class bdist_wininst(Command):
description = "create an executable installer for MS Windows"
user_options = [('bdist-dir=', None,
"temporary directory for creating the distribution"),
('plat-name=', 'p',
"platform name to embed in generated filenames "
"(default: %s)" % get_platform()),
('keep-temp', 'k',
"keep the pseudo-installation tree around after " +
"creating the distribution archive"),
('target-version=', None,
"require a specific python version" +
" on the target system"),
('no-target-compile', 'c',
"do not compile .py to .pyc on the target system"),
('no-target-optimize', 'o',
"do not compile .py to .pyo (optimized)"
"on the target system"),
('dist-dir=', 'd',
"directory to put final built distributions in"),
('bitmap=', 'b',
"bitmap to use for the installer instead of python-powered logo"),
('title=', 't',
"title to display on the installer background instead of default"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
('install-script=', None,
"basename of installation script to be run after"
"installation or before deinstallation"),
('pre-install-script=', None,
"Fully qualified filename of a script to be run before "
"any files are installed. This script need not be in the "
"distribution"),
('user-access-control=', None,
"specify Vista's UAC handling - 'none'/default=no "
"handling, 'auto'=use UAC if target Python installed for "
"all users, 'force'=always use UAC"),
]
boolean_options = ['keep-temp', 'no-target-compile', 'no-target-optimize',
'skip-build']
def initialize_options(self):
self.bdist_dir = None
self.plat_name = None
self.keep_temp = 0
self.no_target_compile = 0
self.no_target_optimize = 0
self.target_version = None
self.dist_dir = None
self.bitmap = None
self.title = None
self.skip_build = None
self.install_script = None
self.pre_install_script = None
self.user_access_control = None
def finalize_options(self):
self.set_undefined_options('bdist', ('skip_build', 'skip_build'))
if self.bdist_dir is None:
if self.skip_build and self.plat_name:
# If build is skipped and plat_name is overridden, bdist will
# not see the correct 'plat_name' - so set that up manually.
bdist = self.distribution.get_command_obj('bdist')
bdist.plat_name = self.plat_name
# next the command will be initialized using that name
bdist_base = self.get_finalized_command('bdist').bdist_base
self.bdist_dir = os.path.join(bdist_base, 'wininst')
if not self.target_version:
self.target_version = ""
if not self.skip_build and self.distribution.has_ext_modules():
short_version = get_python_version()
if self.target_version and self.target_version != short_version:
raise DistutilsOptionError(
"target version can only be %s, or the '--skip-build'" \
" option must be specified" % (short_version,))
self.target_version = short_version
self.set_undefined_options('bdist',
('dist_dir', 'dist_dir'),
('plat_name', 'plat_name'),
)
if self.install_script:
for script in self.distribution.scripts:
if self.install_script == os.path.basename(script):
break
else:
raise DistutilsOptionError(
"install_script '%s' not found in scripts"
% self.install_script)
def run(self):
if (sys.platform != "win32" and
(self.distribution.has_ext_modules() or
self.distribution.has_c_libraries())):
raise DistutilsPlatformError \
("distribution contains extensions and/or C libraries; "
"must be compiled on a Windows 32 platform")
if not self.skip_build:
self.run_command('build')
install = self.reinitialize_command('install', reinit_subcommands=1)
install.root = self.bdist_dir
install.skip_build = self.skip_build
install.warn_dir = 0
install.plat_name = self.plat_name
install_lib = self.reinitialize_command('install_lib')
# we do not want to include pyc or pyo files
install_lib.compile = 0
install_lib.optimize = 0
if self.distribution.has_ext_modules():
# If we are building an installer for a Python version other
# than the one we are currently running, then we need to ensure
# our build_lib reflects the other Python version rather than ours.
# Note that for target_version!=sys.version, we must have skipped the
# build step, so there is no issue with enforcing the build of this
# version.
target_version = self.target_version
if not target_version:
assert self.skip_build, "Should have already checked this"
target_version = sys.version[0:3]
plat_specifier = ".%s-%s" % (self.plat_name, target_version)
build = self.get_finalized_command('build')
build.build_lib = os.path.join(build.build_base,
'lib' + plat_specifier)
# Use a custom scheme for the zip-file, because we have to decide
# at installation time which scheme to use.
for key in ('purelib', 'platlib', 'headers', 'scripts', 'data'):
value = key.upper()
if key == 'headers':
value = value + '/Include/$dist_name'
setattr(install,
'install_' + key,
value)
log.info("installing to %s", self.bdist_dir)
install.ensure_finalized()
# avoid warning of 'install_lib' about installing
# into a directory not in sys.path
sys.path.insert(0, os.path.join(self.bdist_dir, 'PURELIB'))
install.run()
del sys.path[0]
# And make an archive relative to the root of the
# pseudo-installation tree.
from tempfile import mktemp
archive_basename = mktemp()
fullname = self.distribution.get_fullname()
arcname = self.make_archive(archive_basename, "zip",
root_dir=self.bdist_dir)
# create an exe containing the zip-file
self.create_exe(arcname, fullname, self.bitmap)
if self.distribution.has_ext_modules():
pyversion = get_python_version()
else:
pyversion = 'any'
self.distribution.dist_files.append(('bdist_wininst', pyversion,
self.get_installer_filename(fullname)))
# remove the zip-file again
log.debug("removing temporary file '%s'", arcname)
os.remove(arcname)
if not self.keep_temp:
remove_tree(self.bdist_dir, dry_run=self.dry_run)
def get_inidata(self):
# Return data describing the installation.
lines = []
metadata = self.distribution.metadata
# Write the [metadata] section.
lines.append("[metadata]")
# 'info' will be displayed in the installer's dialog box,
# describing the items to be installed.
info = (metadata.long_description or '') + '\n'
# Escape newline characters
def escape(s):
return s.replace("\n", "\\n")
for name in ["author", "author_email", "description", "maintainer",
"maintainer_email", "name", "url", "version"]:
data = getattr(metadata, name, "")
if data:
info = info + ("\n %s: %s" % \
(name.capitalize(), escape(data)))
lines.append("%s=%s" % (name, escape(data)))
# The [setup] section contains entries controlling
# the installer runtime.
lines.append("\n[Setup]")
if self.install_script:
lines.append("install_script=%s" % self.install_script)
lines.append("info=%s" % escape(info))
lines.append("target_compile=%d" % (not self.no_target_compile))
lines.append("target_optimize=%d" % (not self.no_target_optimize))
if self.target_version:
lines.append("target_version=%s" % self.target_version)
if self.user_access_control:
lines.append("user_access_control=%s" % self.user_access_control)
title = self.title or self.distribution.get_fullname()
lines.append("title=%s" % escape(title))
import time
import distutils
build_info = "Built %s with distutils-%s" % \
(time.ctime(time.time()), distutils.__version__)
lines.append("build_info=%s" % build_info)
return "\n".join(lines)
def create_exe(self, arcname, fullname, bitmap=None):
import struct
self.mkpath(self.dist_dir)
cfgdata = self.get_inidata()
installer_name = self.get_installer_filename(fullname)
self.announce("creating %s" % installer_name)
if bitmap:
bitmapdata = open(bitmap, "rb").read()
bitmaplen = len(bitmapdata)
else:
bitmaplen = 0
file = open(installer_name, "wb")
file.write(self.get_exe_bytes())
if bitmap:
file.write(bitmapdata)
# Convert cfgdata from unicode to ascii, mbcs encoded
if isinstance(cfgdata, str):
cfgdata = cfgdata.encode("mbcs")
# Append the pre-install script
cfgdata = cfgdata + b"\0"
if self.pre_install_script:
# We need to normalize newlines, so we open in text mode and
# convert back to bytes. "latin-1" simply avoids any possible
# failures.
with open(self.pre_install_script, "r",
encoding="latin-1") as script:
script_data = script.read().encode("latin-1")
cfgdata = cfgdata + script_data + b"\n\0"
else:
# empty pre-install script
cfgdata = cfgdata + b"\0"
file.write(cfgdata)
# The 'magic number' 0x1234567B is used to make sure that the
# binary layout of 'cfgdata' is what the wininst.exe binary
# expects. If the layout changes, increment that number, make
# the corresponding changes to the wininst.exe sources, and
# recompile them.
header = struct.pack("<iii",
0x1234567B, # tag
len(cfgdata), # length
bitmaplen, # number of bytes in bitmap
)
file.write(header)
file.write(open(arcname, "rb").read())
def get_installer_filename(self, fullname):
# Factored out to allow overriding in subclasses
if self.target_version:
# if we create an installer for a specific python version,
# it's better to include this in the name
installer_name = os.path.join(self.dist_dir,
"%s.%s-py%s.exe" %
(fullname, self.plat_name, self.target_version))
else:
installer_name = os.path.join(self.dist_dir,
"%s.%s.exe" % (fullname, self.plat_name))
return installer_name
def get_exe_bytes(self):
from distutils.msvccompiler import get_build_version
# If a target-version other than the current version has been
# specified, then using the MSVC version from *this* build is no good.
# Without actually finding and executing the target version and parsing
# its sys.version, we just hard-code our knowledge of old versions.
# NOTE: Possible alternative is to allow "--target-version" to
# specify a Python executable rather than a simple version string.
# We can then execute this program to obtain any info we need, such
# as the real sys.version string for the build.
cur_version = get_python_version()
if self.target_version and self.target_version != cur_version:
# If the target version is *later* than us, then we assume they
# use what we use
# string compares seem wrong, but are what sysconfig.py itself uses
if self.target_version > cur_version:
bv = get_build_version()
else:
if self.target_version < "2.4":
bv = 6.0
else:
bv = 7.1
else:
# for current version - use authoritative check.
bv = get_build_version()
# wininst-x.y.exe is in the same directory as this file
directory = os.path.dirname(__file__)
# we must use a wininst-x.y.exe built with the same C compiler
# used for python. XXX What about mingw, borland, and so on?
# if plat_name starts with "win" but is not "win32"
# we want to strip "win" and leave the rest (e.g. -amd64)
# for all other cases, we don't want any suffix
if self.plat_name != 'win32' and self.plat_name[:3] == 'win':
sfix = self.plat_name[3:]
else:
sfix = ''
filename = os.path.join(directory, "wininst-%.1f%s.exe" % (bv, sfix))
f = open(filename, "rb")
try:
return f.read()
finally:
f.close()
"""distutils.command.build
Implements the Distutils 'build' command."""
import sys, os
from distutils.core import Command
from distutils.errors import DistutilsOptionError
from distutils.util import get_platform
def show_compilers():
from distutils.ccompiler import show_compilers
show_compilers()
class build(Command):
description = "build everything needed to install"
user_options = [
('build-base=', 'b',
"base directory for build library"),
('build-purelib=', None,
"build directory for platform-neutral distributions"),
('build-platlib=', None,
"build directory for platform-specific distributions"),
('build-lib=', None,
"build directory for all distribution (defaults to either " +
"build-purelib or build-platlib"),
('build-scripts=', None,
"build directory for scripts"),
('build-temp=', 't',
"temporary build directory"),
('plat-name=', 'p',
"platform name to build for, if supported "
"(default: %s)" % get_platform()),
('compiler=', 'c',
"specify the compiler type"),
('debug', 'g',
"compile extensions and libraries with debugging information"),
('force', 'f',
"forcibly build everything (ignore file timestamps)"),
('executable=', 'e',
"specify final destination interpreter path (build.py)"),
]
boolean_options = ['debug', 'force']
help_options = [
('help-compiler', None,
"list available compilers", show_compilers),
]
def initialize_options(self):
self.build_base = 'build'
# these are decided only after 'build_base' has its final value
# (unless overridden by the user or client)
self.build_purelib = None
self.build_platlib = None
self.build_lib = None
self.build_temp = None
self.build_scripts = None
self.compiler = None
self.plat_name = None
self.debug = None
self.force = 0
self.executable = None
def finalize_options(self):
if self.plat_name is None:
self.plat_name = get_platform()
else:
# plat-name only supported for windows (other platforms are
# supported via ./configure flags, if at all). Avoid misleading
# other platforms.
if os.name != 'nt':
raise DistutilsOptionError(
"--plat-name only supported on Windows (try "
"using './configure --help' on your platform)")
plat_specifier = ".%s-%s" % (self.plat_name, sys.version[0:3])
# Make it so Python 2.x and Python 2.x with --with-pydebug don't
# share the same build directories. Doing so confuses the build
# process for C modules
if hasattr(sys, 'gettotalrefcount'):
plat_specifier += '-pydebug'
# 'build_purelib' and 'build_platlib' just default to 'lib' and
# 'lib.<plat>' under the base build directory. We only use one of
# them for a given distribution, though --
if self.build_purelib is None:
self.build_purelib = os.path.join(self.build_base, 'lib')
if self.build_platlib is None:
self.build_platlib = os.path.join(self.build_base,
'lib' + plat_specifier)
# 'build_lib' is the actual directory that we will use for this
# particular module distribution -- if user didn't supply it, pick
# one of 'build_purelib' or 'build_platlib'.
if self.build_lib is None:
if self.distribution.ext_modules:
self.build_lib = self.build_platlib
else:
self.build_lib = self.build_purelib
# 'build_temp' -- temporary directory for compiler turds,
# "build/temp.<plat>"
if self.build_temp is None:
self.build_temp = os.path.join(self.build_base,
'temp' + plat_specifier)
if self.build_scripts is None:
self.build_scripts = os.path.join(self.build_base,
'scripts-' + sys.version[0:3])
if self.executable is None:
self.executable = os.path.normpath(sys.executable)
def run(self):
# Run all relevant sub-commands. This will be some subset of:
# - build_py - pure Python modules
# - build_clib - standalone C libraries
# - build_ext - Python extensions
# - build_scripts - (Python) scripts
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
# -- Predicates for the sub-command list ---------------------------
def has_pure_modules(self):
return self.distribution.has_pure_modules()
def has_c_libraries(self):
return self.distribution.has_c_libraries()
def has_ext_modules(self):
return self.distribution.has_ext_modules()
def has_scripts(self):
return self.distribution.has_scripts()
sub_commands = [('build_py', has_pure_modules),
('build_clib', has_c_libraries),
('build_ext', has_ext_modules),
('build_scripts', has_scripts),
]
"""distutils.command.build_clib
Implements the Distutils 'build_clib' command, to build a C/C++ library
that is included in the module distribution and needed by an extension
module."""
# XXX this module has *lots* of code ripped-off quite transparently from
# build_ext.py -- not surprisingly really, as the work required to build
# a static library from a collection of C source files is not really all
# that different from what's required to build a shared object file from
# a collection of C source files. Nevertheless, I haven't done the
# necessary refactoring to account for the overlap in code between the
# two modules, mainly because a number of subtle details changed in the
# cut 'n paste. Sigh.
import os
from distutils.core import Command
from distutils.errors import *
from distutils.sysconfig import customize_compiler
from distutils import log
def show_compilers():
from distutils.ccompiler import show_compilers
show_compilers()
class build_clib(Command):
description = "build C/C++ libraries used by Python extensions"
user_options = [
('build-clib=', 'b',
"directory to build C/C++ libraries to"),
('build-temp=', 't',
"directory to put temporary build by-products"),
('debug', 'g',
"compile with debugging information"),
('force', 'f',
"forcibly build everything (ignore file timestamps)"),
('compiler=', 'c',
"specify the compiler type"),
]
boolean_options = ['debug', 'force']
help_options = [
('help-compiler', None,
"list available compilers", show_compilers),
]
def initialize_options(self):
self.build_clib = None
self.build_temp = None
# List of libraries to build
self.libraries = None
# Compilation options for all libraries
self.include_dirs = None
self.define = None
self.undef = None
self.debug = None
self.force = 0
self.compiler = None
def finalize_options(self):
# This might be confusing: both build-clib and build-temp default
# to build-temp as defined by the "build" command. This is because
# I think that C libraries are really just temporary build
# by-products, at least from the point of view of building Python
# extensions -- but I want to keep my options open.
self.set_undefined_options('build',
('build_temp', 'build_clib'),
('build_temp', 'build_temp'),
('compiler', 'compiler'),
('debug', 'debug'),
('force', 'force'))
self.libraries = self.distribution.libraries
if self.libraries:
self.check_library_list(self.libraries)
if self.include_dirs is None:
self.include_dirs = self.distribution.include_dirs or []
if isinstance(self.include_dirs, str):
self.include_dirs = self.include_dirs.split(os.pathsep)
# XXX same as for build_ext -- what about 'self.define' and
# 'self.undef' ?
def run(self):
if not self.libraries:
return
# Yech -- this is cut 'n pasted from build_ext.py!
from distutils.ccompiler import new_compiler
self.compiler = new_compiler(compiler=self.compiler,
dry_run=self.dry_run,
force=self.force)
customize_compiler(self.compiler)
if self.include_dirs is not None:
self.compiler.set_include_dirs(self.include_dirs)
if self.define is not None:
# 'define' option is a list of (name,value) tuples
for (name,value) in self.define:
self.compiler.define_macro(name, value)
if self.undef is not None:
for macro in self.undef:
self.compiler.undefine_macro(macro)
self.build_libraries(self.libraries)
def check_library_list(self, libraries):
"""Ensure that the list of libraries is valid.
`library` is presumably provided as a command option 'libraries'.
This method checks that it is a list of 2-tuples, where the tuples
are (library_name, build_info_dict).
Raise DistutilsSetupError if the structure is invalid anywhere;
just returns otherwise.
"""
if not isinstance(libraries, list):
raise DistutilsSetupError(
"'libraries' option must be a list of tuples")
for lib in libraries:
if not isinstance(lib, tuple) and len(lib) != 2:
raise DistutilsSetupError(
"each element of 'libraries' must a 2-tuple")
name, build_info = lib
if not isinstance(name, str):
raise DistutilsSetupError(
"first element of each tuple in 'libraries' "
"must be a string (the library name)")
if '/' in name or (os.sep != '/' and os.sep in name):
raise DistutilsSetupError("bad library name '%s': "
"may not contain directory separators" % lib[0])
if not isinstance(build_info, dict):
raise DistutilsSetupError(
"second element of each tuple in 'libraries' "
"must be a dictionary (build info)")
def get_library_names(self):
# Assume the library list is valid -- 'check_library_list()' is
# called from 'finalize_options()', so it should be!
if not self.libraries:
return None
lib_names = []
for (lib_name, build_info) in self.libraries:
lib_names.append(lib_name)
return lib_names
def get_source_files(self):
self.check_library_list(self.libraries)
filenames = []
for (lib_name, build_info) in self.libraries:
sources = build_info.get('sources')
if sources is None or not isinstance(sources, (list, tuple)):
raise DistutilsSetupError(
"in 'libraries' option (library '%s'), "
"'sources' must be present and must be "
"a list of source filenames" % lib_name)
filenames.extend(sources)
return filenames
def build_libraries(self, libraries):
for (lib_name, build_info) in libraries:
sources = build_info.get('sources')
if sources is None or not isinstance(sources, (list, tuple)):
raise DistutilsSetupError(
"in 'libraries' option (library '%s'), "
"'sources' must be present and must be "
"a list of source filenames" % lib_name)
sources = list(sources)
log.info("building '%s' library", lib_name)
# First, compile the source code to object files in the library
# directory. (This should probably change to putting object
# files in a temporary build directory.)
macros = build_info.get('macros')
include_dirs = build_info.get('include_dirs')
objects = self.compiler.compile(sources,
output_dir=self.build_temp,
macros=macros,
include_dirs=include_dirs,
debug=self.debug)
# Now "link" the object files together into a static library.
# (On Unix at least, this isn't really linking -- it just
# builds an archive. Whatever.)
self.compiler.create_static_lib(objects, lib_name,
output_dir=self.build_clib,
debug=self.debug)
"""distutils.command.build_ext
Implements the Distutils 'build_ext' command, for building extension
modules (currently limited to C extensions, should accommodate C++
extensions ASAP)."""
import sys, os, re
from distutils.core import Command
from distutils.errors import *
from distutils.sysconfig import customize_compiler, get_python_version
from distutils.sysconfig import get_config_h_filename
from distutils.dep_util import newer_group
from distutils.extension import Extension
from distutils.util import get_platform
from distutils import log
from site import USER_BASE
if os.name == 'nt':
from distutils.msvccompiler import get_build_version
MSVC_VERSION = int(get_build_version())
# An extension name is just a dot-separated list of Python NAMEs (ie.
# the same as a fully-qualified module name).
extension_name_re = re.compile \
(r'^[a-zA-Z_][a-zA-Z_0-9]*(\.[a-zA-Z_][a-zA-Z_0-9]*)*$')
def show_compilers ():
from distutils.ccompiler import show_compilers
show_compilers()
class build_ext(Command):
description = "build C/C++ extensions (compile/link to build directory)"
# XXX thoughts on how to deal with complex command-line options like
# these, i.e. how to make it so fancy_getopt can suck them off the
# command line and make it look like setup.py defined the appropriate
# lists of tuples of what-have-you.
# - each command needs a callback to process its command-line options
# - Command.__init__() needs access to its share of the whole
# command line (must ultimately come from
# Distribution.parse_command_line())
# - it then calls the current command class' option-parsing
# callback to deal with weird options like -D, which have to
# parse the option text and churn out some custom data
# structure
# - that data structure (in this case, a list of 2-tuples)
# will then be present in the command object by the time
# we get to finalize_options() (i.e. the constructor
# takes care of both command-line and client options
# in between initialize_options() and finalize_options())
sep_by = " (separated by '%s')" % os.pathsep
user_options = [
('build-lib=', 'b',
"directory for compiled extension modules"),
('build-temp=', 't',
"directory for temporary files (build by-products)"),
('plat-name=', 'p',
"platform name to cross-compile for, if supported "
"(default: %s)" % get_platform()),
('inplace', 'i',
"ignore build-lib and put compiled extensions into the source " +
"directory alongside your pure Python modules"),
('include-dirs=', 'I',
"list of directories to search for header files" + sep_by),
('define=', 'D',
"C preprocessor macros to define"),
('undef=', 'U',
"C preprocessor macros to undefine"),
('libraries=', 'l',
"external C libraries to link with"),
('library-dirs=', 'L',
"directories to search for external C libraries" + sep_by),
('rpath=', 'R',
"directories to search for shared C libraries at runtime"),
('link-objects=', 'O',
"extra explicit link objects to include in the link"),
('debug', 'g',
"compile/link with debugging information"),
('force', 'f',
"forcibly build everything (ignore file timestamps)"),
('compiler=', 'c',
"specify the compiler type"),
('swig-cpp', None,
"make SWIG create C++ files (default is C)"),
('swig-opts=', None,
"list of SWIG command line options"),
('swig=', None,
"path to the SWIG executable"),
('user', None,
"add user include, library and rpath")
]
boolean_options = ['inplace', 'debug', 'force', 'swig-cpp', 'user']
help_options = [
('help-compiler', None,
"list available compilers", show_compilers),
]
def initialize_options(self):
self.extensions = None
self.build_lib = None
self.plat_name = None
self.build_temp = None
self.inplace = 0
self.package = None
self.include_dirs = None
self.define = None
self.undef = None
self.libraries = None
self.library_dirs = None
self.rpath = None
self.link_objects = None
self.debug = None
self.force = None
self.compiler = None
self.swig = None
self.swig_cpp = None
self.swig_opts = None
self.user = None
def finalize_options(self):
from distutils import sysconfig
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('build_temp', 'build_temp'),
('compiler', 'compiler'),
('debug', 'debug'),
('force', 'force'),
('plat_name', 'plat_name'),
)
if self.package is None:
self.package = self.distribution.ext_package
self.extensions = self.distribution.ext_modules
# Make sure Python's include directories (for Python.h, pyconfig.h,
# etc.) are in the include search path.
py_include = sysconfig.get_python_inc()
plat_py_include = sysconfig.get_python_inc(plat_specific=1)
if self.include_dirs is None:
self.include_dirs = self.distribution.include_dirs or []
if isinstance(self.include_dirs, str):
self.include_dirs = self.include_dirs.split(os.pathsep)
# If in a virtualenv, add its include directory
# Issue 16116
if sys.exec_prefix != sys.base_exec_prefix:
self.include_dirs.append(os.path.join(sys.exec_prefix, 'include'))
# Put the Python "system" include dir at the end, so that
# any local include dirs take precedence.
self.include_dirs.append(py_include)
if plat_py_include != py_include:
self.include_dirs.append(plat_py_include)
self.ensure_string_list('libraries')
# Life is easier if we're not forever checking for None, so
# simplify these options to empty lists if unset
if self.libraries is None:
self.libraries = []
if self.library_dirs is None:
self.library_dirs = []
elif isinstance(self.library_dirs, str):
self.library_dirs = self.library_dirs.split(os.pathsep)
if self.rpath is None:
self.rpath = []
elif isinstance(self.rpath, str):
self.rpath = self.rpath.split(os.pathsep)
# for extensions under windows use different directories
# for Release and Debug builds.
# also Python's library directory must be appended to library_dirs
if os.name == 'nt':
# the 'libs' directory is for binary installs - we assume that
# must be the *native* platform. But we don't really support
# cross-compiling via a binary install anyway, so we let it go.
self.library_dirs.append(os.path.join(sys.exec_prefix, 'libs'))
if sys.base_exec_prefix != sys.prefix: # Issue 16116
self.library_dirs.append(os.path.join(sys.base_exec_prefix, 'libs'))
if self.debug:
self.build_temp = os.path.join(self.build_temp, "Debug")
else:
self.build_temp = os.path.join(self.build_temp, "Release")
# Append the source distribution include and library directories,
# this allows distutils on windows to work in the source tree
self.include_dirs.append(os.path.dirname(get_config_h_filename()))
_sys_home = getattr(sys, '_home', None)
if _sys_home:
self.library_dirs.append(_sys_home)
if MSVC_VERSION >= 9:
# Use the .lib files for the correct architecture
if self.plat_name == 'win32':
suffix = ''
else:
# win-amd64 or win-ia64
suffix = self.plat_name[4:]
new_lib = os.path.join(sys.exec_prefix, 'PCbuild')
if suffix:
new_lib = os.path.join(new_lib, suffix)
self.library_dirs.append(new_lib)
elif MSVC_VERSION == 8:
self.library_dirs.append(os.path.join(sys.exec_prefix,
'PC', 'VS8.0'))
elif MSVC_VERSION == 7:
self.library_dirs.append(os.path.join(sys.exec_prefix,
'PC', 'VS7.1'))
else:
self.library_dirs.append(os.path.join(sys.exec_prefix,
'PC', 'VC6'))
# for extensions under Cygwin and AtheOS Python's library directory must be
# appended to library_dirs
if sys.platform[:6] == 'cygwin' or sys.platform[:6] == 'atheos':
if sys.executable.startswith(os.path.join(sys.exec_prefix, "bin")):
# building third party extensions
self.library_dirs.append(os.path.join(sys.prefix, "lib",
"python" + get_python_version(),
"config"))
else:
# building python standard extensions
self.library_dirs.append('.')
# For building extensions with a shared Python library,
# Python's library directory must be appended to library_dirs
# See Issues: #1600860, #4366
if (sysconfig.get_config_var('Py_ENABLE_SHARED')):
if not sysconfig.python_build:
# building third party extensions
self.library_dirs.append(sysconfig.get_config_var('LIBDIR'))
else:
# building python standard extensions
self.library_dirs.append('.')
# The argument parsing will result in self.define being a string, but
# it has to be a list of 2-tuples. All the preprocessor symbols
# specified by the 'define' option will be set to '1'. Multiple
# symbols can be separated with commas.
if self.define:
defines = self.define.split(',')
self.define = [(symbol, '1') for symbol in defines]
# The option for macros to undefine is also a string from the
# option parsing, but has to be a list. Multiple symbols can also
# be separated with commas here.
if self.undef:
self.undef = self.undef.split(',')
if self.swig_opts is None:
self.swig_opts = []
else:
self.swig_opts = self.swig_opts.split(' ')
# Finally add the user include and library directories if requested
if self.user:
user_include = os.path.join(USER_BASE, "include")
user_lib = os.path.join(USER_BASE, "lib")
if os.path.isdir(user_include):
self.include_dirs.append(user_include)
if os.path.isdir(user_lib):
self.library_dirs.append(user_lib)
self.rpath.append(user_lib)
def run(self):
from distutils.ccompiler import new_compiler
# 'self.extensions', as supplied by setup.py, is a list of
# Extension instances. See the documentation for Extension (in
# distutils.extension) for details.
#
# For backwards compatibility with Distutils 0.8.2 and earlier, we
# also allow the 'extensions' list to be a list of tuples:
# (ext_name, build_info)
# where build_info is a dictionary containing everything that
# Extension instances do except the name, with a few things being
# differently named. We convert these 2-tuples to Extension
# instances as needed.
if not self.extensions:
return
# If we were asked to build any C/C++ libraries, make sure that the
# directory where we put them is in the library search path for
# linking extensions.
if self.distribution.has_c_libraries():
build_clib = self.get_finalized_command('build_clib')
self.libraries.extend(build_clib.get_library_names() or [])
self.library_dirs.append(build_clib.build_clib)
# Setup the CCompiler object that we'll use to do all the
# compiling and linking
self.compiler = new_compiler(compiler=self.compiler,
verbose=self.verbose,
dry_run=self.dry_run,
force=self.force)
customize_compiler(self.compiler)
# If we are cross-compiling, init the compiler now (if we are not
# cross-compiling, init would not hurt, but people may rely on
# late initialization of compiler even if they shouldn't...)
if os.name == 'nt' and self.plat_name != get_platform():
self.compiler.initialize(self.plat_name)
# And make sure that any compile/link-related options (which might
# come from the command-line or from the setup script) are set in
# that CCompiler object -- that way, they automatically apply to
# all compiling and linking done here.
if self.include_dirs is not None:
self.compiler.set_include_dirs(self.include_dirs)
if self.define is not None:
# 'define' option is a list of (name,value) tuples
for (name, value) in self.define:
self.compiler.define_macro(name, value)
if self.undef is not None:
for macro in self.undef:
self.compiler.undefine_macro(macro)
if self.libraries is not None:
self.compiler.set_libraries(self.libraries)
if self.library_dirs is not None:
self.compiler.set_library_dirs(self.library_dirs)
if self.rpath is not None:
self.compiler.set_runtime_library_dirs(self.rpath)
if self.link_objects is not None:
self.compiler.set_link_objects(self.link_objects)
# Now actually compile and link everything.
self.build_extensions()
def check_extensions_list(self, extensions):
"""Ensure that the list of extensions (presumably provided as a
command option 'extensions') is valid, i.e. it is a list of
Extension objects. We also support the old-style list of 2-tuples,
where the tuples are (ext_name, build_info), which are converted to
Extension instances here.
Raise DistutilsSetupError if the structure is invalid anywhere;
just returns otherwise.
"""
if not isinstance(extensions, list):
raise DistutilsSetupError(
"'ext_modules' option must be a list of Extension instances")
for i, ext in enumerate(extensions):
if isinstance(ext, Extension):
continue # OK! (assume type-checking done
# by Extension constructor)
if not isinstance(ext, tuple) or len(ext) != 2:
raise DistutilsSetupError(
"each element of 'ext_modules' option must be an "
"Extension instance or 2-tuple")
ext_name, build_info = ext
log.warn(("old-style (ext_name, build_info) tuple found in "
"ext_modules for extension '%s'"
"-- please convert to Extension instance" % ext_name))
if not (isinstance(ext_name, str) and
extension_name_re.match(ext_name)):
raise DistutilsSetupError(
"first element of each tuple in 'ext_modules' "
"must be the extension name (a string)")
if not isinstance(build_info, dict):
raise DistutilsSetupError(
"second element of each tuple in 'ext_modules' "
"must be a dictionary (build info)")
# OK, the (ext_name, build_info) dict is type-safe: convert it
# to an Extension instance.
ext = Extension(ext_name, build_info['sources'])
# Easy stuff: one-to-one mapping from dict elements to
# instance attributes.
for key in ('include_dirs', 'library_dirs', 'libraries',
'extra_objects', 'extra_compile_args',
'extra_link_args'):
val = build_info.get(key)
if val is not None:
setattr(ext, key, val)
# Medium-easy stuff: same syntax/semantics, different names.
ext.runtime_library_dirs = build_info.get('rpath')
if 'def_file' in build_info:
log.warn("'def_file' element of build info dict "
"no longer supported")
# Non-trivial stuff: 'macros' split into 'define_macros'
# and 'undef_macros'.
macros = build_info.get('macros')
if macros:
ext.define_macros = []
ext.undef_macros = []
for macro in macros:
if not (isinstance(macro, tuple) and len(macro) in (1, 2)):
raise DistutilsSetupError(
"'macros' element of build info dict "
"must be 1- or 2-tuple")
if len(macro) == 1:
ext.undef_macros.append(macro[0])
elif len(macro) == 2:
ext.define_macros.append(macro)
extensions[i] = ext
def get_source_files(self):
self.check_extensions_list(self.extensions)
filenames = []
# Wouldn't it be neat if we knew the names of header files too...
for ext in self.extensions:
filenames.extend(ext.sources)
return filenames
def get_outputs(self):
# Sanity check the 'extensions' list -- can't assume this is being
# done in the same run as a 'build_extensions()' call (in fact, we
# can probably assume that it *isn't*!).
self.check_extensions_list(self.extensions)
# And build the list of output (built) filenames. Note that this
# ignores the 'inplace' flag, and assumes everything goes in the
# "build" tree.
outputs = []
for ext in self.extensions:
outputs.append(self.get_ext_fullpath(ext.name))
return outputs
def build_extensions(self):
# First, sanity-check the 'extensions' list
self.check_extensions_list(self.extensions)
for ext in self.extensions:
try:
self.build_extension(ext)
except (CCompilerError, DistutilsError, CompileError) as e:
if not ext.optional:
raise
self.warn('building extension "%s" failed: %s' %
(ext.name, e))
def build_extension(self, ext):
sources = ext.sources
if sources is None or not isinstance(sources, (list, tuple)):
raise DistutilsSetupError(
"in 'ext_modules' option (extension '%s'), "
"'sources' must be present and must be "
"a list of source filenames" % ext.name)
sources = list(sources)
ext_path = self.get_ext_fullpath(ext.name)
depends = sources + ext.depends
if not (self.force or newer_group(depends, ext_path, 'newer')):
log.debug("skipping '%s' extension (up-to-date)", ext.name)
return
else:
log.info("building '%s' extension", ext.name)
# First, scan the sources for SWIG definition files (.i), run
# SWIG on 'em to create .c files, and modify the sources list
# accordingly.
sources = self.swig_sources(sources, ext)
# Next, compile the source code to object files.
# XXX not honouring 'define_macros' or 'undef_macros' -- the
# CCompiler API needs to change to accommodate this, and I
# want to do one thing at a time!
# Two possible sources for extra compiler arguments:
# - 'extra_compile_args' in Extension object
# - CFLAGS environment variable (not particularly
# elegant, but people seem to expect it and I
# guess it's useful)
# The environment variable should take precedence, and
# any sensible compiler will give precedence to later
# command line args. Hence we combine them in order:
extra_args = ext.extra_compile_args or []
macros = ext.define_macros[:]
for undef in ext.undef_macros:
macros.append((undef,))
objects = self.compiler.compile(sources,
output_dir=self.build_temp,
macros=macros,
include_dirs=ext.include_dirs,
debug=self.debug,
extra_postargs=extra_args,
depends=ext.depends)
# XXX -- this is a Vile HACK!
#
# The setup.py script for Python on Unix needs to be able to
# get this list so it can perform all the clean up needed to
# avoid keeping object files around when cleaning out a failed
# build of an extension module. Since Distutils does not
# track dependencies, we have to get rid of intermediates to
# ensure all the intermediates will be properly re-built.
#
self._built_objects = objects[:]
# Now link the object files together into a "shared object" --
# of course, first we have to figure out all the other things
# that go into the mix.
if ext.extra_objects:
objects.extend(ext.extra_objects)
extra_args = ext.extra_link_args or []
# Detect target language, if not provided
language = ext.language or self.compiler.detect_language(sources)
self.compiler.link_shared_object(
objects, ext_path,
libraries=self.get_libraries(ext),
library_dirs=ext.library_dirs,
runtime_library_dirs=ext.runtime_library_dirs,
extra_postargs=extra_args,
export_symbols=self.get_export_symbols(ext),
debug=self.debug,
build_temp=self.build_temp,
target_lang=language)
def swig_sources(self, sources, extension):
"""Walk the list of source files in 'sources', looking for SWIG
interface (.i) files. Run SWIG on all that are found, and
return a modified 'sources' list with SWIG source files replaced
by the generated C (or C++) files.
"""
new_sources = []
swig_sources = []
swig_targets = {}
# XXX this drops generated C/C++ files into the source tree, which
# is fine for developers who want to distribute the generated
# source -- but there should be an option to put SWIG output in
# the temp dir.
if self.swig_cpp:
log.warn("--swig-cpp is deprecated - use --swig-opts=-c++")
if self.swig_cpp or ('-c++' in self.swig_opts) or \
('-c++' in extension.swig_opts):
target_ext = '.cpp'
else:
target_ext = '.c'
for source in sources:
(base, ext) = os.path.splitext(source)
if ext == ".i": # SWIG interface file
new_sources.append(base + '_wrap' + target_ext)
swig_sources.append(source)
swig_targets[source] = new_sources[-1]
else:
new_sources.append(source)
if not swig_sources:
return new_sources
swig = self.swig or self.find_swig()
swig_cmd = [swig, "-python"]
swig_cmd.extend(self.swig_opts)
if self.swig_cpp:
swig_cmd.append("-c++")
# Do not override commandline arguments
if not self.swig_opts:
for o in extension.swig_opts:
swig_cmd.append(o)
for source in swig_sources:
target = swig_targets[source]
log.info("swigging %s to %s", source, target)
self.spawn(swig_cmd + ["-o", target, source])
return new_sources
def find_swig(self):
"""Return the name of the SWIG executable. On Unix, this is
just "swig" -- it should be in the PATH. Tries a bit harder on
Windows.
"""
if os.name == "posix":
return "swig"
elif os.name == "nt":
# Look for SWIG in its standard installation directory on
# Windows (or so I presume!). If we find it there, great;
# if not, act like Unix and assume it's in the PATH.
for vers in ("1.3", "1.2", "1.1"):
fn = os.path.join("c:\\swig%s" % vers, "swig.exe")
if os.path.isfile(fn):
return fn
else:
return "swig.exe"
else:
raise DistutilsPlatformError(
"I don't know how to find (much less run) SWIG "
"on platform '%s'" % os.name)
# -- Name generators -----------------------------------------------
# (extension names, filenames, whatever)
def get_ext_fullpath(self, ext_name):
"""Returns the path of the filename for a given extension.
The file is located in `build_lib` or directly in the package
(inplace option).
"""
fullname = self.get_ext_fullname(ext_name)
modpath = fullname.split('.')
filename = self.get_ext_filename(modpath[-1])
if not self.inplace:
# no further work needed
# returning :
# build_dir/package/path/filename
filename = os.path.join(*modpath[:-1]+[filename])
return os.path.join(self.build_lib, filename)
# the inplace option requires to find the package directory
# using the build_py command for that
package = '.'.join(modpath[0:-1])
build_py = self.get_finalized_command('build_py')
package_dir = os.path.abspath(build_py.get_package_dir(package))
# returning
# package_dir/filename
return os.path.join(package_dir, filename)
def get_ext_fullname(self, ext_name):
"""Returns the fullname of a given extension name.
Adds the `package.` prefix"""
if self.package is None:
return ext_name
else:
return self.package + '.' + ext_name
def get_ext_filename(self, ext_name):
r"""Convert the name of an extension (eg. "foo.bar") into the name
of the file from which it will be loaded (eg. "foo/bar.so", or
"foo\bar.pyd").
"""
from distutils.sysconfig import get_config_var
ext_path = ext_name.split('.')
# extensions in debug_mode are named 'module_d.pyd' under windows
ext_suffix = get_config_var('EXT_SUFFIX')
if os.name == 'nt' and self.debug:
return os.path.join(*ext_path) + '_d' + ext_suffix
return os.path.join(*ext_path) + ext_suffix
def get_export_symbols(self, ext):
"""Return the list of symbols that a shared extension has to
export. This either uses 'ext.export_symbols' or, if it's not
provided, "PyInit_" + module_name. Only relevant on Windows, where
the .pyd file (DLL) must export the module "PyInit_" function.
"""
initfunc_name = "PyInit_" + ext.name.split('.')[-1]
if initfunc_name not in ext.export_symbols:
ext.export_symbols.append(initfunc_name)
return ext.export_symbols
def get_libraries(self, ext):
"""Return the list of libraries to link against when building a
shared extension. On most platforms, this is just 'ext.libraries';
on Windows, we add the Python library (eg. python20.dll).
"""
# The python library is always needed on Windows. For MSVC, this
# is redundant, since the library is mentioned in a pragma in
# pyconfig.h that MSVC groks. The other Windows compilers all seem
# to need it mentioned explicitly, though, so that's what we do.
# Append '_d' to the python import library on debug builds.
if sys.platform == "win32":
from distutils.msvccompiler import MSVCCompiler
if not isinstance(self.compiler, MSVCCompiler):
template = "python%d%d"
if self.debug:
template = template + '_d'
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib]
else:
return ext.libraries
elif sys.platform[:6] == "cygwin":
template = "python%d.%d"
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib]
elif sys.platform[:6] == "atheos":
from distutils import sysconfig
template = "python%d.%d"
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# Get SHLIBS from Makefile
extra = []
for lib in sysconfig.get_config_var('SHLIBS').split():
if lib.startswith('-l'):
extra.append(lib[2:])
else:
extra.append(lib)
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib, "m"] + extra
elif sys.platform == 'darwin':
# Don't use the default code below
return ext.libraries
elif sys.platform[:3] == 'aix':
# Don't use the default code below
return ext.libraries
else:
from distutils import sysconfig
if sysconfig.get_config_var('Py_ENABLE_SHARED'):
pythonlib = 'python{}.{}{}'.format(
sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff,
sys.abiflags)
return ext.libraries + [pythonlib]
else:
return ext.libraries
"""distutils.command.build_py
Implements the Distutils 'build_py' command."""
import os
import importlib.util
import sys
from glob import glob
from distutils.core import Command
from distutils.errors import *
from distutils.util import convert_path, Mixin2to3
from distutils import log
class build_py (Command):
description = "\"build\" pure Python modules (copy to build directory)"
user_options = [
('build-lib=', 'd', "directory to \"build\" (copy) to"),
('compile', 'c', "compile .py to .pyc"),
('no-compile', None, "don't compile .py files [default]"),
('optimize=', 'O',
"also compile with optimization: -O1 for \"python -O\", "
"-O2 for \"python -OO\", and -O0 to disable [default: -O0]"),
('force', 'f', "forcibly build everything (ignore file timestamps)"),
]
boolean_options = ['compile', 'force']
negative_opt = {'no-compile' : 'compile'}
def initialize_options(self):
self.build_lib = None
self.py_modules = None
self.package = None
self.package_data = None
self.package_dir = None
self.compile = 0
self.optimize = 0
self.force = None
def finalize_options(self):
self.set_undefined_options('build',
('build_lib', 'build_lib'),
('force', 'force'))
# Get the distribution options that are aliases for build_py
# options -- list of packages and list of modules.
self.packages = self.distribution.packages
self.py_modules = self.distribution.py_modules
self.package_data = self.distribution.package_data
self.package_dir = {}
if self.distribution.package_dir:
for name, path in self.distribution.package_dir.items():
self.package_dir[name] = convert_path(path)
self.data_files = self.get_data_files()
# Ick, copied straight from install_lib.py (fancy_getopt needs a
# type system! Hell, *everything* needs a type system!!!)
if not isinstance(self.optimize, int):
try:
self.optimize = int(self.optimize)
assert 0 <= self.optimize <= 2
except (ValueError, AssertionError):
raise DistutilsOptionError("optimize must be 0, 1, or 2")
def run(self):
# XXX copy_file by default preserves atime and mtime. IMHO this is
# the right thing to do, but perhaps it should be an option -- in
# particular, a site administrator might want installed files to
# reflect the time of installation rather than the last
# modification time before the installed release.
# XXX copy_file by default preserves mode, which appears to be the
# wrong thing to do: if a file is read-only in the working
# directory, we want it to be installed read/write so that the next
# installation of the same module distribution can overwrite it
# without problems. (This might be a Unix-specific issue.) Thus
# we turn off 'preserve_mode' when copying to the build directory,
# since the build directory is supposed to be exactly what the
# installation will look like (ie. we preserve mode when
# installing).
# Two options control which modules will be installed: 'packages'
# and 'py_modules'. The former lets us work with whole packages, not
# specifying individual modules at all; the latter is for
# specifying modules one-at-a-time.
if self.py_modules:
self.build_modules()
if self.packages:
self.build_packages()
self.build_package_data()
self.byte_compile(self.get_outputs(include_bytecode=0))
def get_data_files(self):
"""Generate list of '(package,src_dir,build_dir,filenames)' tuples"""
data = []
if not self.packages:
return data
for package in self.packages:
# Locate package source directory
src_dir = self.get_package_dir(package)
# Compute package build directory
build_dir = os.path.join(*([self.build_lib] + package.split('.')))
# Length of path to strip from found files
plen = 0
if src_dir:
plen = len(src_dir)+1
# Strip directory from globbed filenames
filenames = [
file[plen:] for file in self.find_data_files(package, src_dir)
]
data.append((package, src_dir, build_dir, filenames))
return data
def find_data_files(self, package, src_dir):
"""Return filenames for package's data files in 'src_dir'"""
globs = (self.package_data.get('', [])
+ self.package_data.get(package, []))
files = []
for pattern in globs:
# Each pattern has to be converted to a platform-specific path
filelist = glob(os.path.join(src_dir, convert_path(pattern)))
# Files that match more than one pattern are only added once
files.extend([fn for fn in filelist if fn not in files
and os.path.isfile(fn)])
return files
def build_package_data(self):
"""Copy data files into build directory"""
lastdir = None
for package, src_dir, build_dir, filenames in self.data_files:
for filename in filenames:
target = os.path.join(build_dir, filename)
self.mkpath(os.path.dirname(target))
self.copy_file(os.path.join(src_dir, filename), target,
preserve_mode=False)
def get_package_dir(self, package):
"""Return the directory, relative to the top of the source
distribution, where package 'package' should be found
(at least according to the 'package_dir' option, if any)."""
path = package.split('.')
if not self.package_dir:
if path:
return os.path.join(*path)
else:
return ''
else:
tail = []
while path:
try:
pdir = self.package_dir['.'.join(path)]
except KeyError:
tail.insert(0, path[-1])
del path[-1]
else:
tail.insert(0, pdir)
return os.path.join(*tail)
else:
# Oops, got all the way through 'path' without finding a
# match in package_dir. If package_dir defines a directory
# for the root (nameless) package, then fallback on it;
# otherwise, we might as well have not consulted
# package_dir at all, as we just use the directory implied
# by 'tail' (which should be the same as the original value
# of 'path' at this point).
pdir = self.package_dir.get('')
if pdir is not None:
tail.insert(0, pdir)
if tail:
return os.path.join(*tail)
else:
return ''
def check_package(self, package, package_dir):
# Empty dir name means current directory, which we can probably
# assume exists. Also, os.path.exists and isdir don't know about
# my "empty string means current dir" convention, so we have to
# circumvent them.
if package_dir != "":
if not os.path.exists(package_dir):
raise DistutilsFileError(
"package directory '%s' does not exist" % package_dir)
if not os.path.isdir(package_dir):
raise DistutilsFileError(
"supposed package directory '%s' exists, "
"but is not a directory" % package_dir)
# Require __init__.py for all but the "root package"
if package:
init_py = os.path.join(package_dir, "__init__.py")
if os.path.isfile(init_py):
return init_py
else:
log.warn(("package init file '%s' not found " +
"(or not a regular file)"), init_py)
# Either not in a package at all (__init__.py not expected), or
# __init__.py doesn't exist -- so don't return the filename.
return None
def check_module(self, module, module_file):
if not os.path.isfile(module_file):
log.warn("file %s (for module %s) not found", module_file, module)
return False
else:
return True
def find_package_modules(self, package, package_dir):
self.check_package(package, package_dir)
module_files = glob(os.path.join(package_dir, "*.py"))
modules = []
setup_script = os.path.abspath(self.distribution.script_name)
for f in module_files:
abs_f = os.path.abspath(f)
if abs_f != setup_script:
module = os.path.splitext(os.path.basename(f))[0]
modules.append((package, module, f))
else:
self.debug_print("excluding %s" % setup_script)
return modules
def find_modules(self):
"""Finds individually-specified Python modules, ie. those listed by
module name in 'self.py_modules'. Returns a list of tuples (package,
module_base, filename): 'package' is a tuple of the path through
package-space to the module; 'module_base' is the bare (no
packages, no dots) module name, and 'filename' is the path to the
".py" file (relative to the distribution root) that implements the
module.
"""
# Map package names to tuples of useful info about the package:
# (package_dir, checked)
# package_dir - the directory where we'll find source files for
# this package
# checked - true if we have checked that the package directory
# is valid (exists, contains __init__.py, ... ?)
packages = {}
# List of (package, module, filename) tuples to return
modules = []
# We treat modules-in-packages almost the same as toplevel modules,
# just the "package" for a toplevel is empty (either an empty
# string or empty list, depending on context). Differences:
# - don't check for __init__.py in directory for empty package
for module in self.py_modules:
path = module.split('.')
package = '.'.join(path[0:-1])
module_base = path[-1]
try:
(package_dir, checked) = packages[package]
except KeyError:
package_dir = self.get_package_dir(package)
checked = 0
if not checked:
init_py = self.check_package(package, package_dir)
packages[package] = (package_dir, 1)
if init_py:
modules.append((package, "__init__", init_py))
# XXX perhaps we should also check for just .pyc files
# (so greedy closed-source bastards can distribute Python
# modules too)
module_file = os.path.join(package_dir, module_base + ".py")
if not self.check_module(module, module_file):
continue
modules.append((package, module_base, module_file))
return modules
def find_all_modules(self):
"""Compute the list of all modules that will be built, whether
they are specified one-module-at-a-time ('self.py_modules') or
by whole packages ('self.packages'). Return a list of tuples
(package, module, module_file), just like 'find_modules()' and
'find_package_modules()' do."""
modules = []
if self.py_modules:
modules.extend(self.find_modules())
if self.packages:
for package in self.packages:
package_dir = self.get_package_dir(package)
m = self.find_package_modules(package, package_dir)
modules.extend(m)
return modules
def get_source_files(self):
return [module[-1] for module in self.find_all_modules()]
def get_module_outfile(self, build_dir, package, module):
outfile_path = [build_dir] + list(package) + [module + ".py"]
return os.path.join(*outfile_path)
def get_outputs(self, include_bytecode=1):
modules = self.find_all_modules()
outputs = []
for (package, module, module_file) in modules:
package = package.split('.')
filename = self.get_module_outfile(self.build_lib, package, module)
outputs.append(filename)
if include_bytecode:
if self.compile:
outputs.append(importlib.util.cache_from_source(
filename, debug_override=True))
if self.optimize > 0:
outputs.append(importlib.util.cache_from_source(
filename, debug_override=False))
outputs += [
os.path.join(build_dir, filename)
for package, src_dir, build_dir, filenames in self.data_files
for filename in filenames
]
return outputs
def build_module(self, module, module_file, package):
if isinstance(package, str):
package = package.split('.')
elif not isinstance(package, (list, tuple)):
raise TypeError(
"'package' must be a string (dot-separated), list, or tuple")
# Now put the module source file into the "build" area -- this is
# easy, we just copy it somewhere under self.build_lib (the build
# directory for Python source).
outfile = self.get_module_outfile(self.build_lib, package, module)
dir = os.path.dirname(outfile)
self.mkpath(dir)
return self.copy_file(module_file, outfile, preserve_mode=0)
def build_modules(self):
modules = self.find_modules()
for (package, module, module_file) in modules:
# Now "build" the module -- ie. copy the source file to
# self.build_lib (the build directory for Python source).
# (Actually, it gets copied to the directory for this package
# under self.build_lib.)
self.build_module(module, module_file, package)
def build_packages(self):
for package in self.packages:
# Get list of (package, module, module_file) tuples based on
# scanning the package directory. 'package' is only included
# in the tuple so that 'find_modules()' and
# 'find_package_tuples()' have a consistent interface; it's
# ignored here (apart from a sanity check). Also, 'module' is
# the *unqualified* module name (ie. no dots, no package -- we
# already know its package!), and 'module_file' is the path to
# the .py file, relative to the current directory
# (ie. including 'package_dir').
package_dir = self.get_package_dir(package)
modules = self.find_package_modules(package, package_dir)
# Now loop over the modules we found, "building" each one (just
# copy it to self.build_lib).
for (package_, module, module_file) in modules:
assert package == package_
self.build_module(module, module_file, package)
def byte_compile(self, files):
if sys.dont_write_bytecode:
self.warn('byte-compiling is disabled, skipping.')
return
from distutils.util import byte_compile
prefix = self.build_lib
if prefix[-1] != os.sep:
prefix = prefix + os.sep
# XXX this code is essentially the same as the 'byte_compile()
# method of the "install_lib" command, except for the determination
# of the 'prefix' string. Hmmm.
if self.compile:
byte_compile(files, optimize=0,
force=self.force, prefix=prefix, dry_run=self.dry_run)
if self.optimize > 0:
byte_compile(files, optimize=self.optimize,
force=self.force, prefix=prefix, dry_run=self.dry_run)
class build_py_2to3(build_py, Mixin2to3):
def run(self):
self.updated_files = []
# Base class code
if self.py_modules:
self.build_modules()
if self.packages:
self.build_packages()
self.build_package_data()
# 2to3
self.run_2to3(self.updated_files)
# Remaining base class code
self.byte_compile(self.get_outputs(include_bytecode=0))
def build_module(self, module, module_file, package):
res = build_py.build_module(self, module, module_file, package)
if res[1]:
# file was copied
self.updated_files.append(res[0])
return res
"""distutils.command.build_scripts
Implements the Distutils 'build_scripts' command."""
import os, re
from stat import ST_MODE
from distutils import sysconfig
from distutils.core import Command
from distutils.dep_util import newer
from distutils.util import convert_path, Mixin2to3
from distutils import log
import tokenize
# check if Python is called on the first line with this expression
first_line_re = re.compile(b'^#!.*python[0-9.]*([ \t].*)?$')
class build_scripts(Command):
description = "\"build\" scripts (copy and fixup #! line)"
user_options = [
('build-dir=', 'd', "directory to \"build\" (copy) to"),
('force', 'f', "forcibly build everything (ignore file timestamps"),
('executable=', 'e', "specify final destination interpreter path"),
]
boolean_options = ['force']
def initialize_options(self):
self.build_dir = None
self.scripts = None
self.force = None
self.executable = None
self.outfiles = None
def finalize_options(self):
self.set_undefined_options('build',
('build_scripts', 'build_dir'),
('force', 'force'),
('executable', 'executable'))
self.scripts = self.distribution.scripts
def get_source_files(self):
return self.scripts
def run(self):
if not self.scripts:
return
self.copy_scripts()
def copy_scripts(self):
"""Copy each script listed in 'self.scripts'; if it's marked as a
Python script in the Unix way (first line matches 'first_line_re',
ie. starts with "\#!" and contains "python"), then adjust the first
line to refer to the current Python interpreter as we copy.
"""
self.mkpath(self.build_dir)
outfiles = []
updated_files = []
for script in self.scripts:
adjust = False
script = convert_path(script)
outfile = os.path.join(self.build_dir, os.path.basename(script))
outfiles.append(outfile)
if not self.force and not newer(script, outfile):
log.debug("not copying %s (up-to-date)", script)
continue
# Always open the file, but ignore failures in dry-run mode --
# that way, we'll get accurate feedback if we can read the
# script.
try:
f = open(script, "rb")
except OSError:
if not self.dry_run:
raise
f = None
else:
encoding, lines = tokenize.detect_encoding(f.readline)
f.seek(0)
first_line = f.readline()
if not first_line:
self.warn("%s is an empty file (skipping)" % script)
continue
match = first_line_re.match(first_line)
if match:
adjust = True
post_interp = match.group(1) or b''
if adjust:
log.info("copying and adjusting %s -> %s", script,
self.build_dir)
updated_files.append(outfile)
if not self.dry_run:
if not sysconfig.python_build:
executable = self.executable
else:
executable = os.path.join(
sysconfig.get_config_var("BINDIR"),
"python%s%s" % (sysconfig.get_config_var("VERSION"),
sysconfig.get_config_var("EXE")))
executable = os.fsencode(executable)
shebang = b"#!" + executable + post_interp + b"\n"
# Python parser starts to read a script using UTF-8 until
# it gets a #coding:xxx cookie. The shebang has to be the
# first line of a file, the #coding:xxx cookie cannot be
# written before. So the shebang has to be decodable from
# UTF-8.
try:
shebang.decode('utf-8')
except UnicodeDecodeError:
raise ValueError(
"The shebang ({!r}) is not decodable "
"from utf-8".format(shebang))
# If the script is encoded to a custom encoding (use a
# #coding:xxx cookie), the shebang has to be decodable from
# the script encoding too.
try:
shebang.decode(encoding)
except UnicodeDecodeError:
raise ValueError(
"The shebang ({!r}) is not decodable "
"from the script encoding ({})"
.format(shebang, encoding))
with open(outfile, "wb") as outf:
outf.write(shebang)
outf.writelines(f.readlines())
if f:
f.close()
else:
if f:
f.close()
updated_files.append(outfile)
self.copy_file(script, outfile)
if os.name == 'posix':
for file in outfiles:
if self.dry_run:
log.info("changing mode of %s", file)
else:
oldmode = os.stat(file)[ST_MODE] & 0o7777
newmode = (oldmode | 0o555) & 0o7777
if newmode != oldmode:
log.info("changing mode of %s from %o to %o",
file, oldmode, newmode)
os.chmod(file, newmode)
# XXX should we modify self.outfiles?
return outfiles, updated_files
class build_scripts_2to3(build_scripts, Mixin2to3):
def copy_scripts(self):
outfiles, updated_files = build_scripts.copy_scripts(self)
if not self.dry_run:
self.run_2to3(updated_files)
return outfiles, updated_files
"""distutils.command.check
Implements the Distutils 'check' command.
"""
from distutils.core import Command
from distutils.errors import DistutilsSetupError
try:
# docutils is installed
from docutils.utils import Reporter
from docutils.parsers.rst import Parser
from docutils import frontend
from docutils import nodes
from io import StringIO
class SilentReporter(Reporter):
def __init__(self, source, report_level, halt_level, stream=None,
debug=0, encoding='ascii', error_handler='replace'):
self.messages = []
Reporter.__init__(self, source, report_level, halt_level, stream,
debug, encoding, error_handler)
def system_message(self, level, message, *children, **kwargs):
self.messages.append((level, message, children, kwargs))
return nodes.system_message(message, level=level,
type=self.levels[level],
*children, **kwargs)
HAS_DOCUTILS = True
except Exception:
# Catch all exceptions because exceptions besides ImportError probably
# indicate that docutils is not ported to Py3k.
HAS_DOCUTILS = False
class check(Command):
"""This command checks the meta-data of the package.
"""
description = ("perform some checks on the package")
user_options = [('metadata', 'm', 'Verify meta-data'),
('restructuredtext', 'r',
('Checks if long string meta-data syntax '
'are reStructuredText-compliant')),
('strict', 's',
'Will exit with an error if a check fails')]
boolean_options = ['metadata', 'restructuredtext', 'strict']
def initialize_options(self):
"""Sets default values for options."""
self.restructuredtext = 0
self.metadata = 1
self.strict = 0
self._warnings = 0
def finalize_options(self):
pass
def warn(self, msg):
"""Counts the number of warnings that occurs."""
self._warnings += 1
return Command.warn(self, msg)
def run(self):
"""Runs the command."""
# perform the various tests
if self.metadata:
self.check_metadata()
if self.restructuredtext:
if HAS_DOCUTILS:
self.check_restructuredtext()
elif self.strict:
raise DistutilsSetupError('The docutils package is needed.')
# let's raise an error in strict mode, if we have at least
# one warning
if self.strict and self._warnings > 0:
raise DistutilsSetupError('Please correct your package.')
def check_metadata(self):
"""Ensures that all required elements of meta-data are supplied.
name, version, URL, (author and author_email) or
(maintainer and maintainer_email)).
Warns if any are missing.
"""
metadata = self.distribution.metadata
missing = []
for attr in ('name', 'version', 'url'):
if not (hasattr(metadata, attr) and getattr(metadata, attr)):
missing.append(attr)
if missing:
self.warn("missing required meta-data: %s" % ', '.join(missing))
if metadata.author:
if not metadata.author_email:
self.warn("missing meta-data: if 'author' supplied, " +
"'author_email' must be supplied too")
elif metadata.maintainer:
if not metadata.maintainer_email:
self.warn("missing meta-data: if 'maintainer' supplied, " +
"'maintainer_email' must be supplied too")
else:
self.warn("missing meta-data: either (author and author_email) " +
"or (maintainer and maintainer_email) " +
"must be supplied")
def check_restructuredtext(self):
"""Checks if the long string fields are reST-compliant."""
data = self.distribution.get_long_description()
for warning in self._check_rst_data(data):
line = warning[-1].get('line')
if line is None:
warning = warning[1]
else:
warning = '%s (line %s)' % (warning[1], line)
self.warn(warning)
def _check_rst_data(self, data):
"""Returns warnings when the provided data doesn't compile."""
source_path = StringIO()
parser = Parser()
settings = frontend.OptionParser(components=(Parser,)).get_default_values()
settings.tab_width = 4
settings.pep_references = None
settings.rfc_references = None
reporter = SilentReporter(source_path,
settings.report_level,
settings.halt_level,
stream=settings.warning_stream,
debug=settings.debug,
encoding=settings.error_encoding,
error_handler=settings.error_encoding_error_handler)
document = nodes.document(settings, reporter, source=source_path)
document.note_source(source_path, -1)
try:
parser.parse(data, document)
except AttributeError as e:
reporter.messages.append(
(-1, 'Could not finish the parsing: %s.' % e, '', {}))
return reporter.messages
"""distutils.command.clean
Implements the Distutils 'clean' command."""
# contributed by Bastian Kleineidam <[email protected]>, added 2000-03-18
import os
from distutils.core import Command
from distutils.dir_util import remove_tree
from distutils import log
class clean(Command):
description = "clean up temporary files from 'build' command"
user_options = [
('build-base=', 'b',
"base build directory (default: 'build.build-base')"),
('build-lib=', None,
"build directory for all modules (default: 'build.build-lib')"),
('build-temp=', 't',
"temporary build directory (default: 'build.build-temp')"),
('build-scripts=', None,
"build directory for scripts (default: 'build.build-scripts')"),
('bdist-base=', None,
"temporary directory for built distributions"),
('all', 'a',
"remove all build output, not just temporary by-products")
]
boolean_options = ['all']
def initialize_options(self):
self.build_base = None
self.build_lib = None
self.build_temp = None
self.build_scripts = None
self.bdist_base = None
self.all = None
def finalize_options(self):
self.set_undefined_options('build',
('build_base', 'build_base'),
('build_lib', 'build_lib'),
('build_scripts', 'build_scripts'),
('build_temp', 'build_temp'))
self.set_undefined_options('bdist',
('bdist_base', 'bdist_base'))
def run(self):
# remove the build/temp.<plat> directory (unless it's already
# gone)
if os.path.exists(self.build_temp):
remove_tree(self.build_temp, dry_run=self.dry_run)
else:
log.debug("'%s' does not exist -- can't clean it",
self.build_temp)
if self.all:
# remove build directories
for directory in (self.build_lib,
self.bdist_base,
self.build_scripts):
if os.path.exists(directory):
remove_tree(directory, dry_run=self.dry_run)
else:
log.warn("'%s' does not exist -- can't clean it",
directory)
# just for the heck of it, try to remove the base build directory:
# we might have emptied it right now, but if not we don't care
if not self.dry_run:
try:
os.rmdir(self.build_base)
log.info("removing '%s'", self.build_base)
except OSError:
pass
"""distutils.command.config
Implements the Distutils 'config' command, a (mostly) empty command class
that exists mainly to be sub-classed by specific module distributions and
applications. The idea is that while every "config" command is different,
at least they're all named the same, and users always see "config" in the
list of standard commands. Also, this is a good place to put common
configure-like tasks: "try to compile this C code", or "figure out where
this header file lives".
"""
import sys, os, re
from distutils.core import Command
from distutils.errors import DistutilsExecError
from distutils.sysconfig import customize_compiler
from distutils import log
LANG_EXT = {"c": ".c", "c++": ".cxx"}
class config(Command):
description = "prepare to build"
user_options = [
('compiler=', None,
"specify the compiler type"),
('cc=', None,
"specify the compiler executable"),
('include-dirs=', 'I',
"list of directories to search for header files"),
('define=', 'D',
"C preprocessor macros to define"),
('undef=', 'U',
"C preprocessor macros to undefine"),
('libraries=', 'l',
"external C libraries to link with"),
('library-dirs=', 'L',
"directories to search for external C libraries"),
('noisy', None,
"show every action (compile, link, run, ...) taken"),
('dump-source', None,
"dump generated source files before attempting to compile them"),
]
# The three standard command methods: since the "config" command
# does nothing by default, these are empty.
def initialize_options(self):
self.compiler = None
self.cc = None
self.include_dirs = None
self.libraries = None
self.library_dirs = None
# maximal output for now
self.noisy = 1
self.dump_source = 1
# list of temporary files generated along-the-way that we have
# to clean at some point
self.temp_files = []
def finalize_options(self):
if self.include_dirs is None:
self.include_dirs = self.distribution.include_dirs or []
elif isinstance(self.include_dirs, str):
self.include_dirs = self.include_dirs.split(os.pathsep)
if self.libraries is None:
self.libraries = []
elif isinstance(self.libraries, str):
self.libraries = [self.libraries]
if self.library_dirs is None:
self.library_dirs = []
elif isinstance(self.library_dirs, str):
self.library_dirs = self.library_dirs.split(os.pathsep)
def run(self):
pass
# Utility methods for actual "config" commands. The interfaces are
# loosely based on Autoconf macros of similar names. Sub-classes
# may use these freely.
def _check_compiler(self):
"""Check that 'self.compiler' really is a CCompiler object;
if not, make it one.
"""
# We do this late, and only on-demand, because this is an expensive
# import.
from distutils.ccompiler import CCompiler, new_compiler
if not isinstance(self.compiler, CCompiler):
self.compiler = new_compiler(compiler=self.compiler,
dry_run=self.dry_run, force=1)
customize_compiler(self.compiler)
if self.include_dirs:
self.compiler.set_include_dirs(self.include_dirs)
if self.libraries:
self.compiler.set_libraries(self.libraries)
if self.library_dirs:
self.compiler.set_library_dirs(self.library_dirs)
def _gen_temp_sourcefile(self, body, headers, lang):
filename = "_configtest" + LANG_EXT[lang]
file = open(filename, "w")
if headers:
for header in headers:
file.write("#include <%s>\n" % header)
file.write("\n")
file.write(body)
if body[-1] != "\n":
file.write("\n")
file.close()
return filename
def _preprocess(self, body, headers, include_dirs, lang):
src = self._gen_temp_sourcefile(body, headers, lang)
out = "_configtest.i"
self.temp_files.extend([src, out])
self.compiler.preprocess(src, out, include_dirs=include_dirs)
return (src, out)
def _compile(self, body, headers, include_dirs, lang):
src = self._gen_temp_sourcefile(body, headers, lang)
if self.dump_source:
dump_file(src, "compiling '%s':" % src)
(obj,) = self.compiler.object_filenames([src])
self.temp_files.extend([src, obj])
self.compiler.compile([src], include_dirs=include_dirs)
return (src, obj)
def _link(self, body, headers, include_dirs, libraries, library_dirs,
lang):
(src, obj) = self._compile(body, headers, include_dirs, lang)
prog = os.path.splitext(os.path.basename(src))[0]
self.compiler.link_executable([obj], prog,
libraries=libraries,
library_dirs=library_dirs,
target_lang=lang)
if self.compiler.exe_extension is not None:
prog = prog + self.compiler.exe_extension
self.temp_files.append(prog)
return (src, obj, prog)
def _clean(self, *filenames):
if not filenames:
filenames = self.temp_files
self.temp_files = []
log.info("removing: %s", ' '.join(filenames))
for filename in filenames:
try:
os.remove(filename)
except OSError:
pass
# XXX these ignore the dry-run flag: what to do, what to do? even if
# you want a dry-run build, you still need some sort of configuration
# info. My inclination is to make it up to the real config command to
# consult 'dry_run', and assume a default (minimal) configuration if
# true. The problem with trying to do it here is that you'd have to
# return either true or false from all the 'try' methods, neither of
# which is correct.
# XXX need access to the header search path and maybe default macros.
def try_cpp(self, body=None, headers=None, include_dirs=None, lang="c"):
"""Construct a source file from 'body' (a string containing lines
of C/C++ code) and 'headers' (a list of header files to include)
and run it through the preprocessor. Return true if the
preprocessor succeeded, false if there were any errors.
('body' probably isn't of much use, but what the heck.)
"""
from distutils.ccompiler import CompileError
self._check_compiler()
ok = True
try:
self._preprocess(body, headers, include_dirs, lang)
except CompileError:
ok = False
self._clean()
return ok
def search_cpp(self, pattern, body=None, headers=None, include_dirs=None,
lang="c"):
"""Construct a source file (just like 'try_cpp()'), run it through
the preprocessor, and return true if any line of the output matches
'pattern'. 'pattern' should either be a compiled regex object or a
string containing a regex. If both 'body' and 'headers' are None,
preprocesses an empty file -- which can be useful to determine the
symbols the preprocessor and compiler set by default.
"""
self._check_compiler()
src, out = self._preprocess(body, headers, include_dirs, lang)
if isinstance(pattern, str):
pattern = re.compile(pattern)
file = open(out)
match = False
while True:
line = file.readline()
if line == '':
break
if pattern.search(line):
match = True
break
file.close()
self._clean()
return match
def try_compile(self, body, headers=None, include_dirs=None, lang="c"):
"""Try to compile a source file built from 'body' and 'headers'.
Return true on success, false otherwise.
"""
from distutils.ccompiler import CompileError
self._check_compiler()
try:
self._compile(body, headers, include_dirs, lang)
ok = True
except CompileError:
ok = False
log.info(ok and "success!" or "failure.")
self._clean()
return ok
def try_link(self, body, headers=None, include_dirs=None, libraries=None,
library_dirs=None, lang="c"):
"""Try to compile and link a source file, built from 'body' and
'headers', to executable form. Return true on success, false
otherwise.
"""
from distutils.ccompiler import CompileError, LinkError
self._check_compiler()
try:
self._link(body, headers, include_dirs,
libraries, library_dirs, lang)
ok = True
except (CompileError, LinkError):
ok = False
log.info(ok and "success!" or "failure.")
self._clean()
return ok
def try_run(self, body, headers=None, include_dirs=None, libraries=None,
library_dirs=None, lang="c"):
"""Try to compile, link to an executable, and run a program
built from 'body' and 'headers'. Return true on success, false
otherwise.
"""
from distutils.ccompiler import CompileError, LinkError
self._check_compiler()
try:
src, obj, exe = self._link(body, headers, include_dirs,
libraries, library_dirs, lang)
self.spawn([exe])
ok = True
except (CompileError, LinkError, DistutilsExecError):
ok = False
log.info(ok and "success!" or "failure.")
self._clean()
return ok
# -- High-level methods --------------------------------------------
# (these are the ones that are actually likely to be useful
# when implementing a real-world config command!)
def check_func(self, func, headers=None, include_dirs=None,
libraries=None, library_dirs=None, decl=0, call=0):
"""Determine if function 'func' is available by constructing a
source file that refers to 'func', and compiles and links it.
If everything succeeds, returns true; otherwise returns false.
The constructed source file starts out by including the header
files listed in 'headers'. If 'decl' is true, it then declares
'func' (as "int func()"); you probably shouldn't supply 'headers'
and set 'decl' true in the same call, or you might get errors about
a conflicting declarations for 'func'. Finally, the constructed
'main()' function either references 'func' or (if 'call' is true)
calls it. 'libraries' and 'library_dirs' are used when
linking.
"""
self._check_compiler()
body = []
if decl:
body.append("int %s ();" % func)
body.append("int main () {")
if call:
body.append(" %s();" % func)
else:
body.append(" %s;" % func)
body.append("}")
body = "\n".join(body) + "\n"
return self.try_link(body, headers, include_dirs,
libraries, library_dirs)
def check_lib(self, library, library_dirs=None, headers=None,
include_dirs=None, other_libraries=[]):
"""Determine if 'library' is available to be linked against,
without actually checking that any particular symbols are provided
by it. 'headers' will be used in constructing the source file to
be compiled, but the only effect of this is to check if all the
header files listed are available. Any libraries listed in
'other_libraries' will be included in the link, in case 'library'
has symbols that depend on other libraries.
"""
self._check_compiler()
return self.try_link("int main (void) { }", headers, include_dirs,
[library] + other_libraries, library_dirs)
def check_header(self, header, include_dirs=None, library_dirs=None,
lang="c"):
"""Determine if the system header file named by 'header_file'
exists and can be found by the preprocessor; return true if so,
false otherwise.
"""
return self.try_cpp(body="/* No body */", headers=[header],
include_dirs=include_dirs)
def dump_file(filename, head=None):
"""Dumps a file content into log.info.
If head is not None, will be dumped before the file content.
"""
if head is None:
log.info('%s' % filename)
else:
log.info(head)
file = open(filename)
try:
log.info(file.read())
finally:
file.close()
"""distutils.command.install
Implements the Distutils 'install' command."""
import sys
import os
from distutils import log
from distutils.core import Command
from distutils.debug import DEBUG
from distutils.sysconfig import get_config_vars
from distutils.errors import DistutilsPlatformError
from distutils.file_util import write_file
from distutils.util import convert_path, subst_vars, change_root
from distutils.util import get_platform
from distutils.errors import DistutilsOptionError
from site import USER_BASE
from site import USER_SITE
HAS_USER_SITE = True
WINDOWS_SCHEME = {
'purelib': '$base/Lib/site-packages',
'platlib': '$base/Lib/site-packages',
'headers': '$base/Include/$dist_name',
'scripts': '$base/Scripts',
'data' : '$base',
}
INSTALL_SCHEMES = {
'unix_prefix': {
'purelib': '$base/lib/python$py_version_short/site-packages',
'platlib': '$platbase/lib/python$py_version_short/site-packages',
'headers': '$base/include/python$py_version_short$abiflags/$dist_name',
'scripts': '$base/bin',
'data' : '$base',
},
'unix_home': {
'purelib': '$base/lib/python',
'platlib': '$base/lib/python',
'headers': '$base/include/python/$dist_name',
'scripts': '$base/bin',
'data' : '$base',
},
'nt': WINDOWS_SCHEME,
}
# user site schemes
if HAS_USER_SITE:
INSTALL_SCHEMES['nt_user'] = {
'purelib': '$usersite',
'platlib': '$usersite',
'headers': '$userbase/Python$py_version_nodot/Include/$dist_name',
'scripts': '$userbase/Scripts',
'data' : '$userbase',
}
INSTALL_SCHEMES['unix_user'] = {
'purelib': '$usersite',
'platlib': '$usersite',
'headers':
'$userbase/include/python$py_version_short$abiflags/$dist_name',
'scripts': '$userbase/bin',
'data' : '$userbase',
}
# The keys to an installation scheme; if any new types of files are to be
# installed, be sure to add an entry to every installation scheme above,
# and to SCHEME_KEYS here.
SCHEME_KEYS = ('purelib', 'platlib', 'headers', 'scripts', 'data')
class install(Command):
description = "install everything from build directory"
user_options = [
# Select installation scheme and set base director(y|ies)
('prefix=', None,
"installation prefix"),
('exec-prefix=', None,
"(Unix only) prefix for platform-specific files"),
('home=', None,
"(Unix only) home directory to install under"),
# Or, just set the base director(y|ies)
('install-base=', None,
"base installation directory (instead of --prefix or --home)"),
('install-platbase=', None,
"base installation directory for platform-specific files " +
"(instead of --exec-prefix or --home)"),
('root=', None,
"install everything relative to this alternate root directory"),
# Or, explicitly set the installation scheme
('install-purelib=', None,
"installation directory for pure Python module distributions"),
('install-platlib=', None,
"installation directory for non-pure module distributions"),
('install-lib=', None,
"installation directory for all module distributions " +
"(overrides --install-purelib and --install-platlib)"),
('install-headers=', None,
"installation directory for C/C++ headers"),
('install-scripts=', None,
"installation directory for Python scripts"),
('install-data=', None,
"installation directory for data files"),
# Byte-compilation options -- see install_lib.py for details, as
# these are duplicated from there (but only install_lib does
# anything with them).
('compile', 'c', "compile .py to .pyc [default]"),
('no-compile', None, "don't compile .py files"),
('optimize=', 'O',
"also compile with optimization: -O1 for \"python -O\", "
"-O2 for \"python -OO\", and -O0 to disable [default: -O0]"),
# Miscellaneous control options
('force', 'f',
"force installation (overwrite any existing files)"),
('skip-build', None,
"skip rebuilding everything (for testing/debugging)"),
# Where to install documentation (eventually!)
#('doc-format=', None, "format of documentation to generate"),
#('install-man=', None, "directory for Unix man pages"),
#('install-html=', None, "directory for HTML documentation"),
#('install-info=', None, "directory for GNU info files"),
('record=', None,
"filename in which to record list of installed files"),
]
boolean_options = ['compile', 'force', 'skip-build']
if HAS_USER_SITE:
user_options.append(('user', None,
"install in user site-package '%s'" % USER_SITE))
boolean_options.append('user')
negative_opt = {'no-compile' : 'compile'}
def initialize_options(self):
"""Initializes options."""
# High-level options: these select both an installation base
# and scheme.
self.prefix = None
self.exec_prefix = None
self.home = None
self.user = 0
# These select only the installation base; it's up to the user to
# specify the installation scheme (currently, that means supplying
# the --install-{platlib,purelib,scripts,data} options).
self.install_base = None
self.install_platbase = None
self.root = None
# These options are the actual installation directories; if not
# supplied by the user, they are filled in using the installation
# scheme implied by prefix/exec-prefix/home and the contents of
# that installation scheme.
self.install_purelib = None # for pure module distributions
self.install_platlib = None # non-pure (dists w/ extensions)
self.install_headers = None # for C/C++ headers
self.install_lib = None # set to either purelib or platlib
self.install_scripts = None
self.install_data = None
self.install_userbase = USER_BASE
self.install_usersite = USER_SITE
self.compile = None
self.optimize = None
# These two are for putting non-packagized distributions into their
# own directory and creating a .pth file if it makes sense.
# 'extra_path' comes from the setup file; 'install_path_file' can
# be turned off if it makes no sense to install a .pth file. (But
# better to install it uselessly than to guess wrong and not
# install it when it's necessary and would be used!) Currently,
# 'install_path_file' is always true unless some outsider meddles
# with it.
self.extra_path = None
self.install_path_file = 1
# 'force' forces installation, even if target files are not
# out-of-date. 'skip_build' skips running the "build" command,
# handy if you know it's not necessary. 'warn_dir' (which is *not*
# a user option, it's just there so the bdist_* commands can turn
# it off) determines whether we warn about installing to a
# directory not in sys.path.
self.force = 0
self.skip_build = 0
self.warn_dir = 1
# These are only here as a conduit from the 'build' command to the
# 'install_*' commands that do the real work. ('build_base' isn't
# actually used anywhere, but it might be useful in future.) They
# are not user options, because if the user told the install
# command where the build directory is, that wouldn't affect the
# build command.
self.build_base = None
self.build_lib = None
# Not defined yet because we don't know anything about
# documentation yet.
#self.install_man = None
#self.install_html = None
#self.install_info = None
self.record = None
# -- Option finalizing methods -------------------------------------
# (This is rather more involved than for most commands,
# because this is where the policy for installing third-
# party Python modules on various platforms given a wide
# array of user input is decided. Yes, it's quite complex!)
def finalize_options(self):
"""Finalizes options."""
# This method (and its pliant slaves, like 'finalize_unix()',
# 'finalize_other()', and 'select_scheme()') is where the default
# installation directories for modules, extension modules, and
# anything else we care to install from a Python module
# distribution. Thus, this code makes a pretty important policy
# statement about how third-party stuff is added to a Python
# installation! Note that the actual work of installation is done
# by the relatively simple 'install_*' commands; they just take
# their orders from the installation directory options determined
# here.
# Check for errors/inconsistencies in the options; first, stuff
# that's wrong on any platform.
if ((self.prefix or self.exec_prefix or self.home) and
(self.install_base or self.install_platbase)):
raise DistutilsOptionError(
"must supply either prefix/exec-prefix/home or " +
"install-base/install-platbase -- not both")
if self.home and (self.prefix or self.exec_prefix):
raise DistutilsOptionError(
"must supply either home or prefix/exec-prefix -- not both")
if self.user and (self.prefix or self.exec_prefix or self.home or
self.install_base or self.install_platbase):
raise DistutilsOptionError("can't combine user with prefix, "
"exec_prefix/home, or install_(plat)base")
# Next, stuff that's wrong (or dubious) only on certain platforms.
if os.name != "posix":
if self.exec_prefix:
self.warn("exec-prefix option ignored on this platform")
self.exec_prefix = None
# Now the interesting logic -- so interesting that we farm it out
# to other methods. The goal of these methods is to set the final
# values for the install_{lib,scripts,data,...} options, using as
# input a heady brew of prefix, exec_prefix, home, install_base,
# install_platbase, user-supplied versions of
# install_{purelib,platlib,lib,scripts,data,...}, and the
# INSTALL_SCHEME dictionary above. Phew!
self.dump_dirs("pre-finalize_{unix,other}")
if os.name == 'posix':
self.finalize_unix()
else:
self.finalize_other()
self.dump_dirs("post-finalize_{unix,other}()")
# Expand configuration variables, tilde, etc. in self.install_base
# and self.install_platbase -- that way, we can use $base or
# $platbase in the other installation directories and not worry
# about needing recursive variable expansion (shudder).
py_version = sys.version.split()[0]
(prefix, exec_prefix) = get_config_vars('prefix', 'exec_prefix')
try:
abiflags = sys.abiflags
except AttributeError:
# sys.abiflags may not be defined on all platforms.
abiflags = ''
self.config_vars = {'dist_name': self.distribution.get_name(),
'dist_version': self.distribution.get_version(),
'dist_fullname': self.distribution.get_fullname(),
'py_version': py_version,
'py_version_short': py_version[0:3],
'py_version_nodot': py_version[0] + py_version[2],
'sys_prefix': prefix,
'prefix': prefix,
'sys_exec_prefix': exec_prefix,
'exec_prefix': exec_prefix,
'abiflags': abiflags,
}
if HAS_USER_SITE:
self.config_vars['userbase'] = self.install_userbase
self.config_vars['usersite'] = self.install_usersite
self.expand_basedirs()
self.dump_dirs("post-expand_basedirs()")
# Now define config vars for the base directories so we can expand
# everything else.
self.config_vars['base'] = self.install_base
self.config_vars['platbase'] = self.install_platbase
if DEBUG:
from pprint import pprint
print("config vars:")
pprint(self.config_vars)
# Expand "~" and configuration variables in the installation
# directories.
self.expand_dirs()
self.dump_dirs("post-expand_dirs()")
# Create directories in the home dir:
if self.user:
self.create_home_path()
# Pick the actual directory to install all modules to: either
# install_purelib or install_platlib, depending on whether this
# module distribution is pure or not. Of course, if the user
# already specified install_lib, use their selection.
if self.install_lib is None:
if self.distribution.ext_modules: # has extensions: non-pure
self.install_lib = self.install_platlib
else:
self.install_lib = self.install_purelib
# Convert directories from Unix /-separated syntax to the local
# convention.
self.convert_paths('lib', 'purelib', 'platlib',
'scripts', 'data', 'headers',
'userbase', 'usersite')
# Well, we're not actually fully completely finalized yet: we still
# have to deal with 'extra_path', which is the hack for allowing
# non-packagized module distributions (hello, Numerical Python!) to
# get their own directories.
self.handle_extra_path()
self.install_libbase = self.install_lib # needed for .pth file
self.install_lib = os.path.join(self.install_lib, self.extra_dirs)
# If a new root directory was supplied, make all the installation
# dirs relative to it.
if self.root is not None:
self.change_roots('libbase', 'lib', 'purelib', 'platlib',
'scripts', 'data', 'headers')
self.dump_dirs("after prepending root")
# Find out the build directories, ie. where to install from.
self.set_undefined_options('build',
('build_base', 'build_base'),
('build_lib', 'build_lib'))
# Punt on doc directories for now -- after all, we're punting on
# documentation completely!
def dump_dirs(self, msg):
"""Dumps the list of user options."""
if not DEBUG:
return
from distutils.fancy_getopt import longopt_xlate
log.debug(msg + ":")
for opt in self.user_options:
opt_name = opt[0]
if opt_name[-1] == "=":
opt_name = opt_name[0:-1]
if opt_name in self.negative_opt:
opt_name = self.negative_opt[opt_name]
opt_name = opt_name.translate(longopt_xlate)
val = not getattr(self, opt_name)
else:
opt_name = opt_name.translate(longopt_xlate)
val = getattr(self, opt_name)
log.debug(" %s: %s" % (opt_name, val))
def finalize_unix(self):
"""Finalizes options for posix platforms."""
if self.install_base is not None or self.install_platbase is not None:
if ((self.install_lib is None and
self.install_purelib is None and
self.install_platlib is None) or
self.install_headers is None or
self.install_scripts is None or
self.install_data is None):
raise DistutilsOptionError(
"install-base or install-platbase supplied, but "
"installation scheme is incomplete")
return
if self.user:
if self.install_userbase is None:
raise DistutilsPlatformError(
"User base directory is not specified")
self.install_base = self.install_platbase = self.install_userbase
self.select_scheme("unix_user")
elif self.home is not None:
self.install_base = self.install_platbase = self.home
self.select_scheme("unix_home")
else:
if self.prefix is None:
if self.exec_prefix is not None:
raise DistutilsOptionError(
"must not supply exec-prefix without prefix")
self.prefix = os.path.normpath(sys.prefix)
self.exec_prefix = os.path.normpath(sys.exec_prefix)
else:
if self.exec_prefix is None:
self.exec_prefix = self.prefix
self.install_base = self.prefix
self.install_platbase = self.exec_prefix
self.select_scheme("unix_prefix")
def finalize_other(self):
"""Finalizes options for non-posix platforms"""
if self.user:
if self.install_userbase is None:
raise DistutilsPlatformError(
"User base directory is not specified")
self.install_base = self.install_platbase = self.install_userbase
self.select_scheme(os.name + "_user")
elif self.home is not None:
self.install_base = self.install_platbase = self.home
self.select_scheme("unix_home")
else:
if self.prefix is None:
self.prefix = os.path.normpath(sys.prefix)
self.install_base = self.install_platbase = self.prefix
try:
self.select_scheme(os.name)
except KeyError:
raise DistutilsPlatformError(
"I don't know how to install stuff on '%s'" % os.name)
def select_scheme(self, name):
"""Sets the install directories by applying the install schemes."""
# it's the caller's problem if they supply a bad name!
scheme = INSTALL_SCHEMES[name]
for key in SCHEME_KEYS:
attrname = 'install_' + key
if getattr(self, attrname) is None:
setattr(self, attrname, scheme[key])
def _expand_attrs(self, attrs):
for attr in attrs:
val = getattr(self, attr)
if val is not None:
if os.name == 'posix' or os.name == 'nt':
val = os.path.expanduser(val)
val = subst_vars(val, self.config_vars)
setattr(self, attr, val)
def expand_basedirs(self):
"""Calls `os.path.expanduser` on install_base, install_platbase and
root."""
self._expand_attrs(['install_base', 'install_platbase', 'root'])
def expand_dirs(self):
"""Calls `os.path.expanduser` on install dirs."""
self._expand_attrs(['install_purelib', 'install_platlib',
'install_lib', 'install_headers',
'install_scripts', 'install_data',])
def convert_paths(self, *names):
"""Call `convert_path` over `names`."""
for name in names:
attr = "install_" + name
setattr(self, attr, convert_path(getattr(self, attr)))
def handle_extra_path(self):
"""Set `path_file` and `extra_dirs` using `extra_path`."""
if self.extra_path is None:
self.extra_path = self.distribution.extra_path
if self.extra_path is not None:
if isinstance(self.extra_path, str):
self.extra_path = self.extra_path.split(',')
if len(self.extra_path) == 1:
path_file = extra_dirs = self.extra_path[0]
elif len(self.extra_path) == 2:
path_file, extra_dirs = self.extra_path
else:
raise DistutilsOptionError(
"'extra_path' option must be a list, tuple, or "
"comma-separated string with 1 or 2 elements")
# convert to local form in case Unix notation used (as it
# should be in setup scripts)
extra_dirs = convert_path(extra_dirs)
else:
path_file = None
extra_dirs = ''
# XXX should we warn if path_file and not extra_dirs? (in which
# case the path file would be harmless but pointless)
self.path_file = path_file
self.extra_dirs = extra_dirs
def change_roots(self, *names):
"""Change the install directories pointed by name using root."""
for name in names:
attr = "install_" + name
setattr(self, attr, change_root(self.root, getattr(self, attr)))
def create_home_path(self):
"""Create directories under ~."""
if not self.user:
return
home = convert_path(os.path.expanduser("~"))
for name, path in self.config_vars.items():
if path.startswith(home) and not os.path.isdir(path):
self.debug_print("os.makedirs('%s', 0o700)" % path)
os.makedirs(path, 0o700)
# -- Command execution methods -------------------------------------
def run(self):
"""Runs the command."""
# Obviously have to build before we can install
if not self.skip_build:
self.run_command('build')
# If we built for any other platform, we can't install.
build_plat = self.distribution.get_command_obj('build').plat_name
# check warn_dir - it is a clue that the 'install' is happening
# internally, and not to sys.path, so we don't check the platform
# matches what we are running.
if self.warn_dir and build_plat != get_platform():
raise DistutilsPlatformError("Can't install when "
"cross-compiling")
# Run all sub-commands (at least those that need to be run)
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
if self.path_file:
self.create_path_file()
# write list of installed files, if requested.
if self.record:
outputs = self.get_outputs()
if self.root: # strip any package prefix
root_len = len(self.root)
for counter in range(len(outputs)):
outputs[counter] = outputs[counter][root_len:]
self.execute(write_file,
(self.record, outputs),
"writing list of installed files to '%s'" %
self.record)
sys_path = map(os.path.normpath, sys.path)
sys_path = map(os.path.normcase, sys_path)
install_lib = os.path.normcase(os.path.normpath(self.install_lib))
if (self.warn_dir and
not (self.path_file and self.install_path_file) and
install_lib not in sys_path):
log.debug(("modules installed to '%s', which is not in "
"Python's module search path (sys.path) -- "
"you'll have to change the search path yourself"),
self.install_lib)
def create_path_file(self):
"""Creates the .pth file"""
filename = os.path.join(self.install_libbase,
self.path_file + ".pth")
if self.install_path_file:
self.execute(write_file,
(filename, [self.extra_dirs]),
"creating %s" % filename)
else:
self.warn("path file '%s' not created" % filename)
# -- Reporting methods ---------------------------------------------
def get_outputs(self):
"""Assembles the outputs of all the sub-commands."""
outputs = []
for cmd_name in self.get_sub_commands():
cmd = self.get_finalized_command(cmd_name)
# Add the contents of cmd.get_outputs(), ensuring
# that outputs doesn't contain duplicate entries
for filename in cmd.get_outputs():
if filename not in outputs:
outputs.append(filename)
if self.path_file and self.install_path_file:
outputs.append(os.path.join(self.install_libbase,
self.path_file + ".pth"))
return outputs
def get_inputs(self):
"""Returns the inputs of all the sub-commands"""
# XXX gee, this looks familiar ;-(
inputs = []
for cmd_name in self.get_sub_commands():
cmd = self.get_finalized_command(cmd_name)
inputs.extend(cmd.get_inputs())
return inputs
# -- Predicates for sub-command list -------------------------------
def has_lib(self):
"""Returns true if the current distribution has any Python
modules to install."""
return (self.distribution.has_pure_modules() or
self.distribution.has_ext_modules())
def has_headers(self):
"""Returns true if the current distribution has any headers to
install."""
return self.distribution.has_headers()
def has_scripts(self):
"""Returns true if the current distribution has any scripts to.
install."""
return self.distribution.has_scripts()
def has_data(self):
"""Returns true if the current distribution has any data to.
install."""
return self.distribution.has_data_files()
# 'sub_commands': a list of commands this command might have to run to
# get its work done. See cmd.py for more info.
sub_commands = [('install_lib', has_lib),
('install_headers', has_headers),
('install_scripts', has_scripts),
('install_data', has_data),
('install_egg_info', lambda self:True),
]
"""distutils.command.install_data
Implements the Distutils 'install_data' command, for installing
platform-independent data files."""
# contributed by Bastian Kleineidam
import os
from distutils.core import Command
from distutils.util import change_root, convert_path
class install_data(Command):
description = "install data files"
user_options = [
('install-dir=', 'd',
"base directory for installing data files "
"(default: installation base dir)"),
('root=', None,
"install everything relative to this alternate root directory"),
('force', 'f', "force installation (overwrite existing files)"),
]
boolean_options = ['force']
def initialize_options(self):
self.install_dir = None
self.outfiles = []
self.root = None
self.force = 0
self.data_files = self.distribution.data_files
self.warn_dir = 1
def finalize_options(self):
self.set_undefined_options('install',
('install_data', 'install_dir'),
('root', 'root'),
('force', 'force'),
)
def run(self):
self.mkpath(self.install_dir)
for f in self.data_files:
if isinstance(f, str):
# it's a simple file, so copy it
f = convert_path(f)
if self.warn_dir:
self.warn("setup script did not provide a directory for "
"'%s' -- installing right in '%s'" %
(f, self.install_dir))
(out, _) = self.copy_file(f, self.install_dir)
self.outfiles.append(out)
else:
# it's a tuple with path to install to and a list of files
dir = convert_path(f[0])
if not os.path.isabs(dir):
dir = os.path.join(self.install_dir, dir)
elif self.root:
dir = change_root(self.root, dir)
self.mkpath(dir)
if f[1] == []:
# If there are no files listed, the user must be
# trying to create an empty directory, so add the
# directory to the list of output files.
self.outfiles.append(dir)
else:
# Copy files, adding them to the list of output files.
for data in f[1]:
data = convert_path(data)
(out, _) = self.copy_file(data, dir)
self.outfiles.append(out)
def get_inputs(self):
return self.data_files or []
def get_outputs(self):
return self.outfiles
"""distutils.command.install_egg_info
Implements the Distutils 'install_egg_info' command, for installing
a package's PKG-INFO metadata."""
from distutils.cmd import Command
from distutils import log, dir_util
import os, sys, re
class install_egg_info(Command):
"""Install an .egg-info file for the package"""
description = "Install package's PKG-INFO metadata as an .egg-info file"
user_options = [
('install-dir=', 'd', "directory to install to"),
]
def initialize_options(self):
self.install_dir = None
def finalize_options(self):
self.set_undefined_options('install_lib',('install_dir','install_dir'))
basename = "%s-%s-py%s.egg-info" % (
to_filename(safe_name(self.distribution.get_name())),
to_filename(safe_version(self.distribution.get_version())),
sys.version[:3]
)
self.target = os.path.join(self.install_dir, basename)
self.outputs = [self.target]
def run(self):
target = self.target
if os.path.isdir(target) and not os.path.islink(target):
dir_util.remove_tree(target, dry_run=self.dry_run)
elif os.path.exists(target):
self.execute(os.unlink,(self.target,),"Removing "+target)
elif not os.path.isdir(self.install_dir):
self.execute(os.makedirs, (self.install_dir,),
"Creating "+self.install_dir)
log.info("Writing %s", target)
if not self.dry_run:
with open(target, 'w', encoding='UTF-8') as f:
self.distribution.metadata.write_pkg_file(f)
def get_outputs(self):
return self.outputs
# The following routines are taken from setuptools' pkg_resources module and
# can be replaced by importing them from pkg_resources once it is included
# in the stdlib.
def safe_name(name):
"""Convert an arbitrary string to a standard distribution name
Any runs of non-alphanumeric/. characters are replaced with a single '-'.
"""
return re.sub('[^A-Za-z0-9.]+', '-', name)
def safe_version(version):
"""Convert an arbitrary string to a standard version string
Spaces become dots, and all other non-alphanumeric characters become
dashes, with runs of multiple dashes condensed to a single dash.
"""
version = version.replace(' ','.')
return re.sub('[^A-Za-z0-9.]+', '-', version)
def to_filename(name):
"""Convert a project or version name to its filename-escaped form
Any '-' characters are currently replaced with '_'.
"""
return name.replace('-','_')
"""distutils.command.install_headers
Implements the Distutils 'install_headers' command, to install C/C++ header
files to the Python include directory."""
from distutils.core import Command
# XXX force is never used
class install_headers(Command):
description = "install C/C++ header files"
user_options = [('install-dir=', 'd',
"directory to install header files to"),
('force', 'f',
"force installation (overwrite existing files)"),
]
boolean_options = ['force']
def initialize_options(self):
self.install_dir = None
self.force = 0
self.outfiles = []
def finalize_options(self):
self.set_undefined_options('install',
('install_headers', 'install_dir'),
('force', 'force'))
def run(self):
headers = self.distribution.headers
if not headers:
return
self.mkpath(self.install_dir)
for header in headers:
(out, _) = self.copy_file(header, self.install_dir)
self.outfiles.append(out)
def get_inputs(self):
return self.distribution.headers or []
def get_outputs(self):
return self.outfiles
"""distutils.command.install_lib
Implements the Distutils 'install_lib' command
(install all Python modules)."""
import os
import importlib.util
import sys
from distutils.core import Command
from distutils.errors import DistutilsOptionError
# Extension for Python source files.
PYTHON_SOURCE_EXTENSION = ".py"
class install_lib(Command):
description = "install all Python modules (extensions and pure Python)"
# The byte-compilation options are a tad confusing. Here are the
# possible scenarios:
# 1) no compilation at all (--no-compile --no-optimize)
# 2) compile .pyc only (--compile --no-optimize; default)
# 3) compile .pyc and "level 1" .pyo (--compile --optimize)
# 4) compile "level 1" .pyo only (--no-compile --optimize)
# 5) compile .pyc and "level 2" .pyo (--compile --optimize-more)
# 6) compile "level 2" .pyo only (--no-compile --optimize-more)
#
# The UI for this is two option, 'compile' and 'optimize'.
# 'compile' is strictly boolean, and only decides whether to
# generate .pyc files. 'optimize' is three-way (0, 1, or 2), and
# decides both whether to generate .pyo files and what level of
# optimization to use.
user_options = [
('install-dir=', 'd', "directory to install to"),
('build-dir=','b', "build directory (where to install from)"),
('force', 'f', "force installation (overwrite existing files)"),
('compile', 'c', "compile .py to .pyc [default]"),
('no-compile', None, "don't compile .py files"),
('optimize=', 'O',
"also compile with optimization: -O1 for \"python -O\", "
"-O2 for \"python -OO\", and -O0 to disable [default: -O0]"),
('skip-build', None, "skip the build steps"),
]
boolean_options = ['force', 'compile', 'skip-build']
negative_opt = {'no-compile' : 'compile'}
def initialize_options(self):
# let the 'install' command dictate our installation directory
self.install_dir = None
self.build_dir = None
self.force = 0
self.compile = None
self.optimize = None
self.skip_build = None
def finalize_options(self):
# Get all the information we need to install pure Python modules
# from the umbrella 'install' command -- build (source) directory,
# install (target) directory, and whether to compile .py files.
self.set_undefined_options('install',
('build_lib', 'build_dir'),
('install_lib', 'install_dir'),
('force', 'force'),
('compile', 'compile'),
('optimize', 'optimize'),
('skip_build', 'skip_build'),
)
if self.compile is None:
self.compile = True
if self.optimize is None:
self.optimize = False
if not isinstance(self.optimize, int):
try:
self.optimize = int(self.optimize)
if self.optimize not in (0, 1, 2):
raise AssertionError
except (ValueError, AssertionError):
raise DistutilsOptionError("optimize must be 0, 1, or 2")
def run(self):
# Make sure we have built everything we need first
self.build()
# Install everything: simply dump the entire contents of the build
# directory to the installation directory (that's the beauty of
# having a build directory!)
outfiles = self.install()
# (Optionally) compile .py to .pyc
if outfiles is not None and self.distribution.has_pure_modules():
self.byte_compile(outfiles)
# -- Top-level worker functions ------------------------------------
# (called from 'run()')
def build(self):
if not self.skip_build:
if self.distribution.has_pure_modules():
self.run_command('build_py')
if self.distribution.has_ext_modules():
self.run_command('build_ext')
def install(self):
if os.path.isdir(self.build_dir):
outfiles = self.copy_tree(self.build_dir, self.install_dir)
else:
self.warn("'%s' does not exist -- no Python modules to install" %
self.build_dir)
return
return outfiles
def byte_compile(self, files):
if sys.dont_write_bytecode:
self.warn('byte-compiling is disabled, skipping.')
return
from distutils.util import byte_compile
# Get the "--root" directory supplied to the "install" command,
# and use it as a prefix to strip off the purported filename
# encoded in bytecode files. This is far from complete, but it
# should at least generate usable bytecode in RPM distributions.
install_root = self.get_finalized_command('install').root
if self.compile:
byte_compile(files, optimize=0,
force=self.force, prefix=install_root,
dry_run=self.dry_run)
if self.optimize > 0:
byte_compile(files, optimize=self.optimize,
force=self.force, prefix=install_root,
verbose=self.verbose, dry_run=self.dry_run)
# -- Utility methods -----------------------------------------------
def _mutate_outputs(self, has_any, build_cmd, cmd_option, output_dir):
if not has_any:
return []
build_cmd = self.get_finalized_command(build_cmd)
build_files = build_cmd.get_outputs()
build_dir = getattr(build_cmd, cmd_option)
prefix_len = len(build_dir) + len(os.sep)
outputs = []
for file in build_files:
outputs.append(os.path.join(output_dir, file[prefix_len:]))
return outputs
def _bytecode_filenames(self, py_filenames):
bytecode_files = []
for py_file in py_filenames:
# Since build_py handles package data installation, the
# list of outputs can contain more than just .py files.
# Make sure we only report bytecode for the .py files.
ext = os.path.splitext(os.path.normcase(py_file))[1]
if ext != PYTHON_SOURCE_EXTENSION:
continue
if self.compile:
bytecode_files.append(importlib.util.cache_from_source(
py_file, debug_override=True))
if self.optimize > 0:
bytecode_files.append(importlib.util.cache_from_source(
py_file, debug_override=False))
return bytecode_files
# -- External interface --------------------------------------------
# (called by outsiders)
def get_outputs(self):
"""Return the list of files that would be installed if this command
were actually run. Not affected by the "dry-run" flag or whether
modules have actually been built yet.
"""
pure_outputs = \
self._mutate_outputs(self.distribution.has_pure_modules(),
'build_py', 'build_lib',
self.install_dir)
if self.compile:
bytecode_outputs = self._bytecode_filenames(pure_outputs)
else:
bytecode_outputs = []
ext_outputs = \
self._mutate_outputs(self.distribution.has_ext_modules(),
'build_ext', 'build_lib',
self.install_dir)
return pure_outputs + bytecode_outputs + ext_outputs
def get_inputs(self):
"""Get the list of files that are input to this command, ie. the
files that get installed as they are named in the build tree.
The files in this list correspond one-to-one to the output
filenames returned by 'get_outputs()'.
"""
inputs = []
if self.distribution.has_pure_modules():
build_py = self.get_finalized_command('build_py')
inputs.extend(build_py.get_outputs())
if self.distribution.has_ext_modules():
build_ext = self.get_finalized_command('build_ext')
inputs.extend(build_ext.get_outputs())
return inputs
"""distutils.command.install_scripts
Implements the Distutils 'install_scripts' command, for installing
Python scripts."""
# contributed by Bastian Kleineidam
import os
from distutils.core import Command
from distutils import log
from stat import ST_MODE
class install_scripts(Command):
description = "install scripts (Python or otherwise)"
user_options = [
('install-dir=', 'd', "directory to install scripts to"),
('build-dir=','b', "build directory (where to install from)"),
('force', 'f', "force installation (overwrite existing files)"),
('skip-build', None, "skip the build steps"),
]
boolean_options = ['force', 'skip-build']
def initialize_options(self):
self.install_dir = None
self.force = 0
self.build_dir = None
self.skip_build = None
def finalize_options(self):
self.set_undefined_options('build', ('build_scripts', 'build_dir'))
self.set_undefined_options('install',
('install_scripts', 'install_dir'),
('force', 'force'),
('skip_build', 'skip_build'),
)
def run(self):
if not self.skip_build:
self.run_command('build_scripts')
self.outfiles = self.copy_tree(self.build_dir, self.install_dir)
if os.name == 'posix':
# Set the executable bits (owner, group, and world) on
# all the scripts we just installed.
for file in self.get_outputs():
if self.dry_run:
log.info("changing mode of %s", file)
else:
mode = ((os.stat(file)[ST_MODE]) | 0o555) & 0o7777
log.info("changing mode of %s to %o", file, mode)
os.chmod(file, mode)
def get_inputs(self):
return self.distribution.scripts or []
def get_outputs(self):
return self.outfiles or []
"""distutils.command.register
Implements the Distutils 'register' command (register with the repository).
"""
# created 2002/10/21, Richard Jones
import os, string, getpass
import io
import urllib.parse, urllib.request
from warnings import warn
from distutils.core import PyPIRCCommand
from distutils.errors import *
from distutils import log
class register(PyPIRCCommand):
description = ("register the distribution with the Python package index")
user_options = PyPIRCCommand.user_options + [
('list-classifiers', None,
'list the valid Trove classifiers'),
('strict', None ,
'Will stop the registering if the meta-data are not fully compliant')
]
boolean_options = PyPIRCCommand.boolean_options + [
'verify', 'list-classifiers', 'strict']
sub_commands = [('check', lambda self: True)]
def initialize_options(self):
PyPIRCCommand.initialize_options(self)
self.list_classifiers = 0
self.strict = 0
def finalize_options(self):
PyPIRCCommand.finalize_options(self)
# setting options for the `check` subcommand
check_options = {'strict': ('register', self.strict),
'restructuredtext': ('register', 1)}
self.distribution.command_options['check'] = check_options
def run(self):
self.finalize_options()
self._set_config()
# Run sub commands
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
if self.dry_run:
self.verify_metadata()
elif self.list_classifiers:
self.classifiers()
else:
self.send_metadata()
def check_metadata(self):
"""Deprecated API."""
warn("distutils.command.register.check_metadata is deprecated, \
use the check command instead", PendingDeprecationWarning)
check = self.distribution.get_command_obj('check')
check.ensure_finalized()
check.strict = self.strict
check.restructuredtext = 1
check.run()
def _set_config(self):
''' Reads the configuration file and set attributes.
'''
config = self._read_pypirc()
if config != {}:
self.username = config['username']
self.password = config['password']
self.repository = config['repository']
self.realm = config['realm']
self.has_config = True
else:
if self.repository not in ('pypi', self.DEFAULT_REPOSITORY):
raise ValueError('%s not found in .pypirc' % self.repository)
if self.repository == 'pypi':
self.repository = self.DEFAULT_REPOSITORY
self.has_config = False
def classifiers(self):
''' Fetch the list of classifiers from the server.
'''
url = self.repository+'?:action=list_classifiers'
response = urllib.request.urlopen(url)
log.info(self._read_pypi_response(response))
def verify_metadata(self):
''' Send the metadata to the package index server to be checked.
'''
# send the info to the server and report the result
(code, result) = self.post_to_server(self.build_post_data('verify'))
log.info('Server response (%s): %s' % (code, result))
def send_metadata(self):
''' Send the metadata to the package index server.
Well, do the following:
1. figure who the user is, and then
2. send the data as a Basic auth'ed POST.
First we try to read the username/password from $HOME/.pypirc,
which is a ConfigParser-formatted file with a section
[distutils] containing username and password entries (both
in clear text). Eg:
[distutils]
index-servers =
pypi
[pypi]
username: fred
password: sekrit
Otherwise, to figure who the user is, we offer the user three
choices:
1. use existing login,
2. register as a new user, or
3. set the password to a random string and email the user.
'''
# see if we can short-cut and get the username/password from the
# config
if self.has_config:
choice = '1'
username = self.username
password = self.password
else:
choice = 'x'
username = password = ''
# get the user's login info
choices = '1 2 3 4'.split()
while choice not in choices:
self.announce('''\
We need to know who you are, so please choose either:
1. use your existing login,
2. register as a new user,
3. have the server generate a new password for you (and email it to you), or
4. quit
Your selection [default 1]: ''', log.INFO)
choice = input()
if not choice:
choice = '1'
elif choice not in choices:
print('Please choose one of the four options!')
if choice == '1':
# get the username and password
while not username:
username = input('Username: ')
while not password:
password = getpass.getpass('Password: ')
# set up the authentication
auth = urllib.request.HTTPPasswordMgr()
host = urllib.parse.urlparse(self.repository)[1]
auth.add_password(self.realm, host, username, password)
# send the info to the server and report the result
code, result = self.post_to_server(self.build_post_data('submit'),
auth)
self.announce('Server response (%s): %s' % (code, result),
log.INFO)
# possibly save the login
if code == 200:
if self.has_config:
# sharing the password in the distribution instance
# so the upload command can reuse it
self.distribution.password = password
else:
self.announce(('I can store your PyPI login so future '
'submissions will be faster.'), log.INFO)
self.announce('(the login will be stored in %s)' % \
self._get_rc_file(), log.INFO)
choice = 'X'
while choice.lower() not in 'yn':
choice = input('Save your login (y/N)?')
if not choice:
choice = 'n'
if choice.lower() == 'y':
self._store_pypirc(username, password)
elif choice == '2':
data = {':action': 'user'}
data['name'] = data['password'] = data['email'] = ''
data['confirm'] = None
while not data['name']:
data['name'] = input('Username: ')
while data['password'] != data['confirm']:
while not data['password']:
data['password'] = getpass.getpass('Password: ')
while not data['confirm']:
data['confirm'] = getpass.getpass(' Confirm: ')
if data['password'] != data['confirm']:
data['password'] = ''
data['confirm'] = None
print("Password and confirm don't match!")
while not data['email']:
data['email'] = input(' EMail: ')
code, result = self.post_to_server(data)
if code != 200:
log.info('Server response (%s): %s' % (code, result))
else:
log.info('You will receive an email shortly.')
log.info(('Follow the instructions in it to '
'complete registration.'))
elif choice == '3':
data = {':action': 'password_reset'}
data['email'] = ''
while not data['email']:
data['email'] = input('Your email address: ')
code, result = self.post_to_server(data)
log.info('Server response (%s): %s' % (code, result))
def build_post_data(self, action):
# figure the data to send - the metadata plus some additional
# information used by the package server
meta = self.distribution.metadata
data = {
':action': action,
'metadata_version' : '1.0',
'name': meta.get_name(),
'version': meta.get_version(),
'summary': meta.get_description(),
'home_page': meta.get_url(),
'author': meta.get_contact(),
'author_email': meta.get_contact_email(),
'license': meta.get_licence(),
'description': meta.get_long_description(),
'keywords': meta.get_keywords(),
'platform': meta.get_platforms(),
'classifiers': meta.get_classifiers(),
'download_url': meta.get_download_url(),
# PEP 314
'provides': meta.get_provides(),
'requires': meta.get_requires(),
'obsoletes': meta.get_obsoletes(),
}
if data['provides'] or data['requires'] or data['obsoletes']:
data['metadata_version'] = '1.1'
return data
def post_to_server(self, data, auth=None):
''' Post a query to the server, and return a string response.
'''
if 'name' in data:
self.announce('Registering %s to %s' % (data['name'],
self.repository),
log.INFO)
# Build up the MIME payload for the urllib2 POST data
boundary = '--------------GHSKFJDLGDS7543FJKLFHRE75642756743254'
sep_boundary = '\n--' + boundary
end_boundary = sep_boundary + '--'
body = io.StringIO()
for key, value in data.items():
# handle multiple entries for the same name
if type(value) not in (type([]), type( () )):
value = [value]
for value in value:
value = str(value)
body.write(sep_boundary)
body.write('\nContent-Disposition: form-data; name="%s"'%key)
body.write("\n\n")
body.write(value)
if value and value[-1] == '\r':
body.write('\n') # write an extra newline (lurve Macs)
body.write(end_boundary)
body.write("\n")
body = body.getvalue().encode("utf-8")
# build the Request
headers = {
'Content-type': 'multipart/form-data; boundary=%s; charset=utf-8'%boundary,
'Content-length': str(len(body))
}
req = urllib.request.Request(self.repository, body, headers)
# handle HTTP and include the Basic Auth handler
opener = urllib.request.build_opener(
urllib.request.HTTPBasicAuthHandler(password_mgr=auth)
)
data = ''
try:
result = opener.open(req)
except urllib.error.HTTPError as e:
if self.show_response:
data = e.fp.read()
result = e.code, e.msg
except urllib.error.URLError as e:
result = 500, str(e)
else:
if self.show_response:
data = result.read()
result = 200, 'OK'
if self.show_response:
dashes = '-' * 75
self.announce('%s%r%s' % (dashes, data, dashes))
return result
"""distutils.command.sdist
Implements the Distutils 'sdist' command (create a source distribution)."""
import os
import string
import sys
from types import *
from glob import glob
from warnings import warn
from distutils.core import Command
from distutils import dir_util, dep_util, file_util, archive_util
from distutils.text_file import TextFile
from distutils.errors import *
from distutils.filelist import FileList
from distutils import log
from distutils.util import convert_path
def show_formats():
"""Print all possible values for the 'formats' option (used by
the "--help-formats" command-line option).
"""
from distutils.fancy_getopt import FancyGetopt
from distutils.archive_util import ARCHIVE_FORMATS
formats = []
for format in ARCHIVE_FORMATS.keys():
formats.append(("formats=" + format, None,
ARCHIVE_FORMATS[format][2]))
formats.sort()
FancyGetopt(formats).print_help(
"List of available source distribution formats:")
class sdist(Command):
description = "create a source distribution (tarball, zip file, etc.)"
def checking_metadata(self):
"""Callable used for the check sub-command.
Placed here so user_options can view it"""
return self.metadata_check
user_options = [
('template=', 't',
"name of manifest template file [default: MANIFEST.in]"),
('manifest=', 'm',
"name of manifest file [default: MANIFEST]"),
('use-defaults', None,
"include the default file set in the manifest "
"[default; disable with --no-defaults]"),
('no-defaults', None,
"don't include the default file set"),
('prune', None,
"specifically exclude files/directories that should not be "
"distributed (build tree, RCS/CVS dirs, etc.) "
"[default; disable with --no-prune]"),
('no-prune', None,
"don't automatically exclude anything"),
('manifest-only', 'o',
"just regenerate the manifest and then stop "
"(implies --force-manifest)"),
('force-manifest', 'f',
"forcibly regenerate the manifest and carry on as usual. "
"Deprecated: now the manifest is always regenerated."),
('formats=', None,
"formats for source distribution (comma-separated list)"),
('keep-temp', 'k',
"keep the distribution tree around after creating " +
"archive file(s)"),
('dist-dir=', 'd',
"directory to put the source distribution archive(s) in "
"[default: dist]"),
('metadata-check', None,
"Ensure that all required elements of meta-data "
"are supplied. Warn if any missing. [default]"),
('owner=', 'u',
"Owner name used when creating a tar file [default: current user]"),
('group=', 'g',
"Group name used when creating a tar file [default: current group]"),
]
boolean_options = ['use-defaults', 'prune',
'manifest-only', 'force-manifest',
'keep-temp', 'metadata-check']
help_options = [
('help-formats', None,
"list available distribution formats", show_formats),
]
negative_opt = {'no-defaults': 'use-defaults',
'no-prune': 'prune' }
default_format = {'posix': 'gztar',
'nt': 'zip' }
sub_commands = [('check', checking_metadata)]
def initialize_options(self):
# 'template' and 'manifest' are, respectively, the names of
# the manifest template and manifest file.
self.template = None
self.manifest = None
# 'use_defaults': if true, we will include the default file set
# in the manifest
self.use_defaults = 1
self.prune = 1
self.manifest_only = 0
self.force_manifest = 0
self.formats = None
self.keep_temp = 0
self.dist_dir = None
self.archive_files = None
self.metadata_check = 1
self.owner = None
self.group = None
def finalize_options(self):
if self.manifest is None:
self.manifest = "MANIFEST"
if self.template is None:
self.template = "MANIFEST.in"
self.ensure_string_list('formats')
if self.formats is None:
try:
self.formats = [self.default_format[os.name]]
except KeyError:
raise DistutilsPlatformError(
"don't know how to create source distributions "
"on platform %s" % os.name)
bad_format = archive_util.check_archive_formats(self.formats)
if bad_format:
raise DistutilsOptionError(
"unknown archive format '%s'" % bad_format)
if self.dist_dir is None:
self.dist_dir = "dist"
def run(self):
# 'filelist' contains the list of files that will make up the
# manifest
self.filelist = FileList()
# Run sub commands
for cmd_name in self.get_sub_commands():
self.run_command(cmd_name)
# Do whatever it takes to get the list of files to process
# (process the manifest template, read an existing manifest,
# whatever). File list is accumulated in 'self.filelist'.
self.get_file_list()
# If user just wanted us to regenerate the manifest, stop now.
if self.manifest_only:
return
# Otherwise, go ahead and create the source distribution tarball,
# or zipfile, or whatever.
self.make_distribution()
def check_metadata(self):
"""Deprecated API."""
warn("distutils.command.sdist.check_metadata is deprecated, \
use the check command instead", PendingDeprecationWarning)
check = self.distribution.get_command_obj('check')
check.ensure_finalized()
check.run()
def get_file_list(self):
"""Figure out the list of files to include in the source
distribution, and put it in 'self.filelist'. This might involve
reading the manifest template (and writing the manifest), or just
reading the manifest, or just using the default file set -- it all
depends on the user's options.
"""
# new behavior when using a template:
# the file list is recalculated every time because
# even if MANIFEST.in or setup.py are not changed
# the user might have added some files in the tree that
# need to be included.
#
# This makes --force the default and only behavior with templates.
template_exists = os.path.isfile(self.template)
if not template_exists and self._manifest_is_not_generated():
self.read_manifest()
self.filelist.sort()
self.filelist.remove_duplicates()
return
if not template_exists:
self.warn(("manifest template '%s' does not exist " +
"(using default file list)") %
self.template)
self.filelist.findall()
if self.use_defaults:
self.add_defaults()
if template_exists:
self.read_template()
if self.prune:
self.prune_file_list()
self.filelist.sort()
self.filelist.remove_duplicates()
self.write_manifest()
def add_defaults(self):
"""Add all the default files to self.filelist:
- README or README.txt
- setup.py
- test/test*.py
- all pure Python modules mentioned in setup script
- all files pointed by package_data (build_py)
- all files defined in data_files.
- all files defined as scripts.
- all C sources listed as part of extensions or C libraries
in the setup script (doesn't catch C headers!)
Warns if (README or README.txt) or setup.py are missing; everything
else is optional.
"""
standards = [('README', 'README.txt'), self.distribution.script_name]
for fn in standards:
if isinstance(fn, tuple):
alts = fn
got_it = False
for fn in alts:
if os.path.exists(fn):
got_it = True
self.filelist.append(fn)
break
if not got_it:
self.warn("standard file not found: should have one of " +
', '.join(alts))
else:
if os.path.exists(fn):
self.filelist.append(fn)
else:
self.warn("standard file '%s' not found" % fn)
optional = ['test/test*.py', 'setup.cfg']
for pattern in optional:
files = filter(os.path.isfile, glob(pattern))
self.filelist.extend(files)
# build_py is used to get:
# - python modules
# - files defined in package_data
build_py = self.get_finalized_command('build_py')
# getting python files
if self.distribution.has_pure_modules():
self.filelist.extend(build_py.get_source_files())
# getting package_data files
# (computed in build_py.data_files by build_py.finalize_options)
for pkg, src_dir, build_dir, filenames in build_py.data_files:
for filename in filenames:
self.filelist.append(os.path.join(src_dir, filename))
# getting distribution.data_files
if self.distribution.has_data_files():
for item in self.distribution.data_files:
if isinstance(item, str): # plain file
item = convert_path(item)
if os.path.isfile(item):
self.filelist.append(item)
else: # a (dirname, filenames) tuple
dirname, filenames = item
for f in filenames:
f = convert_path(f)
if os.path.isfile(f):
self.filelist.append(f)
if self.distribution.has_ext_modules():
build_ext = self.get_finalized_command('build_ext')
self.filelist.extend(build_ext.get_source_files())
if self.distribution.has_c_libraries():
build_clib = self.get_finalized_command('build_clib')
self.filelist.extend(build_clib.get_source_files())
if self.distribution.has_scripts():
build_scripts = self.get_finalized_command('build_scripts')
self.filelist.extend(build_scripts.get_source_files())
def read_template(self):
"""Read and parse manifest template file named by self.template.
(usually "MANIFEST.in") The parsing and processing is done by
'self.filelist', which updates itself accordingly.
"""
log.info("reading manifest template '%s'", self.template)
template = TextFile(self.template, strip_comments=1, skip_blanks=1,
join_lines=1, lstrip_ws=1, rstrip_ws=1,
collapse_join=1)
try:
while True:
line = template.readline()
if line is None: # end of file
break
try:
self.filelist.process_template_line(line)
# the call above can raise a DistutilsTemplateError for
# malformed lines, or a ValueError from the lower-level
# convert_path function
except (DistutilsTemplateError, ValueError) as msg:
self.warn("%s, line %d: %s" % (template.filename,
template.current_line,
msg))
finally:
template.close()
def prune_file_list(self):
"""Prune off branches that might slip into the file list as created
by 'read_template()', but really don't belong there:
* the build tree (typically "build")
* the release tree itself (only an issue if we ran "sdist"
previously with --keep-temp, or it aborted)
* any RCS, CVS, .svn, .hg, .git, .bzr, _darcs directories
"""
build = self.get_finalized_command('build')
base_dir = self.distribution.get_fullname()
self.filelist.exclude_pattern(None, prefix=build.build_base)
self.filelist.exclude_pattern(None, prefix=base_dir)
if sys.platform == 'win32':
seps = r'/|\\'
else:
seps = '/'
vcs_dirs = ['RCS', 'CVS', r'\.svn', r'\.hg', r'\.git', r'\.bzr',
'_darcs']
vcs_ptrn = r'(^|%s)(%s)(%s).*' % (seps, '|'.join(vcs_dirs), seps)
self.filelist.exclude_pattern(vcs_ptrn, is_regex=1)
def write_manifest(self):
"""Write the file list in 'self.filelist' (presumably as filled in
by 'add_defaults()' and 'read_template()') to the manifest file
named by 'self.manifest'.
"""
if self._manifest_is_not_generated():
log.info("not writing to manually maintained "
"manifest file '%s'" % self.manifest)
return
content = self.filelist.files[:]
content.insert(0, '# file GENERATED by distutils, do NOT edit')
self.execute(file_util.write_file, (self.manifest, content),
"writing manifest file '%s'" % self.manifest)
def _manifest_is_not_generated(self):
# check for special comment used in 3.1.3 and higher
if not os.path.isfile(self.manifest):
return False
fp = open(self.manifest)
try:
first_line = fp.readline()
finally:
fp.close()
return first_line != '# file GENERATED by distutils, do NOT edit\n'
def read_manifest(self):
"""Read the manifest file (named by 'self.manifest') and use it to
fill in 'self.filelist', the list of files to include in the source
distribution.
"""
log.info("reading manifest file '%s'", self.manifest)
manifest = open(self.manifest)
for line in manifest:
# ignore comments and blank lines
line = line.strip()
if line.startswith('#') or not line:
continue
self.filelist.append(line)
manifest.close()
def make_release_tree(self, base_dir, files):
"""Create the directory tree that will become the source
distribution archive. All directories implied by the filenames in
'files' are created under 'base_dir', and then we hard link or copy
(if hard linking is unavailable) those files into place.
Essentially, this duplicates the developer's source tree, but in a
directory named after the distribution, containing only the files
to be distributed.
"""
# Create all the directories under 'base_dir' necessary to
# put 'files' there; the 'mkpath()' is just so we don't die
# if the manifest happens to be empty.
self.mkpath(base_dir)
dir_util.create_tree(base_dir, files, dry_run=self.dry_run)
# And walk over the list of files, either making a hard link (if
# os.link exists) to each one that doesn't already exist in its
# corresponding location under 'base_dir', or copying each file
# that's out-of-date in 'base_dir'. (Usually, all files will be
# out-of-date, because by default we blow away 'base_dir' when
# we're done making the distribution archives.)
if hasattr(os, 'link'): # can make hard links on this system
link = 'hard'
msg = "making hard links in %s..." % base_dir
else: # nope, have to copy
link = None
msg = "copying files to %s..." % base_dir
if not files:
log.warn("no files to distribute -- empty manifest?")
else:
log.info(msg)
for file in files:
if not os.path.isfile(file):
log.warn("'%s' not a regular file -- skipping" % file)
else:
dest = os.path.join(base_dir, file)
self.copy_file(file, dest, link=link)
self.distribution.metadata.write_pkg_info(base_dir)
def make_distribution(self):
"""Create the source distribution(s). First, we create the release
tree with 'make_release_tree()'; then, we create all required
archive files (according to 'self.formats') from the release tree.
Finally, we clean up by blowing away the release tree (unless
'self.keep_temp' is true). The list of archive files created is
stored so it can be retrieved later by 'get_archive_files()'.
"""
# Don't warn about missing meta-data here -- should be (and is!)
# done elsewhere.
base_dir = self.distribution.get_fullname()
base_name = os.path.join(self.dist_dir, base_dir)
self.make_release_tree(base_dir, self.filelist.files)
archive_files = [] # remember names of files we create
# tar archive must be created last to avoid overwrite and remove
if 'tar' in self.formats:
self.formats.append(self.formats.pop(self.formats.index('tar')))
for fmt in self.formats:
file = self.make_archive(base_name, fmt, base_dir=base_dir,
owner=self.owner, group=self.group)
archive_files.append(file)
self.distribution.dist_files.append(('sdist', '', file))
self.archive_files = archive_files
if not self.keep_temp:
dir_util.remove_tree(base_dir, dry_run=self.dry_run)
def get_archive_files(self):
"""Return the list of archive files created when the command
was run, or None if the command hasn't run yet.
"""
return self.archive_files
"""distutils.command.upload
Implements the Distutils 'upload' subcommand (upload package to PyPI)."""
import sys
import os, io
import socket
import platform
from base64 import standard_b64encode
from urllib.request import urlopen, Request, HTTPError
from urllib.parse import urlparse
from distutils.errors import DistutilsError, DistutilsOptionError
from distutils.core import PyPIRCCommand
from distutils.spawn import spawn
from distutils import log
# this keeps compatibility for 2.3 and 2.4
if sys.version < "2.5":
from md5 import md5
else:
from hashlib import md5
class upload(PyPIRCCommand):
description = "upload binary package to PyPI"
user_options = PyPIRCCommand.user_options + [
('sign', 's',
'sign files to upload using gpg'),
('identity=', 'i', 'GPG identity used to sign files'),
]
boolean_options = PyPIRCCommand.boolean_options + ['sign']
def initialize_options(self):
PyPIRCCommand.initialize_options(self)
self.username = ''
self.password = ''
self.show_response = 0
self.sign = False
self.identity = None
def finalize_options(self):
PyPIRCCommand.finalize_options(self)
if self.identity and not self.sign:
raise DistutilsOptionError(
"Must use --sign for --identity to have meaning"
)
config = self._read_pypirc()
if config != {}:
self.username = config['username']
self.password = config['password']
self.repository = config['repository']
self.realm = config['realm']
# getting the password from the distribution
# if previously set by the register command
if not self.password and self.distribution.password:
self.password = self.distribution.password
def run(self):
if not self.distribution.dist_files:
raise DistutilsOptionError("No dist file created in earlier command")
for command, pyversion, filename in self.distribution.dist_files:
self.upload_file(command, pyversion, filename)
def upload_file(self, command, pyversion, filename):
# Makes sure the repository URL is compliant
schema, netloc, url, params, query, fragments = \
urlparse(self.repository)
if params or query or fragments:
raise AssertionError("Incompatible url %s" % self.repository)
if schema not in ('http', 'https'):
raise AssertionError("unsupported schema " + schema)
# Sign if requested
if self.sign:
gpg_args = ["gpg", "--detach-sign", "-a", filename]
if self.identity:
gpg_args[2:2] = ["--local-user", self.identity]
spawn(gpg_args,
dry_run=self.dry_run)
# Fill in the data - send all the meta-data in case we need to
# register a new release
f = open(filename,'rb')
try:
content = f.read()
finally:
f.close()
meta = self.distribution.metadata
data = {
# action
':action': 'file_upload',
'protcol_version': '1',
# identify release
'name': meta.get_name(),
'version': meta.get_version(),
# file content
'content': (os.path.basename(filename),content),
'filetype': command,
'pyversion': pyversion,
'md5_digest': md5(content).hexdigest(),
# additional meta-data
'metadata_version' : '1.0',
'summary': meta.get_description(),
'home_page': meta.get_url(),
'author': meta.get_contact(),
'author_email': meta.get_contact_email(),
'license': meta.get_licence(),
'description': meta.get_long_description(),
'keywords': meta.get_keywords(),
'platform': meta.get_platforms(),
'classifiers': meta.get_classifiers(),
'download_url': meta.get_download_url(),
# PEP 314
'provides': meta.get_provides(),
'requires': meta.get_requires(),
'obsoletes': meta.get_obsoletes(),
}
comment = ''
if command == 'bdist_rpm':
dist, version, id = platform.dist()
if dist:
comment = 'built for %s %s' % (dist, version)
elif command == 'bdist_dumb':
comment = 'built for %s' % platform.platform(terse=1)
data['comment'] = comment
if self.sign:
data['gpg_signature'] = (os.path.basename(filename) + ".asc",
open(filename+".asc", "rb").read())
# set up the authentication
user_pass = (self.username + ":" + self.password).encode('ascii')
# The exact encoding of the authentication string is debated.
# Anyway PyPI only accepts ascii for both username or password.
auth = "Basic " + standard_b64encode(user_pass).decode('ascii')
# Build up the MIME payload for the POST data
boundary = '--------------GHSKFJDLGDS7543FJKLFHRE75642756743254'
sep_boundary = b'\r\n--' + boundary.encode('ascii')
end_boundary = sep_boundary + b'--\r\n'
body = io.BytesIO()
for key, value in data.items():
title = '\r\nContent-Disposition: form-data; name="%s"' % key
# handle multiple entries for the same name
if type(value) != type([]):
value = [value]
for value in value:
if type(value) is tuple:
title += '; filename="%s"' % value[0]
value = value[1]
else:
value = str(value).encode('utf-8')
body.write(sep_boundary)
body.write(title.encode('utf-8'))
body.write(b"\r\n\r\n")
body.write(value)
if value and value[-1:] == b'\r':
body.write(b'\n') # write an extra newline (lurve Macs)
body.write(end_boundary)
body = body.getvalue()
self.announce("Submitting %s to %s" % (filename, self.repository), log.INFO)
# build the Request
headers = {'Content-type':
'multipart/form-data; boundary=%s' % boundary,
'Content-length': str(len(body)),
'Authorization': auth}
request = Request(self.repository, data=body,
headers=headers)
# send the data
try:
result = urlopen(request)
status = result.getcode()
reason = result.msg
except OSError as e:
self.announce(str(e), log.ERROR)
raise
except HTTPError as e:
status = e.code
reason = e.msg
if status == 200:
self.announce('Server response (%s): %s' % (status, reason),
log.INFO)
else:
msg = 'Upload failed (%s): %s' % (status, reason)
self.announce(msg, log.ERROR)
raise DistutilsError(msg)
if self.show_response:
text = self._read_pypi_response(result)
msg = '\n'.join(('-' * 75, text, '-' * 75))
self.announce(msg, log.INFO)
md5: F8A38FD27DA720881C0AF1AC99B8C1AD | sha1: 2ED31938119E2EBDEB0F5539C985E9965AEF72D7 | sha256: B2E32B3FA44B3A9A8FDFA906627355F6F48B4821929F9BCE5DED2D07894361D4 | sha512: AAFA05BC5BD68687B998FE4D9A619CAECC65D14F317AF7A05AC0ECAB7E231891E8719029245DC84EDDCE20BDD4C0CC6F4FFAFDF8200227746B28CC6628564495
md5: 2124A793AC7D675E1B2D5FDEE19A87D0 | sha1: 3A1A6AE7C218E41C4EB303C548DB9EC06BD6A6B5 | sha256: 1AA3927C7985386D42759656665C7B422EE226DF16A19446AF6D9A6613B8AE9B | sha512: D5B7B789108C00901E96A3F336C2176A6E7F50E73CB485974E8BB7AF1B513B099E88EB6800ED1F0C53969A86A6870130A477C9B17CAD0E00F9DE4AC90252E051
md5: 7B112B1FB864C90EC5B65EAB21CB40B8 | sha1: E7B73361F722FC7CBB93EF98A8D26E34F4D49767 | sha256: 751941B4E09898C31791EFEB5F90FC7367C89831D4A98637ED505E40763E287B | sha512: BF9CDEFF39CC4FA48457C55AD02E3856B5B27998535AED801A469252F01E7676462332FA3F93877753E963D037472F615C1FC5FC2E996316621B4E0A180CB5F5
md5: AE6CE17005C63B7E9BF15A2A21ABB315 | sha1: 9B6BDFB9D648FA422F54EC07B8C8EA70389C09EB | sha256: 4A3387A54EECA83F3A8FF1F5F282F7966C9E7BFE159C8EB45444CAB01B3E167E | sha512: C883A5F599540D636EFC8C0ABC05AAB7BAD0AA1B10AB507F43F18E0FBA905A10B94FF2F1BA10AE0FEE15CC1B90A165A768DC078FDA0AC27474F0EEF66F6A11AF
md5: ED0FDE686788CAEC4F2CB1EC9C31680C | sha1: 81AE63B87EAA9FA5637835D2122C50953AE19D34 | sha256: E362670F93CDD952335B1A41E5529F184F2022EA4D41817A9781B150B062511C | sha512: D90D5E74A9BE23816A93490ED30C0AAE9F7F038A42BD14AA2CE78E95967B4AABD848F006F00ADE619C9976755658D45AA0F5B6D5BABBBB2D59A6ED3A3A12AC6B
md5: E2312F199976D03A7CF41E453C5AF246 | sha1: C723BF05F7132C9B66C4F91D6CC363D08B4ED622 | sha256: 84FE7824717BB55D7F32C7487E37012A1BC6CD4C8C0202BE4BFB07E770F8DC51 | sha512: A5CAD97D8BCF893B79EED436AE8DF232D7E53DF86A0ED38B381C128C5D8C76C0CAAD41407ED564F2EA2725236EB98EA6D29413886EA22371920BF2B498B49686
md5: 94A363CD532D88AC33997C25657A19B5 | sha1: A98F1A8361D0183651C0EF457B9AC4339E429BEA | sha256: 13B98844B2FA4A39A4D8EBB414FC79450D5AB4F0C8F5141AC06D40B2A0431EA4 | sha512: 3B1C87A67F63E4276453EC1E322F0C13896DD0524EF35F4E4037A481CE354FEAA98440F85B784C0B90A900C59EF115654F687457180EA433EA0100427F5C26F5
"""distutils.command
Package containing implementation of all the standard Distutils
commands."""
__all__ = ['build',
'build_py',
'build_ext',
'build_clib',
'build_scripts',
'clean',
'install',
'install_lib',
'install_headers',
'install_scripts',
'install_data',
'sdist',
'register',
'bdist',
'bdist_dumb',
'bdist_rpm',
'bdist_wininst',
'check',
'upload',
# These two are reserved for future use:
#'bdist_sdux',
#'bdist_pkgtool',
# Note:
# bdist_packager is not included because it only provides
# an abstract base class
]
# Copyright (C) 2002-2007 Python Software Foundation
# Author: Ben Gertzfield
# Contact: [email protected]
"""Base64 content transfer encoding per RFCs 2045-2047.
This module handles the content transfer encoding method defined in RFC 2045
to encode arbitrary 8-bit data using the three 8-bit bytes in four 7-bit
characters encoding known as Base64.
It is used in the MIME standards for email to attach images, audio, and text
using some 8-bit character sets to messages.
This module provides an interface to encode and decode both headers and bodies
with Base64 encoding.
RFC 2045 defines a method for including character set information in an
`encoded-word' in a header. This method is commonly used for 8-bit real names
in To:, From:, Cc:, etc. fields, as well as Subject: lines.
This module does not do the line wrapping or end-of-line character conversion
necessary for proper internationalized headers; it only does dumb encoding and
decoding. To deal with the various line wrapping issues, use the email.header
module.
"""
__all__ = [
'body_decode',
'body_encode',
'decode',
'decodestring',
'header_encode',
'header_length',
]
from base64 import b64encode
from binascii import b2a_base64, a2b_base64
CRLF = '\r\n'
NL = '\n'
EMPTYSTRING = ''
# See also Charset.py
MISC_LEN = 7
# Helpers
def header_length(bytearray):
"""Return the length of s when it is encoded with base64."""
groups_of_3, leftover = divmod(len(bytearray), 3)
# 4 bytes out for each 3 bytes (or nonzero fraction thereof) in.
n = groups_of_3 * 4
if leftover:
n += 4
return n
def header_encode(header_bytes, charset='iso-8859-1'):
"""Encode a single header line with Base64 encoding in a given charset.
charset names the character set to use to encode the header. It defaults
to iso-8859-1. Base64 encoding is defined in RFC 2045.
"""
if not header_bytes:
return ""
if isinstance(header_bytes, str):
header_bytes = header_bytes.encode(charset)
encoded = b64encode(header_bytes).decode("ascii")
return '=?%s?b?%s?=' % (charset, encoded)
def body_encode(s, maxlinelen=76, eol=NL):
r"""Encode a string with base64.
Each line will be wrapped at, at most, maxlinelen characters (defaults to
76 characters).
Each line of encoded text will end with eol, which defaults to "\n". Set
this to "\r\n" if you will be using the result of this function directly
in an email.
"""
if not s:
return s
encvec = []
max_unencoded = maxlinelen * 3 // 4
for i in range(0, len(s), max_unencoded):
# BAW: should encode() inherit b2a_base64()'s dubious behavior in
# adding a newline to the encoded string?
enc = b2a_base64(s[i:i + max_unencoded]).decode("ascii")
if enc.endswith(NL) and eol != NL:
enc = enc[:-1] + eol
encvec.append(enc)
return EMPTYSTRING.join(encvec)
def decode(string):
"""Decode a raw base64 string, returning a bytes object.
This function does not parse a full MIME header value encoded with
base64 (like =?iso-8895-1?b?bmloISBuaWgh?=) -- please use the high
level email.header class for that functionality.
"""
if not string:
return bytes()
elif isinstance(string, str):
return a2b_base64(string.encode('raw-unicode-escape'))
else:
return a2b_base64(string)
# For convenience and backwards compatibility w/ standard base64 module
body_decode = decode
decodestring = decode
# Copyright (C) 2001-2007 Python Software Foundation
# Author: Ben Gertzfield, Barry Warsaw
# Contact: [email protected]
__all__ = [
'Charset',
'add_alias',
'add_charset',
'add_codec',
]
from functools import partial
import email.base64mime
import email.quoprimime
from email import errors
from email.encoders import encode_7or8bit
# Flags for types of header encodings
QP = 1 # Quoted-Printable
BASE64 = 2 # Base64
SHORTEST = 3 # the shorter of QP and base64, but only for headers
# In "=?charset?q?hello_world?=", the =?, ?q?, and ?= add up to 7
RFC2047_CHROME_LEN = 7
DEFAULT_CHARSET = 'us-ascii'
UNKNOWN8BIT = 'unknown-8bit'
EMPTYSTRING = ''
# Defaults
CHARSETS = {
# input header enc body enc output conv
'iso-8859-1': (QP, QP, None),
'iso-8859-2': (QP, QP, None),
'iso-8859-3': (QP, QP, None),
'iso-8859-4': (QP, QP, None),
# iso-8859-5 is Cyrillic, and not especially used
# iso-8859-6 is Arabic, also not particularly used
# iso-8859-7 is Greek, QP will not make it readable
# iso-8859-8 is Hebrew, QP will not make it readable
'iso-8859-9': (QP, QP, None),
'iso-8859-10': (QP, QP, None),
# iso-8859-11 is Thai, QP will not make it readable
'iso-8859-13': (QP, QP, None),
'iso-8859-14': (QP, QP, None),
'iso-8859-15': (QP, QP, None),
'iso-8859-16': (QP, QP, None),
'windows-1252':(QP, QP, None),
'viscii': (QP, QP, None),
'us-ascii': (None, None, None),
'big5': (BASE64, BASE64, None),
'gb2312': (BASE64, BASE64, None),
'euc-jp': (BASE64, None, 'iso-2022-jp'),
'shift_jis': (BASE64, None, 'iso-2022-jp'),
'iso-2022-jp': (BASE64, None, None),
'koi8-r': (BASE64, BASE64, None),
'utf-8': (SHORTEST, BASE64, 'utf-8'),
}
# Aliases for other commonly-used names for character sets. Map
# them to the real ones used in email.
ALIASES = {
'latin_1': 'iso-8859-1',
'latin-1': 'iso-8859-1',
'latin_2': 'iso-8859-2',
'latin-2': 'iso-8859-2',
'latin_3': 'iso-8859-3',
'latin-3': 'iso-8859-3',
'latin_4': 'iso-8859-4',
'latin-4': 'iso-8859-4',
'latin_5': 'iso-8859-9',
'latin-5': 'iso-8859-9',
'latin_6': 'iso-8859-10',
'latin-6': 'iso-8859-10',
'latin_7': 'iso-8859-13',
'latin-7': 'iso-8859-13',
'latin_8': 'iso-8859-14',
'latin-8': 'iso-8859-14',
'latin_9': 'iso-8859-15',
'latin-9': 'iso-8859-15',
'latin_10':'iso-8859-16',
'latin-10':'iso-8859-16',
'cp949': 'ks_c_5601-1987',
'euc_jp': 'euc-jp',
'euc_kr': 'euc-kr',
'ascii': 'us-ascii',
}
# Map charsets to their Unicode codec strings.
CODEC_MAP = {
'gb2312': 'eucgb2312_cn',
'big5': 'big5_tw',
# Hack: We don't want *any* conversion for stuff marked us-ascii, as all
# sorts of garbage might be sent to us in the guise of 7-bit us-ascii.
# Let that stuff pass through without conversion to/from Unicode.
'us-ascii': None,
}
# Convenience functions for extending the above mappings
def add_charset(charset, header_enc=None, body_enc=None, output_charset=None):
"""Add character set properties to the global registry.
charset is the input character set, and must be the canonical name of a
character set.
Optional header_enc and body_enc is either Charset.QP for
quoted-printable, Charset.BASE64 for base64 encoding, Charset.SHORTEST for
the shortest of qp or base64 encoding, or None for no encoding. SHORTEST
is only valid for header_enc. It describes how message headers and
message bodies in the input charset are to be encoded. Default is no
encoding.
Optional output_charset is the character set that the output should be
in. Conversions will proceed from input charset, to Unicode, to the
output charset when the method Charset.convert() is called. The default
is to output in the same character set as the input.
Both input_charset and output_charset must have Unicode codec entries in
the module's charset-to-codec mapping; use add_codec(charset, codecname)
to add codecs the module does not know about. See the codecs module's
documentation for more information.
"""
if body_enc == SHORTEST:
raise ValueError('SHORTEST not allowed for body_enc')
CHARSETS[charset] = (header_enc, body_enc, output_charset)
def add_alias(alias, canonical):
"""Add a character set alias.
alias is the alias name, e.g. latin-1
canonical is the character set's canonical name, e.g. iso-8859-1
"""
ALIASES[alias] = canonical
def add_codec(charset, codecname):
"""Add a codec that map characters in the given charset to/from Unicode.
charset is the canonical name of a character set. codecname is the name
of a Python codec, as appropriate for the second argument to the unicode()
built-in, or to the encode() method of a Unicode string.
"""
CODEC_MAP[charset] = codecname
# Convenience function for encoding strings, taking into account
# that they might be unknown-8bit (ie: have surrogate-escaped bytes)
def _encode(string, codec):
if codec == UNKNOWN8BIT:
return string.encode('ascii', 'surrogateescape')
else:
return string.encode(codec)
class Charset:
"""Map character sets to their email properties.
This class provides information about the requirements imposed on email
for a specific character set. It also provides convenience routines for
converting between character sets, given the availability of the
applicable codecs. Given a character set, it will do its best to provide
information on how to use that character set in an email in an
RFC-compliant way.
Certain character sets must be encoded with quoted-printable or base64
when used in email headers or bodies. Certain character sets must be
converted outright, and are not allowed in email. Instances of this
module expose the following information about a character set:
input_charset: The initial character set specified. Common aliases
are converted to their `official' email names (e.g. latin_1
is converted to iso-8859-1). Defaults to 7-bit us-ascii.
header_encoding: If the character set must be encoded before it can be
used in an email header, this attribute will be set to
Charset.QP (for quoted-printable), Charset.BASE64 (for
base64 encoding), or Charset.SHORTEST for the shortest of
QP or BASE64 encoding. Otherwise, it will be None.
body_encoding: Same as header_encoding, but describes the encoding for the
mail message's body, which indeed may be different than the
header encoding. Charset.SHORTEST is not allowed for
body_encoding.
output_charset: Some character sets must be converted before they can be
used in email headers or bodies. If the input_charset is
one of them, this attribute will contain the name of the
charset output will be converted to. Otherwise, it will
be None.
input_codec: The name of the Python codec used to convert the
input_charset to Unicode. If no conversion codec is
necessary, this attribute will be None.
output_codec: The name of the Python codec used to convert Unicode
to the output_charset. If no conversion codec is necessary,
this attribute will have the same value as the input_codec.
"""
def __init__(self, input_charset=DEFAULT_CHARSET):
# RFC 2046, $4.1.2 says charsets are not case sensitive. We coerce to
# unicode because its .lower() is locale insensitive. If the argument
# is already a unicode, we leave it at that, but ensure that the
# charset is ASCII, as the standard (RFC XXX) requires.
try:
if isinstance(input_charset, str):
input_charset.encode('ascii')
else:
input_charset = str(input_charset, 'ascii')
except UnicodeError:
raise errors.CharsetError(input_charset)
input_charset = input_charset.lower()
# Set the input charset after filtering through the aliases
self.input_charset = ALIASES.get(input_charset, input_charset)
# We can try to guess which encoding and conversion to use by the
# charset_map dictionary. Try that first, but let the user override
# it.
henc, benc, conv = CHARSETS.get(self.input_charset,
(SHORTEST, BASE64, None))
if not conv:
conv = self.input_charset
# Set the attributes, allowing the arguments to override the default.
self.header_encoding = henc
self.body_encoding = benc
self.output_charset = ALIASES.get(conv, conv)
# Now set the codecs. If one isn't defined for input_charset,
# guess and try a Unicode codec with the same name as input_codec.
self.input_codec = CODEC_MAP.get(self.input_charset,
self.input_charset)
self.output_codec = CODEC_MAP.get(self.output_charset,
self.output_charset)
def __str__(self):
return self.input_charset.lower()
__repr__ = __str__
def __eq__(self, other):
return str(self) == str(other).lower()
def __ne__(self, other):
return not self.__eq__(other)
def get_body_encoding(self):
"""Return the content-transfer-encoding used for body encoding.
This is either the string `quoted-printable' or `base64' depending on
the encoding used, or it is a function in which case you should call
the function with a single argument, the Message object being
encoded. The function should then set the Content-Transfer-Encoding
header itself to whatever is appropriate.
Returns "quoted-printable" if self.body_encoding is QP.
Returns "base64" if self.body_encoding is BASE64.
Returns conversion function otherwise.
"""
assert self.body_encoding != SHORTEST
if self.body_encoding == QP:
return 'quoted-printable'
elif self.body_encoding == BASE64:
return 'base64'
else:
return encode_7or8bit
def get_output_charset(self):
"""Return the output character set.
This is self.output_charset if that is not None, otherwise it is
self.input_charset.
"""
return self.output_charset or self.input_charset
def header_encode(self, string):
"""Header-encode a string by converting it first to bytes.
The type of encoding (base64 or quoted-printable) will be based on
this charset's `header_encoding`.
:param string: A unicode string for the header. It must be possible
to encode this string to bytes using the character set's
output codec.
:return: The encoded string, with RFC 2047 chrome.
"""
codec = self.output_codec or 'us-ascii'
header_bytes = _encode(string, codec)
# 7bit/8bit encodings return the string unchanged (modulo conversions)
encoder_module = self._get_encoder(header_bytes)
if encoder_module is None:
return string
return encoder_module.header_encode(header_bytes, codec)
def header_encode_lines(self, string, maxlengths):
"""Header-encode a string by converting it first to bytes.
This is similar to `header_encode()` except that the string is fit
into maximum line lengths as given by the argument.
:param string: A unicode string for the header. It must be possible
to encode this string to bytes using the character set's
output codec.
:param maxlengths: Maximum line length iterator. Each element
returned from this iterator will provide the next maximum line
length. This parameter is used as an argument to built-in next()
and should never be exhausted. The maximum line lengths should
not count the RFC 2047 chrome. These line lengths are only a
hint; the splitter does the best it can.
:return: Lines of encoded strings, each with RFC 2047 chrome.
"""
# See which encoding we should use.
codec = self.output_codec or 'us-ascii'
header_bytes = _encode(string, codec)
encoder_module = self._get_encoder(header_bytes)
encoder = partial(encoder_module.header_encode, charset=codec)
# Calculate the number of characters that the RFC 2047 chrome will
# contribute to each line.
charset = self.get_output_charset()
extra = len(charset) + RFC2047_CHROME_LEN
# Now comes the hard part. We must encode bytes but we can't split on
# bytes because some character sets are variable length and each
# encoded word must stand on its own. So the problem is you have to
# encode to bytes to figure out this word's length, but you must split
# on characters. This causes two problems: first, we don't know how
# many octets a specific substring of unicode characters will get
# encoded to, and second, we don't know how many ASCII characters
# those octets will get encoded to. Unless we try it. Which seems
# inefficient. In the interest of being correct rather than fast (and
# in the hope that there will be few encoded headers in any such
# message), brute force it. :(
lines = []
current_line = []
maxlen = next(maxlengths) - extra
for character in string:
current_line.append(character)
this_line = EMPTYSTRING.join(current_line)
length = encoder_module.header_length(_encode(this_line, charset))
if length > maxlen:
# This last character doesn't fit so pop it off.
current_line.pop()
# Does nothing fit on the first line?
if not lines and not current_line:
lines.append(None)
else:
separator = (' ' if lines else '')
joined_line = EMPTYSTRING.join(current_line)
header_bytes = _encode(joined_line, codec)
lines.append(encoder(header_bytes))
current_line = [character]
maxlen = next(maxlengths) - extra
joined_line = EMPTYSTRING.join(current_line)
header_bytes = _encode(joined_line, codec)
lines.append(encoder(header_bytes))
return lines
def _get_encoder(self, header_bytes):
if self.header_encoding == BASE64:
return email.base64mime
elif self.header_encoding == QP:
return email.quoprimime
elif self.header_encoding == SHORTEST:
len64 = email.base64mime.header_length(header_bytes)
lenqp = email.quoprimime.header_length(header_bytes)
if len64 < lenqp:
return email.base64mime
else:
return email.quoprimime
else:
return None
def body_encode(self, string):
"""Body-encode a string by converting it first to bytes.
The type of encoding (base64 or quoted-printable) will be based on
self.body_encoding. If body_encoding is None, we assume the
output charset is a 7bit encoding, so re-encoding the decoded
string using the ascii codec produces the correct string version
of the content.
"""
if not string:
return string
if self.body_encoding is BASE64:
if isinstance(string, str):
string = string.encode(self.output_charset)
return email.base64mime.body_encode(string)
elif self.body_encoding is QP:
# quopromime.body_encode takes a string, but operates on it as if
# it were a list of byte codes. For a (minimal) history on why
# this is so, see changeset 0cf700464177. To correctly encode a
# character set, then, we must turn it into pseudo bytes via the
# latin1 charset, which will encode any byte as a single code point
# between 0 and 255, which is what body_encode is expecting.
if isinstance(string, str):
string = string.encode(self.output_charset)
string = string.decode('latin1')
return email.quoprimime.body_encode(string)
else:
if isinstance(string, str):
string = string.encode(self.output_charset).decode('ascii')
return string
import binascii
import email.charset
import email.message
import email.errors
from email import quoprimime
class ContentManager:
def __init__(self):
self.get_handlers = {}
self.set_handlers = {}
def add_get_handler(self, key, handler):
self.get_handlers[key] = handler
def get_content(self, msg, *args, **kw):
content_type = msg.get_content_type()
if content_type in self.get_handlers:
return self.get_handlers[content_type](msg, *args, **kw)
maintype = msg.get_content_maintype()
if maintype in self.get_handlers:
return self.get_handlers[maintype](msg, *args, **kw)
if '' in self.get_handlers:
return self.get_handlers[''](msg, *args, **kw)
raise KeyError(content_type)
def add_set_handler(self, typekey, handler):
self.set_handlers[typekey] = handler
def set_content(self, msg, obj, *args, **kw):
if msg.get_content_maintype() == 'multipart':
# XXX: is this error a good idea or not? We can remove it later,
# but we can't add it later, so do it for now.
raise TypeError("set_content not valid on multipart")
handler = self._find_set_handler(msg, obj)
msg.clear_content()
handler(msg, obj, *args, **kw)
def _find_set_handler(self, msg, obj):
full_path_for_error = None
for typ in type(obj).__mro__:
if typ in self.set_handlers:
return self.set_handlers[typ]
qname = typ.__qualname__
modname = getattr(typ, '__module__', '')
full_path = '.'.join((modname, qname)) if modname else qname
if full_path_for_error is None:
full_path_for_error = full_path
if full_path in self.set_handlers:
return self.set_handlers[full_path]
if qname in self.set_handlers:
return self.set_handlers[qname]
name = typ.__name__
if name in self.set_handlers:
return self.set_handlers[name]
if None in self.set_handlers:
return self.set_handlers[None]
raise KeyError(full_path_for_error)
raw_data_manager = ContentManager()
def get_text_content(msg, errors='replace'):
content = msg.get_payload(decode=True)
charset = msg.get_param('charset', 'ASCII')
return content.decode(charset, errors=errors)
raw_data_manager.add_get_handler('text', get_text_content)
def get_non_text_content(msg):
return msg.get_payload(decode=True)
for maintype in 'audio image video application'.split():
raw_data_manager.add_get_handler(maintype, get_non_text_content)
def get_message_content(msg):
return msg.get_payload(0)
for subtype in 'rfc822 external-body'.split():
raw_data_manager.add_get_handler('message/'+subtype, get_message_content)
def get_and_fixup_unknown_message_content(msg):
# If we don't understand a message subtype, we are supposed to treat it as
# if it were application/octet-stream, per
# tools.ietf.org/html/rfc2046#section-5.2.4. Feedparser doesn't do that,
# so do our best to fix things up. Note that it is *not* appropriate to
# model message/partial content as Message objects, so they are handled
# here as well. (How to reassemble them is out of scope for this comment :)
return bytes(msg.get_payload(0))
raw_data_manager.add_get_handler('message',
get_and_fixup_unknown_message_content)
def _prepare_set(msg, maintype, subtype, headers):
msg['Content-Type'] = '/'.join((maintype, subtype))
if headers:
if not hasattr(headers[0], 'name'):
mp = msg.policy
headers = [mp.header_factory(*mp.header_source_parse([header]))
for header in headers]
try:
for header in headers:
if header.defects:
raise header.defects[0]
msg[header.name] = header
except email.errors.HeaderDefect as exc:
raise ValueError("Invalid header: {}".format(
header.fold(policy=msg.policy))) from exc
def _finalize_set(msg, disposition, filename, cid, params):
if disposition is None and filename is not None:
disposition = 'attachment'
if disposition is not None:
msg['Content-Disposition'] = disposition
if filename is not None:
msg.set_param('filename',
filename,
header='Content-Disposition',
replace=True)
if cid is not None:
msg['Content-ID'] = cid
if params is not None:
for key, value in params.items():
msg.set_param(key, value)
# XXX: This is a cleaned-up version of base64mime.body_encode. It would
# be nice to drop both this and quoprimime.body_encode in favor of
# enhanced binascii routines that accepted a max_line_length parameter.
def _encode_base64(data, max_line_length):
encoded_lines = []
unencoded_bytes_per_line = max_line_length * 3 // 4
for i in range(0, len(data), unencoded_bytes_per_line):
thisline = data[i:i+unencoded_bytes_per_line]
encoded_lines.append(binascii.b2a_base64(thisline).decode('ascii'))
return ''.join(encoded_lines)
def _encode_text(string, charset, cte, policy):
lines = string.encode(charset).splitlines()
linesep = policy.linesep.encode('ascii')
def embeded_body(lines): return linesep.join(lines) + linesep
def normal_body(lines): return b'\n'.join(lines) + b'\n'
if cte==None:
# Use heuristics to decide on the "best" encoding.
try:
return '7bit', normal_body(lines).decode('ascii')
except UnicodeDecodeError:
pass
if (policy.cte_type == '8bit' and
max(len(x) for x in lines) <= policy.max_line_length):
return '8bit', normal_body(lines).decode('ascii', 'surrogateescape')
sniff = embeded_body(lines[:10])
sniff_qp = quoprimime.body_encode(sniff.decode('latin-1'),
policy.max_line_length)
sniff_base64 = binascii.b2a_base64(sniff)
# This is a little unfair to qp; it includes lineseps, base64 doesn't.
if len(sniff_qp) > len(sniff_base64):
cte = 'base64'
else:
cte = 'quoted-printable'
if len(lines) <= 10:
return cte, sniff_qp
if cte == '7bit':
data = normal_body(lines).decode('ascii')
elif cte == '8bit':
data = normal_body(lines).decode('ascii', 'surrogateescape')
elif cte == 'quoted-printable':
data = quoprimime.body_encode(normal_body(lines).decode('latin-1'),
policy.max_line_length)
elif cte == 'base64':
data = _encode_base64(embeded_body(lines), policy.max_line_length)
else:
raise ValueError("Unknown content transfer encoding {}".format(cte))
return cte, data
def set_text_content(msg, string, subtype="plain", charset='utf-8', cte=None,
disposition=None, filename=None, cid=None,
params=None, headers=None):
_prepare_set(msg, 'text', subtype, headers)
cte, payload = _encode_text(string, charset, cte, msg.policy)
msg.set_payload(payload)
msg.set_param('charset',
email.charset.ALIASES.get(charset, charset),
replace=True)
msg['Content-Transfer-Encoding'] = cte
_finalize_set(msg, disposition, filename, cid, params)
raw_data_manager.add_set_handler(str, set_text_content)
def set_message_content(msg, message, subtype="rfc822", cte=None,
disposition=None, filename=None, cid=None,
params=None, headers=None):
if subtype == 'partial':
raise ValueError("message/partial is not supported for Message objects")
if subtype == 'rfc822':
if cte not in (None, '7bit', '8bit', 'binary'):
# http://tools.ietf.org/html/rfc2046#section-5.2.1 mandate.
raise ValueError(
"message/rfc822 parts do not support cte={}".format(cte))
# 8bit will get coerced on serialization if policy.cte_type='7bit'. We
# may end up claiming 8bit when it isn't needed, but the only negative
# result of that should be a gateway that needs to coerce to 7bit
# having to look through the whole embedded message to discover whether
# or not it actually has to do anything.
cte = '8bit' if cte is None else cte
elif subtype == 'external-body':
if cte not in (None, '7bit'):
# http://tools.ietf.org/html/rfc2046#section-5.2.3 mandate.
raise ValueError(
"message/external-body parts do not support cte={}".format(cte))
cte = '7bit'
elif cte is None:
# http://tools.ietf.org/html/rfc2046#section-5.2.4 says all future
# subtypes should be restricted to 7bit, so assume that.
cte = '7bit'
_prepare_set(msg, 'message', subtype, headers)
msg.set_payload([message])
msg['Content-Transfer-Encoding'] = cte
_finalize_set(msg, disposition, filename, cid, params)
raw_data_manager.add_set_handler(email.message.Message, set_message_content)
def set_bytes_content(msg, data, maintype, subtype, cte='base64',
disposition=None, filename=None, cid=None,
params=None, headers=None):
_prepare_set(msg, maintype, subtype, headers)
if cte == 'base64':
data = _encode_base64(data, max_line_length=msg.policy.max_line_length)
elif cte == 'quoted-printable':
# XXX: quoprimime.body_encode won't encode newline characters in data,
# so we can't use it. This means max_line_length is ignored. Another
# bug to fix later. (Note: encoders.quopri is broken on line ends.)
data = binascii.b2a_qp(data, istext=False, header=False, quotetabs=True)
data = data.decode('ascii')
elif cte == '7bit':
# Make sure it really is only ASCII. The early warning here seems
# worth the overhead...if you care write your own content manager :).
data.encode('ascii')
elif cte in ('8bit', 'binary'):
data = data.decode('ascii', 'surrogateescape')
msg.set_payload(data)
msg['Content-Transfer-Encoding'] = cte
_finalize_set(msg, disposition, filename, cid, params)
for typ in (bytes, bytearray, memoryview):
raw_data_manager.add_set_handler(typ, set_bytes_content)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Encodings and related functions."""
__all__ = [
'encode_7or8bit',
'encode_base64',
'encode_noop',
'encode_quopri',
]
from base64 import encodebytes as _bencode
from quopri import encodestring as _encodestring
def _qencode(s):
enc = _encodestring(s, quotetabs=True)
# Must encode spaces, which quopri.encodestring() doesn't do
return enc.replace(b' ', b'=20')
def encode_base64(msg):
"""Encode the message's payload in Base64.
Also, add an appropriate Content-Transfer-Encoding header.
"""
orig = msg.get_payload(decode=True)
encdata = str(_bencode(orig), 'ascii')
msg.set_payload(encdata)
msg['Content-Transfer-Encoding'] = 'base64'
def encode_quopri(msg):
"""Encode the message's payload in quoted-printable.
Also, add an appropriate Content-Transfer-Encoding header.
"""
orig = msg.get_payload(decode=True)
encdata = _qencode(orig)
msg.set_payload(encdata)
msg['Content-Transfer-Encoding'] = 'quoted-printable'
def encode_7or8bit(msg):
"""Set the Content-Transfer-Encoding header to 7bit or 8bit."""
orig = msg.get_payload(decode=True)
if orig is None:
# There's no payload. For backwards compatibility we use 7bit
msg['Content-Transfer-Encoding'] = '7bit'
return
# We play a trick to make this go fast. If decoding from ASCII succeeds,
# we know the data must be 7bit, otherwise treat it as 8bit.
try:
orig.decode('ascii')
except UnicodeError:
msg['Content-Transfer-Encoding'] = '8bit'
else:
msg['Content-Transfer-Encoding'] = '7bit'
def encode_noop(msg):
"""Do nothing."""
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""email package exception classes."""
class MessageError(Exception):
"""Base class for errors in the email package."""
class MessageParseError(MessageError):
"""Base class for message parsing errors."""
class HeaderParseError(MessageParseError):
"""Error while parsing headers."""
class BoundaryError(MessageParseError):
"""Couldn't find terminating boundary."""
class MultipartConversionError(MessageError, TypeError):
"""Conversion to a multipart is prohibited."""
class CharsetError(MessageError):
"""An illegal charset was given."""
# These are parsing defects which the parser was able to work around.
class MessageDefect(ValueError):
"""Base class for a message defect."""
def __init__(self, line=None):
if line is not None:
super().__init__(line)
self.line = line
class NoBoundaryInMultipartDefect(MessageDefect):
"""A message claimed to be a multipart but had no boundary parameter."""
class StartBoundaryNotFoundDefect(MessageDefect):
"""The claimed start boundary was never found."""
class CloseBoundaryNotFoundDefect(MessageDefect):
"""A start boundary was found, but not the corresponding close boundary."""
class FirstHeaderLineIsContinuationDefect(MessageDefect):
"""A message had a continuation line as its first header line."""
class MisplacedEnvelopeHeaderDefect(MessageDefect):
"""A 'Unix-from' header was found in the middle of a header block."""
class MissingHeaderBodySeparatorDefect(MessageDefect):
"""Found line with no leading whitespace and no colon before blank line."""
# XXX: backward compatibility, just in case (it was never emitted).
MalformedHeaderDefect = MissingHeaderBodySeparatorDefect
class MultipartInvariantViolationDefect(MessageDefect):
"""A message claimed to be a multipart but no subparts were found."""
class InvalidMultipartContentTransferEncodingDefect(MessageDefect):
"""An invalid content transfer encoding was set on the multipart itself."""
class UndecodableBytesDefect(MessageDefect):
"""Header contained bytes that could not be decoded"""
class InvalidBase64PaddingDefect(MessageDefect):
"""base64 encoded sequence had an incorrect length"""
class InvalidBase64CharactersDefect(MessageDefect):
"""base64 encoded sequence had characters not in base64 alphabet"""
# These errors are specific to header parsing.
class HeaderDefect(MessageDefect):
"""Base class for a header defect."""
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
class InvalidHeaderDefect(HeaderDefect):
"""Header is not valid, message gives details."""
class HeaderMissingRequiredValue(HeaderDefect):
"""A header that must have a value had none"""
class NonPrintableDefect(HeaderDefect):
"""ASCII characters outside the ascii-printable range found"""
def __init__(self, non_printables):
super().__init__(non_printables)
self.non_printables = non_printables
def __str__(self):
return ("the following ASCII non-printables found in header: "
"{}".format(self.non_printables))
class ObsoleteHeaderDefect(HeaderDefect):
"""Header uses syntax declared obsolete by RFC 5322"""
class NonASCIILocalPartDefect(HeaderDefect):
"""local_part contains non-ASCII characters"""
# This defect only occurs during unicode parsing, not when
# parsing messages decoded from binary.
# Copyright (C) 2004-2006 Python Software Foundation
# Authors: Baxter, Wouters and Warsaw
# Contact: [email protected]
"""FeedParser - An email feed parser.
The feed parser implements an interface for incrementally parsing an email
message, line by line. This has advantages for certain applications, such as
those reading email messages off a socket.
FeedParser.feed() is the primary interface for pushing new data into the
parser. It returns when there's nothing more it can do with the available
data. When you have no more data to push into the parser, call .close().
This completes the parsing and returns the root message object.
The other advantage of this parser is that it will never raise a parsing
exception. Instead, when it finds something unexpected, it adds a 'defect' to
the current message. Defects are just instances that live on the message
object's .defects attribute.
"""
__all__ = ['FeedParser', 'BytesFeedParser']
import re
from email import errors
from email import message
from email._policybase import compat32
NLCRE = re.compile('\r\n|\r|\n')
NLCRE_bol = re.compile('(\r\n|\r|\n)')
NLCRE_eol = re.compile('(\r\n|\r|\n)\Z')
NLCRE_crack = re.compile('(\r\n|\r|\n)')
# RFC 2822 $3.6.8 Optional fields. ftext is %d33-57 / %d59-126, Any character
# except controls, SP, and ":".
headerRE = re.compile(r'^(From |[\041-\071\073-\176]*:|[\t ])')
EMPTYSTRING = ''
NL = '\n'
NeedMoreData = object()
class BufferedSubFile(object):
"""A file-ish object that can have new data loaded into it.
You can also push and pop line-matching predicates onto a stack. When the
current predicate matches the current line, a false EOF response
(i.e. empty string) is returned instead. This lets the parser adhere to a
simple abstraction -- it parses until EOF closes the current message.
"""
def __init__(self):
# Chunks of the last partial line pushed into this object.
self._partial = []
# The list of full, pushed lines, in reverse order
self._lines = []
# The stack of false-EOF checking predicates.
self._eofstack = []
# A flag indicating whether the file has been closed or not.
self._closed = False
def push_eof_matcher(self, pred):
self._eofstack.append(pred)
def pop_eof_matcher(self):
return self._eofstack.pop()
def close(self):
# Don't forget any trailing partial line.
self.pushlines(''.join(self._partial).splitlines(True))
self._partial = []
self._closed = True
def readline(self):
if not self._lines:
if self._closed:
return ''
return NeedMoreData
# Pop the line off the stack and see if it matches the current
# false-EOF predicate.
line = self._lines.pop()
# RFC 2046, section 5.1.2 requires us to recognize outer level
# boundaries at any level of inner nesting. Do this, but be sure it's
# in the order of most to least nested.
for ateof in self._eofstack[::-1]:
if ateof(line):
# We're at the false EOF. But push the last line back first.
self._lines.append(line)
return ''
return line
def unreadline(self, line):
# Let the consumer push a line back into the buffer.
assert line is not NeedMoreData
self._lines.append(line)
def push(self, data):
"""Push some new data into this object."""
# Crack into lines, but preserve the linesep characters on the end of each
parts = data.splitlines(True)
if not parts or not parts[0].endswith(('\n', '\r')):
# No new complete lines, so just accumulate partials
self._partial += parts
return
if self._partial:
# If there are previous leftovers, complete them now
self._partial.append(parts[0])
parts[0:1] = ''.join(self._partial).splitlines(True)
del self._partial[:]
# If the last element of the list does not end in a newline, then treat
# it as a partial line. We only check for '\n' here because a line
# ending with '\r' might be a line that was split in the middle of a
# '\r\n' sequence (see bugs 1555570 and 1721862).
if not parts[-1].endswith('\n'):
self._partial = [parts.pop()]
self.pushlines(parts)
def pushlines(self, lines):
# Reverse and insert at the front of the lines.
self._lines[:0] = lines[::-1]
def __iter__(self):
return self
def __next__(self):
line = self.readline()
if line == '':
raise StopIteration
return line
class FeedParser:
"""A feed-style parser of email."""
def __init__(self, _factory=None, *, policy=compat32):
"""_factory is called with no arguments to create a new message obj
The policy keyword specifies a policy object that controls a number of
aspects of the parser's operation. The default policy maintains
backward compatibility.
"""
self.policy = policy
self._factory_kwds = lambda: {'policy': self.policy}
if _factory is None:
# What this should be:
#self._factory = policy.default_message_factory
# but, because we are post 3.4 feature freeze, fix with temp hack:
if self.policy is compat32:
self._factory = message.Message
else:
self._factory = message.EmailMessage
else:
self._factory = _factory
try:
_factory(policy=self.policy)
except TypeError:
# Assume this is an old-style factory
self._factory_kwds = lambda: {}
self._input = BufferedSubFile()
self._msgstack = []
self._parse = self._parsegen().__next__
self._cur = None
self._last = None
self._headersonly = False
# Non-public interface for supporting Parser's headersonly flag
def _set_headersonly(self):
self._headersonly = True
def feed(self, data):
"""Push more data into the parser."""
self._input.push(data)
self._call_parse()
def _call_parse(self):
try:
self._parse()
except StopIteration:
pass
def close(self):
"""Parse all remaining data and return the root message object."""
self._input.close()
self._call_parse()
root = self._pop_message()
assert not self._msgstack
# Look for final set of defects
if root.get_content_maintype() == 'multipart' \
and not root.is_multipart():
defect = errors.MultipartInvariantViolationDefect()
self.policy.handle_defect(root, defect)
return root
def _new_message(self):
msg = self._factory(**self._factory_kwds())
if self._cur and self._cur.get_content_type() == 'multipart/digest':
msg.set_default_type('message/rfc822')
if self._msgstack:
self._msgstack[-1].attach(msg)
self._msgstack.append(msg)
self._cur = msg
self._last = msg
def _pop_message(self):
retval = self._msgstack.pop()
if self._msgstack:
self._cur = self._msgstack[-1]
else:
self._cur = None
return retval
def _parsegen(self):
# Create a new message and start by parsing headers.
self._new_message()
headers = []
# Collect the headers, searching for a line that doesn't match the RFC
# 2822 header or continuation pattern (including an empty line).
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
if not headerRE.match(line):
# If we saw the RFC defined header/body separator
# (i.e. newline), just throw it away. Otherwise the line is
# part of the body so push it back.
if not NLCRE.match(line):
defect = errors.MissingHeaderBodySeparatorDefect()
self.policy.handle_defect(self._cur, defect)
self._input.unreadline(line)
break
headers.append(line)
# Done with the headers, so parse them and figure out what we're
# supposed to see in the body of the message.
self._parse_headers(headers)
# Headers-only parsing is a backwards compatibility hack, which was
# necessary in the older parser, which could raise errors. All
# remaining lines in the input are thrown into the message body.
if self._headersonly:
lines = []
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
if line == '':
break
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
return
if self._cur.get_content_type() == 'message/delivery-status':
# message/delivery-status contains blocks of headers separated by
# a blank line. We'll represent each header block as a separate
# nested message object, but the processing is a bit different
# than standard message/* types because there is no body for the
# nested messages. A blank line separates the subparts.
while True:
self._input.push_eof_matcher(NLCRE.match)
for retval in self._parsegen():
if retval is NeedMoreData:
yield NeedMoreData
continue
break
msg = self._pop_message()
# We need to pop the EOF matcher in order to tell if we're at
# the end of the current file, not the end of the last block
# of message headers.
self._input.pop_eof_matcher()
# The input stream must be sitting at the newline or at the
# EOF. We want to see if we're at the end of this subpart, so
# first consume the blank line, then test the next line to see
# if we're at this subpart's EOF.
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
break
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
break
if line == '':
break
# Not at EOF so this is a line we're going to need.
self._input.unreadline(line)
return
if self._cur.get_content_maintype() == 'message':
# The message claims to be a message/* type, then what follows is
# another RFC 2822 message.
for retval in self._parsegen():
if retval is NeedMoreData:
yield NeedMoreData
continue
break
self._pop_message()
return
if self._cur.get_content_maintype() == 'multipart':
boundary = self._cur.get_boundary()
if boundary is None:
# The message /claims/ to be a multipart but it has not
# defined a boundary. That's a problem which we'll handle by
# reading everything until the EOF and marking the message as
# defective.
defect = errors.NoBoundaryInMultipartDefect()
self.policy.handle_defect(self._cur, defect)
lines = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
return
# Make sure a valid content type was specified per RFC 2045:6.4.
if (self._cur.get('content-transfer-encoding', '8bit').lower()
not in ('7bit', '8bit', 'binary')):
defect = errors.InvalidMultipartContentTransferEncodingDefect()
self.policy.handle_defect(self._cur, defect)
# Create a line match predicate which matches the inter-part
# boundary as well as the end-of-multipart boundary. Don't push
# this onto the input stream until we've scanned past the
# preamble.
separator = '--' + boundary
boundaryre = re.compile(
'(?P<sep>' + re.escape(separator) +
r')(?P<end>--)?(?P<ws>[ \t]*)(?P<linesep>\r\n|\r|\n)?$')
capturing_preamble = True
preamble = []
linesep = False
close_boundary_seen = False
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
if line == '':
break
mo = boundaryre.match(line)
if mo:
# If we're looking at the end boundary, we're done with
# this multipart. If there was a newline at the end of
# the closing boundary, then we need to initialize the
# epilogue with the empty string (see below).
if mo.group('end'):
close_boundary_seen = True
linesep = mo.group('linesep')
break
# We saw an inter-part boundary. Were we in the preamble?
if capturing_preamble:
if preamble:
# According to RFC 2046, the last newline belongs
# to the boundary.
lastline = preamble[-1]
eolmo = NLCRE_eol.search(lastline)
if eolmo:
preamble[-1] = lastline[:-len(eolmo.group(0))]
self._cur.preamble = EMPTYSTRING.join(preamble)
capturing_preamble = False
self._input.unreadline(line)
continue
# We saw a boundary separating two parts. Consume any
# multiple boundary lines that may be following. Our
# interpretation of RFC 2046 BNF grammar does not produce
# body parts within such double boundaries.
while True:
line = self._input.readline()
if line is NeedMoreData:
yield NeedMoreData
continue
mo = boundaryre.match(line)
if not mo:
self._input.unreadline(line)
break
# Recurse to parse this subpart; the input stream points
# at the subpart's first line.
self._input.push_eof_matcher(boundaryre.match)
for retval in self._parsegen():
if retval is NeedMoreData:
yield NeedMoreData
continue
break
# Because of RFC 2046, the newline preceding the boundary
# separator actually belongs to the boundary, not the
# previous subpart's payload (or epilogue if the previous
# part is a multipart).
if self._last.get_content_maintype() == 'multipart':
epilogue = self._last.epilogue
if epilogue == '':
self._last.epilogue = None
elif epilogue is not None:
mo = NLCRE_eol.search(epilogue)
if mo:
end = len(mo.group(0))
self._last.epilogue = epilogue[:-end]
else:
payload = self._last._payload
if isinstance(payload, str):
mo = NLCRE_eol.search(payload)
if mo:
payload = payload[:-len(mo.group(0))]
self._last._payload = payload
self._input.pop_eof_matcher()
self._pop_message()
# Set the multipart up for newline cleansing, which will
# happen if we're in a nested multipart.
self._last = self._cur
else:
# I think we must be in the preamble
assert capturing_preamble
preamble.append(line)
# We've seen either the EOF or the end boundary. If we're still
# capturing the preamble, we never saw the start boundary. Note
# that as a defect and store the captured text as the payload.
if capturing_preamble:
defect = errors.StartBoundaryNotFoundDefect()
self.policy.handle_defect(self._cur, defect)
self._cur.set_payload(EMPTYSTRING.join(preamble))
epilogue = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
self._cur.epilogue = EMPTYSTRING.join(epilogue)
return
# If we're not processing the preamble, then we might have seen
# EOF without seeing that end boundary...that is also a defect.
if not close_boundary_seen:
defect = errors.CloseBoundaryNotFoundDefect()
self.policy.handle_defect(self._cur, defect)
return
# Everything from here to the EOF is epilogue. If the end boundary
# ended in a newline, we'll need to make sure the epilogue isn't
# None
if linesep:
epilogue = ['']
else:
epilogue = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
epilogue.append(line)
# Any CRLF at the front of the epilogue is not technically part of
# the epilogue. Also, watch out for an empty string epilogue,
# which means a single newline.
if epilogue:
firstline = epilogue[0]
bolmo = NLCRE_bol.match(firstline)
if bolmo:
epilogue[0] = firstline[len(bolmo.group(0)):]
self._cur.epilogue = EMPTYSTRING.join(epilogue)
return
# Otherwise, it's some non-multipart type, so the entire rest of the
# file contents becomes the payload.
lines = []
for line in self._input:
if line is NeedMoreData:
yield NeedMoreData
continue
lines.append(line)
self._cur.set_payload(EMPTYSTRING.join(lines))
def _parse_headers(self, lines):
# Passed a list of lines that make up the headers for the current msg
lastheader = ''
lastvalue = []
for lineno, line in enumerate(lines):
# Check for continuation
if line[0] in ' \t':
if not lastheader:
# The first line of the headers was a continuation. This
# is illegal, so let's note the defect, store the illegal
# line, and ignore it for purposes of headers.
defect = errors.FirstHeaderLineIsContinuationDefect(line)
self.policy.handle_defect(self._cur, defect)
continue
lastvalue.append(line)
continue
if lastheader:
self._cur.set_raw(*self.policy.header_source_parse(lastvalue))
lastheader, lastvalue = '', []
# Check for envelope header, i.e. unix-from
if line.startswith('From '):
if lineno == 0:
# Strip off the trailing newline
mo = NLCRE_eol.search(line)
if mo:
line = line[:-len(mo.group(0))]
self._cur.set_unixfrom(line)
continue
elif lineno == len(lines) - 1:
# Something looking like a unix-from at the end - it's
# probably the first line of the body, so push back the
# line and stop.
self._input.unreadline(line)
return
else:
# Weirdly placed unix-from line. Note this as a defect
# and ignore it.
defect = errors.MisplacedEnvelopeHeaderDefect(line)
self._cur.defects.append(defect)
continue
# Split the line on the colon separating field name from value.
# There will always be a colon, because if there wasn't the part of
# the parser that calls us would have started parsing the body.
i = line.find(':')
# If the colon is on the start of the line the header is clearly
# malformed, but we might be able to salvage the rest of the
# message. Track the error but keep going.
if i == 0:
defect = errors.InvalidHeaderDefect("Missing header name.")
self._cur.defects.append(defect)
continue
assert i>0, "_parse_headers fed line with no : and no leading WS"
lastheader = line[:i]
lastvalue = [line]
# Done with all the lines, so handle the last header.
if lastheader:
self._cur.set_raw(*self.policy.header_source_parse(lastvalue))
class BytesFeedParser(FeedParser):
"""Like FeedParser, but feed accepts bytes."""
def feed(self, data):
super().feed(data.decode('ascii', 'surrogateescape'))
# Copyright (C) 2001-2010 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Classes to generate plain text from a message object tree."""
__all__ = ['Generator', 'DecodedGenerator', 'BytesGenerator']
import re
import sys
import time
import random
from copy import deepcopy
from io import StringIO, BytesIO
from email.utils import _has_surrogates
UNDERSCORE = '_'
NL = '\n' # XXX: no longer used by the code below.
fcre = re.compile(r'^From ', re.MULTILINE)
class Generator:
"""Generates output from a Message object tree.
This basic generator writes the message to the given file object as plain
text.
"""
#
# Public interface
#
def __init__(self, outfp, mangle_from_=True, maxheaderlen=None, *,
policy=None):
"""Create the generator for message flattening.
outfp is the output file-like object for writing the message to. It
must have a write() method.
Optional mangle_from_ is a flag that, when True (the default), escapes
From_ lines in the body of the message by putting a `>' in front of
them.
Optional maxheaderlen specifies the longest length for a non-continued
header. When a header line is longer (in characters, with tabs
expanded to 8 spaces) than maxheaderlen, the header will split as
defined in the Header class. Set maxheaderlen to zero to disable
header wrapping. The default is 78, as recommended (but not required)
by RFC 2822.
The policy keyword specifies a policy object that controls a number of
aspects of the generator's operation. If no policy is specified,
the policy associated with the Message object passed to the
flatten method is used.
"""
self._fp = outfp
self._mangle_from_ = mangle_from_
self.maxheaderlen = maxheaderlen
self.policy = policy
def write(self, s):
# Just delegate to the file object
self._fp.write(s)
def flatten(self, msg, unixfrom=False, linesep=None):
r"""Print the message object tree rooted at msg to the output file
specified when the Generator instance was created.
unixfrom is a flag that forces the printing of a Unix From_ delimiter
before the first object in the message tree. If the original message
has no From_ delimiter, a `standard' one is crafted. By default, this
is False to inhibit the printing of any From_ delimiter.
Note that for subobjects, no From_ line is printed.
linesep specifies the characters used to indicate a new line in
the output. The default value is determined by the policy specified
when the Generator instance was created or, if none was specified,
from the policy associated with the msg.
"""
# We use the _XXX constants for operating on data that comes directly
# from the msg, and _encoded_XXX constants for operating on data that
# has already been converted (to bytes in the BytesGenerator) and
# inserted into a temporary buffer.
policy = msg.policy if self.policy is None else self.policy
if linesep is not None:
policy = policy.clone(linesep=linesep)
if self.maxheaderlen is not None:
policy = policy.clone(max_line_length=self.maxheaderlen)
self._NL = policy.linesep
self._encoded_NL = self._encode(self._NL)
self._EMPTY = ''
self._encoded_EMTPY = self._encode('')
# Because we use clone (below) when we recursively process message
# subparts, and because clone uses the computed policy (not None),
# submessages will automatically get set to the computed policy when
# they are processed by this code.
old_gen_policy = self.policy
old_msg_policy = msg.policy
try:
self.policy = policy
msg.policy = policy
if unixfrom:
ufrom = msg.get_unixfrom()
if not ufrom:
ufrom = 'From nobody ' + time.ctime(time.time())
self.write(ufrom + self._NL)
self._write(msg)
finally:
self.policy = old_gen_policy
msg.policy = old_msg_policy
def clone(self, fp):
"""Clone this generator with the exact same options."""
return self.__class__(fp,
self._mangle_from_,
None, # Use policy setting, which we've adjusted
policy=self.policy)
#
# Protected interface - undocumented ;/
#
# Note that we use 'self.write' when what we are writing is coming from
# the source, and self._fp.write when what we are writing is coming from a
# buffer (because the Bytes subclass has already had a chance to transform
# the data in its write method in that case). This is an entirely
# pragmatic split determined by experiment; we could be more general by
# always using write and having the Bytes subclass write method detect when
# it has already transformed the input; but, since this whole thing is a
# hack anyway this seems good enough.
# Similarly, we have _XXX and _encoded_XXX attributes that are used on
# source and buffer data, respectively.
_encoded_EMPTY = ''
def _new_buffer(self):
# BytesGenerator overrides this to return BytesIO.
return StringIO()
def _encode(self, s):
# BytesGenerator overrides this to encode strings to bytes.
return s
def _write_lines(self, lines):
# We have to transform the line endings.
if not lines:
return
lines = lines.splitlines(True)
for line in lines[:-1]:
self.write(line.rstrip('\r\n'))
self.write(self._NL)
laststripped = lines[-1].rstrip('\r\n')
self.write(laststripped)
if len(lines[-1]) != len(laststripped):
self.write(self._NL)
def _write(self, msg):
# We can't write the headers yet because of the following scenario:
# say a multipart message includes the boundary string somewhere in
# its body. We'd have to calculate the new boundary /before/ we write
# the headers so that we can write the correct Content-Type:
# parameter.
#
# The way we do this, so as to make the _handle_*() methods simpler,
# is to cache any subpart writes into a buffer. The we write the
# headers and the buffer contents. That way, subpart handlers can
# Do The Right Thing, and can still modify the Content-Type: header if
# necessary.
oldfp = self._fp
try:
self._munge_cte = None
self._fp = sfp = self._new_buffer()
self._dispatch(msg)
finally:
self._fp = oldfp
munge_cte = self._munge_cte
del self._munge_cte
# If we munged the cte, copy the message again and re-fix the CTE.
if munge_cte:
msg = deepcopy(msg)
msg.replace_header('content-transfer-encoding', munge_cte[0])
msg.replace_header('content-type', munge_cte[1])
# Write the headers. First we see if the message object wants to
# handle that itself. If not, we'll do it generically.
meth = getattr(msg, '_write_headers', None)
if meth is None:
self._write_headers(msg)
else:
meth(self)
self._fp.write(sfp.getvalue())
def _dispatch(self, msg):
# Get the Content-Type: for the message, then try to dispatch to
# self._handle_<maintype>_<subtype>(). If there's no handler for the
# full MIME type, then dispatch to self._handle_<maintype>(). If
# that's missing too, then dispatch to self._writeBody().
main = msg.get_content_maintype()
sub = msg.get_content_subtype()
specific = UNDERSCORE.join((main, sub)).replace('-', '_')
meth = getattr(self, '_handle_' + specific, None)
if meth is None:
generic = main.replace('-', '_')
meth = getattr(self, '_handle_' + generic, None)
if meth is None:
meth = self._writeBody
meth(msg)
#
# Default handlers
#
def _write_headers(self, msg):
for h, v in msg.raw_items():
self.write(self.policy.fold(h, v))
# A blank line always separates headers from body
self.write(self._NL)
#
# Handlers for writing types and subtypes
#
def _handle_text(self, msg):
payload = msg.get_payload()
if payload is None:
return
if not isinstance(payload, str):
raise TypeError('string payload expected: %s' % type(payload))
if _has_surrogates(msg._payload):
charset = msg.get_param('charset')
if charset is not None:
# XXX: This copy stuff is an ugly hack to avoid modifying the
# existing message.
msg = deepcopy(msg)
del msg['content-transfer-encoding']
msg.set_payload(payload, charset)
payload = msg.get_payload()
self._munge_cte = (msg['content-transfer-encoding'],
msg['content-type'])
if self._mangle_from_:
payload = fcre.sub('>From ', payload)
self._write_lines(payload)
# Default body handler
_writeBody = _handle_text
def _handle_multipart(self, msg):
# The trick here is to write out each part separately, merge them all
# together, and then make sure that the boundary we've chosen isn't
# present in the payload.
msgtexts = []
subparts = msg.get_payload()
if subparts is None:
subparts = []
elif isinstance(subparts, str):
# e.g. a non-strict parse of a message with no starting boundary.
self.write(subparts)
return
elif not isinstance(subparts, list):
# Scalar payload
subparts = [subparts]
for part in subparts:
s = self._new_buffer()
g = self.clone(s)
g.flatten(part, unixfrom=False, linesep=self._NL)
msgtexts.append(s.getvalue())
# BAW: What about boundaries that are wrapped in double-quotes?
boundary = msg.get_boundary()
if not boundary:
# Create a boundary that doesn't appear in any of the
# message texts.
alltext = self._encoded_NL.join(msgtexts)
boundary = self._make_boundary(alltext)
msg.set_boundary(boundary)
# If there's a preamble, write it out, with a trailing CRLF
if msg.preamble is not None:
if self._mangle_from_:
preamble = fcre.sub('>From ', msg.preamble)
else:
preamble = msg.preamble
self._write_lines(preamble)
self.write(self._NL)
# dash-boundary transport-padding CRLF
self.write('--' + boundary + self._NL)
# body-part
if msgtexts:
self._fp.write(msgtexts.pop(0))
# *encapsulation
# --> delimiter transport-padding
# --> CRLF body-part
for body_part in msgtexts:
# delimiter transport-padding CRLF
self.write(self._NL + '--' + boundary + self._NL)
# body-part
self._fp.write(body_part)
# close-delimiter transport-padding
self.write(self._NL + '--' + boundary + '--' + self._NL)
if msg.epilogue is not None:
if self._mangle_from_:
epilogue = fcre.sub('>From ', msg.epilogue)
else:
epilogue = msg.epilogue
self._write_lines(epilogue)
def _handle_multipart_signed(self, msg):
# The contents of signed parts has to stay unmodified in order to keep
# the signature intact per RFC1847 2.1, so we disable header wrapping.
# RDM: This isn't enough to completely preserve the part, but it helps.
p = self.policy
self.policy = p.clone(max_line_length=0)
try:
self._handle_multipart(msg)
finally:
self.policy = p
def _handle_message_delivery_status(self, msg):
# We can't just write the headers directly to self's file object
# because this will leave an extra newline between the last header
# block and the boundary. Sigh.
blocks = []
for part in msg.get_payload():
s = self._new_buffer()
g = self.clone(s)
g.flatten(part, unixfrom=False, linesep=self._NL)
text = s.getvalue()
lines = text.split(self._encoded_NL)
# Strip off the unnecessary trailing empty line
if lines and lines[-1] == self._encoded_EMPTY:
blocks.append(self._encoded_NL.join(lines[:-1]))
else:
blocks.append(text)
# Now join all the blocks with an empty line. This has the lovely
# effect of separating each block with an empty line, but not adding
# an extra one after the last one.
self._fp.write(self._encoded_NL.join(blocks))
def _handle_message(self, msg):
s = self._new_buffer()
g = self.clone(s)
# The payload of a message/rfc822 part should be a multipart sequence
# of length 1. The zeroth element of the list should be the Message
# object for the subpart. Extract that object, stringify it, and
# write it out.
# Except, it turns out, when it's a string instead, which happens when
# and only when HeaderParser is used on a message of mime type
# message/rfc822. Such messages are generated by, for example,
# Groupwise when forwarding unadorned messages. (Issue 7970.) So
# in that case we just emit the string body.
payload = msg._payload
if isinstance(payload, list):
g.flatten(msg.get_payload(0), unixfrom=False, linesep=self._NL)
payload = s.getvalue()
else:
payload = self._encode(payload)
self._fp.write(payload)
# This used to be a module level function; we use a classmethod for this
# and _compile_re so we can continue to provide the module level function
# for backward compatibility by doing
# _make_boundary = Generator._make_boundary
# at the end of the module. It *is* internal, so we could drop that...
@classmethod
def _make_boundary(cls, text=None):
# Craft a random boundary. If text is given, ensure that the chosen
# boundary doesn't appear in the text.
token = random.randrange(sys.maxsize)
boundary = ('=' * 15) + (_fmt % token) + '=='
if text is None:
return boundary
b = boundary
counter = 0
while True:
cre = cls._compile_re('^--' + re.escape(b) + '(--)?$', re.MULTILINE)
if not cre.search(text):
break
b = boundary + '.' + str(counter)
counter += 1
return b
@classmethod
def _compile_re(cls, s, flags):
return re.compile(s, flags)
class BytesGenerator(Generator):
"""Generates a bytes version of a Message object tree.
Functionally identical to the base Generator except that the output is
bytes and not string. When surrogates were used in the input to encode
bytes, these are decoded back to bytes for output. If the policy has
cte_type set to 7bit, then the message is transformed such that the
non-ASCII bytes are properly content transfer encoded, using the charset
unknown-8bit.
The outfp object must accept bytes in its write method.
"""
# Bytes versions of this constant for use in manipulating data from
# the BytesIO buffer.
_encoded_EMPTY = b''
def write(self, s):
self._fp.write(s.encode('ascii', 'surrogateescape'))
def _new_buffer(self):
return BytesIO()
def _encode(self, s):
return s.encode('ascii')
def _write_headers(self, msg):
# This is almost the same as the string version, except for handling
# strings with 8bit bytes.
for h, v in msg.raw_items():
self._fp.write(self.policy.fold_binary(h, v))
# A blank line always separates headers from body
self.write(self._NL)
def _handle_text(self, msg):
# If the string has surrogates the original source was bytes, so
# just write it back out.
if msg._payload is None:
return
if _has_surrogates(msg._payload) and not self.policy.cte_type=='7bit':
if self._mangle_from_:
msg._payload = fcre.sub(">From ", msg._payload)
self._write_lines(msg._payload)
else:
super(BytesGenerator,self)._handle_text(msg)
# Default body handler
_writeBody = _handle_text
@classmethod
def _compile_re(cls, s, flags):
return re.compile(s.encode('ascii'), flags)
_FMT = '[Non-text (%(type)s) part of message omitted, filename %(filename)s]'
class DecodedGenerator(Generator):
"""Generates a text representation of a message.
Like the Generator base class, except that non-text parts are substituted
with a format string representing the part.
"""
def __init__(self, outfp, mangle_from_=True, maxheaderlen=78, fmt=None):
"""Like Generator.__init__() except that an additional optional
argument is allowed.
Walks through all subparts of a message. If the subpart is of main
type `text', then it prints the decoded payload of the subpart.
Otherwise, fmt is a format string that is used instead of the message
payload. fmt is expanded with the following keywords (in
%(keyword)s format):
type : Full MIME type of the non-text part
maintype : Main MIME type of the non-text part
subtype : Sub-MIME type of the non-text part
filename : Filename of the non-text part
description: Description associated with the non-text part
encoding : Content transfer encoding of the non-text part
The default value for fmt is None, meaning
[Non-text (%(type)s) part of message omitted, filename %(filename)s]
"""
Generator.__init__(self, outfp, mangle_from_, maxheaderlen)
if fmt is None:
self._fmt = _FMT
else:
self._fmt = fmt
def _dispatch(self, msg):
for part in msg.walk():
maintype = part.get_content_maintype()
if maintype == 'text':
print(part.get_payload(decode=False), file=self)
elif maintype == 'multipart':
# Just skip this
pass
else:
print(self._fmt % {
'type' : part.get_content_type(),
'maintype' : part.get_content_maintype(),
'subtype' : part.get_content_subtype(),
'filename' : part.get_filename('[no filename]'),
'description': part.get('Content-Description',
'[no description]'),
'encoding' : part.get('Content-Transfer-Encoding',
'[no encoding]'),
}, file=self)
# Helper used by Generator._make_boundary
_width = len(repr(sys.maxsize-1))
_fmt = '%%0%dd' % _width
# Backward compatibility
_make_boundary = Generator._make_boundary
# Copyright (C) 2002-2007 Python Software Foundation
# Author: Ben Gertzfield, Barry Warsaw
# Contact: [email protected]
"""Header encoding and decoding functionality."""
__all__ = [
'Header',
'decode_header',
'make_header',
]
import re
import binascii
import email.quoprimime
import email.base64mime
from email.errors import HeaderParseError
from email import charset as _charset
Charset = _charset.Charset
NL = '\n'
SPACE = ' '
BSPACE = b' '
SPACE8 = ' ' * 8
EMPTYSTRING = ''
MAXLINELEN = 78
FWS = ' \t'
USASCII = Charset('us-ascii')
UTF8 = Charset('utf-8')
# Match encoded-word strings in the form =?charset?q?Hello_World?=
ecre = re.compile(r'''
=\? # literal =?
(?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
\? # literal ?
(?P<encoding>[qb]) # either a "q" or a "b", case insensitive
\? # literal ?
(?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
\?= # literal ?=
''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
# Field name regexp, including trailing colon, but not separating whitespace,
# according to RFC 2822. Character range is from tilde to exclamation mark.
# For use with .match()
fcre = re.compile(r'[\041-\176]+:$')
# Find a header embedded in a putative header value. Used to check for
# header injection attack.
_embeded_header = re.compile(r'\n[^ \t]+:')
# Helpers
_max_append = email.quoprimime._max_append
def decode_header(header):
"""Decode a message header value without converting charset.
Returns a list of (string, charset) pairs containing each of the decoded
parts of the header. Charset is None for non-encoded parts of the header,
otherwise a lower-case string containing the name of the character set
specified in the encoded string.
header may be a string that may or may not contain RFC2047 encoded words,
or it may be a Header object.
An email.errors.HeaderParseError may be raised when certain decoding error
occurs (e.g. a base64 decoding exception).
"""
# If it is a Header object, we can just return the encoded chunks.
if hasattr(header, '_chunks'):
return [(_charset._encode(string, str(charset)), str(charset))
for string, charset in header._chunks]
# If no encoding, just return the header with no charset.
if not ecre.search(header):
return [(header, None)]
# First step is to parse all the encoded parts into triplets of the form
# (encoded_string, encoding, charset). For unencoded strings, the last
# two parts will be None.
words = []
for line in header.splitlines():
parts = ecre.split(line)
first = True
while parts:
unencoded = parts.pop(0)
if first:
unencoded = unencoded.lstrip()
first = False
if unencoded:
words.append((unencoded, None, None))
if parts:
charset = parts.pop(0).lower()
encoding = parts.pop(0).lower()
encoded = parts.pop(0)
words.append((encoded, encoding, charset))
# Now loop over words and remove words that consist of whitespace
# between two encoded strings.
droplist = []
for n, w in enumerate(words):
if n>1 and w[1] and words[n-2][1] and words[n-1][0].isspace():
droplist.append(n-1)
for d in reversed(droplist):
del words[d]
# The next step is to decode each encoded word by applying the reverse
# base64 or quopri transformation. decoded_words is now a list of the
# form (decoded_word, charset).
decoded_words = []
for encoded_string, encoding, charset in words:
if encoding is None:
# This is an unencoded word.
decoded_words.append((encoded_string, charset))
elif encoding == 'q':
word = email.quoprimime.header_decode(encoded_string)
decoded_words.append((word, charset))
elif encoding == 'b':
paderr = len(encoded_string) % 4 # Postel's law: add missing padding
if paderr:
encoded_string += '==='[:4 - paderr]
try:
word = email.base64mime.decode(encoded_string)
except binascii.Error:
raise HeaderParseError('Base64 decoding error')
else:
decoded_words.append((word, charset))
else:
raise AssertionError('Unexpected encoding: ' + encoding)
# Now convert all words to bytes and collapse consecutive runs of
# similarly encoded words.
collapsed = []
last_word = last_charset = None
for word, charset in decoded_words:
if isinstance(word, str):
word = bytes(word, 'raw-unicode-escape')
if last_word is None:
last_word = word
last_charset = charset
elif charset != last_charset:
collapsed.append((last_word, last_charset))
last_word = word
last_charset = charset
elif last_charset is None:
last_word += BSPACE + word
else:
last_word += word
collapsed.append((last_word, last_charset))
return collapsed
def make_header(decoded_seq, maxlinelen=None, header_name=None,
continuation_ws=' '):
"""Create a Header from a sequence of pairs as returned by decode_header()
decode_header() takes a header value string and returns a sequence of
pairs of the format (decoded_string, charset) where charset is the string
name of the character set.
This function takes one of those sequence of pairs and returns a Header
instance. Optional maxlinelen, header_name, and continuation_ws are as in
the Header constructor.
"""
h = Header(maxlinelen=maxlinelen, header_name=header_name,
continuation_ws=continuation_ws)
for s, charset in decoded_seq:
# None means us-ascii but we can simply pass it on to h.append()
if charset is not None and not isinstance(charset, Charset):
charset = Charset(charset)
h.append(s, charset)
return h
class Header:
def __init__(self, s=None, charset=None,
maxlinelen=None, header_name=None,
continuation_ws=' ', errors='strict'):
"""Create a MIME-compliant header that can contain many character sets.
Optional s is the initial header value. If None, the initial header
value is not set. You can later append to the header with .append()
method calls. s may be a byte string or a Unicode string, but see the
.append() documentation for semantics.
Optional charset serves two purposes: it has the same meaning as the
charset argument to the .append() method. It also sets the default
character set for all subsequent .append() calls that omit the charset
argument. If charset is not provided in the constructor, the us-ascii
charset is used both as s's initial charset and as the default for
subsequent .append() calls.
The maximum line length can be specified explicitly via maxlinelen. For
splitting the first line to a shorter value (to account for the field
header which isn't included in s, e.g. `Subject') pass in the name of
the field in header_name. The default maxlinelen is 78 as recommended
by RFC 2822.
continuation_ws must be RFC 2822 compliant folding whitespace (usually
either a space or a hard tab) which will be prepended to continuation
lines.
errors is passed through to the .append() call.
"""
if charset is None:
charset = USASCII
elif not isinstance(charset, Charset):
charset = Charset(charset)
self._charset = charset
self._continuation_ws = continuation_ws
self._chunks = []
if s is not None:
self.append(s, charset, errors)
if maxlinelen is None:
maxlinelen = MAXLINELEN
self._maxlinelen = maxlinelen
if header_name is None:
self._headerlen = 0
else:
# Take the separating colon and space into account.
self._headerlen = len(header_name) + 2
def __str__(self):
"""Return the string value of the header."""
self._normalize()
uchunks = []
lastcs = None
lastspace = None
for string, charset in self._chunks:
# We must preserve spaces between encoded and non-encoded word
# boundaries, which means for us we need to add a space when we go
# from a charset to None/us-ascii, or from None/us-ascii to a
# charset. Only do this for the second and subsequent chunks.
# Don't add a space if the None/us-ascii string already has
# a space (trailing or leading depending on transition)
nextcs = charset
if nextcs == _charset.UNKNOWN8BIT:
original_bytes = string.encode('ascii', 'surrogateescape')
string = original_bytes.decode('ascii', 'replace')
if uchunks:
hasspace = string and self._nonctext(string[0])
if lastcs not in (None, 'us-ascii'):
if nextcs in (None, 'us-ascii') and not hasspace:
uchunks.append(SPACE)
nextcs = None
elif nextcs not in (None, 'us-ascii') and not lastspace:
uchunks.append(SPACE)
lastspace = string and self._nonctext(string[-1])
lastcs = nextcs
uchunks.append(string)
return EMPTYSTRING.join(uchunks)
# Rich comparison operators for equality only. BAW: does it make sense to
# have or explicitly disable <, <=, >, >= operators?
def __eq__(self, other):
# other may be a Header or a string. Both are fine so coerce
# ourselves to a unicode (of the unencoded header value), swap the
# args and do another comparison.
return other == str(self)
def __ne__(self, other):
return not self == other
def append(self, s, charset=None, errors='strict'):
"""Append a string to the MIME header.
Optional charset, if given, should be a Charset instance or the name
of a character set (which will be converted to a Charset instance). A
value of None (the default) means that the charset given in the
constructor is used.
s may be a byte string or a Unicode string. If it is a byte string
(i.e. isinstance(s, str) is false), then charset is the encoding of
that byte string, and a UnicodeError will be raised if the string
cannot be decoded with that charset. If s is a Unicode string, then
charset is a hint specifying the character set of the characters in
the string. In either case, when producing an RFC 2822 compliant
header using RFC 2047 rules, the string will be encoded using the
output codec of the charset. If the string cannot be encoded to the
output codec, a UnicodeError will be raised.
Optional `errors' is passed as the errors argument to the decode
call if s is a byte string.
"""
if charset is None:
charset = self._charset
elif not isinstance(charset, Charset):
charset = Charset(charset)
if not isinstance(s, str):
input_charset = charset.input_codec or 'us-ascii'
if input_charset == _charset.UNKNOWN8BIT:
s = s.decode('us-ascii', 'surrogateescape')
else:
s = s.decode(input_charset, errors)
# Ensure that the bytes we're storing can be decoded to the output
# character set, otherwise an early error is raised.
output_charset = charset.output_codec or 'us-ascii'
if output_charset != _charset.UNKNOWN8BIT:
try:
s.encode(output_charset, errors)
except UnicodeEncodeError:
if output_charset!='us-ascii':
raise
charset = UTF8
self._chunks.append((s, charset))
def _nonctext(self, s):
"""True if string s is not a ctext character of RFC822.
"""
return s.isspace() or s in ('(', ')', '\\')
def encode(self, splitchars=';, \t', maxlinelen=None, linesep='\n'):
r"""Encode a message header into an RFC-compliant format.
There are many issues involved in converting a given string for use in
an email header. Only certain character sets are readable in most
email clients, and as header strings can only contain a subset of
7-bit ASCII, care must be taken to properly convert and encode (with
Base64 or quoted-printable) header strings. In addition, there is a
75-character length limit on any given encoded header field, so
line-wrapping must be performed, even with double-byte character sets.
Optional maxlinelen specifies the maximum length of each generated
line, exclusive of the linesep string. Individual lines may be longer
than maxlinelen if a folding point cannot be found. The first line
will be shorter by the length of the header name plus ": " if a header
name was specified at Header construction time. The default value for
maxlinelen is determined at header construction time.
Optional splitchars is a string containing characters which should be
given extra weight by the splitting algorithm during normal header
wrapping. This is in very rough support of RFC 2822's `higher level
syntactic breaks': split points preceded by a splitchar are preferred
during line splitting, with the characters preferred in the order in
which they appear in the string. Space and tab may be included in the
string to indicate whether preference should be given to one over the
other as a split point when other split chars do not appear in the line
being split. Splitchars does not affect RFC 2047 encoded lines.
Optional linesep is a string to be used to separate the lines of
the value. The default value is the most useful for typical
Python applications, but it can be set to \r\n to produce RFC-compliant
line separators when needed.
"""
self._normalize()
if maxlinelen is None:
maxlinelen = self._maxlinelen
# A maxlinelen of 0 means don't wrap. For all practical purposes,
# choosing a huge number here accomplishes that and makes the
# _ValueFormatter algorithm much simpler.
if maxlinelen == 0:
maxlinelen = 1000000
formatter = _ValueFormatter(self._headerlen, maxlinelen,
self._continuation_ws, splitchars)
lastcs = None
hasspace = lastspace = None
for string, charset in self._chunks:
if hasspace is not None:
hasspace = string and self._nonctext(string[0])
if lastcs not in (None, 'us-ascii'):
if not hasspace or charset not in (None, 'us-ascii'):
formatter.add_transition()
elif charset not in (None, 'us-ascii') and not lastspace:
formatter.add_transition()
lastspace = string and self._nonctext(string[-1])
lastcs = charset
hasspace = False
lines = string.splitlines()
if lines:
formatter.feed('', lines[0], charset)
else:
formatter.feed('', '', charset)
for line in lines[1:]:
formatter.newline()
if charset.header_encoding is not None:
formatter.feed(self._continuation_ws, ' ' + line.lstrip(),
charset)
else:
sline = line.lstrip()
fws = line[:len(line)-len(sline)]
formatter.feed(fws, sline, charset)
if len(lines) > 1:
formatter.newline()
if self._chunks:
formatter.add_transition()
value = formatter._str(linesep)
if _embeded_header.search(value):
raise HeaderParseError("header value appears to contain "
"an embedded header: {!r}".format(value))
return value
def _normalize(self):
# Step 1: Normalize the chunks so that all runs of identical charsets
# get collapsed into a single unicode string.
chunks = []
last_charset = None
last_chunk = []
for string, charset in self._chunks:
if charset == last_charset:
last_chunk.append(string)
else:
if last_charset is not None:
chunks.append((SPACE.join(last_chunk), last_charset))
last_chunk = [string]
last_charset = charset
if last_chunk:
chunks.append((SPACE.join(last_chunk), last_charset))
self._chunks = chunks
class _ValueFormatter:
def __init__(self, headerlen, maxlen, continuation_ws, splitchars):
self._maxlen = maxlen
self._continuation_ws = continuation_ws
self._continuation_ws_len = len(continuation_ws)
self._splitchars = splitchars
self._lines = []
self._current_line = _Accumulator(headerlen)
def _str(self, linesep):
self.newline()
return linesep.join(self._lines)
def __str__(self):
return self._str(NL)
def newline(self):
end_of_line = self._current_line.pop()
if end_of_line != (' ', ''):
self._current_line.push(*end_of_line)
if len(self._current_line) > 0:
if self._current_line.is_onlyws():
self._lines[-1] += str(self._current_line)
else:
self._lines.append(str(self._current_line))
self._current_line.reset()
def add_transition(self):
self._current_line.push(' ', '')
def feed(self, fws, string, charset):
# If the charset has no header encoding (i.e. it is an ASCII encoding)
# then we must split the header at the "highest level syntactic break"
# possible. Note that we don't have a lot of smarts about field
# syntax; we just try to break on semi-colons, then commas, then
# whitespace. Eventually, this should be pluggable.
if charset.header_encoding is None:
self._ascii_split(fws, string, self._splitchars)
return
# Otherwise, we're doing either a Base64 or a quoted-printable
# encoding which means we don't need to split the line on syntactic
# breaks. We can basically just find enough characters to fit on the
# current line, minus the RFC 2047 chrome. What makes this trickier
# though is that we have to split at octet boundaries, not character
# boundaries but it's only safe to split at character boundaries so at
# best we can only get close.
encoded_lines = charset.header_encode_lines(string, self._maxlengths())
# The first element extends the current line, but if it's None then
# nothing more fit on the current line so start a new line.
try:
first_line = encoded_lines.pop(0)
except IndexError:
# There are no encoded lines, so we're done.
return
if first_line is not None:
self._append_chunk(fws, first_line)
try:
last_line = encoded_lines.pop()
except IndexError:
# There was only one line.
return
self.newline()
self._current_line.push(self._continuation_ws, last_line)
# Everything else are full lines in themselves.
for line in encoded_lines:
self._lines.append(self._continuation_ws + line)
def _maxlengths(self):
# The first line's length.
yield self._maxlen - len(self._current_line)
while True:
yield self._maxlen - self._continuation_ws_len
def _ascii_split(self, fws, string, splitchars):
# The RFC 2822 header folding algorithm is simple in principle but
# complex in practice. Lines may be folded any place where "folding
# white space" appears by inserting a linesep character in front of the
# FWS. The complication is that not all spaces or tabs qualify as FWS,
# and we are also supposed to prefer to break at "higher level
# syntactic breaks". We can't do either of these without intimate
# knowledge of the structure of structured headers, which we don't have
# here. So the best we can do here is prefer to break at the specified
# splitchars, and hope that we don't choose any spaces or tabs that
# aren't legal FWS. (This is at least better than the old algorithm,
# where we would sometimes *introduce* FWS after a splitchar, or the
# algorithm before that, where we would turn all white space runs into
# single spaces or tabs.)
parts = re.split("(["+FWS+"]+)", fws+string)
if parts[0]:
parts[:0] = ['']
else:
parts.pop(0)
for fws, part in zip(*[iter(parts)]*2):
self._append_chunk(fws, part)
def _append_chunk(self, fws, string):
self._current_line.push(fws, string)
if len(self._current_line) > self._maxlen:
# Find the best split point, working backward from the end.
# There might be none, on a long first line.
for ch in self._splitchars:
for i in range(self._current_line.part_count()-1, 0, -1):
if ch.isspace():
fws = self._current_line[i][0]
if fws and fws[0]==ch:
break
prevpart = self._current_line[i-1][1]
if prevpart and prevpart[-1]==ch:
break
else:
continue
break
else:
fws, part = self._current_line.pop()
if self._current_line._initial_size > 0:
# There will be a header, so leave it on a line by itself.
self.newline()
if not fws:
# We don't use continuation_ws here because the whitespace
# after a header should always be a space.
fws = ' '
self._current_line.push(fws, part)
return
remainder = self._current_line.pop_from(i)
self._lines.append(str(self._current_line))
self._current_line.reset(remainder)
class _Accumulator(list):
def __init__(self, initial_size=0):
self._initial_size = initial_size
super().__init__()
def push(self, fws, string):
self.append((fws, string))
def pop_from(self, i=0):
popped = self[i:]
self[i:] = []
return popped
def pop(self):
if self.part_count()==0:
return ('', '')
return super().pop()
def __len__(self):
return sum((len(fws)+len(part) for fws, part in self),
self._initial_size)
def __str__(self):
return EMPTYSTRING.join((EMPTYSTRING.join((fws, part))
for fws, part in self))
def reset(self, startval=None):
if startval is None:
startval = []
self[:] = startval
self._initial_size = 0
def is_onlyws(self):
return self._initial_size==0 and (not self or str(self).isspace())
def part_count(self):
return super().__len__()
"""Representing and manipulating email headers via custom objects.
This module provides an implementation of the HeaderRegistry API.
The implementation is designed to flexibly follow RFC5322 rules.
Eventually HeaderRegistry will be a public API, but it isn't yet,
and will probably change some before that happens.
"""
from types import MappingProxyType
from email import utils
from email import errors
from email import _header_value_parser as parser
class Address:
def __init__(self, display_name='', username='', domain='', addr_spec=None):
"""Create an object represeting a full email address.
An address can have a 'display_name', a 'username', and a 'domain'. In
addition to specifying the username and domain separately, they may be
specified together by using the addr_spec keyword *instead of* the
username and domain keywords. If an addr_spec string is specified it
must be properly quoted according to RFC 5322 rules; an error will be
raised if it is not.
An Address object has display_name, username, domain, and addr_spec
attributes, all of which are read-only. The addr_spec and the string
value of the object are both quoted according to RFC5322 rules, but
without any Content Transfer Encoding.
"""
# This clause with its potential 'raise' may only happen when an
# application program creates an Address object using an addr_spec
# keyword. The email library code itself must always supply username
# and domain.
if addr_spec is not None:
if username or domain:
raise TypeError("addrspec specified when username and/or "
"domain also specified")
a_s, rest = parser.get_addr_spec(addr_spec)
if rest:
raise ValueError("Invalid addr_spec; only '{}' "
"could be parsed from '{}'".format(
a_s, addr_spec))
if a_s.all_defects:
raise a_s.all_defects[0]
username = a_s.local_part
domain = a_s.domain
self._display_name = display_name
self._username = username
self._domain = domain
@property
def display_name(self):
return self._display_name
@property
def username(self):
return self._username
@property
def domain(self):
return self._domain
@property
def addr_spec(self):
"""The addr_spec (username@domain) portion of the address, quoted
according to RFC 5322 rules, but with no Content Transfer Encoding.
"""
nameset = set(self.username)
if len(nameset) > len(nameset-parser.DOT_ATOM_ENDS):
lp = parser.quote_string(self.username)
else:
lp = self.username
if self.domain:
return lp + '@' + self.domain
if not lp:
return '<>'
return lp
def __repr__(self):
return "Address(display_name={!r}, username={!r}, domain={!r})".format(
self.display_name, self.username, self.domain)
def __str__(self):
nameset = set(self.display_name)
if len(nameset) > len(nameset-parser.SPECIALS):
disp = parser.quote_string(self.display_name)
else:
disp = self.display_name
if disp:
addr_spec = '' if self.addr_spec=='<>' else self.addr_spec
return "{} <{}>".format(disp, addr_spec)
return self.addr_spec
def __eq__(self, other):
if type(other) != type(self):
return False
return (self.display_name == other.display_name and
self.username == other.username and
self.domain == other.domain)
class Group:
def __init__(self, display_name=None, addresses=None):
"""Create an object representing an address group.
An address group consists of a display_name followed by colon and an
list of addresses (see Address) terminated by a semi-colon. The Group
is created by specifying a display_name and a possibly empty list of
Address objects. A Group can also be used to represent a single
address that is not in a group, which is convenient when manipulating
lists that are a combination of Groups and individual Addresses. In
this case the display_name should be set to None. In particular, the
string representation of a Group whose display_name is None is the same
as the Address object, if there is one and only one Address object in
the addresses list.
"""
self._display_name = display_name
self._addresses = tuple(addresses) if addresses else tuple()
@property
def display_name(self):
return self._display_name
@property
def addresses(self):
return self._addresses
def __repr__(self):
return "Group(display_name={!r}, addresses={!r}".format(
self.display_name, self.addresses)
def __str__(self):
if self.display_name is None and len(self.addresses)==1:
return str(self.addresses[0])
disp = self.display_name
if disp is not None:
nameset = set(disp)
if len(nameset) > len(nameset-parser.SPECIALS):
disp = parser.quote_string(disp)
adrstr = ", ".join(str(x) for x in self.addresses)
adrstr = ' ' + adrstr if adrstr else adrstr
return "{}:{};".format(disp, adrstr)
def __eq__(self, other):
if type(other) != type(self):
return False
return (self.display_name == other.display_name and
self.addresses == other.addresses)
# Header Classes #
class BaseHeader(str):
"""Base class for message headers.
Implements generic behavior and provides tools for subclasses.
A subclass must define a classmethod named 'parse' that takes an unfolded
value string and a dictionary as its arguments. The dictionary will
contain one key, 'defects', initialized to an empty list. After the call
the dictionary must contain two additional keys: parse_tree, set to the
parse tree obtained from parsing the header, and 'decoded', set to the
string value of the idealized representation of the data from the value.
(That is, encoded words are decoded, and values that have canonical
representations are so represented.)
The defects key is intended to collect parsing defects, which the message
parser will subsequently dispose of as appropriate. The parser should not,
insofar as practical, raise any errors. Defects should be added to the
list instead. The standard header parsers register defects for RFC
compliance issues, for obsolete RFC syntax, and for unrecoverable parsing
errors.
The parse method may add additional keys to the dictionary. In this case
the subclass must define an 'init' method, which will be passed the
dictionary as its keyword arguments. The method should use (usually by
setting them as the value of similarly named attributes) and remove all the
extra keys added by its parse method, and then use super to call its parent
class with the remaining arguments and keywords.
The subclass should also make sure that a 'max_count' attribute is defined
that is either None or 1. XXX: need to better define this API.
"""
def __new__(cls, name, value):
kwds = {'defects': []}
cls.parse(value, kwds)
if utils._has_surrogates(kwds['decoded']):
kwds['decoded'] = utils._sanitize(kwds['decoded'])
self = str.__new__(cls, kwds['decoded'])
del kwds['decoded']
self.init(name, **kwds)
return self
def init(self, name, *, parse_tree, defects):
self._name = name
self._parse_tree = parse_tree
self._defects = defects
@property
def name(self):
return self._name
@property
def defects(self):
return tuple(self._defects)
def __reduce__(self):
return (
_reconstruct_header,
(
self.__class__.__name__,
self.__class__.__bases__,
str(self),
),
self.__dict__)
@classmethod
def _reconstruct(cls, value):
return str.__new__(cls, value)
def fold(self, *, policy):
"""Fold header according to policy.
The parsed representation of the header is folded according to
RFC5322 rules, as modified by the policy. If the parse tree
contains surrogateescaped bytes, the bytes are CTE encoded using
the charset 'unknown-8bit".
Any non-ASCII characters in the parse tree are CTE encoded using
charset utf-8. XXX: make this a policy setting.
The returned value is an ASCII-only string possibly containing linesep
characters, and ending with a linesep character. The string includes
the header name and the ': ' separator.
"""
# At some point we need to only put fws here if it was in the source.
header = parser.Header([
parser.HeaderLabel([
parser.ValueTerminal(self.name, 'header-name'),
parser.ValueTerminal(':', 'header-sep')]),
parser.CFWSList([parser.WhiteSpaceTerminal(' ', 'fws')]),
self._parse_tree])
return header.fold(policy=policy)
def _reconstruct_header(cls_name, bases, value):
return type(cls_name, bases, {})._reconstruct(value)
class UnstructuredHeader:
max_count = None
value_parser = staticmethod(parser.get_unstructured)
@classmethod
def parse(cls, value, kwds):
kwds['parse_tree'] = cls.value_parser(value)
kwds['decoded'] = str(kwds['parse_tree'])
class UniqueUnstructuredHeader(UnstructuredHeader):
max_count = 1
class DateHeader:
"""Header whose value consists of a single timestamp.
Provides an additional attribute, datetime, which is either an aware
datetime using a timezone, or a naive datetime if the timezone
in the input string is -0000. Also accepts a datetime as input.
The 'value' attribute is the normalized form of the timestamp,
which means it is the output of format_datetime on the datetime.
"""
max_count = None
# This is used only for folding, not for creating 'decoded'.
value_parser = staticmethod(parser.get_unstructured)
@classmethod
def parse(cls, value, kwds):
if not value:
kwds['defects'].append(errors.HeaderMissingRequiredValue())
kwds['datetime'] = None
kwds['decoded'] = ''
kwds['parse_tree'] = parser.TokenList()
return
if isinstance(value, str):
value = utils.parsedate_to_datetime(value)
kwds['datetime'] = value
kwds['decoded'] = utils.format_datetime(kwds['datetime'])
kwds['parse_tree'] = cls.value_parser(kwds['decoded'])
def init(self, *args, **kw):
self._datetime = kw.pop('datetime')
super().init(*args, **kw)
@property
def datetime(self):
return self._datetime
class UniqueDateHeader(DateHeader):
max_count = 1
class AddressHeader:
max_count = None
@staticmethod
def value_parser(value):
address_list, value = parser.get_address_list(value)
assert not value, 'this should not happen'
return address_list
@classmethod
def parse(cls, value, kwds):
if isinstance(value, str):
# We are translating here from the RFC language (address/mailbox)
# to our API language (group/address).
kwds['parse_tree'] = address_list = cls.value_parser(value)
groups = []
for addr in address_list.addresses:
groups.append(Group(addr.display_name,
[Address(mb.display_name or '',
mb.local_part or '',
mb.domain or '')
for mb in addr.all_mailboxes]))
defects = list(address_list.all_defects)
else:
# Assume it is Address/Group stuff
if not hasattr(value, '__iter__'):
value = [value]
groups = [Group(None, [item]) if not hasattr(item, 'addresses')
else item
for item in value]
defects = []
kwds['groups'] = groups
kwds['defects'] = defects
kwds['decoded'] = ', '.join([str(item) for item in groups])
if 'parse_tree' not in kwds:
kwds['parse_tree'] = cls.value_parser(kwds['decoded'])
def init(self, *args, **kw):
self._groups = tuple(kw.pop('groups'))
self._addresses = None
super().init(*args, **kw)
@property
def groups(self):
return self._groups
@property
def addresses(self):
if self._addresses is None:
self._addresses = tuple([address for group in self._groups
for address in group.addresses])
return self._addresses
class UniqueAddressHeader(AddressHeader):
max_count = 1
class SingleAddressHeader(AddressHeader):
@property
def address(self):
if len(self.addresses)!=1:
raise ValueError(("value of single address header {} is not "
"a single address").format(self.name))
return self.addresses[0]
class UniqueSingleAddressHeader(SingleAddressHeader):
max_count = 1
class MIMEVersionHeader:
max_count = 1
value_parser = staticmethod(parser.parse_mime_version)
@classmethod
def parse(cls, value, kwds):
kwds['parse_tree'] = parse_tree = cls.value_parser(value)
kwds['decoded'] = str(parse_tree)
kwds['defects'].extend(parse_tree.all_defects)
kwds['major'] = None if parse_tree.minor is None else parse_tree.major
kwds['minor'] = parse_tree.minor
if parse_tree.minor is not None:
kwds['version'] = '{}.{}'.format(kwds['major'], kwds['minor'])
else:
kwds['version'] = None
def init(self, *args, **kw):
self._version = kw.pop('version')
self._major = kw.pop('major')
self._minor = kw.pop('minor')
super().init(*args, **kw)
@property
def major(self):
return self._major
@property
def minor(self):
return self._minor
@property
def version(self):
return self._version
class ParameterizedMIMEHeader:
# Mixin that handles the params dict. Must be subclassed and
# a property value_parser for the specific header provided.
max_count = 1
@classmethod
def parse(cls, value, kwds):
kwds['parse_tree'] = parse_tree = cls.value_parser(value)
kwds['decoded'] = str(parse_tree)
kwds['defects'].extend(parse_tree.all_defects)
if parse_tree.params is None:
kwds['params'] = {}
else:
# The MIME RFCs specify that parameter ordering is arbitrary.
kwds['params'] = {utils._sanitize(name).lower():
utils._sanitize(value)
for name, value in parse_tree.params}
def init(self, *args, **kw):
self._params = kw.pop('params')
super().init(*args, **kw)
@property
def params(self):
return MappingProxyType(self._params)
class ContentTypeHeader(ParameterizedMIMEHeader):
value_parser = staticmethod(parser.parse_content_type_header)
def init(self, *args, **kw):
super().init(*args, **kw)
self._maintype = utils._sanitize(self._parse_tree.maintype)
self._subtype = utils._sanitize(self._parse_tree.subtype)
@property
def maintype(self):
return self._maintype
@property
def subtype(self):
return self._subtype
@property
def content_type(self):
return self.maintype + '/' + self.subtype
class ContentDispositionHeader(ParameterizedMIMEHeader):
value_parser = staticmethod(parser.parse_content_disposition_header)
def init(self, *args, **kw):
super().init(*args, **kw)
cd = self._parse_tree.content_disposition
self._content_disposition = cd if cd is None else utils._sanitize(cd)
@property
def content_disposition(self):
return self._content_disposition
class ContentTransferEncodingHeader:
max_count = 1
value_parser = staticmethod(parser.parse_content_transfer_encoding_header)
@classmethod
def parse(cls, value, kwds):
kwds['parse_tree'] = parse_tree = cls.value_parser(value)
kwds['decoded'] = str(parse_tree)
kwds['defects'].extend(parse_tree.all_defects)
def init(self, *args, **kw):
super().init(*args, **kw)
self._cte = utils._sanitize(self._parse_tree.cte)
@property
def cte(self):
return self._cte
# The header factory #
_default_header_map = {
'subject': UniqueUnstructuredHeader,
'date': UniqueDateHeader,
'resent-date': DateHeader,
'orig-date': UniqueDateHeader,
'sender': UniqueSingleAddressHeader,
'resent-sender': SingleAddressHeader,
'to': UniqueAddressHeader,
'resent-to': AddressHeader,
'cc': UniqueAddressHeader,
'resent-cc': AddressHeader,
'bcc': UniqueAddressHeader,
'resent-bcc': AddressHeader,
'from': UniqueAddressHeader,
'resent-from': AddressHeader,
'reply-to': UniqueAddressHeader,
'mime-version': MIMEVersionHeader,
'content-type': ContentTypeHeader,
'content-disposition': ContentDispositionHeader,
'content-transfer-encoding': ContentTransferEncodingHeader,
}
class HeaderRegistry:
"""A header_factory and header registry."""
def __init__(self, base_class=BaseHeader, default_class=UnstructuredHeader,
use_default_map=True):
"""Create a header_factory that works with the Policy API.
base_class is the class that will be the last class in the created
header class's __bases__ list. default_class is the class that will be
used if "name" (see __call__) does not appear in the registry.
use_default_map controls whether or not the default mapping of names to
specialized classes is copied in to the registry when the factory is
created. The default is True.
"""
self.registry = {}
self.base_class = base_class
self.default_class = default_class
if use_default_map:
self.registry.update(_default_header_map)
def map_to_type(self, name, cls):
"""Register cls as the specialized class for handling "name" headers.
"""
self.registry[name.lower()] = cls
def __getitem__(self, name):
cls = self.registry.get(name.lower(), self.default_class)
return type('_'+cls.__name__, (cls, self.base_class), {})
def __call__(self, name, value):
"""Create a header instance for header 'name' from 'value'.
Creates a header instance by creating a specialized class for parsing
and representing the specified header by combining the factory
base_class with a specialized class from the registry or the
default_class, and passing the name and value to the constructed
class's constructor.
"""
return self[name](name, value)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Various types of useful iterators and generators."""
__all__ = [
'body_line_iterator',
'typed_subpart_iterator',
'walk',
# Do not include _structure() since it's part of the debugging API.
]
import sys
from io import StringIO
# This function will become a method of the Message class
def walk(self):
"""Walk over the message tree, yielding each subpart.
The walk is performed in depth-first order. This method is a
generator.
"""
yield self
if self.is_multipart():
for subpart in self.get_payload():
yield from subpart.walk()
# These two functions are imported into the Iterators.py interface module.
def body_line_iterator(msg, decode=False):
"""Iterate over the parts, returning string payloads line-by-line.
Optional decode (default False) is passed through to .get_payload().
"""
for subpart in msg.walk():
payload = subpart.get_payload(decode=decode)
if isinstance(payload, str):
yield from StringIO(payload)
def typed_subpart_iterator(msg, maintype='text', subtype=None):
"""Iterate over the subparts with a given MIME type.
Use `maintype' as the main MIME type to match against; this defaults to
"text". Optional `subtype' is the MIME subtype to match against; if
omitted, only the main type is matched.
"""
for subpart in msg.walk():
if subpart.get_content_maintype() == maintype:
if subtype is None or subpart.get_content_subtype() == subtype:
yield subpart
def _structure(msg, fp=None, level=0, include_default=False):
"""A handy debugging aid"""
if fp is None:
fp = sys.stdout
tab = ' ' * (level * 4)
print(tab + msg.get_content_type(), end='', file=fp)
if include_default:
print(' [%s]' % msg.get_default_type(), file=fp)
else:
print(file=fp)
if msg.is_multipart():
for subpart in msg.get_payload():
_structure(subpart, fp, level+1, include_default)
# Copyright (C) 2001-2007 Python Software Foundation
# Author: Barry Warsaw, Thomas Wouters, Anthony Baxter
# Contact: [email protected]
"""A parser of RFC 2822 and MIME email messages."""
__all__ = ['Parser', 'HeaderParser', 'BytesParser', 'BytesHeaderParser',
'FeedParser', 'BytesFeedParser']
from io import StringIO, TextIOWrapper
from email.feedparser import FeedParser, BytesFeedParser
from email._policybase import compat32
class Parser:
def __init__(self, _class=None, *, policy=compat32):
"""Parser of RFC 2822 and MIME email messages.
Creates an in-memory object tree representing the email message, which
can then be manipulated and turned over to a Generator to return the
textual representation of the message.
The string must be formatted as a block of RFC 2822 headers and header
continuation lines, optionally preceeded by a `Unix-from' header. The
header block is terminated either by the end of the string or by a
blank line.
_class is the class to instantiate for new message objects when they
must be created. This class must have a constructor that can take
zero arguments. Default is Message.Message.
The policy keyword specifies a policy object that controls a number of
aspects of the parser's operation. The default policy maintains
backward compatibility.
"""
self._class = _class
self.policy = policy
def parse(self, fp, headersonly=False):
"""Create a message structure from the data in a file.
Reads all the data from the file and returns the root of the message
structure. Optional headersonly is a flag specifying whether to stop
parsing after reading the headers or not. The default is False,
meaning it parses the entire contents of the file.
"""
feedparser = FeedParser(self._class, policy=self.policy)
if headersonly:
feedparser._set_headersonly()
while True:
data = fp.read(8192)
if not data:
break
feedparser.feed(data)
return feedparser.close()
def parsestr(self, text, headersonly=False):
"""Create a message structure from a string.
Returns the root of the message structure. Optional headersonly is a
flag specifying whether to stop parsing after reading the headers or
not. The default is False, meaning it parses the entire contents of
the file.
"""
return self.parse(StringIO(text), headersonly=headersonly)
class HeaderParser(Parser):
def parse(self, fp, headersonly=True):
return Parser.parse(self, fp, True)
def parsestr(self, text, headersonly=True):
return Parser.parsestr(self, text, True)
class BytesParser:
def __init__(self, *args, **kw):
"""Parser of binary RFC 2822 and MIME email messages.
Creates an in-memory object tree representing the email message, which
can then be manipulated and turned over to a Generator to return the
textual representation of the message.
The input must be formatted as a block of RFC 2822 headers and header
continuation lines, optionally preceeded by a `Unix-from' header. The
header block is terminated either by the end of the input or by a
blank line.
_class is the class to instantiate for new message objects when they
must be created. This class must have a constructor that can take
zero arguments. Default is Message.Message.
"""
self.parser = Parser(*args, **kw)
def parse(self, fp, headersonly=False):
"""Create a message structure from the data in a binary file.
Reads all the data from the file and returns the root of the message
structure. Optional headersonly is a flag specifying whether to stop
parsing after reading the headers or not. The default is False,
meaning it parses the entire contents of the file.
"""
fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape')
try:
return self.parser.parse(fp, headersonly)
finally:
fp.detach()
def parsebytes(self, text, headersonly=False):
"""Create a message structure from a byte string.
Returns the root of the message structure. Optional headersonly is a
flag specifying whether to stop parsing after reading the headers or
not. The default is False, meaning it parses the entire contents of
the file.
"""
text = text.decode('ASCII', errors='surrogateescape')
return self.parser.parsestr(text, headersonly)
class BytesHeaderParser(BytesParser):
def parse(self, fp, headersonly=True):
return BytesParser.parse(self, fp, headersonly=True)
def parsebytes(self, text, headersonly=True):
return BytesParser.parsebytes(self, text, headersonly=True)
"""This will be the home for the policy that hooks in the new
code that adds all the email6 features.
"""
from email._policybase import Policy, Compat32, compat32, _extend_docstrings
from email.utils import _has_surrogates
from email.headerregistry import HeaderRegistry as HeaderRegistry
from email.contentmanager import raw_data_manager
__all__ = [
'Compat32',
'compat32',
'Policy',
'EmailPolicy',
'default',
'strict',
'SMTP',
'HTTP',
]
@_extend_docstrings
class EmailPolicy(Policy):
"""+
PROVISIONAL
The API extensions enabled by this policy are currently provisional.
Refer to the documentation for details.
This policy adds new header parsing and folding algorithms. Instead of
simple strings, headers are custom objects with custom attributes
depending on the type of the field. The folding algorithm fully
implements RFCs 2047 and 5322.
In addition to the settable attributes listed above that apply to
all Policies, this policy adds the following additional attributes:
refold_source -- if the value for a header in the Message object
came from the parsing of some source, this attribute
indicates whether or not a generator should refold
that value when transforming the message back into
stream form. The possible values are:
none -- all source values use original folding
long -- source values that have any line that is
longer than max_line_length will be
refolded
all -- all values are refolded.
The default is 'long'.
header_factory -- a callable that takes two arguments, 'name' and
'value', where 'name' is a header field name and
'value' is an unfolded header field value, and
returns a string-like object that represents that
header. A default header_factory is provided that
understands some of the RFC5322 header field types.
(Currently address fields and date fields have
special treatment, while all other fields are
treated as unstructured. This list will be
completed before the extension is marked stable.)
content_manager -- an object with at least two methods: get_content
and set_content. When the get_content or
set_content method of a Message object is called,
it calls the corresponding method of this object,
passing it the message object as its first argument,
and any arguments or keywords that were passed to
it as additional arguments. The default
content_manager is
:data:`~email.contentmanager.raw_data_manager`.
"""
refold_source = 'long'
header_factory = HeaderRegistry()
content_manager = raw_data_manager
def __init__(self, **kw):
# Ensure that each new instance gets a unique header factory
# (as opposed to clones, which share the factory).
if 'header_factory' not in kw:
object.__setattr__(self, 'header_factory', HeaderRegistry())
super().__init__(**kw)
def header_max_count(self, name):
"""+
The implementation for this class returns the max_count attribute from
the specialized header class that would be used to construct a header
of type 'name'.
"""
return self.header_factory[name].max_count
# The logic of the next three methods is chosen such that it is possible to
# switch a Message object between a Compat32 policy and a policy derived
# from this class and have the results stay consistent. This allows a
# Message object constructed with this policy to be passed to a library
# that only handles Compat32 objects, or to receive such an object and
# convert it to use the newer style by just changing its policy. It is
# also chosen because it postpones the relatively expensive full rfc5322
# parse until as late as possible when parsing from source, since in many
# applications only a few headers will actually be inspected.
def header_source_parse(self, sourcelines):
"""+
The name is parsed as everything up to the ':' and returned unmodified.
The value is determined by stripping leading whitespace off the
remainder of the first line, joining all subsequent lines together, and
stripping any trailing carriage return or linefeed characters. (This
is the same as Compat32).
"""
name, value = sourcelines[0].split(':', 1)
value = value.lstrip(' \t') + ''.join(sourcelines[1:])
return (name, value.rstrip('\r\n'))
def header_store_parse(self, name, value):
"""+
The name is returned unchanged. If the input value has a 'name'
attribute and it matches the name ignoring case, the value is returned
unchanged. Otherwise the name and value are passed to header_factory
method, and the resulting custom header object is returned as the
value. In this case a ValueError is raised if the input value contains
CR or LF characters.
"""
if hasattr(value, 'name') and value.name.lower() == name.lower():
return (name, value)
if isinstance(value, str) and len(value.splitlines())>1:
raise ValueError("Header values may not contain linefeed "
"or carriage return characters")
return (name, self.header_factory(name, value))
def header_fetch_parse(self, name, value):
"""+
If the value has a 'name' attribute, it is returned to unmodified.
Otherwise the name and the value with any linesep characters removed
are passed to the header_factory method, and the resulting custom
header object is returned. Any surrogateescaped bytes get turned
into the unicode unknown-character glyph.
"""
if hasattr(value, 'name'):
return value
return self.header_factory(name, ''.join(value.splitlines()))
def fold(self, name, value):
"""+
Header folding is controlled by the refold_source policy setting. A
value is considered to be a 'source value' if and only if it does not
have a 'name' attribute (having a 'name' attribute means it is a header
object of some sort). If a source value needs to be refolded according
to the policy, it is converted into a custom header object by passing
the name and the value with any linesep characters removed to the
header_factory method. Folding of a custom header object is done by
calling its fold method with the current policy.
Source values are split into lines using splitlines. If the value is
not to be refolded, the lines are rejoined using the linesep from the
policy and returned. The exception is lines containing non-ascii
binary data. In that case the value is refolded regardless of the
refold_source setting, which causes the binary data to be CTE encoded
using the unknown-8bit charset.
"""
return self._fold(name, value, refold_binary=True)
def fold_binary(self, name, value):
"""+
The same as fold if cte_type is 7bit, except that the returned value is
bytes.
If cte_type is 8bit, non-ASCII binary data is converted back into
bytes. Headers with binary data are not refolded, regardless of the
refold_header setting, since there is no way to know whether the binary
data consists of single byte characters or multibyte characters.
"""
folded = self._fold(name, value, refold_binary=self.cte_type=='7bit')
return folded.encode('ascii', 'surrogateescape')
def _fold(self, name, value, refold_binary=False):
if hasattr(value, 'name'):
return value.fold(policy=self)
maxlen = self.max_line_length if self.max_line_length else float('inf')
lines = value.splitlines()
refold = (self.refold_source == 'all' or
self.refold_source == 'long' and
(lines and len(lines[0])+len(name)+2 > maxlen or
any(len(x) > maxlen for x in lines[1:])))
if refold or refold_binary and _has_surrogates(value):
return self.header_factory(name, ''.join(lines)).fold(policy=self)
return name + ': ' + self.linesep.join(lines) + self.linesep
default = EmailPolicy()
# Make the default policy use the class default header_factory
del default.header_factory
strict = default.clone(raise_on_defect=True)
SMTP = default.clone(linesep='\r\n')
HTTP = default.clone(linesep='\r\n', max_line_length=None)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Ben Gertzfield
# Contact: [email protected]
"""Quoted-printable content transfer encoding per RFCs 2045-2047.
This module handles the content transfer encoding method defined in RFC 2045
to encode US ASCII-like 8-bit data called `quoted-printable'. It is used to
safely encode text that is in a character set similar to the 7-bit US ASCII
character set, but that includes some 8-bit characters that are normally not
allowed in email bodies or headers.
Quoted-printable is very space-inefficient for encoding binary files; use the
email.base64mime module for that instead.
This module provides an interface to encode and decode both headers and bodies
with quoted-printable encoding.
RFC 2045 defines a method for including character set information in an
`encoded-word' in a header. This method is commonly used for 8-bit real names
in To:/From:/Cc: etc. fields, as well as Subject: lines.
This module does not do the line wrapping or end-of-line character
conversion necessary for proper internationalized headers; it only
does dumb encoding and decoding. To deal with the various line
wrapping issues, use the email.header module.
"""
__all__ = [
'body_decode',
'body_encode',
'body_length',
'decode',
'decodestring',
'header_decode',
'header_encode',
'header_length',
'quote',
'unquote',
]
import re
from string import ascii_letters, digits, hexdigits
CRLF = '\r\n'
NL = '\n'
EMPTYSTRING = ''
# Build a mapping of octets to the expansion of that octet. Since we're only
# going to have 256 of these things, this isn't terribly inefficient
# space-wise. Remember that headers and bodies have different sets of safe
# characters. Initialize both maps with the full expansion, and then override
# the safe bytes with the more compact form.
_QUOPRI_MAP = ['=%02X' % c for c in range(256)]
_QUOPRI_HEADER_MAP = _QUOPRI_MAP[:]
_QUOPRI_BODY_MAP = _QUOPRI_MAP[:]
# Safe header bytes which need no encoding.
for c in b'-!*+/' + ascii_letters.encode('ascii') + digits.encode('ascii'):
_QUOPRI_HEADER_MAP[c] = chr(c)
# Headers have one other special encoding; spaces become underscores.
_QUOPRI_HEADER_MAP[ord(' ')] = '_'
# Safe body bytes which need no encoding.
for c in (b' !"#$%&\'()*+,-./0123456789:;<>'
b'?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`'
b'abcdefghijklmnopqrstuvwxyz{|}~\t'):
_QUOPRI_BODY_MAP[c] = chr(c)
# Helpers
def header_check(octet):
"""Return True if the octet should be escaped with header quopri."""
return chr(octet) != _QUOPRI_HEADER_MAP[octet]
def body_check(octet):
"""Return True if the octet should be escaped with body quopri."""
return chr(octet) != _QUOPRI_BODY_MAP[octet]
def header_length(bytearray):
"""Return a header quoted-printable encoding length.
Note that this does not include any RFC 2047 chrome added by
`header_encode()`.
:param bytearray: An array of bytes (a.k.a. octets).
:return: The length in bytes of the byte array when it is encoded with
quoted-printable for headers.
"""
return sum(len(_QUOPRI_HEADER_MAP[octet]) for octet in bytearray)
def body_length(bytearray):
"""Return a body quoted-printable encoding length.
:param bytearray: An array of bytes (a.k.a. octets).
:return: The length in bytes of the byte array when it is encoded with
quoted-printable for bodies.
"""
return sum(len(_QUOPRI_BODY_MAP[octet]) for octet in bytearray)
def _max_append(L, s, maxlen, extra=''):
if not isinstance(s, str):
s = chr(s)
if not L:
L.append(s.lstrip())
elif len(L[-1]) + len(s) <= maxlen:
L[-1] += extra + s
else:
L.append(s.lstrip())
def unquote(s):
"""Turn a string in the form =AB to the ASCII character with value 0xab"""
return chr(int(s[1:3], 16))
def quote(c):
return _QUOPRI_MAP[ord(c)]
def header_encode(header_bytes, charset='iso-8859-1'):
"""Encode a single header line with quoted-printable (like) encoding.
Defined in RFC 2045, this `Q' encoding is similar to quoted-printable, but
used specifically for email header fields to allow charsets with mostly 7
bit characters (and some 8 bit) to remain more or less readable in non-RFC
2045 aware mail clients.
charset names the character set to use in the RFC 2046 header. It
defaults to iso-8859-1.
"""
# Return empty headers as an empty string.
if not header_bytes:
return ''
# Iterate over every byte, encoding if necessary.
encoded = header_bytes.decode('latin1').translate(_QUOPRI_HEADER_MAP)
# Now add the RFC chrome to each encoded chunk and glue the chunks
# together.
return '=?%s?q?%s?=' % (charset, encoded)
_QUOPRI_BODY_ENCODE_MAP = _QUOPRI_BODY_MAP[:]
for c in b'\r\n':
_QUOPRI_BODY_ENCODE_MAP[c] = chr(c)
def body_encode(body, maxlinelen=76, eol=NL):
"""Encode with quoted-printable, wrapping at maxlinelen characters.
Each line of encoded text will end with eol, which defaults to "\\n". Set
this to "\\r\\n" if you will be using the result of this function directly
in an email.
Each line will be wrapped at, at most, maxlinelen characters before the
eol string (maxlinelen defaults to 76 characters, the maximum value
permitted by RFC 2045). Long lines will have the 'soft line break'
quoted-printable character "=" appended to them, so the decoded text will
be identical to the original text.
The minimum maxlinelen is 4 to have room for a quoted character ("=XX")
followed by a soft line break. Smaller values will generate a
ValueError.
"""
if maxlinelen < 4:
raise ValueError("maxlinelen must be at least 4")
if not body:
return body
# quote speacial characters
body = body.translate(_QUOPRI_BODY_ENCODE_MAP)
soft_break = '=' + eol
# leave space for the '=' at the end of a line
maxlinelen1 = maxlinelen - 1
encoded_body = []
append = encoded_body.append
for line in body.splitlines():
# break up the line into pieces no longer than maxlinelen - 1
start = 0
laststart = len(line) - 1 - maxlinelen
while start <= laststart:
stop = start + maxlinelen1
# make sure we don't break up an escape sequence
if line[stop - 2] == '=':
append(line[start:stop - 1])
start = stop - 2
elif line[stop - 1] == '=':
append(line[start:stop])
start = stop - 1
else:
append(line[start:stop] + '=')
start = stop
# handle rest of line, special case if line ends in whitespace
if line and line[-1] in ' \t':
room = start - laststart
if room >= 3:
# It's a whitespace character at end-of-line, and we have room
# for the three-character quoted encoding.
q = quote(line[-1])
elif room == 2:
# There's room for the whitespace character and a soft break.
q = line[-1] + soft_break
else:
# There's room only for a soft break. The quoted whitespace
# will be the only content on the subsequent line.
q = soft_break + quote(line[-1])
append(line[start:-1] + q)
else:
append(line[start:])
# add back final newline if present
if body[-1] in CRLF:
append('')
return eol.join(encoded_body)
# BAW: I'm not sure if the intent was for the signature of this function to be
# the same as base64MIME.decode() or not...
def decode(encoded, eol=NL):
"""Decode a quoted-printable string.
Lines are separated with eol, which defaults to \\n.
"""
if not encoded:
return encoded
# BAW: see comment in encode() above. Again, we're building up the
# decoded string with string concatenation, which could be done much more
# efficiently.
decoded = ''
for line in encoded.splitlines():
line = line.rstrip()
if not line:
decoded += eol
continue
i = 0
n = len(line)
while i < n:
c = line[i]
if c != '=':
decoded += c
i += 1
# Otherwise, c == "=". Are we at the end of the line? If so, add
# a soft line break.
elif i+1 == n:
i += 1
continue
# Decode if in form =AB
elif i+2 < n and line[i+1] in hexdigits and line[i+2] in hexdigits:
decoded += unquote(line[i:i+3])
i += 3
# Otherwise, not in form =AB, pass literally
else:
decoded += c
i += 1
if i == n:
decoded += eol
# Special case if original string did not end with eol
if encoded[-1] not in '\r\n' and decoded.endswith(eol):
decoded = decoded[:-1]
return decoded
# For convenience and backwards compatibility w/ standard base64 module
body_decode = decode
decodestring = decode
def _unquote_match(match):
"""Turn a match in the form =AB to the ASCII character with value 0xab"""
s = match.group(0)
return unquote(s)
# Header decoding is done a bit differently
def header_decode(s):
"""Decode a string encoded with RFC 2045 MIME header `Q' encoding.
This function does not parse a full MIME header value encoded with
quoted-printable (like =?iso-8895-1?q?Hello_World?=) -- please use
the high level email.header class for that functionality.
"""
s = s.replace('_', ' ')
return re.sub(r'=[a-fA-F0-9]{2}', _unquote_match, s, flags=re.ASCII)
# Copyright (C) 2001-2010 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Miscellaneous utilities."""
__all__ = [
'collapse_rfc2231_value',
'decode_params',
'decode_rfc2231',
'encode_rfc2231',
'formataddr',
'formatdate',
'format_datetime',
'getaddresses',
'make_msgid',
'mktime_tz',
'parseaddr',
'parsedate',
'parsedate_tz',
'parsedate_to_datetime',
'unquote',
]
import os
import re
import time
import random
import socket
import datetime
import urllib.parse
from email._parseaddr import quote
from email._parseaddr import AddressList as _AddressList
from email._parseaddr import mktime_tz
from email._parseaddr import parsedate, parsedate_tz, _parsedate_tz
# Intrapackage imports
from email.charset import Charset
COMMASPACE = ', '
EMPTYSTRING = ''
UEMPTYSTRING = ''
CRLF = '\r\n'
TICK = "'"
specialsre = re.compile(r'[][\\()<>@,:;".]')
escapesre = re.compile(r'[\\"]')
def _has_surrogates(s):
"""Return True if s contains surrogate-escaped binary data."""
# This check is based on the fact that unless there are surrogates, utf8
# (Python's default encoding) can encode any string. This is the fastest
# way to check for surrogates, see issue 11454 for timings.
try:
s.encode()
return False
except UnicodeEncodeError:
return True
# How to deal with a string containing bytes before handing it to the
# application through the 'normal' interface.
def _sanitize(string):
# Turn any escaped bytes into unicode 'unknown' char. If the escaped
# bytes happen to be utf-8 they will instead get decoded, even if they
# were invalid in the charset the source was supposed to be in. This
# seems like it is not a bad thing; a defect was still registered.
original_bytes = string.encode('utf-8', 'surrogateescape')
return original_bytes.decode('utf-8', 'replace')
# Helpers
def formataddr(pair, charset='utf-8'):
"""The inverse of parseaddr(), this takes a 2-tuple of the form
(realname, email_address) and returns the string value suitable
for an RFC 2822 From, To or Cc header.
If the first element of pair is false, then the second element is
returned unmodified.
Optional charset if given is the character set that is used to encode
realname in case realname is not ASCII safe. Can be an instance of str or
a Charset-like object which has a header_encode method. Default is
'utf-8'.
"""
name, address = pair
# The address MUST (per RFC) be ascii, so raise an UnicodeError if it isn't.
address.encode('ascii')
if name:
try:
name.encode('ascii')
except UnicodeEncodeError:
if isinstance(charset, str):
charset = Charset(charset)
encoded_name = charset.header_encode(name)
return "%s <%s>" % (encoded_name, address)
else:
quotes = ''
if specialsre.search(name):
quotes = '"'
name = escapesre.sub(r'\\\g<0>', name)
return '%s%s%s <%s>' % (quotes, name, quotes, address)
return address
def getaddresses(fieldvalues):
"""Return a list of (REALNAME, EMAIL) for each fieldvalue."""
all = COMMASPACE.join(fieldvalues)
a = _AddressList(all)
return a.addresslist
ecre = re.compile(r'''
=\? # literal =?
(?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
\? # literal ?
(?P<encoding>[qb]) # either a "q" or a "b", case insensitive
\? # literal ?
(?P<atom>.*?) # non-greedy up to the next ?= is the atom
\?= # literal ?=
''', re.VERBOSE | re.IGNORECASE)
def _format_timetuple_and_zone(timetuple, zone):
return '%s, %02d %s %04d %02d:%02d:%02d %s' % (
['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'][timetuple[6]],
timetuple[2],
['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'][timetuple[1] - 1],
timetuple[0], timetuple[3], timetuple[4], timetuple[5],
zone)
def formatdate(timeval=None, localtime=False, usegmt=False):
"""Returns a date string as specified by RFC 2822, e.g.:
Fri, 09 Nov 2001 01:08:47 -0000
Optional timeval if given is a floating point time value as accepted by
gmtime() and localtime(), otherwise the current time is used.
Optional localtime is a flag that when True, interprets timeval, and
returns a date relative to the local timezone instead of UTC, properly
taking daylight savings time into account.
Optional argument usegmt means that the timezone is written out as
an ascii string, not numeric one (so "GMT" instead of "+0000"). This
is needed for HTTP, and is only used when localtime==False.
"""
# Note: we cannot use strftime() because that honors the locale and RFC
# 2822 requires that day and month names be the English abbreviations.
if timeval is None:
timeval = time.time()
if localtime or usegmt:
dt = datetime.datetime.fromtimestamp(timeval, datetime.timezone.utc)
else:
dt = datetime.datetime.utcfromtimestamp(timeval)
if localtime:
dt = dt.astimezone()
usegmt = False
return format_datetime(dt, usegmt)
def format_datetime(dt, usegmt=False):
"""Turn a datetime into a date string as specified in RFC 2822.
If usegmt is True, dt must be an aware datetime with an offset of zero. In
this case 'GMT' will be rendered instead of the normal +0000 required by
RFC2822. This is to support HTTP headers involving date stamps.
"""
now = dt.timetuple()
if usegmt:
if dt.tzinfo is None or dt.tzinfo != datetime.timezone.utc:
raise ValueError("usegmt option requires a UTC datetime")
zone = 'GMT'
elif dt.tzinfo is None:
zone = '-0000'
else:
zone = dt.strftime("%z")
return _format_timetuple_and_zone(now, zone)
def make_msgid(idstring=None, domain=None):
"""Returns a string suitable for RFC 2822 compliant Message-ID, e.g:
<142480216486.20800.16526388040877946887@nightshade.la.mastaler.com>
Optional idstring if given is a string used to strengthen the
uniqueness of the message id. Optional domain if given provides the
portion of the message id after the '@'. It defaults to the locally
defined hostname.
"""
timeval = int(time.time()*100)
pid = os.getpid()
randint = random.getrandbits(64)
if idstring is None:
idstring = ''
else:
idstring = '.' + idstring
if domain is None:
domain = socket.getfqdn()
msgid = '<%d.%d.%d%s@%s>' % (timeval, pid, randint, idstring, domain)
return msgid
def parsedate_to_datetime(data):
*dtuple, tz = _parsedate_tz(data)
if tz is None:
return datetime.datetime(*dtuple[:6])
return datetime.datetime(*dtuple[:6],
tzinfo=datetime.timezone(datetime.timedelta(seconds=tz)))
def parseaddr(addr):
addrs = _AddressList(addr).addresslist
if not addrs:
return '', ''
return addrs[0]
# rfc822.unquote() doesn't properly de-backslash-ify in Python pre-2.3.
def unquote(str):
"""Remove quotes from a string."""
if len(str) > 1:
if str.startswith('"') and str.endswith('"'):
return str[1:-1].replace('\\\\', '\\').replace('\\"', '"')
if str.startswith('<') and str.endswith('>'):
return str[1:-1]
return str
# RFC2231-related functions - parameter encoding and decoding
def decode_rfc2231(s):
"""Decode string according to RFC 2231"""
parts = s.split(TICK, 2)
if len(parts) <= 2:
return None, None, s
return parts
def encode_rfc2231(s, charset=None, language=None):
"""Encode string according to RFC 2231.
If neither charset nor language is given, then s is returned as-is. If
charset is given but not language, the string is encoded using the empty
string for language.
"""
s = urllib.parse.quote(s, safe='', encoding=charset or 'ascii')
if charset is None and language is None:
return s
if language is None:
language = ''
return "%s'%s'%s" % (charset, language, s)
rfc2231_continuation = re.compile(r'^(?P<name>\w+)\*((?P<num>[0-9]+)\*?)?$',
re.ASCII)
def decode_params(params):
"""Decode parameters list according to RFC 2231.
params is a sequence of 2-tuples containing (param name, string value).
"""
# Copy params so we don't mess with the original
params = params[:]
new_params = []
# Map parameter's name to a list of continuations. The values are a
# 3-tuple of the continuation number, the string value, and a flag
# specifying whether a particular segment is %-encoded.
rfc2231_params = {}
name, value = params.pop(0)
new_params.append((name, value))
while params:
name, value = params.pop(0)
if name.endswith('*'):
encoded = True
else:
encoded = False
value = unquote(value)
mo = rfc2231_continuation.match(name)
if mo:
name, num = mo.group('name', 'num')
if num is not None:
num = int(num)
rfc2231_params.setdefault(name, []).append((num, value, encoded))
else:
new_params.append((name, '"%s"' % quote(value)))
if rfc2231_params:
for name, continuations in rfc2231_params.items():
value = []
extended = False
# Sort by number
continuations.sort()
# And now append all values in numerical order, converting
# %-encodings for the encoded segments. If any of the
# continuation names ends in a *, then the entire string, after
# decoding segments and concatenating, must have the charset and
# language specifiers at the beginning of the string.
for num, s, encoded in continuations:
if encoded:
# Decode as "latin-1", so the characters in s directly
# represent the percent-encoded octet values.
# collapse_rfc2231_value treats this as an octet sequence.
s = urllib.parse.unquote(s, encoding="latin-1")
extended = True
value.append(s)
value = quote(EMPTYSTRING.join(value))
if extended:
charset, language, value = decode_rfc2231(value)
new_params.append((name, (charset, language, '"%s"' % value)))
else:
new_params.append((name, '"%s"' % value))
return new_params
def collapse_rfc2231_value(value, errors='replace',
fallback_charset='us-ascii'):
if not isinstance(value, tuple) or len(value) != 3:
return unquote(value)
# While value comes to us as a unicode string, we need it to be a bytes
# object. We do not want bytes() normal utf-8 decoder, we want a straight
# interpretation of the string as character bytes.
charset, language, text = value
if charset is None:
# Issue 17369: if charset/lang is None, decode_rfc2231 couldn't parse
# the value, so use the fallback_charset.
charset = fallback_charset
rawbytes = bytes(text, 'raw-unicode-escape')
try:
return str(rawbytes, charset, errors)
except LookupError:
# charset is not a known codec.
return unquote(text)
#
# datetime doesn't provide a localtime function yet, so provide one. Code
# adapted from the patch in issue 9527. This may not be perfect, but it is
# better than not having it.
#
def localtime(dt=None, isdst=-1):
"""Return local time as an aware datetime object.
If called without arguments, return current time. Otherwise *dt*
argument should be a datetime instance, and it is converted to the
local time zone according to the system time zone database. If *dt* is
naive (that is, dt.tzinfo is None), it is assumed to be in local time.
In this case, a positive or zero value for *isdst* causes localtime to
presume initially that summer time (for example, Daylight Saving Time)
is or is not (respectively) in effect for the specified time. A
negative value for *isdst* causes the localtime() function to attempt
to divine whether summer time is in effect for the specified time.
"""
if dt is None:
return datetime.datetime.now(datetime.timezone.utc).astimezone()
if dt.tzinfo is not None:
return dt.astimezone()
# We have a naive datetime. Convert to a (localtime) timetuple and pass to
# system mktime together with the isdst hint. System mktime will return
# seconds since epoch.
tm = dt.timetuple()[:-1] + (isdst,)
seconds = time.mktime(tm)
localtm = time.localtime(seconds)
try:
delta = datetime.timedelta(seconds=localtm.tm_gmtoff)
tz = datetime.timezone(delta, localtm.tm_zone)
except AttributeError:
# Compute UTC offset and compare with the value implied by tm_isdst.
# If the values match, use the zone name implied by tm_isdst.
delta = dt - datetime.datetime(*time.gmtime(seconds)[:6])
dst = time.daylight and localtm.tm_isdst > 0
gmtoff = -(time.altzone if dst else time.timezone)
if delta == datetime.timedelta(seconds=gmtoff):
tz = datetime.timezone(delta, time.tzname[dst])
else:
tz = datetime.timezone(delta)
return dt.replace(tzinfo=tz)
""" Routines for manipulating RFC2047 encoded words.
This is currently a package-private API, but will be considered for promotion
to a public API if there is demand.
"""
# An ecoded word looks like this:
#
# =?charset[*lang]?cte?encoded_string?=
#
# for more information about charset see the charset module. Here it is one
# of the preferred MIME charset names (hopefully; you never know when parsing).
# cte (Content Transfer Encoding) is either 'q' or 'b' (ignoring case). In
# theory other letters could be used for other encodings, but in practice this
# (almost?) never happens. There could be a public API for adding entries
# to the CTE tables, but YAGNI for now. 'q' is Quoted Printable, 'b' is
# Base64. The meaning of encoded_string should be obvious. 'lang' is optional
# as indicated by the brackets (they are not part of the syntax) but is almost
# never encountered in practice.
#
# The general interface for a CTE decoder is that it takes the encoded_string
# as its argument, and returns a tuple (cte_decoded_string, defects). The
# cte_decoded_string is the original binary that was encoded using the
# specified cte. 'defects' is a list of MessageDefect instances indicating any
# problems encountered during conversion. 'charset' and 'lang' are the
# corresponding strings extracted from the EW, case preserved.
#
# The general interface for a CTE encoder is that it takes a binary sequence
# as input and returns the cte_encoded_string, which is an ascii-only string.
#
# Each decoder must also supply a length function that takes the binary
# sequence as its argument and returns the length of the resulting encoded
# string.
#
# The main API functions for the module are decode, which calls the decoder
# referenced by the cte specifier, and encode, which adds the appropriate
# RFC 2047 "chrome" to the encoded string, and can optionally automatically
# select the shortest possible encoding. See their docstrings below for
# details.
import re
import base64
import binascii
import functools
from string import ascii_letters, digits
from email import errors
__all__ = ['decode_q',
'encode_q',
'decode_b',
'encode_b',
'len_q',
'len_b',
'decode',
'encode',
]
#
# Quoted Printable
#
# regex based decoder.
_q_byte_subber = functools.partial(re.compile(br'=([a-fA-F0-9]{2})').sub,
lambda m: bytes([int(m.group(1), 16)]))
def decode_q(encoded):
encoded = encoded.replace(b'_', b' ')
return _q_byte_subber(encoded), []
# dict mapping bytes to their encoded form
class _QByteMap(dict):
safe = b'-!*+/' + ascii_letters.encode('ascii') + digits.encode('ascii')
def __missing__(self, key):
if key in self.safe:
self[key] = chr(key)
else:
self[key] = "={:02X}".format(key)
return self[key]
_q_byte_map = _QByteMap()
# In headers spaces are mapped to '_'.
_q_byte_map[ord(' ')] = '_'
def encode_q(bstring):
return ''.join(_q_byte_map[x] for x in bstring)
def len_q(bstring):
return sum(len(_q_byte_map[x]) for x in bstring)
#
# Base64
#
def decode_b(encoded):
defects = []
pad_err = len(encoded) % 4
if pad_err:
defects.append(errors.InvalidBase64PaddingDefect())
padded_encoded = encoded + b'==='[:4-pad_err]
else:
padded_encoded = encoded
try:
return base64.b64decode(padded_encoded, validate=True), defects
except binascii.Error:
# Since we had correct padding, this must an invalid char error.
defects = [errors.InvalidBase64CharactersDefect()]
# The non-alphabet characters are ignored as far as padding
# goes, but we don't know how many there are. So we'll just
# try various padding lengths until something works.
for i in 0, 1, 2, 3:
try:
return base64.b64decode(encoded+b'='*i, validate=False), defects
except binascii.Error:
if i==0:
defects.append(errors.InvalidBase64PaddingDefect())
else:
# This should never happen.
raise AssertionError("unexpected binascii.Error")
def encode_b(bstring):
return base64.b64encode(bstring).decode('ascii')
def len_b(bstring):
groups_of_3, leftover = divmod(len(bstring), 3)
# 4 bytes out for each 3 bytes (or nonzero fraction thereof) in.
return groups_of_3 * 4 + (4 if leftover else 0)
_cte_decoders = {
'q': decode_q,
'b': decode_b,
}
def decode(ew):
"""Decode encoded word and return (string, charset, lang, defects) tuple.
An RFC 2047/2243 encoded word has the form:
=?charset*lang?cte?encoded_string?=
where '*lang' may be omitted but the other parts may not be.
This function expects exactly such a string (that is, it does not check the
syntax and may raise errors if the string is not well formed), and returns
the encoded_string decoded first from its Content Transfer Encoding and
then from the resulting bytes into unicode using the specified charset. If
the cte-decoded string does not successfully decode using the specified
character set, a defect is added to the defects list and the unknown octets
are replaced by the unicode 'unknown' character \\uFDFF.
The specified charset and language are returned. The default for language,
which is rarely if ever encountered, is the empty string.
"""
_, charset, cte, cte_string, _ = ew.split('?')
charset, _, lang = charset.partition('*')
cte = cte.lower()
# Recover the original bytes and do CTE decoding.
bstring = cte_string.encode('ascii', 'surrogateescape')
bstring, defects = _cte_decoders[cte](bstring)
# Turn the CTE decoded bytes into unicode.
try:
string = bstring.decode(charset)
except UnicodeError:
defects.append(errors.UndecodableBytesDefect("Encoded word "
"contains bytes not decodable using {} charset".format(charset)))
string = bstring.decode(charset, 'surrogateescape')
except LookupError:
string = bstring.decode('ascii', 'surrogateescape')
if charset.lower() != 'unknown-8bit':
defects.append(errors.CharsetError("Unknown charset {} "
"in encoded word; decoded as unknown bytes".format(charset)))
return string, charset, lang, defects
_cte_encoders = {
'q': encode_q,
'b': encode_b,
}
_cte_encode_length = {
'q': len_q,
'b': len_b,
}
def encode(string, charset='utf-8', encoding=None, lang=''):
"""Encode string using the CTE encoding that produces the shorter result.
Produces an RFC 2047/2243 encoded word of the form:
=?charset*lang?cte?encoded_string?=
where '*lang' is omitted unless the 'lang' parameter is given a value.
Optional argument charset (defaults to utf-8) specifies the charset to use
to encode the string to binary before CTE encoding it. Optional argument
'encoding' is the cte specifier for the encoding that should be used ('q'
or 'b'); if it is None (the default) the encoding which produces the
shortest encoded sequence is used, except that 'q' is preferred if it is up
to five characters longer. Optional argument 'lang' (default '') gives the
RFC 2243 language string to specify in the encoded word.
"""
if charset == 'unknown-8bit':
bstring = string.encode('ascii', 'surrogateescape')
else:
bstring = string.encode(charset)
if encoding is None:
qlen = _cte_encode_length['q'](bstring)
blen = _cte_encode_length['b'](bstring)
# Bias toward q. 5 is arbitrary.
encoding = 'q' if qlen - blen < 5 else 'b'
encoded = _cte_encoders[encoding](bstring)
if lang:
lang = '*' + lang
return "=?{}{}?{}?{}?=".format(charset, lang, encoding, encoded)
"""Header value parser implementing various email-related RFC parsing rules.
The parsing methods defined in this module implement various email related
parsing rules. Principal among them is RFC 5322, which is the followon
to RFC 2822 and primarily a clarification of the former. It also implements
RFC 2047 encoded word decoding.
RFC 5322 goes to considerable trouble to maintain backward compatibility with
RFC 822 in the parse phase, while cleaning up the structure on the generation
phase. This parser supports correct RFC 5322 generation by tagging white space
as folding white space only when folding is allowed in the non-obsolete rule
sets. Actually, the parser is even more generous when accepting input than RFC
5322 mandates, following the spirit of Postel's Law, which RFC 5322 encourages.
Where possible deviations from the standard are annotated on the 'defects'
attribute of tokens that deviate.
The general structure of the parser follows RFC 5322, and uses its terminology
where there is a direct correspondence. Where the implementation requires a
somewhat different structure than that used by the formal grammar, new terms
that mimic the closest existing terms are used. Thus, it really helps to have
a copy of RFC 5322 handy when studying this code.
Input to the parser is a string that has already been unfolded according to
RFC 5322 rules. According to the RFC this unfolding is the very first step, and
this parser leaves the unfolding step to a higher level message parser, which
will have already detected the line breaks that need unfolding while
determining the beginning and end of each header.
The output of the parser is a TokenList object, which is a list subclass. A
TokenList is a recursive data structure. The terminal nodes of the structure
are Terminal objects, which are subclasses of str. These do not correspond
directly to terminal objects in the formal grammar, but are instead more
practical higher level combinations of true terminals.
All TokenList and Terminal objects have a 'value' attribute, which produces the
semantically meaningful value of that part of the parse subtree. The value of
all whitespace tokens (no matter how many sub-tokens they may contain) is a
single space, as per the RFC rules. This includes 'CFWS', which is herein
included in the general class of whitespace tokens. There is one exception to
the rule that whitespace tokens are collapsed into single spaces in values: in
the value of a 'bare-quoted-string' (a quoted-string with no leading or
trailing whitespace), any whitespace that appeared between the quotation marks
is preserved in the returned value. Note that in all Terminal strings quoted
pairs are turned into their unquoted values.
All TokenList and Terminal objects also have a string value, which attempts to
be a "canonical" representation of the RFC-compliant form of the substring that
produced the parsed subtree, including minimal use of quoted pair quoting.
Whitespace runs are not collapsed.
Comment tokens also have a 'content' attribute providing the string found
between the parens (including any nested comments) with whitespace preserved.
All TokenList and Terminal objects have a 'defects' attribute which is a
possibly empty list all of the defects found while creating the token. Defects
may appear on any token in the tree, and a composite list of all defects in the
subtree is available through the 'all_defects' attribute of any node. (For
Terminal notes x.defects == x.all_defects.)
Each object in a parse tree is called a 'token', and each has a 'token_type'
attribute that gives the name from the RFC 5322 grammar that it represents.
Not all RFC 5322 nodes are produced, and there is one non-RFC 5322 node that
may be produced: 'ptext'. A 'ptext' is a string of printable ascii characters.
It is returned in place of lists of (ctext/quoted-pair) and
(qtext/quoted-pair).
XXX: provide complete list of token types.
"""
import re
import urllib # For urllib.parse.unquote
from string import hexdigits
from collections import OrderedDict
from operator import itemgetter
from email import _encoded_words as _ew
from email import errors
from email import utils
#
# Useful constants and functions
#
WSP = set(' \t')
CFWS_LEADER = WSP | set('(')
SPECIALS = set(r'()<>@,:;.\"[]')
ATOM_ENDS = SPECIALS | WSP
DOT_ATOM_ENDS = ATOM_ENDS - set('.')
# '.', '"', and '(' do not end phrases in order to support obs-phrase
PHRASE_ENDS = SPECIALS - set('."(')
TSPECIALS = (SPECIALS | set('/?=')) - set('.')
TOKEN_ENDS = TSPECIALS | WSP
ASPECIALS = TSPECIALS | set("*'%")
ATTRIBUTE_ENDS = ASPECIALS | WSP
EXTENDED_ATTRIBUTE_ENDS = ATTRIBUTE_ENDS - set('%')
def quote_string(value):
return '"'+str(value).replace('\\', '\\\\').replace('"', r'\"')+'"'
#
# Accumulator for header folding
#
class _Folded:
def __init__(self, maxlen, policy):
self.maxlen = maxlen
self.policy = policy
self.lastlen = 0
self.stickyspace = None
self.firstline = True
self.done = []
self.current = []
def newline(self):
self.done.extend(self.current)
self.done.append(self.policy.linesep)
self.current.clear()
self.lastlen = 0
def finalize(self):
if self.current:
self.newline()
def __str__(self):
return ''.join(self.done)
def append(self, stoken):
self.current.append(stoken)
def append_if_fits(self, token, stoken=None):
if stoken is None:
stoken = str(token)
l = len(stoken)
if self.stickyspace is not None:
stickyspace_len = len(self.stickyspace)
if self.lastlen + stickyspace_len + l <= self.maxlen:
self.current.append(self.stickyspace)
self.lastlen += stickyspace_len
self.current.append(stoken)
self.lastlen += l
self.stickyspace = None
self.firstline = False
return True
if token.has_fws:
ws = token.pop_leading_fws()
if ws is not None:
self.stickyspace += str(ws)
stickyspace_len += len(ws)
token._fold(self)
return True
if stickyspace_len and l + 1 <= self.maxlen:
margin = self.maxlen - l
if 0 < margin < stickyspace_len:
trim = stickyspace_len - margin
self.current.append(self.stickyspace[:trim])
self.stickyspace = self.stickyspace[trim:]
stickyspace_len = trim
self.newline()
self.current.append(self.stickyspace)
self.current.append(stoken)
self.lastlen = l + stickyspace_len
self.stickyspace = None
self.firstline = False
return True
if not self.firstline:
self.newline()
self.current.append(self.stickyspace)
self.current.append(stoken)
self.stickyspace = None
self.firstline = False
return True
if self.lastlen + l <= self.maxlen:
self.current.append(stoken)
self.lastlen += l
return True
if l < self.maxlen:
self.newline()
self.current.append(stoken)
self.lastlen = l
return True
return False
#
# TokenList and its subclasses
#
class TokenList(list):
token_type = None
def __init__(self, *args, **kw):
super().__init__(*args, **kw)
self.defects = []
def __str__(self):
return ''.join(str(x) for x in self)
def __repr__(self):
return '{}({})'.format(self.__class__.__name__,
super().__repr__())
@property
def value(self):
return ''.join(x.value for x in self if x.value)
@property
def all_defects(self):
return sum((x.all_defects for x in self), self.defects)
#
# Folding API
#
# parts():
#
# return a list of objects that constitute the "higher level syntactic
# objects" specified by the RFC as the best places to fold a header line.
# The returned objects must include leading folding white space, even if
# this means mutating the underlying parse tree of the object. Each object
# is only responsible for returning *its* parts, and should not drill down
# to any lower level except as required to meet the leading folding white
# space constraint.
#
# _fold(folded):
#
# folded: the result accumulator. This is an instance of _Folded.
# (XXX: I haven't finished factoring this out yet, the folding code
# pretty much uses this as a state object.) When the folded.current
# contains as much text as will fit, the _fold method should call
# folded.newline.
# folded.lastlen: the current length of the test stored in folded.current.
# folded.maxlen: The maximum number of characters that may appear on a
# folded line. Differs from the policy setting in that "no limit" is
# represented by +inf, which means it can be used in the trivially
# logical fashion in comparisons.
#
# Currently no subclasses implement parts, and I think this will remain
# true. A subclass only needs to implement _fold when the generic version
# isn't sufficient. _fold will need to be implemented primarily when it is
# possible for encoded words to appear in the specialized token-list, since
# there is no generic algorithm that can know where exactly the encoded
# words are allowed. A _fold implementation is responsible for filling
# lines in the same general way that the top level _fold does. It may, and
# should, call the _fold method of sub-objects in a similar fashion to that
# of the top level _fold.
#
# XXX: I'm hoping it will be possible to factor the existing code further
# to reduce redundancy and make the logic clearer.
@property
def parts(self):
klass = self.__class__
this = []
for token in self:
if token.startswith_fws():
if this:
yield this[0] if len(this)==1 else klass(this)
this.clear()
end_ws = token.pop_trailing_ws()
this.append(token)
if end_ws:
yield klass(this)
this = [end_ws]
if this:
yield this[0] if len(this)==1 else klass(this)
def startswith_fws(self):
return self[0].startswith_fws()
def pop_leading_fws(self):
if self[0].token_type == 'fws':
return self.pop(0)
return self[0].pop_leading_fws()
def pop_trailing_ws(self):
if self[-1].token_type == 'cfws':
return self.pop(-1)
return self[-1].pop_trailing_ws()
@property
def has_fws(self):
for part in self:
if part.has_fws:
return True
return False
def has_leading_comment(self):
return self[0].has_leading_comment()
@property
def comments(self):
comments = []
for token in self:
comments.extend(token.comments)
return comments
def fold(self, *, policy):
# max_line_length 0/None means no limit, ie: infinitely long.
maxlen = policy.max_line_length or float("+inf")
folded = _Folded(maxlen, policy)
self._fold(folded)
folded.finalize()
return str(folded)
def as_encoded_word(self, charset):
# This works only for things returned by 'parts', which include
# the leading fws, if any, that should be used.
res = []
ws = self.pop_leading_fws()
if ws:
res.append(ws)
trailer = self.pop(-1) if self[-1].token_type=='fws' else ''
res.append(_ew.encode(str(self), charset))
res.append(trailer)
return ''.join(res)
def cte_encode(self, charset, policy):
res = []
for part in self:
res.append(part.cte_encode(charset, policy))
return ''.join(res)
def _fold(self, folded):
for part in self.parts:
tstr = str(part)
tlen = len(tstr)
try:
str(part).encode('us-ascii')
except UnicodeEncodeError:
if any(isinstance(x, errors.UndecodableBytesDefect)
for x in part.all_defects):
charset = 'unknown-8bit'
else:
# XXX: this should be a policy setting
charset = 'utf-8'
tstr = part.cte_encode(charset, folded.policy)
tlen = len(tstr)
if folded.append_if_fits(part, tstr):
continue
# Peel off the leading whitespace if any and make it sticky, to
# avoid infinite recursion.
ws = part.pop_leading_fws()
if ws is not None:
# Peel off the leading whitespace and make it sticky, to
# avoid infinite recursion.
folded.stickyspace = str(part.pop(0))
if folded.append_if_fits(part):
continue
if part.has_fws:
part._fold(folded)
continue
# There are no fold points in this one; it is too long for a single
# line and can't be split...we just have to put it on its own line.
folded.append(tstr)
folded.newline()
def pprint(self, indent=''):
print('\n'.join(self._pp(indent='')))
def ppstr(self, indent=''):
return '\n'.join(self._pp(indent=''))
def _pp(self, indent=''):
yield '{}{}/{}('.format(
indent,
self.__class__.__name__,
self.token_type)
for token in self:
if not hasattr(token, '_pp'):
yield (indent + ' !! invalid element in token '
'list: {!r}'.format(token))
else:
yield from token._pp(indent+' ')
if self.defects:
extra = ' Defects: {}'.format(self.defects)
else:
extra = ''
yield '{}){}'.format(indent, extra)
class WhiteSpaceTokenList(TokenList):
@property
def value(self):
return ' '
@property
def comments(self):
return [x.content for x in self if x.token_type=='comment']
class UnstructuredTokenList(TokenList):
token_type = 'unstructured'
def _fold(self, folded):
last_ew = None
for part in self.parts:
tstr = str(part)
is_ew = False
try:
str(part).encode('us-ascii')
except UnicodeEncodeError:
if any(isinstance(x, errors.UndecodableBytesDefect)
for x in part.all_defects):
charset = 'unknown-8bit'
else:
charset = 'utf-8'
if last_ew is not None:
# We've already done an EW, combine this one with it
# if there's room.
chunk = get_unstructured(
''.join(folded.current[last_ew:]+[tstr])).as_encoded_word(charset)
oldlastlen = sum(len(x) for x in folded.current[:last_ew])
schunk = str(chunk)
lchunk = len(schunk)
if oldlastlen + lchunk <= folded.maxlen:
del folded.current[last_ew:]
folded.append(schunk)
folded.lastlen = oldlastlen + lchunk
continue
tstr = part.as_encoded_word(charset)
is_ew = True
if folded.append_if_fits(part, tstr):
if is_ew:
last_ew = len(folded.current) - 1
continue
if is_ew or last_ew:
# It's too big to fit on the line, but since we've
# got encoded words we can use encoded word folding.
part._fold_as_ew(folded)
continue
# Peel off the leading whitespace if any and make it sticky, to
# avoid infinite recursion.
ws = part.pop_leading_fws()
if ws is not None:
folded.stickyspace = str(ws)
if folded.append_if_fits(part):
continue
if part.has_fws:
part.fold(folded)
continue
# It can't be split...we just have to put it on its own line.
folded.append(tstr)
folded.newline()
last_ew = None
def cte_encode(self, charset, policy):
res = []
last_ew = None
for part in self:
spart = str(part)
try:
spart.encode('us-ascii')
res.append(spart)
except UnicodeEncodeError:
if last_ew is None:
res.append(part.cte_encode(charset, policy))
last_ew = len(res)
else:
tl = get_unstructured(''.join(res[last_ew:] + [spart]))
res.append(tl.as_encoded_word())
return ''.join(res)
class Phrase(TokenList):
token_type = 'phrase'
def _fold(self, folded):
# As with Unstructured, we can have pure ASCII with or without
# surrogateescape encoded bytes, or we could have unicode. But this
# case is more complicated, since we have to deal with the various
# sub-token types and how they can be composed in the face of
# unicode-that-needs-CTE-encoding, and the fact that if a token a
# comment that becomes a barrier across which we can't compose encoded
# words.
last_ew = None
for part in self.parts:
tstr = str(part)
tlen = len(tstr)
has_ew = False
try:
str(part).encode('us-ascii')
except UnicodeEncodeError:
if any(isinstance(x, errors.UndecodableBytesDefect)
for x in part.all_defects):
charset = 'unknown-8bit'
else:
charset = 'utf-8'
if last_ew is not None and not part.has_leading_comment():
# We've already done an EW, let's see if we can combine
# this one with it. The last_ew logic ensures that all we
# have at this point is atoms, no comments or quoted
# strings. So we can treat the text between the last
# encoded word and the content of this token as
# unstructured text, and things will work correctly. But
# we have to strip off any trailing comment on this token
# first, and if it is a quoted string we have to pull out
# the content (we're encoding it, so it no longer needs to
# be quoted).
if part[-1].token_type == 'cfws' and part.comments:
remainder = part.pop(-1)
else:
remainder = ''
for i, token in enumerate(part):
if token.token_type == 'bare-quoted-string':
part[i] = UnstructuredTokenList(token[:])
chunk = get_unstructured(
''.join(folded.current[last_ew:]+[tstr])).as_encoded_word(charset)
schunk = str(chunk)
lchunk = len(schunk)
if last_ew + lchunk <= folded.maxlen:
del folded.current[last_ew:]
folded.append(schunk)
folded.lastlen = sum(len(x) for x in folded.current)
continue
tstr = part.as_encoded_word(charset)
tlen = len(tstr)
has_ew = True
if folded.append_if_fits(part, tstr):
if has_ew and not part.comments:
last_ew = len(folded.current) - 1
elif part.comments or part.token_type == 'quoted-string':
# If a comment is involved we can't combine EWs. And if a
# quoted string is involved, it's not worth the effort to
# try to combine them.
last_ew = None
continue
part._fold(folded)
def cte_encode(self, charset, policy):
res = []
last_ew = None
is_ew = False
for part in self:
spart = str(part)
try:
spart.encode('us-ascii')
res.append(spart)
except UnicodeEncodeError:
is_ew = True
if last_ew is None:
if not part.comments:
last_ew = len(res)
res.append(part.cte_encode(charset, policy))
elif not part.has_leading_comment():
if part[-1].token_type == 'cfws' and part.comments:
remainder = part.pop(-1)
else:
remainder = ''
for i, token in enumerate(part):
if token.token_type == 'bare-quoted-string':
part[i] = UnstructuredTokenList(token[:])
tl = get_unstructured(''.join(res[last_ew:] + [spart]))
res[last_ew:] = [tl.as_encoded_word(charset)]
if part.comments or (not is_ew and part.token_type == 'quoted-string'):
last_ew = None
return ''.join(res)
class Word(TokenList):
token_type = 'word'
class CFWSList(WhiteSpaceTokenList):
token_type = 'cfws'
def has_leading_comment(self):
return bool(self.comments)
class Atom(TokenList):
token_type = 'atom'
class Token(TokenList):
token_type = 'token'
class EncodedWord(TokenList):
token_type = 'encoded-word'
cte = None
charset = None
lang = None
@property
def encoded(self):
if self.cte is not None:
return self.cte
_ew.encode(str(self), self.charset)
class QuotedString(TokenList):
token_type = 'quoted-string'
@property
def content(self):
for x in self:
if x.token_type == 'bare-quoted-string':
return x.value
@property
def quoted_value(self):
res = []
for x in self:
if x.token_type == 'bare-quoted-string':
res.append(str(x))
else:
res.append(x.value)
return ''.join(res)
@property
def stripped_value(self):
for token in self:
if token.token_type == 'bare-quoted-string':
return token.value
class BareQuotedString(QuotedString):
token_type = 'bare-quoted-string'
def __str__(self):
return quote_string(''.join(str(x) for x in self))
@property
def value(self):
return ''.join(str(x) for x in self)
class Comment(WhiteSpaceTokenList):
token_type = 'comment'
def __str__(self):
return ''.join(sum([
["("],
[self.quote(x) for x in self],
[")"],
], []))
def quote(self, value):
if value.token_type == 'comment':
return str(value)
return str(value).replace('\\', '\\\\').replace(
'(', '\(').replace(
')', '\)')
@property
def content(self):
return ''.join(str(x) for x in self)
@property
def comments(self):
return [self.content]
class AddressList(TokenList):
token_type = 'address-list'
@property
def addresses(self):
return [x for x in self if x.token_type=='address']
@property
def mailboxes(self):
return sum((x.mailboxes
for x in self if x.token_type=='address'), [])
@property
def all_mailboxes(self):
return sum((x.all_mailboxes
for x in self if x.token_type=='address'), [])
class Address(TokenList):
token_type = 'address'
@property
def display_name(self):
if self[0].token_type == 'group':
return self[0].display_name
@property
def mailboxes(self):
if self[0].token_type == 'mailbox':
return [self[0]]
elif self[0].token_type == 'invalid-mailbox':
return []
return self[0].mailboxes
@property
def all_mailboxes(self):
if self[0].token_type == 'mailbox':
return [self[0]]
elif self[0].token_type == 'invalid-mailbox':
return [self[0]]
return self[0].all_mailboxes
class MailboxList(TokenList):
token_type = 'mailbox-list'
@property
def mailboxes(self):
return [x for x in self if x.token_type=='mailbox']
@property
def all_mailboxes(self):
return [x for x in self
if x.token_type in ('mailbox', 'invalid-mailbox')]
class GroupList(TokenList):
token_type = 'group-list'
@property
def mailboxes(self):
if not self or self[0].token_type != 'mailbox-list':
return []
return self[0].mailboxes
@property
def all_mailboxes(self):
if not self or self[0].token_type != 'mailbox-list':
return []
return self[0].all_mailboxes
class Group(TokenList):
token_type = "group"
@property
def mailboxes(self):
if self[2].token_type != 'group-list':
return []
return self[2].mailboxes
@property
def all_mailboxes(self):
if self[2].token_type != 'group-list':
return []
return self[2].all_mailboxes
@property
def display_name(self):
return self[0].display_name
class NameAddr(TokenList):
token_type = 'name-addr'
@property
def display_name(self):
if len(self) == 1:
return None
return self[0].display_name
@property
def local_part(self):
return self[-1].local_part
@property
def domain(self):
return self[-1].domain
@property
def route(self):
return self[-1].route
@property
def addr_spec(self):
return self[-1].addr_spec
class AngleAddr(TokenList):
token_type = 'angle-addr'
@property
def local_part(self):
for x in self:
if x.token_type == 'addr-spec':
return x.local_part
@property
def domain(self):
for x in self:
if x.token_type == 'addr-spec':
return x.domain
@property
def route(self):
for x in self:
if x.token_type == 'obs-route':
return x.domains
@property
def addr_spec(self):
for x in self:
if x.token_type == 'addr-spec':
return x.addr_spec
else:
return '<>'
class ObsRoute(TokenList):
token_type = 'obs-route'
@property
def domains(self):
return [x.domain for x in self if x.token_type == 'domain']
class Mailbox(TokenList):
token_type = 'mailbox'
@property
def display_name(self):
if self[0].token_type == 'name-addr':
return self[0].display_name
@property
def local_part(self):
return self[0].local_part
@property
def domain(self):
return self[0].domain
@property
def route(self):
if self[0].token_type == 'name-addr':
return self[0].route
@property
def addr_spec(self):
return self[0].addr_spec
class InvalidMailbox(TokenList):
token_type = 'invalid-mailbox'
@property
def display_name(self):
return None
local_part = domain = route = addr_spec = display_name
class Domain(TokenList):
token_type = 'domain'
@property
def domain(self):
return ''.join(super().value.split())
class DotAtom(TokenList):
token_type = 'dot-atom'
class DotAtomText(TokenList):
token_type = 'dot-atom-text'
class AddrSpec(TokenList):
token_type = 'addr-spec'
@property
def local_part(self):
return self[0].local_part
@property
def domain(self):
if len(self) < 3:
return None
return self[-1].domain
@property
def value(self):
if len(self) < 3:
return self[0].value
return self[0].value.rstrip()+self[1].value+self[2].value.lstrip()
@property
def addr_spec(self):
nameset = set(self.local_part)
if len(nameset) > len(nameset-DOT_ATOM_ENDS):
lp = quote_string(self.local_part)
else:
lp = self.local_part
if self.domain is not None:
return lp + '@' + self.domain
return lp
class ObsLocalPart(TokenList):
token_type = 'obs-local-part'
class DisplayName(Phrase):
token_type = 'display-name'
@property
def display_name(self):
res = TokenList(self)
if res[0].token_type == 'cfws':
res.pop(0)
else:
if res[0][0].token_type == 'cfws':
res[0] = TokenList(res[0][1:])
if res[-1].token_type == 'cfws':
res.pop()
else:
if res[-1][-1].token_type == 'cfws':
res[-1] = TokenList(res[-1][:-1])
return res.value
@property
def value(self):
quote = False
if self.defects:
quote = True
else:
for x in self:
if x.token_type == 'quoted-string':
quote = True
if quote:
pre = post = ''
if self[0].token_type=='cfws' or self[0][0].token_type=='cfws':
pre = ' '
if self[-1].token_type=='cfws' or self[-1][-1].token_type=='cfws':
post = ' '
return pre+quote_string(self.display_name)+post
else:
return super().value
class LocalPart(TokenList):
token_type = 'local-part'
@property
def value(self):
if self[0].token_type == "quoted-string":
return self[0].quoted_value
else:
return self[0].value
@property
def local_part(self):
# Strip whitespace from front, back, and around dots.
res = [DOT]
last = DOT
last_is_tl = False
for tok in self[0] + [DOT]:
if tok.token_type == 'cfws':
continue
if (last_is_tl and tok.token_type == 'dot' and
last[-1].token_type == 'cfws'):
res[-1] = TokenList(last[:-1])
is_tl = isinstance(tok, TokenList)
if (is_tl and last.token_type == 'dot' and
tok[0].token_type == 'cfws'):
res.append(TokenList(tok[1:]))
else:
res.append(tok)
last = res[-1]
last_is_tl = is_tl
res = TokenList(res[1:-1])
return res.value
class DomainLiteral(TokenList):
token_type = 'domain-literal'
@property
def domain(self):
return ''.join(super().value.split())
@property
def ip(self):
for x in self:
if x.token_type == 'ptext':
return x.value
class MIMEVersion(TokenList):
token_type = 'mime-version'
major = None
minor = None
class Parameter(TokenList):
token_type = 'parameter'
sectioned = False
extended = False
charset = 'us-ascii'
@property
def section_number(self):
# Because the first token, the attribute (name) eats CFWS, the second
# token is always the section if there is one.
return self[1].number if self.sectioned else 0
@property
def param_value(self):
# This is part of the "handle quoted extended parameters" hack.
for token in self:
if token.token_type == 'value':
return token.stripped_value
if token.token_type == 'quoted-string':
for token in token:
if token.token_type == 'bare-quoted-string':
for token in token:
if token.token_type == 'value':
return token.stripped_value
return ''
class InvalidParameter(Parameter):
token_type = 'invalid-parameter'
class Attribute(TokenList):
token_type = 'attribute'
@property
def stripped_value(self):
for token in self:
if token.token_type.endswith('attrtext'):
return token.value
class Section(TokenList):
token_type = 'section'
number = None
class Value(TokenList):
token_type = 'value'
@property
def stripped_value(self):
token = self[0]
if token.token_type == 'cfws':
token = self[1]
if token.token_type.endswith(
('quoted-string', 'attribute', 'extended-attribute')):
return token.stripped_value
return self.value
class MimeParameters(TokenList):
token_type = 'mime-parameters'
@property
def params(self):
# The RFC specifically states that the ordering of parameters is not
# guaranteed and may be reordered by the transport layer. So we have
# to assume the RFC 2231 pieces can come in any order. However, we
# output them in the order that we first see a given name, which gives
# us a stable __str__.
params = OrderedDict()
for token in self:
if not token.token_type.endswith('parameter'):
continue
if token[0].token_type != 'attribute':
continue
name = token[0].value.strip()
if name not in params:
params[name] = []
params[name].append((token.section_number, token))
for name, parts in params.items():
parts = sorted(parts, key=itemgetter(0))
first_param = parts[0][1]
charset = first_param.charset
# Our arbitrary error recovery is to ignore duplicate parameters,
# to use appearance order if there are duplicate rfc 2231 parts,
# and to ignore gaps. This mimics the error recovery of get_param.
if not first_param.extended and len(parts) > 1:
if parts[1][0] == 0:
parts[1][1].defects.append(errors.InvalidHeaderDefect(
'duplicate parameter name; duplicate(s) ignored'))
parts = parts[:1]
# Else assume the *0* was missing...note that this is different
# from get_param, but we registered a defect for this earlier.
value_parts = []
i = 0
for section_number, param in parts:
if section_number != i:
# We could get fancier here and look for a complete
# duplicate extended parameter and ignore the second one
# seen. But we're not doing that. The old code didn't.
if not param.extended:
param.defects.append(errors.InvalidHeaderDefect(
'duplicate parameter name; duplicate ignored'))
continue
else:
param.defects.append(errors.InvalidHeaderDefect(
"inconsistent RFC2231 parameter numbering"))
i += 1
value = param.param_value
if param.extended:
try:
value = urllib.parse.unquote_to_bytes(value)
except UnicodeEncodeError:
# source had surrogate escaped bytes. What we do now
# is a bit of an open question. I'm not sure this is
# the best choice, but it is what the old algorithm did
value = urllib.parse.unquote(value, encoding='latin-1')
else:
try:
value = value.decode(charset, 'surrogateescape')
except LookupError:
# XXX: there should really be a custom defect for
# unknown character set to make it easy to find,
# because otherwise unknown charset is a silent
# failure.
value = value.decode('us-ascii', 'surrogateescape')
if utils._has_surrogates(value):
param.defects.append(errors.UndecodableBytesDefect())
value_parts.append(value)
value = ''.join(value_parts)
yield name, value
def __str__(self):
params = []
for name, value in self.params:
if value:
params.append('{}={}'.format(name, quote_string(value)))
else:
params.append(name)
params = '; '.join(params)
return ' ' + params if params else ''
class ParameterizedHeaderValue(TokenList):
@property
def params(self):
for token in reversed(self):
if token.token_type == 'mime-parameters':
return token.params
return {}
@property
def parts(self):
if self and self[-1].token_type == 'mime-parameters':
# We don't want to start a new line if all of the params don't fit
# after the value, so unwrap the parameter list.
return TokenList(self[:-1] + self[-1])
return TokenList(self).parts
class ContentType(ParameterizedHeaderValue):
token_type = 'content-type'
maintype = 'text'
subtype = 'plain'
class ContentDisposition(ParameterizedHeaderValue):
token_type = 'content-disposition'
content_disposition = None
class ContentTransferEncoding(TokenList):
token_type = 'content-transfer-encoding'
cte = '7bit'
class HeaderLabel(TokenList):
token_type = 'header-label'
class Header(TokenList):
token_type = 'header'
def _fold(self, folded):
folded.append(str(self.pop(0)))
folded.lastlen = len(folded.current[0])
# The first line of the header is different from all others: we don't
# want to start a new object on a new line if it has any fold points in
# it that would allow part of it to be on the first header line.
# Further, if the first fold point would fit on the new line, we want
# to do that, but if it doesn't we want to put it on the first line.
# Folded supports this via the stickyspace attribute. If this
# attribute is not None, it does the special handling.
folded.stickyspace = str(self.pop(0)) if self[0].token_type == 'cfws' else ''
rest = self.pop(0)
if self:
raise ValueError("Malformed Header token list")
rest._fold(folded)
#
# Terminal classes and instances
#
class Terminal(str):
def __new__(cls, value, token_type):
self = super().__new__(cls, value)
self.token_type = token_type
self.defects = []
return self
def __repr__(self):
return "{}({})".format(self.__class__.__name__, super().__repr__())
@property
def all_defects(self):
return list(self.defects)
def _pp(self, indent=''):
return ["{}{}/{}({}){}".format(
indent,
self.__class__.__name__,
self.token_type,
super().__repr__(),
'' if not self.defects else ' {}'.format(self.defects),
)]
def cte_encode(self, charset, policy):
value = str(self)
try:
value.encode('us-ascii')
return value
except UnicodeEncodeError:
return _ew.encode(value, charset)
def pop_trailing_ws(self):
# This terminates the recursion.
return None
def pop_leading_fws(self):
# This terminates the recursion.
return None
@property
def comments(self):
return []
def has_leading_comment(self):
return False
def __getnewargs__(self):
return(str(self), self.token_type)
class WhiteSpaceTerminal(Terminal):
@property
def value(self):
return ' '
def startswith_fws(self):
return True
has_fws = True
class ValueTerminal(Terminal):
@property
def value(self):
return self
def startswith_fws(self):
return False
has_fws = False
def as_encoded_word(self, charset):
return _ew.encode(str(self), charset)
class EWWhiteSpaceTerminal(WhiteSpaceTerminal):
@property
def value(self):
return ''
@property
def encoded(self):
return self[:]
def __str__(self):
return ''
has_fws = True
# XXX these need to become classes and used as instances so
# that a program can't change them in a parse tree and screw
# up other parse trees. Maybe should have tests for that, too.
DOT = ValueTerminal('.', 'dot')
ListSeparator = ValueTerminal(',', 'list-separator')
RouteComponentMarker = ValueTerminal('@', 'route-component-marker')
#
# Parser
#
# Parse strings according to RFC822/2047/2822/5322 rules.
#
# This is a stateless parser. Each get_XXX function accepts a string and
# returns either a Terminal or a TokenList representing the RFC object named
# by the method and a string containing the remaining unparsed characters
# from the input. Thus a parser method consumes the next syntactic construct
# of a given type and returns a token representing the construct plus the
# unparsed remainder of the input string.
#
# For example, if the first element of a structured header is a 'phrase',
# then:
#
# phrase, value = get_phrase(value)
#
# returns the complete phrase from the start of the string value, plus any
# characters left in the string after the phrase is removed.
_wsp_splitter = re.compile(r'([{}]+)'.format(''.join(WSP))).split
_non_atom_end_matcher = re.compile(r"[^{}]+".format(
''.join(ATOM_ENDS).replace('\\','\\\\').replace(']','\]'))).match
_non_printable_finder = re.compile(r"[\x00-\x20\x7F]").findall
_non_token_end_matcher = re.compile(r"[^{}]+".format(
''.join(TOKEN_ENDS).replace('\\','\\\\').replace(']','\]'))).match
_non_attribute_end_matcher = re.compile(r"[^{}]+".format(
''.join(ATTRIBUTE_ENDS).replace('\\','\\\\').replace(']','\]'))).match
_non_extended_attribute_end_matcher = re.compile(r"[^{}]+".format(
''.join(EXTENDED_ATTRIBUTE_ENDS).replace(
'\\','\\\\').replace(']','\]'))).match
def _validate_xtext(xtext):
"""If input token contains ASCII non-printables, register a defect."""
non_printables = _non_printable_finder(xtext)
if non_printables:
xtext.defects.append(errors.NonPrintableDefect(non_printables))
if utils._has_surrogates(xtext):
xtext.defects.append(errors.UndecodableBytesDefect(
"Non-ASCII characters found in header token"))
def _get_ptext_to_endchars(value, endchars):
"""Scan printables/quoted-pairs until endchars and return unquoted ptext.
This function turns a run of qcontent, ccontent-without-comments, or
dtext-with-quoted-printables into a single string by unquoting any
quoted printables. It returns the string, the remaining value, and
a flag that is True iff there were any quoted printables decoded.
"""
fragment, *remainder = _wsp_splitter(value, 1)
vchars = []
escape = False
had_qp = False
for pos in range(len(fragment)):
if fragment[pos] == '\\':
if escape:
escape = False
had_qp = True
else:
escape = True
continue
if escape:
escape = False
elif fragment[pos] in endchars:
break
vchars.append(fragment[pos])
else:
pos = pos + 1
return ''.join(vchars), ''.join([fragment[pos:]] + remainder), had_qp
def get_fws(value):
"""FWS = 1*WSP
This isn't the RFC definition. We're using fws to represent tokens where
folding can be done, but when we are parsing the *un*folding has already
been done so we don't need to watch out for CRLF.
"""
newvalue = value.lstrip()
fws = WhiteSpaceTerminal(value[:len(value)-len(newvalue)], 'fws')
return fws, newvalue
def get_encoded_word(value):
""" encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
"""
ew = EncodedWord()
if not value.startswith('=?'):
raise errors.HeaderParseError(
"expected encoded word but found {}".format(value))
tok, *remainder = value[2:].split('?=', 1)
if tok == value[2:]:
raise errors.HeaderParseError(
"expected encoded word but found {}".format(value))
remstr = ''.join(remainder)
if len(remstr) > 1 and remstr[0] in hexdigits and remstr[1] in hexdigits:
# The ? after the CTE was followed by an encoded word escape (=XX).
rest, *remainder = remstr.split('?=', 1)
tok = tok + '?=' + rest
if len(tok.split()) > 1:
ew.defects.append(errors.InvalidHeaderDefect(
"whitespace inside encoded word"))
ew.cte = value
value = ''.join(remainder)
try:
text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
except ValueError:
raise errors.HeaderParseError(
"encoded word format invalid: '{}'".format(ew.cte))
ew.charset = charset
ew.lang = lang
ew.defects.extend(defects)
while text:
if text[0] in WSP:
token, text = get_fws(text)
ew.append(token)
continue
chars, *remainder = _wsp_splitter(text, 1)
vtext = ValueTerminal(chars, 'vtext')
_validate_xtext(vtext)
ew.append(vtext)
text = ''.join(remainder)
return ew, value
def get_unstructured(value):
"""unstructured = (*([FWS] vchar) *WSP) / obs-unstruct
obs-unstruct = *((*LF *CR *(obs-utext) *LF *CR)) / FWS)
obs-utext = %d0 / obs-NO-WS-CTL / LF / CR
obs-NO-WS-CTL is control characters except WSP/CR/LF.
So, basically, we have printable runs, plus control characters or nulls in
the obsolete syntax, separated by whitespace. Since RFC 2047 uses the
obsolete syntax in its specification, but requires whitespace on either
side of the encoded words, I can see no reason to need to separate the
non-printable-non-whitespace from the printable runs if they occur, so we
parse this into xtext tokens separated by WSP tokens.
Because an 'unstructured' value must by definition constitute the entire
value, this 'get' routine does not return a remaining value, only the
parsed TokenList.
"""
# XXX: but what about bare CR and LF? They might signal the start or
# end of an encoded word. YAGNI for now, since our current parsers
# will never send us strings with bare CR or LF.
unstructured = UnstructuredTokenList()
while value:
if value[0] in WSP:
token, value = get_fws(value)
unstructured.append(token)
continue
if value.startswith('=?'):
try:
token, value = get_encoded_word(value)
except errors.HeaderParseError:
# XXX: Need to figure out how to register defects when
# appropriate here.
pass
else:
have_ws = True
if len(unstructured) > 0:
if unstructured[-1].token_type != 'fws':
unstructured.defects.append(errors.InvalidHeaderDefect(
"missing whitespace before encoded word"))
have_ws = False
if have_ws and len(unstructured) > 1:
if unstructured[-2].token_type == 'encoded-word':
unstructured[-1] = EWWhiteSpaceTerminal(
unstructured[-1], 'fws')
unstructured.append(token)
continue
tok, *remainder = _wsp_splitter(value, 1)
vtext = ValueTerminal(tok, 'vtext')
_validate_xtext(vtext)
unstructured.append(vtext)
value = ''.join(remainder)
return unstructured
def get_qp_ctext(value):
"""ctext = <printable ascii except \ ( )>
This is not the RFC ctext, since we are handling nested comments in comment
and unquoting quoted-pairs here. We allow anything except the '()'
characters, but if we find any ASCII other than the RFC defined printable
ASCII an NonPrintableDefect is added to the token's defects list. Since
quoted pairs are converted to their unquoted values, what is returned is
a 'ptext' token. In this case it is a WhiteSpaceTerminal, so it's value
is ' '.
"""
ptext, value, _ = _get_ptext_to_endchars(value, '()')
ptext = WhiteSpaceTerminal(ptext, 'ptext')
_validate_xtext(ptext)
return ptext, value
def get_qcontent(value):
"""qcontent = qtext / quoted-pair
We allow anything except the DQUOTE character, but if we find any ASCII
other than the RFC defined printable ASCII an NonPrintableDefect is
added to the token's defects list. Any quoted pairs are converted to their
unquoted values, so what is returned is a 'ptext' token. In this case it
is a ValueTerminal.
"""
ptext, value, _ = _get_ptext_to_endchars(value, '"')
ptext = ValueTerminal(ptext, 'ptext')
_validate_xtext(ptext)
return ptext, value
def get_atext(value):
"""atext = <matches _atext_matcher>
We allow any non-ATOM_ENDS in atext, but add an InvalidATextDefect to
the token's defects list if we find non-atext characters.
"""
m = _non_atom_end_matcher(value)
if not m:
raise errors.HeaderParseError(
"expected atext but found '{}'".format(value))
atext = m.group()
value = value[len(atext):]
atext = ValueTerminal(atext, 'atext')
_validate_xtext(atext)
return atext, value
def get_bare_quoted_string(value):
"""bare-quoted-string = DQUOTE *([FWS] qcontent) [FWS] DQUOTE
A quoted-string without the leading or trailing white space. Its
value is the text between the quote marks, with whitespace
preserved and quoted pairs decoded.
"""
if value[0] != '"':
raise errors.HeaderParseError(
"expected '\"' but found '{}'".format(value))
bare_quoted_string = BareQuotedString()
value = value[1:]
while value and value[0] != '"':
if value[0] in WSP:
token, value = get_fws(value)
elif value[:2] == '=?':
try:
token, value = get_encoded_word(value)
bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
"encoded word inside quoted string"))
except errors.HeaderParseError:
token, value = get_qcontent(value)
else:
token, value = get_qcontent(value)
bare_quoted_string.append(token)
if not value:
bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
"end of header inside quoted string"))
return bare_quoted_string, value
return bare_quoted_string, value[1:]
def get_comment(value):
"""comment = "(" *([FWS] ccontent) [FWS] ")"
ccontent = ctext / quoted-pair / comment
We handle nested comments here, and quoted-pair in our qp-ctext routine.
"""
if value and value[0] != '(':
raise errors.HeaderParseError(
"expected '(' but found '{}'".format(value))
comment = Comment()
value = value[1:]
while value and value[0] != ")":
if value[0] in WSP:
token, value = get_fws(value)
elif value[0] == '(':
token, value = get_comment(value)
else:
token, value = get_qp_ctext(value)
comment.append(token)
if not value:
comment.defects.append(errors.InvalidHeaderDefect(
"end of header inside comment"))
return comment, value
return comment, value[1:]
def get_cfws(value):
"""CFWS = (1*([FWS] comment) [FWS]) / FWS
"""
cfws = CFWSList()
while value and value[0] in CFWS_LEADER:
if value[0] in WSP:
token, value = get_fws(value)
else:
token, value = get_comment(value)
cfws.append(token)
return cfws, value
def get_quoted_string(value):
"""quoted-string = [CFWS] <bare-quoted-string> [CFWS]
'bare-quoted-string' is an intermediate class defined by this
parser and not by the RFC grammar. It is the quoted string
without any attached CFWS.
"""
quoted_string = QuotedString()
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
quoted_string.append(token)
token, value = get_bare_quoted_string(value)
quoted_string.append(token)
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
quoted_string.append(token)
return quoted_string, value
def get_atom(value):
"""atom = [CFWS] 1*atext [CFWS]
An atom could be an rfc2047 encoded word.
"""
atom = Atom()
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
atom.append(token)
if value and value[0] in ATOM_ENDS:
raise errors.HeaderParseError(
"expected atom but found '{}'".format(value))
if value.startswith('=?'):
try:
token, value = get_encoded_word(value)
except errors.HeaderParseError:
# XXX: need to figure out how to register defects when
# appropriate here.
token, value = get_atext(value)
else:
token, value = get_atext(value)
atom.append(token)
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
atom.append(token)
return atom, value
def get_dot_atom_text(value):
""" dot-text = 1*atext *("." 1*atext)
"""
dot_atom_text = DotAtomText()
if not value or value[0] in ATOM_ENDS:
raise errors.HeaderParseError("expected atom at a start of "
"dot-atom-text but found '{}'".format(value))
while value and value[0] not in ATOM_ENDS:
token, value = get_atext(value)
dot_atom_text.append(token)
if value and value[0] == '.':
dot_atom_text.append(DOT)
value = value[1:]
if dot_atom_text[-1] is DOT:
raise errors.HeaderParseError("expected atom at end of dot-atom-text "
"but found '{}'".format('.'+value))
return dot_atom_text, value
def get_dot_atom(value):
""" dot-atom = [CFWS] dot-atom-text [CFWS]
Any place we can have a dot atom, we could instead have an rfc2047 encoded
word.
"""
dot_atom = DotAtom()
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
dot_atom.append(token)
if value.startswith('=?'):
try:
token, value = get_encoded_word(value)
except errors.HeaderParseError:
# XXX: need to figure out how to register defects when
# appropriate here.
token, value = get_dot_atom_text(value)
else:
token, value = get_dot_atom_text(value)
dot_atom.append(token)
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
dot_atom.append(token)
return dot_atom, value
def get_word(value):
"""word = atom / quoted-string
Either atom or quoted-string may start with CFWS. We have to peel off this
CFWS first to determine which type of word to parse. Afterward we splice
the leading CFWS, if any, into the parsed sub-token.
If neither an atom or a quoted-string is found before the next special, a
HeaderParseError is raised.
The token returned is either an Atom or a QuotedString, as appropriate.
This means the 'word' level of the formal grammar is not represented in the
parse tree; this is because having that extra layer when manipulating the
parse tree is more confusing than it is helpful.
"""
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
else:
leader = None
if value[0]=='"':
token, value = get_quoted_string(value)
elif value[0] in SPECIALS:
raise errors.HeaderParseError("Expected 'atom' or 'quoted-string' "
"but found '{}'".format(value))
else:
token, value = get_atom(value)
if leader is not None:
token[:0] = [leader]
return token, value
def get_phrase(value):
""" phrase = 1*word / obs-phrase
obs-phrase = word *(word / "." / CFWS)
This means a phrase can be a sequence of words, periods, and CFWS in any
order as long as it starts with at least one word. If anything other than
words is detected, an ObsoleteHeaderDefect is added to the token's defect
list. We also accept a phrase that starts with CFWS followed by a dot;
this is registered as an InvalidHeaderDefect, since it is not supported by
even the obsolete grammar.
"""
phrase = Phrase()
try:
token, value = get_word(value)
phrase.append(token)
except errors.HeaderParseError:
phrase.defects.append(errors.InvalidHeaderDefect(
"phrase does not start with word"))
while value and value[0] not in PHRASE_ENDS:
if value[0]=='.':
phrase.append(DOT)
phrase.defects.append(errors.ObsoleteHeaderDefect(
"period in 'phrase'"))
value = value[1:]
else:
try:
token, value = get_word(value)
except errors.HeaderParseError:
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
phrase.defects.append(errors.ObsoleteHeaderDefect(
"comment found without atom"))
else:
raise
phrase.append(token)
return phrase, value
def get_local_part(value):
""" local-part = dot-atom / quoted-string / obs-local-part
"""
local_part = LocalPart()
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value:
raise errors.HeaderParseError(
"expected local-part but found '{}'".format(value))
try:
token, value = get_dot_atom(value)
except errors.HeaderParseError:
try:
token, value = get_word(value)
except errors.HeaderParseError:
if value[0] != '\\' and value[0] in PHRASE_ENDS:
raise
token = TokenList()
if leader is not None:
token[:0] = [leader]
local_part.append(token)
if value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
obs_local_part, value = get_obs_local_part(str(local_part) + value)
if obs_local_part.token_type == 'invalid-obs-local-part':
local_part.defects.append(errors.InvalidHeaderDefect(
"local-part is not dot-atom, quoted-string, or obs-local-part"))
else:
local_part.defects.append(errors.ObsoleteHeaderDefect(
"local-part is not a dot-atom (contains CFWS)"))
local_part[0] = obs_local_part
try:
local_part.value.encode('ascii')
except UnicodeEncodeError:
local_part.defects.append(errors.NonASCIILocalPartDefect(
"local-part contains non-ASCII characters)"))
return local_part, value
def get_obs_local_part(value):
""" obs-local-part = word *("." word)
"""
obs_local_part = ObsLocalPart()
last_non_ws_was_dot = False
while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
if value[0] == '.':
if last_non_ws_was_dot:
obs_local_part.defects.append(errors.InvalidHeaderDefect(
"invalid repeated '.'"))
obs_local_part.append(DOT)
last_non_ws_was_dot = True
value = value[1:]
continue
elif value[0]=='\\':
obs_local_part.append(ValueTerminal(value[0],
'misplaced-special'))
value = value[1:]
obs_local_part.defects.append(errors.InvalidHeaderDefect(
"'\\' character outside of quoted-string/ccontent"))
last_non_ws_was_dot = False
continue
if obs_local_part and obs_local_part[-1].token_type != 'dot':
obs_local_part.defects.append(errors.InvalidHeaderDefect(
"missing '.' between words"))
try:
token, value = get_word(value)
last_non_ws_was_dot = False
except errors.HeaderParseError:
if value[0] not in CFWS_LEADER:
raise
token, value = get_cfws(value)
obs_local_part.append(token)
if (obs_local_part[0].token_type == 'dot' or
obs_local_part[0].token_type=='cfws' and
obs_local_part[1].token_type=='dot'):
obs_local_part.defects.append(errors.InvalidHeaderDefect(
"Invalid leading '.' in local part"))
if (obs_local_part[-1].token_type == 'dot' or
obs_local_part[-1].token_type=='cfws' and
obs_local_part[-2].token_type=='dot'):
obs_local_part.defects.append(errors.InvalidHeaderDefect(
"Invalid trailing '.' in local part"))
if obs_local_part.defects:
obs_local_part.token_type = 'invalid-obs-local-part'
return obs_local_part, value
def get_dtext(value):
""" dtext = <printable ascii except \ [ ]> / obs-dtext
obs-dtext = obs-NO-WS-CTL / quoted-pair
We allow anything except the excluded characters, but if we find any
ASCII other than the RFC defined printable ASCII an NonPrintableDefect is
added to the token's defects list. Quoted pairs are converted to their
unquoted values, so what is returned is a ptext token, in this case a
ValueTerminal. If there were quoted-printables, an ObsoleteHeaderDefect is
added to the returned token's defect list.
"""
ptext, value, had_qp = _get_ptext_to_endchars(value, '[]')
ptext = ValueTerminal(ptext, 'ptext')
if had_qp:
ptext.defects.append(errors.ObsoleteHeaderDefect(
"quoted printable found in domain-literal"))
_validate_xtext(ptext)
return ptext, value
def _check_for_early_dl_end(value, domain_literal):
if value:
return False
domain_literal.append(errors.InvalidHeaderDefect(
"end of input inside domain-literal"))
domain_literal.append(ValueTerminal(']', 'domain-literal-end'))
return True
def get_domain_literal(value):
""" domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
"""
domain_literal = DomainLiteral()
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
domain_literal.append(token)
if not value:
raise errors.HeaderParseError("expected domain-literal")
if value[0] != '[':
raise errors.HeaderParseError("expected '[' at start of domain-literal "
"but found '{}'".format(value))
value = value[1:]
if _check_for_early_dl_end(value, domain_literal):
return domain_literal, value
domain_literal.append(ValueTerminal('[', 'domain-literal-start'))
if value[0] in WSP:
token, value = get_fws(value)
domain_literal.append(token)
token, value = get_dtext(value)
domain_literal.append(token)
if _check_for_early_dl_end(value, domain_literal):
return domain_literal, value
if value[0] in WSP:
token, value = get_fws(value)
domain_literal.append(token)
if _check_for_early_dl_end(value, domain_literal):
return domain_literal, value
if value[0] != ']':
raise errors.HeaderParseError("expected ']' at end of domain-literal "
"but found '{}'".format(value))
domain_literal.append(ValueTerminal(']', 'domain-literal-end'))
value = value[1:]
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
domain_literal.append(token)
return domain_literal, value
def get_domain(value):
""" domain = dot-atom / domain-literal / obs-domain
obs-domain = atom *("." atom))
"""
domain = Domain()
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value:
raise errors.HeaderParseError(
"expected domain but found '{}'".format(value))
if value[0] == '[':
token, value = get_domain_literal(value)
if leader is not None:
token[:0] = [leader]
domain.append(token)
return domain, value
try:
token, value = get_dot_atom(value)
except errors.HeaderParseError:
token, value = get_atom(value)
if leader is not None:
token[:0] = [leader]
domain.append(token)
if value and value[0] == '.':
domain.defects.append(errors.ObsoleteHeaderDefect(
"domain is not a dot-atom (contains CFWS)"))
if domain[0].token_type == 'dot-atom':
domain[:] = domain[0]
while value and value[0] == '.':
domain.append(DOT)
token, value = get_atom(value[1:])
domain.append(token)
return domain, value
def get_addr_spec(value):
""" addr-spec = local-part "@" domain
"""
addr_spec = AddrSpec()
token, value = get_local_part(value)
addr_spec.append(token)
if not value or value[0] != '@':
addr_spec.defects.append(errors.InvalidHeaderDefect(
"add-spec local part with no domain"))
return addr_spec, value
addr_spec.append(ValueTerminal('@', 'address-at-symbol'))
token, value = get_domain(value[1:])
addr_spec.append(token)
return addr_spec, value
def get_obs_route(value):
""" obs-route = obs-domain-list ":"
obs-domain-list = *(CFWS / ",") "@" domain *("," [CFWS] ["@" domain])
Returns an obs-route token with the appropriate sub-tokens (that is,
there is no obs-domain-list in the parse tree).
"""
obs_route = ObsRoute()
while value and (value[0]==',' or value[0] in CFWS_LEADER):
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
obs_route.append(token)
elif value[0] == ',':
obs_route.append(ListSeparator)
value = value[1:]
if not value or value[0] != '@':
raise errors.HeaderParseError(
"expected obs-route domain but found '{}'".format(value))
obs_route.append(RouteComponentMarker)
token, value = get_domain(value[1:])
obs_route.append(token)
while value and value[0]==',':
obs_route.append(ListSeparator)
value = value[1:]
if not value:
break
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
obs_route.append(token)
if value[0] == '@':
obs_route.append(RouteComponentMarker)
token, value = get_domain(value[1:])
obs_route.append(token)
if not value:
raise errors.HeaderParseError("end of header while parsing obs-route")
if value[0] != ':':
raise errors.HeaderParseError( "expected ':' marking end of "
"obs-route but found '{}'".format(value))
obs_route.append(ValueTerminal(':', 'end-of-obs-route-marker'))
return obs_route, value[1:]
def get_angle_addr(value):
""" angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
obs-angle-addr = [CFWS] "<" obs-route addr-spec ">" [CFWS]
"""
angle_addr = AngleAddr()
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
angle_addr.append(token)
if not value or value[0] != '<':
raise errors.HeaderParseError(
"expected angle-addr but found '{}'".format(value))
angle_addr.append(ValueTerminal('<', 'angle-addr-start'))
value = value[1:]
# Although it is not legal per RFC5322, SMTP uses '<>' in certain
# circumstances.
if value[0] == '>':
angle_addr.append(ValueTerminal('>', 'angle-addr-end'))
angle_addr.defects.append(errors.InvalidHeaderDefect(
"null addr-spec in angle-addr"))
value = value[1:]
return angle_addr, value
try:
token, value = get_addr_spec(value)
except errors.HeaderParseError:
try:
token, value = get_obs_route(value)
angle_addr.defects.append(errors.ObsoleteHeaderDefect(
"obsolete route specification in angle-addr"))
except errors.HeaderParseError:
raise errors.HeaderParseError(
"expected addr-spec or obs-route but found '{}'".format(value))
angle_addr.append(token)
token, value = get_addr_spec(value)
angle_addr.append(token)
if value and value[0] == '>':
value = value[1:]
else:
angle_addr.defects.append(errors.InvalidHeaderDefect(
"missing trailing '>' on angle-addr"))
angle_addr.append(ValueTerminal('>', 'angle-addr-end'))
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
angle_addr.append(token)
return angle_addr, value
def get_display_name(value):
""" display-name = phrase
Because this is simply a name-rule, we don't return a display-name
token containing a phrase, but rather a display-name token with
the content of the phrase.
"""
display_name = DisplayName()
token, value = get_phrase(value)
display_name.extend(token[:])
display_name.defects = token.defects[:]
return display_name, value
def get_name_addr(value):
""" name-addr = [display-name] angle-addr
"""
name_addr = NameAddr()
# Both the optional display name and the angle-addr can start with cfws.
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value:
raise errors.HeaderParseError(
"expected name-addr but found '{}'".format(leader))
if value[0] != '<':
if value[0] in PHRASE_ENDS:
raise errors.HeaderParseError(
"expected name-addr but found '{}'".format(value))
token, value = get_display_name(value)
if not value:
raise errors.HeaderParseError(
"expected name-addr but found '{}'".format(token))
if leader is not None:
token[0][:0] = [leader]
leader = None
name_addr.append(token)
token, value = get_angle_addr(value)
if leader is not None:
token[:0] = [leader]
name_addr.append(token)
return name_addr, value
def get_mailbox(value):
""" mailbox = name-addr / addr-spec
"""
# The only way to figure out if we are dealing with a name-addr or an
# addr-spec is to try parsing each one.
mailbox = Mailbox()
try:
token, value = get_name_addr(value)
except errors.HeaderParseError:
try:
token, value = get_addr_spec(value)
except errors.HeaderParseError:
raise errors.HeaderParseError(
"expected mailbox but found '{}'".format(value))
if any(isinstance(x, errors.InvalidHeaderDefect)
for x in token.all_defects):
mailbox.token_type = 'invalid-mailbox'
mailbox.append(token)
return mailbox, value
def get_invalid_mailbox(value, endchars):
""" Read everything up to one of the chars in endchars.
This is outside the formal grammar. The InvalidMailbox TokenList that is
returned acts like a Mailbox, but the data attributes are None.
"""
invalid_mailbox = InvalidMailbox()
while value and value[0] not in endchars:
if value[0] in PHRASE_ENDS:
invalid_mailbox.append(ValueTerminal(value[0],
'misplaced-special'))
value = value[1:]
else:
token, value = get_phrase(value)
invalid_mailbox.append(token)
return invalid_mailbox, value
def get_mailbox_list(value):
""" mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
obs-mbox-list = *([CFWS] ",") mailbox *("," [mailbox / CFWS])
For this routine we go outside the formal grammar in order to improve error
handling. We recognize the end of the mailbox list only at the end of the
value or at a ';' (the group terminator). This is so that we can turn
invalid mailboxes into InvalidMailbox tokens and continue parsing any
remaining valid mailboxes. We also allow all mailbox entries to be null,
and this condition is handled appropriately at a higher level.
"""
mailbox_list = MailboxList()
while value and value[0] != ';':
try:
token, value = get_mailbox(value)
mailbox_list.append(token)
except errors.HeaderParseError:
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value or value[0] in ',;':
mailbox_list.append(leader)
mailbox_list.defects.append(errors.ObsoleteHeaderDefect(
"empty element in mailbox-list"))
else:
token, value = get_invalid_mailbox(value, ',;')
if leader is not None:
token[:0] = [leader]
mailbox_list.append(token)
mailbox_list.defects.append(errors.InvalidHeaderDefect(
"invalid mailbox in mailbox-list"))
elif value[0] == ',':
mailbox_list.defects.append(errors.ObsoleteHeaderDefect(
"empty element in mailbox-list"))
else:
token, value = get_invalid_mailbox(value, ',;')
if leader is not None:
token[:0] = [leader]
mailbox_list.append(token)
mailbox_list.defects.append(errors.InvalidHeaderDefect(
"invalid mailbox in mailbox-list"))
if value and value[0] not in ',;':
# Crap after mailbox; treat it as an invalid mailbox.
# The mailbox info will still be available.
mailbox = mailbox_list[-1]
mailbox.token_type = 'invalid-mailbox'
token, value = get_invalid_mailbox(value, ',;')
mailbox.extend(token)
mailbox_list.defects.append(errors.InvalidHeaderDefect(
"invalid mailbox in mailbox-list"))
if value and value[0] == ',':
mailbox_list.append(ListSeparator)
value = value[1:]
return mailbox_list, value
def get_group_list(value):
""" group-list = mailbox-list / CFWS / obs-group-list
obs-group-list = 1*([CFWS] ",") [CFWS]
"""
group_list = GroupList()
if not value:
group_list.defects.append(errors.InvalidHeaderDefect(
"end of header before group-list"))
return group_list, value
leader = None
if value and value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value:
# This should never happen in email parsing, since CFWS-only is a
# legal alternative to group-list in a group, which is the only
# place group-list appears.
group_list.defects.append(errors.InvalidHeaderDefect(
"end of header in group-list"))
group_list.append(leader)
return group_list, value
if value[0] == ';':
group_list.append(leader)
return group_list, value
token, value = get_mailbox_list(value)
if len(token.all_mailboxes)==0:
if leader is not None:
group_list.append(leader)
group_list.extend(token)
group_list.defects.append(errors.ObsoleteHeaderDefect(
"group-list with empty entries"))
return group_list, value
if leader is not None:
token[:0] = [leader]
group_list.append(token)
return group_list, value
def get_group(value):
""" group = display-name ":" [group-list] ";" [CFWS]
"""
group = Group()
token, value = get_display_name(value)
if not value or value[0] != ':':
raise errors.HeaderParseError("expected ':' at end of group "
"display name but found '{}'".format(value))
group.append(token)
group.append(ValueTerminal(':', 'group-display-name-terminator'))
value = value[1:]
if value and value[0] == ';':
group.append(ValueTerminal(';', 'group-terminator'))
return group, value[1:]
token, value = get_group_list(value)
group.append(token)
if not value:
group.defects.append(errors.InvalidHeaderDefect(
"end of header in group"))
if value[0] != ';':
raise errors.HeaderParseError(
"expected ';' at end of group but found {}".format(value))
group.append(ValueTerminal(';', 'group-terminator'))
value = value[1:]
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
group.append(token)
return group, value
def get_address(value):
""" address = mailbox / group
Note that counter-intuitively, an address can be either a single address or
a list of addresses (a group). This is why the returned Address object has
a 'mailboxes' attribute which treats a single address as a list of length
one. When you need to differentiate between to two cases, extract the single
element, which is either a mailbox or a group token.
"""
# The formal grammar isn't very helpful when parsing an address. mailbox
# and group, especially when allowing for obsolete forms, start off very
# similarly. It is only when you reach one of @, <, or : that you know
# what you've got. So, we try each one in turn, starting with the more
# likely of the two. We could perhaps make this more efficient by looking
# for a phrase and then branching based on the next character, but that
# would be a premature optimization.
address = Address()
try:
token, value = get_group(value)
except errors.HeaderParseError:
try:
token, value = get_mailbox(value)
except errors.HeaderParseError:
raise errors.HeaderParseError(
"expected address but found '{}'".format(value))
address.append(token)
return address, value
def get_address_list(value):
""" address_list = (address *("," address)) / obs-addr-list
obs-addr-list = *([CFWS] ",") address *("," [address / CFWS])
We depart from the formal grammar here by continuing to parse until the end
of the input, assuming the input to be entirely composed of an
address-list. This is always true in email parsing, and allows us
to skip invalid addresses to parse additional valid ones.
"""
address_list = AddressList()
while value:
try:
token, value = get_address(value)
address_list.append(token)
except errors.HeaderParseError as err:
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value or value[0] == ',':
address_list.append(leader)
address_list.defects.append(errors.ObsoleteHeaderDefect(
"address-list entry with no content"))
else:
token, value = get_invalid_mailbox(value, ',')
if leader is not None:
token[:0] = [leader]
address_list.append(Address([token]))
address_list.defects.append(errors.InvalidHeaderDefect(
"invalid address in address-list"))
elif value[0] == ',':
address_list.defects.append(errors.ObsoleteHeaderDefect(
"empty element in address-list"))
else:
token, value = get_invalid_mailbox(value, ',')
if leader is not None:
token[:0] = [leader]
address_list.append(Address([token]))
address_list.defects.append(errors.InvalidHeaderDefect(
"invalid address in address-list"))
if value and value[0] != ',':
# Crap after address; treat it as an invalid mailbox.
# The mailbox info will still be available.
mailbox = address_list[-1][0]
mailbox.token_type = 'invalid-mailbox'
token, value = get_invalid_mailbox(value, ',')
mailbox.extend(token)
address_list.defects.append(errors.InvalidHeaderDefect(
"invalid address in address-list"))
if value: # Must be a , at this point.
address_list.append(ValueTerminal(',', 'list-separator'))
value = value[1:]
return address_list, value
#
# XXX: As I begin to add additional header parsers, I'm realizing we probably
# have two level of parser routines: the get_XXX methods that get a token in
# the grammar, and parse_XXX methods that parse an entire field value. So
# get_address_list above should really be a parse_ method, as probably should
# be get_unstructured.
#
def parse_mime_version(value):
""" mime-version = [CFWS] 1*digit [CFWS] "." [CFWS] 1*digit [CFWS]
"""
# The [CFWS] is implicit in the RFC 2045 BNF.
# XXX: This routine is a bit verbose, should factor out a get_int method.
mime_version = MIMEVersion()
if not value:
mime_version.defects.append(errors.HeaderMissingRequiredValue(
"Missing MIME version number (eg: 1.0)"))
return mime_version
if value[0] in CFWS_LEADER:
token, value = get_cfws(value)
mime_version.append(token)
if not value:
mime_version.defects.append(errors.HeaderMissingRequiredValue(
"Expected MIME version number but found only CFWS"))
digits = ''
while value and value[0] != '.' and value[0] not in CFWS_LEADER:
digits += value[0]
value = value[1:]
if not digits.isdigit():
mime_version.defects.append(errors.InvalidHeaderDefect(
"Expected MIME major version number but found {!r}".format(digits)))
mime_version.append(ValueTerminal(digits, 'xtext'))
else:
mime_version.major = int(digits)
mime_version.append(ValueTerminal(digits, 'digits'))
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
mime_version.append(token)
if not value or value[0] != '.':
if mime_version.major is not None:
mime_version.defects.append(errors.InvalidHeaderDefect(
"Incomplete MIME version; found only major number"))
if value:
mime_version.append(ValueTerminal(value, 'xtext'))
return mime_version
mime_version.append(ValueTerminal('.', 'version-separator'))
value = value[1:]
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
mime_version.append(token)
if not value:
if mime_version.major is not None:
mime_version.defects.append(errors.InvalidHeaderDefect(
"Incomplete MIME version; found only major number"))
return mime_version
digits = ''
while value and value[0] not in CFWS_LEADER:
digits += value[0]
value = value[1:]
if not digits.isdigit():
mime_version.defects.append(errors.InvalidHeaderDefect(
"Expected MIME minor version number but found {!r}".format(digits)))
mime_version.append(ValueTerminal(digits, 'xtext'))
else:
mime_version.minor = int(digits)
mime_version.append(ValueTerminal(digits, 'digits'))
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
mime_version.append(token)
if value:
mime_version.defects.append(errors.InvalidHeaderDefect(
"Excess non-CFWS text after MIME version"))
mime_version.append(ValueTerminal(value, 'xtext'))
return mime_version
def get_invalid_parameter(value):
""" Read everything up to the next ';'.
This is outside the formal grammar. The InvalidParameter TokenList that is
returned acts like a Parameter, but the data attributes are None.
"""
invalid_parameter = InvalidParameter()
while value and value[0] != ';':
if value[0] in PHRASE_ENDS:
invalid_parameter.append(ValueTerminal(value[0],
'misplaced-special'))
value = value[1:]
else:
token, value = get_phrase(value)
invalid_parameter.append(token)
return invalid_parameter, value
def get_ttext(value):
"""ttext = <matches _ttext_matcher>
We allow any non-TOKEN_ENDS in ttext, but add defects to the token's
defects list if we find non-ttext characters. We also register defects for
*any* non-printables even though the RFC doesn't exclude all of them,
because we follow the spirit of RFC 5322.
"""
m = _non_token_end_matcher(value)
if not m:
raise errors.HeaderParseError(
"expected ttext but found '{}'".format(value))
ttext = m.group()
value = value[len(ttext):]
ttext = ValueTerminal(ttext, 'ttext')
_validate_xtext(ttext)
return ttext, value
def get_token(value):
"""token = [CFWS] 1*ttext [CFWS]
The RFC equivalent of ttext is any US-ASCII chars except space, ctls, or
tspecials. We also exclude tabs even though the RFC doesn't.
The RFC implies the CFWS but is not explicit about it in the BNF.
"""
mtoken = Token()
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
mtoken.append(token)
if value and value[0] in TOKEN_ENDS:
raise errors.HeaderParseError(
"expected token but found '{}'".format(value))
token, value = get_ttext(value)
mtoken.append(token)
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
mtoken.append(token)
return mtoken, value
def get_attrtext(value):
"""attrtext = 1*(any non-ATTRIBUTE_ENDS character)
We allow any non-ATTRIBUTE_ENDS in attrtext, but add defects to the
token's defects list if we find non-attrtext characters. We also register
defects for *any* non-printables even though the RFC doesn't exclude all of
them, because we follow the spirit of RFC 5322.
"""
m = _non_attribute_end_matcher(value)
if not m:
raise errors.HeaderParseError(
"expected attrtext but found {!r}".format(value))
attrtext = m.group()
value = value[len(attrtext):]
attrtext = ValueTerminal(attrtext, 'attrtext')
_validate_xtext(attrtext)
return attrtext, value
def get_attribute(value):
""" [CFWS] 1*attrtext [CFWS]
This version of the BNF makes the CFWS explicit, and as usual we use a
value terminal for the actual run of characters. The RFC equivalent of
attrtext is the token characters, with the subtraction of '*', "'", and '%'.
We include tab in the excluded set just as we do for token.
"""
attribute = Attribute()
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
attribute.append(token)
if value and value[0] in ATTRIBUTE_ENDS:
raise errors.HeaderParseError(
"expected token but found '{}'".format(value))
token, value = get_attrtext(value)
attribute.append(token)
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
attribute.append(token)
return attribute, value
def get_extended_attrtext(value):
"""attrtext = 1*(any non-ATTRIBUTE_ENDS character plus '%')
This is a special parsing routine so that we get a value that
includes % escapes as a single string (which we decode as a single
string later).
"""
m = _non_extended_attribute_end_matcher(value)
if not m:
raise errors.HeaderParseError(
"expected extended attrtext but found {!r}".format(value))
attrtext = m.group()
value = value[len(attrtext):]
attrtext = ValueTerminal(attrtext, 'extended-attrtext')
_validate_xtext(attrtext)
return attrtext, value
def get_extended_attribute(value):
""" [CFWS] 1*extended_attrtext [CFWS]
This is like the non-extended version except we allow % characters, so that
we can pick up an encoded value as a single string.
"""
# XXX: should we have an ExtendedAttribute TokenList?
attribute = Attribute()
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
attribute.append(token)
if value and value[0] in EXTENDED_ATTRIBUTE_ENDS:
raise errors.HeaderParseError(
"expected token but found '{}'".format(value))
token, value = get_extended_attrtext(value)
attribute.append(token)
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
attribute.append(token)
return attribute, value
def get_section(value):
""" '*' digits
The formal BNF is more complicated because leading 0s are not allowed. We
check for that and add a defect. We also assume no CFWS is allowed between
the '*' and the digits, though the RFC is not crystal clear on that.
The caller should already have dealt with leading CFWS.
"""
section = Section()
if not value or value[0] != '*':
raise errors.HeaderParseError("Expected section but found {}".format(
value))
section.append(ValueTerminal('*', 'section-marker'))
value = value[1:]
if not value or not value[0].isdigit():
raise errors.HeaderParseError("Expected section number but "
"found {}".format(value))
digits = ''
while value and value[0].isdigit():
digits += value[0]
value = value[1:]
if digits[0] == '0' and digits != '0':
section.defects.append(errors.InvalidHeaderError("section number"
"has an invalid leading 0"))
section.number = int(digits)
section.append(ValueTerminal(digits, 'digits'))
return section, value
def get_value(value):
""" quoted-string / attribute
"""
v = Value()
if not value:
raise errors.HeaderParseError("Expected value but found end of string")
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value:
raise errors.HeaderParseError("Expected value but found "
"only {}".format(leader))
if value[0] == '"':
token, value = get_quoted_string(value)
else:
token, value = get_extended_attribute(value)
if leader is not None:
token[:0] = [leader]
v.append(token)
return v, value
def get_parameter(value):
""" attribute [section] ["*"] [CFWS] "=" value
The CFWS is implied by the RFC but not made explicit in the BNF. This
simplified form of the BNF from the RFC is made to conform with the RFC BNF
through some extra checks. We do it this way because it makes both error
recovery and working with the resulting parse tree easier.
"""
# It is possible CFWS would also be implicitly allowed between the section
# and the 'extended-attribute' marker (the '*') , but we've never seen that
# in the wild and we will therefore ignore the possibility.
param = Parameter()
token, value = get_attribute(value)
param.append(token)
if not value or value[0] == ';':
param.defects.append(errors.InvalidHeaderDefect("Parameter contains "
"name ({}) but no value".format(token)))
return param, value
if value[0] == '*':
try:
token, value = get_section(value)
param.sectioned = True
param.append(token)
except errors.HeaderParseError:
pass
if not value:
raise errors.HeaderParseError("Incomplete parameter")
if value[0] == '*':
param.append(ValueTerminal('*', 'extended-parameter-marker'))
value = value[1:]
param.extended = True
if value[0] != '=':
raise errors.HeaderParseError("Parameter not followed by '='")
param.append(ValueTerminal('=', 'parameter-separator'))
value = value[1:]
leader = None
if value and value[0] in CFWS_LEADER:
token, value = get_cfws(value)
param.append(token)
remainder = None
appendto = param
if param.extended and value and value[0] == '"':
# Now for some serious hackery to handle the common invalid case of
# double quotes around an extended value. We also accept (with defect)
# a value marked as encoded that isn't really.
qstring, remainder = get_quoted_string(value)
inner_value = qstring.stripped_value
semi_valid = False
if param.section_number == 0:
if inner_value and inner_value[0] == "'":
semi_valid = True
else:
token, rest = get_attrtext(inner_value)
if rest and rest[0] == "'":
semi_valid = True
else:
try:
token, rest = get_extended_attrtext(inner_value)
except:
pass
else:
if not rest:
semi_valid = True
if semi_valid:
param.defects.append(errors.InvalidHeaderDefect(
"Quoted string value for extended parameter is invalid"))
param.append(qstring)
for t in qstring:
if t.token_type == 'bare-quoted-string':
t[:] = []
appendto = t
break
value = inner_value
else:
remainder = None
param.defects.append(errors.InvalidHeaderDefect(
"Parameter marked as extended but appears to have a "
"quoted string value that is non-encoded"))
if value and value[0] == "'":
token = None
else:
token, value = get_value(value)
if not param.extended or param.section_number > 0:
if not value or value[0] != "'":
appendto.append(token)
if remainder is not None:
assert not value, value
value = remainder
return param, value
param.defects.append(errors.InvalidHeaderDefect(
"Apparent initial-extended-value but attribute "
"was not marked as extended or was not initial section"))
if not value:
# Assume the charset/lang is missing and the token is the value.
param.defects.append(errors.InvalidHeaderDefect(
"Missing required charset/lang delimiters"))
appendto.append(token)
if remainder is None:
return param, value
else:
if token is not None:
for t in token:
if t.token_type == 'extended-attrtext':
break
t.token_type == 'attrtext'
appendto.append(t)
param.charset = t.value
if value[0] != "'":
raise errors.HeaderParseError("Expected RFC2231 char/lang encoding "
"delimiter, but found {!r}".format(value))
appendto.append(ValueTerminal("'", 'RFC2231 delimiter'))
value = value[1:]
if value and value[0] != "'":
token, value = get_attrtext(value)
appendto.append(token)
param.lang = token.value
if not value or value[0] != "'":
raise errors.HeaderParseError("Expected RFC2231 char/lang encoding "
"delimiter, but found {}".format(value))
appendto.append(ValueTerminal("'", 'RFC2231 delimiter'))
value = value[1:]
if remainder is not None:
# Treat the rest of value as bare quoted string content.
v = Value()
while value:
if value[0] in WSP:
token, value = get_fws(value)
else:
token, value = get_qcontent(value)
v.append(token)
token = v
else:
token, value = get_value(value)
appendto.append(token)
if remainder is not None:
assert not value, value
value = remainder
return param, value
def parse_mime_parameters(value):
""" parameter *( ";" parameter )
That BNF is meant to indicate this routine should only be called after
finding and handling the leading ';'. There is no corresponding rule in
the formal RFC grammar, but it is more convenient for us for the set of
parameters to be treated as its own TokenList.
This is 'parse' routine because it consumes the reminaing value, but it
would never be called to parse a full header. Instead it is called to
parse everything after the non-parameter value of a specific MIME header.
"""
mime_parameters = MimeParameters()
while value:
try:
token, value = get_parameter(value)
mime_parameters.append(token)
except errors.HeaderParseError as err:
leader = None
if value[0] in CFWS_LEADER:
leader, value = get_cfws(value)
if not value:
mime_parameters.append(leader)
return mime_parameters
if value[0] == ';':
if leader is not None:
mime_parameters.append(leader)
mime_parameters.defects.append(errors.InvalidHeaderDefect(
"parameter entry with no content"))
else:
token, value = get_invalid_parameter(value)
if leader:
token[:0] = [leader]
mime_parameters.append(token)
mime_parameters.defects.append(errors.InvalidHeaderDefect(
"invalid parameter {!r}".format(token)))
if value and value[0] != ';':
# Junk after the otherwise valid parameter. Mark it as
# invalid, but it will have a value.
param = mime_parameters[-1]
param.token_type = 'invalid-parameter'
token, value = get_invalid_parameter(value)
param.extend(token)
mime_parameters.defects.append(errors.InvalidHeaderDefect(
"parameter with invalid trailing text {!r}".format(token)))
if value:
# Must be a ';' at this point.
mime_parameters.append(ValueTerminal(';', 'parameter-separator'))
value = value[1:]
return mime_parameters
def _find_mime_parameters(tokenlist, value):
"""Do our best to find the parameters in an invalid MIME header
"""
while value and value[0] != ';':
if value[0] in PHRASE_ENDS:
tokenlist.append(ValueTerminal(value[0], 'misplaced-special'))
value = value[1:]
else:
token, value = get_phrase(value)
tokenlist.append(token)
if not value:
return
tokenlist.append(ValueTerminal(';', 'parameter-separator'))
tokenlist.append(parse_mime_parameters(value[1:]))
def parse_content_type_header(value):
""" maintype "/" subtype *( ";" parameter )
The maintype and substype are tokens. Theoretically they could
be checked against the official IANA list + x-token, but we
don't do that.
"""
ctype = ContentType()
recover = False
if not value:
ctype.defects.append(errors.HeaderMissingRequiredValue(
"Missing content type specification"))
return ctype
try:
token, value = get_token(value)
except errors.HeaderParseError:
ctype.defects.append(errors.InvalidHeaderDefect(
"Expected content maintype but found {!r}".format(value)))
_find_mime_parameters(ctype, value)
return ctype
ctype.append(token)
# XXX: If we really want to follow the formal grammer we should make
# mantype and subtype specialized TokenLists here. Probably not worth it.
if not value or value[0] != '/':
ctype.defects.append(errors.InvalidHeaderDefect(
"Invalid content type"))
if value:
_find_mime_parameters(ctype, value)
return ctype
ctype.maintype = token.value.strip().lower()
ctype.append(ValueTerminal('/', 'content-type-separator'))
value = value[1:]
try:
token, value = get_token(value)
except errors.HeaderParseError:
ctype.defects.append(errors.InvalidHeaderDefect(
"Expected content subtype but found {!r}".format(value)))
_find_mime_parameters(ctype, value)
return ctype
ctype.append(token)
ctype.subtype = token.value.strip().lower()
if not value:
return ctype
if value[0] != ';':
ctype.defects.append(errors.InvalidHeaderDefect(
"Only parameters are valid after content type, but "
"found {!r}".format(value)))
# The RFC requires that a syntactically invalid content-type be treated
# as text/plain. Perhaps we should postel this, but we should probably
# only do that if we were checking the subtype value against IANA.
del ctype.maintype, ctype.subtype
_find_mime_parameters(ctype, value)
return ctype
ctype.append(ValueTerminal(';', 'parameter-separator'))
ctype.append(parse_mime_parameters(value[1:]))
return ctype
def parse_content_disposition_header(value):
""" disposition-type *( ";" parameter )
"""
disp_header = ContentDisposition()
if not value:
disp_header.defects.append(errors.HeaderMissingRequiredValue(
"Missing content disposition"))
return disp_header
try:
token, value = get_token(value)
except errors.HeaderParseError:
disp_header.defects.append(errors.InvalidHeaderDefect(
"Expected content disposition but found {!r}".format(value)))
_find_mime_parameters(disp_header, value)
return disp_header
disp_header.append(token)
disp_header.content_disposition = token.value.strip().lower()
if not value:
return disp_header
if value[0] != ';':
disp_header.defects.append(errors.InvalidHeaderDefect(
"Only parameters are valid after content disposition, but "
"found {!r}".format(value)))
_find_mime_parameters(disp_header, value)
return disp_header
disp_header.append(ValueTerminal(';', 'parameter-separator'))
disp_header.append(parse_mime_parameters(value[1:]))
return disp_header
def parse_content_transfer_encoding_header(value):
""" mechanism
"""
# We should probably validate the values, since the list is fixed.
cte_header = ContentTransferEncoding()
if not value:
cte_header.defects.append(errors.HeaderMissingRequiredValue(
"Missing content transfer encoding"))
return cte_header
try:
token, value = get_token(value)
except errors.HeaderParseError:
cte_header.defects.append(errors.InvalidHeaderDefect(
"Expected content transfer encoding but found {!r}".format(value)))
else:
cte_header.append(token)
cte_header.cte = token.value.strip().lower()
if not value:
return cte_header
while value:
cte_header.defects.append(errors.InvalidHeaderDefect(
"Extra text after content transfer encoding"))
if value[0] in PHRASE_ENDS:
cte_header.append(ValueTerminal(value[0], 'misplaced-special'))
value = value[1:]
else:
token, value = get_phrase(value)
cte_header.append(token)
return cte_header
# Copyright (C) 2002-2007 Python Software Foundation
# Contact: [email protected]
"""Email address parsing code.
Lifted directly from rfc822.py. This should eventually be rewritten.
"""
__all__ = [
'mktime_tz',
'parsedate',
'parsedate_tz',
'quote',
]
import time, calendar
SPACE = ' '
EMPTYSTRING = ''
COMMASPACE = ', '
# Parse a date field
_monthnames = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul',
'aug', 'sep', 'oct', 'nov', 'dec',
'january', 'february', 'march', 'april', 'may', 'june', 'july',
'august', 'september', 'october', 'november', 'december']
_daynames = ['mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun']
# The timezone table does not include the military time zones defined
# in RFC822, other than Z. According to RFC1123, the description in
# RFC822 gets the signs wrong, so we can't rely on any such time
# zones. RFC1123 recommends that numeric timezone indicators be used
# instead of timezone names.
_timezones = {'UT':0, 'UTC':0, 'GMT':0, 'Z':0,
'AST': -400, 'ADT': -300, # Atlantic (used in Canada)
'EST': -500, 'EDT': -400, # Eastern
'CST': -600, 'CDT': -500, # Central
'MST': -700, 'MDT': -600, # Mountain
'PST': -800, 'PDT': -700 # Pacific
}
def parsedate_tz(data):
"""Convert a date string to a time tuple.
Accounts for military timezones.
"""
res = _parsedate_tz(data)
if not res:
return
if res[9] is None:
res[9] = 0
return tuple(res)
def _parsedate_tz(data):
"""Convert date to extended time tuple.
The last (additional) element is the time zone offset in seconds, except if
the timezone was specified as -0000. In that case the last element is
None. This indicates a UTC timestamp that explicitly declaims knowledge of
the source timezone, as opposed to a +0000 timestamp that indicates the
source timezone really was UTC.
"""
if not data:
return
data = data.split()
# The FWS after the comma after the day-of-week is optional, so search and
# adjust for this.
if data[0].endswith(',') or data[0].lower() in _daynames:
# There's a dayname here. Skip it
del data[0]
else:
i = data[0].rfind(',')
if i >= 0:
data[0] = data[0][i+1:]
if len(data) == 3: # RFC 850 date, deprecated
stuff = data[0].split('-')
if len(stuff) == 3:
data = stuff + data[1:]
if len(data) == 4:
s = data[3]
i = s.find('+')
if i == -1:
i = s.find('-')
if i > 0:
data[3:] = [s[:i], s[i:]]
else:
data.append('') # Dummy tz
if len(data) < 5:
return None
data = data[:5]
[dd, mm, yy, tm, tz] = data
mm = mm.lower()
if mm not in _monthnames:
dd, mm = mm, dd.lower()
if mm not in _monthnames:
return None
mm = _monthnames.index(mm) + 1
if mm > 12:
mm -= 12
if dd[-1] == ',':
dd = dd[:-1]
i = yy.find(':')
if i > 0:
yy, tm = tm, yy
if yy[-1] == ',':
yy = yy[:-1]
if not yy[0].isdigit():
yy, tz = tz, yy
if tm[-1] == ',':
tm = tm[:-1]
tm = tm.split(':')
if len(tm) == 2:
[thh, tmm] = tm
tss = '0'
elif len(tm) == 3:
[thh, tmm, tss] = tm
elif len(tm) == 1 and '.' in tm[0]:
# Some non-compliant MUAs use '.' to separate time elements.
tm = tm[0].split('.')
if len(tm) == 2:
[thh, tmm] = tm
tss = 0
elif len(tm) == 3:
[thh, tmm, tss] = tm
else:
return None
try:
yy = int(yy)
dd = int(dd)
thh = int(thh)
tmm = int(tmm)
tss = int(tss)
except ValueError:
return None
# Check for a yy specified in two-digit format, then convert it to the
# appropriate four-digit format, according to the POSIX standard. RFC 822
# calls for a two-digit yy, but RFC 2822 (which obsoletes RFC 822)
# mandates a 4-digit yy. For more information, see the documentation for
# the time module.
if yy < 100:
# The year is between 1969 and 1999 (inclusive).
if yy > 68:
yy += 1900
# The year is between 2000 and 2068 (inclusive).
else:
yy += 2000
tzoffset = None
tz = tz.upper()
if tz in _timezones:
tzoffset = _timezones[tz]
else:
try:
tzoffset = int(tz)
except ValueError:
pass
if tzoffset==0 and tz.startswith('-'):
tzoffset = None
# Convert a timezone offset into seconds ; -0500 -> -18000
if tzoffset:
if tzoffset < 0:
tzsign = -1
tzoffset = -tzoffset
else:
tzsign = 1
tzoffset = tzsign * ( (tzoffset//100)*3600 + (tzoffset % 100)*60)
# Daylight Saving Time flag is set to -1, since DST is unknown.
return [yy, mm, dd, thh, tmm, tss, 0, 1, -1, tzoffset]
def parsedate(data):
"""Convert a time string to a time tuple."""
t = parsedate_tz(data)
if isinstance(t, tuple):
return t[:9]
else:
return t
def mktime_tz(data):
"""Turn a 10-tuple as returned by parsedate_tz() into a POSIX timestamp."""
if data[9] is None:
# No zone info, so localtime is better assumption than GMT
return time.mktime(data[:8] + (-1,))
else:
t = calendar.timegm(data)
return t - data[9]
def quote(str):
"""Prepare string to be used in a quoted string.
Turns backslash and double quote characters into quoted pairs. These
are the only characters that need to be quoted inside a quoted string.
Does not add the surrounding double quotes.
"""
return str.replace('\\', '\\\\').replace('"', '\\"')
class AddrlistClass:
"""Address parser class by Ben Escoto.
To understand what this class does, it helps to have a copy of RFC 2822 in
front of you.
Note: this class interface is deprecated and may be removed in the future.
Use email.utils.AddressList instead.
"""
def __init__(self, field):
"""Initialize a new instance.
`field' is an unparsed address header field, containing
one or more addresses.
"""
self.specials = '()<>@,:;.\"[]'
self.pos = 0
self.LWS = ' \t'
self.CR = '\r\n'
self.FWS = self.LWS + self.CR
self.atomends = self.specials + self.LWS + self.CR
# Note that RFC 2822 now specifies `.' as obs-phrase, meaning that it
# is obsolete syntax. RFC 2822 requires that we recognize obsolete
# syntax, so allow dots in phrases.
self.phraseends = self.atomends.replace('.', '')
self.field = field
self.commentlist = []
def gotonext(self):
"""Skip white space and extract comments."""
wslist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS + '\n\r':
if self.field[self.pos] not in '\n\r':
wslist.append(self.field[self.pos])
self.pos += 1
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
else:
break
return EMPTYSTRING.join(wslist)
def getaddrlist(self):
"""Parse all addresses.
Returns a list containing all of the addresses.
"""
result = []
while self.pos < len(self.field):
ad = self.getaddress()
if ad:
result += ad
else:
result.append(('', ''))
return result
def getaddress(self):
"""Parse the next address."""
self.commentlist = []
self.gotonext()
oldpos = self.pos
oldcl = self.commentlist
plist = self.getphraselist()
self.gotonext()
returnlist = []
if self.pos >= len(self.field):
# Bad email address technically, no domain.
if plist:
returnlist = [(SPACE.join(self.commentlist), plist[0])]
elif self.field[self.pos] in '.@':
# email address is just an addrspec
# this isn't very efficient since we start over
self.pos = oldpos
self.commentlist = oldcl
addrspec = self.getaddrspec()
returnlist = [(SPACE.join(self.commentlist), addrspec)]
elif self.field[self.pos] == ':':
# address is a group
returnlist = []
fieldlen = len(self.field)
self.pos += 1
while self.pos < len(self.field):
self.gotonext()
if self.pos < fieldlen and self.field[self.pos] == ';':
self.pos += 1
break
returnlist = returnlist + self.getaddress()
elif self.field[self.pos] == '<':
# Address is a phrase then a route addr
routeaddr = self.getrouteaddr()
if self.commentlist:
returnlist = [(SPACE.join(plist) + ' (' +
' '.join(self.commentlist) + ')', routeaddr)]
else:
returnlist = [(SPACE.join(plist), routeaddr)]
else:
if plist:
returnlist = [(SPACE.join(self.commentlist), plist[0])]
elif self.field[self.pos] in self.specials:
self.pos += 1
self.gotonext()
if self.pos < len(self.field) and self.field[self.pos] == ',':
self.pos += 1
return returnlist
def getrouteaddr(self):
"""Parse a route address (Return-path value).
This method just skips all the route stuff and returns the addrspec.
"""
if self.field[self.pos] != '<':
return
expectroute = False
self.pos += 1
self.gotonext()
adlist = ''
while self.pos < len(self.field):
if expectroute:
self.getdomain()
expectroute = False
elif self.field[self.pos] == '>':
self.pos += 1
break
elif self.field[self.pos] == '@':
self.pos += 1
expectroute = True
elif self.field[self.pos] == ':':
self.pos += 1
else:
adlist = self.getaddrspec()
self.pos += 1
break
self.gotonext()
return adlist
def getaddrspec(self):
"""Parse an RFC 2822 addr-spec."""
aslist = []
self.gotonext()
while self.pos < len(self.field):
preserve_ws = True
if self.field[self.pos] == '.':
if aslist and not aslist[-1].strip():
aslist.pop()
aslist.append('.')
self.pos += 1
preserve_ws = False
elif self.field[self.pos] == '"':
aslist.append('"%s"' % quote(self.getquote()))
elif self.field[self.pos] in self.atomends:
if aslist and not aslist[-1].strip():
aslist.pop()
break
else:
aslist.append(self.getatom())
ws = self.gotonext()
if preserve_ws and ws:
aslist.append(ws)
if self.pos >= len(self.field) or self.field[self.pos] != '@':
return EMPTYSTRING.join(aslist)
aslist.append('@')
self.pos += 1
self.gotonext()
return EMPTYSTRING.join(aslist) + self.getdomain()
def getdomain(self):
"""Get the complete domain name from an address."""
sdlist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.LWS:
self.pos += 1
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
elif self.field[self.pos] == '[':
sdlist.append(self.getdomainliteral())
elif self.field[self.pos] == '.':
self.pos += 1
sdlist.append('.')
elif self.field[self.pos] in self.atomends:
break
else:
sdlist.append(self.getatom())
return EMPTYSTRING.join(sdlist)
def getdelimited(self, beginchar, endchars, allowcomments=True):
"""Parse a header fragment delimited by special characters.
`beginchar' is the start character for the fragment.
If self is not looking at an instance of `beginchar' then
getdelimited returns the empty string.
`endchars' is a sequence of allowable end-delimiting characters.
Parsing stops when one of these is encountered.
If `allowcomments' is non-zero, embedded RFC 2822 comments are allowed
within the parsed fragment.
"""
if self.field[self.pos] != beginchar:
return ''
slist = ['']
quote = False
self.pos += 1
while self.pos < len(self.field):
if quote:
slist.append(self.field[self.pos])
quote = False
elif self.field[self.pos] in endchars:
self.pos += 1
break
elif allowcomments and self.field[self.pos] == '(':
slist.append(self.getcomment())
continue # have already advanced pos from getcomment
elif self.field[self.pos] == '\\':
quote = True
else:
slist.append(self.field[self.pos])
self.pos += 1
return EMPTYSTRING.join(slist)
def getquote(self):
"""Get a quote-delimited fragment from self's field."""
return self.getdelimited('"', '"\r', False)
def getcomment(self):
"""Get a parenthesis-delimited fragment from self's field."""
return self.getdelimited('(', ')\r', True)
def getdomainliteral(self):
"""Parse an RFC 2822 domain-literal."""
return '[%s]' % self.getdelimited('[', ']\r', False)
def getatom(self, atomends=None):
"""Parse an RFC 2822 atom.
Optional atomends specifies a different set of end token delimiters
(the default is to use self.atomends). This is used e.g. in
getphraselist() since phrase endings must not include the `.' (which
is legal in phrases)."""
atomlist = ['']
if atomends is None:
atomends = self.atomends
while self.pos < len(self.field):
if self.field[self.pos] in atomends:
break
else:
atomlist.append(self.field[self.pos])
self.pos += 1
return EMPTYSTRING.join(atomlist)
def getphraselist(self):
"""Parse a sequence of RFC 2822 phrases.
A phrase is a sequence of words, which are in turn either RFC 2822
atoms or quoted-strings. Phrases are canonicalized by squeezing all
runs of continuous whitespace into one space.
"""
plist = []
while self.pos < len(self.field):
if self.field[self.pos] in self.FWS:
self.pos += 1
elif self.field[self.pos] == '"':
plist.append(self.getquote())
elif self.field[self.pos] == '(':
self.commentlist.append(self.getcomment())
elif self.field[self.pos] in self.phraseends:
break
else:
plist.append(self.getatom(self.phraseends))
return plist
class AddressList(AddrlistClass):
"""An AddressList encapsulates a list of parsed RFC 2822 addresses."""
def __init__(self, field):
AddrlistClass.__init__(self, field)
if field:
self.addresslist = self.getaddrlist()
else:
self.addresslist = []
def __len__(self):
return len(self.addresslist)
def __add__(self, other):
# Set union
newaddr = AddressList(None)
newaddr.addresslist = self.addresslist[:]
for x in other.addresslist:
if not x in self.addresslist:
newaddr.addresslist.append(x)
return newaddr
def __iadd__(self, other):
# Set union, in-place
for x in other.addresslist:
if not x in self.addresslist:
self.addresslist.append(x)
return self
def __sub__(self, other):
# Set difference
newaddr = AddressList(None)
for x in self.addresslist:
if not x in other.addresslist:
newaddr.addresslist.append(x)
return newaddr
def __isub__(self, other):
# Set difference, in-place
for x in other.addresslist:
if x in self.addresslist:
self.addresslist.remove(x)
return self
def __getitem__(self, index):
# Make indexing, slices, and 'in' work
return self.addresslist[index]
"""Policy framework for the email package.
Allows fine grained feature control of how the package parses and emits data.
"""
import abc
from email import header
from email import charset as _charset
from email.utils import _has_surrogates
__all__ = [
'Policy',
'Compat32',
'compat32',
]
class _PolicyBase:
"""Policy Object basic framework.
This class is useless unless subclassed. A subclass should define
class attributes with defaults for any values that are to be
managed by the Policy object. The constructor will then allow
non-default values to be set for these attributes at instance
creation time. The instance will be callable, taking these same
attributes keyword arguments, and returning a new instance
identical to the called instance except for those values changed
by the keyword arguments. Instances may be added, yielding new
instances with any non-default values from the right hand
operand overriding those in the left hand operand. That is,
A + B == A(<non-default values of B>)
The repr of an instance can be used to reconstruct the object
if and only if the repr of the values can be used to reconstruct
those values.
"""
def __init__(self, **kw):
"""Create new Policy, possibly overriding some defaults.
See class docstring for a list of overridable attributes.
"""
for name, value in kw.items():
if hasattr(self, name):
super(_PolicyBase,self).__setattr__(name, value)
else:
raise TypeError(
"{!r} is an invalid keyword argument for {}".format(
name, self.__class__.__name__))
def __repr__(self):
args = [ "{}={!r}".format(name, value)
for name, value in self.__dict__.items() ]
return "{}({})".format(self.__class__.__name__, ', '.join(args))
def clone(self, **kw):
"""Return a new instance with specified attributes changed.
The new instance has the same attribute values as the current object,
except for the changes passed in as keyword arguments.
"""
newpolicy = self.__class__.__new__(self.__class__)
for attr, value in self.__dict__.items():
object.__setattr__(newpolicy, attr, value)
for attr, value in kw.items():
if not hasattr(self, attr):
raise TypeError(
"{!r} is an invalid keyword argument for {}".format(
attr, self.__class__.__name__))
object.__setattr__(newpolicy, attr, value)
return newpolicy
def __setattr__(self, name, value):
if hasattr(self, name):
msg = "{!r} object attribute {!r} is read-only"
else:
msg = "{!r} object has no attribute {!r}"
raise AttributeError(msg.format(self.__class__.__name__, name))
def __add__(self, other):
"""Non-default values from right operand override those from left.
The object returned is a new instance of the subclass.
"""
return self.clone(**other.__dict__)
def _append_doc(doc, added_doc):
doc = doc.rsplit('\n', 1)[0]
added_doc = added_doc.split('\n', 1)[1]
return doc + '\n' + added_doc
def _extend_docstrings(cls):
if cls.__doc__ and cls.__doc__.startswith('+'):
cls.__doc__ = _append_doc(cls.__bases__[0].__doc__, cls.__doc__)
for name, attr in cls.__dict__.items():
if attr.__doc__ and attr.__doc__.startswith('+'):
for c in (c for base in cls.__bases__ for c in base.mro()):
doc = getattr(getattr(c, name), '__doc__')
if doc:
attr.__doc__ = _append_doc(doc, attr.__doc__)
break
return cls
class Policy(_PolicyBase, metaclass=abc.ABCMeta):
r"""Controls for how messages are interpreted and formatted.
Most of the classes and many of the methods in the email package accept
Policy objects as parameters. A Policy object contains a set of values and
functions that control how input is interpreted and how output is rendered.
For example, the parameter 'raise_on_defect' controls whether or not an RFC
violation results in an error being raised or not, while 'max_line_length'
controls the maximum length of output lines when a Message is serialized.
Any valid attribute may be overridden when a Policy is created by passing
it as a keyword argument to the constructor. Policy objects are immutable,
but a new Policy object can be created with only certain values changed by
calling the Policy instance with keyword arguments. Policy objects can
also be added, producing a new Policy object in which the non-default
attributes set in the right hand operand overwrite those specified in the
left operand.
Settable attributes:
raise_on_defect -- If true, then defects should be raised as errors.
Default: False.
linesep -- string containing the value to use as separation
between output lines. Default '\n'.
cte_type -- Type of allowed content transfer encodings
7bit -- ASCII only
8bit -- Content-Transfer-Encoding: 8bit is allowed
Default: 8bit. Also controls the disposition of
(RFC invalid) binary data in headers; see the
documentation of the binary_fold method.
max_line_length -- maximum length of lines, excluding 'linesep',
during serialization. None or 0 means no line
wrapping is done. Default is 78.
"""
raise_on_defect = False
linesep = '\n'
cte_type = '8bit'
max_line_length = 78
def handle_defect(self, obj, defect):
"""Based on policy, either raise defect or call register_defect.
handle_defect(obj, defect)
defect should be a Defect subclass, but in any case must be an
Exception subclass. obj is the object on which the defect should be
registered if it is not raised. If the raise_on_defect is True, the
defect is raised as an error, otherwise the object and the defect are
passed to register_defect.
This method is intended to be called by parsers that discover defects.
The email package parsers always call it with Defect instances.
"""
if self.raise_on_defect:
raise defect
self.register_defect(obj, defect)
def register_defect(self, obj, defect):
"""Record 'defect' on 'obj'.
Called by handle_defect if raise_on_defect is False. This method is
part of the Policy API so that Policy subclasses can implement custom
defect handling. The default implementation calls the append method of
the defects attribute of obj. The objects used by the email package by
default that get passed to this method will always have a defects
attribute with an append method.
"""
obj.defects.append(defect)
def header_max_count(self, name):
"""Return the maximum allowed number of headers named 'name'.
Called when a header is added to a Message object. If the returned
value is not 0 or None, and there are already a number of headers with
the name 'name' equal to the value returned, a ValueError is raised.
Because the default behavior of Message's __setitem__ is to append the
value to the list of headers, it is easy to create duplicate headers
without realizing it. This method allows certain headers to be limited
in the number of instances of that header that may be added to a
Message programmatically. (The limit is not observed by the parser,
which will faithfully produce as many headers as exist in the message
being parsed.)
The default implementation returns None for all header names.
"""
return None
@abc.abstractmethod
def header_source_parse(self, sourcelines):
"""Given a list of linesep terminated strings constituting the lines of
a single header, return the (name, value) tuple that should be stored
in the model. The input lines should retain their terminating linesep
characters. The lines passed in by the email package may contain
surrogateescaped binary data.
"""
raise NotImplementedError
@abc.abstractmethod
def header_store_parse(self, name, value):
"""Given the header name and the value provided by the application
program, return the (name, value) that should be stored in the model.
"""
raise NotImplementedError
@abc.abstractmethod
def header_fetch_parse(self, name, value):
"""Given the header name and the value from the model, return the value
to be returned to the application program that is requesting that
header. The value passed in by the email package may contain
surrogateescaped binary data if the lines were parsed by a BytesParser.
The returned value should not contain any surrogateescaped data.
"""
raise NotImplementedError
@abc.abstractmethod
def fold(self, name, value):
"""Given the header name and the value from the model, return a string
containing linesep characters that implement the folding of the header
according to the policy controls. The value passed in by the email
package may contain surrogateescaped binary data if the lines were
parsed by a BytesParser. The returned value should not contain any
surrogateescaped data.
"""
raise NotImplementedError
@abc.abstractmethod
def fold_binary(self, name, value):
"""Given the header name and the value from the model, return binary
data containing linesep characters that implement the folding of the
header according to the policy controls. The value passed in by the
email package may contain surrogateescaped binary data.
"""
raise NotImplementedError
@_extend_docstrings
class Compat32(Policy):
"""+
This particular policy is the backward compatibility Policy. It
replicates the behavior of the email package version 5.1.
"""
def _sanitize_header(self, name, value):
# If the header value contains surrogates, return a Header using
# the unknown-8bit charset to encode the bytes as encoded words.
if not isinstance(value, str):
# Assume it is already a header object
return value
if _has_surrogates(value):
return header.Header(value, charset=_charset.UNKNOWN8BIT,
header_name=name)
else:
return value
def header_source_parse(self, sourcelines):
"""+
The name is parsed as everything up to the ':' and returned unmodified.
The value is determined by stripping leading whitespace off the
remainder of the first line, joining all subsequent lines together, and
stripping any trailing carriage return or linefeed characters.
"""
name, value = sourcelines[0].split(':', 1)
value = value.lstrip(' \t') + ''.join(sourcelines[1:])
return (name, value.rstrip('\r\n'))
def header_store_parse(self, name, value):
"""+
The name and value are returned unmodified.
"""
return (name, value)
def header_fetch_parse(self, name, value):
"""+
If the value contains binary data, it is converted into a Header object
using the unknown-8bit charset. Otherwise it is returned unmodified.
"""
return self._sanitize_header(name, value)
def fold(self, name, value):
"""+
Headers are folded using the Header folding algorithm, which preserves
existing line breaks in the value, and wraps each resulting line to the
max_line_length. Non-ASCII binary data are CTE encoded using the
unknown-8bit charset.
"""
return self._fold(name, value, sanitize=True)
def fold_binary(self, name, value):
"""+
Headers are folded using the Header folding algorithm, which preserves
existing line breaks in the value, and wraps each resulting line to the
max_line_length. If cte_type is 7bit, non-ascii binary data is CTE
encoded using the unknown-8bit charset. Otherwise the original source
header is used, with its existing line breaks and/or binary data.
"""
folded = self._fold(name, value, sanitize=self.cte_type=='7bit')
return folded.encode('ascii', 'surrogateescape')
def _fold(self, name, value, sanitize):
parts = []
parts.append('%s: ' % name)
if isinstance(value, str):
if _has_surrogates(value):
if sanitize:
h = header.Header(value,
charset=_charset.UNKNOWN8BIT,
header_name=name)
else:
# If we have raw 8bit data in a byte string, we have no idea
# what the encoding is. There is no safe way to split this
# string. If it's ascii-subset, then we could do a normal
# ascii split, but if it's multibyte then we could break the
# string. There's no way to know so the least harm seems to
# be to not split the string and risk it being too long.
parts.append(value)
h = None
else:
h = header.Header(value, header_name=name)
else:
# Assume it is a Header-like object.
h = value
if h is not None:
parts.append(h.encode(linesep=self.linesep,
maxlinelen=self.max_line_length))
parts.append(self.linesep)
return ''.join(parts)
compat32 = Compat32()
# Copyright (C) 2001-2007 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""A package for parsing, handling, and generating email messages."""
__version__ = '5.1.0'
__all__ = [
'base64mime',
'charset',
'encoders',
'errors',
'feedparser',
'generator',
'header',
'iterators',
'message',
'message_from_file',
'message_from_binary_file',
'message_from_string',
'message_from_bytes',
'mime',
'parser',
'quoprimime',
'utils',
]
# Some convenience routines. Don't import Parser and Message as side-effects
# of importing email since those cascadingly import most of the rest of the
# email package.
def message_from_string(s, *args, **kws):
"""Parse a string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parsestr(s)
def message_from_bytes(s, *args, **kws):
"""Parse a bytes string into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import BytesParser
return BytesParser(*args, **kws).parsebytes(s)
def message_from_file(fp, *args, **kws):
"""Read a file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import Parser
return Parser(*args, **kws).parse(fp)
def message_from_binary_file(fp, *args, **kws):
"""Read a binary file and parse its contents into a Message object model.
Optional _class and strict are passed to the Parser constructor.
"""
from email.parser import BytesParser
return BytesParser(*args, **kws).parse(fp)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Keith Dart
# Contact: [email protected]
"""Class representing application/* type MIME documents."""
__all__ = ["MIMEApplication"]
from email import encoders
from email.mime.nonmultipart import MIMENonMultipart
class MIMEApplication(MIMENonMultipart):
"""Class for generating application/* MIME documents."""
def __init__(self, _data, _subtype='octet-stream',
_encoder=encoders.encode_base64, **_params):
"""Create an application/* type MIME document.
_data is a string containing the raw application data.
_subtype is the MIME content type subtype, defaulting to
'octet-stream'.
_encoder is a function which will perform the actual encoding for
transport of the application data, defaulting to base64 encoding.
Any additional keyword arguments are passed to the base class
constructor, which turns them into parameters on the Content-Type
header.
"""
if _subtype is None:
raise TypeError('Invalid application MIME subtype')
MIMENonMultipart.__init__(self, 'application', _subtype, **_params)
self.set_payload(_data)
_encoder(self)
# Copyright (C) 2001-2007 Python Software Foundation
# Author: Anthony Baxter
# Contact: [email protected]
"""Class representing audio/* type MIME documents."""
__all__ = ['MIMEAudio']
import sndhdr
from io import BytesIO
from email import encoders
from email.mime.nonmultipart import MIMENonMultipart
_sndhdr_MIMEmap = {'au' : 'basic',
'wav' :'x-wav',
'aiff':'x-aiff',
'aifc':'x-aiff',
}
# There are others in sndhdr that don't have MIME types. :(
# Additional ones to be added to sndhdr? midi, mp3, realaudio, wma??
def _whatsnd(data):
"""Try to identify a sound file type.
sndhdr.what() has a pretty cruddy interface, unfortunately. This is why
we re-do it here. It would be easier to reverse engineer the Unix 'file'
command and use the standard 'magic' file, as shipped with a modern Unix.
"""
hdr = data[:512]
fakefile = BytesIO(hdr)
for testfn in sndhdr.tests:
res = testfn(hdr, fakefile)
if res is not None:
return _sndhdr_MIMEmap.get(res[0])
return None
class MIMEAudio(MIMENonMultipart):
"""Class for generating audio/* MIME documents."""
def __init__(self, _audiodata, _subtype=None,
_encoder=encoders.encode_base64, **_params):
"""Create an audio/* type MIME document.
_audiodata is a string containing the raw audio data. If this data
can be decoded by the standard Python `sndhdr' module, then the
subtype will be automatically included in the Content-Type header.
Otherwise, you can specify the specific audio subtype via the
_subtype parameter. If _subtype is not given, and no subtype can be
guessed, a TypeError is raised.
_encoder is a function which will perform the actual encoding for
transport of the image data. It takes one argument, which is this
Image instance. It should use get_payload() and set_payload() to
change the payload to the encoded form. It should also add any
Content-Transfer-Encoding or other headers to the message as
necessary. The default encoding is Base64.
Any additional keyword arguments are passed to the base class
constructor, which turns them into parameters on the Content-Type
header.
"""
if _subtype is None:
_subtype = _whatsnd(_audiodata)
if _subtype is None:
raise TypeError('Could not find audio MIME subtype')
MIMENonMultipart.__init__(self, 'audio', _subtype, **_params)
self.set_payload(_audiodata)
_encoder(self)
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Base class for MIME specializations."""
__all__ = ['MIMEBase']
from email import message
class MIMEBase(message.Message):
"""Base class for MIME specializations."""
def __init__(self, _maintype, _subtype, **_params):
"""This constructor adds a Content-Type: and a MIME-Version: header.
The Content-Type: header is taken from the _maintype and _subtype
arguments. Additional parameters for this header are taken from the
keyword arguments.
"""
message.Message.__init__(self)
ctype = '%s/%s' % (_maintype, _subtype)
self.add_header('Content-Type', ctype, **_params)
self['MIME-Version'] = '1.0'
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Class representing image/* type MIME documents."""
__all__ = ['MIMEImage']
import imghdr
from email import encoders
from email.mime.nonmultipart import MIMENonMultipart
class MIMEImage(MIMENonMultipart):
"""Class for generating image/* type MIME documents."""
def __init__(self, _imagedata, _subtype=None,
_encoder=encoders.encode_base64, **_params):
"""Create an image/* type MIME document.
_imagedata is a string containing the raw image data. If this data
can be decoded by the standard Python `imghdr' module, then the
subtype will be automatically included in the Content-Type header.
Otherwise, you can specify the specific image subtype via the _subtype
parameter.
_encoder is a function which will perform the actual encoding for
transport of the image data. It takes one argument, which is this
Image instance. It should use get_payload() and set_payload() to
change the payload to the encoded form. It should also add any
Content-Transfer-Encoding or other headers to the message as
necessary. The default encoding is Base64.
Any additional keyword arguments are passed to the base class
constructor, which turns them into parameters on the Content-Type
header.
"""
if _subtype is None:
_subtype = imghdr.what(None, _imagedata)
if _subtype is None:
raise TypeError('Could not guess image MIME subtype')
MIMENonMultipart.__init__(self, 'image', _subtype, **_params)
self.set_payload(_imagedata)
_encoder(self)
# Copyright (C) 2002-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Base class for MIME multipart/* type messages."""
__all__ = ['MIMEMultipart']
from email.mime.base import MIMEBase
class MIMEMultipart(MIMEBase):
"""Base class for MIME multipart/* type messages."""
def __init__(self, _subtype='mixed', boundary=None, _subparts=None,
**_params):
"""Creates a multipart/* type message.
By default, creates a multipart/mixed message, with proper
Content-Type and MIME-Version headers.
_subtype is the subtype of the multipart content type, defaulting to
`mixed'.
boundary is the multipart boundary string. By default it is
calculated as needed.
_subparts is a sequence of initial subparts for the payload. It
must be an iterable object, such as a list. You can always
attach new subparts to the message by using the attach() method.
Additional parameters for the Content-Type header are taken from the
keyword arguments (or passed into the _params argument).
"""
MIMEBase.__init__(self, 'multipart', _subtype, **_params)
# Initialise _payload to an empty list as the Message superclass's
# implementation of is_multipart assumes that _payload is a list for
# multipart messages.
self._payload = []
if _subparts:
for p in _subparts:
self.attach(p)
if boundary:
self.set_boundary(boundary)
# Copyright (C) 2002-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Base class for MIME type messages that are not multipart."""
__all__ = ['MIMENonMultipart']
from email import errors
from email.mime.base import MIMEBase
class MIMENonMultipart(MIMEBase):
"""Base class for MIME non-multipart type messages."""
def attach(self, payload):
# The public API prohibits attaching multiple subparts to MIMEBase
# derived subtypes since none of them are, by definition, of content
# type multipart/*
raise errors.MultipartConversionError(
'Cannot attach additional subparts to non-multipart/*')
# Copyright (C) 2001-2006 Python Software Foundation
# Author: Barry Warsaw
# Contact: [email protected]
"""Class representing text/* type MIME documents."""
__all__ = ['MIMEText']
from email.mime.nonmultipart import MIMENonMultipart
class MIMEText(MIMENonMultipart):
"""Class for generating text/* type MIME documents."""
def __init__(self, _text, _subtype='plain', _charset=None):
"""Create a text/* type MIME document.
_text is the string for this message object.
_subtype is the MIME sub content type, defaulting to "plain".
_charset is the character set parameter added to the Content-Type
header. This defaults to "us-ascii". Note that as a side-effect, the
Content-Transfer-Encoding header will also be set.
"""
# If no _charset was specified, check to see if there are non-ascii
# characters present. If not, use 'us-ascii', otherwise use utf-8.
# XXX: This can be removed once #7304 is fixed.
if _charset is None:
try:
_text.encode('us-ascii')
_charset = 'us-ascii'
except UnicodeEncodeError:
_charset = 'utf-8'
MIMENonMultipart.__init__(self, 'text', _subtype,
**{'charset': _charset})
self.set_payload(_text, _charset)
""" Encoding Aliases Support
This module is used by the encodings package search function to
map encodings names to module names.
Note that the search function normalizes the encoding names before
doing the lookup, so the mapping will have to map normalized
encoding names to module names.
Contents:
The following aliases dictionary contains mappings of all IANA
character set names for which the Python core library provides
codecs. In addition to these, a few Python specific codec
aliases have also been added.
"""
aliases = {
# Please keep this list sorted alphabetically by value !
# ascii codec
'646' : 'ascii',
'ansi_x3.4_1968' : 'ascii',
'ansi_x3_4_1968' : 'ascii', # some email headers use this non-standard name
'ansi_x3.4_1986' : 'ascii',
'cp367' : 'ascii',
'csascii' : 'ascii',
'ibm367' : 'ascii',
'iso646_us' : 'ascii',
'iso_646.irv_1991' : 'ascii',
'iso_ir_6' : 'ascii',
'us' : 'ascii',
'us_ascii' : 'ascii',
# base64_codec codec
'base64' : 'base64_codec',
'base_64' : 'base64_codec',
# big5 codec
'big5_tw' : 'big5',
'csbig5' : 'big5',
# big5hkscs codec
'big5_hkscs' : 'big5hkscs',
'hkscs' : 'big5hkscs',
# bz2_codec codec
'bz2' : 'bz2_codec',
# cp037 codec
'037' : 'cp037',
'csibm037' : 'cp037',
'ebcdic_cp_ca' : 'cp037',
'ebcdic_cp_nl' : 'cp037',
'ebcdic_cp_us' : 'cp037',
'ebcdic_cp_wt' : 'cp037',
'ibm037' : 'cp037',
'ibm039' : 'cp037',
# cp1026 codec
'1026' : 'cp1026',
'csibm1026' : 'cp1026',
'ibm1026' : 'cp1026',
# cp1125 codec
'1125' : 'cp1125',
'ibm1125' : 'cp1125',
'cp866u' : 'cp1125',
'ruscii' : 'cp1125',
# cp1140 codec
'1140' : 'cp1140',
'ibm1140' : 'cp1140',
# cp1250 codec
'1250' : 'cp1250',
'windows_1250' : 'cp1250',
# cp1251 codec
'1251' : 'cp1251',
'windows_1251' : 'cp1251',
# cp1252 codec
'1252' : 'cp1252',
'windows_1252' : 'cp1252',
# cp1253 codec
'1253' : 'cp1253',
'windows_1253' : 'cp1253',
# cp1254 codec
'1254' : 'cp1254',
'windows_1254' : 'cp1254',
# cp1255 codec
'1255' : 'cp1255',
'windows_1255' : 'cp1255',
# cp1256 codec
'1256' : 'cp1256',
'windows_1256' : 'cp1256',
# cp1257 codec
'1257' : 'cp1257',
'windows_1257' : 'cp1257',
# cp1258 codec
'1258' : 'cp1258',
'windows_1258' : 'cp1258',
# cp273 codec
'273' : 'cp273',
'ibm273' : 'cp273',
'csibm273' : 'cp273',
# cp424 codec
'424' : 'cp424',
'csibm424' : 'cp424',
'ebcdic_cp_he' : 'cp424',
'ibm424' : 'cp424',
# cp437 codec
'437' : 'cp437',
'cspc8codepage437' : 'cp437',
'ibm437' : 'cp437',
# cp500 codec
'500' : 'cp500',
'csibm500' : 'cp500',
'ebcdic_cp_be' : 'cp500',
'ebcdic_cp_ch' : 'cp500',
'ibm500' : 'cp500',
# cp775 codec
'775' : 'cp775',
'cspc775baltic' : 'cp775',
'ibm775' : 'cp775',
# cp850 codec
'850' : 'cp850',
'cspc850multilingual' : 'cp850',
'ibm850' : 'cp850',
# cp852 codec
'852' : 'cp852',
'cspcp852' : 'cp852',
'ibm852' : 'cp852',
# cp855 codec
'855' : 'cp855',
'csibm855' : 'cp855',
'ibm855' : 'cp855',
# cp857 codec
'857' : 'cp857',
'csibm857' : 'cp857',
'ibm857' : 'cp857',
# cp858 codec
'858' : 'cp858',
'csibm858' : 'cp858',
'ibm858' : 'cp858',
# cp860 codec
'860' : 'cp860',
'csibm860' : 'cp860',
'ibm860' : 'cp860',
# cp861 codec
'861' : 'cp861',
'cp_is' : 'cp861',
'csibm861' : 'cp861',
'ibm861' : 'cp861',
# cp862 codec
'862' : 'cp862',
'cspc862latinhebrew' : 'cp862',
'ibm862' : 'cp862',
# cp863 codec
'863' : 'cp863',
'csibm863' : 'cp863',
'ibm863' : 'cp863',
# cp864 codec
'864' : 'cp864',
'csibm864' : 'cp864',
'ibm864' : 'cp864',
# cp865 codec
'865' : 'cp865',
'csibm865' : 'cp865',
'ibm865' : 'cp865',
# cp866 codec
'866' : 'cp866',
'csibm866' : 'cp866',
'ibm866' : 'cp866',
# cp869 codec
'869' : 'cp869',
'cp_gr' : 'cp869',
'csibm869' : 'cp869',
'ibm869' : 'cp869',
# cp932 codec
'932' : 'cp932',
'ms932' : 'cp932',
'mskanji' : 'cp932',
'ms_kanji' : 'cp932',
# cp949 codec
'949' : 'cp949',
'ms949' : 'cp949',
'uhc' : 'cp949',
# cp950 codec
'950' : 'cp950',
'ms950' : 'cp950',
# euc_jis_2004 codec
'jisx0213' : 'euc_jis_2004',
'eucjis2004' : 'euc_jis_2004',
'euc_jis2004' : 'euc_jis_2004',
# euc_jisx0213 codec
'eucjisx0213' : 'euc_jisx0213',
# euc_jp codec
'eucjp' : 'euc_jp',
'ujis' : 'euc_jp',
'u_jis' : 'euc_jp',
# euc_kr codec
'euckr' : 'euc_kr',
'korean' : 'euc_kr',
'ksc5601' : 'euc_kr',
'ks_c_5601' : 'euc_kr',
'ks_c_5601_1987' : 'euc_kr',
'ksx1001' : 'euc_kr',
'ks_x_1001' : 'euc_kr',
# gb18030 codec
'gb18030_2000' : 'gb18030',
# gb2312 codec
'chinese' : 'gb2312',
'csiso58gb231280' : 'gb2312',
'euc_cn' : 'gb2312',
'euccn' : 'gb2312',
'eucgb2312_cn' : 'gb2312',
'gb2312_1980' : 'gb2312',
'gb2312_80' : 'gb2312',
'iso_ir_58' : 'gb2312',
# gbk codec
'936' : 'gbk',
'cp936' : 'gbk',
'ms936' : 'gbk',
# hex_codec codec
'hex' : 'hex_codec',
# hp_roman8 codec
'roman8' : 'hp_roman8',
'r8' : 'hp_roman8',
'csHPRoman8' : 'hp_roman8',
# hz codec
'hzgb' : 'hz',
'hz_gb' : 'hz',
'hz_gb_2312' : 'hz',
# iso2022_jp codec
'csiso2022jp' : 'iso2022_jp',
'iso2022jp' : 'iso2022_jp',
'iso_2022_jp' : 'iso2022_jp',
# iso2022_jp_1 codec
'iso2022jp_1' : 'iso2022_jp_1',
'iso_2022_jp_1' : 'iso2022_jp_1',
# iso2022_jp_2 codec
'iso2022jp_2' : 'iso2022_jp_2',
'iso_2022_jp_2' : 'iso2022_jp_2',
# iso2022_jp_2004 codec
'iso_2022_jp_2004' : 'iso2022_jp_2004',
'iso2022jp_2004' : 'iso2022_jp_2004',
# iso2022_jp_3 codec
'iso2022jp_3' : 'iso2022_jp_3',
'iso_2022_jp_3' : 'iso2022_jp_3',
# iso2022_jp_ext codec
'iso2022jp_ext' : 'iso2022_jp_ext',
'iso_2022_jp_ext' : 'iso2022_jp_ext',
# iso2022_kr codec
'csiso2022kr' : 'iso2022_kr',
'iso2022kr' : 'iso2022_kr',
'iso_2022_kr' : 'iso2022_kr',
# iso8859_10 codec
'csisolatin6' : 'iso8859_10',
'iso_8859_10' : 'iso8859_10',
'iso_8859_10_1992' : 'iso8859_10',
'iso_ir_157' : 'iso8859_10',
'l6' : 'iso8859_10',
'latin6' : 'iso8859_10',
# iso8859_11 codec
'thai' : 'iso8859_11',
'iso_8859_11' : 'iso8859_11',
'iso_8859_11_2001' : 'iso8859_11',
# iso8859_13 codec
'iso_8859_13' : 'iso8859_13',
'l7' : 'iso8859_13',
'latin7' : 'iso8859_13',
# iso8859_14 codec
'iso_8859_14' : 'iso8859_14',
'iso_8859_14_1998' : 'iso8859_14',
'iso_celtic' : 'iso8859_14',
'iso_ir_199' : 'iso8859_14',
'l8' : 'iso8859_14',
'latin8' : 'iso8859_14',
# iso8859_15 codec
'iso_8859_15' : 'iso8859_15',
'l9' : 'iso8859_15',
'latin9' : 'iso8859_15',
# iso8859_16 codec
'iso_8859_16' : 'iso8859_16',
'iso_8859_16_2001' : 'iso8859_16',
'iso_ir_226' : 'iso8859_16',
'l10' : 'iso8859_16',
'latin10' : 'iso8859_16',
# iso8859_2 codec
'csisolatin2' : 'iso8859_2',
'iso_8859_2' : 'iso8859_2',
'iso_8859_2_1987' : 'iso8859_2',
'iso_ir_101' : 'iso8859_2',
'l2' : 'iso8859_2',
'latin2' : 'iso8859_2',
# iso8859_3 codec
'csisolatin3' : 'iso8859_3',
'iso_8859_3' : 'iso8859_3',
'iso_8859_3_1988' : 'iso8859_3',
'iso_ir_109' : 'iso8859_3',
'l3' : 'iso8859_3',
'latin3' : 'iso8859_3',
# iso8859_4 codec
'csisolatin4' : 'iso8859_4',
'iso_8859_4' : 'iso8859_4',
'iso_8859_4_1988' : 'iso8859_4',
'iso_ir_110' : 'iso8859_4',
'l4' : 'iso8859_4',
'latin4' : 'iso8859_4',
# iso8859_5 codec
'csisolatincyrillic' : 'iso8859_5',
'cyrillic' : 'iso8859_5',
'iso_8859_5' : 'iso8859_5',
'iso_8859_5_1988' : 'iso8859_5',
'iso_ir_144' : 'iso8859_5',
# iso8859_6 codec
'arabic' : 'iso8859_6',
'asmo_708' : 'iso8859_6',
'csisolatinarabic' : 'iso8859_6',
'ecma_114' : 'iso8859_6',
'iso_8859_6' : 'iso8859_6',
'iso_8859_6_1987' : 'iso8859_6',
'iso_ir_127' : 'iso8859_6',
# iso8859_7 codec
'csisolatingreek' : 'iso8859_7',
'ecma_118' : 'iso8859_7',
'elot_928' : 'iso8859_7',
'greek' : 'iso8859_7',
'greek8' : 'iso8859_7',
'iso_8859_7' : 'iso8859_7',
'iso_8859_7_1987' : 'iso8859_7',
'iso_ir_126' : 'iso8859_7',
# iso8859_8 codec
'csisolatinhebrew' : 'iso8859_8',
'hebrew' : 'iso8859_8',
'iso_8859_8' : 'iso8859_8',
'iso_8859_8_1988' : 'iso8859_8',
'iso_ir_138' : 'iso8859_8',
# iso8859_9 codec
'csisolatin5' : 'iso8859_9',
'iso_8859_9' : 'iso8859_9',
'iso_8859_9_1989' : 'iso8859_9',
'iso_ir_148' : 'iso8859_9',
'l5' : 'iso8859_9',
'latin5' : 'iso8859_9',
# johab codec
'cp1361' : 'johab',
'ms1361' : 'johab',
# koi8_r codec
'cskoi8r' : 'koi8_r',
# latin_1 codec
#
# Note that the latin_1 codec is implemented internally in C and a
# lot faster than the charmap codec iso8859_1 which uses the same
# encoding. This is why we discourage the use of the iso8859_1
# codec and alias it to latin_1 instead.
#
'8859' : 'latin_1',
'cp819' : 'latin_1',
'csisolatin1' : 'latin_1',
'ibm819' : 'latin_1',
'iso8859' : 'latin_1',
'iso8859_1' : 'latin_1',
'iso_8859_1' : 'latin_1',
'iso_8859_1_1987' : 'latin_1',
'iso_ir_100' : 'latin_1',
'l1' : 'latin_1',
'latin' : 'latin_1',
'latin1' : 'latin_1',
# mac_cyrillic codec
'maccyrillic' : 'mac_cyrillic',
# mac_greek codec
'macgreek' : 'mac_greek',
# mac_iceland codec
'maciceland' : 'mac_iceland',
# mac_latin2 codec
'maccentraleurope' : 'mac_latin2',
'maclatin2' : 'mac_latin2',
# mac_roman codec
'macintosh' : 'mac_roman',
'macroman' : 'mac_roman',
# mac_turkish codec
'macturkish' : 'mac_turkish',
# mbcs codec
'dbcs' : 'mbcs',
# ptcp154 codec
'csptcp154' : 'ptcp154',
'pt154' : 'ptcp154',
'cp154' : 'ptcp154',
'cyrillic_asian' : 'ptcp154',
# quopri_codec codec
'quopri' : 'quopri_codec',
'quoted_printable' : 'quopri_codec',
'quotedprintable' : 'quopri_codec',
# rot_13 codec
'rot13' : 'rot_13',
# shift_jis codec
'csshiftjis' : 'shift_jis',
'shiftjis' : 'shift_jis',
'sjis' : 'shift_jis',
's_jis' : 'shift_jis',
# shift_jis_2004 codec
'shiftjis2004' : 'shift_jis_2004',
'sjis_2004' : 'shift_jis_2004',
's_jis_2004' : 'shift_jis_2004',
# shift_jisx0213 codec
'shiftjisx0213' : 'shift_jisx0213',
'sjisx0213' : 'shift_jisx0213',
's_jisx0213' : 'shift_jisx0213',
# tactis codec
'tis260' : 'tactis',
# tis_620 codec
'tis620' : 'tis_620',
'tis_620_0' : 'tis_620',
'tis_620_2529_0' : 'tis_620',
'tis_620_2529_1' : 'tis_620',
'iso_ir_166' : 'tis_620',
# utf_16 codec
'u16' : 'utf_16',
'utf16' : 'utf_16',
# utf_16_be codec
'unicodebigunmarked' : 'utf_16_be',
'utf_16be' : 'utf_16_be',
# utf_16_le codec
'unicodelittleunmarked' : 'utf_16_le',
'utf_16le' : 'utf_16_le',
# utf_32 codec
'u32' : 'utf_32',
'utf32' : 'utf_32',
# utf_32_be codec
'utf_32be' : 'utf_32_be',
# utf_32_le codec
'utf_32le' : 'utf_32_le',
# utf_7 codec
'u7' : 'utf_7',
'utf7' : 'utf_7',
'unicode_1_1_utf_7' : 'utf_7',
# utf_8 codec
'u8' : 'utf_8',
'utf' : 'utf_8',
'utf8' : 'utf_8',
'utf8_ucs2' : 'utf_8',
'utf8_ucs4' : 'utf_8',
# uu_codec codec
'uu' : 'uu_codec',
# zlib_codec codec
'zip' : 'zlib_codec',
'zlib' : 'zlib_codec',
# temporary mac CJK aliases, will be replaced by proper codecs in 3.1
'x_mac_japanese' : 'shift_jis',
'x_mac_korean' : 'euc_kr',
'x_mac_simp_chinese' : 'gb2312',
'x_mac_trad_chinese' : 'big5',
}
""" Python 'ascii' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.ascii_encode
decode = codecs.ascii_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.ascii_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.ascii_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
class StreamConverter(StreamWriter,StreamReader):
encode = codecs.ascii_decode
decode = codecs.ascii_encode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='ascii',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
"""Python 'base64_codec' Codec - base64 content transfer encoding.
This codec de/encodes from bytes to bytes.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs
import base64
### Codec APIs
def base64_encode(input, errors='strict'):
assert errors == 'strict'
return (base64.encodebytes(input), len(input))
def base64_decode(input, errors='strict'):
assert errors == 'strict'
return (base64.decodebytes(input), len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return base64_encode(input, errors)
def decode(self, input, errors='strict'):
return base64_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
assert self.errors == 'strict'
return base64.encodebytes(input)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
assert self.errors == 'strict'
return base64.decodebytes(input)
class StreamWriter(Codec, codecs.StreamWriter):
charbuffertype = bytes
class StreamReader(Codec, codecs.StreamReader):
charbuffertype = bytes
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='base64',
encode=base64_encode,
decode=base64_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
#
# big5.py: Python Unicode Codec for BIG5
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_tw, codecs
import _multibytecodec as mbc
codec = _codecs_tw.getcodec('big5')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='big5',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# big5hkscs.py: Python Unicode Codec for BIG5HKSCS
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_hk, codecs
import _multibytecodec as mbc
codec = _codecs_hk.getcodec('big5hkscs')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='big5hkscs',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""Python 'bz2_codec' Codec - bz2 compression encoding.
This codec de/encodes from bytes to bytes and is therefore usable with
bytes.transform() and bytes.untransform().
Adapted by Raymond Hettinger from zlib_codec.py which was written
by Marc-Andre Lemburg ([email protected]).
"""
import codecs
import bz2 # this codec needs the optional bz2 module !
### Codec APIs
def bz2_encode(input, errors='strict'):
assert errors == 'strict'
return (bz2.compress(input), len(input))
def bz2_decode(input, errors='strict'):
assert errors == 'strict'
return (bz2.decompress(input), len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return bz2_encode(input, errors)
def decode(self, input, errors='strict'):
return bz2_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.compressobj = bz2.BZ2Compressor()
def encode(self, input, final=False):
if final:
c = self.compressobj.compress(input)
return c + self.compressobj.flush()
else:
return self.compressobj.compress(input)
def reset(self):
self.compressobj = bz2.BZ2Compressor()
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.decompressobj = bz2.BZ2Decompressor()
def decode(self, input, final=False):
try:
return self.decompressobj.decompress(input)
except EOFError:
return ''
def reset(self):
self.decompressobj = bz2.BZ2Decompressor()
class StreamWriter(Codec, codecs.StreamWriter):
charbuffertype = bytes
class StreamReader(Codec, codecs.StreamReader):
charbuffertype = bytes
### encodings module API
def getregentry():
return codecs.CodecInfo(
name="bz2",
encode=bz2_encode,
decode=bz2_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Generic Python Character Mapping Codec.
Use this codec directly rather than through the automatic
conversion mechanisms supplied by unicode() and .encode().
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.charmap_encode
decode = codecs.charmap_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict', mapping=None):
codecs.IncrementalEncoder.__init__(self, errors)
self.mapping = mapping
def encode(self, input, final=False):
return codecs.charmap_encode(input, self.errors, self.mapping)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='strict', mapping=None):
codecs.IncrementalDecoder.__init__(self, errors)
self.mapping = mapping
def decode(self, input, final=False):
return codecs.charmap_decode(input, self.errors, self.mapping)[0]
class StreamWriter(Codec,codecs.StreamWriter):
def __init__(self,stream,errors='strict',mapping=None):
codecs.StreamWriter.__init__(self,stream,errors)
self.mapping = mapping
def encode(self,input,errors='strict'):
return Codec.encode(input,errors,self.mapping)
class StreamReader(Codec,codecs.StreamReader):
def __init__(self,stream,errors='strict',mapping=None):
codecs.StreamReader.__init__(self,stream,errors)
self.mapping = mapping
def decode(self,input,errors='strict'):
return Codec.decode(input,errors,self.mapping)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='charmap',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python Character Mapping Codec cp037 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp037',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x9c' # 0x04 -> CONTROL
'\t' # 0x05 -> HORIZONTAL TABULATION
'\x86' # 0x06 -> CONTROL
'\x7f' # 0x07 -> DELETE
'\x97' # 0x08 -> CONTROL
'\x8d' # 0x09 -> CONTROL
'\x8e' # 0x0A -> CONTROL
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x9d' # 0x14 -> CONTROL
'\x85' # 0x15 -> CONTROL
'\x08' # 0x16 -> BACKSPACE
'\x87' # 0x17 -> CONTROL
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x92' # 0x1A -> CONTROL
'\x8f' # 0x1B -> CONTROL
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
'\x80' # 0x20 -> CONTROL
'\x81' # 0x21 -> CONTROL
'\x82' # 0x22 -> CONTROL
'\x83' # 0x23 -> CONTROL
'\x84' # 0x24 -> CONTROL
'\n' # 0x25 -> LINE FEED
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
'\x1b' # 0x27 -> ESCAPE
'\x88' # 0x28 -> CONTROL
'\x89' # 0x29 -> CONTROL
'\x8a' # 0x2A -> CONTROL
'\x8b' # 0x2B -> CONTROL
'\x8c' # 0x2C -> CONTROL
'\x05' # 0x2D -> ENQUIRY
'\x06' # 0x2E -> ACKNOWLEDGE
'\x07' # 0x2F -> BELL
'\x90' # 0x30 -> CONTROL
'\x91' # 0x31 -> CONTROL
'\x16' # 0x32 -> SYNCHRONOUS IDLE
'\x93' # 0x33 -> CONTROL
'\x94' # 0x34 -> CONTROL
'\x95' # 0x35 -> CONTROL
'\x96' # 0x36 -> CONTROL
'\x04' # 0x37 -> END OF TRANSMISSION
'\x98' # 0x38 -> CONTROL
'\x99' # 0x39 -> CONTROL
'\x9a' # 0x3A -> CONTROL
'\x9b' # 0x3B -> CONTROL
'\x14' # 0x3C -> DEVICE CONTROL FOUR
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
'\x9e' # 0x3E -> CONTROL
'\x1a' # 0x3F -> SUBSTITUTE
' ' # 0x40 -> SPACE
'\xa0' # 0x41 -> NO-BREAK SPACE
'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
'\xa2' # 0x4A -> CENT SIGN
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'|' # 0x4F -> VERTICAL LINE
'&' # 0x50 -> AMPERSAND
'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
'!' # 0x5A -> EXCLAMATION MARK
'$' # 0x5B -> DOLLAR SIGN
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'\xac' # 0x5F -> NOT SIGN
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
'\xa6' # 0x6A -> BROKEN BAR
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
'`' # 0x79 -> GRAVE ACCENT
':' # 0x7A -> COLON
'#' # 0x7B -> NUMBER SIGN
'@' # 0x7C -> COMMERCIAL AT
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'"' # 0x7F -> QUOTATION MARK
'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (ICELANDIC)
'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (ICELANDIC)
'\xb1' # 0x8F -> PLUS-MINUS SIGN
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
'\xb8' # 0x9D -> CEDILLA
'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
'\xa4' # 0x9F -> CURRENCY SIGN
'\xb5' # 0xA0 -> MICRO SIGN
'~' # 0xA1 -> TILDE
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
'\xbf' # 0xAB -> INVERTED QUESTION MARK
'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (ICELANDIC)
'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (ICELANDIC)
'\xae' # 0xAF -> REGISTERED SIGN
'^' # 0xB0 -> CIRCUMFLEX ACCENT
'\xa3' # 0xB1 -> POUND SIGN
'\xa5' # 0xB2 -> YEN SIGN
'\xb7' # 0xB3 -> MIDDLE DOT
'\xa9' # 0xB4 -> COPYRIGHT SIGN
'\xa7' # 0xB5 -> SECTION SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
'[' # 0xBA -> LEFT SQUARE BRACKET
']' # 0xBB -> RIGHT SQUARE BRACKET
'\xaf' # 0xBC -> MACRON
'\xa8' # 0xBD -> DIAERESIS
'\xb4' # 0xBE -> ACUTE ACCENT
'\xd7' # 0xBF -> MULTIPLICATION SIGN
'{' # 0xC0 -> LEFT CURLY BRACKET
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0xCC -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
'}' # 0xD0 -> RIGHT CURLY BRACKET
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb9' # 0xDA -> SUPERSCRIPT ONE
'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xDC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\\' # 0xE0 -> REVERSE SOLIDUS
'\xf7' # 0xE1 -> DIVISION SIGN
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd6' # 0xEC -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xFC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1006 generated from 'MAPPINGS/VENDORS/MISC/CP1006.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1006',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u06f0' # 0xA1 -> EXTENDED ARABIC-INDIC DIGIT ZERO
'\u06f1' # 0xA2 -> EXTENDED ARABIC-INDIC DIGIT ONE
'\u06f2' # 0xA3 -> EXTENDED ARABIC-INDIC DIGIT TWO
'\u06f3' # 0xA4 -> EXTENDED ARABIC-INDIC DIGIT THREE
'\u06f4' # 0xA5 -> EXTENDED ARABIC-INDIC DIGIT FOUR
'\u06f5' # 0xA6 -> EXTENDED ARABIC-INDIC DIGIT FIVE
'\u06f6' # 0xA7 -> EXTENDED ARABIC-INDIC DIGIT SIX
'\u06f7' # 0xA8 -> EXTENDED ARABIC-INDIC DIGIT SEVEN
'\u06f8' # 0xA9 -> EXTENDED ARABIC-INDIC DIGIT EIGHT
'\u06f9' # 0xAA -> EXTENDED ARABIC-INDIC DIGIT NINE
'\u060c' # 0xAB -> ARABIC COMMA
'\u061b' # 0xAC -> ARABIC SEMICOLON
'\xad' # 0xAD -> SOFT HYPHEN
'\u061f' # 0xAE -> ARABIC QUESTION MARK
'\ufe81' # 0xAF -> ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
'\ufe8d' # 0xB0 -> ARABIC LETTER ALEF ISOLATED FORM
'\ufe8e' # 0xB1 -> ARABIC LETTER ALEF FINAL FORM
'\ufe8e' # 0xB2 -> ARABIC LETTER ALEF FINAL FORM
'\ufe8f' # 0xB3 -> ARABIC LETTER BEH ISOLATED FORM
'\ufe91' # 0xB4 -> ARABIC LETTER BEH INITIAL FORM
'\ufb56' # 0xB5 -> ARABIC LETTER PEH ISOLATED FORM
'\ufb58' # 0xB6 -> ARABIC LETTER PEH INITIAL FORM
'\ufe93' # 0xB7 -> ARABIC LETTER TEH MARBUTA ISOLATED FORM
'\ufe95' # 0xB8 -> ARABIC LETTER TEH ISOLATED FORM
'\ufe97' # 0xB9 -> ARABIC LETTER TEH INITIAL FORM
'\ufb66' # 0xBA -> ARABIC LETTER TTEH ISOLATED FORM
'\ufb68' # 0xBB -> ARABIC LETTER TTEH INITIAL FORM
'\ufe99' # 0xBC -> ARABIC LETTER THEH ISOLATED FORM
'\ufe9b' # 0xBD -> ARABIC LETTER THEH INITIAL FORM
'\ufe9d' # 0xBE -> ARABIC LETTER JEEM ISOLATED FORM
'\ufe9f' # 0xBF -> ARABIC LETTER JEEM INITIAL FORM
'\ufb7a' # 0xC0 -> ARABIC LETTER TCHEH ISOLATED FORM
'\ufb7c' # 0xC1 -> ARABIC LETTER TCHEH INITIAL FORM
'\ufea1' # 0xC2 -> ARABIC LETTER HAH ISOLATED FORM
'\ufea3' # 0xC3 -> ARABIC LETTER HAH INITIAL FORM
'\ufea5' # 0xC4 -> ARABIC LETTER KHAH ISOLATED FORM
'\ufea7' # 0xC5 -> ARABIC LETTER KHAH INITIAL FORM
'\ufea9' # 0xC6 -> ARABIC LETTER DAL ISOLATED FORM
'\ufb84' # 0xC7 -> ARABIC LETTER DAHAL ISOLATED FORMN
'\ufeab' # 0xC8 -> ARABIC LETTER THAL ISOLATED FORM
'\ufead' # 0xC9 -> ARABIC LETTER REH ISOLATED FORM
'\ufb8c' # 0xCA -> ARABIC LETTER RREH ISOLATED FORM
'\ufeaf' # 0xCB -> ARABIC LETTER ZAIN ISOLATED FORM
'\ufb8a' # 0xCC -> ARABIC LETTER JEH ISOLATED FORM
'\ufeb1' # 0xCD -> ARABIC LETTER SEEN ISOLATED FORM
'\ufeb3' # 0xCE -> ARABIC LETTER SEEN INITIAL FORM
'\ufeb5' # 0xCF -> ARABIC LETTER SHEEN ISOLATED FORM
'\ufeb7' # 0xD0 -> ARABIC LETTER SHEEN INITIAL FORM
'\ufeb9' # 0xD1 -> ARABIC LETTER SAD ISOLATED FORM
'\ufebb' # 0xD2 -> ARABIC LETTER SAD INITIAL FORM
'\ufebd' # 0xD3 -> ARABIC LETTER DAD ISOLATED FORM
'\ufebf' # 0xD4 -> ARABIC LETTER DAD INITIAL FORM
'\ufec1' # 0xD5 -> ARABIC LETTER TAH ISOLATED FORM
'\ufec5' # 0xD6 -> ARABIC LETTER ZAH ISOLATED FORM
'\ufec9' # 0xD7 -> ARABIC LETTER AIN ISOLATED FORM
'\ufeca' # 0xD8 -> ARABIC LETTER AIN FINAL FORM
'\ufecb' # 0xD9 -> ARABIC LETTER AIN INITIAL FORM
'\ufecc' # 0xDA -> ARABIC LETTER AIN MEDIAL FORM
'\ufecd' # 0xDB -> ARABIC LETTER GHAIN ISOLATED FORM
'\ufece' # 0xDC -> ARABIC LETTER GHAIN FINAL FORM
'\ufecf' # 0xDD -> ARABIC LETTER GHAIN INITIAL FORM
'\ufed0' # 0xDE -> ARABIC LETTER GHAIN MEDIAL FORM
'\ufed1' # 0xDF -> ARABIC LETTER FEH ISOLATED FORM
'\ufed3' # 0xE0 -> ARABIC LETTER FEH INITIAL FORM
'\ufed5' # 0xE1 -> ARABIC LETTER QAF ISOLATED FORM
'\ufed7' # 0xE2 -> ARABIC LETTER QAF INITIAL FORM
'\ufed9' # 0xE3 -> ARABIC LETTER KAF ISOLATED FORM
'\ufedb' # 0xE4 -> ARABIC LETTER KAF INITIAL FORM
'\ufb92' # 0xE5 -> ARABIC LETTER GAF ISOLATED FORM
'\ufb94' # 0xE6 -> ARABIC LETTER GAF INITIAL FORM
'\ufedd' # 0xE7 -> ARABIC LETTER LAM ISOLATED FORM
'\ufedf' # 0xE8 -> ARABIC LETTER LAM INITIAL FORM
'\ufee0' # 0xE9 -> ARABIC LETTER LAM MEDIAL FORM
'\ufee1' # 0xEA -> ARABIC LETTER MEEM ISOLATED FORM
'\ufee3' # 0xEB -> ARABIC LETTER MEEM INITIAL FORM
'\ufb9e' # 0xEC -> ARABIC LETTER NOON GHUNNA ISOLATED FORM
'\ufee5' # 0xED -> ARABIC LETTER NOON ISOLATED FORM
'\ufee7' # 0xEE -> ARABIC LETTER NOON INITIAL FORM
'\ufe85' # 0xEF -> ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
'\ufeed' # 0xF0 -> ARABIC LETTER WAW ISOLATED FORM
'\ufba6' # 0xF1 -> ARABIC LETTER HEH GOAL ISOLATED FORM
'\ufba8' # 0xF2 -> ARABIC LETTER HEH GOAL INITIAL FORM
'\ufba9' # 0xF3 -> ARABIC LETTER HEH GOAL MEDIAL FORM
'\ufbaa' # 0xF4 -> ARABIC LETTER HEH DOACHASHMEE ISOLATED FORM
'\ufe80' # 0xF5 -> ARABIC LETTER HAMZA ISOLATED FORM
'\ufe89' # 0xF6 -> ARABIC LETTER YEH WITH HAMZA ABOVE ISOLATED FORM
'\ufe8a' # 0xF7 -> ARABIC LETTER YEH WITH HAMZA ABOVE FINAL FORM
'\ufe8b' # 0xF8 -> ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
'\ufef1' # 0xF9 -> ARABIC LETTER YEH ISOLATED FORM
'\ufef2' # 0xFA -> ARABIC LETTER YEH FINAL FORM
'\ufef3' # 0xFB -> ARABIC LETTER YEH INITIAL FORM
'\ufbb0' # 0xFC -> ARABIC LETTER YEH BARREE WITH HAMZA ABOVE ISOLATED FORM
'\ufbae' # 0xFD -> ARABIC LETTER YEH BARREE ISOLATED FORM
'\ufe7c' # 0xFE -> ARABIC SHADDA ISOLATED FORM
'\ufe7d' # 0xFF -> ARABIC SHADDA MEDIAL FORM
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1026 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP1026.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1026',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x9c' # 0x04 -> CONTROL
'\t' # 0x05 -> HORIZONTAL TABULATION
'\x86' # 0x06 -> CONTROL
'\x7f' # 0x07 -> DELETE
'\x97' # 0x08 -> CONTROL
'\x8d' # 0x09 -> CONTROL
'\x8e' # 0x0A -> CONTROL
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x9d' # 0x14 -> CONTROL
'\x85' # 0x15 -> CONTROL
'\x08' # 0x16 -> BACKSPACE
'\x87' # 0x17 -> CONTROL
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x92' # 0x1A -> CONTROL
'\x8f' # 0x1B -> CONTROL
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
'\x80' # 0x20 -> CONTROL
'\x81' # 0x21 -> CONTROL
'\x82' # 0x22 -> CONTROL
'\x83' # 0x23 -> CONTROL
'\x84' # 0x24 -> CONTROL
'\n' # 0x25 -> LINE FEED
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
'\x1b' # 0x27 -> ESCAPE
'\x88' # 0x28 -> CONTROL
'\x89' # 0x29 -> CONTROL
'\x8a' # 0x2A -> CONTROL
'\x8b' # 0x2B -> CONTROL
'\x8c' # 0x2C -> CONTROL
'\x05' # 0x2D -> ENQUIRY
'\x06' # 0x2E -> ACKNOWLEDGE
'\x07' # 0x2F -> BELL
'\x90' # 0x30 -> CONTROL
'\x91' # 0x31 -> CONTROL
'\x16' # 0x32 -> SYNCHRONOUS IDLE
'\x93' # 0x33 -> CONTROL
'\x94' # 0x34 -> CONTROL
'\x95' # 0x35 -> CONTROL
'\x96' # 0x36 -> CONTROL
'\x04' # 0x37 -> END OF TRANSMISSION
'\x98' # 0x38 -> CONTROL
'\x99' # 0x39 -> CONTROL
'\x9a' # 0x3A -> CONTROL
'\x9b' # 0x3B -> CONTROL
'\x14' # 0x3C -> DEVICE CONTROL FOUR
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
'\x9e' # 0x3E -> CONTROL
'\x1a' # 0x3F -> SUBSTITUTE
' ' # 0x40 -> SPACE
'\xa0' # 0x41 -> NO-BREAK SPACE
'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
'{' # 0x48 -> LEFT CURLY BRACKET
'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
'\xc7' # 0x4A -> LATIN CAPITAL LETTER C WITH CEDILLA
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'!' # 0x4F -> EXCLAMATION MARK
'&' # 0x50 -> AMPERSAND
'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
'\u011e' # 0x5A -> LATIN CAPITAL LETTER G WITH BREVE
'\u0130' # 0x5B -> LATIN CAPITAL LETTER I WITH DOT ABOVE
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'^' # 0x5F -> CIRCUMFLEX ACCENT
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'[' # 0x68 -> LEFT SQUARE BRACKET
'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
'\u015f' # 0x6A -> LATIN SMALL LETTER S WITH CEDILLA
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
'\u0131' # 0x79 -> LATIN SMALL LETTER DOTLESS I
':' # 0x7A -> COLON
'\xd6' # 0x7B -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\u015e' # 0x7C -> LATIN CAPITAL LETTER S WITH CEDILLA
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'\xdc' # 0x7F -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'}' # 0x8C -> RIGHT CURLY BRACKET
'`' # 0x8D -> GRAVE ACCENT
'\xa6' # 0x8E -> BROKEN BAR
'\xb1' # 0x8F -> PLUS-MINUS SIGN
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
'\xb8' # 0x9D -> CEDILLA
'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
'\xa4' # 0x9F -> CURRENCY SIGN
'\xb5' # 0xA0 -> MICRO SIGN
'\xf6' # 0xA1 -> LATIN SMALL LETTER O WITH DIAERESIS
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
'\xbf' # 0xAB -> INVERTED QUESTION MARK
']' # 0xAC -> RIGHT SQUARE BRACKET
'$' # 0xAD -> DOLLAR SIGN
'@' # 0xAE -> COMMERCIAL AT
'\xae' # 0xAF -> REGISTERED SIGN
'\xa2' # 0xB0 -> CENT SIGN
'\xa3' # 0xB1 -> POUND SIGN
'\xa5' # 0xB2 -> YEN SIGN
'\xb7' # 0xB3 -> MIDDLE DOT
'\xa9' # 0xB4 -> COPYRIGHT SIGN
'\xa7' # 0xB5 -> SECTION SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
'\xac' # 0xBA -> NOT SIGN
'|' # 0xBB -> VERTICAL LINE
'\xaf' # 0xBC -> MACRON
'\xa8' # 0xBD -> DIAERESIS
'\xb4' # 0xBE -> ACUTE ACCENT
'\xd7' # 0xBF -> MULTIPLICATION SIGN
'\xe7' # 0xC0 -> LATIN SMALL LETTER C WITH CEDILLA
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'~' # 0xCC -> TILDE
'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
'\u011f' # 0xD0 -> LATIN SMALL LETTER G WITH BREVE
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb9' # 0xDA -> SUPERSCRIPT ONE
'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\\' # 0xDC -> REVERSE SOLIDUS
'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xfc' # 0xE0 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xf7' # 0xE1 -> DIVISION SIGN
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'#' # 0xEC -> NUMBER SIGN
'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'"' # 0xFC -> QUOTATION MARK
'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec for CP1125
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1125',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0410, # CYRILLIC CAPITAL LETTER A
0x0081: 0x0411, # CYRILLIC CAPITAL LETTER BE
0x0082: 0x0412, # CYRILLIC CAPITAL LETTER VE
0x0083: 0x0413, # CYRILLIC CAPITAL LETTER GHE
0x0084: 0x0414, # CYRILLIC CAPITAL LETTER DE
0x0085: 0x0415, # CYRILLIC CAPITAL LETTER IE
0x0086: 0x0416, # CYRILLIC CAPITAL LETTER ZHE
0x0087: 0x0417, # CYRILLIC CAPITAL LETTER ZE
0x0088: 0x0418, # CYRILLIC CAPITAL LETTER I
0x0089: 0x0419, # CYRILLIC CAPITAL LETTER SHORT I
0x008a: 0x041a, # CYRILLIC CAPITAL LETTER KA
0x008b: 0x041b, # CYRILLIC CAPITAL LETTER EL
0x008c: 0x041c, # CYRILLIC CAPITAL LETTER EM
0x008d: 0x041d, # CYRILLIC CAPITAL LETTER EN
0x008e: 0x041e, # CYRILLIC CAPITAL LETTER O
0x008f: 0x041f, # CYRILLIC CAPITAL LETTER PE
0x0090: 0x0420, # CYRILLIC CAPITAL LETTER ER
0x0091: 0x0421, # CYRILLIC CAPITAL LETTER ES
0x0092: 0x0422, # CYRILLIC CAPITAL LETTER TE
0x0093: 0x0423, # CYRILLIC CAPITAL LETTER U
0x0094: 0x0424, # CYRILLIC CAPITAL LETTER EF
0x0095: 0x0425, # CYRILLIC CAPITAL LETTER HA
0x0096: 0x0426, # CYRILLIC CAPITAL LETTER TSE
0x0097: 0x0427, # CYRILLIC CAPITAL LETTER CHE
0x0098: 0x0428, # CYRILLIC CAPITAL LETTER SHA
0x0099: 0x0429, # CYRILLIC CAPITAL LETTER SHCHA
0x009a: 0x042a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x009b: 0x042b, # CYRILLIC CAPITAL LETTER YERU
0x009c: 0x042c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x009d: 0x042d, # CYRILLIC CAPITAL LETTER E
0x009e: 0x042e, # CYRILLIC CAPITAL LETTER YU
0x009f: 0x042f, # CYRILLIC CAPITAL LETTER YA
0x00a0: 0x0430, # CYRILLIC SMALL LETTER A
0x00a1: 0x0431, # CYRILLIC SMALL LETTER BE
0x00a2: 0x0432, # CYRILLIC SMALL LETTER VE
0x00a3: 0x0433, # CYRILLIC SMALL LETTER GHE
0x00a4: 0x0434, # CYRILLIC SMALL LETTER DE
0x00a5: 0x0435, # CYRILLIC SMALL LETTER IE
0x00a6: 0x0436, # CYRILLIC SMALL LETTER ZHE
0x00a7: 0x0437, # CYRILLIC SMALL LETTER ZE
0x00a8: 0x0438, # CYRILLIC SMALL LETTER I
0x00a9: 0x0439, # CYRILLIC SMALL LETTER SHORT I
0x00aa: 0x043a, # CYRILLIC SMALL LETTER KA
0x00ab: 0x043b, # CYRILLIC SMALL LETTER EL
0x00ac: 0x043c, # CYRILLIC SMALL LETTER EM
0x00ad: 0x043d, # CYRILLIC SMALL LETTER EN
0x00ae: 0x043e, # CYRILLIC SMALL LETTER O
0x00af: 0x043f, # CYRILLIC SMALL LETTER PE
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x0440, # CYRILLIC SMALL LETTER ER
0x00e1: 0x0441, # CYRILLIC SMALL LETTER ES
0x00e2: 0x0442, # CYRILLIC SMALL LETTER TE
0x00e3: 0x0443, # CYRILLIC SMALL LETTER U
0x00e4: 0x0444, # CYRILLIC SMALL LETTER EF
0x00e5: 0x0445, # CYRILLIC SMALL LETTER HA
0x00e6: 0x0446, # CYRILLIC SMALL LETTER TSE
0x00e7: 0x0447, # CYRILLIC SMALL LETTER CHE
0x00e8: 0x0448, # CYRILLIC SMALL LETTER SHA
0x00e9: 0x0449, # CYRILLIC SMALL LETTER SHCHA
0x00ea: 0x044a, # CYRILLIC SMALL LETTER HARD SIGN
0x00eb: 0x044b, # CYRILLIC SMALL LETTER YERU
0x00ec: 0x044c, # CYRILLIC SMALL LETTER SOFT SIGN
0x00ed: 0x044d, # CYRILLIC SMALL LETTER E
0x00ee: 0x044e, # CYRILLIC SMALL LETTER YU
0x00ef: 0x044f, # CYRILLIC SMALL LETTER YA
0x00f0: 0x0401, # CYRILLIC CAPITAL LETTER IO
0x00f1: 0x0451, # CYRILLIC SMALL LETTER IO
0x00f2: 0x0490, # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0x00f3: 0x0491, # CYRILLIC SMALL LETTER GHE WITH UPTURN
0x00f4: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x00f5: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x00f6: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x00f7: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x00f8: 0x0407, # CYRILLIC CAPITAL LETTER YI
0x00f9: 0x0457, # CYRILLIC SMALL LETTER YI
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x2116, # NUMERO SIGN
0x00fd: 0x00a4, # CURRENCY SIGN
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\u0410' # 0x0080 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0x0081 -> CYRILLIC CAPITAL LETTER BE
'\u0412' # 0x0082 -> CYRILLIC CAPITAL LETTER VE
'\u0413' # 0x0083 -> CYRILLIC CAPITAL LETTER GHE
'\u0414' # 0x0084 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0x0085 -> CYRILLIC CAPITAL LETTER IE
'\u0416' # 0x0086 -> CYRILLIC CAPITAL LETTER ZHE
'\u0417' # 0x0087 -> CYRILLIC CAPITAL LETTER ZE
'\u0418' # 0x0088 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0x0089 -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0x008a -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0x008b -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0x008c -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0x008d -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0x008e -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0x008f -> CYRILLIC CAPITAL LETTER PE
'\u0420' # 0x0090 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0x0091 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0x0092 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0x0093 -> CYRILLIC CAPITAL LETTER U
'\u0424' # 0x0094 -> CYRILLIC CAPITAL LETTER EF
'\u0425' # 0x0095 -> CYRILLIC CAPITAL LETTER HA
'\u0426' # 0x0096 -> CYRILLIC CAPITAL LETTER TSE
'\u0427' # 0x0097 -> CYRILLIC CAPITAL LETTER CHE
'\u0428' # 0x0098 -> CYRILLIC CAPITAL LETTER SHA
'\u0429' # 0x0099 -> CYRILLIC CAPITAL LETTER SHCHA
'\u042a' # 0x009a -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u042b' # 0x009b -> CYRILLIC CAPITAL LETTER YERU
'\u042c' # 0x009c -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042d' # 0x009d -> CYRILLIC CAPITAL LETTER E
'\u042e' # 0x009e -> CYRILLIC CAPITAL LETTER YU
'\u042f' # 0x009f -> CYRILLIC CAPITAL LETTER YA
'\u0430' # 0x00a0 -> CYRILLIC SMALL LETTER A
'\u0431' # 0x00a1 -> CYRILLIC SMALL LETTER BE
'\u0432' # 0x00a2 -> CYRILLIC SMALL LETTER VE
'\u0433' # 0x00a3 -> CYRILLIC SMALL LETTER GHE
'\u0434' # 0x00a4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0x00a5 -> CYRILLIC SMALL LETTER IE
'\u0436' # 0x00a6 -> CYRILLIC SMALL LETTER ZHE
'\u0437' # 0x00a7 -> CYRILLIC SMALL LETTER ZE
'\u0438' # 0x00a8 -> CYRILLIC SMALL LETTER I
'\u0439' # 0x00a9 -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0x00aa -> CYRILLIC SMALL LETTER KA
'\u043b' # 0x00ab -> CYRILLIC SMALL LETTER EL
'\u043c' # 0x00ac -> CYRILLIC SMALL LETTER EM
'\u043d' # 0x00ad -> CYRILLIC SMALL LETTER EN
'\u043e' # 0x00ae -> CYRILLIC SMALL LETTER O
'\u043f' # 0x00af -> CYRILLIC SMALL LETTER PE
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u0440' # 0x00e0 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0x00e1 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0x00e2 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0x00e3 -> CYRILLIC SMALL LETTER U
'\u0444' # 0x00e4 -> CYRILLIC SMALL LETTER EF
'\u0445' # 0x00e5 -> CYRILLIC SMALL LETTER HA
'\u0446' # 0x00e6 -> CYRILLIC SMALL LETTER TSE
'\u0447' # 0x00e7 -> CYRILLIC SMALL LETTER CHE
'\u0448' # 0x00e8 -> CYRILLIC SMALL LETTER SHA
'\u0449' # 0x00e9 -> CYRILLIC SMALL LETTER SHCHA
'\u044a' # 0x00ea -> CYRILLIC SMALL LETTER HARD SIGN
'\u044b' # 0x00eb -> CYRILLIC SMALL LETTER YERU
'\u044c' # 0x00ec -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044d' # 0x00ed -> CYRILLIC SMALL LETTER E
'\u044e' # 0x00ee -> CYRILLIC SMALL LETTER YU
'\u044f' # 0x00ef -> CYRILLIC SMALL LETTER YA
'\u0401' # 0x00f0 -> CYRILLIC CAPITAL LETTER IO
'\u0451' # 0x00f1 -> CYRILLIC SMALL LETTER IO
'\u0490' # 0x00f2 -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
'\u0491' # 0x00f3 -> CYRILLIC SMALL LETTER GHE WITH UPTURN
'\u0404' # 0x00f4 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\u0454' # 0x00f5 -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\u0406' # 0x00f6 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0456' # 0x00f7 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0407' # 0x00f8 -> CYRILLIC CAPITAL LETTER YI
'\u0457' # 0x00f9 -> CYRILLIC SMALL LETTER YI
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u2116' # 0x00fc -> NUMERO SIGN
'\xa4' # 0x00fd -> CURRENCY SIGN
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00fd, # CURRENCY SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x0401: 0x00f0, # CYRILLIC CAPITAL LETTER IO
0x0404: 0x00f4, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0406: 0x00f6, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x0407: 0x00f8, # CYRILLIC CAPITAL LETTER YI
0x0410: 0x0080, # CYRILLIC CAPITAL LETTER A
0x0411: 0x0081, # CYRILLIC CAPITAL LETTER BE
0x0412: 0x0082, # CYRILLIC CAPITAL LETTER VE
0x0413: 0x0083, # CYRILLIC CAPITAL LETTER GHE
0x0414: 0x0084, # CYRILLIC CAPITAL LETTER DE
0x0415: 0x0085, # CYRILLIC CAPITAL LETTER IE
0x0416: 0x0086, # CYRILLIC CAPITAL LETTER ZHE
0x0417: 0x0087, # CYRILLIC CAPITAL LETTER ZE
0x0418: 0x0088, # CYRILLIC CAPITAL LETTER I
0x0419: 0x0089, # CYRILLIC CAPITAL LETTER SHORT I
0x041a: 0x008a, # CYRILLIC CAPITAL LETTER KA
0x041b: 0x008b, # CYRILLIC CAPITAL LETTER EL
0x041c: 0x008c, # CYRILLIC CAPITAL LETTER EM
0x041d: 0x008d, # CYRILLIC CAPITAL LETTER EN
0x041e: 0x008e, # CYRILLIC CAPITAL LETTER O
0x041f: 0x008f, # CYRILLIC CAPITAL LETTER PE
0x0420: 0x0090, # CYRILLIC CAPITAL LETTER ER
0x0421: 0x0091, # CYRILLIC CAPITAL LETTER ES
0x0422: 0x0092, # CYRILLIC CAPITAL LETTER TE
0x0423: 0x0093, # CYRILLIC CAPITAL LETTER U
0x0424: 0x0094, # CYRILLIC CAPITAL LETTER EF
0x0425: 0x0095, # CYRILLIC CAPITAL LETTER HA
0x0426: 0x0096, # CYRILLIC CAPITAL LETTER TSE
0x0427: 0x0097, # CYRILLIC CAPITAL LETTER CHE
0x0428: 0x0098, # CYRILLIC CAPITAL LETTER SHA
0x0429: 0x0099, # CYRILLIC CAPITAL LETTER SHCHA
0x042a: 0x009a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x042b: 0x009b, # CYRILLIC CAPITAL LETTER YERU
0x042c: 0x009c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x042d: 0x009d, # CYRILLIC CAPITAL LETTER E
0x042e: 0x009e, # CYRILLIC CAPITAL LETTER YU
0x042f: 0x009f, # CYRILLIC CAPITAL LETTER YA
0x0430: 0x00a0, # CYRILLIC SMALL LETTER A
0x0431: 0x00a1, # CYRILLIC SMALL LETTER BE
0x0432: 0x00a2, # CYRILLIC SMALL LETTER VE
0x0433: 0x00a3, # CYRILLIC SMALL LETTER GHE
0x0434: 0x00a4, # CYRILLIC SMALL LETTER DE
0x0435: 0x00a5, # CYRILLIC SMALL LETTER IE
0x0436: 0x00a6, # CYRILLIC SMALL LETTER ZHE
0x0437: 0x00a7, # CYRILLIC SMALL LETTER ZE
0x0438: 0x00a8, # CYRILLIC SMALL LETTER I
0x0439: 0x00a9, # CYRILLIC SMALL LETTER SHORT I
0x043a: 0x00aa, # CYRILLIC SMALL LETTER KA
0x043b: 0x00ab, # CYRILLIC SMALL LETTER EL
0x043c: 0x00ac, # CYRILLIC SMALL LETTER EM
0x043d: 0x00ad, # CYRILLIC SMALL LETTER EN
0x043e: 0x00ae, # CYRILLIC SMALL LETTER O
0x043f: 0x00af, # CYRILLIC SMALL LETTER PE
0x0440: 0x00e0, # CYRILLIC SMALL LETTER ER
0x0441: 0x00e1, # CYRILLIC SMALL LETTER ES
0x0442: 0x00e2, # CYRILLIC SMALL LETTER TE
0x0443: 0x00e3, # CYRILLIC SMALL LETTER U
0x0444: 0x00e4, # CYRILLIC SMALL LETTER EF
0x0445: 0x00e5, # CYRILLIC SMALL LETTER HA
0x0446: 0x00e6, # CYRILLIC SMALL LETTER TSE
0x0447: 0x00e7, # CYRILLIC SMALL LETTER CHE
0x0448: 0x00e8, # CYRILLIC SMALL LETTER SHA
0x0449: 0x00e9, # CYRILLIC SMALL LETTER SHCHA
0x044a: 0x00ea, # CYRILLIC SMALL LETTER HARD SIGN
0x044b: 0x00eb, # CYRILLIC SMALL LETTER YERU
0x044c: 0x00ec, # CYRILLIC SMALL LETTER SOFT SIGN
0x044d: 0x00ed, # CYRILLIC SMALL LETTER E
0x044e: 0x00ee, # CYRILLIC SMALL LETTER YU
0x044f: 0x00ef, # CYRILLIC SMALL LETTER YA
0x0451: 0x00f1, # CYRILLIC SMALL LETTER IO
0x0454: 0x00f5, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0456: 0x00f7, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x0457: 0x00f9, # CYRILLIC SMALL LETTER YI
0x0490: 0x00f2, # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0x0491: 0x00f3, # CYRILLIC SMALL LETTER GHE WITH UPTURN
0x2116: 0x00fc, # NUMERO SIGN
0x221a: 0x00fb, # SQUARE ROOT
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp1140 generated from 'python-mappings/CP1140.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1140',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x9c' # 0x04 -> CONTROL
'\t' # 0x05 -> HORIZONTAL TABULATION
'\x86' # 0x06 -> CONTROL
'\x7f' # 0x07 -> DELETE
'\x97' # 0x08 -> CONTROL
'\x8d' # 0x09 -> CONTROL
'\x8e' # 0x0A -> CONTROL
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x9d' # 0x14 -> CONTROL
'\x85' # 0x15 -> CONTROL
'\x08' # 0x16 -> BACKSPACE
'\x87' # 0x17 -> CONTROL
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x92' # 0x1A -> CONTROL
'\x8f' # 0x1B -> CONTROL
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
'\x80' # 0x20 -> CONTROL
'\x81' # 0x21 -> CONTROL
'\x82' # 0x22 -> CONTROL
'\x83' # 0x23 -> CONTROL
'\x84' # 0x24 -> CONTROL
'\n' # 0x25 -> LINE FEED
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
'\x1b' # 0x27 -> ESCAPE
'\x88' # 0x28 -> CONTROL
'\x89' # 0x29 -> CONTROL
'\x8a' # 0x2A -> CONTROL
'\x8b' # 0x2B -> CONTROL
'\x8c' # 0x2C -> CONTROL
'\x05' # 0x2D -> ENQUIRY
'\x06' # 0x2E -> ACKNOWLEDGE
'\x07' # 0x2F -> BELL
'\x90' # 0x30 -> CONTROL
'\x91' # 0x31 -> CONTROL
'\x16' # 0x32 -> SYNCHRONOUS IDLE
'\x93' # 0x33 -> CONTROL
'\x94' # 0x34 -> CONTROL
'\x95' # 0x35 -> CONTROL
'\x96' # 0x36 -> CONTROL
'\x04' # 0x37 -> END OF TRANSMISSION
'\x98' # 0x38 -> CONTROL
'\x99' # 0x39 -> CONTROL
'\x9a' # 0x3A -> CONTROL
'\x9b' # 0x3B -> CONTROL
'\x14' # 0x3C -> DEVICE CONTROL FOUR
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
'\x9e' # 0x3E -> CONTROL
'\x1a' # 0x3F -> SUBSTITUTE
' ' # 0x40 -> SPACE
'\xa0' # 0x41 -> NO-BREAK SPACE
'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
'\xa2' # 0x4A -> CENT SIGN
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'|' # 0x4F -> VERTICAL LINE
'&' # 0x50 -> AMPERSAND
'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
'!' # 0x5A -> EXCLAMATION MARK
'$' # 0x5B -> DOLLAR SIGN
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'\xac' # 0x5F -> NOT SIGN
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
'\xa6' # 0x6A -> BROKEN BAR
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
'`' # 0x79 -> GRAVE ACCENT
':' # 0x7A -> COLON
'#' # 0x7B -> NUMBER SIGN
'@' # 0x7C -> COMMERCIAL AT
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'"' # 0x7F -> QUOTATION MARK
'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (ICELANDIC)
'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (ICELANDIC)
'\xb1' # 0x8F -> PLUS-MINUS SIGN
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
'\xb8' # 0x9D -> CEDILLA
'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
'\u20ac' # 0x9F -> EURO SIGN
'\xb5' # 0xA0 -> MICRO SIGN
'~' # 0xA1 -> TILDE
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
'\xbf' # 0xAB -> INVERTED QUESTION MARK
'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (ICELANDIC)
'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (ICELANDIC)
'\xae' # 0xAF -> REGISTERED SIGN
'^' # 0xB0 -> CIRCUMFLEX ACCENT
'\xa3' # 0xB1 -> POUND SIGN
'\xa5' # 0xB2 -> YEN SIGN
'\xb7' # 0xB3 -> MIDDLE DOT
'\xa9' # 0xB4 -> COPYRIGHT SIGN
'\xa7' # 0xB5 -> SECTION SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
'[' # 0xBA -> LEFT SQUARE BRACKET
']' # 0xBB -> RIGHT SQUARE BRACKET
'\xaf' # 0xBC -> MACRON
'\xa8' # 0xBD -> DIAERESIS
'\xb4' # 0xBE -> ACUTE ACCENT
'\xd7' # 0xBF -> MULTIPLICATION SIGN
'{' # 0xC0 -> LEFT CURLY BRACKET
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0xCC -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
'}' # 0xD0 -> RIGHT CURLY BRACKET
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb9' # 0xDA -> SUPERSCRIPT ONE
'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xDC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\\' # 0xE0 -> REVERSE SOLIDUS
'\xf7' # 0xE1 -> DIVISION SIGN
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd6' # 0xEC -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xFC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1250 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1250',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\ufffe' # 0x83 -> UNDEFINED
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\ufffe' # 0x88 -> UNDEFINED
'\u2030' # 0x89 -> PER MILLE SIGN
'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u015a' # 0x8C -> LATIN CAPITAL LETTER S WITH ACUTE
'\u0164' # 0x8D -> LATIN CAPITAL LETTER T WITH CARON
'\u017d' # 0x8E -> LATIN CAPITAL LETTER Z WITH CARON
'\u0179' # 0x8F -> LATIN CAPITAL LETTER Z WITH ACUTE
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\ufffe' # 0x98 -> UNDEFINED
'\u2122' # 0x99 -> TRADE MARK SIGN
'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u015b' # 0x9C -> LATIN SMALL LETTER S WITH ACUTE
'\u0165' # 0x9D -> LATIN SMALL LETTER T WITH CARON
'\u017e' # 0x9E -> LATIN SMALL LETTER Z WITH CARON
'\u017a' # 0x9F -> LATIN SMALL LETTER Z WITH ACUTE
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u02c7' # 0xA1 -> CARON
'\u02d8' # 0xA2 -> BREVE
'\u0141' # 0xA3 -> LATIN CAPITAL LETTER L WITH STROKE
'\xa4' # 0xA4 -> CURRENCY SIGN
'\u0104' # 0xA5 -> LATIN CAPITAL LETTER A WITH OGONEK
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u015e' # 0xAA -> LATIN CAPITAL LETTER S WITH CEDILLA
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u02db' # 0xB2 -> OGONEK
'\u0142' # 0xB3 -> LATIN SMALL LETTER L WITH STROKE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\u0105' # 0xB9 -> LATIN SMALL LETTER A WITH OGONEK
'\u015f' # 0xBA -> LATIN SMALL LETTER S WITH CEDILLA
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u013d' # 0xBC -> LATIN CAPITAL LETTER L WITH CARON
'\u02dd' # 0xBD -> DOUBLE ACUTE ACCENT
'\u013e' # 0xBE -> LATIN SMALL LETTER L WITH CARON
'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u0154' # 0xC0 -> LATIN CAPITAL LETTER R WITH ACUTE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u0139' # 0xC5 -> LATIN CAPITAL LETTER L WITH ACUTE
'\u0106' # 0xC6 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u011a' # 0xCC -> LATIN CAPITAL LETTER E WITH CARON
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\u010e' # 0xCF -> LATIN CAPITAL LETTER D WITH CARON
'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\u0147' # 0xD2 -> LATIN CAPITAL LETTER N WITH CARON
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u0150' # 0xD5 -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\u0158' # 0xD8 -> LATIN CAPITAL LETTER R WITH CARON
'\u016e' # 0xD9 -> LATIN CAPITAL LETTER U WITH RING ABOVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\u0170' # 0xDB -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\u0162' # 0xDE -> LATIN CAPITAL LETTER T WITH CEDILLA
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\u0155' # 0xE0 -> LATIN SMALL LETTER R WITH ACUTE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\u013a' # 0xE5 -> LATIN SMALL LETTER L WITH ACUTE
'\u0107' # 0xE6 -> LATIN SMALL LETTER C WITH ACUTE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\u011b' # 0xEC -> LATIN SMALL LETTER E WITH CARON
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\u010f' # 0xEF -> LATIN SMALL LETTER D WITH CARON
'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
'\u0148' # 0xF2 -> LATIN SMALL LETTER N WITH CARON
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\u0151' # 0xF5 -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\u0159' # 0xF8 -> LATIN SMALL LETTER R WITH CARON
'\u016f' # 0xF9 -> LATIN SMALL LETTER U WITH RING ABOVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\u0171' # 0xFB -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\u0163' # 0xFE -> LATIN SMALL LETTER T WITH CEDILLA
'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1251 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1251',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u0402' # 0x80 -> CYRILLIC CAPITAL LETTER DJE
'\u0403' # 0x81 -> CYRILLIC CAPITAL LETTER GJE
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0453' # 0x83 -> CYRILLIC SMALL LETTER GJE
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u20ac' # 0x88 -> EURO SIGN
'\u2030' # 0x89 -> PER MILLE SIGN
'\u0409' # 0x8A -> CYRILLIC CAPITAL LETTER LJE
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u040a' # 0x8C -> CYRILLIC CAPITAL LETTER NJE
'\u040c' # 0x8D -> CYRILLIC CAPITAL LETTER KJE
'\u040b' # 0x8E -> CYRILLIC CAPITAL LETTER TSHE
'\u040f' # 0x8F -> CYRILLIC CAPITAL LETTER DZHE
'\u0452' # 0x90 -> CYRILLIC SMALL LETTER DJE
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\ufffe' # 0x98 -> UNDEFINED
'\u2122' # 0x99 -> TRADE MARK SIGN
'\u0459' # 0x9A -> CYRILLIC SMALL LETTER LJE
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u045a' # 0x9C -> CYRILLIC SMALL LETTER NJE
'\u045c' # 0x9D -> CYRILLIC SMALL LETTER KJE
'\u045b' # 0x9E -> CYRILLIC SMALL LETTER TSHE
'\u045f' # 0x9F -> CYRILLIC SMALL LETTER DZHE
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u040e' # 0xA1 -> CYRILLIC CAPITAL LETTER SHORT U
'\u045e' # 0xA2 -> CYRILLIC SMALL LETTER SHORT U
'\u0408' # 0xA3 -> CYRILLIC CAPITAL LETTER JE
'\xa4' # 0xA4 -> CURRENCY SIGN
'\u0490' # 0xA5 -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\u0401' # 0xA8 -> CYRILLIC CAPITAL LETTER IO
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u0404' # 0xAA -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\u0407' # 0xAF -> CYRILLIC CAPITAL LETTER YI
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u0406' # 0xB2 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0456' # 0xB3 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0491' # 0xB4 -> CYRILLIC SMALL LETTER GHE WITH UPTURN
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\u0451' # 0xB8 -> CYRILLIC SMALL LETTER IO
'\u2116' # 0xB9 -> NUMERO SIGN
'\u0454' # 0xBA -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u0458' # 0xBC -> CYRILLIC SMALL LETTER JE
'\u0405' # 0xBD -> CYRILLIC CAPITAL LETTER DZE
'\u0455' # 0xBE -> CYRILLIC SMALL LETTER DZE
'\u0457' # 0xBF -> CYRILLIC SMALL LETTER YI
'\u0410' # 0xC0 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0xC1 -> CYRILLIC CAPITAL LETTER BE
'\u0412' # 0xC2 -> CYRILLIC CAPITAL LETTER VE
'\u0413' # 0xC3 -> CYRILLIC CAPITAL LETTER GHE
'\u0414' # 0xC4 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0xC5 -> CYRILLIC CAPITAL LETTER IE
'\u0416' # 0xC6 -> CYRILLIC CAPITAL LETTER ZHE
'\u0417' # 0xC7 -> CYRILLIC CAPITAL LETTER ZE
'\u0418' # 0xC8 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0xC9 -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0xCA -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0xCB -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0xCC -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0xCD -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0xCE -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0xCF -> CYRILLIC CAPITAL LETTER PE
'\u0420' # 0xD0 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0xD1 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0xD2 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0xD3 -> CYRILLIC CAPITAL LETTER U
'\u0424' # 0xD4 -> CYRILLIC CAPITAL LETTER EF
'\u0425' # 0xD5 -> CYRILLIC CAPITAL LETTER HA
'\u0426' # 0xD6 -> CYRILLIC CAPITAL LETTER TSE
'\u0427' # 0xD7 -> CYRILLIC CAPITAL LETTER CHE
'\u0428' # 0xD8 -> CYRILLIC CAPITAL LETTER SHA
'\u0429' # 0xD9 -> CYRILLIC CAPITAL LETTER SHCHA
'\u042a' # 0xDA -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u042b' # 0xDB -> CYRILLIC CAPITAL LETTER YERU
'\u042c' # 0xDC -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042d' # 0xDD -> CYRILLIC CAPITAL LETTER E
'\u042e' # 0xDE -> CYRILLIC CAPITAL LETTER YU
'\u042f' # 0xDF -> CYRILLIC CAPITAL LETTER YA
'\u0430' # 0xE0 -> CYRILLIC SMALL LETTER A
'\u0431' # 0xE1 -> CYRILLIC SMALL LETTER BE
'\u0432' # 0xE2 -> CYRILLIC SMALL LETTER VE
'\u0433' # 0xE3 -> CYRILLIC SMALL LETTER GHE
'\u0434' # 0xE4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0xE5 -> CYRILLIC SMALL LETTER IE
'\u0436' # 0xE6 -> CYRILLIC SMALL LETTER ZHE
'\u0437' # 0xE7 -> CYRILLIC SMALL LETTER ZE
'\u0438' # 0xE8 -> CYRILLIC SMALL LETTER I
'\u0439' # 0xE9 -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0xEA -> CYRILLIC SMALL LETTER KA
'\u043b' # 0xEB -> CYRILLIC SMALL LETTER EL
'\u043c' # 0xEC -> CYRILLIC SMALL LETTER EM
'\u043d' # 0xED -> CYRILLIC SMALL LETTER EN
'\u043e' # 0xEE -> CYRILLIC SMALL LETTER O
'\u043f' # 0xEF -> CYRILLIC SMALL LETTER PE
'\u0440' # 0xF0 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0xF1 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0xF2 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0xF3 -> CYRILLIC SMALL LETTER U
'\u0444' # 0xF4 -> CYRILLIC SMALL LETTER EF
'\u0445' # 0xF5 -> CYRILLIC SMALL LETTER HA
'\u0446' # 0xF6 -> CYRILLIC SMALL LETTER TSE
'\u0447' # 0xF7 -> CYRILLIC SMALL LETTER CHE
'\u0448' # 0xF8 -> CYRILLIC SMALL LETTER SHA
'\u0449' # 0xF9 -> CYRILLIC SMALL LETTER SHCHA
'\u044a' # 0xFA -> CYRILLIC SMALL LETTER HARD SIGN
'\u044b' # 0xFB -> CYRILLIC SMALL LETTER YERU
'\u044c' # 0xFC -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044d' # 0xFD -> CYRILLIC SMALL LETTER E
'\u044e' # 0xFE -> CYRILLIC SMALL LETTER YU
'\u044f' # 0xFF -> CYRILLIC SMALL LETTER YA
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1252 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1252',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u2030' # 0x89 -> PER MILLE SIGN
'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
'\ufffe' # 0x8D -> UNDEFINED
'\u017d' # 0x8E -> LATIN CAPITAL LETTER Z WITH CARON
'\ufffe' # 0x8F -> UNDEFINED
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u02dc' # 0x98 -> SMALL TILDE
'\u2122' # 0x99 -> TRADE MARK SIGN
'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
'\ufffe' # 0x9D -> UNDEFINED
'\u017e' # 0x9E -> LATIN SMALL LETTER Z WITH CARON
'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0xFE -> LATIN SMALL LETTER THORN
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1253 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1253.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1253',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\ufffe' # 0x88 -> UNDEFINED
'\u2030' # 0x89 -> PER MILLE SIGN
'\ufffe' # 0x8A -> UNDEFINED
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\ufffe' # 0x8C -> UNDEFINED
'\ufffe' # 0x8D -> UNDEFINED
'\ufffe' # 0x8E -> UNDEFINED
'\ufffe' # 0x8F -> UNDEFINED
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\ufffe' # 0x98 -> UNDEFINED
'\u2122' # 0x99 -> TRADE MARK SIGN
'\ufffe' # 0x9A -> UNDEFINED
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\ufffe' # 0x9C -> UNDEFINED
'\ufffe' # 0x9D -> UNDEFINED
'\ufffe' # 0x9E -> UNDEFINED
'\ufffe' # 0x9F -> UNDEFINED
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0385' # 0xA1 -> GREEK DIALYTIKA TONOS
'\u0386' # 0xA2 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\ufffe' # 0xAA -> UNDEFINED
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\u2015' # 0xAF -> HORIZONTAL BAR
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\u0384' # 0xB4 -> GREEK TONOS
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\u0388' # 0xB8 -> GREEK CAPITAL LETTER EPSILON WITH TONOS
'\u0389' # 0xB9 -> GREEK CAPITAL LETTER ETA WITH TONOS
'\u038a' # 0xBA -> GREEK CAPITAL LETTER IOTA WITH TONOS
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u038c' # 0xBC -> GREEK CAPITAL LETTER OMICRON WITH TONOS
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\u038e' # 0xBE -> GREEK CAPITAL LETTER UPSILON WITH TONOS
'\u038f' # 0xBF -> GREEK CAPITAL LETTER OMEGA WITH TONOS
'\u0390' # 0xC0 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
'\u0391' # 0xC1 -> GREEK CAPITAL LETTER ALPHA
'\u0392' # 0xC2 -> GREEK CAPITAL LETTER BETA
'\u0393' # 0xC3 -> GREEK CAPITAL LETTER GAMMA
'\u0394' # 0xC4 -> GREEK CAPITAL LETTER DELTA
'\u0395' # 0xC5 -> GREEK CAPITAL LETTER EPSILON
'\u0396' # 0xC6 -> GREEK CAPITAL LETTER ZETA
'\u0397' # 0xC7 -> GREEK CAPITAL LETTER ETA
'\u0398' # 0xC8 -> GREEK CAPITAL LETTER THETA
'\u0399' # 0xC9 -> GREEK CAPITAL LETTER IOTA
'\u039a' # 0xCA -> GREEK CAPITAL LETTER KAPPA
'\u039b' # 0xCB -> GREEK CAPITAL LETTER LAMDA
'\u039c' # 0xCC -> GREEK CAPITAL LETTER MU
'\u039d' # 0xCD -> GREEK CAPITAL LETTER NU
'\u039e' # 0xCE -> GREEK CAPITAL LETTER XI
'\u039f' # 0xCF -> GREEK CAPITAL LETTER OMICRON
'\u03a0' # 0xD0 -> GREEK CAPITAL LETTER PI
'\u03a1' # 0xD1 -> GREEK CAPITAL LETTER RHO
'\ufffe' # 0xD2 -> UNDEFINED
'\u03a3' # 0xD3 -> GREEK CAPITAL LETTER SIGMA
'\u03a4' # 0xD4 -> GREEK CAPITAL LETTER TAU
'\u03a5' # 0xD5 -> GREEK CAPITAL LETTER UPSILON
'\u03a6' # 0xD6 -> GREEK CAPITAL LETTER PHI
'\u03a7' # 0xD7 -> GREEK CAPITAL LETTER CHI
'\u03a8' # 0xD8 -> GREEK CAPITAL LETTER PSI
'\u03a9' # 0xD9 -> GREEK CAPITAL LETTER OMEGA
'\u03aa' # 0xDA -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\u03ab' # 0xDB -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'\u03ac' # 0xDC -> GREEK SMALL LETTER ALPHA WITH TONOS
'\u03ad' # 0xDD -> GREEK SMALL LETTER EPSILON WITH TONOS
'\u03ae' # 0xDE -> GREEK SMALL LETTER ETA WITH TONOS
'\u03af' # 0xDF -> GREEK SMALL LETTER IOTA WITH TONOS
'\u03b0' # 0xE0 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
'\u03b1' # 0xE1 -> GREEK SMALL LETTER ALPHA
'\u03b2' # 0xE2 -> GREEK SMALL LETTER BETA
'\u03b3' # 0xE3 -> GREEK SMALL LETTER GAMMA
'\u03b4' # 0xE4 -> GREEK SMALL LETTER DELTA
'\u03b5' # 0xE5 -> GREEK SMALL LETTER EPSILON
'\u03b6' # 0xE6 -> GREEK SMALL LETTER ZETA
'\u03b7' # 0xE7 -> GREEK SMALL LETTER ETA
'\u03b8' # 0xE8 -> GREEK SMALL LETTER THETA
'\u03b9' # 0xE9 -> GREEK SMALL LETTER IOTA
'\u03ba' # 0xEA -> GREEK SMALL LETTER KAPPA
'\u03bb' # 0xEB -> GREEK SMALL LETTER LAMDA
'\u03bc' # 0xEC -> GREEK SMALL LETTER MU
'\u03bd' # 0xED -> GREEK SMALL LETTER NU
'\u03be' # 0xEE -> GREEK SMALL LETTER XI
'\u03bf' # 0xEF -> GREEK SMALL LETTER OMICRON
'\u03c0' # 0xF0 -> GREEK SMALL LETTER PI
'\u03c1' # 0xF1 -> GREEK SMALL LETTER RHO
'\u03c2' # 0xF2 -> GREEK SMALL LETTER FINAL SIGMA
'\u03c3' # 0xF3 -> GREEK SMALL LETTER SIGMA
'\u03c4' # 0xF4 -> GREEK SMALL LETTER TAU
'\u03c5' # 0xF5 -> GREEK SMALL LETTER UPSILON
'\u03c6' # 0xF6 -> GREEK SMALL LETTER PHI
'\u03c7' # 0xF7 -> GREEK SMALL LETTER CHI
'\u03c8' # 0xF8 -> GREEK SMALL LETTER PSI
'\u03c9' # 0xF9 -> GREEK SMALL LETTER OMEGA
'\u03ca' # 0xFA -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\u03cb' # 0xFB -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\u03cc' # 0xFC -> GREEK SMALL LETTER OMICRON WITH TONOS
'\u03cd' # 0xFD -> GREEK SMALL LETTER UPSILON WITH TONOS
'\u03ce' # 0xFE -> GREEK SMALL LETTER OMEGA WITH TONOS
'\ufffe' # 0xFF -> UNDEFINED
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1254 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1254.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1254',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u2030' # 0x89 -> PER MILLE SIGN
'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
'\ufffe' # 0x8D -> UNDEFINED
'\ufffe' # 0x8E -> UNDEFINED
'\ufffe' # 0x8F -> UNDEFINED
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u02dc' # 0x98 -> SMALL TILDE
'\u2122' # 0x99 -> TRADE MARK SIGN
'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
'\ufffe' # 0x9D -> UNDEFINED
'\ufffe' # 0x9E -> UNDEFINED
'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u011e' # 0xD0 -> LATIN CAPITAL LETTER G WITH BREVE
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u0130' # 0xDD -> LATIN CAPITAL LETTER I WITH DOT ABOVE
'\u015e' # 0xDE -> LATIN CAPITAL LETTER S WITH CEDILLA
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\u011f' # 0xF0 -> LATIN SMALL LETTER G WITH BREVE
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u0131' # 0xFD -> LATIN SMALL LETTER DOTLESS I
'\u015f' # 0xFE -> LATIN SMALL LETTER S WITH CEDILLA
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1255 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1255.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1255',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u2030' # 0x89 -> PER MILLE SIGN
'\ufffe' # 0x8A -> UNDEFINED
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\ufffe' # 0x8C -> UNDEFINED
'\ufffe' # 0x8D -> UNDEFINED
'\ufffe' # 0x8E -> UNDEFINED
'\ufffe' # 0x8F -> UNDEFINED
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u02dc' # 0x98 -> SMALL TILDE
'\u2122' # 0x99 -> TRADE MARK SIGN
'\ufffe' # 0x9A -> UNDEFINED
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\ufffe' # 0x9C -> UNDEFINED
'\ufffe' # 0x9D -> UNDEFINED
'\ufffe' # 0x9E -> UNDEFINED
'\ufffe' # 0x9F -> UNDEFINED
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\u20aa' # 0xA4 -> NEW SHEQEL SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xd7' # 0xAA -> MULTIPLICATION SIGN
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xf7' # 0xBA -> DIVISION SIGN
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\u05b0' # 0xC0 -> HEBREW POINT SHEVA
'\u05b1' # 0xC1 -> HEBREW POINT HATAF SEGOL
'\u05b2' # 0xC2 -> HEBREW POINT HATAF PATAH
'\u05b3' # 0xC3 -> HEBREW POINT HATAF QAMATS
'\u05b4' # 0xC4 -> HEBREW POINT HIRIQ
'\u05b5' # 0xC5 -> HEBREW POINT TSERE
'\u05b6' # 0xC6 -> HEBREW POINT SEGOL
'\u05b7' # 0xC7 -> HEBREW POINT PATAH
'\u05b8' # 0xC8 -> HEBREW POINT QAMATS
'\u05b9' # 0xC9 -> HEBREW POINT HOLAM
'\ufffe' # 0xCA -> UNDEFINED
'\u05bb' # 0xCB -> HEBREW POINT QUBUTS
'\u05bc' # 0xCC -> HEBREW POINT DAGESH OR MAPIQ
'\u05bd' # 0xCD -> HEBREW POINT METEG
'\u05be' # 0xCE -> HEBREW PUNCTUATION MAQAF
'\u05bf' # 0xCF -> HEBREW POINT RAFE
'\u05c0' # 0xD0 -> HEBREW PUNCTUATION PASEQ
'\u05c1' # 0xD1 -> HEBREW POINT SHIN DOT
'\u05c2' # 0xD2 -> HEBREW POINT SIN DOT
'\u05c3' # 0xD3 -> HEBREW PUNCTUATION SOF PASUQ
'\u05f0' # 0xD4 -> HEBREW LIGATURE YIDDISH DOUBLE VAV
'\u05f1' # 0xD5 -> HEBREW LIGATURE YIDDISH VAV YOD
'\u05f2' # 0xD6 -> HEBREW LIGATURE YIDDISH DOUBLE YOD
'\u05f3' # 0xD7 -> HEBREW PUNCTUATION GERESH
'\u05f4' # 0xD8 -> HEBREW PUNCTUATION GERSHAYIM
'\ufffe' # 0xD9 -> UNDEFINED
'\ufffe' # 0xDA -> UNDEFINED
'\ufffe' # 0xDB -> UNDEFINED
'\ufffe' # 0xDC -> UNDEFINED
'\ufffe' # 0xDD -> UNDEFINED
'\ufffe' # 0xDE -> UNDEFINED
'\ufffe' # 0xDF -> UNDEFINED
'\u05d0' # 0xE0 -> HEBREW LETTER ALEF
'\u05d1' # 0xE1 -> HEBREW LETTER BET
'\u05d2' # 0xE2 -> HEBREW LETTER GIMEL
'\u05d3' # 0xE3 -> HEBREW LETTER DALET
'\u05d4' # 0xE4 -> HEBREW LETTER HE
'\u05d5' # 0xE5 -> HEBREW LETTER VAV
'\u05d6' # 0xE6 -> HEBREW LETTER ZAYIN
'\u05d7' # 0xE7 -> HEBREW LETTER HET
'\u05d8' # 0xE8 -> HEBREW LETTER TET
'\u05d9' # 0xE9 -> HEBREW LETTER YOD
'\u05da' # 0xEA -> HEBREW LETTER FINAL KAF
'\u05db' # 0xEB -> HEBREW LETTER KAF
'\u05dc' # 0xEC -> HEBREW LETTER LAMED
'\u05dd' # 0xED -> HEBREW LETTER FINAL MEM
'\u05de' # 0xEE -> HEBREW LETTER MEM
'\u05df' # 0xEF -> HEBREW LETTER FINAL NUN
'\u05e0' # 0xF0 -> HEBREW LETTER NUN
'\u05e1' # 0xF1 -> HEBREW LETTER SAMEKH
'\u05e2' # 0xF2 -> HEBREW LETTER AYIN
'\u05e3' # 0xF3 -> HEBREW LETTER FINAL PE
'\u05e4' # 0xF4 -> HEBREW LETTER PE
'\u05e5' # 0xF5 -> HEBREW LETTER FINAL TSADI
'\u05e6' # 0xF6 -> HEBREW LETTER TSADI
'\u05e7' # 0xF7 -> HEBREW LETTER QOF
'\u05e8' # 0xF8 -> HEBREW LETTER RESH
'\u05e9' # 0xF9 -> HEBREW LETTER SHIN
'\u05ea' # 0xFA -> HEBREW LETTER TAV
'\ufffe' # 0xFB -> UNDEFINED
'\ufffe' # 0xFC -> UNDEFINED
'\u200e' # 0xFD -> LEFT-TO-RIGHT MARK
'\u200f' # 0xFE -> RIGHT-TO-LEFT MARK
'\ufffe' # 0xFF -> UNDEFINED
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1256 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1256.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1256',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\u067e' # 0x81 -> ARABIC LETTER PEH
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u2030' # 0x89 -> PER MILLE SIGN
'\u0679' # 0x8A -> ARABIC LETTER TTEH
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
'\u0686' # 0x8D -> ARABIC LETTER TCHEH
'\u0698' # 0x8E -> ARABIC LETTER JEH
'\u0688' # 0x8F -> ARABIC LETTER DDAL
'\u06af' # 0x90 -> ARABIC LETTER GAF
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u06a9' # 0x98 -> ARABIC LETTER KEHEH
'\u2122' # 0x99 -> TRADE MARK SIGN
'\u0691' # 0x9A -> ARABIC LETTER RREH
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
'\u200c' # 0x9D -> ZERO WIDTH NON-JOINER
'\u200d' # 0x9E -> ZERO WIDTH JOINER
'\u06ba' # 0x9F -> ARABIC LETTER NOON GHUNNA
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u060c' # 0xA1 -> ARABIC COMMA
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u06be' # 0xAA -> ARABIC LETTER HEH DOACHASHMEE
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\u061b' # 0xBA -> ARABIC SEMICOLON
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\u061f' # 0xBF -> ARABIC QUESTION MARK
'\u06c1' # 0xC0 -> ARABIC LETTER HEH GOAL
'\u0621' # 0xC1 -> ARABIC LETTER HAMZA
'\u0622' # 0xC2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
'\u0623' # 0xC3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
'\u0624' # 0xC4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
'\u0625' # 0xC5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
'\u0626' # 0xC6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
'\u0627' # 0xC7 -> ARABIC LETTER ALEF
'\u0628' # 0xC8 -> ARABIC LETTER BEH
'\u0629' # 0xC9 -> ARABIC LETTER TEH MARBUTA
'\u062a' # 0xCA -> ARABIC LETTER TEH
'\u062b' # 0xCB -> ARABIC LETTER THEH
'\u062c' # 0xCC -> ARABIC LETTER JEEM
'\u062d' # 0xCD -> ARABIC LETTER HAH
'\u062e' # 0xCE -> ARABIC LETTER KHAH
'\u062f' # 0xCF -> ARABIC LETTER DAL
'\u0630' # 0xD0 -> ARABIC LETTER THAL
'\u0631' # 0xD1 -> ARABIC LETTER REH
'\u0632' # 0xD2 -> ARABIC LETTER ZAIN
'\u0633' # 0xD3 -> ARABIC LETTER SEEN
'\u0634' # 0xD4 -> ARABIC LETTER SHEEN
'\u0635' # 0xD5 -> ARABIC LETTER SAD
'\u0636' # 0xD6 -> ARABIC LETTER DAD
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\u0637' # 0xD8 -> ARABIC LETTER TAH
'\u0638' # 0xD9 -> ARABIC LETTER ZAH
'\u0639' # 0xDA -> ARABIC LETTER AIN
'\u063a' # 0xDB -> ARABIC LETTER GHAIN
'\u0640' # 0xDC -> ARABIC TATWEEL
'\u0641' # 0xDD -> ARABIC LETTER FEH
'\u0642' # 0xDE -> ARABIC LETTER QAF
'\u0643' # 0xDF -> ARABIC LETTER KAF
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\u0644' # 0xE1 -> ARABIC LETTER LAM
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\u0645' # 0xE3 -> ARABIC LETTER MEEM
'\u0646' # 0xE4 -> ARABIC LETTER NOON
'\u0647' # 0xE5 -> ARABIC LETTER HEH
'\u0648' # 0xE6 -> ARABIC LETTER WAW
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\u0649' # 0xEC -> ARABIC LETTER ALEF MAKSURA
'\u064a' # 0xED -> ARABIC LETTER YEH
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\u064b' # 0xF0 -> ARABIC FATHATAN
'\u064c' # 0xF1 -> ARABIC DAMMATAN
'\u064d' # 0xF2 -> ARABIC KASRATAN
'\u064e' # 0xF3 -> ARABIC FATHA
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\u064f' # 0xF5 -> ARABIC DAMMA
'\u0650' # 0xF6 -> ARABIC KASRA
'\xf7' # 0xF7 -> DIVISION SIGN
'\u0651' # 0xF8 -> ARABIC SHADDA
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\u0652' # 0xFA -> ARABIC SUKUN
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u200e' # 0xFD -> LEFT-TO-RIGHT MARK
'\u200f' # 0xFE -> RIGHT-TO-LEFT MARK
'\u06d2' # 0xFF -> ARABIC LETTER YEH BARREE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1257 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1257.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1257',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\ufffe' # 0x83 -> UNDEFINED
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\ufffe' # 0x88 -> UNDEFINED
'\u2030' # 0x89 -> PER MILLE SIGN
'\ufffe' # 0x8A -> UNDEFINED
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\ufffe' # 0x8C -> UNDEFINED
'\xa8' # 0x8D -> DIAERESIS
'\u02c7' # 0x8E -> CARON
'\xb8' # 0x8F -> CEDILLA
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\ufffe' # 0x98 -> UNDEFINED
'\u2122' # 0x99 -> TRADE MARK SIGN
'\ufffe' # 0x9A -> UNDEFINED
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\ufffe' # 0x9C -> UNDEFINED
'\xaf' # 0x9D -> MACRON
'\u02db' # 0x9E -> OGONEK
'\ufffe' # 0x9F -> UNDEFINED
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\ufffe' # 0xA1 -> UNDEFINED
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\ufffe' # 0xA5 -> UNDEFINED
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xd8' # 0xA8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u0156' # 0xAA -> LATIN CAPITAL LETTER R WITH CEDILLA
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xc6' # 0xAF -> LATIN CAPITAL LETTER AE
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xf8' # 0xB8 -> LATIN SMALL LETTER O WITH STROKE
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\u0157' # 0xBA -> LATIN SMALL LETTER R WITH CEDILLA
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xe6' # 0xBF -> LATIN SMALL LETTER AE
'\u0104' # 0xC0 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u012e' # 0xC1 -> LATIN CAPITAL LETTER I WITH OGONEK
'\u0100' # 0xC2 -> LATIN CAPITAL LETTER A WITH MACRON
'\u0106' # 0xC3 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\u0118' # 0xC6 -> LATIN CAPITAL LETTER E WITH OGONEK
'\u0112' # 0xC7 -> LATIN CAPITAL LETTER E WITH MACRON
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0179' # 0xCA -> LATIN CAPITAL LETTER Z WITH ACUTE
'\u0116' # 0xCB -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\u0122' # 0xCC -> LATIN CAPITAL LETTER G WITH CEDILLA
'\u0136' # 0xCD -> LATIN CAPITAL LETTER K WITH CEDILLA
'\u012a' # 0xCE -> LATIN CAPITAL LETTER I WITH MACRON
'\u013b' # 0xCF -> LATIN CAPITAL LETTER L WITH CEDILLA
'\u0160' # 0xD0 -> LATIN CAPITAL LETTER S WITH CARON
'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\u0145' # 0xD2 -> LATIN CAPITAL LETTER N WITH CEDILLA
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\u014c' # 0xD4 -> LATIN CAPITAL LETTER O WITH MACRON
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\u0172' # 0xD8 -> LATIN CAPITAL LETTER U WITH OGONEK
'\u0141' # 0xD9 -> LATIN CAPITAL LETTER L WITH STROKE
'\u015a' # 0xDA -> LATIN CAPITAL LETTER S WITH ACUTE
'\u016a' # 0xDB -> LATIN CAPITAL LETTER U WITH MACRON
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u017b' # 0xDD -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\u017d' # 0xDE -> LATIN CAPITAL LETTER Z WITH CARON
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\u0105' # 0xE0 -> LATIN SMALL LETTER A WITH OGONEK
'\u012f' # 0xE1 -> LATIN SMALL LETTER I WITH OGONEK
'\u0101' # 0xE2 -> LATIN SMALL LETTER A WITH MACRON
'\u0107' # 0xE3 -> LATIN SMALL LETTER C WITH ACUTE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\u0119' # 0xE6 -> LATIN SMALL LETTER E WITH OGONEK
'\u0113' # 0xE7 -> LATIN SMALL LETTER E WITH MACRON
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\u017a' # 0xEA -> LATIN SMALL LETTER Z WITH ACUTE
'\u0117' # 0xEB -> LATIN SMALL LETTER E WITH DOT ABOVE
'\u0123' # 0xEC -> LATIN SMALL LETTER G WITH CEDILLA
'\u0137' # 0xED -> LATIN SMALL LETTER K WITH CEDILLA
'\u012b' # 0xEE -> LATIN SMALL LETTER I WITH MACRON
'\u013c' # 0xEF -> LATIN SMALL LETTER L WITH CEDILLA
'\u0161' # 0xF0 -> LATIN SMALL LETTER S WITH CARON
'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
'\u0146' # 0xF2 -> LATIN SMALL LETTER N WITH CEDILLA
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\u014d' # 0xF4 -> LATIN SMALL LETTER O WITH MACRON
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\u0173' # 0xF8 -> LATIN SMALL LETTER U WITH OGONEK
'\u0142' # 0xF9 -> LATIN SMALL LETTER L WITH STROKE
'\u015b' # 0xFA -> LATIN SMALL LETTER S WITH ACUTE
'\u016b' # 0xFB -> LATIN SMALL LETTER U WITH MACRON
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u017e' # 0xFE -> LATIN SMALL LETTER Z WITH CARON
'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp1258 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1258.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp1258',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u2030' # 0x89 -> PER MILLE SIGN
'\ufffe' # 0x8A -> UNDEFINED
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
'\ufffe' # 0x8D -> UNDEFINED
'\ufffe' # 0x8E -> UNDEFINED
'\ufffe' # 0x8F -> UNDEFINED
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u02dc' # 0x98 -> SMALL TILDE
'\u2122' # 0x99 -> TRADE MARK SIGN
'\ufffe' # 0x9A -> UNDEFINED
'\u203a' # 0x9B -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
'\ufffe' # 0x9D -> UNDEFINED
'\ufffe' # 0x9E -> UNDEFINED
'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u0300' # 0xCC -> COMBINING GRAVE ACCENT
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\u0309' # 0xD2 -> COMBINING HOOK ABOVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u01a0' # 0xD5 -> LATIN CAPITAL LETTER O WITH HORN
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u01af' # 0xDD -> LATIN CAPITAL LETTER U WITH HORN
'\u0303' # 0xDE -> COMBINING TILDE
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\u0301' # 0xEC -> COMBINING ACUTE ACCENT
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\u0323' # 0xF2 -> COMBINING DOT BELOW
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\u01a1' # 0xF5 -> LATIN SMALL LETTER O WITH HORN
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u01b0' # 0xFD -> LATIN SMALL LETTER U WITH HORN
'\u20ab' # 0xFE -> DONG SIGN
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp273 generated from 'python-mappings/CP273.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp273',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL (NUL)
'\x01' # 0x01 -> START OF HEADING (SOH)
'\x02' # 0x02 -> START OF TEXT (STX)
'\x03' # 0x03 -> END OF TEXT (ETX)
'\x9c' # 0x04 -> STRING TERMINATOR (ST)
'\t' # 0x05 -> CHARACTER TABULATION (HT)
'\x86' # 0x06 -> START OF SELECTED AREA (SSA)
'\x7f' # 0x07 -> DELETE (DEL)
'\x97' # 0x08 -> END OF GUARDED AREA (EPA)
'\x8d' # 0x09 -> REVERSE LINE FEED (RI)
'\x8e' # 0x0A -> SINGLE-SHIFT TWO (SS2)
'\x0b' # 0x0B -> LINE TABULATION (VT)
'\x0c' # 0x0C -> FORM FEED (FF)
'\r' # 0x0D -> CARRIAGE RETURN (CR)
'\x0e' # 0x0E -> SHIFT OUT (SO)
'\x0f' # 0x0F -> SHIFT IN (SI)
'\x10' # 0x10 -> DATALINK ESCAPE (DLE)
'\x11' # 0x11 -> DEVICE CONTROL ONE (DC1)
'\x12' # 0x12 -> DEVICE CONTROL TWO (DC2)
'\x13' # 0x13 -> DEVICE CONTROL THREE (DC3)
'\x9d' # 0x14 -> OPERATING SYSTEM COMMAND (OSC)
'\x85' # 0x15 -> NEXT LINE (NEL)
'\x08' # 0x16 -> BACKSPACE (BS)
'\x87' # 0x17 -> END OF SELECTED AREA (ESA)
'\x18' # 0x18 -> CANCEL (CAN)
'\x19' # 0x19 -> END OF MEDIUM (EM)
'\x92' # 0x1A -> PRIVATE USE TWO (PU2)
'\x8f' # 0x1B -> SINGLE-SHIFT THREE (SS3)
'\x1c' # 0x1C -> FILE SEPARATOR (IS4)
'\x1d' # 0x1D -> GROUP SEPARATOR (IS3)
'\x1e' # 0x1E -> RECORD SEPARATOR (IS2)
'\x1f' # 0x1F -> UNIT SEPARATOR (IS1)
'\x80' # 0x20 -> PADDING CHARACTER (PAD)
'\x81' # 0x21 -> HIGH OCTET PRESET (HOP)
'\x82' # 0x22 -> BREAK PERMITTED HERE (BPH)
'\x83' # 0x23 -> NO BREAK HERE (NBH)
'\x84' # 0x24 -> INDEX (IND)
'\n' # 0x25 -> LINE FEED (LF)
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK (ETB)
'\x1b' # 0x27 -> ESCAPE (ESC)
'\x88' # 0x28 -> CHARACTER TABULATION SET (HTS)
'\x89' # 0x29 -> CHARACTER TABULATION WITH JUSTIFICATION (HTJ)
'\x8a' # 0x2A -> LINE TABULATION SET (VTS)
'\x8b' # 0x2B -> PARTIAL LINE FORWARD (PLD)
'\x8c' # 0x2C -> PARTIAL LINE BACKWARD (PLU)
'\x05' # 0x2D -> ENQUIRY (ENQ)
'\x06' # 0x2E -> ACKNOWLEDGE (ACK)
'\x07' # 0x2F -> BELL (BEL)
'\x90' # 0x30 -> DEVICE CONTROL STRING (DCS)
'\x91' # 0x31 -> PRIVATE USE ONE (PU1)
'\x16' # 0x32 -> SYNCHRONOUS IDLE (SYN)
'\x93' # 0x33 -> SET TRANSMIT STATE (STS)
'\x94' # 0x34 -> CANCEL CHARACTER (CCH)
'\x95' # 0x35 -> MESSAGE WAITING (MW)
'\x96' # 0x36 -> START OF GUARDED AREA (SPA)
'\x04' # 0x37 -> END OF TRANSMISSION (EOT)
'\x98' # 0x38 -> START OF STRING (SOS)
'\x99' # 0x39 -> SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI)
'\x9a' # 0x3A -> SINGLE CHARACTER INTRODUCER (SCI)
'\x9b' # 0x3B -> CONTROL SEQUENCE INTRODUCER (CSI)
'\x14' # 0x3C -> DEVICE CONTROL FOUR (DC4)
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE (NAK)
'\x9e' # 0x3E -> PRIVACY MESSAGE (PM)
'\x1a' # 0x3F -> SUBSTITUTE (SUB)
' ' # 0x40 -> SPACE
'\xa0' # 0x41 -> NO-BREAK SPACE
'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'{' # 0x43 -> LEFT CURLY BRACKET
'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
'\xc4' # 0x4A -> LATIN CAPITAL LETTER A WITH DIAERESIS
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'!' # 0x4F -> EXCLAMATION MARK
'&' # 0x50 -> AMPERSAND
'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
'~' # 0x59 -> TILDE
'\xdc' # 0x5A -> LATIN CAPITAL LETTER U WITH DIAERESIS
'$' # 0x5B -> DOLLAR SIGN
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'^' # 0x5F -> CIRCUMFLEX ACCENT
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'[' # 0x63 -> LEFT SQUARE BRACKET
'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
'\xf6' # 0x6A -> LATIN SMALL LETTER O WITH DIAERESIS
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
'`' # 0x79 -> GRAVE ACCENT
':' # 0x7A -> COLON
'#' # 0x7B -> NUMBER SIGN
'\xa7' # 0x7C -> SECTION SIGN
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'"' # 0x7F -> QUOTATION MARK
'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (Icelandic)
'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (Icelandic)
'\xb1' # 0x8F -> PLUS-MINUS SIGN
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
'\xe6' # 0x9C -> LATIN SMALL LETTER AE
'\xb8' # 0x9D -> CEDILLA
'\xc6' # 0x9E -> LATIN CAPITAL LETTER AE
'\xa4' # 0x9F -> CURRENCY SIGN
'\xb5' # 0xA0 -> MICRO SIGN
'\xdf' # 0xA1 -> LATIN SMALL LETTER SHARP S (German)
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
'\xbf' # 0xAB -> INVERTED QUESTION MARK
'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (Icelandic)
'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (Icelandic)
'\xae' # 0xAF -> REGISTERED SIGN
'\xa2' # 0xB0 -> CENT SIGN
'\xa3' # 0xB1 -> POUND SIGN
'\xa5' # 0xB2 -> YEN SIGN
'\xb7' # 0xB3 -> MIDDLE DOT
'\xa9' # 0xB4 -> COPYRIGHT SIGN
'@' # 0xB5 -> COMMERCIAL AT
'\xb6' # 0xB6 -> PILCROW SIGN
'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
'\xac' # 0xBA -> NOT SIGN
'|' # 0xBB -> VERTICAL LINE
'\u203e' # 0xBC -> OVERLINE
'\xa8' # 0xBD -> DIAERESIS
'\xb4' # 0xBE -> ACUTE ACCENT
'\xd7' # 0xBF -> MULTIPLICATION SIGN
'\xe4' # 0xC0 -> LATIN SMALL LETTER A WITH DIAERESIS
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xa6' # 0xCC -> BROKEN BAR
'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
'\xfc' # 0xD0 -> LATIN SMALL LETTER U WITH DIAERESIS
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb9' # 0xDA -> SUPERSCRIPT ONE
'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'}' # 0xDC -> RIGHT CURLY BRACKET
'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xd6' # 0xE0 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xf7' # 0xE1 -> DIVISION SIGN
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\\' # 0xEC -> REVERSE SOLIDUS
'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
']' # 0xFC -> RIGHT SQUARE BRACKET
'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
'\x9f' # 0xFF -> APPLICATION PROGRAM COMMAND (APC)
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp424 generated from 'MAPPINGS/VENDORS/MISC/CP424.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp424',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x9c' # 0x04 -> SELECT
'\t' # 0x05 -> HORIZONTAL TABULATION
'\x86' # 0x06 -> REQUIRED NEW LINE
'\x7f' # 0x07 -> DELETE
'\x97' # 0x08 -> GRAPHIC ESCAPE
'\x8d' # 0x09 -> SUPERSCRIPT
'\x8e' # 0x0A -> REPEAT
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x9d' # 0x14 -> RESTORE/ENABLE PRESENTATION
'\x85' # 0x15 -> NEW LINE
'\x08' # 0x16 -> BACKSPACE
'\x87' # 0x17 -> PROGRAM OPERATOR COMMUNICATION
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x92' # 0x1A -> UNIT BACK SPACE
'\x8f' # 0x1B -> CUSTOMER USE ONE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
'\x80' # 0x20 -> DIGIT SELECT
'\x81' # 0x21 -> START OF SIGNIFICANCE
'\x82' # 0x22 -> FIELD SEPARATOR
'\x83' # 0x23 -> WORD UNDERSCORE
'\x84' # 0x24 -> BYPASS OR INHIBIT PRESENTATION
'\n' # 0x25 -> LINE FEED
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
'\x1b' # 0x27 -> ESCAPE
'\x88' # 0x28 -> SET ATTRIBUTE
'\x89' # 0x29 -> START FIELD EXTENDED
'\x8a' # 0x2A -> SET MODE OR SWITCH
'\x8b' # 0x2B -> CONTROL SEQUENCE PREFIX
'\x8c' # 0x2C -> MODIFY FIELD ATTRIBUTE
'\x05' # 0x2D -> ENQUIRY
'\x06' # 0x2E -> ACKNOWLEDGE
'\x07' # 0x2F -> BELL
'\x90' # 0x30 -> <reserved>
'\x91' # 0x31 -> <reserved>
'\x16' # 0x32 -> SYNCHRONOUS IDLE
'\x93' # 0x33 -> INDEX RETURN
'\x94' # 0x34 -> PRESENTATION POSITION
'\x95' # 0x35 -> TRANSPARENT
'\x96' # 0x36 -> NUMERIC BACKSPACE
'\x04' # 0x37 -> END OF TRANSMISSION
'\x98' # 0x38 -> SUBSCRIPT
'\x99' # 0x39 -> INDENT TABULATION
'\x9a' # 0x3A -> REVERSE FORM FEED
'\x9b' # 0x3B -> CUSTOMER USE THREE
'\x14' # 0x3C -> DEVICE CONTROL FOUR
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
'\x9e' # 0x3E -> <reserved>
'\x1a' # 0x3F -> SUBSTITUTE
' ' # 0x40 -> SPACE
'\u05d0' # 0x41 -> HEBREW LETTER ALEF
'\u05d1' # 0x42 -> HEBREW LETTER BET
'\u05d2' # 0x43 -> HEBREW LETTER GIMEL
'\u05d3' # 0x44 -> HEBREW LETTER DALET
'\u05d4' # 0x45 -> HEBREW LETTER HE
'\u05d5' # 0x46 -> HEBREW LETTER VAV
'\u05d6' # 0x47 -> HEBREW LETTER ZAYIN
'\u05d7' # 0x48 -> HEBREW LETTER HET
'\u05d8' # 0x49 -> HEBREW LETTER TET
'\xa2' # 0x4A -> CENT SIGN
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'|' # 0x4F -> VERTICAL LINE
'&' # 0x50 -> AMPERSAND
'\u05d9' # 0x51 -> HEBREW LETTER YOD
'\u05da' # 0x52 -> HEBREW LETTER FINAL KAF
'\u05db' # 0x53 -> HEBREW LETTER KAF
'\u05dc' # 0x54 -> HEBREW LETTER LAMED
'\u05dd' # 0x55 -> HEBREW LETTER FINAL MEM
'\u05de' # 0x56 -> HEBREW LETTER MEM
'\u05df' # 0x57 -> HEBREW LETTER FINAL NUN
'\u05e0' # 0x58 -> HEBREW LETTER NUN
'\u05e1' # 0x59 -> HEBREW LETTER SAMEKH
'!' # 0x5A -> EXCLAMATION MARK
'$' # 0x5B -> DOLLAR SIGN
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'\xac' # 0x5F -> NOT SIGN
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\u05e2' # 0x62 -> HEBREW LETTER AYIN
'\u05e3' # 0x63 -> HEBREW LETTER FINAL PE
'\u05e4' # 0x64 -> HEBREW LETTER PE
'\u05e5' # 0x65 -> HEBREW LETTER FINAL TSADI
'\u05e6' # 0x66 -> HEBREW LETTER TSADI
'\u05e7' # 0x67 -> HEBREW LETTER QOF
'\u05e8' # 0x68 -> HEBREW LETTER RESH
'\u05e9' # 0x69 -> HEBREW LETTER SHIN
'\xa6' # 0x6A -> BROKEN BAR
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\ufffe' # 0x70 -> UNDEFINED
'\u05ea' # 0x71 -> HEBREW LETTER TAV
'\ufffe' # 0x72 -> UNDEFINED
'\ufffe' # 0x73 -> UNDEFINED
'\xa0' # 0x74 -> NO-BREAK SPACE
'\ufffe' # 0x75 -> UNDEFINED
'\ufffe' # 0x76 -> UNDEFINED
'\ufffe' # 0x77 -> UNDEFINED
'\u2017' # 0x78 -> DOUBLE LOW LINE
'`' # 0x79 -> GRAVE ACCENT
':' # 0x7A -> COLON
'#' # 0x7B -> NUMBER SIGN
'@' # 0x7C -> COMMERCIAL AT
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'"' # 0x7F -> QUOTATION MARK
'\ufffe' # 0x80 -> UNDEFINED
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\ufffe' # 0x8C -> UNDEFINED
'\ufffe' # 0x8D -> UNDEFINED
'\ufffe' # 0x8E -> UNDEFINED
'\xb1' # 0x8F -> PLUS-MINUS SIGN
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\ufffe' # 0x9A -> UNDEFINED
'\ufffe' # 0x9B -> UNDEFINED
'\ufffe' # 0x9C -> UNDEFINED
'\xb8' # 0x9D -> CEDILLA
'\ufffe' # 0x9E -> UNDEFINED
'\xa4' # 0x9F -> CURRENCY SIGN
'\xb5' # 0xA0 -> MICRO SIGN
'~' # 0xA1 -> TILDE
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\ufffe' # 0xAA -> UNDEFINED
'\ufffe' # 0xAB -> UNDEFINED
'\ufffe' # 0xAC -> UNDEFINED
'\ufffe' # 0xAD -> UNDEFINED
'\ufffe' # 0xAE -> UNDEFINED
'\xae' # 0xAF -> REGISTERED SIGN
'^' # 0xB0 -> CIRCUMFLEX ACCENT
'\xa3' # 0xB1 -> POUND SIGN
'\xa5' # 0xB2 -> YEN SIGN
'\xb7' # 0xB3 -> MIDDLE DOT
'\xa9' # 0xB4 -> COPYRIGHT SIGN
'\xa7' # 0xB5 -> SECTION SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
'[' # 0xBA -> LEFT SQUARE BRACKET
']' # 0xBB -> RIGHT SQUARE BRACKET
'\xaf' # 0xBC -> MACRON
'\xa8' # 0xBD -> DIAERESIS
'\xb4' # 0xBE -> ACUTE ACCENT
'\xd7' # 0xBF -> MULTIPLICATION SIGN
'{' # 0xC0 -> LEFT CURLY BRACKET
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\ufffe' # 0xCB -> UNDEFINED
'\ufffe' # 0xCC -> UNDEFINED
'\ufffe' # 0xCD -> UNDEFINED
'\ufffe' # 0xCE -> UNDEFINED
'\ufffe' # 0xCF -> UNDEFINED
'}' # 0xD0 -> RIGHT CURLY BRACKET
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb9' # 0xDA -> SUPERSCRIPT ONE
'\ufffe' # 0xDB -> UNDEFINED
'\ufffe' # 0xDC -> UNDEFINED
'\ufffe' # 0xDD -> UNDEFINED
'\ufffe' # 0xDE -> UNDEFINED
'\ufffe' # 0xDF -> UNDEFINED
'\\' # 0xE0 -> REVERSE SOLIDUS
'\xf7' # 0xE1 -> DIVISION SIGN
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\ufffe' # 0xEB -> UNDEFINED
'\ufffe' # 0xEC -> UNDEFINED
'\ufffe' # 0xED -> UNDEFINED
'\ufffe' # 0xEE -> UNDEFINED
'\ufffe' # 0xEF -> UNDEFINED
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\ufffe' # 0xFB -> UNDEFINED
'\ufffe' # 0xFC -> UNDEFINED
'\ufffe' # 0xFD -> UNDEFINED
'\ufffe' # 0xFE -> UNDEFINED
'\x9f' # 0xFF -> EIGHT ONES
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp437 generated from 'VENDORS/MICSFT/PC/CP437.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp437',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00a5, # YEN SIGN
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xa2' # 0x009b -> CENT SIGN
'\xa3' # 0x009c -> POUND SIGN
'\xa5' # 0x009d -> YEN SIGN
'\u20a7' # 0x009e -> PESETA SIGN
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\u2310' # 0x00a9 -> REVERSED NOT SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
'\xb5' # 0x00e6 -> MICRO SIGN
'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
'\u221e' # 0x00ec -> INFINITY
'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
'\u2229' # 0x00ef -> INTERSECTION
'\u2261' # 0x00f0 -> IDENTICAL TO
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a5: 0x009d, # YEN SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp500 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP500.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp500',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x9c' # 0x04 -> CONTROL
'\t' # 0x05 -> HORIZONTAL TABULATION
'\x86' # 0x06 -> CONTROL
'\x7f' # 0x07 -> DELETE
'\x97' # 0x08 -> CONTROL
'\x8d' # 0x09 -> CONTROL
'\x8e' # 0x0A -> CONTROL
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x9d' # 0x14 -> CONTROL
'\x85' # 0x15 -> CONTROL
'\x08' # 0x16 -> BACKSPACE
'\x87' # 0x17 -> CONTROL
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x92' # 0x1A -> CONTROL
'\x8f' # 0x1B -> CONTROL
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
'\x80' # 0x20 -> CONTROL
'\x81' # 0x21 -> CONTROL
'\x82' # 0x22 -> CONTROL
'\x83' # 0x23 -> CONTROL
'\x84' # 0x24 -> CONTROL
'\n' # 0x25 -> LINE FEED
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
'\x1b' # 0x27 -> ESCAPE
'\x88' # 0x28 -> CONTROL
'\x89' # 0x29 -> CONTROL
'\x8a' # 0x2A -> CONTROL
'\x8b' # 0x2B -> CONTROL
'\x8c' # 0x2C -> CONTROL
'\x05' # 0x2D -> ENQUIRY
'\x06' # 0x2E -> ACKNOWLEDGE
'\x07' # 0x2F -> BELL
'\x90' # 0x30 -> CONTROL
'\x91' # 0x31 -> CONTROL
'\x16' # 0x32 -> SYNCHRONOUS IDLE
'\x93' # 0x33 -> CONTROL
'\x94' # 0x34 -> CONTROL
'\x95' # 0x35 -> CONTROL
'\x96' # 0x36 -> CONTROL
'\x04' # 0x37 -> END OF TRANSMISSION
'\x98' # 0x38 -> CONTROL
'\x99' # 0x39 -> CONTROL
'\x9a' # 0x3A -> CONTROL
'\x9b' # 0x3B -> CONTROL
'\x14' # 0x3C -> DEVICE CONTROL FOUR
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
'\x9e' # 0x3E -> CONTROL
'\x1a' # 0x3F -> SUBSTITUTE
' ' # 0x40 -> SPACE
'\xa0' # 0x41 -> NO-BREAK SPACE
'\xe2' # 0x42 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x43 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x44 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0x45 -> LATIN SMALL LETTER A WITH ACUTE
'\xe3' # 0x46 -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x47 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x48 -> LATIN SMALL LETTER C WITH CEDILLA
'\xf1' # 0x49 -> LATIN SMALL LETTER N WITH TILDE
'[' # 0x4A -> LEFT SQUARE BRACKET
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'!' # 0x4F -> EXCLAMATION MARK
'&' # 0x50 -> AMPERSAND
'\xe9' # 0x51 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0x52 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x53 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x54 -> LATIN SMALL LETTER E WITH GRAVE
'\xed' # 0x55 -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0x56 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x57 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xec' # 0x58 -> LATIN SMALL LETTER I WITH GRAVE
'\xdf' # 0x59 -> LATIN SMALL LETTER SHARP S (GERMAN)
']' # 0x5A -> RIGHT SQUARE BRACKET
'$' # 0x5B -> DOLLAR SIGN
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'^' # 0x5F -> CIRCUMFLEX ACCENT
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\xc2' # 0x62 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc4' # 0x63 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc0' # 0x64 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0x65 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc3' # 0x66 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc5' # 0x67 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x68 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xd1' # 0x69 -> LATIN CAPITAL LETTER N WITH TILDE
'\xa6' # 0x6A -> BROKEN BAR
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\xf8' # 0x70 -> LATIN SMALL LETTER O WITH STROKE
'\xc9' # 0x71 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0x72 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x73 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x74 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0x75 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x76 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x77 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0x78 -> LATIN CAPITAL LETTER I WITH GRAVE
'`' # 0x79 -> GRAVE ACCENT
':' # 0x7A -> COLON
'#' # 0x7B -> NUMBER SIGN
'@' # 0x7C -> COMMERCIAL AT
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'"' # 0x7F -> QUOTATION MARK
'\xd8' # 0x80 -> LATIN CAPITAL LETTER O WITH STROKE
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\xab' # 0x8A -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x8B -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xf0' # 0x8C -> LATIN SMALL LETTER ETH (ICELANDIC)
'\xfd' # 0x8D -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0x8E -> LATIN SMALL LETTER THORN (ICELANDIC)
'\xb1' # 0x8F -> PLUS-MINUS SIGN
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\xaa' # 0x9A -> FEMININE ORDINAL INDICATOR
'\xba' # 0x9B -> MASCULINE ORDINAL INDICATOR
'\xe6' # 0x9C -> LATIN SMALL LIGATURE AE
'\xb8' # 0x9D -> CEDILLA
'\xc6' # 0x9E -> LATIN CAPITAL LIGATURE AE
'\xa4' # 0x9F -> CURRENCY SIGN
'\xb5' # 0xA0 -> MICRO SIGN
'~' # 0xA1 -> TILDE
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\xa1' # 0xAA -> INVERTED EXCLAMATION MARK
'\xbf' # 0xAB -> INVERTED QUESTION MARK
'\xd0' # 0xAC -> LATIN CAPITAL LETTER ETH (ICELANDIC)
'\xdd' # 0xAD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xAE -> LATIN CAPITAL LETTER THORN (ICELANDIC)
'\xae' # 0xAF -> REGISTERED SIGN
'\xa2' # 0xB0 -> CENT SIGN
'\xa3' # 0xB1 -> POUND SIGN
'\xa5' # 0xB2 -> YEN SIGN
'\xb7' # 0xB3 -> MIDDLE DOT
'\xa9' # 0xB4 -> COPYRIGHT SIGN
'\xa7' # 0xB5 -> SECTION SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xbc' # 0xB7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xB8 -> VULGAR FRACTION ONE HALF
'\xbe' # 0xB9 -> VULGAR FRACTION THREE QUARTERS
'\xac' # 0xBA -> NOT SIGN
'|' # 0xBB -> VERTICAL LINE
'\xaf' # 0xBC -> MACRON
'\xa8' # 0xBD -> DIAERESIS
'\xb4' # 0xBE -> ACUTE ACCENT
'\xd7' # 0xBF -> MULTIPLICATION SIGN
'{' # 0xC0 -> LEFT CURLY BRACKET
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\xf4' # 0xCB -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0xCC -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0xCD -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xCE -> LATIN SMALL LETTER O WITH ACUTE
'\xf5' # 0xCF -> LATIN SMALL LETTER O WITH TILDE
'}' # 0xD0 -> RIGHT CURLY BRACKET
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb9' # 0xDA -> SUPERSCRIPT ONE
'\xfb' # 0xDB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xDC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xf9' # 0xDD -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xDE -> LATIN SMALL LETTER U WITH ACUTE
'\xff' # 0xDF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\\' # 0xE0 -> REVERSE SOLIDUS
'\xf7' # 0xE1 -> DIVISION SIGN
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\xd4' # 0xEB -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd6' # 0xEC -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd2' # 0xED -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd5' # 0xEF -> LATIN CAPITAL LETTER O WITH TILDE
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\xdb' # 0xFB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xFC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xd9' # 0xFD -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xFE -> LATIN CAPITAL LETTER U WITH ACUTE
'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
"""
Code page 65001: Windows UTF-8 (CP_UTF8).
"""
import codecs
import functools
if not hasattr(codecs, 'code_page_encode'):
raise LookupError("cp65001 encoding is only available on Windows")
### Codec APIs
encode = functools.partial(codecs.code_page_encode, 65001)
decode = functools.partial(codecs.code_page_decode, 65001)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = decode
class StreamWriter(codecs.StreamWriter):
encode = encode
class StreamReader(codecs.StreamReader):
decode = decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp65001',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""Python Character Mapping Codec cp720 generated on Windows:
Vista 6.0.6002 SP2 Multiprocessor Free with the command:
python Tools/unicode/genwincodec.py 720
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp720',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\x80'
'\x81'
'\xe9' # 0x82 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x83 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\x84'
'\xe0' # 0x85 -> LATIN SMALL LETTER A WITH GRAVE
'\x86'
'\xe7' # 0x87 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x88 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x89 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x8A -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x8B -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x8C -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\x8d'
'\x8e'
'\x8f'
'\x90'
'\u0651' # 0x91 -> ARABIC SHADDA
'\u0652' # 0x92 -> ARABIC SUKUN
'\xf4' # 0x93 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xa4' # 0x94 -> CURRENCY SIGN
'\u0640' # 0x95 -> ARABIC TATWEEL
'\xfb' # 0x96 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x97 -> LATIN SMALL LETTER U WITH GRAVE
'\u0621' # 0x98 -> ARABIC LETTER HAMZA
'\u0622' # 0x99 -> ARABIC LETTER ALEF WITH MADDA ABOVE
'\u0623' # 0x9A -> ARABIC LETTER ALEF WITH HAMZA ABOVE
'\u0624' # 0x9B -> ARABIC LETTER WAW WITH HAMZA ABOVE
'\xa3' # 0x9C -> POUND SIGN
'\u0625' # 0x9D -> ARABIC LETTER ALEF WITH HAMZA BELOW
'\u0626' # 0x9E -> ARABIC LETTER YEH WITH HAMZA ABOVE
'\u0627' # 0x9F -> ARABIC LETTER ALEF
'\u0628' # 0xA0 -> ARABIC LETTER BEH
'\u0629' # 0xA1 -> ARABIC LETTER TEH MARBUTA
'\u062a' # 0xA2 -> ARABIC LETTER TEH
'\u062b' # 0xA3 -> ARABIC LETTER THEH
'\u062c' # 0xA4 -> ARABIC LETTER JEEM
'\u062d' # 0xA5 -> ARABIC LETTER HAH
'\u062e' # 0xA6 -> ARABIC LETTER KHAH
'\u062f' # 0xA7 -> ARABIC LETTER DAL
'\u0630' # 0xA8 -> ARABIC LETTER THAL
'\u0631' # 0xA9 -> ARABIC LETTER REH
'\u0632' # 0xAA -> ARABIC LETTER ZAIN
'\u0633' # 0xAB -> ARABIC LETTER SEEN
'\u0634' # 0xAC -> ARABIC LETTER SHEEN
'\u0635' # 0xAD -> ARABIC LETTER SAD
'\xab' # 0xAE -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xAF -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0xB0 -> LIGHT SHADE
'\u2592' # 0xB1 -> MEDIUM SHADE
'\u2593' # 0xB2 -> DARK SHADE
'\u2502' # 0xB3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0xB4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0xB5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0xB6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0xB7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0xB8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0xB9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0xBA -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0xBB -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0xBC -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0xBD -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0xBE -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0xBF -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0xC0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0xC1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0xC2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0xC3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0xC4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0xC5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0xC6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0xC7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0xC8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0xC9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0xCA -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0xCB -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0xCC -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0xCD -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0xCE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0xCF -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0xD0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0xD1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0xD2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0xD3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0xD4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0xD5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0xD6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0xD7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0xD8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0xD9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0xDA -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0xDB -> FULL BLOCK
'\u2584' # 0xDC -> LOWER HALF BLOCK
'\u258c' # 0xDD -> LEFT HALF BLOCK
'\u2590' # 0xDE -> RIGHT HALF BLOCK
'\u2580' # 0xDF -> UPPER HALF BLOCK
'\u0636' # 0xE0 -> ARABIC LETTER DAD
'\u0637' # 0xE1 -> ARABIC LETTER TAH
'\u0638' # 0xE2 -> ARABIC LETTER ZAH
'\u0639' # 0xE3 -> ARABIC LETTER AIN
'\u063a' # 0xE4 -> ARABIC LETTER GHAIN
'\u0641' # 0xE5 -> ARABIC LETTER FEH
'\xb5' # 0xE6 -> MICRO SIGN
'\u0642' # 0xE7 -> ARABIC LETTER QAF
'\u0643' # 0xE8 -> ARABIC LETTER KAF
'\u0644' # 0xE9 -> ARABIC LETTER LAM
'\u0645' # 0xEA -> ARABIC LETTER MEEM
'\u0646' # 0xEB -> ARABIC LETTER NOON
'\u0647' # 0xEC -> ARABIC LETTER HEH
'\u0648' # 0xED -> ARABIC LETTER WAW
'\u0649' # 0xEE -> ARABIC LETTER ALEF MAKSURA
'\u064a' # 0xEF -> ARABIC LETTER YEH
'\u2261' # 0xF0 -> IDENTICAL TO
'\u064b' # 0xF1 -> ARABIC FATHATAN
'\u064c' # 0xF2 -> ARABIC DAMMATAN
'\u064d' # 0xF3 -> ARABIC KASRATAN
'\u064e' # 0xF4 -> ARABIC FATHA
'\u064f' # 0xF5 -> ARABIC DAMMA
'\u0650' # 0xF6 -> ARABIC KASRA
'\u2248' # 0xF7 -> ALMOST EQUAL TO
'\xb0' # 0xF8 -> DEGREE SIGN
'\u2219' # 0xF9 -> BULLET OPERATOR
'\xb7' # 0xFA -> MIDDLE DOT
'\u221a' # 0xFB -> SQUARE ROOT
'\u207f' # 0xFC -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0xFD -> SUPERSCRIPT TWO
'\u25a0' # 0xFE -> BLACK SQUARE
'\xa0' # 0xFF -> NO-BREAK SPACE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp737 generated from 'VENDORS/MICSFT/PC/CP737.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp737',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0391, # GREEK CAPITAL LETTER ALPHA
0x0081: 0x0392, # GREEK CAPITAL LETTER BETA
0x0082: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x0083: 0x0394, # GREEK CAPITAL LETTER DELTA
0x0084: 0x0395, # GREEK CAPITAL LETTER EPSILON
0x0085: 0x0396, # GREEK CAPITAL LETTER ZETA
0x0086: 0x0397, # GREEK CAPITAL LETTER ETA
0x0087: 0x0398, # GREEK CAPITAL LETTER THETA
0x0088: 0x0399, # GREEK CAPITAL LETTER IOTA
0x0089: 0x039a, # GREEK CAPITAL LETTER KAPPA
0x008a: 0x039b, # GREEK CAPITAL LETTER LAMDA
0x008b: 0x039c, # GREEK CAPITAL LETTER MU
0x008c: 0x039d, # GREEK CAPITAL LETTER NU
0x008d: 0x039e, # GREEK CAPITAL LETTER XI
0x008e: 0x039f, # GREEK CAPITAL LETTER OMICRON
0x008f: 0x03a0, # GREEK CAPITAL LETTER PI
0x0090: 0x03a1, # GREEK CAPITAL LETTER RHO
0x0091: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x0092: 0x03a4, # GREEK CAPITAL LETTER TAU
0x0093: 0x03a5, # GREEK CAPITAL LETTER UPSILON
0x0094: 0x03a6, # GREEK CAPITAL LETTER PHI
0x0095: 0x03a7, # GREEK CAPITAL LETTER CHI
0x0096: 0x03a8, # GREEK CAPITAL LETTER PSI
0x0097: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x0098: 0x03b1, # GREEK SMALL LETTER ALPHA
0x0099: 0x03b2, # GREEK SMALL LETTER BETA
0x009a: 0x03b3, # GREEK SMALL LETTER GAMMA
0x009b: 0x03b4, # GREEK SMALL LETTER DELTA
0x009c: 0x03b5, # GREEK SMALL LETTER EPSILON
0x009d: 0x03b6, # GREEK SMALL LETTER ZETA
0x009e: 0x03b7, # GREEK SMALL LETTER ETA
0x009f: 0x03b8, # GREEK SMALL LETTER THETA
0x00a0: 0x03b9, # GREEK SMALL LETTER IOTA
0x00a1: 0x03ba, # GREEK SMALL LETTER KAPPA
0x00a2: 0x03bb, # GREEK SMALL LETTER LAMDA
0x00a3: 0x03bc, # GREEK SMALL LETTER MU
0x00a4: 0x03bd, # GREEK SMALL LETTER NU
0x00a5: 0x03be, # GREEK SMALL LETTER XI
0x00a6: 0x03bf, # GREEK SMALL LETTER OMICRON
0x00a7: 0x03c0, # GREEK SMALL LETTER PI
0x00a8: 0x03c1, # GREEK SMALL LETTER RHO
0x00a9: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00aa: 0x03c2, # GREEK SMALL LETTER FINAL SIGMA
0x00ab: 0x03c4, # GREEK SMALL LETTER TAU
0x00ac: 0x03c5, # GREEK SMALL LETTER UPSILON
0x00ad: 0x03c6, # GREEK SMALL LETTER PHI
0x00ae: 0x03c7, # GREEK SMALL LETTER CHI
0x00af: 0x03c8, # GREEK SMALL LETTER PSI
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03c9, # GREEK SMALL LETTER OMEGA
0x00e1: 0x03ac, # GREEK SMALL LETTER ALPHA WITH TONOS
0x00e2: 0x03ad, # GREEK SMALL LETTER EPSILON WITH TONOS
0x00e3: 0x03ae, # GREEK SMALL LETTER ETA WITH TONOS
0x00e4: 0x03ca, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x00e5: 0x03af, # GREEK SMALL LETTER IOTA WITH TONOS
0x00e6: 0x03cc, # GREEK SMALL LETTER OMICRON WITH TONOS
0x00e7: 0x03cd, # GREEK SMALL LETTER UPSILON WITH TONOS
0x00e8: 0x03cb, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x00e9: 0x03ce, # GREEK SMALL LETTER OMEGA WITH TONOS
0x00ea: 0x0386, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x00eb: 0x0388, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x00ec: 0x0389, # GREEK CAPITAL LETTER ETA WITH TONOS
0x00ed: 0x038a, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x00ee: 0x038c, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x00ef: 0x038e, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x00f0: 0x038f, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x03aa, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x00f5: 0x03ab, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\u0391' # 0x0080 -> GREEK CAPITAL LETTER ALPHA
'\u0392' # 0x0081 -> GREEK CAPITAL LETTER BETA
'\u0393' # 0x0082 -> GREEK CAPITAL LETTER GAMMA
'\u0394' # 0x0083 -> GREEK CAPITAL LETTER DELTA
'\u0395' # 0x0084 -> GREEK CAPITAL LETTER EPSILON
'\u0396' # 0x0085 -> GREEK CAPITAL LETTER ZETA
'\u0397' # 0x0086 -> GREEK CAPITAL LETTER ETA
'\u0398' # 0x0087 -> GREEK CAPITAL LETTER THETA
'\u0399' # 0x0088 -> GREEK CAPITAL LETTER IOTA
'\u039a' # 0x0089 -> GREEK CAPITAL LETTER KAPPA
'\u039b' # 0x008a -> GREEK CAPITAL LETTER LAMDA
'\u039c' # 0x008b -> GREEK CAPITAL LETTER MU
'\u039d' # 0x008c -> GREEK CAPITAL LETTER NU
'\u039e' # 0x008d -> GREEK CAPITAL LETTER XI
'\u039f' # 0x008e -> GREEK CAPITAL LETTER OMICRON
'\u03a0' # 0x008f -> GREEK CAPITAL LETTER PI
'\u03a1' # 0x0090 -> GREEK CAPITAL LETTER RHO
'\u03a3' # 0x0091 -> GREEK CAPITAL LETTER SIGMA
'\u03a4' # 0x0092 -> GREEK CAPITAL LETTER TAU
'\u03a5' # 0x0093 -> GREEK CAPITAL LETTER UPSILON
'\u03a6' # 0x0094 -> GREEK CAPITAL LETTER PHI
'\u03a7' # 0x0095 -> GREEK CAPITAL LETTER CHI
'\u03a8' # 0x0096 -> GREEK CAPITAL LETTER PSI
'\u03a9' # 0x0097 -> GREEK CAPITAL LETTER OMEGA
'\u03b1' # 0x0098 -> GREEK SMALL LETTER ALPHA
'\u03b2' # 0x0099 -> GREEK SMALL LETTER BETA
'\u03b3' # 0x009a -> GREEK SMALL LETTER GAMMA
'\u03b4' # 0x009b -> GREEK SMALL LETTER DELTA
'\u03b5' # 0x009c -> GREEK SMALL LETTER EPSILON
'\u03b6' # 0x009d -> GREEK SMALL LETTER ZETA
'\u03b7' # 0x009e -> GREEK SMALL LETTER ETA
'\u03b8' # 0x009f -> GREEK SMALL LETTER THETA
'\u03b9' # 0x00a0 -> GREEK SMALL LETTER IOTA
'\u03ba' # 0x00a1 -> GREEK SMALL LETTER KAPPA
'\u03bb' # 0x00a2 -> GREEK SMALL LETTER LAMDA
'\u03bc' # 0x00a3 -> GREEK SMALL LETTER MU
'\u03bd' # 0x00a4 -> GREEK SMALL LETTER NU
'\u03be' # 0x00a5 -> GREEK SMALL LETTER XI
'\u03bf' # 0x00a6 -> GREEK SMALL LETTER OMICRON
'\u03c0' # 0x00a7 -> GREEK SMALL LETTER PI
'\u03c1' # 0x00a8 -> GREEK SMALL LETTER RHO
'\u03c3' # 0x00a9 -> GREEK SMALL LETTER SIGMA
'\u03c2' # 0x00aa -> GREEK SMALL LETTER FINAL SIGMA
'\u03c4' # 0x00ab -> GREEK SMALL LETTER TAU
'\u03c5' # 0x00ac -> GREEK SMALL LETTER UPSILON
'\u03c6' # 0x00ad -> GREEK SMALL LETTER PHI
'\u03c7' # 0x00ae -> GREEK SMALL LETTER CHI
'\u03c8' # 0x00af -> GREEK SMALL LETTER PSI
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03c9' # 0x00e0 -> GREEK SMALL LETTER OMEGA
'\u03ac' # 0x00e1 -> GREEK SMALL LETTER ALPHA WITH TONOS
'\u03ad' # 0x00e2 -> GREEK SMALL LETTER EPSILON WITH TONOS
'\u03ae' # 0x00e3 -> GREEK SMALL LETTER ETA WITH TONOS
'\u03ca' # 0x00e4 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\u03af' # 0x00e5 -> GREEK SMALL LETTER IOTA WITH TONOS
'\u03cc' # 0x00e6 -> GREEK SMALL LETTER OMICRON WITH TONOS
'\u03cd' # 0x00e7 -> GREEK SMALL LETTER UPSILON WITH TONOS
'\u03cb' # 0x00e8 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\u03ce' # 0x00e9 -> GREEK SMALL LETTER OMEGA WITH TONOS
'\u0386' # 0x00ea -> GREEK CAPITAL LETTER ALPHA WITH TONOS
'\u0388' # 0x00eb -> GREEK CAPITAL LETTER EPSILON WITH TONOS
'\u0389' # 0x00ec -> GREEK CAPITAL LETTER ETA WITH TONOS
'\u038a' # 0x00ed -> GREEK CAPITAL LETTER IOTA WITH TONOS
'\u038c' # 0x00ee -> GREEK CAPITAL LETTER OMICRON WITH TONOS
'\u038e' # 0x00ef -> GREEK CAPITAL LETTER UPSILON WITH TONOS
'\u038f' # 0x00f0 -> GREEK CAPITAL LETTER OMEGA WITH TONOS
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u03aa' # 0x00f4 -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\u03ab' # 0x00f5 -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b7: 0x00fa, # MIDDLE DOT
0x00f7: 0x00f6, # DIVISION SIGN
0x0386: 0x00ea, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x0388: 0x00eb, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x0389: 0x00ec, # GREEK CAPITAL LETTER ETA WITH TONOS
0x038a: 0x00ed, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x038c: 0x00ee, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x038e: 0x00ef, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x038f: 0x00f0, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x0391: 0x0080, # GREEK CAPITAL LETTER ALPHA
0x0392: 0x0081, # GREEK CAPITAL LETTER BETA
0x0393: 0x0082, # GREEK CAPITAL LETTER GAMMA
0x0394: 0x0083, # GREEK CAPITAL LETTER DELTA
0x0395: 0x0084, # GREEK CAPITAL LETTER EPSILON
0x0396: 0x0085, # GREEK CAPITAL LETTER ZETA
0x0397: 0x0086, # GREEK CAPITAL LETTER ETA
0x0398: 0x0087, # GREEK CAPITAL LETTER THETA
0x0399: 0x0088, # GREEK CAPITAL LETTER IOTA
0x039a: 0x0089, # GREEK CAPITAL LETTER KAPPA
0x039b: 0x008a, # GREEK CAPITAL LETTER LAMDA
0x039c: 0x008b, # GREEK CAPITAL LETTER MU
0x039d: 0x008c, # GREEK CAPITAL LETTER NU
0x039e: 0x008d, # GREEK CAPITAL LETTER XI
0x039f: 0x008e, # GREEK CAPITAL LETTER OMICRON
0x03a0: 0x008f, # GREEK CAPITAL LETTER PI
0x03a1: 0x0090, # GREEK CAPITAL LETTER RHO
0x03a3: 0x0091, # GREEK CAPITAL LETTER SIGMA
0x03a4: 0x0092, # GREEK CAPITAL LETTER TAU
0x03a5: 0x0093, # GREEK CAPITAL LETTER UPSILON
0x03a6: 0x0094, # GREEK CAPITAL LETTER PHI
0x03a7: 0x0095, # GREEK CAPITAL LETTER CHI
0x03a8: 0x0096, # GREEK CAPITAL LETTER PSI
0x03a9: 0x0097, # GREEK CAPITAL LETTER OMEGA
0x03aa: 0x00f4, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x03ab: 0x00f5, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x03ac: 0x00e1, # GREEK SMALL LETTER ALPHA WITH TONOS
0x03ad: 0x00e2, # GREEK SMALL LETTER EPSILON WITH TONOS
0x03ae: 0x00e3, # GREEK SMALL LETTER ETA WITH TONOS
0x03af: 0x00e5, # GREEK SMALL LETTER IOTA WITH TONOS
0x03b1: 0x0098, # GREEK SMALL LETTER ALPHA
0x03b2: 0x0099, # GREEK SMALL LETTER BETA
0x03b3: 0x009a, # GREEK SMALL LETTER GAMMA
0x03b4: 0x009b, # GREEK SMALL LETTER DELTA
0x03b5: 0x009c, # GREEK SMALL LETTER EPSILON
0x03b6: 0x009d, # GREEK SMALL LETTER ZETA
0x03b7: 0x009e, # GREEK SMALL LETTER ETA
0x03b8: 0x009f, # GREEK SMALL LETTER THETA
0x03b9: 0x00a0, # GREEK SMALL LETTER IOTA
0x03ba: 0x00a1, # GREEK SMALL LETTER KAPPA
0x03bb: 0x00a2, # GREEK SMALL LETTER LAMDA
0x03bc: 0x00a3, # GREEK SMALL LETTER MU
0x03bd: 0x00a4, # GREEK SMALL LETTER NU
0x03be: 0x00a5, # GREEK SMALL LETTER XI
0x03bf: 0x00a6, # GREEK SMALL LETTER OMICRON
0x03c0: 0x00a7, # GREEK SMALL LETTER PI
0x03c1: 0x00a8, # GREEK SMALL LETTER RHO
0x03c2: 0x00aa, # GREEK SMALL LETTER FINAL SIGMA
0x03c3: 0x00a9, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00ab, # GREEK SMALL LETTER TAU
0x03c5: 0x00ac, # GREEK SMALL LETTER UPSILON
0x03c6: 0x00ad, # GREEK SMALL LETTER PHI
0x03c7: 0x00ae, # GREEK SMALL LETTER CHI
0x03c8: 0x00af, # GREEK SMALL LETTER PSI
0x03c9: 0x00e0, # GREEK SMALL LETTER OMEGA
0x03ca: 0x00e4, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x03cb: 0x00e8, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x03cc: 0x00e6, # GREEK SMALL LETTER OMICRON WITH TONOS
0x03cd: 0x00e7, # GREEK SMALL LETTER UPSILON WITH TONOS
0x03ce: 0x00e9, # GREEK SMALL LETTER OMEGA WITH TONOS
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp775 generated from 'VENDORS/MICSFT/PC/CP775.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp775',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0106, # LATIN CAPITAL LETTER C WITH ACUTE
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x0101, # LATIN SMALL LETTER A WITH MACRON
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x0123, # LATIN SMALL LETTER G WITH CEDILLA
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x0107, # LATIN SMALL LETTER C WITH ACUTE
0x0088: 0x0142, # LATIN SMALL LETTER L WITH STROKE
0x0089: 0x0113, # LATIN SMALL LETTER E WITH MACRON
0x008a: 0x0156, # LATIN CAPITAL LETTER R WITH CEDILLA
0x008b: 0x0157, # LATIN SMALL LETTER R WITH CEDILLA
0x008c: 0x012b, # LATIN SMALL LETTER I WITH MACRON
0x008d: 0x0179, # LATIN CAPITAL LETTER Z WITH ACUTE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x014d, # LATIN SMALL LETTER O WITH MACRON
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x0122, # LATIN CAPITAL LETTER G WITH CEDILLA
0x0096: 0x00a2, # CENT SIGN
0x0097: 0x015a, # LATIN CAPITAL LETTER S WITH ACUTE
0x0098: 0x015b, # LATIN SMALL LETTER S WITH ACUTE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x00a4, # CURRENCY SIGN
0x00a0: 0x0100, # LATIN CAPITAL LETTER A WITH MACRON
0x00a1: 0x012a, # LATIN CAPITAL LETTER I WITH MACRON
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x017b, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x00a4: 0x017c, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x00a5: 0x017a, # LATIN SMALL LETTER Z WITH ACUTE
0x00a6: 0x201d, # RIGHT DOUBLE QUOTATION MARK
0x00a7: 0x00a6, # BROKEN BAR
0x00a8: 0x00a9, # COPYRIGHT SIGN
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x0141, # LATIN CAPITAL LETTER L WITH STROKE
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x0104, # LATIN CAPITAL LETTER A WITH OGONEK
0x00b6: 0x010c, # LATIN CAPITAL LETTER C WITH CARON
0x00b7: 0x0118, # LATIN CAPITAL LETTER E WITH OGONEK
0x00b8: 0x0116, # LATIN CAPITAL LETTER E WITH DOT ABOVE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x012e, # LATIN CAPITAL LETTER I WITH OGONEK
0x00be: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x0172, # LATIN CAPITAL LETTER U WITH OGONEK
0x00c7: 0x016a, # LATIN CAPITAL LETTER U WITH MACRON
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x017d, # LATIN CAPITAL LETTER Z WITH CARON
0x00d0: 0x0105, # LATIN SMALL LETTER A WITH OGONEK
0x00d1: 0x010d, # LATIN SMALL LETTER C WITH CARON
0x00d2: 0x0119, # LATIN SMALL LETTER E WITH OGONEK
0x00d3: 0x0117, # LATIN SMALL LETTER E WITH DOT ABOVE
0x00d4: 0x012f, # LATIN SMALL LETTER I WITH OGONEK
0x00d5: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x00d6: 0x0173, # LATIN SMALL LETTER U WITH OGONEK
0x00d7: 0x016b, # LATIN SMALL LETTER U WITH MACRON
0x00d8: 0x017e, # LATIN SMALL LETTER Z WITH CARON
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e2: 0x014c, # LATIN CAPITAL LETTER O WITH MACRON
0x00e3: 0x0143, # LATIN CAPITAL LETTER N WITH ACUTE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x0144, # LATIN SMALL LETTER N WITH ACUTE
0x00e8: 0x0136, # LATIN CAPITAL LETTER K WITH CEDILLA
0x00e9: 0x0137, # LATIN SMALL LETTER K WITH CEDILLA
0x00ea: 0x013b, # LATIN CAPITAL LETTER L WITH CEDILLA
0x00eb: 0x013c, # LATIN SMALL LETTER L WITH CEDILLA
0x00ec: 0x0146, # LATIN SMALL LETTER N WITH CEDILLA
0x00ed: 0x0112, # LATIN CAPITAL LETTER E WITH MACRON
0x00ee: 0x0145, # LATIN CAPITAL LETTER N WITH CEDILLA
0x00ef: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x201c, # LEFT DOUBLE QUOTATION MARK
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x201e, # DOUBLE LOW-9 QUOTATION MARK
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\u0106' # 0x0080 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\u0101' # 0x0083 -> LATIN SMALL LETTER A WITH MACRON
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\u0123' # 0x0085 -> LATIN SMALL LETTER G WITH CEDILLA
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\u0107' # 0x0087 -> LATIN SMALL LETTER C WITH ACUTE
'\u0142' # 0x0088 -> LATIN SMALL LETTER L WITH STROKE
'\u0113' # 0x0089 -> LATIN SMALL LETTER E WITH MACRON
'\u0156' # 0x008a -> LATIN CAPITAL LETTER R WITH CEDILLA
'\u0157' # 0x008b -> LATIN SMALL LETTER R WITH CEDILLA
'\u012b' # 0x008c -> LATIN SMALL LETTER I WITH MACRON
'\u0179' # 0x008d -> LATIN CAPITAL LETTER Z WITH ACUTE
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\u014d' # 0x0093 -> LATIN SMALL LETTER O WITH MACRON
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\u0122' # 0x0095 -> LATIN CAPITAL LETTER G WITH CEDILLA
'\xa2' # 0x0096 -> CENT SIGN
'\u015a' # 0x0097 -> LATIN CAPITAL LETTER S WITH ACUTE
'\u015b' # 0x0098 -> LATIN SMALL LETTER S WITH ACUTE
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
'\xa3' # 0x009c -> POUND SIGN
'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
'\xd7' # 0x009e -> MULTIPLICATION SIGN
'\xa4' # 0x009f -> CURRENCY SIGN
'\u0100' # 0x00a0 -> LATIN CAPITAL LETTER A WITH MACRON
'\u012a' # 0x00a1 -> LATIN CAPITAL LETTER I WITH MACRON
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\u017b' # 0x00a3 -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\u017c' # 0x00a4 -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u017a' # 0x00a5 -> LATIN SMALL LETTER Z WITH ACUTE
'\u201d' # 0x00a6 -> RIGHT DOUBLE QUOTATION MARK
'\xa6' # 0x00a7 -> BROKEN BAR
'\xa9' # 0x00a8 -> COPYRIGHT SIGN
'\xae' # 0x00a9 -> REGISTERED SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\u0141' # 0x00ad -> LATIN CAPITAL LETTER L WITH STROKE
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u0104' # 0x00b5 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u010c' # 0x00b6 -> LATIN CAPITAL LETTER C WITH CARON
'\u0118' # 0x00b7 -> LATIN CAPITAL LETTER E WITH OGONEK
'\u0116' # 0x00b8 -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u012e' # 0x00bd -> LATIN CAPITAL LETTER I WITH OGONEK
'\u0160' # 0x00be -> LATIN CAPITAL LETTER S WITH CARON
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u0172' # 0x00c6 -> LATIN CAPITAL LETTER U WITH OGONEK
'\u016a' # 0x00c7 -> LATIN CAPITAL LETTER U WITH MACRON
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u017d' # 0x00cf -> LATIN CAPITAL LETTER Z WITH CARON
'\u0105' # 0x00d0 -> LATIN SMALL LETTER A WITH OGONEK
'\u010d' # 0x00d1 -> LATIN SMALL LETTER C WITH CARON
'\u0119' # 0x00d2 -> LATIN SMALL LETTER E WITH OGONEK
'\u0117' # 0x00d3 -> LATIN SMALL LETTER E WITH DOT ABOVE
'\u012f' # 0x00d4 -> LATIN SMALL LETTER I WITH OGONEK
'\u0161' # 0x00d5 -> LATIN SMALL LETTER S WITH CARON
'\u0173' # 0x00d6 -> LATIN SMALL LETTER U WITH OGONEK
'\u016b' # 0x00d7 -> LATIN SMALL LETTER U WITH MACRON
'\u017e' # 0x00d8 -> LATIN SMALL LETTER Z WITH CARON
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S (GERMAN)
'\u014c' # 0x00e2 -> LATIN CAPITAL LETTER O WITH MACRON
'\u0143' # 0x00e3 -> LATIN CAPITAL LETTER N WITH ACUTE
'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xb5' # 0x00e6 -> MICRO SIGN
'\u0144' # 0x00e7 -> LATIN SMALL LETTER N WITH ACUTE
'\u0136' # 0x00e8 -> LATIN CAPITAL LETTER K WITH CEDILLA
'\u0137' # 0x00e9 -> LATIN SMALL LETTER K WITH CEDILLA
'\u013b' # 0x00ea -> LATIN CAPITAL LETTER L WITH CEDILLA
'\u013c' # 0x00eb -> LATIN SMALL LETTER L WITH CEDILLA
'\u0146' # 0x00ec -> LATIN SMALL LETTER N WITH CEDILLA
'\u0112' # 0x00ed -> LATIN CAPITAL LETTER E WITH MACRON
'\u0145' # 0x00ee -> LATIN CAPITAL LETTER N WITH CEDILLA
'\u2019' # 0x00ef -> RIGHT SINGLE QUOTATION MARK
'\xad' # 0x00f0 -> SOFT HYPHEN
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u201c' # 0x00f2 -> LEFT DOUBLE QUOTATION MARK
'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
'\xb6' # 0x00f4 -> PILCROW SIGN
'\xa7' # 0x00f5 -> SECTION SIGN
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u201e' # 0x00f7 -> DOUBLE LOW-9 QUOTATION MARK
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\xb9' # 0x00fb -> SUPERSCRIPT ONE
'\xb3' # 0x00fc -> SUPERSCRIPT THREE
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a2: 0x0096, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x009f, # CURRENCY SIGN
0x00a6: 0x00a7, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a9: 0x00a8, # COPYRIGHT SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x0100: 0x00a0, # LATIN CAPITAL LETTER A WITH MACRON
0x0101: 0x0083, # LATIN SMALL LETTER A WITH MACRON
0x0104: 0x00b5, # LATIN CAPITAL LETTER A WITH OGONEK
0x0105: 0x00d0, # LATIN SMALL LETTER A WITH OGONEK
0x0106: 0x0080, # LATIN CAPITAL LETTER C WITH ACUTE
0x0107: 0x0087, # LATIN SMALL LETTER C WITH ACUTE
0x010c: 0x00b6, # LATIN CAPITAL LETTER C WITH CARON
0x010d: 0x00d1, # LATIN SMALL LETTER C WITH CARON
0x0112: 0x00ed, # LATIN CAPITAL LETTER E WITH MACRON
0x0113: 0x0089, # LATIN SMALL LETTER E WITH MACRON
0x0116: 0x00b8, # LATIN CAPITAL LETTER E WITH DOT ABOVE
0x0117: 0x00d3, # LATIN SMALL LETTER E WITH DOT ABOVE
0x0118: 0x00b7, # LATIN CAPITAL LETTER E WITH OGONEK
0x0119: 0x00d2, # LATIN SMALL LETTER E WITH OGONEK
0x0122: 0x0095, # LATIN CAPITAL LETTER G WITH CEDILLA
0x0123: 0x0085, # LATIN SMALL LETTER G WITH CEDILLA
0x012a: 0x00a1, # LATIN CAPITAL LETTER I WITH MACRON
0x012b: 0x008c, # LATIN SMALL LETTER I WITH MACRON
0x012e: 0x00bd, # LATIN CAPITAL LETTER I WITH OGONEK
0x012f: 0x00d4, # LATIN SMALL LETTER I WITH OGONEK
0x0136: 0x00e8, # LATIN CAPITAL LETTER K WITH CEDILLA
0x0137: 0x00e9, # LATIN SMALL LETTER K WITH CEDILLA
0x013b: 0x00ea, # LATIN CAPITAL LETTER L WITH CEDILLA
0x013c: 0x00eb, # LATIN SMALL LETTER L WITH CEDILLA
0x0141: 0x00ad, # LATIN CAPITAL LETTER L WITH STROKE
0x0142: 0x0088, # LATIN SMALL LETTER L WITH STROKE
0x0143: 0x00e3, # LATIN CAPITAL LETTER N WITH ACUTE
0x0144: 0x00e7, # LATIN SMALL LETTER N WITH ACUTE
0x0145: 0x00ee, # LATIN CAPITAL LETTER N WITH CEDILLA
0x0146: 0x00ec, # LATIN SMALL LETTER N WITH CEDILLA
0x014c: 0x00e2, # LATIN CAPITAL LETTER O WITH MACRON
0x014d: 0x0093, # LATIN SMALL LETTER O WITH MACRON
0x0156: 0x008a, # LATIN CAPITAL LETTER R WITH CEDILLA
0x0157: 0x008b, # LATIN SMALL LETTER R WITH CEDILLA
0x015a: 0x0097, # LATIN CAPITAL LETTER S WITH ACUTE
0x015b: 0x0098, # LATIN SMALL LETTER S WITH ACUTE
0x0160: 0x00be, # LATIN CAPITAL LETTER S WITH CARON
0x0161: 0x00d5, # LATIN SMALL LETTER S WITH CARON
0x016a: 0x00c7, # LATIN CAPITAL LETTER U WITH MACRON
0x016b: 0x00d7, # LATIN SMALL LETTER U WITH MACRON
0x0172: 0x00c6, # LATIN CAPITAL LETTER U WITH OGONEK
0x0173: 0x00d6, # LATIN SMALL LETTER U WITH OGONEK
0x0179: 0x008d, # LATIN CAPITAL LETTER Z WITH ACUTE
0x017a: 0x00a5, # LATIN SMALL LETTER Z WITH ACUTE
0x017b: 0x00a3, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x017c: 0x00a4, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x017d: 0x00cf, # LATIN CAPITAL LETTER Z WITH CARON
0x017e: 0x00d8, # LATIN SMALL LETTER Z WITH CARON
0x2019: 0x00ef, # RIGHT SINGLE QUOTATION MARK
0x201c: 0x00f2, # LEFT DOUBLE QUOTATION MARK
0x201d: 0x00a6, # RIGHT DOUBLE QUOTATION MARK
0x201e: 0x00f7, # DOUBLE LOW-9 QUOTATION MARK
0x2219: 0x00f9, # BULLET OPERATOR
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP850.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp850',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00b8: 0x00a9, # COPYRIGHT SIGN
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x00a2, # CENT SIGN
0x00be: 0x00a5, # YEN SIGN
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00c7: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x00f0, # LATIN SMALL LETTER ETH
0x00d1: 0x00d0, # LATIN CAPITAL LETTER ETH
0x00d2: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00d5: 0x0131, # LATIN SMALL LETTER DOTLESS I
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x00a6, # BROKEN BAR
0x00de: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x00fe, # LATIN SMALL LETTER THORN
0x00e8: 0x00de, # LATIN CAPITAL LETTER THORN
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00eb: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ec: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00ed: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00ee: 0x00af, # MACRON
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2017, # DOUBLE LOW LINE
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
'\xa3' # 0x009c -> POUND SIGN
'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
'\xd7' # 0x009e -> MULTIPLICATION SIGN
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\xae' # 0x00a9 -> REGISTERED SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc0' # 0x00b7 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xa9' # 0x00b8 -> COPYRIGHT SIGN
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\xa2' # 0x00bd -> CENT SIGN
'\xa5' # 0x00be -> YEN SIGN
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\xe3' # 0x00c6 -> LATIN SMALL LETTER A WITH TILDE
'\xc3' # 0x00c7 -> LATIN CAPITAL LETTER A WITH TILDE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa4' # 0x00cf -> CURRENCY SIGN
'\xf0' # 0x00d0 -> LATIN SMALL LETTER ETH
'\xd0' # 0x00d1 -> LATIN CAPITAL LETTER ETH
'\xca' # 0x00d2 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x00d4 -> LATIN CAPITAL LETTER E WITH GRAVE
'\u0131' # 0x00d5 -> LATIN SMALL LETTER DOTLESS I
'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x00d8 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\xa6' # 0x00dd -> BROKEN BAR
'\xcc' # 0x00de -> LATIN CAPITAL LETTER I WITH GRAVE
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd2' # 0x00e3 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xb5' # 0x00e6 -> MICRO SIGN
'\xfe' # 0x00e7 -> LATIN SMALL LETTER THORN
'\xde' # 0x00e8 -> LATIN CAPITAL LETTER THORN
'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0x00ea -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0x00eb -> LATIN CAPITAL LETTER U WITH GRAVE
'\xfd' # 0x00ec -> LATIN SMALL LETTER Y WITH ACUTE
'\xdd' # 0x00ed -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xaf' # 0x00ee -> MACRON
'\xb4' # 0x00ef -> ACUTE ACCENT
'\xad' # 0x00f0 -> SOFT HYPHEN
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2017' # 0x00f2 -> DOUBLE LOW LINE
'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
'\xb6' # 0x00f4 -> PILCROW SIGN
'\xa7' # 0x00f5 -> SECTION SIGN
'\xf7' # 0x00f6 -> DIVISION SIGN
'\xb8' # 0x00f7 -> CEDILLA
'\xb0' # 0x00f8 -> DEGREE SIGN
'\xa8' # 0x00f9 -> DIAERESIS
'\xb7' # 0x00fa -> MIDDLE DOT
'\xb9' # 0x00fb -> SUPERSCRIPT ONE
'\xb3' # 0x00fc -> SUPERSCRIPT THREE
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x00bd, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a5: 0x00be, # YEN SIGN
0x00a6: 0x00dd, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x00b8, # COPYRIGHT SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00af: 0x00ee, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00f7, # CEDILLA
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x00b7, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x00c7, # LATIN CAPITAL LETTER A WITH TILDE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x00d4, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x00d2, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cc: 0x00de, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x00d8, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d0: 0x00d1, # LATIN CAPITAL LETTER ETH
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00e3, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00d9: 0x00eb, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00db: 0x00ea, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x00ed, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00de: 0x00e8, # LATIN CAPITAL LETTER THORN
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x00c6, # LATIN SMALL LETTER A WITH TILDE
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f0: 0x00d0, # LATIN SMALL LETTER ETH
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x00ec, # LATIN SMALL LETTER Y WITH ACUTE
0x00fe: 0x00e7, # LATIN SMALL LETTER THORN
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0131: 0x00d5, # LATIN SMALL LETTER DOTLESS I
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x2017: 0x00f2, # DOUBLE LOW LINE
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP852.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp852',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x016f, # LATIN SMALL LETTER U WITH RING ABOVE
0x0086: 0x0107, # LATIN SMALL LETTER C WITH ACUTE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x0142, # LATIN SMALL LETTER L WITH STROKE
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x0150, # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
0x008b: 0x0151, # LATIN SMALL LETTER O WITH DOUBLE ACUTE
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x0179, # LATIN CAPITAL LETTER Z WITH ACUTE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x0106, # LATIN CAPITAL LETTER C WITH ACUTE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x0139, # LATIN CAPITAL LETTER L WITH ACUTE
0x0092: 0x013a, # LATIN SMALL LETTER L WITH ACUTE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x013d, # LATIN CAPITAL LETTER L WITH CARON
0x0096: 0x013e, # LATIN SMALL LETTER L WITH CARON
0x0097: 0x015a, # LATIN CAPITAL LETTER S WITH ACUTE
0x0098: 0x015b, # LATIN SMALL LETTER S WITH ACUTE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x0164, # LATIN CAPITAL LETTER T WITH CARON
0x009c: 0x0165, # LATIN SMALL LETTER T WITH CARON
0x009d: 0x0141, # LATIN CAPITAL LETTER L WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x010d, # LATIN SMALL LETTER C WITH CARON
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x0104, # LATIN CAPITAL LETTER A WITH OGONEK
0x00a5: 0x0105, # LATIN SMALL LETTER A WITH OGONEK
0x00a6: 0x017d, # LATIN CAPITAL LETTER Z WITH CARON
0x00a7: 0x017e, # LATIN SMALL LETTER Z WITH CARON
0x00a8: 0x0118, # LATIN CAPITAL LETTER E WITH OGONEK
0x00a9: 0x0119, # LATIN SMALL LETTER E WITH OGONEK
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x017a, # LATIN SMALL LETTER Z WITH ACUTE
0x00ac: 0x010c, # LATIN CAPITAL LETTER C WITH CARON
0x00ad: 0x015f, # LATIN SMALL LETTER S WITH CEDILLA
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x011a, # LATIN CAPITAL LETTER E WITH CARON
0x00b8: 0x015e, # LATIN CAPITAL LETTER S WITH CEDILLA
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x017b, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x00be: 0x017c, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x0102, # LATIN CAPITAL LETTER A WITH BREVE
0x00c7: 0x0103, # LATIN SMALL LETTER A WITH BREVE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x0111, # LATIN SMALL LETTER D WITH STROKE
0x00d1: 0x0110, # LATIN CAPITAL LETTER D WITH STROKE
0x00d2: 0x010e, # LATIN CAPITAL LETTER D WITH CARON
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x010f, # LATIN SMALL LETTER D WITH CARON
0x00d5: 0x0147, # LATIN CAPITAL LETTER N WITH CARON
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x011b, # LATIN SMALL LETTER E WITH CARON
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x0162, # LATIN CAPITAL LETTER T WITH CEDILLA
0x00de: 0x016e, # LATIN CAPITAL LETTER U WITH RING ABOVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x0143, # LATIN CAPITAL LETTER N WITH ACUTE
0x00e4: 0x0144, # LATIN SMALL LETTER N WITH ACUTE
0x00e5: 0x0148, # LATIN SMALL LETTER N WITH CARON
0x00e6: 0x0160, # LATIN CAPITAL LETTER S WITH CARON
0x00e7: 0x0161, # LATIN SMALL LETTER S WITH CARON
0x00e8: 0x0154, # LATIN CAPITAL LETTER R WITH ACUTE
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x0155, # LATIN SMALL LETTER R WITH ACUTE
0x00eb: 0x0170, # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
0x00ec: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00ed: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00ee: 0x0163, # LATIN SMALL LETTER T WITH CEDILLA
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x02dd, # DOUBLE ACUTE ACCENT
0x00f2: 0x02db, # OGONEK
0x00f3: 0x02c7, # CARON
0x00f4: 0x02d8, # BREVE
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x02d9, # DOT ABOVE
0x00fb: 0x0171, # LATIN SMALL LETTER U WITH DOUBLE ACUTE
0x00fc: 0x0158, # LATIN CAPITAL LETTER R WITH CARON
0x00fd: 0x0159, # LATIN SMALL LETTER R WITH CARON
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\u016f' # 0x0085 -> LATIN SMALL LETTER U WITH RING ABOVE
'\u0107' # 0x0086 -> LATIN SMALL LETTER C WITH ACUTE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\u0142' # 0x0088 -> LATIN SMALL LETTER L WITH STROKE
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\u0150' # 0x008a -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
'\u0151' # 0x008b -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\u0179' # 0x008d -> LATIN CAPITAL LETTER Z WITH ACUTE
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u0106' # 0x008f -> LATIN CAPITAL LETTER C WITH ACUTE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0139' # 0x0091 -> LATIN CAPITAL LETTER L WITH ACUTE
'\u013a' # 0x0092 -> LATIN SMALL LETTER L WITH ACUTE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\u013d' # 0x0095 -> LATIN CAPITAL LETTER L WITH CARON
'\u013e' # 0x0096 -> LATIN SMALL LETTER L WITH CARON
'\u015a' # 0x0097 -> LATIN CAPITAL LETTER S WITH ACUTE
'\u015b' # 0x0098 -> LATIN SMALL LETTER S WITH ACUTE
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u0164' # 0x009b -> LATIN CAPITAL LETTER T WITH CARON
'\u0165' # 0x009c -> LATIN SMALL LETTER T WITH CARON
'\u0141' # 0x009d -> LATIN CAPITAL LETTER L WITH STROKE
'\xd7' # 0x009e -> MULTIPLICATION SIGN
'\u010d' # 0x009f -> LATIN SMALL LETTER C WITH CARON
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\u0104' # 0x00a4 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u0105' # 0x00a5 -> LATIN SMALL LETTER A WITH OGONEK
'\u017d' # 0x00a6 -> LATIN CAPITAL LETTER Z WITH CARON
'\u017e' # 0x00a7 -> LATIN SMALL LETTER Z WITH CARON
'\u0118' # 0x00a8 -> LATIN CAPITAL LETTER E WITH OGONEK
'\u0119' # 0x00a9 -> LATIN SMALL LETTER E WITH OGONEK
'\xac' # 0x00aa -> NOT SIGN
'\u017a' # 0x00ab -> LATIN SMALL LETTER Z WITH ACUTE
'\u010c' # 0x00ac -> LATIN CAPITAL LETTER C WITH CARON
'\u015f' # 0x00ad -> LATIN SMALL LETTER S WITH CEDILLA
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\u011a' # 0x00b7 -> LATIN CAPITAL LETTER E WITH CARON
'\u015e' # 0x00b8 -> LATIN CAPITAL LETTER S WITH CEDILLA
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u017b' # 0x00bd -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\u017c' # 0x00be -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u0102' # 0x00c6 -> LATIN CAPITAL LETTER A WITH BREVE
'\u0103' # 0x00c7 -> LATIN SMALL LETTER A WITH BREVE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa4' # 0x00cf -> CURRENCY SIGN
'\u0111' # 0x00d0 -> LATIN SMALL LETTER D WITH STROKE
'\u0110' # 0x00d1 -> LATIN CAPITAL LETTER D WITH STROKE
'\u010e' # 0x00d2 -> LATIN CAPITAL LETTER D WITH CARON
'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u010f' # 0x00d4 -> LATIN SMALL LETTER D WITH CARON
'\u0147' # 0x00d5 -> LATIN CAPITAL LETTER N WITH CARON
'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\u011b' # 0x00d8 -> LATIN SMALL LETTER E WITH CARON
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u0162' # 0x00dd -> LATIN CAPITAL LETTER T WITH CEDILLA
'\u016e' # 0x00de -> LATIN CAPITAL LETTER U WITH RING ABOVE
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u0143' # 0x00e3 -> LATIN CAPITAL LETTER N WITH ACUTE
'\u0144' # 0x00e4 -> LATIN SMALL LETTER N WITH ACUTE
'\u0148' # 0x00e5 -> LATIN SMALL LETTER N WITH CARON
'\u0160' # 0x00e6 -> LATIN CAPITAL LETTER S WITH CARON
'\u0161' # 0x00e7 -> LATIN SMALL LETTER S WITH CARON
'\u0154' # 0x00e8 -> LATIN CAPITAL LETTER R WITH ACUTE
'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
'\u0155' # 0x00ea -> LATIN SMALL LETTER R WITH ACUTE
'\u0170' # 0x00eb -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
'\xfd' # 0x00ec -> LATIN SMALL LETTER Y WITH ACUTE
'\xdd' # 0x00ed -> LATIN CAPITAL LETTER Y WITH ACUTE
'\u0163' # 0x00ee -> LATIN SMALL LETTER T WITH CEDILLA
'\xb4' # 0x00ef -> ACUTE ACCENT
'\xad' # 0x00f0 -> SOFT HYPHEN
'\u02dd' # 0x00f1 -> DOUBLE ACUTE ACCENT
'\u02db' # 0x00f2 -> OGONEK
'\u02c7' # 0x00f3 -> CARON
'\u02d8' # 0x00f4 -> BREVE
'\xa7' # 0x00f5 -> SECTION SIGN
'\xf7' # 0x00f6 -> DIVISION SIGN
'\xb8' # 0x00f7 -> CEDILLA
'\xb0' # 0x00f8 -> DEGREE SIGN
'\xa8' # 0x00f9 -> DIAERESIS
'\u02d9' # 0x00fa -> DOT ABOVE
'\u0171' # 0x00fb -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
'\u0158' # 0x00fc -> LATIN CAPITAL LETTER R WITH CARON
'\u0159' # 0x00fd -> LATIN SMALL LETTER R WITH CARON
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b8: 0x00f7, # CEDILLA
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x00ed, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x00ec, # LATIN SMALL LETTER Y WITH ACUTE
0x0102: 0x00c6, # LATIN CAPITAL LETTER A WITH BREVE
0x0103: 0x00c7, # LATIN SMALL LETTER A WITH BREVE
0x0104: 0x00a4, # LATIN CAPITAL LETTER A WITH OGONEK
0x0105: 0x00a5, # LATIN SMALL LETTER A WITH OGONEK
0x0106: 0x008f, # LATIN CAPITAL LETTER C WITH ACUTE
0x0107: 0x0086, # LATIN SMALL LETTER C WITH ACUTE
0x010c: 0x00ac, # LATIN CAPITAL LETTER C WITH CARON
0x010d: 0x009f, # LATIN SMALL LETTER C WITH CARON
0x010e: 0x00d2, # LATIN CAPITAL LETTER D WITH CARON
0x010f: 0x00d4, # LATIN SMALL LETTER D WITH CARON
0x0110: 0x00d1, # LATIN CAPITAL LETTER D WITH STROKE
0x0111: 0x00d0, # LATIN SMALL LETTER D WITH STROKE
0x0118: 0x00a8, # LATIN CAPITAL LETTER E WITH OGONEK
0x0119: 0x00a9, # LATIN SMALL LETTER E WITH OGONEK
0x011a: 0x00b7, # LATIN CAPITAL LETTER E WITH CARON
0x011b: 0x00d8, # LATIN SMALL LETTER E WITH CARON
0x0139: 0x0091, # LATIN CAPITAL LETTER L WITH ACUTE
0x013a: 0x0092, # LATIN SMALL LETTER L WITH ACUTE
0x013d: 0x0095, # LATIN CAPITAL LETTER L WITH CARON
0x013e: 0x0096, # LATIN SMALL LETTER L WITH CARON
0x0141: 0x009d, # LATIN CAPITAL LETTER L WITH STROKE
0x0142: 0x0088, # LATIN SMALL LETTER L WITH STROKE
0x0143: 0x00e3, # LATIN CAPITAL LETTER N WITH ACUTE
0x0144: 0x00e4, # LATIN SMALL LETTER N WITH ACUTE
0x0147: 0x00d5, # LATIN CAPITAL LETTER N WITH CARON
0x0148: 0x00e5, # LATIN SMALL LETTER N WITH CARON
0x0150: 0x008a, # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
0x0151: 0x008b, # LATIN SMALL LETTER O WITH DOUBLE ACUTE
0x0154: 0x00e8, # LATIN CAPITAL LETTER R WITH ACUTE
0x0155: 0x00ea, # LATIN SMALL LETTER R WITH ACUTE
0x0158: 0x00fc, # LATIN CAPITAL LETTER R WITH CARON
0x0159: 0x00fd, # LATIN SMALL LETTER R WITH CARON
0x015a: 0x0097, # LATIN CAPITAL LETTER S WITH ACUTE
0x015b: 0x0098, # LATIN SMALL LETTER S WITH ACUTE
0x015e: 0x00b8, # LATIN CAPITAL LETTER S WITH CEDILLA
0x015f: 0x00ad, # LATIN SMALL LETTER S WITH CEDILLA
0x0160: 0x00e6, # LATIN CAPITAL LETTER S WITH CARON
0x0161: 0x00e7, # LATIN SMALL LETTER S WITH CARON
0x0162: 0x00dd, # LATIN CAPITAL LETTER T WITH CEDILLA
0x0163: 0x00ee, # LATIN SMALL LETTER T WITH CEDILLA
0x0164: 0x009b, # LATIN CAPITAL LETTER T WITH CARON
0x0165: 0x009c, # LATIN SMALL LETTER T WITH CARON
0x016e: 0x00de, # LATIN CAPITAL LETTER U WITH RING ABOVE
0x016f: 0x0085, # LATIN SMALL LETTER U WITH RING ABOVE
0x0170: 0x00eb, # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
0x0171: 0x00fb, # LATIN SMALL LETTER U WITH DOUBLE ACUTE
0x0179: 0x008d, # LATIN CAPITAL LETTER Z WITH ACUTE
0x017a: 0x00ab, # LATIN SMALL LETTER Z WITH ACUTE
0x017b: 0x00bd, # LATIN CAPITAL LETTER Z WITH DOT ABOVE
0x017c: 0x00be, # LATIN SMALL LETTER Z WITH DOT ABOVE
0x017d: 0x00a6, # LATIN CAPITAL LETTER Z WITH CARON
0x017e: 0x00a7, # LATIN SMALL LETTER Z WITH CARON
0x02c7: 0x00f3, # CARON
0x02d8: 0x00f4, # BREVE
0x02d9: 0x00fa, # DOT ABOVE
0x02db: 0x00f2, # OGONEK
0x02dd: 0x00f1, # DOUBLE ACUTE ACCENT
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP855.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp855',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0452, # CYRILLIC SMALL LETTER DJE
0x0081: 0x0402, # CYRILLIC CAPITAL LETTER DJE
0x0082: 0x0453, # CYRILLIC SMALL LETTER GJE
0x0083: 0x0403, # CYRILLIC CAPITAL LETTER GJE
0x0084: 0x0451, # CYRILLIC SMALL LETTER IO
0x0085: 0x0401, # CYRILLIC CAPITAL LETTER IO
0x0086: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0087: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0088: 0x0455, # CYRILLIC SMALL LETTER DZE
0x0089: 0x0405, # CYRILLIC CAPITAL LETTER DZE
0x008a: 0x0456, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x008b: 0x0406, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x008c: 0x0457, # CYRILLIC SMALL LETTER YI
0x008d: 0x0407, # CYRILLIC CAPITAL LETTER YI
0x008e: 0x0458, # CYRILLIC SMALL LETTER JE
0x008f: 0x0408, # CYRILLIC CAPITAL LETTER JE
0x0090: 0x0459, # CYRILLIC SMALL LETTER LJE
0x0091: 0x0409, # CYRILLIC CAPITAL LETTER LJE
0x0092: 0x045a, # CYRILLIC SMALL LETTER NJE
0x0093: 0x040a, # CYRILLIC CAPITAL LETTER NJE
0x0094: 0x045b, # CYRILLIC SMALL LETTER TSHE
0x0095: 0x040b, # CYRILLIC CAPITAL LETTER TSHE
0x0096: 0x045c, # CYRILLIC SMALL LETTER KJE
0x0097: 0x040c, # CYRILLIC CAPITAL LETTER KJE
0x0098: 0x045e, # CYRILLIC SMALL LETTER SHORT U
0x0099: 0x040e, # CYRILLIC CAPITAL LETTER SHORT U
0x009a: 0x045f, # CYRILLIC SMALL LETTER DZHE
0x009b: 0x040f, # CYRILLIC CAPITAL LETTER DZHE
0x009c: 0x044e, # CYRILLIC SMALL LETTER YU
0x009d: 0x042e, # CYRILLIC CAPITAL LETTER YU
0x009e: 0x044a, # CYRILLIC SMALL LETTER HARD SIGN
0x009f: 0x042a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x00a0: 0x0430, # CYRILLIC SMALL LETTER A
0x00a1: 0x0410, # CYRILLIC CAPITAL LETTER A
0x00a2: 0x0431, # CYRILLIC SMALL LETTER BE
0x00a3: 0x0411, # CYRILLIC CAPITAL LETTER BE
0x00a4: 0x0446, # CYRILLIC SMALL LETTER TSE
0x00a5: 0x0426, # CYRILLIC CAPITAL LETTER TSE
0x00a6: 0x0434, # CYRILLIC SMALL LETTER DE
0x00a7: 0x0414, # CYRILLIC CAPITAL LETTER DE
0x00a8: 0x0435, # CYRILLIC SMALL LETTER IE
0x00a9: 0x0415, # CYRILLIC CAPITAL LETTER IE
0x00aa: 0x0444, # CYRILLIC SMALL LETTER EF
0x00ab: 0x0424, # CYRILLIC CAPITAL LETTER EF
0x00ac: 0x0433, # CYRILLIC SMALL LETTER GHE
0x00ad: 0x0413, # CYRILLIC CAPITAL LETTER GHE
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x0445, # CYRILLIC SMALL LETTER HA
0x00b6: 0x0425, # CYRILLIC CAPITAL LETTER HA
0x00b7: 0x0438, # CYRILLIC SMALL LETTER I
0x00b8: 0x0418, # CYRILLIC CAPITAL LETTER I
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x0439, # CYRILLIC SMALL LETTER SHORT I
0x00be: 0x0419, # CYRILLIC CAPITAL LETTER SHORT I
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x043a, # CYRILLIC SMALL LETTER KA
0x00c7: 0x041a, # CYRILLIC CAPITAL LETTER KA
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x043b, # CYRILLIC SMALL LETTER EL
0x00d1: 0x041b, # CYRILLIC CAPITAL LETTER EL
0x00d2: 0x043c, # CYRILLIC SMALL LETTER EM
0x00d3: 0x041c, # CYRILLIC CAPITAL LETTER EM
0x00d4: 0x043d, # CYRILLIC SMALL LETTER EN
0x00d5: 0x041d, # CYRILLIC CAPITAL LETTER EN
0x00d6: 0x043e, # CYRILLIC SMALL LETTER O
0x00d7: 0x041e, # CYRILLIC CAPITAL LETTER O
0x00d8: 0x043f, # CYRILLIC SMALL LETTER PE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x041f, # CYRILLIC CAPITAL LETTER PE
0x00de: 0x044f, # CYRILLIC SMALL LETTER YA
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x042f, # CYRILLIC CAPITAL LETTER YA
0x00e1: 0x0440, # CYRILLIC SMALL LETTER ER
0x00e2: 0x0420, # CYRILLIC CAPITAL LETTER ER
0x00e3: 0x0441, # CYRILLIC SMALL LETTER ES
0x00e4: 0x0421, # CYRILLIC CAPITAL LETTER ES
0x00e5: 0x0442, # CYRILLIC SMALL LETTER TE
0x00e6: 0x0422, # CYRILLIC CAPITAL LETTER TE
0x00e7: 0x0443, # CYRILLIC SMALL LETTER U
0x00e8: 0x0423, # CYRILLIC CAPITAL LETTER U
0x00e9: 0x0436, # CYRILLIC SMALL LETTER ZHE
0x00ea: 0x0416, # CYRILLIC CAPITAL LETTER ZHE
0x00eb: 0x0432, # CYRILLIC SMALL LETTER VE
0x00ec: 0x0412, # CYRILLIC CAPITAL LETTER VE
0x00ed: 0x044c, # CYRILLIC SMALL LETTER SOFT SIGN
0x00ee: 0x042c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x00ef: 0x2116, # NUMERO SIGN
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x044b, # CYRILLIC SMALL LETTER YERU
0x00f2: 0x042b, # CYRILLIC CAPITAL LETTER YERU
0x00f3: 0x0437, # CYRILLIC SMALL LETTER ZE
0x00f4: 0x0417, # CYRILLIC CAPITAL LETTER ZE
0x00f5: 0x0448, # CYRILLIC SMALL LETTER SHA
0x00f6: 0x0428, # CYRILLIC CAPITAL LETTER SHA
0x00f7: 0x044d, # CYRILLIC SMALL LETTER E
0x00f8: 0x042d, # CYRILLIC CAPITAL LETTER E
0x00f9: 0x0449, # CYRILLIC SMALL LETTER SHCHA
0x00fa: 0x0429, # CYRILLIC CAPITAL LETTER SHCHA
0x00fb: 0x0447, # CYRILLIC SMALL LETTER CHE
0x00fc: 0x0427, # CYRILLIC CAPITAL LETTER CHE
0x00fd: 0x00a7, # SECTION SIGN
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\u0452' # 0x0080 -> CYRILLIC SMALL LETTER DJE
'\u0402' # 0x0081 -> CYRILLIC CAPITAL LETTER DJE
'\u0453' # 0x0082 -> CYRILLIC SMALL LETTER GJE
'\u0403' # 0x0083 -> CYRILLIC CAPITAL LETTER GJE
'\u0451' # 0x0084 -> CYRILLIC SMALL LETTER IO
'\u0401' # 0x0085 -> CYRILLIC CAPITAL LETTER IO
'\u0454' # 0x0086 -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\u0404' # 0x0087 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\u0455' # 0x0088 -> CYRILLIC SMALL LETTER DZE
'\u0405' # 0x0089 -> CYRILLIC CAPITAL LETTER DZE
'\u0456' # 0x008a -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0406' # 0x008b -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0457' # 0x008c -> CYRILLIC SMALL LETTER YI
'\u0407' # 0x008d -> CYRILLIC CAPITAL LETTER YI
'\u0458' # 0x008e -> CYRILLIC SMALL LETTER JE
'\u0408' # 0x008f -> CYRILLIC CAPITAL LETTER JE
'\u0459' # 0x0090 -> CYRILLIC SMALL LETTER LJE
'\u0409' # 0x0091 -> CYRILLIC CAPITAL LETTER LJE
'\u045a' # 0x0092 -> CYRILLIC SMALL LETTER NJE
'\u040a' # 0x0093 -> CYRILLIC CAPITAL LETTER NJE
'\u045b' # 0x0094 -> CYRILLIC SMALL LETTER TSHE
'\u040b' # 0x0095 -> CYRILLIC CAPITAL LETTER TSHE
'\u045c' # 0x0096 -> CYRILLIC SMALL LETTER KJE
'\u040c' # 0x0097 -> CYRILLIC CAPITAL LETTER KJE
'\u045e' # 0x0098 -> CYRILLIC SMALL LETTER SHORT U
'\u040e' # 0x0099 -> CYRILLIC CAPITAL LETTER SHORT U
'\u045f' # 0x009a -> CYRILLIC SMALL LETTER DZHE
'\u040f' # 0x009b -> CYRILLIC CAPITAL LETTER DZHE
'\u044e' # 0x009c -> CYRILLIC SMALL LETTER YU
'\u042e' # 0x009d -> CYRILLIC CAPITAL LETTER YU
'\u044a' # 0x009e -> CYRILLIC SMALL LETTER HARD SIGN
'\u042a' # 0x009f -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u0430' # 0x00a0 -> CYRILLIC SMALL LETTER A
'\u0410' # 0x00a1 -> CYRILLIC CAPITAL LETTER A
'\u0431' # 0x00a2 -> CYRILLIC SMALL LETTER BE
'\u0411' # 0x00a3 -> CYRILLIC CAPITAL LETTER BE
'\u0446' # 0x00a4 -> CYRILLIC SMALL LETTER TSE
'\u0426' # 0x00a5 -> CYRILLIC CAPITAL LETTER TSE
'\u0434' # 0x00a6 -> CYRILLIC SMALL LETTER DE
'\u0414' # 0x00a7 -> CYRILLIC CAPITAL LETTER DE
'\u0435' # 0x00a8 -> CYRILLIC SMALL LETTER IE
'\u0415' # 0x00a9 -> CYRILLIC CAPITAL LETTER IE
'\u0444' # 0x00aa -> CYRILLIC SMALL LETTER EF
'\u0424' # 0x00ab -> CYRILLIC CAPITAL LETTER EF
'\u0433' # 0x00ac -> CYRILLIC SMALL LETTER GHE
'\u0413' # 0x00ad -> CYRILLIC CAPITAL LETTER GHE
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u0445' # 0x00b5 -> CYRILLIC SMALL LETTER HA
'\u0425' # 0x00b6 -> CYRILLIC CAPITAL LETTER HA
'\u0438' # 0x00b7 -> CYRILLIC SMALL LETTER I
'\u0418' # 0x00b8 -> CYRILLIC CAPITAL LETTER I
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u0439' # 0x00bd -> CYRILLIC SMALL LETTER SHORT I
'\u0419' # 0x00be -> CYRILLIC CAPITAL LETTER SHORT I
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u043a' # 0x00c6 -> CYRILLIC SMALL LETTER KA
'\u041a' # 0x00c7 -> CYRILLIC CAPITAL LETTER KA
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa4' # 0x00cf -> CURRENCY SIGN
'\u043b' # 0x00d0 -> CYRILLIC SMALL LETTER EL
'\u041b' # 0x00d1 -> CYRILLIC CAPITAL LETTER EL
'\u043c' # 0x00d2 -> CYRILLIC SMALL LETTER EM
'\u041c' # 0x00d3 -> CYRILLIC CAPITAL LETTER EM
'\u043d' # 0x00d4 -> CYRILLIC SMALL LETTER EN
'\u041d' # 0x00d5 -> CYRILLIC CAPITAL LETTER EN
'\u043e' # 0x00d6 -> CYRILLIC SMALL LETTER O
'\u041e' # 0x00d7 -> CYRILLIC CAPITAL LETTER O
'\u043f' # 0x00d8 -> CYRILLIC SMALL LETTER PE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u041f' # 0x00dd -> CYRILLIC CAPITAL LETTER PE
'\u044f' # 0x00de -> CYRILLIC SMALL LETTER YA
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u042f' # 0x00e0 -> CYRILLIC CAPITAL LETTER YA
'\u0440' # 0x00e1 -> CYRILLIC SMALL LETTER ER
'\u0420' # 0x00e2 -> CYRILLIC CAPITAL LETTER ER
'\u0441' # 0x00e3 -> CYRILLIC SMALL LETTER ES
'\u0421' # 0x00e4 -> CYRILLIC CAPITAL LETTER ES
'\u0442' # 0x00e5 -> CYRILLIC SMALL LETTER TE
'\u0422' # 0x00e6 -> CYRILLIC CAPITAL LETTER TE
'\u0443' # 0x00e7 -> CYRILLIC SMALL LETTER U
'\u0423' # 0x00e8 -> CYRILLIC CAPITAL LETTER U
'\u0436' # 0x00e9 -> CYRILLIC SMALL LETTER ZHE
'\u0416' # 0x00ea -> CYRILLIC CAPITAL LETTER ZHE
'\u0432' # 0x00eb -> CYRILLIC SMALL LETTER VE
'\u0412' # 0x00ec -> CYRILLIC CAPITAL LETTER VE
'\u044c' # 0x00ed -> CYRILLIC SMALL LETTER SOFT SIGN
'\u042c' # 0x00ee -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u2116' # 0x00ef -> NUMERO SIGN
'\xad' # 0x00f0 -> SOFT HYPHEN
'\u044b' # 0x00f1 -> CYRILLIC SMALL LETTER YERU
'\u042b' # 0x00f2 -> CYRILLIC CAPITAL LETTER YERU
'\u0437' # 0x00f3 -> CYRILLIC SMALL LETTER ZE
'\u0417' # 0x00f4 -> CYRILLIC CAPITAL LETTER ZE
'\u0448' # 0x00f5 -> CYRILLIC SMALL LETTER SHA
'\u0428' # 0x00f6 -> CYRILLIC CAPITAL LETTER SHA
'\u044d' # 0x00f7 -> CYRILLIC SMALL LETTER E
'\u042d' # 0x00f8 -> CYRILLIC CAPITAL LETTER E
'\u0449' # 0x00f9 -> CYRILLIC SMALL LETTER SHCHA
'\u0429' # 0x00fa -> CYRILLIC CAPITAL LETTER SHCHA
'\u0447' # 0x00fb -> CYRILLIC SMALL LETTER CHE
'\u0427' # 0x00fc -> CYRILLIC CAPITAL LETTER CHE
'\xa7' # 0x00fd -> SECTION SIGN
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a7: 0x00fd, # SECTION SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ad: 0x00f0, # SOFT HYPHEN
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x0401: 0x0085, # CYRILLIC CAPITAL LETTER IO
0x0402: 0x0081, # CYRILLIC CAPITAL LETTER DJE
0x0403: 0x0083, # CYRILLIC CAPITAL LETTER GJE
0x0404: 0x0087, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0405: 0x0089, # CYRILLIC CAPITAL LETTER DZE
0x0406: 0x008b, # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
0x0407: 0x008d, # CYRILLIC CAPITAL LETTER YI
0x0408: 0x008f, # CYRILLIC CAPITAL LETTER JE
0x0409: 0x0091, # CYRILLIC CAPITAL LETTER LJE
0x040a: 0x0093, # CYRILLIC CAPITAL LETTER NJE
0x040b: 0x0095, # CYRILLIC CAPITAL LETTER TSHE
0x040c: 0x0097, # CYRILLIC CAPITAL LETTER KJE
0x040e: 0x0099, # CYRILLIC CAPITAL LETTER SHORT U
0x040f: 0x009b, # CYRILLIC CAPITAL LETTER DZHE
0x0410: 0x00a1, # CYRILLIC CAPITAL LETTER A
0x0411: 0x00a3, # CYRILLIC CAPITAL LETTER BE
0x0412: 0x00ec, # CYRILLIC CAPITAL LETTER VE
0x0413: 0x00ad, # CYRILLIC CAPITAL LETTER GHE
0x0414: 0x00a7, # CYRILLIC CAPITAL LETTER DE
0x0415: 0x00a9, # CYRILLIC CAPITAL LETTER IE
0x0416: 0x00ea, # CYRILLIC CAPITAL LETTER ZHE
0x0417: 0x00f4, # CYRILLIC CAPITAL LETTER ZE
0x0418: 0x00b8, # CYRILLIC CAPITAL LETTER I
0x0419: 0x00be, # CYRILLIC CAPITAL LETTER SHORT I
0x041a: 0x00c7, # CYRILLIC CAPITAL LETTER KA
0x041b: 0x00d1, # CYRILLIC CAPITAL LETTER EL
0x041c: 0x00d3, # CYRILLIC CAPITAL LETTER EM
0x041d: 0x00d5, # CYRILLIC CAPITAL LETTER EN
0x041e: 0x00d7, # CYRILLIC CAPITAL LETTER O
0x041f: 0x00dd, # CYRILLIC CAPITAL LETTER PE
0x0420: 0x00e2, # CYRILLIC CAPITAL LETTER ER
0x0421: 0x00e4, # CYRILLIC CAPITAL LETTER ES
0x0422: 0x00e6, # CYRILLIC CAPITAL LETTER TE
0x0423: 0x00e8, # CYRILLIC CAPITAL LETTER U
0x0424: 0x00ab, # CYRILLIC CAPITAL LETTER EF
0x0425: 0x00b6, # CYRILLIC CAPITAL LETTER HA
0x0426: 0x00a5, # CYRILLIC CAPITAL LETTER TSE
0x0427: 0x00fc, # CYRILLIC CAPITAL LETTER CHE
0x0428: 0x00f6, # CYRILLIC CAPITAL LETTER SHA
0x0429: 0x00fa, # CYRILLIC CAPITAL LETTER SHCHA
0x042a: 0x009f, # CYRILLIC CAPITAL LETTER HARD SIGN
0x042b: 0x00f2, # CYRILLIC CAPITAL LETTER YERU
0x042c: 0x00ee, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x042d: 0x00f8, # CYRILLIC CAPITAL LETTER E
0x042e: 0x009d, # CYRILLIC CAPITAL LETTER YU
0x042f: 0x00e0, # CYRILLIC CAPITAL LETTER YA
0x0430: 0x00a0, # CYRILLIC SMALL LETTER A
0x0431: 0x00a2, # CYRILLIC SMALL LETTER BE
0x0432: 0x00eb, # CYRILLIC SMALL LETTER VE
0x0433: 0x00ac, # CYRILLIC SMALL LETTER GHE
0x0434: 0x00a6, # CYRILLIC SMALL LETTER DE
0x0435: 0x00a8, # CYRILLIC SMALL LETTER IE
0x0436: 0x00e9, # CYRILLIC SMALL LETTER ZHE
0x0437: 0x00f3, # CYRILLIC SMALL LETTER ZE
0x0438: 0x00b7, # CYRILLIC SMALL LETTER I
0x0439: 0x00bd, # CYRILLIC SMALL LETTER SHORT I
0x043a: 0x00c6, # CYRILLIC SMALL LETTER KA
0x043b: 0x00d0, # CYRILLIC SMALL LETTER EL
0x043c: 0x00d2, # CYRILLIC SMALL LETTER EM
0x043d: 0x00d4, # CYRILLIC SMALL LETTER EN
0x043e: 0x00d6, # CYRILLIC SMALL LETTER O
0x043f: 0x00d8, # CYRILLIC SMALL LETTER PE
0x0440: 0x00e1, # CYRILLIC SMALL LETTER ER
0x0441: 0x00e3, # CYRILLIC SMALL LETTER ES
0x0442: 0x00e5, # CYRILLIC SMALL LETTER TE
0x0443: 0x00e7, # CYRILLIC SMALL LETTER U
0x0444: 0x00aa, # CYRILLIC SMALL LETTER EF
0x0445: 0x00b5, # CYRILLIC SMALL LETTER HA
0x0446: 0x00a4, # CYRILLIC SMALL LETTER TSE
0x0447: 0x00fb, # CYRILLIC SMALL LETTER CHE
0x0448: 0x00f5, # CYRILLIC SMALL LETTER SHA
0x0449: 0x00f9, # CYRILLIC SMALL LETTER SHCHA
0x044a: 0x009e, # CYRILLIC SMALL LETTER HARD SIGN
0x044b: 0x00f1, # CYRILLIC SMALL LETTER YERU
0x044c: 0x00ed, # CYRILLIC SMALL LETTER SOFT SIGN
0x044d: 0x00f7, # CYRILLIC SMALL LETTER E
0x044e: 0x009c, # CYRILLIC SMALL LETTER YU
0x044f: 0x00de, # CYRILLIC SMALL LETTER YA
0x0451: 0x0084, # CYRILLIC SMALL LETTER IO
0x0452: 0x0080, # CYRILLIC SMALL LETTER DJE
0x0453: 0x0082, # CYRILLIC SMALL LETTER GJE
0x0454: 0x0086, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0455: 0x0088, # CYRILLIC SMALL LETTER DZE
0x0456: 0x008a, # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
0x0457: 0x008c, # CYRILLIC SMALL LETTER YI
0x0458: 0x008e, # CYRILLIC SMALL LETTER JE
0x0459: 0x0090, # CYRILLIC SMALL LETTER LJE
0x045a: 0x0092, # CYRILLIC SMALL LETTER NJE
0x045b: 0x0094, # CYRILLIC SMALL LETTER TSHE
0x045c: 0x0096, # CYRILLIC SMALL LETTER KJE
0x045e: 0x0098, # CYRILLIC SMALL LETTER SHORT U
0x045f: 0x009a, # CYRILLIC SMALL LETTER DZHE
0x2116: 0x00ef, # NUMERO SIGN
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp856 generated from 'MAPPINGS/VENDORS/MISC/CP856.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp856',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u05d0' # 0x80 -> HEBREW LETTER ALEF
'\u05d1' # 0x81 -> HEBREW LETTER BET
'\u05d2' # 0x82 -> HEBREW LETTER GIMEL
'\u05d3' # 0x83 -> HEBREW LETTER DALET
'\u05d4' # 0x84 -> HEBREW LETTER HE
'\u05d5' # 0x85 -> HEBREW LETTER VAV
'\u05d6' # 0x86 -> HEBREW LETTER ZAYIN
'\u05d7' # 0x87 -> HEBREW LETTER HET
'\u05d8' # 0x88 -> HEBREW LETTER TET
'\u05d9' # 0x89 -> HEBREW LETTER YOD
'\u05da' # 0x8A -> HEBREW LETTER FINAL KAF
'\u05db' # 0x8B -> HEBREW LETTER KAF
'\u05dc' # 0x8C -> HEBREW LETTER LAMED
'\u05dd' # 0x8D -> HEBREW LETTER FINAL MEM
'\u05de' # 0x8E -> HEBREW LETTER MEM
'\u05df' # 0x8F -> HEBREW LETTER FINAL NUN
'\u05e0' # 0x90 -> HEBREW LETTER NUN
'\u05e1' # 0x91 -> HEBREW LETTER SAMEKH
'\u05e2' # 0x92 -> HEBREW LETTER AYIN
'\u05e3' # 0x93 -> HEBREW LETTER FINAL PE
'\u05e4' # 0x94 -> HEBREW LETTER PE
'\u05e5' # 0x95 -> HEBREW LETTER FINAL TSADI
'\u05e6' # 0x96 -> HEBREW LETTER TSADI
'\u05e7' # 0x97 -> HEBREW LETTER QOF
'\u05e8' # 0x98 -> HEBREW LETTER RESH
'\u05e9' # 0x99 -> HEBREW LETTER SHIN
'\u05ea' # 0x9A -> HEBREW LETTER TAV
'\ufffe' # 0x9B -> UNDEFINED
'\xa3' # 0x9C -> POUND SIGN
'\ufffe' # 0x9D -> UNDEFINED
'\xd7' # 0x9E -> MULTIPLICATION SIGN
'\ufffe' # 0x9F -> UNDEFINED
'\ufffe' # 0xA0 -> UNDEFINED
'\ufffe' # 0xA1 -> UNDEFINED
'\ufffe' # 0xA2 -> UNDEFINED
'\ufffe' # 0xA3 -> UNDEFINED
'\ufffe' # 0xA4 -> UNDEFINED
'\ufffe' # 0xA5 -> UNDEFINED
'\ufffe' # 0xA6 -> UNDEFINED
'\ufffe' # 0xA7 -> UNDEFINED
'\ufffe' # 0xA8 -> UNDEFINED
'\xae' # 0xA9 -> REGISTERED SIGN
'\xac' # 0xAA -> NOT SIGN
'\xbd' # 0xAB -> VULGAR FRACTION ONE HALF
'\xbc' # 0xAC -> VULGAR FRACTION ONE QUARTER
'\ufffe' # 0xAD -> UNDEFINED
'\xab' # 0xAE -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xAF -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0xB0 -> LIGHT SHADE
'\u2592' # 0xB1 -> MEDIUM SHADE
'\u2593' # 0xB2 -> DARK SHADE
'\u2502' # 0xB3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0xB4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\ufffe' # 0xB5 -> UNDEFINED
'\ufffe' # 0xB6 -> UNDEFINED
'\ufffe' # 0xB7 -> UNDEFINED
'\xa9' # 0xB8 -> COPYRIGHT SIGN
'\u2563' # 0xB9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0xBA -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0xBB -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0xBC -> BOX DRAWINGS DOUBLE UP AND LEFT
'\xa2' # 0xBD -> CENT SIGN
'\xa5' # 0xBE -> YEN SIGN
'\u2510' # 0xBF -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0xC0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0xC1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0xC2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0xC3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0xC4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0xC5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\ufffe' # 0xC6 -> UNDEFINED
'\ufffe' # 0xC7 -> UNDEFINED
'\u255a' # 0xC8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0xC9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0xCA -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0xCB -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0xCC -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0xCD -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0xCE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa4' # 0xCF -> CURRENCY SIGN
'\ufffe' # 0xD0 -> UNDEFINED
'\ufffe' # 0xD1 -> UNDEFINED
'\ufffe' # 0xD2 -> UNDEFINED
'\ufffe' # 0xD3 -> UNDEFINEDS
'\ufffe' # 0xD4 -> UNDEFINED
'\ufffe' # 0xD5 -> UNDEFINED
'\ufffe' # 0xD6 -> UNDEFINEDE
'\ufffe' # 0xD7 -> UNDEFINED
'\ufffe' # 0xD8 -> UNDEFINED
'\u2518' # 0xD9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0xDA -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0xDB -> FULL BLOCK
'\u2584' # 0xDC -> LOWER HALF BLOCK
'\xa6' # 0xDD -> BROKEN BAR
'\ufffe' # 0xDE -> UNDEFINED
'\u2580' # 0xDF -> UPPER HALF BLOCK
'\ufffe' # 0xE0 -> UNDEFINED
'\ufffe' # 0xE1 -> UNDEFINED
'\ufffe' # 0xE2 -> UNDEFINED
'\ufffe' # 0xE3 -> UNDEFINED
'\ufffe' # 0xE4 -> UNDEFINED
'\ufffe' # 0xE5 -> UNDEFINED
'\xb5' # 0xE6 -> MICRO SIGN
'\ufffe' # 0xE7 -> UNDEFINED
'\ufffe' # 0xE8 -> UNDEFINED
'\ufffe' # 0xE9 -> UNDEFINED
'\ufffe' # 0xEA -> UNDEFINED
'\ufffe' # 0xEB -> UNDEFINED
'\ufffe' # 0xEC -> UNDEFINED
'\ufffe' # 0xED -> UNDEFINED
'\xaf' # 0xEE -> MACRON
'\xb4' # 0xEF -> ACUTE ACCENT
'\xad' # 0xF0 -> SOFT HYPHEN
'\xb1' # 0xF1 -> PLUS-MINUS SIGN
'\u2017' # 0xF2 -> DOUBLE LOW LINE
'\xbe' # 0xF3 -> VULGAR FRACTION THREE QUARTERS
'\xb6' # 0xF4 -> PILCROW SIGN
'\xa7' # 0xF5 -> SECTION SIGN
'\xf7' # 0xF6 -> DIVISION SIGN
'\xb8' # 0xF7 -> CEDILLA
'\xb0' # 0xF8 -> DEGREE SIGN
'\xa8' # 0xF9 -> DIAERESIS
'\xb7' # 0xFA -> MIDDLE DOT
'\xb9' # 0xFB -> SUPERSCRIPT ONE
'\xb3' # 0xFC -> SUPERSCRIPT THREE
'\xb2' # 0xFD -> SUPERSCRIPT TWO
'\u25a0' # 0xFE -> BLACK SQUARE
'\xa0' # 0xFF -> NO-BREAK SPACE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP857.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp857',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x0131, # LATIN SMALL LETTER DOTLESS I
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x0130, # LATIN CAPITAL LETTER I WITH DOT ABOVE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x015e, # LATIN CAPITAL LETTER S WITH CEDILLA
0x009f: 0x015f, # LATIN SMALL LETTER S WITH CEDILLA
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x011e, # LATIN CAPITAL LETTER G WITH BREVE
0x00a7: 0x011f, # LATIN SMALL LETTER G WITH BREVE
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00b8: 0x00a9, # COPYRIGHT SIGN
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x00a2, # CENT SIGN
0x00be: 0x00a5, # YEN SIGN
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00c7: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00d1: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00d2: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00d5: None, # UNDEFINED
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x00a6, # BROKEN BAR
0x00de: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: None, # UNDEFINED
0x00e8: 0x00d7, # MULTIPLICATION SIGN
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00eb: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ed: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x00ee: 0x00af, # MACRON
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: None, # UNDEFINED
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\u0131' # 0x008d -> LATIN SMALL LETTER DOTLESS I
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\u0130' # 0x0098 -> LATIN CAPITAL LETTER I WITH DOT ABOVE
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
'\xa3' # 0x009c -> POUND SIGN
'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
'\u015e' # 0x009e -> LATIN CAPITAL LETTER S WITH CEDILLA
'\u015f' # 0x009f -> LATIN SMALL LETTER S WITH CEDILLA
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\u011e' # 0x00a6 -> LATIN CAPITAL LETTER G WITH BREVE
'\u011f' # 0x00a7 -> LATIN SMALL LETTER G WITH BREVE
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\xae' # 0x00a9 -> REGISTERED SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc0' # 0x00b7 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xa9' # 0x00b8 -> COPYRIGHT SIGN
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\xa2' # 0x00bd -> CENT SIGN
'\xa5' # 0x00be -> YEN SIGN
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\xe3' # 0x00c6 -> LATIN SMALL LETTER A WITH TILDE
'\xc3' # 0x00c7 -> LATIN CAPITAL LETTER A WITH TILDE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa4' # 0x00cf -> CURRENCY SIGN
'\xba' # 0x00d0 -> MASCULINE ORDINAL INDICATOR
'\xaa' # 0x00d1 -> FEMININE ORDINAL INDICATOR
'\xca' # 0x00d2 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x00d4 -> LATIN CAPITAL LETTER E WITH GRAVE
'\ufffe' # 0x00d5 -> UNDEFINED
'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x00d8 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\xa6' # 0x00dd -> BROKEN BAR
'\xcc' # 0x00de -> LATIN CAPITAL LETTER I WITH GRAVE
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd2' # 0x00e3 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xb5' # 0x00e6 -> MICRO SIGN
'\ufffe' # 0x00e7 -> UNDEFINED
'\xd7' # 0x00e8 -> MULTIPLICATION SIGN
'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0x00ea -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0x00eb -> LATIN CAPITAL LETTER U WITH GRAVE
'\xec' # 0x00ec -> LATIN SMALL LETTER I WITH GRAVE
'\xff' # 0x00ed -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xaf' # 0x00ee -> MACRON
'\xb4' # 0x00ef -> ACUTE ACCENT
'\xad' # 0x00f0 -> SOFT HYPHEN
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\ufffe' # 0x00f2 -> UNDEFINED
'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
'\xb6' # 0x00f4 -> PILCROW SIGN
'\xa7' # 0x00f5 -> SECTION SIGN
'\xf7' # 0x00f6 -> DIVISION SIGN
'\xb8' # 0x00f7 -> CEDILLA
'\xb0' # 0x00f8 -> DEGREE SIGN
'\xa8' # 0x00f9 -> DIAERESIS
'\xb7' # 0x00fa -> MIDDLE DOT
'\xb9' # 0x00fb -> SUPERSCRIPT ONE
'\xb3' # 0x00fc -> SUPERSCRIPT THREE
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x00bd, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a5: 0x00be, # YEN SIGN
0x00a6: 0x00dd, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x00b8, # COPYRIGHT SIGN
0x00aa: 0x00d1, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00af: 0x00ee, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00f7, # CEDILLA
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00ba: 0x00d0, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x00b7, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x00c7, # LATIN CAPITAL LETTER A WITH TILDE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x00d4, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x00d2, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cc: 0x00de, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x00d8, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00e3, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x00e8, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00d9: 0x00eb, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00db: 0x00ea, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x00c6, # LATIN SMALL LETTER A WITH TILDE
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00ff: 0x00ed, # LATIN SMALL LETTER Y WITH DIAERESIS
0x011e: 0x00a6, # LATIN CAPITAL LETTER G WITH BREVE
0x011f: 0x00a7, # LATIN SMALL LETTER G WITH BREVE
0x0130: 0x0098, # LATIN CAPITAL LETTER I WITH DOT ABOVE
0x0131: 0x008d, # LATIN SMALL LETTER DOTLESS I
0x015e: 0x009e, # LATIN CAPITAL LETTER S WITH CEDILLA
0x015f: 0x009f, # LATIN SMALL LETTER S WITH CEDILLA
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec for CP858, modified from cp850.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp858',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x00d7, # MULTIPLICATION SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00ae, # REGISTERED SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00b6: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00b7: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x00b8: 0x00a9, # COPYRIGHT SIGN
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x00a2, # CENT SIGN
0x00be: 0x00a5, # YEN SIGN
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x00c7: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x00a4, # CURRENCY SIGN
0x00d0: 0x00f0, # LATIN SMALL LETTER ETH
0x00d1: 0x00d0, # LATIN CAPITAL LETTER ETH
0x00d2: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00d3: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00d4: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x00d5: 0x20ac, # EURO SIGN
0x00d6: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d7: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00d8: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x00a6, # BROKEN BAR
0x00de: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00e3: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00e4: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x00e5: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x00fe, # LATIN SMALL LETTER THORN
0x00e8: 0x00de, # LATIN CAPITAL LETTER THORN
0x00e9: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00ea: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00eb: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x00ec: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x00ed: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00ee: 0x00af, # MACRON
0x00ef: 0x00b4, # ACUTE ACCENT
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2017, # DOUBLE LOW LINE
0x00f3: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00f4: 0x00b6, # PILCROW SIGN
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x00b8, # CEDILLA
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x00b9, # SUPERSCRIPT ONE
0x00fc: 0x00b3, # SUPERSCRIPT THREE
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
'\xa3' # 0x009c -> POUND SIGN
'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
'\xd7' # 0x009e -> MULTIPLICATION SIGN
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\xae' # 0x00a9 -> REGISTERED SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\xc1' # 0x00b5 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0x00b6 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc0' # 0x00b7 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xa9' # 0x00b8 -> COPYRIGHT SIGN
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\xa2' # 0x00bd -> CENT SIGN
'\xa5' # 0x00be -> YEN SIGN
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\xe3' # 0x00c6 -> LATIN SMALL LETTER A WITH TILDE
'\xc3' # 0x00c7 -> LATIN CAPITAL LETTER A WITH TILDE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa4' # 0x00cf -> CURRENCY SIGN
'\xf0' # 0x00d0 -> LATIN SMALL LETTER ETH
'\xd0' # 0x00d1 -> LATIN CAPITAL LETTER ETH
'\xca' # 0x00d2 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0x00d3 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0x00d4 -> LATIN CAPITAL LETTER E WITH GRAVE
'\u20ac' # 0x00d5 -> EURO SIGN
'\xcd' # 0x00d6 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0x00d7 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0x00d8 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\xa6' # 0x00dd -> BROKEN BAR
'\xcc' # 0x00de -> LATIN CAPITAL LETTER I WITH GRAVE
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\xd3' # 0x00e0 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\xd4' # 0x00e2 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd2' # 0x00e3 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xf5' # 0x00e4 -> LATIN SMALL LETTER O WITH TILDE
'\xd5' # 0x00e5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xb5' # 0x00e6 -> MICRO SIGN
'\xfe' # 0x00e7 -> LATIN SMALL LETTER THORN
'\xde' # 0x00e8 -> LATIN CAPITAL LETTER THORN
'\xda' # 0x00e9 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0x00ea -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0x00eb -> LATIN CAPITAL LETTER U WITH GRAVE
'\xfd' # 0x00ec -> LATIN SMALL LETTER Y WITH ACUTE
'\xdd' # 0x00ed -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xaf' # 0x00ee -> MACRON
'\xb4' # 0x00ef -> ACUTE ACCENT
'\xad' # 0x00f0 -> SOFT HYPHEN
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2017' # 0x00f2 -> DOUBLE LOW LINE
'\xbe' # 0x00f3 -> VULGAR FRACTION THREE QUARTERS
'\xb6' # 0x00f4 -> PILCROW SIGN
'\xa7' # 0x00f5 -> SECTION SIGN
'\xf7' # 0x00f6 -> DIVISION SIGN
'\xb8' # 0x00f7 -> CEDILLA
'\xb0' # 0x00f8 -> DEGREE SIGN
'\xa8' # 0x00f9 -> DIAERESIS
'\xb7' # 0x00fa -> MIDDLE DOT
'\xb9' # 0x00fb -> SUPERSCRIPT ONE
'\xb3' # 0x00fc -> SUPERSCRIPT THREE
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x00bd, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00cf, # CURRENCY SIGN
0x00a5: 0x00be, # YEN SIGN
0x00a6: 0x00dd, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x00b8, # COPYRIGHT SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00ae: 0x00a9, # REGISTERED SIGN
0x00af: 0x00ee, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00fc, # SUPERSCRIPT THREE
0x00b4: 0x00ef, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x00f4, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00f7, # CEDILLA
0x00b9: 0x00fb, # SUPERSCRIPT ONE
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00f3, # VULGAR FRACTION THREE QUARTERS
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x00b7, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x00b5, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x00b6, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x00c7, # LATIN CAPITAL LETTER A WITH TILDE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x00d4, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x00d2, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x00d3, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00cc: 0x00de, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x00d6, # LATIN CAPITAL LETTER I WITH ACUTE
0x00ce: 0x00d7, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x00d8, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d0: 0x00d1, # LATIN CAPITAL LETTER ETH
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00e3, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x00e0, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x00e2, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x00e5, # LATIN CAPITAL LETTER O WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d7: 0x009e, # MULTIPLICATION SIGN
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00d9: 0x00eb, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x00e9, # LATIN CAPITAL LETTER U WITH ACUTE
0x00db: 0x00ea, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x00ed, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00de: 0x00e8, # LATIN CAPITAL LETTER THORN
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x00c6, # LATIN SMALL LETTER A WITH TILDE
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f0: 0x00d0, # LATIN SMALL LETTER ETH
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x00e4, # LATIN SMALL LETTER O WITH TILDE
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x00ec, # LATIN SMALL LETTER Y WITH ACUTE
0x00fe: 0x00e7, # LATIN SMALL LETTER THORN
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x20ac: 0x00d5, # EURO SIGN
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x2017: 0x00f2, # DOUBLE LOW LINE
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP860.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp860',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e3, # LATIN SMALL LETTER A WITH TILDE
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x008c: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c3, # LATIN CAPITAL LETTER A WITH TILDE
0x008f: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x0092: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f5, # LATIN SMALL LETTER O WITH TILDE
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00cc, # LATIN CAPITAL LETTER I WITH GRAVE
0x0099: 0x00d5, # LATIN CAPITAL LETTER O WITH TILDE
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x00d2, # LATIN CAPITAL LETTER O WITH GRAVE
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0x0084 -> LATIN SMALL LETTER A WITH TILDE
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xc1' # 0x0086 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xca' # 0x0089 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xcd' # 0x008b -> LATIN CAPITAL LETTER I WITH ACUTE
'\xd4' # 0x008c -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
'\xc3' # 0x008e -> LATIN CAPITAL LETTER A WITH TILDE
'\xc2' # 0x008f -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xc0' # 0x0091 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc8' # 0x0092 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0x0094 -> LATIN SMALL LETTER O WITH TILDE
'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
'\xda' # 0x0096 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\xcc' # 0x0098 -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd5' # 0x0099 -> LATIN CAPITAL LETTER O WITH TILDE
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xa2' # 0x009b -> CENT SIGN
'\xa3' # 0x009c -> POUND SIGN
'\xd9' # 0x009d -> LATIN CAPITAL LETTER U WITH GRAVE
'\u20a7' # 0x009e -> PESETA SIGN
'\xd3' # 0x009f -> LATIN CAPITAL LETTER O WITH ACUTE
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\xd2' # 0x00a9 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
'\xb5' # 0x00e6 -> MICRO SIGN
'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
'\u221e' # 0x00ec -> INFINITY
'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
'\u2229' # 0x00ef -> INTERSECTION
'\u2261' # 0x00f0 -> IDENTICAL TO
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c0: 0x0091, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c1: 0x0086, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c2: 0x008f, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c3: 0x008e, # LATIN CAPITAL LETTER A WITH TILDE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x0092, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x0089, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cc: 0x0098, # LATIN CAPITAL LETTER I WITH GRAVE
0x00cd: 0x008b, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d2: 0x00a9, # LATIN CAPITAL LETTER O WITH GRAVE
0x00d3: 0x009f, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d4: 0x008c, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d5: 0x0099, # LATIN CAPITAL LETTER O WITH TILDE
0x00d9: 0x009d, # LATIN CAPITAL LETTER U WITH GRAVE
0x00da: 0x0096, # LATIN CAPITAL LETTER U WITH ACUTE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e3: 0x0084, # LATIN SMALL LETTER A WITH TILDE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f5: 0x0094, # LATIN SMALL LETTER O WITH TILDE
0x00f7: 0x00f6, # DIVISION SIGN
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP861.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp861',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00d0, # LATIN CAPITAL LETTER ETH
0x008c: 0x00f0, # LATIN SMALL LETTER ETH
0x008d: 0x00de, # LATIN CAPITAL LETTER THORN
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00fe, # LATIN SMALL LETTER THORN
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00dd, # LATIN CAPITAL LETTER Y WITH ACUTE
0x0098: 0x00fd, # LATIN SMALL LETTER Y WITH ACUTE
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00c1, # LATIN CAPITAL LETTER A WITH ACUTE
0x00a5: 0x00cd, # LATIN CAPITAL LETTER I WITH ACUTE
0x00a6: 0x00d3, # LATIN CAPITAL LETTER O WITH ACUTE
0x00a7: 0x00da, # LATIN CAPITAL LETTER U WITH ACUTE
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xd0' # 0x008b -> LATIN CAPITAL LETTER ETH
'\xf0' # 0x008c -> LATIN SMALL LETTER ETH
'\xde' # 0x008d -> LATIN CAPITAL LETTER THORN
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xfe' # 0x0095 -> LATIN SMALL LETTER THORN
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xdd' # 0x0097 -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xfd' # 0x0098 -> LATIN SMALL LETTER Y WITH ACUTE
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
'\xa3' # 0x009c -> POUND SIGN
'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
'\u20a7' # 0x009e -> PESETA SIGN
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xc1' # 0x00a4 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xcd' # 0x00a5 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xd3' # 0x00a6 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xda' # 0x00a7 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\u2310' # 0x00a9 -> REVERSED NOT SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
'\xb5' # 0x00e6 -> MICRO SIGN
'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
'\u221e' # 0x00ec -> INFINITY
'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
'\u2229' # 0x00ef -> INTERSECTION
'\u2261' # 0x00f0 -> IDENTICAL TO
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a3: 0x009c, # POUND SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c1: 0x00a4, # LATIN CAPITAL LETTER A WITH ACUTE
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00cd: 0x00a5, # LATIN CAPITAL LETTER I WITH ACUTE
0x00d0: 0x008b, # LATIN CAPITAL LETTER ETH
0x00d3: 0x00a6, # LATIN CAPITAL LETTER O WITH ACUTE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00da: 0x00a7, # LATIN CAPITAL LETTER U WITH ACUTE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00dd: 0x0097, # LATIN CAPITAL LETTER Y WITH ACUTE
0x00de: 0x008d, # LATIN CAPITAL LETTER THORN
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00f0: 0x008c, # LATIN SMALL LETTER ETH
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00fd: 0x0098, # LATIN SMALL LETTER Y WITH ACUTE
0x00fe: 0x0095, # LATIN SMALL LETTER THORN
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP862.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp862',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x05d0, # HEBREW LETTER ALEF
0x0081: 0x05d1, # HEBREW LETTER BET
0x0082: 0x05d2, # HEBREW LETTER GIMEL
0x0083: 0x05d3, # HEBREW LETTER DALET
0x0084: 0x05d4, # HEBREW LETTER HE
0x0085: 0x05d5, # HEBREW LETTER VAV
0x0086: 0x05d6, # HEBREW LETTER ZAYIN
0x0087: 0x05d7, # HEBREW LETTER HET
0x0088: 0x05d8, # HEBREW LETTER TET
0x0089: 0x05d9, # HEBREW LETTER YOD
0x008a: 0x05da, # HEBREW LETTER FINAL KAF
0x008b: 0x05db, # HEBREW LETTER KAF
0x008c: 0x05dc, # HEBREW LETTER LAMED
0x008d: 0x05dd, # HEBREW LETTER FINAL MEM
0x008e: 0x05de, # HEBREW LETTER MEM
0x008f: 0x05df, # HEBREW LETTER FINAL NUN
0x0090: 0x05e0, # HEBREW LETTER NUN
0x0091: 0x05e1, # HEBREW LETTER SAMEKH
0x0092: 0x05e2, # HEBREW LETTER AYIN
0x0093: 0x05e3, # HEBREW LETTER FINAL PE
0x0094: 0x05e4, # HEBREW LETTER PE
0x0095: 0x05e5, # HEBREW LETTER FINAL TSADI
0x0096: 0x05e6, # HEBREW LETTER TSADI
0x0097: 0x05e7, # HEBREW LETTER QOF
0x0098: 0x05e8, # HEBREW LETTER RESH
0x0099: 0x05e9, # HEBREW LETTER SHIN
0x009a: 0x05ea, # HEBREW LETTER TAV
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00a5, # YEN SIGN
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\u05d0' # 0x0080 -> HEBREW LETTER ALEF
'\u05d1' # 0x0081 -> HEBREW LETTER BET
'\u05d2' # 0x0082 -> HEBREW LETTER GIMEL
'\u05d3' # 0x0083 -> HEBREW LETTER DALET
'\u05d4' # 0x0084 -> HEBREW LETTER HE
'\u05d5' # 0x0085 -> HEBREW LETTER VAV
'\u05d6' # 0x0086 -> HEBREW LETTER ZAYIN
'\u05d7' # 0x0087 -> HEBREW LETTER HET
'\u05d8' # 0x0088 -> HEBREW LETTER TET
'\u05d9' # 0x0089 -> HEBREW LETTER YOD
'\u05da' # 0x008a -> HEBREW LETTER FINAL KAF
'\u05db' # 0x008b -> HEBREW LETTER KAF
'\u05dc' # 0x008c -> HEBREW LETTER LAMED
'\u05dd' # 0x008d -> HEBREW LETTER FINAL MEM
'\u05de' # 0x008e -> HEBREW LETTER MEM
'\u05df' # 0x008f -> HEBREW LETTER FINAL NUN
'\u05e0' # 0x0090 -> HEBREW LETTER NUN
'\u05e1' # 0x0091 -> HEBREW LETTER SAMEKH
'\u05e2' # 0x0092 -> HEBREW LETTER AYIN
'\u05e3' # 0x0093 -> HEBREW LETTER FINAL PE
'\u05e4' # 0x0094 -> HEBREW LETTER PE
'\u05e5' # 0x0095 -> HEBREW LETTER FINAL TSADI
'\u05e6' # 0x0096 -> HEBREW LETTER TSADI
'\u05e7' # 0x0097 -> HEBREW LETTER QOF
'\u05e8' # 0x0098 -> HEBREW LETTER RESH
'\u05e9' # 0x0099 -> HEBREW LETTER SHIN
'\u05ea' # 0x009a -> HEBREW LETTER TAV
'\xa2' # 0x009b -> CENT SIGN
'\xa3' # 0x009c -> POUND SIGN
'\xa5' # 0x009d -> YEN SIGN
'\u20a7' # 0x009e -> PESETA SIGN
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\u2310' # 0x00a9 -> REVERSED NOT SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S (GERMAN)
'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
'\xb5' # 0x00e6 -> MICRO SIGN
'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
'\u221e' # 0x00ec -> INFINITY
'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
'\u2229' # 0x00ef -> INTERSECTION
'\u2261' # 0x00f0 -> IDENTICAL TO
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a5: 0x009d, # YEN SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S (GERMAN)
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f7: 0x00f6, # DIVISION SIGN
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x05d0: 0x0080, # HEBREW LETTER ALEF
0x05d1: 0x0081, # HEBREW LETTER BET
0x05d2: 0x0082, # HEBREW LETTER GIMEL
0x05d3: 0x0083, # HEBREW LETTER DALET
0x05d4: 0x0084, # HEBREW LETTER HE
0x05d5: 0x0085, # HEBREW LETTER VAV
0x05d6: 0x0086, # HEBREW LETTER ZAYIN
0x05d7: 0x0087, # HEBREW LETTER HET
0x05d8: 0x0088, # HEBREW LETTER TET
0x05d9: 0x0089, # HEBREW LETTER YOD
0x05da: 0x008a, # HEBREW LETTER FINAL KAF
0x05db: 0x008b, # HEBREW LETTER KAF
0x05dc: 0x008c, # HEBREW LETTER LAMED
0x05dd: 0x008d, # HEBREW LETTER FINAL MEM
0x05de: 0x008e, # HEBREW LETTER MEM
0x05df: 0x008f, # HEBREW LETTER FINAL NUN
0x05e0: 0x0090, # HEBREW LETTER NUN
0x05e1: 0x0091, # HEBREW LETTER SAMEKH
0x05e2: 0x0092, # HEBREW LETTER AYIN
0x05e3: 0x0093, # HEBREW LETTER FINAL PE
0x05e4: 0x0094, # HEBREW LETTER PE
0x05e5: 0x0095, # HEBREW LETTER FINAL TSADI
0x05e6: 0x0096, # HEBREW LETTER TSADI
0x05e7: 0x0097, # HEBREW LETTER QOF
0x05e8: 0x0098, # HEBREW LETTER RESH
0x05e9: 0x0099, # HEBREW LETTER SHIN
0x05ea: 0x009a, # HEBREW LETTER TAV
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP863.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp863',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00c2, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00b6, # PILCROW SIGN
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x2017, # DOUBLE LOW LINE
0x008e: 0x00c0, # LATIN CAPITAL LETTER A WITH GRAVE
0x008f: 0x00a7, # SECTION SIGN
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00c8, # LATIN CAPITAL LETTER E WITH GRAVE
0x0092: 0x00ca, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00cb, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x0095: 0x00cf, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00a4, # CURRENCY SIGN
0x0099: 0x00d4, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00a2, # CENT SIGN
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d9, # LATIN CAPITAL LETTER U WITH GRAVE
0x009e: 0x00db, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00a6, # BROKEN BAR
0x00a1: 0x00b4, # ACUTE ACCENT
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00a8, # DIAERESIS
0x00a5: 0x00b8, # CEDILLA
0x00a6: 0x00b3, # SUPERSCRIPT THREE
0x00a7: 0x00af, # MACRON
0x00a8: 0x00ce, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00be, # VULGAR FRACTION THREE QUARTERS
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xc2' # 0x0084 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xb6' # 0x0086 -> PILCROW SIGN
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\u2017' # 0x008d -> DOUBLE LOW LINE
'\xc0' # 0x008e -> LATIN CAPITAL LETTER A WITH GRAVE
'\xa7' # 0x008f -> SECTION SIGN
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xc8' # 0x0091 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xca' # 0x0092 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xcb' # 0x0094 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcf' # 0x0095 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\xa4' # 0x0098 -> CURRENCY SIGN
'\xd4' # 0x0099 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xa2' # 0x009b -> CENT SIGN
'\xa3' # 0x009c -> POUND SIGN
'\xd9' # 0x009d -> LATIN CAPITAL LETTER U WITH GRAVE
'\xdb' # 0x009e -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xa6' # 0x00a0 -> BROKEN BAR
'\xb4' # 0x00a1 -> ACUTE ACCENT
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xa8' # 0x00a4 -> DIAERESIS
'\xb8' # 0x00a5 -> CEDILLA
'\xb3' # 0x00a6 -> SUPERSCRIPT THREE
'\xaf' # 0x00a7 -> MACRON
'\xce' # 0x00a8 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\u2310' # 0x00a9 -> REVERSED NOT SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xbe' # 0x00ad -> VULGAR FRACTION THREE QUARTERS
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
'\xb5' # 0x00e6 -> MICRO SIGN
'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
'\u221e' # 0x00ec -> INFINITY
'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
'\u2229' # 0x00ef -> INTERSECTION
'\u2261' # 0x00f0 -> IDENTICAL TO
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a2: 0x009b, # CENT SIGN
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x0098, # CURRENCY SIGN
0x00a6: 0x00a0, # BROKEN BAR
0x00a7: 0x008f, # SECTION SIGN
0x00a8: 0x00a4, # DIAERESIS
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00af: 0x00a7, # MACRON
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b3: 0x00a6, # SUPERSCRIPT THREE
0x00b4: 0x00a1, # ACUTE ACCENT
0x00b5: 0x00e6, # MICRO SIGN
0x00b6: 0x0086, # PILCROW SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00b8: 0x00a5, # CEDILLA
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00be: 0x00ad, # VULGAR FRACTION THREE QUARTERS
0x00c0: 0x008e, # LATIN CAPITAL LETTER A WITH GRAVE
0x00c2: 0x0084, # LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c8: 0x0091, # LATIN CAPITAL LETTER E WITH GRAVE
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00ca: 0x0092, # LATIN CAPITAL LETTER E WITH CIRCUMFLEX
0x00cb: 0x0094, # LATIN CAPITAL LETTER E WITH DIAERESIS
0x00ce: 0x00a8, # LATIN CAPITAL LETTER I WITH CIRCUMFLEX
0x00cf: 0x0095, # LATIN CAPITAL LETTER I WITH DIAERESIS
0x00d4: 0x0099, # LATIN CAPITAL LETTER O WITH CIRCUMFLEX
0x00d9: 0x009d, # LATIN CAPITAL LETTER U WITH GRAVE
0x00db: 0x009e, # LATIN CAPITAL LETTER U WITH CIRCUMFLEX
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f7: 0x00f6, # DIVISION SIGN
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x2017: 0x008d, # DOUBLE LOW LINE
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP864.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp864',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0025: 0x066a, # ARABIC PERCENT SIGN
0x0080: 0x00b0, # DEGREE SIGN
0x0081: 0x00b7, # MIDDLE DOT
0x0082: 0x2219, # BULLET OPERATOR
0x0083: 0x221a, # SQUARE ROOT
0x0084: 0x2592, # MEDIUM SHADE
0x0085: 0x2500, # FORMS LIGHT HORIZONTAL
0x0086: 0x2502, # FORMS LIGHT VERTICAL
0x0087: 0x253c, # FORMS LIGHT VERTICAL AND HORIZONTAL
0x0088: 0x2524, # FORMS LIGHT VERTICAL AND LEFT
0x0089: 0x252c, # FORMS LIGHT DOWN AND HORIZONTAL
0x008a: 0x251c, # FORMS LIGHT VERTICAL AND RIGHT
0x008b: 0x2534, # FORMS LIGHT UP AND HORIZONTAL
0x008c: 0x2510, # FORMS LIGHT DOWN AND LEFT
0x008d: 0x250c, # FORMS LIGHT DOWN AND RIGHT
0x008e: 0x2514, # FORMS LIGHT UP AND RIGHT
0x008f: 0x2518, # FORMS LIGHT UP AND LEFT
0x0090: 0x03b2, # GREEK SMALL BETA
0x0091: 0x221e, # INFINITY
0x0092: 0x03c6, # GREEK SMALL PHI
0x0093: 0x00b1, # PLUS-OR-MINUS SIGN
0x0094: 0x00bd, # FRACTION 1/2
0x0095: 0x00bc, # FRACTION 1/4
0x0096: 0x2248, # ALMOST EQUAL TO
0x0097: 0x00ab, # LEFT POINTING GUILLEMET
0x0098: 0x00bb, # RIGHT POINTING GUILLEMET
0x0099: 0xfef7, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
0x009a: 0xfef8, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
0x009b: None, # UNDEFINED
0x009c: None, # UNDEFINED
0x009d: 0xfefb, # ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
0x009e: 0xfefc, # ARABIC LIGATURE LAM WITH ALEF FINAL FORM
0x009f: None, # UNDEFINED
0x00a1: 0x00ad, # SOFT HYPHEN
0x00a2: 0xfe82, # ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
0x00a5: 0xfe84, # ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
0x00a6: None, # UNDEFINED
0x00a7: None, # UNDEFINED
0x00a8: 0xfe8e, # ARABIC LETTER ALEF FINAL FORM
0x00a9: 0xfe8f, # ARABIC LETTER BEH ISOLATED FORM
0x00aa: 0xfe95, # ARABIC LETTER TEH ISOLATED FORM
0x00ab: 0xfe99, # ARABIC LETTER THEH ISOLATED FORM
0x00ac: 0x060c, # ARABIC COMMA
0x00ad: 0xfe9d, # ARABIC LETTER JEEM ISOLATED FORM
0x00ae: 0xfea1, # ARABIC LETTER HAH ISOLATED FORM
0x00af: 0xfea5, # ARABIC LETTER KHAH ISOLATED FORM
0x00b0: 0x0660, # ARABIC-INDIC DIGIT ZERO
0x00b1: 0x0661, # ARABIC-INDIC DIGIT ONE
0x00b2: 0x0662, # ARABIC-INDIC DIGIT TWO
0x00b3: 0x0663, # ARABIC-INDIC DIGIT THREE
0x00b4: 0x0664, # ARABIC-INDIC DIGIT FOUR
0x00b5: 0x0665, # ARABIC-INDIC DIGIT FIVE
0x00b6: 0x0666, # ARABIC-INDIC DIGIT SIX
0x00b7: 0x0667, # ARABIC-INDIC DIGIT SEVEN
0x00b8: 0x0668, # ARABIC-INDIC DIGIT EIGHT
0x00b9: 0x0669, # ARABIC-INDIC DIGIT NINE
0x00ba: 0xfed1, # ARABIC LETTER FEH ISOLATED FORM
0x00bb: 0x061b, # ARABIC SEMICOLON
0x00bc: 0xfeb1, # ARABIC LETTER SEEN ISOLATED FORM
0x00bd: 0xfeb5, # ARABIC LETTER SHEEN ISOLATED FORM
0x00be: 0xfeb9, # ARABIC LETTER SAD ISOLATED FORM
0x00bf: 0x061f, # ARABIC QUESTION MARK
0x00c0: 0x00a2, # CENT SIGN
0x00c1: 0xfe80, # ARABIC LETTER HAMZA ISOLATED FORM
0x00c2: 0xfe81, # ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
0x00c3: 0xfe83, # ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
0x00c4: 0xfe85, # ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
0x00c5: 0xfeca, # ARABIC LETTER AIN FINAL FORM
0x00c6: 0xfe8b, # ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
0x00c7: 0xfe8d, # ARABIC LETTER ALEF ISOLATED FORM
0x00c8: 0xfe91, # ARABIC LETTER BEH INITIAL FORM
0x00c9: 0xfe93, # ARABIC LETTER TEH MARBUTA ISOLATED FORM
0x00ca: 0xfe97, # ARABIC LETTER TEH INITIAL FORM
0x00cb: 0xfe9b, # ARABIC LETTER THEH INITIAL FORM
0x00cc: 0xfe9f, # ARABIC LETTER JEEM INITIAL FORM
0x00cd: 0xfea3, # ARABIC LETTER HAH INITIAL FORM
0x00ce: 0xfea7, # ARABIC LETTER KHAH INITIAL FORM
0x00cf: 0xfea9, # ARABIC LETTER DAL ISOLATED FORM
0x00d0: 0xfeab, # ARABIC LETTER THAL ISOLATED FORM
0x00d1: 0xfead, # ARABIC LETTER REH ISOLATED FORM
0x00d2: 0xfeaf, # ARABIC LETTER ZAIN ISOLATED FORM
0x00d3: 0xfeb3, # ARABIC LETTER SEEN INITIAL FORM
0x00d4: 0xfeb7, # ARABIC LETTER SHEEN INITIAL FORM
0x00d5: 0xfebb, # ARABIC LETTER SAD INITIAL FORM
0x00d6: 0xfebf, # ARABIC LETTER DAD INITIAL FORM
0x00d7: 0xfec1, # ARABIC LETTER TAH ISOLATED FORM
0x00d8: 0xfec5, # ARABIC LETTER ZAH ISOLATED FORM
0x00d9: 0xfecb, # ARABIC LETTER AIN INITIAL FORM
0x00da: 0xfecf, # ARABIC LETTER GHAIN INITIAL FORM
0x00db: 0x00a6, # BROKEN VERTICAL BAR
0x00dc: 0x00ac, # NOT SIGN
0x00dd: 0x00f7, # DIVISION SIGN
0x00de: 0x00d7, # MULTIPLICATION SIGN
0x00df: 0xfec9, # ARABIC LETTER AIN ISOLATED FORM
0x00e0: 0x0640, # ARABIC TATWEEL
0x00e1: 0xfed3, # ARABIC LETTER FEH INITIAL FORM
0x00e2: 0xfed7, # ARABIC LETTER QAF INITIAL FORM
0x00e3: 0xfedb, # ARABIC LETTER KAF INITIAL FORM
0x00e4: 0xfedf, # ARABIC LETTER LAM INITIAL FORM
0x00e5: 0xfee3, # ARABIC LETTER MEEM INITIAL FORM
0x00e6: 0xfee7, # ARABIC LETTER NOON INITIAL FORM
0x00e7: 0xfeeb, # ARABIC LETTER HEH INITIAL FORM
0x00e8: 0xfeed, # ARABIC LETTER WAW ISOLATED FORM
0x00e9: 0xfeef, # ARABIC LETTER ALEF MAKSURA ISOLATED FORM
0x00ea: 0xfef3, # ARABIC LETTER YEH INITIAL FORM
0x00eb: 0xfebd, # ARABIC LETTER DAD ISOLATED FORM
0x00ec: 0xfecc, # ARABIC LETTER AIN MEDIAL FORM
0x00ed: 0xfece, # ARABIC LETTER GHAIN FINAL FORM
0x00ee: 0xfecd, # ARABIC LETTER GHAIN ISOLATED FORM
0x00ef: 0xfee1, # ARABIC LETTER MEEM ISOLATED FORM
0x00f0: 0xfe7d, # ARABIC SHADDA MEDIAL FORM
0x00f1: 0x0651, # ARABIC SHADDAH
0x00f2: 0xfee5, # ARABIC LETTER NOON ISOLATED FORM
0x00f3: 0xfee9, # ARABIC LETTER HEH ISOLATED FORM
0x00f4: 0xfeec, # ARABIC LETTER HEH MEDIAL FORM
0x00f5: 0xfef0, # ARABIC LETTER ALEF MAKSURA FINAL FORM
0x00f6: 0xfef2, # ARABIC LETTER YEH FINAL FORM
0x00f7: 0xfed0, # ARABIC LETTER GHAIN MEDIAL FORM
0x00f8: 0xfed5, # ARABIC LETTER QAF ISOLATED FORM
0x00f9: 0xfef5, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
0x00fa: 0xfef6, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
0x00fb: 0xfedd, # ARABIC LETTER LAM ISOLATED FORM
0x00fc: 0xfed9, # ARABIC LETTER KAF ISOLATED FORM
0x00fd: 0xfef1, # ARABIC LETTER YEH ISOLATED FORM
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: None, # UNDEFINED
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'\u066a' # 0x0025 -> ARABIC PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xb0' # 0x0080 -> DEGREE SIGN
'\xb7' # 0x0081 -> MIDDLE DOT
'\u2219' # 0x0082 -> BULLET OPERATOR
'\u221a' # 0x0083 -> SQUARE ROOT
'\u2592' # 0x0084 -> MEDIUM SHADE
'\u2500' # 0x0085 -> FORMS LIGHT HORIZONTAL
'\u2502' # 0x0086 -> FORMS LIGHT VERTICAL
'\u253c' # 0x0087 -> FORMS LIGHT VERTICAL AND HORIZONTAL
'\u2524' # 0x0088 -> FORMS LIGHT VERTICAL AND LEFT
'\u252c' # 0x0089 -> FORMS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x008a -> FORMS LIGHT VERTICAL AND RIGHT
'\u2534' # 0x008b -> FORMS LIGHT UP AND HORIZONTAL
'\u2510' # 0x008c -> FORMS LIGHT DOWN AND LEFT
'\u250c' # 0x008d -> FORMS LIGHT DOWN AND RIGHT
'\u2514' # 0x008e -> FORMS LIGHT UP AND RIGHT
'\u2518' # 0x008f -> FORMS LIGHT UP AND LEFT
'\u03b2' # 0x0090 -> GREEK SMALL BETA
'\u221e' # 0x0091 -> INFINITY
'\u03c6' # 0x0092 -> GREEK SMALL PHI
'\xb1' # 0x0093 -> PLUS-OR-MINUS SIGN
'\xbd' # 0x0094 -> FRACTION 1/2
'\xbc' # 0x0095 -> FRACTION 1/4
'\u2248' # 0x0096 -> ALMOST EQUAL TO
'\xab' # 0x0097 -> LEFT POINTING GUILLEMET
'\xbb' # 0x0098 -> RIGHT POINTING GUILLEMET
'\ufef7' # 0x0099 -> ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
'\ufef8' # 0x009a -> ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
'\ufffe' # 0x009b -> UNDEFINED
'\ufffe' # 0x009c -> UNDEFINED
'\ufefb' # 0x009d -> ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
'\ufefc' # 0x009e -> ARABIC LIGATURE LAM WITH ALEF FINAL FORM
'\ufffe' # 0x009f -> UNDEFINED
'\xa0' # 0x00a0 -> NON-BREAKING SPACE
'\xad' # 0x00a1 -> SOFT HYPHEN
'\ufe82' # 0x00a2 -> ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
'\xa3' # 0x00a3 -> POUND SIGN
'\xa4' # 0x00a4 -> CURRENCY SIGN
'\ufe84' # 0x00a5 -> ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
'\ufffe' # 0x00a6 -> UNDEFINED
'\ufffe' # 0x00a7 -> UNDEFINED
'\ufe8e' # 0x00a8 -> ARABIC LETTER ALEF FINAL FORM
'\ufe8f' # 0x00a9 -> ARABIC LETTER BEH ISOLATED FORM
'\ufe95' # 0x00aa -> ARABIC LETTER TEH ISOLATED FORM
'\ufe99' # 0x00ab -> ARABIC LETTER THEH ISOLATED FORM
'\u060c' # 0x00ac -> ARABIC COMMA
'\ufe9d' # 0x00ad -> ARABIC LETTER JEEM ISOLATED FORM
'\ufea1' # 0x00ae -> ARABIC LETTER HAH ISOLATED FORM
'\ufea5' # 0x00af -> ARABIC LETTER KHAH ISOLATED FORM
'\u0660' # 0x00b0 -> ARABIC-INDIC DIGIT ZERO
'\u0661' # 0x00b1 -> ARABIC-INDIC DIGIT ONE
'\u0662' # 0x00b2 -> ARABIC-INDIC DIGIT TWO
'\u0663' # 0x00b3 -> ARABIC-INDIC DIGIT THREE
'\u0664' # 0x00b4 -> ARABIC-INDIC DIGIT FOUR
'\u0665' # 0x00b5 -> ARABIC-INDIC DIGIT FIVE
'\u0666' # 0x00b6 -> ARABIC-INDIC DIGIT SIX
'\u0667' # 0x00b7 -> ARABIC-INDIC DIGIT SEVEN
'\u0668' # 0x00b8 -> ARABIC-INDIC DIGIT EIGHT
'\u0669' # 0x00b9 -> ARABIC-INDIC DIGIT NINE
'\ufed1' # 0x00ba -> ARABIC LETTER FEH ISOLATED FORM
'\u061b' # 0x00bb -> ARABIC SEMICOLON
'\ufeb1' # 0x00bc -> ARABIC LETTER SEEN ISOLATED FORM
'\ufeb5' # 0x00bd -> ARABIC LETTER SHEEN ISOLATED FORM
'\ufeb9' # 0x00be -> ARABIC LETTER SAD ISOLATED FORM
'\u061f' # 0x00bf -> ARABIC QUESTION MARK
'\xa2' # 0x00c0 -> CENT SIGN
'\ufe80' # 0x00c1 -> ARABIC LETTER HAMZA ISOLATED FORM
'\ufe81' # 0x00c2 -> ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
'\ufe83' # 0x00c3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
'\ufe85' # 0x00c4 -> ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
'\ufeca' # 0x00c5 -> ARABIC LETTER AIN FINAL FORM
'\ufe8b' # 0x00c6 -> ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
'\ufe8d' # 0x00c7 -> ARABIC LETTER ALEF ISOLATED FORM
'\ufe91' # 0x00c8 -> ARABIC LETTER BEH INITIAL FORM
'\ufe93' # 0x00c9 -> ARABIC LETTER TEH MARBUTA ISOLATED FORM
'\ufe97' # 0x00ca -> ARABIC LETTER TEH INITIAL FORM
'\ufe9b' # 0x00cb -> ARABIC LETTER THEH INITIAL FORM
'\ufe9f' # 0x00cc -> ARABIC LETTER JEEM INITIAL FORM
'\ufea3' # 0x00cd -> ARABIC LETTER HAH INITIAL FORM
'\ufea7' # 0x00ce -> ARABIC LETTER KHAH INITIAL FORM
'\ufea9' # 0x00cf -> ARABIC LETTER DAL ISOLATED FORM
'\ufeab' # 0x00d0 -> ARABIC LETTER THAL ISOLATED FORM
'\ufead' # 0x00d1 -> ARABIC LETTER REH ISOLATED FORM
'\ufeaf' # 0x00d2 -> ARABIC LETTER ZAIN ISOLATED FORM
'\ufeb3' # 0x00d3 -> ARABIC LETTER SEEN INITIAL FORM
'\ufeb7' # 0x00d4 -> ARABIC LETTER SHEEN INITIAL FORM
'\ufebb' # 0x00d5 -> ARABIC LETTER SAD INITIAL FORM
'\ufebf' # 0x00d6 -> ARABIC LETTER DAD INITIAL FORM
'\ufec1' # 0x00d7 -> ARABIC LETTER TAH ISOLATED FORM
'\ufec5' # 0x00d8 -> ARABIC LETTER ZAH ISOLATED FORM
'\ufecb' # 0x00d9 -> ARABIC LETTER AIN INITIAL FORM
'\ufecf' # 0x00da -> ARABIC LETTER GHAIN INITIAL FORM
'\xa6' # 0x00db -> BROKEN VERTICAL BAR
'\xac' # 0x00dc -> NOT SIGN
'\xf7' # 0x00dd -> DIVISION SIGN
'\xd7' # 0x00de -> MULTIPLICATION SIGN
'\ufec9' # 0x00df -> ARABIC LETTER AIN ISOLATED FORM
'\u0640' # 0x00e0 -> ARABIC TATWEEL
'\ufed3' # 0x00e1 -> ARABIC LETTER FEH INITIAL FORM
'\ufed7' # 0x00e2 -> ARABIC LETTER QAF INITIAL FORM
'\ufedb' # 0x00e3 -> ARABIC LETTER KAF INITIAL FORM
'\ufedf' # 0x00e4 -> ARABIC LETTER LAM INITIAL FORM
'\ufee3' # 0x00e5 -> ARABIC LETTER MEEM INITIAL FORM
'\ufee7' # 0x00e6 -> ARABIC LETTER NOON INITIAL FORM
'\ufeeb' # 0x00e7 -> ARABIC LETTER HEH INITIAL FORM
'\ufeed' # 0x00e8 -> ARABIC LETTER WAW ISOLATED FORM
'\ufeef' # 0x00e9 -> ARABIC LETTER ALEF MAKSURA ISOLATED FORM
'\ufef3' # 0x00ea -> ARABIC LETTER YEH INITIAL FORM
'\ufebd' # 0x00eb -> ARABIC LETTER DAD ISOLATED FORM
'\ufecc' # 0x00ec -> ARABIC LETTER AIN MEDIAL FORM
'\ufece' # 0x00ed -> ARABIC LETTER GHAIN FINAL FORM
'\ufecd' # 0x00ee -> ARABIC LETTER GHAIN ISOLATED FORM
'\ufee1' # 0x00ef -> ARABIC LETTER MEEM ISOLATED FORM
'\ufe7d' # 0x00f0 -> ARABIC SHADDA MEDIAL FORM
'\u0651' # 0x00f1 -> ARABIC SHADDAH
'\ufee5' # 0x00f2 -> ARABIC LETTER NOON ISOLATED FORM
'\ufee9' # 0x00f3 -> ARABIC LETTER HEH ISOLATED FORM
'\ufeec' # 0x00f4 -> ARABIC LETTER HEH MEDIAL FORM
'\ufef0' # 0x00f5 -> ARABIC LETTER ALEF MAKSURA FINAL FORM
'\ufef2' # 0x00f6 -> ARABIC LETTER YEH FINAL FORM
'\ufed0' # 0x00f7 -> ARABIC LETTER GHAIN MEDIAL FORM
'\ufed5' # 0x00f8 -> ARABIC LETTER QAF ISOLATED FORM
'\ufef5' # 0x00f9 -> ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
'\ufef6' # 0x00fa -> ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
'\ufedd' # 0x00fb -> ARABIC LETTER LAM ISOLATED FORM
'\ufed9' # 0x00fc -> ARABIC LETTER KAF ISOLATED FORM
'\ufef1' # 0x00fd -> ARABIC LETTER YEH ISOLATED FORM
'\u25a0' # 0x00fe -> BLACK SQUARE
'\ufffe' # 0x00ff -> UNDEFINED
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00a0, # NON-BREAKING SPACE
0x00a2: 0x00c0, # CENT SIGN
0x00a3: 0x00a3, # POUND SIGN
0x00a4: 0x00a4, # CURRENCY SIGN
0x00a6: 0x00db, # BROKEN VERTICAL BAR
0x00ab: 0x0097, # LEFT POINTING GUILLEMET
0x00ac: 0x00dc, # NOT SIGN
0x00ad: 0x00a1, # SOFT HYPHEN
0x00b0: 0x0080, # DEGREE SIGN
0x00b1: 0x0093, # PLUS-OR-MINUS SIGN
0x00b7: 0x0081, # MIDDLE DOT
0x00bb: 0x0098, # RIGHT POINTING GUILLEMET
0x00bc: 0x0095, # FRACTION 1/4
0x00bd: 0x0094, # FRACTION 1/2
0x00d7: 0x00de, # MULTIPLICATION SIGN
0x00f7: 0x00dd, # DIVISION SIGN
0x03b2: 0x0090, # GREEK SMALL BETA
0x03c6: 0x0092, # GREEK SMALL PHI
0x060c: 0x00ac, # ARABIC COMMA
0x061b: 0x00bb, # ARABIC SEMICOLON
0x061f: 0x00bf, # ARABIC QUESTION MARK
0x0640: 0x00e0, # ARABIC TATWEEL
0x0651: 0x00f1, # ARABIC SHADDAH
0x0660: 0x00b0, # ARABIC-INDIC DIGIT ZERO
0x0661: 0x00b1, # ARABIC-INDIC DIGIT ONE
0x0662: 0x00b2, # ARABIC-INDIC DIGIT TWO
0x0663: 0x00b3, # ARABIC-INDIC DIGIT THREE
0x0664: 0x00b4, # ARABIC-INDIC DIGIT FOUR
0x0665: 0x00b5, # ARABIC-INDIC DIGIT FIVE
0x0666: 0x00b6, # ARABIC-INDIC DIGIT SIX
0x0667: 0x00b7, # ARABIC-INDIC DIGIT SEVEN
0x0668: 0x00b8, # ARABIC-INDIC DIGIT EIGHT
0x0669: 0x00b9, # ARABIC-INDIC DIGIT NINE
0x066a: 0x0025, # ARABIC PERCENT SIGN
0x2219: 0x0082, # BULLET OPERATOR
0x221a: 0x0083, # SQUARE ROOT
0x221e: 0x0091, # INFINITY
0x2248: 0x0096, # ALMOST EQUAL TO
0x2500: 0x0085, # FORMS LIGHT HORIZONTAL
0x2502: 0x0086, # FORMS LIGHT VERTICAL
0x250c: 0x008d, # FORMS LIGHT DOWN AND RIGHT
0x2510: 0x008c, # FORMS LIGHT DOWN AND LEFT
0x2514: 0x008e, # FORMS LIGHT UP AND RIGHT
0x2518: 0x008f, # FORMS LIGHT UP AND LEFT
0x251c: 0x008a, # FORMS LIGHT VERTICAL AND RIGHT
0x2524: 0x0088, # FORMS LIGHT VERTICAL AND LEFT
0x252c: 0x0089, # FORMS LIGHT DOWN AND HORIZONTAL
0x2534: 0x008b, # FORMS LIGHT UP AND HORIZONTAL
0x253c: 0x0087, # FORMS LIGHT VERTICAL AND HORIZONTAL
0x2592: 0x0084, # MEDIUM SHADE
0x25a0: 0x00fe, # BLACK SQUARE
0xfe7d: 0x00f0, # ARABIC SHADDA MEDIAL FORM
0xfe80: 0x00c1, # ARABIC LETTER HAMZA ISOLATED FORM
0xfe81: 0x00c2, # ARABIC LETTER ALEF WITH MADDA ABOVE ISOLATED FORM
0xfe82: 0x00a2, # ARABIC LETTER ALEF WITH MADDA ABOVE FINAL FORM
0xfe83: 0x00c3, # ARABIC LETTER ALEF WITH HAMZA ABOVE ISOLATED FORM
0xfe84: 0x00a5, # ARABIC LETTER ALEF WITH HAMZA ABOVE FINAL FORM
0xfe85: 0x00c4, # ARABIC LETTER WAW WITH HAMZA ABOVE ISOLATED FORM
0xfe8b: 0x00c6, # ARABIC LETTER YEH WITH HAMZA ABOVE INITIAL FORM
0xfe8d: 0x00c7, # ARABIC LETTER ALEF ISOLATED FORM
0xfe8e: 0x00a8, # ARABIC LETTER ALEF FINAL FORM
0xfe8f: 0x00a9, # ARABIC LETTER BEH ISOLATED FORM
0xfe91: 0x00c8, # ARABIC LETTER BEH INITIAL FORM
0xfe93: 0x00c9, # ARABIC LETTER TEH MARBUTA ISOLATED FORM
0xfe95: 0x00aa, # ARABIC LETTER TEH ISOLATED FORM
0xfe97: 0x00ca, # ARABIC LETTER TEH INITIAL FORM
0xfe99: 0x00ab, # ARABIC LETTER THEH ISOLATED FORM
0xfe9b: 0x00cb, # ARABIC LETTER THEH INITIAL FORM
0xfe9d: 0x00ad, # ARABIC LETTER JEEM ISOLATED FORM
0xfe9f: 0x00cc, # ARABIC LETTER JEEM INITIAL FORM
0xfea1: 0x00ae, # ARABIC LETTER HAH ISOLATED FORM
0xfea3: 0x00cd, # ARABIC LETTER HAH INITIAL FORM
0xfea5: 0x00af, # ARABIC LETTER KHAH ISOLATED FORM
0xfea7: 0x00ce, # ARABIC LETTER KHAH INITIAL FORM
0xfea9: 0x00cf, # ARABIC LETTER DAL ISOLATED FORM
0xfeab: 0x00d0, # ARABIC LETTER THAL ISOLATED FORM
0xfead: 0x00d1, # ARABIC LETTER REH ISOLATED FORM
0xfeaf: 0x00d2, # ARABIC LETTER ZAIN ISOLATED FORM
0xfeb1: 0x00bc, # ARABIC LETTER SEEN ISOLATED FORM
0xfeb3: 0x00d3, # ARABIC LETTER SEEN INITIAL FORM
0xfeb5: 0x00bd, # ARABIC LETTER SHEEN ISOLATED FORM
0xfeb7: 0x00d4, # ARABIC LETTER SHEEN INITIAL FORM
0xfeb9: 0x00be, # ARABIC LETTER SAD ISOLATED FORM
0xfebb: 0x00d5, # ARABIC LETTER SAD INITIAL FORM
0xfebd: 0x00eb, # ARABIC LETTER DAD ISOLATED FORM
0xfebf: 0x00d6, # ARABIC LETTER DAD INITIAL FORM
0xfec1: 0x00d7, # ARABIC LETTER TAH ISOLATED FORM
0xfec5: 0x00d8, # ARABIC LETTER ZAH ISOLATED FORM
0xfec9: 0x00df, # ARABIC LETTER AIN ISOLATED FORM
0xfeca: 0x00c5, # ARABIC LETTER AIN FINAL FORM
0xfecb: 0x00d9, # ARABIC LETTER AIN INITIAL FORM
0xfecc: 0x00ec, # ARABIC LETTER AIN MEDIAL FORM
0xfecd: 0x00ee, # ARABIC LETTER GHAIN ISOLATED FORM
0xfece: 0x00ed, # ARABIC LETTER GHAIN FINAL FORM
0xfecf: 0x00da, # ARABIC LETTER GHAIN INITIAL FORM
0xfed0: 0x00f7, # ARABIC LETTER GHAIN MEDIAL FORM
0xfed1: 0x00ba, # ARABIC LETTER FEH ISOLATED FORM
0xfed3: 0x00e1, # ARABIC LETTER FEH INITIAL FORM
0xfed5: 0x00f8, # ARABIC LETTER QAF ISOLATED FORM
0xfed7: 0x00e2, # ARABIC LETTER QAF INITIAL FORM
0xfed9: 0x00fc, # ARABIC LETTER KAF ISOLATED FORM
0xfedb: 0x00e3, # ARABIC LETTER KAF INITIAL FORM
0xfedd: 0x00fb, # ARABIC LETTER LAM ISOLATED FORM
0xfedf: 0x00e4, # ARABIC LETTER LAM INITIAL FORM
0xfee1: 0x00ef, # ARABIC LETTER MEEM ISOLATED FORM
0xfee3: 0x00e5, # ARABIC LETTER MEEM INITIAL FORM
0xfee5: 0x00f2, # ARABIC LETTER NOON ISOLATED FORM
0xfee7: 0x00e6, # ARABIC LETTER NOON INITIAL FORM
0xfee9: 0x00f3, # ARABIC LETTER HEH ISOLATED FORM
0xfeeb: 0x00e7, # ARABIC LETTER HEH INITIAL FORM
0xfeec: 0x00f4, # ARABIC LETTER HEH MEDIAL FORM
0xfeed: 0x00e8, # ARABIC LETTER WAW ISOLATED FORM
0xfeef: 0x00e9, # ARABIC LETTER ALEF MAKSURA ISOLATED FORM
0xfef0: 0x00f5, # ARABIC LETTER ALEF MAKSURA FINAL FORM
0xfef1: 0x00fd, # ARABIC LETTER YEH ISOLATED FORM
0xfef2: 0x00f6, # ARABIC LETTER YEH FINAL FORM
0xfef3: 0x00ea, # ARABIC LETTER YEH INITIAL FORM
0xfef5: 0x00f9, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE ISOLATED FORM
0xfef6: 0x00fa, # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE FINAL FORM
0xfef7: 0x0099, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE ISOLATED FORM
0xfef8: 0x009a, # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE FINAL FORM
0xfefb: 0x009d, # ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM
0xfefc: 0x009e, # ARABIC LIGATURE LAM WITH ALEF FINAL FORM
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP865.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp865',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0081: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x0082: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x0083: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x0084: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x0085: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0086: 0x00e5, # LATIN SMALL LETTER A WITH RING ABOVE
0x0087: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x0088: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0089: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x008a: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x008b: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x008c: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x008d: 0x00ec, # LATIN SMALL LETTER I WITH GRAVE
0x008e: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x008f: 0x00c5, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x0090: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0091: 0x00e6, # LATIN SMALL LIGATURE AE
0x0092: 0x00c6, # LATIN CAPITAL LIGATURE AE
0x0093: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x0094: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x0095: 0x00f2, # LATIN SMALL LETTER O WITH GRAVE
0x0096: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x0097: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x0098: 0x00ff, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0099: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x009a: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x009b: 0x00f8, # LATIN SMALL LETTER O WITH STROKE
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x00d8, # LATIN CAPITAL LETTER O WITH STROKE
0x009e: 0x20a7, # PESETA SIGN
0x009f: 0x0192, # LATIN SMALL LETTER F WITH HOOK
0x00a0: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x00a1: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x00a2: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x00a3: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x00a4: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x00a5: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x00a6: 0x00aa, # FEMININE ORDINAL INDICATOR
0x00a7: 0x00ba, # MASCULINE ORDINAL INDICATOR
0x00a8: 0x00bf, # INVERTED QUESTION MARK
0x00a9: 0x2310, # REVERSED NOT SIGN
0x00aa: 0x00ac, # NOT SIGN
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x00bc, # VULGAR FRACTION ONE QUARTER
0x00ad: 0x00a1, # INVERTED EXCLAMATION MARK
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00a4, # CURRENCY SIGN
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00e1: 0x00df, # LATIN SMALL LETTER SHARP S
0x00e2: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00e3: 0x03c0, # GREEK SMALL LETTER PI
0x00e4: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00e5: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00e6: 0x00b5, # MICRO SIGN
0x00e7: 0x03c4, # GREEK SMALL LETTER TAU
0x00e8: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00e9: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ea: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00eb: 0x03b4, # GREEK SMALL LETTER DELTA
0x00ec: 0x221e, # INFINITY
0x00ed: 0x03c6, # GREEK SMALL LETTER PHI
0x00ee: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00ef: 0x2229, # INTERSECTION
0x00f0: 0x2261, # IDENTICAL TO
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x2265, # GREATER-THAN OR EQUAL TO
0x00f3: 0x2264, # LESS-THAN OR EQUAL TO
0x00f4: 0x2320, # TOP HALF INTEGRAL
0x00f5: 0x2321, # BOTTOM HALF INTEGRAL
0x00f6: 0x00f7, # DIVISION SIGN
0x00f7: 0x2248, # ALMOST EQUAL TO
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x207f, # SUPERSCRIPT LATIN SMALL LETTER N
0x00fd: 0x00b2, # SUPERSCRIPT TWO
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\xc7' # 0x0080 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xfc' # 0x0081 -> LATIN SMALL LETTER U WITH DIAERESIS
'\xe9' # 0x0082 -> LATIN SMALL LETTER E WITH ACUTE
'\xe2' # 0x0083 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x0084 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe0' # 0x0085 -> LATIN SMALL LETTER A WITH GRAVE
'\xe5' # 0x0086 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x0087 -> LATIN SMALL LETTER C WITH CEDILLA
'\xea' # 0x0088 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0089 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xe8' # 0x008a -> LATIN SMALL LETTER E WITH GRAVE
'\xef' # 0x008b -> LATIN SMALL LETTER I WITH DIAERESIS
'\xee' # 0x008c -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xec' # 0x008d -> LATIN SMALL LETTER I WITH GRAVE
'\xc4' # 0x008e -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x008f -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc9' # 0x0090 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xe6' # 0x0091 -> LATIN SMALL LIGATURE AE
'\xc6' # 0x0092 -> LATIN CAPITAL LIGATURE AE
'\xf4' # 0x0093 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x0094 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf2' # 0x0095 -> LATIN SMALL LETTER O WITH GRAVE
'\xfb' # 0x0096 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xf9' # 0x0097 -> LATIN SMALL LETTER U WITH GRAVE
'\xff' # 0x0098 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xd6' # 0x0099 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x009a -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xf8' # 0x009b -> LATIN SMALL LETTER O WITH STROKE
'\xa3' # 0x009c -> POUND SIGN
'\xd8' # 0x009d -> LATIN CAPITAL LETTER O WITH STROKE
'\u20a7' # 0x009e -> PESETA SIGN
'\u0192' # 0x009f -> LATIN SMALL LETTER F WITH HOOK
'\xe1' # 0x00a0 -> LATIN SMALL LETTER A WITH ACUTE
'\xed' # 0x00a1 -> LATIN SMALL LETTER I WITH ACUTE
'\xf3' # 0x00a2 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0x00a3 -> LATIN SMALL LETTER U WITH ACUTE
'\xf1' # 0x00a4 -> LATIN SMALL LETTER N WITH TILDE
'\xd1' # 0x00a5 -> LATIN CAPITAL LETTER N WITH TILDE
'\xaa' # 0x00a6 -> FEMININE ORDINAL INDICATOR
'\xba' # 0x00a7 -> MASCULINE ORDINAL INDICATOR
'\xbf' # 0x00a8 -> INVERTED QUESTION MARK
'\u2310' # 0x00a9 -> REVERSED NOT SIGN
'\xac' # 0x00aa -> NOT SIGN
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\xbc' # 0x00ac -> VULGAR FRACTION ONE QUARTER
'\xa1' # 0x00ad -> INVERTED EXCLAMATION MARK
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xa4' # 0x00af -> CURRENCY SIGN
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b1' # 0x00e0 -> GREEK SMALL LETTER ALPHA
'\xdf' # 0x00e1 -> LATIN SMALL LETTER SHARP S
'\u0393' # 0x00e2 -> GREEK CAPITAL LETTER GAMMA
'\u03c0' # 0x00e3 -> GREEK SMALL LETTER PI
'\u03a3' # 0x00e4 -> GREEK CAPITAL LETTER SIGMA
'\u03c3' # 0x00e5 -> GREEK SMALL LETTER SIGMA
'\xb5' # 0x00e6 -> MICRO SIGN
'\u03c4' # 0x00e7 -> GREEK SMALL LETTER TAU
'\u03a6' # 0x00e8 -> GREEK CAPITAL LETTER PHI
'\u0398' # 0x00e9 -> GREEK CAPITAL LETTER THETA
'\u03a9' # 0x00ea -> GREEK CAPITAL LETTER OMEGA
'\u03b4' # 0x00eb -> GREEK SMALL LETTER DELTA
'\u221e' # 0x00ec -> INFINITY
'\u03c6' # 0x00ed -> GREEK SMALL LETTER PHI
'\u03b5' # 0x00ee -> GREEK SMALL LETTER EPSILON
'\u2229' # 0x00ef -> INTERSECTION
'\u2261' # 0x00f0 -> IDENTICAL TO
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u2265' # 0x00f2 -> GREATER-THAN OR EQUAL TO
'\u2264' # 0x00f3 -> LESS-THAN OR EQUAL TO
'\u2320' # 0x00f4 -> TOP HALF INTEGRAL
'\u2321' # 0x00f5 -> BOTTOM HALF INTEGRAL
'\xf7' # 0x00f6 -> DIVISION SIGN
'\u2248' # 0x00f7 -> ALMOST EQUAL TO
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u207f' # 0x00fc -> SUPERSCRIPT LATIN SMALL LETTER N
'\xb2' # 0x00fd -> SUPERSCRIPT TWO
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a1: 0x00ad, # INVERTED EXCLAMATION MARK
0x00a3: 0x009c, # POUND SIGN
0x00a4: 0x00af, # CURRENCY SIGN
0x00aa: 0x00a6, # FEMININE ORDINAL INDICATOR
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x00aa, # NOT SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x00fd, # SUPERSCRIPT TWO
0x00b5: 0x00e6, # MICRO SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x00ba: 0x00a7, # MASCULINE ORDINAL INDICATOR
0x00bc: 0x00ac, # VULGAR FRACTION ONE QUARTER
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x00bf: 0x00a8, # INVERTED QUESTION MARK
0x00c4: 0x008e, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c5: 0x008f, # LATIN CAPITAL LETTER A WITH RING ABOVE
0x00c6: 0x0092, # LATIN CAPITAL LIGATURE AE
0x00c7: 0x0080, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0090, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d1: 0x00a5, # LATIN CAPITAL LETTER N WITH TILDE
0x00d6: 0x0099, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00d8: 0x009d, # LATIN CAPITAL LETTER O WITH STROKE
0x00dc: 0x009a, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00df: 0x00e1, # LATIN SMALL LETTER SHARP S
0x00e0: 0x0085, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x00a0, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0083, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x0084, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e5: 0x0086, # LATIN SMALL LETTER A WITH RING ABOVE
0x00e6: 0x0091, # LATIN SMALL LIGATURE AE
0x00e7: 0x0087, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008a, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x0082, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0088, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0089, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ec: 0x008d, # LATIN SMALL LETTER I WITH GRAVE
0x00ed: 0x00a1, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x008c, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x008b, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x00a4, # LATIN SMALL LETTER N WITH TILDE
0x00f2: 0x0095, # LATIN SMALL LETTER O WITH GRAVE
0x00f3: 0x00a2, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0093, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x0094, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x00f6, # DIVISION SIGN
0x00f8: 0x009b, # LATIN SMALL LETTER O WITH STROKE
0x00f9: 0x0097, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x00a3, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x0096, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x0081, # LATIN SMALL LETTER U WITH DIAERESIS
0x00ff: 0x0098, # LATIN SMALL LETTER Y WITH DIAERESIS
0x0192: 0x009f, # LATIN SMALL LETTER F WITH HOOK
0x0393: 0x00e2, # GREEK CAPITAL LETTER GAMMA
0x0398: 0x00e9, # GREEK CAPITAL LETTER THETA
0x03a3: 0x00e4, # GREEK CAPITAL LETTER SIGMA
0x03a6: 0x00e8, # GREEK CAPITAL LETTER PHI
0x03a9: 0x00ea, # GREEK CAPITAL LETTER OMEGA
0x03b1: 0x00e0, # GREEK SMALL LETTER ALPHA
0x03b4: 0x00eb, # GREEK SMALL LETTER DELTA
0x03b5: 0x00ee, # GREEK SMALL LETTER EPSILON
0x03c0: 0x00e3, # GREEK SMALL LETTER PI
0x03c3: 0x00e5, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00e7, # GREEK SMALL LETTER TAU
0x03c6: 0x00ed, # GREEK SMALL LETTER PHI
0x207f: 0x00fc, # SUPERSCRIPT LATIN SMALL LETTER N
0x20a7: 0x009e, # PESETA SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x221e: 0x00ec, # INFINITY
0x2229: 0x00ef, # INTERSECTION
0x2248: 0x00f7, # ALMOST EQUAL TO
0x2261: 0x00f0, # IDENTICAL TO
0x2264: 0x00f3, # LESS-THAN OR EQUAL TO
0x2265: 0x00f2, # GREATER-THAN OR EQUAL TO
0x2310: 0x00a9, # REVERSED NOT SIGN
0x2320: 0x00f4, # TOP HALF INTEGRAL
0x2321: 0x00f5, # BOTTOM HALF INTEGRAL
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP866.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp866',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x0410, # CYRILLIC CAPITAL LETTER A
0x0081: 0x0411, # CYRILLIC CAPITAL LETTER BE
0x0082: 0x0412, # CYRILLIC CAPITAL LETTER VE
0x0083: 0x0413, # CYRILLIC CAPITAL LETTER GHE
0x0084: 0x0414, # CYRILLIC CAPITAL LETTER DE
0x0085: 0x0415, # CYRILLIC CAPITAL LETTER IE
0x0086: 0x0416, # CYRILLIC CAPITAL LETTER ZHE
0x0087: 0x0417, # CYRILLIC CAPITAL LETTER ZE
0x0088: 0x0418, # CYRILLIC CAPITAL LETTER I
0x0089: 0x0419, # CYRILLIC CAPITAL LETTER SHORT I
0x008a: 0x041a, # CYRILLIC CAPITAL LETTER KA
0x008b: 0x041b, # CYRILLIC CAPITAL LETTER EL
0x008c: 0x041c, # CYRILLIC CAPITAL LETTER EM
0x008d: 0x041d, # CYRILLIC CAPITAL LETTER EN
0x008e: 0x041e, # CYRILLIC CAPITAL LETTER O
0x008f: 0x041f, # CYRILLIC CAPITAL LETTER PE
0x0090: 0x0420, # CYRILLIC CAPITAL LETTER ER
0x0091: 0x0421, # CYRILLIC CAPITAL LETTER ES
0x0092: 0x0422, # CYRILLIC CAPITAL LETTER TE
0x0093: 0x0423, # CYRILLIC CAPITAL LETTER U
0x0094: 0x0424, # CYRILLIC CAPITAL LETTER EF
0x0095: 0x0425, # CYRILLIC CAPITAL LETTER HA
0x0096: 0x0426, # CYRILLIC CAPITAL LETTER TSE
0x0097: 0x0427, # CYRILLIC CAPITAL LETTER CHE
0x0098: 0x0428, # CYRILLIC CAPITAL LETTER SHA
0x0099: 0x0429, # CYRILLIC CAPITAL LETTER SHCHA
0x009a: 0x042a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x009b: 0x042b, # CYRILLIC CAPITAL LETTER YERU
0x009c: 0x042c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x009d: 0x042d, # CYRILLIC CAPITAL LETTER E
0x009e: 0x042e, # CYRILLIC CAPITAL LETTER YU
0x009f: 0x042f, # CYRILLIC CAPITAL LETTER YA
0x00a0: 0x0430, # CYRILLIC SMALL LETTER A
0x00a1: 0x0431, # CYRILLIC SMALL LETTER BE
0x00a2: 0x0432, # CYRILLIC SMALL LETTER VE
0x00a3: 0x0433, # CYRILLIC SMALL LETTER GHE
0x00a4: 0x0434, # CYRILLIC SMALL LETTER DE
0x00a5: 0x0435, # CYRILLIC SMALL LETTER IE
0x00a6: 0x0436, # CYRILLIC SMALL LETTER ZHE
0x00a7: 0x0437, # CYRILLIC SMALL LETTER ZE
0x00a8: 0x0438, # CYRILLIC SMALL LETTER I
0x00a9: 0x0439, # CYRILLIC SMALL LETTER SHORT I
0x00aa: 0x043a, # CYRILLIC SMALL LETTER KA
0x00ab: 0x043b, # CYRILLIC SMALL LETTER EL
0x00ac: 0x043c, # CYRILLIC SMALL LETTER EM
0x00ad: 0x043d, # CYRILLIC SMALL LETTER EN
0x00ae: 0x043e, # CYRILLIC SMALL LETTER O
0x00af: 0x043f, # CYRILLIC SMALL LETTER PE
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x2561, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x00b6: 0x2562, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x00b7: 0x2556, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x00b8: 0x2555, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x255c, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x00be: 0x255b, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x255e, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x00c7: 0x255f, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x2567, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x00d0: 0x2568, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x00d1: 0x2564, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x00d2: 0x2565, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x00d3: 0x2559, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x00d4: 0x2558, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x00d5: 0x2552, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x00d6: 0x2553, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x00d7: 0x256b, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x00d8: 0x256a, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x258c, # LEFT HALF BLOCK
0x00de: 0x2590, # RIGHT HALF BLOCK
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x0440, # CYRILLIC SMALL LETTER ER
0x00e1: 0x0441, # CYRILLIC SMALL LETTER ES
0x00e2: 0x0442, # CYRILLIC SMALL LETTER TE
0x00e3: 0x0443, # CYRILLIC SMALL LETTER U
0x00e4: 0x0444, # CYRILLIC SMALL LETTER EF
0x00e5: 0x0445, # CYRILLIC SMALL LETTER HA
0x00e6: 0x0446, # CYRILLIC SMALL LETTER TSE
0x00e7: 0x0447, # CYRILLIC SMALL LETTER CHE
0x00e8: 0x0448, # CYRILLIC SMALL LETTER SHA
0x00e9: 0x0449, # CYRILLIC SMALL LETTER SHCHA
0x00ea: 0x044a, # CYRILLIC SMALL LETTER HARD SIGN
0x00eb: 0x044b, # CYRILLIC SMALL LETTER YERU
0x00ec: 0x044c, # CYRILLIC SMALL LETTER SOFT SIGN
0x00ed: 0x044d, # CYRILLIC SMALL LETTER E
0x00ee: 0x044e, # CYRILLIC SMALL LETTER YU
0x00ef: 0x044f, # CYRILLIC SMALL LETTER YA
0x00f0: 0x0401, # CYRILLIC CAPITAL LETTER IO
0x00f1: 0x0451, # CYRILLIC SMALL LETTER IO
0x00f2: 0x0404, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x00f3: 0x0454, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x00f4: 0x0407, # CYRILLIC CAPITAL LETTER YI
0x00f5: 0x0457, # CYRILLIC SMALL LETTER YI
0x00f6: 0x040e, # CYRILLIC CAPITAL LETTER SHORT U
0x00f7: 0x045e, # CYRILLIC SMALL LETTER SHORT U
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x2219, # BULLET OPERATOR
0x00fa: 0x00b7, # MIDDLE DOT
0x00fb: 0x221a, # SQUARE ROOT
0x00fc: 0x2116, # NUMERO SIGN
0x00fd: 0x00a4, # CURRENCY SIGN
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\u0410' # 0x0080 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0x0081 -> CYRILLIC CAPITAL LETTER BE
'\u0412' # 0x0082 -> CYRILLIC CAPITAL LETTER VE
'\u0413' # 0x0083 -> CYRILLIC CAPITAL LETTER GHE
'\u0414' # 0x0084 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0x0085 -> CYRILLIC CAPITAL LETTER IE
'\u0416' # 0x0086 -> CYRILLIC CAPITAL LETTER ZHE
'\u0417' # 0x0087 -> CYRILLIC CAPITAL LETTER ZE
'\u0418' # 0x0088 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0x0089 -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0x008a -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0x008b -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0x008c -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0x008d -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0x008e -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0x008f -> CYRILLIC CAPITAL LETTER PE
'\u0420' # 0x0090 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0x0091 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0x0092 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0x0093 -> CYRILLIC CAPITAL LETTER U
'\u0424' # 0x0094 -> CYRILLIC CAPITAL LETTER EF
'\u0425' # 0x0095 -> CYRILLIC CAPITAL LETTER HA
'\u0426' # 0x0096 -> CYRILLIC CAPITAL LETTER TSE
'\u0427' # 0x0097 -> CYRILLIC CAPITAL LETTER CHE
'\u0428' # 0x0098 -> CYRILLIC CAPITAL LETTER SHA
'\u0429' # 0x0099 -> CYRILLIC CAPITAL LETTER SHCHA
'\u042a' # 0x009a -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u042b' # 0x009b -> CYRILLIC CAPITAL LETTER YERU
'\u042c' # 0x009c -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042d' # 0x009d -> CYRILLIC CAPITAL LETTER E
'\u042e' # 0x009e -> CYRILLIC CAPITAL LETTER YU
'\u042f' # 0x009f -> CYRILLIC CAPITAL LETTER YA
'\u0430' # 0x00a0 -> CYRILLIC SMALL LETTER A
'\u0431' # 0x00a1 -> CYRILLIC SMALL LETTER BE
'\u0432' # 0x00a2 -> CYRILLIC SMALL LETTER VE
'\u0433' # 0x00a3 -> CYRILLIC SMALL LETTER GHE
'\u0434' # 0x00a4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0x00a5 -> CYRILLIC SMALL LETTER IE
'\u0436' # 0x00a6 -> CYRILLIC SMALL LETTER ZHE
'\u0437' # 0x00a7 -> CYRILLIC SMALL LETTER ZE
'\u0438' # 0x00a8 -> CYRILLIC SMALL LETTER I
'\u0439' # 0x00a9 -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0x00aa -> CYRILLIC SMALL LETTER KA
'\u043b' # 0x00ab -> CYRILLIC SMALL LETTER EL
'\u043c' # 0x00ac -> CYRILLIC SMALL LETTER EM
'\u043d' # 0x00ad -> CYRILLIC SMALL LETTER EN
'\u043e' # 0x00ae -> CYRILLIC SMALL LETTER O
'\u043f' # 0x00af -> CYRILLIC SMALL LETTER PE
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u2561' # 0x00b5 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u2562' # 0x00b6 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2556' # 0x00b7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2555' # 0x00b8 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255c' # 0x00bd -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255b' # 0x00be -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u255e' # 0x00c6 -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0x00c7 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u2567' # 0x00cf -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0x00d0 -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2564' # 0x00d1 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0x00d2 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2559' # 0x00d3 -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u2558' # 0x00d4 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2552' # 0x00d5 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u2553' # 0x00d6 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u256b' # 0x00d7 -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256a' # 0x00d8 -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u258c' # 0x00dd -> LEFT HALF BLOCK
'\u2590' # 0x00de -> RIGHT HALF BLOCK
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u0440' # 0x00e0 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0x00e1 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0x00e2 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0x00e3 -> CYRILLIC SMALL LETTER U
'\u0444' # 0x00e4 -> CYRILLIC SMALL LETTER EF
'\u0445' # 0x00e5 -> CYRILLIC SMALL LETTER HA
'\u0446' # 0x00e6 -> CYRILLIC SMALL LETTER TSE
'\u0447' # 0x00e7 -> CYRILLIC SMALL LETTER CHE
'\u0448' # 0x00e8 -> CYRILLIC SMALL LETTER SHA
'\u0449' # 0x00e9 -> CYRILLIC SMALL LETTER SHCHA
'\u044a' # 0x00ea -> CYRILLIC SMALL LETTER HARD SIGN
'\u044b' # 0x00eb -> CYRILLIC SMALL LETTER YERU
'\u044c' # 0x00ec -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044d' # 0x00ed -> CYRILLIC SMALL LETTER E
'\u044e' # 0x00ee -> CYRILLIC SMALL LETTER YU
'\u044f' # 0x00ef -> CYRILLIC SMALL LETTER YA
'\u0401' # 0x00f0 -> CYRILLIC CAPITAL LETTER IO
'\u0451' # 0x00f1 -> CYRILLIC SMALL LETTER IO
'\u0404' # 0x00f2 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\u0454' # 0x00f3 -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\u0407' # 0x00f4 -> CYRILLIC CAPITAL LETTER YI
'\u0457' # 0x00f5 -> CYRILLIC SMALL LETTER YI
'\u040e' # 0x00f6 -> CYRILLIC CAPITAL LETTER SHORT U
'\u045e' # 0x00f7 -> CYRILLIC SMALL LETTER SHORT U
'\xb0' # 0x00f8 -> DEGREE SIGN
'\u2219' # 0x00f9 -> BULLET OPERATOR
'\xb7' # 0x00fa -> MIDDLE DOT
'\u221a' # 0x00fb -> SQUARE ROOT
'\u2116' # 0x00fc -> NUMERO SIGN
'\xa4' # 0x00fd -> CURRENCY SIGN
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a4: 0x00fd, # CURRENCY SIGN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b7: 0x00fa, # MIDDLE DOT
0x0401: 0x00f0, # CYRILLIC CAPITAL LETTER IO
0x0404: 0x00f2, # CYRILLIC CAPITAL LETTER UKRAINIAN IE
0x0407: 0x00f4, # CYRILLIC CAPITAL LETTER YI
0x040e: 0x00f6, # CYRILLIC CAPITAL LETTER SHORT U
0x0410: 0x0080, # CYRILLIC CAPITAL LETTER A
0x0411: 0x0081, # CYRILLIC CAPITAL LETTER BE
0x0412: 0x0082, # CYRILLIC CAPITAL LETTER VE
0x0413: 0x0083, # CYRILLIC CAPITAL LETTER GHE
0x0414: 0x0084, # CYRILLIC CAPITAL LETTER DE
0x0415: 0x0085, # CYRILLIC CAPITAL LETTER IE
0x0416: 0x0086, # CYRILLIC CAPITAL LETTER ZHE
0x0417: 0x0087, # CYRILLIC CAPITAL LETTER ZE
0x0418: 0x0088, # CYRILLIC CAPITAL LETTER I
0x0419: 0x0089, # CYRILLIC CAPITAL LETTER SHORT I
0x041a: 0x008a, # CYRILLIC CAPITAL LETTER KA
0x041b: 0x008b, # CYRILLIC CAPITAL LETTER EL
0x041c: 0x008c, # CYRILLIC CAPITAL LETTER EM
0x041d: 0x008d, # CYRILLIC CAPITAL LETTER EN
0x041e: 0x008e, # CYRILLIC CAPITAL LETTER O
0x041f: 0x008f, # CYRILLIC CAPITAL LETTER PE
0x0420: 0x0090, # CYRILLIC CAPITAL LETTER ER
0x0421: 0x0091, # CYRILLIC CAPITAL LETTER ES
0x0422: 0x0092, # CYRILLIC CAPITAL LETTER TE
0x0423: 0x0093, # CYRILLIC CAPITAL LETTER U
0x0424: 0x0094, # CYRILLIC CAPITAL LETTER EF
0x0425: 0x0095, # CYRILLIC CAPITAL LETTER HA
0x0426: 0x0096, # CYRILLIC CAPITAL LETTER TSE
0x0427: 0x0097, # CYRILLIC CAPITAL LETTER CHE
0x0428: 0x0098, # CYRILLIC CAPITAL LETTER SHA
0x0429: 0x0099, # CYRILLIC CAPITAL LETTER SHCHA
0x042a: 0x009a, # CYRILLIC CAPITAL LETTER HARD SIGN
0x042b: 0x009b, # CYRILLIC CAPITAL LETTER YERU
0x042c: 0x009c, # CYRILLIC CAPITAL LETTER SOFT SIGN
0x042d: 0x009d, # CYRILLIC CAPITAL LETTER E
0x042e: 0x009e, # CYRILLIC CAPITAL LETTER YU
0x042f: 0x009f, # CYRILLIC CAPITAL LETTER YA
0x0430: 0x00a0, # CYRILLIC SMALL LETTER A
0x0431: 0x00a1, # CYRILLIC SMALL LETTER BE
0x0432: 0x00a2, # CYRILLIC SMALL LETTER VE
0x0433: 0x00a3, # CYRILLIC SMALL LETTER GHE
0x0434: 0x00a4, # CYRILLIC SMALL LETTER DE
0x0435: 0x00a5, # CYRILLIC SMALL LETTER IE
0x0436: 0x00a6, # CYRILLIC SMALL LETTER ZHE
0x0437: 0x00a7, # CYRILLIC SMALL LETTER ZE
0x0438: 0x00a8, # CYRILLIC SMALL LETTER I
0x0439: 0x00a9, # CYRILLIC SMALL LETTER SHORT I
0x043a: 0x00aa, # CYRILLIC SMALL LETTER KA
0x043b: 0x00ab, # CYRILLIC SMALL LETTER EL
0x043c: 0x00ac, # CYRILLIC SMALL LETTER EM
0x043d: 0x00ad, # CYRILLIC SMALL LETTER EN
0x043e: 0x00ae, # CYRILLIC SMALL LETTER O
0x043f: 0x00af, # CYRILLIC SMALL LETTER PE
0x0440: 0x00e0, # CYRILLIC SMALL LETTER ER
0x0441: 0x00e1, # CYRILLIC SMALL LETTER ES
0x0442: 0x00e2, # CYRILLIC SMALL LETTER TE
0x0443: 0x00e3, # CYRILLIC SMALL LETTER U
0x0444: 0x00e4, # CYRILLIC SMALL LETTER EF
0x0445: 0x00e5, # CYRILLIC SMALL LETTER HA
0x0446: 0x00e6, # CYRILLIC SMALL LETTER TSE
0x0447: 0x00e7, # CYRILLIC SMALL LETTER CHE
0x0448: 0x00e8, # CYRILLIC SMALL LETTER SHA
0x0449: 0x00e9, # CYRILLIC SMALL LETTER SHCHA
0x044a: 0x00ea, # CYRILLIC SMALL LETTER HARD SIGN
0x044b: 0x00eb, # CYRILLIC SMALL LETTER YERU
0x044c: 0x00ec, # CYRILLIC SMALL LETTER SOFT SIGN
0x044d: 0x00ed, # CYRILLIC SMALL LETTER E
0x044e: 0x00ee, # CYRILLIC SMALL LETTER YU
0x044f: 0x00ef, # CYRILLIC SMALL LETTER YA
0x0451: 0x00f1, # CYRILLIC SMALL LETTER IO
0x0454: 0x00f3, # CYRILLIC SMALL LETTER UKRAINIAN IE
0x0457: 0x00f5, # CYRILLIC SMALL LETTER YI
0x045e: 0x00f7, # CYRILLIC SMALL LETTER SHORT U
0x2116: 0x00fc, # NUMERO SIGN
0x2219: 0x00f9, # BULLET OPERATOR
0x221a: 0x00fb, # SQUARE ROOT
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2552: 0x00d5, # BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
0x2553: 0x00d6, # BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2555: 0x00b8, # BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
0x2556: 0x00b7, # BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x2558: 0x00d4, # BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
0x2559: 0x00d3, # BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255b: 0x00be, # BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
0x255c: 0x00bd, # BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x255e: 0x00c6, # BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
0x255f: 0x00c7, # BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2561: 0x00b5, # BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
0x2562: 0x00b6, # BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2564: 0x00d1, # BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
0x2565: 0x00d2, # BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2567: 0x00cf, # BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
0x2568: 0x00d0, # BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256a: 0x00d8, # BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
0x256b: 0x00d7, # BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x258c: 0x00dd, # LEFT HALF BLOCK
0x2590: 0x00de, # RIGHT HALF BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec generated from 'VENDORS/MICSFT/PC/CP869.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp869',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: None, # UNDEFINED
0x0081: None, # UNDEFINED
0x0082: None, # UNDEFINED
0x0083: None, # UNDEFINED
0x0084: None, # UNDEFINED
0x0085: None, # UNDEFINED
0x0086: 0x0386, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x0087: None, # UNDEFINED
0x0088: 0x00b7, # MIDDLE DOT
0x0089: 0x00ac, # NOT SIGN
0x008a: 0x00a6, # BROKEN BAR
0x008b: 0x2018, # LEFT SINGLE QUOTATION MARK
0x008c: 0x2019, # RIGHT SINGLE QUOTATION MARK
0x008d: 0x0388, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x008e: 0x2015, # HORIZONTAL BAR
0x008f: 0x0389, # GREEK CAPITAL LETTER ETA WITH TONOS
0x0090: 0x038a, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x0091: 0x03aa, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x0092: 0x038c, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x0093: None, # UNDEFINED
0x0094: None, # UNDEFINED
0x0095: 0x038e, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x0096: 0x03ab, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x0097: 0x00a9, # COPYRIGHT SIGN
0x0098: 0x038f, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x0099: 0x00b2, # SUPERSCRIPT TWO
0x009a: 0x00b3, # SUPERSCRIPT THREE
0x009b: 0x03ac, # GREEK SMALL LETTER ALPHA WITH TONOS
0x009c: 0x00a3, # POUND SIGN
0x009d: 0x03ad, # GREEK SMALL LETTER EPSILON WITH TONOS
0x009e: 0x03ae, # GREEK SMALL LETTER ETA WITH TONOS
0x009f: 0x03af, # GREEK SMALL LETTER IOTA WITH TONOS
0x00a0: 0x03ca, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x00a1: 0x0390, # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0x00a2: 0x03cc, # GREEK SMALL LETTER OMICRON WITH TONOS
0x00a3: 0x03cd, # GREEK SMALL LETTER UPSILON WITH TONOS
0x00a4: 0x0391, # GREEK CAPITAL LETTER ALPHA
0x00a5: 0x0392, # GREEK CAPITAL LETTER BETA
0x00a6: 0x0393, # GREEK CAPITAL LETTER GAMMA
0x00a7: 0x0394, # GREEK CAPITAL LETTER DELTA
0x00a8: 0x0395, # GREEK CAPITAL LETTER EPSILON
0x00a9: 0x0396, # GREEK CAPITAL LETTER ZETA
0x00aa: 0x0397, # GREEK CAPITAL LETTER ETA
0x00ab: 0x00bd, # VULGAR FRACTION ONE HALF
0x00ac: 0x0398, # GREEK CAPITAL LETTER THETA
0x00ad: 0x0399, # GREEK CAPITAL LETTER IOTA
0x00ae: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00af: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00b0: 0x2591, # LIGHT SHADE
0x00b1: 0x2592, # MEDIUM SHADE
0x00b2: 0x2593, # DARK SHADE
0x00b3: 0x2502, # BOX DRAWINGS LIGHT VERTICAL
0x00b4: 0x2524, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x00b5: 0x039a, # GREEK CAPITAL LETTER KAPPA
0x00b6: 0x039b, # GREEK CAPITAL LETTER LAMDA
0x00b7: 0x039c, # GREEK CAPITAL LETTER MU
0x00b8: 0x039d, # GREEK CAPITAL LETTER NU
0x00b9: 0x2563, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x00ba: 0x2551, # BOX DRAWINGS DOUBLE VERTICAL
0x00bb: 0x2557, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x00bc: 0x255d, # BOX DRAWINGS DOUBLE UP AND LEFT
0x00bd: 0x039e, # GREEK CAPITAL LETTER XI
0x00be: 0x039f, # GREEK CAPITAL LETTER OMICRON
0x00bf: 0x2510, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x00c0: 0x2514, # BOX DRAWINGS LIGHT UP AND RIGHT
0x00c1: 0x2534, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x00c2: 0x252c, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x00c3: 0x251c, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x00c4: 0x2500, # BOX DRAWINGS LIGHT HORIZONTAL
0x00c5: 0x253c, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x00c6: 0x03a0, # GREEK CAPITAL LETTER PI
0x00c7: 0x03a1, # GREEK CAPITAL LETTER RHO
0x00c8: 0x255a, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x00c9: 0x2554, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x00ca: 0x2569, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x00cb: 0x2566, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x00cc: 0x2560, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x00cd: 0x2550, # BOX DRAWINGS DOUBLE HORIZONTAL
0x00ce: 0x256c, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x00cf: 0x03a3, # GREEK CAPITAL LETTER SIGMA
0x00d0: 0x03a4, # GREEK CAPITAL LETTER TAU
0x00d1: 0x03a5, # GREEK CAPITAL LETTER UPSILON
0x00d2: 0x03a6, # GREEK CAPITAL LETTER PHI
0x00d3: 0x03a7, # GREEK CAPITAL LETTER CHI
0x00d4: 0x03a8, # GREEK CAPITAL LETTER PSI
0x00d5: 0x03a9, # GREEK CAPITAL LETTER OMEGA
0x00d6: 0x03b1, # GREEK SMALL LETTER ALPHA
0x00d7: 0x03b2, # GREEK SMALL LETTER BETA
0x00d8: 0x03b3, # GREEK SMALL LETTER GAMMA
0x00d9: 0x2518, # BOX DRAWINGS LIGHT UP AND LEFT
0x00da: 0x250c, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x00db: 0x2588, # FULL BLOCK
0x00dc: 0x2584, # LOWER HALF BLOCK
0x00dd: 0x03b4, # GREEK SMALL LETTER DELTA
0x00de: 0x03b5, # GREEK SMALL LETTER EPSILON
0x00df: 0x2580, # UPPER HALF BLOCK
0x00e0: 0x03b6, # GREEK SMALL LETTER ZETA
0x00e1: 0x03b7, # GREEK SMALL LETTER ETA
0x00e2: 0x03b8, # GREEK SMALL LETTER THETA
0x00e3: 0x03b9, # GREEK SMALL LETTER IOTA
0x00e4: 0x03ba, # GREEK SMALL LETTER KAPPA
0x00e5: 0x03bb, # GREEK SMALL LETTER LAMDA
0x00e6: 0x03bc, # GREEK SMALL LETTER MU
0x00e7: 0x03bd, # GREEK SMALL LETTER NU
0x00e8: 0x03be, # GREEK SMALL LETTER XI
0x00e9: 0x03bf, # GREEK SMALL LETTER OMICRON
0x00ea: 0x03c0, # GREEK SMALL LETTER PI
0x00eb: 0x03c1, # GREEK SMALL LETTER RHO
0x00ec: 0x03c3, # GREEK SMALL LETTER SIGMA
0x00ed: 0x03c2, # GREEK SMALL LETTER FINAL SIGMA
0x00ee: 0x03c4, # GREEK SMALL LETTER TAU
0x00ef: 0x0384, # GREEK TONOS
0x00f0: 0x00ad, # SOFT HYPHEN
0x00f1: 0x00b1, # PLUS-MINUS SIGN
0x00f2: 0x03c5, # GREEK SMALL LETTER UPSILON
0x00f3: 0x03c6, # GREEK SMALL LETTER PHI
0x00f4: 0x03c7, # GREEK SMALL LETTER CHI
0x00f5: 0x00a7, # SECTION SIGN
0x00f6: 0x03c8, # GREEK SMALL LETTER PSI
0x00f7: 0x0385, # GREEK DIALYTIKA TONOS
0x00f8: 0x00b0, # DEGREE SIGN
0x00f9: 0x00a8, # DIAERESIS
0x00fa: 0x03c9, # GREEK SMALL LETTER OMEGA
0x00fb: 0x03cb, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x00fc: 0x03b0, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0x00fd: 0x03ce, # GREEK SMALL LETTER OMEGA WITH TONOS
0x00fe: 0x25a0, # BLACK SQUARE
0x00ff: 0x00a0, # NO-BREAK SPACE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> NULL
'\x01' # 0x0001 -> START OF HEADING
'\x02' # 0x0002 -> START OF TEXT
'\x03' # 0x0003 -> END OF TEXT
'\x04' # 0x0004 -> END OF TRANSMISSION
'\x05' # 0x0005 -> ENQUIRY
'\x06' # 0x0006 -> ACKNOWLEDGE
'\x07' # 0x0007 -> BELL
'\x08' # 0x0008 -> BACKSPACE
'\t' # 0x0009 -> HORIZONTAL TABULATION
'\n' # 0x000a -> LINE FEED
'\x0b' # 0x000b -> VERTICAL TABULATION
'\x0c' # 0x000c -> FORM FEED
'\r' # 0x000d -> CARRIAGE RETURN
'\x0e' # 0x000e -> SHIFT OUT
'\x0f' # 0x000f -> SHIFT IN
'\x10' # 0x0010 -> DATA LINK ESCAPE
'\x11' # 0x0011 -> DEVICE CONTROL ONE
'\x12' # 0x0012 -> DEVICE CONTROL TWO
'\x13' # 0x0013 -> DEVICE CONTROL THREE
'\x14' # 0x0014 -> DEVICE CONTROL FOUR
'\x15' # 0x0015 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x0016 -> SYNCHRONOUS IDLE
'\x17' # 0x0017 -> END OF TRANSMISSION BLOCK
'\x18' # 0x0018 -> CANCEL
'\x19' # 0x0019 -> END OF MEDIUM
'\x1a' # 0x001a -> SUBSTITUTE
'\x1b' # 0x001b -> ESCAPE
'\x1c' # 0x001c -> FILE SEPARATOR
'\x1d' # 0x001d -> GROUP SEPARATOR
'\x1e' # 0x001e -> RECORD SEPARATOR
'\x1f' # 0x001f -> UNIT SEPARATOR
' ' # 0x0020 -> SPACE
'!' # 0x0021 -> EXCLAMATION MARK
'"' # 0x0022 -> QUOTATION MARK
'#' # 0x0023 -> NUMBER SIGN
'$' # 0x0024 -> DOLLAR SIGN
'%' # 0x0025 -> PERCENT SIGN
'&' # 0x0026 -> AMPERSAND
"'" # 0x0027 -> APOSTROPHE
'(' # 0x0028 -> LEFT PARENTHESIS
')' # 0x0029 -> RIGHT PARENTHESIS
'*' # 0x002a -> ASTERISK
'+' # 0x002b -> PLUS SIGN
',' # 0x002c -> COMMA
'-' # 0x002d -> HYPHEN-MINUS
'.' # 0x002e -> FULL STOP
'/' # 0x002f -> SOLIDUS
'0' # 0x0030 -> DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE
'2' # 0x0032 -> DIGIT TWO
'3' # 0x0033 -> DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE
':' # 0x003a -> COLON
';' # 0x003b -> SEMICOLON
'<' # 0x003c -> LESS-THAN SIGN
'=' # 0x003d -> EQUALS SIGN
'>' # 0x003e -> GREATER-THAN SIGN
'?' # 0x003f -> QUESTION MARK
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET
'\\' # 0x005c -> REVERSE SOLIDUS
']' # 0x005d -> RIGHT SQUARE BRACKET
'^' # 0x005e -> CIRCUMFLEX ACCENT
'_' # 0x005f -> LOW LINE
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET
'|' # 0x007c -> VERTICAL LINE
'}' # 0x007d -> RIGHT CURLY BRACKET
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> DELETE
'\ufffe' # 0x0080 -> UNDEFINED
'\ufffe' # 0x0081 -> UNDEFINED
'\ufffe' # 0x0082 -> UNDEFINED
'\ufffe' # 0x0083 -> UNDEFINED
'\ufffe' # 0x0084 -> UNDEFINED
'\ufffe' # 0x0085 -> UNDEFINED
'\u0386' # 0x0086 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
'\ufffe' # 0x0087 -> UNDEFINED
'\xb7' # 0x0088 -> MIDDLE DOT
'\xac' # 0x0089 -> NOT SIGN
'\xa6' # 0x008a -> BROKEN BAR
'\u2018' # 0x008b -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x008c -> RIGHT SINGLE QUOTATION MARK
'\u0388' # 0x008d -> GREEK CAPITAL LETTER EPSILON WITH TONOS
'\u2015' # 0x008e -> HORIZONTAL BAR
'\u0389' # 0x008f -> GREEK CAPITAL LETTER ETA WITH TONOS
'\u038a' # 0x0090 -> GREEK CAPITAL LETTER IOTA WITH TONOS
'\u03aa' # 0x0091 -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\u038c' # 0x0092 -> GREEK CAPITAL LETTER OMICRON WITH TONOS
'\ufffe' # 0x0093 -> UNDEFINED
'\ufffe' # 0x0094 -> UNDEFINED
'\u038e' # 0x0095 -> GREEK CAPITAL LETTER UPSILON WITH TONOS
'\u03ab' # 0x0096 -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'\xa9' # 0x0097 -> COPYRIGHT SIGN
'\u038f' # 0x0098 -> GREEK CAPITAL LETTER OMEGA WITH TONOS
'\xb2' # 0x0099 -> SUPERSCRIPT TWO
'\xb3' # 0x009a -> SUPERSCRIPT THREE
'\u03ac' # 0x009b -> GREEK SMALL LETTER ALPHA WITH TONOS
'\xa3' # 0x009c -> POUND SIGN
'\u03ad' # 0x009d -> GREEK SMALL LETTER EPSILON WITH TONOS
'\u03ae' # 0x009e -> GREEK SMALL LETTER ETA WITH TONOS
'\u03af' # 0x009f -> GREEK SMALL LETTER IOTA WITH TONOS
'\u03ca' # 0x00a0 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\u0390' # 0x00a1 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
'\u03cc' # 0x00a2 -> GREEK SMALL LETTER OMICRON WITH TONOS
'\u03cd' # 0x00a3 -> GREEK SMALL LETTER UPSILON WITH TONOS
'\u0391' # 0x00a4 -> GREEK CAPITAL LETTER ALPHA
'\u0392' # 0x00a5 -> GREEK CAPITAL LETTER BETA
'\u0393' # 0x00a6 -> GREEK CAPITAL LETTER GAMMA
'\u0394' # 0x00a7 -> GREEK CAPITAL LETTER DELTA
'\u0395' # 0x00a8 -> GREEK CAPITAL LETTER EPSILON
'\u0396' # 0x00a9 -> GREEK CAPITAL LETTER ZETA
'\u0397' # 0x00aa -> GREEK CAPITAL LETTER ETA
'\xbd' # 0x00ab -> VULGAR FRACTION ONE HALF
'\u0398' # 0x00ac -> GREEK CAPITAL LETTER THETA
'\u0399' # 0x00ad -> GREEK CAPITAL LETTER IOTA
'\xab' # 0x00ae -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0x00af -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2591' # 0x00b0 -> LIGHT SHADE
'\u2592' # 0x00b1 -> MEDIUM SHADE
'\u2593' # 0x00b2 -> DARK SHADE
'\u2502' # 0x00b3 -> BOX DRAWINGS LIGHT VERTICAL
'\u2524' # 0x00b4 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u039a' # 0x00b5 -> GREEK CAPITAL LETTER KAPPA
'\u039b' # 0x00b6 -> GREEK CAPITAL LETTER LAMDA
'\u039c' # 0x00b7 -> GREEK CAPITAL LETTER MU
'\u039d' # 0x00b8 -> GREEK CAPITAL LETTER NU
'\u2563' # 0x00b9 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2551' # 0x00ba -> BOX DRAWINGS DOUBLE VERTICAL
'\u2557' # 0x00bb -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u255d' # 0x00bc -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u039e' # 0x00bd -> GREEK CAPITAL LETTER XI
'\u039f' # 0x00be -> GREEK CAPITAL LETTER OMICRON
'\u2510' # 0x00bf -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x00c0 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2534' # 0x00c1 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u252c' # 0x00c2 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u251c' # 0x00c3 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2500' # 0x00c4 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u253c' # 0x00c5 -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u03a0' # 0x00c6 -> GREEK CAPITAL LETTER PI
'\u03a1' # 0x00c7 -> GREEK CAPITAL LETTER RHO
'\u255a' # 0x00c8 -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u2554' # 0x00c9 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2569' # 0x00ca -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u2566' # 0x00cb -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2560' # 0x00cc -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2550' # 0x00cd -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u256c' # 0x00ce -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\u03a3' # 0x00cf -> GREEK CAPITAL LETTER SIGMA
'\u03a4' # 0x00d0 -> GREEK CAPITAL LETTER TAU
'\u03a5' # 0x00d1 -> GREEK CAPITAL LETTER UPSILON
'\u03a6' # 0x00d2 -> GREEK CAPITAL LETTER PHI
'\u03a7' # 0x00d3 -> GREEK CAPITAL LETTER CHI
'\u03a8' # 0x00d4 -> GREEK CAPITAL LETTER PSI
'\u03a9' # 0x00d5 -> GREEK CAPITAL LETTER OMEGA
'\u03b1' # 0x00d6 -> GREEK SMALL LETTER ALPHA
'\u03b2' # 0x00d7 -> GREEK SMALL LETTER BETA
'\u03b3' # 0x00d8 -> GREEK SMALL LETTER GAMMA
'\u2518' # 0x00d9 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u250c' # 0x00da -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2588' # 0x00db -> FULL BLOCK
'\u2584' # 0x00dc -> LOWER HALF BLOCK
'\u03b4' # 0x00dd -> GREEK SMALL LETTER DELTA
'\u03b5' # 0x00de -> GREEK SMALL LETTER EPSILON
'\u2580' # 0x00df -> UPPER HALF BLOCK
'\u03b6' # 0x00e0 -> GREEK SMALL LETTER ZETA
'\u03b7' # 0x00e1 -> GREEK SMALL LETTER ETA
'\u03b8' # 0x00e2 -> GREEK SMALL LETTER THETA
'\u03b9' # 0x00e3 -> GREEK SMALL LETTER IOTA
'\u03ba' # 0x00e4 -> GREEK SMALL LETTER KAPPA
'\u03bb' # 0x00e5 -> GREEK SMALL LETTER LAMDA
'\u03bc' # 0x00e6 -> GREEK SMALL LETTER MU
'\u03bd' # 0x00e7 -> GREEK SMALL LETTER NU
'\u03be' # 0x00e8 -> GREEK SMALL LETTER XI
'\u03bf' # 0x00e9 -> GREEK SMALL LETTER OMICRON
'\u03c0' # 0x00ea -> GREEK SMALL LETTER PI
'\u03c1' # 0x00eb -> GREEK SMALL LETTER RHO
'\u03c3' # 0x00ec -> GREEK SMALL LETTER SIGMA
'\u03c2' # 0x00ed -> GREEK SMALL LETTER FINAL SIGMA
'\u03c4' # 0x00ee -> GREEK SMALL LETTER TAU
'\u0384' # 0x00ef -> GREEK TONOS
'\xad' # 0x00f0 -> SOFT HYPHEN
'\xb1' # 0x00f1 -> PLUS-MINUS SIGN
'\u03c5' # 0x00f2 -> GREEK SMALL LETTER UPSILON
'\u03c6' # 0x00f3 -> GREEK SMALL LETTER PHI
'\u03c7' # 0x00f4 -> GREEK SMALL LETTER CHI
'\xa7' # 0x00f5 -> SECTION SIGN
'\u03c8' # 0x00f6 -> GREEK SMALL LETTER PSI
'\u0385' # 0x00f7 -> GREEK DIALYTIKA TONOS
'\xb0' # 0x00f8 -> DEGREE SIGN
'\xa8' # 0x00f9 -> DIAERESIS
'\u03c9' # 0x00fa -> GREEK SMALL LETTER OMEGA
'\u03cb' # 0x00fb -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\u03b0' # 0x00fc -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
'\u03ce' # 0x00fd -> GREEK SMALL LETTER OMEGA WITH TONOS
'\u25a0' # 0x00fe -> BLACK SQUARE
'\xa0' # 0x00ff -> NO-BREAK SPACE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # NULL
0x0001: 0x0001, # START OF HEADING
0x0002: 0x0002, # START OF TEXT
0x0003: 0x0003, # END OF TEXT
0x0004: 0x0004, # END OF TRANSMISSION
0x0005: 0x0005, # ENQUIRY
0x0006: 0x0006, # ACKNOWLEDGE
0x0007: 0x0007, # BELL
0x0008: 0x0008, # BACKSPACE
0x0009: 0x0009, # HORIZONTAL TABULATION
0x000a: 0x000a, # LINE FEED
0x000b: 0x000b, # VERTICAL TABULATION
0x000c: 0x000c, # FORM FEED
0x000d: 0x000d, # CARRIAGE RETURN
0x000e: 0x000e, # SHIFT OUT
0x000f: 0x000f, # SHIFT IN
0x0010: 0x0010, # DATA LINK ESCAPE
0x0011: 0x0011, # DEVICE CONTROL ONE
0x0012: 0x0012, # DEVICE CONTROL TWO
0x0013: 0x0013, # DEVICE CONTROL THREE
0x0014: 0x0014, # DEVICE CONTROL FOUR
0x0015: 0x0015, # NEGATIVE ACKNOWLEDGE
0x0016: 0x0016, # SYNCHRONOUS IDLE
0x0017: 0x0017, # END OF TRANSMISSION BLOCK
0x0018: 0x0018, # CANCEL
0x0019: 0x0019, # END OF MEDIUM
0x001a: 0x001a, # SUBSTITUTE
0x001b: 0x001b, # ESCAPE
0x001c: 0x001c, # FILE SEPARATOR
0x001d: 0x001d, # GROUP SEPARATOR
0x001e: 0x001e, # RECORD SEPARATOR
0x001f: 0x001f, # UNIT SEPARATOR
0x0020: 0x0020, # SPACE
0x0021: 0x0021, # EXCLAMATION MARK
0x0022: 0x0022, # QUOTATION MARK
0x0023: 0x0023, # NUMBER SIGN
0x0024: 0x0024, # DOLLAR SIGN
0x0025: 0x0025, # PERCENT SIGN
0x0026: 0x0026, # AMPERSAND
0x0027: 0x0027, # APOSTROPHE
0x0028: 0x0028, # LEFT PARENTHESIS
0x0029: 0x0029, # RIGHT PARENTHESIS
0x002a: 0x002a, # ASTERISK
0x002b: 0x002b, # PLUS SIGN
0x002c: 0x002c, # COMMA
0x002d: 0x002d, # HYPHEN-MINUS
0x002e: 0x002e, # FULL STOP
0x002f: 0x002f, # SOLIDUS
0x0030: 0x0030, # DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE
0x0032: 0x0032, # DIGIT TWO
0x0033: 0x0033, # DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE
0x003a: 0x003a, # COLON
0x003b: 0x003b, # SEMICOLON
0x003c: 0x003c, # LESS-THAN SIGN
0x003d: 0x003d, # EQUALS SIGN
0x003e: 0x003e, # GREATER-THAN SIGN
0x003f: 0x003f, # QUESTION MARK
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET
0x005c: 0x005c, # REVERSE SOLIDUS
0x005d: 0x005d, # RIGHT SQUARE BRACKET
0x005e: 0x005e, # CIRCUMFLEX ACCENT
0x005f: 0x005f, # LOW LINE
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET
0x007c: 0x007c, # VERTICAL LINE
0x007d: 0x007d, # RIGHT CURLY BRACKET
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # DELETE
0x00a0: 0x00ff, # NO-BREAK SPACE
0x00a3: 0x009c, # POUND SIGN
0x00a6: 0x008a, # BROKEN BAR
0x00a7: 0x00f5, # SECTION SIGN
0x00a8: 0x00f9, # DIAERESIS
0x00a9: 0x0097, # COPYRIGHT SIGN
0x00ab: 0x00ae, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00ac: 0x0089, # NOT SIGN
0x00ad: 0x00f0, # SOFT HYPHEN
0x00b0: 0x00f8, # DEGREE SIGN
0x00b1: 0x00f1, # PLUS-MINUS SIGN
0x00b2: 0x0099, # SUPERSCRIPT TWO
0x00b3: 0x009a, # SUPERSCRIPT THREE
0x00b7: 0x0088, # MIDDLE DOT
0x00bb: 0x00af, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
0x00bd: 0x00ab, # VULGAR FRACTION ONE HALF
0x0384: 0x00ef, # GREEK TONOS
0x0385: 0x00f7, # GREEK DIALYTIKA TONOS
0x0386: 0x0086, # GREEK CAPITAL LETTER ALPHA WITH TONOS
0x0388: 0x008d, # GREEK CAPITAL LETTER EPSILON WITH TONOS
0x0389: 0x008f, # GREEK CAPITAL LETTER ETA WITH TONOS
0x038a: 0x0090, # GREEK CAPITAL LETTER IOTA WITH TONOS
0x038c: 0x0092, # GREEK CAPITAL LETTER OMICRON WITH TONOS
0x038e: 0x0095, # GREEK CAPITAL LETTER UPSILON WITH TONOS
0x038f: 0x0098, # GREEK CAPITAL LETTER OMEGA WITH TONOS
0x0390: 0x00a1, # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
0x0391: 0x00a4, # GREEK CAPITAL LETTER ALPHA
0x0392: 0x00a5, # GREEK CAPITAL LETTER BETA
0x0393: 0x00a6, # GREEK CAPITAL LETTER GAMMA
0x0394: 0x00a7, # GREEK CAPITAL LETTER DELTA
0x0395: 0x00a8, # GREEK CAPITAL LETTER EPSILON
0x0396: 0x00a9, # GREEK CAPITAL LETTER ZETA
0x0397: 0x00aa, # GREEK CAPITAL LETTER ETA
0x0398: 0x00ac, # GREEK CAPITAL LETTER THETA
0x0399: 0x00ad, # GREEK CAPITAL LETTER IOTA
0x039a: 0x00b5, # GREEK CAPITAL LETTER KAPPA
0x039b: 0x00b6, # GREEK CAPITAL LETTER LAMDA
0x039c: 0x00b7, # GREEK CAPITAL LETTER MU
0x039d: 0x00b8, # GREEK CAPITAL LETTER NU
0x039e: 0x00bd, # GREEK CAPITAL LETTER XI
0x039f: 0x00be, # GREEK CAPITAL LETTER OMICRON
0x03a0: 0x00c6, # GREEK CAPITAL LETTER PI
0x03a1: 0x00c7, # GREEK CAPITAL LETTER RHO
0x03a3: 0x00cf, # GREEK CAPITAL LETTER SIGMA
0x03a4: 0x00d0, # GREEK CAPITAL LETTER TAU
0x03a5: 0x00d1, # GREEK CAPITAL LETTER UPSILON
0x03a6: 0x00d2, # GREEK CAPITAL LETTER PHI
0x03a7: 0x00d3, # GREEK CAPITAL LETTER CHI
0x03a8: 0x00d4, # GREEK CAPITAL LETTER PSI
0x03a9: 0x00d5, # GREEK CAPITAL LETTER OMEGA
0x03aa: 0x0091, # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
0x03ab: 0x0096, # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
0x03ac: 0x009b, # GREEK SMALL LETTER ALPHA WITH TONOS
0x03ad: 0x009d, # GREEK SMALL LETTER EPSILON WITH TONOS
0x03ae: 0x009e, # GREEK SMALL LETTER ETA WITH TONOS
0x03af: 0x009f, # GREEK SMALL LETTER IOTA WITH TONOS
0x03b0: 0x00fc, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
0x03b1: 0x00d6, # GREEK SMALL LETTER ALPHA
0x03b2: 0x00d7, # GREEK SMALL LETTER BETA
0x03b3: 0x00d8, # GREEK SMALL LETTER GAMMA
0x03b4: 0x00dd, # GREEK SMALL LETTER DELTA
0x03b5: 0x00de, # GREEK SMALL LETTER EPSILON
0x03b6: 0x00e0, # GREEK SMALL LETTER ZETA
0x03b7: 0x00e1, # GREEK SMALL LETTER ETA
0x03b8: 0x00e2, # GREEK SMALL LETTER THETA
0x03b9: 0x00e3, # GREEK SMALL LETTER IOTA
0x03ba: 0x00e4, # GREEK SMALL LETTER KAPPA
0x03bb: 0x00e5, # GREEK SMALL LETTER LAMDA
0x03bc: 0x00e6, # GREEK SMALL LETTER MU
0x03bd: 0x00e7, # GREEK SMALL LETTER NU
0x03be: 0x00e8, # GREEK SMALL LETTER XI
0x03bf: 0x00e9, # GREEK SMALL LETTER OMICRON
0x03c0: 0x00ea, # GREEK SMALL LETTER PI
0x03c1: 0x00eb, # GREEK SMALL LETTER RHO
0x03c2: 0x00ed, # GREEK SMALL LETTER FINAL SIGMA
0x03c3: 0x00ec, # GREEK SMALL LETTER SIGMA
0x03c4: 0x00ee, # GREEK SMALL LETTER TAU
0x03c5: 0x00f2, # GREEK SMALL LETTER UPSILON
0x03c6: 0x00f3, # GREEK SMALL LETTER PHI
0x03c7: 0x00f4, # GREEK SMALL LETTER CHI
0x03c8: 0x00f6, # GREEK SMALL LETTER PSI
0x03c9: 0x00fa, # GREEK SMALL LETTER OMEGA
0x03ca: 0x00a0, # GREEK SMALL LETTER IOTA WITH DIALYTIKA
0x03cb: 0x00fb, # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
0x03cc: 0x00a2, # GREEK SMALL LETTER OMICRON WITH TONOS
0x03cd: 0x00a3, # GREEK SMALL LETTER UPSILON WITH TONOS
0x03ce: 0x00fd, # GREEK SMALL LETTER OMEGA WITH TONOS
0x2015: 0x008e, # HORIZONTAL BAR
0x2018: 0x008b, # LEFT SINGLE QUOTATION MARK
0x2019: 0x008c, # RIGHT SINGLE QUOTATION MARK
0x2500: 0x00c4, # BOX DRAWINGS LIGHT HORIZONTAL
0x2502: 0x00b3, # BOX DRAWINGS LIGHT VERTICAL
0x250c: 0x00da, # BOX DRAWINGS LIGHT DOWN AND RIGHT
0x2510: 0x00bf, # BOX DRAWINGS LIGHT DOWN AND LEFT
0x2514: 0x00c0, # BOX DRAWINGS LIGHT UP AND RIGHT
0x2518: 0x00d9, # BOX DRAWINGS LIGHT UP AND LEFT
0x251c: 0x00c3, # BOX DRAWINGS LIGHT VERTICAL AND RIGHT
0x2524: 0x00b4, # BOX DRAWINGS LIGHT VERTICAL AND LEFT
0x252c: 0x00c2, # BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
0x2534: 0x00c1, # BOX DRAWINGS LIGHT UP AND HORIZONTAL
0x253c: 0x00c5, # BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
0x2550: 0x00cd, # BOX DRAWINGS DOUBLE HORIZONTAL
0x2551: 0x00ba, # BOX DRAWINGS DOUBLE VERTICAL
0x2554: 0x00c9, # BOX DRAWINGS DOUBLE DOWN AND RIGHT
0x2557: 0x00bb, # BOX DRAWINGS DOUBLE DOWN AND LEFT
0x255a: 0x00c8, # BOX DRAWINGS DOUBLE UP AND RIGHT
0x255d: 0x00bc, # BOX DRAWINGS DOUBLE UP AND LEFT
0x2560: 0x00cc, # BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
0x2563: 0x00b9, # BOX DRAWINGS DOUBLE VERTICAL AND LEFT
0x2566: 0x00cb, # BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
0x2569: 0x00ca, # BOX DRAWINGS DOUBLE UP AND HORIZONTAL
0x256c: 0x00ce, # BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
0x2580: 0x00df, # UPPER HALF BLOCK
0x2584: 0x00dc, # LOWER HALF BLOCK
0x2588: 0x00db, # FULL BLOCK
0x2591: 0x00b0, # LIGHT SHADE
0x2592: 0x00b1, # MEDIUM SHADE
0x2593: 0x00b2, # DARK SHADE
0x25a0: 0x00fe, # BLACK SQUARE
}
""" Python Character Mapping Codec cp874 generated from 'MAPPINGS/VENDORS/MICSFT/WINDOWS/CP874.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp874',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\ufffe' # 0x81 -> UNDEFINED
'\ufffe' # 0x82 -> UNDEFINED
'\ufffe' # 0x83 -> UNDEFINED
'\ufffe' # 0x84 -> UNDEFINED
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\ufffe' # 0x86 -> UNDEFINED
'\ufffe' # 0x87 -> UNDEFINED
'\ufffe' # 0x88 -> UNDEFINED
'\ufffe' # 0x89 -> UNDEFINED
'\ufffe' # 0x8A -> UNDEFINED
'\ufffe' # 0x8B -> UNDEFINED
'\ufffe' # 0x8C -> UNDEFINED
'\ufffe' # 0x8D -> UNDEFINED
'\ufffe' # 0x8E -> UNDEFINED
'\ufffe' # 0x8F -> UNDEFINED
'\ufffe' # 0x90 -> UNDEFINED
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\ufffe' # 0x98 -> UNDEFINED
'\ufffe' # 0x99 -> UNDEFINED
'\ufffe' # 0x9A -> UNDEFINED
'\ufffe' # 0x9B -> UNDEFINED
'\ufffe' # 0x9C -> UNDEFINED
'\ufffe' # 0x9D -> UNDEFINED
'\ufffe' # 0x9E -> UNDEFINED
'\ufffe' # 0x9F -> UNDEFINED
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0e01' # 0xA1 -> THAI CHARACTER KO KAI
'\u0e02' # 0xA2 -> THAI CHARACTER KHO KHAI
'\u0e03' # 0xA3 -> THAI CHARACTER KHO KHUAT
'\u0e04' # 0xA4 -> THAI CHARACTER KHO KHWAI
'\u0e05' # 0xA5 -> THAI CHARACTER KHO KHON
'\u0e06' # 0xA6 -> THAI CHARACTER KHO RAKHANG
'\u0e07' # 0xA7 -> THAI CHARACTER NGO NGU
'\u0e08' # 0xA8 -> THAI CHARACTER CHO CHAN
'\u0e09' # 0xA9 -> THAI CHARACTER CHO CHING
'\u0e0a' # 0xAA -> THAI CHARACTER CHO CHANG
'\u0e0b' # 0xAB -> THAI CHARACTER SO SO
'\u0e0c' # 0xAC -> THAI CHARACTER CHO CHOE
'\u0e0d' # 0xAD -> THAI CHARACTER YO YING
'\u0e0e' # 0xAE -> THAI CHARACTER DO CHADA
'\u0e0f' # 0xAF -> THAI CHARACTER TO PATAK
'\u0e10' # 0xB0 -> THAI CHARACTER THO THAN
'\u0e11' # 0xB1 -> THAI CHARACTER THO NANGMONTHO
'\u0e12' # 0xB2 -> THAI CHARACTER THO PHUTHAO
'\u0e13' # 0xB3 -> THAI CHARACTER NO NEN
'\u0e14' # 0xB4 -> THAI CHARACTER DO DEK
'\u0e15' # 0xB5 -> THAI CHARACTER TO TAO
'\u0e16' # 0xB6 -> THAI CHARACTER THO THUNG
'\u0e17' # 0xB7 -> THAI CHARACTER THO THAHAN
'\u0e18' # 0xB8 -> THAI CHARACTER THO THONG
'\u0e19' # 0xB9 -> THAI CHARACTER NO NU
'\u0e1a' # 0xBA -> THAI CHARACTER BO BAIMAI
'\u0e1b' # 0xBB -> THAI CHARACTER PO PLA
'\u0e1c' # 0xBC -> THAI CHARACTER PHO PHUNG
'\u0e1d' # 0xBD -> THAI CHARACTER FO FA
'\u0e1e' # 0xBE -> THAI CHARACTER PHO PHAN
'\u0e1f' # 0xBF -> THAI CHARACTER FO FAN
'\u0e20' # 0xC0 -> THAI CHARACTER PHO SAMPHAO
'\u0e21' # 0xC1 -> THAI CHARACTER MO MA
'\u0e22' # 0xC2 -> THAI CHARACTER YO YAK
'\u0e23' # 0xC3 -> THAI CHARACTER RO RUA
'\u0e24' # 0xC4 -> THAI CHARACTER RU
'\u0e25' # 0xC5 -> THAI CHARACTER LO LING
'\u0e26' # 0xC6 -> THAI CHARACTER LU
'\u0e27' # 0xC7 -> THAI CHARACTER WO WAEN
'\u0e28' # 0xC8 -> THAI CHARACTER SO SALA
'\u0e29' # 0xC9 -> THAI CHARACTER SO RUSI
'\u0e2a' # 0xCA -> THAI CHARACTER SO SUA
'\u0e2b' # 0xCB -> THAI CHARACTER HO HIP
'\u0e2c' # 0xCC -> THAI CHARACTER LO CHULA
'\u0e2d' # 0xCD -> THAI CHARACTER O ANG
'\u0e2e' # 0xCE -> THAI CHARACTER HO NOKHUK
'\u0e2f' # 0xCF -> THAI CHARACTER PAIYANNOI
'\u0e30' # 0xD0 -> THAI CHARACTER SARA A
'\u0e31' # 0xD1 -> THAI CHARACTER MAI HAN-AKAT
'\u0e32' # 0xD2 -> THAI CHARACTER SARA AA
'\u0e33' # 0xD3 -> THAI CHARACTER SARA AM
'\u0e34' # 0xD4 -> THAI CHARACTER SARA I
'\u0e35' # 0xD5 -> THAI CHARACTER SARA II
'\u0e36' # 0xD6 -> THAI CHARACTER SARA UE
'\u0e37' # 0xD7 -> THAI CHARACTER SARA UEE
'\u0e38' # 0xD8 -> THAI CHARACTER SARA U
'\u0e39' # 0xD9 -> THAI CHARACTER SARA UU
'\u0e3a' # 0xDA -> THAI CHARACTER PHINTHU
'\ufffe' # 0xDB -> UNDEFINED
'\ufffe' # 0xDC -> UNDEFINED
'\ufffe' # 0xDD -> UNDEFINED
'\ufffe' # 0xDE -> UNDEFINED
'\u0e3f' # 0xDF -> THAI CURRENCY SYMBOL BAHT
'\u0e40' # 0xE0 -> THAI CHARACTER SARA E
'\u0e41' # 0xE1 -> THAI CHARACTER SARA AE
'\u0e42' # 0xE2 -> THAI CHARACTER SARA O
'\u0e43' # 0xE3 -> THAI CHARACTER SARA AI MAIMUAN
'\u0e44' # 0xE4 -> THAI CHARACTER SARA AI MAIMALAI
'\u0e45' # 0xE5 -> THAI CHARACTER LAKKHANGYAO
'\u0e46' # 0xE6 -> THAI CHARACTER MAIYAMOK
'\u0e47' # 0xE7 -> THAI CHARACTER MAITAIKHU
'\u0e48' # 0xE8 -> THAI CHARACTER MAI EK
'\u0e49' # 0xE9 -> THAI CHARACTER MAI THO
'\u0e4a' # 0xEA -> THAI CHARACTER MAI TRI
'\u0e4b' # 0xEB -> THAI CHARACTER MAI CHATTAWA
'\u0e4c' # 0xEC -> THAI CHARACTER THANTHAKHAT
'\u0e4d' # 0xED -> THAI CHARACTER NIKHAHIT
'\u0e4e' # 0xEE -> THAI CHARACTER YAMAKKAN
'\u0e4f' # 0xEF -> THAI CHARACTER FONGMAN
'\u0e50' # 0xF0 -> THAI DIGIT ZERO
'\u0e51' # 0xF1 -> THAI DIGIT ONE
'\u0e52' # 0xF2 -> THAI DIGIT TWO
'\u0e53' # 0xF3 -> THAI DIGIT THREE
'\u0e54' # 0xF4 -> THAI DIGIT FOUR
'\u0e55' # 0xF5 -> THAI DIGIT FIVE
'\u0e56' # 0xF6 -> THAI DIGIT SIX
'\u0e57' # 0xF7 -> THAI DIGIT SEVEN
'\u0e58' # 0xF8 -> THAI DIGIT EIGHT
'\u0e59' # 0xF9 -> THAI DIGIT NINE
'\u0e5a' # 0xFA -> THAI CHARACTER ANGKHANKHU
'\u0e5b' # 0xFB -> THAI CHARACTER KHOMUT
'\ufffe' # 0xFC -> UNDEFINED
'\ufffe' # 0xFD -> UNDEFINED
'\ufffe' # 0xFE -> UNDEFINED
'\ufffe' # 0xFF -> UNDEFINED
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec cp875 generated from 'MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='cp875',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x9c' # 0x04 -> CONTROL
'\t' # 0x05 -> HORIZONTAL TABULATION
'\x86' # 0x06 -> CONTROL
'\x7f' # 0x07 -> DELETE
'\x97' # 0x08 -> CONTROL
'\x8d' # 0x09 -> CONTROL
'\x8e' # 0x0A -> CONTROL
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x9d' # 0x14 -> CONTROL
'\x85' # 0x15 -> CONTROL
'\x08' # 0x16 -> BACKSPACE
'\x87' # 0x17 -> CONTROL
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x92' # 0x1A -> CONTROL
'\x8f' # 0x1B -> CONTROL
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
'\x80' # 0x20 -> CONTROL
'\x81' # 0x21 -> CONTROL
'\x82' # 0x22 -> CONTROL
'\x83' # 0x23 -> CONTROL
'\x84' # 0x24 -> CONTROL
'\n' # 0x25 -> LINE FEED
'\x17' # 0x26 -> END OF TRANSMISSION BLOCK
'\x1b' # 0x27 -> ESCAPE
'\x88' # 0x28 -> CONTROL
'\x89' # 0x29 -> CONTROL
'\x8a' # 0x2A -> CONTROL
'\x8b' # 0x2B -> CONTROL
'\x8c' # 0x2C -> CONTROL
'\x05' # 0x2D -> ENQUIRY
'\x06' # 0x2E -> ACKNOWLEDGE
'\x07' # 0x2F -> BELL
'\x90' # 0x30 -> CONTROL
'\x91' # 0x31 -> CONTROL
'\x16' # 0x32 -> SYNCHRONOUS IDLE
'\x93' # 0x33 -> CONTROL
'\x94' # 0x34 -> CONTROL
'\x95' # 0x35 -> CONTROL
'\x96' # 0x36 -> CONTROL
'\x04' # 0x37 -> END OF TRANSMISSION
'\x98' # 0x38 -> CONTROL
'\x99' # 0x39 -> CONTROL
'\x9a' # 0x3A -> CONTROL
'\x9b' # 0x3B -> CONTROL
'\x14' # 0x3C -> DEVICE CONTROL FOUR
'\x15' # 0x3D -> NEGATIVE ACKNOWLEDGE
'\x9e' # 0x3E -> CONTROL
'\x1a' # 0x3F -> SUBSTITUTE
' ' # 0x40 -> SPACE
'\u0391' # 0x41 -> GREEK CAPITAL LETTER ALPHA
'\u0392' # 0x42 -> GREEK CAPITAL LETTER BETA
'\u0393' # 0x43 -> GREEK CAPITAL LETTER GAMMA
'\u0394' # 0x44 -> GREEK CAPITAL LETTER DELTA
'\u0395' # 0x45 -> GREEK CAPITAL LETTER EPSILON
'\u0396' # 0x46 -> GREEK CAPITAL LETTER ZETA
'\u0397' # 0x47 -> GREEK CAPITAL LETTER ETA
'\u0398' # 0x48 -> GREEK CAPITAL LETTER THETA
'\u0399' # 0x49 -> GREEK CAPITAL LETTER IOTA
'[' # 0x4A -> LEFT SQUARE BRACKET
'.' # 0x4B -> FULL STOP
'<' # 0x4C -> LESS-THAN SIGN
'(' # 0x4D -> LEFT PARENTHESIS
'+' # 0x4E -> PLUS SIGN
'!' # 0x4F -> EXCLAMATION MARK
'&' # 0x50 -> AMPERSAND
'\u039a' # 0x51 -> GREEK CAPITAL LETTER KAPPA
'\u039b' # 0x52 -> GREEK CAPITAL LETTER LAMDA
'\u039c' # 0x53 -> GREEK CAPITAL LETTER MU
'\u039d' # 0x54 -> GREEK CAPITAL LETTER NU
'\u039e' # 0x55 -> GREEK CAPITAL LETTER XI
'\u039f' # 0x56 -> GREEK CAPITAL LETTER OMICRON
'\u03a0' # 0x57 -> GREEK CAPITAL LETTER PI
'\u03a1' # 0x58 -> GREEK CAPITAL LETTER RHO
'\u03a3' # 0x59 -> GREEK CAPITAL LETTER SIGMA
']' # 0x5A -> RIGHT SQUARE BRACKET
'$' # 0x5B -> DOLLAR SIGN
'*' # 0x5C -> ASTERISK
')' # 0x5D -> RIGHT PARENTHESIS
';' # 0x5E -> SEMICOLON
'^' # 0x5F -> CIRCUMFLEX ACCENT
'-' # 0x60 -> HYPHEN-MINUS
'/' # 0x61 -> SOLIDUS
'\u03a4' # 0x62 -> GREEK CAPITAL LETTER TAU
'\u03a5' # 0x63 -> GREEK CAPITAL LETTER UPSILON
'\u03a6' # 0x64 -> GREEK CAPITAL LETTER PHI
'\u03a7' # 0x65 -> GREEK CAPITAL LETTER CHI
'\u03a8' # 0x66 -> GREEK CAPITAL LETTER PSI
'\u03a9' # 0x67 -> GREEK CAPITAL LETTER OMEGA
'\u03aa' # 0x68 -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\u03ab' # 0x69 -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'|' # 0x6A -> VERTICAL LINE
',' # 0x6B -> COMMA
'%' # 0x6C -> PERCENT SIGN
'_' # 0x6D -> LOW LINE
'>' # 0x6E -> GREATER-THAN SIGN
'?' # 0x6F -> QUESTION MARK
'\xa8' # 0x70 -> DIAERESIS
'\u0386' # 0x71 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
'\u0388' # 0x72 -> GREEK CAPITAL LETTER EPSILON WITH TONOS
'\u0389' # 0x73 -> GREEK CAPITAL LETTER ETA WITH TONOS
'\xa0' # 0x74 -> NO-BREAK SPACE
'\u038a' # 0x75 -> GREEK CAPITAL LETTER IOTA WITH TONOS
'\u038c' # 0x76 -> GREEK CAPITAL LETTER OMICRON WITH TONOS
'\u038e' # 0x77 -> GREEK CAPITAL LETTER UPSILON WITH TONOS
'\u038f' # 0x78 -> GREEK CAPITAL LETTER OMEGA WITH TONOS
'`' # 0x79 -> GRAVE ACCENT
':' # 0x7A -> COLON
'#' # 0x7B -> NUMBER SIGN
'@' # 0x7C -> COMMERCIAL AT
"'" # 0x7D -> APOSTROPHE
'=' # 0x7E -> EQUALS SIGN
'"' # 0x7F -> QUOTATION MARK
'\u0385' # 0x80 -> GREEK DIALYTIKA TONOS
'a' # 0x81 -> LATIN SMALL LETTER A
'b' # 0x82 -> LATIN SMALL LETTER B
'c' # 0x83 -> LATIN SMALL LETTER C
'd' # 0x84 -> LATIN SMALL LETTER D
'e' # 0x85 -> LATIN SMALL LETTER E
'f' # 0x86 -> LATIN SMALL LETTER F
'g' # 0x87 -> LATIN SMALL LETTER G
'h' # 0x88 -> LATIN SMALL LETTER H
'i' # 0x89 -> LATIN SMALL LETTER I
'\u03b1' # 0x8A -> GREEK SMALL LETTER ALPHA
'\u03b2' # 0x8B -> GREEK SMALL LETTER BETA
'\u03b3' # 0x8C -> GREEK SMALL LETTER GAMMA
'\u03b4' # 0x8D -> GREEK SMALL LETTER DELTA
'\u03b5' # 0x8E -> GREEK SMALL LETTER EPSILON
'\u03b6' # 0x8F -> GREEK SMALL LETTER ZETA
'\xb0' # 0x90 -> DEGREE SIGN
'j' # 0x91 -> LATIN SMALL LETTER J
'k' # 0x92 -> LATIN SMALL LETTER K
'l' # 0x93 -> LATIN SMALL LETTER L
'm' # 0x94 -> LATIN SMALL LETTER M
'n' # 0x95 -> LATIN SMALL LETTER N
'o' # 0x96 -> LATIN SMALL LETTER O
'p' # 0x97 -> LATIN SMALL LETTER P
'q' # 0x98 -> LATIN SMALL LETTER Q
'r' # 0x99 -> LATIN SMALL LETTER R
'\u03b7' # 0x9A -> GREEK SMALL LETTER ETA
'\u03b8' # 0x9B -> GREEK SMALL LETTER THETA
'\u03b9' # 0x9C -> GREEK SMALL LETTER IOTA
'\u03ba' # 0x9D -> GREEK SMALL LETTER KAPPA
'\u03bb' # 0x9E -> GREEK SMALL LETTER LAMDA
'\u03bc' # 0x9F -> GREEK SMALL LETTER MU
'\xb4' # 0xA0 -> ACUTE ACCENT
'~' # 0xA1 -> TILDE
's' # 0xA2 -> LATIN SMALL LETTER S
't' # 0xA3 -> LATIN SMALL LETTER T
'u' # 0xA4 -> LATIN SMALL LETTER U
'v' # 0xA5 -> LATIN SMALL LETTER V
'w' # 0xA6 -> LATIN SMALL LETTER W
'x' # 0xA7 -> LATIN SMALL LETTER X
'y' # 0xA8 -> LATIN SMALL LETTER Y
'z' # 0xA9 -> LATIN SMALL LETTER Z
'\u03bd' # 0xAA -> GREEK SMALL LETTER NU
'\u03be' # 0xAB -> GREEK SMALL LETTER XI
'\u03bf' # 0xAC -> GREEK SMALL LETTER OMICRON
'\u03c0' # 0xAD -> GREEK SMALL LETTER PI
'\u03c1' # 0xAE -> GREEK SMALL LETTER RHO
'\u03c3' # 0xAF -> GREEK SMALL LETTER SIGMA
'\xa3' # 0xB0 -> POUND SIGN
'\u03ac' # 0xB1 -> GREEK SMALL LETTER ALPHA WITH TONOS
'\u03ad' # 0xB2 -> GREEK SMALL LETTER EPSILON WITH TONOS
'\u03ae' # 0xB3 -> GREEK SMALL LETTER ETA WITH TONOS
'\u03ca' # 0xB4 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\u03af' # 0xB5 -> GREEK SMALL LETTER IOTA WITH TONOS
'\u03cc' # 0xB6 -> GREEK SMALL LETTER OMICRON WITH TONOS
'\u03cd' # 0xB7 -> GREEK SMALL LETTER UPSILON WITH TONOS
'\u03cb' # 0xB8 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\u03ce' # 0xB9 -> GREEK SMALL LETTER OMEGA WITH TONOS
'\u03c2' # 0xBA -> GREEK SMALL LETTER FINAL SIGMA
'\u03c4' # 0xBB -> GREEK SMALL LETTER TAU
'\u03c5' # 0xBC -> GREEK SMALL LETTER UPSILON
'\u03c6' # 0xBD -> GREEK SMALL LETTER PHI
'\u03c7' # 0xBE -> GREEK SMALL LETTER CHI
'\u03c8' # 0xBF -> GREEK SMALL LETTER PSI
'{' # 0xC0 -> LEFT CURLY BRACKET
'A' # 0xC1 -> LATIN CAPITAL LETTER A
'B' # 0xC2 -> LATIN CAPITAL LETTER B
'C' # 0xC3 -> LATIN CAPITAL LETTER C
'D' # 0xC4 -> LATIN CAPITAL LETTER D
'E' # 0xC5 -> LATIN CAPITAL LETTER E
'F' # 0xC6 -> LATIN CAPITAL LETTER F
'G' # 0xC7 -> LATIN CAPITAL LETTER G
'H' # 0xC8 -> LATIN CAPITAL LETTER H
'I' # 0xC9 -> LATIN CAPITAL LETTER I
'\xad' # 0xCA -> SOFT HYPHEN
'\u03c9' # 0xCB -> GREEK SMALL LETTER OMEGA
'\u0390' # 0xCC -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
'\u03b0' # 0xCD -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
'\u2018' # 0xCE -> LEFT SINGLE QUOTATION MARK
'\u2015' # 0xCF -> HORIZONTAL BAR
'}' # 0xD0 -> RIGHT CURLY BRACKET
'J' # 0xD1 -> LATIN CAPITAL LETTER J
'K' # 0xD2 -> LATIN CAPITAL LETTER K
'L' # 0xD3 -> LATIN CAPITAL LETTER L
'M' # 0xD4 -> LATIN CAPITAL LETTER M
'N' # 0xD5 -> LATIN CAPITAL LETTER N
'O' # 0xD6 -> LATIN CAPITAL LETTER O
'P' # 0xD7 -> LATIN CAPITAL LETTER P
'Q' # 0xD8 -> LATIN CAPITAL LETTER Q
'R' # 0xD9 -> LATIN CAPITAL LETTER R
'\xb1' # 0xDA -> PLUS-MINUS SIGN
'\xbd' # 0xDB -> VULGAR FRACTION ONE HALF
'\x1a' # 0xDC -> SUBSTITUTE
'\u0387' # 0xDD -> GREEK ANO TELEIA
'\u2019' # 0xDE -> RIGHT SINGLE QUOTATION MARK
'\xa6' # 0xDF -> BROKEN BAR
'\\' # 0xE0 -> REVERSE SOLIDUS
'\x1a' # 0xE1 -> SUBSTITUTE
'S' # 0xE2 -> LATIN CAPITAL LETTER S
'T' # 0xE3 -> LATIN CAPITAL LETTER T
'U' # 0xE4 -> LATIN CAPITAL LETTER U
'V' # 0xE5 -> LATIN CAPITAL LETTER V
'W' # 0xE6 -> LATIN CAPITAL LETTER W
'X' # 0xE7 -> LATIN CAPITAL LETTER X
'Y' # 0xE8 -> LATIN CAPITAL LETTER Y
'Z' # 0xE9 -> LATIN CAPITAL LETTER Z
'\xb2' # 0xEA -> SUPERSCRIPT TWO
'\xa7' # 0xEB -> SECTION SIGN
'\x1a' # 0xEC -> SUBSTITUTE
'\x1a' # 0xED -> SUBSTITUTE
'\xab' # 0xEE -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xEF -> NOT SIGN
'0' # 0xF0 -> DIGIT ZERO
'1' # 0xF1 -> DIGIT ONE
'2' # 0xF2 -> DIGIT TWO
'3' # 0xF3 -> DIGIT THREE
'4' # 0xF4 -> DIGIT FOUR
'5' # 0xF5 -> DIGIT FIVE
'6' # 0xF6 -> DIGIT SIX
'7' # 0xF7 -> DIGIT SEVEN
'8' # 0xF8 -> DIGIT EIGHT
'9' # 0xF9 -> DIGIT NINE
'\xb3' # 0xFA -> SUPERSCRIPT THREE
'\xa9' # 0xFB -> COPYRIGHT SIGN
'\x1a' # 0xFC -> SUBSTITUTE
'\x1a' # 0xFD -> SUBSTITUTE
'\xbb' # 0xFE -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\x9f' # 0xFF -> CONTROL
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
#
# cp932.py: Python Unicode Codec for CP932
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('cp932')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='cp932',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# cp949.py: Python Unicode Codec for CP949
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_kr, codecs
import _multibytecodec as mbc
codec = _codecs_kr.getcodec('cp949')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='cp949',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# cp950.py: Python Unicode Codec for CP950
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_tw, codecs
import _multibytecodec as mbc
codec = _codecs_tw.getcodec('cp950')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='cp950',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# euc_jisx0213.py: Python Unicode Codec for EUC_JISX0213
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('euc_jisx0213')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='euc_jisx0213',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# euc_jis_2004.py: Python Unicode Codec for EUC_JIS_2004
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('euc_jis_2004')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='euc_jis_2004',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# euc_jp.py: Python Unicode Codec for EUC_JP
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('euc_jp')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='euc_jp',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# euc_kr.py: Python Unicode Codec for EUC_KR
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_kr, codecs
import _multibytecodec as mbc
codec = _codecs_kr.getcodec('euc_kr')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='euc_kr',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# gb18030.py: Python Unicode Codec for GB18030
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_cn, codecs
import _multibytecodec as mbc
codec = _codecs_cn.getcodec('gb18030')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='gb18030',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# gb2312.py: Python Unicode Codec for GB2312
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_cn, codecs
import _multibytecodec as mbc
codec = _codecs_cn.getcodec('gb2312')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='gb2312',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# gbk.py: Python Unicode Codec for GBK
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_cn, codecs
import _multibytecodec as mbc
codec = _codecs_cn.getcodec('gbk')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='gbk',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""Python 'hex_codec' Codec - 2-digit hex content transfer encoding.
This codec de/encodes from bytes to bytes.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs
import binascii
### Codec APIs
def hex_encode(input, errors='strict'):
assert errors == 'strict'
return (binascii.b2a_hex(input), len(input))
def hex_decode(input, errors='strict'):
assert errors == 'strict'
return (binascii.a2b_hex(input), len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return hex_encode(input, errors)
def decode(self, input, errors='strict'):
return hex_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
assert self.errors == 'strict'
return binascii.b2a_hex(input)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
assert self.errors == 'strict'
return binascii.a2b_hex(input)
class StreamWriter(Codec, codecs.StreamWriter):
charbuffertype = bytes
class StreamReader(Codec, codecs.StreamReader):
charbuffertype = bytes
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='hex',
encode=hex_encode,
decode=hex_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Python Character Mapping Codec generated from 'hp_roman8.txt' with gencodec.py.
Based on data from ftp://dkuug.dk/i18n/charmaps/HP-ROMAN8 (Keld Simonsen)
Original source: LaserJet IIP Printer User's Manual HP part no
33471-90901, Hewlet-Packard, June 1989.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='hp-roman8',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xc0' # 0xA1 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc2' # 0xA2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc8' # 0xA3 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xca' # 0xA4 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xA5 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xce' # 0xA6 -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xA7 -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xb4' # 0xA8 -> ACUTE ACCENT
'\u02cb' # 0xA9 -> MODIFIER LETTER GRAVE ACCENT (MANDARIN CHINESE FOURTH TONE)
'\u02c6' # 0xAA -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\xa8' # 0xAB -> DIAERESIS
'\u02dc' # 0xAC -> SMALL TILDE
'\xd9' # 0xAD -> LATIN CAPITAL LETTER U WITH GRAVE
'\xdb' # 0xAE -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\u20a4' # 0xAF -> LIRA SIGN
'\xaf' # 0xB0 -> MACRON
'\xdd' # 0xB1 -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xfd' # 0xB2 -> LATIN SMALL LETTER Y WITH ACUTE
'\xb0' # 0xB3 -> DEGREE SIGN
'\xc7' # 0xB4 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xe7' # 0xB5 -> LATIN SMALL LETTER C WITH CEDILLA
'\xd1' # 0xB6 -> LATIN CAPITAL LETTER N WITH TILDE
'\xf1' # 0xB7 -> LATIN SMALL LETTER N WITH TILDE
'\xa1' # 0xB8 -> INVERTED EXCLAMATION MARK
'\xbf' # 0xB9 -> INVERTED QUESTION MARK
'\xa4' # 0xBA -> CURRENCY SIGN
'\xa3' # 0xBB -> POUND SIGN
'\xa5' # 0xBC -> YEN SIGN
'\xa7' # 0xBD -> SECTION SIGN
'\u0192' # 0xBE -> LATIN SMALL LETTER F WITH HOOK
'\xa2' # 0xBF -> CENT SIGN
'\xe2' # 0xC0 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xea' # 0xC1 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xf4' # 0xC2 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xfb' # 0xC3 -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xe1' # 0xC4 -> LATIN SMALL LETTER A WITH ACUTE
'\xe9' # 0xC5 -> LATIN SMALL LETTER E WITH ACUTE
'\xf3' # 0xC6 -> LATIN SMALL LETTER O WITH ACUTE
'\xfa' # 0xC7 -> LATIN SMALL LETTER U WITH ACUTE
'\xe0' # 0xC8 -> LATIN SMALL LETTER A WITH GRAVE
'\xe8' # 0xC9 -> LATIN SMALL LETTER E WITH GRAVE
'\xf2' # 0xCA -> LATIN SMALL LETTER O WITH GRAVE
'\xf9' # 0xCB -> LATIN SMALL LETTER U WITH GRAVE
'\xe4' # 0xCC -> LATIN SMALL LETTER A WITH DIAERESIS
'\xeb' # 0xCD -> LATIN SMALL LETTER E WITH DIAERESIS
'\xf6' # 0xCE -> LATIN SMALL LETTER O WITH DIAERESIS
'\xfc' # 0xCF -> LATIN SMALL LETTER U WITH DIAERESIS
'\xc5' # 0xD0 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xee' # 0xD1 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xd8' # 0xD2 -> LATIN CAPITAL LETTER O WITH STROKE
'\xc6' # 0xD3 -> LATIN CAPITAL LETTER AE
'\xe5' # 0xD4 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xed' # 0xD5 -> LATIN SMALL LETTER I WITH ACUTE
'\xf8' # 0xD6 -> LATIN SMALL LETTER O WITH STROKE
'\xe6' # 0xD7 -> LATIN SMALL LETTER AE
'\xc4' # 0xD8 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xec' # 0xD9 -> LATIN SMALL LETTER I WITH GRAVE
'\xd6' # 0xDA -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0xDB -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xc9' # 0xDC -> LATIN CAPITAL LETTER E WITH ACUTE
'\xef' # 0xDD -> LATIN SMALL LETTER I WITH DIAERESIS
'\xdf' # 0xDE -> LATIN SMALL LETTER SHARP S (GERMAN)
'\xd4' # 0xDF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xc1' # 0xE0 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc3' # 0xE1 -> LATIN CAPITAL LETTER A WITH TILDE
'\xe3' # 0xE2 -> LATIN SMALL LETTER A WITH TILDE
'\xd0' # 0xE3 -> LATIN CAPITAL LETTER ETH (ICELANDIC)
'\xf0' # 0xE4 -> LATIN SMALL LETTER ETH (ICELANDIC)
'\xcd' # 0xE5 -> LATIN CAPITAL LETTER I WITH ACUTE
'\xcc' # 0xE6 -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd3' # 0xE7 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd2' # 0xE8 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd5' # 0xE9 -> LATIN CAPITAL LETTER O WITH TILDE
'\xf5' # 0xEA -> LATIN SMALL LETTER O WITH TILDE
'\u0160' # 0xEB -> LATIN CAPITAL LETTER S WITH CARON
'\u0161' # 0xEC -> LATIN SMALL LETTER S WITH CARON
'\xda' # 0xED -> LATIN CAPITAL LETTER U WITH ACUTE
'\u0178' # 0xEE -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\xff' # 0xEF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\xde' # 0xF0 -> LATIN CAPITAL LETTER THORN (ICELANDIC)
'\xfe' # 0xF1 -> LATIN SMALL LETTER THORN (ICELANDIC)
'\xb7' # 0xF2 -> MIDDLE DOT
'\xb5' # 0xF3 -> MICRO SIGN
'\xb6' # 0xF4 -> PILCROW SIGN
'\xbe' # 0xF5 -> VULGAR FRACTION THREE QUARTERS
'\u2014' # 0xF6 -> EM DASH
'\xbc' # 0xF7 -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xF8 -> VULGAR FRACTION ONE HALF
'\xaa' # 0xF9 -> FEMININE ORDINAL INDICATOR
'\xba' # 0xFA -> MASCULINE ORDINAL INDICATOR
'\xab' # 0xFB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u25a0' # 0xFC -> BLACK SQUARE
'\xbb' # 0xFD -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xb1' # 0xFE -> PLUS-MINUS SIGN
'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
#
# hz.py: Python Unicode Codec for HZ
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_cn, codecs
import _multibytecodec as mbc
codec = _codecs_cn.getcodec('hz')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='hz',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
# This module implements the RFCs 3490 (IDNA) and 3491 (Nameprep)
import stringprep, re, codecs
from unicodedata import ucd_3_2_0 as unicodedata
# IDNA section 3.1
dots = re.compile("[\u002E\u3002\uFF0E\uFF61]")
# IDNA section 5
ace_prefix = b"xn--"
sace_prefix = "xn--"
# This assumes query strings, so AllowUnassigned is true
def nameprep(label):
# Map
newlabel = []
for c in label:
if stringprep.in_table_b1(c):
# Map to nothing
continue
newlabel.append(stringprep.map_table_b2(c))
label = "".join(newlabel)
# Normalize
label = unicodedata.normalize("NFKC", label)
# Prohibit
for c in label:
if stringprep.in_table_c12(c) or \
stringprep.in_table_c22(c) or \
stringprep.in_table_c3(c) or \
stringprep.in_table_c4(c) or \
stringprep.in_table_c5(c) or \
stringprep.in_table_c6(c) or \
stringprep.in_table_c7(c) or \
stringprep.in_table_c8(c) or \
stringprep.in_table_c9(c):
raise UnicodeError("Invalid character %r" % c)
# Check bidi
RandAL = [stringprep.in_table_d1(x) for x in label]
for c in RandAL:
if c:
# There is a RandAL char in the string. Must perform further
# tests:
# 1) The characters in section 5.8 MUST be prohibited.
# This is table C.8, which was already checked
# 2) If a string contains any RandALCat character, the string
# MUST NOT contain any LCat character.
if any(stringprep.in_table_d2(x) for x in label):
raise UnicodeError("Violation of BIDI requirement 2")
# 3) If a string contains any RandALCat character, a
# RandALCat character MUST be the first character of the
# string, and a RandALCat character MUST be the last
# character of the string.
if not RandAL[0] or not RandAL[-1]:
raise UnicodeError("Violation of BIDI requirement 3")
return label
def ToASCII(label):
try:
# Step 1: try ASCII
label = label.encode("ascii")
except UnicodeError:
pass
else:
# Skip to step 3: UseSTD3ASCIIRules is false, so
# Skip to step 8.
if 0 < len(label) < 64:
return label
raise UnicodeError("label empty or too long")
# Step 2: nameprep
label = nameprep(label)
# Step 3: UseSTD3ASCIIRules is false
# Step 4: try ASCII
try:
label = label.encode("ascii")
except UnicodeError:
pass
else:
# Skip to step 8.
if 0 < len(label) < 64:
return label
raise UnicodeError("label empty or too long")
# Step 5: Check ACE prefix
if label.startswith(sace_prefix):
raise UnicodeError("Label starts with ACE prefix")
# Step 6: Encode with PUNYCODE
label = label.encode("punycode")
# Step 7: Prepend ACE prefix
label = ace_prefix + label
# Step 8: Check size
if 0 < len(label) < 64:
return label
raise UnicodeError("label empty or too long")
def ToUnicode(label):
# Step 1: Check for ASCII
if isinstance(label, bytes):
pure_ascii = True
else:
try:
label = label.encode("ascii")
pure_ascii = True
except UnicodeError:
pure_ascii = False
if not pure_ascii:
# Step 2: Perform nameprep
label = nameprep(label)
# It doesn't say this, but apparently, it should be ASCII now
try:
label = label.encode("ascii")
except UnicodeError:
raise UnicodeError("Invalid character in IDN label")
# Step 3: Check for ACE prefix
if not label.startswith(ace_prefix):
return str(label, "ascii")
# Step 4: Remove ACE prefix
label1 = label[len(ace_prefix):]
# Step 5: Decode using PUNYCODE
result = label1.decode("punycode")
# Step 6: Apply ToASCII
label2 = ToASCII(result)
# Step 7: Compare the result of step 6 with the one of step 3
# label2 will already be in lower case.
if str(label, "ascii").lower() != str(label2, "ascii"):
raise UnicodeError("IDNA does not round-trip", label, label2)
# Step 8: return the result of step 5
return result
### Codec APIs
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
if errors != 'strict':
# IDNA is quite clear that implementations must be strict
raise UnicodeError("unsupported error handling "+errors)
if not input:
return b'', 0
try:
result = input.encode('ascii')
except UnicodeEncodeError:
pass
else:
# ASCII name: fast path
labels = result.split(b'.')
for label in labels[:-1]:
if not (0 < len(label) < 64):
raise UnicodeError("label empty or too long")
if len(labels[-1]) >= 64:
raise UnicodeError("label too long")
return result, len(input)
result = bytearray()
labels = dots.split(input)
if labels and not labels[-1]:
trailing_dot = b'.'
del labels[-1]
else:
trailing_dot = b''
for label in labels:
if result:
# Join with U+002E
result.extend(b'.')
result.extend(ToASCII(label))
return bytes(result+trailing_dot), len(input)
def decode(self, input, errors='strict'):
if errors != 'strict':
raise UnicodeError("Unsupported error handling "+errors)
if not input:
return "", 0
# IDNA allows decoding to operate on Unicode strings, too.
if not isinstance(input, bytes):
# XXX obviously wrong, see #3232
input = bytes(input)
if ace_prefix not in input:
# Fast path
try:
return input.decode('ascii'), len(input)
except UnicodeDecodeError:
pass
labels = input.split(b".")
if labels and len(labels[-1]) == 0:
trailing_dot = '.'
del labels[-1]
else:
trailing_dot = ''
result = []
for label in labels:
result.append(ToUnicode(label))
return ".".join(result)+trailing_dot, len(input)
class IncrementalEncoder(codecs.BufferedIncrementalEncoder):
def _buffer_encode(self, input, errors, final):
if errors != 'strict':
# IDNA is quite clear that implementations must be strict
raise UnicodeError("unsupported error handling "+errors)
if not input:
return (b'', 0)
labels = dots.split(input)
trailing_dot = b''
if labels:
if not labels[-1]:
trailing_dot = b'.'
del labels[-1]
elif not final:
# Keep potentially unfinished label until the next call
del labels[-1]
if labels:
trailing_dot = b'.'
result = bytearray()
size = 0
for label in labels:
if size:
# Join with U+002E
result.extend(b'.')
size += 1
result.extend(ToASCII(label))
size += len(label)
result += trailing_dot
size += len(trailing_dot)
return (bytes(result), size)
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def _buffer_decode(self, input, errors, final):
if errors != 'strict':
raise UnicodeError("Unsupported error handling "+errors)
if not input:
return ("", 0)
# IDNA allows decoding to operate on Unicode strings, too.
if isinstance(input, str):
labels = dots.split(input)
else:
# Must be ASCII string
input = str(input, "ascii")
labels = input.split(".")
trailing_dot = ''
if labels:
if not labels[-1]:
trailing_dot = '.'
del labels[-1]
elif not final:
# Keep potentially unfinished label until the next call
del labels[-1]
if labels:
trailing_dot = '.'
result = []
size = 0
for label in labels:
result.append(ToUnicode(label))
if size:
size += 1
size += len(label)
result = ".".join(result) + trailing_dot
size += len(trailing_dot)
return (result, size)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='idna',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
#
# iso2022_jp.py: Python Unicode Codec for ISO2022_JP
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_jp')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_jp',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# iso2022_jp_1.py: Python Unicode Codec for ISO2022_JP_1
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_jp_1')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_jp_1',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# iso2022_jp_2.py: Python Unicode Codec for ISO2022_JP_2
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_jp_2')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_jp_2',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# iso2022_jp_2004.py: Python Unicode Codec for ISO2022_JP_2004
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_jp_2004')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_jp_2004',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# iso2022_jp_3.py: Python Unicode Codec for ISO2022_JP_3
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_jp_3')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_jp_3',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# iso2022_jp_ext.py: Python Unicode Codec for ISO2022_JP_EXT
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_jp_ext')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_jp_ext',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# iso2022_kr.py: Python Unicode Codec for ISO2022_KR
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_iso2022, codecs
import _multibytecodec as mbc
codec = _codecs_iso2022.getcodec('iso2022_kr')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='iso2022_kr',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec iso8859_1 generated from 'MAPPINGS/ISO8859/8859-1.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-1',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH (Icelandic)
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN (Icelandic)
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH (Icelandic)
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0xFE -> LATIN SMALL LETTER THORN (Icelandic)
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_10 generated from 'MAPPINGS/ISO8859/8859-10.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-10',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u0112' # 0xA2 -> LATIN CAPITAL LETTER E WITH MACRON
'\u0122' # 0xA3 -> LATIN CAPITAL LETTER G WITH CEDILLA
'\u012a' # 0xA4 -> LATIN CAPITAL LETTER I WITH MACRON
'\u0128' # 0xA5 -> LATIN CAPITAL LETTER I WITH TILDE
'\u0136' # 0xA6 -> LATIN CAPITAL LETTER K WITH CEDILLA
'\xa7' # 0xA7 -> SECTION SIGN
'\u013b' # 0xA8 -> LATIN CAPITAL LETTER L WITH CEDILLA
'\u0110' # 0xA9 -> LATIN CAPITAL LETTER D WITH STROKE
'\u0160' # 0xAA -> LATIN CAPITAL LETTER S WITH CARON
'\u0166' # 0xAB -> LATIN CAPITAL LETTER T WITH STROKE
'\u017d' # 0xAC -> LATIN CAPITAL LETTER Z WITH CARON
'\xad' # 0xAD -> SOFT HYPHEN
'\u016a' # 0xAE -> LATIN CAPITAL LETTER U WITH MACRON
'\u014a' # 0xAF -> LATIN CAPITAL LETTER ENG
'\xb0' # 0xB0 -> DEGREE SIGN
'\u0105' # 0xB1 -> LATIN SMALL LETTER A WITH OGONEK
'\u0113' # 0xB2 -> LATIN SMALL LETTER E WITH MACRON
'\u0123' # 0xB3 -> LATIN SMALL LETTER G WITH CEDILLA
'\u012b' # 0xB4 -> LATIN SMALL LETTER I WITH MACRON
'\u0129' # 0xB5 -> LATIN SMALL LETTER I WITH TILDE
'\u0137' # 0xB6 -> LATIN SMALL LETTER K WITH CEDILLA
'\xb7' # 0xB7 -> MIDDLE DOT
'\u013c' # 0xB8 -> LATIN SMALL LETTER L WITH CEDILLA
'\u0111' # 0xB9 -> LATIN SMALL LETTER D WITH STROKE
'\u0161' # 0xBA -> LATIN SMALL LETTER S WITH CARON
'\u0167' # 0xBB -> LATIN SMALL LETTER T WITH STROKE
'\u017e' # 0xBC -> LATIN SMALL LETTER Z WITH CARON
'\u2015' # 0xBD -> HORIZONTAL BAR
'\u016b' # 0xBE -> LATIN SMALL LETTER U WITH MACRON
'\u014b' # 0xBF -> LATIN SMALL LETTER ENG
'\u0100' # 0xC0 -> LATIN CAPITAL LETTER A WITH MACRON
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\u012e' # 0xC7 -> LATIN CAPITAL LETTER I WITH OGONEK
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u0116' # 0xCC -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH (Icelandic)
'\u0145' # 0xD1 -> LATIN CAPITAL LETTER N WITH CEDILLA
'\u014c' # 0xD2 -> LATIN CAPITAL LETTER O WITH MACRON
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\u0168' # 0xD7 -> LATIN CAPITAL LETTER U WITH TILDE
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\u0172' # 0xD9 -> LATIN CAPITAL LETTER U WITH OGONEK
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN (Icelandic)
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
'\u0101' # 0xE0 -> LATIN SMALL LETTER A WITH MACRON
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\u012f' # 0xE7 -> LATIN SMALL LETTER I WITH OGONEK
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\u0117' # 0xEC -> LATIN SMALL LETTER E WITH DOT ABOVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH (Icelandic)
'\u0146' # 0xF1 -> LATIN SMALL LETTER N WITH CEDILLA
'\u014d' # 0xF2 -> LATIN SMALL LETTER O WITH MACRON
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\u0169' # 0xF7 -> LATIN SMALL LETTER U WITH TILDE
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\u0173' # 0xF9 -> LATIN SMALL LETTER U WITH OGONEK
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0xFE -> LATIN SMALL LETTER THORN (Icelandic)
'\u0138' # 0xFF -> LATIN SMALL LETTER KRA
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_11 generated from 'MAPPINGS/ISO8859/8859-11.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-11',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0e01' # 0xA1 -> THAI CHARACTER KO KAI
'\u0e02' # 0xA2 -> THAI CHARACTER KHO KHAI
'\u0e03' # 0xA3 -> THAI CHARACTER KHO KHUAT
'\u0e04' # 0xA4 -> THAI CHARACTER KHO KHWAI
'\u0e05' # 0xA5 -> THAI CHARACTER KHO KHON
'\u0e06' # 0xA6 -> THAI CHARACTER KHO RAKHANG
'\u0e07' # 0xA7 -> THAI CHARACTER NGO NGU
'\u0e08' # 0xA8 -> THAI CHARACTER CHO CHAN
'\u0e09' # 0xA9 -> THAI CHARACTER CHO CHING
'\u0e0a' # 0xAA -> THAI CHARACTER CHO CHANG
'\u0e0b' # 0xAB -> THAI CHARACTER SO SO
'\u0e0c' # 0xAC -> THAI CHARACTER CHO CHOE
'\u0e0d' # 0xAD -> THAI CHARACTER YO YING
'\u0e0e' # 0xAE -> THAI CHARACTER DO CHADA
'\u0e0f' # 0xAF -> THAI CHARACTER TO PATAK
'\u0e10' # 0xB0 -> THAI CHARACTER THO THAN
'\u0e11' # 0xB1 -> THAI CHARACTER THO NANGMONTHO
'\u0e12' # 0xB2 -> THAI CHARACTER THO PHUTHAO
'\u0e13' # 0xB3 -> THAI CHARACTER NO NEN
'\u0e14' # 0xB4 -> THAI CHARACTER DO DEK
'\u0e15' # 0xB5 -> THAI CHARACTER TO TAO
'\u0e16' # 0xB6 -> THAI CHARACTER THO THUNG
'\u0e17' # 0xB7 -> THAI CHARACTER THO THAHAN
'\u0e18' # 0xB8 -> THAI CHARACTER THO THONG
'\u0e19' # 0xB9 -> THAI CHARACTER NO NU
'\u0e1a' # 0xBA -> THAI CHARACTER BO BAIMAI
'\u0e1b' # 0xBB -> THAI CHARACTER PO PLA
'\u0e1c' # 0xBC -> THAI CHARACTER PHO PHUNG
'\u0e1d' # 0xBD -> THAI CHARACTER FO FA
'\u0e1e' # 0xBE -> THAI CHARACTER PHO PHAN
'\u0e1f' # 0xBF -> THAI CHARACTER FO FAN
'\u0e20' # 0xC0 -> THAI CHARACTER PHO SAMPHAO
'\u0e21' # 0xC1 -> THAI CHARACTER MO MA
'\u0e22' # 0xC2 -> THAI CHARACTER YO YAK
'\u0e23' # 0xC3 -> THAI CHARACTER RO RUA
'\u0e24' # 0xC4 -> THAI CHARACTER RU
'\u0e25' # 0xC5 -> THAI CHARACTER LO LING
'\u0e26' # 0xC6 -> THAI CHARACTER LU
'\u0e27' # 0xC7 -> THAI CHARACTER WO WAEN
'\u0e28' # 0xC8 -> THAI CHARACTER SO SALA
'\u0e29' # 0xC9 -> THAI CHARACTER SO RUSI
'\u0e2a' # 0xCA -> THAI CHARACTER SO SUA
'\u0e2b' # 0xCB -> THAI CHARACTER HO HIP
'\u0e2c' # 0xCC -> THAI CHARACTER LO CHULA
'\u0e2d' # 0xCD -> THAI CHARACTER O ANG
'\u0e2e' # 0xCE -> THAI CHARACTER HO NOKHUK
'\u0e2f' # 0xCF -> THAI CHARACTER PAIYANNOI
'\u0e30' # 0xD0 -> THAI CHARACTER SARA A
'\u0e31' # 0xD1 -> THAI CHARACTER MAI HAN-AKAT
'\u0e32' # 0xD2 -> THAI CHARACTER SARA AA
'\u0e33' # 0xD3 -> THAI CHARACTER SARA AM
'\u0e34' # 0xD4 -> THAI CHARACTER SARA I
'\u0e35' # 0xD5 -> THAI CHARACTER SARA II
'\u0e36' # 0xD6 -> THAI CHARACTER SARA UE
'\u0e37' # 0xD7 -> THAI CHARACTER SARA UEE
'\u0e38' # 0xD8 -> THAI CHARACTER SARA U
'\u0e39' # 0xD9 -> THAI CHARACTER SARA UU
'\u0e3a' # 0xDA -> THAI CHARACTER PHINTHU
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\u0e3f' # 0xDF -> THAI CURRENCY SYMBOL BAHT
'\u0e40' # 0xE0 -> THAI CHARACTER SARA E
'\u0e41' # 0xE1 -> THAI CHARACTER SARA AE
'\u0e42' # 0xE2 -> THAI CHARACTER SARA O
'\u0e43' # 0xE3 -> THAI CHARACTER SARA AI MAIMUAN
'\u0e44' # 0xE4 -> THAI CHARACTER SARA AI MAIMALAI
'\u0e45' # 0xE5 -> THAI CHARACTER LAKKHANGYAO
'\u0e46' # 0xE6 -> THAI CHARACTER MAIYAMOK
'\u0e47' # 0xE7 -> THAI CHARACTER MAITAIKHU
'\u0e48' # 0xE8 -> THAI CHARACTER MAI EK
'\u0e49' # 0xE9 -> THAI CHARACTER MAI THO
'\u0e4a' # 0xEA -> THAI CHARACTER MAI TRI
'\u0e4b' # 0xEB -> THAI CHARACTER MAI CHATTAWA
'\u0e4c' # 0xEC -> THAI CHARACTER THANTHAKHAT
'\u0e4d' # 0xED -> THAI CHARACTER NIKHAHIT
'\u0e4e' # 0xEE -> THAI CHARACTER YAMAKKAN
'\u0e4f' # 0xEF -> THAI CHARACTER FONGMAN
'\u0e50' # 0xF0 -> THAI DIGIT ZERO
'\u0e51' # 0xF1 -> THAI DIGIT ONE
'\u0e52' # 0xF2 -> THAI DIGIT TWO
'\u0e53' # 0xF3 -> THAI DIGIT THREE
'\u0e54' # 0xF4 -> THAI DIGIT FOUR
'\u0e55' # 0xF5 -> THAI DIGIT FIVE
'\u0e56' # 0xF6 -> THAI DIGIT SIX
'\u0e57' # 0xF7 -> THAI DIGIT SEVEN
'\u0e58' # 0xF8 -> THAI DIGIT EIGHT
'\u0e59' # 0xF9 -> THAI DIGIT NINE
'\u0e5a' # 0xFA -> THAI CHARACTER ANGKHANKHU
'\u0e5b' # 0xFB -> THAI CHARACTER KHOMUT
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_13 generated from 'MAPPINGS/ISO8859/8859-13.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-13',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u201d' # 0xA1 -> RIGHT DOUBLE QUOTATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\u201e' # 0xA5 -> DOUBLE LOW-9 QUOTATION MARK
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xd8' # 0xA8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u0156' # 0xAA -> LATIN CAPITAL LETTER R WITH CEDILLA
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xc6' # 0xAF -> LATIN CAPITAL LETTER AE
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\u201c' # 0xB4 -> LEFT DOUBLE QUOTATION MARK
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xf8' # 0xB8 -> LATIN SMALL LETTER O WITH STROKE
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\u0157' # 0xBA -> LATIN SMALL LETTER R WITH CEDILLA
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xe6' # 0xBF -> LATIN SMALL LETTER AE
'\u0104' # 0xC0 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u012e' # 0xC1 -> LATIN CAPITAL LETTER I WITH OGONEK
'\u0100' # 0xC2 -> LATIN CAPITAL LETTER A WITH MACRON
'\u0106' # 0xC3 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\u0118' # 0xC6 -> LATIN CAPITAL LETTER E WITH OGONEK
'\u0112' # 0xC7 -> LATIN CAPITAL LETTER E WITH MACRON
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0179' # 0xCA -> LATIN CAPITAL LETTER Z WITH ACUTE
'\u0116' # 0xCB -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\u0122' # 0xCC -> LATIN CAPITAL LETTER G WITH CEDILLA
'\u0136' # 0xCD -> LATIN CAPITAL LETTER K WITH CEDILLA
'\u012a' # 0xCE -> LATIN CAPITAL LETTER I WITH MACRON
'\u013b' # 0xCF -> LATIN CAPITAL LETTER L WITH CEDILLA
'\u0160' # 0xD0 -> LATIN CAPITAL LETTER S WITH CARON
'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\u0145' # 0xD2 -> LATIN CAPITAL LETTER N WITH CEDILLA
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\u014c' # 0xD4 -> LATIN CAPITAL LETTER O WITH MACRON
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\u0172' # 0xD8 -> LATIN CAPITAL LETTER U WITH OGONEK
'\u0141' # 0xD9 -> LATIN CAPITAL LETTER L WITH STROKE
'\u015a' # 0xDA -> LATIN CAPITAL LETTER S WITH ACUTE
'\u016a' # 0xDB -> LATIN CAPITAL LETTER U WITH MACRON
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u017b' # 0xDD -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\u017d' # 0xDE -> LATIN CAPITAL LETTER Z WITH CARON
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
'\u0105' # 0xE0 -> LATIN SMALL LETTER A WITH OGONEK
'\u012f' # 0xE1 -> LATIN SMALL LETTER I WITH OGONEK
'\u0101' # 0xE2 -> LATIN SMALL LETTER A WITH MACRON
'\u0107' # 0xE3 -> LATIN SMALL LETTER C WITH ACUTE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\u0119' # 0xE6 -> LATIN SMALL LETTER E WITH OGONEK
'\u0113' # 0xE7 -> LATIN SMALL LETTER E WITH MACRON
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\u017a' # 0xEA -> LATIN SMALL LETTER Z WITH ACUTE
'\u0117' # 0xEB -> LATIN SMALL LETTER E WITH DOT ABOVE
'\u0123' # 0xEC -> LATIN SMALL LETTER G WITH CEDILLA
'\u0137' # 0xED -> LATIN SMALL LETTER K WITH CEDILLA
'\u012b' # 0xEE -> LATIN SMALL LETTER I WITH MACRON
'\u013c' # 0xEF -> LATIN SMALL LETTER L WITH CEDILLA
'\u0161' # 0xF0 -> LATIN SMALL LETTER S WITH CARON
'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
'\u0146' # 0xF2 -> LATIN SMALL LETTER N WITH CEDILLA
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\u014d' # 0xF4 -> LATIN SMALL LETTER O WITH MACRON
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\u0173' # 0xF8 -> LATIN SMALL LETTER U WITH OGONEK
'\u0142' # 0xF9 -> LATIN SMALL LETTER L WITH STROKE
'\u015b' # 0xFA -> LATIN SMALL LETTER S WITH ACUTE
'\u016b' # 0xFB -> LATIN SMALL LETTER U WITH MACRON
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u017e' # 0xFE -> LATIN SMALL LETTER Z WITH CARON
'\u2019' # 0xFF -> RIGHT SINGLE QUOTATION MARK
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_14 generated from 'MAPPINGS/ISO8859/8859-14.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-14',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u1e02' # 0xA1 -> LATIN CAPITAL LETTER B WITH DOT ABOVE
'\u1e03' # 0xA2 -> LATIN SMALL LETTER B WITH DOT ABOVE
'\xa3' # 0xA3 -> POUND SIGN
'\u010a' # 0xA4 -> LATIN CAPITAL LETTER C WITH DOT ABOVE
'\u010b' # 0xA5 -> LATIN SMALL LETTER C WITH DOT ABOVE
'\u1e0a' # 0xA6 -> LATIN CAPITAL LETTER D WITH DOT ABOVE
'\xa7' # 0xA7 -> SECTION SIGN
'\u1e80' # 0xA8 -> LATIN CAPITAL LETTER W WITH GRAVE
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u1e82' # 0xAA -> LATIN CAPITAL LETTER W WITH ACUTE
'\u1e0b' # 0xAB -> LATIN SMALL LETTER D WITH DOT ABOVE
'\u1ef2' # 0xAC -> LATIN CAPITAL LETTER Y WITH GRAVE
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\u0178' # 0xAF -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\u1e1e' # 0xB0 -> LATIN CAPITAL LETTER F WITH DOT ABOVE
'\u1e1f' # 0xB1 -> LATIN SMALL LETTER F WITH DOT ABOVE
'\u0120' # 0xB2 -> LATIN CAPITAL LETTER G WITH DOT ABOVE
'\u0121' # 0xB3 -> LATIN SMALL LETTER G WITH DOT ABOVE
'\u1e40' # 0xB4 -> LATIN CAPITAL LETTER M WITH DOT ABOVE
'\u1e41' # 0xB5 -> LATIN SMALL LETTER M WITH DOT ABOVE
'\xb6' # 0xB6 -> PILCROW SIGN
'\u1e56' # 0xB7 -> LATIN CAPITAL LETTER P WITH DOT ABOVE
'\u1e81' # 0xB8 -> LATIN SMALL LETTER W WITH GRAVE
'\u1e57' # 0xB9 -> LATIN SMALL LETTER P WITH DOT ABOVE
'\u1e83' # 0xBA -> LATIN SMALL LETTER W WITH ACUTE
'\u1e60' # 0xBB -> LATIN CAPITAL LETTER S WITH DOT ABOVE
'\u1ef3' # 0xBC -> LATIN SMALL LETTER Y WITH GRAVE
'\u1e84' # 0xBD -> LATIN CAPITAL LETTER W WITH DIAERESIS
'\u1e85' # 0xBE -> LATIN SMALL LETTER W WITH DIAERESIS
'\u1e61' # 0xBF -> LATIN SMALL LETTER S WITH DOT ABOVE
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u0174' # 0xD0 -> LATIN CAPITAL LETTER W WITH CIRCUMFLEX
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\u1e6a' # 0xD7 -> LATIN CAPITAL LETTER T WITH DOT ABOVE
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\u0176' # 0xDE -> LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\u0175' # 0xF0 -> LATIN SMALL LETTER W WITH CIRCUMFLEX
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\u1e6b' # 0xF7 -> LATIN SMALL LETTER T WITH DOT ABOVE
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\u0177' # 0xFE -> LATIN SMALL LETTER Y WITH CIRCUMFLEX
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_15 generated from 'MAPPINGS/ISO8859/8859-15.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-15',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\u20ac' # 0xA4 -> EURO SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\u0160' # 0xA6 -> LATIN CAPITAL LETTER S WITH CARON
'\xa7' # 0xA7 -> SECTION SIGN
'\u0161' # 0xA8 -> LATIN SMALL LETTER S WITH CARON
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\u017d' # 0xB4 -> LATIN CAPITAL LETTER Z WITH CARON
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\u017e' # 0xB8 -> LATIN SMALL LETTER Z WITH CARON
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u0152' # 0xBC -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xBD -> LATIN SMALL LIGATURE OE
'\u0178' # 0xBE -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0xFE -> LATIN SMALL LETTER THORN
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_16 generated from 'MAPPINGS/ISO8859/8859-16.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-16',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u0105' # 0xA2 -> LATIN SMALL LETTER A WITH OGONEK
'\u0141' # 0xA3 -> LATIN CAPITAL LETTER L WITH STROKE
'\u20ac' # 0xA4 -> EURO SIGN
'\u201e' # 0xA5 -> DOUBLE LOW-9 QUOTATION MARK
'\u0160' # 0xA6 -> LATIN CAPITAL LETTER S WITH CARON
'\xa7' # 0xA7 -> SECTION SIGN
'\u0161' # 0xA8 -> LATIN SMALL LETTER S WITH CARON
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u0218' # 0xAA -> LATIN CAPITAL LETTER S WITH COMMA BELOW
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u0179' # 0xAC -> LATIN CAPITAL LETTER Z WITH ACUTE
'\xad' # 0xAD -> SOFT HYPHEN
'\u017a' # 0xAE -> LATIN SMALL LETTER Z WITH ACUTE
'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u010c' # 0xB2 -> LATIN CAPITAL LETTER C WITH CARON
'\u0142' # 0xB3 -> LATIN SMALL LETTER L WITH STROKE
'\u017d' # 0xB4 -> LATIN CAPITAL LETTER Z WITH CARON
'\u201d' # 0xB5 -> RIGHT DOUBLE QUOTATION MARK
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\u017e' # 0xB8 -> LATIN SMALL LETTER Z WITH CARON
'\u010d' # 0xB9 -> LATIN SMALL LETTER C WITH CARON
'\u0219' # 0xBA -> LATIN SMALL LETTER S WITH COMMA BELOW
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u0152' # 0xBC -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xBD -> LATIN SMALL LIGATURE OE
'\u0178' # 0xBE -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u0106' # 0xC5 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u0150' # 0xD5 -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\u015a' # 0xD7 -> LATIN CAPITAL LETTER S WITH ACUTE
'\u0170' # 0xD8 -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u0118' # 0xDD -> LATIN CAPITAL LETTER E WITH OGONEK
'\u021a' # 0xDE -> LATIN CAPITAL LETTER T WITH COMMA BELOW
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\u0107' # 0xE5 -> LATIN SMALL LETTER C WITH ACUTE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\u0151' # 0xF5 -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\u015b' # 0xF7 -> LATIN SMALL LETTER S WITH ACUTE
'\u0171' # 0xF8 -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u0119' # 0xFD -> LATIN SMALL LETTER E WITH OGONEK
'\u021b' # 0xFE -> LATIN SMALL LETTER T WITH COMMA BELOW
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_2 generated from 'MAPPINGS/ISO8859/8859-2.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-2',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u02d8' # 0xA2 -> BREVE
'\u0141' # 0xA3 -> LATIN CAPITAL LETTER L WITH STROKE
'\xa4' # 0xA4 -> CURRENCY SIGN
'\u013d' # 0xA5 -> LATIN CAPITAL LETTER L WITH CARON
'\u015a' # 0xA6 -> LATIN CAPITAL LETTER S WITH ACUTE
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\u0160' # 0xA9 -> LATIN CAPITAL LETTER S WITH CARON
'\u015e' # 0xAA -> LATIN CAPITAL LETTER S WITH CEDILLA
'\u0164' # 0xAB -> LATIN CAPITAL LETTER T WITH CARON
'\u0179' # 0xAC -> LATIN CAPITAL LETTER Z WITH ACUTE
'\xad' # 0xAD -> SOFT HYPHEN
'\u017d' # 0xAE -> LATIN CAPITAL LETTER Z WITH CARON
'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\xb0' # 0xB0 -> DEGREE SIGN
'\u0105' # 0xB1 -> LATIN SMALL LETTER A WITH OGONEK
'\u02db' # 0xB2 -> OGONEK
'\u0142' # 0xB3 -> LATIN SMALL LETTER L WITH STROKE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\u013e' # 0xB5 -> LATIN SMALL LETTER L WITH CARON
'\u015b' # 0xB6 -> LATIN SMALL LETTER S WITH ACUTE
'\u02c7' # 0xB7 -> CARON
'\xb8' # 0xB8 -> CEDILLA
'\u0161' # 0xB9 -> LATIN SMALL LETTER S WITH CARON
'\u015f' # 0xBA -> LATIN SMALL LETTER S WITH CEDILLA
'\u0165' # 0xBB -> LATIN SMALL LETTER T WITH CARON
'\u017a' # 0xBC -> LATIN SMALL LETTER Z WITH ACUTE
'\u02dd' # 0xBD -> DOUBLE ACUTE ACCENT
'\u017e' # 0xBE -> LATIN SMALL LETTER Z WITH CARON
'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u0154' # 0xC0 -> LATIN CAPITAL LETTER R WITH ACUTE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\u0102' # 0xC3 -> LATIN CAPITAL LETTER A WITH BREVE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u0139' # 0xC5 -> LATIN CAPITAL LETTER L WITH ACUTE
'\u0106' # 0xC6 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u011a' # 0xCC -> LATIN CAPITAL LETTER E WITH CARON
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\u010e' # 0xCF -> LATIN CAPITAL LETTER D WITH CARON
'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
'\u0143' # 0xD1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\u0147' # 0xD2 -> LATIN CAPITAL LETTER N WITH CARON
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u0150' # 0xD5 -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\u0158' # 0xD8 -> LATIN CAPITAL LETTER R WITH CARON
'\u016e' # 0xD9 -> LATIN CAPITAL LETTER U WITH RING ABOVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\u0170' # 0xDB -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\u0162' # 0xDE -> LATIN CAPITAL LETTER T WITH CEDILLA
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\u0155' # 0xE0 -> LATIN SMALL LETTER R WITH ACUTE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\u0103' # 0xE3 -> LATIN SMALL LETTER A WITH BREVE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\u013a' # 0xE5 -> LATIN SMALL LETTER L WITH ACUTE
'\u0107' # 0xE6 -> LATIN SMALL LETTER C WITH ACUTE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\u011b' # 0xEC -> LATIN SMALL LETTER E WITH CARON
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\u010f' # 0xEF -> LATIN SMALL LETTER D WITH CARON
'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
'\u0144' # 0xF1 -> LATIN SMALL LETTER N WITH ACUTE
'\u0148' # 0xF2 -> LATIN SMALL LETTER N WITH CARON
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\u0151' # 0xF5 -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\u0159' # 0xF8 -> LATIN SMALL LETTER R WITH CARON
'\u016f' # 0xF9 -> LATIN SMALL LETTER U WITH RING ABOVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\u0171' # 0xFB -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\u0163' # 0xFE -> LATIN SMALL LETTER T WITH CEDILLA
'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_3 generated from 'MAPPINGS/ISO8859/8859-3.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-3',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0126' # 0xA1 -> LATIN CAPITAL LETTER H WITH STROKE
'\u02d8' # 0xA2 -> BREVE
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\ufffe'
'\u0124' # 0xA6 -> LATIN CAPITAL LETTER H WITH CIRCUMFLEX
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\u0130' # 0xA9 -> LATIN CAPITAL LETTER I WITH DOT ABOVE
'\u015e' # 0xAA -> LATIN CAPITAL LETTER S WITH CEDILLA
'\u011e' # 0xAB -> LATIN CAPITAL LETTER G WITH BREVE
'\u0134' # 0xAC -> LATIN CAPITAL LETTER J WITH CIRCUMFLEX
'\xad' # 0xAD -> SOFT HYPHEN
'\ufffe'
'\u017b' # 0xAF -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\xb0' # 0xB0 -> DEGREE SIGN
'\u0127' # 0xB1 -> LATIN SMALL LETTER H WITH STROKE
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\u0125' # 0xB6 -> LATIN SMALL LETTER H WITH CIRCUMFLEX
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\u0131' # 0xB9 -> LATIN SMALL LETTER DOTLESS I
'\u015f' # 0xBA -> LATIN SMALL LETTER S WITH CEDILLA
'\u011f' # 0xBB -> LATIN SMALL LETTER G WITH BREVE
'\u0135' # 0xBC -> LATIN SMALL LETTER J WITH CIRCUMFLEX
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\ufffe'
'\u017c' # 0xBF -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\ufffe'
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u010a' # 0xC5 -> LATIN CAPITAL LETTER C WITH DOT ABOVE
'\u0108' # 0xC6 -> LATIN CAPITAL LETTER C WITH CIRCUMFLEX
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\ufffe'
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u0120' # 0xD5 -> LATIN CAPITAL LETTER G WITH DOT ABOVE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\u011c' # 0xD8 -> LATIN CAPITAL LETTER G WITH CIRCUMFLEX
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u016c' # 0xDD -> LATIN CAPITAL LETTER U WITH BREVE
'\u015c' # 0xDE -> LATIN CAPITAL LETTER S WITH CIRCUMFLEX
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\ufffe'
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\u010b' # 0xE5 -> LATIN SMALL LETTER C WITH DOT ABOVE
'\u0109' # 0xE6 -> LATIN SMALL LETTER C WITH CIRCUMFLEX
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\ufffe'
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\u0121' # 0xF5 -> LATIN SMALL LETTER G WITH DOT ABOVE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\u011d' # 0xF8 -> LATIN SMALL LETTER G WITH CIRCUMFLEX
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u016d' # 0xFD -> LATIN SMALL LETTER U WITH BREVE
'\u015d' # 0xFE -> LATIN SMALL LETTER S WITH CIRCUMFLEX
'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_4 generated from 'MAPPINGS/ISO8859/8859-4.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-4',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0104' # 0xA1 -> LATIN CAPITAL LETTER A WITH OGONEK
'\u0138' # 0xA2 -> LATIN SMALL LETTER KRA
'\u0156' # 0xA3 -> LATIN CAPITAL LETTER R WITH CEDILLA
'\xa4' # 0xA4 -> CURRENCY SIGN
'\u0128' # 0xA5 -> LATIN CAPITAL LETTER I WITH TILDE
'\u013b' # 0xA6 -> LATIN CAPITAL LETTER L WITH CEDILLA
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\u0160' # 0xA9 -> LATIN CAPITAL LETTER S WITH CARON
'\u0112' # 0xAA -> LATIN CAPITAL LETTER E WITH MACRON
'\u0122' # 0xAB -> LATIN CAPITAL LETTER G WITH CEDILLA
'\u0166' # 0xAC -> LATIN CAPITAL LETTER T WITH STROKE
'\xad' # 0xAD -> SOFT HYPHEN
'\u017d' # 0xAE -> LATIN CAPITAL LETTER Z WITH CARON
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\u0105' # 0xB1 -> LATIN SMALL LETTER A WITH OGONEK
'\u02db' # 0xB2 -> OGONEK
'\u0157' # 0xB3 -> LATIN SMALL LETTER R WITH CEDILLA
'\xb4' # 0xB4 -> ACUTE ACCENT
'\u0129' # 0xB5 -> LATIN SMALL LETTER I WITH TILDE
'\u013c' # 0xB6 -> LATIN SMALL LETTER L WITH CEDILLA
'\u02c7' # 0xB7 -> CARON
'\xb8' # 0xB8 -> CEDILLA
'\u0161' # 0xB9 -> LATIN SMALL LETTER S WITH CARON
'\u0113' # 0xBA -> LATIN SMALL LETTER E WITH MACRON
'\u0123' # 0xBB -> LATIN SMALL LETTER G WITH CEDILLA
'\u0167' # 0xBC -> LATIN SMALL LETTER T WITH STROKE
'\u014a' # 0xBD -> LATIN CAPITAL LETTER ENG
'\u017e' # 0xBE -> LATIN SMALL LETTER Z WITH CARON
'\u014b' # 0xBF -> LATIN SMALL LETTER ENG
'\u0100' # 0xC0 -> LATIN CAPITAL LETTER A WITH MACRON
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\u012e' # 0xC7 -> LATIN CAPITAL LETTER I WITH OGONEK
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0118' # 0xCA -> LATIN CAPITAL LETTER E WITH OGONEK
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u0116' # 0xCC -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\u012a' # 0xCF -> LATIN CAPITAL LETTER I WITH MACRON
'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
'\u0145' # 0xD1 -> LATIN CAPITAL LETTER N WITH CEDILLA
'\u014c' # 0xD2 -> LATIN CAPITAL LETTER O WITH MACRON
'\u0136' # 0xD3 -> LATIN CAPITAL LETTER K WITH CEDILLA
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\u0172' # 0xD9 -> LATIN CAPITAL LETTER U WITH OGONEK
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u0168' # 0xDD -> LATIN CAPITAL LETTER U WITH TILDE
'\u016a' # 0xDE -> LATIN CAPITAL LETTER U WITH MACRON
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\u0101' # 0xE0 -> LATIN SMALL LETTER A WITH MACRON
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\u012f' # 0xE7 -> LATIN SMALL LETTER I WITH OGONEK
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\u0119' # 0xEA -> LATIN SMALL LETTER E WITH OGONEK
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\u0117' # 0xEC -> LATIN SMALL LETTER E WITH DOT ABOVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\u012b' # 0xEF -> LATIN SMALL LETTER I WITH MACRON
'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
'\u0146' # 0xF1 -> LATIN SMALL LETTER N WITH CEDILLA
'\u014d' # 0xF2 -> LATIN SMALL LETTER O WITH MACRON
'\u0137' # 0xF3 -> LATIN SMALL LETTER K WITH CEDILLA
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\u0173' # 0xF9 -> LATIN SMALL LETTER U WITH OGONEK
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u0169' # 0xFD -> LATIN SMALL LETTER U WITH TILDE
'\u016b' # 0xFE -> LATIN SMALL LETTER U WITH MACRON
'\u02d9' # 0xFF -> DOT ABOVE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_5 generated from 'MAPPINGS/ISO8859/8859-5.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-5',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u0401' # 0xA1 -> CYRILLIC CAPITAL LETTER IO
'\u0402' # 0xA2 -> CYRILLIC CAPITAL LETTER DJE
'\u0403' # 0xA3 -> CYRILLIC CAPITAL LETTER GJE
'\u0404' # 0xA4 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\u0405' # 0xA5 -> CYRILLIC CAPITAL LETTER DZE
'\u0406' # 0xA6 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0407' # 0xA7 -> CYRILLIC CAPITAL LETTER YI
'\u0408' # 0xA8 -> CYRILLIC CAPITAL LETTER JE
'\u0409' # 0xA9 -> CYRILLIC CAPITAL LETTER LJE
'\u040a' # 0xAA -> CYRILLIC CAPITAL LETTER NJE
'\u040b' # 0xAB -> CYRILLIC CAPITAL LETTER TSHE
'\u040c' # 0xAC -> CYRILLIC CAPITAL LETTER KJE
'\xad' # 0xAD -> SOFT HYPHEN
'\u040e' # 0xAE -> CYRILLIC CAPITAL LETTER SHORT U
'\u040f' # 0xAF -> CYRILLIC CAPITAL LETTER DZHE
'\u0410' # 0xB0 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0xB1 -> CYRILLIC CAPITAL LETTER BE
'\u0412' # 0xB2 -> CYRILLIC CAPITAL LETTER VE
'\u0413' # 0xB3 -> CYRILLIC CAPITAL LETTER GHE
'\u0414' # 0xB4 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0xB5 -> CYRILLIC CAPITAL LETTER IE
'\u0416' # 0xB6 -> CYRILLIC CAPITAL LETTER ZHE
'\u0417' # 0xB7 -> CYRILLIC CAPITAL LETTER ZE
'\u0418' # 0xB8 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0xB9 -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0xBA -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0xBB -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0xBC -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0xBD -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0xBE -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0xBF -> CYRILLIC CAPITAL LETTER PE
'\u0420' # 0xC0 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0xC1 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0xC2 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0xC3 -> CYRILLIC CAPITAL LETTER U
'\u0424' # 0xC4 -> CYRILLIC CAPITAL LETTER EF
'\u0425' # 0xC5 -> CYRILLIC CAPITAL LETTER HA
'\u0426' # 0xC6 -> CYRILLIC CAPITAL LETTER TSE
'\u0427' # 0xC7 -> CYRILLIC CAPITAL LETTER CHE
'\u0428' # 0xC8 -> CYRILLIC CAPITAL LETTER SHA
'\u0429' # 0xC9 -> CYRILLIC CAPITAL LETTER SHCHA
'\u042a' # 0xCA -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u042b' # 0xCB -> CYRILLIC CAPITAL LETTER YERU
'\u042c' # 0xCC -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042d' # 0xCD -> CYRILLIC CAPITAL LETTER E
'\u042e' # 0xCE -> CYRILLIC CAPITAL LETTER YU
'\u042f' # 0xCF -> CYRILLIC CAPITAL LETTER YA
'\u0430' # 0xD0 -> CYRILLIC SMALL LETTER A
'\u0431' # 0xD1 -> CYRILLIC SMALL LETTER BE
'\u0432' # 0xD2 -> CYRILLIC SMALL LETTER VE
'\u0433' # 0xD3 -> CYRILLIC SMALL LETTER GHE
'\u0434' # 0xD4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0xD5 -> CYRILLIC SMALL LETTER IE
'\u0436' # 0xD6 -> CYRILLIC SMALL LETTER ZHE
'\u0437' # 0xD7 -> CYRILLIC SMALL LETTER ZE
'\u0438' # 0xD8 -> CYRILLIC SMALL LETTER I
'\u0439' # 0xD9 -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0xDA -> CYRILLIC SMALL LETTER KA
'\u043b' # 0xDB -> CYRILLIC SMALL LETTER EL
'\u043c' # 0xDC -> CYRILLIC SMALL LETTER EM
'\u043d' # 0xDD -> CYRILLIC SMALL LETTER EN
'\u043e' # 0xDE -> CYRILLIC SMALL LETTER O
'\u043f' # 0xDF -> CYRILLIC SMALL LETTER PE
'\u0440' # 0xE0 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0xE1 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0xE2 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0xE3 -> CYRILLIC SMALL LETTER U
'\u0444' # 0xE4 -> CYRILLIC SMALL LETTER EF
'\u0445' # 0xE5 -> CYRILLIC SMALL LETTER HA
'\u0446' # 0xE6 -> CYRILLIC SMALL LETTER TSE
'\u0447' # 0xE7 -> CYRILLIC SMALL LETTER CHE
'\u0448' # 0xE8 -> CYRILLIC SMALL LETTER SHA
'\u0449' # 0xE9 -> CYRILLIC SMALL LETTER SHCHA
'\u044a' # 0xEA -> CYRILLIC SMALL LETTER HARD SIGN
'\u044b' # 0xEB -> CYRILLIC SMALL LETTER YERU
'\u044c' # 0xEC -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044d' # 0xED -> CYRILLIC SMALL LETTER E
'\u044e' # 0xEE -> CYRILLIC SMALL LETTER YU
'\u044f' # 0xEF -> CYRILLIC SMALL LETTER YA
'\u2116' # 0xF0 -> NUMERO SIGN
'\u0451' # 0xF1 -> CYRILLIC SMALL LETTER IO
'\u0452' # 0xF2 -> CYRILLIC SMALL LETTER DJE
'\u0453' # 0xF3 -> CYRILLIC SMALL LETTER GJE
'\u0454' # 0xF4 -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\u0455' # 0xF5 -> CYRILLIC SMALL LETTER DZE
'\u0456' # 0xF6 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0457' # 0xF7 -> CYRILLIC SMALL LETTER YI
'\u0458' # 0xF8 -> CYRILLIC SMALL LETTER JE
'\u0459' # 0xF9 -> CYRILLIC SMALL LETTER LJE
'\u045a' # 0xFA -> CYRILLIC SMALL LETTER NJE
'\u045b' # 0xFB -> CYRILLIC SMALL LETTER TSHE
'\u045c' # 0xFC -> CYRILLIC SMALL LETTER KJE
'\xa7' # 0xFD -> SECTION SIGN
'\u045e' # 0xFE -> CYRILLIC SMALL LETTER SHORT U
'\u045f' # 0xFF -> CYRILLIC SMALL LETTER DZHE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_6 generated from 'MAPPINGS/ISO8859/8859-6.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-6',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\ufffe'
'\ufffe'
'\ufffe'
'\xa4' # 0xA4 -> CURRENCY SIGN
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\u060c' # 0xAC -> ARABIC COMMA
'\xad' # 0xAD -> SOFT HYPHEN
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\u061b' # 0xBB -> ARABIC SEMICOLON
'\ufffe'
'\ufffe'
'\ufffe'
'\u061f' # 0xBF -> ARABIC QUESTION MARK
'\ufffe'
'\u0621' # 0xC1 -> ARABIC LETTER HAMZA
'\u0622' # 0xC2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
'\u0623' # 0xC3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
'\u0624' # 0xC4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
'\u0625' # 0xC5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
'\u0626' # 0xC6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
'\u0627' # 0xC7 -> ARABIC LETTER ALEF
'\u0628' # 0xC8 -> ARABIC LETTER BEH
'\u0629' # 0xC9 -> ARABIC LETTER TEH MARBUTA
'\u062a' # 0xCA -> ARABIC LETTER TEH
'\u062b' # 0xCB -> ARABIC LETTER THEH
'\u062c' # 0xCC -> ARABIC LETTER JEEM
'\u062d' # 0xCD -> ARABIC LETTER HAH
'\u062e' # 0xCE -> ARABIC LETTER KHAH
'\u062f' # 0xCF -> ARABIC LETTER DAL
'\u0630' # 0xD0 -> ARABIC LETTER THAL
'\u0631' # 0xD1 -> ARABIC LETTER REH
'\u0632' # 0xD2 -> ARABIC LETTER ZAIN
'\u0633' # 0xD3 -> ARABIC LETTER SEEN
'\u0634' # 0xD4 -> ARABIC LETTER SHEEN
'\u0635' # 0xD5 -> ARABIC LETTER SAD
'\u0636' # 0xD6 -> ARABIC LETTER DAD
'\u0637' # 0xD7 -> ARABIC LETTER TAH
'\u0638' # 0xD8 -> ARABIC LETTER ZAH
'\u0639' # 0xD9 -> ARABIC LETTER AIN
'\u063a' # 0xDA -> ARABIC LETTER GHAIN
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\u0640' # 0xE0 -> ARABIC TATWEEL
'\u0641' # 0xE1 -> ARABIC LETTER FEH
'\u0642' # 0xE2 -> ARABIC LETTER QAF
'\u0643' # 0xE3 -> ARABIC LETTER KAF
'\u0644' # 0xE4 -> ARABIC LETTER LAM
'\u0645' # 0xE5 -> ARABIC LETTER MEEM
'\u0646' # 0xE6 -> ARABIC LETTER NOON
'\u0647' # 0xE7 -> ARABIC LETTER HEH
'\u0648' # 0xE8 -> ARABIC LETTER WAW
'\u0649' # 0xE9 -> ARABIC LETTER ALEF MAKSURA
'\u064a' # 0xEA -> ARABIC LETTER YEH
'\u064b' # 0xEB -> ARABIC FATHATAN
'\u064c' # 0xEC -> ARABIC DAMMATAN
'\u064d' # 0xED -> ARABIC KASRATAN
'\u064e' # 0xEE -> ARABIC FATHA
'\u064f' # 0xEF -> ARABIC DAMMA
'\u0650' # 0xF0 -> ARABIC KASRA
'\u0651' # 0xF1 -> ARABIC SHADDA
'\u0652' # 0xF2 -> ARABIC SUKUN
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_7 generated from 'MAPPINGS/ISO8859/8859-7.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-7',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u2018' # 0xA1 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xA2 -> RIGHT SINGLE QUOTATION MARK
'\xa3' # 0xA3 -> POUND SIGN
'\u20ac' # 0xA4 -> EURO SIGN
'\u20af' # 0xA5 -> DRACHMA SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u037a' # 0xAA -> GREEK YPOGEGRAMMENI
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\ufffe'
'\u2015' # 0xAF -> HORIZONTAL BAR
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\u0384' # 0xB4 -> GREEK TONOS
'\u0385' # 0xB5 -> GREEK DIALYTIKA TONOS
'\u0386' # 0xB6 -> GREEK CAPITAL LETTER ALPHA WITH TONOS
'\xb7' # 0xB7 -> MIDDLE DOT
'\u0388' # 0xB8 -> GREEK CAPITAL LETTER EPSILON WITH TONOS
'\u0389' # 0xB9 -> GREEK CAPITAL LETTER ETA WITH TONOS
'\u038a' # 0xBA -> GREEK CAPITAL LETTER IOTA WITH TONOS
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u038c' # 0xBC -> GREEK CAPITAL LETTER OMICRON WITH TONOS
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\u038e' # 0xBE -> GREEK CAPITAL LETTER UPSILON WITH TONOS
'\u038f' # 0xBF -> GREEK CAPITAL LETTER OMEGA WITH TONOS
'\u0390' # 0xC0 -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
'\u0391' # 0xC1 -> GREEK CAPITAL LETTER ALPHA
'\u0392' # 0xC2 -> GREEK CAPITAL LETTER BETA
'\u0393' # 0xC3 -> GREEK CAPITAL LETTER GAMMA
'\u0394' # 0xC4 -> GREEK CAPITAL LETTER DELTA
'\u0395' # 0xC5 -> GREEK CAPITAL LETTER EPSILON
'\u0396' # 0xC6 -> GREEK CAPITAL LETTER ZETA
'\u0397' # 0xC7 -> GREEK CAPITAL LETTER ETA
'\u0398' # 0xC8 -> GREEK CAPITAL LETTER THETA
'\u0399' # 0xC9 -> GREEK CAPITAL LETTER IOTA
'\u039a' # 0xCA -> GREEK CAPITAL LETTER KAPPA
'\u039b' # 0xCB -> GREEK CAPITAL LETTER LAMDA
'\u039c' # 0xCC -> GREEK CAPITAL LETTER MU
'\u039d' # 0xCD -> GREEK CAPITAL LETTER NU
'\u039e' # 0xCE -> GREEK CAPITAL LETTER XI
'\u039f' # 0xCF -> GREEK CAPITAL LETTER OMICRON
'\u03a0' # 0xD0 -> GREEK CAPITAL LETTER PI
'\u03a1' # 0xD1 -> GREEK CAPITAL LETTER RHO
'\ufffe'
'\u03a3' # 0xD3 -> GREEK CAPITAL LETTER SIGMA
'\u03a4' # 0xD4 -> GREEK CAPITAL LETTER TAU
'\u03a5' # 0xD5 -> GREEK CAPITAL LETTER UPSILON
'\u03a6' # 0xD6 -> GREEK CAPITAL LETTER PHI
'\u03a7' # 0xD7 -> GREEK CAPITAL LETTER CHI
'\u03a8' # 0xD8 -> GREEK CAPITAL LETTER PSI
'\u03a9' # 0xD9 -> GREEK CAPITAL LETTER OMEGA
'\u03aa' # 0xDA -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\u03ab' # 0xDB -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'\u03ac' # 0xDC -> GREEK SMALL LETTER ALPHA WITH TONOS
'\u03ad' # 0xDD -> GREEK SMALL LETTER EPSILON WITH TONOS
'\u03ae' # 0xDE -> GREEK SMALL LETTER ETA WITH TONOS
'\u03af' # 0xDF -> GREEK SMALL LETTER IOTA WITH TONOS
'\u03b0' # 0xE0 -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
'\u03b1' # 0xE1 -> GREEK SMALL LETTER ALPHA
'\u03b2' # 0xE2 -> GREEK SMALL LETTER BETA
'\u03b3' # 0xE3 -> GREEK SMALL LETTER GAMMA
'\u03b4' # 0xE4 -> GREEK SMALL LETTER DELTA
'\u03b5' # 0xE5 -> GREEK SMALL LETTER EPSILON
'\u03b6' # 0xE6 -> GREEK SMALL LETTER ZETA
'\u03b7' # 0xE7 -> GREEK SMALL LETTER ETA
'\u03b8' # 0xE8 -> GREEK SMALL LETTER THETA
'\u03b9' # 0xE9 -> GREEK SMALL LETTER IOTA
'\u03ba' # 0xEA -> GREEK SMALL LETTER KAPPA
'\u03bb' # 0xEB -> GREEK SMALL LETTER LAMDA
'\u03bc' # 0xEC -> GREEK SMALL LETTER MU
'\u03bd' # 0xED -> GREEK SMALL LETTER NU
'\u03be' # 0xEE -> GREEK SMALL LETTER XI
'\u03bf' # 0xEF -> GREEK SMALL LETTER OMICRON
'\u03c0' # 0xF0 -> GREEK SMALL LETTER PI
'\u03c1' # 0xF1 -> GREEK SMALL LETTER RHO
'\u03c2' # 0xF2 -> GREEK SMALL LETTER FINAL SIGMA
'\u03c3' # 0xF3 -> GREEK SMALL LETTER SIGMA
'\u03c4' # 0xF4 -> GREEK SMALL LETTER TAU
'\u03c5' # 0xF5 -> GREEK SMALL LETTER UPSILON
'\u03c6' # 0xF6 -> GREEK SMALL LETTER PHI
'\u03c7' # 0xF7 -> GREEK SMALL LETTER CHI
'\u03c8' # 0xF8 -> GREEK SMALL LETTER PSI
'\u03c9' # 0xF9 -> GREEK SMALL LETTER OMEGA
'\u03ca' # 0xFA -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\u03cb' # 0xFB -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\u03cc' # 0xFC -> GREEK SMALL LETTER OMICRON WITH TONOS
'\u03cd' # 0xFD -> GREEK SMALL LETTER UPSILON WITH TONOS
'\u03ce' # 0xFE -> GREEK SMALL LETTER OMEGA WITH TONOS
'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_8 generated from 'MAPPINGS/ISO8859/8859-8.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-8',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\ufffe'
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xd7' # 0xAA -> MULTIPLICATION SIGN
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xf7' # 0xBA -> DIVISION SIGN
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\u2017' # 0xDF -> DOUBLE LOW LINE
'\u05d0' # 0xE0 -> HEBREW LETTER ALEF
'\u05d1' # 0xE1 -> HEBREW LETTER BET
'\u05d2' # 0xE2 -> HEBREW LETTER GIMEL
'\u05d3' # 0xE3 -> HEBREW LETTER DALET
'\u05d4' # 0xE4 -> HEBREW LETTER HE
'\u05d5' # 0xE5 -> HEBREW LETTER VAV
'\u05d6' # 0xE6 -> HEBREW LETTER ZAYIN
'\u05d7' # 0xE7 -> HEBREW LETTER HET
'\u05d8' # 0xE8 -> HEBREW LETTER TET
'\u05d9' # 0xE9 -> HEBREW LETTER YOD
'\u05da' # 0xEA -> HEBREW LETTER FINAL KAF
'\u05db' # 0xEB -> HEBREW LETTER KAF
'\u05dc' # 0xEC -> HEBREW LETTER LAMED
'\u05dd' # 0xED -> HEBREW LETTER FINAL MEM
'\u05de' # 0xEE -> HEBREW LETTER MEM
'\u05df' # 0xEF -> HEBREW LETTER FINAL NUN
'\u05e0' # 0xF0 -> HEBREW LETTER NUN
'\u05e1' # 0xF1 -> HEBREW LETTER SAMEKH
'\u05e2' # 0xF2 -> HEBREW LETTER AYIN
'\u05e3' # 0xF3 -> HEBREW LETTER FINAL PE
'\u05e4' # 0xF4 -> HEBREW LETTER PE
'\u05e5' # 0xF5 -> HEBREW LETTER FINAL TSADI
'\u05e6' # 0xF6 -> HEBREW LETTER TSADI
'\u05e7' # 0xF7 -> HEBREW LETTER QOF
'\u05e8' # 0xF8 -> HEBREW LETTER RESH
'\u05e9' # 0xF9 -> HEBREW LETTER SHIN
'\u05ea' # 0xFA -> HEBREW LETTER TAV
'\ufffe'
'\ufffe'
'\u200e' # 0xFD -> LEFT-TO-RIGHT MARK
'\u200f' # 0xFE -> RIGHT-TO-LEFT MARK
'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec iso8859_9 generated from 'MAPPINGS/ISO8859/8859-9.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-9',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\u011e' # 0xD0 -> LATIN CAPITAL LETTER G WITH BREVE
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u0130' # 0xDD -> LATIN CAPITAL LETTER I WITH DOT ABOVE
'\u015e' # 0xDE -> LATIN CAPITAL LETTER S WITH CEDILLA
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\u011f' # 0xF0 -> LATIN SMALL LETTER G WITH BREVE
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\u0131' # 0xFD -> LATIN SMALL LETTER DOTLESS I
'\u015f' # 0xFE -> LATIN SMALL LETTER S WITH CEDILLA
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
#
# johab.py: Python Unicode Codec for JOHAB
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_kr, codecs
import _multibytecodec as mbc
codec = _codecs_kr.getcodec('johab')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='johab',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec koi8_r generated from 'MAPPINGS/VENDORS/MISC/KOI8-R.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='koi8-r',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u2500' # 0x80 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u2502' # 0x81 -> BOX DRAWINGS LIGHT VERTICAL
'\u250c' # 0x82 -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2510' # 0x83 -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x84 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2518' # 0x85 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u251c' # 0x86 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2524' # 0x87 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u252c' # 0x88 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u2534' # 0x89 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u253c' # 0x8A -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u2580' # 0x8B -> UPPER HALF BLOCK
'\u2584' # 0x8C -> LOWER HALF BLOCK
'\u2588' # 0x8D -> FULL BLOCK
'\u258c' # 0x8E -> LEFT HALF BLOCK
'\u2590' # 0x8F -> RIGHT HALF BLOCK
'\u2591' # 0x90 -> LIGHT SHADE
'\u2592' # 0x91 -> MEDIUM SHADE
'\u2593' # 0x92 -> DARK SHADE
'\u2320' # 0x93 -> TOP HALF INTEGRAL
'\u25a0' # 0x94 -> BLACK SQUARE
'\u2219' # 0x95 -> BULLET OPERATOR
'\u221a' # 0x96 -> SQUARE ROOT
'\u2248' # 0x97 -> ALMOST EQUAL TO
'\u2264' # 0x98 -> LESS-THAN OR EQUAL TO
'\u2265' # 0x99 -> GREATER-THAN OR EQUAL TO
'\xa0' # 0x9A -> NO-BREAK SPACE
'\u2321' # 0x9B -> BOTTOM HALF INTEGRAL
'\xb0' # 0x9C -> DEGREE SIGN
'\xb2' # 0x9D -> SUPERSCRIPT TWO
'\xb7' # 0x9E -> MIDDLE DOT
'\xf7' # 0x9F -> DIVISION SIGN
'\u2550' # 0xA0 -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u2551' # 0xA1 -> BOX DRAWINGS DOUBLE VERTICAL
'\u2552' # 0xA2 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u0451' # 0xA3 -> CYRILLIC SMALL LETTER IO
'\u2553' # 0xA4 -> BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
'\u2554' # 0xA5 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u2555' # 0xA6 -> BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
'\u2556' # 0xA7 -> BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
'\u2557' # 0xA8 -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u2558' # 0xA9 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2559' # 0xAA -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u255a' # 0xAB -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u255b' # 0xAC -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u255c' # 0xAD -> BOX DRAWINGS UP DOUBLE AND LEFT SINGLE
'\u255d' # 0xAE -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255e' # 0xAF -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0xB0 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u2560' # 0xB1 -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2561' # 0xB2 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u0401' # 0xB3 -> CYRILLIC CAPITAL LETTER IO
'\u2562' # 0xB4 -> BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
'\u2563' # 0xB5 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u2564' # 0xB6 -> BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
'\u2565' # 0xB7 -> BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
'\u2566' # 0xB8 -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2567' # 0xB9 -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0xBA -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2569' # 0xBB -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u256a' # 0xBC -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u256b' # 0xBD -> BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
'\u256c' # 0xBE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa9' # 0xBF -> COPYRIGHT SIGN
'\u044e' # 0xC0 -> CYRILLIC SMALL LETTER YU
'\u0430' # 0xC1 -> CYRILLIC SMALL LETTER A
'\u0431' # 0xC2 -> CYRILLIC SMALL LETTER BE
'\u0446' # 0xC3 -> CYRILLIC SMALL LETTER TSE
'\u0434' # 0xC4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0xC5 -> CYRILLIC SMALL LETTER IE
'\u0444' # 0xC6 -> CYRILLIC SMALL LETTER EF
'\u0433' # 0xC7 -> CYRILLIC SMALL LETTER GHE
'\u0445' # 0xC8 -> CYRILLIC SMALL LETTER HA
'\u0438' # 0xC9 -> CYRILLIC SMALL LETTER I
'\u0439' # 0xCA -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0xCB -> CYRILLIC SMALL LETTER KA
'\u043b' # 0xCC -> CYRILLIC SMALL LETTER EL
'\u043c' # 0xCD -> CYRILLIC SMALL LETTER EM
'\u043d' # 0xCE -> CYRILLIC SMALL LETTER EN
'\u043e' # 0xCF -> CYRILLIC SMALL LETTER O
'\u043f' # 0xD0 -> CYRILLIC SMALL LETTER PE
'\u044f' # 0xD1 -> CYRILLIC SMALL LETTER YA
'\u0440' # 0xD2 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0xD3 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0xD4 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0xD5 -> CYRILLIC SMALL LETTER U
'\u0436' # 0xD6 -> CYRILLIC SMALL LETTER ZHE
'\u0432' # 0xD7 -> CYRILLIC SMALL LETTER VE
'\u044c' # 0xD8 -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044b' # 0xD9 -> CYRILLIC SMALL LETTER YERU
'\u0437' # 0xDA -> CYRILLIC SMALL LETTER ZE
'\u0448' # 0xDB -> CYRILLIC SMALL LETTER SHA
'\u044d' # 0xDC -> CYRILLIC SMALL LETTER E
'\u0449' # 0xDD -> CYRILLIC SMALL LETTER SHCHA
'\u0447' # 0xDE -> CYRILLIC SMALL LETTER CHE
'\u044a' # 0xDF -> CYRILLIC SMALL LETTER HARD SIGN
'\u042e' # 0xE0 -> CYRILLIC CAPITAL LETTER YU
'\u0410' # 0xE1 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0xE2 -> CYRILLIC CAPITAL LETTER BE
'\u0426' # 0xE3 -> CYRILLIC CAPITAL LETTER TSE
'\u0414' # 0xE4 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0xE5 -> CYRILLIC CAPITAL LETTER IE
'\u0424' # 0xE6 -> CYRILLIC CAPITAL LETTER EF
'\u0413' # 0xE7 -> CYRILLIC CAPITAL LETTER GHE
'\u0425' # 0xE8 -> CYRILLIC CAPITAL LETTER HA
'\u0418' # 0xE9 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0xEA -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0xEB -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0xEC -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0xED -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0xEE -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0xEF -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0xF0 -> CYRILLIC CAPITAL LETTER PE
'\u042f' # 0xF1 -> CYRILLIC CAPITAL LETTER YA
'\u0420' # 0xF2 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0xF3 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0xF4 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0xF5 -> CYRILLIC CAPITAL LETTER U
'\u0416' # 0xF6 -> CYRILLIC CAPITAL LETTER ZHE
'\u0412' # 0xF7 -> CYRILLIC CAPITAL LETTER VE
'\u042c' # 0xF8 -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042b' # 0xF9 -> CYRILLIC CAPITAL LETTER YERU
'\u0417' # 0xFA -> CYRILLIC CAPITAL LETTER ZE
'\u0428' # 0xFB -> CYRILLIC CAPITAL LETTER SHA
'\u042d' # 0xFC -> CYRILLIC CAPITAL LETTER E
'\u0429' # 0xFD -> CYRILLIC CAPITAL LETTER SHCHA
'\u0427' # 0xFE -> CYRILLIC CAPITAL LETTER CHE
'\u042a' # 0xFF -> CYRILLIC CAPITAL LETTER HARD SIGN
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec koi8_u generated from 'python-mappings/KOI8-U.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='koi8-u',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u2500' # 0x80 -> BOX DRAWINGS LIGHT HORIZONTAL
'\u2502' # 0x81 -> BOX DRAWINGS LIGHT VERTICAL
'\u250c' # 0x82 -> BOX DRAWINGS LIGHT DOWN AND RIGHT
'\u2510' # 0x83 -> BOX DRAWINGS LIGHT DOWN AND LEFT
'\u2514' # 0x84 -> BOX DRAWINGS LIGHT UP AND RIGHT
'\u2518' # 0x85 -> BOX DRAWINGS LIGHT UP AND LEFT
'\u251c' # 0x86 -> BOX DRAWINGS LIGHT VERTICAL AND RIGHT
'\u2524' # 0x87 -> BOX DRAWINGS LIGHT VERTICAL AND LEFT
'\u252c' # 0x88 -> BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
'\u2534' # 0x89 -> BOX DRAWINGS LIGHT UP AND HORIZONTAL
'\u253c' # 0x8A -> BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
'\u2580' # 0x8B -> UPPER HALF BLOCK
'\u2584' # 0x8C -> LOWER HALF BLOCK
'\u2588' # 0x8D -> FULL BLOCK
'\u258c' # 0x8E -> LEFT HALF BLOCK
'\u2590' # 0x8F -> RIGHT HALF BLOCK
'\u2591' # 0x90 -> LIGHT SHADE
'\u2592' # 0x91 -> MEDIUM SHADE
'\u2593' # 0x92 -> DARK SHADE
'\u2320' # 0x93 -> TOP HALF INTEGRAL
'\u25a0' # 0x94 -> BLACK SQUARE
'\u2219' # 0x95 -> BULLET OPERATOR
'\u221a' # 0x96 -> SQUARE ROOT
'\u2248' # 0x97 -> ALMOST EQUAL TO
'\u2264' # 0x98 -> LESS-THAN OR EQUAL TO
'\u2265' # 0x99 -> GREATER-THAN OR EQUAL TO
'\xa0' # 0x9A -> NO-BREAK SPACE
'\u2321' # 0x9B -> BOTTOM HALF INTEGRAL
'\xb0' # 0x9C -> DEGREE SIGN
'\xb2' # 0x9D -> SUPERSCRIPT TWO
'\xb7' # 0x9E -> MIDDLE DOT
'\xf7' # 0x9F -> DIVISION SIGN
'\u2550' # 0xA0 -> BOX DRAWINGS DOUBLE HORIZONTAL
'\u2551' # 0xA1 -> BOX DRAWINGS DOUBLE VERTICAL
'\u2552' # 0xA2 -> BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
'\u0451' # 0xA3 -> CYRILLIC SMALL LETTER IO
'\u0454' # 0xA4 -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\u2554' # 0xA5 -> BOX DRAWINGS DOUBLE DOWN AND RIGHT
'\u0456' # 0xA6 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0457' # 0xA7 -> CYRILLIC SMALL LETTER YI (UKRAINIAN)
'\u2557' # 0xA8 -> BOX DRAWINGS DOUBLE DOWN AND LEFT
'\u2558' # 0xA9 -> BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
'\u2559' # 0xAA -> BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
'\u255a' # 0xAB -> BOX DRAWINGS DOUBLE UP AND RIGHT
'\u255b' # 0xAC -> BOX DRAWINGS UP SINGLE AND LEFT DOUBLE
'\u0491' # 0xAD -> CYRILLIC SMALL LETTER UKRAINIAN GHE WITH UPTURN
'\u255d' # 0xAE -> BOX DRAWINGS DOUBLE UP AND LEFT
'\u255e' # 0xAF -> BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
'\u255f' # 0xB0 -> BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
'\u2560' # 0xB1 -> BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
'\u2561' # 0xB2 -> BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
'\u0401' # 0xB3 -> CYRILLIC CAPITAL LETTER IO
'\u0404' # 0xB4 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\u2563' # 0xB5 -> BOX DRAWINGS DOUBLE VERTICAL AND LEFT
'\u0406' # 0xB6 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0407' # 0xB7 -> CYRILLIC CAPITAL LETTER YI (UKRAINIAN)
'\u2566' # 0xB8 -> BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
'\u2567' # 0xB9 -> BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
'\u2568' # 0xBA -> BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
'\u2569' # 0xBB -> BOX DRAWINGS DOUBLE UP AND HORIZONTAL
'\u256a' # 0xBC -> BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
'\u0490' # 0xBD -> CYRILLIC CAPITAL LETTER UKRAINIAN GHE WITH UPTURN
'\u256c' # 0xBE -> BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
'\xa9' # 0xBF -> COPYRIGHT SIGN
'\u044e' # 0xC0 -> CYRILLIC SMALL LETTER YU
'\u0430' # 0xC1 -> CYRILLIC SMALL LETTER A
'\u0431' # 0xC2 -> CYRILLIC SMALL LETTER BE
'\u0446' # 0xC3 -> CYRILLIC SMALL LETTER TSE
'\u0434' # 0xC4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0xC5 -> CYRILLIC SMALL LETTER IE
'\u0444' # 0xC6 -> CYRILLIC SMALL LETTER EF
'\u0433' # 0xC7 -> CYRILLIC SMALL LETTER GHE
'\u0445' # 0xC8 -> CYRILLIC SMALL LETTER HA
'\u0438' # 0xC9 -> CYRILLIC SMALL LETTER I
'\u0439' # 0xCA -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0xCB -> CYRILLIC SMALL LETTER KA
'\u043b' # 0xCC -> CYRILLIC SMALL LETTER EL
'\u043c' # 0xCD -> CYRILLIC SMALL LETTER EM
'\u043d' # 0xCE -> CYRILLIC SMALL LETTER EN
'\u043e' # 0xCF -> CYRILLIC SMALL LETTER O
'\u043f' # 0xD0 -> CYRILLIC SMALL LETTER PE
'\u044f' # 0xD1 -> CYRILLIC SMALL LETTER YA
'\u0440' # 0xD2 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0xD3 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0xD4 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0xD5 -> CYRILLIC SMALL LETTER U
'\u0436' # 0xD6 -> CYRILLIC SMALL LETTER ZHE
'\u0432' # 0xD7 -> CYRILLIC SMALL LETTER VE
'\u044c' # 0xD8 -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044b' # 0xD9 -> CYRILLIC SMALL LETTER YERU
'\u0437' # 0xDA -> CYRILLIC SMALL LETTER ZE
'\u0448' # 0xDB -> CYRILLIC SMALL LETTER SHA
'\u044d' # 0xDC -> CYRILLIC SMALL LETTER E
'\u0449' # 0xDD -> CYRILLIC SMALL LETTER SHCHA
'\u0447' # 0xDE -> CYRILLIC SMALL LETTER CHE
'\u044a' # 0xDF -> CYRILLIC SMALL LETTER HARD SIGN
'\u042e' # 0xE0 -> CYRILLIC CAPITAL LETTER YU
'\u0410' # 0xE1 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0xE2 -> CYRILLIC CAPITAL LETTER BE
'\u0426' # 0xE3 -> CYRILLIC CAPITAL LETTER TSE
'\u0414' # 0xE4 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0xE5 -> CYRILLIC CAPITAL LETTER IE
'\u0424' # 0xE6 -> CYRILLIC CAPITAL LETTER EF
'\u0413' # 0xE7 -> CYRILLIC CAPITAL LETTER GHE
'\u0425' # 0xE8 -> CYRILLIC CAPITAL LETTER HA
'\u0418' # 0xE9 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0xEA -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0xEB -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0xEC -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0xED -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0xEE -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0xEF -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0xF0 -> CYRILLIC CAPITAL LETTER PE
'\u042f' # 0xF1 -> CYRILLIC CAPITAL LETTER YA
'\u0420' # 0xF2 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0xF3 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0xF4 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0xF5 -> CYRILLIC CAPITAL LETTER U
'\u0416' # 0xF6 -> CYRILLIC CAPITAL LETTER ZHE
'\u0412' # 0xF7 -> CYRILLIC CAPITAL LETTER VE
'\u042c' # 0xF8 -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042b' # 0xF9 -> CYRILLIC CAPITAL LETTER YERU
'\u0417' # 0xFA -> CYRILLIC CAPITAL LETTER ZE
'\u0428' # 0xFB -> CYRILLIC CAPITAL LETTER SHA
'\u042d' # 0xFC -> CYRILLIC CAPITAL LETTER E
'\u0429' # 0xFD -> CYRILLIC CAPITAL LETTER SHCHA
'\u0427' # 0xFE -> CYRILLIC CAPITAL LETTER CHE
'\u042a' # 0xFF -> CYRILLIC CAPITAL LETTER HARD SIGN
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'latin-1' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.latin_1_encode
decode = codecs.latin_1_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.latin_1_encode(input,self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.latin_1_decode(input,self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
class StreamConverter(StreamWriter,StreamReader):
encode = codecs.latin_1_decode
decode = codecs.latin_1_encode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='iso8859-1',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec generated from 'VENDORS/APPLE/ARABIC.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_map)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-arabic',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Map
decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
0x0080: 0x00c4, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x0081: 0x00a0, # NO-BREAK SPACE, right-left
0x0082: 0x00c7, # LATIN CAPITAL LETTER C WITH CEDILLA
0x0083: 0x00c9, # LATIN CAPITAL LETTER E WITH ACUTE
0x0084: 0x00d1, # LATIN CAPITAL LETTER N WITH TILDE
0x0085: 0x00d6, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x0086: 0x00dc, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x0087: 0x00e1, # LATIN SMALL LETTER A WITH ACUTE
0x0088: 0x00e0, # LATIN SMALL LETTER A WITH GRAVE
0x0089: 0x00e2, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x008a: 0x00e4, # LATIN SMALL LETTER A WITH DIAERESIS
0x008b: 0x06ba, # ARABIC LETTER NOON GHUNNA
0x008c: 0x00ab, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x008d: 0x00e7, # LATIN SMALL LETTER C WITH CEDILLA
0x008e: 0x00e9, # LATIN SMALL LETTER E WITH ACUTE
0x008f: 0x00e8, # LATIN SMALL LETTER E WITH GRAVE
0x0090: 0x00ea, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x0091: 0x00eb, # LATIN SMALL LETTER E WITH DIAERESIS
0x0092: 0x00ed, # LATIN SMALL LETTER I WITH ACUTE
0x0093: 0x2026, # HORIZONTAL ELLIPSIS, right-left
0x0094: 0x00ee, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x0095: 0x00ef, # LATIN SMALL LETTER I WITH DIAERESIS
0x0096: 0x00f1, # LATIN SMALL LETTER N WITH TILDE
0x0097: 0x00f3, # LATIN SMALL LETTER O WITH ACUTE
0x0098: 0x00bb, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x0099: 0x00f4, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x009a: 0x00f6, # LATIN SMALL LETTER O WITH DIAERESIS
0x009b: 0x00f7, # DIVISION SIGN, right-left
0x009c: 0x00fa, # LATIN SMALL LETTER U WITH ACUTE
0x009d: 0x00f9, # LATIN SMALL LETTER U WITH GRAVE
0x009e: 0x00fb, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x009f: 0x00fc, # LATIN SMALL LETTER U WITH DIAERESIS
0x00a0: 0x0020, # SPACE, right-left
0x00a1: 0x0021, # EXCLAMATION MARK, right-left
0x00a2: 0x0022, # QUOTATION MARK, right-left
0x00a3: 0x0023, # NUMBER SIGN, right-left
0x00a4: 0x0024, # DOLLAR SIGN, right-left
0x00a5: 0x066a, # ARABIC PERCENT SIGN
0x00a6: 0x0026, # AMPERSAND, right-left
0x00a7: 0x0027, # APOSTROPHE, right-left
0x00a8: 0x0028, # LEFT PARENTHESIS, right-left
0x00a9: 0x0029, # RIGHT PARENTHESIS, right-left
0x00aa: 0x002a, # ASTERISK, right-left
0x00ab: 0x002b, # PLUS SIGN, right-left
0x00ac: 0x060c, # ARABIC COMMA
0x00ad: 0x002d, # HYPHEN-MINUS, right-left
0x00ae: 0x002e, # FULL STOP, right-left
0x00af: 0x002f, # SOLIDUS, right-left
0x00b0: 0x0660, # ARABIC-INDIC DIGIT ZERO, right-left (need override)
0x00b1: 0x0661, # ARABIC-INDIC DIGIT ONE, right-left (need override)
0x00b2: 0x0662, # ARABIC-INDIC DIGIT TWO, right-left (need override)
0x00b3: 0x0663, # ARABIC-INDIC DIGIT THREE, right-left (need override)
0x00b4: 0x0664, # ARABIC-INDIC DIGIT FOUR, right-left (need override)
0x00b5: 0x0665, # ARABIC-INDIC DIGIT FIVE, right-left (need override)
0x00b6: 0x0666, # ARABIC-INDIC DIGIT SIX, right-left (need override)
0x00b7: 0x0667, # ARABIC-INDIC DIGIT SEVEN, right-left (need override)
0x00b8: 0x0668, # ARABIC-INDIC DIGIT EIGHT, right-left (need override)
0x00b9: 0x0669, # ARABIC-INDIC DIGIT NINE, right-left (need override)
0x00ba: 0x003a, # COLON, right-left
0x00bb: 0x061b, # ARABIC SEMICOLON
0x00bc: 0x003c, # LESS-THAN SIGN, right-left
0x00bd: 0x003d, # EQUALS SIGN, right-left
0x00be: 0x003e, # GREATER-THAN SIGN, right-left
0x00bf: 0x061f, # ARABIC QUESTION MARK
0x00c0: 0x274a, # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
0x00c1: 0x0621, # ARABIC LETTER HAMZA
0x00c2: 0x0622, # ARABIC LETTER ALEF WITH MADDA ABOVE
0x00c3: 0x0623, # ARABIC LETTER ALEF WITH HAMZA ABOVE
0x00c4: 0x0624, # ARABIC LETTER WAW WITH HAMZA ABOVE
0x00c5: 0x0625, # ARABIC LETTER ALEF WITH HAMZA BELOW
0x00c6: 0x0626, # ARABIC LETTER YEH WITH HAMZA ABOVE
0x00c7: 0x0627, # ARABIC LETTER ALEF
0x00c8: 0x0628, # ARABIC LETTER BEH
0x00c9: 0x0629, # ARABIC LETTER TEH MARBUTA
0x00ca: 0x062a, # ARABIC LETTER TEH
0x00cb: 0x062b, # ARABIC LETTER THEH
0x00cc: 0x062c, # ARABIC LETTER JEEM
0x00cd: 0x062d, # ARABIC LETTER HAH
0x00ce: 0x062e, # ARABIC LETTER KHAH
0x00cf: 0x062f, # ARABIC LETTER DAL
0x00d0: 0x0630, # ARABIC LETTER THAL
0x00d1: 0x0631, # ARABIC LETTER REH
0x00d2: 0x0632, # ARABIC LETTER ZAIN
0x00d3: 0x0633, # ARABIC LETTER SEEN
0x00d4: 0x0634, # ARABIC LETTER SHEEN
0x00d5: 0x0635, # ARABIC LETTER SAD
0x00d6: 0x0636, # ARABIC LETTER DAD
0x00d7: 0x0637, # ARABIC LETTER TAH
0x00d8: 0x0638, # ARABIC LETTER ZAH
0x00d9: 0x0639, # ARABIC LETTER AIN
0x00da: 0x063a, # ARABIC LETTER GHAIN
0x00db: 0x005b, # LEFT SQUARE BRACKET, right-left
0x00dc: 0x005c, # REVERSE SOLIDUS, right-left
0x00dd: 0x005d, # RIGHT SQUARE BRACKET, right-left
0x00de: 0x005e, # CIRCUMFLEX ACCENT, right-left
0x00df: 0x005f, # LOW LINE, right-left
0x00e0: 0x0640, # ARABIC TATWEEL
0x00e1: 0x0641, # ARABIC LETTER FEH
0x00e2: 0x0642, # ARABIC LETTER QAF
0x00e3: 0x0643, # ARABIC LETTER KAF
0x00e4: 0x0644, # ARABIC LETTER LAM
0x00e5: 0x0645, # ARABIC LETTER MEEM
0x00e6: 0x0646, # ARABIC LETTER NOON
0x00e7: 0x0647, # ARABIC LETTER HEH
0x00e8: 0x0648, # ARABIC LETTER WAW
0x00e9: 0x0649, # ARABIC LETTER ALEF MAKSURA
0x00ea: 0x064a, # ARABIC LETTER YEH
0x00eb: 0x064b, # ARABIC FATHATAN
0x00ec: 0x064c, # ARABIC DAMMATAN
0x00ed: 0x064d, # ARABIC KASRATAN
0x00ee: 0x064e, # ARABIC FATHA
0x00ef: 0x064f, # ARABIC DAMMA
0x00f0: 0x0650, # ARABIC KASRA
0x00f1: 0x0651, # ARABIC SHADDA
0x00f2: 0x0652, # ARABIC SUKUN
0x00f3: 0x067e, # ARABIC LETTER PEH
0x00f4: 0x0679, # ARABIC LETTER TTEH
0x00f5: 0x0686, # ARABIC LETTER TCHEH
0x00f6: 0x06d5, # ARABIC LETTER AE
0x00f7: 0x06a4, # ARABIC LETTER VEH
0x00f8: 0x06af, # ARABIC LETTER GAF
0x00f9: 0x0688, # ARABIC LETTER DDAL
0x00fa: 0x0691, # ARABIC LETTER RREH
0x00fb: 0x007b, # LEFT CURLY BRACKET, right-left
0x00fc: 0x007c, # VERTICAL LINE, right-left
0x00fd: 0x007d, # RIGHT CURLY BRACKET, right-left
0x00fe: 0x0698, # ARABIC LETTER JEH
0x00ff: 0x06d2, # ARABIC LETTER YEH BARREE
})
### Decoding Table
decoding_table = (
'\x00' # 0x0000 -> CONTROL CHARACTER
'\x01' # 0x0001 -> CONTROL CHARACTER
'\x02' # 0x0002 -> CONTROL CHARACTER
'\x03' # 0x0003 -> CONTROL CHARACTER
'\x04' # 0x0004 -> CONTROL CHARACTER
'\x05' # 0x0005 -> CONTROL CHARACTER
'\x06' # 0x0006 -> CONTROL CHARACTER
'\x07' # 0x0007 -> CONTROL CHARACTER
'\x08' # 0x0008 -> CONTROL CHARACTER
'\t' # 0x0009 -> CONTROL CHARACTER
'\n' # 0x000a -> CONTROL CHARACTER
'\x0b' # 0x000b -> CONTROL CHARACTER
'\x0c' # 0x000c -> CONTROL CHARACTER
'\r' # 0x000d -> CONTROL CHARACTER
'\x0e' # 0x000e -> CONTROL CHARACTER
'\x0f' # 0x000f -> CONTROL CHARACTER
'\x10' # 0x0010 -> CONTROL CHARACTER
'\x11' # 0x0011 -> CONTROL CHARACTER
'\x12' # 0x0012 -> CONTROL CHARACTER
'\x13' # 0x0013 -> CONTROL CHARACTER
'\x14' # 0x0014 -> CONTROL CHARACTER
'\x15' # 0x0015 -> CONTROL CHARACTER
'\x16' # 0x0016 -> CONTROL CHARACTER
'\x17' # 0x0017 -> CONTROL CHARACTER
'\x18' # 0x0018 -> CONTROL CHARACTER
'\x19' # 0x0019 -> CONTROL CHARACTER
'\x1a' # 0x001a -> CONTROL CHARACTER
'\x1b' # 0x001b -> CONTROL CHARACTER
'\x1c' # 0x001c -> CONTROL CHARACTER
'\x1d' # 0x001d -> CONTROL CHARACTER
'\x1e' # 0x001e -> CONTROL CHARACTER
'\x1f' # 0x001f -> CONTROL CHARACTER
' ' # 0x0020 -> SPACE, left-right
'!' # 0x0021 -> EXCLAMATION MARK, left-right
'"' # 0x0022 -> QUOTATION MARK, left-right
'#' # 0x0023 -> NUMBER SIGN, left-right
'$' # 0x0024 -> DOLLAR SIGN, left-right
'%' # 0x0025 -> PERCENT SIGN, left-right
'&' # 0x0026 -> AMPERSAND, left-right
"'" # 0x0027 -> APOSTROPHE, left-right
'(' # 0x0028 -> LEFT PARENTHESIS, left-right
')' # 0x0029 -> RIGHT PARENTHESIS, left-right
'*' # 0x002a -> ASTERISK, left-right
'+' # 0x002b -> PLUS SIGN, left-right
',' # 0x002c -> COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
'-' # 0x002d -> HYPHEN-MINUS, left-right
'.' # 0x002e -> FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
'/' # 0x002f -> SOLIDUS, left-right
'0' # 0x0030 -> DIGIT ZERO; in Arabic-script context, displayed as 0x0660 ARABIC-INDIC DIGIT ZERO
'1' # 0x0031 -> DIGIT ONE; in Arabic-script context, displayed as 0x0661 ARABIC-INDIC DIGIT ONE
'2' # 0x0032 -> DIGIT TWO; in Arabic-script context, displayed as 0x0662 ARABIC-INDIC DIGIT TWO
'3' # 0x0033 -> DIGIT THREE; in Arabic-script context, displayed as 0x0663 ARABIC-INDIC DIGIT THREE
'4' # 0x0034 -> DIGIT FOUR; in Arabic-script context, displayed as 0x0664 ARABIC-INDIC DIGIT FOUR
'5' # 0x0035 -> DIGIT FIVE; in Arabic-script context, displayed as 0x0665 ARABIC-INDIC DIGIT FIVE
'6' # 0x0036 -> DIGIT SIX; in Arabic-script context, displayed as 0x0666 ARABIC-INDIC DIGIT SIX
'7' # 0x0037 -> DIGIT SEVEN; in Arabic-script context, displayed as 0x0667 ARABIC-INDIC DIGIT SEVEN
'8' # 0x0038 -> DIGIT EIGHT; in Arabic-script context, displayed as 0x0668 ARABIC-INDIC DIGIT EIGHT
'9' # 0x0039 -> DIGIT NINE; in Arabic-script context, displayed as 0x0669 ARABIC-INDIC DIGIT NINE
':' # 0x003a -> COLON, left-right
';' # 0x003b -> SEMICOLON, left-right
'<' # 0x003c -> LESS-THAN SIGN, left-right
'=' # 0x003d -> EQUALS SIGN, left-right
'>' # 0x003e -> GREATER-THAN SIGN, left-right
'?' # 0x003f -> QUESTION MARK, left-right
'@' # 0x0040 -> COMMERCIAL AT
'A' # 0x0041 -> LATIN CAPITAL LETTER A
'B' # 0x0042 -> LATIN CAPITAL LETTER B
'C' # 0x0043 -> LATIN CAPITAL LETTER C
'D' # 0x0044 -> LATIN CAPITAL LETTER D
'E' # 0x0045 -> LATIN CAPITAL LETTER E
'F' # 0x0046 -> LATIN CAPITAL LETTER F
'G' # 0x0047 -> LATIN CAPITAL LETTER G
'H' # 0x0048 -> LATIN CAPITAL LETTER H
'I' # 0x0049 -> LATIN CAPITAL LETTER I
'J' # 0x004a -> LATIN CAPITAL LETTER J
'K' # 0x004b -> LATIN CAPITAL LETTER K
'L' # 0x004c -> LATIN CAPITAL LETTER L
'M' # 0x004d -> LATIN CAPITAL LETTER M
'N' # 0x004e -> LATIN CAPITAL LETTER N
'O' # 0x004f -> LATIN CAPITAL LETTER O
'P' # 0x0050 -> LATIN CAPITAL LETTER P
'Q' # 0x0051 -> LATIN CAPITAL LETTER Q
'R' # 0x0052 -> LATIN CAPITAL LETTER R
'S' # 0x0053 -> LATIN CAPITAL LETTER S
'T' # 0x0054 -> LATIN CAPITAL LETTER T
'U' # 0x0055 -> LATIN CAPITAL LETTER U
'V' # 0x0056 -> LATIN CAPITAL LETTER V
'W' # 0x0057 -> LATIN CAPITAL LETTER W
'X' # 0x0058 -> LATIN CAPITAL LETTER X
'Y' # 0x0059 -> LATIN CAPITAL LETTER Y
'Z' # 0x005a -> LATIN CAPITAL LETTER Z
'[' # 0x005b -> LEFT SQUARE BRACKET, left-right
'\\' # 0x005c -> REVERSE SOLIDUS, left-right
']' # 0x005d -> RIGHT SQUARE BRACKET, left-right
'^' # 0x005e -> CIRCUMFLEX ACCENT, left-right
'_' # 0x005f -> LOW LINE, left-right
'`' # 0x0060 -> GRAVE ACCENT
'a' # 0x0061 -> LATIN SMALL LETTER A
'b' # 0x0062 -> LATIN SMALL LETTER B
'c' # 0x0063 -> LATIN SMALL LETTER C
'd' # 0x0064 -> LATIN SMALL LETTER D
'e' # 0x0065 -> LATIN SMALL LETTER E
'f' # 0x0066 -> LATIN SMALL LETTER F
'g' # 0x0067 -> LATIN SMALL LETTER G
'h' # 0x0068 -> LATIN SMALL LETTER H
'i' # 0x0069 -> LATIN SMALL LETTER I
'j' # 0x006a -> LATIN SMALL LETTER J
'k' # 0x006b -> LATIN SMALL LETTER K
'l' # 0x006c -> LATIN SMALL LETTER L
'm' # 0x006d -> LATIN SMALL LETTER M
'n' # 0x006e -> LATIN SMALL LETTER N
'o' # 0x006f -> LATIN SMALL LETTER O
'p' # 0x0070 -> LATIN SMALL LETTER P
'q' # 0x0071 -> LATIN SMALL LETTER Q
'r' # 0x0072 -> LATIN SMALL LETTER R
's' # 0x0073 -> LATIN SMALL LETTER S
't' # 0x0074 -> LATIN SMALL LETTER T
'u' # 0x0075 -> LATIN SMALL LETTER U
'v' # 0x0076 -> LATIN SMALL LETTER V
'w' # 0x0077 -> LATIN SMALL LETTER W
'x' # 0x0078 -> LATIN SMALL LETTER X
'y' # 0x0079 -> LATIN SMALL LETTER Y
'z' # 0x007a -> LATIN SMALL LETTER Z
'{' # 0x007b -> LEFT CURLY BRACKET, left-right
'|' # 0x007c -> VERTICAL LINE, left-right
'}' # 0x007d -> RIGHT CURLY BRACKET, left-right
'~' # 0x007e -> TILDE
'\x7f' # 0x007f -> CONTROL CHARACTER
'\xc4' # 0x0080 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xa0' # 0x0081 -> NO-BREAK SPACE, right-left
'\xc7' # 0x0082 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x0083 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x0084 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x0085 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x0086 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x0087 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x0088 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x0089 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x008a -> LATIN SMALL LETTER A WITH DIAERESIS
'\u06ba' # 0x008b -> ARABIC LETTER NOON GHUNNA
'\xab' # 0x008c -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
'\xe7' # 0x008d -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x008e -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x008f -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x0090 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x0091 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x0092 -> LATIN SMALL LETTER I WITH ACUTE
'\u2026' # 0x0093 -> HORIZONTAL ELLIPSIS, right-left
'\xee' # 0x0094 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x0095 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x0096 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x0097 -> LATIN SMALL LETTER O WITH ACUTE
'\xbb' # 0x0098 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
'\xf4' # 0x0099 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x009a -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0x009b -> DIVISION SIGN, right-left
'\xfa' # 0x009c -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x009d -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x009e -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x009f -> LATIN SMALL LETTER U WITH DIAERESIS
' ' # 0x00a0 -> SPACE, right-left
'!' # 0x00a1 -> EXCLAMATION MARK, right-left
'"' # 0x00a2 -> QUOTATION MARK, right-left
'#' # 0x00a3 -> NUMBER SIGN, right-left
'$' # 0x00a4 -> DOLLAR SIGN, right-left
'\u066a' # 0x00a5 -> ARABIC PERCENT SIGN
'&' # 0x00a6 -> AMPERSAND, right-left
"'" # 0x00a7 -> APOSTROPHE, right-left
'(' # 0x00a8 -> LEFT PARENTHESIS, right-left
')' # 0x00a9 -> RIGHT PARENTHESIS, right-left
'*' # 0x00aa -> ASTERISK, right-left
'+' # 0x00ab -> PLUS SIGN, right-left
'\u060c' # 0x00ac -> ARABIC COMMA
'-' # 0x00ad -> HYPHEN-MINUS, right-left
'.' # 0x00ae -> FULL STOP, right-left
'/' # 0x00af -> SOLIDUS, right-left
'\u0660' # 0x00b0 -> ARABIC-INDIC DIGIT ZERO, right-left (need override)
'\u0661' # 0x00b1 -> ARABIC-INDIC DIGIT ONE, right-left (need override)
'\u0662' # 0x00b2 -> ARABIC-INDIC DIGIT TWO, right-left (need override)
'\u0663' # 0x00b3 -> ARABIC-INDIC DIGIT THREE, right-left (need override)
'\u0664' # 0x00b4 -> ARABIC-INDIC DIGIT FOUR, right-left (need override)
'\u0665' # 0x00b5 -> ARABIC-INDIC DIGIT FIVE, right-left (need override)
'\u0666' # 0x00b6 -> ARABIC-INDIC DIGIT SIX, right-left (need override)
'\u0667' # 0x00b7 -> ARABIC-INDIC DIGIT SEVEN, right-left (need override)
'\u0668' # 0x00b8 -> ARABIC-INDIC DIGIT EIGHT, right-left (need override)
'\u0669' # 0x00b9 -> ARABIC-INDIC DIGIT NINE, right-left (need override)
':' # 0x00ba -> COLON, right-left
'\u061b' # 0x00bb -> ARABIC SEMICOLON
'<' # 0x00bc -> LESS-THAN SIGN, right-left
'=' # 0x00bd -> EQUALS SIGN, right-left
'>' # 0x00be -> GREATER-THAN SIGN, right-left
'\u061f' # 0x00bf -> ARABIC QUESTION MARK
'\u274a' # 0x00c0 -> EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
'\u0621' # 0x00c1 -> ARABIC LETTER HAMZA
'\u0622' # 0x00c2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
'\u0623' # 0x00c3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
'\u0624' # 0x00c4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
'\u0625' # 0x00c5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
'\u0626' # 0x00c6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
'\u0627' # 0x00c7 -> ARABIC LETTER ALEF
'\u0628' # 0x00c8 -> ARABIC LETTER BEH
'\u0629' # 0x00c9 -> ARABIC LETTER TEH MARBUTA
'\u062a' # 0x00ca -> ARABIC LETTER TEH
'\u062b' # 0x00cb -> ARABIC LETTER THEH
'\u062c' # 0x00cc -> ARABIC LETTER JEEM
'\u062d' # 0x00cd -> ARABIC LETTER HAH
'\u062e' # 0x00ce -> ARABIC LETTER KHAH
'\u062f' # 0x00cf -> ARABIC LETTER DAL
'\u0630' # 0x00d0 -> ARABIC LETTER THAL
'\u0631' # 0x00d1 -> ARABIC LETTER REH
'\u0632' # 0x00d2 -> ARABIC LETTER ZAIN
'\u0633' # 0x00d3 -> ARABIC LETTER SEEN
'\u0634' # 0x00d4 -> ARABIC LETTER SHEEN
'\u0635' # 0x00d5 -> ARABIC LETTER SAD
'\u0636' # 0x00d6 -> ARABIC LETTER DAD
'\u0637' # 0x00d7 -> ARABIC LETTER TAH
'\u0638' # 0x00d8 -> ARABIC LETTER ZAH
'\u0639' # 0x00d9 -> ARABIC LETTER AIN
'\u063a' # 0x00da -> ARABIC LETTER GHAIN
'[' # 0x00db -> LEFT SQUARE BRACKET, right-left
'\\' # 0x00dc -> REVERSE SOLIDUS, right-left
']' # 0x00dd -> RIGHT SQUARE BRACKET, right-left
'^' # 0x00de -> CIRCUMFLEX ACCENT, right-left
'_' # 0x00df -> LOW LINE, right-left
'\u0640' # 0x00e0 -> ARABIC TATWEEL
'\u0641' # 0x00e1 -> ARABIC LETTER FEH
'\u0642' # 0x00e2 -> ARABIC LETTER QAF
'\u0643' # 0x00e3 -> ARABIC LETTER KAF
'\u0644' # 0x00e4 -> ARABIC LETTER LAM
'\u0645' # 0x00e5 -> ARABIC LETTER MEEM
'\u0646' # 0x00e6 -> ARABIC LETTER NOON
'\u0647' # 0x00e7 -> ARABIC LETTER HEH
'\u0648' # 0x00e8 -> ARABIC LETTER WAW
'\u0649' # 0x00e9 -> ARABIC LETTER ALEF MAKSURA
'\u064a' # 0x00ea -> ARABIC LETTER YEH
'\u064b' # 0x00eb -> ARABIC FATHATAN
'\u064c' # 0x00ec -> ARABIC DAMMATAN
'\u064d' # 0x00ed -> ARABIC KASRATAN
'\u064e' # 0x00ee -> ARABIC FATHA
'\u064f' # 0x00ef -> ARABIC DAMMA
'\u0650' # 0x00f0 -> ARABIC KASRA
'\u0651' # 0x00f1 -> ARABIC SHADDA
'\u0652' # 0x00f2 -> ARABIC SUKUN
'\u067e' # 0x00f3 -> ARABIC LETTER PEH
'\u0679' # 0x00f4 -> ARABIC LETTER TTEH
'\u0686' # 0x00f5 -> ARABIC LETTER TCHEH
'\u06d5' # 0x00f6 -> ARABIC LETTER AE
'\u06a4' # 0x00f7 -> ARABIC LETTER VEH
'\u06af' # 0x00f8 -> ARABIC LETTER GAF
'\u0688' # 0x00f9 -> ARABIC LETTER DDAL
'\u0691' # 0x00fa -> ARABIC LETTER RREH
'{' # 0x00fb -> LEFT CURLY BRACKET, right-left
'|' # 0x00fc -> VERTICAL LINE, right-left
'}' # 0x00fd -> RIGHT CURLY BRACKET, right-left
'\u0698' # 0x00fe -> ARABIC LETTER JEH
'\u06d2' # 0x00ff -> ARABIC LETTER YEH BARREE
)
### Encoding Map
encoding_map = {
0x0000: 0x0000, # CONTROL CHARACTER
0x0001: 0x0001, # CONTROL CHARACTER
0x0002: 0x0002, # CONTROL CHARACTER
0x0003: 0x0003, # CONTROL CHARACTER
0x0004: 0x0004, # CONTROL CHARACTER
0x0005: 0x0005, # CONTROL CHARACTER
0x0006: 0x0006, # CONTROL CHARACTER
0x0007: 0x0007, # CONTROL CHARACTER
0x0008: 0x0008, # CONTROL CHARACTER
0x0009: 0x0009, # CONTROL CHARACTER
0x000a: 0x000a, # CONTROL CHARACTER
0x000b: 0x000b, # CONTROL CHARACTER
0x000c: 0x000c, # CONTROL CHARACTER
0x000d: 0x000d, # CONTROL CHARACTER
0x000e: 0x000e, # CONTROL CHARACTER
0x000f: 0x000f, # CONTROL CHARACTER
0x0010: 0x0010, # CONTROL CHARACTER
0x0011: 0x0011, # CONTROL CHARACTER
0x0012: 0x0012, # CONTROL CHARACTER
0x0013: 0x0013, # CONTROL CHARACTER
0x0014: 0x0014, # CONTROL CHARACTER
0x0015: 0x0015, # CONTROL CHARACTER
0x0016: 0x0016, # CONTROL CHARACTER
0x0017: 0x0017, # CONTROL CHARACTER
0x0018: 0x0018, # CONTROL CHARACTER
0x0019: 0x0019, # CONTROL CHARACTER
0x001a: 0x001a, # CONTROL CHARACTER
0x001b: 0x001b, # CONTROL CHARACTER
0x001c: 0x001c, # CONTROL CHARACTER
0x001d: 0x001d, # CONTROL CHARACTER
0x001e: 0x001e, # CONTROL CHARACTER
0x001f: 0x001f, # CONTROL CHARACTER
0x0020: 0x0020, # SPACE, left-right
0x0020: 0x00a0, # SPACE, right-left
0x0021: 0x0021, # EXCLAMATION MARK, left-right
0x0021: 0x00a1, # EXCLAMATION MARK, right-left
0x0022: 0x0022, # QUOTATION MARK, left-right
0x0022: 0x00a2, # QUOTATION MARK, right-left
0x0023: 0x0023, # NUMBER SIGN, left-right
0x0023: 0x00a3, # NUMBER SIGN, right-left
0x0024: 0x0024, # DOLLAR SIGN, left-right
0x0024: 0x00a4, # DOLLAR SIGN, right-left
0x0025: 0x0025, # PERCENT SIGN, left-right
0x0026: 0x0026, # AMPERSAND, left-right
0x0026: 0x00a6, # AMPERSAND, right-left
0x0027: 0x0027, # APOSTROPHE, left-right
0x0027: 0x00a7, # APOSTROPHE, right-left
0x0028: 0x0028, # LEFT PARENTHESIS, left-right
0x0028: 0x00a8, # LEFT PARENTHESIS, right-left
0x0029: 0x0029, # RIGHT PARENTHESIS, left-right
0x0029: 0x00a9, # RIGHT PARENTHESIS, right-left
0x002a: 0x002a, # ASTERISK, left-right
0x002a: 0x00aa, # ASTERISK, right-left
0x002b: 0x002b, # PLUS SIGN, left-right
0x002b: 0x00ab, # PLUS SIGN, right-left
0x002c: 0x002c, # COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
0x002d: 0x002d, # HYPHEN-MINUS, left-right
0x002d: 0x00ad, # HYPHEN-MINUS, right-left
0x002e: 0x002e, # FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
0x002e: 0x00ae, # FULL STOP, right-left
0x002f: 0x002f, # SOLIDUS, left-right
0x002f: 0x00af, # SOLIDUS, right-left
0x0030: 0x0030, # DIGIT ZERO; in Arabic-script context, displayed as 0x0660 ARABIC-INDIC DIGIT ZERO
0x0031: 0x0031, # DIGIT ONE; in Arabic-script context, displayed as 0x0661 ARABIC-INDIC DIGIT ONE
0x0032: 0x0032, # DIGIT TWO; in Arabic-script context, displayed as 0x0662 ARABIC-INDIC DIGIT TWO
0x0033: 0x0033, # DIGIT THREE; in Arabic-script context, displayed as 0x0663 ARABIC-INDIC DIGIT THREE
0x0034: 0x0034, # DIGIT FOUR; in Arabic-script context, displayed as 0x0664 ARABIC-INDIC DIGIT FOUR
0x0035: 0x0035, # DIGIT FIVE; in Arabic-script context, displayed as 0x0665 ARABIC-INDIC DIGIT FIVE
0x0036: 0x0036, # DIGIT SIX; in Arabic-script context, displayed as 0x0666 ARABIC-INDIC DIGIT SIX
0x0037: 0x0037, # DIGIT SEVEN; in Arabic-script context, displayed as 0x0667 ARABIC-INDIC DIGIT SEVEN
0x0038: 0x0038, # DIGIT EIGHT; in Arabic-script context, displayed as 0x0668 ARABIC-INDIC DIGIT EIGHT
0x0039: 0x0039, # DIGIT NINE; in Arabic-script context, displayed as 0x0669 ARABIC-INDIC DIGIT NINE
0x003a: 0x003a, # COLON, left-right
0x003a: 0x00ba, # COLON, right-left
0x003b: 0x003b, # SEMICOLON, left-right
0x003c: 0x003c, # LESS-THAN SIGN, left-right
0x003c: 0x00bc, # LESS-THAN SIGN, right-left
0x003d: 0x003d, # EQUALS SIGN, left-right
0x003d: 0x00bd, # EQUALS SIGN, right-left
0x003e: 0x003e, # GREATER-THAN SIGN, left-right
0x003e: 0x00be, # GREATER-THAN SIGN, right-left
0x003f: 0x003f, # QUESTION MARK, left-right
0x0040: 0x0040, # COMMERCIAL AT
0x0041: 0x0041, # LATIN CAPITAL LETTER A
0x0042: 0x0042, # LATIN CAPITAL LETTER B
0x0043: 0x0043, # LATIN CAPITAL LETTER C
0x0044: 0x0044, # LATIN CAPITAL LETTER D
0x0045: 0x0045, # LATIN CAPITAL LETTER E
0x0046: 0x0046, # LATIN CAPITAL LETTER F
0x0047: 0x0047, # LATIN CAPITAL LETTER G
0x0048: 0x0048, # LATIN CAPITAL LETTER H
0x0049: 0x0049, # LATIN CAPITAL LETTER I
0x004a: 0x004a, # LATIN CAPITAL LETTER J
0x004b: 0x004b, # LATIN CAPITAL LETTER K
0x004c: 0x004c, # LATIN CAPITAL LETTER L
0x004d: 0x004d, # LATIN CAPITAL LETTER M
0x004e: 0x004e, # LATIN CAPITAL LETTER N
0x004f: 0x004f, # LATIN CAPITAL LETTER O
0x0050: 0x0050, # LATIN CAPITAL LETTER P
0x0051: 0x0051, # LATIN CAPITAL LETTER Q
0x0052: 0x0052, # LATIN CAPITAL LETTER R
0x0053: 0x0053, # LATIN CAPITAL LETTER S
0x0054: 0x0054, # LATIN CAPITAL LETTER T
0x0055: 0x0055, # LATIN CAPITAL LETTER U
0x0056: 0x0056, # LATIN CAPITAL LETTER V
0x0057: 0x0057, # LATIN CAPITAL LETTER W
0x0058: 0x0058, # LATIN CAPITAL LETTER X
0x0059: 0x0059, # LATIN CAPITAL LETTER Y
0x005a: 0x005a, # LATIN CAPITAL LETTER Z
0x005b: 0x005b, # LEFT SQUARE BRACKET, left-right
0x005b: 0x00db, # LEFT SQUARE BRACKET, right-left
0x005c: 0x005c, # REVERSE SOLIDUS, left-right
0x005c: 0x00dc, # REVERSE SOLIDUS, right-left
0x005d: 0x005d, # RIGHT SQUARE BRACKET, left-right
0x005d: 0x00dd, # RIGHT SQUARE BRACKET, right-left
0x005e: 0x005e, # CIRCUMFLEX ACCENT, left-right
0x005e: 0x00de, # CIRCUMFLEX ACCENT, right-left
0x005f: 0x005f, # LOW LINE, left-right
0x005f: 0x00df, # LOW LINE, right-left
0x0060: 0x0060, # GRAVE ACCENT
0x0061: 0x0061, # LATIN SMALL LETTER A
0x0062: 0x0062, # LATIN SMALL LETTER B
0x0063: 0x0063, # LATIN SMALL LETTER C
0x0064: 0x0064, # LATIN SMALL LETTER D
0x0065: 0x0065, # LATIN SMALL LETTER E
0x0066: 0x0066, # LATIN SMALL LETTER F
0x0067: 0x0067, # LATIN SMALL LETTER G
0x0068: 0x0068, # LATIN SMALL LETTER H
0x0069: 0x0069, # LATIN SMALL LETTER I
0x006a: 0x006a, # LATIN SMALL LETTER J
0x006b: 0x006b, # LATIN SMALL LETTER K
0x006c: 0x006c, # LATIN SMALL LETTER L
0x006d: 0x006d, # LATIN SMALL LETTER M
0x006e: 0x006e, # LATIN SMALL LETTER N
0x006f: 0x006f, # LATIN SMALL LETTER O
0x0070: 0x0070, # LATIN SMALL LETTER P
0x0071: 0x0071, # LATIN SMALL LETTER Q
0x0072: 0x0072, # LATIN SMALL LETTER R
0x0073: 0x0073, # LATIN SMALL LETTER S
0x0074: 0x0074, # LATIN SMALL LETTER T
0x0075: 0x0075, # LATIN SMALL LETTER U
0x0076: 0x0076, # LATIN SMALL LETTER V
0x0077: 0x0077, # LATIN SMALL LETTER W
0x0078: 0x0078, # LATIN SMALL LETTER X
0x0079: 0x0079, # LATIN SMALL LETTER Y
0x007a: 0x007a, # LATIN SMALL LETTER Z
0x007b: 0x007b, # LEFT CURLY BRACKET, left-right
0x007b: 0x00fb, # LEFT CURLY BRACKET, right-left
0x007c: 0x007c, # VERTICAL LINE, left-right
0x007c: 0x00fc, # VERTICAL LINE, right-left
0x007d: 0x007d, # RIGHT CURLY BRACKET, left-right
0x007d: 0x00fd, # RIGHT CURLY BRACKET, right-left
0x007e: 0x007e, # TILDE
0x007f: 0x007f, # CONTROL CHARACTER
0x00a0: 0x0081, # NO-BREAK SPACE, right-left
0x00ab: 0x008c, # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x00bb: 0x0098, # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
0x00c4: 0x0080, # LATIN CAPITAL LETTER A WITH DIAERESIS
0x00c7: 0x0082, # LATIN CAPITAL LETTER C WITH CEDILLA
0x00c9: 0x0083, # LATIN CAPITAL LETTER E WITH ACUTE
0x00d1: 0x0084, # LATIN CAPITAL LETTER N WITH TILDE
0x00d6: 0x0085, # LATIN CAPITAL LETTER O WITH DIAERESIS
0x00dc: 0x0086, # LATIN CAPITAL LETTER U WITH DIAERESIS
0x00e0: 0x0088, # LATIN SMALL LETTER A WITH GRAVE
0x00e1: 0x0087, # LATIN SMALL LETTER A WITH ACUTE
0x00e2: 0x0089, # LATIN SMALL LETTER A WITH CIRCUMFLEX
0x00e4: 0x008a, # LATIN SMALL LETTER A WITH DIAERESIS
0x00e7: 0x008d, # LATIN SMALL LETTER C WITH CEDILLA
0x00e8: 0x008f, # LATIN SMALL LETTER E WITH GRAVE
0x00e9: 0x008e, # LATIN SMALL LETTER E WITH ACUTE
0x00ea: 0x0090, # LATIN SMALL LETTER E WITH CIRCUMFLEX
0x00eb: 0x0091, # LATIN SMALL LETTER E WITH DIAERESIS
0x00ed: 0x0092, # LATIN SMALL LETTER I WITH ACUTE
0x00ee: 0x0094, # LATIN SMALL LETTER I WITH CIRCUMFLEX
0x00ef: 0x0095, # LATIN SMALL LETTER I WITH DIAERESIS
0x00f1: 0x0096, # LATIN SMALL LETTER N WITH TILDE
0x00f3: 0x0097, # LATIN SMALL LETTER O WITH ACUTE
0x00f4: 0x0099, # LATIN SMALL LETTER O WITH CIRCUMFLEX
0x00f6: 0x009a, # LATIN SMALL LETTER O WITH DIAERESIS
0x00f7: 0x009b, # DIVISION SIGN, right-left
0x00f9: 0x009d, # LATIN SMALL LETTER U WITH GRAVE
0x00fa: 0x009c, # LATIN SMALL LETTER U WITH ACUTE
0x00fb: 0x009e, # LATIN SMALL LETTER U WITH CIRCUMFLEX
0x00fc: 0x009f, # LATIN SMALL LETTER U WITH DIAERESIS
0x060c: 0x00ac, # ARABIC COMMA
0x061b: 0x00bb, # ARABIC SEMICOLON
0x061f: 0x00bf, # ARABIC QUESTION MARK
0x0621: 0x00c1, # ARABIC LETTER HAMZA
0x0622: 0x00c2, # ARABIC LETTER ALEF WITH MADDA ABOVE
0x0623: 0x00c3, # ARABIC LETTER ALEF WITH HAMZA ABOVE
0x0624: 0x00c4, # ARABIC LETTER WAW WITH HAMZA ABOVE
0x0625: 0x00c5, # ARABIC LETTER ALEF WITH HAMZA BELOW
0x0626: 0x00c6, # ARABIC LETTER YEH WITH HAMZA ABOVE
0x0627: 0x00c7, # ARABIC LETTER ALEF
0x0628: 0x00c8, # ARABIC LETTER BEH
0x0629: 0x00c9, # ARABIC LETTER TEH MARBUTA
0x062a: 0x00ca, # ARABIC LETTER TEH
0x062b: 0x00cb, # ARABIC LETTER THEH
0x062c: 0x00cc, # ARABIC LETTER JEEM
0x062d: 0x00cd, # ARABIC LETTER HAH
0x062e: 0x00ce, # ARABIC LETTER KHAH
0x062f: 0x00cf, # ARABIC LETTER DAL
0x0630: 0x00d0, # ARABIC LETTER THAL
0x0631: 0x00d1, # ARABIC LETTER REH
0x0632: 0x00d2, # ARABIC LETTER ZAIN
0x0633: 0x00d3, # ARABIC LETTER SEEN
0x0634: 0x00d4, # ARABIC LETTER SHEEN
0x0635: 0x00d5, # ARABIC LETTER SAD
0x0636: 0x00d6, # ARABIC LETTER DAD
0x0637: 0x00d7, # ARABIC LETTER TAH
0x0638: 0x00d8, # ARABIC LETTER ZAH
0x0639: 0x00d9, # ARABIC LETTER AIN
0x063a: 0x00da, # ARABIC LETTER GHAIN
0x0640: 0x00e0, # ARABIC TATWEEL
0x0641: 0x00e1, # ARABIC LETTER FEH
0x0642: 0x00e2, # ARABIC LETTER QAF
0x0643: 0x00e3, # ARABIC LETTER KAF
0x0644: 0x00e4, # ARABIC LETTER LAM
0x0645: 0x00e5, # ARABIC LETTER MEEM
0x0646: 0x00e6, # ARABIC LETTER NOON
0x0647: 0x00e7, # ARABIC LETTER HEH
0x0648: 0x00e8, # ARABIC LETTER WAW
0x0649: 0x00e9, # ARABIC LETTER ALEF MAKSURA
0x064a: 0x00ea, # ARABIC LETTER YEH
0x064b: 0x00eb, # ARABIC FATHATAN
0x064c: 0x00ec, # ARABIC DAMMATAN
0x064d: 0x00ed, # ARABIC KASRATAN
0x064e: 0x00ee, # ARABIC FATHA
0x064f: 0x00ef, # ARABIC DAMMA
0x0650: 0x00f0, # ARABIC KASRA
0x0651: 0x00f1, # ARABIC SHADDA
0x0652: 0x00f2, # ARABIC SUKUN
0x0660: 0x00b0, # ARABIC-INDIC DIGIT ZERO, right-left (need override)
0x0661: 0x00b1, # ARABIC-INDIC DIGIT ONE, right-left (need override)
0x0662: 0x00b2, # ARABIC-INDIC DIGIT TWO, right-left (need override)
0x0663: 0x00b3, # ARABIC-INDIC DIGIT THREE, right-left (need override)
0x0664: 0x00b4, # ARABIC-INDIC DIGIT FOUR, right-left (need override)
0x0665: 0x00b5, # ARABIC-INDIC DIGIT FIVE, right-left (need override)
0x0666: 0x00b6, # ARABIC-INDIC DIGIT SIX, right-left (need override)
0x0667: 0x00b7, # ARABIC-INDIC DIGIT SEVEN, right-left (need override)
0x0668: 0x00b8, # ARABIC-INDIC DIGIT EIGHT, right-left (need override)
0x0669: 0x00b9, # ARABIC-INDIC DIGIT NINE, right-left (need override)
0x066a: 0x00a5, # ARABIC PERCENT SIGN
0x0679: 0x00f4, # ARABIC LETTER TTEH
0x067e: 0x00f3, # ARABIC LETTER PEH
0x0686: 0x00f5, # ARABIC LETTER TCHEH
0x0688: 0x00f9, # ARABIC LETTER DDAL
0x0691: 0x00fa, # ARABIC LETTER RREH
0x0698: 0x00fe, # ARABIC LETTER JEH
0x06a4: 0x00f7, # ARABIC LETTER VEH
0x06af: 0x00f8, # ARABIC LETTER GAF
0x06ba: 0x008b, # ARABIC LETTER NOON GHUNNA
0x06d2: 0x00ff, # ARABIC LETTER YEH BARREE
0x06d5: 0x00f6, # ARABIC LETTER AE
0x2026: 0x0093, # HORIZONTAL ELLIPSIS, right-left
0x274a: 0x00c0, # EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
}
""" Python Character Mapping Codec mac_centeuro generated from 'MAPPINGS/VENDORS/APPLE/CENTEURO.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-centeuro',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u0100' # 0x81 -> LATIN CAPITAL LETTER A WITH MACRON
'\u0101' # 0x82 -> LATIN SMALL LETTER A WITH MACRON
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0104' # 0x84 -> LATIN CAPITAL LETTER A WITH OGONEK
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\u0105' # 0x88 -> LATIN SMALL LETTER A WITH OGONEK
'\u010c' # 0x89 -> LATIN CAPITAL LETTER C WITH CARON
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\u010d' # 0x8B -> LATIN SMALL LETTER C WITH CARON
'\u0106' # 0x8C -> LATIN CAPITAL LETTER C WITH ACUTE
'\u0107' # 0x8D -> LATIN SMALL LETTER C WITH ACUTE
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\u0179' # 0x8F -> LATIN CAPITAL LETTER Z WITH ACUTE
'\u017a' # 0x90 -> LATIN SMALL LETTER Z WITH ACUTE
'\u010e' # 0x91 -> LATIN CAPITAL LETTER D WITH CARON
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\u010f' # 0x93 -> LATIN SMALL LETTER D WITH CARON
'\u0112' # 0x94 -> LATIN CAPITAL LETTER E WITH MACRON
'\u0113' # 0x95 -> LATIN SMALL LETTER E WITH MACRON
'\u0116' # 0x96 -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\u0117' # 0x98 -> LATIN SMALL LETTER E WITH DOT ABOVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\u011a' # 0x9D -> LATIN CAPITAL LETTER E WITH CARON
'\u011b' # 0x9E -> LATIN SMALL LETTER E WITH CARON
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\u0118' # 0xA2 -> LATIN CAPITAL LETTER E WITH OGONEK
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\u0119' # 0xAB -> LATIN SMALL LETTER E WITH OGONEK
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\u0123' # 0xAE -> LATIN SMALL LETTER G WITH CEDILLA
'\u012e' # 0xAF -> LATIN CAPITAL LETTER I WITH OGONEK
'\u012f' # 0xB0 -> LATIN SMALL LETTER I WITH OGONEK
'\u012a' # 0xB1 -> LATIN CAPITAL LETTER I WITH MACRON
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\u012b' # 0xB4 -> LATIN SMALL LETTER I WITH MACRON
'\u0136' # 0xB5 -> LATIN CAPITAL LETTER K WITH CEDILLA
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u0142' # 0xB8 -> LATIN SMALL LETTER L WITH STROKE
'\u013b' # 0xB9 -> LATIN CAPITAL LETTER L WITH CEDILLA
'\u013c' # 0xBA -> LATIN SMALL LETTER L WITH CEDILLA
'\u013d' # 0xBB -> LATIN CAPITAL LETTER L WITH CARON
'\u013e' # 0xBC -> LATIN SMALL LETTER L WITH CARON
'\u0139' # 0xBD -> LATIN CAPITAL LETTER L WITH ACUTE
'\u013a' # 0xBE -> LATIN SMALL LETTER L WITH ACUTE
'\u0145' # 0xBF -> LATIN CAPITAL LETTER N WITH CEDILLA
'\u0146' # 0xC0 -> LATIN SMALL LETTER N WITH CEDILLA
'\u0143' # 0xC1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0144' # 0xC4 -> LATIN SMALL LETTER N WITH ACUTE
'\u0147' # 0xC5 -> LATIN CAPITAL LETTER N WITH CARON
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\u0148' # 0xCB -> LATIN SMALL LETTER N WITH CARON
'\u0150' # 0xCC -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0151' # 0xCE -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
'\u014c' # 0xCF -> LATIN CAPITAL LETTER O WITH MACRON
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\u014d' # 0xD8 -> LATIN SMALL LETTER O WITH MACRON
'\u0154' # 0xD9 -> LATIN CAPITAL LETTER R WITH ACUTE
'\u0155' # 0xDA -> LATIN SMALL LETTER R WITH ACUTE
'\u0158' # 0xDB -> LATIN CAPITAL LETTER R WITH CARON
'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u0159' # 0xDE -> LATIN SMALL LETTER R WITH CARON
'\u0156' # 0xDF -> LATIN CAPITAL LETTER R WITH CEDILLA
'\u0157' # 0xE0 -> LATIN SMALL LETTER R WITH CEDILLA
'\u0160' # 0xE1 -> LATIN CAPITAL LETTER S WITH CARON
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u0161' # 0xE4 -> LATIN SMALL LETTER S WITH CARON
'\u015a' # 0xE5 -> LATIN CAPITAL LETTER S WITH ACUTE
'\u015b' # 0xE6 -> LATIN SMALL LETTER S WITH ACUTE
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\u0164' # 0xE8 -> LATIN CAPITAL LETTER T WITH CARON
'\u0165' # 0xE9 -> LATIN SMALL LETTER T WITH CARON
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\u017d' # 0xEB -> LATIN CAPITAL LETTER Z WITH CARON
'\u017e' # 0xEC -> LATIN SMALL LETTER Z WITH CARON
'\u016a' # 0xED -> LATIN CAPITAL LETTER U WITH MACRON
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u016b' # 0xF0 -> LATIN SMALL LETTER U WITH MACRON
'\u016e' # 0xF1 -> LATIN CAPITAL LETTER U WITH RING ABOVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\u016f' # 0xF3 -> LATIN SMALL LETTER U WITH RING ABOVE
'\u0170' # 0xF4 -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
'\u0171' # 0xF5 -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
'\u0172' # 0xF6 -> LATIN CAPITAL LETTER U WITH OGONEK
'\u0173' # 0xF7 -> LATIN SMALL LETTER U WITH OGONEK
'\xdd' # 0xF8 -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xfd' # 0xF9 -> LATIN SMALL LETTER Y WITH ACUTE
'\u0137' # 0xFA -> LATIN SMALL LETTER K WITH CEDILLA
'\u017b' # 0xFB -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\u0141' # 0xFC -> LATIN CAPITAL LETTER L WITH STROKE
'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u0122' # 0xFE -> LATIN CAPITAL LETTER G WITH CEDILLA
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_croatian generated from 'MAPPINGS/VENDORS/APPLE/CROATIAN.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-croatian',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\u0160' # 0xA9 -> LATIN CAPITAL LETTER S WITH CARON
'\u2122' # 0xAA -> TRADE MARK SIGN
'\xb4' # 0xAB -> ACUTE ACCENT
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\u017d' # 0xAE -> LATIN CAPITAL LETTER Z WITH CARON
'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
'\u221e' # 0xB0 -> INFINITY
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\u2206' # 0xB4 -> INCREMENT
'\xb5' # 0xB5 -> MICRO SIGN
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u220f' # 0xB8 -> N-ARY PRODUCT
'\u0161' # 0xB9 -> LATIN SMALL LETTER S WITH CARON
'\u222b' # 0xBA -> INTEGRAL
'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
'\u017e' # 0xBE -> LATIN SMALL LETTER Z WITH CARON
'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
'\xbf' # 0xC0 -> INVERTED QUESTION MARK
'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u0106' # 0xC6 -> LATIN CAPITAL LETTER C WITH ACUTE
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u010c' # 0xC8 -> LATIN CAPITAL LETTER C WITH CARON
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
'\u0110' # 0xD0 -> LATIN CAPITAL LETTER D WITH STROKE
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\uf8ff' # 0xD8 -> Apple logo
'\xa9' # 0xD9 -> COPYRIGHT SIGN
'\u2044' # 0xDA -> FRACTION SLASH
'\u20ac' # 0xDB -> EURO SIGN
'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\xc6' # 0xDE -> LATIN CAPITAL LETTER AE
'\xbb' # 0xDF -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2013' # 0xE0 -> EN DASH
'\xb7' # 0xE1 -> MIDDLE DOT
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u2030' # 0xE4 -> PER MILLE SIGN
'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\u0107' # 0xE6 -> LATIN SMALL LETTER C WITH ACUTE
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\u010d' # 0xE8 -> LATIN SMALL LETTER C WITH CARON
'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u0111' # 0xF0 -> LATIN SMALL LETTER D WITH STROKE
'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u02dc' # 0xF7 -> SMALL TILDE
'\xaf' # 0xF8 -> MACRON
'\u03c0' # 0xF9 -> GREEK SMALL LETTER PI
'\xcb' # 0xFA -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\u02da' # 0xFB -> RING ABOVE
'\xb8' # 0xFC -> CEDILLA
'\xca' # 0xFD -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xe6' # 0xFE -> LATIN SMALL LETTER AE
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_cyrillic generated from 'MAPPINGS/VENDORS/APPLE/CYRILLIC.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-cyrillic',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\u0410' # 0x80 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0x81 -> CYRILLIC CAPITAL LETTER BE
'\u0412' # 0x82 -> CYRILLIC CAPITAL LETTER VE
'\u0413' # 0x83 -> CYRILLIC CAPITAL LETTER GHE
'\u0414' # 0x84 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0x85 -> CYRILLIC CAPITAL LETTER IE
'\u0416' # 0x86 -> CYRILLIC CAPITAL LETTER ZHE
'\u0417' # 0x87 -> CYRILLIC CAPITAL LETTER ZE
'\u0418' # 0x88 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0x89 -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0x8A -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0x8B -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0x8C -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0x8D -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0x8E -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0x8F -> CYRILLIC CAPITAL LETTER PE
'\u0420' # 0x90 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0x91 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0x92 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0x93 -> CYRILLIC CAPITAL LETTER U
'\u0424' # 0x94 -> CYRILLIC CAPITAL LETTER EF
'\u0425' # 0x95 -> CYRILLIC CAPITAL LETTER HA
'\u0426' # 0x96 -> CYRILLIC CAPITAL LETTER TSE
'\u0427' # 0x97 -> CYRILLIC CAPITAL LETTER CHE
'\u0428' # 0x98 -> CYRILLIC CAPITAL LETTER SHA
'\u0429' # 0x99 -> CYRILLIC CAPITAL LETTER SHCHA
'\u042a' # 0x9A -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u042b' # 0x9B -> CYRILLIC CAPITAL LETTER YERU
'\u042c' # 0x9C -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042d' # 0x9D -> CYRILLIC CAPITAL LETTER E
'\u042e' # 0x9E -> CYRILLIC CAPITAL LETTER YU
'\u042f' # 0x9F -> CYRILLIC CAPITAL LETTER YA
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\u0490' # 0xA2 -> CYRILLIC CAPITAL LETTER GHE WITH UPTURN
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\u0406' # 0xA7 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\u0402' # 0xAB -> CYRILLIC CAPITAL LETTER DJE
'\u0452' # 0xAC -> CYRILLIC SMALL LETTER DJE
'\u2260' # 0xAD -> NOT EQUAL TO
'\u0403' # 0xAE -> CYRILLIC CAPITAL LETTER GJE
'\u0453' # 0xAF -> CYRILLIC SMALL LETTER GJE
'\u221e' # 0xB0 -> INFINITY
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\u0456' # 0xB4 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\xb5' # 0xB5 -> MICRO SIGN
'\u0491' # 0xB6 -> CYRILLIC SMALL LETTER GHE WITH UPTURN
'\u0408' # 0xB7 -> CYRILLIC CAPITAL LETTER JE
'\u0404' # 0xB8 -> CYRILLIC CAPITAL LETTER UKRAINIAN IE
'\u0454' # 0xB9 -> CYRILLIC SMALL LETTER UKRAINIAN IE
'\u0407' # 0xBA -> CYRILLIC CAPITAL LETTER YI
'\u0457' # 0xBB -> CYRILLIC SMALL LETTER YI
'\u0409' # 0xBC -> CYRILLIC CAPITAL LETTER LJE
'\u0459' # 0xBD -> CYRILLIC SMALL LETTER LJE
'\u040a' # 0xBE -> CYRILLIC CAPITAL LETTER NJE
'\u045a' # 0xBF -> CYRILLIC SMALL LETTER NJE
'\u0458' # 0xC0 -> CYRILLIC SMALL LETTER JE
'\u0405' # 0xC1 -> CYRILLIC CAPITAL LETTER DZE
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\u040b' # 0xCB -> CYRILLIC CAPITAL LETTER TSHE
'\u045b' # 0xCC -> CYRILLIC SMALL LETTER TSHE
'\u040c' # 0xCD -> CYRILLIC CAPITAL LETTER KJE
'\u045c' # 0xCE -> CYRILLIC SMALL LETTER KJE
'\u0455' # 0xCF -> CYRILLIC SMALL LETTER DZE
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u201e' # 0xD7 -> DOUBLE LOW-9 QUOTATION MARK
'\u040e' # 0xD8 -> CYRILLIC CAPITAL LETTER SHORT U
'\u045e' # 0xD9 -> CYRILLIC SMALL LETTER SHORT U
'\u040f' # 0xDA -> CYRILLIC CAPITAL LETTER DZHE
'\u045f' # 0xDB -> CYRILLIC SMALL LETTER DZHE
'\u2116' # 0xDC -> NUMERO SIGN
'\u0401' # 0xDD -> CYRILLIC CAPITAL LETTER IO
'\u0451' # 0xDE -> CYRILLIC SMALL LETTER IO
'\u044f' # 0xDF -> CYRILLIC SMALL LETTER YA
'\u0430' # 0xE0 -> CYRILLIC SMALL LETTER A
'\u0431' # 0xE1 -> CYRILLIC SMALL LETTER BE
'\u0432' # 0xE2 -> CYRILLIC SMALL LETTER VE
'\u0433' # 0xE3 -> CYRILLIC SMALL LETTER GHE
'\u0434' # 0xE4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0xE5 -> CYRILLIC SMALL LETTER IE
'\u0436' # 0xE6 -> CYRILLIC SMALL LETTER ZHE
'\u0437' # 0xE7 -> CYRILLIC SMALL LETTER ZE
'\u0438' # 0xE8 -> CYRILLIC SMALL LETTER I
'\u0439' # 0xE9 -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0xEA -> CYRILLIC SMALL LETTER KA
'\u043b' # 0xEB -> CYRILLIC SMALL LETTER EL
'\u043c' # 0xEC -> CYRILLIC SMALL LETTER EM
'\u043d' # 0xED -> CYRILLIC SMALL LETTER EN
'\u043e' # 0xEE -> CYRILLIC SMALL LETTER O
'\u043f' # 0xEF -> CYRILLIC SMALL LETTER PE
'\u0440' # 0xF0 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0xF1 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0xF2 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0xF3 -> CYRILLIC SMALL LETTER U
'\u0444' # 0xF4 -> CYRILLIC SMALL LETTER EF
'\u0445' # 0xF5 -> CYRILLIC SMALL LETTER HA
'\u0446' # 0xF6 -> CYRILLIC SMALL LETTER TSE
'\u0447' # 0xF7 -> CYRILLIC SMALL LETTER CHE
'\u0448' # 0xF8 -> CYRILLIC SMALL LETTER SHA
'\u0449' # 0xF9 -> CYRILLIC SMALL LETTER SHCHA
'\u044a' # 0xFA -> CYRILLIC SMALL LETTER HARD SIGN
'\u044b' # 0xFB -> CYRILLIC SMALL LETTER YERU
'\u044c' # 0xFC -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044d' # 0xFD -> CYRILLIC SMALL LETTER E
'\u044e' # 0xFE -> CYRILLIC SMALL LETTER YU
'\u20ac' # 0xFF -> EURO SIGN
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_farsi generated from 'MAPPINGS/VENDORS/APPLE/FARSI.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-farsi',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE, left-right
'!' # 0x21 -> EXCLAMATION MARK, left-right
'"' # 0x22 -> QUOTATION MARK, left-right
'#' # 0x23 -> NUMBER SIGN, left-right
'$' # 0x24 -> DOLLAR SIGN, left-right
'%' # 0x25 -> PERCENT SIGN, left-right
'&' # 0x26 -> AMPERSAND, left-right
"'" # 0x27 -> APOSTROPHE, left-right
'(' # 0x28 -> LEFT PARENTHESIS, left-right
')' # 0x29 -> RIGHT PARENTHESIS, left-right
'*' # 0x2A -> ASTERISK, left-right
'+' # 0x2B -> PLUS SIGN, left-right
',' # 0x2C -> COMMA, left-right; in Arabic-script context, displayed as 0x066C ARABIC THOUSANDS SEPARATOR
'-' # 0x2D -> HYPHEN-MINUS, left-right
'.' # 0x2E -> FULL STOP, left-right; in Arabic-script context, displayed as 0x066B ARABIC DECIMAL SEPARATOR
'/' # 0x2F -> SOLIDUS, left-right
'0' # 0x30 -> DIGIT ZERO; in Arabic-script context, displayed as 0x06F0 EXTENDED ARABIC-INDIC DIGIT ZERO
'1' # 0x31 -> DIGIT ONE; in Arabic-script context, displayed as 0x06F1 EXTENDED ARABIC-INDIC DIGIT ONE
'2' # 0x32 -> DIGIT TWO; in Arabic-script context, displayed as 0x06F2 EXTENDED ARABIC-INDIC DIGIT TWO
'3' # 0x33 -> DIGIT THREE; in Arabic-script context, displayed as 0x06F3 EXTENDED ARABIC-INDIC DIGIT THREE
'4' # 0x34 -> DIGIT FOUR; in Arabic-script context, displayed as 0x06F4 EXTENDED ARABIC-INDIC DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE; in Arabic-script context, displayed as 0x06F5 EXTENDED ARABIC-INDIC DIGIT FIVE
'6' # 0x36 -> DIGIT SIX; in Arabic-script context, displayed as 0x06F6 EXTENDED ARABIC-INDIC DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN; in Arabic-script context, displayed as 0x06F7 EXTENDED ARABIC-INDIC DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT; in Arabic-script context, displayed as 0x06F8 EXTENDED ARABIC-INDIC DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE; in Arabic-script context, displayed as 0x06F9 EXTENDED ARABIC-INDIC DIGIT NINE
':' # 0x3A -> COLON, left-right
';' # 0x3B -> SEMICOLON, left-right
'<' # 0x3C -> LESS-THAN SIGN, left-right
'=' # 0x3D -> EQUALS SIGN, left-right
'>' # 0x3E -> GREATER-THAN SIGN, left-right
'?' # 0x3F -> QUESTION MARK, left-right
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET, left-right
'\\' # 0x5C -> REVERSE SOLIDUS, left-right
']' # 0x5D -> RIGHT SQUARE BRACKET, left-right
'^' # 0x5E -> CIRCUMFLEX ACCENT, left-right
'_' # 0x5F -> LOW LINE, left-right
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET, left-right
'|' # 0x7C -> VERTICAL LINE, left-right
'}' # 0x7D -> RIGHT CURLY BRACKET, left-right
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xa0' # 0x81 -> NO-BREAK SPACE, right-left
'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\u06ba' # 0x8B -> ARABIC LETTER NOON GHUNNA
'\xab' # 0x8C -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\u2026' # 0x93 -> HORIZONTAL ELLIPSIS, right-left
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\xbb' # 0x98 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, right-left
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0x9B -> DIVISION SIGN, right-left
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
' ' # 0xA0 -> SPACE, right-left
'!' # 0xA1 -> EXCLAMATION MARK, right-left
'"' # 0xA2 -> QUOTATION MARK, right-left
'#' # 0xA3 -> NUMBER SIGN, right-left
'$' # 0xA4 -> DOLLAR SIGN, right-left
'\u066a' # 0xA5 -> ARABIC PERCENT SIGN
'&' # 0xA6 -> AMPERSAND, right-left
"'" # 0xA7 -> APOSTROPHE, right-left
'(' # 0xA8 -> LEFT PARENTHESIS, right-left
')' # 0xA9 -> RIGHT PARENTHESIS, right-left
'*' # 0xAA -> ASTERISK, right-left
'+' # 0xAB -> PLUS SIGN, right-left
'\u060c' # 0xAC -> ARABIC COMMA
'-' # 0xAD -> HYPHEN-MINUS, right-left
'.' # 0xAE -> FULL STOP, right-left
'/' # 0xAF -> SOLIDUS, right-left
'\u06f0' # 0xB0 -> EXTENDED ARABIC-INDIC DIGIT ZERO, right-left (need override)
'\u06f1' # 0xB1 -> EXTENDED ARABIC-INDIC DIGIT ONE, right-left (need override)
'\u06f2' # 0xB2 -> EXTENDED ARABIC-INDIC DIGIT TWO, right-left (need override)
'\u06f3' # 0xB3 -> EXTENDED ARABIC-INDIC DIGIT THREE, right-left (need override)
'\u06f4' # 0xB4 -> EXTENDED ARABIC-INDIC DIGIT FOUR, right-left (need override)
'\u06f5' # 0xB5 -> EXTENDED ARABIC-INDIC DIGIT FIVE, right-left (need override)
'\u06f6' # 0xB6 -> EXTENDED ARABIC-INDIC DIGIT SIX, right-left (need override)
'\u06f7' # 0xB7 -> EXTENDED ARABIC-INDIC DIGIT SEVEN, right-left (need override)
'\u06f8' # 0xB8 -> EXTENDED ARABIC-INDIC DIGIT EIGHT, right-left (need override)
'\u06f9' # 0xB9 -> EXTENDED ARABIC-INDIC DIGIT NINE, right-left (need override)
':' # 0xBA -> COLON, right-left
'\u061b' # 0xBB -> ARABIC SEMICOLON
'<' # 0xBC -> LESS-THAN SIGN, right-left
'=' # 0xBD -> EQUALS SIGN, right-left
'>' # 0xBE -> GREATER-THAN SIGN, right-left
'\u061f' # 0xBF -> ARABIC QUESTION MARK
'\u274a' # 0xC0 -> EIGHT TEARDROP-SPOKED PROPELLER ASTERISK, right-left
'\u0621' # 0xC1 -> ARABIC LETTER HAMZA
'\u0622' # 0xC2 -> ARABIC LETTER ALEF WITH MADDA ABOVE
'\u0623' # 0xC3 -> ARABIC LETTER ALEF WITH HAMZA ABOVE
'\u0624' # 0xC4 -> ARABIC LETTER WAW WITH HAMZA ABOVE
'\u0625' # 0xC5 -> ARABIC LETTER ALEF WITH HAMZA BELOW
'\u0626' # 0xC6 -> ARABIC LETTER YEH WITH HAMZA ABOVE
'\u0627' # 0xC7 -> ARABIC LETTER ALEF
'\u0628' # 0xC8 -> ARABIC LETTER BEH
'\u0629' # 0xC9 -> ARABIC LETTER TEH MARBUTA
'\u062a' # 0xCA -> ARABIC LETTER TEH
'\u062b' # 0xCB -> ARABIC LETTER THEH
'\u062c' # 0xCC -> ARABIC LETTER JEEM
'\u062d' # 0xCD -> ARABIC LETTER HAH
'\u062e' # 0xCE -> ARABIC LETTER KHAH
'\u062f' # 0xCF -> ARABIC LETTER DAL
'\u0630' # 0xD0 -> ARABIC LETTER THAL
'\u0631' # 0xD1 -> ARABIC LETTER REH
'\u0632' # 0xD2 -> ARABIC LETTER ZAIN
'\u0633' # 0xD3 -> ARABIC LETTER SEEN
'\u0634' # 0xD4 -> ARABIC LETTER SHEEN
'\u0635' # 0xD5 -> ARABIC LETTER SAD
'\u0636' # 0xD6 -> ARABIC LETTER DAD
'\u0637' # 0xD7 -> ARABIC LETTER TAH
'\u0638' # 0xD8 -> ARABIC LETTER ZAH
'\u0639' # 0xD9 -> ARABIC LETTER AIN
'\u063a' # 0xDA -> ARABIC LETTER GHAIN
'[' # 0xDB -> LEFT SQUARE BRACKET, right-left
'\\' # 0xDC -> REVERSE SOLIDUS, right-left
']' # 0xDD -> RIGHT SQUARE BRACKET, right-left
'^' # 0xDE -> CIRCUMFLEX ACCENT, right-left
'_' # 0xDF -> LOW LINE, right-left
'\u0640' # 0xE0 -> ARABIC TATWEEL
'\u0641' # 0xE1 -> ARABIC LETTER FEH
'\u0642' # 0xE2 -> ARABIC LETTER QAF
'\u0643' # 0xE3 -> ARABIC LETTER KAF
'\u0644' # 0xE4 -> ARABIC LETTER LAM
'\u0645' # 0xE5 -> ARABIC LETTER MEEM
'\u0646' # 0xE6 -> ARABIC LETTER NOON
'\u0647' # 0xE7 -> ARABIC LETTER HEH
'\u0648' # 0xE8 -> ARABIC LETTER WAW
'\u0649' # 0xE9 -> ARABIC LETTER ALEF MAKSURA
'\u064a' # 0xEA -> ARABIC LETTER YEH
'\u064b' # 0xEB -> ARABIC FATHATAN
'\u064c' # 0xEC -> ARABIC DAMMATAN
'\u064d' # 0xED -> ARABIC KASRATAN
'\u064e' # 0xEE -> ARABIC FATHA
'\u064f' # 0xEF -> ARABIC DAMMA
'\u0650' # 0xF0 -> ARABIC KASRA
'\u0651' # 0xF1 -> ARABIC SHADDA
'\u0652' # 0xF2 -> ARABIC SUKUN
'\u067e' # 0xF3 -> ARABIC LETTER PEH
'\u0679' # 0xF4 -> ARABIC LETTER TTEH
'\u0686' # 0xF5 -> ARABIC LETTER TCHEH
'\u06d5' # 0xF6 -> ARABIC LETTER AE
'\u06a4' # 0xF7 -> ARABIC LETTER VEH
'\u06af' # 0xF8 -> ARABIC LETTER GAF
'\u0688' # 0xF9 -> ARABIC LETTER DDAL
'\u0691' # 0xFA -> ARABIC LETTER RREH
'{' # 0xFB -> LEFT CURLY BRACKET, right-left
'|' # 0xFC -> VERTICAL LINE, right-left
'}' # 0xFD -> RIGHT CURLY BRACKET, right-left
'\u0698' # 0xFE -> ARABIC LETTER JEH
'\u06d2' # 0xFF -> ARABIC LETTER YEH BARREE
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_greek generated from 'MAPPINGS/VENDORS/APPLE/GREEK.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-greek',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xb9' # 0x81 -> SUPERSCRIPT ONE
'\xb2' # 0x82 -> SUPERSCRIPT TWO
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xb3' # 0x84 -> SUPERSCRIPT THREE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\u0385' # 0x87 -> GREEK DIALYTIKA TONOS
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\u0384' # 0x8B -> GREEK TONOS
'\xa8' # 0x8C -> DIAERESIS
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xa3' # 0x92 -> POUND SIGN
'\u2122' # 0x93 -> TRADE MARK SIGN
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\u2022' # 0x96 -> BULLET
'\xbd' # 0x97 -> VULGAR FRACTION ONE HALF
'\u2030' # 0x98 -> PER MILLE SIGN
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xa6' # 0x9B -> BROKEN BAR
'\u20ac' # 0x9C -> EURO SIGN # before Mac OS 9.2.2, was SOFT HYPHEN
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\u0393' # 0xA1 -> GREEK CAPITAL LETTER GAMMA
'\u0394' # 0xA2 -> GREEK CAPITAL LETTER DELTA
'\u0398' # 0xA3 -> GREEK CAPITAL LETTER THETA
'\u039b' # 0xA4 -> GREEK CAPITAL LETTER LAMDA
'\u039e' # 0xA5 -> GREEK CAPITAL LETTER XI
'\u03a0' # 0xA6 -> GREEK CAPITAL LETTER PI
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u03a3' # 0xAA -> GREEK CAPITAL LETTER SIGMA
'\u03aa' # 0xAB -> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\xa7' # 0xAC -> SECTION SIGN
'\u2260' # 0xAD -> NOT EQUAL TO
'\xb0' # 0xAE -> DEGREE SIGN
'\xb7' # 0xAF -> MIDDLE DOT
'\u0391' # 0xB0 -> GREEK CAPITAL LETTER ALPHA
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\xa5' # 0xB4 -> YEN SIGN
'\u0392' # 0xB5 -> GREEK CAPITAL LETTER BETA
'\u0395' # 0xB6 -> GREEK CAPITAL LETTER EPSILON
'\u0396' # 0xB7 -> GREEK CAPITAL LETTER ZETA
'\u0397' # 0xB8 -> GREEK CAPITAL LETTER ETA
'\u0399' # 0xB9 -> GREEK CAPITAL LETTER IOTA
'\u039a' # 0xBA -> GREEK CAPITAL LETTER KAPPA
'\u039c' # 0xBB -> GREEK CAPITAL LETTER MU
'\u03a6' # 0xBC -> GREEK CAPITAL LETTER PHI
'\u03ab' # 0xBD -> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'\u03a8' # 0xBE -> GREEK CAPITAL LETTER PSI
'\u03a9' # 0xBF -> GREEK CAPITAL LETTER OMEGA
'\u03ac' # 0xC0 -> GREEK SMALL LETTER ALPHA WITH TONOS
'\u039d' # 0xC1 -> GREEK CAPITAL LETTER NU
'\xac' # 0xC2 -> NOT SIGN
'\u039f' # 0xC3 -> GREEK CAPITAL LETTER OMICRON
'\u03a1' # 0xC4 -> GREEK CAPITAL LETTER RHO
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u03a4' # 0xC6 -> GREEK CAPITAL LETTER TAU
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\u03a5' # 0xCB -> GREEK CAPITAL LETTER UPSILON
'\u03a7' # 0xCC -> GREEK CAPITAL LETTER CHI
'\u0386' # 0xCD -> GREEK CAPITAL LETTER ALPHA WITH TONOS
'\u0388' # 0xCE -> GREEK CAPITAL LETTER EPSILON WITH TONOS
'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
'\u2013' # 0xD0 -> EN DASH
'\u2015' # 0xD1 -> HORIZONTAL BAR
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u0389' # 0xD7 -> GREEK CAPITAL LETTER ETA WITH TONOS
'\u038a' # 0xD8 -> GREEK CAPITAL LETTER IOTA WITH TONOS
'\u038c' # 0xD9 -> GREEK CAPITAL LETTER OMICRON WITH TONOS
'\u038e' # 0xDA -> GREEK CAPITAL LETTER UPSILON WITH TONOS
'\u03ad' # 0xDB -> GREEK SMALL LETTER EPSILON WITH TONOS
'\u03ae' # 0xDC -> GREEK SMALL LETTER ETA WITH TONOS
'\u03af' # 0xDD -> GREEK SMALL LETTER IOTA WITH TONOS
'\u03cc' # 0xDE -> GREEK SMALL LETTER OMICRON WITH TONOS
'\u038f' # 0xDF -> GREEK CAPITAL LETTER OMEGA WITH TONOS
'\u03cd' # 0xE0 -> GREEK SMALL LETTER UPSILON WITH TONOS
'\u03b1' # 0xE1 -> GREEK SMALL LETTER ALPHA
'\u03b2' # 0xE2 -> GREEK SMALL LETTER BETA
'\u03c8' # 0xE3 -> GREEK SMALL LETTER PSI
'\u03b4' # 0xE4 -> GREEK SMALL LETTER DELTA
'\u03b5' # 0xE5 -> GREEK SMALL LETTER EPSILON
'\u03c6' # 0xE6 -> GREEK SMALL LETTER PHI
'\u03b3' # 0xE7 -> GREEK SMALL LETTER GAMMA
'\u03b7' # 0xE8 -> GREEK SMALL LETTER ETA
'\u03b9' # 0xE9 -> GREEK SMALL LETTER IOTA
'\u03be' # 0xEA -> GREEK SMALL LETTER XI
'\u03ba' # 0xEB -> GREEK SMALL LETTER KAPPA
'\u03bb' # 0xEC -> GREEK SMALL LETTER LAMDA
'\u03bc' # 0xED -> GREEK SMALL LETTER MU
'\u03bd' # 0xEE -> GREEK SMALL LETTER NU
'\u03bf' # 0xEF -> GREEK SMALL LETTER OMICRON
'\u03c0' # 0xF0 -> GREEK SMALL LETTER PI
'\u03ce' # 0xF1 -> GREEK SMALL LETTER OMEGA WITH TONOS
'\u03c1' # 0xF2 -> GREEK SMALL LETTER RHO
'\u03c3' # 0xF3 -> GREEK SMALL LETTER SIGMA
'\u03c4' # 0xF4 -> GREEK SMALL LETTER TAU
'\u03b8' # 0xF5 -> GREEK SMALL LETTER THETA
'\u03c9' # 0xF6 -> GREEK SMALL LETTER OMEGA
'\u03c2' # 0xF7 -> GREEK SMALL LETTER FINAL SIGMA
'\u03c7' # 0xF8 -> GREEK SMALL LETTER CHI
'\u03c5' # 0xF9 -> GREEK SMALL LETTER UPSILON
'\u03b6' # 0xFA -> GREEK SMALL LETTER ZETA
'\u03ca' # 0xFB -> GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\u03cb' # 0xFC -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\u0390' # 0xFD -> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
'\u03b0' # 0xFE -> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
'\xad' # 0xFF -> SOFT HYPHEN # before Mac OS 9.2.2, was undefined
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_iceland generated from 'MAPPINGS/VENDORS/APPLE/ICELAND.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-iceland',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\xdd' # 0xA0 -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xb0' # 0xA1 -> DEGREE SIGN
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\xb4' # 0xAB -> ACUTE ACCENT
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\xc6' # 0xAE -> LATIN CAPITAL LETTER AE
'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
'\u221e' # 0xB0 -> INFINITY
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\xa5' # 0xB4 -> YEN SIGN
'\xb5' # 0xB5 -> MICRO SIGN
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u220f' # 0xB8 -> N-ARY PRODUCT
'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
'\u222b' # 0xBA -> INTEGRAL
'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
'\xe6' # 0xBE -> LATIN SMALL LETTER AE
'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
'\xbf' # 0xC0 -> INVERTED QUESTION MARK
'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\u2044' # 0xDA -> FRACTION SLASH
'\u20ac' # 0xDB -> EURO SIGN
'\xd0' # 0xDC -> LATIN CAPITAL LETTER ETH
'\xf0' # 0xDD -> LATIN SMALL LETTER ETH
'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN
'\xfe' # 0xDF -> LATIN SMALL LETTER THORN
'\xfd' # 0xE0 -> LATIN SMALL LETTER Y WITH ACUTE
'\xb7' # 0xE1 -> MIDDLE DOT
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u2030' # 0xE4 -> PER MILLE SIGN
'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\uf8ff' # 0xF0 -> Apple logo
'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u02dc' # 0xF7 -> SMALL TILDE
'\xaf' # 0xF8 -> MACRON
'\u02d8' # 0xF9 -> BREVE
'\u02d9' # 0xFA -> DOT ABOVE
'\u02da' # 0xFB -> RING ABOVE
'\xb8' # 0xFC -> CEDILLA
'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
'\u02db' # 0xFE -> OGONEK
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_latin2 generated from 'MAPPINGS/VENDORS/MICSFT/MAC/LATIN2.TXT' with gencodec.py.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
(c) Copyright 2000 Guido van Rossum.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-latin2',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\u0100' # 0x81 -> LATIN CAPITAL LETTER A WITH MACRON
'\u0101' # 0x82 -> LATIN SMALL LETTER A WITH MACRON
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\u0104' # 0x84 -> LATIN CAPITAL LETTER A WITH OGONEK
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\u0105' # 0x88 -> LATIN SMALL LETTER A WITH OGONEK
'\u010c' # 0x89 -> LATIN CAPITAL LETTER C WITH CARON
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\u010d' # 0x8B -> LATIN SMALL LETTER C WITH CARON
'\u0106' # 0x8C -> LATIN CAPITAL LETTER C WITH ACUTE
'\u0107' # 0x8D -> LATIN SMALL LETTER C WITH ACUTE
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\u0179' # 0x8F -> LATIN CAPITAL LETTER Z WITH ACUTE
'\u017a' # 0x90 -> LATIN SMALL LETTER Z WITH ACUTE
'\u010e' # 0x91 -> LATIN CAPITAL LETTER D WITH CARON
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\u010f' # 0x93 -> LATIN SMALL LETTER D WITH CARON
'\u0112' # 0x94 -> LATIN CAPITAL LETTER E WITH MACRON
'\u0113' # 0x95 -> LATIN SMALL LETTER E WITH MACRON
'\u0116' # 0x96 -> LATIN CAPITAL LETTER E WITH DOT ABOVE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\u0117' # 0x98 -> LATIN SMALL LETTER E WITH DOT ABOVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\u011a' # 0x9D -> LATIN CAPITAL LETTER E WITH CARON
'\u011b' # 0x9E -> LATIN SMALL LETTER E WITH CARON
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\u0118' # 0xA2 -> LATIN CAPITAL LETTER E WITH OGONEK
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\u0119' # 0xAB -> LATIN SMALL LETTER E WITH OGONEK
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\u0123' # 0xAE -> LATIN SMALL LETTER G WITH CEDILLA
'\u012e' # 0xAF -> LATIN CAPITAL LETTER I WITH OGONEK
'\u012f' # 0xB0 -> LATIN SMALL LETTER I WITH OGONEK
'\u012a' # 0xB1 -> LATIN CAPITAL LETTER I WITH MACRON
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\u012b' # 0xB4 -> LATIN SMALL LETTER I WITH MACRON
'\u0136' # 0xB5 -> LATIN CAPITAL LETTER K WITH CEDILLA
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u0142' # 0xB8 -> LATIN SMALL LETTER L WITH STROKE
'\u013b' # 0xB9 -> LATIN CAPITAL LETTER L WITH CEDILLA
'\u013c' # 0xBA -> LATIN SMALL LETTER L WITH CEDILLA
'\u013d' # 0xBB -> LATIN CAPITAL LETTER L WITH CARON
'\u013e' # 0xBC -> LATIN SMALL LETTER L WITH CARON
'\u0139' # 0xBD -> LATIN CAPITAL LETTER L WITH ACUTE
'\u013a' # 0xBE -> LATIN SMALL LETTER L WITH ACUTE
'\u0145' # 0xBF -> LATIN CAPITAL LETTER N WITH CEDILLA
'\u0146' # 0xC0 -> LATIN SMALL LETTER N WITH CEDILLA
'\u0143' # 0xC1 -> LATIN CAPITAL LETTER N WITH ACUTE
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0144' # 0xC4 -> LATIN SMALL LETTER N WITH ACUTE
'\u0147' # 0xC5 -> LATIN CAPITAL LETTER N WITH CARON
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\u0148' # 0xCB -> LATIN SMALL LETTER N WITH CARON
'\u0150' # 0xCC -> LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0151' # 0xCE -> LATIN SMALL LETTER O WITH DOUBLE ACUTE
'\u014c' # 0xCF -> LATIN CAPITAL LETTER O WITH MACRON
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\u014d' # 0xD8 -> LATIN SMALL LETTER O WITH MACRON
'\u0154' # 0xD9 -> LATIN CAPITAL LETTER R WITH ACUTE
'\u0155' # 0xDA -> LATIN SMALL LETTER R WITH ACUTE
'\u0158' # 0xDB -> LATIN CAPITAL LETTER R WITH CARON
'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u0159' # 0xDE -> LATIN SMALL LETTER R WITH CARON
'\u0156' # 0xDF -> LATIN CAPITAL LETTER R WITH CEDILLA
'\u0157' # 0xE0 -> LATIN SMALL LETTER R WITH CEDILLA
'\u0160' # 0xE1 -> LATIN CAPITAL LETTER S WITH CARON
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u0161' # 0xE4 -> LATIN SMALL LETTER S WITH CARON
'\u015a' # 0xE5 -> LATIN CAPITAL LETTER S WITH ACUTE
'\u015b' # 0xE6 -> LATIN SMALL LETTER S WITH ACUTE
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\u0164' # 0xE8 -> LATIN CAPITAL LETTER T WITH CARON
'\u0165' # 0xE9 -> LATIN SMALL LETTER T WITH CARON
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\u017d' # 0xEB -> LATIN CAPITAL LETTER Z WITH CARON
'\u017e' # 0xEC -> LATIN SMALL LETTER Z WITH CARON
'\u016a' # 0xED -> LATIN CAPITAL LETTER U WITH MACRON
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\u016b' # 0xF0 -> LATIN SMALL LETTER U WITH MACRON
'\u016e' # 0xF1 -> LATIN CAPITAL LETTER U WITH RING ABOVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\u016f' # 0xF3 -> LATIN SMALL LETTER U WITH RING ABOVE
'\u0170' # 0xF4 -> LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
'\u0171' # 0xF5 -> LATIN SMALL LETTER U WITH DOUBLE ACUTE
'\u0172' # 0xF6 -> LATIN CAPITAL LETTER U WITH OGONEK
'\u0173' # 0xF7 -> LATIN SMALL LETTER U WITH OGONEK
'\xdd' # 0xF8 -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xfd' # 0xF9 -> LATIN SMALL LETTER Y WITH ACUTE
'\u0137' # 0xFA -> LATIN SMALL LETTER K WITH CEDILLA
'\u017b' # 0xFB -> LATIN CAPITAL LETTER Z WITH DOT ABOVE
'\u0141' # 0xFC -> LATIN CAPITAL LETTER L WITH STROKE
'\u017c' # 0xFD -> LATIN SMALL LETTER Z WITH DOT ABOVE
'\u0122' # 0xFE -> LATIN CAPITAL LETTER G WITH CEDILLA
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_roman generated from 'MAPPINGS/VENDORS/APPLE/ROMAN.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-roman',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\xb4' # 0xAB -> ACUTE ACCENT
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\xc6' # 0xAE -> LATIN CAPITAL LETTER AE
'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
'\u221e' # 0xB0 -> INFINITY
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\xa5' # 0xB4 -> YEN SIGN
'\xb5' # 0xB5 -> MICRO SIGN
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u220f' # 0xB8 -> N-ARY PRODUCT
'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
'\u222b' # 0xBA -> INTEGRAL
'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
'\xe6' # 0xBE -> LATIN SMALL LETTER AE
'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
'\xbf' # 0xC0 -> INVERTED QUESTION MARK
'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\u2044' # 0xDA -> FRACTION SLASH
'\u20ac' # 0xDB -> EURO SIGN
'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\ufb01' # 0xDE -> LATIN SMALL LIGATURE FI
'\ufb02' # 0xDF -> LATIN SMALL LIGATURE FL
'\u2021' # 0xE0 -> DOUBLE DAGGER
'\xb7' # 0xE1 -> MIDDLE DOT
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u2030' # 0xE4 -> PER MILLE SIGN
'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\uf8ff' # 0xF0 -> Apple logo
'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u02dc' # 0xF7 -> SMALL TILDE
'\xaf' # 0xF8 -> MACRON
'\u02d8' # 0xF9 -> BREVE
'\u02d9' # 0xFA -> DOT ABOVE
'\u02da' # 0xFB -> RING ABOVE
'\xb8' # 0xFC -> CEDILLA
'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
'\u02db' # 0xFE -> OGONEK
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_romanian generated from 'MAPPINGS/VENDORS/APPLE/ROMANIAN.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-romanian',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\xb4' # 0xAB -> ACUTE ACCENT
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\u0102' # 0xAE -> LATIN CAPITAL LETTER A WITH BREVE
'\u0218' # 0xAF -> LATIN CAPITAL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
'\u221e' # 0xB0 -> INFINITY
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\xa5' # 0xB4 -> YEN SIGN
'\xb5' # 0xB5 -> MICRO SIGN
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u220f' # 0xB8 -> N-ARY PRODUCT
'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
'\u222b' # 0xBA -> INTEGRAL
'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
'\u0103' # 0xBE -> LATIN SMALL LETTER A WITH BREVE
'\u0219' # 0xBF -> LATIN SMALL LETTER S WITH COMMA BELOW # for Unicode 3.0 and later
'\xbf' # 0xC0 -> INVERTED QUESTION MARK
'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\u2044' # 0xDA -> FRACTION SLASH
'\u20ac' # 0xDB -> EURO SIGN
'\u2039' # 0xDC -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u203a' # 0xDD -> SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
'\u021a' # 0xDE -> LATIN CAPITAL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
'\u021b' # 0xDF -> LATIN SMALL LETTER T WITH COMMA BELOW # for Unicode 3.0 and later
'\u2021' # 0xE0 -> DOUBLE DAGGER
'\xb7' # 0xE1 -> MIDDLE DOT
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u2030' # 0xE4 -> PER MILLE SIGN
'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\uf8ff' # 0xF0 -> Apple logo
'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
'\u0131' # 0xF5 -> LATIN SMALL LETTER DOTLESS I
'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u02dc' # 0xF7 -> SMALL TILDE
'\xaf' # 0xF8 -> MACRON
'\u02d8' # 0xF9 -> BREVE
'\u02d9' # 0xFA -> DOT ABOVE
'\u02da' # 0xFB -> RING ABOVE
'\xb8' # 0xFC -> CEDILLA
'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
'\u02db' # 0xFE -> OGONEK
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec mac_turkish generated from 'MAPPINGS/VENDORS/APPLE/TURKISH.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mac-turkish',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> CONTROL CHARACTER
'\x01' # 0x01 -> CONTROL CHARACTER
'\x02' # 0x02 -> CONTROL CHARACTER
'\x03' # 0x03 -> CONTROL CHARACTER
'\x04' # 0x04 -> CONTROL CHARACTER
'\x05' # 0x05 -> CONTROL CHARACTER
'\x06' # 0x06 -> CONTROL CHARACTER
'\x07' # 0x07 -> CONTROL CHARACTER
'\x08' # 0x08 -> CONTROL CHARACTER
'\t' # 0x09 -> CONTROL CHARACTER
'\n' # 0x0A -> CONTROL CHARACTER
'\x0b' # 0x0B -> CONTROL CHARACTER
'\x0c' # 0x0C -> CONTROL CHARACTER
'\r' # 0x0D -> CONTROL CHARACTER
'\x0e' # 0x0E -> CONTROL CHARACTER
'\x0f' # 0x0F -> CONTROL CHARACTER
'\x10' # 0x10 -> CONTROL CHARACTER
'\x11' # 0x11 -> CONTROL CHARACTER
'\x12' # 0x12 -> CONTROL CHARACTER
'\x13' # 0x13 -> CONTROL CHARACTER
'\x14' # 0x14 -> CONTROL CHARACTER
'\x15' # 0x15 -> CONTROL CHARACTER
'\x16' # 0x16 -> CONTROL CHARACTER
'\x17' # 0x17 -> CONTROL CHARACTER
'\x18' # 0x18 -> CONTROL CHARACTER
'\x19' # 0x19 -> CONTROL CHARACTER
'\x1a' # 0x1A -> CONTROL CHARACTER
'\x1b' # 0x1B -> CONTROL CHARACTER
'\x1c' # 0x1C -> CONTROL CHARACTER
'\x1d' # 0x1D -> CONTROL CHARACTER
'\x1e' # 0x1E -> CONTROL CHARACTER
'\x1f' # 0x1F -> CONTROL CHARACTER
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> CONTROL CHARACTER
'\xc4' # 0x80 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0x81 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc7' # 0x82 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc9' # 0x83 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xd1' # 0x84 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd6' # 0x85 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xdc' # 0x86 -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xe1' # 0x87 -> LATIN SMALL LETTER A WITH ACUTE
'\xe0' # 0x88 -> LATIN SMALL LETTER A WITH GRAVE
'\xe2' # 0x89 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe4' # 0x8A -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe3' # 0x8B -> LATIN SMALL LETTER A WITH TILDE
'\xe5' # 0x8C -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe7' # 0x8D -> LATIN SMALL LETTER C WITH CEDILLA
'\xe9' # 0x8E -> LATIN SMALL LETTER E WITH ACUTE
'\xe8' # 0x8F -> LATIN SMALL LETTER E WITH GRAVE
'\xea' # 0x90 -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0x91 -> LATIN SMALL LETTER E WITH DIAERESIS
'\xed' # 0x92 -> LATIN SMALL LETTER I WITH ACUTE
'\xec' # 0x93 -> LATIN SMALL LETTER I WITH GRAVE
'\xee' # 0x94 -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0x95 -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf1' # 0x96 -> LATIN SMALL LETTER N WITH TILDE
'\xf3' # 0x97 -> LATIN SMALL LETTER O WITH ACUTE
'\xf2' # 0x98 -> LATIN SMALL LETTER O WITH GRAVE
'\xf4' # 0x99 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf6' # 0x9A -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf5' # 0x9B -> LATIN SMALL LETTER O WITH TILDE
'\xfa' # 0x9C -> LATIN SMALL LETTER U WITH ACUTE
'\xf9' # 0x9D -> LATIN SMALL LETTER U WITH GRAVE
'\xfb' # 0x9E -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0x9F -> LATIN SMALL LETTER U WITH DIAERESIS
'\u2020' # 0xA0 -> DAGGER
'\xb0' # 0xA1 -> DEGREE SIGN
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa7' # 0xA4 -> SECTION SIGN
'\u2022' # 0xA5 -> BULLET
'\xb6' # 0xA6 -> PILCROW SIGN
'\xdf' # 0xA7 -> LATIN SMALL LETTER SHARP S
'\xae' # 0xA8 -> REGISTERED SIGN
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u2122' # 0xAA -> TRADE MARK SIGN
'\xb4' # 0xAB -> ACUTE ACCENT
'\xa8' # 0xAC -> DIAERESIS
'\u2260' # 0xAD -> NOT EQUAL TO
'\xc6' # 0xAE -> LATIN CAPITAL LETTER AE
'\xd8' # 0xAF -> LATIN CAPITAL LETTER O WITH STROKE
'\u221e' # 0xB0 -> INFINITY
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\u2264' # 0xB2 -> LESS-THAN OR EQUAL TO
'\u2265' # 0xB3 -> GREATER-THAN OR EQUAL TO
'\xa5' # 0xB4 -> YEN SIGN
'\xb5' # 0xB5 -> MICRO SIGN
'\u2202' # 0xB6 -> PARTIAL DIFFERENTIAL
'\u2211' # 0xB7 -> N-ARY SUMMATION
'\u220f' # 0xB8 -> N-ARY PRODUCT
'\u03c0' # 0xB9 -> GREEK SMALL LETTER PI
'\u222b' # 0xBA -> INTEGRAL
'\xaa' # 0xBB -> FEMININE ORDINAL INDICATOR
'\xba' # 0xBC -> MASCULINE ORDINAL INDICATOR
'\u03a9' # 0xBD -> GREEK CAPITAL LETTER OMEGA
'\xe6' # 0xBE -> LATIN SMALL LETTER AE
'\xf8' # 0xBF -> LATIN SMALL LETTER O WITH STROKE
'\xbf' # 0xC0 -> INVERTED QUESTION MARK
'\xa1' # 0xC1 -> INVERTED EXCLAMATION MARK
'\xac' # 0xC2 -> NOT SIGN
'\u221a' # 0xC3 -> SQUARE ROOT
'\u0192' # 0xC4 -> LATIN SMALL LETTER F WITH HOOK
'\u2248' # 0xC5 -> ALMOST EQUAL TO
'\u2206' # 0xC6 -> INCREMENT
'\xab' # 0xC7 -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbb' # 0xC8 -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u2026' # 0xC9 -> HORIZONTAL ELLIPSIS
'\xa0' # 0xCA -> NO-BREAK SPACE
'\xc0' # 0xCB -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc3' # 0xCC -> LATIN CAPITAL LETTER A WITH TILDE
'\xd5' # 0xCD -> LATIN CAPITAL LETTER O WITH TILDE
'\u0152' # 0xCE -> LATIN CAPITAL LIGATURE OE
'\u0153' # 0xCF -> LATIN SMALL LIGATURE OE
'\u2013' # 0xD0 -> EN DASH
'\u2014' # 0xD1 -> EM DASH
'\u201c' # 0xD2 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0xD3 -> RIGHT DOUBLE QUOTATION MARK
'\u2018' # 0xD4 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0xD5 -> RIGHT SINGLE QUOTATION MARK
'\xf7' # 0xD6 -> DIVISION SIGN
'\u25ca' # 0xD7 -> LOZENGE
'\xff' # 0xD8 -> LATIN SMALL LETTER Y WITH DIAERESIS
'\u0178' # 0xD9 -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\u011e' # 0xDA -> LATIN CAPITAL LETTER G WITH BREVE
'\u011f' # 0xDB -> LATIN SMALL LETTER G WITH BREVE
'\u0130' # 0xDC -> LATIN CAPITAL LETTER I WITH DOT ABOVE
'\u0131' # 0xDD -> LATIN SMALL LETTER DOTLESS I
'\u015e' # 0xDE -> LATIN CAPITAL LETTER S WITH CEDILLA
'\u015f' # 0xDF -> LATIN SMALL LETTER S WITH CEDILLA
'\u2021' # 0xE0 -> DOUBLE DAGGER
'\xb7' # 0xE1 -> MIDDLE DOT
'\u201a' # 0xE2 -> SINGLE LOW-9 QUOTATION MARK
'\u201e' # 0xE3 -> DOUBLE LOW-9 QUOTATION MARK
'\u2030' # 0xE4 -> PER MILLE SIGN
'\xc2' # 0xE5 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xca' # 0xE6 -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xc1' # 0xE7 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xcb' # 0xE8 -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xc8' # 0xE9 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xcd' # 0xEA -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xEB -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xEC -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xcc' # 0xED -> LATIN CAPITAL LETTER I WITH GRAVE
'\xd3' # 0xEE -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xEF -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\uf8ff' # 0xF0 -> Apple logo
'\xd2' # 0xF1 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xda' # 0xF2 -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xF3 -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xd9' # 0xF4 -> LATIN CAPITAL LETTER U WITH GRAVE
'\uf8a0' # 0xF5 -> undefined1
'\u02c6' # 0xF6 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u02dc' # 0xF7 -> SMALL TILDE
'\xaf' # 0xF8 -> MACRON
'\u02d8' # 0xF9 -> BREVE
'\u02d9' # 0xFA -> DOT ABOVE
'\u02da' # 0xFB -> RING ABOVE
'\xb8' # 0xFC -> CEDILLA
'\u02dd' # 0xFD -> DOUBLE ACUTE ACCENT
'\u02db' # 0xFE -> OGONEK
'\u02c7' # 0xFF -> CARON
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'mbcs' Codec for Windows
Cloned by Mark Hammond ([email protected]) from ascii.py,
which was written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
# Import them explicitly to cause an ImportError
# on non-Windows systems
from codecs import mbcs_encode, mbcs_decode
# for IncrementalDecoder, IncrementalEncoder, ...
import codecs
### Codec APIs
encode = mbcs_encode
def decode(input, errors='strict'):
return mbcs_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return mbcs_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = mbcs_decode
class StreamWriter(codecs.StreamWriter):
encode = mbcs_encode
class StreamReader(codecs.StreamReader):
decode = mbcs_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='mbcs',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec for PalmOS 3.5.
Written by Sjoerd Mullender ([email protected]); based on iso8859_15.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='palmos',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\u20ac' # 0x80 -> EURO SIGN
'\x81' # 0x81 -> <control>
'\u201a' # 0x82 -> SINGLE LOW-9 QUOTATION MARK
'\u0192' # 0x83 -> LATIN SMALL LETTER F WITH HOOK
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u2020' # 0x86 -> DAGGER
'\u2021' # 0x87 -> DOUBLE DAGGER
'\u02c6' # 0x88 -> MODIFIER LETTER CIRCUMFLEX ACCENT
'\u2030' # 0x89 -> PER MILLE SIGN
'\u0160' # 0x8A -> LATIN CAPITAL LETTER S WITH CARON
'\u2039' # 0x8B -> SINGLE LEFT-POINTING ANGLE QUOTATION MARK
'\u0152' # 0x8C -> LATIN CAPITAL LIGATURE OE
'\u2666' # 0x8D -> BLACK DIAMOND SUIT
'\u2663' # 0x8E -> BLACK CLUB SUIT
'\u2665' # 0x8F -> BLACK HEART SUIT
'\u2660' # 0x90 -> BLACK SPADE SUIT
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u02dc' # 0x98 -> SMALL TILDE
'\u2122' # 0x99 -> TRADE MARK SIGN
'\u0161' # 0x9A -> LATIN SMALL LETTER S WITH CARON
'\x9b' # 0x9B -> <control>
'\u0153' # 0x9C -> LATIN SMALL LIGATURE OE
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\u0178' # 0x9F -> LATIN CAPITAL LETTER Y WITH DIAERESIS
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\xa1' # 0xA1 -> INVERTED EXCLAMATION MARK
'\xa2' # 0xA2 -> CENT SIGN
'\xa3' # 0xA3 -> POUND SIGN
'\xa4' # 0xA4 -> CURRENCY SIGN
'\xa5' # 0xA5 -> YEN SIGN
'\xa6' # 0xA6 -> BROKEN BAR
'\xa7' # 0xA7 -> SECTION SIGN
'\xa8' # 0xA8 -> DIAERESIS
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\xaa' # 0xAA -> FEMININE ORDINAL INDICATOR
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\xad' # 0xAD -> SOFT HYPHEN
'\xae' # 0xAE -> REGISTERED SIGN
'\xaf' # 0xAF -> MACRON
'\xb0' # 0xB0 -> DEGREE SIGN
'\xb1' # 0xB1 -> PLUS-MINUS SIGN
'\xb2' # 0xB2 -> SUPERSCRIPT TWO
'\xb3' # 0xB3 -> SUPERSCRIPT THREE
'\xb4' # 0xB4 -> ACUTE ACCENT
'\xb5' # 0xB5 -> MICRO SIGN
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\xb8' # 0xB8 -> CEDILLA
'\xb9' # 0xB9 -> SUPERSCRIPT ONE
'\xba' # 0xBA -> MASCULINE ORDINAL INDICATOR
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xbc' # 0xBC -> VULGAR FRACTION ONE QUARTER
'\xbd' # 0xBD -> VULGAR FRACTION ONE HALF
'\xbe' # 0xBE -> VULGAR FRACTION THREE QUARTERS
'\xbf' # 0xBF -> INVERTED QUESTION MARK
'\xc0' # 0xC0 -> LATIN CAPITAL LETTER A WITH GRAVE
'\xc1' # 0xC1 -> LATIN CAPITAL LETTER A WITH ACUTE
'\xc2' # 0xC2 -> LATIN CAPITAL LETTER A WITH CIRCUMFLEX
'\xc3' # 0xC3 -> LATIN CAPITAL LETTER A WITH TILDE
'\xc4' # 0xC4 -> LATIN CAPITAL LETTER A WITH DIAERESIS
'\xc5' # 0xC5 -> LATIN CAPITAL LETTER A WITH RING ABOVE
'\xc6' # 0xC6 -> LATIN CAPITAL LETTER AE
'\xc7' # 0xC7 -> LATIN CAPITAL LETTER C WITH CEDILLA
'\xc8' # 0xC8 -> LATIN CAPITAL LETTER E WITH GRAVE
'\xc9' # 0xC9 -> LATIN CAPITAL LETTER E WITH ACUTE
'\xca' # 0xCA -> LATIN CAPITAL LETTER E WITH CIRCUMFLEX
'\xcb' # 0xCB -> LATIN CAPITAL LETTER E WITH DIAERESIS
'\xcc' # 0xCC -> LATIN CAPITAL LETTER I WITH GRAVE
'\xcd' # 0xCD -> LATIN CAPITAL LETTER I WITH ACUTE
'\xce' # 0xCE -> LATIN CAPITAL LETTER I WITH CIRCUMFLEX
'\xcf' # 0xCF -> LATIN CAPITAL LETTER I WITH DIAERESIS
'\xd0' # 0xD0 -> LATIN CAPITAL LETTER ETH (Icelandic)
'\xd1' # 0xD1 -> LATIN CAPITAL LETTER N WITH TILDE
'\xd2' # 0xD2 -> LATIN CAPITAL LETTER O WITH GRAVE
'\xd3' # 0xD3 -> LATIN CAPITAL LETTER O WITH ACUTE
'\xd4' # 0xD4 -> LATIN CAPITAL LETTER O WITH CIRCUMFLEX
'\xd5' # 0xD5 -> LATIN CAPITAL LETTER O WITH TILDE
'\xd6' # 0xD6 -> LATIN CAPITAL LETTER O WITH DIAERESIS
'\xd7' # 0xD7 -> MULTIPLICATION SIGN
'\xd8' # 0xD8 -> LATIN CAPITAL LETTER O WITH STROKE
'\xd9' # 0xD9 -> LATIN CAPITAL LETTER U WITH GRAVE
'\xda' # 0xDA -> LATIN CAPITAL LETTER U WITH ACUTE
'\xdb' # 0xDB -> LATIN CAPITAL LETTER U WITH CIRCUMFLEX
'\xdc' # 0xDC -> LATIN CAPITAL LETTER U WITH DIAERESIS
'\xdd' # 0xDD -> LATIN CAPITAL LETTER Y WITH ACUTE
'\xde' # 0xDE -> LATIN CAPITAL LETTER THORN (Icelandic)
'\xdf' # 0xDF -> LATIN SMALL LETTER SHARP S (German)
'\xe0' # 0xE0 -> LATIN SMALL LETTER A WITH GRAVE
'\xe1' # 0xE1 -> LATIN SMALL LETTER A WITH ACUTE
'\xe2' # 0xE2 -> LATIN SMALL LETTER A WITH CIRCUMFLEX
'\xe3' # 0xE3 -> LATIN SMALL LETTER A WITH TILDE
'\xe4' # 0xE4 -> LATIN SMALL LETTER A WITH DIAERESIS
'\xe5' # 0xE5 -> LATIN SMALL LETTER A WITH RING ABOVE
'\xe6' # 0xE6 -> LATIN SMALL LETTER AE
'\xe7' # 0xE7 -> LATIN SMALL LETTER C WITH CEDILLA
'\xe8' # 0xE8 -> LATIN SMALL LETTER E WITH GRAVE
'\xe9' # 0xE9 -> LATIN SMALL LETTER E WITH ACUTE
'\xea' # 0xEA -> LATIN SMALL LETTER E WITH CIRCUMFLEX
'\xeb' # 0xEB -> LATIN SMALL LETTER E WITH DIAERESIS
'\xec' # 0xEC -> LATIN SMALL LETTER I WITH GRAVE
'\xed' # 0xED -> LATIN SMALL LETTER I WITH ACUTE
'\xee' # 0xEE -> LATIN SMALL LETTER I WITH CIRCUMFLEX
'\xef' # 0xEF -> LATIN SMALL LETTER I WITH DIAERESIS
'\xf0' # 0xF0 -> LATIN SMALL LETTER ETH (Icelandic)
'\xf1' # 0xF1 -> LATIN SMALL LETTER N WITH TILDE
'\xf2' # 0xF2 -> LATIN SMALL LETTER O WITH GRAVE
'\xf3' # 0xF3 -> LATIN SMALL LETTER O WITH ACUTE
'\xf4' # 0xF4 -> LATIN SMALL LETTER O WITH CIRCUMFLEX
'\xf5' # 0xF5 -> LATIN SMALL LETTER O WITH TILDE
'\xf6' # 0xF6 -> LATIN SMALL LETTER O WITH DIAERESIS
'\xf7' # 0xF7 -> DIVISION SIGN
'\xf8' # 0xF8 -> LATIN SMALL LETTER O WITH STROKE
'\xf9' # 0xF9 -> LATIN SMALL LETTER U WITH GRAVE
'\xfa' # 0xFA -> LATIN SMALL LETTER U WITH ACUTE
'\xfb' # 0xFB -> LATIN SMALL LETTER U WITH CIRCUMFLEX
'\xfc' # 0xFC -> LATIN SMALL LETTER U WITH DIAERESIS
'\xfd' # 0xFD -> LATIN SMALL LETTER Y WITH ACUTE
'\xfe' # 0xFE -> LATIN SMALL LETTER THORN (Icelandic)
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python Character Mapping Codec generated from 'PTCP154.txt' with gencodec.py.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
(c) Copyright 2000 Guido van Rossum.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='ptcp154',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE (DEL)
'\u0496' # 0x80 -> CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
'\u0492' # 0x81 -> CYRILLIC CAPITAL LETTER GHE WITH STROKE
'\u04ee' # 0x82 -> CYRILLIC CAPITAL LETTER U WITH MACRON
'\u0493' # 0x83 -> CYRILLIC SMALL LETTER GHE WITH STROKE
'\u201e' # 0x84 -> DOUBLE LOW-9 QUOTATION MARK
'\u2026' # 0x85 -> HORIZONTAL ELLIPSIS
'\u04b6' # 0x86 -> CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
'\u04ae' # 0x87 -> CYRILLIC CAPITAL LETTER STRAIGHT U
'\u04b2' # 0x88 -> CYRILLIC CAPITAL LETTER HA WITH DESCENDER
'\u04af' # 0x89 -> CYRILLIC SMALL LETTER STRAIGHT U
'\u04a0' # 0x8A -> CYRILLIC CAPITAL LETTER BASHKIR KA
'\u04e2' # 0x8B -> CYRILLIC CAPITAL LETTER I WITH MACRON
'\u04a2' # 0x8C -> CYRILLIC CAPITAL LETTER EN WITH DESCENDER
'\u049a' # 0x8D -> CYRILLIC CAPITAL LETTER KA WITH DESCENDER
'\u04ba' # 0x8E -> CYRILLIC CAPITAL LETTER SHHA
'\u04b8' # 0x8F -> CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE
'\u0497' # 0x90 -> CYRILLIC SMALL LETTER ZHE WITH DESCENDER
'\u2018' # 0x91 -> LEFT SINGLE QUOTATION MARK
'\u2019' # 0x92 -> RIGHT SINGLE QUOTATION MARK
'\u201c' # 0x93 -> LEFT DOUBLE QUOTATION MARK
'\u201d' # 0x94 -> RIGHT DOUBLE QUOTATION MARK
'\u2022' # 0x95 -> BULLET
'\u2013' # 0x96 -> EN DASH
'\u2014' # 0x97 -> EM DASH
'\u04b3' # 0x98 -> CYRILLIC SMALL LETTER HA WITH DESCENDER
'\u04b7' # 0x99 -> CYRILLIC SMALL LETTER CHE WITH DESCENDER
'\u04a1' # 0x9A -> CYRILLIC SMALL LETTER BASHKIR KA
'\u04e3' # 0x9B -> CYRILLIC SMALL LETTER I WITH MACRON
'\u04a3' # 0x9C -> CYRILLIC SMALL LETTER EN WITH DESCENDER
'\u049b' # 0x9D -> CYRILLIC SMALL LETTER KA WITH DESCENDER
'\u04bb' # 0x9E -> CYRILLIC SMALL LETTER SHHA
'\u04b9' # 0x9F -> CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE
'\xa0' # 0xA0 -> NO-BREAK SPACE
'\u040e' # 0xA1 -> CYRILLIC CAPITAL LETTER SHORT U (Byelorussian)
'\u045e' # 0xA2 -> CYRILLIC SMALL LETTER SHORT U (Byelorussian)
'\u0408' # 0xA3 -> CYRILLIC CAPITAL LETTER JE
'\u04e8' # 0xA4 -> CYRILLIC CAPITAL LETTER BARRED O
'\u0498' # 0xA5 -> CYRILLIC CAPITAL LETTER ZE WITH DESCENDER
'\u04b0' # 0xA6 -> CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE
'\xa7' # 0xA7 -> SECTION SIGN
'\u0401' # 0xA8 -> CYRILLIC CAPITAL LETTER IO
'\xa9' # 0xA9 -> COPYRIGHT SIGN
'\u04d8' # 0xAA -> CYRILLIC CAPITAL LETTER SCHWA
'\xab' # 0xAB -> LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xac' # 0xAC -> NOT SIGN
'\u04ef' # 0xAD -> CYRILLIC SMALL LETTER U WITH MACRON
'\xae' # 0xAE -> REGISTERED SIGN
'\u049c' # 0xAF -> CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE
'\xb0' # 0xB0 -> DEGREE SIGN
'\u04b1' # 0xB1 -> CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE
'\u0406' # 0xB2 -> CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0456' # 0xB3 -> CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
'\u0499' # 0xB4 -> CYRILLIC SMALL LETTER ZE WITH DESCENDER
'\u04e9' # 0xB5 -> CYRILLIC SMALL LETTER BARRED O
'\xb6' # 0xB6 -> PILCROW SIGN
'\xb7' # 0xB7 -> MIDDLE DOT
'\u0451' # 0xB8 -> CYRILLIC SMALL LETTER IO
'\u2116' # 0xB9 -> NUMERO SIGN
'\u04d9' # 0xBA -> CYRILLIC SMALL LETTER SCHWA
'\xbb' # 0xBB -> RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\u0458' # 0xBC -> CYRILLIC SMALL LETTER JE
'\u04aa' # 0xBD -> CYRILLIC CAPITAL LETTER ES WITH DESCENDER
'\u04ab' # 0xBE -> CYRILLIC SMALL LETTER ES WITH DESCENDER
'\u049d' # 0xBF -> CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE
'\u0410' # 0xC0 -> CYRILLIC CAPITAL LETTER A
'\u0411' # 0xC1 -> CYRILLIC CAPITAL LETTER BE
'\u0412' # 0xC2 -> CYRILLIC CAPITAL LETTER VE
'\u0413' # 0xC3 -> CYRILLIC CAPITAL LETTER GHE
'\u0414' # 0xC4 -> CYRILLIC CAPITAL LETTER DE
'\u0415' # 0xC5 -> CYRILLIC CAPITAL LETTER IE
'\u0416' # 0xC6 -> CYRILLIC CAPITAL LETTER ZHE
'\u0417' # 0xC7 -> CYRILLIC CAPITAL LETTER ZE
'\u0418' # 0xC8 -> CYRILLIC CAPITAL LETTER I
'\u0419' # 0xC9 -> CYRILLIC CAPITAL LETTER SHORT I
'\u041a' # 0xCA -> CYRILLIC CAPITAL LETTER KA
'\u041b' # 0xCB -> CYRILLIC CAPITAL LETTER EL
'\u041c' # 0xCC -> CYRILLIC CAPITAL LETTER EM
'\u041d' # 0xCD -> CYRILLIC CAPITAL LETTER EN
'\u041e' # 0xCE -> CYRILLIC CAPITAL LETTER O
'\u041f' # 0xCF -> CYRILLIC CAPITAL LETTER PE
'\u0420' # 0xD0 -> CYRILLIC CAPITAL LETTER ER
'\u0421' # 0xD1 -> CYRILLIC CAPITAL LETTER ES
'\u0422' # 0xD2 -> CYRILLIC CAPITAL LETTER TE
'\u0423' # 0xD3 -> CYRILLIC CAPITAL LETTER U
'\u0424' # 0xD4 -> CYRILLIC CAPITAL LETTER EF
'\u0425' # 0xD5 -> CYRILLIC CAPITAL LETTER HA
'\u0426' # 0xD6 -> CYRILLIC CAPITAL LETTER TSE
'\u0427' # 0xD7 -> CYRILLIC CAPITAL LETTER CHE
'\u0428' # 0xD8 -> CYRILLIC CAPITAL LETTER SHA
'\u0429' # 0xD9 -> CYRILLIC CAPITAL LETTER SHCHA
'\u042a' # 0xDA -> CYRILLIC CAPITAL LETTER HARD SIGN
'\u042b' # 0xDB -> CYRILLIC CAPITAL LETTER YERU
'\u042c' # 0xDC -> CYRILLIC CAPITAL LETTER SOFT SIGN
'\u042d' # 0xDD -> CYRILLIC CAPITAL LETTER E
'\u042e' # 0xDE -> CYRILLIC CAPITAL LETTER YU
'\u042f' # 0xDF -> CYRILLIC CAPITAL LETTER YA
'\u0430' # 0xE0 -> CYRILLIC SMALL LETTER A
'\u0431' # 0xE1 -> CYRILLIC SMALL LETTER BE
'\u0432' # 0xE2 -> CYRILLIC SMALL LETTER VE
'\u0433' # 0xE3 -> CYRILLIC SMALL LETTER GHE
'\u0434' # 0xE4 -> CYRILLIC SMALL LETTER DE
'\u0435' # 0xE5 -> CYRILLIC SMALL LETTER IE
'\u0436' # 0xE6 -> CYRILLIC SMALL LETTER ZHE
'\u0437' # 0xE7 -> CYRILLIC SMALL LETTER ZE
'\u0438' # 0xE8 -> CYRILLIC SMALL LETTER I
'\u0439' # 0xE9 -> CYRILLIC SMALL LETTER SHORT I
'\u043a' # 0xEA -> CYRILLIC SMALL LETTER KA
'\u043b' # 0xEB -> CYRILLIC SMALL LETTER EL
'\u043c' # 0xEC -> CYRILLIC SMALL LETTER EM
'\u043d' # 0xED -> CYRILLIC SMALL LETTER EN
'\u043e' # 0xEE -> CYRILLIC SMALL LETTER O
'\u043f' # 0xEF -> CYRILLIC SMALL LETTER PE
'\u0440' # 0xF0 -> CYRILLIC SMALL LETTER ER
'\u0441' # 0xF1 -> CYRILLIC SMALL LETTER ES
'\u0442' # 0xF2 -> CYRILLIC SMALL LETTER TE
'\u0443' # 0xF3 -> CYRILLIC SMALL LETTER U
'\u0444' # 0xF4 -> CYRILLIC SMALL LETTER EF
'\u0445' # 0xF5 -> CYRILLIC SMALL LETTER HA
'\u0446' # 0xF6 -> CYRILLIC SMALL LETTER TSE
'\u0447' # 0xF7 -> CYRILLIC SMALL LETTER CHE
'\u0448' # 0xF8 -> CYRILLIC SMALL LETTER SHA
'\u0449' # 0xF9 -> CYRILLIC SMALL LETTER SHCHA
'\u044a' # 0xFA -> CYRILLIC SMALL LETTER HARD SIGN
'\u044b' # 0xFB -> CYRILLIC SMALL LETTER YERU
'\u044c' # 0xFC -> CYRILLIC SMALL LETTER SOFT SIGN
'\u044d' # 0xFD -> CYRILLIC SMALL LETTER E
'\u044e' # 0xFE -> CYRILLIC SMALL LETTER YU
'\u044f' # 0xFF -> CYRILLIC SMALL LETTER YA
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Codec for the Punicode encoding, as specified in RFC 3492
Written by Martin v. Löwis.
"""
import codecs
##################### Encoding #####################################
def segregate(str):
"""3.1 Basic code point segregation"""
base = bytearray()
extended = set()
for c in str:
if ord(c) < 128:
base.append(ord(c))
else:
extended.add(c)
extended = sorted(extended)
return bytes(base), extended
def selective_len(str, max):
"""Return the length of str, considering only characters below max."""
res = 0
for c in str:
if ord(c) < max:
res += 1
return res
def selective_find(str, char, index, pos):
"""Return a pair (index, pos), indicating the next occurrence of
char in str. index is the position of the character considering
only ordinals up to and including char, and pos is the position in
the full string. index/pos is the starting position in the full
string."""
l = len(str)
while 1:
pos += 1
if pos == l:
return (-1, -1)
c = str[pos]
if c == char:
return index+1, pos
elif c < char:
index += 1
def insertion_unsort(str, extended):
"""3.2 Insertion unsort coding"""
oldchar = 0x80
result = []
oldindex = -1
for c in extended:
index = pos = -1
char = ord(c)
curlen = selective_len(str, char)
delta = (curlen+1) * (char - oldchar)
while 1:
index,pos = selective_find(str,c,index,pos)
if index == -1:
break
delta += index - oldindex
result.append(delta-1)
oldindex = index
delta = 0
oldchar = char
return result
def T(j, bias):
# Punycode parameters: tmin = 1, tmax = 26, base = 36
res = 36 * (j + 1) - bias
if res < 1: return 1
if res > 26: return 26
return res
digits = b"abcdefghijklmnopqrstuvwxyz0123456789"
def generate_generalized_integer(N, bias):
"""3.3 Generalized variable-length integers"""
result = bytearray()
j = 0
while 1:
t = T(j, bias)
if N < t:
result.append(digits[N])
return bytes(result)
result.append(digits[t + ((N - t) % (36 - t))])
N = (N - t) // (36 - t)
j += 1
def adapt(delta, first, numchars):
if first:
delta //= 700
else:
delta //= 2
delta += delta // numchars
# ((base - tmin) * tmax) // 2 == 455
divisions = 0
while delta > 455:
delta = delta // 35 # base - tmin
divisions += 36
bias = divisions + (36 * delta // (delta + 38))
return bias
def generate_integers(baselen, deltas):
"""3.4 Bias adaptation"""
# Punycode parameters: initial bias = 72, damp = 700, skew = 38
result = bytearray()
bias = 72
for points, delta in enumerate(deltas):
s = generate_generalized_integer(delta, bias)
result.extend(s)
bias = adapt(delta, points==0, baselen+points+1)
return bytes(result)
def punycode_encode(text):
base, extended = segregate(text)
deltas = insertion_unsort(text, extended)
extended = generate_integers(len(base), deltas)
if base:
return base + b"-" + extended
return extended
##################### Decoding #####################################
def decode_generalized_number(extended, extpos, bias, errors):
"""3.3 Generalized variable-length integers"""
result = 0
w = 1
j = 0
while 1:
try:
char = ord(extended[extpos])
except IndexError:
if errors == "strict":
raise UnicodeError("incomplete punicode string")
return extpos + 1, None
extpos += 1
if 0x41 <= char <= 0x5A: # A-Z
digit = char - 0x41
elif 0x30 <= char <= 0x39:
digit = char - 22 # 0x30-26
elif errors == "strict":
raise UnicodeError("Invalid extended code point '%s'"
% extended[extpos])
else:
return extpos, None
t = T(j, bias)
result += digit * w
if digit < t:
return extpos, result
w = w * (36 - t)
j += 1
def insertion_sort(base, extended, errors):
"""3.2 Insertion unsort coding"""
char = 0x80
pos = -1
bias = 72
extpos = 0
while extpos < len(extended):
newpos, delta = decode_generalized_number(extended, extpos,
bias, errors)
if delta is None:
# There was an error in decoding. We can't continue because
# synchronization is lost.
return base
pos += delta+1
char += pos // (len(base) + 1)
if char > 0x10FFFF:
if errors == "strict":
raise UnicodeError("Invalid character U+%x" % char)
char = ord('?')
pos = pos % (len(base) + 1)
base = base[:pos] + chr(char) + base[pos:]
bias = adapt(delta, (extpos == 0), len(base))
extpos = newpos
return base
def punycode_decode(text, errors):
if isinstance(text, str):
text = text.encode("ascii")
if isinstance(text, memoryview):
text = bytes(text)
pos = text.rfind(b"-")
if pos == -1:
base = ""
extended = str(text, "ascii").upper()
else:
base = str(text[:pos], "ascii", errors)
extended = str(text[pos+1:], "ascii").upper()
return insertion_sort(base, extended, errors)
### Codec APIs
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
res = punycode_encode(input)
return res, len(input)
def decode(self, input, errors='strict'):
if errors not in ('strict', 'replace', 'ignore'):
raise UnicodeError("Unsupported error handling "+errors)
res = punycode_decode(input, errors)
return res, len(input)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return punycode_encode(input)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
if self.errors not in ('strict', 'replace', 'ignore'):
raise UnicodeError("Unsupported error handling "+self.errors)
return punycode_decode(input, self.errors)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='punycode',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
"""Codec for quoted-printable encoding.
This codec de/encodes from bytes to bytes.
"""
import codecs
import quopri
from io import BytesIO
def quopri_encode(input, errors='strict'):
assert errors == 'strict'
f = BytesIO(input)
g = BytesIO()
quopri.encode(f, g, quotetabs=True)
return (g.getvalue(), len(input))
def quopri_decode(input, errors='strict'):
assert errors == 'strict'
f = BytesIO(input)
g = BytesIO()
quopri.decode(f, g)
return (g.getvalue(), len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return quopri_encode(input, errors)
def decode(self, input, errors='strict'):
return quopri_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return quopri_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return quopri_decode(input, self.errors)[0]
class StreamWriter(Codec, codecs.StreamWriter):
charbuffertype = bytes
class StreamReader(Codec, codecs.StreamReader):
charbuffertype = bytes
# encodings module API
def getregentry():
return codecs.CodecInfo(
name='quopri',
encode=quopri_encode,
decode=quopri_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
""" Python 'raw-unicode-escape' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.raw_unicode_escape_encode
decode = codecs.raw_unicode_escape_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.raw_unicode_escape_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.raw_unicode_escape_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='raw-unicode-escape',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
#!/usr/bin/env python
""" Python Character Mapping Codec for ROT13.
This codec de/encodes from str to str.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return (input.translate(rot13_map), len(input))
def decode(self, input, errors='strict'):
return (input.translate(rot13_map), len(input))
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return input.translate(rot13_map)
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return input.translate(rot13_map)
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='rot-13',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
_is_text_encoding=False,
)
### Map
rot13_map = codecs.make_identity_dict(range(256))
rot13_map.update({
0x0041: 0x004e,
0x0042: 0x004f,
0x0043: 0x0050,
0x0044: 0x0051,
0x0045: 0x0052,
0x0046: 0x0053,
0x0047: 0x0054,
0x0048: 0x0055,
0x0049: 0x0056,
0x004a: 0x0057,
0x004b: 0x0058,
0x004c: 0x0059,
0x004d: 0x005a,
0x004e: 0x0041,
0x004f: 0x0042,
0x0050: 0x0043,
0x0051: 0x0044,
0x0052: 0x0045,
0x0053: 0x0046,
0x0054: 0x0047,
0x0055: 0x0048,
0x0056: 0x0049,
0x0057: 0x004a,
0x0058: 0x004b,
0x0059: 0x004c,
0x005a: 0x004d,
0x0061: 0x006e,
0x0062: 0x006f,
0x0063: 0x0070,
0x0064: 0x0071,
0x0065: 0x0072,
0x0066: 0x0073,
0x0067: 0x0074,
0x0068: 0x0075,
0x0069: 0x0076,
0x006a: 0x0077,
0x006b: 0x0078,
0x006c: 0x0079,
0x006d: 0x007a,
0x006e: 0x0061,
0x006f: 0x0062,
0x0070: 0x0063,
0x0071: 0x0064,
0x0072: 0x0065,
0x0073: 0x0066,
0x0074: 0x0067,
0x0075: 0x0068,
0x0076: 0x0069,
0x0077: 0x006a,
0x0078: 0x006b,
0x0079: 0x006c,
0x007a: 0x006d,
})
### Filter API
def rot13(infile, outfile):
outfile.write(codecs.encode(infile.read(), 'rot-13'))
if __name__ == '__main__':
import sys
rot13(sys.stdin, sys.stdout)
#
# shift_jis.py: Python Unicode Codec for SHIFT_JIS
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('shift_jis')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='shift_jis',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# shift_jisx0213.py: Python Unicode Codec for SHIFT_JISX0213
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('shift_jisx0213')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='shift_jisx0213',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
#
# shift_jis_2004.py: Python Unicode Codec for SHIFT_JIS_2004
#
# Written by Hye-Shik Chang <[email protected]>
#
import _codecs_jp, codecs
import _multibytecodec as mbc
codec = _codecs_jp.getcodec('shift_jis_2004')
class Codec(codecs.Codec):
encode = codec.encode
decode = codec.decode
class IncrementalEncoder(mbc.MultibyteIncrementalEncoder,
codecs.IncrementalEncoder):
codec = codec
class IncrementalDecoder(mbc.MultibyteIncrementalDecoder,
codecs.IncrementalDecoder):
codec = codec
class StreamReader(Codec, mbc.MultibyteStreamReader, codecs.StreamReader):
codec = codec
class StreamWriter(Codec, mbc.MultibyteStreamWriter, codecs.StreamWriter):
codec = codec
def getregentry():
return codecs.CodecInfo(
name='shift_jis_2004',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python Character Mapping Codec tis_620 generated from 'python-mappings/TIS-620.TXT' with gencodec.py.
"""#"
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
return codecs.charmap_encode(input,errors,encoding_table)
def decode(self,input,errors='strict'):
return codecs.charmap_decode(input,errors,decoding_table)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='tis-620',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
### Decoding Table
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DEVICE CONTROL ONE
'\x12' # 0x12 -> DEVICE CONTROL TWO
'\x13' # 0x13 -> DEVICE CONTROL THREE
'\x14' # 0x14 -> DEVICE CONTROL FOUR
'\x15' # 0x15 -> NEGATIVE ACKNOWLEDGE
'\x16' # 0x16 -> SYNCHRONOUS IDLE
'\x17' # 0x17 -> END OF TRANSMISSION BLOCK
'\x18' # 0x18 -> CANCEL
'\x19' # 0x19 -> END OF MEDIUM
'\x1a' # 0x1A -> SUBSTITUTE
'\x1b' # 0x1B -> ESCAPE
'\x1c' # 0x1C -> FILE SEPARATOR
'\x1d' # 0x1D -> GROUP SEPARATOR
'\x1e' # 0x1E -> RECORD SEPARATOR
'\x1f' # 0x1F -> UNIT SEPARATOR
' ' # 0x20 -> SPACE
'!' # 0x21 -> EXCLAMATION MARK
'"' # 0x22 -> QUOTATION MARK
'#' # 0x23 -> NUMBER SIGN
'$' # 0x24 -> DOLLAR SIGN
'%' # 0x25 -> PERCENT SIGN
'&' # 0x26 -> AMPERSAND
"'" # 0x27 -> APOSTROPHE
'(' # 0x28 -> LEFT PARENTHESIS
')' # 0x29 -> RIGHT PARENTHESIS
'*' # 0x2A -> ASTERISK
'+' # 0x2B -> PLUS SIGN
',' # 0x2C -> COMMA
'-' # 0x2D -> HYPHEN-MINUS
'.' # 0x2E -> FULL STOP
'/' # 0x2F -> SOLIDUS
'0' # 0x30 -> DIGIT ZERO
'1' # 0x31 -> DIGIT ONE
'2' # 0x32 -> DIGIT TWO
'3' # 0x33 -> DIGIT THREE
'4' # 0x34 -> DIGIT FOUR
'5' # 0x35 -> DIGIT FIVE
'6' # 0x36 -> DIGIT SIX
'7' # 0x37 -> DIGIT SEVEN
'8' # 0x38 -> DIGIT EIGHT
'9' # 0x39 -> DIGIT NINE
':' # 0x3A -> COLON
';' # 0x3B -> SEMICOLON
'<' # 0x3C -> LESS-THAN SIGN
'=' # 0x3D -> EQUALS SIGN
'>' # 0x3E -> GREATER-THAN SIGN
'?' # 0x3F -> QUESTION MARK
'@' # 0x40 -> COMMERCIAL AT
'A' # 0x41 -> LATIN CAPITAL LETTER A
'B' # 0x42 -> LATIN CAPITAL LETTER B
'C' # 0x43 -> LATIN CAPITAL LETTER C
'D' # 0x44 -> LATIN CAPITAL LETTER D
'E' # 0x45 -> LATIN CAPITAL LETTER E
'F' # 0x46 -> LATIN CAPITAL LETTER F
'G' # 0x47 -> LATIN CAPITAL LETTER G
'H' # 0x48 -> LATIN CAPITAL LETTER H
'I' # 0x49 -> LATIN CAPITAL LETTER I
'J' # 0x4A -> LATIN CAPITAL LETTER J
'K' # 0x4B -> LATIN CAPITAL LETTER K
'L' # 0x4C -> LATIN CAPITAL LETTER L
'M' # 0x4D -> LATIN CAPITAL LETTER M
'N' # 0x4E -> LATIN CAPITAL LETTER N
'O' # 0x4F -> LATIN CAPITAL LETTER O
'P' # 0x50 -> LATIN CAPITAL LETTER P
'Q' # 0x51 -> LATIN CAPITAL LETTER Q
'R' # 0x52 -> LATIN CAPITAL LETTER R
'S' # 0x53 -> LATIN CAPITAL LETTER S
'T' # 0x54 -> LATIN CAPITAL LETTER T
'U' # 0x55 -> LATIN CAPITAL LETTER U
'V' # 0x56 -> LATIN CAPITAL LETTER V
'W' # 0x57 -> LATIN CAPITAL LETTER W
'X' # 0x58 -> LATIN CAPITAL LETTER X
'Y' # 0x59 -> LATIN CAPITAL LETTER Y
'Z' # 0x5A -> LATIN CAPITAL LETTER Z
'[' # 0x5B -> LEFT SQUARE BRACKET
'\\' # 0x5C -> REVERSE SOLIDUS
']' # 0x5D -> RIGHT SQUARE BRACKET
'^' # 0x5E -> CIRCUMFLEX ACCENT
'_' # 0x5F -> LOW LINE
'`' # 0x60 -> GRAVE ACCENT
'a' # 0x61 -> LATIN SMALL LETTER A
'b' # 0x62 -> LATIN SMALL LETTER B
'c' # 0x63 -> LATIN SMALL LETTER C
'd' # 0x64 -> LATIN SMALL LETTER D
'e' # 0x65 -> LATIN SMALL LETTER E
'f' # 0x66 -> LATIN SMALL LETTER F
'g' # 0x67 -> LATIN SMALL LETTER G
'h' # 0x68 -> LATIN SMALL LETTER H
'i' # 0x69 -> LATIN SMALL LETTER I
'j' # 0x6A -> LATIN SMALL LETTER J
'k' # 0x6B -> LATIN SMALL LETTER K
'l' # 0x6C -> LATIN SMALL LETTER L
'm' # 0x6D -> LATIN SMALL LETTER M
'n' # 0x6E -> LATIN SMALL LETTER N
'o' # 0x6F -> LATIN SMALL LETTER O
'p' # 0x70 -> LATIN SMALL LETTER P
'q' # 0x71 -> LATIN SMALL LETTER Q
'r' # 0x72 -> LATIN SMALL LETTER R
's' # 0x73 -> LATIN SMALL LETTER S
't' # 0x74 -> LATIN SMALL LETTER T
'u' # 0x75 -> LATIN SMALL LETTER U
'v' # 0x76 -> LATIN SMALL LETTER V
'w' # 0x77 -> LATIN SMALL LETTER W
'x' # 0x78 -> LATIN SMALL LETTER X
'y' # 0x79 -> LATIN SMALL LETTER Y
'z' # 0x7A -> LATIN SMALL LETTER Z
'{' # 0x7B -> LEFT CURLY BRACKET
'|' # 0x7C -> VERTICAL LINE
'}' # 0x7D -> RIGHT CURLY BRACKET
'~' # 0x7E -> TILDE
'\x7f' # 0x7F -> DELETE
'\x80' # 0x80 -> <control>
'\x81' # 0x81 -> <control>
'\x82' # 0x82 -> <control>
'\x83' # 0x83 -> <control>
'\x84' # 0x84 -> <control>
'\x85' # 0x85 -> <control>
'\x86' # 0x86 -> <control>
'\x87' # 0x87 -> <control>
'\x88' # 0x88 -> <control>
'\x89' # 0x89 -> <control>
'\x8a' # 0x8A -> <control>
'\x8b' # 0x8B -> <control>
'\x8c' # 0x8C -> <control>
'\x8d' # 0x8D -> <control>
'\x8e' # 0x8E -> <control>
'\x8f' # 0x8F -> <control>
'\x90' # 0x90 -> <control>
'\x91' # 0x91 -> <control>
'\x92' # 0x92 -> <control>
'\x93' # 0x93 -> <control>
'\x94' # 0x94 -> <control>
'\x95' # 0x95 -> <control>
'\x96' # 0x96 -> <control>
'\x97' # 0x97 -> <control>
'\x98' # 0x98 -> <control>
'\x99' # 0x99 -> <control>
'\x9a' # 0x9A -> <control>
'\x9b' # 0x9B -> <control>
'\x9c' # 0x9C -> <control>
'\x9d' # 0x9D -> <control>
'\x9e' # 0x9E -> <control>
'\x9f' # 0x9F -> <control>
'\ufffe'
'\u0e01' # 0xA1 -> THAI CHARACTER KO KAI
'\u0e02' # 0xA2 -> THAI CHARACTER KHO KHAI
'\u0e03' # 0xA3 -> THAI CHARACTER KHO KHUAT
'\u0e04' # 0xA4 -> THAI CHARACTER KHO KHWAI
'\u0e05' # 0xA5 -> THAI CHARACTER KHO KHON
'\u0e06' # 0xA6 -> THAI CHARACTER KHO RAKHANG
'\u0e07' # 0xA7 -> THAI CHARACTER NGO NGU
'\u0e08' # 0xA8 -> THAI CHARACTER CHO CHAN
'\u0e09' # 0xA9 -> THAI CHARACTER CHO CHING
'\u0e0a' # 0xAA -> THAI CHARACTER CHO CHANG
'\u0e0b' # 0xAB -> THAI CHARACTER SO SO
'\u0e0c' # 0xAC -> THAI CHARACTER CHO CHOE
'\u0e0d' # 0xAD -> THAI CHARACTER YO YING
'\u0e0e' # 0xAE -> THAI CHARACTER DO CHADA
'\u0e0f' # 0xAF -> THAI CHARACTER TO PATAK
'\u0e10' # 0xB0 -> THAI CHARACTER THO THAN
'\u0e11' # 0xB1 -> THAI CHARACTER THO NANGMONTHO
'\u0e12' # 0xB2 -> THAI CHARACTER THO PHUTHAO
'\u0e13' # 0xB3 -> THAI CHARACTER NO NEN
'\u0e14' # 0xB4 -> THAI CHARACTER DO DEK
'\u0e15' # 0xB5 -> THAI CHARACTER TO TAO
'\u0e16' # 0xB6 -> THAI CHARACTER THO THUNG
'\u0e17' # 0xB7 -> THAI CHARACTER THO THAHAN
'\u0e18' # 0xB8 -> THAI CHARACTER THO THONG
'\u0e19' # 0xB9 -> THAI CHARACTER NO NU
'\u0e1a' # 0xBA -> THAI CHARACTER BO BAIMAI
'\u0e1b' # 0xBB -> THAI CHARACTER PO PLA
'\u0e1c' # 0xBC -> THAI CHARACTER PHO PHUNG
'\u0e1d' # 0xBD -> THAI CHARACTER FO FA
'\u0e1e' # 0xBE -> THAI CHARACTER PHO PHAN
'\u0e1f' # 0xBF -> THAI CHARACTER FO FAN
'\u0e20' # 0xC0 -> THAI CHARACTER PHO SAMPHAO
'\u0e21' # 0xC1 -> THAI CHARACTER MO MA
'\u0e22' # 0xC2 -> THAI CHARACTER YO YAK
'\u0e23' # 0xC3 -> THAI CHARACTER RO RUA
'\u0e24' # 0xC4 -> THAI CHARACTER RU
'\u0e25' # 0xC5 -> THAI CHARACTER LO LING
'\u0e26' # 0xC6 -> THAI CHARACTER LU
'\u0e27' # 0xC7 -> THAI CHARACTER WO WAEN
'\u0e28' # 0xC8 -> THAI CHARACTER SO SALA
'\u0e29' # 0xC9 -> THAI CHARACTER SO RUSI
'\u0e2a' # 0xCA -> THAI CHARACTER SO SUA
'\u0e2b' # 0xCB -> THAI CHARACTER HO HIP
'\u0e2c' # 0xCC -> THAI CHARACTER LO CHULA
'\u0e2d' # 0xCD -> THAI CHARACTER O ANG
'\u0e2e' # 0xCE -> THAI CHARACTER HO NOKHUK
'\u0e2f' # 0xCF -> THAI CHARACTER PAIYANNOI
'\u0e30' # 0xD0 -> THAI CHARACTER SARA A
'\u0e31' # 0xD1 -> THAI CHARACTER MAI HAN-AKAT
'\u0e32' # 0xD2 -> THAI CHARACTER SARA AA
'\u0e33' # 0xD3 -> THAI CHARACTER SARA AM
'\u0e34' # 0xD4 -> THAI CHARACTER SARA I
'\u0e35' # 0xD5 -> THAI CHARACTER SARA II
'\u0e36' # 0xD6 -> THAI CHARACTER SARA UE
'\u0e37' # 0xD7 -> THAI CHARACTER SARA UEE
'\u0e38' # 0xD8 -> THAI CHARACTER SARA U
'\u0e39' # 0xD9 -> THAI CHARACTER SARA UU
'\u0e3a' # 0xDA -> THAI CHARACTER PHINTHU
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
'\u0e3f' # 0xDF -> THAI CURRENCY SYMBOL BAHT
'\u0e40' # 0xE0 -> THAI CHARACTER SARA E
'\u0e41' # 0xE1 -> THAI CHARACTER SARA AE
'\u0e42' # 0xE2 -> THAI CHARACTER SARA O
'\u0e43' # 0xE3 -> THAI CHARACTER SARA AI MAIMUAN
'\u0e44' # 0xE4 -> THAI CHARACTER SARA AI MAIMALAI
'\u0e45' # 0xE5 -> THAI CHARACTER LAKKHANGYAO
'\u0e46' # 0xE6 -> THAI CHARACTER MAIYAMOK
'\u0e47' # 0xE7 -> THAI CHARACTER MAITAIKHU
'\u0e48' # 0xE8 -> THAI CHARACTER MAI EK
'\u0e49' # 0xE9 -> THAI CHARACTER MAI THO
'\u0e4a' # 0xEA -> THAI CHARACTER MAI TRI
'\u0e4b' # 0xEB -> THAI CHARACTER MAI CHATTAWA
'\u0e4c' # 0xEC -> THAI CHARACTER THANTHAKHAT
'\u0e4d' # 0xED -> THAI CHARACTER NIKHAHIT
'\u0e4e' # 0xEE -> THAI CHARACTER YAMAKKAN
'\u0e4f' # 0xEF -> THAI CHARACTER FONGMAN
'\u0e50' # 0xF0 -> THAI DIGIT ZERO
'\u0e51' # 0xF1 -> THAI DIGIT ONE
'\u0e52' # 0xF2 -> THAI DIGIT TWO
'\u0e53' # 0xF3 -> THAI DIGIT THREE
'\u0e54' # 0xF4 -> THAI DIGIT FOUR
'\u0e55' # 0xF5 -> THAI DIGIT FIVE
'\u0e56' # 0xF6 -> THAI DIGIT SIX
'\u0e57' # 0xF7 -> THAI DIGIT SEVEN
'\u0e58' # 0xF8 -> THAI DIGIT EIGHT
'\u0e59' # 0xF9 -> THAI DIGIT NINE
'\u0e5a' # 0xFA -> THAI CHARACTER ANGKHANKHU
'\u0e5b' # 0xFB -> THAI CHARACTER KHOMUT
'\ufffe'
'\ufffe'
'\ufffe'
'\ufffe'
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
""" Python 'undefined' Codec
This codec will always raise a ValueError exception when being
used. It is intended for use by the site.py file to switch off
automatic string to Unicode coercion.
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
def encode(self,input,errors='strict'):
raise UnicodeError("undefined encoding")
def decode(self,input,errors='strict'):
raise UnicodeError("undefined encoding")
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
raise UnicodeError("undefined encoding")
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
raise UnicodeError("undefined encoding")
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='undefined',
encode=Codec().encode,
decode=Codec().decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'unicode-escape' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.unicode_escape_encode
decode = codecs.unicode_escape_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.unicode_escape_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.unicode_escape_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='unicode-escape',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'unicode-internal' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
class Codec(codecs.Codec):
# Note: Binding these as C functions will result in the class not
# converting them to methods. This is intended.
encode = codecs.unicode_internal_encode
decode = codecs.unicode_internal_decode
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.unicode_internal_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return codecs.unicode_internal_decode(input, self.errors)[0]
class StreamWriter(Codec,codecs.StreamWriter):
pass
class StreamReader(Codec,codecs.StreamReader):
pass
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='unicode-internal',
encode=Codec.encode,
decode=Codec.decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamwriter=StreamWriter,
streamreader=StreamReader,
)
""" Python 'utf-16' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs, sys
### Codec APIs
encode = codecs.utf_16_encode
def decode(input, errors='strict'):
return codecs.utf_16_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
codecs.IncrementalEncoder.__init__(self, errors)
self.encoder = None
def encode(self, input, final=False):
if self.encoder is None:
result = codecs.utf_16_encode(input, self.errors)[0]
if sys.byteorder == 'little':
self.encoder = codecs.utf_16_le_encode
else:
self.encoder = codecs.utf_16_be_encode
return result
return self.encoder(input, self.errors)[0]
def reset(self):
codecs.IncrementalEncoder.reset(self)
self.encoder = None
def getstate(self):
# state info we return to the caller:
# 0: stream is in natural order for this platform
# 2: endianness hasn't been determined yet
# (we're never writing in unnatural order)
return (2 if self.encoder is None else 0)
def setstate(self, state):
if state:
self.encoder = None
else:
if sys.byteorder == 'little':
self.encoder = codecs.utf_16_le_encode
else:
self.encoder = codecs.utf_16_be_encode
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def __init__(self, errors='strict'):
codecs.BufferedIncrementalDecoder.__init__(self, errors)
self.decoder = None
def _buffer_decode(self, input, errors, final):
if self.decoder is None:
(output, consumed, byteorder) = \
codecs.utf_16_ex_decode(input, errors, 0, final)
if byteorder == -1:
self.decoder = codecs.utf_16_le_decode
elif byteorder == 1:
self.decoder = codecs.utf_16_be_decode
elif consumed >= 2:
raise UnicodeError("UTF-16 stream does not start with BOM")
return (output, consumed)
return self.decoder(input, self.errors, final)
def reset(self):
codecs.BufferedIncrementalDecoder.reset(self)
self.decoder = None
def getstate(self):
# additonal state info from the base class must be None here,
# as it isn't passed along to the caller
state = codecs.BufferedIncrementalDecoder.getstate(self)[0]
# additional state info we pass to the caller:
# 0: stream is in natural order for this platform
# 1: stream is in unnatural order
# 2: endianness hasn't been determined yet
if self.decoder is None:
return (state, 2)
addstate = int((sys.byteorder == "big") !=
(self.decoder is codecs.utf_16_be_decode))
return (state, addstate)
def setstate(self, state):
# state[1] will be ignored by BufferedIncrementalDecoder.setstate()
codecs.BufferedIncrementalDecoder.setstate(self, state)
state = state[1]
if state == 0:
self.decoder = (codecs.utf_16_be_decode
if sys.byteorder == "big"
else codecs.utf_16_le_decode)
elif state == 1:
self.decoder = (codecs.utf_16_le_decode
if sys.byteorder == "big"
else codecs.utf_16_be_decode)
else:
self.decoder = None
class StreamWriter(codecs.StreamWriter):
def __init__(self, stream, errors='strict'):
codecs.StreamWriter.__init__(self, stream, errors)
self.encoder = None
def reset(self):
codecs.StreamWriter.reset(self)
self.encoder = None
def encode(self, input, errors='strict'):
if self.encoder is None:
result = codecs.utf_16_encode(input, errors)
if sys.byteorder == 'little':
self.encoder = codecs.utf_16_le_encode
else:
self.encoder = codecs.utf_16_be_encode
return result
else:
return self.encoder(input, errors)
class StreamReader(codecs.StreamReader):
def reset(self):
codecs.StreamReader.reset(self)
try:
del self.decode
except AttributeError:
pass
def decode(self, input, errors='strict'):
(object, consumed, byteorder) = \
codecs.utf_16_ex_decode(input, errors, 0, False)
if byteorder == -1:
self.decode = codecs.utf_16_le_decode
elif byteorder == 1:
self.decode = codecs.utf_16_be_decode
elif consumed>=2:
raise UnicodeError("UTF-16 stream does not start with BOM")
return (object, consumed)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-16',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-16-be' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
encode = codecs.utf_16_be_encode
def decode(input, errors='strict'):
return codecs.utf_16_be_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_16_be_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_16_be_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_16_be_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_16_be_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-16-be',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-16-le' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
encode = codecs.utf_16_le_encode
def decode(input, errors='strict'):
return codecs.utf_16_le_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_16_le_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_16_le_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_16_le_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_16_le_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-16-le',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""
Python 'utf-32' Codec
"""
import codecs, sys
### Codec APIs
encode = codecs.utf_32_encode
def decode(input, errors='strict'):
return codecs.utf_32_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
codecs.IncrementalEncoder.__init__(self, errors)
self.encoder = None
def encode(self, input, final=False):
if self.encoder is None:
result = codecs.utf_32_encode(input, self.errors)[0]
if sys.byteorder == 'little':
self.encoder = codecs.utf_32_le_encode
else:
self.encoder = codecs.utf_32_be_encode
return result
return self.encoder(input, self.errors)[0]
def reset(self):
codecs.IncrementalEncoder.reset(self)
self.encoder = None
def getstate(self):
# state info we return to the caller:
# 0: stream is in natural order for this platform
# 2: endianness hasn't been determined yet
# (we're never writing in unnatural order)
return (2 if self.encoder is None else 0)
def setstate(self, state):
if state:
self.encoder = None
else:
if sys.byteorder == 'little':
self.encoder = codecs.utf_32_le_encode
else:
self.encoder = codecs.utf_32_be_encode
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def __init__(self, errors='strict'):
codecs.BufferedIncrementalDecoder.__init__(self, errors)
self.decoder = None
def _buffer_decode(self, input, errors, final):
if self.decoder is None:
(output, consumed, byteorder) = \
codecs.utf_32_ex_decode(input, errors, 0, final)
if byteorder == -1:
self.decoder = codecs.utf_32_le_decode
elif byteorder == 1:
self.decoder = codecs.utf_32_be_decode
elif consumed >= 4:
raise UnicodeError("UTF-32 stream does not start with BOM")
return (output, consumed)
return self.decoder(input, self.errors, final)
def reset(self):
codecs.BufferedIncrementalDecoder.reset(self)
self.decoder = None
def getstate(self):
# additonal state info from the base class must be None here,
# as it isn't passed along to the caller
state = codecs.BufferedIncrementalDecoder.getstate(self)[0]
# additional state info we pass to the caller:
# 0: stream is in natural order for this platform
# 1: stream is in unnatural order
# 2: endianness hasn't been determined yet
if self.decoder is None:
return (state, 2)
addstate = int((sys.byteorder == "big") !=
(self.decoder is codecs.utf_32_be_decode))
return (state, addstate)
def setstate(self, state):
# state[1] will be ignored by BufferedIncrementalDecoder.setstate()
codecs.BufferedIncrementalDecoder.setstate(self, state)
state = state[1]
if state == 0:
self.decoder = (codecs.utf_32_be_decode
if sys.byteorder == "big"
else codecs.utf_32_le_decode)
elif state == 1:
self.decoder = (codecs.utf_32_le_decode
if sys.byteorder == "big"
else codecs.utf_32_be_decode)
else:
self.decoder = None
class StreamWriter(codecs.StreamWriter):
def __init__(self, stream, errors='strict'):
self.encoder = None
codecs.StreamWriter.__init__(self, stream, errors)
def reset(self):
codecs.StreamWriter.reset(self)
self.encoder = None
def encode(self, input, errors='strict'):
if self.encoder is None:
result = codecs.utf_32_encode(input, errors)
if sys.byteorder == 'little':
self.encoder = codecs.utf_32_le_encode
else:
self.encoder = codecs.utf_32_be_encode
return result
else:
return self.encoder(input, errors)
class StreamReader(codecs.StreamReader):
def reset(self):
codecs.StreamReader.reset(self)
try:
del self.decode
except AttributeError:
pass
def decode(self, input, errors='strict'):
(object, consumed, byteorder) = \
codecs.utf_32_ex_decode(input, errors, 0, False)
if byteorder == -1:
self.decode = codecs.utf_32_le_decode
elif byteorder == 1:
self.decode = codecs.utf_32_be_decode
elif consumed>=4:
raise UnicodeError("UTF-32 stream does not start with BOM")
return (object, consumed)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-32',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""
Python 'utf-32-be' Codec
"""
import codecs
### Codec APIs
encode = codecs.utf_32_be_encode
def decode(input, errors='strict'):
return codecs.utf_32_be_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_32_be_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_32_be_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_32_be_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_32_be_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-32-be',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""
Python 'utf-32-le' Codec
"""
import codecs
### Codec APIs
encode = codecs.utf_32_le_encode
def decode(input, errors='strict'):
return codecs.utf_32_le_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_32_le_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_32_le_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_32_le_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_32_le_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-32-le',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-7' Codec
Written by Brian Quinlan ([email protected]).
"""
import codecs
### Codec APIs
encode = codecs.utf_7_encode
def decode(input, errors='strict'):
return codecs.utf_7_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_7_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_7_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_7_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_7_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-7',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-8' Codec
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""
import codecs
### Codec APIs
encode = codecs.utf_8_encode
def decode(input, errors='strict'):
return codecs.utf_8_decode(input, errors, True)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return codecs.utf_8_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
_buffer_decode = codecs.utf_8_decode
class StreamWriter(codecs.StreamWriter):
encode = codecs.utf_8_encode
class StreamReader(codecs.StreamReader):
decode = codecs.utf_8_decode
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-8',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
""" Python 'utf-8-sig' Codec
This work similar to UTF-8 with the following changes:
* On encoding/writing a UTF-8 encoded BOM will be prepended/written as the
first three bytes.
* On decoding/reading if the first three bytes are a UTF-8 encoded BOM, these
bytes will be skipped.
"""
import codecs
### Codec APIs
def encode(input, errors='strict'):
return (codecs.BOM_UTF8 + codecs.utf_8_encode(input, errors)[0],
len(input))
def decode(input, errors='strict'):
prefix = 0
if input[:3] == codecs.BOM_UTF8:
input = input[3:]
prefix = 3
(output, consumed) = codecs.utf_8_decode(input, errors, True)
return (output, consumed+prefix)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
codecs.IncrementalEncoder.__init__(self, errors)
self.first = 1
def encode(self, input, final=False):
if self.first:
self.first = 0
return codecs.BOM_UTF8 + \
codecs.utf_8_encode(input, self.errors)[0]
else:
return codecs.utf_8_encode(input, self.errors)[0]
def reset(self):
codecs.IncrementalEncoder.reset(self)
self.first = 1
def getstate(self):
return self.first
def setstate(self, state):
self.first = state
class IncrementalDecoder(codecs.BufferedIncrementalDecoder):
def __init__(self, errors='strict'):
codecs.BufferedIncrementalDecoder.__init__(self, errors)
self.first = 1
def _buffer_decode(self, input, errors, final):
if self.first:
if len(input) < 3:
if codecs.BOM_UTF8.startswith(input):
# not enough data to decide if this really is a BOM
# => try again on the next call
return ("", 0)
else:
self.first = 0
else:
self.first = 0
if input[:3] == codecs.BOM_UTF8:
(output, consumed) = \
codecs.utf_8_decode(input[3:], errors, final)
return (output, consumed+3)
return codecs.utf_8_decode(input, errors, final)
def reset(self):
codecs.BufferedIncrementalDecoder.reset(self)
self.first = 1
def getstate(self):
state = codecs.BufferedIncrementalDecoder.getstate(self)
# state[1] must be 0 here, as it isn't passed along to the caller
return (state[0], self.first)
def setstate(self, state):
# state[1] will be ignored by BufferedIncrementalDecoder.setstate()
codecs.BufferedIncrementalDecoder.setstate(self, state)
self.first = state[1]
class StreamWriter(codecs.StreamWriter):
def reset(self):
codecs.StreamWriter.reset(self)
try:
del self.encode
except AttributeError:
pass
def encode(self, input, errors='strict'):
self.encode = codecs.utf_8_encode
return encode(input, errors)
class StreamReader(codecs.StreamReader):
def reset(self):
codecs.StreamReader.reset(self)
try:
del self.decode
except AttributeError:
pass
def decode(self, input, errors='strict'):
if len(input) < 3:
if codecs.BOM_UTF8.startswith(input):
# not enough data to decide if this is a BOM
# => try again on the next call
return ("", 0)
elif input[:3] == codecs.BOM_UTF8:
self.decode = codecs.utf_8_decode
(output, consumed) = codecs.utf_8_decode(input[3:],errors)
return (output, consumed+3)
# (else) no BOM present
self.decode = codecs.utf_8_decode
return codecs.utf_8_decode(input, errors)
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='utf-8-sig',
encode=encode,
decode=decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
)
"""Python 'uu_codec' Codec - UU content transfer encoding.
This codec de/encodes from bytes to bytes.
Written by Marc-Andre Lemburg ([email protected]). Some details were
adapted from uu.py which was written by Lance Ellinghouse and
modified by Jack Jansen and Fredrik Lundh.
"""
import codecs
import binascii
from io import BytesIO
### Codec APIs
def uu_encode(input, errors='strict', filename='<data>', mode=0o666):
assert errors == 'strict'
infile = BytesIO(input)
outfile = BytesIO()
read = infile.read
write = outfile.write
# Encode
write(('begin %o %s\n' % (mode & 0o777, filename)).encode('ascii'))
chunk = read(45)
while chunk:
write(binascii.b2a_uu(chunk))
chunk = read(45)
write(b' \nend\n')
return (outfile.getvalue(), len(input))
def uu_decode(input, errors='strict'):
assert errors == 'strict'
infile = BytesIO(input)
outfile = BytesIO()
readline = infile.readline
write = outfile.write
# Find start of encoded data
while 1:
s = readline()
if not s:
raise ValueError('Missing "begin" line in input data')
if s[:5] == b'begin':
break
# Decode
while True:
s = readline()
if not s or s == b'end\n':
break
try:
data = binascii.a2b_uu(s)
except binascii.Error as v:
# Workaround for broken uuencoders by /Fredrik Lundh
nbytes = (((s[0]-32) & 63) * 4 + 5) // 3
data = binascii.a2b_uu(s[:nbytes])
#sys.stderr.write("Warning: %s\n" % str(v))
write(data)
if not s:
raise ValueError('Truncated input data')
return (outfile.getvalue(), len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return uu_encode(input, errors)
def decode(self, input, errors='strict'):
return uu_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def encode(self, input, final=False):
return uu_encode(input, self.errors)[0]
class IncrementalDecoder(codecs.IncrementalDecoder):
def decode(self, input, final=False):
return uu_decode(input, self.errors)[0]
class StreamWriter(Codec, codecs.StreamWriter):
charbuffertype = bytes
class StreamReader(Codec, codecs.StreamReader):
charbuffertype = bytes
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='uu',
encode=uu_encode,
decode=uu_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
_is_text_encoding=False,
)
"""Python 'zlib_codec' Codec - zlib compression encoding.
This codec de/encodes from bytes to bytes.
Written by Marc-Andre Lemburg ([email protected]).
"""
import codecs
import zlib # this codec needs the optional zlib module !
### Codec APIs
def zlib_encode(input, errors='strict'):
assert errors == 'strict'
return (zlib.compress(input), len(input))
def zlib_decode(input, errors='strict'):
assert errors == 'strict'
return (zlib.decompress(input), len(input))
class Codec(codecs.Codec):
def encode(self, input, errors='strict'):
return zlib_encode(input, errors)
def decode(self, input, errors='strict'):
return zlib_decode(input, errors)
class IncrementalEncoder(codecs.IncrementalEncoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.compressobj = zlib.compressobj()
def encode(self, input, final=False):
if final:
c = self.compressobj.compress(input)
return c + self.compressobj.flush()
else:
return self.compressobj.compress(input)
def reset(self):
self.compressobj = zlib.compressobj()
class IncrementalDecoder(codecs.IncrementalDecoder):
def __init__(self, errors='strict'):
assert errors == 'strict'
self.errors = errors
self.decompressobj = zlib.decompressobj()
def decode(self, input, final=False):
if final:
c = self.decompressobj.decompress(input)
return c + self.decompressobj.flush()
else:
return self.decompressobj.decompress(input)
def reset(self):
self.decompressobj = zlib.decompressobj()
class StreamWriter(Codec, codecs.StreamWriter):
charbuffertype = bytes
class StreamReader(Codec, codecs.StreamReader):
charbuffertype = bytes
### encodings module API
def getregentry():
return codecs.CodecInfo(
name='zlib',
encode=zlib_encode,
decode=zlib_decode,
incrementalencoder=IncrementalEncoder,
incrementaldecoder=IncrementalDecoder,
streamreader=StreamReader,
streamwriter=StreamWriter,
_is_text_encoding=False,
)
""" Standard "encodings" Package
Standard Python encoding modules are stored in this package
directory.
Codec modules must have names corresponding to normalized encoding
names as defined in the normalize_encoding() function below, e.g.
'utf-8' must be implemented by the module 'utf_8.py'.
Each codec module must export the following interface:
* getregentry() -> codecs.CodecInfo object
The getregentry() API must return a CodecInfo object with encoder, decoder,
incrementalencoder, incrementaldecoder, streamwriter and streamreader
atttributes which adhere to the Python Codec Interface Standard.
In addition, a module may optionally also define the following
APIs which are then used by the package's codec search function:
* getaliases() -> sequence of encoding name strings to use as aliases
Alias names returned by getaliases() must be normalized encoding
names as defined by normalize_encoding().
Written by Marc-Andre Lemburg ([email protected]).
(c) Copyright CNRI, All Rights Reserved. NO WARRANTY.
"""#"
import codecs
from . import aliases
_cache = {}
_unknown = '--unknown--'
_import_tail = ['*']
_aliases = aliases.aliases
class CodecRegistryError(LookupError, SystemError):
pass
def normalize_encoding(encoding):
""" Normalize an encoding name.
Normalization works as follows: all non-alphanumeric
characters except the dot used for Python package names are
collapsed and replaced with a single underscore, e.g. ' -;#'
becomes '_'. Leading and trailing underscores are removed.
Note that encoding names should be ASCII only; if they do use
non-ASCII characters, these must be Latin-1 compatible.
"""
if isinstance(encoding, bytes):
encoding = str(encoding, "ascii")
chars = []
punct = False
for c in encoding:
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
punct = False
else:
punct = True
return ''.join(chars)
def search_function(encoding):
# Cache lookup
entry = _cache.get(encoding, _unknown)
if entry is not _unknown:
return entry
# Import the module:
#
# First try to find an alias for the normalized encoding
# name and lookup the module using the aliased name, then try to
# lookup the module using the standard import scheme, i.e. first
# try in the encodings package, then at top-level.
#
norm_encoding = normalize_encoding(encoding)
aliased_encoding = _aliases.get(norm_encoding) or \
_aliases.get(norm_encoding.replace('.', '_'))
if aliased_encoding is not None:
modnames = [aliased_encoding,
norm_encoding]
else:
modnames = [norm_encoding]
for modname in modnames:
if not modname or '.' in modname:
continue
try:
# Import is absolute to prevent the possibly malicious import of a
# module with side-effects that is not in the 'encodings' package.
mod = __import__('encodings.' + modname, fromlist=_import_tail,
level=0)
except ImportError:
pass
else:
break
else:
mod = None
try:
getregentry = mod.getregentry
except AttributeError:
# Not a codec module
mod = None
if mod is None:
# Cache misses
_cache[encoding] = None
return None
# Now ask the module for the registry entry
entry = getregentry()
if not isinstance(entry, codecs.CodecInfo):
if not 4 <= len(entry) <= 7:
raise CodecRegistryError('module "%s" (%s) failed to register'
% (mod.__name__, mod.__file__))
if not callable(entry[0]) or not callable(entry[1]) or \
(entry[2] is not None and not callable(entry[2])) or \
(entry[3] is not None and not callable(entry[3])) or \
(len(entry) > 4 and entry[4] is not None and not callable(entry[4])) or \
(len(entry) > 5 and entry[5] is not None and not callable(entry[5])):
raise CodecRegistryError('incompatible codecs in module "%s" (%s)'
% (mod.__name__, mod.__file__))
if len(entry)<7 or entry[6] is None:
entry += (None,)*(6-len(entry)) + (mod.__name__.split(".", 1)[1],)
entry = codecs.CodecInfo(*entry)
# Cache the codec registry entry
_cache[encoding] = entry
# Register its aliases (without overwriting previously registered
# aliases)
try:
codecaliases = mod.getaliases()
except AttributeError:
pass
else:
for alias in codecaliases:
if alias not in _aliases:
_aliases[alias] = modname
# Return the registry entry
return entry
# Register the search_function in the Python codec registry
codecs.register(search_function)
"""Basic pip uninstallation support, helper for the Windows uninstaller"""
import argparse
import ensurepip
import sys
def _main(argv=None):
parser = argparse.ArgumentParser(prog="python -m ensurepip._uninstall")
parser.add_argument(
"--version",
action="version",
version="pip {}".format(ensurepip.version()),
help="Show the version of pip this will attempt to uninstall.",
)
parser.add_argument(
"-v", "--verbose",
action="count",
default=0,
dest="verbosity",
help=("Give more output. Option is additive, and can be used up to 3 "
"times."),
)
args = parser.parse_args(argv)
return ensurepip._uninstall_helper(verbosity=args.verbosity)
if __name__ == "__main__":
sys.exit(_main())
import os
import os.path
import pkgutil
import sys
import tempfile
__all__ = ["version", "bootstrap"]
_SETUPTOOLS_VERSION = "40.6.2"
_PIP_VERSION = "18.1"
_PROJECTS = [
("setuptools", _SETUPTOOLS_VERSION),
("pip", _PIP_VERSION),
]
def _run_pip(args, additional_paths=None):
# Add our bundled software to the sys.path so we can import it
if additional_paths is not None:
sys.path = additional_paths + sys.path
# Install the bundled software
import pip._internal
return pip._internal.main(args)
def version():
"""
Returns a string specifying the bundled version of pip.
"""
return _PIP_VERSION
def _disable_pip_configuration_settings():
# We deliberately ignore all pip environment variables
# when invoking pip
# See http://bugs.python.org/issue19734 for details
keys_to_remove = [k for k in os.environ if k.startswith("PIP_")]
for k in keys_to_remove:
del os.environ[k]
# We also ignore the settings in the default pip configuration file
# See http://bugs.python.org/issue20053 for details
os.environ['PIP_CONFIG_FILE'] = os.devnull
def bootstrap(*, root=None, upgrade=False, user=False,
altinstall=False, default_pip=False,
verbosity=0):
"""
Bootstrap pip into the current Python installation (or the given root
directory).
Note that calling this function will alter both sys.path and os.environ.
"""
# Discard the return value
_bootstrap(root=root, upgrade=upgrade, user=user,
altinstall=altinstall, default_pip=default_pip,
verbosity=verbosity)
def _bootstrap(*, root=None, upgrade=False, user=False,
altinstall=False, default_pip=False,
verbosity=0):
"""
Bootstrap pip into the current Python installation (or the given root
directory). Returns pip command status code.
Note that calling this function will alter both sys.path and os.environ.
"""
if altinstall and default_pip:
raise ValueError("Cannot use altinstall and default_pip together")
_disable_pip_configuration_settings()
# By default, installing pip and setuptools installs all of the
# following scripts (X.Y == running Python version):
#
# pip, pipX, pipX.Y, easy_install, easy_install-X.Y
#
# pip 1.5+ allows ensurepip to request that some of those be left out
if altinstall:
# omit pip, pipX and easy_install
os.environ["ENSUREPIP_OPTIONS"] = "altinstall"
elif not default_pip:
# omit pip and easy_install
os.environ["ENSUREPIP_OPTIONS"] = "install"
with tempfile.TemporaryDirectory() as tmpdir:
# Put our bundled wheels into a temporary directory and construct the
# additional paths that need added to sys.path
additional_paths = []
for project, version in _PROJECTS:
wheel_name = "{}-{}-py2.py3-none-any.whl".format(project, version)
whl = pkgutil.get_data(
"ensurepip",
"_bundled/{}".format(wheel_name),
)
with open(os.path.join(tmpdir, wheel_name), "wb") as fp:
fp.write(whl)
additional_paths.append(os.path.join(tmpdir, wheel_name))
# Construct the arguments to be passed to the pip command
args = ["install", "--no-index", "--find-links", tmpdir]
if sys.implementation.name == "ironpython":
args += ["--no-compile"]
if root:
args += ["--root", root]
if upgrade:
args += ["--upgrade"]
if user:
args += ["--user"]
if verbosity:
args += ["-" + "v" * verbosity]
return _run_pip(args + [p[0] for p in _PROJECTS], additional_paths)
def _uninstall_helper(*, verbosity=0):
"""Helper to support a clean default uninstall process on Windows
Note that calling this function may alter os.environ.
"""
# Nothing to do if pip was never installed, or has been removed
try:
import pip
except ImportError:
return
# If the pip version doesn't match the bundled one, leave it alone
if pip.__version__ != _PIP_VERSION:
msg = ("ensurepip will only uninstall a matching version "
"({!r} installed, {!r} bundled)")
print(msg.format(pip.__version__, _PIP_VERSION), file=sys.stderr)
return
_disable_pip_configuration_settings()
# Construct the arguments to be passed to the pip command
args = ["uninstall", "-y", "--disable-pip-version-check"]
if verbosity:
args += ["-" + "v" * verbosity]
return _run_pip(args + [p[0] for p in reversed(_PROJECTS)])
def _main(argv=None):
import argparse
parser = argparse.ArgumentParser(prog="python -m ensurepip")
parser.add_argument(
"--version",
action="version",
version="pip {}".format(version()),
help="Show the version of pip that is bundled with this Python.",
)
parser.add_argument(
"-v", "--verbose",
action="count",
default=0,
dest="verbosity",
help=("Give more output. Option is additive, and can be used up to 3 "
"times."),
)
parser.add_argument(
"-U", "--upgrade",
action="store_true",
default=False,
help="Upgrade pip and dependencies, even if already installed.",
)
parser.add_argument(
"--user",
action="store_true",
default=False,
help="Install using the user scheme.",
)
parser.add_argument(
"--root",
default=None,
help="Install everything relative to this alternate root directory.",
)
parser.add_argument(
"--altinstall",
action="store_true",
default=False,
help=("Make an alternate install, installing only the X.Y versioned "
"scripts (Default: pipX, pipX.Y, easy_install-X.Y)."),
)
parser.add_argument(
"--default-pip",
action="store_true",
default=False,
help=("Make a default pip install, installing the unqualified pip "
"and easy_install in addition to the versioned scripts."),
)
args = parser.parse_args(argv)
return _bootstrap(
root=args.root,
upgrade=args.upgrade,
user=args.user,
verbosity=args.verbosity,
altinstall=args.altinstall,
default_pip=args.default_pip,
)
import ensurepip
import sys
if __name__ == "__main__":
sys.exit(ensurepip._main())
"""HTML character entity references."""
# maps the HTML entity name to the Unicode code point
name2codepoint = {
'AElig': 0x00c6, # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1
'Aacute': 0x00c1, # latin capital letter A with acute, U+00C1 ISOlat1
'Acirc': 0x00c2, # latin capital letter A with circumflex, U+00C2 ISOlat1
'Agrave': 0x00c0, # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1
'Alpha': 0x0391, # greek capital letter alpha, U+0391
'Aring': 0x00c5, # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1
'Atilde': 0x00c3, # latin capital letter A with tilde, U+00C3 ISOlat1
'Auml': 0x00c4, # latin capital letter A with diaeresis, U+00C4 ISOlat1
'Beta': 0x0392, # greek capital letter beta, U+0392
'Ccedil': 0x00c7, # latin capital letter C with cedilla, U+00C7 ISOlat1
'Chi': 0x03a7, # greek capital letter chi, U+03A7
'Dagger': 0x2021, # double dagger, U+2021 ISOpub
'Delta': 0x0394, # greek capital letter delta, U+0394 ISOgrk3
'ETH': 0x00d0, # latin capital letter ETH, U+00D0 ISOlat1
'Eacute': 0x00c9, # latin capital letter E with acute, U+00C9 ISOlat1
'Ecirc': 0x00ca, # latin capital letter E with circumflex, U+00CA ISOlat1
'Egrave': 0x00c8, # latin capital letter E with grave, U+00C8 ISOlat1
'Epsilon': 0x0395, # greek capital letter epsilon, U+0395
'Eta': 0x0397, # greek capital letter eta, U+0397
'Euml': 0x00cb, # latin capital letter E with diaeresis, U+00CB ISOlat1
'Gamma': 0x0393, # greek capital letter gamma, U+0393 ISOgrk3
'Iacute': 0x00cd, # latin capital letter I with acute, U+00CD ISOlat1
'Icirc': 0x00ce, # latin capital letter I with circumflex, U+00CE ISOlat1
'Igrave': 0x00cc, # latin capital letter I with grave, U+00CC ISOlat1
'Iota': 0x0399, # greek capital letter iota, U+0399
'Iuml': 0x00cf, # latin capital letter I with diaeresis, U+00CF ISOlat1
'Kappa': 0x039a, # greek capital letter kappa, U+039A
'Lambda': 0x039b, # greek capital letter lambda, U+039B ISOgrk3
'Mu': 0x039c, # greek capital letter mu, U+039C
'Ntilde': 0x00d1, # latin capital letter N with tilde, U+00D1 ISOlat1
'Nu': 0x039d, # greek capital letter nu, U+039D
'OElig': 0x0152, # latin capital ligature OE, U+0152 ISOlat2
'Oacute': 0x00d3, # latin capital letter O with acute, U+00D3 ISOlat1
'Ocirc': 0x00d4, # latin capital letter O with circumflex, U+00D4 ISOlat1
'Ograve': 0x00d2, # latin capital letter O with grave, U+00D2 ISOlat1
'Omega': 0x03a9, # greek capital letter omega, U+03A9 ISOgrk3
'Omicron': 0x039f, # greek capital letter omicron, U+039F
'Oslash': 0x00d8, # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1
'Otilde': 0x00d5, # latin capital letter O with tilde, U+00D5 ISOlat1
'Ouml': 0x00d6, # latin capital letter O with diaeresis, U+00D6 ISOlat1
'Phi': 0x03a6, # greek capital letter phi, U+03A6 ISOgrk3
'Pi': 0x03a0, # greek capital letter pi, U+03A0 ISOgrk3
'Prime': 0x2033, # double prime = seconds = inches, U+2033 ISOtech
'Psi': 0x03a8, # greek capital letter psi, U+03A8 ISOgrk3
'Rho': 0x03a1, # greek capital letter rho, U+03A1
'Scaron': 0x0160, # latin capital letter S with caron, U+0160 ISOlat2
'Sigma': 0x03a3, # greek capital letter sigma, U+03A3 ISOgrk3
'THORN': 0x00de, # latin capital letter THORN, U+00DE ISOlat1
'Tau': 0x03a4, # greek capital letter tau, U+03A4
'Theta': 0x0398, # greek capital letter theta, U+0398 ISOgrk3
'Uacute': 0x00da, # latin capital letter U with acute, U+00DA ISOlat1
'Ucirc': 0x00db, # latin capital letter U with circumflex, U+00DB ISOlat1
'Ugrave': 0x00d9, # latin capital letter U with grave, U+00D9 ISOlat1
'Upsilon': 0x03a5, # greek capital letter upsilon, U+03A5 ISOgrk3
'Uuml': 0x00dc, # latin capital letter U with diaeresis, U+00DC ISOlat1
'Xi': 0x039e, # greek capital letter xi, U+039E ISOgrk3
'Yacute': 0x00dd, # latin capital letter Y with acute, U+00DD ISOlat1
'Yuml': 0x0178, # latin capital letter Y with diaeresis, U+0178 ISOlat2
'Zeta': 0x0396, # greek capital letter zeta, U+0396
'aacute': 0x00e1, # latin small letter a with acute, U+00E1 ISOlat1
'acirc': 0x00e2, # latin small letter a with circumflex, U+00E2 ISOlat1
'acute': 0x00b4, # acute accent = spacing acute, U+00B4 ISOdia
'aelig': 0x00e6, # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1
'agrave': 0x00e0, # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1
'alefsym': 0x2135, # alef symbol = first transfinite cardinal, U+2135 NEW
'alpha': 0x03b1, # greek small letter alpha, U+03B1 ISOgrk3
'amp': 0x0026, # ampersand, U+0026 ISOnum
'and': 0x2227, # logical and = wedge, U+2227 ISOtech
'ang': 0x2220, # angle, U+2220 ISOamso
'aring': 0x00e5, # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1
'asymp': 0x2248, # almost equal to = asymptotic to, U+2248 ISOamsr
'atilde': 0x00e3, # latin small letter a with tilde, U+00E3 ISOlat1
'auml': 0x00e4, # latin small letter a with diaeresis, U+00E4 ISOlat1
'bdquo': 0x201e, # double low-9 quotation mark, U+201E NEW
'beta': 0x03b2, # greek small letter beta, U+03B2 ISOgrk3
'brvbar': 0x00a6, # broken bar = broken vertical bar, U+00A6 ISOnum
'bull': 0x2022, # bullet = black small circle, U+2022 ISOpub
'cap': 0x2229, # intersection = cap, U+2229 ISOtech
'ccedil': 0x00e7, # latin small letter c with cedilla, U+00E7 ISOlat1
'cedil': 0x00b8, # cedilla = spacing cedilla, U+00B8 ISOdia
'cent': 0x00a2, # cent sign, U+00A2 ISOnum
'chi': 0x03c7, # greek small letter chi, U+03C7 ISOgrk3
'circ': 0x02c6, # modifier letter circumflex accent, U+02C6 ISOpub
'clubs': 0x2663, # black club suit = shamrock, U+2663 ISOpub
'cong': 0x2245, # approximately equal to, U+2245 ISOtech
'copy': 0x00a9, # copyright sign, U+00A9 ISOnum
'crarr': 0x21b5, # downwards arrow with corner leftwards = carriage return, U+21B5 NEW
'cup': 0x222a, # union = cup, U+222A ISOtech
'curren': 0x00a4, # currency sign, U+00A4 ISOnum
'dArr': 0x21d3, # downwards double arrow, U+21D3 ISOamsa
'dagger': 0x2020, # dagger, U+2020 ISOpub
'darr': 0x2193, # downwards arrow, U+2193 ISOnum
'deg': 0x00b0, # degree sign, U+00B0 ISOnum
'delta': 0x03b4, # greek small letter delta, U+03B4 ISOgrk3
'diams': 0x2666, # black diamond suit, U+2666 ISOpub
'divide': 0x00f7, # division sign, U+00F7 ISOnum
'eacute': 0x00e9, # latin small letter e with acute, U+00E9 ISOlat1
'ecirc': 0x00ea, # latin small letter e with circumflex, U+00EA ISOlat1
'egrave': 0x00e8, # latin small letter e with grave, U+00E8 ISOlat1
'empty': 0x2205, # empty set = null set = diameter, U+2205 ISOamso
'emsp': 0x2003, # em space, U+2003 ISOpub
'ensp': 0x2002, # en space, U+2002 ISOpub
'epsilon': 0x03b5, # greek small letter epsilon, U+03B5 ISOgrk3
'equiv': 0x2261, # identical to, U+2261 ISOtech
'eta': 0x03b7, # greek small letter eta, U+03B7 ISOgrk3
'eth': 0x00f0, # latin small letter eth, U+00F0 ISOlat1
'euml': 0x00eb, # latin small letter e with diaeresis, U+00EB ISOlat1
'euro': 0x20ac, # euro sign, U+20AC NEW
'exist': 0x2203, # there exists, U+2203 ISOtech
'fnof': 0x0192, # latin small f with hook = function = florin, U+0192 ISOtech
'forall': 0x2200, # for all, U+2200 ISOtech
'frac12': 0x00bd, # vulgar fraction one half = fraction one half, U+00BD ISOnum
'frac14': 0x00bc, # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum
'frac34': 0x00be, # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum
'frasl': 0x2044, # fraction slash, U+2044 NEW
'gamma': 0x03b3, # greek small letter gamma, U+03B3 ISOgrk3
'ge': 0x2265, # greater-than or equal to, U+2265 ISOtech
'gt': 0x003e, # greater-than sign, U+003E ISOnum
'hArr': 0x21d4, # left right double arrow, U+21D4 ISOamsa
'harr': 0x2194, # left right arrow, U+2194 ISOamsa
'hearts': 0x2665, # black heart suit = valentine, U+2665 ISOpub
'hellip': 0x2026, # horizontal ellipsis = three dot leader, U+2026 ISOpub
'iacute': 0x00ed, # latin small letter i with acute, U+00ED ISOlat1
'icirc': 0x00ee, # latin small letter i with circumflex, U+00EE ISOlat1
'iexcl': 0x00a1, # inverted exclamation mark, U+00A1 ISOnum
'igrave': 0x00ec, # latin small letter i with grave, U+00EC ISOlat1
'image': 0x2111, # blackletter capital I = imaginary part, U+2111 ISOamso
'infin': 0x221e, # infinity, U+221E ISOtech
'int': 0x222b, # integral, U+222B ISOtech
'iota': 0x03b9, # greek small letter iota, U+03B9 ISOgrk3
'iquest': 0x00bf, # inverted question mark = turned question mark, U+00BF ISOnum
'isin': 0x2208, # element of, U+2208 ISOtech
'iuml': 0x00ef, # latin small letter i with diaeresis, U+00EF ISOlat1
'kappa': 0x03ba, # greek small letter kappa, U+03BA ISOgrk3
'lArr': 0x21d0, # leftwards double arrow, U+21D0 ISOtech
'lambda': 0x03bb, # greek small letter lambda, U+03BB ISOgrk3
'lang': 0x2329, # left-pointing angle bracket = bra, U+2329 ISOtech
'laquo': 0x00ab, # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum
'larr': 0x2190, # leftwards arrow, U+2190 ISOnum
'lceil': 0x2308, # left ceiling = apl upstile, U+2308 ISOamsc
'ldquo': 0x201c, # left double quotation mark, U+201C ISOnum
'le': 0x2264, # less-than or equal to, U+2264 ISOtech
'lfloor': 0x230a, # left floor = apl downstile, U+230A ISOamsc
'lowast': 0x2217, # asterisk operator, U+2217 ISOtech
'loz': 0x25ca, # lozenge, U+25CA ISOpub
'lrm': 0x200e, # left-to-right mark, U+200E NEW RFC 2070
'lsaquo': 0x2039, # single left-pointing angle quotation mark, U+2039 ISO proposed
'lsquo': 0x2018, # left single quotation mark, U+2018 ISOnum
'lt': 0x003c, # less-than sign, U+003C ISOnum
'macr': 0x00af, # macron = spacing macron = overline = APL overbar, U+00AF ISOdia
'mdash': 0x2014, # em dash, U+2014 ISOpub
'micro': 0x00b5, # micro sign, U+00B5 ISOnum
'middot': 0x00b7, # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum
'minus': 0x2212, # minus sign, U+2212 ISOtech
'mu': 0x03bc, # greek small letter mu, U+03BC ISOgrk3
'nabla': 0x2207, # nabla = backward difference, U+2207 ISOtech
'nbsp': 0x00a0, # no-break space = non-breaking space, U+00A0 ISOnum
'ndash': 0x2013, # en dash, U+2013 ISOpub
'ne': 0x2260, # not equal to, U+2260 ISOtech
'ni': 0x220b, # contains as member, U+220B ISOtech
'not': 0x00ac, # not sign, U+00AC ISOnum
'notin': 0x2209, # not an element of, U+2209 ISOtech
'nsub': 0x2284, # not a subset of, U+2284 ISOamsn
'ntilde': 0x00f1, # latin small letter n with tilde, U+00F1 ISOlat1
'nu': 0x03bd, # greek small letter nu, U+03BD ISOgrk3
'oacute': 0x00f3, # latin small letter o with acute, U+00F3 ISOlat1
'ocirc': 0x00f4, # latin small letter o with circumflex, U+00F4 ISOlat1
'oelig': 0x0153, # latin small ligature oe, U+0153 ISOlat2
'ograve': 0x00f2, # latin small letter o with grave, U+00F2 ISOlat1
'oline': 0x203e, # overline = spacing overscore, U+203E NEW
'omega': 0x03c9, # greek small letter omega, U+03C9 ISOgrk3
'omicron': 0x03bf, # greek small letter omicron, U+03BF NEW
'oplus': 0x2295, # circled plus = direct sum, U+2295 ISOamsb
'or': 0x2228, # logical or = vee, U+2228 ISOtech
'ordf': 0x00aa, # feminine ordinal indicator, U+00AA ISOnum
'ordm': 0x00ba, # masculine ordinal indicator, U+00BA ISOnum
'oslash': 0x00f8, # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1
'otilde': 0x00f5, # latin small letter o with tilde, U+00F5 ISOlat1
'otimes': 0x2297, # circled times = vector product, U+2297 ISOamsb
'ouml': 0x00f6, # latin small letter o with diaeresis, U+00F6 ISOlat1
'para': 0x00b6, # pilcrow sign = paragraph sign, U+00B6 ISOnum
'part': 0x2202, # partial differential, U+2202 ISOtech
'permil': 0x2030, # per mille sign, U+2030 ISOtech
'perp': 0x22a5, # up tack = orthogonal to = perpendicular, U+22A5 ISOtech
'phi': 0x03c6, # greek small letter phi, U+03C6 ISOgrk3
'pi': 0x03c0, # greek small letter pi, U+03C0 ISOgrk3
'piv': 0x03d6, # greek pi symbol, U+03D6 ISOgrk3
'plusmn': 0x00b1, # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum
'pound': 0x00a3, # pound sign, U+00A3 ISOnum
'prime': 0x2032, # prime = minutes = feet, U+2032 ISOtech
'prod': 0x220f, # n-ary product = product sign, U+220F ISOamsb
'prop': 0x221d, # proportional to, U+221D ISOtech
'psi': 0x03c8, # greek small letter psi, U+03C8 ISOgrk3
'quot': 0x0022, # quotation mark = APL quote, U+0022 ISOnum
'rArr': 0x21d2, # rightwards double arrow, U+21D2 ISOtech
'radic': 0x221a, # square root = radical sign, U+221A ISOtech
'rang': 0x232a, # right-pointing angle bracket = ket, U+232A ISOtech
'raquo': 0x00bb, # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum
'rarr': 0x2192, # rightwards arrow, U+2192 ISOnum
'rceil': 0x2309, # right ceiling, U+2309 ISOamsc
'rdquo': 0x201d, # right double quotation mark, U+201D ISOnum
'real': 0x211c, # blackletter capital R = real part symbol, U+211C ISOamso
'reg': 0x00ae, # registered sign = registered trade mark sign, U+00AE ISOnum
'rfloor': 0x230b, # right floor, U+230B ISOamsc
'rho': 0x03c1, # greek small letter rho, U+03C1 ISOgrk3
'rlm': 0x200f, # right-to-left mark, U+200F NEW RFC 2070
'rsaquo': 0x203a, # single right-pointing angle quotation mark, U+203A ISO proposed
'rsquo': 0x2019, # right single quotation mark, U+2019 ISOnum
'sbquo': 0x201a, # single low-9 quotation mark, U+201A NEW
'scaron': 0x0161, # latin small letter s with caron, U+0161 ISOlat2
'sdot': 0x22c5, # dot operator, U+22C5 ISOamsb
'sect': 0x00a7, # section sign, U+00A7 ISOnum
'shy': 0x00ad, # soft hyphen = discretionary hyphen, U+00AD ISOnum
'sigma': 0x03c3, # greek small letter sigma, U+03C3 ISOgrk3
'sigmaf': 0x03c2, # greek small letter final sigma, U+03C2 ISOgrk3
'sim': 0x223c, # tilde operator = varies with = similar to, U+223C ISOtech
'spades': 0x2660, # black spade suit, U+2660 ISOpub
'sub': 0x2282, # subset of, U+2282 ISOtech
'sube': 0x2286, # subset of or equal to, U+2286 ISOtech
'sum': 0x2211, # n-ary summation, U+2211 ISOamsb
'sup': 0x2283, # superset of, U+2283 ISOtech
'sup1': 0x00b9, # superscript one = superscript digit one, U+00B9 ISOnum
'sup2': 0x00b2, # superscript two = superscript digit two = squared, U+00B2 ISOnum
'sup3': 0x00b3, # superscript three = superscript digit three = cubed, U+00B3 ISOnum
'supe': 0x2287, # superset of or equal to, U+2287 ISOtech
'szlig': 0x00df, # latin small letter sharp s = ess-zed, U+00DF ISOlat1
'tau': 0x03c4, # greek small letter tau, U+03C4 ISOgrk3
'there4': 0x2234, # therefore, U+2234 ISOtech
'theta': 0x03b8, # greek small letter theta, U+03B8 ISOgrk3
'thetasym': 0x03d1, # greek small letter theta symbol, U+03D1 NEW
'thinsp': 0x2009, # thin space, U+2009 ISOpub
'thorn': 0x00fe, # latin small letter thorn with, U+00FE ISOlat1
'tilde': 0x02dc, # small tilde, U+02DC ISOdia
'times': 0x00d7, # multiplication sign, U+00D7 ISOnum
'trade': 0x2122, # trade mark sign, U+2122 ISOnum
'uArr': 0x21d1, # upwards double arrow, U+21D1 ISOamsa
'uacute': 0x00fa, # latin small letter u with acute, U+00FA ISOlat1
'uarr': 0x2191, # upwards arrow, U+2191 ISOnum
'ucirc': 0x00fb, # latin small letter u with circumflex, U+00FB ISOlat1
'ugrave': 0x00f9, # latin small letter u with grave, U+00F9 ISOlat1
'uml': 0x00a8, # diaeresis = spacing diaeresis, U+00A8 ISOdia
'upsih': 0x03d2, # greek upsilon with hook symbol, U+03D2 NEW
'upsilon': 0x03c5, # greek small letter upsilon, U+03C5 ISOgrk3
'uuml': 0x00fc, # latin small letter u with diaeresis, U+00FC ISOlat1
'weierp': 0x2118, # script capital P = power set = Weierstrass p, U+2118 ISOamso
'xi': 0x03be, # greek small letter xi, U+03BE ISOgrk3
'yacute': 0x00fd, # latin small letter y with acute, U+00FD ISOlat1
'yen': 0x00a5, # yen sign = yuan sign, U+00A5 ISOnum
'yuml': 0x00ff, # latin small letter y with diaeresis, U+00FF ISOlat1
'zeta': 0x03b6, # greek small letter zeta, U+03B6 ISOgrk3
'zwj': 0x200d, # zero width joiner, U+200D NEW RFC 2070
'zwnj': 0x200c, # zero width non-joiner, U+200C NEW RFC 2070
}
# maps the HTML5 named character references to the equivalent Unicode character(s)
html5 = {
'Aacute': '\xc1',
'aacute': '\xe1',
'Aacute;': '\xc1',
'aacute;': '\xe1',
'Abreve;': '\u0102',
'abreve;': '\u0103',
'ac;': '\u223e',
'acd;': '\u223f',
'acE;': '\u223e\u0333',
'Acirc': '\xc2',
'acirc': '\xe2',
'Acirc;': '\xc2',
'acirc;': '\xe2',
'acute': '\xb4',
'acute;': '\xb4',
'Acy;': '\u0410',
'acy;': '\u0430',
'AElig': '\xc6',
'aelig': '\xe6',
'AElig;': '\xc6',
'aelig;': '\xe6',
'af;': '\u2061',
'Afr;': '\U0001d504',
'afr;': '\U0001d51e',
'Agrave': '\xc0',
'agrave': '\xe0',
'Agrave;': '\xc0',
'agrave;': '\xe0',
'alefsym;': '\u2135',
'aleph;': '\u2135',
'Alpha;': '\u0391',
'alpha;': '\u03b1',
'Amacr;': '\u0100',
'amacr;': '\u0101',
'amalg;': '\u2a3f',
'AMP': '&',
'amp': '&',
'AMP;': '&',
'amp;': '&',
'And;': '\u2a53',
'and;': '\u2227',
'andand;': '\u2a55',
'andd;': '\u2a5c',
'andslope;': '\u2a58',
'andv;': '\u2a5a',
'ang;': '\u2220',
'ange;': '\u29a4',
'angle;': '\u2220',
'angmsd;': '\u2221',
'angmsdaa;': '\u29a8',
'angmsdab;': '\u29a9',
'angmsdac;': '\u29aa',
'angmsdad;': '\u29ab',
'angmsdae;': '\u29ac',
'angmsdaf;': '\u29ad',
'angmsdag;': '\u29ae',
'angmsdah;': '\u29af',
'angrt;': '\u221f',
'angrtvb;': '\u22be',
'angrtvbd;': '\u299d',
'angsph;': '\u2222',
'angst;': '\xc5',
'angzarr;': '\u237c',
'Aogon;': '\u0104',
'aogon;': '\u0105',
'Aopf;': '\U0001d538',
'aopf;': '\U0001d552',
'ap;': '\u2248',
'apacir;': '\u2a6f',
'apE;': '\u2a70',
'ape;': '\u224a',
'apid;': '\u224b',
'apos;': "'",
'ApplyFunction;': '\u2061',
'approx;': '\u2248',
'approxeq;': '\u224a',
'Aring': '\xc5',
'aring': '\xe5',
'Aring;': '\xc5',
'aring;': '\xe5',
'Ascr;': '\U0001d49c',
'ascr;': '\U0001d4b6',
'Assign;': '\u2254',
'ast;': '*',
'asymp;': '\u2248',
'asympeq;': '\u224d',
'Atilde': '\xc3',
'atilde': '\xe3',
'Atilde;': '\xc3',
'atilde;': '\xe3',
'Auml': '\xc4',
'auml': '\xe4',
'Auml;': '\xc4',
'auml;': '\xe4',
'awconint;': '\u2233',
'awint;': '\u2a11',
'backcong;': '\u224c',
'backepsilon;': '\u03f6',
'backprime;': '\u2035',
'backsim;': '\u223d',
'backsimeq;': '\u22cd',
'Backslash;': '\u2216',
'Barv;': '\u2ae7',
'barvee;': '\u22bd',
'Barwed;': '\u2306',
'barwed;': '\u2305',
'barwedge;': '\u2305',
'bbrk;': '\u23b5',
'bbrktbrk;': '\u23b6',
'bcong;': '\u224c',
'Bcy;': '\u0411',
'bcy;': '\u0431',
'bdquo;': '\u201e',
'becaus;': '\u2235',
'Because;': '\u2235',
'because;': '\u2235',
'bemptyv;': '\u29b0',
'bepsi;': '\u03f6',
'bernou;': '\u212c',
'Bernoullis;': '\u212c',
'Beta;': '\u0392',
'beta;': '\u03b2',
'beth;': '\u2136',
'between;': '\u226c',
'Bfr;': '\U0001d505',
'bfr;': '\U0001d51f',
'bigcap;': '\u22c2',
'bigcirc;': '\u25ef',
'bigcup;': '\u22c3',
'bigodot;': '\u2a00',
'bigoplus;': '\u2a01',
'bigotimes;': '\u2a02',
'bigsqcup;': '\u2a06',
'bigstar;': '\u2605',
'bigtriangledown;': '\u25bd',
'bigtriangleup;': '\u25b3',
'biguplus;': '\u2a04',
'bigvee;': '\u22c1',
'bigwedge;': '\u22c0',
'bkarow;': '\u290d',
'blacklozenge;': '\u29eb',
'blacksquare;': '\u25aa',
'blacktriangle;': '\u25b4',
'blacktriangledown;': '\u25be',
'blacktriangleleft;': '\u25c2',
'blacktriangleright;': '\u25b8',
'blank;': '\u2423',
'blk12;': '\u2592',
'blk14;': '\u2591',
'blk34;': '\u2593',
'block;': '\u2588',
'bne;': '=\u20e5',
'bnequiv;': '\u2261\u20e5',
'bNot;': '\u2aed',
'bnot;': '\u2310',
'Bopf;': '\U0001d539',
'bopf;': '\U0001d553',
'bot;': '\u22a5',
'bottom;': '\u22a5',
'bowtie;': '\u22c8',
'boxbox;': '\u29c9',
'boxDL;': '\u2557',
'boxDl;': '\u2556',
'boxdL;': '\u2555',
'boxdl;': '\u2510',
'boxDR;': '\u2554',
'boxDr;': '\u2553',
'boxdR;': '\u2552',
'boxdr;': '\u250c',
'boxH;': '\u2550',
'boxh;': '\u2500',
'boxHD;': '\u2566',
'boxHd;': '\u2564',
'boxhD;': '\u2565',
'boxhd;': '\u252c',
'boxHU;': '\u2569',
'boxHu;': '\u2567',
'boxhU;': '\u2568',
'boxhu;': '\u2534',
'boxminus;': '\u229f',
'boxplus;': '\u229e',
'boxtimes;': '\u22a0',
'boxUL;': '\u255d',
'boxUl;': '\u255c',
'boxuL;': '\u255b',
'boxul;': '\u2518',
'boxUR;': '\u255a',
'boxUr;': '\u2559',
'boxuR;': '\u2558',
'boxur;': '\u2514',
'boxV;': '\u2551',
'boxv;': '\u2502',
'boxVH;': '\u256c',
'boxVh;': '\u256b',
'boxvH;': '\u256a',
'boxvh;': '\u253c',
'boxVL;': '\u2563',
'boxVl;': '\u2562',
'boxvL;': '\u2561',
'boxvl;': '\u2524',
'boxVR;': '\u2560',
'boxVr;': '\u255f',
'boxvR;': '\u255e',
'boxvr;': '\u251c',
'bprime;': '\u2035',
'Breve;': '\u02d8',
'breve;': '\u02d8',
'brvbar': '\xa6',
'brvbar;': '\xa6',
'Bscr;': '\u212c',
'bscr;': '\U0001d4b7',
'bsemi;': '\u204f',
'bsim;': '\u223d',
'bsime;': '\u22cd',
'bsol;': '\\',
'bsolb;': '\u29c5',
'bsolhsub;': '\u27c8',
'bull;': '\u2022',
'bullet;': '\u2022',
'bump;': '\u224e',
'bumpE;': '\u2aae',
'bumpe;': '\u224f',
'Bumpeq;': '\u224e',
'bumpeq;': '\u224f',
'Cacute;': '\u0106',
'cacute;': '\u0107',
'Cap;': '\u22d2',
'cap;': '\u2229',
'capand;': '\u2a44',
'capbrcup;': '\u2a49',
'capcap;': '\u2a4b',
'capcup;': '\u2a47',
'capdot;': '\u2a40',
'CapitalDifferentialD;': '\u2145',
'caps;': '\u2229\ufe00',
'caret;': '\u2041',
'caron;': '\u02c7',
'Cayleys;': '\u212d',
'ccaps;': '\u2a4d',
'Ccaron;': '\u010c',
'ccaron;': '\u010d',
'Ccedil': '\xc7',
'ccedil': '\xe7',
'Ccedil;': '\xc7',
'ccedil;': '\xe7',
'Ccirc;': '\u0108',
'ccirc;': '\u0109',
'Cconint;': '\u2230',
'ccups;': '\u2a4c',
'ccupssm;': '\u2a50',
'Cdot;': '\u010a',
'cdot;': '\u010b',
'cedil': '\xb8',
'cedil;': '\xb8',
'Cedilla;': '\xb8',
'cemptyv;': '\u29b2',
'cent': '\xa2',
'cent;': '\xa2',
'CenterDot;': '\xb7',
'centerdot;': '\xb7',
'Cfr;': '\u212d',
'cfr;': '\U0001d520',
'CHcy;': '\u0427',
'chcy;': '\u0447',
'check;': '\u2713',
'checkmark;': '\u2713',
'Chi;': '\u03a7',
'chi;': '\u03c7',
'cir;': '\u25cb',
'circ;': '\u02c6',
'circeq;': '\u2257',
'circlearrowleft;': '\u21ba',
'circlearrowright;': '\u21bb',
'circledast;': '\u229b',
'circledcirc;': '\u229a',
'circleddash;': '\u229d',
'CircleDot;': '\u2299',
'circledR;': '\xae',
'circledS;': '\u24c8',
'CircleMinus;': '\u2296',
'CirclePlus;': '\u2295',
'CircleTimes;': '\u2297',
'cirE;': '\u29c3',
'cire;': '\u2257',
'cirfnint;': '\u2a10',
'cirmid;': '\u2aef',
'cirscir;': '\u29c2',
'ClockwiseContourIntegral;': '\u2232',
'CloseCurlyDoubleQuote;': '\u201d',
'CloseCurlyQuote;': '\u2019',
'clubs;': '\u2663',
'clubsuit;': '\u2663',
'Colon;': '\u2237',
'colon;': ':',
'Colone;': '\u2a74',
'colone;': '\u2254',
'coloneq;': '\u2254',
'comma;': ',',
'commat;': '@',
'comp;': '\u2201',
'compfn;': '\u2218',
'complement;': '\u2201',
'complexes;': '\u2102',
'cong;': '\u2245',
'congdot;': '\u2a6d',
'Congruent;': '\u2261',
'Conint;': '\u222f',
'conint;': '\u222e',
'ContourIntegral;': '\u222e',
'Copf;': '\u2102',
'copf;': '\U0001d554',
'coprod;': '\u2210',
'Coproduct;': '\u2210',
'COPY': '\xa9',
'copy': '\xa9',
'COPY;': '\xa9',
'copy;': '\xa9',
'copysr;': '\u2117',
'CounterClockwiseContourIntegral;': '\u2233',
'crarr;': '\u21b5',
'Cross;': '\u2a2f',
'cross;': '\u2717',
'Cscr;': '\U0001d49e',
'cscr;': '\U0001d4b8',
'csub;': '\u2acf',
'csube;': '\u2ad1',
'csup;': '\u2ad0',
'csupe;': '\u2ad2',
'ctdot;': '\u22ef',
'cudarrl;': '\u2938',
'cudarrr;': '\u2935',
'cuepr;': '\u22de',
'cuesc;': '\u22df',
'cularr;': '\u21b6',
'cularrp;': '\u293d',
'Cup;': '\u22d3',
'cup;': '\u222a',
'cupbrcap;': '\u2a48',
'CupCap;': '\u224d',
'cupcap;': '\u2a46',
'cupcup;': '\u2a4a',
'cupdot;': '\u228d',
'cupor;': '\u2a45',
'cups;': '\u222a\ufe00',
'curarr;': '\u21b7',
'curarrm;': '\u293c',
'curlyeqprec;': '\u22de',
'curlyeqsucc;': '\u22df',
'curlyvee;': '\u22ce',
'curlywedge;': '\u22cf',
'curren': '\xa4',
'curren;': '\xa4',
'curvearrowleft;': '\u21b6',
'curvearrowright;': '\u21b7',
'cuvee;': '\u22ce',
'cuwed;': '\u22cf',
'cwconint;': '\u2232',
'cwint;': '\u2231',
'cylcty;': '\u232d',
'Dagger;': '\u2021',
'dagger;': '\u2020',
'daleth;': '\u2138',
'Darr;': '\u21a1',
'dArr;': '\u21d3',
'darr;': '\u2193',
'dash;': '\u2010',
'Dashv;': '\u2ae4',
'dashv;': '\u22a3',
'dbkarow;': '\u290f',
'dblac;': '\u02dd',
'Dcaron;': '\u010e',
'dcaron;': '\u010f',
'Dcy;': '\u0414',
'dcy;': '\u0434',
'DD;': '\u2145',
'dd;': '\u2146',
'ddagger;': '\u2021',
'ddarr;': '\u21ca',
'DDotrahd;': '\u2911',
'ddotseq;': '\u2a77',
'deg': '\xb0',
'deg;': '\xb0',
'Del;': '\u2207',
'Delta;': '\u0394',
'delta;': '\u03b4',
'demptyv;': '\u29b1',
'dfisht;': '\u297f',
'Dfr;': '\U0001d507',
'dfr;': '\U0001d521',
'dHar;': '\u2965',
'dharl;': '\u21c3',
'dharr;': '\u21c2',
'DiacriticalAcute;': '\xb4',
'DiacriticalDot;': '\u02d9',
'DiacriticalDoubleAcute;': '\u02dd',
'DiacriticalGrave;': '`',
'DiacriticalTilde;': '\u02dc',
'diam;': '\u22c4',
'Diamond;': '\u22c4',
'diamond;': '\u22c4',
'diamondsuit;': '\u2666',
'diams;': '\u2666',
'die;': '\xa8',
'DifferentialD;': '\u2146',
'digamma;': '\u03dd',
'disin;': '\u22f2',
'div;': '\xf7',
'divide': '\xf7',
'divide;': '\xf7',
'divideontimes;': '\u22c7',
'divonx;': '\u22c7',
'DJcy;': '\u0402',
'djcy;': '\u0452',
'dlcorn;': '\u231e',
'dlcrop;': '\u230d',
'dollar;': '$',
'Dopf;': '\U0001d53b',
'dopf;': '\U0001d555',
'Dot;': '\xa8',
'dot;': '\u02d9',
'DotDot;': '\u20dc',
'doteq;': '\u2250',
'doteqdot;': '\u2251',
'DotEqual;': '\u2250',
'dotminus;': '\u2238',
'dotplus;': '\u2214',
'dotsquare;': '\u22a1',
'doublebarwedge;': '\u2306',
'DoubleContourIntegral;': '\u222f',
'DoubleDot;': '\xa8',
'DoubleDownArrow;': '\u21d3',
'DoubleLeftArrow;': '\u21d0',
'DoubleLeftRightArrow;': '\u21d4',
'DoubleLeftTee;': '\u2ae4',
'DoubleLongLeftArrow;': '\u27f8',
'DoubleLongLeftRightArrow;': '\u27fa',
'DoubleLongRightArrow;': '\u27f9',
'DoubleRightArrow;': '\u21d2',
'DoubleRightTee;': '\u22a8',
'DoubleUpArrow;': '\u21d1',
'DoubleUpDownArrow;': '\u21d5',
'DoubleVerticalBar;': '\u2225',
'DownArrow;': '\u2193',
'Downarrow;': '\u21d3',
'downarrow;': '\u2193',
'DownArrowBar;': '\u2913',
'DownArrowUpArrow;': '\u21f5',
'DownBreve;': '\u0311',
'downdownarrows;': '\u21ca',
'downharpoonleft;': '\u21c3',
'downharpoonright;': '\u21c2',
'DownLeftRightVector;': '\u2950',
'DownLeftTeeVector;': '\u295e',
'DownLeftVector;': '\u21bd',
'DownLeftVectorBar;': '\u2956',
'DownRightTeeVector;': '\u295f',
'DownRightVector;': '\u21c1',
'DownRightVectorBar;': '\u2957',
'DownTee;': '\u22a4',
'DownTeeArrow;': '\u21a7',
'drbkarow;': '\u2910',
'drcorn;': '\u231f',
'drcrop;': '\u230c',
'Dscr;': '\U0001d49f',
'dscr;': '\U0001d4b9',
'DScy;': '\u0405',
'dscy;': '\u0455',
'dsol;': '\u29f6',
'Dstrok;': '\u0110',
'dstrok;': '\u0111',
'dtdot;': '\u22f1',
'dtri;': '\u25bf',
'dtrif;': '\u25be',
'duarr;': '\u21f5',
'duhar;': '\u296f',
'dwangle;': '\u29a6',
'DZcy;': '\u040f',
'dzcy;': '\u045f',
'dzigrarr;': '\u27ff',
'Eacute': '\xc9',
'eacute': '\xe9',
'Eacute;': '\xc9',
'eacute;': '\xe9',
'easter;': '\u2a6e',
'Ecaron;': '\u011a',
'ecaron;': '\u011b',
'ecir;': '\u2256',
'Ecirc': '\xca',
'ecirc': '\xea',
'Ecirc;': '\xca',
'ecirc;': '\xea',
'ecolon;': '\u2255',
'Ecy;': '\u042d',
'ecy;': '\u044d',
'eDDot;': '\u2a77',
'Edot;': '\u0116',
'eDot;': '\u2251',
'edot;': '\u0117',
'ee;': '\u2147',
'efDot;': '\u2252',
'Efr;': '\U0001d508',
'efr;': '\U0001d522',
'eg;': '\u2a9a',
'Egrave': '\xc8',
'egrave': '\xe8',
'Egrave;': '\xc8',
'egrave;': '\xe8',
'egs;': '\u2a96',
'egsdot;': '\u2a98',
'el;': '\u2a99',
'Element;': '\u2208',
'elinters;': '\u23e7',
'ell;': '\u2113',
'els;': '\u2a95',
'elsdot;': '\u2a97',
'Emacr;': '\u0112',
'emacr;': '\u0113',
'empty;': '\u2205',
'emptyset;': '\u2205',
'EmptySmallSquare;': '\u25fb',
'emptyv;': '\u2205',
'EmptyVerySmallSquare;': '\u25ab',
'emsp13;': '\u2004',
'emsp14;': '\u2005',
'emsp;': '\u2003',
'ENG;': '\u014a',
'eng;': '\u014b',
'ensp;': '\u2002',
'Eogon;': '\u0118',
'eogon;': '\u0119',
'Eopf;': '\U0001d53c',
'eopf;': '\U0001d556',
'epar;': '\u22d5',
'eparsl;': '\u29e3',
'eplus;': '\u2a71',
'epsi;': '\u03b5',
'Epsilon;': '\u0395',
'epsilon;': '\u03b5',
'epsiv;': '\u03f5',
'eqcirc;': '\u2256',
'eqcolon;': '\u2255',
'eqsim;': '\u2242',
'eqslantgtr;': '\u2a96',
'eqslantless;': '\u2a95',
'Equal;': '\u2a75',
'equals;': '=',
'EqualTilde;': '\u2242',
'equest;': '\u225f',
'Equilibrium;': '\u21cc',
'equiv;': '\u2261',
'equivDD;': '\u2a78',
'eqvparsl;': '\u29e5',
'erarr;': '\u2971',
'erDot;': '\u2253',
'Escr;': '\u2130',
'escr;': '\u212f',
'esdot;': '\u2250',
'Esim;': '\u2a73',
'esim;': '\u2242',
'Eta;': '\u0397',
'eta;': '\u03b7',
'ETH': '\xd0',
'eth': '\xf0',
'ETH;': '\xd0',
'eth;': '\xf0',
'Euml': '\xcb',
'euml': '\xeb',
'Euml;': '\xcb',
'euml;': '\xeb',
'euro;': '\u20ac',
'excl;': '!',
'exist;': '\u2203',
'Exists;': '\u2203',
'expectation;': '\u2130',
'ExponentialE;': '\u2147',
'exponentiale;': '\u2147',
'fallingdotseq;': '\u2252',
'Fcy;': '\u0424',
'fcy;': '\u0444',
'female;': '\u2640',
'ffilig;': '\ufb03',
'fflig;': '\ufb00',
'ffllig;': '\ufb04',
'Ffr;': '\U0001d509',
'ffr;': '\U0001d523',
'filig;': '\ufb01',
'FilledSmallSquare;': '\u25fc',
'FilledVerySmallSquare;': '\u25aa',
'fjlig;': 'fj',
'flat;': '\u266d',
'fllig;': '\ufb02',
'fltns;': '\u25b1',
'fnof;': '\u0192',
'Fopf;': '\U0001d53d',
'fopf;': '\U0001d557',
'ForAll;': '\u2200',
'forall;': '\u2200',
'fork;': '\u22d4',
'forkv;': '\u2ad9',
'Fouriertrf;': '\u2131',
'fpartint;': '\u2a0d',
'frac12': '\xbd',
'frac12;': '\xbd',
'frac13;': '\u2153',
'frac14': '\xbc',
'frac14;': '\xbc',
'frac15;': '\u2155',
'frac16;': '\u2159',
'frac18;': '\u215b',
'frac23;': '\u2154',
'frac25;': '\u2156',
'frac34': '\xbe',
'frac34;': '\xbe',
'frac35;': '\u2157',
'frac38;': '\u215c',
'frac45;': '\u2158',
'frac56;': '\u215a',
'frac58;': '\u215d',
'frac78;': '\u215e',
'frasl;': '\u2044',
'frown;': '\u2322',
'Fscr;': '\u2131',
'fscr;': '\U0001d4bb',
'gacute;': '\u01f5',
'Gamma;': '\u0393',
'gamma;': '\u03b3',
'Gammad;': '\u03dc',
'gammad;': '\u03dd',
'gap;': '\u2a86',
'Gbreve;': '\u011e',
'gbreve;': '\u011f',
'Gcedil;': '\u0122',
'Gcirc;': '\u011c',
'gcirc;': '\u011d',
'Gcy;': '\u0413',
'gcy;': '\u0433',
'Gdot;': '\u0120',
'gdot;': '\u0121',
'gE;': '\u2267',
'ge;': '\u2265',
'gEl;': '\u2a8c',
'gel;': '\u22db',
'geq;': '\u2265',
'geqq;': '\u2267',
'geqslant;': '\u2a7e',
'ges;': '\u2a7e',
'gescc;': '\u2aa9',
'gesdot;': '\u2a80',
'gesdoto;': '\u2a82',
'gesdotol;': '\u2a84',
'gesl;': '\u22db\ufe00',
'gesles;': '\u2a94',
'Gfr;': '\U0001d50a',
'gfr;': '\U0001d524',
'Gg;': '\u22d9',
'gg;': '\u226b',
'ggg;': '\u22d9',
'gimel;': '\u2137',
'GJcy;': '\u0403',
'gjcy;': '\u0453',
'gl;': '\u2277',
'gla;': '\u2aa5',
'glE;': '\u2a92',
'glj;': '\u2aa4',
'gnap;': '\u2a8a',
'gnapprox;': '\u2a8a',
'gnE;': '\u2269',
'gne;': '\u2a88',
'gneq;': '\u2a88',
'gneqq;': '\u2269',
'gnsim;': '\u22e7',
'Gopf;': '\U0001d53e',
'gopf;': '\U0001d558',
'grave;': '`',
'GreaterEqual;': '\u2265',
'GreaterEqualLess;': '\u22db',
'GreaterFullEqual;': '\u2267',
'GreaterGreater;': '\u2aa2',
'GreaterLess;': '\u2277',
'GreaterSlantEqual;': '\u2a7e',
'GreaterTilde;': '\u2273',
'Gscr;': '\U0001d4a2',
'gscr;': '\u210a',
'gsim;': '\u2273',
'gsime;': '\u2a8e',
'gsiml;': '\u2a90',
'GT': '>',
'gt': '>',
'GT;': '>',
'Gt;': '\u226b',
'gt;': '>',
'gtcc;': '\u2aa7',
'gtcir;': '\u2a7a',
'gtdot;': '\u22d7',
'gtlPar;': '\u2995',
'gtquest;': '\u2a7c',
'gtrapprox;': '\u2a86',
'gtrarr;': '\u2978',
'gtrdot;': '\u22d7',
'gtreqless;': '\u22db',
'gtreqqless;': '\u2a8c',
'gtrless;': '\u2277',
'gtrsim;': '\u2273',
'gvertneqq;': '\u2269\ufe00',
'gvnE;': '\u2269\ufe00',
'Hacek;': '\u02c7',
'hairsp;': '\u200a',
'half;': '\xbd',
'hamilt;': '\u210b',
'HARDcy;': '\u042a',
'hardcy;': '\u044a',
'hArr;': '\u21d4',
'harr;': '\u2194',
'harrcir;': '\u2948',
'harrw;': '\u21ad',
'Hat;': '^',
'hbar;': '\u210f',
'Hcirc;': '\u0124',
'hcirc;': '\u0125',
'hearts;': '\u2665',
'heartsuit;': '\u2665',
'hellip;': '\u2026',
'hercon;': '\u22b9',
'Hfr;': '\u210c',
'hfr;': '\U0001d525',
'HilbertSpace;': '\u210b',
'hksearow;': '\u2925',
'hkswarow;': '\u2926',
'hoarr;': '\u21ff',
'homtht;': '\u223b',
'hookleftarrow;': '\u21a9',
'hookrightarrow;': '\u21aa',
'Hopf;': '\u210d',
'hopf;': '\U0001d559',
'horbar;': '\u2015',
'HorizontalLine;': '\u2500',
'Hscr;': '\u210b',
'hscr;': '\U0001d4bd',
'hslash;': '\u210f',
'Hstrok;': '\u0126',
'hstrok;': '\u0127',
'HumpDownHump;': '\u224e',
'HumpEqual;': '\u224f',
'hybull;': '\u2043',
'hyphen;': '\u2010',
'Iacute': '\xcd',
'iacute': '\xed',
'Iacute;': '\xcd',
'iacute;': '\xed',
'ic;': '\u2063',
'Icirc': '\xce',
'icirc': '\xee',
'Icirc;': '\xce',
'icirc;': '\xee',
'Icy;': '\u0418',
'icy;': '\u0438',
'Idot;': '\u0130',
'IEcy;': '\u0415',
'iecy;': '\u0435',
'iexcl': '\xa1',
'iexcl;': '\xa1',
'iff;': '\u21d4',
'Ifr;': '\u2111',
'ifr;': '\U0001d526',
'Igrave': '\xcc',
'igrave': '\xec',
'Igrave;': '\xcc',
'igrave;': '\xec',
'ii;': '\u2148',
'iiiint;': '\u2a0c',
'iiint;': '\u222d',
'iinfin;': '\u29dc',
'iiota;': '\u2129',
'IJlig;': '\u0132',
'ijlig;': '\u0133',
'Im;': '\u2111',
'Imacr;': '\u012a',
'imacr;': '\u012b',
'image;': '\u2111',
'ImaginaryI;': '\u2148',
'imagline;': '\u2110',
'imagpart;': '\u2111',
'imath;': '\u0131',
'imof;': '\u22b7',
'imped;': '\u01b5',
'Implies;': '\u21d2',
'in;': '\u2208',
'incare;': '\u2105',
'infin;': '\u221e',
'infintie;': '\u29dd',
'inodot;': '\u0131',
'Int;': '\u222c',
'int;': '\u222b',
'intcal;': '\u22ba',
'integers;': '\u2124',
'Integral;': '\u222b',
'intercal;': '\u22ba',
'Intersection;': '\u22c2',
'intlarhk;': '\u2a17',
'intprod;': '\u2a3c',
'InvisibleComma;': '\u2063',
'InvisibleTimes;': '\u2062',
'IOcy;': '\u0401',
'iocy;': '\u0451',
'Iogon;': '\u012e',
'iogon;': '\u012f',
'Iopf;': '\U0001d540',
'iopf;': '\U0001d55a',
'Iota;': '\u0399',
'iota;': '\u03b9',
'iprod;': '\u2a3c',
'iquest': '\xbf',
'iquest;': '\xbf',
'Iscr;': '\u2110',
'iscr;': '\U0001d4be',
'isin;': '\u2208',
'isindot;': '\u22f5',
'isinE;': '\u22f9',
'isins;': '\u22f4',
'isinsv;': '\u22f3',
'isinv;': '\u2208',
'it;': '\u2062',
'Itilde;': '\u0128',
'itilde;': '\u0129',
'Iukcy;': '\u0406',
'iukcy;': '\u0456',
'Iuml': '\xcf',
'iuml': '\xef',
'Iuml;': '\xcf',
'iuml;': '\xef',
'Jcirc;': '\u0134',
'jcirc;': '\u0135',
'Jcy;': '\u0419',
'jcy;': '\u0439',
'Jfr;': '\U0001d50d',
'jfr;': '\U0001d527',
'jmath;': '\u0237',
'Jopf;': '\U0001d541',
'jopf;': '\U0001d55b',
'Jscr;': '\U0001d4a5',
'jscr;': '\U0001d4bf',
'Jsercy;': '\u0408',
'jsercy;': '\u0458',
'Jukcy;': '\u0404',
'jukcy;': '\u0454',
'Kappa;': '\u039a',
'kappa;': '\u03ba',
'kappav;': '\u03f0',
'Kcedil;': '\u0136',
'kcedil;': '\u0137',
'Kcy;': '\u041a',
'kcy;': '\u043a',
'Kfr;': '\U0001d50e',
'kfr;': '\U0001d528',
'kgreen;': '\u0138',
'KHcy;': '\u0425',
'khcy;': '\u0445',
'KJcy;': '\u040c',
'kjcy;': '\u045c',
'Kopf;': '\U0001d542',
'kopf;': '\U0001d55c',
'Kscr;': '\U0001d4a6',
'kscr;': '\U0001d4c0',
'lAarr;': '\u21da',
'Lacute;': '\u0139',
'lacute;': '\u013a',
'laemptyv;': '\u29b4',
'lagran;': '\u2112',
'Lambda;': '\u039b',
'lambda;': '\u03bb',
'Lang;': '\u27ea',
'lang;': '\u27e8',
'langd;': '\u2991',
'langle;': '\u27e8',
'lap;': '\u2a85',
'Laplacetrf;': '\u2112',
'laquo': '\xab',
'laquo;': '\xab',
'Larr;': '\u219e',
'lArr;': '\u21d0',
'larr;': '\u2190',
'larrb;': '\u21e4',
'larrbfs;': '\u291f',
'larrfs;': '\u291d',
'larrhk;': '\u21a9',
'larrlp;': '\u21ab',
'larrpl;': '\u2939',
'larrsim;': '\u2973',
'larrtl;': '\u21a2',
'lat;': '\u2aab',
'lAtail;': '\u291b',
'latail;': '\u2919',
'late;': '\u2aad',
'lates;': '\u2aad\ufe00',
'lBarr;': '\u290e',
'lbarr;': '\u290c',
'lbbrk;': '\u2772',
'lbrace;': '{',
'lbrack;': '[',
'lbrke;': '\u298b',
'lbrksld;': '\u298f',
'lbrkslu;': '\u298d',
'Lcaron;': '\u013d',
'lcaron;': '\u013e',
'Lcedil;': '\u013b',
'lcedil;': '\u013c',
'lceil;': '\u2308',
'lcub;': '{',
'Lcy;': '\u041b',
'lcy;': '\u043b',
'ldca;': '\u2936',
'ldquo;': '\u201c',
'ldquor;': '\u201e',
'ldrdhar;': '\u2967',
'ldrushar;': '\u294b',
'ldsh;': '\u21b2',
'lE;': '\u2266',
'le;': '\u2264',
'LeftAngleBracket;': '\u27e8',
'LeftArrow;': '\u2190',
'Leftarrow;': '\u21d0',
'leftarrow;': '\u2190',
'LeftArrowBar;': '\u21e4',
'LeftArrowRightArrow;': '\u21c6',
'leftarrowtail;': '\u21a2',
'LeftCeiling;': '\u2308',
'LeftDoubleBracket;': '\u27e6',
'LeftDownTeeVector;': '\u2961',
'LeftDownVector;': '\u21c3',
'LeftDownVectorBar;': '\u2959',
'LeftFloor;': '\u230a',
'leftharpoondown;': '\u21bd',
'leftharpoonup;': '\u21bc',
'leftleftarrows;': '\u21c7',
'LeftRightArrow;': '\u2194',
'Leftrightarrow;': '\u21d4',
'leftrightarrow;': '\u2194',
'leftrightarrows;': '\u21c6',
'leftrightharpoons;': '\u21cb',
'leftrightsquigarrow;': '\u21ad',
'LeftRightVector;': '\u294e',
'LeftTee;': '\u22a3',
'LeftTeeArrow;': '\u21a4',
'LeftTeeVector;': '\u295a',
'leftthreetimes;': '\u22cb',
'LeftTriangle;': '\u22b2',
'LeftTriangleBar;': '\u29cf',
'LeftTriangleEqual;': '\u22b4',
'LeftUpDownVector;': '\u2951',
'LeftUpTeeVector;': '\u2960',
'LeftUpVector;': '\u21bf',
'LeftUpVectorBar;': '\u2958',
'LeftVector;': '\u21bc',
'LeftVectorBar;': '\u2952',
'lEg;': '\u2a8b',
'leg;': '\u22da',
'leq;': '\u2264',
'leqq;': '\u2266',
'leqslant;': '\u2a7d',
'les;': '\u2a7d',
'lescc;': '\u2aa8',
'lesdot;': '\u2a7f',
'lesdoto;': '\u2a81',
'lesdotor;': '\u2a83',
'lesg;': '\u22da\ufe00',
'lesges;': '\u2a93',
'lessapprox;': '\u2a85',
'lessdot;': '\u22d6',
'lesseqgtr;': '\u22da',
'lesseqqgtr;': '\u2a8b',
'LessEqualGreater;': '\u22da',
'LessFullEqual;': '\u2266',
'LessGreater;': '\u2276',
'lessgtr;': '\u2276',
'LessLess;': '\u2aa1',
'lesssim;': '\u2272',
'LessSlantEqual;': '\u2a7d',
'LessTilde;': '\u2272',
'lfisht;': '\u297c',
'lfloor;': '\u230a',
'Lfr;': '\U0001d50f',
'lfr;': '\U0001d529',
'lg;': '\u2276',
'lgE;': '\u2a91',
'lHar;': '\u2962',
'lhard;': '\u21bd',
'lharu;': '\u21bc',
'lharul;': '\u296a',
'lhblk;': '\u2584',
'LJcy;': '\u0409',
'ljcy;': '\u0459',
'Ll;': '\u22d8',
'll;': '\u226a',
'llarr;': '\u21c7',
'llcorner;': '\u231e',
'Lleftarrow;': '\u21da',
'llhard;': '\u296b',
'lltri;': '\u25fa',
'Lmidot;': '\u013f',
'lmidot;': '\u0140',
'lmoust;': '\u23b0',
'lmoustache;': '\u23b0',
'lnap;': '\u2a89',
'lnapprox;': '\u2a89',
'lnE;': '\u2268',
'lne;': '\u2a87',
'lneq;': '\u2a87',
'lneqq;': '\u2268',
'lnsim;': '\u22e6',
'loang;': '\u27ec',
'loarr;': '\u21fd',
'lobrk;': '\u27e6',
'LongLeftArrow;': '\u27f5',
'Longleftarrow;': '\u27f8',
'longleftarrow;': '\u27f5',
'LongLeftRightArrow;': '\u27f7',
'Longleftrightarrow;': '\u27fa',
'longleftrightarrow;': '\u27f7',
'longmapsto;': '\u27fc',
'LongRightArrow;': '\u27f6',
'Longrightarrow;': '\u27f9',
'longrightarrow;': '\u27f6',
'looparrowleft;': '\u21ab',
'looparrowright;': '\u21ac',
'lopar;': '\u2985',
'Lopf;': '\U0001d543',
'lopf;': '\U0001d55d',
'loplus;': '\u2a2d',
'lotimes;': '\u2a34',
'lowast;': '\u2217',
'lowbar;': '_',
'LowerLeftArrow;': '\u2199',
'LowerRightArrow;': '\u2198',
'loz;': '\u25ca',
'lozenge;': '\u25ca',
'lozf;': '\u29eb',
'lpar;': '(',
'lparlt;': '\u2993',
'lrarr;': '\u21c6',
'lrcorner;': '\u231f',
'lrhar;': '\u21cb',
'lrhard;': '\u296d',
'lrm;': '\u200e',
'lrtri;': '\u22bf',
'lsaquo;': '\u2039',
'Lscr;': '\u2112',
'lscr;': '\U0001d4c1',
'Lsh;': '\u21b0',
'lsh;': '\u21b0',
'lsim;': '\u2272',
'lsime;': '\u2a8d',
'lsimg;': '\u2a8f',
'lsqb;': '[',
'lsquo;': '\u2018',
'lsquor;': '\u201a',
'Lstrok;': '\u0141',
'lstrok;': '\u0142',
'LT': '<',
'lt': '<',
'LT;': '<',
'Lt;': '\u226a',
'lt;': '<',
'ltcc;': '\u2aa6',
'ltcir;': '\u2a79',
'ltdot;': '\u22d6',
'lthree;': '\u22cb',
'ltimes;': '\u22c9',
'ltlarr;': '\u2976',
'ltquest;': '\u2a7b',
'ltri;': '\u25c3',
'ltrie;': '\u22b4',
'ltrif;': '\u25c2',
'ltrPar;': '\u2996',
'lurdshar;': '\u294a',
'luruhar;': '\u2966',
'lvertneqq;': '\u2268\ufe00',
'lvnE;': '\u2268\ufe00',
'macr': '\xaf',
'macr;': '\xaf',
'male;': '\u2642',
'malt;': '\u2720',
'maltese;': '\u2720',
'Map;': '\u2905',
'map;': '\u21a6',
'mapsto;': '\u21a6',
'mapstodown;': '\u21a7',
'mapstoleft;': '\u21a4',
'mapstoup;': '\u21a5',
'marker;': '\u25ae',
'mcomma;': '\u2a29',
'Mcy;': '\u041c',
'mcy;': '\u043c',
'mdash;': '\u2014',
'mDDot;': '\u223a',
'measuredangle;': '\u2221',
'MediumSpace;': '\u205f',
'Mellintrf;': '\u2133',
'Mfr;': '\U0001d510',
'mfr;': '\U0001d52a',
'mho;': '\u2127',
'micro': '\xb5',
'micro;': '\xb5',
'mid;': '\u2223',
'midast;': '*',
'midcir;': '\u2af0',
'middot': '\xb7',
'middot;': '\xb7',
'minus;': '\u2212',
'minusb;': '\u229f',
'minusd;': '\u2238',
'minusdu;': '\u2a2a',
'MinusPlus;': '\u2213',
'mlcp;': '\u2adb',
'mldr;': '\u2026',
'mnplus;': '\u2213',
'models;': '\u22a7',
'Mopf;': '\U0001d544',
'mopf;': '\U0001d55e',
'mp;': '\u2213',
'Mscr;': '\u2133',
'mscr;': '\U0001d4c2',
'mstpos;': '\u223e',
'Mu;': '\u039c',
'mu;': '\u03bc',
'multimap;': '\u22b8',
'mumap;': '\u22b8',
'nabla;': '\u2207',
'Nacute;': '\u0143',
'nacute;': '\u0144',
'nang;': '\u2220\u20d2',
'nap;': '\u2249',
'napE;': '\u2a70\u0338',
'napid;': '\u224b\u0338',
'napos;': '\u0149',
'napprox;': '\u2249',
'natur;': '\u266e',
'natural;': '\u266e',
'naturals;': '\u2115',
'nbsp': '\xa0',
'nbsp;': '\xa0',
'nbump;': '\u224e\u0338',
'nbumpe;': '\u224f\u0338',
'ncap;': '\u2a43',
'Ncaron;': '\u0147',
'ncaron;': '\u0148',
'Ncedil;': '\u0145',
'ncedil;': '\u0146',
'ncong;': '\u2247',
'ncongdot;': '\u2a6d\u0338',
'ncup;': '\u2a42',
'Ncy;': '\u041d',
'ncy;': '\u043d',
'ndash;': '\u2013',
'ne;': '\u2260',
'nearhk;': '\u2924',
'neArr;': '\u21d7',
'nearr;': '\u2197',
'nearrow;': '\u2197',
'nedot;': '\u2250\u0338',
'NegativeMediumSpace;': '\u200b',
'NegativeThickSpace;': '\u200b',
'NegativeThinSpace;': '\u200b',
'NegativeVeryThinSpace;': '\u200b',
'nequiv;': '\u2262',
'nesear;': '\u2928',
'nesim;': '\u2242\u0338',
'NestedGreaterGreater;': '\u226b',
'NestedLessLess;': '\u226a',
'NewLine;': '\n',
'nexist;': '\u2204',
'nexists;': '\u2204',
'Nfr;': '\U0001d511',
'nfr;': '\U0001d52b',
'ngE;': '\u2267\u0338',
'nge;': '\u2271',
'ngeq;': '\u2271',
'ngeqq;': '\u2267\u0338',
'ngeqslant;': '\u2a7e\u0338',
'nges;': '\u2a7e\u0338',
'nGg;': '\u22d9\u0338',
'ngsim;': '\u2275',
'nGt;': '\u226b\u20d2',
'ngt;': '\u226f',
'ngtr;': '\u226f',
'nGtv;': '\u226b\u0338',
'nhArr;': '\u21ce',
'nharr;': '\u21ae',
'nhpar;': '\u2af2',
'ni;': '\u220b',
'nis;': '\u22fc',
'nisd;': '\u22fa',
'niv;': '\u220b',
'NJcy;': '\u040a',
'njcy;': '\u045a',
'nlArr;': '\u21cd',
'nlarr;': '\u219a',
'nldr;': '\u2025',
'nlE;': '\u2266\u0338',
'nle;': '\u2270',
'nLeftarrow;': '\u21cd',
'nleftarrow;': '\u219a',
'nLeftrightarrow;': '\u21ce',
'nleftrightarrow;': '\u21ae',
'nleq;': '\u2270',
'nleqq;': '\u2266\u0338',
'nleqslant;': '\u2a7d\u0338',
'nles;': '\u2a7d\u0338',
'nless;': '\u226e',
'nLl;': '\u22d8\u0338',
'nlsim;': '\u2274',
'nLt;': '\u226a\u20d2',
'nlt;': '\u226e',
'nltri;': '\u22ea',
'nltrie;': '\u22ec',
'nLtv;': '\u226a\u0338',
'nmid;': '\u2224',
'NoBreak;': '\u2060',
'NonBreakingSpace;': '\xa0',
'Nopf;': '\u2115',
'nopf;': '\U0001d55f',
'not': '\xac',
'Not;': '\u2aec',
'not;': '\xac',
'NotCongruent;': '\u2262',
'NotCupCap;': '\u226d',
'NotDoubleVerticalBar;': '\u2226',
'NotElement;': '\u2209',
'NotEqual;': '\u2260',
'NotEqualTilde;': '\u2242\u0338',
'NotExists;': '\u2204',
'NotGreater;': '\u226f',
'NotGreaterEqual;': '\u2271',
'NotGreaterFullEqual;': '\u2267\u0338',
'NotGreaterGreater;': '\u226b\u0338',
'NotGreaterLess;': '\u2279',
'NotGreaterSlantEqual;': '\u2a7e\u0338',
'NotGreaterTilde;': '\u2275',
'NotHumpDownHump;': '\u224e\u0338',
'NotHumpEqual;': '\u224f\u0338',
'notin;': '\u2209',
'notindot;': '\u22f5\u0338',
'notinE;': '\u22f9\u0338',
'notinva;': '\u2209',
'notinvb;': '\u22f7',
'notinvc;': '\u22f6',
'NotLeftTriangle;': '\u22ea',
'NotLeftTriangleBar;': '\u29cf\u0338',
'NotLeftTriangleEqual;': '\u22ec',
'NotLess;': '\u226e',
'NotLessEqual;': '\u2270',
'NotLessGreater;': '\u2278',
'NotLessLess;': '\u226a\u0338',
'NotLessSlantEqual;': '\u2a7d\u0338',
'NotLessTilde;': '\u2274',
'NotNestedGreaterGreater;': '\u2aa2\u0338',
'NotNestedLessLess;': '\u2aa1\u0338',
'notni;': '\u220c',
'notniva;': '\u220c',
'notnivb;': '\u22fe',
'notnivc;': '\u22fd',
'NotPrecedes;': '\u2280',
'NotPrecedesEqual;': '\u2aaf\u0338',
'NotPrecedesSlantEqual;': '\u22e0',
'NotReverseElement;': '\u220c',
'NotRightTriangle;': '\u22eb',
'NotRightTriangleBar;': '\u29d0\u0338',
'NotRightTriangleEqual;': '\u22ed',
'NotSquareSubset;': '\u228f\u0338',
'NotSquareSubsetEqual;': '\u22e2',
'NotSquareSuperset;': '\u2290\u0338',
'NotSquareSupersetEqual;': '\u22e3',
'NotSubset;': '\u2282\u20d2',
'NotSubsetEqual;': '\u2288',
'NotSucceeds;': '\u2281',
'NotSucceedsEqual;': '\u2ab0\u0338',
'NotSucceedsSlantEqual;': '\u22e1',
'NotSucceedsTilde;': '\u227f\u0338',
'NotSuperset;': '\u2283\u20d2',
'NotSupersetEqual;': '\u2289',
'NotTilde;': '\u2241',
'NotTildeEqual;': '\u2244',
'NotTildeFullEqual;': '\u2247',
'NotTildeTilde;': '\u2249',
'NotVerticalBar;': '\u2224',
'npar;': '\u2226',
'nparallel;': '\u2226',
'nparsl;': '\u2afd\u20e5',
'npart;': '\u2202\u0338',
'npolint;': '\u2a14',
'npr;': '\u2280',
'nprcue;': '\u22e0',
'npre;': '\u2aaf\u0338',
'nprec;': '\u2280',
'npreceq;': '\u2aaf\u0338',
'nrArr;': '\u21cf',
'nrarr;': '\u219b',
'nrarrc;': '\u2933\u0338',
'nrarrw;': '\u219d\u0338',
'nRightarrow;': '\u21cf',
'nrightarrow;': '\u219b',
'nrtri;': '\u22eb',
'nrtrie;': '\u22ed',
'nsc;': '\u2281',
'nsccue;': '\u22e1',
'nsce;': '\u2ab0\u0338',
'Nscr;': '\U0001d4a9',
'nscr;': '\U0001d4c3',
'nshortmid;': '\u2224',
'nshortparallel;': '\u2226',
'nsim;': '\u2241',
'nsime;': '\u2244',
'nsimeq;': '\u2244',
'nsmid;': '\u2224',
'nspar;': '\u2226',
'nsqsube;': '\u22e2',
'nsqsupe;': '\u22e3',
'nsub;': '\u2284',
'nsubE;': '\u2ac5\u0338',
'nsube;': '\u2288',
'nsubset;': '\u2282\u20d2',
'nsubseteq;': '\u2288',
'nsubseteqq;': '\u2ac5\u0338',
'nsucc;': '\u2281',
'nsucceq;': '\u2ab0\u0338',
'nsup;': '\u2285',
'nsupE;': '\u2ac6\u0338',
'nsupe;': '\u2289',
'nsupset;': '\u2283\u20d2',
'nsupseteq;': '\u2289',
'nsupseteqq;': '\u2ac6\u0338',
'ntgl;': '\u2279',
'Ntilde': '\xd1',
'ntilde': '\xf1',
'Ntilde;': '\xd1',
'ntilde;': '\xf1',
'ntlg;': '\u2278',
'ntriangleleft;': '\u22ea',
'ntrianglelefteq;': '\u22ec',
'ntriangleright;': '\u22eb',
'ntrianglerighteq;': '\u22ed',
'Nu;': '\u039d',
'nu;': '\u03bd',
'num;': '#',
'numero;': '\u2116',
'numsp;': '\u2007',
'nvap;': '\u224d\u20d2',
'nVDash;': '\u22af',
'nVdash;': '\u22ae',
'nvDash;': '\u22ad',
'nvdash;': '\u22ac',
'nvge;': '\u2265\u20d2',
'nvgt;': '>\u20d2',
'nvHarr;': '\u2904',
'nvinfin;': '\u29de',
'nvlArr;': '\u2902',
'nvle;': '\u2264\u20d2',
'nvlt;': '<\u20d2',
'nvltrie;': '\u22b4\u20d2',
'nvrArr;': '\u2903',
'nvrtrie;': '\u22b5\u20d2',
'nvsim;': '\u223c\u20d2',
'nwarhk;': '\u2923',
'nwArr;': '\u21d6',
'nwarr;': '\u2196',
'nwarrow;': '\u2196',
'nwnear;': '\u2927',
'Oacute': '\xd3',
'oacute': '\xf3',
'Oacute;': '\xd3',
'oacute;': '\xf3',
'oast;': '\u229b',
'ocir;': '\u229a',
'Ocirc': '\xd4',
'ocirc': '\xf4',
'Ocirc;': '\xd4',
'ocirc;': '\xf4',
'Ocy;': '\u041e',
'ocy;': '\u043e',
'odash;': '\u229d',
'Odblac;': '\u0150',
'odblac;': '\u0151',
'odiv;': '\u2a38',
'odot;': '\u2299',
'odsold;': '\u29bc',
'OElig;': '\u0152',
'oelig;': '\u0153',
'ofcir;': '\u29bf',
'Ofr;': '\U0001d512',
'ofr;': '\U0001d52c',
'ogon;': '\u02db',
'Ograve': '\xd2',
'ograve': '\xf2',
'Ograve;': '\xd2',
'ograve;': '\xf2',
'ogt;': '\u29c1',
'ohbar;': '\u29b5',
'ohm;': '\u03a9',
'oint;': '\u222e',
'olarr;': '\u21ba',
'olcir;': '\u29be',
'olcross;': '\u29bb',
'oline;': '\u203e',
'olt;': '\u29c0',
'Omacr;': '\u014c',
'omacr;': '\u014d',
'Omega;': '\u03a9',
'omega;': '\u03c9',
'Omicron;': '\u039f',
'omicron;': '\u03bf',
'omid;': '\u29b6',
'ominus;': '\u2296',
'Oopf;': '\U0001d546',
'oopf;': '\U0001d560',
'opar;': '\u29b7',
'OpenCurlyDoubleQuote;': '\u201c',
'OpenCurlyQuote;': '\u2018',
'operp;': '\u29b9',
'oplus;': '\u2295',
'Or;': '\u2a54',
'or;': '\u2228',
'orarr;': '\u21bb',
'ord;': '\u2a5d',
'order;': '\u2134',
'orderof;': '\u2134',
'ordf': '\xaa',
'ordf;': '\xaa',
'ordm': '\xba',
'ordm;': '\xba',
'origof;': '\u22b6',
'oror;': '\u2a56',
'orslope;': '\u2a57',
'orv;': '\u2a5b',
'oS;': '\u24c8',
'Oscr;': '\U0001d4aa',
'oscr;': '\u2134',
'Oslash': '\xd8',
'oslash': '\xf8',
'Oslash;': '\xd8',
'oslash;': '\xf8',
'osol;': '\u2298',
'Otilde': '\xd5',
'otilde': '\xf5',
'Otilde;': '\xd5',
'otilde;': '\xf5',
'Otimes;': '\u2a37',
'otimes;': '\u2297',
'otimesas;': '\u2a36',
'Ouml': '\xd6',
'ouml': '\xf6',
'Ouml;': '\xd6',
'ouml;': '\xf6',
'ovbar;': '\u233d',
'OverBar;': '\u203e',
'OverBrace;': '\u23de',
'OverBracket;': '\u23b4',
'OverParenthesis;': '\u23dc',
'par;': '\u2225',
'para': '\xb6',
'para;': '\xb6',
'parallel;': '\u2225',
'parsim;': '\u2af3',
'parsl;': '\u2afd',
'part;': '\u2202',
'PartialD;': '\u2202',
'Pcy;': '\u041f',
'pcy;': '\u043f',
'percnt;': '%',
'period;': '.',
'permil;': '\u2030',
'perp;': '\u22a5',
'pertenk;': '\u2031',
'Pfr;': '\U0001d513',
'pfr;': '\U0001d52d',
'Phi;': '\u03a6',
'phi;': '\u03c6',
'phiv;': '\u03d5',
'phmmat;': '\u2133',
'phone;': '\u260e',
'Pi;': '\u03a0',
'pi;': '\u03c0',
'pitchfork;': '\u22d4',
'piv;': '\u03d6',
'planck;': '\u210f',
'planckh;': '\u210e',
'plankv;': '\u210f',
'plus;': '+',
'plusacir;': '\u2a23',
'plusb;': '\u229e',
'pluscir;': '\u2a22',
'plusdo;': '\u2214',
'plusdu;': '\u2a25',
'pluse;': '\u2a72',
'PlusMinus;': '\xb1',
'plusmn': '\xb1',
'plusmn;': '\xb1',
'plussim;': '\u2a26',
'plustwo;': '\u2a27',
'pm;': '\xb1',
'Poincareplane;': '\u210c',
'pointint;': '\u2a15',
'Popf;': '\u2119',
'popf;': '\U0001d561',
'pound': '\xa3',
'pound;': '\xa3',
'Pr;': '\u2abb',
'pr;': '\u227a',
'prap;': '\u2ab7',
'prcue;': '\u227c',
'prE;': '\u2ab3',
'pre;': '\u2aaf',
'prec;': '\u227a',
'precapprox;': '\u2ab7',
'preccurlyeq;': '\u227c',
'Precedes;': '\u227a',
'PrecedesEqual;': '\u2aaf',
'PrecedesSlantEqual;': '\u227c',
'PrecedesTilde;': '\u227e',
'preceq;': '\u2aaf',
'precnapprox;': '\u2ab9',
'precneqq;': '\u2ab5',
'precnsim;': '\u22e8',
'precsim;': '\u227e',
'Prime;': '\u2033',
'prime;': '\u2032',
'primes;': '\u2119',
'prnap;': '\u2ab9',
'prnE;': '\u2ab5',
'prnsim;': '\u22e8',
'prod;': '\u220f',
'Product;': '\u220f',
'profalar;': '\u232e',
'profline;': '\u2312',
'profsurf;': '\u2313',
'prop;': '\u221d',
'Proportion;': '\u2237',
'Proportional;': '\u221d',
'propto;': '\u221d',
'prsim;': '\u227e',
'prurel;': '\u22b0',
'Pscr;': '\U0001d4ab',
'pscr;': '\U0001d4c5',
'Psi;': '\u03a8',
'psi;': '\u03c8',
'puncsp;': '\u2008',
'Qfr;': '\U0001d514',
'qfr;': '\U0001d52e',
'qint;': '\u2a0c',
'Qopf;': '\u211a',
'qopf;': '\U0001d562',
'qprime;': '\u2057',
'Qscr;': '\U0001d4ac',
'qscr;': '\U0001d4c6',
'quaternions;': '\u210d',
'quatint;': '\u2a16',
'quest;': '?',
'questeq;': '\u225f',
'QUOT': '"',
'quot': '"',
'QUOT;': '"',
'quot;': '"',
'rAarr;': '\u21db',
'race;': '\u223d\u0331',
'Racute;': '\u0154',
'racute;': '\u0155',
'radic;': '\u221a',
'raemptyv;': '\u29b3',
'Rang;': '\u27eb',
'rang;': '\u27e9',
'rangd;': '\u2992',
'range;': '\u29a5',
'rangle;': '\u27e9',
'raquo': '\xbb',
'raquo;': '\xbb',
'Rarr;': '\u21a0',
'rArr;': '\u21d2',
'rarr;': '\u2192',
'rarrap;': '\u2975',
'rarrb;': '\u21e5',
'rarrbfs;': '\u2920',
'rarrc;': '\u2933',
'rarrfs;': '\u291e',
'rarrhk;': '\u21aa',
'rarrlp;': '\u21ac',
'rarrpl;': '\u2945',
'rarrsim;': '\u2974',
'Rarrtl;': '\u2916',
'rarrtl;': '\u21a3',
'rarrw;': '\u219d',
'rAtail;': '\u291c',
'ratail;': '\u291a',
'ratio;': '\u2236',
'rationals;': '\u211a',
'RBarr;': '\u2910',
'rBarr;': '\u290f',
'rbarr;': '\u290d',
'rbbrk;': '\u2773',
'rbrace;': '}',
'rbrack;': ']',
'rbrke;': '\u298c',
'rbrksld;': '\u298e',
'rbrkslu;': '\u2990',
'Rcaron;': '\u0158',
'rcaron;': '\u0159',
'Rcedil;': '\u0156',
'rcedil;': '\u0157',
'rceil;': '\u2309',
'rcub;': '}',
'Rcy;': '\u0420',
'rcy;': '\u0440',
'rdca;': '\u2937',
'rdldhar;': '\u2969',
'rdquo;': '\u201d',
'rdquor;': '\u201d',
'rdsh;': '\u21b3',
'Re;': '\u211c',
'real;': '\u211c',
'realine;': '\u211b',
'realpart;': '\u211c',
'reals;': '\u211d',
'rect;': '\u25ad',
'REG': '\xae',
'reg': '\xae',
'REG;': '\xae',
'reg;': '\xae',
'ReverseElement;': '\u220b',
'ReverseEquilibrium;': '\u21cb',
'ReverseUpEquilibrium;': '\u296f',
'rfisht;': '\u297d',
'rfloor;': '\u230b',
'Rfr;': '\u211c',
'rfr;': '\U0001d52f',
'rHar;': '\u2964',
'rhard;': '\u21c1',
'rharu;': '\u21c0',
'rharul;': '\u296c',
'Rho;': '\u03a1',
'rho;': '\u03c1',
'rhov;': '\u03f1',
'RightAngleBracket;': '\u27e9',
'RightArrow;': '\u2192',
'Rightarrow;': '\u21d2',
'rightarrow;': '\u2192',
'RightArrowBar;': '\u21e5',
'RightArrowLeftArrow;': '\u21c4',
'rightarrowtail;': '\u21a3',
'RightCeiling;': '\u2309',
'RightDoubleBracket;': '\u27e7',
'RightDownTeeVector;': '\u295d',
'RightDownVector;': '\u21c2',
'RightDownVectorBar;': '\u2955',
'RightFloor;': '\u230b',
'rightharpoondown;': '\u21c1',
'rightharpoonup;': '\u21c0',
'rightleftarrows;': '\u21c4',
'rightleftharpoons;': '\u21cc',
'rightrightarrows;': '\u21c9',
'rightsquigarrow;': '\u219d',
'RightTee;': '\u22a2',
'RightTeeArrow;': '\u21a6',
'RightTeeVector;': '\u295b',
'rightthreetimes;': '\u22cc',
'RightTriangle;': '\u22b3',
'RightTriangleBar;': '\u29d0',
'RightTriangleEqual;': '\u22b5',
'RightUpDownVector;': '\u294f',
'RightUpTeeVector;': '\u295c',
'RightUpVector;': '\u21be',
'RightUpVectorBar;': '\u2954',
'RightVector;': '\u21c0',
'RightVectorBar;': '\u2953',
'ring;': '\u02da',
'risingdotseq;': '\u2253',
'rlarr;': '\u21c4',
'rlhar;': '\u21cc',
'rlm;': '\u200f',
'rmoust;': '\u23b1',
'rmoustache;': '\u23b1',
'rnmid;': '\u2aee',
'roang;': '\u27ed',
'roarr;': '\u21fe',
'robrk;': '\u27e7',
'ropar;': '\u2986',
'Ropf;': '\u211d',
'ropf;': '\U0001d563',
'roplus;': '\u2a2e',
'rotimes;': '\u2a35',
'RoundImplies;': '\u2970',
'rpar;': ')',
'rpargt;': '\u2994',
'rppolint;': '\u2a12',
'rrarr;': '\u21c9',
'Rrightarrow;': '\u21db',
'rsaquo;': '\u203a',
'Rscr;': '\u211b',
'rscr;': '\U0001d4c7',
'Rsh;': '\u21b1',
'rsh;': '\u21b1',
'rsqb;': ']',
'rsquo;': '\u2019',
'rsquor;': '\u2019',
'rthree;': '\u22cc',
'rtimes;': '\u22ca',
'rtri;': '\u25b9',
'rtrie;': '\u22b5',
'rtrif;': '\u25b8',
'rtriltri;': '\u29ce',
'RuleDelayed;': '\u29f4',
'ruluhar;': '\u2968',
'rx;': '\u211e',
'Sacute;': '\u015a',
'sacute;': '\u015b',
'sbquo;': '\u201a',
'Sc;': '\u2abc',
'sc;': '\u227b',
'scap;': '\u2ab8',
'Scaron;': '\u0160',
'scaron;': '\u0161',
'sccue;': '\u227d',
'scE;': '\u2ab4',
'sce;': '\u2ab0',
'Scedil;': '\u015e',
'scedil;': '\u015f',
'Scirc;': '\u015c',
'scirc;': '\u015d',
'scnap;': '\u2aba',
'scnE;': '\u2ab6',
'scnsim;': '\u22e9',
'scpolint;': '\u2a13',
'scsim;': '\u227f',
'Scy;': '\u0421',
'scy;': '\u0441',
'sdot;': '\u22c5',
'sdotb;': '\u22a1',
'sdote;': '\u2a66',
'searhk;': '\u2925',
'seArr;': '\u21d8',
'searr;': '\u2198',
'searrow;': '\u2198',
'sect': '\xa7',
'sect;': '\xa7',
'semi;': ';',
'seswar;': '\u2929',
'setminus;': '\u2216',
'setmn;': '\u2216',
'sext;': '\u2736',
'Sfr;': '\U0001d516',
'sfr;': '\U0001d530',
'sfrown;': '\u2322',
'sharp;': '\u266f',
'SHCHcy;': '\u0429',
'shchcy;': '\u0449',
'SHcy;': '\u0428',
'shcy;': '\u0448',
'ShortDownArrow;': '\u2193',
'ShortLeftArrow;': '\u2190',
'shortmid;': '\u2223',
'shortparallel;': '\u2225',
'ShortRightArrow;': '\u2192',
'ShortUpArrow;': '\u2191',
'shy': '\xad',
'shy;': '\xad',
'Sigma;': '\u03a3',
'sigma;': '\u03c3',
'sigmaf;': '\u03c2',
'sigmav;': '\u03c2',
'sim;': '\u223c',
'simdot;': '\u2a6a',
'sime;': '\u2243',
'simeq;': '\u2243',
'simg;': '\u2a9e',
'simgE;': '\u2aa0',
'siml;': '\u2a9d',
'simlE;': '\u2a9f',
'simne;': '\u2246',
'simplus;': '\u2a24',
'simrarr;': '\u2972',
'slarr;': '\u2190',
'SmallCircle;': '\u2218',
'smallsetminus;': '\u2216',
'smashp;': '\u2a33',
'smeparsl;': '\u29e4',
'smid;': '\u2223',
'smile;': '\u2323',
'smt;': '\u2aaa',
'smte;': '\u2aac',
'smtes;': '\u2aac\ufe00',
'SOFTcy;': '\u042c',
'softcy;': '\u044c',
'sol;': '/',
'solb;': '\u29c4',
'solbar;': '\u233f',
'Sopf;': '\U0001d54a',
'sopf;': '\U0001d564',
'spades;': '\u2660',
'spadesuit;': '\u2660',
'spar;': '\u2225',
'sqcap;': '\u2293',
'sqcaps;': '\u2293\ufe00',
'sqcup;': '\u2294',
'sqcups;': '\u2294\ufe00',
'Sqrt;': '\u221a',
'sqsub;': '\u228f',
'sqsube;': '\u2291',
'sqsubset;': '\u228f',
'sqsubseteq;': '\u2291',
'sqsup;': '\u2290',
'sqsupe;': '\u2292',
'sqsupset;': '\u2290',
'sqsupseteq;': '\u2292',
'squ;': '\u25a1',
'Square;': '\u25a1',
'square;': '\u25a1',
'SquareIntersection;': '\u2293',
'SquareSubset;': '\u228f',
'SquareSubsetEqual;': '\u2291',
'SquareSuperset;': '\u2290',
'SquareSupersetEqual;': '\u2292',
'SquareUnion;': '\u2294',
'squarf;': '\u25aa',
'squf;': '\u25aa',
'srarr;': '\u2192',
'Sscr;': '\U0001d4ae',
'sscr;': '\U0001d4c8',
'ssetmn;': '\u2216',
'ssmile;': '\u2323',
'sstarf;': '\u22c6',
'Star;': '\u22c6',
'star;': '\u2606',
'starf;': '\u2605',
'straightepsilon;': '\u03f5',
'straightphi;': '\u03d5',
'strns;': '\xaf',
'Sub;': '\u22d0',
'sub;': '\u2282',
'subdot;': '\u2abd',
'subE;': '\u2ac5',
'sube;': '\u2286',
'subedot;': '\u2ac3',
'submult;': '\u2ac1',
'subnE;': '\u2acb',
'subne;': '\u228a',
'subplus;': '\u2abf',
'subrarr;': '\u2979',
'Subset;': '\u22d0',
'subset;': '\u2282',
'subseteq;': '\u2286',
'subseteqq;': '\u2ac5',
'SubsetEqual;': '\u2286',
'subsetneq;': '\u228a',
'subsetneqq;': '\u2acb',
'subsim;': '\u2ac7',
'subsub;': '\u2ad5',
'subsup;': '\u2ad3',
'succ;': '\u227b',
'succapprox;': '\u2ab8',
'succcurlyeq;': '\u227d',
'Succeeds;': '\u227b',
'SucceedsEqual;': '\u2ab0',
'SucceedsSlantEqual;': '\u227d',
'SucceedsTilde;': '\u227f',
'succeq;': '\u2ab0',
'succnapprox;': '\u2aba',
'succneqq;': '\u2ab6',
'succnsim;': '\u22e9',
'succsim;': '\u227f',
'SuchThat;': '\u220b',
'Sum;': '\u2211',
'sum;': '\u2211',
'sung;': '\u266a',
'sup1': '\xb9',
'sup1;': '\xb9',
'sup2': '\xb2',
'sup2;': '\xb2',
'sup3': '\xb3',
'sup3;': '\xb3',
'Sup;': '\u22d1',
'sup;': '\u2283',
'supdot;': '\u2abe',
'supdsub;': '\u2ad8',
'supE;': '\u2ac6',
'supe;': '\u2287',
'supedot;': '\u2ac4',
'Superset;': '\u2283',
'SupersetEqual;': '\u2287',
'suphsol;': '\u27c9',
'suphsub;': '\u2ad7',
'suplarr;': '\u297b',
'supmult;': '\u2ac2',
'supnE;': '\u2acc',
'supne;': '\u228b',
'supplus;': '\u2ac0',
'Supset;': '\u22d1',
'supset;': '\u2283',
'supseteq;': '\u2287',
'supseteqq;': '\u2ac6',
'supsetneq;': '\u228b',
'supsetneqq;': '\u2acc',
'supsim;': '\u2ac8',
'supsub;': '\u2ad4',
'supsup;': '\u2ad6',
'swarhk;': '\u2926',
'swArr;': '\u21d9',
'swarr;': '\u2199',
'swarrow;': '\u2199',
'swnwar;': '\u292a',
'szlig': '\xdf',
'szlig;': '\xdf',
'Tab;': '\t',
'target;': '\u2316',
'Tau;': '\u03a4',
'tau;': '\u03c4',
'tbrk;': '\u23b4',
'Tcaron;': '\u0164',
'tcaron;': '\u0165',
'Tcedil;': '\u0162',
'tcedil;': '\u0163',
'Tcy;': '\u0422',
'tcy;': '\u0442',
'tdot;': '\u20db',
'telrec;': '\u2315',
'Tfr;': '\U0001d517',
'tfr;': '\U0001d531',
'there4;': '\u2234',
'Therefore;': '\u2234',
'therefore;': '\u2234',
'Theta;': '\u0398',
'theta;': '\u03b8',
'thetasym;': '\u03d1',
'thetav;': '\u03d1',
'thickapprox;': '\u2248',
'thicksim;': '\u223c',
'ThickSpace;': '\u205f\u200a',
'thinsp;': '\u2009',
'ThinSpace;': '\u2009',
'thkap;': '\u2248',
'thksim;': '\u223c',
'THORN': '\xde',
'thorn': '\xfe',
'THORN;': '\xde',
'thorn;': '\xfe',
'Tilde;': '\u223c',
'tilde;': '\u02dc',
'TildeEqual;': '\u2243',
'TildeFullEqual;': '\u2245',
'TildeTilde;': '\u2248',
'times': '\xd7',
'times;': '\xd7',
'timesb;': '\u22a0',
'timesbar;': '\u2a31',
'timesd;': '\u2a30',
'tint;': '\u222d',
'toea;': '\u2928',
'top;': '\u22a4',
'topbot;': '\u2336',
'topcir;': '\u2af1',
'Topf;': '\U0001d54b',
'topf;': '\U0001d565',
'topfork;': '\u2ada',
'tosa;': '\u2929',
'tprime;': '\u2034',
'TRADE;': '\u2122',
'trade;': '\u2122',
'triangle;': '\u25b5',
'triangledown;': '\u25bf',
'triangleleft;': '\u25c3',
'trianglelefteq;': '\u22b4',
'triangleq;': '\u225c',
'triangleright;': '\u25b9',
'trianglerighteq;': '\u22b5',
'tridot;': '\u25ec',
'trie;': '\u225c',
'triminus;': '\u2a3a',
'TripleDot;': '\u20db',
'triplus;': '\u2a39',
'trisb;': '\u29cd',
'tritime;': '\u2a3b',
'trpezium;': '\u23e2',
'Tscr;': '\U0001d4af',
'tscr;': '\U0001d4c9',
'TScy;': '\u0426',
'tscy;': '\u0446',
'TSHcy;': '\u040b',
'tshcy;': '\u045b',
'Tstrok;': '\u0166',
'tstrok;': '\u0167',
'twixt;': '\u226c',
'twoheadleftarrow;': '\u219e',
'twoheadrightarrow;': '\u21a0',
'Uacute': '\xda',
'uacute': '\xfa',
'Uacute;': '\xda',
'uacute;': '\xfa',
'Uarr;': '\u219f',
'uArr;': '\u21d1',
'uarr;': '\u2191',
'Uarrocir;': '\u2949',
'Ubrcy;': '\u040e',
'ubrcy;': '\u045e',
'Ubreve;': '\u016c',
'ubreve;': '\u016d',
'Ucirc': '\xdb',
'ucirc': '\xfb',
'Ucirc;': '\xdb',
'ucirc;': '\xfb',
'Ucy;': '\u0423',
'ucy;': '\u0443',
'udarr;': '\u21c5',
'Udblac;': '\u0170',
'udblac;': '\u0171',
'udhar;': '\u296e',
'ufisht;': '\u297e',
'Ufr;': '\U0001d518',
'ufr;': '\U0001d532',
'Ugrave': '\xd9',
'ugrave': '\xf9',
'Ugrave;': '\xd9',
'ugrave;': '\xf9',
'uHar;': '\u2963',
'uharl;': '\u21bf',
'uharr;': '\u21be',
'uhblk;': '\u2580',
'ulcorn;': '\u231c',
'ulcorner;': '\u231c',
'ulcrop;': '\u230f',
'ultri;': '\u25f8',
'Umacr;': '\u016a',
'umacr;': '\u016b',
'uml': '\xa8',
'uml;': '\xa8',
'UnderBar;': '_',
'UnderBrace;': '\u23df',
'UnderBracket;': '\u23b5',
'UnderParenthesis;': '\u23dd',
'Union;': '\u22c3',
'UnionPlus;': '\u228e',
'Uogon;': '\u0172',
'uogon;': '\u0173',
'Uopf;': '\U0001d54c',
'uopf;': '\U0001d566',
'UpArrow;': '\u2191',
'Uparrow;': '\u21d1',
'uparrow;': '\u2191',
'UpArrowBar;': '\u2912',
'UpArrowDownArrow;': '\u21c5',
'UpDownArrow;': '\u2195',
'Updownarrow;': '\u21d5',
'updownarrow;': '\u2195',
'UpEquilibrium;': '\u296e',
'upharpoonleft;': '\u21bf',
'upharpoonright;': '\u21be',
'uplus;': '\u228e',
'UpperLeftArrow;': '\u2196',
'UpperRightArrow;': '\u2197',
'Upsi;': '\u03d2',
'upsi;': '\u03c5',
'upsih;': '\u03d2',
'Upsilon;': '\u03a5',
'upsilon;': '\u03c5',
'UpTee;': '\u22a5',
'UpTeeArrow;': '\u21a5',
'upuparrows;': '\u21c8',
'urcorn;': '\u231d',
'urcorner;': '\u231d',
'urcrop;': '\u230e',
'Uring;': '\u016e',
'uring;': '\u016f',
'urtri;': '\u25f9',
'Uscr;': '\U0001d4b0',
'uscr;': '\U0001d4ca',
'utdot;': '\u22f0',
'Utilde;': '\u0168',
'utilde;': '\u0169',
'utri;': '\u25b5',
'utrif;': '\u25b4',
'uuarr;': '\u21c8',
'Uuml': '\xdc',
'uuml': '\xfc',
'Uuml;': '\xdc',
'uuml;': '\xfc',
'uwangle;': '\u29a7',
'vangrt;': '\u299c',
'varepsilon;': '\u03f5',
'varkappa;': '\u03f0',
'varnothing;': '\u2205',
'varphi;': '\u03d5',
'varpi;': '\u03d6',
'varpropto;': '\u221d',
'vArr;': '\u21d5',
'varr;': '\u2195',
'varrho;': '\u03f1',
'varsigma;': '\u03c2',
'varsubsetneq;': '\u228a\ufe00',
'varsubsetneqq;': '\u2acb\ufe00',
'varsupsetneq;': '\u228b\ufe00',
'varsupsetneqq;': '\u2acc\ufe00',
'vartheta;': '\u03d1',
'vartriangleleft;': '\u22b2',
'vartriangleright;': '\u22b3',
'Vbar;': '\u2aeb',
'vBar;': '\u2ae8',
'vBarv;': '\u2ae9',
'Vcy;': '\u0412',
'vcy;': '\u0432',
'VDash;': '\u22ab',
'Vdash;': '\u22a9',
'vDash;': '\u22a8',
'vdash;': '\u22a2',
'Vdashl;': '\u2ae6',
'Vee;': '\u22c1',
'vee;': '\u2228',
'veebar;': '\u22bb',
'veeeq;': '\u225a',
'vellip;': '\u22ee',
'Verbar;': '\u2016',
'verbar;': '|',
'Vert;': '\u2016',
'vert;': '|',
'VerticalBar;': '\u2223',
'VerticalLine;': '|',
'VerticalSeparator;': '\u2758',
'VerticalTilde;': '\u2240',
'VeryThinSpace;': '\u200a',
'Vfr;': '\U0001d519',
'vfr;': '\U0001d533',
'vltri;': '\u22b2',
'vnsub;': '\u2282\u20d2',
'vnsup;': '\u2283\u20d2',
'Vopf;': '\U0001d54d',
'vopf;': '\U0001d567',
'vprop;': '\u221d',
'vrtri;': '\u22b3',
'Vscr;': '\U0001d4b1',
'vscr;': '\U0001d4cb',
'vsubnE;': '\u2acb\ufe00',
'vsubne;': '\u228a\ufe00',
'vsupnE;': '\u2acc\ufe00',
'vsupne;': '\u228b\ufe00',
'Vvdash;': '\u22aa',
'vzigzag;': '\u299a',
'Wcirc;': '\u0174',
'wcirc;': '\u0175',
'wedbar;': '\u2a5f',
'Wedge;': '\u22c0',
'wedge;': '\u2227',
'wedgeq;': '\u2259',
'weierp;': '\u2118',
'Wfr;': '\U0001d51a',
'wfr;': '\U0001d534',
'Wopf;': '\U0001d54e',
'wopf;': '\U0001d568',
'wp;': '\u2118',
'wr;': '\u2240',
'wreath;': '\u2240',
'Wscr;': '\U0001d4b2',
'wscr;': '\U0001d4cc',
'xcap;': '\u22c2',
'xcirc;': '\u25ef',
'xcup;': '\u22c3',
'xdtri;': '\u25bd',
'Xfr;': '\U0001d51b',
'xfr;': '\U0001d535',
'xhArr;': '\u27fa',
'xharr;': '\u27f7',
'Xi;': '\u039e',
'xi;': '\u03be',
'xlArr;': '\u27f8',
'xlarr;': '\u27f5',
'xmap;': '\u27fc',
'xnis;': '\u22fb',
'xodot;': '\u2a00',
'Xopf;': '\U0001d54f',
'xopf;': '\U0001d569',
'xoplus;': '\u2a01',
'xotime;': '\u2a02',
'xrArr;': '\u27f9',
'xrarr;': '\u27f6',
'Xscr;': '\U0001d4b3',
'xscr;': '\U0001d4cd',
'xsqcup;': '\u2a06',
'xuplus;': '\u2a04',
'xutri;': '\u25b3',
'xvee;': '\u22c1',
'xwedge;': '\u22c0',
'Yacute': '\xdd',
'yacute': '\xfd',
'Yacute;': '\xdd',
'yacute;': '\xfd',
'YAcy;': '\u042f',
'yacy;': '\u044f',
'Ycirc;': '\u0176',
'ycirc;': '\u0177',
'Ycy;': '\u042b',
'ycy;': '\u044b',
'yen': '\xa5',
'yen;': '\xa5',
'Yfr;': '\U0001d51c',
'yfr;': '\U0001d536',
'YIcy;': '\u0407',
'yicy;': '\u0457',
'Yopf;': '\U0001d550',
'yopf;': '\U0001d56a',
'Yscr;': '\U0001d4b4',
'yscr;': '\U0001d4ce',
'YUcy;': '\u042e',
'yucy;': '\u044e',
'yuml': '\xff',
'Yuml;': '\u0178',
'yuml;': '\xff',
'Zacute;': '\u0179',
'zacute;': '\u017a',
'Zcaron;': '\u017d',
'zcaron;': '\u017e',
'Zcy;': '\u0417',
'zcy;': '\u0437',
'Zdot;': '\u017b',
'zdot;': '\u017c',
'zeetrf;': '\u2128',
'ZeroWidthSpace;': '\u200b',
'Zeta;': '\u0396',
'zeta;': '\u03b6',
'Zfr;': '\u2128',
'zfr;': '\U0001d537',
'ZHcy;': '\u0416',
'zhcy;': '\u0436',
'zigrarr;': '\u21dd',
'Zopf;': '\u2124',
'zopf;': '\U0001d56b',
'Zscr;': '\U0001d4b5',
'zscr;': '\U0001d4cf',
'zwj;': '\u200d',
'zwnj;': '\u200c',
}
# maps the Unicode code point to the HTML entity name
codepoint2name = {}
# maps the HTML entity name to the character
# (or a character reference if the character is outside the Latin-1 range)
entitydefs = {}
for (name, codepoint) in name2codepoint.items():
codepoint2name[codepoint] = name
entitydefs[name] = chr(codepoint)
del name, codepoint
"""A parser for HTML and XHTML."""
# This file is based on sgmllib.py, but the API is slightly different.
# XXX There should be a way to distinguish between PCDATA (parsed
# character data -- the normal case), RCDATA (replaceable character
# data -- only char and entity references and end tags are special)
# and CDATA (character data -- only end tags are special).
import re
import warnings
import _markupbase
from html import unescape
__all__ = ['HTMLParser']
# Regular expressions used for parsing
interesting_normal = re.compile('[&<]')
incomplete = re.compile('&[a-zA-Z#]')
entityref = re.compile('&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]')
charref = re.compile('&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]')
starttagopen = re.compile('<[a-zA-Z]')
piclose = re.compile('>')
commentclose = re.compile(r'--\s*>')
# Note:
# 1) the strict attrfind isn't really strict, but we can't make it
# correctly strict without breaking backward compatibility;
# 2) if you change tagfind/attrfind remember to update locatestarttagend too;
# 3) if you change tagfind/attrfind and/or locatestarttagend the parser will
# explode, so don't do it.
tagfind = re.compile('([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\s|/(?!>))*')
# see http://www.w3.org/TR/html5/tokenization.html#tag-open-state
# and http://www.w3.org/TR/html5/tokenization.html#tag-name-state
tagfind_tolerant = re.compile('([a-zA-Z][^\t\n\r\f />\x00]*)(?:\s|/(?!>))*')
attrfind = re.compile(
r'\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*'
r'(\'[^\']*\'|"[^"]*"|[^\s"\'=<>`]*))?')
attrfind_tolerant = re.compile(
r'((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*'
r'(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*')
locatestarttagend = re.compile(r"""
<[a-zA-Z][-.a-zA-Z0-9:_]* # tag name
(?:\s+ # whitespace before attribute name
(?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name
(?:\s*=\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|\"[^\"]*\" # LIT-enclosed value
|[^'\">\s]+ # bare value
)
)?
)
)*
\s* # trailing whitespace
""", re.VERBOSE)
locatestarttagend_tolerant = re.compile(r"""
<[a-zA-Z][^\t\n\r\f />\x00]* # tag name
(?:[\s/]* # optional whitespace before attribute name
(?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name
(?:\s*=+\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|"[^"]*" # LIT-enclosed value
|(?!['"])[^>\s]* # bare value
)
(?:\s*,)* # possibly followed by a comma
)?(?:\s|/(?!>))*
)*
)?
\s* # trailing whitespace
""", re.VERBOSE)
endendtag = re.compile('>')
# the HTML 5 spec, section 8.1.2.2, doesn't allow spaces between
# </ and the tag name, so maybe this should be fixed
endtagfind = re.compile('</\s*([a-zA-Z][-.a-zA-Z0-9:_]*)\s*>')
class HTMLParseError(Exception):
"""Exception raised for all parse errors."""
def __init__(self, msg, position=(None, None)):
assert msg
self.msg = msg
self.lineno = position[0]
self.offset = position[1]
def __str__(self):
result = self.msg
if self.lineno is not None:
result = result + ", at line %d" % self.lineno
if self.offset is not None:
result = result + ", column %d" % (self.offset + 1)
return result
_default_sentinel = object()
class HTMLParser(_markupbase.ParserBase):
"""Find tags and other markup and call handler functions.
Usage:
p = HTMLParser()
p.feed(data)
...
p.close()
Start tags are handled by calling self.handle_starttag() or
self.handle_startendtag(); end tags by self.handle_endtag(). The
data between tags is passed from the parser to the derived class
by calling self.handle_data() with the data as argument (the data
may be split up in arbitrary chunks). If convert_charrefs is
True the character references are converted automatically to the
corresponding Unicode character (and self.handle_data() is no
longer split in chunks), otherwise they are passed by calling
self.handle_entityref() or self.handle_charref() with the string
containing respectively the named or numeric reference as the
argument.
"""
CDATA_CONTENT_ELEMENTS = ("script", "style")
def __init__(self, strict=_default_sentinel, *,
convert_charrefs=_default_sentinel):
"""Initialize and reset this instance.
If convert_charrefs is True (default: False), all character references
are automatically converted to the corresponding Unicode characters.
If strict is set to False (the default) the parser will parse invalid
markup, otherwise it will raise an error. Note that the strict mode
and argument are deprecated.
"""
if strict is not _default_sentinel:
warnings.warn("The strict argument and mode are deprecated.",
DeprecationWarning, stacklevel=2)
else:
strict = False # default
self.strict = strict
if convert_charrefs is _default_sentinel:
convert_charrefs = False # default
warnings.warn("The value of convert_charrefs will become True in "
"3.5. You are encouraged to set the value explicitly.",
DeprecationWarning, stacklevel=2)
self.convert_charrefs = convert_charrefs
self.reset()
def reset(self):
"""Reset this instance. Loses all unprocessed data."""
self.rawdata = ''
self.lasttag = '???'
self.interesting = interesting_normal
self.cdata_elem = None
_markupbase.ParserBase.reset(self)
def feed(self, data):
r"""Feed data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n').
"""
self.rawdata = self.rawdata + data
self.goahead(0)
def close(self):
"""Handle any buffered data."""
self.goahead(1)
def error(self, message):
warnings.warn("The 'error' method is deprecated.",
DeprecationWarning, stacklevel=2)
raise HTMLParseError(message, self.getpos())
__starttag_text = None
def get_starttag_text(self):
"""Return full source of start tag: '<...>'."""
return self.__starttag_text
def set_cdata_mode(self, elem):
self.cdata_elem = elem.lower()
self.interesting = re.compile(r'</\s*%s\s*>' % self.cdata_elem, re.I)
def clear_cdata_mode(self):
self.interesting = interesting_normal
self.cdata_elem = None
# Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
def goahead(self, end):
rawdata = self.rawdata
i = 0
n = len(rawdata)
while i < n:
if self.convert_charrefs and not self.cdata_elem:
j = rawdata.find('<', i)
if j < 0:
# if we can't find the next <, either we are at the end
# or there's more text incoming. If the latter is True,
# we can't pass the text to handle_data in case we have
# a charref cut in half at end. Try to determine if
# this is the case before proceding by looking for an
# & near the end and see if it's followed by a space or ;.
amppos = rawdata.rfind('&', max(i, n-34))
if (amppos >= 0 and
not re.compile(r'[\s;]').search(rawdata, amppos)):
break # wait till we get all the text
j = n
else:
match = self.interesting.search(rawdata, i) # < or &
if match:
j = match.start()
else:
if self.cdata_elem:
break
j = n
if i < j:
if self.convert_charrefs and not self.cdata_elem:
self.handle_data(unescape(rawdata[i:j]))
else:
self.handle_data(rawdata[i:j])
i = self.updatepos(i, j)
if i == n: break
startswith = rawdata.startswith
if startswith('<', i):
if starttagopen.match(rawdata, i): # < + letter
k = self.parse_starttag(i)
elif startswith("</", i):
k = self.parse_endtag(i)
elif startswith("<!--", i):
k = self.parse_comment(i)
elif startswith("<?", i):
k = self.parse_pi(i)
elif startswith("<!", i):
if self.strict:
k = self.parse_declaration(i)
else:
k = self.parse_html_declaration(i)
elif (i + 1) < n:
self.handle_data("<")
k = i + 1
else:
break
if k < 0:
if not end:
break
if self.strict:
self.error("EOF in middle of construct")
k = rawdata.find('>', i + 1)
if k < 0:
k = rawdata.find('<', i + 1)
if k < 0:
k = i + 1
else:
k += 1
if self.convert_charrefs and not self.cdata_elem:
self.handle_data(unescape(rawdata[i:k]))
else:
self.handle_data(rawdata[i:k])
i = self.updatepos(i, k)
elif startswith("&#", i):
match = charref.match(rawdata, i)
if match:
name = match.group()[2:-1]
self.handle_charref(name)
k = match.end()
if not startswith(';', k-1):
k = k - 1
i = self.updatepos(i, k)
continue
else:
if ";" in rawdata[i:]: # bail by consuming &#
self.handle_data(rawdata[i:i+2])
i = self.updatepos(i, i+2)
break
elif startswith('&', i):
match = entityref.match(rawdata, i)
if match:
name = match.group(1)
self.handle_entityref(name)
k = match.end()
if not startswith(';', k-1):
k = k - 1
i = self.updatepos(i, k)
continue
match = incomplete.match(rawdata, i)
if match:
# match.group() will contain at least 2 chars
if end and match.group() == rawdata[i:]:
if self.strict:
self.error("EOF in middle of entity or char ref")
else:
k = match.end()
if k <= i:
k = n
i = self.updatepos(i, i + 1)
# incomplete
break
elif (i + 1) < n:
# not the end of the buffer, and can't be confused
# with some other construct
self.handle_data("&")
i = self.updatepos(i, i + 1)
else:
break
else:
assert 0, "interesting.search() lied"
# end while
if end and i < n and not self.cdata_elem:
if self.convert_charrefs and not self.cdata_elem:
self.handle_data(unescape(rawdata[i:n]))
else:
self.handle_data(rawdata[i:n])
i = self.updatepos(i, n)
self.rawdata = rawdata[i:]
# Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
def parse_html_declaration(self, i):
rawdata = self.rawdata
assert rawdata[i:i+2] == '<!', ('unexpected call to '
'parse_html_declaration()')
if rawdata[i:i+4] == '<!--':
# this case is actually already handled in goahead()
return self.parse_comment(i)
elif rawdata[i:i+3] == '<![':
return self.parse_marked_section(i)
elif rawdata[i:i+9].lower() == '<!doctype':
# find the closing >
gtpos = rawdata.find('>', i+9)
if gtpos == -1:
return -1
self.handle_decl(rawdata[i+2:gtpos])
return gtpos+1
else:
return self.parse_bogus_comment(i)
# Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
def parse_bogus_comment(self, i, report=1):
rawdata = self.rawdata
assert rawdata[i:i+2] in ('<!', '</'), ('unexpected call to '
'parse_comment()')
pos = rawdata.find('>', i+2)
if pos == -1:
return -1
if report:
self.handle_comment(rawdata[i+2:pos])
return pos + 1
# Internal -- parse processing instr, return end or -1 if not terminated
def parse_pi(self, i):
rawdata = self.rawdata
assert rawdata[i:i+2] == '<?', 'unexpected call to parse_pi()'
match = piclose.search(rawdata, i+2) # >
if not match:
return -1
j = match.start()
self.handle_pi(rawdata[i+2: j])
j = match.end()
return j
# Internal -- handle starttag, return end or -1 if not terminated
def parse_starttag(self, i):
self.__starttag_text = None
endpos = self.check_for_whole_start_tag(i)
if endpos < 0:
return endpos
rawdata = self.rawdata
self.__starttag_text = rawdata[i:endpos]
# Now parse the data between i+1 and j into a tag and attrs
attrs = []
if self.strict:
match = tagfind.match(rawdata, i+1)
else:
match = tagfind_tolerant.match(rawdata, i+1)
assert match, 'unexpected call to parse_starttag()'
k = match.end()
self.lasttag = tag = match.group(1).lower()
while k < endpos:
if self.strict:
m = attrfind.match(rawdata, k)
else:
m = attrfind_tolerant.match(rawdata, k)
if not m:
break
attrname, rest, attrvalue = m.group(1, 2, 3)
if not rest:
attrvalue = None
elif attrvalue[:1] == '\'' == attrvalue[-1:] or \
attrvalue[:1] == '"' == attrvalue[-1:]:
attrvalue = attrvalue[1:-1]
if attrvalue:
attrvalue = unescape(attrvalue)
attrs.append((attrname.lower(), attrvalue))
k = m.end()
end = rawdata[k:endpos].strip()
if end not in (">", "/>"):
lineno, offset = self.getpos()
if "\n" in self.__starttag_text:
lineno = lineno + self.__starttag_text.count("\n")
offset = len(self.__starttag_text) \
- self.__starttag_text.rfind("\n")
else:
offset = offset + len(self.__starttag_text)
if self.strict:
self.error("junk characters in start tag: %r"
% (rawdata[k:endpos][:20],))
self.handle_data(rawdata[i:endpos])
return endpos
if end.endswith('/>'):
# XHTML-style empty tag: <span attr="value" />
self.handle_startendtag(tag, attrs)
else:
self.handle_starttag(tag, attrs)
if tag in self.CDATA_CONTENT_ELEMENTS:
self.set_cdata_mode(tag)
return endpos
# Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
def check_for_whole_start_tag(self, i):
rawdata = self.rawdata
if self.strict:
m = locatestarttagend.match(rawdata, i)
else:
m = locatestarttagend_tolerant.match(rawdata, i)
if m:
j = m.end()
next = rawdata[j:j+1]
if next == ">":
return j + 1
if next == "/":
if rawdata.startswith("/>", j):
return j + 2
if rawdata.startswith("/", j):
# buffer boundary
return -1
# else bogus input
if self.strict:
self.updatepos(i, j + 1)
self.error("malformed empty start tag")
if j > i:
return j
else:
return i + 1
if next == "":
# end of input
return -1
if next in ("abcdefghijklmnopqrstuvwxyz=/"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
# end of input in or before attribute value, or we have the
# '/' from a '/>' ending
return -1
if self.strict:
self.updatepos(i, j)
self.error("malformed start tag")
if j > i:
return j
else:
return i + 1
raise AssertionError("we should not get here!")
# Internal -- parse endtag, return end or -1 if incomplete
def parse_endtag(self, i):
rawdata = self.rawdata
assert rawdata[i:i+2] == "</", "unexpected call to parse_endtag"
match = endendtag.search(rawdata, i+1) # >
if not match:
return -1
gtpos = match.end()
match = endtagfind.match(rawdata, i) # </ + tag + >
if not match:
if self.cdata_elem is not None:
self.handle_data(rawdata[i:gtpos])
return gtpos
if self.strict:
self.error("bad end tag: %r" % (rawdata[i:gtpos],))
# find the name: w3.org/TR/html5/tokenization.html#tag-name-state
namematch = tagfind_tolerant.match(rawdata, i+2)
if not namematch:
# w3.org/TR/html5/tokenization.html#end-tag-open-state
if rawdata[i:i+3] == '</>':
return i+3
else:
return self.parse_bogus_comment(i)
tagname = namematch.group(1).lower()
# consume and ignore other stuff between the name and the >
# Note: this is not 100% correct, since we might have things like
# </tag attr=">">, but looking for > after tha name should cover
# most of the cases and is much simpler
gtpos = rawdata.find('>', namematch.end())
self.handle_endtag(tagname)
return gtpos+1
elem = match.group(1).lower() # script or style
if self.cdata_elem is not None:
if elem != self.cdata_elem:
self.handle_data(rawdata[i:gtpos])
return gtpos
self.handle_endtag(elem.lower())
self.clear_cdata_mode()
return gtpos
# Overridable -- finish processing of start+end tag: <tag.../>
def handle_startendtag(self, tag, attrs):
self.handle_starttag(tag, attrs)
self.handle_endtag(tag)
# Overridable -- handle start tag
def handle_starttag(self, tag, attrs):
pass
# Overridable -- handle end tag
def handle_endtag(self, tag):
pass
# Overridable -- handle character reference
def handle_charref(self, name):
pass
# Overridable -- handle entity reference
def handle_entityref(self, name):
pass
# Overridable -- handle data
def handle_data(self, data):
pass
# Overridable -- handle comment
def handle_comment(self, data):
pass
# Overridable -- handle declaration
def handle_decl(self, decl):
pass
# Overridable -- handle processing instruction
def handle_pi(self, data):
pass
def unknown_decl(self, data):
if self.strict:
self.error("unknown declaration: %r" % (data,))
# Internal -- helper to remove special character quoting
def unescape(self, s):
warnings.warn('The unescape method is deprecated and will be removed '
'in 3.5, use html.unescape() instead.',
DeprecationWarning, stacklevel=2)
return unescape(s)
"""
General functions for HTML manipulation.
"""
import re as _re
from html.entities import html5 as _html5
__all__ = ['escape', 'unescape']
def escape(s, quote=True):
"""
Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true (the default), the quotation mark
characters, both double quote (") and single quote (') characters are also
translated.
"""
s = s.replace("&", "&") # Must be done first!
s = s.replace("<", "<")
s = s.replace(">", ">")
if quote:
s = s.replace('"', """)
s = s.replace('\'', "'")
return s
# see http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references
_invalid_charrefs = {
0x00: '\ufffd', # REPLACEMENT CHARACTER
0x0d: '\r', # CARRIAGE RETURN
0x80: '\u20ac', # EURO SIGN
0x81: '\x81', # <control>
0x82: '\u201a', # SINGLE LOW-9 QUOTATION MARK
0x83: '\u0192', # LATIN SMALL LETTER F WITH HOOK
0x84: '\u201e', # DOUBLE LOW-9 QUOTATION MARK
0x85: '\u2026', # HORIZONTAL ELLIPSIS
0x86: '\u2020', # DAGGER
0x87: '\u2021', # DOUBLE DAGGER
0x88: '\u02c6', # MODIFIER LETTER CIRCUMFLEX ACCENT
0x89: '\u2030', # PER MILLE SIGN
0x8a: '\u0160', # LATIN CAPITAL LETTER S WITH CARON
0x8b: '\u2039', # SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8c: '\u0152', # LATIN CAPITAL LIGATURE OE
0x8d: '\x8d', # <control>
0x8e: '\u017d', # LATIN CAPITAL LETTER Z WITH CARON
0x8f: '\x8f', # <control>
0x90: '\x90', # <control>
0x91: '\u2018', # LEFT SINGLE QUOTATION MARK
0x92: '\u2019', # RIGHT SINGLE QUOTATION MARK
0x93: '\u201c', # LEFT DOUBLE QUOTATION MARK
0x94: '\u201d', # RIGHT DOUBLE QUOTATION MARK
0x95: '\u2022', # BULLET
0x96: '\u2013', # EN DASH
0x97: '\u2014', # EM DASH
0x98: '\u02dc', # SMALL TILDE
0x99: '\u2122', # TRADE MARK SIGN
0x9a: '\u0161', # LATIN SMALL LETTER S WITH CARON
0x9b: '\u203a', # SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9c: '\u0153', # LATIN SMALL LIGATURE OE
0x9d: '\x9d', # <control>
0x9e: '\u017e', # LATIN SMALL LETTER Z WITH CARON
0x9f: '\u0178', # LATIN CAPITAL LETTER Y WITH DIAERESIS
}
_invalid_codepoints = {
# 0x0001 to 0x0008
0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
# 0x000E to 0x001F
0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19,
0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
# 0x007F to 0x009F
0x7f, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a,
0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96,
0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
# 0xFDD0 to 0xFDEF
0xfdd0, 0xfdd1, 0xfdd2, 0xfdd3, 0xfdd4, 0xfdd5, 0xfdd6, 0xfdd7, 0xfdd8,
0xfdd9, 0xfdda, 0xfddb, 0xfddc, 0xfddd, 0xfdde, 0xfddf, 0xfde0, 0xfde1,
0xfde2, 0xfde3, 0xfde4, 0xfde5, 0xfde6, 0xfde7, 0xfde8, 0xfde9, 0xfdea,
0xfdeb, 0xfdec, 0xfded, 0xfdee, 0xfdef,
# others
0xb, 0xfffe, 0xffff, 0x1fffe, 0x1ffff, 0x2fffe, 0x2ffff, 0x3fffe, 0x3ffff,
0x4fffe, 0x4ffff, 0x5fffe, 0x5ffff, 0x6fffe, 0x6ffff, 0x7fffe, 0x7ffff,
0x8fffe, 0x8ffff, 0x9fffe, 0x9ffff, 0xafffe, 0xaffff, 0xbfffe, 0xbffff,
0xcfffe, 0xcffff, 0xdfffe, 0xdffff, 0xefffe, 0xeffff, 0xffffe, 0xfffff,
0x10fffe, 0x10ffff
}
def _replace_charref(s):
s = s.group(1)
if s[0] == '#':
# numeric charref
if s[1] in 'xX':
num = int(s[2:].rstrip(';'), 16)
else:
num = int(s[1:].rstrip(';'))
if num in _invalid_charrefs:
return _invalid_charrefs[num]
if 0xD800 <= num <= 0xDFFF or num > 0x10FFFF:
return '\uFFFD'
if num in _invalid_codepoints:
return ''
return chr(num)
else:
# named charref
if s in _html5:
return _html5[s]
# find the longest matching name (as defined by the standard)
for x in range(len(s)-1, 1, -1):
if s[:x] in _html5:
return _html5[s[:x]] + s[x:]
else:
return '&' + s
_charref = _re.compile(r'&(#[0-9]+;?'
r'|#[xX][0-9a-fA-F]+;?'
r'|[^\t\n\f <&#;]{1,32};?)')
def unescape(s):
"""
Convert all named and numeric character references (e.g. >, >,
&x3e;) in the string s to the corresponding unicode characters.
This function uses the rules defined by the HTML 5 standard
for both valid and invalid character references, and the list of
HTML 5 named character references defined in html.entities.html5.
"""
if '&' not in s:
return s
return _charref.sub(_replace_charref, s)
"""HTTP/1.1 client library
<intro stuff goes here>
<other stuff, too>
HTTPConnection goes through a number of "states", which define when a client
may legally make another request or fetch the response for a particular
request. This diagram details these state transitions:
(null)
|
| HTTPConnection()
v
Idle
|
| putrequest()
v
Request-started
|
| ( putheader() )* endheaders()
v
Request-sent
|
| response = getresponse()
v
Unread-response [Response-headers-read]
|\____________________
| |
| response.read() | putrequest()
v v
Idle Req-started-unread-response
______/|
/ |
response.read() | | ( putheader() )* endheaders()
v v
Request-started Req-sent-unread-response
|
| response.read()
v
Request-sent
This diagram presents the following rules:
-- a second request may not be started until {response-headers-read}
-- a response [object] cannot be retrieved until {request-sent}
-- there is no differentiation between an unread response body and a
partially read response body
Note: this enforcement is applied by the HTTPConnection class. The
HTTPResponse class does not enforce this state machine, which
implies sophisticated clients may accelerate the request/response
pipeline. Caution should be taken, though: accelerating the states
beyond the above pattern may imply knowledge of the server's
connection-close behavior for certain requests. For example, it
is impossible to tell whether the server will close the connection
UNTIL the response headers have been read; this means that further
requests cannot be placed into the pipeline until it is known that
the server will NOT be closing the connection.
Logical State __state __response
------------- ------- ----------
Idle _CS_IDLE None
Request-started _CS_REQ_STARTED None
Request-sent _CS_REQ_SENT None
Unread-response _CS_IDLE <response_class>
Req-started-unread-response _CS_REQ_STARTED <response_class>
Req-sent-unread-response _CS_REQ_SENT <response_class>
"""
import email.parser
import email.message
import io
import os
import re
import socket
import collections
from urllib.parse import urlsplit
# HTTPMessage, parse_headers(), and the HTTP status code constants are
# intentionally omitted for simplicity
__all__ = ["HTTPResponse", "HTTPConnection",
"HTTPException", "NotConnected", "UnknownProtocol",
"UnknownTransferEncoding", "UnimplementedFileMode",
"IncompleteRead", "InvalidURL", "ImproperConnectionState",
"CannotSendRequest", "CannotSendHeader", "ResponseNotReady",
"BadStatusLine", "LineTooLong", "error", "responses"]
HTTP_PORT = 80
HTTPS_PORT = 443
_UNKNOWN = 'UNKNOWN'
# connection states
_CS_IDLE = 'Idle'
_CS_REQ_STARTED = 'Request-started'
_CS_REQ_SENT = 'Request-sent'
# status codes
# informational
CONTINUE = 100
SWITCHING_PROTOCOLS = 101
PROCESSING = 102
# successful
OK = 200
CREATED = 201
ACCEPTED = 202
NON_AUTHORITATIVE_INFORMATION = 203
NO_CONTENT = 204
RESET_CONTENT = 205
PARTIAL_CONTENT = 206
MULTI_STATUS = 207
IM_USED = 226
# redirection
MULTIPLE_CHOICES = 300
MOVED_PERMANENTLY = 301
FOUND = 302
SEE_OTHER = 303
NOT_MODIFIED = 304
USE_PROXY = 305
TEMPORARY_REDIRECT = 307
# client error
BAD_REQUEST = 400
UNAUTHORIZED = 401
PAYMENT_REQUIRED = 402
FORBIDDEN = 403
NOT_FOUND = 404
METHOD_NOT_ALLOWED = 405
NOT_ACCEPTABLE = 406
PROXY_AUTHENTICATION_REQUIRED = 407
REQUEST_TIMEOUT = 408
CONFLICT = 409
GONE = 410
LENGTH_REQUIRED = 411
PRECONDITION_FAILED = 412
REQUEST_ENTITY_TOO_LARGE = 413
REQUEST_URI_TOO_LONG = 414
UNSUPPORTED_MEDIA_TYPE = 415
REQUESTED_RANGE_NOT_SATISFIABLE = 416
EXPECTATION_FAILED = 417
UNPROCESSABLE_ENTITY = 422
LOCKED = 423
FAILED_DEPENDENCY = 424
UPGRADE_REQUIRED = 426
PRECONDITION_REQUIRED = 428
TOO_MANY_REQUESTS = 429
REQUEST_HEADER_FIELDS_TOO_LARGE = 431
# server error
INTERNAL_SERVER_ERROR = 500
NOT_IMPLEMENTED = 501
BAD_GATEWAY = 502
SERVICE_UNAVAILABLE = 503
GATEWAY_TIMEOUT = 504
HTTP_VERSION_NOT_SUPPORTED = 505
INSUFFICIENT_STORAGE = 507
NOT_EXTENDED = 510
NETWORK_AUTHENTICATION_REQUIRED = 511
# Mapping status codes to official W3C names
responses = {
100: 'Continue',
101: 'Switching Protocols',
200: 'OK',
201: 'Created',
202: 'Accepted',
203: 'Non-Authoritative Information',
204: 'No Content',
205: 'Reset Content',
206: 'Partial Content',
300: 'Multiple Choices',
301: 'Moved Permanently',
302: 'Found',
303: 'See Other',
304: 'Not Modified',
305: 'Use Proxy',
306: '(Unused)',
307: 'Temporary Redirect',
400: 'Bad Request',
401: 'Unauthorized',
402: 'Payment Required',
403: 'Forbidden',
404: 'Not Found',
405: 'Method Not Allowed',
406: 'Not Acceptable',
407: 'Proxy Authentication Required',
408: 'Request Timeout',
409: 'Conflict',
410: 'Gone',
411: 'Length Required',
412: 'Precondition Failed',
413: 'Request Entity Too Large',
414: 'Request-URI Too Long',
415: 'Unsupported Media Type',
416: 'Requested Range Not Satisfiable',
417: 'Expectation Failed',
428: 'Precondition Required',
429: 'Too Many Requests',
431: 'Request Header Fields Too Large',
500: 'Internal Server Error',
501: 'Not Implemented',
502: 'Bad Gateway',
503: 'Service Unavailable',
504: 'Gateway Timeout',
505: 'HTTP Version Not Supported',
511: 'Network Authentication Required',
}
# maximal amount of data to read at one time in _safe_read
MAXAMOUNT = 1048576
# maximal line length when calling readline().
_MAXLINE = 65536
_MAXHEADERS = 100
# Header name/value ABNF (http://tools.ietf.org/html/rfc7230#section-3.2)
#
# VCHAR = %x21-7E
# obs-text = %x80-FF
# header-field = field-name ":" OWS field-value OWS
# field-name = token
# field-value = *( field-content / obs-fold )
# field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
# field-vchar = VCHAR / obs-text
#
# obs-fold = CRLF 1*( SP / HTAB )
# ; obsolete line folding
# ; see Section 3.2.4
# token = 1*tchar
#
# tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
# / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
# / DIGIT / ALPHA
# ; any VCHAR, except delimiters
#
# VCHAR defined in http://tools.ietf.org/html/rfc5234#appendix-B.1
# the patterns for both name and value are more leniant than RFC
# definitions to allow for backwards compatibility
_is_legal_header_name = re.compile(rb'[^:\s][^:\r\n]*').fullmatch
_is_illegal_header_value = re.compile(rb'\n(?![ \t])|\r(?![ \t\n])').search
# We always set the Content-Length header for these methods because some
# servers will otherwise respond with a 411
_METHODS_EXPECTING_BODY = {'PATCH', 'POST', 'PUT'}
class HTTPMessage(email.message.Message):
# XXX The only usage of this method is in
# http.server.CGIHTTPRequestHandler. Maybe move the code there so
# that it doesn't need to be part of the public API. The API has
# never been defined so this could cause backwards compatibility
# issues.
def getallmatchingheaders(self, name):
"""Find all header lines matching a given header name.
Look through the list of headers and find all lines matching a given
header name (and their continuation lines). A list of the lines is
returned, without interpretation. If the header does not occur, an
empty list is returned. If the header occurs multiple times, all
occurrences are returned. Case is not important in the header name.
"""
name = name.lower() + ':'
n = len(name)
lst = []
hit = 0
for line in self.keys():
if line[:n].lower() == name:
hit = 1
elif not line[:1].isspace():
hit = 0
if hit:
lst.append(line)
return lst
def parse_headers(fp, _class=HTTPMessage):
"""Parses only RFC2822 headers from a file pointer.
email Parser wants to see strings rather than bytes.
But a TextIOWrapper around self.rfile would buffer too many bytes
from the stream, bytes which we later need to read as bytes.
So we read the correct bytes here, as bytes, for email Parser
to parse.
"""
headers = []
while True:
line = fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
headers.append(line)
if len(headers) > _MAXHEADERS:
raise HTTPException("got more than %d headers" % _MAXHEADERS)
if line in (b'\r\n', b'\n', b''):
break
hstring = b''.join(headers).decode('iso-8859-1')
return email.parser.Parser(_class=_class).parsestr(hstring)
class HTTPResponse(io.RawIOBase):
# See RFC 2616 sec 19.6 and RFC 1945 sec 6 for details.
# The bytes from the socket object are iso-8859-1 strings.
# See RFC 2616 sec 2.2 which notes an exception for MIME-encoded
# text following RFC 2047. The basic status line parsing only
# accepts iso-8859-1.
def __init__(self, sock, debuglevel=0, method=None, url=None):
# If the response includes a content-length header, we need to
# make sure that the client doesn't read more than the
# specified number of bytes. If it does, it will block until
# the server times out and closes the connection. This will
# happen if a self.fp.read() is done (without a size) whether
# self.fp is buffered or not. So, no self.fp.read() by
# clients unless they know what they are doing.
self.fp = sock.makefile("rb")
self.debuglevel = debuglevel
self._method = method
# The HTTPResponse object is returned via urllib. The clients
# of http and urllib expect different attributes for the
# headers. headers is used here and supports urllib. msg is
# provided as a backwards compatibility layer for http
# clients.
self.headers = self.msg = None
# from the Status-Line of the response
self.version = _UNKNOWN # HTTP-Version
self.status = _UNKNOWN # Status-Code
self.reason = _UNKNOWN # Reason-Phrase
self.chunked = _UNKNOWN # is "chunked" being used?
self.chunk_left = _UNKNOWN # bytes left to read in current chunk
self.length = _UNKNOWN # number of bytes left in response
self.will_close = _UNKNOWN # conn will close at end of response
def _read_status(self):
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
if len(line) > _MAXLINE:
raise LineTooLong("status line")
if self.debuglevel > 0:
print("reply:", repr(line))
if not line:
# Presumably, the server closed the connection before
# sending a valid response.
raise BadStatusLine(line)
try:
version, status, reason = line.split(None, 2)
except ValueError:
try:
version, status = line.split(None, 1)
reason = ""
except ValueError:
# empty version will cause next test to fail.
version = ""
if not version.startswith("HTTP/"):
self._close_conn()
raise BadStatusLine(line)
# The status code is a three-digit number
try:
status = int(status)
if status < 100 or status > 999:
raise BadStatusLine(line)
except ValueError:
raise BadStatusLine(line)
return version, status, reason
def begin(self):
if self.headers is not None:
# we've already started reading the response
return
# read until we get a non-100 response
while True:
version, status, reason = self._read_status()
if status != CONTINUE:
break
# skip the header from the 100 response
while True:
skip = self.fp.readline(_MAXLINE + 1)
if len(skip) > _MAXLINE:
raise LineTooLong("header line")
skip = skip.strip()
if not skip:
break
if self.debuglevel > 0:
print("header:", skip)
self.code = self.status = status
self.reason = reason.strip()
if version in ("HTTP/1.0", "HTTP/0.9"):
# Some servers might still return "0.9", treat it as 1.0 anyway
self.version = 10
elif version.startswith("HTTP/1."):
self.version = 11 # use HTTP/1.1 code for HTTP/1.x where x>=1
else:
raise UnknownProtocol(version)
self.headers = self.msg = parse_headers(self.fp)
if self.debuglevel > 0:
for hdr in self.headers:
print("header:", hdr, end=" ")
# are we using the chunked-style of transfer encoding?
tr_enc = self.headers.get("transfer-encoding")
if tr_enc and tr_enc.lower() == "chunked":
self.chunked = True
self.chunk_left = None
else:
self.chunked = False
# will the connection close at the end of the response?
self.will_close = self._check_close()
# do we have a Content-Length?
# NOTE: RFC 2616, S4.4, #3 says we ignore this if tr_enc is "chunked"
self.length = None
length = self.headers.get("content-length")
# are we using the chunked-style of transfer encoding?
tr_enc = self.headers.get("transfer-encoding")
if length and not self.chunked:
try:
self.length = int(length)
except ValueError:
self.length = None
else:
if self.length < 0: # ignore nonsensical negative lengths
self.length = None
else:
self.length = None
# does the body have a fixed length? (of zero)
if (status == NO_CONTENT or status == NOT_MODIFIED or
100 <= status < 200 or # 1xx codes
self._method == "HEAD"):
self.length = 0
# if the connection remains open, and we aren't using chunked, and
# a content-length was not provided, then assume that the connection
# WILL close.
if (not self.will_close and
not self.chunked and
self.length is None):
self.will_close = True
def _check_close(self):
conn = self.headers.get("connection")
if self.version == 11:
# An HTTP/1.1 proxy is assumed to stay open unless
# explicitly closed.
conn = self.headers.get("connection")
if conn and "close" in conn.lower():
return True
return False
# Some HTTP/1.0 implementations have support for persistent
# connections, using rules different than HTTP/1.1.
# For older HTTP, Keep-Alive indicates persistent connection.
if self.headers.get("keep-alive"):
return False
# At least Akamai returns a "Connection: Keep-Alive" header,
# which was supposed to be sent by the client.
if conn and "keep-alive" in conn.lower():
return False
# Proxy-Connection is a netscape hack.
pconn = self.headers.get("proxy-connection")
if pconn and "keep-alive" in pconn.lower():
return False
# otherwise, assume it will close
return True
def _close_conn(self):
fp = self.fp
self.fp = None
fp.close()
def close(self):
try:
super().close() # set "closed" flag
finally:
if self.fp:
self._close_conn()
# These implementations are for the benefit of io.BufferedReader.
# XXX This class should probably be revised to act more like
# the "raw stream" that BufferedReader expects.
def flush(self):
super().flush()
if self.fp:
self.fp.flush()
def readable(self):
return True
# End of "raw stream" methods
def isclosed(self):
"""True if the connection is closed."""
# NOTE: it is possible that we will not ever call self.close(). This
# case occurs when will_close is TRUE, length is None, and we
# read up to the last byte, but NOT past it.
#
# IMPLIES: if will_close is FALSE, then self.close() will ALWAYS be
# called, meaning self.isclosed() is meaningful.
return self.fp is None
def read(self, amt=None):
if self.fp is None:
return b""
if self._method == "HEAD":
self._close_conn()
return b""
if amt is not None:
# Amount is given, so call base class version
# (which is implemented in terms of self.readinto)
return super(HTTPResponse, self).read(amt)
else:
# Amount is not given (unbounded read) so we must check self.length
# and self.chunked
if self.chunked:
return self._readall_chunked()
if self.length is None:
s = self.fp.read()
else:
try:
s = self._safe_read(self.length)
except IncompleteRead:
self._close_conn()
raise
self.length = 0
self._close_conn() # we read everything
return s
def readinto(self, b):
if self.fp is None:
return 0
if self._method == "HEAD":
self._close_conn()
return 0
if self.chunked:
return self._readinto_chunked(b)
if self.length is not None:
if len(b) > self.length:
# clip the read to the "end of response"
b = memoryview(b)[0:self.length]
# we do not use _safe_read() here because this may be a .will_close
# connection, and the user is reading more bytes than will be provided
# (for example, reading in 1k chunks)
n = self.fp.readinto(b)
if not n and b:
# Ideally, we would raise IncompleteRead if the content-length
# wasn't satisfied, but it might break compatibility.
self._close_conn()
elif self.length is not None:
self.length -= n
if not self.length:
self._close_conn()
return n
def _read_next_chunk_size(self):
# Read the next chunk size from the file
line = self.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("chunk size")
i = line.find(b";")
if i >= 0:
line = line[:i] # strip chunk-extensions
try:
return int(line, 16)
except ValueError:
# close the connection as protocol synchronisation is
# probably lost
self._close_conn()
raise
def _read_and_discard_trailer(self):
# read and discard trailer up to the CRLF terminator
### note: we shouldn't have any trailers!
while True:
line = self.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("trailer line")
if not line:
# a vanishingly small number of sites EOF without
# sending the trailer
break
if line in (b'\r\n', b'\n', b''):
break
def _readall_chunked(self):
assert self.chunked != _UNKNOWN
chunk_left = self.chunk_left
value = []
while True:
if chunk_left is None:
try:
chunk_left = self._read_next_chunk_size()
if chunk_left == 0:
break
except ValueError:
raise IncompleteRead(b''.join(value))
value.append(self._safe_read(chunk_left))
# we read the whole chunk, get another
self._safe_read(2) # toss the CRLF at the end of the chunk
chunk_left = None
self._read_and_discard_trailer()
# we read everything; close the "file"
self._close_conn()
return b''.join(value)
def _readinto_chunked(self, b):
assert self.chunked != _UNKNOWN
chunk_left = self.chunk_left
total_bytes = 0
mvb = memoryview(b)
while True:
if chunk_left is None:
try:
chunk_left = self._read_next_chunk_size()
if chunk_left == 0:
break
except ValueError:
raise IncompleteRead(bytes(b[0:total_bytes]))
if len(mvb) < chunk_left:
n = self._safe_readinto(mvb)
self.chunk_left = chunk_left - n
return total_bytes + n
elif len(mvb) == chunk_left:
n = self._safe_readinto(mvb)
self._safe_read(2) # toss the CRLF at the end of the chunk
self.chunk_left = None
return total_bytes + n
else:
temp_mvb = mvb[0:chunk_left]
n = self._safe_readinto(temp_mvb)
mvb = mvb[n:]
total_bytes += n
# we read the whole chunk, get another
self._safe_read(2) # toss the CRLF at the end of the chunk
chunk_left = None
self._read_and_discard_trailer()
# we read everything; close the "file"
self._close_conn()
return total_bytes
def _safe_read(self, amt):
"""Read the number of bytes requested, compensating for partial reads.
Normally, we have a blocking socket, but a read() can be interrupted
by a signal (resulting in a partial read).
Note that we cannot distinguish between EOF and an interrupt when zero
bytes have been read. IncompleteRead() will be raised in this
situation.
This function should be used when <amt> bytes "should" be present for
reading. If the bytes are truly not available (due to EOF), then the
IncompleteRead exception can be used to detect the problem.
"""
s = []
while amt > 0:
chunk = self.fp.read(min(amt, MAXAMOUNT))
if not chunk:
raise IncompleteRead(b''.join(s), amt)
s.append(chunk)
amt -= len(chunk)
return b"".join(s)
def _safe_readinto(self, b):
"""Same as _safe_read, but for reading into a buffer."""
total_bytes = 0
mvb = memoryview(b)
while total_bytes < len(b):
if MAXAMOUNT < len(mvb):
temp_mvb = mvb[0:MAXAMOUNT]
n = self.fp.readinto(temp_mvb)
else:
n = self.fp.readinto(mvb)
if not n:
raise IncompleteRead(bytes(mvb[0:total_bytes]), len(b))
mvb = mvb[n:]
total_bytes += n
return total_bytes
def fileno(self):
return self.fp.fileno()
def getheader(self, name, default=None):
if self.headers is None:
raise ResponseNotReady()
headers = self.headers.get_all(name) or default
if isinstance(headers, str) or not hasattr(headers, '__iter__'):
return headers
else:
return ', '.join(headers)
def getheaders(self):
"""Return list of (header, value) tuples."""
if self.headers is None:
raise ResponseNotReady()
return list(self.headers.items())
# We override IOBase.__iter__ so that it doesn't check for closed-ness
def __iter__(self):
return self
# For compatibility with old-style urllib responses.
def info(self):
return self.headers
def geturl(self):
return self.url
def getcode(self):
return self.status
class HTTPConnection:
_http_vsn = 11
_http_vsn_str = 'HTTP/1.1'
response_class = HTTPResponse
default_port = HTTP_PORT
auto_open = 1
debuglevel = 0
# TCP Maximum Segment Size (MSS) is determined by the TCP stack on
# a per-connection basis. There is no simple and efficient
# platform independent mechanism for determining the MSS, so
# instead a reasonable estimate is chosen. The getsockopt()
# interface using the TCP_MAXSEG parameter may be a suitable
# approach on some operating systems. A value of 16KiB is chosen
# as a reasonable estimate of the maximum MSS.
mss = 16384
def __init__(self, host, port=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None):
self.timeout = timeout
self.source_address = source_address
self.sock = None
self._buffer = []
self.__response = None
self.__state = _CS_IDLE
self._method = None
self._tunnel_host = None
self._tunnel_port = None
self._tunnel_headers = {}
(self.host, self.port) = self._get_hostport(host, port)
# This is stored as an instance variable to allow unit
# tests to replace it with a suitable mockup
self._create_connection = socket.create_connection
def set_tunnel(self, host, port=None, headers=None):
"""Set up host and port for HTTP CONNECT tunnelling.
In a connection that uses HTTP CONNECT tunneling, the host passed to the
constructor is used as a proxy server that relays all communication to
the endpoint passed to `set_tunnel`. This done by sending an HTTP
CONNECT request to the proxy server when the connection is established.
This method must be called before the HTML connection has been
established.
The headers argument should be a mapping of extra HTTP headers to send
with the CONNECT request.
"""
if self.sock:
raise RuntimeError("Can't set up tunnel for established connection")
self._tunnel_host, self._tunnel_port = self._get_hostport(host, port)
if headers:
self._tunnel_headers = headers
else:
self._tunnel_headers.clear()
def _get_hostport(self, host, port):
if port is None:
i = host.rfind(':')
j = host.rfind(']') # ipv6 addresses have [...]
if i > j:
try:
port = int(host[i+1:])
except ValueError:
if host[i+1:] == "": # http://foo.com:/ == http://foo.com/
port = self.default_port
else:
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
host = host[:i]
else:
port = self.default_port
if host and host[0] == '[' and host[-1] == ']':
host = host[1:-1]
return (host, port)
def set_debuglevel(self, level):
self.debuglevel = level
def _tunnel(self):
connect_str = "CONNECT %s:%d HTTP/1.0\r\n" % (self._tunnel_host,
self._tunnel_port)
connect_bytes = connect_str.encode("ascii")
self.send(connect_bytes)
for header, value in self._tunnel_headers.items():
header_str = "%s: %s\r\n" % (header, value)
header_bytes = header_str.encode("latin-1")
self.send(header_bytes)
self.send(b'\r\n')
response = self.response_class(self.sock, method=self._method)
(version, code, message) = response._read_status()
if code != 200:
self.close()
raise OSError("Tunnel connection failed: %d %s" % (code,
message.strip()))
while True:
line = response.fp.readline(_MAXLINE + 1)
if len(line) > _MAXLINE:
raise LineTooLong("header line")
if not line:
# for sites which EOF without sending a trailer
break
if line in (b'\r\n', b'\n', b''):
break
def connect(self):
"""Connect to the host and port specified in __init__."""
self.sock = self._create_connection((self.host,self.port),
self.timeout, self.source_address)
if self._tunnel_host:
self._tunnel()
def close(self):
"""Close the connection to the HTTP server."""
self.__state = _CS_IDLE
try:
sock = self.sock
if sock:
self.sock = None
sock.close() # close it manually... there may be other refs
finally:
response = self.__response
if response:
self.__response = None
response.close()
def send(self, data):
"""Send `data' to the server.
``data`` can be a string object, a bytes object, an array object, a
file-like object that supports a .read() method, or an iterable object.
"""
if self.sock is None:
if self.auto_open:
self.connect()
else:
raise NotConnected()
if self.debuglevel > 0:
print("send:", repr(data))
blocksize = 8192
if hasattr(data, "read") :
if self.debuglevel > 0:
print("sendIng a read()able")
encode = False
try:
mode = data.mode
except AttributeError:
# io.BytesIO and other file-like objects don't have a `mode`
# attribute.
pass
else:
if "b" not in mode:
encode = True
if self.debuglevel > 0:
print("encoding file using iso-8859-1")
while 1:
datablock = data.read(blocksize)
if not datablock:
break
if encode:
datablock = datablock.encode("iso-8859-1")
self.sock.sendall(datablock)
return
try:
self.sock.sendall(data)
except TypeError:
if isinstance(data, collections.Iterable):
for d in data:
self.sock.sendall(d)
else:
raise TypeError("data should be a bytes-like object "
"or an iterable, got %r" % type(data))
def _output(self, s):
"""Add a line of output to the current request buffer.
Assumes that the line does *not* end with \\r\\n.
"""
self._buffer.append(s)
def _send_output(self, message_body=None):
"""Send the currently buffered request and clear the buffer.
Appends an extra \\r\\n to the buffer.
A message_body may be specified, to be appended to the request.
"""
self._buffer.extend((b"", b""))
msg = b"\r\n".join(self._buffer)
del self._buffer[:]
# If msg and message_body are sent in a single send() call,
# it will avoid performance problems caused by the interaction
# between delayed ack and the Nagle algorithm. However,
# there is no performance gain if the message is larger
# than MSS (and there is a memory penalty for the message
# copy).
if isinstance(message_body, bytes) and len(message_body) < self.mss:
msg += message_body
message_body = None
self.send(msg)
if message_body is not None:
# message_body was not a string (i.e. it is a file), and
# we must run the risk of Nagle.
self.send(message_body)
def putrequest(self, method, url, skip_host=0, skip_accept_encoding=0):
"""Send a request to the server.
`method' specifies an HTTP request method, e.g. 'GET'.
`url' specifies the object being requested, e.g. '/index.html'.
`skip_host' if True does not add automatically a 'Host:' header
`skip_accept_encoding' if True does not add automatically an
'Accept-Encoding:' header
"""
# if a prior response has been completed, then forget about it.
if self.__response and self.__response.isclosed():
self.__response = None
# in certain cases, we cannot issue another request on this connection.
# this occurs when:
# 1) we are in the process of sending a request. (_CS_REQ_STARTED)
# 2) a response to a previous request has signalled that it is going
# to close the connection upon completion.
# 3) the headers for the previous response have not been read, thus
# we cannot determine whether point (2) is true. (_CS_REQ_SENT)
#
# if there is no prior response, then we can request at will.
#
# if point (2) is true, then we will have passed the socket to the
# response (effectively meaning, "there is no prior response"), and
# will open a new one when a new request is made.
#
# Note: if a prior response exists, then we *can* start a new request.
# We are not allowed to begin fetching the response to this new
# request, however, until that prior response is complete.
#
if self.__state == _CS_IDLE:
self.__state = _CS_REQ_STARTED
else:
raise CannotSendRequest(self.__state)
# Save the method we use, we need it later in the response phase
self._method = method
if not url:
url = '/'
request = '%s %s %s' % (method, url, self._http_vsn_str)
# Non-ASCII characters should have been eliminated earlier
self._output(request.encode('ascii'))
if self._http_vsn == 11:
# Issue some standard headers for better HTTP/1.1 compliance
if not skip_host:
# this header is issued *only* for HTTP/1.1
# connections. more specifically, this means it is
# only issued when the client uses the new
# HTTPConnection() class. backwards-compat clients
# will be using HTTP/1.0 and those clients may be
# issuing this header themselves. we should NOT issue
# it twice; some web servers (such as Apache) barf
# when they see two Host: headers
# If we need a non-standard port,include it in the
# header. If the request is going through a proxy,
# but the host of the actual URL, not the host of the
# proxy.
netloc = ''
if url.startswith('http'):
nil, netloc, nil, nil, nil = urlsplit(url)
if netloc:
try:
netloc_enc = netloc.encode("ascii")
except UnicodeEncodeError:
netloc_enc = netloc.encode("idna")
self.putheader('Host', netloc_enc)
else:
if self._tunnel_host:
host = self._tunnel_host
port = self._tunnel_port
else:
host = self.host
port = self.port
try:
host_enc = host.encode("ascii")
except UnicodeEncodeError:
host_enc = host.encode("idna")
# As per RFC 273, IPv6 address should be wrapped with []
# when used as Host header
if host.find(':') >= 0:
host_enc = b'[' + host_enc + b']'
if port == self.default_port:
self.putheader('Host', host_enc)
else:
host_enc = host_enc.decode("ascii")
self.putheader('Host', "%s:%s" % (host_enc, port))
# note: we are assuming that clients will not attempt to set these
# headers since *this* library must deal with the
# consequences. this also means that when the supporting
# libraries are updated to recognize other forms, then this
# code should be changed (removed or updated).
# we only want a Content-Encoding of "identity" since we don't
# support encodings such as x-gzip or x-deflate.
if not skip_accept_encoding:
self.putheader('Accept-Encoding', 'identity')
# we can accept "chunked" Transfer-Encodings, but no others
# NOTE: no TE header implies *only* "chunked"
#self.putheader('TE', 'chunked')
# if TE is supplied in the header, then it must appear in a
# Connection header.
#self.putheader('Connection', 'TE')
else:
# For HTTP/1.0, the server will assume "not chunked"
pass
def putheader(self, header, *values):
"""Send a request header line to the server.
For example: h.putheader('Accept', 'text/html')
"""
if self.__state != _CS_REQ_STARTED:
raise CannotSendHeader()
if hasattr(header, 'encode'):
header = header.encode('ascii')
if not _is_legal_header_name(header):
raise ValueError('Invalid header name %r' % (header,))
values = list(values)
for i, one_value in enumerate(values):
if hasattr(one_value, 'encode'):
values[i] = one_value.encode('latin-1')
elif isinstance(one_value, int):
values[i] = str(one_value).encode('ascii')
if _is_illegal_header_value(values[i]):
raise ValueError('Invalid header value %r' % (values[i],))
value = b'\r\n\t'.join(values)
header = header + b': ' + value
self._output(header)
def endheaders(self, message_body=None):
"""Indicate that the last header line has been sent to the server.
This method sends the request to the server. The optional message_body
argument can be used to pass a message body associated with the
request. The message body will be sent in the same packet as the
message headers if it is a string, otherwise it is sent as a separate
packet.
"""
if self.__state == _CS_REQ_STARTED:
self.__state = _CS_REQ_SENT
else:
raise CannotSendHeader()
self._send_output(message_body)
def request(self, method, url, body=None, headers={}):
"""Send a complete request to the server."""
self._send_request(method, url, body, headers)
def _set_content_length(self, body, method):
# Set the content-length based on the body. If the body is "empty", we
# set Content-Length: 0 for methods that expect a body (RFC 7230,
# Section 3.3.2). If the body is set for other methods, we set the
# header provided we can figure out what the length is.
thelen = None
method_expects_body = method.upper() in _METHODS_EXPECTING_BODY
if body is None and method_expects_body:
thelen = '0'
elif body is not None:
try:
thelen = str(len(body))
except TypeError:
# If this is a file-like object, try to
# fstat its file descriptor
try:
thelen = str(os.fstat(body.fileno()).st_size)
except (AttributeError, OSError):
# Don't send a length if this failed
if self.debuglevel > 0: print("Cannot stat!!")
if thelen is not None:
self.putheader('Content-Length', thelen)
def _send_request(self, method, url, body, headers):
# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = dict.fromkeys([k.lower() for k in headers])
skips = {}
if 'host' in header_names:
skips['skip_host'] = 1
if 'accept-encoding' in header_names:
skips['skip_accept_encoding'] = 1
self.putrequest(method, url, **skips)
if 'content-length' not in header_names:
self._set_content_length(body, method)
for hdr, value in headers.items():
self.putheader(hdr, value)
if isinstance(body, str):
# RFC 2616 Section 3.7.1 says that text default has a
# default charset of iso-8859-1.
body = body.encode('iso-8859-1')
self.endheaders(body)
def getresponse(self):
"""Get the response from the server.
If the HTTPConnection is in the correct state, returns an
instance of HTTPResponse or of whatever object is returned by
class the response_class variable.
If a request has not been sent or if a previous response has
not be handled, ResponseNotReady is raised. If the HTTP
response indicates that the connection should be closed, then
it will be closed before the response is returned. When the
connection is closed, the underlying socket is closed.
"""
# if a prior response has been completed, then forget about it.
if self.__response and self.__response.isclosed():
self.__response = None
# if a prior response exists, then it must be completed (otherwise, we
# cannot read this response's header to determine the connection-close
# behavior)
#
# note: if a prior response existed, but was connection-close, then the
# socket and response were made independent of this HTTPConnection
# object since a new request requires that we open a whole new
# connection
#
# this means the prior response had one of two states:
# 1) will_close: this connection was reset and the prior socket and
# response operate independently
# 2) persistent: the response was retained and we await its
# isclosed() status to become true.
#
if self.__state != _CS_REQ_SENT or self.__response:
raise ResponseNotReady(self.__state)
if self.debuglevel > 0:
response = self.response_class(self.sock, self.debuglevel,
method=self._method)
else:
response = self.response_class(self.sock, method=self._method)
try:
response.begin()
assert response.will_close != _UNKNOWN
self.__state = _CS_IDLE
if response.will_close:
# this effectively passes the connection to the response
self.close()
else:
# remember this, so we can tell when it is complete
self.__response = response
return response
except:
response.close()
raise
try:
import ssl
except ImportError:
pass
else:
class HTTPSConnection(HTTPConnection):
"This class allows communication via SSL."
default_port = HTTPS_PORT
# XXX Should key_file and cert_file be deprecated in favour of context?
def __init__(self, host, port=None, key_file=None, cert_file=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None, *, context=None,
check_hostname=None):
super(HTTPSConnection, self).__init__(host, port, timeout,
source_address)
self.key_file = key_file
self.cert_file = cert_file
if context is None:
context = ssl._create_default_https_context()
will_verify = context.verify_mode != ssl.CERT_NONE
if check_hostname is None:
check_hostname = context.check_hostname
if check_hostname and not will_verify:
raise ValueError("check_hostname needs a SSL context with "
"either CERT_OPTIONAL or CERT_REQUIRED")
if key_file or cert_file:
context.load_cert_chain(cert_file, key_file)
self._context = context
self._check_hostname = check_hostname
def connect(self):
"Connect to a host on a given (SSL) port."
super().connect()
if self._tunnel_host:
server_hostname = self._tunnel_host
else:
server_hostname = self.host
self.sock = self._context.wrap_socket(self.sock,
server_hostname=server_hostname)
if not self._context.check_hostname and self._check_hostname:
try:
ssl.match_hostname(self.sock.getpeercert(), server_hostname)
except Exception:
self.sock.shutdown(socket.SHUT_RDWR)
self.sock.close()
raise
__all__.append("HTTPSConnection")
class HTTPException(Exception):
# Subclasses that define an __init__ must call Exception.__init__
# or define self.args. Otherwise, str() will fail.
pass
class NotConnected(HTTPException):
pass
class InvalidURL(HTTPException):
pass
class UnknownProtocol(HTTPException):
def __init__(self, version):
self.args = version,
self.version = version
class UnknownTransferEncoding(HTTPException):
pass
class UnimplementedFileMode(HTTPException):
pass
class IncompleteRead(HTTPException):
def __init__(self, partial, expected=None):
self.args = partial,
self.partial = partial
self.expected = expected
def __repr__(self):
if self.expected is not None:
e = ', %i more expected' % self.expected
else:
e = ''
return 'IncompleteRead(%i bytes read%s)' % (len(self.partial), e)
def __str__(self):
return repr(self)
class ImproperConnectionState(HTTPException):
pass
class CannotSendRequest(ImproperConnectionState):
pass
class CannotSendHeader(ImproperConnectionState):
pass
class ResponseNotReady(ImproperConnectionState):
pass
class BadStatusLine(HTTPException):
def __init__(self, line):
if not line:
line = repr(line)
self.args = line,
self.line = line
class LineTooLong(HTTPException):
def __init__(self, line_type):
HTTPException.__init__(self, "got more than %d bytes when reading %s"
% (_MAXLINE, line_type))
# for backwards compatibility
error = HTTPException
"""HTTP server classes.
Note: BaseHTTPRequestHandler doesn't implement any HTTP request; see
SimpleHTTPRequestHandler for simple implementations of GET, HEAD and POST,
and CGIHTTPRequestHandler for CGI scripts.
It does, however, optionally implement HTTP/1.1 persistent connections,
as of version 0.3.
Notes on CGIHTTPRequestHandler
------------------------------
This class implements GET and POST requests to cgi-bin scripts.
If the os.fork() function is not present (e.g. on Windows),
subprocess.Popen() is used as a fallback, with slightly altered semantics.
In all cases, the implementation is intentionally naive -- all
requests are executed synchronously.
SECURITY WARNING: DON'T USE THIS CODE UNLESS YOU ARE INSIDE A FIREWALL
-- it may execute arbitrary Python code or external programs.
Note that status code 200 is sent prior to execution of a CGI script, so
scripts cannot send other status codes such as 302 (redirect).
XXX To do:
- log requests even later (to capture byte count)
- log user-agent header and other interesting goodies
- send error log to separate file
"""
# See also:
#
# HTTP Working Group T. Berners-Lee
# INTERNET-DRAFT R. T. Fielding
# <draft-ietf-http-v10-spec-00.txt> H. Frystyk Nielsen
# Expires September 8, 1995 March 8, 1995
#
# URL: http://www.ics.uci.edu/pub/ietf/http/draft-ietf-http-v10-spec-00.txt
#
# and
#
# Network Working Group R. Fielding
# Request for Comments: 2616 et al
# Obsoletes: 2068 June 1999
# Category: Standards Track
#
# URL: http://www.faqs.org/rfcs/rfc2616.html
# Log files
# ---------
#
# Here's a quote from the NCSA httpd docs about log file format.
#
# | The logfile format is as follows. Each line consists of:
# |
# | host rfc931 authuser [DD/Mon/YYYY:hh:mm:ss] "request" ddd bbbb
# |
# | host: Either the DNS name or the IP number of the remote client
# | rfc931: Any information returned by identd for this person,
# | - otherwise.
# | authuser: If user sent a userid for authentication, the user name,
# | - otherwise.
# | DD: Day
# | Mon: Month (calendar name)
# | YYYY: Year
# | hh: hour (24-hour format, the machine's timezone)
# | mm: minutes
# | ss: seconds
# | request: The first line of the HTTP request as sent by the client.
# | ddd: the status code returned by the server, - if not available.
# | bbbb: the total number of bytes sent,
# | *not including the HTTP/1.0 header*, - if not available
# |
# | You can determine the name of the file accessed through request.
#
# (Actually, the latter is only true if you know the server configuration
# at the time the request was made!)
__version__ = "0.6"
__all__ = [
"HTTPServer", "BaseHTTPRequestHandler",
"SimpleHTTPRequestHandler", "CGIHTTPRequestHandler",
]
import html
import http.client
import io
import mimetypes
import os
import posixpath
import select
import shutil
import socket # For gethostbyaddr()
import socketserver
import sys
import time
import urllib.parse
import copy
import argparse
# Default error message template
DEFAULT_ERROR_MESSAGE = """\
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>Error response</title>
</head>
<body>
<h1>Error response</h1>
<p>Error code: %(code)d</p>
<p>Message: %(message)s.</p>
<p>Error code explanation: %(code)s - %(explain)s.</p>
</body>
</html>
"""
DEFAULT_ERROR_CONTENT_TYPE = "text/html;charset=utf-8"
def _quote_html(html):
return html.replace("&", "&").replace("<", "<").replace(">", ">")
class HTTPServer(socketserver.TCPServer):
allow_reuse_address = 1 # Seems to make sense in testing environment
def server_bind(self):
"""Override server_bind to store the server name."""
socketserver.TCPServer.server_bind(self)
host, port = self.socket.getsockname()[:2]
self.server_name = socket.getfqdn(host)
self.server_port = port
class BaseHTTPRequestHandler(socketserver.StreamRequestHandler):
"""HTTP request handler base class.
The following explanation of HTTP serves to guide you through the
code as well as to expose any misunderstandings I may have about
HTTP (so you don't need to read the code to figure out I'm wrong
:-).
HTTP (HyperText Transfer Protocol) is an extensible protocol on
top of a reliable stream transport (e.g. TCP/IP). The protocol
recognizes three parts to a request:
1. One line identifying the request type and path
2. An optional set of RFC-822-style headers
3. An optional data part
The headers and data are separated by a blank line.
The first line of the request has the form
<command> <path> <version>
where <command> is a (case-sensitive) keyword such as GET or POST,
<path> is a string containing path information for the request,
and <version> should be the string "HTTP/1.0" or "HTTP/1.1".
<path> is encoded using the URL encoding scheme (using %xx to signify
the ASCII character with hex code xx).
The specification specifies that lines are separated by CRLF but
for compatibility with the widest range of clients recommends
servers also handle LF. Similarly, whitespace in the request line
is treated sensibly (allowing multiple spaces between components
and allowing trailing whitespace).
Similarly, for output, lines ought to be separated by CRLF pairs
but most clients grok LF characters just fine.
If the first line of the request has the form
<command> <path>
(i.e. <version> is left out) then this is assumed to be an HTTP
0.9 request; this form has no optional headers and data part and
the reply consists of just the data.
The reply form of the HTTP 1.x protocol again has three parts:
1. One line giving the response code
2. An optional set of RFC-822-style headers
3. The data
Again, the headers and data are separated by a blank line.
The response code line has the form
<version> <responsecode> <responsestring>
where <version> is the protocol version ("HTTP/1.0" or "HTTP/1.1"),
<responsecode> is a 3-digit response code indicating success or
failure of the request, and <responsestring> is an optional
human-readable string explaining what the response code means.
This server parses the request and the headers, and then calls a
function specific to the request type (<command>). Specifically,
a request SPAM will be handled by a method do_SPAM(). If no
such method exists the server sends an error response to the
client. If it exists, it is called with no arguments:
do_SPAM()
Note that the request name is case sensitive (i.e. SPAM and spam
are different requests).
The various request details are stored in instance variables:
- client_address is the client IP address in the form (host,
port);
- command, path and version are the broken-down request line;
- headers is an instance of email.message.Message (or a derived
class) containing the header information;
- rfile is a file object open for reading positioned at the
start of the optional input data part;
- wfile is a file object open for writing.
IT IS IMPORTANT TO ADHERE TO THE PROTOCOL FOR WRITING!
The first thing to be written must be the response line. Then
follow 0 or more header lines, then a blank line, and then the
actual data (if any). The meaning of the header lines depends on
the command executed by the server; in most cases, when data is
returned, there should be at least one header line of the form
Content-type: <type>/<subtype>
where <type> and <subtype> should be registered MIME types,
e.g. "text/html" or "text/plain".
"""
# The Python system version, truncated to its first component.
sys_version = "Python/" + sys.version.split()[0]
# The server software version. You may want to override this.
# The format is multiple whitespace-separated strings,
# where each string is of the form name[/version].
server_version = "BaseHTTP/" + __version__
error_message_format = DEFAULT_ERROR_MESSAGE
error_content_type = DEFAULT_ERROR_CONTENT_TYPE
# The default request version. This only affects responses up until
# the point where the request line is parsed, so it mainly decides what
# the client gets back when sending a malformed request line.
# Most web servers default to HTTP 0.9, i.e. don't send a status line.
default_request_version = "HTTP/0.9"
def parse_request(self):
"""Parse a request (internal).
The request should be stored in self.raw_requestline; the results
are in self.command, self.path, self.request_version and
self.headers.
Return True for success, False for failure; on failure, an
error is sent back.
"""
self.command = None # set in case of error on the first line
self.request_version = version = self.default_request_version
self.close_connection = True
requestline = str(self.raw_requestline, 'iso-8859-1')
requestline = requestline.rstrip('\r\n')
self.requestline = requestline
words = requestline.split()
if len(words) == 3:
command, path, version = words
if version[:5] != 'HTTP/':
self.send_error(400, "Bad request version (%r)" % version)
return False
try:
base_version_number = version.split('/', 1)[1]
version_number = base_version_number.split(".")
# RFC 2145 section 3.1 says there can be only one "." and
# - major and minor numbers MUST be treated as
# separate integers;
# - HTTP/2.4 is a lower version than HTTP/2.13, which in
# turn is lower than HTTP/12.3;
# - Leading zeros MUST be ignored by recipients.
if len(version_number) != 2:
raise ValueError
version_number = int(version_number[0]), int(version_number[1])
except (ValueError, IndexError):
self.send_error(400, "Bad request version (%r)" % version)
return False
if version_number >= (1, 1) and self.protocol_version >= "HTTP/1.1":
self.close_connection = False
if version_number >= (2, 0):
self.send_error(505,
"Invalid HTTP Version (%s)" % base_version_number)
return False
elif len(words) == 2:
command, path = words
self.close_connection = True
if command != 'GET':
self.send_error(400,
"Bad HTTP/0.9 request type (%r)" % command)
return False
elif not words:
return False
else:
self.send_error(400, "Bad request syntax (%r)" % requestline)
return False
self.command, self.path, self.request_version = command, path, version
# Examine the headers and look for a Connection directive.
try:
self.headers = http.client.parse_headers(self.rfile,
_class=self.MessageClass)
except http.client.LineTooLong:
self.send_error(400, "Line too long")
return False
conntype = self.headers.get('Connection', "")
if conntype.lower() == 'close':
self.close_connection = True
elif (conntype.lower() == 'keep-alive' and
self.protocol_version >= "HTTP/1.1"):
self.close_connection = False
# Examine the headers and look for an Expect directive
expect = self.headers.get('Expect', "")
if (expect.lower() == "100-continue" and
self.protocol_version >= "HTTP/1.1" and
self.request_version >= "HTTP/1.1"):
if not self.handle_expect_100():
return False
return True
def handle_expect_100(self):
"""Decide what to do with an "Expect: 100-continue" header.
If the client is expecting a 100 Continue response, we must
respond with either a 100 Continue or a final response before
waiting for the request body. The default is to always respond
with a 100 Continue. You can behave differently (for example,
reject unauthorized requests) by overriding this method.
This method should either return True (possibly after sending
a 100 Continue response) or send an error response and return
False.
"""
self.send_response_only(100)
self.end_headers()
return True
def handle_one_request(self):
"""Handle a single HTTP request.
You normally don't need to override this method; see the class
__doc__ string for information on how to handle specific HTTP
commands such as GET and POST.
"""
try:
self.raw_requestline = self.rfile.readline(65537)
if len(self.raw_requestline) > 65536:
self.requestline = ''
self.request_version = ''
self.command = ''
self.send_error(414)
return
if not self.raw_requestline:
self.close_connection = True
return
if not self.parse_request():
# An error code has been sent, just exit
return
mname = 'do_' + self.command
if not hasattr(self, mname):
self.send_error(501, "Unsupported method (%r)" % self.command)
return
method = getattr(self, mname)
method()
self.wfile.flush() #actually send the response if not already done.
except socket.timeout as e:
#a read or a write timed out. Discard this connection
self.log_error("Request timed out: %r", e)
self.close_connection = True
return
def handle(self):
"""Handle multiple requests if necessary."""
self.close_connection = True
self.handle_one_request()
while not self.close_connection:
self.handle_one_request()
def send_error(self, code, message=None, explain=None):
"""Send and log an error reply.
Arguments are
* code: an HTTP error code
3 digits
* message: a simple optional 1 line reason phrase.
*( HTAB / SP / VCHAR / %x80-FF )
defaults to short entry matching the response code
* explain: a detailed message defaults to the long entry
matching the response code.
This sends an error response (so it must be called before any
output has been generated), logs the error, and finally sends
a piece of HTML explaining the error to the user.
"""
try:
shortmsg, longmsg = self.responses[code]
except KeyError:
shortmsg, longmsg = '???', '???'
if message is None:
message = shortmsg
if explain is None:
explain = longmsg
self.log_error("code %d, message %s", code, message)
# using _quote_html to prevent Cross Site Scripting attacks (see bug #1100201)
content = (self.error_message_format %
{'code': code, 'message': _quote_html(message), 'explain': _quote_html(explain)})
body = content.encode('UTF-8', 'replace')
self.send_response(code, message)
self.send_header("Content-Type", self.error_content_type)
self.send_header('Connection', 'close')
self.send_header('Content-Length', int(len(body)))
self.end_headers()
if self.command != 'HEAD' and code >= 200 and code not in (204, 304):
self.wfile.write(body)
def send_response(self, code, message=None):
"""Add the response header to the headers buffer and log the
response code.
Also send two standard headers with the server software
version and the current date.
"""
self.log_request(code)
self.send_response_only(code, message)
self.send_header('Server', self.version_string())
self.send_header('Date', self.date_time_string())
def send_response_only(self, code, message=None):
"""Send the response header only."""
if message is None:
if code in self.responses:
message = self.responses[code][0]
else:
message = ''
if self.request_version != 'HTTP/0.9':
if not hasattr(self, '_headers_buffer'):
self._headers_buffer = []
self._headers_buffer.append(("%s %d %s\r\n" %
(self.protocol_version, code, message)).encode(
'latin-1', 'strict'))
def send_header(self, keyword, value):
"""Send a MIME header to the headers buffer."""
if self.request_version != 'HTTP/0.9':
if not hasattr(self, '_headers_buffer'):
self._headers_buffer = []
self._headers_buffer.append(
("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))
if keyword.lower() == 'connection':
if value.lower() == 'close':
self.close_connection = True
elif value.lower() == 'keep-alive':
self.close_connection = False
def end_headers(self):
"""Send the blank line ending the MIME headers."""
if self.request_version != 'HTTP/0.9':
self._headers_buffer.append(b"\r\n")
self.flush_headers()
def flush_headers(self):
if hasattr(self, '_headers_buffer'):
self.wfile.write(b"".join(self._headers_buffer))
self._headers_buffer = []
def log_request(self, code='-', size='-'):
"""Log an accepted request.
This is called by send_response().
"""
self.log_message('"%s" %s %s',
self.requestline, str(code), str(size))
def log_error(self, format, *args):
"""Log an error.
This is called when a request cannot be fulfilled. By
default it passes the message on to log_message().
Arguments are the same as for log_message().
XXX This should go to the separate error log.
"""
self.log_message(format, *args)
def log_message(self, format, *args):
"""Log an arbitrary message.
This is used by all other logging functions. Override
it if you have specific logging wishes.
The first argument, FORMAT, is a format string for the
message to be logged. If the format string contains
any % escapes requiring parameters, they should be
specified as subsequent arguments (it's just like
printf!).
The client ip and current date/time are prefixed to
every message.
"""
sys.stderr.write("%s - - [%s] %s\n" %
(self.address_string(),
self.log_date_time_string(),
format%args))
def version_string(self):
"""Return the server software version string."""
return self.server_version + ' ' + self.sys_version
def date_time_string(self, timestamp=None):
"""Return the current date and time formatted for a message header."""
if timestamp is None:
timestamp = time.time()
year, month, day, hh, mm, ss, wd, y, z = time.gmtime(timestamp)
s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
self.weekdayname[wd],
day, self.monthname[month], year,
hh, mm, ss)
return s
def log_date_time_string(self):
"""Return the current time formatted for logging."""
now = time.time()
year, month, day, hh, mm, ss, x, y, z = time.localtime(now)
s = "%02d/%3s/%04d %02d:%02d:%02d" % (
day, self.monthname[month], year, hh, mm, ss)
return s
weekdayname = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
monthname = [None,
'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
def address_string(self):
"""Return the client address."""
return self.client_address[0]
# Essentially static class variables
# The version of the HTTP protocol we support.
# Set this to HTTP/1.1 to enable automatic keepalive
protocol_version = "HTTP/1.0"
# MessageClass used to parse headers
MessageClass = http.client.HTTPMessage
# Table mapping response codes to messages; entries have the
# form {code: (shortmessage, longmessage)}.
# See RFC 2616 and 6585.
responses = {
100: ('Continue', 'Request received, please continue'),
101: ('Switching Protocols',
'Switching to new protocol; obey Upgrade header'),
200: ('OK', 'Request fulfilled, document follows'),
201: ('Created', 'Document created, URL follows'),
202: ('Accepted',
'Request accepted, processing continues off-line'),
203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
204: ('No Content', 'Request fulfilled, nothing follows'),
205: ('Reset Content', 'Clear input form for further input.'),
206: ('Partial Content', 'Partial content follows.'),
300: ('Multiple Choices',
'Object has several resources -- see URI list'),
301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
302: ('Found', 'Object moved temporarily -- see URI list'),
303: ('See Other', 'Object moved -- see Method and URL list'),
304: ('Not Modified',
'Document has not changed since given time'),
305: ('Use Proxy',
'You must use proxy specified in Location to access this '
'resource.'),
307: ('Temporary Redirect',
'Object moved temporarily -- see URI list'),
400: ('Bad Request',
'Bad request syntax or unsupported method'),
401: ('Unauthorized',
'No permission -- see authorization schemes'),
402: ('Payment Required',
'No payment -- see charging schemes'),
403: ('Forbidden',
'Request forbidden -- authorization will not help'),
404: ('Not Found', 'Nothing matches the given URI'),
405: ('Method Not Allowed',
'Specified method is invalid for this resource.'),
406: ('Not Acceptable', 'URI not available in preferred format.'),
407: ('Proxy Authentication Required', 'You must authenticate with '
'this proxy before proceeding.'),
408: ('Request Timeout', 'Request timed out; try again later.'),
409: ('Conflict', 'Request conflict.'),
410: ('Gone',
'URI no longer exists and has been permanently removed.'),
411: ('Length Required', 'Client must specify Content-Length.'),
412: ('Precondition Failed', 'Precondition in headers is false.'),
413: ('Request Entity Too Large', 'Entity is too large.'),
414: ('Request-URI Too Long', 'URI is too long.'),
415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
416: ('Requested Range Not Satisfiable',
'Cannot satisfy request range.'),
417: ('Expectation Failed',
'Expect condition could not be satisfied.'),
428: ('Precondition Required',
'The origin server requires the request to be conditional.'),
429: ('Too Many Requests', 'The user has sent too many requests '
'in a given amount of time ("rate limiting").'),
431: ('Request Header Fields Too Large', 'The server is unwilling to '
'process the request because its header fields are too large.'),
500: ('Internal Server Error', 'Server got itself in trouble'),
501: ('Not Implemented',
'Server does not support this operation'),
502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
503: ('Service Unavailable',
'The server cannot process the request due to a high load'),
504: ('Gateway Timeout',
'The gateway server did not receive a timely response'),
505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
511: ('Network Authentication Required',
'The client needs to authenticate to gain network access.'),
}
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
"""Simple HTTP request handler with GET and HEAD commands.
This serves files from the current directory and any of its
subdirectories. The MIME type for files is determined by
calling the .guess_type() method.
The GET and HEAD requests are identical except that the HEAD
request omits the actual contents of the file.
"""
server_version = "SimpleHTTP/" + __version__
def do_GET(self):
"""Serve a GET request."""
f = self.send_head()
if f:
try:
self.copyfile(f, self.wfile)
finally:
f.close()
def do_HEAD(self):
"""Serve a HEAD request."""
f = self.send_head()
if f:
f.close()
def send_head(self):
"""Common code for GET and HEAD commands.
This sends the response code and MIME headers.
Return value is either a file object (which has to be copied
to the outputfile by the caller unless the command was HEAD,
and must be closed by the caller under all circumstances), or
None, in which case the caller has nothing further to do.
"""
path = self.translate_path(self.path)
f = None
if os.path.isdir(path):
parts = urllib.parse.urlsplit(self.path)
if not parts.path.endswith('/'):
# redirect browser - doing basically what apache does
self.send_response(301)
new_parts = (parts[0], parts[1], parts[2] + '/',
parts[3], parts[4])
new_url = urllib.parse.urlunsplit(new_parts)
self.send_header("Location", new_url)
self.end_headers()
return None
for index in "index.html", "index.htm":
index = os.path.join(path, index)
if os.path.exists(index):
path = index
break
else:
return self.list_directory(path)
ctype = self.guess_type(path)
try:
f = open(path, 'rb')
except OSError:
self.send_error(404, "File not found")
return None
try:
self.send_response(200)
self.send_header("Content-type", ctype)
fs = os.fstat(f.fileno())
self.send_header("Content-Length", str(fs[6]))
self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
self.end_headers()
return f
except:
f.close()
raise
def list_directory(self, path):
"""Helper to produce a directory listing (absent index.html).
Return value is either a file object, or None (indicating an
error). In either case, the headers are sent, making the
interface the same as for send_head().
"""
try:
list = os.listdir(path)
except OSError:
self.send_error(404, "No permission to list directory")
return None
list.sort(key=lambda a: a.lower())
r = []
try:
displaypath = urllib.parse.unquote(self.path,
errors='surrogatepass')
except UnicodeDecodeError:
displaypath = urllib.parse.unquote(path)
displaypath = html.escape(displaypath)
enc = sys.getfilesystemencoding()
title = 'Directory listing for %s' % displaypath
r.append('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
'"http://www.w3.org/TR/html4/strict.dtd">')
r.append('<html>\n<head>')
r.append('<meta http-equiv="Content-Type" '
'content="text/html; charset=%s">' % enc)
r.append('<title>%s</title>\n</head>' % title)
r.append('<body>\n<h1>%s</h1>' % title)
r.append('<hr>\n<ul>')
for name in list:
fullname = os.path.join(path, name)
displayname = linkname = name
# Append / for directories or @ for symbolic links
if os.path.isdir(fullname):
displayname = name + "/"
linkname = name + "/"
if os.path.islink(fullname):
displayname = name + "@"
# Note: a link to a directory displays with @ and links with /
r.append('<li><a href="%s">%s</a></li>'
% (urllib.parse.quote(linkname,
errors='surrogatepass'),
html.escape(displayname)))
r.append('</ul>\n<hr>\n</body>\n</html>\n')
encoded = '\n'.join(r).encode(enc, 'surrogateescape')
f = io.BytesIO()
f.write(encoded)
f.seek(0)
self.send_response(200)
self.send_header("Content-type", "text/html; charset=%s" % enc)
self.send_header("Content-Length", str(len(encoded)))
self.end_headers()
return f
def translate_path(self, path):
"""Translate a /-separated PATH to the local filename syntax.
Components that mean special things to the local file system
(e.g. drive or directory names) are ignored. (XXX They should
probably be diagnosed.)
"""
# abandon query parameters
path = path.split('?',1)[0]
path = path.split('#',1)[0]
# Don't forget explicit trailing slash when normalizing. Issue17324
trailing_slash = path.rstrip().endswith('/')
try:
path = urllib.parse.unquote(path, errors='surrogatepass')
except UnicodeDecodeError:
path = urllib.parse.unquote(path)
path = posixpath.normpath(path)
words = path.split('/')
words = filter(None, words)
path = os.getcwd()
for word in words:
if os.path.dirname(word) or word in (os.curdir, os.pardir):
# Ignore components that are not a simple file/directory name
continue
path = os.path.join(path, word)
if trailing_slash:
path += '/'
return path
def copyfile(self, source, outputfile):
"""Copy all data between two file objects.
The SOURCE argument is a file object open for reading
(or anything with a read() method) and the DESTINATION
argument is a file object open for writing (or
anything with a write() method).
The only reason for overriding this would be to change
the block size or perhaps to replace newlines by CRLF
-- note however that this the default server uses this
to copy binary data as well.
"""
shutil.copyfileobj(source, outputfile)
def guess_type(self, path):
"""Guess the type of a file.
Argument is a PATH (a filename).
Return value is a string of the form type/subtype,
usable for a MIME Content-type header.
The default implementation looks the file's extension
up in the table self.extensions_map, using application/octet-stream
as a default; however it would be permissible (if
slow) to look inside the data to make a better guess.
"""
base, ext = posixpath.splitext(path)
if ext in self.extensions_map:
return self.extensions_map[ext]
ext = ext.lower()
if ext in self.extensions_map:
return self.extensions_map[ext]
else:
return self.extensions_map['']
if not mimetypes.inited:
mimetypes.init() # try to read system mime.types
extensions_map = mimetypes.types_map.copy()
extensions_map.update({
'': 'application/octet-stream', # Default
'.py': 'text/plain',
'.c': 'text/plain',
'.h': 'text/plain',
})
# Utilities for CGIHTTPRequestHandler
def _url_collapse_path(path):
"""
Given a URL path, remove extra '/'s and '.' path elements and collapse
any '..' references and returns a collapsed path.
Implements something akin to RFC-2396 5.2 step 6 to parse relative paths.
The utility of this function is limited to is_cgi method and helps
preventing some security attacks.
Returns: The reconstituted URL, which will always start with a '/'.
Raises: IndexError if too many '..' occur within the path.
"""
# Query component should not be involved.
path, _, query = path.partition('?')
path = urllib.parse.unquote(path)
# Similar to os.path.split(os.path.normpath(path)) but specific to URL
# path semantics rather than local operating system semantics.
path_parts = path.split('/')
head_parts = []
for part in path_parts[:-1]:
if part == '..':
head_parts.pop() # IndexError if more '..' than prior parts
elif part and part != '.':
head_parts.append( part )
if path_parts:
tail_part = path_parts.pop()
if tail_part:
if tail_part == '..':
head_parts.pop()
tail_part = ''
elif tail_part == '.':
tail_part = ''
else:
tail_part = ''
if query:
tail_part = '?'.join((tail_part, query))
splitpath = ('/' + '/'.join(head_parts), tail_part)
collapsed_path = "/".join(splitpath)
return collapsed_path
nobody = None
def nobody_uid():
"""Internal routine to get nobody's uid"""
global nobody
if nobody:
return nobody
try:
import pwd
except ImportError:
return -1
try:
nobody = pwd.getpwnam('nobody')[2]
except KeyError:
nobody = 1 + max(x[2] for x in pwd.getpwall())
return nobody
def executable(path):
"""Test for executable file."""
return os.access(path, os.X_OK)
class CGIHTTPRequestHandler(SimpleHTTPRequestHandler):
"""Complete HTTP server with GET, HEAD and POST commands.
GET and HEAD also support running CGI scripts.
The POST command is *only* implemented for CGI scripts.
"""
# Determine platform specifics
have_fork = hasattr(os, 'fork')
# Make rfile unbuffered -- we need to read one line and then pass
# the rest to a subprocess, so we can't use buffered input.
rbufsize = 0
def do_POST(self):
"""Serve a POST request.
This is only implemented for CGI scripts.
"""
if self.is_cgi():
self.run_cgi()
else:
self.send_error(501, "Can only POST to CGI scripts")
def send_head(self):
"""Version of send_head that support CGI scripts"""
if self.is_cgi():
return self.run_cgi()
else:
return SimpleHTTPRequestHandler.send_head(self)
def is_cgi(self):
"""Test whether self.path corresponds to a CGI script.
Returns True and updates the cgi_info attribute to the tuple
(dir, rest) if self.path requires running a CGI script.
Returns False otherwise.
If any exception is raised, the caller should assume that
self.path was rejected as invalid and act accordingly.
The default implementation tests whether the normalized url
path begins with one of the strings in self.cgi_directories
(and the next character is a '/' or the end of the string).
"""
collapsed_path = _url_collapse_path(self.path)
dir_sep = collapsed_path.find('/', 1)
head, tail = collapsed_path[:dir_sep], collapsed_path[dir_sep+1:]
if head in self.cgi_directories:
self.cgi_info = head, tail
return True
return False
cgi_directories = ['/cgi-bin', '/htbin']
def is_executable(self, path):
"""Test whether argument path is an executable file."""
return executable(path)
def is_python(self, path):
"""Test whether argument path is a Python script."""
head, tail = os.path.splitext(path)
return tail.lower() in (".py", ".pyw")
def run_cgi(self):
"""Execute a CGI script."""
dir, rest = self.cgi_info
path = dir + '/' + rest
i = path.find('/', len(dir)+1)
while i >= 0:
nextdir = path[:i]
nextrest = path[i+1:]
scriptdir = self.translate_path(nextdir)
if os.path.isdir(scriptdir):
dir, rest = nextdir, nextrest
i = path.find('/', len(dir)+1)
else:
break
# find an explicit query string, if present.
rest, _, query = rest.partition('?')
# dissect the part after the directory name into a script name &
# a possible additional path, to be stored in PATH_INFO.
i = rest.find('/')
if i >= 0:
script, rest = rest[:i], rest[i:]
else:
script, rest = rest, ''
scriptname = dir + '/' + script
scriptfile = self.translate_path(scriptname)
if not os.path.exists(scriptfile):
self.send_error(404, "No such CGI script (%r)" % scriptname)
return
if not os.path.isfile(scriptfile):
self.send_error(403, "CGI script is not a plain file (%r)" %
scriptname)
return
ispy = self.is_python(scriptname)
if self.have_fork or not ispy:
if not self.is_executable(scriptfile):
self.send_error(403, "CGI script is not executable (%r)" %
scriptname)
return
# Reference: http://hoohoo.ncsa.uiuc.edu/cgi/env.html
# XXX Much of the following could be prepared ahead of time!
env = copy.deepcopy(os.environ)
env['SERVER_SOFTWARE'] = self.version_string()
env['SERVER_NAME'] = self.server.server_name
env['GATEWAY_INTERFACE'] = 'CGI/1.1'
env['SERVER_PROTOCOL'] = self.protocol_version
env['SERVER_PORT'] = str(self.server.server_port)
env['REQUEST_METHOD'] = self.command
uqrest = urllib.parse.unquote(rest)
env['PATH_INFO'] = uqrest
env['PATH_TRANSLATED'] = self.translate_path(uqrest)
env['SCRIPT_NAME'] = scriptname
if query:
env['QUERY_STRING'] = query
env['REMOTE_ADDR'] = self.client_address[0]
authorization = self.headers.get("authorization")
if authorization:
authorization = authorization.split()
if len(authorization) == 2:
import base64, binascii
env['AUTH_TYPE'] = authorization[0]
if authorization[0].lower() == "basic":
try:
authorization = authorization[1].encode('ascii')
authorization = base64.decodebytes(authorization).\
decode('ascii')
except (binascii.Error, UnicodeError):
pass
else:
authorization = authorization.split(':')
if len(authorization) == 2:
env['REMOTE_USER'] = authorization[0]
# XXX REMOTE_IDENT
if self.headers.get('content-type') is None:
env['CONTENT_TYPE'] = self.headers.get_content_type()
else:
env['CONTENT_TYPE'] = self.headers['content-type']
length = self.headers.get('content-length')
if length:
env['CONTENT_LENGTH'] = length
referer = self.headers.get('referer')
if referer:
env['HTTP_REFERER'] = referer
accept = []
for line in self.headers.getallmatchingheaders('accept'):
if line[:1] in "\t\n\r ":
accept.append(line.strip())
else:
accept = accept + line[7:].split(',')
env['HTTP_ACCEPT'] = ','.join(accept)
ua = self.headers.get('user-agent')
if ua:
env['HTTP_USER_AGENT'] = ua
co = filter(None, self.headers.get_all('cookie', []))
cookie_str = ', '.join(co)
if cookie_str:
env['HTTP_COOKIE'] = cookie_str
# XXX Other HTTP_* headers
# Since we're setting the env in the parent, provide empty
# values to override previously set values
for k in ('QUERY_STRING', 'REMOTE_HOST', 'CONTENT_LENGTH',
'HTTP_USER_AGENT', 'HTTP_COOKIE', 'HTTP_REFERER'):
env.setdefault(k, "")
self.send_response(200, "Script output follows")
self.flush_headers()
decoded_query = query.replace('+', ' ')
if self.have_fork:
# Unix -- fork as we should
args = [script]
if '=' not in decoded_query:
args.append(decoded_query)
nobody = nobody_uid()
self.wfile.flush() # Always flush before forking
pid = os.fork()
if pid != 0:
# Parent
pid, sts = os.waitpid(pid, 0)
# throw away additional data [see bug #427345]
while select.select([self.rfile], [], [], 0)[0]:
if not self.rfile.read(1):
break
if sts:
self.log_error("CGI script exit status %#x", sts)
return
# Child
try:
try:
os.setuid(nobody)
except OSError:
pass
os.dup2(self.rfile.fileno(), 0)
os.dup2(self.wfile.fileno(), 1)
os.execve(scriptfile, args, env)
except:
self.server.handle_error(self.request, self.client_address)
os._exit(127)
else:
# Non-Unix -- use subprocess
import subprocess
cmdline = [scriptfile]
if self.is_python(scriptfile):
interp = sys.executable
if interp.lower().endswith("w.exe"):
# On Windows, use python.exe, not pythonw.exe
interp = interp[:-5] + interp[-4:]
cmdline = [interp, '-u'] + cmdline
if '=' not in query:
cmdline.append(query)
self.log_message("command: %s", subprocess.list2cmdline(cmdline))
try:
nbytes = int(length)
except (TypeError, ValueError):
nbytes = 0
p = subprocess.Popen(cmdline,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env = env
)
if self.command.lower() == "post" and nbytes > 0:
data = self.rfile.read(nbytes)
else:
data = None
# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break
stdout, stderr = p.communicate(data)
self.wfile.write(stdout)
if stderr:
self.log_error('%s', stderr)
p.stderr.close()
p.stdout.close()
status = p.returncode
if status:
self.log_error("CGI script exit status %#x", status)
else:
self.log_message("CGI script exited OK")
def test(HandlerClass=BaseHTTPRequestHandler,
ServerClass=HTTPServer, protocol="HTTP/1.0", port=8000, bind=""):
"""Test the HTTP request handler class.
This runs an HTTP server on port 8000 (or the port argument).
"""
server_address = (bind, port)
HandlerClass.protocol_version = protocol
httpd = ServerClass(server_address, HandlerClass)
sa = httpd.socket.getsockname()
print("Serving HTTP on", sa[0], "port", sa[1], "...")
try:
httpd.serve_forever()
except KeyboardInterrupt:
print("\nKeyboard interrupt received, exiting.")
httpd.server_close()
sys.exit(0)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--cgi', action='store_true',
help='Run as CGI Server')
parser.add_argument('--bind', '-b', default='', metavar='ADDRESS',
help='Specify alternate bind address '
'[default: all interfaces]')
parser.add_argument('port', action='store',
default=8000, type=int,
nargs='?',
help='Specify alternate port [default: 8000]')
args = parser.parse_args()
if args.cgi:
handler_class = CGIHTTPRequestHandler
else:
handler_class = SimpleHTTPRequestHandler
test(HandlerClass=handler_class, port=args.port, bind=args.bind)
# This directory is a Python package.
"""Abstract base classes related to import."""
from . import _bootstrap
from . import machinery
try:
import _frozen_importlib
except ImportError as exc:
if exc.name != '_frozen_importlib':
raise
_frozen_importlib = None
import abc
def _register(abstract_cls, *classes):
for cls in classes:
abstract_cls.register(cls)
if _frozen_importlib is not None:
frozen_cls = getattr(_frozen_importlib, cls.__name__)
abstract_cls.register(frozen_cls)
class Finder(metaclass=abc.ABCMeta):
"""Legacy abstract base class for import finders.
It may be subclassed for compatibility with legacy third party
reimplementations of the import system. Otherwise, finder
implementations should derive from the more specific MetaPathFinder
or PathEntryFinder ABCs.
"""
@abc.abstractmethod
def find_module(self, fullname, path=None):
"""An abstract method that should find a module.
The fullname is a str and the optional path is a str or None.
Returns a Loader object or None.
"""
class MetaPathFinder(Finder):
"""Abstract base class for import finders on sys.meta_path."""
# We don't define find_spec() here since that would break
# hasattr checks we do to support backward compatibility.
def find_module(self, fullname, path):
"""Return a loader for the module.
If no module is found, return None. The fullname is a str and
the path is a list of strings or None.
This method is deprecated in favor of finder.find_spec(). If find_spec()
exists then backwards-compatible functionality is provided for this
method.
"""
if not hasattr(self, 'find_spec'):
return None
found = self.find_spec(fullname, path)
return found.loader if found is not None else None
def invalidate_caches(self):
"""An optional method for clearing the finder's cache, if any.
This method is used by importlib.invalidate_caches().
"""
_register(MetaPathFinder, machinery.BuiltinImporter, machinery.FrozenImporter,
machinery.PathFinder, machinery.WindowsRegistryFinder)
class PathEntryFinder(Finder):
"""Abstract base class for path entry finders used by PathFinder."""
# We don't define find_spec() here since that would break
# hasattr checks we do to support backward compatibility.
def find_loader(self, fullname):
"""Return (loader, namespace portion) for the path entry.
The fullname is a str. The namespace portion is a sequence of
path entries contributing to part of a namespace package. The
sequence may be empty. If loader is not None, the portion will
be ignored.
The portion will be discarded if another path entry finder
locates the module as a normal module or package.
This method is deprecated in favor of finder.find_spec(). If find_spec()
is provided than backwards-compatible functionality is provided.
"""
if not hasattr(self, 'find_spec'):
return None, []
found = self.find_spec(fullname)
if found is not None:
if not found.submodule_search_locations:
portions = []
else:
portions = found.submodule_search_locations
return found.loader, portions
else:
return None, []
find_module = _bootstrap._find_module_shim
def invalidate_caches(self):
"""An optional method for clearing the finder's cache, if any.
This method is used by PathFinder.invalidate_caches().
"""
_register(PathEntryFinder, machinery.FileFinder)
class Loader(metaclass=abc.ABCMeta):
"""Abstract base class for import loaders."""
def create_module(self, spec):
"""Return a module to initialize and into which to load.
This method should raise ImportError if anything prevents it
from creating a new module. It may return None to indicate
that the spec should create the new module.
create_module() is optional.
"""
# By default, defer to _SpecMethods.create() for the new module.
return None
# We don't define exec_module() here since that would break
# hasattr checks we do to support backward compatibility.
def load_module(self, fullname):
"""Return the loaded module.
The module must be added to sys.modules and have import-related
attributes set properly. The fullname is a str.
ImportError is raised on failure.
This method is deprecated in favor of loader.exec_module(). If
exec_module() exists then it is used to provide a backwards-compatible
functionality for this method.
"""
if not hasattr(self, 'exec_module'):
raise ImportError
return _bootstrap._load_module_shim(self, fullname)
def module_repr(self, module):
"""Return a module's repr.
Used by the module type when the method does not raise
NotImplementedError.
This method is deprecated.
"""
# The exception will cause ModuleType.__repr__ to ignore this method.
raise NotImplementedError
class ResourceLoader(Loader):
"""Abstract base class for loaders which can return data from their
back-end storage.
This ABC represents one of the optional protocols specified by PEP 302.
"""
@abc.abstractmethod
def get_data(self, path):
"""Abstract method which when implemented should return the bytes for
the specified path. The path must be a str."""
raise IOError
class InspectLoader(Loader):
"""Abstract base class for loaders which support inspection about the
modules they can load.
This ABC represents one of the optional protocols specified by PEP 302.
"""
def is_package(self, fullname):
"""Optional method which when implemented should return whether the
module is a package. The fullname is a str. Returns a bool.
Raises ImportError if the module cannot be found.
"""
raise ImportError
def get_code(self, fullname):
"""Method which returns the code object for the module.
The fullname is a str. Returns a types.CodeType if possible, else
returns None if a code object does not make sense
(e.g. built-in module). Raises ImportError if the module cannot be
found.
"""
source = self.get_source(fullname)
if source is None:
return None
return self.source_to_code(source)
@abc.abstractmethod
def get_source(self, fullname):
"""Abstract method which should return the source code for the
module. The fullname is a str. Returns a str.
Raises ImportError if the module cannot be found.
"""
raise ImportError
def source_to_code(self, data, path='<string>'):
"""Compile 'data' into a code object.
The 'data' argument can be anything that compile() can handle. The'path'
argument should be where the data was retrieved (when applicable)."""
return compile(data, path, 'exec', dont_inherit=True)
exec_module = _bootstrap._LoaderBasics.exec_module
load_module = _bootstrap._LoaderBasics.load_module
_register(InspectLoader, machinery.BuiltinImporter, machinery.FrozenImporter)
class ExecutionLoader(InspectLoader):
"""Abstract base class for loaders that wish to support the execution of
modules as scripts.
This ABC represents one of the optional protocols specified in PEP 302.
"""
@abc.abstractmethod
def get_filename(self, fullname):
"""Abstract method which should return the value that __file__ is to be
set to.
Raises ImportError if the module cannot be found.
"""
raise ImportError
def get_code(self, fullname):
"""Method to return the code object for fullname.
Should return None if not applicable (e.g. built-in module).
Raise ImportError if the module cannot be found.
"""
source = self.get_source(fullname)
if source is None:
return None
try:
path = self.get_filename(fullname)
except ImportError:
return self.source_to_code(source)
else:
return self.source_to_code(source, path)
_register(ExecutionLoader, machinery.ExtensionFileLoader)
class FileLoader(_bootstrap.FileLoader, ResourceLoader, ExecutionLoader):
"""Abstract base class partially implementing the ResourceLoader and
ExecutionLoader ABCs."""
_register(FileLoader, machinery.SourceFileLoader,
machinery.SourcelessFileLoader)
class SourceLoader(_bootstrap.SourceLoader, ResourceLoader, ExecutionLoader):
"""Abstract base class for loading source code (and optionally any
corresponding bytecode).
To support loading from source code, the abstractmethods inherited from
ResourceLoader and ExecutionLoader need to be implemented. To also support
loading from bytecode, the optional methods specified directly by this ABC
is required.
Inherited abstractmethods not implemented in this ABC:
* ResourceLoader.get_data
* ExecutionLoader.get_filename
"""
def path_mtime(self, path):
"""Return the (int) modification time for the path (str)."""
if self.path_stats.__func__ is SourceLoader.path_stats:
raise IOError
return int(self.path_stats(path)['mtime'])
def path_stats(self, path):
"""Return a metadata dict for the source pointed to by the path (str).
Possible keys:
- 'mtime' (mandatory) is the numeric timestamp of last source
code modification;
- 'size' (optional) is the size in bytes of the source code.
"""
if self.path_mtime.__func__ is SourceLoader.path_mtime:
raise IOError
return {'mtime': self.path_mtime(path)}
def set_data(self, path, data):
"""Write the bytes to the path (if possible).
Accepts a str path and data as bytes.
Any needed intermediary directories are to be created. If for some
reason the file cannot be written because of permissions, fail
silently.
"""
_register(SourceLoader, machinery.SourceFileLoader)
"""The machinery of importlib: finders, loaders, hooks, etc."""
import _imp
from ._bootstrap import (SOURCE_SUFFIXES, DEBUG_BYTECODE_SUFFIXES,
OPTIMIZED_BYTECODE_SUFFIXES, BYTECODE_SUFFIXES,
EXTENSION_SUFFIXES)
from ._bootstrap import ModuleSpec
from ._bootstrap import BuiltinImporter
from ._bootstrap import FrozenImporter
from ._bootstrap import WindowsRegistryFinder
from ._bootstrap import PathFinder
from ._bootstrap import FileFinder
from ._bootstrap import SourceFileLoader
from ._bootstrap import SourcelessFileLoader
from ._bootstrap import ExtensionFileLoader
def all_suffixes():
"""Returns a list of all recognized module suffixes for this process"""
return SOURCE_SUFFIXES + BYTECODE_SUFFIXES + EXTENSION_SUFFIXES
"""Utility code for constructing importers, etc."""
from ._bootstrap import MAGIC_NUMBER
from ._bootstrap import cache_from_source
from ._bootstrap import decode_source
from ._bootstrap import source_from_cache
from ._bootstrap import spec_from_loader
from ._bootstrap import spec_from_file_location
from ._bootstrap import _resolve_name
from ._bootstrap import _find_spec
from contextlib import contextmanager
import functools
import sys
import warnings
def resolve_name(name, package):
"""Resolve a relative module name to an absolute one."""
if not name.startswith('.'):
return name
elif not package:
raise ValueError('{!r} is not a relative name '
'(no leading dot)'.format(name))
level = 0
for character in name:
if character != '.':
break
level += 1
return _resolve_name(name[level:], package, level)
def _find_spec_from_path(name, path=None):
"""Return the spec for the specified module.
First, sys.modules is checked to see if the module was already imported. If
so, then sys.modules[name].__spec__ is returned. If that happens to be
set to None, then ValueError is raised. If the module is not in
sys.modules, then sys.meta_path is searched for a suitable spec with the
value of 'path' given to the finders. None is returned if no spec could
be found.
Dotted names do not have their parent packages implicitly imported. You will
most likely need to explicitly import all parent packages in the proper
order for a submodule to get the correct spec.
"""
if name not in sys.modules:
return _find_spec(name, path)
else:
module = sys.modules[name]
if module is None:
return None
try:
spec = module.__spec__
except AttributeError:
raise ValueError('{}.__spec__ is not set'.format(name))
else:
if spec is None:
raise ValueError('{}.__spec__ is None'.format(name))
return spec
def find_spec(name, package=None):
"""Return the spec for the specified module.
First, sys.modules is checked to see if the module was already imported. If
so, then sys.modules[name].__spec__ is returned. If that happens to be
set to None, then ValueError is raised. If the module is not in
sys.modules, then sys.meta_path is searched for a suitable spec with the
value of 'path' given to the finders. None is returned if no spec could
be found.
If the name is for submodule (contains a dot), the parent module is
automatically imported.
The name and package arguments work the same as importlib.import_module().
In other words, relative module names (with leading dots) work.
"""
fullname = resolve_name(name, package) if name.startswith('.') else name
if fullname not in sys.modules:
parent_name = fullname.rpartition('.')[0]
if parent_name:
# Use builtins.__import__() in case someone replaced it.
parent = __import__(parent_name, fromlist=['__path__'])
return _find_spec(fullname, parent.__path__)
else:
return _find_spec(fullname, None)
else:
module = sys.modules[fullname]
if module is None:
return None
try:
spec = module.__spec__
except AttributeError:
raise ValueError('{}.__spec__ is not set'.format(name))
else:
if spec is None:
raise ValueError('{}.__spec__ is None'.format(name))
return spec
@contextmanager
def _module_to_load(name):
is_reload = name in sys.modules
module = sys.modules.get(name)
if not is_reload:
# This must be done before open() is called as the 'io' module
# implicitly imports 'locale' and would otherwise trigger an
# infinite loop.
module = type(sys)(name)
# This must be done before putting the module in sys.modules
# (otherwise an optimization shortcut in import.c becomes wrong)
module.__initializing__ = True
sys.modules[name] = module
try:
yield module
except Exception:
if not is_reload:
try:
del sys.modules[name]
except KeyError:
pass
finally:
module.__initializing__ = False
def set_package(fxn):
"""Set __package__ on the returned module.
This function is deprecated.
"""
@functools.wraps(fxn)
def set_package_wrapper(*args, **kwargs):
warnings.warn('The import system now takes care of this automatically.',
DeprecationWarning, stacklevel=2)
module = fxn(*args, **kwargs)
if getattr(module, '__package__', None) is None:
module.__package__ = module.__name__
if not hasattr(module, '__path__'):
module.__package__ = module.__package__.rpartition('.')[0]
return module
return set_package_wrapper
def set_loader(fxn):
"""Set __loader__ on the returned module.
This function is deprecated.
"""
@functools.wraps(fxn)
def set_loader_wrapper(self, *args, **kwargs):
warnings.warn('The import system now takes care of this automatically.',
DeprecationWarning, stacklevel=2)
module = fxn(self, *args, **kwargs)
if getattr(module, '__loader__', None) is None:
module.__loader__ = self
return module
return set_loader_wrapper
def module_for_loader(fxn):
"""Decorator to handle selecting the proper module for loaders.
The decorated function is passed the module to use instead of the module
name. The module passed in to the function is either from sys.modules if
it already exists or is a new module. If the module is new, then __name__
is set the first argument to the method, __loader__ is set to self, and
__package__ is set accordingly (if self.is_package() is defined) will be set
before it is passed to the decorated function (if self.is_package() does
not work for the module it will be set post-load).
If an exception is raised and the decorator created the module it is
subsequently removed from sys.modules.
The decorator assumes that the decorated function takes the module name as
the second argument.
"""
warnings.warn('The import system now takes care of this automatically.',
DeprecationWarning, stacklevel=2)
@functools.wraps(fxn)
def module_for_loader_wrapper(self, fullname, *args, **kwargs):
with _module_to_load(fullname) as module:
module.__loader__ = self
try:
is_package = self.is_package(fullname)
except (ImportError, AttributeError):
pass
else:
if is_package:
module.__package__ = fullname
else:
module.__package__ = fullname.rpartition('.')[0]
# If __package__ was not set above, __import__() will do it later.
return fxn(self, module, *args, **kwargs)
return module_for_loader_wrapper
"""Core implementation of import.
This module is NOT meant to be directly imported! It has been designed such
that it can be bootstrapped into Python as the implementation of import. As
such it requires the injection of specific modules and attributes in order to
work. One should use importlib as the public-facing version of this module.
"""
#
# IMPORTANT: Whenever making changes to this module, be sure to run
# a top-level make in order to get the frozen version of the module
# update. Not doing so will result in the Makefile to fail for
# all others who don't have a ./python around to freeze the module
# in the early stages of compilation.
#
# See importlib._setup() for what is injected into the global namespace.
# When editing this code be aware that code executed at import time CANNOT
# reference any injected objects! This includes not only global code but also
# anything specified at the class level.
# Bootstrap-related code ######################################################
_CASE_INSENSITIVE_PLATFORMS = 'win', 'cygwin', 'darwin'
_unspecified = object() # ironpython: default value for dict.get
def _make_relax_case():
if sys.platform.startswith(_CASE_INSENSITIVE_PLATFORMS):
def _relax_case():
"""True if filenames must be checked case-insensitively."""
return b'PYTHONCASEOK' in _os.environ
else:
def _relax_case():
"""True if filenames must be checked case-insensitively."""
return False
return _relax_case
def _w_long(x):
"""Convert a 32-bit integer to little-endian."""
return (int(x) & 0xFFFFFFFF).to_bytes(4, 'little')
def _r_long(int_bytes):
"""Convert 4 bytes in little-endian to an integer."""
return int.from_bytes(int_bytes, 'little')
def _path_join(*path_parts):
"""Replacement for os.path.join()."""
return path_sep.join([part.rstrip(path_separators)
for part in path_parts if part])
def _path_split(path):
"""Replacement for os.path.split()."""
if len(path_separators) == 1:
front, _, tail = path.rpartition(path_sep)
return front, tail
for x in reversed(path):
if x in path_separators:
front, tail = path.rsplit(x, maxsplit=1)
return front, tail
return '', path
def _path_stat(path):
"""Stat the path.
Made a separate function to make it easier to override in experiments
(e.g. cache stat results).
"""
return _os.stat(path)
def _path_is_mode_type(path, mode):
"""Test whether the path is the specified mode type."""
try:
stat_info = _path_stat(path)
except OSError:
return False
return (stat_info.st_mode & 0o170000) == mode
def _path_isfile(path):
"""Replacement for os.path.isfile."""
return _path_is_mode_type(path, 0o100000)
def _path_isdir(path):
"""Replacement for os.path.isdir."""
if not path:
path = _os.getcwd()
return _path_is_mode_type(path, 0o040000)
def _write_atomic(path, data, mode=0o666):
"""Best-effort function to write data to a path atomically.
Be prepared to handle a FileExistsError if concurrent writing of the
temporary file is attempted."""
# id() is used to generate a pseudo-random filename.
path_tmp = '{}.{}'.format(path, id(path))
fd = _os.open(path_tmp,
_os.O_EXCL | _os.O_CREAT | _os.O_WRONLY, mode & 0o666)
try:
# We first write data to a temporary file, and then use os.replace() to
# perform an atomic rename.
with _io.FileIO(fd, 'wb') as file:
file.write(data)
_os.replace(path_tmp, path)
except OSError:
try:
_os.unlink(path_tmp)
except OSError:
pass
raise
def _wrap(new, old):
"""Simple substitute for functools.update_wrapper."""
for replace in ['__module__', '__name__', '__qualname__', '__doc__']:
if hasattr(old, replace):
setattr(new, replace, getattr(old, replace))
new.__dict__.update(old.__dict__)
def _new_module(name):
return type(sys)(name)
_code_type = type(_wrap.__code__)
class _ManageReload:
"""Manages the possible clean-up of sys.modules for load_module()."""
def __init__(self, name):
self._name = name
def __enter__(self):
self._is_reload = self._name in sys.modules
def __exit__(self, *args):
if any(arg is not None for arg in args) and not self._is_reload:
try:
del sys.modules[self._name]
except KeyError:
pass
# Module-level locking ########################################################
# A dict mapping module names to weakrefs of _ModuleLock instances
_module_locks = {}
# A dict mapping thread ids to _ModuleLock instances
_blocking_on = {}
class _DeadlockError(RuntimeError):
pass
class _ModuleLock:
"""A recursive lock implementation which is able to detect deadlocks
(e.g. thread 1 trying to take locks A then B, and thread 2 trying to
take locks B then A).
"""
def __init__(self, name):
self.lock = _thread.allocate_lock()
self.wakeup = _thread.allocate_lock()
self.name = name
self.owner = None
self.count = 0
self.waiters = 0
def has_deadlock(self):
# Deadlock avoidance for concurrent circular imports.
me = _thread.get_ident()
tid = self.owner
while True:
lock = _blocking_on.get(tid)
if lock is None:
return False
tid = lock.owner
if tid == me:
return True
def acquire(self):
"""
Acquire the module lock. If a potential deadlock is detected,
a _DeadlockError is raised.
Otherwise, the lock is always acquired and True is returned.
"""
tid = _thread.get_ident()
_blocking_on[tid] = self
try:
while True:
with self.lock:
if self.count == 0 or self.owner == tid:
self.owner = tid
self.count += 1
return True
if self.has_deadlock():
raise _DeadlockError('deadlock detected by %r' % self)
if self.wakeup.acquire(False):
self.waiters += 1
# Wait for a release() call
self.wakeup.acquire()
self.wakeup.release()
finally:
del _blocking_on[tid]
def release(self):
tid = _thread.get_ident()
with self.lock:
if self.owner != tid:
raise RuntimeError('cannot release un-acquired lock')
assert self.count > 0
self.count -= 1
if self.count == 0:
self.owner = None
if self.waiters:
self.waiters -= 1
self.wakeup.release()
def __repr__(self):
return '_ModuleLock({!r}) at {}'.format(self.name, id(self))
class _DummyModuleLock:
"""A simple _ModuleLock equivalent for Python builds without
multi-threading support."""
def __init__(self, name):
self.name = name
self.count = 0
def acquire(self):
self.count += 1
return True
def release(self):
if self.count == 0:
raise RuntimeError('cannot release un-acquired lock')
self.count -= 1
def __repr__(self):
return '_DummyModuleLock({!r}) at {}'.format(self.name, id(self))
class _ModuleLockManager:
def __init__(self, name):
self._name = name
self._lock = None
def __enter__(self):
try:
self._lock = _get_module_lock(self._name)
finally:
_imp.release_lock()
self._lock.acquire()
def __exit__(self, *args, **kwargs):
self._lock.release()
# The following two functions are for consumption by Python/import.c.
def _get_module_lock(name):
"""Get or create the module lock for a given module name.
Should only be called with the import lock taken."""
lock = None
# ironpython: optimization to avoid KeyError exception
lock_fn = _module_locks.get(name, _unspecified)
if lock_fn is not _unspecified:
lock = lock_fn()
if lock is None:
if _thread is None:
lock = _DummyModuleLock(name)
else:
lock = _ModuleLock(name)
def cb(_):
del _module_locks[name]
_module_locks[name] = _weakref.ref(lock, cb)
return lock
def _lock_unlock_module(name):
"""Release the global import lock, and acquires then release the
module lock for a given module name.
This is used to ensure a module is completely initialized, in the
event it is being imported by another thread.
Should only be called with the import lock taken."""
lock = _get_module_lock(name)
_imp.release_lock()
try:
lock.acquire()
except _DeadlockError:
# Concurrent circular import, we'll accept a partially initialized
# module object.
pass
else:
lock.release()
# Frame stripping magic ###############################################
def _call_with_frames_removed(f, *args, **kwds):
"""remove_importlib_frames in import.c will always remove sequences
of importlib frames that end with a call to this function
Use it instead of a normal call in places where including the importlib
frames introduces unwanted noise into the traceback (e.g. when executing
module code)
"""
return f(*args, **kwds)
# Finder/loader utility code ###############################################
# Magic word to reject .pyc files generated by other Python versions.
# It should change for each incompatible change to the bytecode.
#
# The value of CR and LF is incorporated so if you ever read or write
# a .pyc file in text mode the magic number will be wrong; also, the
# Apple MPW compiler swaps their values, botching string constants.
#
# The magic numbers must be spaced apart at least 2 values, as the
# -U interpeter flag will cause MAGIC+1 being used. They have been
# odd numbers for some time now.
#
# There were a variety of old schemes for setting the magic number.
# The current working scheme is to increment the previous value by
# 10.
#
# Starting with the adoption of PEP 3147 in Python 3.2, every bump in magic
# number also includes a new "magic tag", i.e. a human readable string used
# to represent the magic number in __pycache__ directories. When you change
# the magic number, you must also set a new unique magic tag. Generally this
# can be named after the Python major version of the magic number bump, but
# it can really be anything, as long as it's different than anything else
# that's come before. The tags are included in the following table, starting
# with Python 3.2a0.
#
# Known values:
# Python 1.5: 20121
# Python 1.5.1: 20121
# Python 1.5.2: 20121
# Python 1.6: 50428
# Python 2.0: 50823
# Python 2.0.1: 50823
# Python 2.1: 60202
# Python 2.1.1: 60202
# Python 2.1.2: 60202
# Python 2.2: 60717
# Python 2.3a0: 62011
# Python 2.3a0: 62021
# Python 2.3a0: 62011 (!)
# Python 2.4a0: 62041
# Python 2.4a3: 62051
# Python 2.4b1: 62061
# Python 2.5a0: 62071
# Python 2.5a0: 62081 (ast-branch)
# Python 2.5a0: 62091 (with)
# Python 2.5a0: 62092 (changed WITH_CLEANUP opcode)
# Python 2.5b3: 62101 (fix wrong code: for x, in ...)
# Python 2.5b3: 62111 (fix wrong code: x += yield)
# Python 2.5c1: 62121 (fix wrong lnotab with for loops and
# storing constants that should have been removed)
# Python 2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp)
# Python 2.6a0: 62151 (peephole optimizations and STORE_MAP opcode)
# Python 2.6a1: 62161 (WITH_CLEANUP optimization)
# Python 2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND)
# Python 2.7a0: 62181 (optimize conditional branches:
# introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)
# Python 2.7a0 62191 (introduce SETUP_WITH)
# Python 2.7a0 62201 (introduce BUILD_SET)
# Python 2.7a0 62211 (introduce MAP_ADD and SET_ADD)
# Python 3000: 3000
# 3010 (removed UNARY_CONVERT)
# 3020 (added BUILD_SET)
# 3030 (added keyword-only parameters)
# 3040 (added signature annotations)
# 3050 (print becomes a function)
# 3060 (PEP 3115 metaclass syntax)
# 3061 (string literals become unicode)
# 3071 (PEP 3109 raise changes)
# 3081 (PEP 3137 make __file__ and __name__ unicode)
# 3091 (kill str8 interning)
# 3101 (merge from 2.6a0, see 62151)
# 3103 (__file__ points to source file)
# Python 3.0a4: 3111 (WITH_CLEANUP optimization).
# Python 3.0a5: 3131 (lexical exception stacking, including POP_EXCEPT)
# Python 3.1a0: 3141 (optimize list, set and dict comprehensions:
# change LIST_APPEND and SET_ADD, add MAP_ADD)
# Python 3.1a0: 3151 (optimize conditional branches:
# introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)
# Python 3.2a0: 3160 (add SETUP_WITH)
# tag: cpython-32
# Python 3.2a1: 3170 (add DUP_TOP_TWO, remove DUP_TOPX and ROT_FOUR)
# tag: cpython-32
# Python 3.2a2 3180 (add DELETE_DEREF)
# Python 3.3a0 3190 __class__ super closure changed
# Python 3.3a0 3200 (__qualname__ added)
# 3210 (added size modulo 2**32 to the pyc header)
# Python 3.3a1 3220 (changed PEP 380 implementation)
# Python 3.3a4 3230 (revert changes to implicit __class__ closure)
# Python 3.4a1 3250 (evaluate positional default arguments before
# keyword-only defaults)
# Python 3.4a1 3260 (add LOAD_CLASSDEREF; allow locals of class to override
# free vars)
# Python 3.4a1 3270 (various tweaks to the __class__ closure)
# Python 3.4a1 3280 (remove implicit class argument)
# Python 3.4a4 3290 (changes to __qualname__ computation)
# Python 3.4a4 3300 (more changes to __qualname__ computation)
# Python 3.4rc2 3310 (alter __qualname__ computation)
#
# MAGIC must change whenever the bytecode emitted by the compiler may no
# longer be understood by older implementations of the eval loop (usually
# due to the addition of new opcodes).
MAGIC_NUMBER = (3310).to_bytes(2, 'little') + b'\r\n'
_RAW_MAGIC_NUMBER = int.from_bytes(MAGIC_NUMBER, 'little') # For import.c
_PYCACHE = '__pycache__'
SOURCE_SUFFIXES = ['.py'] # _setup() adds .pyw as needed.
DEBUG_BYTECODE_SUFFIXES = ['.pyc']
OPTIMIZED_BYTECODE_SUFFIXES = ['.pyo']
def cache_from_source(path, debug_override=None):
"""Given the path to a .py file, return the path to its .pyc/.pyo file.
The .py file does not need to exist; this simply returns the path to the
.pyc/.pyo file calculated as if the .py file were imported. The extension
will be .pyc unless sys.flags.optimize is non-zero, then it will be .pyo.
If debug_override is not None, then it must be a boolean and is used in
place of sys.flags.optimize.
If sys.implementation.cache_tag is None then NotImplementedError is raised.
"""
debug = not sys.flags.optimize if debug_override is None else debug_override
if debug:
suffixes = DEBUG_BYTECODE_SUFFIXES
else:
suffixes = OPTIMIZED_BYTECODE_SUFFIXES
head, tail = _path_split(path)
base, sep, rest = tail.rpartition('.')
tag = sys.implementation.cache_tag
if tag is None:
raise NotImplementedError('sys.implementation.cache_tag is None')
filename = ''.join([(base if base else rest), sep, tag, suffixes[0]])
return _path_join(head, _PYCACHE, filename)
def source_from_cache(path):
"""Given the path to a .pyc./.pyo file, return the path to its .py file.
The .pyc/.pyo file does not need to exist; this simply returns the path to
the .py file calculated to correspond to the .pyc/.pyo file. If path does
not conform to PEP 3147 format, ValueError will be raised. If
sys.implementation.cache_tag is None then NotImplementedError is raised.
"""
if sys.implementation.cache_tag is None:
raise NotImplementedError('sys.implementation.cache_tag is None')
head, pycache_filename = _path_split(path)
head, pycache = _path_split(head)
if pycache != _PYCACHE:
raise ValueError('{} not bottom-level directory in '
'{!r}'.format(_PYCACHE, path))
if pycache_filename.count('.') != 2:
raise ValueError('expected only 2 dots in '
'{!r}'.format(pycache_filename))
base_filename = pycache_filename.partition('.')[0]
return _path_join(head, base_filename + SOURCE_SUFFIXES[0])
def _get_sourcefile(bytecode_path):
"""Convert a bytecode file path to a source path (if possible).
This function exists purely for backwards-compatibility for
PyImport_ExecCodeModuleWithFilenames() in the C API.
"""
if len(bytecode_path) == 0:
return None
rest, _, extension = bytecode_path.rpartition('.')
if not rest or extension.lower()[-3:-1] != 'py':
return bytecode_path
try:
source_path = source_from_cache(bytecode_path)
except (NotImplementedError, ValueError):
source_path = bytecode_path[:-1]
return source_path if _path_isfile(source_path) else bytecode_path
def _calc_mode(path):
"""Calculate the mode permissions for a bytecode file."""
try:
mode = _path_stat(path).st_mode
except OSError:
mode = 0o666
# We always ensure write access so we can update cached files
# later even when the source files are read-only on Windows (#6074)
mode |= 0o200
return mode
def _verbose_message(message, *args, verbosity=1):
"""Print the message to stderr if -v/PYTHONVERBOSE is turned on."""
if sys.flags.verbose >= verbosity:
if not message.startswith(('#', 'import ')):
message = '# ' + message
print(message.format(*args), file=sys.stderr)
def _check_name(method):
"""Decorator to verify that the module being requested matches the one the
loader can handle.
The first argument (self) must define _name which the second argument is
compared against. If the comparison fails then ImportError is raised.
"""
def _check_name_wrapper(self, name=None, *args, **kwargs):
if name is None:
name = self.name
elif self.name != name:
raise ImportError('loader cannot handle %s' % name, name=name)
return method(self, name, *args, **kwargs)
_wrap(_check_name_wrapper, method)
return _check_name_wrapper
def _requires_builtin(fxn):
"""Decorator to verify the named module is built-in."""
def _requires_builtin_wrapper(self, fullname):
if fullname not in sys.builtin_module_names:
raise ImportError('{!r} is not a built-in module'.format(fullname),
name=fullname)
return fxn(self, fullname)
_wrap(_requires_builtin_wrapper, fxn)
return _requires_builtin_wrapper
def _requires_frozen(fxn):
"""Decorator to verify the named module is frozen."""
def _requires_frozen_wrapper(self, fullname):
if not _imp.is_frozen(fullname):
raise ImportError('{!r} is not a frozen module'.format(fullname),
name=fullname)
return fxn(self, fullname)
_wrap(_requires_frozen_wrapper, fxn)
return _requires_frozen_wrapper
def _find_module_shim(self, fullname):
"""Try to find a loader for the specified module by delegating to
self.find_loader().
This method is deprecated in favor of finder.find_spec().
"""
# Call find_loader(). If it returns a string (indicating this
# is a namespace package portion), generate a warning and
# return None.
loader, portions = self.find_loader(fullname)
if loader is None and len(portions):
msg = 'Not importing directory {}: missing __init__'
_warnings.warn(msg.format(portions[0]), ImportWarning)
return loader
def _load_module_shim(self, fullname):
"""Load the specified module into sys.modules and return it.
This method is deprecated. Use loader.exec_module instead.
"""
spec = spec_from_loader(fullname, self)
methods = _SpecMethods(spec)
if fullname in sys.modules:
module = sys.modules[fullname]
methods.exec(module)
return sys.modules[fullname]
else:
return methods.load()
def _validate_bytecode_header(data, source_stats=None, name=None, path=None):
"""Validate the header of the passed-in bytecode against source_stats (if
given) and returning the bytecode that can be compiled by compile().
All other arguments are used to enhance error reporting.
ImportError is raised when the magic number is incorrect or the bytecode is
found to be stale. EOFError is raised when the data is found to be
truncated.
"""
exc_details = {}
if name is not None:
exc_details['name'] = name
else:
# To prevent having to make all messages have a conditional name.
name = '<bytecode>'
if path is not None:
exc_details['path'] = path
magic = data[:4]
raw_timestamp = data[4:8]
raw_size = data[8:12]
if magic != MAGIC_NUMBER:
message = 'bad magic number in {!r}: {!r}'.format(name, magic)
_verbose_message('{}', message)
raise ImportError(message, **exc_details)
elif len(raw_timestamp) != 4:
message = 'reached EOF while reading timestamp in {!r}'.format(name)
_verbose_message('{}', message)
raise EOFError(message)
elif len(raw_size) != 4:
message = 'reached EOF while reading size of source in {!r}'.format(name)
_verbose_message('{}', message)
raise EOFError(message)
if source_stats is not None:
try:
source_mtime = int(source_stats['mtime'])
except KeyError:
pass
else:
if _r_long(raw_timestamp) != source_mtime:
message = 'bytecode is stale for {!r}'.format(name)
_verbose_message('{}', message)
raise ImportError(message, **exc_details)
try:
source_size = source_stats['size'] & 0xFFFFFFFF
except KeyError:
pass
else:
if _r_long(raw_size) != source_size:
raise ImportError('bytecode is stale for {!r}'.format(name),
**exc_details)
return data[12:]
def _compile_bytecode(data, name=None, bytecode_path=None, source_path=None):
"""Compile bytecode as returned by _validate_bytecode_header()."""
code = marshal.loads(data)
if isinstance(code, _code_type):
_verbose_message('code object from {!r}', bytecode_path)
if source_path is not None:
_imp._fix_co_filename(code, source_path)
return code
else:
raise ImportError('Non-code object in {!r}'.format(bytecode_path),
name=name, path=bytecode_path)
def _code_to_bytecode(code, mtime=0, source_size=0):
"""Compile a code object into bytecode for writing out to a byte-compiled
file."""
data = bytearray(MAGIC_NUMBER)
data.extend(_w_long(mtime))
data.extend(_w_long(source_size))
data.extend(marshal.dumps(code))
return data
def decode_source(source_bytes):
"""Decode bytes representing source code and return the string.
Universal newline support is used in the decoding.
"""
import tokenize # To avoid bootstrap issues.
source_bytes_readline = _io.BytesIO(source_bytes).readline
encoding = tokenize.detect_encoding(source_bytes_readline)
newline_decoder = _io.IncrementalNewlineDecoder(None, True)
return newline_decoder.decode(source_bytes.decode(encoding[0]))
# Module specifications #######################################################
def _module_repr(module):
# The implementation of ModuleType__repr__().
loader = getattr(module, '__loader__', None)
if hasattr(loader, 'module_repr'):
# As soon as BuiltinImporter, FrozenImporter, and NamespaceLoader
# drop their implementations for module_repr. we can add a
# deprecation warning here.
try:
return loader.module_repr(module)
except Exception:
pass
try:
spec = module.__spec__
except AttributeError:
pass
else:
if spec is not None:
return _SpecMethods(spec).module_repr()
# We could use module.__class__.__name__ instead of 'module' in the
# various repr permutations.
try:
name = module.__name__
except AttributeError:
name = '?'
try:
filename = module.__file__
except AttributeError:
if loader is None:
return '<module {!r}>'.format(name)
else:
return '<module {!r} ({!r})>'.format(name, loader)
else:
return '<module {!r} from {!r}>'.format(name, filename)
class _installed_safely:
def __init__(self, module):
self._module = module
self._spec = module.__spec__
def __enter__(self):
# This must be done before putting the module in sys.modules
# (otherwise an optimization shortcut in import.c becomes
# wrong)
self._spec._initializing = True
sys.modules[self._spec.name] = self._module
def __exit__(self, *args):
try:
spec = self._spec
if any(arg is not None for arg in args):
try:
del sys.modules[spec.name]
except KeyError:
pass
else:
_verbose_message('import {!r} # {!r}', spec.name, spec.loader)
finally:
self._spec._initializing = False
class ModuleSpec:
"""The specification for a module, used for loading.
A module's spec is the source for information about the module. For
data associated with the module, including source, use the spec's
loader.
`name` is the absolute name of the module. `loader` is the loader
to use when loading the module. `parent` is the name of the
package the module is in. The parent is derived from the name.
`is_package` determines if the module is considered a package or
not. On modules this is reflected by the `__path__` attribute.
`origin` is the specific location used by the loader from which to
load the module, if that information is available. When filename is
set, origin will match.
`has_location` indicates that a spec's "origin" reflects a location.
When this is True, `__file__` attribute of the module is set.
`cached` is the location of the cached bytecode file, if any. It
corresponds to the `__cached__` attribute.
`submodule_search_locations` is the sequence of path entries to
search when importing submodules. If set, is_package should be
True--and False otherwise.
Packages are simply modules that (may) have submodules. If a spec
has a non-None value in `submodule_search_locations`, the import
system will consider modules loaded from the spec as packages.
Only finders (see importlib.abc.MetaPathFinder and
importlib.abc.PathEntryFinder) should modify ModuleSpec instances.
"""
def __init__(self, name, loader, *, origin=None, loader_state=None,
is_package=None):
self.name = name
self.loader = loader
self.origin = origin
self.loader_state = loader_state
self.submodule_search_locations = [] if is_package else None
# file-location attributes
self._set_fileattr = False
self._cached = None
def __repr__(self):
args = ['name={!r}'.format(self.name),
'loader={!r}'.format(self.loader)]
if self.origin is not None:
args.append('origin={!r}'.format(self.origin))
if self.submodule_search_locations is not None:
args.append('submodule_search_locations={}'
.format(self.submodule_search_locations))
return '{}({})'.format(self.__class__.__name__, ', '.join(args))
def __eq__(self, other):
smsl = self.submodule_search_locations
try:
return (self.name == other.name and
self.loader == other.loader and
self.origin == other.origin and
smsl == other.submodule_search_locations and
self.cached == other.cached and
self.has_location == other.has_location)
except AttributeError:
return False
@property
def cached(self):
return self._cached
@cached.setter
def cached(self, cached):
self._cached = cached
@property
def parent(self):
"""The name of the module's parent."""
if self.submodule_search_locations is None:
return self.name.rpartition('.')[0]
else:
return self.name
@property
def has_location(self):
return self._set_fileattr
@has_location.setter
def has_location(self, value):
self._set_fileattr = bool(value)
def spec_from_loader(name, loader, *, origin=None, is_package=None):
"""Return a module spec based on various loader methods."""
if hasattr(loader, 'get_filename'):
if is_package is None:
return spec_from_file_location(name, loader=loader)
search = [] if is_package else None
return spec_from_file_location(name, loader=loader,
submodule_search_locations=search)
if is_package is None:
if hasattr(loader, 'is_package'):
try:
is_package = loader.is_package(name)
except ImportError:
is_package = None # aka, undefined
else:
# the default
is_package = False
return ModuleSpec(name, loader, origin=origin, is_package=is_package)
_POPULATE = object()
def spec_from_file_location(name, location=None, *, loader=None,
submodule_search_locations=_POPULATE):
"""Return a module spec based on a file location.
To indicate that the module is a package, set
submodule_search_locations to a list of directory paths. An
empty list is sufficient, though its not otherwise useful to the
import system.
The loader must take a spec as its only __init__() arg.
"""
if location is None:
# The caller may simply want a partially populated location-
# oriented spec. So we set the location to a bogus value and
# fill in as much as we can.
location = '<unknown>'
if hasattr(loader, 'get_filename'):
# ExecutionLoader
try:
location = loader.get_filename(name)
except ImportError:
pass
# If the location is on the filesystem, but doesn't actually exist,
# we could return None here, indicating that the location is not
# valid. However, we don't have a good way of testing since an
# indirect location (e.g. a zip file or URL) will look like a
# non-existent file relative to the filesystem.
spec = ModuleSpec(name, loader, origin=location)
spec._set_fileattr = True
# Pick a loader if one wasn't provided.
if loader is None:
for loader_class, suffixes in _get_supported_file_loaders():
if location.endswith(tuple(suffixes)):
loader = loader_class(name, location)
spec.loader = loader
break
else:
return None
# Set submodule_search_paths appropriately.
if submodule_search_locations is _POPULATE:
# Check the loader.
if hasattr(loader, 'is_package'):
try:
is_package = loader.is_package(name)
except ImportError:
pass
else:
if is_package:
spec.submodule_search_locations = []
else:
spec.submodule_search_locations = submodule_search_locations
if spec.submodule_search_locations == []:
if location:
dirname = _path_split(location)[0]
spec.submodule_search_locations.append(dirname)
return spec
def _spec_from_module(module, loader=None, origin=None):
# This function is meant for use in _setup().
try:
spec = module.__spec__
except AttributeError:
pass
else:
if spec is not None:
return spec
name = module.__name__
if loader is None:
try:
loader = module.__loader__
except AttributeError:
# loader will stay None.
pass
try:
location = module.__file__
except AttributeError:
location = None
if origin is None:
if location is None:
origin = getattr(loader, '_ORIGIN', None) # ironpython: optimization to avoid KeyError exception
else:
origin = location
try:
cached = module.__cached__
except AttributeError:
cached = None
try:
submodule_search_locations = list(module.__path__)
except AttributeError:
submodule_search_locations = None
spec = ModuleSpec(name, loader, origin=origin)
spec._set_fileattr = False if location is None else True
spec.cached = cached
spec.submodule_search_locations = submodule_search_locations
return spec
class _SpecMethods:
"""Convenience wrapper around spec objects to provide spec-specific
methods."""
# The various spec_from_* functions could be made factory methods here.
def __init__(self, spec):
self.spec = spec
def module_repr(self):
"""Return the repr to use for the module."""
# We mostly replicate _module_repr() using the spec attributes.
spec = self.spec
name = '?' if spec.name is None else spec.name
if spec.origin is None:
if spec.loader is None:
return '<module {!r}>'.format(name)
else:
return '<module {!r} ({!r})>'.format(name, spec.loader)
else:
if spec.has_location:
return '<module {!r} from {!r}>'.format(name, spec.origin)
else:
return '<module {!r} ({})>'.format(spec.name, spec.origin)
def init_module_attrs(self, module, *, _override=False, _force_name=True):
"""Set the module's attributes.
All missing import-related module attributes will be set. Here
is how the spec attributes map onto the module:
spec.name -> module.__name__
spec.loader -> module.__loader__
spec.parent -> module.__package__
spec -> module.__spec__
Optional:
spec.origin -> module.__file__ (if spec.set_fileattr is true)
spec.cached -> module.__cached__ (if __file__ also set)
spec.submodule_search_locations -> module.__path__ (if set)
"""
spec = self.spec
# The passed in module may be not support attribute assignment,
# in which case we simply don't set the attributes.
# __name__
if (_override or _force_name or
getattr(module, '__name__', None) is None):
try:
module.__name__ = spec.name
except AttributeError:
pass
# __loader__
if _override or getattr(module, '__loader__', None) is None:
loader = spec.loader
if loader is None:
# A backward compatibility hack.
if spec.submodule_search_locations is not None:
loader = _NamespaceLoader.__new__(_NamespaceLoader)
loader._path = spec.submodule_search_locations
try:
module.__loader__ = loader
except AttributeError:
pass
# __package__
if _override or getattr(module, '__package__', None) is None:
try:
module.__package__ = spec.parent
except AttributeError:
pass
# __spec__
try:
module.__spec__ = spec
except AttributeError:
pass
# __path__
if _override or getattr(module, '__path__', None) is None:
if spec.submodule_search_locations is not None:
try:
module.__path__ = spec.submodule_search_locations
except AttributeError:
pass
if spec.has_location:
# __file__
if _override or getattr(module, '__file__', None) is None:
try:
module.__file__ = spec.origin
except AttributeError:
pass
# __cached__
if _override or getattr(module, '__cached__', None) is None:
if spec.cached is not None:
try:
module.__cached__ = spec.cached
except AttributeError:
pass
def create(self):
"""Return a new module to be loaded.
The import-related module attributes are also set with the
appropriate values from the spec.
"""
spec = self.spec
# Typically loaders will not implement create_module().
if hasattr(spec.loader, 'create_module'):
# If create_module() returns `None` it means the default
# module creation should be used.
module = spec.loader.create_module(spec)
else:
module = None
if module is None:
# This must be done before open() is ever called as the 'io'
# module implicitly imports 'locale' and would otherwise
# trigger an infinite loop.
module = _new_module(spec.name)
self.init_module_attrs(module)
return module
def _exec(self, module):
"""Do everything necessary to execute the module.
The namespace of `module` is used as the target of execution.
This method uses the loader's `exec_module()` method.
"""
self.spec.loader.exec_module(module)
# Used by importlib.reload() and _load_module_shim().
def exec(self, module):
"""Execute the spec in an existing module's namespace."""
name = self.spec.name
_imp.acquire_lock()
with _ModuleLockManager(name):
if sys.modules.get(name) is not module:
msg = 'module {!r} not in sys.modules'.format(name)
raise ImportError(msg, name=name)
if self.spec.loader is None:
if self.spec.submodule_search_locations is None:
raise ImportError('missing loader', name=self.spec.name)
# namespace package
self.init_module_attrs(module, _override=True)
return module
self.init_module_attrs(module, _override=True)
if not hasattr(self.spec.loader, 'exec_module'):
# (issue19713) Once BuiltinImporter and ExtensionFileLoader
# have exec_module() implemented, we can add a deprecation
# warning here.
self.spec.loader.load_module(name)
else:
self._exec(module)
return sys.modules[name]
def _load_backward_compatible(self):
# (issue19713) Once BuiltinImporter and ExtensionFileLoader
# have exec_module() implemented, we can add a deprecation
# warning here.
spec = self.spec
spec.loader.load_module(spec.name)
# The module must be in sys.modules at this point!
module = sys.modules[spec.name]
if getattr(module, '__loader__', None) is None:
try:
module.__loader__ = spec.loader
except AttributeError:
pass
if getattr(module, '__package__', None) is None:
try:
# Since module.__path__ may not line up with
# spec.submodule_search_paths, we can't necessarily rely
# on spec.parent here.
module.__package__ = module.__name__
if not hasattr(module, '__path__'):
module.__package__ = spec.name.rpartition('.')[0]
except AttributeError:
pass
if getattr(module, '__spec__', None) is None:
try:
module.__spec__ = spec
except AttributeError:
pass
return module
def _load_unlocked(self):
# A helper for direct use by the import system.
if self.spec.loader is not None:
# not a namespace package
if not hasattr(self.spec.loader, 'exec_module'):
return self._load_backward_compatible()
module = self.create()
with _installed_safely(module):
if self.spec.loader is None:
if self.spec.submodule_search_locations is None:
raise ImportError('missing loader', name=self.spec.name)
# A namespace package so do nothing.
else:
self._exec(module)
# We don't ensure that the import-related module attributes get
# set in the sys.modules replacement case. Such modules are on
# their own.
return sys.modules[self.spec.name]
# A method used during testing of _load_unlocked() and by
# _load_module_shim().
def load(self):
"""Return a new module object, loaded by the spec's loader.
The module is not added to its parent.
If a module is already in sys.modules, that existing module gets
clobbered.
"""
_imp.acquire_lock()
with _ModuleLockManager(self.spec.name):
return self._load_unlocked()
def _fix_up_module(ns, name, pathname, cpathname=None):
# This function is used by PyImport_ExecCodeModuleObject().
loader = ns.get('__loader__')
spec = ns.get('__spec__')
if not loader:
if spec:
loader = spec.loader
elif pathname == cpathname:
loader = SourcelessFileLoader(name, pathname)
else:
loader = SourceFileLoader(name, pathname)
if not spec:
spec = spec_from_file_location(name, pathname, loader=loader)
try:
ns['__spec__'] = spec
ns['__loader__'] = loader
ns['__file__'] = pathname
ns['__cached__'] = cpathname
except Exception:
# Not important enough to report.
pass
# Loaders #####################################################################
class BuiltinImporter:
"""Meta path import for built-in modules.
All methods are either class or static methods to avoid the need to
instantiate the class.
"""
@staticmethod
def module_repr(module):
"""Return repr for the module.
The method is deprecated. The import machinery does the job itself.
"""
return '<module {!r} (built-in)>'.format(module.__name__)
@classmethod
def find_spec(cls, fullname, path=None, target=None):
if path is not None:
return None
if _imp.is_builtin(fullname):
return spec_from_loader(fullname, cls, origin='built-in')
else:
return None
@classmethod
def find_module(cls, fullname, path=None):
"""Find the built-in module.
If 'path' is ever specified then the search is considered a failure.
This method is deprecated. Use find_spec() instead.
"""
spec = cls.find_spec(fullname, path)
return spec.loader if spec is not None else None
@classmethod
@_requires_builtin
def load_module(cls, fullname):
"""Load a built-in module."""
# Once an exec_module() implementation is added we can also
# add a deprecation warning here.
with _ManageReload(fullname):
module = _call_with_frames_removed(_imp.init_builtin, fullname)
module.__loader__ = cls
module.__package__ = ''
return module
@classmethod
@_requires_builtin
def get_code(cls, fullname):
"""Return None as built-in modules do not have code objects."""
return None
@classmethod
@_requires_builtin
def get_source(cls, fullname):
"""Return None as built-in modules do not have source code."""
return None
@classmethod
@_requires_builtin
def is_package(cls, fullname):
"""Return False as built-in modules are never packages."""
return False
class FrozenImporter:
"""Meta path import for frozen modules.
All methods are either class or static methods to avoid the need to
instantiate the class.
"""
@staticmethod
def module_repr(m):
"""Return repr for the module.
The method is deprecated. The import machinery does the job itself.
"""
return '<module {!r} (frozen)>'.format(m.__name__)
@classmethod
def find_spec(cls, fullname, path=None, target=None):
if _imp.is_frozen(fullname):
return spec_from_loader(fullname, cls, origin='frozen')
else:
return None
@classmethod
def find_module(cls, fullname, path=None):
"""Find a frozen module.
This method is deprecated. Use find_spec() instead.
"""
return cls if _imp.is_frozen(fullname) else None
@staticmethod
def exec_module(module):
name = module.__spec__.name
if not _imp.is_frozen(name):
raise ImportError('{!r} is not a frozen module'.format(name),
name=name)
code = _call_with_frames_removed(_imp.get_frozen_object, name)
exec(code, module.__dict__)
@classmethod
def load_module(cls, fullname):
"""Load a frozen module.
This method is deprecated. Use exec_module() instead.
"""
return _load_module_shim(cls, fullname)
@classmethod
@_requires_frozen
def get_code(cls, fullname):
"""Return the code object for the frozen module."""
return _imp.get_frozen_object(fullname)
@classmethod
@_requires_frozen
def get_source(cls, fullname):
"""Return None as frozen modules do not have source code."""
return None
@classmethod
@_requires_frozen
def is_package(cls, fullname):
"""Return True if the frozen module is a package."""
return _imp.is_frozen_package(fullname)
class WindowsRegistryFinder:
"""Meta path finder for modules declared in the Windows registry."""
REGISTRY_KEY = (
'Software\\Python\\PythonCore\\{sys_version}'
'\\Modules\\{fullname}')
REGISTRY_KEY_DEBUG = (
'Software\\Python\\PythonCore\\{sys_version}'
'\\Modules\\{fullname}\\Debug')
DEBUG_BUILD = False # Changed in _setup()
@classmethod
def _open_registry(cls, key):
try:
return _winreg.OpenKey(_winreg.HKEY_CURRENT_USER, key)
except OSError:
return _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, key)
@classmethod
def _search_registry(cls, fullname):
if cls.DEBUG_BUILD:
registry_key = cls.REGISTRY_KEY_DEBUG
else:
registry_key = cls.REGISTRY_KEY
key = registry_key.format(fullname=fullname,
sys_version=sys.version[:3])
try:
with cls._open_registry(key) as hkey:
filepath = _winreg.QueryValue(hkey, '')
except OSError:
return None
return filepath
@classmethod
def find_spec(cls, fullname, path=None, target=None):
filepath = cls._search_registry(fullname)
if filepath is None:
return None
try:
_path_stat(filepath)
except OSError:
return None
for loader, suffixes in _get_supported_file_loaders():
if filepath.endswith(tuple(suffixes)):
spec = spec_from_loader(fullname, loader(fullname, filepath),
origin=filepath)
return spec
@classmethod
def find_module(cls, fullname, path=None):
"""Find module named in the registry.
This method is deprecated. Use exec_module() instead.
"""
spec = cls.find_spec(fullname, path)
if spec is not None:
return spec.loader
else:
return None
class _LoaderBasics:
"""Base class of common code needed by both SourceLoader and
SourcelessFileLoader."""
def is_package(self, fullname):
"""Concrete implementation of InspectLoader.is_package by checking if
the path returned by get_filename has a filename of '__init__.py'."""
filename = _path_split(self.get_filename(fullname))[1]
filename_base = filename.rsplit('.', 1)[0]
tail_name = fullname.rpartition('.')[2]
return filename_base == '__init__' and tail_name != '__init__'
def exec_module(self, module):
"""Execute the module."""
code = self.get_code(module.__name__)
if code is None:
raise ImportError('cannot load module {!r} when get_code() '
'returns None'.format(module.__name__))
_call_with_frames_removed(exec, code, module.__dict__)
load_module = _load_module_shim
class SourceLoader(_LoaderBasics):
def path_mtime(self, path):
"""Optional method that returns the modification time (an int) for the
specified path, where path is a str.
Raises IOError when the path cannot be handled.
"""
raise IOError
def path_stats(self, path):
"""Optional method returning a metadata dict for the specified path
to by the path (str).
Possible keys:
- 'mtime' (mandatory) is the numeric timestamp of last source
code modification;
- 'size' (optional) is the size in bytes of the source code.
Implementing this method allows the loader to read bytecode files.
Raises IOError when the path cannot be handled.
"""
return {'mtime': self.path_mtime(path)}
def _cache_bytecode(self, source_path, cache_path, data):
"""Optional method which writes data (bytes) to a file path (a str).
Implementing this method allows for the writing of bytecode files.
The source path is needed in order to correctly transfer permissions
"""
# For backwards compatibility, we delegate to set_data()
return self.set_data(cache_path, data)
def set_data(self, path, data):
"""Optional method which writes data (bytes) to a file path (a str).
Implementing this method allows for the writing of bytecode files.
"""
def get_source(self, fullname):
"""Concrete implementation of InspectLoader.get_source."""
path = self.get_filename(fullname)
try:
source_bytes = self.get_data(path)
except OSError as exc:
raise ImportError('source not available through get_data()',
name=fullname) from exc
return decode_source(source_bytes)
def source_to_code(self, data, path, *, _optimize=-1):
"""Return the code object compiled from source.
The 'data' argument can be any object type that compile() supports.
"""
return _call_with_frames_removed(compile, data, path, 'exec',
dont_inherit=True, optimize=_optimize)
def get_code(self, fullname):
"""Concrete implementation of InspectLoader.get_code.
Reading of bytecode requires path_stats to be implemented. To write
bytecode, set_data must also be implemented.
"""
source_path = self.get_filename(fullname)
source_mtime = None
source_bytes = self.get_data(source_path)
code_object = self.source_to_code(source_bytes, source_path)
_verbose_message('code object from {}', source_path)
return code_object
class FileLoader:
"""Base file loader class which implements the loader protocol methods that
require file system usage."""
def __init__(self, fullname, path):
"""Cache the module name and the path to the file found by the
finder."""
self.name = fullname
self.path = path
def __eq__(self, other):
return (self.__class__ == other.__class__ and
self.__dict__ == other.__dict__)
def __hash__(self):
return hash(self.name) ^ hash(self.path)
@_check_name
def load_module(self, fullname):
"""Load a module from a file.
This method is deprecated. Use exec_module() instead.
"""
# The only reason for this method is for the name check.
# Issue #14857: Avoid the zero-argument form of super so the implementation
# of that form can be updated without breaking the frozen module
return super(FileLoader, self).load_module(fullname)
@_check_name
def get_filename(self, fullname):
"""Return the path to the source file as found by the finder."""
return self.path
def get_data(self, path):
"""Return the data from path as raw bytes."""
with _io.FileIO(path, 'r') as file:
return file.read()
class SourceFileLoader(FileLoader, SourceLoader):
"""Concrete implementation of SourceLoader using the file system."""
def path_stats(self, path):
"""Return the metadata for the path."""
st = _path_stat(path)
return {'mtime': st.st_mtime, 'size': st.st_size}
def _cache_bytecode(self, source_path, bytecode_path, data):
# Adapt between the two APIs
mode = _calc_mode(source_path)
return self.set_data(bytecode_path, data, _mode=mode)
def set_data(self, path, data, *, _mode=0o666):
"""Write bytes data to a file."""
parent, filename = _path_split(path)
path_parts = []
# Figure out what directories are missing.
while parent and not _path_isdir(parent):
parent, part = _path_split(parent)
path_parts.append(part)
# Create needed directories.
for part in reversed(path_parts):
parent = _path_join(parent, part)
try:
_os.mkdir(parent)
except FileExistsError:
# Probably another Python process already created the dir.
continue
except OSError as exc:
# Could be a permission error, read-only filesystem: just forget
# about writing the data.
_verbose_message('could not create {!r}: {!r}', parent, exc)
return
try:
_write_atomic(path, data, _mode)
_verbose_message('created {!r}', path)
except OSError as exc:
# Same as above: just don't write the bytecode.
_verbose_message('could not create {!r}: {!r}', path, exc)
class SourcelessFileLoader(FileLoader, _LoaderBasics):
"""Loader which handles sourceless file imports."""
def get_code(self, fullname):
path = self.get_filename(fullname)
data = self.get_data(path)
bytes_data = _validate_bytecode_header(data, name=fullname, path=path)
return _compile_bytecode(bytes_data, name=fullname, bytecode_path=path)
def get_source(self, fullname):
"""Return None as there is no source code."""
return None
# Filled in by _setup().
EXTENSION_SUFFIXES = []
class ExtensionFileLoader:
"""Loader for extension modules.
The constructor is designed to work with FileFinder.
"""
def __init__(self, name, path):
self.name = name
self.path = path
def __eq__(self, other):
return (self.__class__ == other.__class__ and
self.__dict__ == other.__dict__)
def __hash__(self):
return hash(self.name) ^ hash(self.path)
@_check_name
def load_module(self, fullname):
"""Load an extension module."""
# Once an exec_module() implementation is added we can also
# add a deprecation warning here.
with _ManageReload(fullname):
module = _call_with_frames_removed(_imp.load_dynamic,
fullname, self.path)
_verbose_message('extension module loaded from {!r}', self.path)
is_package = self.is_package(fullname)
if is_package and not hasattr(module, '__path__'):
module.__path__ = [_path_split(self.path)[0]]
module.__loader__ = self
module.__package__ = module.__name__
if not is_package:
module.__package__ = module.__package__.rpartition('.')[0]
return module
def is_package(self, fullname):
"""Return True if the extension module is a package."""
file_name = _path_split(self.path)[1]
return any(file_name == '__init__' + suffix
for suffix in EXTENSION_SUFFIXES)
def get_code(self, fullname):
"""Return None as an extension module cannot create a code object."""
return None
def get_source(self, fullname):
"""Return None as extension modules have no source code."""
return None
@_check_name
def get_filename(self, fullname):
"""Return the path to the source file as found by the finder."""
return self.path
class _NamespacePath:
"""Represents a namespace package's path. It uses the module name
to find its parent module, and from there it looks up the parent's
__path__. When this changes, the module's own path is recomputed,
using path_finder. For top-level modules, the parent module's path
is sys.path."""
def __init__(self, name, path, path_finder):
self._name = name
self._path = path
self._last_parent_path = tuple(self._get_parent_path())
self._path_finder = path_finder
def _find_parent_path_names(self):
"""Returns a tuple of (parent-module-name, parent-path-attr-name)"""
parent, dot, me = self._name.rpartition('.')
if dot == '':
# This is a top-level module. sys.path contains the parent path.
return 'sys', 'path'
# Not a top-level module. parent-module.__path__ contains the
# parent path.
return parent, '__path__'
def _get_parent_path(self):
parent_module_name, path_attr_name = self._find_parent_path_names()
return getattr(sys.modules[parent_module_name], path_attr_name)
def _recalculate(self):
# If the parent's path has changed, recalculate _path
parent_path = tuple(self._get_parent_path()) # Make a copy
if parent_path != self._last_parent_path:
spec = self._path_finder(self._name, parent_path)
# Note that no changes are made if a loader is returned, but we
# do remember the new parent path
if spec is not None and spec.loader is None:
if spec.submodule_search_locations:
self._path = spec.submodule_search_locations
self._last_parent_path = parent_path # Save the copy
return self._path
def __iter__(self):
return iter(self._recalculate())
def __len__(self):
return len(self._recalculate())
def __repr__(self):
return '_NamespacePath({!r})'.format(self._path)
def __contains__(self, item):
return item in self._recalculate()
def append(self, item):
self._path.append(item)
# We use this exclusively in init_module_attrs() for backward-compatibility.
class _NamespaceLoader:
def __init__(self, name, path, path_finder):
self._path = _NamespacePath(name, path, path_finder)
@classmethod
def module_repr(cls, module):
"""Return repr for the module.
The method is deprecated. The import machinery does the job itself.
"""
return '<module {!r} (namespace)>'.format(module.__name__)
def is_package(self, fullname):
return True
def get_source(self, fullname):
return ''
def get_code(self, fullname):
return compile('', '<string>', 'exec', dont_inherit=True)
def exec_module(self, module):
pass
def load_module(self, fullname):
"""Load a namespace module.
This method is deprecated. Use exec_module() instead.
"""
# The import system never calls this method.
_verbose_message('namespace module loaded with path {!r}', self._path)
return _load_module_shim(self, fullname)
# Finders #####################################################################
class PathFinder:
"""Meta path finder for sys.path and package __path__ attributes."""
@classmethod
def invalidate_caches(cls):
"""Call the invalidate_caches() method on all path entry finders
stored in sys.path_importer_caches (where implemented)."""
for finder in sys.path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
@classmethod
def _path_hooks(cls, path):
"""Search sequence of hooks for a finder for 'path'.
If 'hooks' is false then use sys.path_hooks.
"""
if not sys.path_hooks:
_warnings.warn('sys.path_hooks is empty', ImportWarning)
for hook in sys.path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
@classmethod
def _path_importer_cache(cls, path):
"""Get the finder for the path entry from sys.path_importer_cache.
If the path entry is not in the cache, find the appropriate finder
and cache it. If no finder is available, store None.
"""
if path == '':
path = _os.getcwd()
# ironpython: optimization to avoid KeyError exception
finder = sys.path_importer_cache.get(path, _unspecified)
if finder is _unspecified:
finder = cls._path_hooks(path)
sys.path_importer_cache[path] = finder
return finder
@classmethod
def _legacy_get_spec(cls, fullname, finder):
# This would be a good place for a DeprecationWarning if
# we ended up going that route.
if hasattr(finder, 'find_loader'):
loader, portions = finder.find_loader(fullname)
else:
loader = finder.find_module(fullname)
portions = []
if loader is not None:
return spec_from_loader(fullname, loader)
spec = ModuleSpec(fullname, None)
spec.submodule_search_locations = portions
return spec
@classmethod
def _get_spec(cls, fullname, path, target=None):
"""Find the loader or namespace_path for this module/package name."""
# If this ends up being a namespace package, namespace_path is
# the list of paths that will become its __path__
namespace_path = []
for entry in path:
if not isinstance(entry, (str, bytes)):
continue
finder = cls._path_importer_cache(entry)
if finder is not None:
if hasattr(finder, 'find_spec'):
spec = finder.find_spec(fullname, target)
else:
spec = cls._legacy_get_spec(fullname, finder)
if spec is None:
continue
if spec.loader is not None:
return spec
portions = spec.submodule_search_locations
if portions is None:
raise ImportError('spec missing loader')
# This is possibly part of a namespace package.
# Remember these path entries (if any) for when we
# create a namespace package, and continue iterating
# on path.
namespace_path.extend(portions)
else:
spec = ModuleSpec(fullname, None)
spec.submodule_search_locations = namespace_path
return spec
@classmethod
def find_spec(cls, fullname, path=None, target=None):
"""find the module on sys.path or 'path' based on sys.path_hooks and
sys.path_importer_cache."""
if path is None:
path = sys.path
spec = cls._get_spec(fullname, path, target)
if spec is None:
return None
elif spec.loader is None:
namespace_path = spec.submodule_search_locations
if namespace_path:
# We found at least one namespace path. Return a
# spec which can create the namespace package.
spec.origin = 'namespace'
spec.submodule_search_locations = _NamespacePath(fullname, namespace_path, cls._get_spec)
return spec
else:
return None
else:
return spec
@classmethod
def find_module(cls, fullname, path=None):
"""find the module on sys.path or 'path' based on sys.path_hooks and
sys.path_importer_cache.
This method is deprecated. Use find_spec() instead.
"""
spec = cls.find_spec(fullname, path)
if spec is None:
return None
return spec.loader
class FileFinder:
"""File-based finder.
Interactions with the file system are cached for performance, being
refreshed when the directory the finder is handling has been modified.
"""
def __init__(self, path, *loader_details):
"""Initialize with the path to search on and a variable number of
2-tuples containing the loader and the file suffixes the loader
recognizes."""
loaders = []
for loader, suffixes in loader_details:
loaders.extend((suffix, loader) for suffix in suffixes)
self._loaders = loaders
# Base (directory) path
self.path = path or '.'
self._path_mtime = -1
self._path_cache = set()
self._relaxed_path_cache = set()
def invalidate_caches(self):
"""Invalidate the directory mtime."""
self._path_mtime = -1
find_module = _find_module_shim
def find_loader(self, fullname):
"""Try to find a loader for the specified module, or the namespace
package portions. Returns (loader, list-of-portions).
This method is deprecated. Use find_spec() instead.
"""
spec = self.find_spec(fullname)
if spec is None:
return None, []
return spec.loader, spec.submodule_search_locations or []
def _get_spec(self, loader_class, fullname, path, smsl, target):
loader = loader_class(fullname, path)
return spec_from_file_location(fullname, path, loader=loader,
submodule_search_locations=smsl)
def find_spec(self, fullname, target=None):
"""Try to find a loader for the specified module, or the namespace
package portions. Returns (loader, list-of-portions)."""
is_namespace = False
tail_module = fullname.rpartition('.')[2]
try:
mtime = _path_stat(self.path or _os.getcwd()).st_mtime
except OSError:
mtime = -1
if mtime != self._path_mtime:
self._fill_cache()
self._path_mtime = mtime
# tail_module keeps the original casing, for __file__ and friends
if _relax_case():
cache = self._relaxed_path_cache
cache_module = tail_module.lower()
else:
cache = self._path_cache
cache_module = tail_module
# Check if the module is the name of a directory (and thus a package).
if cache_module in cache:
base_path = _path_join(self.path, tail_module)
for suffix, loader_class in self._loaders:
init_filename = '__init__' + suffix
full_path = _path_join(base_path, init_filename)
if _path_isfile(full_path):
return self._get_spec(loader_class, fullname, full_path, [base_path], target)
else:
# If a namespace package, return the path if we don't
# find a module in the next section.
is_namespace = _path_isdir(base_path)
# Check for a file w/ a proper suffix exists.
for suffix, loader_class in self._loaders:
full_path = _path_join(self.path, tail_module + suffix)
_verbose_message('trying {}'.format(full_path), verbosity=2)
if cache_module + suffix in cache:
if _path_isfile(full_path):
return self._get_spec(loader_class, fullname, full_path, None, target)
if is_namespace:
_verbose_message('possible namespace for {}'.format(base_path))
spec = ModuleSpec(fullname, None)
spec.submodule_search_locations = [base_path]
return spec
return None
def _fill_cache(self):
"""Fill the cache of potential modules and packages for this directory."""
path = self.path
try:
contents = _os.listdir(path or _os.getcwd())
except (FileNotFoundError, PermissionError, NotADirectoryError):
# Directory has either been removed, turned into a file, or made
# unreadable.
contents = []
# We store two cached versions, to handle runtime changes of the
# PYTHONCASEOK environment variable.
if not sys.platform.startswith('win'):
self._path_cache = set(contents)
else:
# Windows users can import modules with case-insensitive file
# suffixes (for legacy reasons). Make the suffix lowercase here
# so it's done once instead of for every import. This is safe as
# the specified suffixes to check against are always specified in a
# case-sensitive manner.
lower_suffix_contents = set()
for item in contents:
name, dot, suffix = item.partition('.')
if dot:
new_name = '{}.{}'.format(name, suffix.lower())
else:
new_name = name
lower_suffix_contents.add(new_name)
self._path_cache = lower_suffix_contents
if sys.platform.startswith(_CASE_INSENSITIVE_PLATFORMS):
self._relaxed_path_cache = {fn.lower() for fn in contents}
@classmethod
def path_hook(cls, *loader_details):
"""A class method which returns a closure to use on sys.path_hook
which will return an instance using the specified loaders and the path
called on the closure.
If the path called on the closure is not a directory, ImportError is
raised.
"""
def path_hook_for_FileFinder(path):
"""Path hook for importlib.machinery.FileFinder."""
if not _path_isdir(path):
raise ImportError('only directories are supported', path=path)
return cls(path, *loader_details)
return path_hook_for_FileFinder
def __repr__(self):
return 'FileFinder({!r})'.format(self.path)
# Import itself ###############################################################
class _ImportLockContext:
"""Context manager for the import lock."""
def __enter__(self):
"""Acquire the import lock."""
_imp.acquire_lock()
def __exit__(self, exc_type, exc_value, exc_traceback):
"""Release the import lock regardless of any raised exceptions."""
_imp.release_lock()
def _resolve_name(name, package, level):
"""Resolve a relative module name to an absolute one."""
bits = package.rsplit('.', level - 1)
if len(bits) < level:
raise ValueError('attempted relative import beyond top-level package')
base = bits[0]
return '{}.{}'.format(base, name) if name else base
def _find_spec_legacy(finder, name, path):
# This would be a good place for a DeprecationWarning if
# we ended up going that route.
loader = finder.find_module(name, path)
if loader is None:
return None
return spec_from_loader(name, loader)
def _find_spec(name, path, target=None):
"""Find a module's loader."""
if not sys.meta_path:
_warnings.warn('sys.meta_path is empty', ImportWarning)
# We check sys.modules here for the reload case. While a passed-in
# target will usually indicate a reload there is no guarantee, whereas
# sys.modules provides one.
is_reload = name in sys.modules
for finder in sys.meta_path:
with _ImportLockContext():
try:
find_spec = finder.find_spec
except AttributeError:
spec = _find_spec_legacy(finder, name, path)
if spec is None:
continue
else:
spec = find_spec(name, path, target)
if spec is not None:
# The parent import may have already imported this module.
if not is_reload and name in sys.modules:
module = sys.modules[name]
try:
__spec__ = module.__spec__
except AttributeError:
# We use the found spec since that is the one that
# we would have used if the parent module hadn't
# beaten us to the punch.
return spec
else:
if __spec__ is None:
return spec
else:
return __spec__
else:
return spec
else:
return None
def _sanity_check(name, package, level):
"""Verify arguments are "sane"."""
if not isinstance(name, str):
raise TypeError('module name must be str, not {}'.format(type(name)))
if level < 0:
raise ValueError('level must be >= 0')
if package:
if not isinstance(package, str):
raise TypeError('__package__ not set to a string')
elif package not in sys.modules:
msg = ('Parent module {!r} not loaded, cannot perform relative '
'import')
raise SystemError(msg.format(package))
if not name and level == 0:
raise ValueError('Empty module name')
_ERR_MSG_PREFIX = 'No module named '
_ERR_MSG = _ERR_MSG_PREFIX + '{!r}'
def _find_and_load_unlocked(name, import_):
path = None
parent = name.rpartition('.')[0]
if parent:
if parent not in sys.modules:
_call_with_frames_removed(import_, parent)
# Crazy side-effects!
if name in sys.modules:
return sys.modules[name]
parent_module = sys.modules[parent]
try:
path = parent_module.__path__
except AttributeError:
msg = (_ERR_MSG + '; {!r} is not a package').format(name, parent)
raise ImportError(msg, name=name)
spec = _find_spec(name, path)
if spec is None:
raise ImportError(_ERR_MSG.format(name), name=name)
else:
module = _SpecMethods(spec)._load_unlocked()
if parent:
# Set the module as an attribute on its parent.
parent_module = sys.modules[parent]
setattr(parent_module, name.rpartition('.')[2], module)
return module
def _find_and_load(name, import_):
"""Find and load the module, and release the import lock."""
with _ModuleLockManager(name):
return _find_and_load_unlocked(name, import_)
def _gcd_import(name, package=None, level=0):
"""Import and return the module based on its name, the package the call is
being made from, and the level adjustment.
This function represents the greatest common denominator of functionality
between import_module and __import__. This includes setting __package__ if
the loader did not.
"""
_sanity_check(name, package, level)
if level > 0:
name = _resolve_name(name, package, level)
_imp.acquire_lock()
if name not in sys.modules:
return _find_and_load(name, _gcd_import)
module = sys.modules[name]
if module is None:
_imp.release_lock()
message = ('import of {} halted; '
'None in sys.modules'.format(name))
raise ImportError(message, name=name)
_lock_unlock_module(name)
return module
def _handle_fromlist(module, fromlist, import_):
"""Figure out what __import__ should return.
The import_ parameter is a callable which takes the name of module to
import. It is required to decouple the function from assuming importlib's
import implementation is desired.
"""
# The hell that is fromlist ...
# If a package was imported, try to import stuff from fromlist.
if hasattr(module, '__path__'):
if '*' in fromlist:
fromlist = list(fromlist)
fromlist.remove('*')
if hasattr(module, '__all__'):
fromlist.extend(module.__all__)
for x in fromlist:
if not hasattr(module, x):
from_name = '{}.{}'.format(module.__name__, x)
try:
_call_with_frames_removed(import_, from_name)
except ImportError as exc:
# Backwards-compatibility dictates we ignore failed
# imports triggered by fromlist for modules that don't
# exist.
if str(exc).startswith(_ERR_MSG_PREFIX):
if exc.name == from_name:
continue
raise
return module
def _calc___package__(globals):
"""Calculate what __package__ should be.
__package__ is not guaranteed to be defined or could be set to None
to represent that its proper value is unknown.
"""
package = globals.get('__package__')
if package is None:
package = globals['__name__']
if '__path__' not in globals:
package = package.rpartition('.')[0]
return package
def _get_supported_file_loaders():
"""Returns a list of file-based module loaders.
Each item is a tuple (loader, suffixes).
"""
source = SourceFileLoader, SOURCE_SUFFIXES
return [source]
def __import__(name, globals=None, locals=None, fromlist=(), level=0):
"""Import a module.
The 'globals' argument is used to infer where the import is occuring from
to handle relative imports. The 'locals' argument is ignored. The
'fromlist' argument specifies what should exist as attributes on the module
being imported (e.g. ``from module import <fromlist>``). The 'level'
argument represents the package location to import from in a relative
import (e.g. ``from ..pkg import mod`` would have a 'level' of 2).
"""
if level == 0:
module = _gcd_import(name)
else:
globals_ = globals if globals is not None else {}
package = _calc___package__(globals_)
module = _gcd_import(name, package, level)
if not fromlist:
# Return up to the first dot in 'name'. This is complicated by the fact
# that 'name' may be relative.
if level == 0:
return _gcd_import(name.partition('.')[0])
elif not name:
return module
else:
# Figure out where to slice the module's name up to the first dot
# in 'name'.
cut_off = len(name) - len(name.partition('.')[0])
# Slice end needs to be positive to alleviate need to special-case
# when ``'.' not in name``.
return sys.modules[module.__name__[:len(module.__name__)-cut_off]]
else:
return _handle_fromlist(module, fromlist, _gcd_import)
def _builtin_from_name(name):
spec = BuiltinImporter.find_spec(name)
if spec is None:
raise ImportError('no built-in module named ' + name)
methods = _SpecMethods(spec)
return methods._load_unlocked()
def _setup(sys_module, _imp_module):
"""Setup importlib by importing needed built-in modules and injecting them
into the global namespace.
As sys is needed for sys.modules access and _imp is needed to load built-in
modules, those two modules must be explicitly passed in.
"""
global _imp, sys, BYTECODE_SUFFIXES
_imp = _imp_module
sys = sys_module
if sys.flags.optimize:
BYTECODE_SUFFIXES = OPTIMIZED_BYTECODE_SUFFIXES
else:
BYTECODE_SUFFIXES = DEBUG_BYTECODE_SUFFIXES
# Set up the spec for existing builtin/frozen modules.
module_type = type(sys)
for name, module in sys.modules.items():
if isinstance(module, module_type):
if name in sys.builtin_module_names:
loader = BuiltinImporter
elif _imp.is_frozen(name):
loader = FrozenImporter
else:
continue
spec = _spec_from_module(module, loader)
methods = _SpecMethods(spec)
methods.init_module_attrs(module)
# Directly load built-in modules needed during bootstrap.
self_module = sys.modules[__name__]
for builtin_name in ('_io', '_warnings', 'builtins', 'marshal'):
if builtin_name not in sys.modules:
builtin_module = _builtin_from_name(builtin_name)
else:
builtin_module = sys.modules[builtin_name]
setattr(self_module, builtin_name, builtin_module)
# Directly load the os module (needed during bootstrap).
os_details = ('posix', ['/']), ('nt', ['\\', '/'])
if sys.platform == 'win32': os_details = reversed(os_details) # ironpython: optimization to avoid ImportError exception
for builtin_os, path_separators in os_details:
# Assumption made in _path_join()
assert all(len(sep) == 1 for sep in path_separators)
path_sep = path_separators[0]
if builtin_os in sys.modules:
os_module = sys.modules[builtin_os]
break
else:
try:
os_module = _builtin_from_name(builtin_os)
break
except ImportError:
continue
else:
raise ImportError('importlib requires posix or nt')
setattr(self_module, '_os', os_module)
setattr(self_module, 'path_sep', path_sep)
setattr(self_module, 'path_separators', ''.join(path_separators))
# Directly load the _thread module (needed during bootstrap).
try:
thread_module = _builtin_from_name('_thread')
except ImportError:
# Python was built without threads
thread_module = None
setattr(self_module, '_thread', thread_module)
# Directly load the _weakref module (needed during bootstrap).
weakref_module = _builtin_from_name('_weakref')
setattr(self_module, '_weakref', weakref_module)
# Constants
setattr(self_module, '_relax_case', _make_relax_case())
EXTENSION_SUFFIXES.extend(_imp.extension_suffixes())
def _install(sys_module, _imp_module):
"""Install importlib as the implementation of import."""
_setup(sys_module, _imp_module)
supported_loaders = _get_supported_file_loaders()
sys.path_hooks.extend([FileFinder.path_hook(*supported_loaders)])
sys.meta_path.append(BuiltinImporter)
sys.meta_path.append(FrozenImporter)
sys.meta_path.append(PathFinder)
"""A pure Python implementation of import."""
__all__ = ['__import__', 'import_module', 'invalidate_caches', 'reload']
# Bootstrap help #####################################################
# Until bootstrapping is complete, DO NOT import any modules that attempt
# to import importlib._bootstrap (directly or indirectly). Since this
# partially initialised package would be present in sys.modules, those
# modules would get an uninitialised copy of the source version, instead
# of a fully initialised version (either the frozen one or the one
# initialised below if the frozen one is not available).
import _imp # Just the builtin component, NOT the full Python module
import sys
try:
import _frozen_importlib as _bootstrap
except ImportError:
from . import _bootstrap
_bootstrap._setup(sys, _imp)
else:
# importlib._bootstrap is the built-in import, ensure we don't create
# a second copy of the module.
_bootstrap.__name__ = 'importlib._bootstrap'
_bootstrap.__package__ = 'importlib'
try:
_bootstrap.__file__ = __file__.replace('__init__.py', '_bootstrap.py')
except NameError:
# __file__ is not guaranteed to be defined, e.g. if this code gets
# frozen by a tool like cx_Freeze.
pass
sys.modules['importlib._bootstrap'] = _bootstrap
# To simplify imports in test code
_w_long = _bootstrap._w_long
_r_long = _bootstrap._r_long
# Fully bootstrapped at this point, import whatever you like, circular
# dependencies and startup overhead minimisation permitting :)
import types
import warnings
# Public API #########################################################
from ._bootstrap import __import__
def invalidate_caches():
"""Call the invalidate_caches() method on all meta path finders stored in
sys.meta_path (where implemented)."""
for finder in sys.meta_path:
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
def find_loader(name, path=None):
"""Return the loader for the specified module.
This is a backward-compatible wrapper around find_spec().
This function is deprecated in favor of importlib.util.find_spec().
"""
warnings.warn('Use importlib.util.find_spec() instead.',
DeprecationWarning, stacklevel=2)
try:
loader = sys.modules[name].__loader__
if loader is None:
raise ValueError('{}.__loader__ is None'.format(name))
else:
return loader
except KeyError:
pass
except AttributeError:
raise ValueError('{}.__loader__ is not set'.format(name))
spec = _bootstrap._find_spec(name, path)
# We won't worry about malformed specs (missing attributes).
if spec is None:
return None
if spec.loader is None:
if spec.submodule_search_locations is None:
raise ImportError('spec for {} missing loader'.format(name),
name=name)
raise ImportError('namespace packages do not have loaders',
name=name)
return spec.loader
def import_module(name, package=None):
"""Import a module.
The 'package' argument is required when performing a relative import. It
specifies the package to use as the anchor point from which to resolve the
relative import to an absolute import.
"""
level = 0
if name.startswith('.'):
if not package:
msg = ("the 'package' argument is required to perform a relative "
"import for {!r}")
raise TypeError(msg.format(name))
for character in name:
if character != '.':
break
level += 1
return _bootstrap._gcd_import(name[level:], package, level)
_RELOADING = {}
def reload(module):
"""Reload the module and return it.
The module must have been successfully imported before.
"""
if not module or not isinstance(module, types.ModuleType):
raise TypeError("reload() argument must be module")
try:
name = module.__spec__.name
except AttributeError:
name = module.__name__
if sys.modules.get(name) is not module:
msg = "module {} not in sys.modules"
raise ImportError(msg.format(name), name=name)
if name in _RELOADING:
return _RELOADING[name]
_RELOADING[name] = module
try:
parent_name = name.rpartition('.')[0]
if parent_name:
try:
parent = sys.modules[parent_name]
except KeyError:
msg = "parent {!r} not in sys.modules"
raise ImportError(msg.format(parent_name), name=parent_name)
else:
pkgpath = parent.__path__
else:
pkgpath = None
target = module
spec = module.__spec__ = _bootstrap._find_spec(name, pkgpath, target)
methods = _bootstrap._SpecMethods(spec)
methods.exec(module)
# The module may have replaced itself in sys.modules!
return sys.modules[name]
finally:
try:
del _RELOADING[name]
except KeyError:
pass
"""Implementation of JSONDecoder
"""
import re
from json import scanner
try:
from _json import scanstring as c_scanstring
except ImportError:
c_scanstring = None
__all__ = ['JSONDecoder']
FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
NaN = float('nan')
PosInf = float('inf')
NegInf = float('-inf')
def linecol(doc, pos):
if isinstance(doc, bytes):
newline = b'\n'
else:
newline = '\n'
lineno = doc.count(newline, 0, pos) + 1
if lineno == 1:
colno = pos + 1
else:
colno = pos - doc.rindex(newline, 0, pos)
return lineno, colno
def errmsg(msg, doc, pos, end=None):
# Note that this function is called from _json
lineno, colno = linecol(doc, pos)
if end is None:
fmt = '{0}: line {1} column {2} (char {3})'
return fmt.format(msg, lineno, colno, pos)
#fmt = '%s: line %d column %d (char %d)'
#return fmt % (msg, lineno, colno, pos)
endlineno, endcolno = linecol(doc, end)
fmt = '{0}: line {1} column {2} - line {3} column {4} (char {5} - {6})'
return fmt.format(msg, lineno, colno, endlineno, endcolno, pos, end)
#fmt = '%s: line %d column %d - line %d column %d (char %d - %d)'
#return fmt % (msg, lineno, colno, endlineno, endcolno, pos, end)
_CONSTANTS = {
'-Infinity': NegInf,
'Infinity': PosInf,
'NaN': NaN,
}
STRINGCHUNK = re.compile(r'(.*?)(["\\\x00-\x1f])', FLAGS)
BACKSLASH = {
'"': '"', '\\': '\\', '/': '/',
'b': '\b', 'f': '\f', 'n': '\n', 'r': '\r', 't': '\t',
}
def _decode_uXXXX(s, pos):
esc = s[pos + 1:pos + 5]
if len(esc) == 4 and esc[1] not in 'xX':
try:
return int(esc, 16)
except ValueError:
pass
msg = "Invalid \\uXXXX escape"
raise ValueError(errmsg(msg, s, pos))
def py_scanstring(s, end, strict=True,
_b=BACKSLASH, _m=STRINGCHUNK.match):
"""Scan the string s for a JSON string. End is the index of the
character in s after the quote that started the JSON string.
Unescapes all valid JSON string escape sequences and raises ValueError
on attempt to decode an invalid string. If strict is False then literal
control characters are allowed in the string.
Returns a tuple of the decoded string and the index of the character in s
after the end quote."""
chunks = []
_append = chunks.append
begin = end - 1
while 1:
chunk = _m(s, end)
if chunk is None:
raise ValueError(
errmsg("Unterminated string starting at", s, begin))
end = chunk.end()
content, terminator = chunk.groups()
# Content is contains zero or more unescaped string characters
if content:
_append(content)
# Terminator is the end of string, a literal control character,
# or a backslash denoting that an escape sequence follows
if terminator == '"':
break
elif terminator != '\\':
if strict:
#msg = "Invalid control character %r at" % (terminator,)
msg = "Invalid control character {0!r} at".format(terminator)
raise ValueError(errmsg(msg, s, end))
else:
_append(terminator)
continue
try:
esc = s[end]
except IndexError:
raise ValueError(
errmsg("Unterminated string starting at", s, begin))
# If not a unicode escape sequence, must be in the lookup table
if esc != 'u':
try:
char = _b[esc]
except KeyError:
msg = "Invalid \\escape: {0!r}".format(esc)
raise ValueError(errmsg(msg, s, end))
end += 1
else:
uni = _decode_uXXXX(s, end)
end += 5
if 0xd800 <= uni <= 0xdbff and s[end:end + 2] == '\\u':
uni2 = _decode_uXXXX(s, end + 1)
if 0xdc00 <= uni2 <= 0xdfff:
uni = 0x10000 + (((uni - 0xd800) << 10) | (uni2 - 0xdc00))
end += 6
char = chr(uni)
_append(char)
return ''.join(chunks), end
# Use speedup if available
scanstring = c_scanstring or py_scanstring
WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
WHITESPACE_STR = ' \t\n\r'
def JSONObject(s_and_end, strict, scan_once, object_hook, object_pairs_hook,
memo=None, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
pairs = []
pairs_append = pairs.append
# Backwards compatibility
if memo is None:
memo = {}
memo_get = memo.setdefault
# Use a slice to prevent IndexError from being raised, the following
# check will raise a more specific ValueError if the string is empty
nextchar = s[end:end + 1]
# Normally we expect nextchar == '"'
if nextchar != '"':
if nextchar in _ws:
end = _w(s, end).end()
nextchar = s[end:end + 1]
# Trivial empty object
if nextchar == '}':
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end + 1
pairs = {}
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end + 1
elif nextchar != '"':
raise ValueError(errmsg(
"Expecting property name enclosed in double quotes", s, end))
end += 1
while True:
key, end = scanstring(s, end, strict)
key = memo_get(key, key)
# To skip some function call overhead we optimize the fast paths where
# the JSON key separator is ": " or just ":".
if s[end:end + 1] != ':':
end = _w(s, end).end()
if s[end:end + 1] != ':':
raise ValueError(errmsg("Expecting ':' delimiter", s, end))
end += 1
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
try:
value, end = scan_once(s, end)
except StopIteration as err:
raise ValueError(errmsg("Expecting value", s, err.value)) from None
pairs_append((key, value))
try:
nextchar = s[end]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end]
except IndexError:
nextchar = ''
end += 1
if nextchar == '}':
break
elif nextchar != ',':
raise ValueError(errmsg("Expecting ',' delimiter", s, end - 1))
end = _w(s, end).end()
nextchar = s[end:end + 1]
end += 1
if nextchar != '"':
raise ValueError(errmsg(
"Expecting property name enclosed in double quotes", s, end - 1))
if object_pairs_hook is not None:
result = object_pairs_hook(pairs)
return result, end
pairs = dict(pairs)
if object_hook is not None:
pairs = object_hook(pairs)
return pairs, end
def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):
s, end = s_and_end
values = []
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
# Look-ahead for trivial empty array
if nextchar == ']':
return values, end + 1
_append = values.append
while True:
try:
value, end = scan_once(s, end)
except StopIteration as err:
raise ValueError(errmsg("Expecting value", s, err.value)) from None
_append(value)
nextchar = s[end:end + 1]
if nextchar in _ws:
end = _w(s, end + 1).end()
nextchar = s[end:end + 1]
end += 1
if nextchar == ']':
break
elif nextchar != ',':
raise ValueError(errmsg("Expecting ',' delimiter", s, end - 1))
try:
if s[end] in _ws:
end += 1
if s[end] in _ws:
end = _w(s, end + 1).end()
except IndexError:
pass
return values, end
class JSONDecoder(object):
"""Simple JSON <http://json.org> decoder
Performs the following translations in decoding by default:
+---------------+-------------------+
| JSON | Python |
+===============+===================+
| object | dict |
+---------------+-------------------+
| array | list |
+---------------+-------------------+
| string | str |
+---------------+-------------------+
| number (int) | int |
+---------------+-------------------+
| number (real) | float |
+---------------+-------------------+
| true | True |
+---------------+-------------------+
| false | False |
+---------------+-------------------+
| null | None |
+---------------+-------------------+
It also understands ``NaN``, ``Infinity``, and ``-Infinity`` as
their corresponding ``float`` values, which is outside the JSON spec.
"""
def __init__(self, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, strict=True,
object_pairs_hook=None):
"""``object_hook``, if specified, will be called with the result
of every JSON object decoded and its return value will be used in
place of the given ``dict``. This can be used to provide custom
deserializations (e.g. to support JSON-RPC class hinting).
``object_pairs_hook``, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes
priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN.
This can be used to raise an exception if invalid JSON numbers
are encountered.
If ``strict`` is false (true is the default), then control
characters will be allowed inside strings. Control characters in
this context are those with character codes in the 0-31 range,
including ``'\\t'`` (tab), ``'\\n'``, ``'\\r'`` and ``'\\0'``.
"""
self.object_hook = object_hook
self.parse_float = parse_float or float
self.parse_int = parse_int or int
self.parse_constant = parse_constant or _CONSTANTS.__getitem__
self.strict = strict
self.object_pairs_hook = object_pairs_hook
self.parse_object = JSONObject
self.parse_array = JSONArray
self.parse_string = scanstring
self.memo = {}
self.scan_once = scanner.make_scanner(self)
def decode(self, s, _w=WHITESPACE.match):
"""Return the Python representation of ``s`` (a ``str`` instance
containing a JSON document).
"""
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
end = _w(s, end).end()
if end != len(s):
raise ValueError(errmsg("Extra data", s, end, len(s)))
return obj
def raw_decode(self, s, idx=0):
"""Decode a JSON document from ``s`` (a ``str`` beginning with
a JSON document) and return a 2-tuple of the Python
representation and the index in ``s`` where the document ended.
This can be used to decode a JSON document from a string that may
have extraneous data at the end.
"""
try:
obj, end = self.scan_once(s, idx)
except StopIteration as err:
raise ValueError(errmsg("Expecting value", s, err.value)) from None
return obj, end
"""Implementation of JSONEncoder
"""
import re
try:
from _json import encode_basestring_ascii as c_encode_basestring_ascii
except ImportError:
c_encode_basestring_ascii = None
try:
from _json import make_encoder as c_make_encoder
except ImportError:
c_make_encoder = None
ESCAPE = re.compile(r'[\x00-\x1f\\"\b\f\n\r\t]')
ESCAPE_ASCII = re.compile(r'([\\"]|[^\ -~])')
HAS_UTF8 = re.compile(b'[\x80-\xff]')
ESCAPE_DCT = {
'\\': '\\\\',
'"': '\\"',
'\b': '\\b',
'\f': '\\f',
'\n': '\\n',
'\r': '\\r',
'\t': '\\t',
}
for i in range(0x20):
ESCAPE_DCT.setdefault(chr(i), '\\u{0:04x}'.format(i))
#ESCAPE_DCT.setdefault(chr(i), '\\u%04x' % (i,))
INFINITY = float('inf')
FLOAT_REPR = repr
def encode_basestring(s):
"""Return a JSON representation of a Python string
"""
def replace(match):
return ESCAPE_DCT[match.group(0)]
return '"' + ESCAPE.sub(replace, s) + '"'
def py_encode_basestring_ascii(s):
"""Return an ASCII-only JSON representation of a Python string
"""
def replace(match):
s = match.group(0)
try:
return ESCAPE_DCT[s]
except KeyError:
n = ord(s)
if n < 0x10000:
return '\\u{0:04x}'.format(n)
#return '\\u%04x' % (n,)
else:
# surrogate pair
n -= 0x10000
s1 = 0xd800 | ((n >> 10) & 0x3ff)
s2 = 0xdc00 | (n & 0x3ff)
return '\\u{0:04x}\\u{1:04x}'.format(s1, s2)
return '"' + ESCAPE_ASCII.sub(replace, s) + '"'
encode_basestring_ascii = (
c_encode_basestring_ascii or py_encode_basestring_ascii)
class JSONEncoder(object):
"""Extensible JSON <http://json.org> encoder for Python data structures.
Supports the following objects and types by default:
+-------------------+---------------+
| Python | JSON |
+===================+===============+
| dict | object |
+-------------------+---------------+
| list, tuple | array |
+-------------------+---------------+
| str | string |
+-------------------+---------------+
| int, float | number |
+-------------------+---------------+
| True | true |
+-------------------+---------------+
| False | false |
+-------------------+---------------+
| None | null |
+-------------------+---------------+
To extend this to recognize other objects, subclass and implement a
``.default()`` method with another method that returns a serializable
object for ``o`` if possible, otherwise it should call the superclass
implementation (to raise ``TypeError``).
"""
item_separator = ', '
key_separator = ': '
def __init__(self, skipkeys=False, ensure_ascii=True,
check_circular=True, allow_nan=True, sort_keys=False,
indent=None, separators=None, default=None):
"""Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt
encoding of keys that are not str, int, float or None. If
skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str
objects with all incoming non-ASCII characters escaped. If
ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded
objects will be checked for circular references during encoding to
prevent an infinite recursion (which would cause an OverflowError).
Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be
encoded as such. This behavior is not JSON specification compliant,
but is consistent with most JavaScript based encoders and decoders.
Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be
sorted by key; this is useful for regression tests to ensure
that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array
elements and object members will be pretty-printed with that
indent level. An indent level of 0 will only insert newlines.
None is the most compact representation.
If specified, separators should be an (item_separator, key_separator)
tuple. The default is (', ', ': ') if *indent* is ``None`` and
(',', ': ') otherwise. To get the most compact JSON representation,
you should specify (',', ':') to eliminate whitespace.
If specified, default is a function that gets called for objects
that can't otherwise be serialized. It should return a JSON encodable
version of the object or raise a ``TypeError``.
"""
self.skipkeys = skipkeys
self.ensure_ascii = ensure_ascii
self.check_circular = check_circular
self.allow_nan = allow_nan
self.sort_keys = sort_keys
self.indent = indent
if separators is not None:
self.item_separator, self.key_separator = separators
elif indent is not None:
self.item_separator = ','
if default is not None:
self.default = default
def default(self, o):
"""Implement this method in a subclass such that it returns
a serializable object for ``o``, or calls the base implementation
(to raise a ``TypeError``).
For example, to support arbitrary iterators, you could
implement default like this::
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return JSONEncoder.default(self, o)
"""
raise TypeError(repr(o) + " is not JSON serializable")
def encode(self, o):
"""Return a JSON string representation of a Python data structure.
>>> from json.encoder import JSONEncoder
>>> JSONEncoder().encode({"foo": ["bar", "baz"]})
'{"foo": ["bar", "baz"]}'
"""
# This is for extremely simple cases and benchmarks.
if isinstance(o, str):
if self.ensure_ascii:
return encode_basestring_ascii(o)
else:
return encode_basestring(o)
# This doesn't pass the iterator directly to ''.join() because the
# exceptions aren't as detailed. The list call should be roughly
# equivalent to the PySequence_Fast that ''.join() would do.
chunks = self.iterencode(o, _one_shot=True)
if not isinstance(chunks, (list, tuple)):
chunks = list(chunks)
return ''.join(chunks)
def iterencode(self, o, _one_shot=False):
"""Encode the given object and yield each string
representation as available.
For example::
for chunk in JSONEncoder().iterencode(bigobject):
mysocket.write(chunk)
"""
if self.check_circular:
markers = {}
else:
markers = None
if self.ensure_ascii:
_encoder = encode_basestring_ascii
else:
_encoder = encode_basestring
def floatstr(o, allow_nan=self.allow_nan,
_repr=FLOAT_REPR, _inf=INFINITY, _neginf=-INFINITY):
# Check for specials. Note that this type of test is processor
# and/or platform-specific, so do tests which don't depend on the
# internals.
if o != o:
text = 'NaN'
elif o == _inf:
text = 'Infinity'
elif o == _neginf:
text = '-Infinity'
else:
return _repr(o)
if not allow_nan:
raise ValueError(
"Out of range float values are not JSON compliant: " +
repr(o))
return text
if (_one_shot and c_make_encoder is not None
and self.indent is None):
_iterencode = c_make_encoder(
markers, self.default, _encoder, self.indent,
self.key_separator, self.item_separator, self.sort_keys,
self.skipkeys, self.allow_nan)
else:
_iterencode = _make_iterencode(
markers, self.default, _encoder, self.indent, floatstr,
self.key_separator, self.item_separator, self.sort_keys,
self.skipkeys, _one_shot)
return _iterencode(o, 0)
def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
_key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
## HACK: hand-optimized bytecode; turn globals into locals
ValueError=ValueError,
dict=dict,
float=float,
id=id,
int=int,
isinstance=isinstance,
list=list,
str=str,
tuple=tuple,
):
if _indent is not None and not isinstance(_indent, str):
_indent = ' ' * _indent
def _iterencode_list(lst, _current_indent_level):
if not lst:
yield '[]'
return
if markers is not None:
markerid = id(lst)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = lst
buf = '['
if _indent is not None:
_current_indent_level += 1
newline_indent = '\n' + _indent * _current_indent_level
separator = _item_separator + newline_indent
buf += newline_indent
else:
newline_indent = None
separator = _item_separator
first = True
for value in lst:
if first:
first = False
else:
buf = separator
if isinstance(value, str):
yield buf + _encoder(value)
elif value is None:
yield buf + 'null'
elif value is True:
yield buf + 'true'
elif value is False:
yield buf + 'false'
elif isinstance(value, int):
# Subclasses of int/float may override __str__, but we still
# want to encode them as integers/floats in JSON. One example
# within the standard library is IntEnum.
yield buf + str(int(value))
elif isinstance(value, float):
# see comment above for int
yield buf + _floatstr(float(value))
else:
yield buf
if isinstance(value, (list, tuple)):
chunks = _iterencode_list(value, _current_indent_level)
elif isinstance(value, dict):
chunks = _iterencode_dict(value, _current_indent_level)
else:
chunks = _iterencode(value, _current_indent_level)
yield from chunks
if newline_indent is not None:
_current_indent_level -= 1
yield '\n' + _indent * _current_indent_level
yield ']'
if markers is not None:
del markers[markerid]
def _iterencode_dict(dct, _current_indent_level):
if not dct:
yield '{}'
return
if markers is not None:
markerid = id(dct)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = dct
yield '{'
if _indent is not None:
_current_indent_level += 1
newline_indent = '\n' + _indent * _current_indent_level
item_separator = _item_separator + newline_indent
yield newline_indent
else:
newline_indent = None
item_separator = _item_separator
first = True
if _sort_keys:
items = sorted(dct.items(), key=lambda kv: kv[0])
else:
items = dct.items()
for key, value in items:
if isinstance(key, str):
pass
# JavaScript is weakly typed for these, so it makes sense to
# also allow them. Many encoders seem to do something like this.
elif isinstance(key, float):
# see comment for int/float in _make_iterencode
key = _floatstr(float(key))
elif key is True:
key = 'true'
elif key is False:
key = 'false'
elif key is None:
key = 'null'
elif isinstance(key, int):
# see comment for int/float in _make_iterencode
key = str(int(key))
elif _skipkeys:
continue
else:
raise TypeError("key " + repr(key) + " is not a string")
if first:
first = False
else:
yield item_separator
yield _encoder(key)
yield _key_separator
if isinstance(value, str):
yield _encoder(value)
elif value is None:
yield 'null'
elif value is True:
yield 'true'
elif value is False:
yield 'false'
elif isinstance(value, int):
# see comment for int/float in _make_iterencode
yield str(int(value))
elif isinstance(value, float):
# see comment for int/float in _make_iterencode
yield _floatstr(float(value))
else:
if isinstance(value, (list, tuple)):
chunks = _iterencode_list(value, _current_indent_level)
elif isinstance(value, dict):
chunks = _iterencode_dict(value, _current_indent_level)
else:
chunks = _iterencode(value, _current_indent_level)
yield from chunks
if newline_indent is not None:
_current_indent_level -= 1
yield '\n' + _indent * _current_indent_level
yield '}'
if markers is not None:
del markers[markerid]
def _iterencode(o, _current_indent_level):
if isinstance(o, str):
yield _encoder(o)
elif o is None:
yield 'null'
elif o is True:
yield 'true'
elif o is False:
yield 'false'
elif isinstance(o, int):
# see comment for int/float in _make_iterencode
yield str(int(o))
elif isinstance(o, float):
# see comment for int/float in _make_iterencode
yield _floatstr(float(o))
elif isinstance(o, (list, tuple)):
yield from _iterencode_list(o, _current_indent_level)
elif isinstance(o, dict):
yield from _iterencode_dict(o, _current_indent_level)
else:
if markers is not None:
markerid = id(o)
if markerid in markers:
raise ValueError("Circular reference detected")
markers[markerid] = o
o = _default(o)
yield from _iterencode(o, _current_indent_level)
if markers is not None:
del markers[markerid]
return _iterencode
"""JSON token scanner
"""
import re
try:
from _json import make_scanner as c_make_scanner
except ImportError:
c_make_scanner = None
__all__ = ['make_scanner']
NUMBER_RE = re.compile(
r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
def py_make_scanner(context):
parse_object = context.parse_object
parse_array = context.parse_array
parse_string = context.parse_string
match_number = NUMBER_RE.match
strict = context.strict
parse_float = context.parse_float
parse_int = context.parse_int
parse_constant = context.parse_constant
object_hook = context.object_hook
object_pairs_hook = context.object_pairs_hook
memo = context.memo
def _scan_once(string, idx):
try:
nextchar = string[idx]
except IndexError:
raise StopIteration(idx)
if nextchar == '"':
return parse_string(string, idx + 1, strict)
elif nextchar == '{':
return parse_object((string, idx + 1), strict,
_scan_once, object_hook, object_pairs_hook, memo)
elif nextchar == '[':
return parse_array((string, idx + 1), _scan_once)
elif nextchar == 'n' and string[idx:idx + 4] == 'null':
return None, idx + 4
elif nextchar == 't' and string[idx:idx + 4] == 'true':
return True, idx + 4
elif nextchar == 'f' and string[idx:idx + 5] == 'false':
return False, idx + 5
m = match_number(string, idx)
if m is not None:
integer, frac, exp = m.groups()
if frac or exp:
res = parse_float(integer + (frac or '') + (exp or ''))
else:
res = parse_int(integer)
return res, m.end()
elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
return parse_constant('NaN'), idx + 3
elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
return parse_constant('Infinity'), idx + 8
elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
return parse_constant('-Infinity'), idx + 9
else:
raise StopIteration(idx)
def scan_once(string, idx):
try:
return _scan_once(string, idx)
finally:
memo.clear()
return _scan_once
make_scanner = c_make_scanner or py_make_scanner
r"""Command-line tool to validate and pretty-print JSON
Usage::
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{ 1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
"""
import sys
import json
def main():
if len(sys.argv) == 1:
infile = sys.stdin
outfile = sys.stdout
elif len(sys.argv) == 2:
infile = open(sys.argv[1], 'r')
outfile = sys.stdout
elif len(sys.argv) == 3:
infile = open(sys.argv[1], 'r')
outfile = open(sys.argv[2], 'w')
else:
raise SystemExit(sys.argv[0] + " [infile [outfile]]")
with infile:
try:
obj = json.load(infile)
except ValueError as e:
raise SystemExit(e)
with outfile:
json.dump(obj, outfile, sort_keys=True, indent=4)
outfile.write('\n')
if __name__ == '__main__':
main()
r"""JSON (JavaScript Object Notation) <http://json.org> is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.
:mod:`json` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules. It is derived from a
version of the externally maintained simplejson library.
Encoding basic Python object hierarchies::
>>> import json
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
'["foo", {"bar": ["baz", null, 1.0, 2]}]'
>>> print(json.dumps("\"foo\bar"))
"\"foo\bar"
>>> print(json.dumps('\u1234'))
"\u1234"
>>> print(json.dumps('\\'))
"\\"
>>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
{"a": 0, "b": 0, "c": 0}
>>> from io import StringIO
>>> io = StringIO()
>>> json.dump(['streaming API'], io)
>>> io.getvalue()
'["streaming API"]'
Compact encoding::
>>> import json
>>> from collections import OrderedDict
>>> mydict = OrderedDict([('4', 5), ('6', 7)])
>>> json.dumps([1,2,3,mydict], separators=(',', ':'))
'[1,2,3,{"4":5,"6":7}]'
Pretty printing::
>>> import json
>>> print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4))
{
"4": 5,
"6": 7
}
Decoding JSON::
>>> import json
>>> obj = ['foo', {'bar': ['baz', None, 1.0, 2]}]
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]') == obj
True
>>> json.loads('"\\"foo\\bar"') == '"foo\x08ar'
True
>>> from io import StringIO
>>> io = StringIO('["streaming API"]')
>>> json.load(io)[0] == 'streaming API'
True
Specializing JSON object decoding::
>>> import json
>>> def as_complex(dct):
... if '__complex__' in dct:
... return complex(dct['real'], dct['imag'])
... return dct
...
>>> json.loads('{"__complex__": true, "real": 1, "imag": 2}',
... object_hook=as_complex)
(1+2j)
>>> from decimal import Decimal
>>> json.loads('1.1', parse_float=Decimal) == Decimal('1.1')
True
Specializing JSON object encoding::
>>> import json
>>> def encode_complex(obj):
... if isinstance(obj, complex):
... return [obj.real, obj.imag]
... raise TypeError(repr(o) + " is not JSON serializable")
...
>>> json.dumps(2 + 1j, default=encode_complex)
'[2.0, 1.0]'
>>> json.JSONEncoder(default=encode_complex).encode(2 + 1j)
'[2.0, 1.0]'
>>> ''.join(json.JSONEncoder(default=encode_complex).iterencode(2 + 1j))
'[2.0, 1.0]'
Using json.tool from the shell to validate and pretty-print::
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{ 1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
"""
__version__ = '2.0.9'
__all__ = [
'dump', 'dumps', 'load', 'loads',
'JSONDecoder', 'JSONEncoder',
]
__author__ = 'Bob Ippolito <[email protected]>'
from .decoder import JSONDecoder
from .encoder import JSONEncoder
_default_encoder = JSONEncoder(
skipkeys=False,
ensure_ascii=True,
check_circular=True,
allow_nan=True,
indent=None,
separators=None,
default=None,
)
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,
default=None, sort_keys=False, **kw):
"""Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
``.write()``-supporting file-like object).
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
instead of raising a ``TypeError``.
If ``ensure_ascii`` is false, then the strings written to ``fp`` can
contain non-ASCII characters if they appear in strings contained in
``obj``. Otherwise, all such characters are escaped in JSON strings.
If ``check_circular`` is false, then the circular reference check
for container types will be skipped and a circular reference will
result in an ``OverflowError`` (or worse).
If ``allow_nan`` is false, then it will be a ``ValueError`` to
serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
in strict compliance of the JSON specification, instead of using the
JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
If ``indent`` is a non-negative integer, then JSON array elements and
object members will be pretty-printed with that indent level. An indent
level of 0 will only insert newlines. ``None`` is the most compact
representation.
If specified, ``separators`` should be an ``(item_separator, key_separator)``
tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and
``(',', ': ')`` otherwise. To get the most compact JSON representation,
you should specify ``(',', ':')`` to eliminate whitespace.
``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.
If *sort_keys* is ``True`` (default: ``False``), then the output of
dictionaries will be sorted by key.
To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
``.default()`` method to serialize additional types), specify it with
the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.
"""
# cached encoder
if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
default is None and not sort_keys and not kw):
iterable = _default_encoder.iterencode(obj)
else:
if cls is None:
cls = JSONEncoder
iterable = cls(skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators,
default=default, sort_keys=sort_keys, **kw).iterencode(obj)
# could accelerate with writelines in some versions of Python, at
# a debuggability cost
for chunk in iterable:
fp.write(chunk)
def dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,
default=None, sort_keys=False, **kw):
"""Serialize ``obj`` to a JSON formatted ``str``.
If ``skipkeys`` is true then ``dict`` keys that are not basic types
(``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
instead of raising a ``TypeError``.
If ``ensure_ascii`` is false, then the return value can contain non-ASCII
characters if they appear in strings contained in ``obj``. Otherwise, all
such characters are escaped in JSON strings.
If ``check_circular`` is false, then the circular reference check
for container types will be skipped and a circular reference will
result in an ``OverflowError`` (or worse).
If ``allow_nan`` is false, then it will be a ``ValueError`` to
serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in
strict compliance of the JSON specification, instead of using the
JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).
If ``indent`` is a non-negative integer, then JSON array elements and
object members will be pretty-printed with that indent level. An indent
level of 0 will only insert newlines. ``None`` is the most compact
representation.
If specified, ``separators`` should be an ``(item_separator, key_separator)``
tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and
``(',', ': ')`` otherwise. To get the most compact JSON representation,
you should specify ``(',', ':')`` to eliminate whitespace.
``default(obj)`` is a function that should return a serializable version
of obj or raise TypeError. The default simply raises TypeError.
If *sort_keys* is ``True`` (default: ``False``), then the output of
dictionaries will be sorted by key.
To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
``.default()`` method to serialize additional types), specify it with
the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.
"""
# cached encoder
if (not skipkeys and ensure_ascii and
check_circular and allow_nan and
cls is None and indent is None and separators is None and
default is None and not sort_keys and not kw):
return _default_encoder.encode(obj)
if cls is None:
cls = JSONEncoder
return cls(
skipkeys=skipkeys, ensure_ascii=ensure_ascii,
check_circular=check_circular, allow_nan=allow_nan, indent=indent,
separators=separators, default=default, sort_keys=sort_keys,
**kw).encode(obj)
_default_decoder = JSONDecoder(object_hook=None, object_pairs_hook=None)
def load(fp, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
"""Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
a JSON document) to a Python object.
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
"""
return loads(fp.read(),
cls=cls, object_hook=object_hook,
parse_float=parse_float, parse_int=parse_int,
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
def loads(s, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
"""Deserialize ``s`` (a ``str`` instance containing a JSON
document) to a Python object.
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs. The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict will remember the order of insertion). If
``object_hook`` is also defined, the ``object_pairs_hook`` takes priority.
``parse_float``, if specified, will be called with the string
of every JSON float to be decoded. By default this is equivalent to
float(num_str). This can be used to use another datatype or parser
for JSON floats (e.g. decimal.Decimal).
``parse_int``, if specified, will be called with the string
of every JSON int to be decoded. By default this is equivalent to
int(num_str). This can be used to use another datatype or parser
for JSON integers (e.g. float).
``parse_constant``, if specified, will be called with one of the
following strings: -Infinity, Infinity, NaN, null, true, false.
This can be used to raise an exception if invalid JSON numbers
are encountered.
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
The ``encoding`` argument is ignored and deprecated.
"""
if not isinstance(s, str):
raise TypeError('the JSON object must be str, not {!r}'.format(
s.__class__.__name__))
if s.startswith(u'\ufeff'):
raise ValueError("Unexpected UTF-8 BOM (decode using utf-8-sig)")
if (cls is None and object_hook is None and
parse_int is None and parse_float is None and
parse_constant is None and object_pairs_hook is None and not kw):
return _default_decoder.decode(s)
if cls is None:
cls = JSONDecoder
if object_hook is not None:
kw['object_hook'] = object_hook
if object_pairs_hook is not None:
kw['object_pairs_hook'] = object_pairs_hook
if parse_float is not None:
kw['parse_float'] = parse_float
if parse_int is not None:
kw['parse_int'] = parse_int
if parse_constant is not None:
kw['parse_constant'] = parse_constant
return cls(**kw).decode(s)
"""A bottom-up tree matching algorithm implementation meant to speed
up 2to3's matching process. After the tree patterns are reduced to
their rarest linear path, a linear Aho-Corasick automaton is
created. The linear automaton traverses the linear paths from the
leaves to the root of the AST and returns a set of nodes for further
matching. This reduces significantly the number of candidate nodes."""
__author__ = "George Boutsioukis <[email protected]>"
import logging
import itertools
from collections import defaultdict
from . import pytree
from .btm_utils import reduce_tree
class BMNode(object):
"""Class for a node of the Aho-Corasick automaton used in matching"""
count = itertools.count()
def __init__(self):
self.transition_table = {}
self.fixers = []
self.id = next(BMNode.count)
self.content = ''
class BottomMatcher(object):
"""The main matcher class. After instantiating the patterns should
be added using the add_fixer method"""
def __init__(self):
self.match = set()
self.root = BMNode()
self.nodes = [self.root]
self.fixers = []
self.logger = logging.getLogger("RefactoringTool")
def add_fixer(self, fixer):
"""Reduces a fixer's pattern tree to a linear path and adds it
to the matcher(a common Aho-Corasick automaton). The fixer is
appended on the matching states and called when they are
reached"""
self.fixers.append(fixer)
tree = reduce_tree(fixer.pattern_tree)
linear = tree.get_linear_subpattern()
match_nodes = self.add(linear, start=self.root)
for match_node in match_nodes:
match_node.fixers.append(fixer)
def add(self, pattern, start):
"Recursively adds a linear pattern to the AC automaton"
#print("adding pattern", pattern, "to", start)
if not pattern:
#print("empty pattern")
return [start]
if isinstance(pattern[0], tuple):
#alternatives
#print("alternatives")
match_nodes = []
for alternative in pattern[0]:
#add all alternatives, and add the rest of the pattern
#to each end node
end_nodes = self.add(alternative, start=start)
for end in end_nodes:
match_nodes.extend(self.add(pattern[1:], end))
return match_nodes
else:
#single token
#not last
if pattern[0] not in start.transition_table:
#transition did not exist, create new
next_node = BMNode()
start.transition_table[pattern[0]] = next_node
else:
#transition exists already, follow
next_node = start.transition_table[pattern[0]]
if pattern[1:]:
end_nodes = self.add(pattern[1:], start=next_node)
else:
end_nodes = [next_node]
return end_nodes
def run(self, leaves):
"""The main interface with the bottom matcher. The tree is
traversed from the bottom using the constructed
automaton. Nodes are only checked once as the tree is
retraversed. When the automaton fails, we give it one more
shot(in case the above tree matches as a whole with the
rejected leaf), then we break for the next leaf. There is the
special case of multiple arguments(see code comments) where we
recheck the nodes
Args:
The leaves of the AST tree to be matched
Returns:
A dictionary of node matches with fixers as the keys
"""
current_ac_node = self.root
results = defaultdict(list)
for leaf in leaves:
current_ast_node = leaf
while current_ast_node:
current_ast_node.was_checked = True
for child in current_ast_node.children:
# multiple statements, recheck
if isinstance(child, pytree.Leaf) and child.value == ";":
current_ast_node.was_checked = False
break
if current_ast_node.type == 1:
#name
node_token = current_ast_node.value
else:
node_token = current_ast_node.type
if node_token in current_ac_node.transition_table:
#token matches
current_ac_node = current_ac_node.transition_table[node_token]
for fixer in current_ac_node.fixers:
if not fixer in results:
results[fixer] = []
results[fixer].append(current_ast_node)
else:
#matching failed, reset automaton
current_ac_node = self.root
if (current_ast_node.parent is not None
and current_ast_node.parent.was_checked):
#the rest of the tree upwards has been checked, next leaf
break
#recheck the rejected node once from the root
if node_token in current_ac_node.transition_table:
#token matches
current_ac_node = current_ac_node.transition_table[node_token]
for fixer in current_ac_node.fixers:
if not fixer in results.keys():
results[fixer] = []
results[fixer].append(current_ast_node)
current_ast_node = current_ast_node.parent
return results
def print_ac(self):
"Prints a graphviz diagram of the BM automaton(for debugging)"
print("digraph g{")
def print_node(node):
for subnode_key in node.transition_table.keys():
subnode = node.transition_table[subnode_key]
print("%d -> %d [label=%s] //%s" %
(node.id, subnode.id, type_repr(subnode_key), str(subnode.fixers)))
if subnode_key == 1:
print(subnode.content)
print_node(subnode)
print_node(self.root)
print("}")
# taken from pytree.py for debugging; only used by print_ac
_type_reprs = {}
def type_repr(type_num):
global _type_reprs
if not _type_reprs:
from .pygram import python_symbols
# printing tokens is possible but not as useful
# from .pgen2 import token // token.__dict__.items():
for name, val in python_symbols.__dict__.items():
if type(val) == int: _type_reprs[val] = name
return _type_reprs.setdefault(type_num, type_num)
"Utility functions used by the btm_matcher module"
from . import pytree
from .pgen2 import grammar, token
from .pygram import pattern_symbols, python_symbols
syms = pattern_symbols
pysyms = python_symbols
tokens = grammar.opmap
token_labels = token
TYPE_ANY = -1
TYPE_ALTERNATIVES = -2
TYPE_GROUP = -3
class MinNode(object):
"""This class serves as an intermediate representation of the
pattern tree during the conversion to sets of leaf-to-root
subpatterns"""
def __init__(self, type=None, name=None):
self.type = type
self.name = name
self.children = []
self.leaf = False
self.parent = None
self.alternatives = []
self.group = []
def __repr__(self):
return str(self.type) + ' ' + str(self.name)
def leaf_to_root(self):
"""Internal method. Returns a characteristic path of the
pattern tree. This method must be run for all leaves until the
linear subpatterns are merged into a single"""
node = self
subp = []
while node:
if node.type == TYPE_ALTERNATIVES:
node.alternatives.append(subp)
if len(node.alternatives) == len(node.children):
#last alternative
subp = [tuple(node.alternatives)]
node.alternatives = []
node = node.parent
continue
else:
node = node.parent
subp = None
break
if node.type == TYPE_GROUP:
node.group.append(subp)
#probably should check the number of leaves
if len(node.group) == len(node.children):
subp = get_characteristic_subpattern(node.group)
node.group = []
node = node.parent
continue
else:
node = node.parent
subp = None
break
if node.type == token_labels.NAME and node.name:
#in case of type=name, use the name instead
subp.append(node.name)
else:
subp.append(node.type)
node = node.parent
return subp
def get_linear_subpattern(self):
"""Drives the leaf_to_root method. The reason that
leaf_to_root must be run multiple times is because we need to
reject 'group' matches; for example the alternative form
(a | b c) creates a group [b c] that needs to be matched. Since
matching multiple linear patterns overcomes the automaton's
capabilities, leaf_to_root merges each group into a single
choice based on 'characteristic'ity,
i.e. (a|b c) -> (a|b) if b more characteristic than c
Returns: The most 'characteristic'(as defined by
get_characteristic_subpattern) path for the compiled pattern
tree.
"""
for l in self.leaves():
subp = l.leaf_to_root()
if subp:
return subp
def leaves(self):
"Generator that returns the leaves of the tree"
for child in self.children:
yield from child.leaves()
if not self.children:
yield self
def reduce_tree(node, parent=None):
"""
Internal function. Reduces a compiled pattern tree to an
intermediate representation suitable for feeding the
automaton. This also trims off any optional pattern elements(like
[a], a*).
"""
new_node = None
#switch on the node type
if node.type == syms.Matcher:
#skip
node = node.children[0]
if node.type == syms.Alternatives :
#2 cases
if len(node.children) <= 2:
#just a single 'Alternative', skip this node
new_node = reduce_tree(node.children[0], parent)
else:
#real alternatives
new_node = MinNode(type=TYPE_ALTERNATIVES)
#skip odd children('|' tokens)
for child in node.children:
if node.children.index(child)%2:
continue
reduced = reduce_tree(child, new_node)
if reduced is not None:
new_node.children.append(reduced)
elif node.type == syms.Alternative:
if len(node.children) > 1:
new_node = MinNode(type=TYPE_GROUP)
for child in node.children:
reduced = reduce_tree(child, new_node)
if reduced:
new_node.children.append(reduced)
if not new_node.children:
# delete the group if all of the children were reduced to None
new_node = None
else:
new_node = reduce_tree(node.children[0], parent)
elif node.type == syms.Unit:
if (isinstance(node.children[0], pytree.Leaf) and
node.children[0].value == '('):
#skip parentheses
return reduce_tree(node.children[1], parent)
if ((isinstance(node.children[0], pytree.Leaf) and
node.children[0].value == '[')
or
(len(node.children)>1 and
hasattr(node.children[1], "value") and
node.children[1].value == '[')):
#skip whole unit if its optional
return None
leaf = True
details_node = None
alternatives_node = None
has_repeater = False
repeater_node = None
has_variable_name = False
for child in node.children:
if child.type == syms.Details:
leaf = False
details_node = child
elif child.type == syms.Repeater:
has_repeater = True
repeater_node = child
elif child.type == syms.Alternatives:
alternatives_node = child
if hasattr(child, 'value') and child.value == '=': # variable name
has_variable_name = True
#skip variable name
if has_variable_name:
#skip variable name, '='
name_leaf = node.children[2]
if hasattr(name_leaf, 'value') and name_leaf.value == '(':
# skip parenthesis
name_leaf = node.children[3]
else:
name_leaf = node.children[0]
#set node type
if name_leaf.type == token_labels.NAME:
#(python) non-name or wildcard
if name_leaf.value == 'any':
new_node = MinNode(type=TYPE_ANY)
else:
if hasattr(token_labels, name_leaf.value):
new_node = MinNode(type=getattr(token_labels, name_leaf.value))
else:
new_node = MinNode(type=getattr(pysyms, name_leaf.value))
elif name_leaf.type == token_labels.STRING:
#(python) name or character; remove the apostrophes from
#the string value
name = name_leaf.value.strip("'")
if name in tokens:
new_node = MinNode(type=tokens[name])
else:
new_node = MinNode(type=token_labels.NAME, name=name)
elif name_leaf.type == syms.Alternatives:
new_node = reduce_tree(alternatives_node, parent)
#handle repeaters
if has_repeater:
if repeater_node.children[0].value == '*':
#reduce to None
new_node = None
elif repeater_node.children[0].value == '+':
#reduce to a single occurence i.e. do nothing
pass
else:
#TODO: handle {min, max} repeaters
raise NotImplementedError
pass
#add children
if details_node and new_node is not None:
for child in details_node.children[1:-1]:
#skip '<', '>' markers
reduced = reduce_tree(child, new_node)
if reduced is not None:
new_node.children.append(reduced)
if new_node:
new_node.parent = parent
return new_node
def get_characteristic_subpattern(subpatterns):
"""Picks the most characteristic from a list of linear patterns
Current order used is:
names > common_names > common_chars
"""
if not isinstance(subpatterns, list):
return subpatterns
if len(subpatterns)==1:
return subpatterns[0]
# first pick out the ones containing variable names
subpatterns_with_names = []
subpatterns_with_common_names = []
common_names = ['in', 'for', 'if' , 'not', 'None']
subpatterns_with_common_chars = []
common_chars = "[]().,:"
for subpattern in subpatterns:
if any(rec_test(subpattern, lambda x: type(x) is str)):
if any(rec_test(subpattern,
lambda x: isinstance(x, str) and x in common_chars)):
subpatterns_with_common_chars.append(subpattern)
elif any(rec_test(subpattern,
lambda x: isinstance(x, str) and x in common_names)):
subpatterns_with_common_names.append(subpattern)
else:
subpatterns_with_names.append(subpattern)
if subpatterns_with_names:
subpatterns = subpatterns_with_names
elif subpatterns_with_common_names:
subpatterns = subpatterns_with_common_names
elif subpatterns_with_common_chars:
subpatterns = subpatterns_with_common_chars
# of the remaining subpatterns pick out the longest one
return max(subpatterns, key=len)
def rec_test(sequence, test_func):
"""Tests test_func on all items of sequence and items of included
sub-iterables"""
for x in sequence:
if isinstance(x, (list, tuple)):
yield from rec_test(x, test_func)
else:
yield test_func(x)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Base class for fixers (optional, but recommended)."""
# Python imports
import logging
import itertools
# Local imports
from .patcomp import PatternCompiler
from . import pygram
from .fixer_util import does_tree_import
class BaseFix(object):
"""Optional base class for fixers.
The subclass name must be FixFooBar where FooBar is the result of
removing underscores and capitalizing the words of the fix name.
For example, the class name for a fixer named 'has_key' should be
FixHasKey.
"""
PATTERN = None # Most subclasses should override with a string literal
pattern = None # Compiled pattern, set by compile_pattern()
pattern_tree = None # Tree representation of the pattern
options = None # Options object passed to initializer
filename = None # The filename (set by set_filename)
numbers = itertools.count(1) # For new_name()
used_names = set() # A set of all used NAMEs
order = "post" # Does the fixer prefer pre- or post-order traversal
explicit = False # Is this ignored by refactor.py -f all?
run_order = 5 # Fixers will be sorted by run order before execution
# Lower numbers will be run first.
_accept_type = None # [Advanced and not public] This tells RefactoringTool
# which node type to accept when there's not a pattern.
keep_line_order = False # For the bottom matcher: match with the
# original line order
BM_compatible = False # Compatibility with the bottom matching
# module; every fixer should set this
# manually
# Shortcut for access to Python grammar symbols
syms = pygram.python_symbols
def __init__(self, options, log):
"""Initializer. Subclass may override.
Args:
options: an dict containing the options passed to RefactoringTool
that could be used to customize the fixer through the command line.
log: a list to append warnings and other messages to.
"""
self.options = options
self.log = log
self.compile_pattern()
def compile_pattern(self):
"""Compiles self.PATTERN into self.pattern.
Subclass may override if it doesn't want to use
self.{pattern,PATTERN} in .match().
"""
if self.PATTERN is not None:
PC = PatternCompiler()
self.pattern, self.pattern_tree = PC.compile_pattern(self.PATTERN,
with_tree=True)
def set_filename(self, filename):
"""Set the filename.
The main refactoring tool should call this.
"""
self.filename = filename
def match(self, node):
"""Returns match for a given parse tree node.
Should return a true or false object (not necessarily a bool).
It may return a non-empty dict of matching sub-nodes as
returned by a matching pattern.
Subclass may override.
"""
results = {"node": node}
return self.pattern.match(node, results) and results
def transform(self, node, results):
"""Returns the transformation for a given parse tree node.
Args:
node: the root of the parse tree that matched the fixer.
results: a dict mapping symbolic names to part of the match.
Returns:
None, or a node that is a modified copy of the
argument node. The node argument may also be modified in-place to
effect the same change.
Subclass *must* override.
"""
raise NotImplementedError()
def new_name(self, template="xxx_todo_changeme"):
"""Return a string suitable for use as an identifier
The new name is guaranteed not to conflict with other identifiers.
"""
name = template
while name in self.used_names:
name = template + str(next(self.numbers))
self.used_names.add(name)
return name
def log_message(self, message):
if self.first_log:
self.first_log = False
self.log.append("### In file %s ###" % self.filename)
self.log.append(message)
def cannot_convert(self, node, reason=None):
"""Warn the user that a given chunk of code is not valid Python 3,
but that it cannot be converted automatically.
First argument is the top-level node for the code in question.
Optional second argument is why it can't be converted.
"""
lineno = node.get_lineno()
for_output = node.clone()
for_output.prefix = ""
msg = "Line %d: could not convert: %s"
self.log_message(msg % (lineno, for_output))
if reason:
self.log_message(reason)
def warning(self, node, reason):
"""Used for warning the user about possible uncertainty in the
translation.
First argument is the top-level node for the code in question.
Optional second argument is why it can't be converted.
"""
lineno = node.get_lineno()
self.log_message("Line %d: %s" % (lineno, reason))
def start_tree(self, tree, filename):
"""Some fixers need to maintain tree-wide state.
This method is called once, at the start of tree fix-up.
tree - the root node of the tree to be processed.
filename - the name of the file the tree came from.
"""
self.used_names = tree.used_names
self.set_filename(filename)
self.numbers = itertools.count(1)
self.first_log = True
def finish_tree(self, tree, filename):
"""Some fixers need to maintain tree-wide state.
This method is called once, at the conclusion of tree fix-up.
tree - the root node of the tree to be processed.
filename - the name of the file the tree came from.
"""
pass
class ConditionalFix(BaseFix):
""" Base class for fixers which not execute if an import is found. """
# This is the name of the import which, if found, will cause the test to be skipped
skip_on = None
def start_tree(self, *args):
super(ConditionalFix, self).start_tree(*args)
self._should_skip = None
def should_skip(self, node):
if self._should_skip is not None:
return self._should_skip
pkg = self.skip_on.split(".")
name = pkg[-1]
pkg = ".".join(pkg[:-1])
self._should_skip = does_tree_import(pkg, name, node)
return self._should_skip
"""Utility functions, node construction macros, etc."""
# Author: Collin Winter
from itertools import islice
# Local imports
from .pgen2 import token
from .pytree import Leaf, Node
from .pygram import python_symbols as syms
from . import patcomp
###########################################################
### Common node-construction "macros"
###########################################################
def KeywordArg(keyword, value):
return Node(syms.argument,
[keyword, Leaf(token.EQUAL, "="), value])
def LParen():
return Leaf(token.LPAR, "(")
def RParen():
return Leaf(token.RPAR, ")")
def Assign(target, source):
"""Build an assignment statement"""
if not isinstance(target, list):
target = [target]
if not isinstance(source, list):
source.prefix = " "
source = [source]
return Node(syms.atom,
target + [Leaf(token.EQUAL, "=", prefix=" ")] + source)
def Name(name, prefix=None):
"""Return a NAME leaf"""
return Leaf(token.NAME, name, prefix=prefix)
def Attr(obj, attr):
"""A node tuple for obj.attr"""
return [obj, Node(syms.trailer, [Dot(), attr])]
def Comma():
"""A comma leaf"""
return Leaf(token.COMMA, ",")
def Dot():
"""A period (.) leaf"""
return Leaf(token.DOT, ".")
def ArgList(args, lparen=LParen(), rparen=RParen()):
"""A parenthesised argument list, used by Call()"""
node = Node(syms.trailer, [lparen.clone(), rparen.clone()])
if args:
node.insert_child(1, Node(syms.arglist, args))
return node
def Call(func_name, args=None, prefix=None):
"""A function call"""
node = Node(syms.power, [func_name, ArgList(args)])
if prefix is not None:
node.prefix = prefix
return node
def Newline():
"""A newline literal"""
return Leaf(token.NEWLINE, "\n")
def BlankLine():
"""A blank line"""
return Leaf(token.NEWLINE, "")
def Number(n, prefix=None):
return Leaf(token.NUMBER, n, prefix=prefix)
def Subscript(index_node):
"""A numeric or string subscript"""
return Node(syms.trailer, [Leaf(token.LBRACE, "["),
index_node,
Leaf(token.RBRACE, "]")])
def String(string, prefix=None):
"""A string leaf"""
return Leaf(token.STRING, string, prefix=prefix)
def ListComp(xp, fp, it, test=None):
"""A list comprehension of the form [xp for fp in it if test].
If test is None, the "if test" part is omitted.
"""
xp.prefix = ""
fp.prefix = " "
it.prefix = " "
for_leaf = Leaf(token.NAME, "for")
for_leaf.prefix = " "
in_leaf = Leaf(token.NAME, "in")
in_leaf.prefix = " "
inner_args = [for_leaf, fp, in_leaf, it]
if test:
test.prefix = " "
if_leaf = Leaf(token.NAME, "if")
if_leaf.prefix = " "
inner_args.append(Node(syms.comp_if, [if_leaf, test]))
inner = Node(syms.listmaker, [xp, Node(syms.comp_for, inner_args)])
return Node(syms.atom,
[Leaf(token.LBRACE, "["),
inner,
Leaf(token.RBRACE, "]")])
def FromImport(package_name, name_leafs):
""" Return an import statement in the form:
from package import name_leafs"""
# XXX: May not handle dotted imports properly (eg, package_name='foo.bar')
#assert package_name == '.' or '.' not in package_name, "FromImport has "\
# "not been tested with dotted package names -- use at your own "\
# "peril!"
for leaf in name_leafs:
# Pull the leaves out of their old tree
leaf.remove()
children = [Leaf(token.NAME, "from"),
Leaf(token.NAME, package_name, prefix=" "),
Leaf(token.NAME, "import", prefix=" "),
Node(syms.import_as_names, name_leafs)]
imp = Node(syms.import_from, children)
return imp
def ImportAndCall(node, results, names):
"""Returns an import statement and calls a method
of the module:
import module
module.name()"""
obj = results["obj"].clone()
if obj.type == syms.arglist:
newarglist = obj.clone()
else:
newarglist = Node(syms.arglist, [obj.clone()])
after = results["after"]
if after:
after = [n.clone() for n in after]
new = Node(syms.power,
Attr(Name(names[0]), Name(names[1])) +
[Node(syms.trailer,
[results["lpar"].clone(),
newarglist,
results["rpar"].clone()])] + after)
new.prefix = node.prefix
return new
###########################################################
### Determine whether a node represents a given literal
###########################################################
def is_tuple(node):
"""Does the node represent a tuple literal?"""
if isinstance(node, Node) and node.children == [LParen(), RParen()]:
return True
return (isinstance(node, Node)
and len(node.children) == 3
and isinstance(node.children[0], Leaf)
and isinstance(node.children[1], Node)
and isinstance(node.children[2], Leaf)
and node.children[0].value == "("
and node.children[2].value == ")")
def is_list(node):
"""Does the node represent a list literal?"""
return (isinstance(node, Node)
and len(node.children) > 1
and isinstance(node.children[0], Leaf)
and isinstance(node.children[-1], Leaf)
and node.children[0].value == "["
and node.children[-1].value == "]")
###########################################################
### Misc
###########################################################
def parenthesize(node):
return Node(syms.atom, [LParen(), node, RParen()])
consuming_calls = {"sorted", "list", "set", "any", "all", "tuple", "sum",
"min", "max", "enumerate"}
def attr_chain(obj, attr):
"""Follow an attribute chain.
If you have a chain of objects where a.foo -> b, b.foo-> c, etc,
use this to iterate over all objects in the chain. Iteration is
terminated by getattr(x, attr) is None.
Args:
obj: the starting object
attr: the name of the chaining attribute
Yields:
Each successive object in the chain.
"""
next = getattr(obj, attr)
while next:
yield next
next = getattr(next, attr)
p0 = """for_stmt< 'for' any 'in' node=any ':' any* >
| comp_for< 'for' any 'in' node=any any* >
"""
p1 = """
power<
( 'iter' | 'list' | 'tuple' | 'sorted' | 'set' | 'sum' |
'any' | 'all' | 'enumerate' | (any* trailer< '.' 'join' >) )
trailer< '(' node=any ')' >
any*
>
"""
p2 = """
power<
( 'sorted' | 'enumerate' )
trailer< '(' arglist<node=any any*> ')' >
any*
>
"""
pats_built = False
def in_special_context(node):
""" Returns true if node is in an environment where all that is required
of it is being iterable (ie, it doesn't matter if it returns a list
or an iterator).
See test_map_nochange in test_fixers.py for some examples and tests.
"""
global p0, p1, p2, pats_built
if not pats_built:
p0 = patcomp.compile_pattern(p0)
p1 = patcomp.compile_pattern(p1)
p2 = patcomp.compile_pattern(p2)
pats_built = True
patterns = [p0, p1, p2]
for pattern, parent in zip(patterns, attr_chain(node, "parent")):
results = {}
if pattern.match(parent, results) and results["node"] is node:
return True
return False
def is_probably_builtin(node):
"""
Check that something isn't an attribute or function name etc.
"""
prev = node.prev_sibling
if prev is not None and prev.type == token.DOT:
# Attribute lookup.
return False
parent = node.parent
if parent.type in (syms.funcdef, syms.classdef):
return False
if parent.type == syms.expr_stmt and parent.children[0] is node:
# Assignment.
return False
if parent.type == syms.parameters or \
(parent.type == syms.typedargslist and (
(prev is not None and prev.type == token.COMMA) or
parent.children[0] is node
)):
# The name of an argument.
return False
return True
def find_indentation(node):
"""Find the indentation of *node*."""
while node is not None:
if node.type == syms.suite and len(node.children) > 2:
indent = node.children[1]
if indent.type == token.INDENT:
return indent.value
node = node.parent
return ""
###########################################################
### The following functions are to find bindings in a suite
###########################################################
def make_suite(node):
if node.type == syms.suite:
return node
node = node.clone()
parent, node.parent = node.parent, None
suite = Node(syms.suite, [node])
suite.parent = parent
return suite
def find_root(node):
"""Find the top level namespace."""
# Scamper up to the top level namespace
while node.type != syms.file_input:
node = node.parent
if not node:
raise ValueError("root found before file_input node was found.")
return node
def does_tree_import(package, name, node):
""" Returns true if name is imported from package at the
top level of the tree which node belongs to.
To cover the case of an import like 'import foo', use
None for the package and 'foo' for the name. """
binding = find_binding(name, find_root(node), package)
return bool(binding)
def is_import(node):
"""Returns true if the node is an import statement."""
return node.type in (syms.import_name, syms.import_from)
def touch_import(package, name, node):
""" Works like `does_tree_import` but adds an import statement
if it was not imported. """
def is_import_stmt(node):
return (node.type == syms.simple_stmt and node.children and
is_import(node.children[0]))
root = find_root(node)
if does_tree_import(package, name, root):
return
# figure out where to insert the new import. First try to find
# the first import and then skip to the last one.
insert_pos = offset = 0
for idx, node in enumerate(root.children):
if not is_import_stmt(node):
continue
for offset, node2 in enumerate(root.children[idx:]):
if not is_import_stmt(node2):
break
insert_pos = idx + offset
break
# if there are no imports where we can insert, find the docstring.
# if that also fails, we stick to the beginning of the file
if insert_pos == 0:
for idx, node in enumerate(root.children):
if (node.type == syms.simple_stmt and node.children and
node.children[0].type == token.STRING):
insert_pos = idx + 1
break
if package is None:
import_ = Node(syms.import_name, [
Leaf(token.NAME, "import"),
Leaf(token.NAME, name, prefix=" ")
])
else:
import_ = FromImport(package, [Leaf(token.NAME, name, prefix=" ")])
children = [import_, Newline()]
root.insert_child(insert_pos, Node(syms.simple_stmt, children))
_def_syms = {syms.classdef, syms.funcdef}
def find_binding(name, node, package=None):
""" Returns the node which binds variable name, otherwise None.
If optional argument package is supplied, only imports will
be returned.
See test cases for examples."""
for child in node.children:
ret = None
if child.type == syms.for_stmt:
if _find(name, child.children[1]):
return child
n = find_binding(name, make_suite(child.children[-1]), package)
if n: ret = n
elif child.type in (syms.if_stmt, syms.while_stmt):
n = find_binding(name, make_suite(child.children[-1]), package)
if n: ret = n
elif child.type == syms.try_stmt:
n = find_binding(name, make_suite(child.children[2]), package)
if n:
ret = n
else:
for i, kid in enumerate(child.children[3:]):
if kid.type == token.COLON and kid.value == ":":
# i+3 is the colon, i+4 is the suite
n = find_binding(name, make_suite(child.children[i+4]), package)
if n: ret = n
elif child.type in _def_syms and child.children[1].value == name:
ret = child
elif _is_import_binding(child, name, package):
ret = child
elif child.type == syms.simple_stmt:
ret = find_binding(name, child, package)
elif child.type == syms.expr_stmt:
if _find(name, child.children[0]):
ret = child
if ret:
if not package:
return ret
if is_import(ret):
return ret
return None
_block_syms = {syms.funcdef, syms.classdef, syms.trailer}
def _find(name, node):
nodes = [node]
while nodes:
node = nodes.pop()
if node.type > 256 and node.type not in _block_syms:
nodes.extend(node.children)
elif node.type == token.NAME and node.value == name:
return node
return None
def _is_import_binding(node, name, package=None):
""" Will reuturn node if node will import name, or node
will import * from package. None is returned otherwise.
See test cases for examples. """
if node.type == syms.import_name and not package:
imp = node.children[1]
if imp.type == syms.dotted_as_names:
for child in imp.children:
if child.type == syms.dotted_as_name:
if child.children[2].value == name:
return node
elif child.type == token.NAME and child.value == name:
return node
elif imp.type == syms.dotted_as_name:
last = imp.children[-1]
if last.type == token.NAME and last.value == name:
return node
elif imp.type == token.NAME and imp.value == name:
return node
elif node.type == syms.import_from:
# str(...) is used to make life easier here, because
# from a.b import parses to ['import', ['a', '.', 'b'], ...]
if package and str(node.children[1]).strip() != package:
return None
n = node.children[3]
if package and _find("as", n):
# See test_from_import_as for explanation
return None
elif n.type == syms.import_as_names and _find(name, n):
return node
elif n.type == syms.import_as_name:
child = n.children[2]
if child.type == token.NAME and child.value == name:
return node
elif n.type == token.NAME and n.value == name:
return node
elif package and n.type == token.STAR:
return node
return None
# Grammar for 2to3. This grammar supports Python 2.x and 3.x.
# Note: Changing the grammar specified in this file will most likely
# require corresponding changes in the parser module
# (../Modules/parsermodule.c). If you can't make the changes to
# that module yourself, please co-ordinate the required changes
# with someone who can; ask around on python-dev for help. Fred
# Drake <[email protected]> will probably be listening there.
# NOTE WELL: You should also follow all the steps listed in PEP 306,
# "How to Change Python's Grammar"
# Commands for Kees Blom's railroad program
#diagram:token NAME
#diagram:token NUMBER
#diagram:token STRING
#diagram:token NEWLINE
#diagram:token ENDMARKER
#diagram:token INDENT
#diagram:output\input python.bla
#diagram:token DEDENT
#diagram:output\textwidth 20.04cm\oddsidemargin 0.0cm\evensidemargin 0.0cm
#diagram:rules
# Start symbols for the grammar:
# file_input is a module or sequence of commands read from an input file;
# single_input is a single interactive statement;
# eval_input is the input for the eval() and input() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
file_input: (NEWLINE | stmt)* ENDMARKER
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
eval_input: testlist NEWLINE* ENDMARKER
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
decorators: decorator+
decorated: decorators (classdef | funcdef)
funcdef: 'def' NAME parameters ['->' test] ':' suite
parameters: '(' [typedargslist] ')'
typedargslist: ((tfpdef ['=' test] ',')*
('*' [tname] (',' tname ['=' test])* [',' '**' tname] | '**' tname)
| tfpdef ['=' test] (',' tfpdef ['=' test])* [','])
tname: NAME [':' test]
tfpdef: tname | '(' tfplist ')'
tfplist: tfpdef (',' tfpdef)* [',']
varargslist: ((vfpdef ['=' test] ',')*
('*' [vname] (',' vname ['=' test])* [',' '**' vname] | '**' vname)
| vfpdef ['=' test] (',' vfpdef ['=' test])* [','])
vname: NAME
vfpdef: vname | '(' vfplist ')'
vfplist: vfpdef (',' vfpdef)* [',']
stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt |
import_stmt | global_stmt | exec_stmt | assert_stmt)
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
('=' (yield_expr|testlist_star_expr))*)
testlist_star_expr: (test|star_expr) (',' (test|star_expr))* [',']
augassign: ('+=' | '-=' | '*=' | '@=' | '/=' | '%=' | '&=' | '|=' | '^=' |
'<<=' | '>>=' | '**=' | '//=')
# For normal assignments, additional restrictions enforced by the interpreter
print_stmt: 'print' ( [ test (',' test)* [','] ] |
'>>' test [ (',' test)+ [','] ] )
del_stmt: 'del' exprlist
pass_stmt: 'pass'
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
break_stmt: 'break'
continue_stmt: 'continue'
return_stmt: 'return' [testlist]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test | ',' test [',' test]]]
import_stmt: import_name | import_from
import_name: 'import' dotted_as_names
import_from: ('from' ('.'* dotted_name | '.'+)
'import' ('*' | '(' import_as_names ')' | import_as_names))
import_as_name: NAME ['as' NAME]
dotted_as_name: dotted_name ['as' NAME]
import_as_names: import_as_name (',' import_as_name)* [',']
dotted_as_names: dotted_as_name (',' dotted_as_name)*
dotted_name: NAME ('.' NAME)*
global_stmt: ('global' | 'nonlocal') NAME (',' NAME)*
exec_stmt: 'exec' expr ['in' test [',' test]]
assert_stmt: 'assert' test [',' test]
compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
try_stmt: ('try' ':' suite
((except_clause ':' suite)+
['else' ':' suite]
['finally' ':' suite] |
'finally' ':' suite))
with_stmt: 'with' with_item (',' with_item)* ':' suite
with_item: test ['as' expr]
with_var: 'as' expr
# NB compile.c makes sure that the default except clause is last
except_clause: 'except' [test [(',' | 'as') test]]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
# Backward compatibility cruft to support:
# [ x for x in lambda: True, lambda: False if x() ]
# even while also allowing:
# lambda x: 5 if x else 2
# (But not a mix of the two)
testlist_safe: old_test [(',' old_test)+ [',']]
old_test: or_test | old_lambdef
old_lambdef: 'lambda' [varargslist] ':' old_test
test: or_test ['if' or_test 'else' test] | lambdef
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
star_expr: '*' expr
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'@'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: ('(' [yield_expr|testlist_gexp] ')' |
'[' [listmaker] ']' |
'{' [dictsetmaker] '}' |
'`' testlist1 '`' |
NAME | NUMBER | STRING+ | '.' '.' '.')
listmaker: (test|star_expr) ( comp_for | (',' (test|star_expr))* [','] )
testlist_gexp: (test|star_expr) ( comp_for | (',' (test|star_expr))* [','] )
lambdef: 'lambda' [varargslist] ':' test
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: test | [test] ':' [test] [sliceop]
sliceop: ':' [test]
exprlist: (expr|star_expr) (',' (expr|star_expr))* [',']
testlist: test (',' test)* [',']
dictsetmaker: ( (test ':' test (comp_for | (',' test ':' test)* [','])) |
(test (comp_for | (',' test)* [','])) )
classdef: 'class' NAME ['(' [arglist] ')'] ':' suite
arglist: (argument ',')* (argument [',']
|'*' test (',' argument)* [',' '**' test]
|'**' test)
argument: test [comp_for] | test '=' test # Really [keyword '='] test
comp_iter: comp_for | comp_if
comp_for: 'for' exprlist 'in' testlist_safe [comp_iter]
comp_if: 'if' old_test [comp_iter]
testlist1: test (',' test)*
# not used in grammar, but may appear in "node" passed from Parser to Compiler
encoding_decl: NAME
yield_expr: 'yield' [yield_arg]
yield_arg: 'from' test | testlist
"""
Main program for 2to3.
"""
from __future__ import with_statement, print_function
import sys
import os
import difflib
import logging
import shutil
import optparse
from . import refactor
def diff_texts(a, b, filename):
"""Return a unified diff of two strings."""
a = a.splitlines()
b = b.splitlines()
return difflib.unified_diff(a, b, filename, filename,
"(original)", "(refactored)",
lineterm="")
class StdoutRefactoringTool(refactor.MultiprocessRefactoringTool):
"""
A refactoring tool that can avoid overwriting its input files.
Prints output to stdout.
Output files can optionally be written to a different directory and or
have an extra file suffix appended to their name for use in situations
where you do not want to replace the input files.
"""
def __init__(self, fixers, options, explicit, nobackups, show_diffs,
input_base_dir='', output_dir='', append_suffix=''):
"""
Args:
fixers: A list of fixers to import.
options: A dict with RefactoringTool configuration.
explicit: A list of fixers to run even if they are explicit.
nobackups: If true no backup '.bak' files will be created for those
files that are being refactored.
show_diffs: Should diffs of the refactoring be printed to stdout?
input_base_dir: The base directory for all input files. This class
will strip this path prefix off of filenames before substituting
it with output_dir. Only meaningful if output_dir is supplied.
All files processed by refactor() must start with this path.
output_dir: If supplied, all converted files will be written into
this directory tree instead of input_base_dir.
append_suffix: If supplied, all files output by this tool will have
this appended to their filename. Useful for changing .py to
.py3 for example by passing append_suffix='3'.
"""
self.nobackups = nobackups
self.show_diffs = show_diffs
if input_base_dir and not input_base_dir.endswith(os.sep):
input_base_dir += os.sep
self._input_base_dir = input_base_dir
self._output_dir = output_dir
self._append_suffix = append_suffix
super(StdoutRefactoringTool, self).__init__(fixers, options, explicit)
def log_error(self, msg, *args, **kwargs):
self.errors.append((msg, args, kwargs))
self.logger.error(msg, *args, **kwargs)
def write_file(self, new_text, filename, old_text, encoding):
orig_filename = filename
if self._output_dir:
if filename.startswith(self._input_base_dir):
filename = os.path.join(self._output_dir,
filename[len(self._input_base_dir):])
else:
raise ValueError('filename %s does not start with the '
'input_base_dir %s' % (
filename, self._input_base_dir))
if self._append_suffix:
filename += self._append_suffix
if orig_filename != filename:
output_dir = os.path.dirname(filename)
if not os.path.isdir(output_dir):
os.makedirs(output_dir)
self.log_message('Writing converted %s to %s.', orig_filename,
filename)
if not self.nobackups:
# Make backup
backup = filename + ".bak"
if os.path.lexists(backup):
try:
os.remove(backup)
except OSError as err:
self.log_message("Can't remove backup %s", backup)
try:
os.rename(filename, backup)
except OSError as err:
self.log_message("Can't rename %s to %s", filename, backup)
# Actually write the new file
write = super(StdoutRefactoringTool, self).write_file
write(new_text, filename, old_text, encoding)
if not self.nobackups:
shutil.copymode(backup, filename)
if orig_filename != filename:
# Preserve the file mode in the new output directory.
shutil.copymode(orig_filename, filename)
def print_output(self, old, new, filename, equal):
if equal:
self.log_message("No changes to %s", filename)
else:
self.log_message("Refactored %s", filename)
if self.show_diffs:
diff_lines = diff_texts(old, new, filename)
try:
if self.output_lock is not None:
with self.output_lock:
for line in diff_lines:
print(line)
sys.stdout.flush()
else:
for line in diff_lines:
print(line)
except UnicodeEncodeError:
warn("couldn't encode %s's diff for your terminal" %
(filename,))
return
def warn(msg):
print("WARNING: %s" % (msg,), file=sys.stderr)
def main(fixer_pkg, args=None):
"""Main program.
Args:
fixer_pkg: the name of a package where the fixers are located.
args: optional; a list of command line arguments. If omitted,
sys.argv[1:] is used.
Returns a suggested exit status (0, 1, 2).
"""
# Set up option parser
parser = optparse.OptionParser(usage="2to3 [options] file|dir ...")
parser.add_option("-d", "--doctests_only", action="store_true",
help="Fix up doctests only")
parser.add_option("-f", "--fix", action="append", default=[],
help="Each FIX specifies a transformation; default: all")
parser.add_option("-j", "--processes", action="store", default=1,
type="int", help="Run 2to3 concurrently")
parser.add_option("-x", "--nofix", action="append", default=[],
help="Prevent a transformation from being run")
parser.add_option("-l", "--list-fixes", action="store_true",
help="List available transformations")
parser.add_option("-p", "--print-function", action="store_true",
help="Modify the grammar so that print() is a function")
parser.add_option("-v", "--verbose", action="store_true",
help="More verbose logging")
parser.add_option("--no-diffs", action="store_true",
help="Don't show diffs of the refactoring")
parser.add_option("-w", "--write", action="store_true",
help="Write back modified files")
parser.add_option("-n", "--nobackups", action="store_true", default=False,
help="Don't write backups for modified files")
parser.add_option("-o", "--output-dir", action="store", type="str",
default="", help="Put output files in this directory "
"instead of overwriting the input files. Requires -n.")
parser.add_option("-W", "--write-unchanged-files", action="store_true",
help="Also write files even if no changes were required"
" (useful with --output-dir); implies -w.")
parser.add_option("--add-suffix", action="store", type="str", default="",
help="Append this string to all output filenames."
" Requires -n if non-empty. "
"ex: --add-suffix='3' will generate .py3 files.")
# Parse command line arguments
refactor_stdin = False
flags = {}
options, args = parser.parse_args(args)
if options.write_unchanged_files:
flags["write_unchanged_files"] = True
if not options.write:
warn("--write-unchanged-files/-W implies -w.")
options.write = True
# If we allowed these, the original files would be renamed to backup names
# but not replaced.
if options.output_dir and not options.nobackups:
parser.error("Can't use --output-dir/-o without -n.")
if options.add_suffix and not options.nobackups:
parser.error("Can't use --add-suffix without -n.")
if not options.write and options.no_diffs:
warn("not writing files and not printing diffs; that's not very useful")
if not options.write and options.nobackups:
parser.error("Can't use -n without -w")
if options.list_fixes:
print("Available transformations for the -f/--fix option:")
for fixname in refactor.get_all_fix_names(fixer_pkg):
print(fixname)
if not args:
return 0
if not args:
print("At least one file or directory argument required.", file=sys.stderr)
print("Use --help to show usage.", file=sys.stderr)
return 2
if "-" in args:
refactor_stdin = True
if options.write:
print("Can't write to stdin.", file=sys.stderr)
return 2
if options.print_function:
flags["print_function"] = True
# Set up logging handler
level = logging.DEBUG if options.verbose else logging.INFO
logging.basicConfig(format='%(name)s: %(message)s', level=level)
logger = logging.getLogger('lib2to3.main')
# Initialize the refactoring tool
avail_fixes = set(refactor.get_fixers_from_package(fixer_pkg))
unwanted_fixes = set(fixer_pkg + ".fix_" + fix for fix in options.nofix)
explicit = set()
if options.fix:
all_present = False
for fix in options.fix:
if fix == "all":
all_present = True
else:
explicit.add(fixer_pkg + ".fix_" + fix)
requested = avail_fixes.union(explicit) if all_present else explicit
else:
requested = avail_fixes.union(explicit)
fixer_names = requested.difference(unwanted_fixes)
input_base_dir = os.path.commonprefix(args)
if (input_base_dir and not input_base_dir.endswith(os.sep)
and not os.path.isdir(input_base_dir)):
# One or more similar names were passed, their directory is the base.
# os.path.commonprefix() is ignorant of path elements, this corrects
# for that weird API.
input_base_dir = os.path.dirname(input_base_dir)
if options.output_dir:
input_base_dir = input_base_dir.rstrip(os.sep)
logger.info('Output in %r will mirror the input directory %r layout.',
options.output_dir, input_base_dir)
rt = StdoutRefactoringTool(
sorted(fixer_names), flags, sorted(explicit),
options.nobackups, not options.no_diffs,
input_base_dir=input_base_dir,
output_dir=options.output_dir,
append_suffix=options.add_suffix)
# Refactor all files and directories passed as arguments
if not rt.errors:
if refactor_stdin:
rt.refactor_stdin()
else:
try:
rt.refactor(args, options.write, options.doctests_only,
options.processes)
except refactor.MultiprocessingUnsupported:
assert options.processes > 1
print("Sorry, -j isn't supported on this platform.",
file=sys.stderr)
return 1
rt.summarize()
# Return error status (0 if rt.errors is zero)
return int(bool(rt.errors))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Pattern compiler.
The grammer is taken from PatternGrammar.txt.
The compiler compiles a pattern to a pytree.*Pattern instance.
"""
__author__ = "Guido van Rossum <[email protected]>"
# Python imports
import io
import os
# Fairly local imports
from .pgen2 import driver, literals, token, tokenize, parse, grammar
# Really local imports
from . import pytree
from . import pygram
# The pattern grammar file
_PATTERN_GRAMMAR_FILE = os.path.join(os.path.dirname(__file__),
"PatternGrammar.txt")
class PatternSyntaxError(Exception):
pass
def tokenize_wrapper(input):
"""Tokenizes a string suppressing significant whitespace."""
skip = {token.NEWLINE, token.INDENT, token.DEDENT}
tokens = tokenize.generate_tokens(io.StringIO(input).readline)
for quintuple in tokens:
type, value, start, end, line_text = quintuple
if type not in skip:
yield quintuple
class PatternCompiler(object):
def __init__(self, grammar_file=_PATTERN_GRAMMAR_FILE):
"""Initializer.
Takes an optional alternative filename for the pattern grammar.
"""
self.grammar = driver.load_grammar(grammar_file)
self.syms = pygram.Symbols(self.grammar)
self.pygrammar = pygram.python_grammar
self.pysyms = pygram.python_symbols
self.driver = driver.Driver(self.grammar, convert=pattern_convert)
def compile_pattern(self, input, debug=False, with_tree=False):
"""Compiles a pattern string to a nested pytree.*Pattern object."""
tokens = tokenize_wrapper(input)
try:
root = self.driver.parse_tokens(tokens, debug=debug)
except parse.ParseError as e:
raise PatternSyntaxError(str(e))
if with_tree:
return self.compile_node(root), root
else:
return self.compile_node(root)
def compile_node(self, node):
"""Compiles a node, recursively.
This is one big switch on the node type.
"""
# XXX Optimize certain Wildcard-containing-Wildcard patterns
# that can be merged
if node.type == self.syms.Matcher:
node = node.children[0] # Avoid unneeded recursion
if node.type == self.syms.Alternatives:
# Skip the odd children since they are just '|' tokens
alts = [self.compile_node(ch) for ch in node.children[::2]]
if len(alts) == 1:
return alts[0]
p = pytree.WildcardPattern([[a] for a in alts], min=1, max=1)
return p.optimize()
if node.type == self.syms.Alternative:
units = [self.compile_node(ch) for ch in node.children]
if len(units) == 1:
return units[0]
p = pytree.WildcardPattern([units], min=1, max=1)
return p.optimize()
if node.type == self.syms.NegatedUnit:
pattern = self.compile_basic(node.children[1:])
p = pytree.NegatedPattern(pattern)
return p.optimize()
assert node.type == self.syms.Unit
name = None
nodes = node.children
if len(nodes) >= 3 and nodes[1].type == token.EQUAL:
name = nodes[0].value
nodes = nodes[2:]
repeat = None
if len(nodes) >= 2 and nodes[-1].type == self.syms.Repeater:
repeat = nodes[-1]
nodes = nodes[:-1]
# Now we've reduced it to: STRING | NAME [Details] | (...) | [...]
pattern = self.compile_basic(nodes, repeat)
if repeat is not None:
assert repeat.type == self.syms.Repeater
children = repeat.children
child = children[0]
if child.type == token.STAR:
min = 0
max = pytree.HUGE
elif child.type == token.PLUS:
min = 1
max = pytree.HUGE
elif child.type == token.LBRACE:
assert children[-1].type == token.RBRACE
assert len(children) in (3, 5)
min = max = self.get_int(children[1])
if len(children) == 5:
max = self.get_int(children[3])
else:
assert False
if min != 1 or max != 1:
pattern = pattern.optimize()
pattern = pytree.WildcardPattern([[pattern]], min=min, max=max)
if name is not None:
pattern.name = name
return pattern.optimize()
def compile_basic(self, nodes, repeat=None):
# Compile STRING | NAME [Details] | (...) | [...]
assert len(nodes) >= 1
node = nodes[0]
if node.type == token.STRING:
value = str(literals.evalString(node.value))
return pytree.LeafPattern(_type_of_literal(value), value)
elif node.type == token.NAME:
value = node.value
if value.isupper():
if value not in TOKEN_MAP:
raise PatternSyntaxError("Invalid token: %r" % value)
if nodes[1:]:
raise PatternSyntaxError("Can't have details for token")
return pytree.LeafPattern(TOKEN_MAP[value])
else:
if value == "any":
type = None
elif not value.startswith("_"):
type = getattr(self.pysyms, value, None)
if type is None:
raise PatternSyntaxError("Invalid symbol: %r" % value)
if nodes[1:]: # Details present
content = [self.compile_node(nodes[1].children[1])]
else:
content = None
return pytree.NodePattern(type, content)
elif node.value == "(":
return self.compile_node(nodes[1])
elif node.value == "[":
assert repeat is None
subpattern = self.compile_node(nodes[1])
return pytree.WildcardPattern([[subpattern]], min=0, max=1)
assert False, node
def get_int(self, node):
assert node.type == token.NUMBER
return int(node.value)
# Map named tokens to the type value for a LeafPattern
TOKEN_MAP = {"NAME": token.NAME,
"STRING": token.STRING,
"NUMBER": token.NUMBER,
"TOKEN": None}
def _type_of_literal(value):
if value[0].isalpha():
return token.NAME
elif value in grammar.opmap:
return grammar.opmap[value]
else:
return None
def pattern_convert(grammar, raw_node_info):
"""Converts raw node information to a Node or Leaf instance."""
type, value, context, children = raw_node_info
if children or type in grammar.number2symbol:
return pytree.Node(type, children, context=context)
else:
return pytree.Leaf(type, value, context=context)
def compile_pattern(pattern):
return PatternCompiler().compile_pattern(pattern)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# A grammar to describe tree matching patterns.
# Not shown here:
# - 'TOKEN' stands for any token (leaf node)
# - 'any' stands for any node (leaf or interior)
# With 'any' we can still specify the sub-structure.
# The start symbol is 'Matcher'.
Matcher: Alternatives ENDMARKER
Alternatives: Alternative ('|' Alternative)*
Alternative: (Unit | NegatedUnit)+
Unit: [NAME '='] ( STRING [Repeater]
| NAME [Details] [Repeater]
| '(' Alternatives ')' [Repeater]
| '[' Alternatives ']'
)
NegatedUnit: 'not' (STRING | NAME [Details] | '(' Alternatives ')')
Repeater: '*' | '+' | '{' NUMBER [',' NUMBER] '}'
Details: '<' Alternatives '>'
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Export the Python grammar and symbols."""
# Python imports
import os
# Local imports
from .pgen2 import token
from .pgen2 import driver
from . import pytree
# The grammar file
_GRAMMAR_FILE = os.path.join(os.path.dirname(__file__), "Grammar.txt")
_PATTERN_GRAMMAR_FILE = os.path.join(os.path.dirname(__file__),
"PatternGrammar.txt")
class Symbols(object):
def __init__(self, grammar):
"""Initializer.
Creates an attribute for each grammar symbol (nonterminal),
whose value is the symbol's type (an int >= 256).
"""
for name, symbol in grammar.symbol2number.items():
setattr(self, name, symbol)
python_grammar = driver.load_grammar(_GRAMMAR_FILE)
python_symbols = Symbols(python_grammar)
python_grammar_no_print_statement = python_grammar.copy()
del python_grammar_no_print_statement.keywords["print"]
pattern_grammar = driver.load_grammar(_PATTERN_GRAMMAR_FILE)
pattern_symbols = Symbols(pattern_grammar)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""
Python parse tree definitions.
This is a very concrete parse tree; we need to keep every token and
even the comments and whitespace between tokens.
There's also a pattern matching implementation here.
"""
__author__ = "Guido van Rossum <[email protected]>"
import sys
import warnings
from io import StringIO
HUGE = 0x7FFFFFFF # maximum repeat count, default max
_type_reprs = {}
def type_repr(type_num):
global _type_reprs
if not _type_reprs:
from .pygram import python_symbols
# printing tokens is possible but not as useful
# from .pgen2 import token // token.__dict__.items():
for name, val in python_symbols.__dict__.items():
if type(val) == int: _type_reprs[val] = name
return _type_reprs.setdefault(type_num, type_num)
class Base(object):
"""
Abstract base class for Node and Leaf.
This provides some default functionality and boilerplate using the
template pattern.
A node may be a subnode of at most one parent.
"""
# Default values for instance variables
type = None # int: token number (< 256) or symbol number (>= 256)
parent = None # Parent node pointer, or None
children = () # Tuple of subnodes
was_changed = False
was_checked = False
def __new__(cls, *args, **kwds):
"""Constructor that prevents Base from being instantiated."""
assert cls is not Base, "Cannot instantiate Base"
return object.__new__(cls)
def __eq__(self, other):
"""
Compare two nodes for equality.
This calls the method _eq().
"""
if self.__class__ is not other.__class__:
return NotImplemented
return self._eq(other)
__hash__ = None # For Py3 compatibility.
def _eq(self, other):
"""
Compare two nodes for equality.
This is called by __eq__ and __ne__. It is only called if the two nodes
have the same type. This must be implemented by the concrete subclass.
Nodes should be considered equal if they have the same structure,
ignoring the prefix string and other context information.
"""
raise NotImplementedError
def clone(self):
"""
Return a cloned (deep) copy of self.
This must be implemented by the concrete subclass.
"""
raise NotImplementedError
def post_order(self):
"""
Return a post-order iterator for the tree.
This must be implemented by the concrete subclass.
"""
raise NotImplementedError
def pre_order(self):
"""
Return a pre-order iterator for the tree.
This must be implemented by the concrete subclass.
"""
raise NotImplementedError
def replace(self, new):
"""Replace this node with a new one in the parent."""
assert self.parent is not None, str(self)
assert new is not None
if not isinstance(new, list):
new = [new]
l_children = []
found = False
for ch in self.parent.children:
if ch is self:
assert not found, (self.parent.children, self, new)
if new is not None:
l_children.extend(new)
found = True
else:
l_children.append(ch)
assert found, (self.children, self, new)
self.parent.changed()
self.parent.children = l_children
for x in new:
x.parent = self.parent
self.parent = None
def get_lineno(self):
"""Return the line number which generated the invocant node."""
node = self
while not isinstance(node, Leaf):
if not node.children:
return
node = node.children[0]
return node.lineno
def changed(self):
if self.parent:
self.parent.changed()
self.was_changed = True
def remove(self):
"""
Remove the node from the tree. Returns the position of the node in its
parent's children before it was removed.
"""
if self.parent:
for i, node in enumerate(self.parent.children):
if node is self:
self.parent.changed()
del self.parent.children[i]
self.parent = None
return i
@property
def next_sibling(self):
"""
The node immediately following the invocant in their parent's children
list. If the invocant does not have a next sibling, it is None
"""
if self.parent is None:
return None
# Can't use index(); we need to test by identity
for i, child in enumerate(self.parent.children):
if child is self:
try:
return self.parent.children[i+1]
except IndexError:
return None
@property
def prev_sibling(self):
"""
The node immediately preceding the invocant in their parent's children
list. If the invocant does not have a previous sibling, it is None.
"""
if self.parent is None:
return None
# Can't use index(); we need to test by identity
for i, child in enumerate(self.parent.children):
if child is self:
if i == 0:
return None
return self.parent.children[i-1]
def leaves(self):
for child in self.children:
yield from child.leaves()
def depth(self):
if self.parent is None:
return 0
return 1 + self.parent.depth()
def get_suffix(self):
"""
Return the string immediately following the invocant node. This is
effectively equivalent to node.next_sibling.prefix
"""
next_sib = self.next_sibling
if next_sib is None:
return ""
return next_sib.prefix
if sys.version_info < (3, 0):
def __str__(self):
return str(self).encode("ascii")
class Node(Base):
"""Concrete implementation for interior nodes."""
def __init__(self,type, children,
context=None,
prefix=None,
fixers_applied=None):
"""
Initializer.
Takes a type constant (a symbol number >= 256), a sequence of
child nodes, and an optional context keyword argument.
As a side effect, the parent pointers of the children are updated.
"""
assert type >= 256, type
self.type = type
self.children = list(children)
for ch in self.children:
assert ch.parent is None, repr(ch)
ch.parent = self
if prefix is not None:
self.prefix = prefix
if fixers_applied:
self.fixers_applied = fixers_applied[:]
else:
self.fixers_applied = None
def __repr__(self):
"""Return a canonical string representation."""
return "%s(%s, %r)" % (self.__class__.__name__,
type_repr(self.type),
self.children)
def __unicode__(self):
"""
Return a pretty string representation.
This reproduces the input source exactly.
"""
return "".join(map(str, self.children))
if sys.version_info > (3, 0):
__str__ = __unicode__
def _eq(self, other):
"""Compare two nodes for equality."""
return (self.type, self.children) == (other.type, other.children)
def clone(self):
"""Return a cloned (deep) copy of self."""
return Node(self.type, [ch.clone() for ch in self.children],
fixers_applied=self.fixers_applied)
def post_order(self):
"""Return a post-order iterator for the tree."""
for child in self.children:
yield from child.post_order()
yield self
def pre_order(self):
"""Return a pre-order iterator for the tree."""
yield self
for child in self.children:
yield from child.pre_order()
def _prefix_getter(self):
"""
The whitespace and comments preceding this node in the input.
"""
if not self.children:
return ""
return self.children[0].prefix
def _prefix_setter(self, prefix):
if self.children:
self.children[0].prefix = prefix
prefix = property(_prefix_getter, _prefix_setter)
def set_child(self, i, child):
"""
Equivalent to 'node.children[i] = child'. This method also sets the
child's parent attribute appropriately.
"""
child.parent = self
self.children[i].parent = None
self.children[i] = child
self.changed()
def insert_child(self, i, child):
"""
Equivalent to 'node.children.insert(i, child)'. This method also sets
the child's parent attribute appropriately.
"""
child.parent = self
self.children.insert(i, child)
self.changed()
def append_child(self, child):
"""
Equivalent to 'node.children.append(child)'. This method also sets the
child's parent attribute appropriately.
"""
child.parent = self
self.children.append(child)
self.changed()
class Leaf(Base):
"""Concrete implementation for leaf nodes."""
# Default values for instance variables
_prefix = "" # Whitespace and comments preceding this token in the input
lineno = 0 # Line where this token starts in the input
column = 0 # Column where this token tarts in the input
def __init__(self, type, value,
context=None,
prefix=None,
fixers_applied=[]):
"""
Initializer.
Takes a type constant (a token number < 256), a string value, and an
optional context keyword argument.
"""
assert 0 <= type < 256, type
if context is not None:
self._prefix, (self.lineno, self.column) = context
self.type = type
self.value = value
if prefix is not None:
self._prefix = prefix
self.fixers_applied = fixers_applied[:]
def __repr__(self):
"""Return a canonical string representation."""
return "%s(%r, %r)" % (self.__class__.__name__,
self.type,
self.value)
def __unicode__(self):
"""
Return a pretty string representation.
This reproduces the input source exactly.
"""
return self.prefix + str(self.value)
if sys.version_info > (3, 0):
__str__ = __unicode__
def _eq(self, other):
"""Compare two nodes for equality."""
return (self.type, self.value) == (other.type, other.value)
def clone(self):
"""Return a cloned (deep) copy of self."""
return Leaf(self.type, self.value,
(self.prefix, (self.lineno, self.column)),
fixers_applied=self.fixers_applied)
def leaves(self):
yield self
def post_order(self):
"""Return a post-order iterator for the tree."""
yield self
def pre_order(self):
"""Return a pre-order iterator for the tree."""
yield self
def _prefix_getter(self):
"""
The whitespace and comments preceding this token in the input.
"""
return self._prefix
def _prefix_setter(self, prefix):
self.changed()
self._prefix = prefix
prefix = property(_prefix_getter, _prefix_setter)
def convert(gr, raw_node):
"""
Convert raw node information to a Node or Leaf instance.
This is passed to the parser driver which calls it whenever a reduction of a
grammar rule produces a new complete node, so that the tree is build
strictly bottom-up.
"""
type, value, context, children = raw_node
if children or type in gr.number2symbol:
# If there's exactly one child, return that child instead of
# creating a new node.
if len(children) == 1:
return children[0]
return Node(type, children, context=context)
else:
return Leaf(type, value, context=context)
class BasePattern(object):
"""
A pattern is a tree matching pattern.
It looks for a specific node type (token or symbol), and
optionally for a specific content.
This is an abstract base class. There are three concrete
subclasses:
- LeafPattern matches a single leaf node;
- NodePattern matches a single node (usually non-leaf);
- WildcardPattern matches a sequence of nodes of variable length.
"""
# Defaults for instance variables
type = None # Node type (token if < 256, symbol if >= 256)
content = None # Optional content matching pattern
name = None # Optional name used to store match in results dict
def __new__(cls, *args, **kwds):
"""Constructor that prevents BasePattern from being instantiated."""
assert cls is not BasePattern, "Cannot instantiate BasePattern"
return object.__new__(cls)
def __repr__(self):
args = [type_repr(self.type), self.content, self.name]
while args and args[-1] is None:
del args[-1]
return "%s(%s)" % (self.__class__.__name__, ", ".join(map(repr, args)))
def optimize(self):
"""
A subclass can define this as a hook for optimizations.
Returns either self or another node with the same effect.
"""
return self
def match(self, node, results=None):
"""
Does this pattern exactly match a node?
Returns True if it matches, False if not.
If results is not None, it must be a dict which will be
updated with the nodes matching named subpatterns.
Default implementation for non-wildcard patterns.
"""
if self.type is not None and node.type != self.type:
return False
if self.content is not None:
r = None
if results is not None:
r = {}
if not self._submatch(node, r):
return False
if r:
results.update(r)
if results is not None and self.name:
results[self.name] = node
return True
def match_seq(self, nodes, results=None):
"""
Does this pattern exactly match a sequence of nodes?
Default implementation for non-wildcard patterns.
"""
if len(nodes) != 1:
return False
return self.match(nodes[0], results)
def generate_matches(self, nodes):
"""
Generator yielding all matches for this pattern.
Default implementation for non-wildcard patterns.
"""
r = {}
if nodes and self.match(nodes[0], r):
yield 1, r
class LeafPattern(BasePattern):
def __init__(self, type=None, content=None, name=None):
"""
Initializer. Takes optional type, content, and name.
The type, if given must be a token type (< 256). If not given,
this matches any *leaf* node; the content may still be required.
The content, if given, must be a string.
If a name is given, the matching node is stored in the results
dict under that key.
"""
if type is not None:
assert 0 <= type < 256, type
if content is not None:
assert isinstance(content, str), repr(content)
self.type = type
self.content = content
self.name = name
def match(self, node, results=None):
"""Override match() to insist on a leaf node."""
if not isinstance(node, Leaf):
return False
return BasePattern.match(self, node, results)
def _submatch(self, node, results=None):
"""
Match the pattern's content to the node's children.
This assumes the node type matches and self.content is not None.
Returns True if it matches, False if not.
If results is not None, it must be a dict which will be
updated with the nodes matching named subpatterns.
When returning False, the results dict may still be updated.
"""
return self.content == node.value
class NodePattern(BasePattern):
wildcards = False
def __init__(self, type=None, content=None, name=None):
"""
Initializer. Takes optional type, content, and name.
The type, if given, must be a symbol type (>= 256). If the
type is None this matches *any* single node (leaf or not),
except if content is not None, in which it only matches
non-leaf nodes that also match the content pattern.
The content, if not None, must be a sequence of Patterns that
must match the node's children exactly. If the content is
given, the type must not be None.
If a name is given, the matching node is stored in the results
dict under that key.
"""
if type is not None:
assert type >= 256, type
if content is not None:
assert not isinstance(content, str), repr(content)
content = list(content)
for i, item in enumerate(content):
assert isinstance(item, BasePattern), (i, item)
if isinstance(item, WildcardPattern):
self.wildcards = True
self.type = type
self.content = content
self.name = name
def _submatch(self, node, results=None):
"""
Match the pattern's content to the node's children.
This assumes the node type matches and self.content is not None.
Returns True if it matches, False if not.
If results is not None, it must be a dict which will be
updated with the nodes matching named subpatterns.
When returning False, the results dict may still be updated.
"""
if self.wildcards:
for c, r in generate_matches(self.content, node.children):
if c == len(node.children):
if results is not None:
results.update(r)
return True
return False
if len(self.content) != len(node.children):
return False
for subpattern, child in zip(self.content, node.children):
if not subpattern.match(child, results):
return False
return True
class WildcardPattern(BasePattern):
"""
A wildcard pattern can match zero or more nodes.
This has all the flexibility needed to implement patterns like:
.* .+ .? .{m,n}
(a b c | d e | f)
(...)* (...)+ (...)? (...){m,n}
except it always uses non-greedy matching.
"""
def __init__(self, content=None, min=0, max=HUGE, name=None):
"""
Initializer.
Args:
content: optional sequence of subsequences of patterns;
if absent, matches one node;
if present, each subsequence is an alternative [*]
min: optional minimum number of times to match, default 0
max: optional maximum number of times to match, default HUGE
name: optional name assigned to this match
[*] Thus, if content is [[a, b, c], [d, e], [f, g, h]] this is
equivalent to (a b c | d e | f g h); if content is None,
this is equivalent to '.' in regular expression terms.
The min and max parameters work as follows:
min=0, max=maxint: .*
min=1, max=maxint: .+
min=0, max=1: .?
min=1, max=1: .
If content is not None, replace the dot with the parenthesized
list of alternatives, e.g. (a b c | d e | f g h)*
"""
assert 0 <= min <= max <= HUGE, (min, max)
if content is not None:
content = tuple(map(tuple, content)) # Protect against alterations
# Check sanity of alternatives
assert len(content), repr(content) # Can't have zero alternatives
for alt in content:
assert len(alt), repr(alt) # Can have empty alternatives
self.content = content
self.min = min
self.max = max
self.name = name
def optimize(self):
"""Optimize certain stacked wildcard patterns."""
subpattern = None
if (self.content is not None and
len(self.content) == 1 and len(self.content[0]) == 1):
subpattern = self.content[0][0]
if self.min == 1 and self.max == 1:
if self.content is None:
return NodePattern(name=self.name)
if subpattern is not None and self.name == subpattern.name:
return subpattern.optimize()
if (self.min <= 1 and isinstance(subpattern, WildcardPattern) and
subpattern.min <= 1 and self.name == subpattern.name):
return WildcardPattern(subpattern.content,
self.min*subpattern.min,
self.max*subpattern.max,
subpattern.name)
return self
def match(self, node, results=None):
"""Does this pattern exactly match a node?"""
return self.match_seq([node], results)
def match_seq(self, nodes, results=None):
"""Does this pattern exactly match a sequence of nodes?"""
for c, r in self.generate_matches(nodes):
if c == len(nodes):
if results is not None:
results.update(r)
if self.name:
results[self.name] = list(nodes)
return True
return False
def generate_matches(self, nodes):
"""
Generator yielding matches for a sequence of nodes.
Args:
nodes: sequence of nodes
Yields:
(count, results) tuples where:
count: the match comprises nodes[:count];
results: dict containing named submatches.
"""
if self.content is None:
# Shortcut for special case (see __init__.__doc__)
for count in range(self.min, 1 + min(len(nodes), self.max)):
r = {}
if self.name:
r[self.name] = nodes[:count]
yield count, r
elif self.name == "bare_name":
yield self._bare_name_matches(nodes)
else:
# The reason for this is that hitting the recursion limit usually
# results in some ugly messages about how RuntimeErrors are being
# ignored. We only have to do this on CPython, though, because other
# implementations don't have this nasty bug in the first place.
if hasattr(sys, "getrefcount"):
save_stderr = sys.stderr
sys.stderr = StringIO()
try:
for count, r in self._recursive_matches(nodes, 0):
if self.name:
r[self.name] = nodes[:count]
yield count, r
except RuntimeError:
# We fall back to the iterative pattern matching scheme if the recursive
# scheme hits the recursion limit.
for count, r in self._iterative_matches(nodes):
if self.name:
r[self.name] = nodes[:count]
yield count, r
finally:
if hasattr(sys, "getrefcount"):
sys.stderr = save_stderr
def _iterative_matches(self, nodes):
"""Helper to iteratively yield the matches."""
nodelen = len(nodes)
if 0 >= self.min:
yield 0, {}
results = []
# generate matches that use just one alt from self.content
for alt in self.content:
for c, r in generate_matches(alt, nodes):
yield c, r
results.append((c, r))
# for each match, iterate down the nodes
while results:
new_results = []
for c0, r0 in results:
# stop if the entire set of nodes has been matched
if c0 < nodelen and c0 <= self.max:
for alt in self.content:
for c1, r1 in generate_matches(alt, nodes[c0:]):
if c1 > 0:
r = {}
r.update(r0)
r.update(r1)
yield c0 + c1, r
new_results.append((c0 + c1, r))
results = new_results
def _bare_name_matches(self, nodes):
"""Special optimized matcher for bare_name."""
count = 0
r = {}
done = False
max = len(nodes)
while not done and count < max:
done = True
for leaf in self.content:
if leaf[0].match(nodes[count], r):
count += 1
done = False
break
r[self.name] = nodes[:count]
return count, r
def _recursive_matches(self, nodes, count):
"""Helper to recursively yield the matches."""
assert self.content is not None
if count >= self.min:
yield 0, {}
if count < self.max:
for alt in self.content:
for c0, r0 in generate_matches(alt, nodes):
for c1, r1 in self._recursive_matches(nodes[c0:], count+1):
r = {}
r.update(r0)
r.update(r1)
yield c0 + c1, r
class NegatedPattern(BasePattern):
def __init__(self, content=None):
"""
Initializer.
The argument is either a pattern or None. If it is None, this
only matches an empty sequence (effectively '$' in regex
lingo). If it is not None, this matches whenever the argument
pattern doesn't have any matches.
"""
if content is not None:
assert isinstance(content, BasePattern), repr(content)
self.content = content
def match(self, node):
# We never match a node in its entirety
return False
def match_seq(self, nodes):
# We only match an empty sequence of nodes in its entirety
return len(nodes) == 0
def generate_matches(self, nodes):
if self.content is None:
# Return a match if there is an empty sequence
if len(nodes) == 0:
yield 0, {}
else:
# Return a match if the argument pattern has no matches
for c, r in self.content.generate_matches(nodes):
return
yield 0, {}
def generate_matches(patterns, nodes):
"""
Generator yielding matches for a sequence of patterns and nodes.
Args:
patterns: a sequence of patterns
nodes: a sequence of nodes
Yields:
(count, results) tuples where:
count: the entire sequence of patterns matches nodes[:count];
results: dict containing named submatches.
"""
if not patterns:
yield 0, {}
else:
p, rest = patterns[0], patterns[1:]
for c0, r0 in p.generate_matches(nodes):
if not rest:
yield c0, r0
else:
for c1, r1 in generate_matches(rest, nodes[c0:]):
r = {}
r.update(r0)
r.update(r1)
yield c0 + c1, r
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Refactoring framework.
Used as a main program, this can refactor any number of files and/or
recursively descend down directories. Imported as a module, this
provides infrastructure to write your own refactoring tool.
"""
from __future__ import with_statement
__author__ = "Guido van Rossum <[email protected]>"
# Python imports
import os
import sys
import logging
import operator
import collections
import io
from itertools import chain
# Local imports
from .pgen2 import driver, tokenize, token
from .fixer_util import find_root
from . import pytree, pygram
from . import btm_utils as bu
from . import btm_matcher as bm
def get_all_fix_names(fixer_pkg, remove_prefix=True):
"""Return a sorted list of all available fix names in the given package."""
pkg = __import__(fixer_pkg, [], [], ["*"])
fixer_dir = os.path.dirname(pkg.__file__)
fix_names = []
for name in sorted(os.listdir(fixer_dir)):
if name.startswith("fix_") and name.endswith(".py"):
if remove_prefix:
name = name[4:]
fix_names.append(name[:-3])
return fix_names
class _EveryNode(Exception):
pass
def _get_head_types(pat):
""" Accepts a pytree Pattern Node and returns a set
of the pattern types which will match first. """
if isinstance(pat, (pytree.NodePattern, pytree.LeafPattern)):
# NodePatters must either have no type and no content
# or a type and content -- so they don't get any farther
# Always return leafs
if pat.type is None:
raise _EveryNode
return {pat.type}
if isinstance(pat, pytree.NegatedPattern):
if pat.content:
return _get_head_types(pat.content)
raise _EveryNode # Negated Patterns don't have a type
if isinstance(pat, pytree.WildcardPattern):
# Recurse on each node in content
r = set()
for p in pat.content:
for x in p:
r.update(_get_head_types(x))
return r
raise Exception("Oh no! I don't understand pattern %s" %(pat))
def _get_headnode_dict(fixer_list):
""" Accepts a list of fixers and returns a dictionary
of head node type --> fixer list. """
head_nodes = collections.defaultdict(list)
every = []
for fixer in fixer_list:
if fixer.pattern:
try:
heads = _get_head_types(fixer.pattern)
except _EveryNode:
every.append(fixer)
else:
for node_type in heads:
head_nodes[node_type].append(fixer)
else:
if fixer._accept_type is not None:
head_nodes[fixer._accept_type].append(fixer)
else:
every.append(fixer)
for node_type in chain(pygram.python_grammar.symbol2number.values(),
pygram.python_grammar.tokens):
head_nodes[node_type].extend(every)
return dict(head_nodes)
def get_fixers_from_package(pkg_name):
"""
Return the fully qualified names for fixers in the package pkg_name.
"""
return [pkg_name + "." + fix_name
for fix_name in get_all_fix_names(pkg_name, False)]
def _identity(obj):
return obj
if sys.version_info < (3, 0):
import codecs
_open_with_encoding = codecs.open
# codecs.open doesn't translate newlines sadly.
def _from_system_newlines(input):
return input.replace("\r\n", "\n")
def _to_system_newlines(input):
if os.linesep != "\n":
return input.replace("\n", os.linesep)
else:
return input
else:
_open_with_encoding = open
_from_system_newlines = _identity
_to_system_newlines = _identity
def _detect_future_features(source):
have_docstring = False
gen = tokenize.generate_tokens(io.StringIO(source).readline)
def advance():
tok = next(gen)
return tok[0], tok[1]
ignore = frozenset({token.NEWLINE, tokenize.NL, token.COMMENT})
features = set()
try:
while True:
tp, value = advance()
if tp in ignore:
continue
elif tp == token.STRING:
if have_docstring:
break
have_docstring = True
elif tp == token.NAME and value == "from":
tp, value = advance()
if tp != token.NAME or value != "__future__":
break
tp, value = advance()
if tp != token.NAME or value != "import":
break
tp, value = advance()
if tp == token.OP and value == "(":
tp, value = advance()
while tp == token.NAME:
features.add(value)
tp, value = advance()
if tp != token.OP or value != ",":
break
tp, value = advance()
else:
break
except StopIteration:
pass
return frozenset(features)
class FixerError(Exception):
"""A fixer could not be loaded."""
class RefactoringTool(object):
_default_options = {"print_function" : False,
"write_unchanged_files" : False}
CLASS_PREFIX = "Fix" # The prefix for fixer classes
FILE_PREFIX = "fix_" # The prefix for modules with a fixer within
def __init__(self, fixer_names, options=None, explicit=None):
"""Initializer.
Args:
fixer_names: a list of fixers to import
options: an dict with configuration.
explicit: a list of fixers to run even if they are explicit.
"""
self.fixers = fixer_names
self.explicit = explicit or []
self.options = self._default_options.copy()
if options is not None:
self.options.update(options)
if self.options["print_function"]:
self.grammar = pygram.python_grammar_no_print_statement
else:
self.grammar = pygram.python_grammar
# When this is True, the refactor*() methods will call write_file() for
# files processed even if they were not changed during refactoring. If
# and only if the refactor method's write parameter was True.
self.write_unchanged_files = self.options.get("write_unchanged_files")
self.errors = []
self.logger = logging.getLogger("RefactoringTool")
self.fixer_log = []
self.wrote = False
self.driver = driver.Driver(self.grammar,
convert=pytree.convert,
logger=self.logger)
self.pre_order, self.post_order = self.get_fixers()
self.files = [] # List of files that were or should be modified
self.BM = bm.BottomMatcher()
self.bmi_pre_order = [] # Bottom Matcher incompatible fixers
self.bmi_post_order = []
for fixer in chain(self.post_order, self.pre_order):
if fixer.BM_compatible:
self.BM.add_fixer(fixer)
# remove fixers that will be handled by the bottom-up
# matcher
elif fixer in self.pre_order:
self.bmi_pre_order.append(fixer)
elif fixer in self.post_order:
self.bmi_post_order.append(fixer)
self.bmi_pre_order_heads = _get_headnode_dict(self.bmi_pre_order)
self.bmi_post_order_heads = _get_headnode_dict(self.bmi_post_order)
def get_fixers(self):
"""Inspects the options to load the requested patterns and handlers.
Returns:
(pre_order, post_order), where pre_order is the list of fixers that
want a pre-order AST traversal, and post_order is the list that want
post-order traversal.
"""
pre_order_fixers = []
post_order_fixers = []
for fix_mod_path in self.fixers:
mod = __import__(fix_mod_path, {}, {}, ["*"])
fix_name = fix_mod_path.rsplit(".", 1)[-1]
if fix_name.startswith(self.FILE_PREFIX):
fix_name = fix_name[len(self.FILE_PREFIX):]
parts = fix_name.split("_")
class_name = self.CLASS_PREFIX + "".join([p.title() for p in parts])
try:
fix_class = getattr(mod, class_name)
except AttributeError:
raise FixerError("Can't find %s.%s" % (fix_name, class_name))
fixer = fix_class(self.options, self.fixer_log)
if fixer.explicit and self.explicit is not True and \
fix_mod_path not in self.explicit:
self.log_message("Skipping optional fixer: %s", fix_name)
continue
self.log_debug("Adding transformation: %s", fix_name)
if fixer.order == "pre":
pre_order_fixers.append(fixer)
elif fixer.order == "post":
post_order_fixers.append(fixer)
else:
raise FixerError("Illegal fixer order: %r" % fixer.order)
key_func = operator.attrgetter("run_order")
pre_order_fixers.sort(key=key_func)
post_order_fixers.sort(key=key_func)
return (pre_order_fixers, post_order_fixers)
def log_error(self, msg, *args, **kwds):
"""Called when an error occurs."""
raise
def log_message(self, msg, *args):
"""Hook to log a message."""
if args:
msg = msg % args
self.logger.info(msg)
def log_debug(self, msg, *args):
if args:
msg = msg % args
self.logger.debug(msg)
def print_output(self, old_text, new_text, filename, equal):
"""Called with the old version, new version, and filename of a
refactored file."""
pass
def refactor(self, items, write=False, doctests_only=False):
"""Refactor a list of files and directories."""
for dir_or_file in items:
if os.path.isdir(dir_or_file):
self.refactor_dir(dir_or_file, write, doctests_only)
else:
self.refactor_file(dir_or_file, write, doctests_only)
def refactor_dir(self, dir_name, write=False, doctests_only=False):
"""Descends down a directory and refactor every Python file found.
Python files are assumed to have a .py extension.
Files and subdirectories starting with '.' are skipped.
"""
py_ext = os.extsep + "py"
for dirpath, dirnames, filenames in os.walk(dir_name):
self.log_debug("Descending into %s", dirpath)
dirnames.sort()
filenames.sort()
for name in filenames:
if (not name.startswith(".") and
os.path.splitext(name)[1] == py_ext):
fullname = os.path.join(dirpath, name)
self.refactor_file(fullname, write, doctests_only)
# Modify dirnames in-place to remove subdirs with leading dots
dirnames[:] = [dn for dn in dirnames if not dn.startswith(".")]
def _read_python_source(self, filename):
"""
Do our best to decode a Python source file correctly.
"""
try:
f = open(filename, "rb")
except OSError as err:
self.log_error("Can't open %s: %s", filename, err)
return None, None
try:
encoding = tokenize.detect_encoding(f.readline)[0]
finally:
f.close()
with _open_with_encoding(filename, "r", encoding=encoding) as f:
return _from_system_newlines(f.read()), encoding
def refactor_file(self, filename, write=False, doctests_only=False):
"""Refactors a file."""
input, encoding = self._read_python_source(filename)
if input is None:
# Reading the file failed.
return
input += "\n" # Silence certain parse errors
if doctests_only:
self.log_debug("Refactoring doctests in %s", filename)
output = self.refactor_docstring(input, filename)
if self.write_unchanged_files or output != input:
self.processed_file(output, filename, input, write, encoding)
else:
self.log_debug("No doctest changes in %s", filename)
else:
tree = self.refactor_string(input, filename)
if self.write_unchanged_files or (tree and tree.was_changed):
# The [:-1] is to take off the \n we added earlier
self.processed_file(str(tree)[:-1], filename,
write=write, encoding=encoding)
else:
self.log_debug("No changes in %s", filename)
def refactor_string(self, data, name):
"""Refactor a given input string.
Args:
data: a string holding the code to be refactored.
name: a human-readable name for use in error/log messages.
Returns:
An AST corresponding to the refactored input stream; None if
there were errors during the parse.
"""
features = _detect_future_features(data)
if "print_function" in features:
self.driver.grammar = pygram.python_grammar_no_print_statement
try:
tree = self.driver.parse_string(data)
except Exception as err:
self.log_error("Can't parse %s: %s: %s",
name, err.__class__.__name__, err)
return
finally:
self.driver.grammar = self.grammar
tree.future_features = features
self.log_debug("Refactoring %s", name)
self.refactor_tree(tree, name)
return tree
def refactor_stdin(self, doctests_only=False):
input = sys.stdin.read()
if doctests_only:
self.log_debug("Refactoring doctests in stdin")
output = self.refactor_docstring(input, "<stdin>")
if self.write_unchanged_files or output != input:
self.processed_file(output, "<stdin>", input)
else:
self.log_debug("No doctest changes in stdin")
else:
tree = self.refactor_string(input, "<stdin>")
if self.write_unchanged_files or (tree and tree.was_changed):
self.processed_file(str(tree), "<stdin>", input)
else:
self.log_debug("No changes in stdin")
def refactor_tree(self, tree, name):
"""Refactors a parse tree (modifying the tree in place).
For compatible patterns the bottom matcher module is
used. Otherwise the tree is traversed node-to-node for
matches.
Args:
tree: a pytree.Node instance representing the root of the tree
to be refactored.
name: a human-readable name for this tree.
Returns:
True if the tree was modified, False otherwise.
"""
for fixer in chain(self.pre_order, self.post_order):
fixer.start_tree(tree, name)
#use traditional matching for the incompatible fixers
self.traverse_by(self.bmi_pre_order_heads, tree.pre_order())
self.traverse_by(self.bmi_post_order_heads, tree.post_order())
# obtain a set of candidate nodes
match_set = self.BM.run(tree.leaves())
while any(match_set.values()):
for fixer in self.BM.fixers:
if fixer in match_set and match_set[fixer]:
#sort by depth; apply fixers from bottom(of the AST) to top
match_set[fixer].sort(key=pytree.Base.depth, reverse=True)
if fixer.keep_line_order:
#some fixers(eg fix_imports) must be applied
#with the original file's line order
match_set[fixer].sort(key=pytree.Base.get_lineno)
for node in list(match_set[fixer]):
if node in match_set[fixer]:
match_set[fixer].remove(node)
try:
find_root(node)
except ValueError:
# this node has been cut off from a
# previous transformation ; skip
continue
if node.fixers_applied and fixer in node.fixers_applied:
# do not apply the same fixer again
continue
results = fixer.match(node)
if results:
new = fixer.transform(node, results)
if new is not None:
node.replace(new)
#new.fixers_applied.append(fixer)
for node in new.post_order():
# do not apply the fixer again to
# this or any subnode
if not node.fixers_applied:
node.fixers_applied = []
node.fixers_applied.append(fixer)
# update the original match set for
# the added code
new_matches = self.BM.run(new.leaves())
for fxr in new_matches:
if not fxr in match_set:
match_set[fxr]=[]
match_set[fxr].extend(new_matches[fxr])
for fixer in chain(self.pre_order, self.post_order):
fixer.finish_tree(tree, name)
return tree.was_changed
def traverse_by(self, fixers, traversal):
"""Traverse an AST, applying a set of fixers to each node.
This is a helper method for refactor_tree().
Args:
fixers: a list of fixer instances.
traversal: a generator that yields AST nodes.
Returns:
None
"""
if not fixers:
return
for node in traversal:
for fixer in fixers[node.type]:
results = fixer.match(node)
if results:
new = fixer.transform(node, results)
if new is not None:
node.replace(new)
node = new
def processed_file(self, new_text, filename, old_text=None, write=False,
encoding=None):
"""
Called when a file has been refactored and there may be changes.
"""
self.files.append(filename)
if old_text is None:
old_text = self._read_python_source(filename)[0]
if old_text is None:
return
equal = old_text == new_text
self.print_output(old_text, new_text, filename, equal)
if equal:
self.log_debug("No changes to %s", filename)
if not self.write_unchanged_files:
return
if write:
self.write_file(new_text, filename, old_text, encoding)
else:
self.log_debug("Not writing changes to %s", filename)
def write_file(self, new_text, filename, old_text, encoding=None):
"""Writes a string to a file.
It first shows a unified diff between the old text and the new text, and
then rewrites the file; the latter is only done if the write option is
set.
"""
try:
f = _open_with_encoding(filename, "w", encoding=encoding)
except OSError as err:
self.log_error("Can't create %s: %s", filename, err)
return
try:
f.write(_to_system_newlines(new_text))
except OSError as err:
self.log_error("Can't write %s: %s", filename, err)
finally:
f.close()
self.log_debug("Wrote changes to %s", filename)
self.wrote = True
PS1 = ">>> "
PS2 = "... "
def refactor_docstring(self, input, filename):
"""Refactors a docstring, looking for doctests.
This returns a modified version of the input string. It looks
for doctests, which start with a ">>>" prompt, and may be
continued with "..." prompts, as long as the "..." is indented
the same as the ">>>".
(Unfortunately we can't use the doctest module's parser,
since, like most parsers, it is not geared towards preserving
the original source.)
"""
result = []
block = None
block_lineno = None
indent = None
lineno = 0
for line in input.splitlines(keepends=True):
lineno += 1
if line.lstrip().startswith(self.PS1):
if block is not None:
result.extend(self.refactor_doctest(block, block_lineno,
indent, filename))
block_lineno = lineno
block = [line]
i = line.find(self.PS1)
indent = line[:i]
elif (indent is not None and
(line.startswith(indent + self.PS2) or
line == indent + self.PS2.rstrip() + "\n")):
block.append(line)
else:
if block is not None:
result.extend(self.refactor_doctest(block, block_lineno,
indent, filename))
block = None
indent = None
result.append(line)
if block is not None:
result.extend(self.refactor_doctest(block, block_lineno,
indent, filename))
return "".join(result)
def refactor_doctest(self, block, lineno, indent, filename):
"""Refactors one doctest.
A doctest is given as a block of lines, the first of which starts
with ">>>" (possibly indented), while the remaining lines start
with "..." (identically indented).
"""
try:
tree = self.parse_block(block, lineno, indent)
except Exception as err:
if self.logger.isEnabledFor(logging.DEBUG):
for line in block:
self.log_debug("Source: %s", line.rstrip("\n"))
self.log_error("Can't parse docstring in %s line %s: %s: %s",
filename, lineno, err.__class__.__name__, err)
return block
if self.refactor_tree(tree, filename):
new = str(tree).splitlines(keepends=True)
# Undo the adjustment of the line numbers in wrap_toks() below.
clipped, new = new[:lineno-1], new[lineno-1:]
assert clipped == ["\n"] * (lineno-1), clipped
if not new[-1].endswith("\n"):
new[-1] += "\n"
block = [indent + self.PS1 + new.pop(0)]
if new:
block += [indent + self.PS2 + line for line in new]
return block
def summarize(self):
if self.wrote:
were = "were"
else:
were = "need to be"
if not self.files:
self.log_message("No files %s modified.", were)
else:
self.log_message("Files that %s modified:", were)
for file in self.files:
self.log_message(file)
if self.fixer_log:
self.log_message("Warnings/messages while refactoring:")
for message in self.fixer_log:
self.log_message(message)
if self.errors:
if len(self.errors) == 1:
self.log_message("There was 1 error:")
else:
self.log_message("There were %d errors:", len(self.errors))
for msg, args, kwds in self.errors:
self.log_message(msg, *args, **kwds)
def parse_block(self, block, lineno, indent):
"""Parses a block into a tree.
This is necessary to get correct line number / offset information
in the parser diagnostics and embedded into the parse tree.
"""
tree = self.driver.parse_tokens(self.wrap_toks(block, lineno, indent))
tree.future_features = frozenset()
return tree
def wrap_toks(self, block, lineno, indent):
"""Wraps a tokenize stream to systematically modify start/end."""
tokens = tokenize.generate_tokens(self.gen_lines(block, indent).__next__)
for type, value, (line0, col0), (line1, col1), line_text in tokens:
line0 += lineno - 1
line1 += lineno - 1
# Don't bother updating the columns; this is too complicated
# since line_text would also have to be updated and it would
# still break for tokens spanning lines. Let the user guess
# that the column numbers for doctests are relative to the
# end of the prompt string (PS1 or PS2).
yield type, value, (line0, col0), (line1, col1), line_text
def gen_lines(self, block, indent):
"""Generates lines as expected by tokenize from a list of lines.
This strips the first len(indent + self.PS1) characters off each line.
"""
prefix1 = indent + self.PS1
prefix2 = indent + self.PS2
prefix = prefix1
for line in block:
if line.startswith(prefix):
yield line[len(prefix):]
elif line == prefix.rstrip() + "\n":
yield "\n"
else:
raise AssertionError("line=%r, prefix=%r" % (line, prefix))
prefix = prefix2
while True:
yield ""
class MultiprocessingUnsupported(Exception):
pass
class MultiprocessRefactoringTool(RefactoringTool):
def __init__(self, *args, **kwargs):
super(MultiprocessRefactoringTool, self).__init__(*args, **kwargs)
self.queue = None
self.output_lock = None
def refactor(self, items, write=False, doctests_only=False,
num_processes=1):
if num_processes == 1:
return super(MultiprocessRefactoringTool, self).refactor(
items, write, doctests_only)
try:
import multiprocessing
except ImportError:
raise MultiprocessingUnsupported
if self.queue is not None:
raise RuntimeError("already doing multiple processes")
self.queue = multiprocessing.JoinableQueue()
self.output_lock = multiprocessing.Lock()
processes = [multiprocessing.Process(target=self._child)
for i in range(num_processes)]
try:
for p in processes:
p.start()
super(MultiprocessRefactoringTool, self).refactor(items, write,
doctests_only)
finally:
self.queue.join()
for i in range(num_processes):
self.queue.put(None)
for p in processes:
if p.is_alive():
p.join()
self.queue = None
def _child(self):
task = self.queue.get()
while task is not None:
args, kwargs = task
try:
super(MultiprocessRefactoringTool, self).refactor_file(
*args, **kwargs)
finally:
self.queue.task_done()
task = self.queue.get()
def refactor_file(self, *args, **kwargs):
if self.queue is not None:
self.queue.put((args, kwargs))
else:
return super(MultiprocessRefactoringTool, self).refactor_file(
*args, **kwargs)
#empty
import sys
from .main import main
sys.exit(main("lib2to3.fixes"))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for apply().
This converts apply(func, v, k) into (func)(*v, **k)."""
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Call, Comma, parenthesize
class FixApply(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'apply'
trailer<
'('
arglist<
(not argument<NAME '=' any>) func=any ','
(not argument<NAME '=' any>) args=any [','
(not argument<NAME '=' any>) kwds=any] [',']
>
')'
>
>
"""
def transform(self, node, results):
syms = self.syms
assert results
func = results["func"]
args = results["args"]
kwds = results.get("kwds")
prefix = node.prefix
func = func.clone()
if (func.type not in (token.NAME, syms.atom) and
(func.type != syms.power or
func.children[-2].type == token.DOUBLESTAR)):
# Need to parenthesize
func = parenthesize(func)
func.prefix = ""
args = args.clone()
args.prefix = ""
if kwds is not None:
kwds = kwds.clone()
kwds.prefix = ""
l_newargs = [pytree.Leaf(token.STAR, "*"), args]
if kwds is not None:
l_newargs.extend([Comma(),
pytree.Leaf(token.DOUBLESTAR, "**"),
kwds])
l_newargs[-2].prefix = " " # that's the ** token
# XXX Sometimes we could be cleverer, e.g. apply(f, (x, y) + t)
# can be translated into f(x, y, *t) instead of f(*(x, y) + t)
#new = pytree.Node(syms.power, (func, ArgList(l_newargs)))
return Call(func, l_newargs, prefix=prefix)
"""Fixer that replaces deprecated unittest method names."""
# Author: Ezio Melotti
from ..fixer_base import BaseFix
from ..fixer_util import Name
NAMES = dict(
assert_="assertTrue",
assertEquals="assertEqual",
assertNotEquals="assertNotEqual",
assertAlmostEquals="assertAlmostEqual",
assertNotAlmostEquals="assertNotAlmostEqual",
assertRegexpMatches="assertRegex",
assertRaisesRegexp="assertRaisesRegex",
failUnlessEqual="assertEqual",
failIfEqual="assertNotEqual",
failUnlessAlmostEqual="assertAlmostEqual",
failIfAlmostEqual="assertNotAlmostEqual",
failUnless="assertTrue",
failUnlessRaises="assertRaises",
failIf="assertFalse",
)
class FixAsserts(BaseFix):
PATTERN = """
power< any+ trailer< '.' meth=(%s)> any* >
""" % '|'.join(map(repr, NAMES))
def transform(self, node, results):
name = results["meth"][0]
name.replace(Name(NAMES[str(name)], prefix=name.prefix))
"""Fixer for basestring -> str."""
# Author: Christian Heimes
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixBasestring(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "'basestring'"
def transform(self, node, results):
return Name("str", prefix=node.prefix)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes buffer(...) into memoryview(...)."""
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixBuffer(fixer_base.BaseFix):
BM_compatible = True
explicit = True # The user must ask for this fixer
PATTERN = """
power< name='buffer' trailer< '(' [any] ')' > any* >
"""
def transform(self, node, results):
name = results["name"]
name.replace(Name("memoryview", prefix=name.prefix))
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for callable().
This converts callable(obj) into isinstance(obj, collections.Callable), adding a
collections import if needed."""
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import Call, Name, String, Attr, touch_import
class FixCallable(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
# Ignore callable(*args) or use of keywords.
# Either could be a hint that the builtin callable() is not being used.
PATTERN = """
power< 'callable'
trailer< lpar='('
( not(arglist | argument<any '=' any>) func=any
| func=arglist<(not argument<any '=' any>) any ','> )
rpar=')' >
after=any*
>
"""
def transform(self, node, results):
func = results['func']
touch_import(None, 'collections', node=node)
args = [func.clone(), String(', ')]
args.extend(Attr(Name('collections'), Name('Callable')))
return Call(Name('isinstance'), args, prefix=node.prefix)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for dict methods.
d.keys() -> list(d.keys())
d.items() -> list(d.items())
d.values() -> list(d.values())
d.iterkeys() -> iter(d.keys())
d.iteritems() -> iter(d.items())
d.itervalues() -> iter(d.values())
d.viewkeys() -> d.keys()
d.viewitems() -> d.items()
d.viewvalues() -> d.values()
Except in certain very specific contexts: the iter() can be dropped
when the context is list(), sorted(), iter() or for...in; the list()
can be dropped when the context is list() or sorted() (but not iter()
or for...in!). Special contexts that apply to both: list(), sorted(), tuple()
set(), any(), all(), sum().
Note: iter(d.keys()) could be written as iter(d) but since the
original d.iterkeys() was also redundant we don't fix this. And there
are (rare) contexts where it makes a difference (e.g. when passing it
as an argument to a function that introspects the argument).
"""
# Local imports
from .. import pytree
from .. import patcomp
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, LParen, RParen, ArgList, Dot
from .. import fixer_util
iter_exempt = fixer_util.consuming_calls | {"iter"}
class FixDict(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< head=any+
trailer< '.' method=('keys'|'items'|'values'|
'iterkeys'|'iteritems'|'itervalues'|
'viewkeys'|'viewitems'|'viewvalues') >
parens=trailer< '(' ')' >
tail=any*
>
"""
def transform(self, node, results):
head = results["head"]
method = results["method"][0] # Extract node for method name
tail = results["tail"]
syms = self.syms
method_name = method.value
isiter = method_name.startswith("iter")
isview = method_name.startswith("view")
if isiter or isview:
method_name = method_name[4:]
assert method_name in ("keys", "items", "values"), repr(method)
head = [n.clone() for n in head]
tail = [n.clone() for n in tail]
special = not tail and self.in_special_context(node, isiter)
args = head + [pytree.Node(syms.trailer,
[Dot(),
Name(method_name,
prefix=method.prefix)]),
results["parens"].clone()]
new = pytree.Node(syms.power, args)
if not (special or isview):
new.prefix = ""
new = Call(Name("iter" if isiter else "list"), [new])
if tail:
new = pytree.Node(syms.power, [new] + tail)
new.prefix = node.prefix
return new
P1 = "power< func=NAME trailer< '(' node=any ')' > any* >"
p1 = patcomp.compile_pattern(P1)
P2 = """for_stmt< 'for' any 'in' node=any ':' any* >
| comp_for< 'for' any 'in' node=any any* >
"""
p2 = patcomp.compile_pattern(P2)
def in_special_context(self, node, isiter):
if node.parent is None:
return False
results = {}
if (node.parent.parent is not None and
self.p1.match(node.parent.parent, results) and
results["node"] is node):
if isiter:
# iter(d.iterkeys()) -> iter(d.keys()), etc.
return results["func"].value in iter_exempt
else:
# list(d.keys()) -> list(d.keys()), etc.
return results["func"].value in fixer_util.consuming_calls
if not isiter:
return False
# for ... in d.iterkeys() -> for ... in d.keys(), etc.
return self.p2.match(node.parent, results) and results["node"] is node
"""Fixer for except statements with named exceptions.
The following cases will be converted:
- "except E, T:" where T is a name:
except E as T:
- "except E, T:" where T is not a name, tuple or list:
except E as t:
T = t
This is done because the target of an "except" clause must be a
name.
- "except E, T:" where T is a tuple or list literal:
except E as t:
T = t.args
"""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Assign, Attr, Name, is_tuple, is_list, syms
def find_excepts(nodes):
for i, n in enumerate(nodes):
if n.type == syms.except_clause:
if n.children[0].value == 'except':
yield (n, nodes[i+2])
class FixExcept(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
try_stmt< 'try' ':' (simple_stmt | suite)
cleanup=(except_clause ':' (simple_stmt | suite))+
tail=(['except' ':' (simple_stmt | suite)]
['else' ':' (simple_stmt | suite)]
['finally' ':' (simple_stmt | suite)]) >
"""
def transform(self, node, results):
syms = self.syms
tail = [n.clone() for n in results["tail"]]
try_cleanup = [ch.clone() for ch in results["cleanup"]]
for except_clause, e_suite in find_excepts(try_cleanup):
if len(except_clause.children) == 4:
(E, comma, N) = except_clause.children[1:4]
comma.replace(Name("as", prefix=" "))
if N.type != token.NAME:
# Generate a new N for the except clause
new_N = Name(self.new_name(), prefix=" ")
target = N.clone()
target.prefix = ""
N.replace(new_N)
new_N = new_N.clone()
# Insert "old_N = new_N" as the first statement in
# the except body. This loop skips leading whitespace
# and indents
#TODO(cwinter) suite-cleanup
suite_stmts = e_suite.children
for i, stmt in enumerate(suite_stmts):
if isinstance(stmt, pytree.Node):
break
# The assignment is different if old_N is a tuple or list
# In that case, the assignment is old_N = new_N.args
if is_tuple(N) or is_list(N):
assign = Assign(target, Attr(new_N, Name('args')))
else:
assign = Assign(target, new_N)
#TODO(cwinter) stopgap until children becomes a smart list
for child in reversed(suite_stmts[:i]):
e_suite.insert_child(0, child)
e_suite.insert_child(i, assign)
elif N.prefix == "":
# No space after a comma is legal; no space after "as",
# not so much.
N.prefix = " "
#TODO(cwinter) fix this when children becomes a smart list
children = [c.clone() for c in node.children[:3]] + try_cleanup + tail
return pytree.Node(node.type, children)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for exec.
This converts usages of the exec statement into calls to a built-in
exec() function.
exec code in ns1, ns2 -> exec(code, ns1, ns2)
"""
# Local imports
from .. import pytree
from .. import fixer_base
from ..fixer_util import Comma, Name, Call
class FixExec(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
exec_stmt< 'exec' a=any 'in' b=any [',' c=any] >
|
exec_stmt< 'exec' (not atom<'(' [any] ')'>) a=any >
"""
def transform(self, node, results):
assert results
syms = self.syms
a = results["a"]
b = results.get("b")
c = results.get("c")
args = [a.clone()]
args[0].prefix = ""
if b is not None:
args.extend([Comma(), b.clone()])
if c is not None:
args.extend([Comma(), c.clone()])
return Call(Name("exec"), args, prefix=node.prefix)
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for execfile.
This converts usages of the execfile function into calls to the built-in
exec() function.
"""
from .. import fixer_base
from ..fixer_util import (Comma, Name, Call, LParen, RParen, Dot, Node,
ArgList, String, syms)
class FixExecfile(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'execfile' trailer< '(' arglist< filename=any [',' globals=any [',' locals=any ] ] > ')' > >
|
power< 'execfile' trailer< '(' filename=any ')' > >
"""
def transform(self, node, results):
assert results
filename = results["filename"]
globals = results.get("globals")
locals = results.get("locals")
# Copy over the prefix from the right parentheses end of the execfile
# call.
execfile_paren = node.children[-1].children[-1].clone()
# Construct open().read().
open_args = ArgList([filename.clone()], rparen=execfile_paren)
open_call = Node(syms.power, [Name("open"), open_args])
read = [Node(syms.trailer, [Dot(), Name('read')]),
Node(syms.trailer, [LParen(), RParen()])]
open_expr = [open_call] + read
# Wrap the open call in a compile call. This is so the filename will be
# preserved in the execed code.
filename_arg = filename.clone()
filename_arg.prefix = " "
exec_str = String("'exec'", " ")
compile_args = open_expr + [Comma(), filename_arg, Comma(), exec_str]
compile_call = Call(Name("compile"), compile_args, "")
# Finally, replace the execfile call with an exec call.
args = [compile_call]
if globals is not None:
args.extend([Comma(), globals.clone()])
if locals is not None:
args.extend([Comma(), locals.clone()])
return Call(Name("exec"), args, prefix=node.prefix)
"""
Convert use of sys.exitfunc to use the atexit module.
"""
# Author: Benjamin Peterson
from lib2to3 import pytree, fixer_base
from lib2to3.fixer_util import Name, Attr, Call, Comma, Newline, syms
class FixExitfunc(fixer_base.BaseFix):
keep_line_order = True
BM_compatible = True
PATTERN = """
(
sys_import=import_name<'import'
('sys'
|
dotted_as_names< (any ',')* 'sys' (',' any)* >
)
>
|
expr_stmt<
power< 'sys' trailer< '.' 'exitfunc' > >
'=' func=any >
)
"""
def __init__(self, *args):
super(FixExitfunc, self).__init__(*args)
def start_tree(self, tree, filename):
super(FixExitfunc, self).start_tree(tree, filename)
self.sys_import = None
def transform(self, node, results):
# First, find the sys import. We'll just hope it's global scope.
if "sys_import" in results:
if self.sys_import is None:
self.sys_import = results["sys_import"]
return
func = results["func"].clone()
func.prefix = ""
register = pytree.Node(syms.power,
Attr(Name("atexit"), Name("register"))
)
call = Call(register, [func], node.prefix)
node.replace(call)
if self.sys_import is None:
# That's interesting.
self.warning(node, "Can't find sys import; Please add an atexit "
"import at the top of your file.")
return
# Now add an atexit import after the sys import.
names = self.sys_import.children[1]
if names.type == syms.dotted_as_names:
names.append_child(Comma())
names.append_child(Name("atexit", " "))
else:
containing_stmt = self.sys_import.parent
position = containing_stmt.children.index(self.sys_import)
stmt_container = containing_stmt.parent
new_import = pytree.Node(syms.import_name,
[Name("import"), Name("atexit", " ")]
)
new = pytree.Node(syms.simple_stmt, [new_import])
containing_stmt.insert_child(position + 1, Newline())
containing_stmt.insert_child(position + 2, new)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes filter(F, X) into list(filter(F, X)).
We avoid the transformation if the filter() call is directly contained
in iter(<>), list(<>), tuple(<>), sorted(<>), ...join(<>), or
for V in <>:.
NOTE: This is still not correct if the original code was depending on
filter(F, X) to return a string if X is a string and a tuple if X is a
tuple. That would require type inference, which we don't do. Let
Python 2.6 figure it out.
"""
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, ListComp, in_special_context
class FixFilter(fixer_base.ConditionalFix):
BM_compatible = True
PATTERN = """
filter_lambda=power<
'filter'
trailer<
'('
arglist<
lambdef< 'lambda'
(fp=NAME | vfpdef< '(' fp=NAME ')'> ) ':' xp=any
>
','
it=any
>
')'
>
>
|
power<
'filter'
trailer< '(' arglist< none='None' ',' seq=any > ')' >
>
|
power<
'filter'
args=trailer< '(' [any] ')' >
>
"""
skip_on = "future_builtins.filter"
def transform(self, node, results):
if self.should_skip(node):
return
if "filter_lambda" in results:
new = ListComp(results.get("fp").clone(),
results.get("fp").clone(),
results.get("it").clone(),
results.get("xp").clone())
elif "none" in results:
new = ListComp(Name("_f"),
Name("_f"),
results["seq"].clone(),
Name("_f"))
else:
if in_special_context(node):
return None
new = node.clone()
new.prefix = ""
new = Call(Name("list"), [new])
new.prefix = node.prefix
return new
"""Fix function attribute names (f.func_x -> f.__x__)."""
# Author: Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixFuncattrs(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< any+ trailer< '.' attr=('func_closure' | 'func_doc' | 'func_globals'
| 'func_name' | 'func_defaults' | 'func_code'
| 'func_dict') > any* >
"""
def transform(self, node, results):
attr = results["attr"][0]
attr.replace(Name(("__%s__" % attr.value[5:]),
prefix=attr.prefix))
"""Remove __future__ imports
from __future__ import foo is replaced with an empty line.
"""
# Author: Christian Heimes
# Local imports
from .. import fixer_base
from ..fixer_util import BlankLine
class FixFuture(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """import_from< 'from' module_name="__future__" 'import' any >"""
# This should be run last -- some things check for the import
run_order = 10
def transform(self, node, results):
new = BlankLine()
new.prefix = node.prefix
return new
"""
Fixer that changes os.getcwdu() to os.getcwd().
"""
# Author: Victor Stinner
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixGetcwdu(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'os' trailer< dot='.' name='getcwdu' > any* >
"""
def transform(self, node, results):
name = results["name"]
name.replace(Name("getcwd", prefix=name.prefix))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for has_key().
Calls to .has_key() methods are expressed in terms of the 'in'
operator:
d.has_key(k) -> k in d
CAVEATS:
1) While the primary target of this fixer is dict.has_key(), the
fixer will change any has_key() method call, regardless of its
class.
2) Cases like this will not be converted:
m = d.has_key
if m(k):
...
Only *calls* to has_key() are converted. While it is possible to
convert the above to something like
m = d.__contains__
if m(k):
...
this is currently not done.
"""
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, parenthesize
class FixHasKey(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
anchor=power<
before=any+
trailer< '.' 'has_key' >
trailer<
'('
( not(arglist | argument<any '=' any>) arg=any
| arglist<(not argument<any '=' any>) arg=any ','>
)
')'
>
after=any*
>
|
negation=not_test<
'not'
anchor=power<
before=any+
trailer< '.' 'has_key' >
trailer<
'('
( not(arglist | argument<any '=' any>) arg=any
| arglist<(not argument<any '=' any>) arg=any ','>
)
')'
>
>
>
"""
def transform(self, node, results):
assert results
syms = self.syms
if (node.parent.type == syms.not_test and
self.pattern.match(node.parent)):
# Don't transform a node matching the first alternative of the
# pattern when its parent matches the second alternative
return None
negation = results.get("negation")
anchor = results["anchor"]
prefix = node.prefix
before = [n.clone() for n in results["before"]]
arg = results["arg"].clone()
after = results.get("after")
if after:
after = [n.clone() for n in after]
if arg.type in (syms.comparison, syms.not_test, syms.and_test,
syms.or_test, syms.test, syms.lambdef, syms.argument):
arg = parenthesize(arg)
if len(before) == 1:
before = before[0]
else:
before = pytree.Node(syms.power, before)
before.prefix = " "
n_op = Name("in", prefix=" ")
if negation:
n_not = Name("not", prefix=" ")
n_op = pytree.Node(syms.comp_op, (n_not, n_op))
new = pytree.Node(syms.comparison, (arg, n_op, before))
if after:
new = parenthesize(new)
new = pytree.Node(syms.power, (new,) + tuple(after))
if node.parent.type in (syms.comparison, syms.expr, syms.xor_expr,
syms.and_expr, syms.shift_expr,
syms.arith_expr, syms.term,
syms.factor, syms.power):
new = parenthesize(new)
new.prefix = prefix
return new
"""Adjust some old Python 2 idioms to their modern counterparts.
* Change some type comparisons to isinstance() calls:
type(x) == T -> isinstance(x, T)
type(x) is T -> isinstance(x, T)
type(x) != T -> not isinstance(x, T)
type(x) is not T -> not isinstance(x, T)
* Change "while 1:" into "while True:".
* Change both
v = list(EXPR)
v.sort()
foo(v)
and the more general
v = EXPR
v.sort()
foo(v)
into
v = sorted(EXPR)
foo(v)
"""
# Author: Jacques Frechet, Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Call, Comma, Name, Node, BlankLine, syms
CMP = "(n='!=' | '==' | 'is' | n=comp_op< 'is' 'not' >)"
TYPE = "power< 'type' trailer< '(' x=any ')' > >"
class FixIdioms(fixer_base.BaseFix):
explicit = True # The user must ask for this fixer
PATTERN = r"""
isinstance=comparison< %s %s T=any >
|
isinstance=comparison< T=any %s %s >
|
while_stmt< 'while' while='1' ':' any+ >
|
sorted=any<
any*
simple_stmt<
expr_stmt< id1=any '='
power< list='list' trailer< '(' (not arglist<any+>) any ')' > >
>
'\n'
>
sort=
simple_stmt<
power< id2=any
trailer< '.' 'sort' > trailer< '(' ')' >
>
'\n'
>
next=any*
>
|
sorted=any<
any*
simple_stmt< expr_stmt< id1=any '=' expr=any > '\n' >
sort=
simple_stmt<
power< id2=any
trailer< '.' 'sort' > trailer< '(' ')' >
>
'\n'
>
next=any*
>
""" % (TYPE, CMP, CMP, TYPE)
def match(self, node):
r = super(FixIdioms, self).match(node)
# If we've matched one of the sort/sorted subpatterns above, we
# want to reject matches where the initial assignment and the
# subsequent .sort() call involve different identifiers.
if r and "sorted" in r:
if r["id1"] == r["id2"]:
return r
return None
return r
def transform(self, node, results):
if "isinstance" in results:
return self.transform_isinstance(node, results)
elif "while" in results:
return self.transform_while(node, results)
elif "sorted" in results:
return self.transform_sort(node, results)
else:
raise RuntimeError("Invalid match")
def transform_isinstance(self, node, results):
x = results["x"].clone() # The thing inside of type()
T = results["T"].clone() # The type being compared against
x.prefix = ""
T.prefix = " "
test = Call(Name("isinstance"), [x, Comma(), T])
if "n" in results:
test.prefix = " "
test = Node(syms.not_test, [Name("not"), test])
test.prefix = node.prefix
return test
def transform_while(self, node, results):
one = results["while"]
one.replace(Name("True", prefix=one.prefix))
def transform_sort(self, node, results):
sort_stmt = results["sort"]
next_stmt = results["next"]
list_call = results.get("list")
simple_expr = results.get("expr")
if list_call:
list_call.replace(Name("sorted", prefix=list_call.prefix))
elif simple_expr:
new = simple_expr.clone()
new.prefix = ""
simple_expr.replace(Call(Name("sorted"), [new],
prefix=simple_expr.prefix))
else:
raise RuntimeError("should not have reached here")
sort_stmt.remove()
btwn = sort_stmt.prefix
# Keep any prefix lines between the sort_stmt and the list_call and
# shove them right after the sorted() call.
if "\n" in btwn:
if next_stmt:
# The new prefix should be everything from the sort_stmt's
# prefix up to the last newline, then the old prefix after a new
# line.
prefix_lines = (btwn.rpartition("\n")[0], next_stmt[0].prefix)
next_stmt[0].prefix = "\n".join(prefix_lines)
else:
assert list_call.parent
assert list_call.next_sibling is None
# Put a blank line after list_call and set its prefix.
end_line = BlankLine()
list_call.parent.append_child(end_line)
assert list_call.next_sibling is end_line
# The new prefix should be everything up to the first new line
# of sort_stmt's prefix.
end_line.prefix = btwn.rpartition("\n")[0]
"""Fixer for import statements.
If spam is being imported from the local directory, this import:
from spam import eggs
Becomes:
from .spam import eggs
And this import:
import spam
Becomes:
from . import spam
"""
# Local imports
from .. import fixer_base
from os.path import dirname, join, exists, sep
from ..fixer_util import FromImport, syms, token
def traverse_imports(names):
"""
Walks over all the names imported in a dotted_as_names node.
"""
pending = [names]
while pending:
node = pending.pop()
if node.type == token.NAME:
yield node.value
elif node.type == syms.dotted_name:
yield "".join([ch.value for ch in node.children])
elif node.type == syms.dotted_as_name:
pending.append(node.children[0])
elif node.type == syms.dotted_as_names:
pending.extend(node.children[::-2])
else:
raise AssertionError("unknown node type")
class FixImport(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
import_from< 'from' imp=any 'import' ['('] any [')'] >
|
import_name< 'import' imp=any >
"""
def start_tree(self, tree, name):
super(FixImport, self).start_tree(tree, name)
self.skip = "absolute_import" in tree.future_features
def transform(self, node, results):
if self.skip:
return
imp = results['imp']
if node.type == syms.import_from:
# Some imps are top-level (eg: 'import ham')
# some are first level (eg: 'import ham.eggs')
# some are third level (eg: 'import ham.eggs as spam')
# Hence, the loop
while not hasattr(imp, 'value'):
imp = imp.children[0]
if self.probably_a_local_import(imp.value):
imp.value = "." + imp.value
imp.changed()
else:
have_local = False
have_absolute = False
for mod_name in traverse_imports(imp):
if self.probably_a_local_import(mod_name):
have_local = True
else:
have_absolute = True
if have_absolute:
if have_local:
# We won't handle both sibling and absolute imports in the
# same statement at the moment.
self.warning(node, "absolute and local imports together")
return
new = FromImport(".", [imp])
new.prefix = node.prefix
return new
def probably_a_local_import(self, imp_name):
if imp_name.startswith("."):
# Relative imports are certainly not local imports.
return False
imp_name = imp_name.split(".", 1)[0]
base_path = dirname(self.filename)
base_path = join(base_path, imp_name)
# If there is no __init__.py next to the file its not in a package
# so can't be a relative import.
if not exists(join(dirname(base_path), "__init__.py")):
return False
for ext in [".py", sep, ".pyc", ".so", ".sl", ".pyd"]:
if exists(base_path + ext):
return True
return False
"""Fix incompatible imports and module references."""
# Authors: Collin Winter, Nick Edds
# Local imports
from .. import fixer_base
from ..fixer_util import Name, attr_chain
MAPPING = {'StringIO': 'io',
'cStringIO': 'io',
'cPickle': 'pickle',
'__builtin__' : 'builtins',
'copy_reg': 'copyreg',
'Queue': 'queue',
'SocketServer': 'socketserver',
'ConfigParser': 'configparser',
'repr': 'reprlib',
'FileDialog': 'tkinter.filedialog',
'tkFileDialog': 'tkinter.filedialog',
'SimpleDialog': 'tkinter.simpledialog',
'tkSimpleDialog': 'tkinter.simpledialog',
'tkColorChooser': 'tkinter.colorchooser',
'tkCommonDialog': 'tkinter.commondialog',
'Dialog': 'tkinter.dialog',
'Tkdnd': 'tkinter.dnd',
'tkFont': 'tkinter.font',
'tkMessageBox': 'tkinter.messagebox',
'ScrolledText': 'tkinter.scrolledtext',
'Tkconstants': 'tkinter.constants',
'Tix': 'tkinter.tix',
'ttk': 'tkinter.ttk',
'Tkinter': 'tkinter',
'markupbase': '_markupbase',
'_winreg': 'winreg',
'thread': '_thread',
'dummy_thread': '_dummy_thread',
# anydbm and whichdb are handled by fix_imports2
'dbhash': 'dbm.bsd',
'dumbdbm': 'dbm.dumb',
'dbm': 'dbm.ndbm',
'gdbm': 'dbm.gnu',
'xmlrpclib': 'xmlrpc.client',
'DocXMLRPCServer': 'xmlrpc.server',
'SimpleXMLRPCServer': 'xmlrpc.server',
'httplib': 'http.client',
'htmlentitydefs' : 'html.entities',
'HTMLParser' : 'html.parser',
'Cookie': 'http.cookies',
'cookielib': 'http.cookiejar',
'BaseHTTPServer': 'http.server',
'SimpleHTTPServer': 'http.server',
'CGIHTTPServer': 'http.server',
#'test.test_support': 'test.support',
'commands': 'subprocess',
'UserString' : 'collections',
'UserList' : 'collections',
'urlparse' : 'urllib.parse',
'robotparser' : 'urllib.robotparser',
}
def alternates(members):
return "(" + "|".join(map(repr, members)) + ")"
def build_pattern(mapping=MAPPING):
mod_list = ' | '.join(["module_name='%s'" % key for key in mapping])
bare_names = alternates(mapping.keys())
yield """name_import=import_name< 'import' ((%s) |
multiple_imports=dotted_as_names< any* (%s) any* >) >
""" % (mod_list, mod_list)
yield """import_from< 'from' (%s) 'import' ['(']
( any | import_as_name< any 'as' any > |
import_as_names< any* >) [')'] >
""" % mod_list
yield """import_name< 'import' (dotted_as_name< (%s) 'as' any > |
multiple_imports=dotted_as_names<
any* dotted_as_name< (%s) 'as' any > any* >) >
""" % (mod_list, mod_list)
# Find usages of module members in code e.g. thread.foo(bar)
yield "power< bare_with_attr=(%s) trailer<'.' any > any* >" % bare_names
class FixImports(fixer_base.BaseFix):
BM_compatible = True
keep_line_order = True
# This is overridden in fix_imports2.
mapping = MAPPING
# We want to run this fixer late, so fix_import doesn't try to make stdlib
# renames into relative imports.
run_order = 6
def build_pattern(self):
return "|".join(build_pattern(self.mapping))
def compile_pattern(self):
# We override this, so MAPPING can be pragmatically altered and the
# changes will be reflected in PATTERN.
self.PATTERN = self.build_pattern()
super(FixImports, self).compile_pattern()
# Don't match the node if it's within another match.
def match(self, node):
match = super(FixImports, self).match
results = match(node)
if results:
# Module usage could be in the trailer of an attribute lookup, so we
# might have nested matches when "bare_with_attr" is present.
if "bare_with_attr" not in results and \
any(match(obj) for obj in attr_chain(node, "parent")):
return False
return results
return False
def start_tree(self, tree, filename):
super(FixImports, self).start_tree(tree, filename)
self.replace = {}
def transform(self, node, results):
import_mod = results.get("module_name")
if import_mod:
mod_name = import_mod.value
new_name = self.mapping[mod_name]
import_mod.replace(Name(new_name, prefix=import_mod.prefix))
if "name_import" in results:
# If it's not a "from x import x, y" or "import x as y" import,
# marked its usage to be replaced.
self.replace[mod_name] = new_name
if "multiple_imports" in results:
# This is a nasty hack to fix multiple imports on a line (e.g.,
# "import StringIO, urlparse"). The problem is that I can't
# figure out an easy way to make a pattern recognize the keys of
# MAPPING randomly sprinkled in an import statement.
results = self.match(node)
if results:
self.transform(node, results)
else:
# Replace usage of the module.
bare_name = results["bare_with_attr"][0]
new_name = self.replace.get(bare_name.value)
if new_name:
bare_name.replace(Name(new_name, prefix=bare_name.prefix))
"""Fix incompatible imports and module references that must be fixed after
fix_imports."""
from . import fix_imports
MAPPING = {
'whichdb': 'dbm',
'anydbm': 'dbm',
}
class FixImports2(fix_imports.FixImports):
run_order = 7
mapping = MAPPING
"""Fixer that changes input(...) into eval(input(...))."""
# Author: Andre Roberge
# Local imports
from .. import fixer_base
from ..fixer_util import Call, Name
from .. import patcomp
context = patcomp.compile_pattern("power< 'eval' trailer< '(' any ')' > >")
class FixInput(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'input' args=trailer< '(' [any] ')' > >
"""
def transform(self, node, results):
# If we're already wrapped in an eval() call, we're done.
if context.match(node.parent.parent):
return
new = node.clone()
new.prefix = ""
return Call(Name("eval"), [new], prefix=node.prefix)
# Copyright 2006 Georg Brandl.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for intern().
intern(s) -> sys.intern(s)"""
# Local imports
from .. import fixer_base
from ..fixer_util import ImportAndCall, touch_import
class FixIntern(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
PATTERN = """
power< 'intern'
trailer< lpar='('
( not(arglist | argument<any '=' any>) obj=any
| obj=arglist<(not argument<any '=' any>) any ','> )
rpar=')' >
after=any*
>
"""
def transform(self, node, results):
names = ('sys', 'intern')
new = ImportAndCall(node, results, names)
touch_import(None, 'sys', node)
return new
# Copyright 2008 Armin Ronacher.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that cleans up a tuple argument to isinstance after the tokens
in it were fixed. This is mainly used to remove double occurrences of
tokens as a leftover of the long -> int / unicode -> str conversion.
eg. isinstance(x, (int, long)) -> isinstance(x, (int, int))
-> isinstance(x, int)
"""
from .. import fixer_base
from ..fixer_util import token
class FixIsinstance(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power<
'isinstance'
trailer< '(' arglist< any ',' atom< '('
args=testlist_gexp< any+ >
')' > > ')' >
>
"""
run_order = 6
def transform(self, node, results):
names_inserted = set()
testlist = results["args"]
args = testlist.children
new_args = []
iterator = enumerate(args)
for idx, arg in iterator:
if arg.type == token.NAME and arg.value in names_inserted:
if idx < len(args) - 1 and args[idx + 1].type == token.COMMA:
next(iterator)
continue
else:
new_args.append(arg)
if arg.type == token.NAME:
names_inserted.add(arg.value)
if new_args and new_args[-1].type == token.COMMA:
del new_args[-1]
if len(new_args) == 1:
atom = testlist.parent
new_args[0].prefix = atom.prefix
atom.replace(new_args[0])
else:
args[:] = new_args
node.changed()
""" Fixer for itertools.(imap|ifilter|izip) --> (map|filter|zip) and
itertools.ifilterfalse --> itertools.filterfalse (bugs 2360-2363)
imports from itertools are fixed in fix_itertools_import.py
If itertools is imported as something else (ie: import itertools as it;
it.izip(spam, eggs)) method calls will not get fixed.
"""
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixItertools(fixer_base.BaseFix):
BM_compatible = True
it_funcs = "('imap'|'ifilter'|'izip'|'izip_longest'|'ifilterfalse')"
PATTERN = """
power< it='itertools'
trailer<
dot='.' func=%(it_funcs)s > trailer< '(' [any] ')' > >
|
power< func=%(it_funcs)s trailer< '(' [any] ')' > >
""" %(locals())
# Needs to be run after fix_(map|zip|filter)
run_order = 6
def transform(self, node, results):
prefix = None
func = results['func'][0]
if ('it' in results and
func.value not in ('ifilterfalse', 'izip_longest')):
dot, it = (results['dot'], results['it'])
# Remove the 'itertools'
prefix = it.prefix
it.remove()
# Replace the node which contains ('.', 'function') with the
# function (to be consistent with the second part of the pattern)
dot.remove()
func.parent.replace(func)
prefix = prefix or func.prefix
func.replace(Name(func.value[1:], prefix=prefix))
""" Fixer for imports of itertools.(imap|ifilter|izip|ifilterfalse) """
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import BlankLine, syms, token
class FixItertoolsImports(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
import_from< 'from' 'itertools' 'import' imports=any >
""" %(locals())
def transform(self, node, results):
imports = results['imports']
if imports.type == syms.import_as_name or not imports.children:
children = [imports]
else:
children = imports.children
for child in children[::2]:
if child.type == token.NAME:
member = child.value
name_node = child
elif child.type == token.STAR:
# Just leave the import as is.
return
else:
assert child.type == syms.import_as_name
name_node = child.children[0]
member_name = name_node.value
if member_name in ('imap', 'izip', 'ifilter'):
child.value = None
child.remove()
elif member_name in ('ifilterfalse', 'izip_longest'):
node.changed()
name_node.value = ('filterfalse' if member_name[1] == 'f'
else 'zip_longest')
# Make sure the import statement is still sane
children = imports.children[:] or [imports]
remove_comma = True
for child in children:
if remove_comma and child.type == token.COMMA:
child.remove()
else:
remove_comma ^= True
while children and children[-1].type == token.COMMA:
children.pop().remove()
# If there are no imports left, just get rid of the entire statement
if (not (imports.children or getattr(imports, 'value', None)) or
imports.parent is None):
p = node.prefix
node = BlankLine()
node.prefix = p
return node
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that turns 'long' into 'int' everywhere.
"""
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import is_probably_builtin
class FixLong(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "'long'"
def transform(self, node, results):
if is_probably_builtin(node):
node.value = "int"
node.changed()
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes map(F, ...) into list(map(F, ...)) unless there
exists a 'from future_builtins import map' statement in the top-level
namespace.
As a special case, map(None, X) is changed into list(X). (This is
necessary because the semantics are changed in this case -- the new
map(None, X) is equivalent to [(x,) for x in X].)
We avoid the transformation (except for the special case mentioned
above) if the map() call is directly contained in iter(<>), list(<>),
tuple(<>), sorted(<>), ...join(<>), or for V in <>:.
NOTE: This is still not correct if the original code was depending on
map(F, X, Y, ...) to go on until the longest argument is exhausted,
substituting None for missing values -- like zip(), it now stops as
soon as the shortest argument is exhausted.
"""
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, ListComp, in_special_context
from ..pygram import python_symbols as syms
class FixMap(fixer_base.ConditionalFix):
BM_compatible = True
PATTERN = """
map_none=power<
'map'
trailer< '(' arglist< 'None' ',' arg=any [','] > ')' >
>
|
map_lambda=power<
'map'
trailer<
'('
arglist<
lambdef< 'lambda'
(fp=NAME | vfpdef< '(' fp=NAME ')'> ) ':' xp=any
>
','
it=any
>
')'
>
>
|
power<
'map' trailer< '(' [arglist=any] ')' >
>
"""
skip_on = 'future_builtins.map'
def transform(self, node, results):
if self.should_skip(node):
return
if node.parent.type == syms.simple_stmt:
self.warning(node, "You should use a for loop here")
new = node.clone()
new.prefix = ""
new = Call(Name("list"), [new])
elif "map_lambda" in results:
new = ListComp(results["xp"].clone(),
results["fp"].clone(),
results["it"].clone())
else:
if "map_none" in results:
new = results["arg"].clone()
else:
if "arglist" in results:
args = results["arglist"]
if args.type == syms.arglist and \
args.children[0].type == token.NAME and \
args.children[0].value == "None":
self.warning(node, "cannot convert map(None, ...) "
"with multiple arguments because map() "
"now truncates to the shortest sequence")
return
if in_special_context(node):
return None
new = node.clone()
new.prefix = ""
new = Call(Name("list"), [new])
new.prefix = node.prefix
return new
"""Fix bound method attributes (method.im_? -> method.__?__).
"""
# Author: Christian Heimes
# Local imports
from .. import fixer_base
from ..fixer_util import Name
MAP = {
"im_func" : "__func__",
"im_self" : "__self__",
"im_class" : "__self__.__class__"
}
class FixMethodattrs(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< any+ trailer< '.' attr=('im_func' | 'im_self' | 'im_class') > any* >
"""
def transform(self, node, results):
attr = results["attr"][0]
new = MAP[attr.value]
attr.replace(Name(new, prefix=attr.prefix))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that turns <> into !=."""
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
class FixNe(fixer_base.BaseFix):
# This is so simple that we don't need the pattern compiler.
_accept_type = token.NOTEQUAL
def match(self, node):
# Override
return node.value == "<>"
def transform(self, node, results):
new = pytree.Leaf(token.NOTEQUAL, "!=", prefix=node.prefix)
return new
"""Fixer for __nonzero__ -> __bool__ methods."""
# Author: Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Name, syms
class FixNonzero(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
classdef< 'class' any+ ':'
suite< any*
funcdef< 'def' name='__nonzero__'
parameters< '(' NAME ')' > any+ >
any* > >
"""
def transform(self, node, results):
name = results["name"]
new = Name("__bool__", prefix=name.prefix)
name.replace(new)
"""Fixer that turns 1L into 1, 0755 into 0o755.
"""
# Copyright 2007 Georg Brandl.
# Licensed to PSF under a Contributor Agreement.
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Number
class FixNumliterals(fixer_base.BaseFix):
# This is so simple that we don't need the pattern compiler.
_accept_type = token.NUMBER
def match(self, node):
# Override
return (node.value.startswith("0") or node.value[-1] in "Ll")
def transform(self, node, results):
val = node.value
if val[-1] in 'Ll':
val = val[:-1]
elif val.startswith('0') and val.isdigit() and len(set(val)) > 1:
val = "0o" + val[1:]
return Number(val, prefix=node.prefix)
"""Fixer for operator functions.
operator.isCallable(obj) -> hasattr(obj, '__call__')
operator.sequenceIncludes(obj) -> operator.contains(obj)
operator.isSequenceType(obj) -> isinstance(obj, collections.Sequence)
operator.isMappingType(obj) -> isinstance(obj, collections.Mapping)
operator.isNumberType(obj) -> isinstance(obj, numbers.Number)
operator.repeat(obj, n) -> operator.mul(obj, n)
operator.irepeat(obj, n) -> operator.imul(obj, n)
"""
import collections
# Local imports
from lib2to3 import fixer_base
from lib2to3.fixer_util import Call, Name, String, touch_import
def invocation(s):
def dec(f):
f.invocation = s
return f
return dec
class FixOperator(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
methods = """
method=('isCallable'|'sequenceIncludes'
|'isSequenceType'|'isMappingType'|'isNumberType'
|'repeat'|'irepeat')
"""
obj = "'(' obj=any ')'"
PATTERN = """
power< module='operator'
trailer< '.' %(methods)s > trailer< %(obj)s > >
|
power< %(methods)s trailer< %(obj)s > >
""" % dict(methods=methods, obj=obj)
def transform(self, node, results):
method = self._check_method(node, results)
if method is not None:
return method(node, results)
@invocation("operator.contains(%s)")
def _sequenceIncludes(self, node, results):
return self._handle_rename(node, results, "contains")
@invocation("hasattr(%s, '__call__')")
def _isCallable(self, node, results):
obj = results["obj"]
args = [obj.clone(), String(", "), String("'__call__'")]
return Call(Name("hasattr"), args, prefix=node.prefix)
@invocation("operator.mul(%s)")
def _repeat(self, node, results):
return self._handle_rename(node, results, "mul")
@invocation("operator.imul(%s)")
def _irepeat(self, node, results):
return self._handle_rename(node, results, "imul")
@invocation("isinstance(%s, collections.Sequence)")
def _isSequenceType(self, node, results):
return self._handle_type2abc(node, results, "collections", "Sequence")
@invocation("isinstance(%s, collections.Mapping)")
def _isMappingType(self, node, results):
return self._handle_type2abc(node, results, "collections", "Mapping")
@invocation("isinstance(%s, numbers.Number)")
def _isNumberType(self, node, results):
return self._handle_type2abc(node, results, "numbers", "Number")
def _handle_rename(self, node, results, name):
method = results["method"][0]
method.value = name
method.changed()
def _handle_type2abc(self, node, results, module, abc):
touch_import(None, module, node)
obj = results["obj"]
args = [obj.clone(), String(", " + ".".join([module, abc]))]
return Call(Name("isinstance"), args, prefix=node.prefix)
def _check_method(self, node, results):
method = getattr(self, "_" + results["method"][0].value)
if isinstance(method, collections.Callable):
if "module" in results:
return method
else:
sub = (str(results["obj"]),)
invocation_str = method.invocation % sub
self.warning(node, "You should use '%s' here." % invocation_str)
return None
"""Fixer that addes parentheses where they are required
This converts ``[x for x in 1, 2]`` to ``[x for x in (1, 2)]``."""
# By Taek Joo Kim and Benjamin Peterson
# Local imports
from .. import fixer_base
from ..fixer_util import LParen, RParen
# XXX This doesn't support nested for loops like [x for x in 1, 2 for x in 1, 2]
class FixParen(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
atom< ('[' | '(')
(listmaker< any
comp_for<
'for' NAME 'in'
target=testlist_safe< any (',' any)+ [',']
>
[any]
>
>
|
testlist_gexp< any
comp_for<
'for' NAME 'in'
target=testlist_safe< any (',' any)+ [',']
>
[any]
>
>)
(']' | ')') >
"""
def transform(self, node, results):
target = results["target"]
lparen = LParen()
lparen.prefix = target.prefix
target.prefix = "" # Make it hug the parentheses
target.insert_child(0, lparen)
target.append_child(RParen())
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for print.
Change:
'print' into 'print()'
'print ...' into 'print(...)'
'print ... ,' into 'print(..., end=" ")'
'print >>x, ...' into 'print(..., file=x)'
No changes are applied if print_function is imported from __future__
"""
# Local imports
from .. import patcomp
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, Comma, String, is_tuple
parend_expr = patcomp.compile_pattern(
"""atom< '(' [atom|STRING|NAME] ')' >"""
)
class FixPrint(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
simple_stmt< any* bare='print' any* > | print_stmt
"""
def transform(self, node, results):
assert results
bare_print = results.get("bare")
if bare_print:
# Special-case print all by itself
bare_print.replace(Call(Name("print"), [],
prefix=bare_print.prefix))
return
assert node.children[0] == Name("print")
args = node.children[1:]
if len(args) == 1 and parend_expr.match(args[0]):
# We don't want to keep sticking parens around an
# already-parenthesised expression.
return
sep = end = file = None
if args and args[-1] == Comma():
args = args[:-1]
end = " "
if args and args[0] == pytree.Leaf(token.RIGHTSHIFT, ">>"):
assert len(args) >= 2
file = args[1].clone()
args = args[3:] # Strip a possible comma after the file expression
# Now synthesize a print(args, sep=..., end=..., file=...) node.
l_args = [arg.clone() for arg in args]
if l_args:
l_args[0].prefix = ""
if sep is not None or end is not None or file is not None:
if sep is not None:
self.add_kwarg(l_args, "sep", String(repr(sep)))
if end is not None:
self.add_kwarg(l_args, "end", String(repr(end)))
if file is not None:
self.add_kwarg(l_args, "file", file)
n_stmt = Call(Name("print"), l_args)
n_stmt.prefix = node.prefix
return n_stmt
def add_kwarg(self, l_nodes, s_kwd, n_expr):
# XXX All this prefix-setting may lose comments (though rarely)
n_expr.prefix = ""
n_argument = pytree.Node(self.syms.argument,
(Name(s_kwd),
pytree.Leaf(token.EQUAL, "="),
n_expr))
if l_nodes:
l_nodes.append(Comma())
n_argument.prefix = " "
l_nodes.append(n_argument)
"""Fixer for 'raise E, V, T'
raise -> raise
raise E -> raise E
raise E, V -> raise E(V)
raise E, V, T -> raise E(V).with_traceback(T)
raise E, None, T -> raise E.with_traceback(T)
raise (((E, E'), E''), E'''), V -> raise E(V)
raise "foo", V, T -> warns about string exceptions
CAVEATS:
1) "raise E, V" will be incorrectly translated if V is an exception
instance. The correct Python 3 idiom is
raise E from V
but since we can't detect instance-hood by syntax alone and since
any client code would have to be changed as well, we don't automate
this.
"""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, Attr, ArgList, is_tuple
class FixRaise(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
raise_stmt< 'raise' exc=any [',' val=any [',' tb=any]] >
"""
def transform(self, node, results):
syms = self.syms
exc = results["exc"].clone()
if exc.type == token.STRING:
msg = "Python 3 does not support string exceptions"
self.cannot_convert(node, msg)
return
# Python 2 supports
# raise ((((E1, E2), E3), E4), E5), V
# as a synonym for
# raise E1, V
# Since Python 3 will not support this, we recurse down any tuple
# literals, always taking the first element.
if is_tuple(exc):
while is_tuple(exc):
# exc.children[1:-1] is the unparenthesized tuple
# exc.children[1].children[0] is the first element of the tuple
exc = exc.children[1].children[0].clone()
exc.prefix = " "
if "val" not in results:
# One-argument raise
new = pytree.Node(syms.raise_stmt, [Name("raise"), exc])
new.prefix = node.prefix
return new
val = results["val"].clone()
if is_tuple(val):
args = [c.clone() for c in val.children[1:-1]]
else:
val.prefix = ""
args = [val]
if "tb" in results:
tb = results["tb"].clone()
tb.prefix = ""
e = exc
# If there's a traceback and None is passed as the value, then don't
# add a call, since the user probably just wants to add a
# traceback. See issue #9661.
if val.type != token.NAME or val.value != "None":
e = Call(exc, args)
with_tb = Attr(e, Name('with_traceback')) + [ArgList([tb])]
new = pytree.Node(syms.simple_stmt, [Name("raise")] + with_tb)
new.prefix = node.prefix
return new
else:
return pytree.Node(syms.raise_stmt,
[Name("raise"), Call(exc, args)],
prefix=node.prefix)
"""Fixer that changes raw_input(...) into input(...)."""
# Author: Andre Roberge
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixRawInput(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< name='raw_input' trailer< '(' [any] ')' > any* >
"""
def transform(self, node, results):
name = results["name"]
name.replace(Name("input", prefix=name.prefix))
# Copyright 2008 Armin Ronacher.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for reduce().
Makes sure reduce() is imported from the functools module if reduce is
used in that module.
"""
from lib2to3 import fixer_base
from lib2to3.fixer_util import touch_import
class FixReduce(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
PATTERN = """
power< 'reduce'
trailer< '('
arglist< (
(not(argument<any '=' any>) any ','
not(argument<any '=' any>) any) |
(not(argument<any '=' any>) any ','
not(argument<any '=' any>) any ','
not(argument<any '=' any>) any)
) >
')' >
>
"""
def transform(self, node, results):
touch_import('functools', 'reduce', node)
"""Fixer for reload().
reload(s) -> imp.reload(s)"""
# Local imports
from .. import fixer_base
from ..fixer_util import ImportAndCall, touch_import
class FixReload(fixer_base.BaseFix):
BM_compatible = True
order = "pre"
PATTERN = """
power< 'reload'
trailer< lpar='('
( not(arglist | argument<any '=' any>) obj=any
| obj=arglist<(not argument<any '=' any>) any ','> )
rpar=')' >
after=any*
>
"""
def transform(self, node, results):
names = ('imp', 'reload')
new = ImportAndCall(node, results, names)
touch_import(None, 'imp', node)
return new
"""Fix incompatible renames
Fixes:
* sys.maxint -> sys.maxsize
"""
# Author: Christian Heimes
# based on Collin Winter's fix_import
# Local imports
from .. import fixer_base
from ..fixer_util import Name, attr_chain
MAPPING = {"sys": {"maxint" : "maxsize"},
}
LOOKUP = {}
def alternates(members):
return "(" + "|".join(map(repr, members)) + ")"
def build_pattern():
#bare = set()
for module, replace in list(MAPPING.items()):
for old_attr, new_attr in list(replace.items()):
LOOKUP[(module, old_attr)] = new_attr
#bare.add(module)
#bare.add(old_attr)
#yield """
# import_name< 'import' (module=%r
# | dotted_as_names< any* module=%r any* >) >
# """ % (module, module)
yield """
import_from< 'from' module_name=%r 'import'
( attr_name=%r | import_as_name< attr_name=%r 'as' any >) >
""" % (module, old_attr, old_attr)
yield """
power< module_name=%r trailer< '.' attr_name=%r > any* >
""" % (module, old_attr)
#yield """bare_name=%s""" % alternates(bare)
class FixRenames(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "|".join(build_pattern())
order = "pre" # Pre-order tree traversal
# Don't match the node if it's within another match
def match(self, node):
match = super(FixRenames, self).match
results = match(node)
if results:
if any(match(obj) for obj in attr_chain(node, "parent")):
return False
return results
return False
#def start_tree(self, tree, filename):
# super(FixRenames, self).start_tree(tree, filename)
# self.replace = {}
def transform(self, node, results):
mod_name = results.get("module_name")
attr_name = results.get("attr_name")
#bare_name = results.get("bare_name")
#import_mod = results.get("module")
if mod_name and attr_name:
new_attr = LOOKUP[(mod_name.value, attr_name.value)]
attr_name.replace(Name(new_attr, prefix=attr_name.prefix))
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that transforms `xyzzy` into repr(xyzzy)."""
# Local imports
from .. import fixer_base
from ..fixer_util import Call, Name, parenthesize
class FixRepr(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
atom < '`' expr=any '`' >
"""
def transform(self, node, results):
expr = results["expr"].clone()
if expr.type == self.syms.testlist1:
expr = parenthesize(expr)
return Call(Name("repr"), [expr], prefix=node.prefix)
"""
Optional fixer to transform set() calls to set literals.
"""
# Author: Benjamin Peterson
from lib2to3 import fixer_base, pytree
from lib2to3.fixer_util import token, syms
class FixSetLiteral(fixer_base.BaseFix):
BM_compatible = True
explicit = True
PATTERN = """power< 'set' trailer< '('
(atom=atom< '[' (items=listmaker< any ((',' any)* [',']) >
|
single=any) ']' >
|
atom< '(' items=testlist_gexp< any ((',' any)* [',']) > ')' >
)
')' > >
"""
def transform(self, node, results):
single = results.get("single")
if single:
# Make a fake listmaker
fake = pytree.Node(syms.listmaker, [single.clone()])
single.replace(fake)
items = fake
else:
items = results["items"]
# Build the contents of the literal
literal = [pytree.Leaf(token.LBRACE, "{")]
literal.extend(n.clone() for n in items.children)
literal.append(pytree.Leaf(token.RBRACE, "}"))
# Set the prefix of the right brace to that of the ')' or ']'
literal[-1].prefix = items.next_sibling.prefix
maker = pytree.Node(syms.dictsetmaker, literal)
maker.prefix = node.prefix
# If the original was a one tuple, we need to remove the extra comma.
if len(maker.children) == 4:
n = maker.children[2]
n.remove()
maker.children[-1].prefix = n.prefix
# Finally, replace the set call with our shiny new literal.
return maker
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for StandardError -> Exception."""
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixStandarderror(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
'StandardError'
"""
def transform(self, node, results):
return Name("Exception", prefix=node.prefix)
"""Fixer for sys.exc_{type, value, traceback}
sys.exc_type -> sys.exc_info()[0]
sys.exc_value -> sys.exc_info()[1]
sys.exc_traceback -> sys.exc_info()[2]
"""
# By Jeff Balogh and Benjamin Peterson
# Local imports
from .. import fixer_base
from ..fixer_util import Attr, Call, Name, Number, Subscript, Node, syms
class FixSysExc(fixer_base.BaseFix):
# This order matches the ordering of sys.exc_info().
exc_info = ["exc_type", "exc_value", "exc_traceback"]
BM_compatible = True
PATTERN = """
power< 'sys' trailer< dot='.' attribute=(%s) > >
""" % '|'.join("'%s'" % e for e in exc_info)
def transform(self, node, results):
sys_attr = results["attribute"][0]
index = Number(self.exc_info.index(sys_attr.value))
call = Call(Name("exc_info"), prefix=sys_attr.prefix)
attr = Attr(Name("sys"), call)
attr[1].children[0].prefix = results["dot"].prefix
attr.append(Subscript(index))
return Node(syms.power, attr, prefix=node.prefix)
"""Fixer for generator.throw(E, V, T).
g.throw(E) -> g.throw(E)
g.throw(E, V) -> g.throw(E(V))
g.throw(E, V, T) -> g.throw(E(V).with_traceback(T))
g.throw("foo"[, V[, T]]) will warn about string exceptions."""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name, Call, ArgList, Attr, is_tuple
class FixThrow(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< any trailer< '.' 'throw' >
trailer< '(' args=arglist< exc=any ',' val=any [',' tb=any] > ')' >
>
|
power< any trailer< '.' 'throw' > trailer< '(' exc=any ')' > >
"""
def transform(self, node, results):
syms = self.syms
exc = results["exc"].clone()
if exc.type is token.STRING:
self.cannot_convert(node, "Python 3 does not support string exceptions")
return
# Leave "g.throw(E)" alone
val = results.get("val")
if val is None:
return
val = val.clone()
if is_tuple(val):
args = [c.clone() for c in val.children[1:-1]]
else:
val.prefix = ""
args = [val]
throw_args = results["args"]
if "tb" in results:
tb = results["tb"].clone()
tb.prefix = ""
e = Call(exc, args)
with_tb = Attr(e, Name('with_traceback')) + [ArgList([tb])]
throw_args.replace(pytree.Node(syms.power, with_tb))
else:
throw_args.replace(Call(exc, args))
"""Fixer for function definitions with tuple parameters.
def func(((a, b), c), d):
...
->
def func(x, d):
((a, b), c) = x
...
It will also support lambdas:
lambda (x, y): x + y -> lambda t: t[0] + t[1]
# The parens are a syntax error in Python 3
lambda (x): x + y -> lambda x: x + y
"""
# Author: Collin Winter
# Local imports
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Assign, Name, Newline, Number, Subscript, syms
def is_docstring(stmt):
return isinstance(stmt, pytree.Node) and \
stmt.children[0].type == token.STRING
class FixTupleParams(fixer_base.BaseFix):
run_order = 4 #use a lower order since lambda is part of other
#patterns
BM_compatible = True
PATTERN = """
funcdef< 'def' any parameters< '(' args=any ')' >
['->' any] ':' suite=any+ >
|
lambda=
lambdef< 'lambda' args=vfpdef< '(' inner=any ')' >
':' body=any
>
"""
def transform(self, node, results):
if "lambda" in results:
return self.transform_lambda(node, results)
new_lines = []
suite = results["suite"]
args = results["args"]
# This crap is so "def foo(...): x = 5; y = 7" is handled correctly.
# TODO(cwinter): suite-cleanup
if suite[0].children[1].type == token.INDENT:
start = 2
indent = suite[0].children[1].value
end = Newline()
else:
start = 0
indent = "; "
end = pytree.Leaf(token.INDENT, "")
# We need access to self for new_name(), and making this a method
# doesn't feel right. Closing over self and new_lines makes the
# code below cleaner.
def handle_tuple(tuple_arg, add_prefix=False):
n = Name(self.new_name())
arg = tuple_arg.clone()
arg.prefix = ""
stmt = Assign(arg, n.clone())
if add_prefix:
n.prefix = " "
tuple_arg.replace(n)
new_lines.append(pytree.Node(syms.simple_stmt,
[stmt, end.clone()]))
if args.type == syms.tfpdef:
handle_tuple(args)
elif args.type == syms.typedargslist:
for i, arg in enumerate(args.children):
if arg.type == syms.tfpdef:
# Without add_prefix, the emitted code is correct,
# just ugly.
handle_tuple(arg, add_prefix=(i > 0))
if not new_lines:
return
# This isn't strictly necessary, but it plays nicely with other fixers.
# TODO(cwinter) get rid of this when children becomes a smart list
for line in new_lines:
line.parent = suite[0]
# TODO(cwinter) suite-cleanup
after = start
if start == 0:
new_lines[0].prefix = " "
elif is_docstring(suite[0].children[start]):
new_lines[0].prefix = indent
after = start + 1
for line in new_lines:
line.parent = suite[0]
suite[0].children[after:after] = new_lines
for i in range(after+1, after+len(new_lines)+1):
suite[0].children[i].prefix = indent
suite[0].changed()
def transform_lambda(self, node, results):
args = results["args"]
body = results["body"]
inner = simplify_args(results["inner"])
# Replace lambda ((((x)))): x with lambda x: x
if inner.type == token.NAME:
inner = inner.clone()
inner.prefix = " "
args.replace(inner)
return
params = find_params(args)
to_index = map_to_index(params)
tup_name = self.new_name(tuple_name(params))
new_param = Name(tup_name, prefix=" ")
args.replace(new_param.clone())
for n in body.post_order():
if n.type == token.NAME and n.value in to_index:
subscripts = [c.clone() for c in to_index[n.value]]
new = pytree.Node(syms.power,
[new_param.clone()] + subscripts)
new.prefix = n.prefix
n.replace(new)
### Helper functions for transform_lambda()
def simplify_args(node):
if node.type in (syms.vfplist, token.NAME):
return node
elif node.type == syms.vfpdef:
# These look like vfpdef< '(' x ')' > where x is NAME
# or another vfpdef instance (leading to recursion).
while node.type == syms.vfpdef:
node = node.children[1]
return node
raise RuntimeError("Received unexpected node %s" % node)
def find_params(node):
if node.type == syms.vfpdef:
return find_params(node.children[1])
elif node.type == token.NAME:
return node.value
return [find_params(c) for c in node.children if c.type != token.COMMA]
def map_to_index(param_list, prefix=[], d=None):
if d is None:
d = {}
for i, obj in enumerate(param_list):
trailer = [Subscript(Number(str(i)))]
if isinstance(obj, list):
map_to_index(obj, trailer, d=d)
else:
d[obj] = prefix + trailer
return d
def tuple_name(param_list):
l = []
for obj in param_list:
if isinstance(obj, list):
l.append(tuple_name(obj))
else:
l.append(obj)
return "_".join(l)
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer for removing uses of the types module.
These work for only the known names in the types module. The forms above
can include types. or not. ie, It is assumed the module is imported either as:
import types
from types import ... # either * or specific types
The import statements are not modified.
There should be another fixer that handles at least the following constants:
type([]) -> list
type(()) -> tuple
type('') -> str
"""
# Local imports
from ..pgen2 import token
from .. import fixer_base
from ..fixer_util import Name
_TYPE_MAPPING = {
'BooleanType' : 'bool',
'BufferType' : 'memoryview',
'ClassType' : 'type',
'ComplexType' : 'complex',
'DictType': 'dict',
'DictionaryType' : 'dict',
'EllipsisType' : 'type(Ellipsis)',
#'FileType' : 'io.IOBase',
'FloatType': 'float',
'IntType': 'int',
'ListType': 'list',
'LongType': 'int',
'ObjectType' : 'object',
'NoneType': 'type(None)',
'NotImplementedType' : 'type(NotImplemented)',
'SliceType' : 'slice',
'StringType': 'bytes', # XXX ?
'StringTypes' : 'str', # XXX ?
'TupleType': 'tuple',
'TypeType' : 'type',
'UnicodeType': 'str',
'XRangeType' : 'range',
}
_pats = ["power< 'types' trailer< '.' name='%s' > >" % t for t in _TYPE_MAPPING]
class FixTypes(fixer_base.BaseFix):
BM_compatible = True
PATTERN = '|'.join(_pats)
def transform(self, node, results):
new_value = _TYPE_MAPPING.get(results["name"].value)
if new_value:
return Name(new_value, prefix=node.prefix)
return None
r"""Fixer for unicode.
* Changes unicode to str and unichr to chr.
* If "...\u..." is not unicode literal change it into "...\\u...".
* Change u"..." into "...".
"""
from ..pgen2 import token
from .. import fixer_base
_mapping = {"unichr" : "chr", "unicode" : "str"}
class FixUnicode(fixer_base.BaseFix):
BM_compatible = True
PATTERN = "STRING | 'unicode' | 'unichr'"
def start_tree(self, tree, filename):
super(FixUnicode, self).start_tree(tree, filename)
self.unicode_literals = 'unicode_literals' in tree.future_features
def transform(self, node, results):
if node.type == token.NAME:
new = node.clone()
new.value = _mapping[node.value]
return new
elif node.type == token.STRING:
val = node.value
if not self.unicode_literals and val[0] in '\'"' and '\\' in val:
val = r'\\'.join([
v.replace('\\u', r'\\u').replace('\\U', r'\\U')
for v in val.split(r'\\')
])
if val[0] in 'uU':
val = val[1:]
if val == node.value:
return node
new = node.clone()
new.value = val
return new
"""Fix changes imports of urllib which are now incompatible.
This is rather similar to fix_imports, but because of the more
complex nature of the fixing for urllib, it has its own fixer.
"""
# Author: Nick Edds
# Local imports
from lib2to3.fixes.fix_imports import alternates, FixImports
from lib2to3 import fixer_base
from lib2to3.fixer_util import (Name, Comma, FromImport, Newline,
find_indentation, Node, syms)
MAPPING = {"urllib": [
("urllib.request",
["URLopener", "FancyURLopener", "urlretrieve",
"_urlopener", "urlopen", "urlcleanup",
"pathname2url", "url2pathname"]),
("urllib.parse",
["quote", "quote_plus", "unquote", "unquote_plus",
"urlencode", "splitattr", "splithost", "splitnport",
"splitpasswd", "splitport", "splitquery", "splittag",
"splittype", "splituser", "splitvalue", ]),
("urllib.error",
["ContentTooShortError"])],
"urllib2" : [
("urllib.request",
["urlopen", "install_opener", "build_opener",
"Request", "OpenerDirector", "BaseHandler",
"HTTPDefaultErrorHandler", "HTTPRedirectHandler",
"HTTPCookieProcessor", "ProxyHandler",
"HTTPPasswordMgr",
"HTTPPasswordMgrWithDefaultRealm",
"AbstractBasicAuthHandler",
"HTTPBasicAuthHandler", "ProxyBasicAuthHandler",
"AbstractDigestAuthHandler",
"HTTPDigestAuthHandler", "ProxyDigestAuthHandler",
"HTTPHandler", "HTTPSHandler", "FileHandler",
"FTPHandler", "CacheFTPHandler",
"UnknownHandler"]),
("urllib.error",
["URLError", "HTTPError"]),
]
}
# Duplicate the url parsing functions for urllib2.
MAPPING["urllib2"].append(MAPPING["urllib"][1])
def build_pattern():
bare = set()
for old_module, changes in MAPPING.items():
for change in changes:
new_module, members = change
members = alternates(members)
yield """import_name< 'import' (module=%r
| dotted_as_names< any* module=%r any* >) >
""" % (old_module, old_module)
yield """import_from< 'from' mod_member=%r 'import'
( member=%s | import_as_name< member=%s 'as' any > |
import_as_names< members=any* >) >
""" % (old_module, members, members)
yield """import_from< 'from' module_star=%r 'import' star='*' >
""" % old_module
yield """import_name< 'import'
dotted_as_name< module_as=%r 'as' any > >
""" % old_module
# bare_with_attr has a special significance for FixImports.match().
yield """power< bare_with_attr=%r trailer< '.' member=%s > any* >
""" % (old_module, members)
class FixUrllib(FixImports):
def build_pattern(self):
return "|".join(build_pattern())
def transform_import(self, node, results):
"""Transform for the basic import case. Replaces the old
import name with a comma separated list of its
replacements.
"""
import_mod = results.get("module")
pref = import_mod.prefix
names = []
# create a Node list of the replacement modules
for name in MAPPING[import_mod.value][:-1]:
names.extend([Name(name[0], prefix=pref), Comma()])
names.append(Name(MAPPING[import_mod.value][-1][0], prefix=pref))
import_mod.replace(names)
def transform_member(self, node, results):
"""Transform for imports of specific module elements. Replaces
the module to be imported from with the appropriate new
module.
"""
mod_member = results.get("mod_member")
pref = mod_member.prefix
member = results.get("member")
# Simple case with only a single member being imported
if member:
# this may be a list of length one, or just a node
if isinstance(member, list):
member = member[0]
new_name = None
for change in MAPPING[mod_member.value]:
if member.value in change[1]:
new_name = change[0]
break
if new_name:
mod_member.replace(Name(new_name, prefix=pref))
else:
self.cannot_convert(node, "This is an invalid module element")
# Multiple members being imported
else:
# a dictionary for replacements, order matters
modules = []
mod_dict = {}
members = results["members"]
for member in members:
# we only care about the actual members
if member.type == syms.import_as_name:
as_name = member.children[2].value
member_name = member.children[0].value
else:
member_name = member.value
as_name = None
if member_name != ",":
for change in MAPPING[mod_member.value]:
if member_name in change[1]:
if change[0] not in mod_dict:
modules.append(change[0])
mod_dict.setdefault(change[0], []).append(member)
new_nodes = []
indentation = find_indentation(node)
first = True
def handle_name(name, prefix):
if name.type == syms.import_as_name:
kids = [Name(name.children[0].value, prefix=prefix),
name.children[1].clone(),
name.children[2].clone()]
return [Node(syms.import_as_name, kids)]
return [Name(name.value, prefix=prefix)]
for module in modules:
elts = mod_dict[module]
names = []
for elt in elts[:-1]:
names.extend(handle_name(elt, pref))
names.append(Comma())
names.extend(handle_name(elts[-1], pref))
new = FromImport(module, names)
if not first or node.parent.prefix.endswith(indentation):
new.prefix = indentation
new_nodes.append(new)
first = False
if new_nodes:
nodes = []
for new_node in new_nodes[:-1]:
nodes.extend([new_node, Newline()])
nodes.append(new_nodes[-1])
node.replace(nodes)
else:
self.cannot_convert(node, "All module elements are invalid")
def transform_dot(self, node, results):
"""Transform for calls to module members in code."""
module_dot = results.get("bare_with_attr")
member = results.get("member")
new_name = None
if isinstance(member, list):
member = member[0]
for change in MAPPING[module_dot.value]:
if member.value in change[1]:
new_name = change[0]
break
if new_name:
module_dot.replace(Name(new_name,
prefix=module_dot.prefix))
else:
self.cannot_convert(node, "This is an invalid module element")
def transform(self, node, results):
if results.get("module"):
self.transform_import(node, results)
elif results.get("mod_member"):
self.transform_member(node, results)
elif results.get("bare_with_attr"):
self.transform_dot(node, results)
# Renaming and star imports are not supported for these modules.
elif results.get("module_star"):
self.cannot_convert(node, "Cannot handle star imports.")
elif results.get("module_as"):
self.cannot_convert(node, "This module is now multiple modules")
"""Fixer that changes 'a ,b' into 'a, b'.
This also changes '{a :b}' into '{a: b}', but does not touch other
uses of colons. It does not touch other uses of whitespace.
"""
from .. import pytree
from ..pgen2 import token
from .. import fixer_base
class FixWsComma(fixer_base.BaseFix):
explicit = True # The user must ask for this fixers
PATTERN = """
any<(not(',') any)+ ',' ((not(',') any)+ ',')* [not(',') any]>
"""
COMMA = pytree.Leaf(token.COMMA, ",")
COLON = pytree.Leaf(token.COLON, ":")
SEPS = (COMMA, COLON)
def transform(self, node, results):
new = node.clone()
comma = False
for child in new.children:
if child in self.SEPS:
prefix = child.prefix
if prefix.isspace() and "\n" not in prefix:
child.prefix = ""
comma = True
else:
if comma:
prefix = child.prefix
if not prefix:
child.prefix = " "
comma = False
return new
# Copyright 2007 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Fixer that changes xrange(...) into range(...)."""
# Local imports
from .. import fixer_base
from ..fixer_util import Name, Call, consuming_calls
from .. import patcomp
class FixXrange(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power<
(name='range'|name='xrange') trailer< '(' args=any ')' >
rest=any* >
"""
def start_tree(self, tree, filename):
super(FixXrange, self).start_tree(tree, filename)
self.transformed_xranges = set()
def finish_tree(self, tree, filename):
self.transformed_xranges = None
def transform(self, node, results):
name = results["name"]
if name.value == "xrange":
return self.transform_xrange(node, results)
elif name.value == "range":
return self.transform_range(node, results)
else:
raise ValueError(repr(name))
def transform_xrange(self, node, results):
name = results["name"]
name.replace(Name("range", prefix=name.prefix))
# This prevents the new range call from being wrapped in a list later.
self.transformed_xranges.add(id(node))
def transform_range(self, node, results):
if (id(node) not in self.transformed_xranges and
not self.in_special_context(node)):
range_call = Call(Name("range"), [results["args"].clone()])
# Encase the range call in list().
list_call = Call(Name("list"), [range_call],
prefix=node.prefix)
# Put things that were after the range() call after the list call.
for n in results["rest"]:
list_call.append_child(n)
return list_call
P1 = "power< func=NAME trailer< '(' node=any ')' > any* >"
p1 = patcomp.compile_pattern(P1)
P2 = """for_stmt< 'for' any 'in' node=any ':' any* >
| comp_for< 'for' any 'in' node=any any* >
| comparison< any 'in' node=any any*>
"""
p2 = patcomp.compile_pattern(P2)
def in_special_context(self, node):
if node.parent is None:
return False
results = {}
if (node.parent.parent is not None and
self.p1.match(node.parent.parent, results) and
results["node"] is node):
# list(d.keys()) -> list(d.keys()), etc.
return results["func"].value in consuming_calls
# for ... in d.iterkeys() -> for ... in d.keys(), etc.
return self.p2.match(node.parent, results) and results["node"] is node
"""Fix "for x in f.xreadlines()" -> "for x in f".
This fixer will also convert g(f.xreadlines) into g(f.__iter__)."""
# Author: Collin Winter
# Local imports
from .. import fixer_base
from ..fixer_util import Name
class FixXreadlines(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< call=any+ trailer< '.' 'xreadlines' > trailer< '(' ')' > >
|
power< any+ trailer< '.' no_call='xreadlines' > >
"""
def transform(self, node, results):
no_call = results.get("no_call")
if no_call:
no_call.replace(Name("__iter__", prefix=no_call.prefix))
else:
node.replace([x.clone() for x in results["call"]])
"""
Fixer that changes zip(seq0, seq1, ...) into list(zip(seq0, seq1, ...)
unless there exists a 'from future_builtins import zip' statement in the
top-level namespace.
We avoid the transformation if the zip() call is directly contained in
iter(<>), list(<>), tuple(<>), sorted(<>), ...join(<>), or for V in <>:.
"""
# Local imports
from .. import fixer_base
from ..fixer_util import Name, Call, in_special_context
class FixZip(fixer_base.ConditionalFix):
BM_compatible = True
PATTERN = """
power< 'zip' args=trailer< '(' [any] ')' >
>
"""
skip_on = "future_builtins.zip"
def transform(self, node, results):
if self.should_skip(node):
return
if in_special_context(node):
return None
new = node.clone()
new.prefix = ""
new = Call(Name("list"), [new])
new.prefix = node.prefix
return new
# Dummy file to make this directory a package.
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Convert graminit.[ch] spit out by pgen to Python code.
Pgen is the Python parser generator. It is useful to quickly create a
parser from a grammar file in Python's grammar notation. But I don't
want my parsers to be written in C (yet), so I'm translating the
parsing tables to Python data structures and writing a Python parse
engine.
Note that the token numbers are constants determined by the standard
Python tokenizer. The standard token module defines these numbers and
their names (the names are not used much). The token numbers are
hardcoded into the Python tokenizer and into pgen. A Python
implementation of the Python tokenizer is also available, in the
standard tokenize module.
On the other hand, symbol numbers (representing the grammar's
non-terminals) are assigned by pgen based on the actual grammar
input.
Note: this module is pretty much obsolete; the pgen module generates
equivalent grammar tables directly from the Grammar.txt input file
without having to invoke the Python pgen C program.
"""
# Python imports
import re
# Local imports
from pgen2 import grammar, token
class Converter(grammar.Grammar):
"""Grammar subclass that reads classic pgen output files.
The run() method reads the tables as produced by the pgen parser
generator, typically contained in two C files, graminit.h and
graminit.c. The other methods are for internal use only.
See the base class for more documentation.
"""
def run(self, graminit_h, graminit_c):
"""Load the grammar tables from the text files written by pgen."""
self.parse_graminit_h(graminit_h)
self.parse_graminit_c(graminit_c)
self.finish_off()
def parse_graminit_h(self, filename):
"""Parse the .h file written by pgen. (Internal)
This file is a sequence of #define statements defining the
nonterminals of the grammar as numbers. We build two tables
mapping the numbers to names and back.
"""
try:
f = open(filename)
except OSError as err:
print("Can't open %s: %s" % (filename, err))
return False
self.symbol2number = {}
self.number2symbol = {}
lineno = 0
for line in f:
lineno += 1
mo = re.match(r"^#define\s+(\w+)\s+(\d+)$", line)
if not mo and line.strip():
print("%s(%s): can't parse %s" % (filename, lineno,
line.strip()))
else:
symbol, number = mo.groups()
number = int(number)
assert symbol not in self.symbol2number
assert number not in self.number2symbol
self.symbol2number[symbol] = number
self.number2symbol[number] = symbol
return True
def parse_graminit_c(self, filename):
"""Parse the .c file written by pgen. (Internal)
The file looks as follows. The first two lines are always this:
#include "pgenheaders.h"
#include "grammar.h"
After that come four blocks:
1) one or more state definitions
2) a table defining dfas
3) a table defining labels
4) a struct defining the grammar
A state definition has the following form:
- one or more arc arrays, each of the form:
static arc arcs_<n>_<m>[<k>] = {
{<i>, <j>},
...
};
- followed by a state array, of the form:
static state states_<s>[<t>] = {
{<k>, arcs_<n>_<m>},
...
};
"""
try:
f = open(filename)
except OSError as err:
print("Can't open %s: %s" % (filename, err))
return False
# The code below essentially uses f's iterator-ness!
lineno = 0
# Expect the two #include lines
lineno, line = lineno+1, next(f)
assert line == '#include "pgenheaders.h"\n', (lineno, line)
lineno, line = lineno+1, next(f)
assert line == '#include "grammar.h"\n', (lineno, line)
# Parse the state definitions
lineno, line = lineno+1, next(f)
allarcs = {}
states = []
while line.startswith("static arc "):
while line.startswith("static arc "):
mo = re.match(r"static arc arcs_(\d+)_(\d+)\[(\d+)\] = {$",
line)
assert mo, (lineno, line)
n, m, k = list(map(int, mo.groups()))
arcs = []
for _ in range(k):
lineno, line = lineno+1, next(f)
mo = re.match(r"\s+{(\d+), (\d+)},$", line)
assert mo, (lineno, line)
i, j = list(map(int, mo.groups()))
arcs.append((i, j))
lineno, line = lineno+1, next(f)
assert line == "};\n", (lineno, line)
allarcs[(n, m)] = arcs
lineno, line = lineno+1, next(f)
mo = re.match(r"static state states_(\d+)\[(\d+)\] = {$", line)
assert mo, (lineno, line)
s, t = list(map(int, mo.groups()))
assert s == len(states), (lineno, line)
state = []
for _ in range(t):
lineno, line = lineno+1, next(f)
mo = re.match(r"\s+{(\d+), arcs_(\d+)_(\d+)},$", line)
assert mo, (lineno, line)
k, n, m = list(map(int, mo.groups()))
arcs = allarcs[n, m]
assert k == len(arcs), (lineno, line)
state.append(arcs)
states.append(state)
lineno, line = lineno+1, next(f)
assert line == "};\n", (lineno, line)
lineno, line = lineno+1, next(f)
self.states = states
# Parse the dfas
dfas = {}
mo = re.match(r"static dfa dfas\[(\d+)\] = {$", line)
assert mo, (lineno, line)
ndfas = int(mo.group(1))
for i in range(ndfas):
lineno, line = lineno+1, next(f)
mo = re.match(r'\s+{(\d+), "(\w+)", (\d+), (\d+), states_(\d+),$',
line)
assert mo, (lineno, line)
symbol = mo.group(2)
number, x, y, z = list(map(int, mo.group(1, 3, 4, 5)))
assert self.symbol2number[symbol] == number, (lineno, line)
assert self.number2symbol[number] == symbol, (lineno, line)
assert x == 0, (lineno, line)
state = states[z]
assert y == len(state), (lineno, line)
lineno, line = lineno+1, next(f)
mo = re.match(r'\s+("(?:\\\d\d\d)*")},$', line)
assert mo, (lineno, line)
first = {}
rawbitset = eval(mo.group(1))
for i, c in enumerate(rawbitset):
byte = ord(c)
for j in range(8):
if byte & (1<<j):
first[i*8 + j] = 1
dfas[number] = (state, first)
lineno, line = lineno+1, next(f)
assert line == "};\n", (lineno, line)
self.dfas = dfas
# Parse the labels
labels = []
lineno, line = lineno+1, next(f)
mo = re.match(r"static label labels\[(\d+)\] = {$", line)
assert mo, (lineno, line)
nlabels = int(mo.group(1))
for i in range(nlabels):
lineno, line = lineno+1, next(f)
mo = re.match(r'\s+{(\d+), (0|"\w+")},$', line)
assert mo, (lineno, line)
x, y = mo.groups()
x = int(x)
if y == "0":
y = None
else:
y = eval(y)
labels.append((x, y))
lineno, line = lineno+1, next(f)
assert line == "};\n", (lineno, line)
self.labels = labels
# Parse the grammar struct
lineno, line = lineno+1, next(f)
assert line == "grammar _PyParser_Grammar = {\n", (lineno, line)
lineno, line = lineno+1, next(f)
mo = re.match(r"\s+(\d+),$", line)
assert mo, (lineno, line)
ndfas = int(mo.group(1))
assert ndfas == len(self.dfas)
lineno, line = lineno+1, next(f)
assert line == "\tdfas,\n", (lineno, line)
lineno, line = lineno+1, next(f)
mo = re.match(r"\s+{(\d+), labels},$", line)
assert mo, (lineno, line)
nlabels = int(mo.group(1))
assert nlabels == len(self.labels), (lineno, line)
lineno, line = lineno+1, next(f)
mo = re.match(r"\s+(\d+)$", line)
assert mo, (lineno, line)
start = int(mo.group(1))
assert start in self.number2symbol, (lineno, line)
self.start = start
lineno, line = lineno+1, next(f)
assert line == "};\n", (lineno, line)
try:
lineno, line = lineno+1, next(f)
except StopIteration:
pass
else:
assert 0, (lineno, line)
def finish_off(self):
"""Create additional useful structures. (Internal)."""
self.keywords = {} # map from keyword strings to arc labels
self.tokens = {} # map from numeric token values to arc labels
for ilabel, (type, value) in enumerate(self.labels):
if type == token.NAME and value is not None:
self.keywords[value] = ilabel
elif value is None:
self.tokens[type] = ilabel
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# Modifications:
# Copyright 2006 Google, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Parser driver.
This provides a high-level interface to parse a file into a syntax tree.
"""
__author__ = "Guido van Rossum <[email protected]>"
__all__ = ["Driver", "load_grammar"]
# Python imports
import codecs
import io
import os
import logging
import sys
# Pgen imports
from . import grammar, parse, token, tokenize, pgen
class Driver(object):
def __init__(self, grammar, convert=None, logger=None):
self.grammar = grammar
if logger is None:
logger = logging.getLogger()
self.logger = logger
self.convert = convert
def parse_tokens(self, tokens, debug=False):
"""Parse a series of tokens and return the syntax tree."""
# XXX Move the prefix computation into a wrapper around tokenize.
p = parse.Parser(self.grammar, self.convert)
p.setup()
lineno = 1
column = 0
type = value = start = end = line_text = None
prefix = ""
for quintuple in tokens:
type, value, start, end, line_text = quintuple
if start != (lineno, column):
assert (lineno, column) <= start, ((lineno, column), start)
s_lineno, s_column = start
if lineno < s_lineno:
prefix += "\n" * (s_lineno - lineno)
lineno = s_lineno
column = 0
if column < s_column:
prefix += line_text[column:s_column]
column = s_column
if type in (tokenize.COMMENT, tokenize.NL):
prefix += value
lineno, column = end
if value.endswith("\n"):
lineno += 1
column = 0
continue
if type == token.OP:
type = grammar.opmap[value]
if debug:
self.logger.debug("%s %r (prefix=%r)",
token.tok_name[type], value, prefix)
if p.addtoken(type, value, (prefix, start)):
if debug:
self.logger.debug("Stop.")
break
prefix = ""
lineno, column = end
if value.endswith("\n"):
lineno += 1
column = 0
else:
# We never broke out -- EOF is too soon (how can this happen???)
raise parse.ParseError("incomplete input",
type, value, (prefix, start))
return p.rootnode
def parse_stream_raw(self, stream, debug=False):
"""Parse a stream and return the syntax tree."""
tokens = tokenize.generate_tokens(stream.readline)
return self.parse_tokens(tokens, debug)
def parse_stream(self, stream, debug=False):
"""Parse a stream and return the syntax tree."""
return self.parse_stream_raw(stream, debug)
def parse_file(self, filename, encoding=None, debug=False):
"""Parse a file and return the syntax tree."""
stream = codecs.open(filename, "r", encoding)
try:
return self.parse_stream(stream, debug)
finally:
stream.close()
def parse_string(self, text, debug=False):
"""Parse a string and return the syntax tree."""
tokens = tokenize.generate_tokens(io.StringIO(text).readline)
return self.parse_tokens(tokens, debug)
def load_grammar(gt="Grammar.txt", gp=None,
save=True, force=False, logger=None):
"""Load the grammar (maybe from a pickle)."""
if logger is None:
logger = logging.getLogger()
if gp is None:
head, tail = os.path.splitext(gt)
if tail == ".txt":
tail = ""
gp = head + tail + ".".join(map(str, sys.version_info)) + ".pickle"
if force or not _newer(gp, gt):
logger.info("Generating grammar tables from %s", gt)
g = pgen.generate_grammar(gt)
if save:
logger.info("Writing grammar tables to %s", gp)
try:
g.dump(gp)
except OSError as e:
logger.info("Writing failed:"+str(e))
else:
g = grammar.Grammar()
g.load(gp)
return g
def _newer(a, b):
"""Inquire whether file a was written since file b."""
if not os.path.exists(a):
return False
if not os.path.exists(b):
return True
return os.path.getmtime(a) >= os.path.getmtime(b)
def main(*args):
"""Main program, when run as a script: produce grammar pickle files.
Calls load_grammar for each argument, a path to a grammar text file.
"""
if not args:
args = sys.argv[1:]
logging.basicConfig(level=logging.INFO, stream=sys.stdout,
format='%(message)s')
for gt in args:
load_grammar(gt, save=True, force=True)
return True
if __name__ == "__main__":
sys.exit(int(not main()))
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""This module defines the data structures used to represent a grammar.
These are a bit arcane because they are derived from the data
structures used by Python's 'pgen' parser generator.
There's also a table here mapping operators to their names in the
token module; the Python tokenize module reports all operators as the
fallback token code OP, but the parser needs the actual token code.
"""
# Python imports
import pickle
# Local imports
from . import token, tokenize
class Grammar(object):
"""Pgen parsing tables conversion class.
Once initialized, this class supplies the grammar tables for the
parsing engine implemented by parse.py. The parsing engine
accesses the instance variables directly. The class here does not
provide initialization of the tables; several subclasses exist to
do this (see the conv and pgen modules).
The load() method reads the tables from a pickle file, which is
much faster than the other ways offered by subclasses. The pickle
file is written by calling dump() (after loading the grammar
tables using a subclass). The report() method prints a readable
representation of the tables to stdout, for debugging.
The instance variables are as follows:
symbol2number -- a dict mapping symbol names to numbers. Symbol
numbers are always 256 or higher, to distinguish
them from token numbers, which are between 0 and
255 (inclusive).
number2symbol -- a dict mapping numbers to symbol names;
these two are each other's inverse.
states -- a list of DFAs, where each DFA is a list of
states, each state is a list of arcs, and each
arc is a (i, j) pair where i is a label and j is
a state number. The DFA number is the index into
this list. (This name is slightly confusing.)
Final states are represented by a special arc of
the form (0, j) where j is its own state number.
dfas -- a dict mapping symbol numbers to (DFA, first)
pairs, where DFA is an item from the states list
above, and first is a set of tokens that can
begin this grammar rule (represented by a dict
whose values are always 1).
labels -- a list of (x, y) pairs where x is either a token
number or a symbol number, and y is either None
or a string; the strings are keywords. The label
number is the index in this list; label numbers
are used to mark state transitions (arcs) in the
DFAs.
start -- the number of the grammar's start symbol.
keywords -- a dict mapping keyword strings to arc labels.
tokens -- a dict mapping token numbers to arc labels.
"""
def __init__(self):
self.symbol2number = {}
self.number2symbol = {}
self.states = []
self.dfas = {}
self.labels = [(0, "EMPTY")]
self.keywords = {}
self.tokens = {}
self.symbol2label = {}
self.start = 256
def dump(self, filename):
"""Dump the grammar tables to a pickle file."""
with open(filename, "wb") as f:
pickle.dump(self.__dict__, f, 2)
def load(self, filename):
"""Load the grammar tables from a pickle file."""
with open(filename, "rb") as f:
d = pickle.load(f)
self.__dict__.update(d)
def copy(self):
"""
Copy the grammar.
"""
new = self.__class__()
for dict_attr in ("symbol2number", "number2symbol", "dfas", "keywords",
"tokens", "symbol2label"):
setattr(new, dict_attr, getattr(self, dict_attr).copy())
new.labels = self.labels[:]
new.states = self.states[:]
new.start = self.start
return new
def report(self):
"""Dump the grammar tables to standard output, for debugging."""
from pprint import pprint
print("s2n")
pprint(self.symbol2number)
print("n2s")
pprint(self.number2symbol)
print("states")
pprint(self.states)
print("dfas")
pprint(self.dfas)
print("labels")
pprint(self.labels)
print("start", self.start)
# Map from operator to number (since tokenize doesn't do this)
opmap_raw = """
( LPAR
) RPAR
[ LSQB
] RSQB
: COLON
, COMMA
; SEMI
+ PLUS
- MINUS
* STAR
/ SLASH
| VBAR
& AMPER
< LESS
> GREATER
= EQUAL
. DOT
% PERCENT
` BACKQUOTE
{ LBRACE
} RBRACE
@ AT
@= ATEQUAL
== EQEQUAL
!= NOTEQUAL
<> NOTEQUAL
<= LESSEQUAL
>= GREATEREQUAL
~ TILDE
^ CIRCUMFLEX
<< LEFTSHIFT
>> RIGHTSHIFT
** DOUBLESTAR
+= PLUSEQUAL
-= MINEQUAL
*= STAREQUAL
/= SLASHEQUAL
%= PERCENTEQUAL
&= AMPEREQUAL
|= VBAREQUAL
^= CIRCUMFLEXEQUAL
<<= LEFTSHIFTEQUAL
>>= RIGHTSHIFTEQUAL
**= DOUBLESTAREQUAL
// DOUBLESLASH
//= DOUBLESLASHEQUAL
-> RARROW
"""
opmap = {}
for line in opmap_raw.splitlines():
if line:
op, name = line.split()
opmap[op] = getattr(token, name)
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Safely evaluate Python string literals without using eval()."""
import re
simple_escapes = {"a": "\a",
"b": "\b",
"f": "\f",
"n": "\n",
"r": "\r",
"t": "\t",
"v": "\v",
"'": "'",
'"': '"',
"\\": "\\"}
def escape(m):
all, tail = m.group(0, 1)
assert all.startswith("\\")
esc = simple_escapes.get(tail)
if esc is not None:
return esc
if tail.startswith("x"):
hexes = tail[1:]
if len(hexes) < 2:
raise ValueError("invalid hex string escape ('\\%s')" % tail)
try:
i = int(hexes, 16)
except ValueError:
raise ValueError("invalid hex string escape ('\\%s')" % tail)
else:
try:
i = int(tail, 8)
except ValueError:
raise ValueError("invalid octal string escape ('\\%s')" % tail)
return chr(i)
def evalString(s):
assert s.startswith("'") or s.startswith('"'), repr(s[:1])
q = s[0]
if s[:3] == q*3:
q = q*3
assert s.endswith(q), repr(s[-len(q):])
assert len(s) >= 2*len(q)
s = s[len(q):-len(q)]
return re.sub(r"\\(\'|\"|\\|[abfnrtv]|x.{0,2}|[0-7]{1,3})", escape, s)
def test():
for i in range(256):
c = chr(i)
s = repr(c)
e = evalString(s)
if e != c:
print(i, c, s, e)
if __name__ == "__main__":
test()
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""Parser engine for the grammar tables generated by pgen.
The grammar table must be loaded first.
See Parser/parser.c in the Python distribution for additional info on
how this parsing engine works.
"""
# Local imports
from . import token
class ParseError(Exception):
"""Exception to signal the parser is stuck."""
def __init__(self, msg, type, value, context):
Exception.__init__(self, "%s: type=%r, value=%r, context=%r" %
(msg, type, value, context))
self.msg = msg
self.type = type
self.value = value
self.context = context
class Parser(object):
"""Parser engine.
The proper usage sequence is:
p = Parser(grammar, [converter]) # create instance
p.setup([start]) # prepare for parsing
<for each input token>:
if p.addtoken(...): # parse a token; may raise ParseError
break
root = p.rootnode # root of abstract syntax tree
A Parser instance may be reused by calling setup() repeatedly.
A Parser instance contains state pertaining to the current token
sequence, and should not be used concurrently by different threads
to parse separate token sequences.
See driver.py for how to get input tokens by tokenizing a file or
string.
Parsing is complete when addtoken() returns True; the root of the
abstract syntax tree can then be retrieved from the rootnode
instance variable. When a syntax error occurs, addtoken() raises
the ParseError exception. There is no error recovery; the parser
cannot be used after a syntax error was reported (but it can be
reinitialized by calling setup()).
"""
def __init__(self, grammar, convert=None):
"""Constructor.
The grammar argument is a grammar.Grammar instance; see the
grammar module for more information.
The parser is not ready yet for parsing; you must call the
setup() method to get it started.
The optional convert argument is a function mapping concrete
syntax tree nodes to abstract syntax tree nodes. If not
given, no conversion is done and the syntax tree produced is
the concrete syntax tree. If given, it must be a function of
two arguments, the first being the grammar (a grammar.Grammar
instance), and the second being the concrete syntax tree node
to be converted. The syntax tree is converted from the bottom
up.
A concrete syntax tree node is a (type, value, context, nodes)
tuple, where type is the node type (a token or symbol number),
value is None for symbols and a string for tokens, context is
None or an opaque value used for error reporting (typically a
(lineno, offset) pair), and nodes is a list of children for
symbols, and None for tokens.
An abstract syntax tree node may be anything; this is entirely
up to the converter function.
"""
self.grammar = grammar
self.convert = convert or (lambda grammar, node: node)
def setup(self, start=None):
"""Prepare for parsing.
This *must* be called before starting to parse.
The optional argument is an alternative start symbol; it
defaults to the grammar's start symbol.
You can use a Parser instance to parse any number of programs;
each time you call setup() the parser is reset to an initial
state determined by the (implicit or explicit) start symbol.
"""
if start is None:
start = self.grammar.start
# Each stack entry is a tuple: (dfa, state, node).
# A node is a tuple: (type, value, context, children),
# where children is a list of nodes or None, and context may be None.
newnode = (start, None, None, [])
stackentry = (self.grammar.dfas[start], 0, newnode)
self.stack = [stackentry]
self.rootnode = None
self.used_names = set() # Aliased to self.rootnode.used_names in pop()
def addtoken(self, type, value, context):
"""Add a token; return True iff this is the end of the program."""
# Map from token to label
ilabel = self.classify(type, value, context)
# Loop until the token is shifted; may raise exceptions
while True:
dfa, state, node = self.stack[-1]
states, first = dfa
arcs = states[state]
# Look for a state with this label
for i, newstate in arcs:
t, v = self.grammar.labels[i]
if ilabel == i:
# Look it up in the list of labels
assert t < 256
# Shift a token; we're done with it
self.shift(type, value, newstate, context)
# Pop while we are in an accept-only state
state = newstate
while states[state] == [(0, state)]:
self.pop()
if not self.stack:
# Done parsing!
return True
dfa, state, node = self.stack[-1]
states, first = dfa
# Done with this token
return False
elif t >= 256:
# See if it's a symbol and if we're in its first set
itsdfa = self.grammar.dfas[t]
itsstates, itsfirst = itsdfa
if ilabel in itsfirst:
# Push a symbol
self.push(t, self.grammar.dfas[t], newstate, context)
break # To continue the outer while loop
else:
if (0, state) in arcs:
# An accepting state, pop it and try something else
self.pop()
if not self.stack:
# Done parsing, but another token is input
raise ParseError("too much input",
type, value, context)
else:
# No success finding a transition
raise ParseError("bad input", type, value, context)
def classify(self, type, value, context):
"""Turn a token into a label. (Internal)"""
if type == token.NAME:
# Keep a listing of all used names
self.used_names.add(value)
# Check for reserved words
ilabel = self.grammar.keywords.get(value)
if ilabel is not None:
return ilabel
ilabel = self.grammar.tokens.get(type)
if ilabel is None:
raise ParseError("bad token", type, value, context)
return ilabel
def shift(self, type, value, newstate, context):
"""Shift a token. (Internal)"""
dfa, state, node = self.stack[-1]
newnode = (type, value, context, None)
newnode = self.convert(self.grammar, newnode)
if newnode is not None:
node[-1].append(newnode)
self.stack[-1] = (dfa, newstate, node)
def push(self, type, newdfa, newstate, context):
"""Push a nonterminal. (Internal)"""
dfa, state, node = self.stack[-1]
newnode = (type, None, context, [])
self.stack[-1] = (dfa, newstate, node)
self.stack.append((newdfa, 0, newnode))
def pop(self):
"""Pop a nonterminal. (Internal)"""
popdfa, popstate, popnode = self.stack.pop()
newnode = self.convert(self.grammar, popnode)
if newnode is not None:
if self.stack:
dfa, state, node = self.stack[-1]
node[-1].append(newnode)
else:
self.rootnode = newnode
self.rootnode.used_names = self.used_names
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
# Pgen imports
from . import grammar, token, tokenize
class PgenGrammar(grammar.Grammar):
pass
class ParserGenerator(object):
def __init__(self, filename, stream=None):
close_stream = None
if stream is None:
stream = open(filename)
close_stream = stream.close
self.filename = filename
self.stream = stream
self.generator = tokenize.generate_tokens(stream.readline)
self.gettoken() # Initialize lookahead
self.dfas, self.startsymbol = self.parse()
if close_stream is not None:
close_stream()
self.first = {} # map from symbol name to set of tokens
self.addfirstsets()
def make_grammar(self):
c = PgenGrammar()
names = list(self.dfas.keys())
names.sort()
names.remove(self.startsymbol)
names.insert(0, self.startsymbol)
for name in names:
i = 256 + len(c.symbol2number)
c.symbol2number[name] = i
c.number2symbol[i] = name
for name in names:
dfa = self.dfas[name]
states = []
for state in dfa:
arcs = []
for label, next in state.arcs.items():
arcs.append((self.make_label(c, label), dfa.index(next)))
if state.isfinal:
arcs.append((0, dfa.index(state)))
states.append(arcs)
c.states.append(states)
c.dfas[c.symbol2number[name]] = (states, self.make_first(c, name))
c.start = c.symbol2number[self.startsymbol]
return c
def make_first(self, c, name):
rawfirst = self.first[name]
first = {}
for label in rawfirst:
ilabel = self.make_label(c, label)
##assert ilabel not in first # XXX failed on <> ... !=
first[ilabel] = 1
return first
def make_label(self, c, label):
# XXX Maybe this should be a method on a subclass of converter?
ilabel = len(c.labels)
if label[0].isalpha():
# Either a symbol name or a named token
if label in c.symbol2number:
# A symbol name (a non-terminal)
if label in c.symbol2label:
return c.symbol2label[label]
else:
c.labels.append((c.symbol2number[label], None))
c.symbol2label[label] = ilabel
return ilabel
else:
# A named token (NAME, NUMBER, STRING)
itoken = getattr(token, label, None)
assert isinstance(itoken, int), label
assert itoken in token.tok_name, label
if itoken in c.tokens:
return c.tokens[itoken]
else:
c.labels.append((itoken, None))
c.tokens[itoken] = ilabel
return ilabel
else:
# Either a keyword or an operator
assert label[0] in ('"', "'"), label
value = eval(label)
if value[0].isalpha():
# A keyword
if value in c.keywords:
return c.keywords[value]
else:
c.labels.append((token.NAME, value))
c.keywords[value] = ilabel
return ilabel
else:
# An operator (any non-numeric token)
itoken = grammar.opmap[value] # Fails if unknown token
if itoken in c.tokens:
return c.tokens[itoken]
else:
c.labels.append((itoken, None))
c.tokens[itoken] = ilabel
return ilabel
def addfirstsets(self):
names = list(self.dfas.keys())
names.sort()
for name in names:
if name not in self.first:
self.calcfirst(name)
#print name, self.first[name].keys()
def calcfirst(self, name):
dfa = self.dfas[name]
self.first[name] = None # dummy to detect left recursion
state = dfa[0]
totalset = {}
overlapcheck = {}
for label, next in state.arcs.items():
if label in self.dfas:
if label in self.first:
fset = self.first[label]
if fset is None:
raise ValueError("recursion for rule %r" % name)
else:
self.calcfirst(label)
fset = self.first[label]
totalset.update(fset)
overlapcheck[label] = fset
else:
totalset[label] = 1
overlapcheck[label] = {label: 1}
inverse = {}
for label, itsfirst in overlapcheck.items():
for symbol in itsfirst:
if symbol in inverse:
raise ValueError("rule %s is ambiguous; %s is in the"
" first sets of %s as well as %s" %
(name, symbol, label, inverse[symbol]))
inverse[symbol] = label
self.first[name] = totalset
def parse(self):
dfas = {}
startsymbol = None
# MSTART: (NEWLINE | RULE)* ENDMARKER
while self.type != token.ENDMARKER:
while self.type == token.NEWLINE:
self.gettoken()
# RULE: NAME ':' RHS NEWLINE
name = self.expect(token.NAME)
self.expect(token.OP, ":")
a, z = self.parse_rhs()
self.expect(token.NEWLINE)
#self.dump_nfa(name, a, z)
dfa = self.make_dfa(a, z)
#self.dump_dfa(name, dfa)
oldlen = len(dfa)
self.simplify_dfa(dfa)
newlen = len(dfa)
dfas[name] = dfa
#print name, oldlen, newlen
if startsymbol is None:
startsymbol = name
return dfas, startsymbol
def make_dfa(self, start, finish):
# To turn an NFA into a DFA, we define the states of the DFA
# to correspond to *sets* of states of the NFA. Then do some
# state reduction. Let's represent sets as dicts with 1 for
# values.
assert isinstance(start, NFAState)
assert isinstance(finish, NFAState)
def closure(state):
base = {}
addclosure(state, base)
return base
def addclosure(state, base):
assert isinstance(state, NFAState)
if state in base:
return
base[state] = 1
for label, next in state.arcs:
if label is None:
addclosure(next, base)
states = [DFAState(closure(start), finish)]
for state in states: # NB states grows while we're iterating
arcs = {}
for nfastate in state.nfaset:
for label, next in nfastate.arcs:
if label is not None:
addclosure(next, arcs.setdefault(label, {}))
for label, nfaset in arcs.items():
for st in states:
if st.nfaset == nfaset:
break
else:
st = DFAState(nfaset, finish)
states.append(st)
state.addarc(st, label)
return states # List of DFAState instances; first one is start
def dump_nfa(self, name, start, finish):
print("Dump of NFA for", name)
todo = [start]
for i, state in enumerate(todo):
print(" State", i, state is finish and "(final)" or "")
for label, next in state.arcs:
if next in todo:
j = todo.index(next)
else:
j = len(todo)
todo.append(next)
if label is None:
print(" -> %d" % j)
else:
print(" %s -> %d" % (label, j))
def dump_dfa(self, name, dfa):
print("Dump of DFA for", name)
for i, state in enumerate(dfa):
print(" State", i, state.isfinal and "(final)" or "")
for label, next in state.arcs.items():
print(" %s -> %d" % (label, dfa.index(next)))
def simplify_dfa(self, dfa):
# This is not theoretically optimal, but works well enough.
# Algorithm: repeatedly look for two states that have the same
# set of arcs (same labels pointing to the same nodes) and
# unify them, until things stop changing.
# dfa is a list of DFAState instances
changes = True
while changes:
changes = False
for i, state_i in enumerate(dfa):
for j in range(i+1, len(dfa)):
state_j = dfa[j]
if state_i == state_j:
#print " unify", i, j
del dfa[j]
for state in dfa:
state.unifystate(state_j, state_i)
changes = True
break
def parse_rhs(self):
# RHS: ALT ('|' ALT)*
a, z = self.parse_alt()
if self.value != "|":
return a, z
else:
aa = NFAState()
zz = NFAState()
aa.addarc(a)
z.addarc(zz)
while self.value == "|":
self.gettoken()
a, z = self.parse_alt()
aa.addarc(a)
z.addarc(zz)
return aa, zz
def parse_alt(self):
# ALT: ITEM+
a, b = self.parse_item()
while (self.value in ("(", "[") or
self.type in (token.NAME, token.STRING)):
c, d = self.parse_item()
b.addarc(c)
b = d
return a, b
def parse_item(self):
# ITEM: '[' RHS ']' | ATOM ['+' | '*']
if self.value == "[":
self.gettoken()
a, z = self.parse_rhs()
self.expect(token.OP, "]")
a.addarc(z)
return a, z
else:
a, z = self.parse_atom()
value = self.value
if value not in ("+", "*"):
return a, z
self.gettoken()
z.addarc(a)
if value == "+":
return a, z
else:
return a, a
def parse_atom(self):
# ATOM: '(' RHS ')' | NAME | STRING
if self.value == "(":
self.gettoken()
a, z = self.parse_rhs()
self.expect(token.OP, ")")
return a, z
elif self.type in (token.NAME, token.STRING):
a = NFAState()
z = NFAState()
a.addarc(z, self.value)
self.gettoken()
return a, z
else:
self.raise_error("expected (...) or NAME or STRING, got %s/%s",
self.type, self.value)
def expect(self, type, value=None):
if self.type != type or (value is not None and self.value != value):
self.raise_error("expected %s/%s, got %s/%s",
type, value, self.type, self.value)
value = self.value
self.gettoken()
return value
def gettoken(self):
tup = next(self.generator)
while tup[0] in (tokenize.COMMENT, tokenize.NL):
tup = next(self.generator)
self.type, self.value, self.begin, self.end, self.line = tup
#print token.tok_name[self.type], repr(self.value)
def raise_error(self, msg, *args):
if args:
try:
msg = msg % args
except:
msg = " ".join([msg] + list(map(str, args)))
raise SyntaxError(msg, (self.filename, self.end[0],
self.end[1], self.line))
class NFAState(object):
def __init__(self):
self.arcs = [] # list of (label, NFAState) pairs
def addarc(self, next, label=None):
assert label is None or isinstance(label, str)
assert isinstance(next, NFAState)
self.arcs.append((label, next))
class DFAState(object):
def __init__(self, nfaset, final):
assert isinstance(nfaset, dict)
assert isinstance(next(iter(nfaset)), NFAState)
assert isinstance(final, NFAState)
self.nfaset = nfaset
self.isfinal = final in nfaset
self.arcs = {} # map from label to DFAState
def addarc(self, next, label):
assert isinstance(label, str)
assert label not in self.arcs
assert isinstance(next, DFAState)
self.arcs[label] = next
def unifystate(self, old, new):
for label, next in self.arcs.items():
if next is old:
self.arcs[label] = new
def __eq__(self, other):
# Equality test -- ignore the nfaset instance variable
assert isinstance(other, DFAState)
if self.isfinal != other.isfinal:
return False
# Can't just return self.arcs == other.arcs, because that
# would invoke this method recursively, with cycles...
if len(self.arcs) != len(other.arcs):
return False
for label, next in self.arcs.items():
if next is not other.arcs.get(label):
return False
return True
__hash__ = None # For Py3 compatibility.
def generate_grammar(filename="Grammar.txt"):
p = ParserGenerator(filename)
return p.make_grammar()
#! /usr/bin/env python3
"""Token constants (from "token.h")."""
# Taken from Python (r53757) and modified to include some tokens
# originally monkeypatched in by pgen2.tokenize
#--start constants--
ENDMARKER = 0
NAME = 1
NUMBER = 2
STRING = 3
NEWLINE = 4
INDENT = 5
DEDENT = 6
LPAR = 7
RPAR = 8
LSQB = 9
RSQB = 10
COLON = 11
COMMA = 12
SEMI = 13
PLUS = 14
MINUS = 15
STAR = 16
SLASH = 17
VBAR = 18
AMPER = 19
LESS = 20
GREATER = 21
EQUAL = 22
DOT = 23
PERCENT = 24
BACKQUOTE = 25
LBRACE = 26
RBRACE = 27
EQEQUAL = 28
NOTEQUAL = 29
LESSEQUAL = 30
GREATEREQUAL = 31
TILDE = 32
CIRCUMFLEX = 33
LEFTSHIFT = 34
RIGHTSHIFT = 35
DOUBLESTAR = 36
PLUSEQUAL = 37
MINEQUAL = 38
STAREQUAL = 39
SLASHEQUAL = 40
PERCENTEQUAL = 41
AMPEREQUAL = 42
VBAREQUAL = 43
CIRCUMFLEXEQUAL = 44
LEFTSHIFTEQUAL = 45
RIGHTSHIFTEQUAL = 46
DOUBLESTAREQUAL = 47
DOUBLESLASH = 48
DOUBLESLASHEQUAL = 49
AT = 50
ATEQUAL = 51
OP = 52
COMMENT = 53
NL = 54
RARROW = 55
ERRORTOKEN = 56
N_TOKENS = 57
NT_OFFSET = 256
#--end constants--
tok_name = {}
for _name, _value in list(globals().items()):
if type(_value) is type(0):
tok_name[_value] = _name
def ISTERMINAL(x):
return x < NT_OFFSET
def ISNONTERMINAL(x):
return x >= NT_OFFSET
def ISEOF(x):
return x == ENDMARKER
# Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation.
# All rights reserved.
"""Tokenization help for Python programs.
generate_tokens(readline) is a generator that breaks a stream of
text into Python tokens. It accepts a readline-like method which is called
repeatedly to get the next line of input (or "" for EOF). It generates
5-tuples with these members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
Older entry points
tokenize_loop(readline, tokeneater)
tokenize(readline, tokeneater=printtoken)
are the same, except instead of generating tokens, tokeneater is a callback
function to which the 5 fields described above are passed as 5 arguments,
each time a new token is found."""
__author__ = 'Ka-Ping Yee <[email protected]>'
__credits__ = \
'GvR, ESR, Tim Peters, Thomas Wouters, Fred Drake, Skip Montanaro'
import string, re
from codecs import BOM_UTF8, lookup
from lib2to3.pgen2.token import *
from . import token
__all__ = [x for x in dir(token) if x[0] != '_'] + ["tokenize",
"generate_tokens", "untokenize"]
del token
try:
bytes
except NameError:
# Support bytes type in Python <= 2.5, so 2to3 turns itself into
# valid Python 3 code.
bytes = str
def group(*choices): return '(' + '|'.join(choices) + ')'
def any(*choices): return group(*choices) + '*'
def maybe(*choices): return group(*choices) + '?'
Whitespace = r'[ \f\t]*'
Comment = r'#[^\r\n]*'
Ignore = Whitespace + any(r'\\\r?\n' + Whitespace) + maybe(Comment)
Name = r'[a-zA-Z_]\w*'
Binnumber = r'0[bB][01]*'
Hexnumber = r'0[xX][\da-fA-F]*[lL]?'
Octnumber = r'0[oO]?[0-7]*[lL]?'
Decnumber = r'[1-9]\d*[lL]?'
Intnumber = group(Binnumber, Hexnumber, Octnumber, Decnumber)
Exponent = r'[eE][-+]?\d+'
Pointfloat = group(r'\d+\.\d*', r'\.\d+') + maybe(Exponent)
Expfloat = r'\d+' + Exponent
Floatnumber = group(Pointfloat, Expfloat)
Imagnumber = group(r'\d+[jJ]', Floatnumber + r'[jJ]')
Number = group(Imagnumber, Floatnumber, Intnumber)
# Tail end of ' string.
Single = r"[^'\\]*(?:\\.[^'\\]*)*'"
# Tail end of " string.
Double = r'[^"\\]*(?:\\.[^"\\]*)*"'
# Tail end of ''' string.
Single3 = r"[^'\\]*(?:(?:\\.|'(?!''))[^'\\]*)*'''"
# Tail end of """ string.
Double3 = r'[^"\\]*(?:(?:\\.|"(?!""))[^"\\]*)*"""'
Triple = group("[ubUB]?[rR]?'''", '[ubUB]?[rR]?"""')
# Single-line ' or " string.
String = group(r"[uU]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*'",
r'[uU]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*"')
# Because of leftmost-then-longest match semantics, be sure to put the
# longest operators first (e.g., if = came before ==, == would get
# recognized as two instances of =).
Operator = group(r"\*\*=?", r">>=?", r"<<=?", r"<>", r"!=",
r"//=?", r"->",
r"[+\-*/%&@|^=<>]=?",
r"~")
Bracket = '[][(){}]'
Special = group(r'\r?\n', r'[:;.,`@]')
Funny = group(Operator, Bracket, Special)
PlainToken = group(Number, Funny, String, Name)
Token = Ignore + PlainToken
# First (or only) line of ' or " string.
ContStr = group(r"[uUbB]?[rR]?'[^\n'\\]*(?:\\.[^\n'\\]*)*" +
group("'", r'\\\r?\n'),
r'[uUbB]?[rR]?"[^\n"\\]*(?:\\.[^\n"\\]*)*' +
group('"', r'\\\r?\n'))
PseudoExtras = group(r'\\\r?\n', Comment, Triple)
PseudoToken = Whitespace + group(PseudoExtras, Number, Funny, ContStr, Name)
tokenprog, pseudoprog, single3prog, double3prog = list(map(
re.compile, (Token, PseudoToken, Single3, Double3)))
endprogs = {"'": re.compile(Single), '"': re.compile(Double),
"'''": single3prog, '"""': double3prog,
"r'''": single3prog, 'r"""': double3prog,
"u'''": single3prog, 'u"""': double3prog,
"b'''": single3prog, 'b"""': double3prog,
"ur'''": single3prog, 'ur"""': double3prog,
"br'''": single3prog, 'br"""': double3prog,
"R'''": single3prog, 'R"""': double3prog,
"U'''": single3prog, 'U"""': double3prog,
"B'''": single3prog, 'B"""': double3prog,
"uR'''": single3prog, 'uR"""': double3prog,
"Ur'''": single3prog, 'Ur"""': double3prog,
"UR'''": single3prog, 'UR"""': double3prog,
"bR'''": single3prog, 'bR"""': double3prog,
"Br'''": single3prog, 'Br"""': double3prog,
"BR'''": single3prog, 'BR"""': double3prog,
'r': None, 'R': None,
'u': None, 'U': None,
'b': None, 'B': None}
triple_quoted = {}
for t in ("'''", '"""',
"r'''", 'r"""', "R'''", 'R"""',
"u'''", 'u"""', "U'''", 'U"""',
"b'''", 'b"""', "B'''", 'B"""',
"ur'''", 'ur"""', "Ur'''", 'Ur"""',
"uR'''", 'uR"""', "UR'''", 'UR"""',
"br'''", 'br"""', "Br'''", 'Br"""',
"bR'''", 'bR"""', "BR'''", 'BR"""',):
triple_quoted[t] = t
single_quoted = {}
for t in ("'", '"',
"r'", 'r"', "R'", 'R"',
"u'", 'u"', "U'", 'U"',
"b'", 'b"', "B'", 'B"',
"ur'", 'ur"', "Ur'", 'Ur"',
"uR'", 'uR"', "UR'", 'UR"',
"br'", 'br"', "Br'", 'Br"',
"bR'", 'bR"', "BR'", 'BR"', ):
single_quoted[t] = t
tabsize = 8
class TokenError(Exception): pass
class StopTokenizing(Exception): pass
def printtoken(type, token, xxx_todo_changeme, xxx_todo_changeme1, line): # for testing
(srow, scol) = xxx_todo_changeme
(erow, ecol) = xxx_todo_changeme1
print("%d,%d-%d,%d:\t%s\t%s" % \
(srow, scol, erow, ecol, tok_name[type], repr(token)))
def tokenize(readline, tokeneater=printtoken):
"""
The tokenize() function accepts two parameters: one representing the
input stream, and one providing an output mechanism for tokenize().
The first parameter, readline, must be a callable object which provides
the same interface as the readline() method of built-in file objects.
Each call to the function should return one line of input as a string.
The second parameter, tokeneater, must also be a callable object. It is
called once for each token, with five arguments, corresponding to the
tuples generated by generate_tokens().
"""
try:
tokenize_loop(readline, tokeneater)
except StopTokenizing:
pass
# backwards compatible interface
def tokenize_loop(readline, tokeneater):
for token_info in generate_tokens(readline):
tokeneater(*token_info)
class Untokenizer:
def __init__(self):
self.tokens = []
self.prev_row = 1
self.prev_col = 0
def add_whitespace(self, start):
row, col = start
assert row <= self.prev_row
col_offset = col - self.prev_col
if col_offset:
self.tokens.append(" " * col_offset)
def untokenize(self, iterable):
for t in iterable:
if len(t) == 2:
self.compat(t, iterable)
break
tok_type, token, start, end, line = t
self.add_whitespace(start)
self.tokens.append(token)
self.prev_row, self.prev_col = end
if tok_type in (NEWLINE, NL):
self.prev_row += 1
self.prev_col = 0
return "".join(self.tokens)
def compat(self, token, iterable):
startline = False
indents = []
toks_append = self.tokens.append
toknum, tokval = token
if toknum in (NAME, NUMBER):
tokval += ' '
if toknum in (NEWLINE, NL):
startline = True
for tok in iterable:
toknum, tokval = tok[:2]
if toknum in (NAME, NUMBER):
tokval += ' '
if toknum == INDENT:
indents.append(tokval)
continue
elif toknum == DEDENT:
indents.pop()
continue
elif toknum in (NEWLINE, NL):
startline = True
elif startline and indents:
toks_append(indents[-1])
startline = False
toks_append(tokval)
cookie_re = re.compile(r'^[ \t\f]*#.*coding[:=][ \t]*([-\w.]+)', re.ASCII)
blank_re = re.compile(br'^[ \t\f]*(?:[#\r\n]|$)', re.ASCII)
def _get_normal_name(orig_enc):
"""Imitates get_normal_name in tokenizer.c."""
# Only care about the first 12 characters.
enc = orig_enc[:12].lower().replace("_", "-")
if enc == "utf-8" or enc.startswith("utf-8-"):
return "utf-8"
if enc in ("latin-1", "iso-8859-1", "iso-latin-1") or \
enc.startswith(("latin-1-", "iso-8859-1-", "iso-latin-1-")):
return "iso-8859-1"
return orig_enc
def detect_encoding(readline):
"""
The detect_encoding() function is used to detect the encoding that should
be used to decode a Python source file. It requires one argument, readline,
in the same way as the tokenize() generator.
It will call readline a maximum of twice, and return the encoding used
(as a string) and a list of any lines (left as bytes) it has read
in.
It detects the encoding from the presence of a utf-8 bom or an encoding
cookie as specified in pep-0263. If both a bom and a cookie are present, but
disagree, a SyntaxError will be raised. If the encoding cookie is an invalid
charset, raise a SyntaxError. Note that if a utf-8 bom is found,
'utf-8-sig' is returned.
If no encoding is specified, then the default of 'utf-8' will be returned.
"""
bom_found = False
encoding = None
default = 'utf-8'
def read_or_stop():
try:
return readline()
except StopIteration:
return bytes()
def find_cookie(line):
try:
line_string = line.decode('ascii')
except UnicodeDecodeError:
return None
match = cookie_re.match(line_string)
if not match:
return None
encoding = _get_normal_name(match.group(1))
try:
codec = lookup(encoding)
except LookupError:
# This behaviour mimics the Python interpreter
raise SyntaxError("unknown encoding: " + encoding)
if bom_found:
if codec.name != 'utf-8':
# This behaviour mimics the Python interpreter
raise SyntaxError('encoding problem: utf-8')
encoding += '-sig'
return encoding
first = read_or_stop()
if first.startswith(BOM_UTF8):
bom_found = True
first = first[3:]
default = 'utf-8-sig'
if not first:
return default, []
encoding = find_cookie(first)
if encoding:
return encoding, [first]
if not blank_re.match(first):
return default, [first]
second = read_or_stop()
if not second:
return default, [first]
encoding = find_cookie(second)
if encoding:
return encoding, [first, second]
return default, [first, second]
def untokenize(iterable):
"""Transform tokens back into Python source code.
Each element returned by the iterable must be a token sequence
with at least two elements, a token number and token value. If
only two tokens are passed, the resulting output is poor.
Round-trip invariant for full input:
Untokenized source will match input source exactly
Round-trip invariant for limited intput:
# Output text will tokenize the back to the input
t1 = [tok[:2] for tok in generate_tokens(f.readline)]
newcode = untokenize(t1)
readline = iter(newcode.splitlines(1)).next
t2 = [tok[:2] for tokin generate_tokens(readline)]
assert t1 == t2
"""
ut = Untokenizer()
return ut.untokenize(iterable)
def generate_tokens(readline):
"""
The generate_tokens() generator requires one argument, readline, which
must be a callable object which provides the same interface as the
readline() method of built-in file objects. Each call to the function
should return one line of input as a string. Alternately, readline
can be a callable function terminating with StopIteration:
readline = open(myfile).next # Example of alternate readline
The generator produces 5-tuples with these members: the token type; the
token string; a 2-tuple (srow, scol) of ints specifying the row and
column where the token begins in the source; a 2-tuple (erow, ecol) of
ints specifying the row and column where the token ends in the source;
and the line on which the token was found. The line passed is the
logical line; continuation lines are included.
"""
lnum = parenlev = continued = 0
namechars, numchars = string.ascii_letters + '_', '0123456789'
contstr, needcont = '', 0
contline = None
indents = [0]
while 1: # loop over lines in stream
try:
line = readline()
except StopIteration:
line = ''
lnum = lnum + 1
pos, max = 0, len(line)
if contstr: # continued string
if not line:
raise TokenError("EOF in multi-line string", strstart)
endmatch = endprog.match(line)
if endmatch:
pos = end = endmatch.end(0)
yield (STRING, contstr + line[:end],
strstart, (lnum, end), contline + line)
contstr, needcont = '', 0
contline = None
elif needcont and line[-2:] != '\\\n' and line[-3:] != '\\\r\n':
yield (ERRORTOKEN, contstr + line,
strstart, (lnum, len(line)), contline)
contstr = ''
contline = None
continue
else:
contstr = contstr + line
contline = contline + line
continue
elif parenlev == 0 and not continued: # new statement
if not line: break
column = 0
while pos < max: # measure leading whitespace
if line[pos] == ' ': column = column + 1
elif line[pos] == '\t': column = (column//tabsize + 1)*tabsize
elif line[pos] == '\f': column = 0
else: break
pos = pos + 1
if pos == max: break
if line[pos] in '#\r\n': # skip comments or blank lines
if line[pos] == '#':
comment_token = line[pos:].rstrip('\r\n')
nl_pos = pos + len(comment_token)
yield (COMMENT, comment_token,
(lnum, pos), (lnum, pos + len(comment_token)), line)
yield (NL, line[nl_pos:],
(lnum, nl_pos), (lnum, len(line)), line)
else:
yield ((NL, COMMENT)[line[pos] == '#'], line[pos:],
(lnum, pos), (lnum, len(line)), line)
continue
if column > indents[-1]: # count indents or dedents
indents.append(column)
yield (INDENT, line[:pos], (lnum, 0), (lnum, pos), line)
while column < indents[-1]:
if column not in indents:
raise IndentationError(
"unindent does not match any outer indentation level",
("<tokenize>", lnum, pos, line))
indents = indents[:-1]
yield (DEDENT, '', (lnum, pos), (lnum, pos), line)
else: # continued statement
if not line:
raise TokenError("EOF in multi-line statement", (lnum, 0))
continued = 0
while pos < max:
pseudomatch = pseudoprog.match(line, pos)
if pseudomatch: # scan for tokens
start, end = pseudomatch.span(1)
spos, epos, pos = (lnum, start), (lnum, end), end
token, initial = line[start:end], line[start]
if initial in numchars or \
(initial == '.' and token != '.'): # ordinary number
yield (NUMBER, token, spos, epos, line)
elif initial in '\r\n':
newline = NEWLINE
if parenlev > 0:
newline = NL
yield (newline, token, spos, epos, line)
elif initial == '#':
assert not token.endswith("\n")
yield (COMMENT, token, spos, epos, line)
elif token in triple_quoted:
endprog = endprogs[token]
endmatch = endprog.match(line, pos)
if endmatch: # all on one line
pos = endmatch.end(0)
token = line[start:pos]
yield (STRING, token, spos, (lnum, pos), line)
else:
strstart = (lnum, start) # multiple lines
contstr = line[start:]
contline = line
break
elif initial in single_quoted or \
token[:2] in single_quoted or \
token[:3] in single_quoted:
if token[-1] == '\n': # continued string
strstart = (lnum, start)
endprog = (endprogs[initial] or endprogs[token[1]] or
endprogs[token[2]])
contstr, needcont = line[start:], 1
contline = line
break
else: # ordinary string
yield (STRING, token, spos, epos, line)
elif initial in namechars: # ordinary name
yield (NAME, token, spos, epos, line)
elif initial == '\\': # continued stmt
# This yield is new; needed for better idempotency:
yield (NL, token, spos, (lnum, pos), line)
continued = 1
else:
if initial in '([{': parenlev = parenlev + 1
elif initial in ')]}': parenlev = parenlev - 1
yield (OP, token, spos, epos, line)
else:
yield (ERRORTOKEN, line[pos],
(lnum, pos), (lnum, pos+1), line)
pos = pos + 1
for indent in indents[1:]: # pop remaining indent levels
yield (DEDENT, '', (lnum, 0), (lnum, 0), '')
yield (ENDMARKER, '', (lnum, 0), (lnum, 0), '')
if __name__ == '__main__': # testing
import sys
if len(sys.argv) > 1: tokenize(open(sys.argv[1]).readline)
else: tokenize(sys.stdin.readline)
# Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
# Licensed to PSF under a Contributor Agreement.
"""The pgen2 package."""
# Copyright 2001-2014 by Vinay Sajip. All Rights Reserved.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Vinay Sajip
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
"""
Configuration functions for the logging package for Python. The core package
is based on PEP 282 and comments thereto in comp.lang.python, and influenced
by Apache's log4j system.
Copyright (C) 2001-2014 Vinay Sajip. All Rights Reserved.
To use, simply 'import logging' and log away!
"""
import errno
import io
import logging
import logging.handlers
import re
import struct
import sys
import traceback
try:
import _thread as thread
import threading
except ImportError: #pragma: no cover
thread = None
from socketserver import ThreadingTCPServer, StreamRequestHandler
DEFAULT_LOGGING_CONFIG_PORT = 9030
RESET_ERROR = errno.ECONNRESET
#
# The following code implements a socket listener for on-the-fly
# reconfiguration of logging.
#
# _listener holds the server object doing the listening
_listener = None
def fileConfig(fname, defaults=None, disable_existing_loggers=True):
"""
Read the logging configuration from a ConfigParser-format file.
This can be called several times from an application, allowing an end user
the ability to select from various pre-canned configurations (if the
developer provides a mechanism to present the choices and load the chosen
configuration).
"""
import configparser
if isinstance(fname, configparser.RawConfigParser):
cp = fname
else:
cp = configparser.ConfigParser(defaults)
if hasattr(fname, 'readline'):
cp.read_file(fname)
else:
cp.read(fname)
formatters = _create_formatters(cp)
# critical section
logging._acquireLock()
try:
logging._handlers.clear()
del logging._handlerList[:]
# Handlers add themselves to logging._handlers
handlers = _install_handlers(cp, formatters)
_install_loggers(cp, handlers, disable_existing_loggers)
finally:
logging._releaseLock()
def _resolve(name):
"""Resolve a dotted name to a global object."""
name = name.split('.')
used = name.pop(0)
found = __import__(used)
for n in name:
used = used + '.' + n
try:
found = getattr(found, n)
except AttributeError:
__import__(used)
found = getattr(found, n)
return found
def _strip_spaces(alist):
return map(lambda x: x.strip(), alist)
def _create_formatters(cp):
"""Create and return formatters"""
flist = cp["formatters"]["keys"]
if not len(flist):
return {}
flist = flist.split(",")
flist = _strip_spaces(flist)
formatters = {}
for form in flist:
sectname = "formatter_%s" % form
fs = cp.get(sectname, "format", raw=True, fallback=None)
dfs = cp.get(sectname, "datefmt", raw=True, fallback=None)
c = logging.Formatter
class_name = cp[sectname].get("class")
if class_name:
c = _resolve(class_name)
f = c(fs, dfs)
formatters[form] = f
return formatters
def _install_handlers(cp, formatters):
"""Install and return handlers"""
hlist = cp["handlers"]["keys"]
if not len(hlist):
return {}
hlist = hlist.split(",")
hlist = _strip_spaces(hlist)
handlers = {}
fixups = [] #for inter-handler references
for hand in hlist:
section = cp["handler_%s" % hand]
klass = section["class"]
fmt = section.get("formatter", "")
try:
klass = eval(klass, vars(logging))
except (AttributeError, NameError):
klass = _resolve(klass)
args = section["args"]
args = eval(args, vars(logging))
h = klass(*args)
if "level" in section:
level = section["level"]
h.setLevel(level)
if len(fmt):
h.setFormatter(formatters[fmt])
if issubclass(klass, logging.handlers.MemoryHandler):
target = section.get("target", "")
if len(target): #the target handler may not be loaded yet, so keep for later...
fixups.append((h, target))
handlers[hand] = h
#now all handlers are loaded, fixup inter-handler references...
for h, t in fixups:
h.setTarget(handlers[t])
return handlers
def _handle_existing_loggers(existing, child_loggers, disable_existing):
"""
When (re)configuring logging, handle loggers which were in the previous
configuration but are not in the new configuration. There's no point
deleting them as other threads may continue to hold references to them;
and by disabling them, you stop them doing any logging.
However, don't disable children of named loggers, as that's probably not
what was intended by the user. Also, allow existing loggers to NOT be
disabled if disable_existing is false.
"""
root = logging.root
for log in existing:
logger = root.manager.loggerDict[log]
if log in child_loggers:
logger.level = logging.NOTSET
logger.handlers = []
logger.propagate = True
else:
logger.disabled = disable_existing
def _install_loggers(cp, handlers, disable_existing):
"""Create and install loggers"""
# configure the root first
llist = cp["loggers"]["keys"]
llist = llist.split(",")
llist = list(map(lambda x: x.strip(), llist))
llist.remove("root")
section = cp["logger_root"]
root = logging.root
log = root
if "level" in section:
level = section["level"]
log.setLevel(level)
for h in root.handlers[:]:
root.removeHandler(h)
hlist = section["handlers"]
if len(hlist):
hlist = hlist.split(",")
hlist = _strip_spaces(hlist)
for hand in hlist:
log.addHandler(handlers[hand])
#and now the others...
#we don't want to lose the existing loggers,
#since other threads may have pointers to them.
#existing is set to contain all existing loggers,
#and as we go through the new configuration we
#remove any which are configured. At the end,
#what's left in existing is the set of loggers
#which were in the previous configuration but
#which are not in the new configuration.
existing = list(root.manager.loggerDict.keys())
#The list needs to be sorted so that we can
#avoid disabling child loggers of explicitly
#named loggers. With a sorted list it is easier
#to find the child loggers.
existing.sort()
#We'll keep the list of existing loggers
#which are children of named loggers here...
child_loggers = []
#now set up the new ones...
for log in llist:
section = cp["logger_%s" % log]
qn = section["qualname"]
propagate = section.getint("propagate", fallback=1)
logger = logging.getLogger(qn)
if qn in existing:
i = existing.index(qn) + 1 # start with the entry after qn
prefixed = qn + "."
pflen = len(prefixed)
num_existing = len(existing)
while i < num_existing:
if existing[i][:pflen] == prefixed:
child_loggers.append(existing[i])
i += 1
existing.remove(qn)
if "level" in section:
level = section["level"]
logger.setLevel(level)
for h in logger.handlers[:]:
logger.removeHandler(h)
logger.propagate = propagate
logger.disabled = 0
hlist = section["handlers"]
if len(hlist):
hlist = hlist.split(",")
hlist = _strip_spaces(hlist)
for hand in hlist:
logger.addHandler(handlers[hand])
#Disable any old loggers. There's no point deleting
#them as other threads may continue to hold references
#and by disabling them, you stop them doing any logging.
#However, don't disable children of named loggers, as that's
#probably not what was intended by the user.
#for log in existing:
# logger = root.manager.loggerDict[log]
# if log in child_loggers:
# logger.level = logging.NOTSET
# logger.handlers = []
# logger.propagate = 1
# elif disable_existing_loggers:
# logger.disabled = 1
_handle_existing_loggers(existing, child_loggers, disable_existing)
IDENTIFIER = re.compile('^[a-z_][a-z0-9_]*$', re.I)
def valid_ident(s):
m = IDENTIFIER.match(s)
if not m:
raise ValueError('Not a valid Python identifier: %r' % s)
return True
class ConvertingMixin(object):
"""For ConvertingXXX's, this mixin class provides common functions"""
def convert_with_key(self, key, value, replace=True):
result = self.configurator.convert(value)
#If the converted value is different, save for next time
if value is not result:
if replace:
self[key] = result
if type(result) in (ConvertingDict, ConvertingList,
ConvertingTuple):
result.parent = self
result.key = key
return result
def convert(self, value):
result = self.configurator.convert(value)
if value is not result:
if type(result) in (ConvertingDict, ConvertingList,
ConvertingTuple):
result.parent = self
return result
# The ConvertingXXX classes are wrappers around standard Python containers,
# and they serve to convert any suitable values in the container. The
# conversion converts base dicts, lists and tuples to their wrapped
# equivalents, whereas strings which match a conversion format are converted
# appropriately.
#
# Each wrapper should have a configurator attribute holding the actual
# configurator to use for conversion.
class ConvertingDict(dict, ConvertingMixin):
"""A converting dictionary wrapper."""
def __getitem__(self, key):
value = dict.__getitem__(self, key)
return self.convert_with_key(key, value)
def get(self, key, default=None):
value = dict.get(self, key, default)
return self.convert_with_key(key, value)
def pop(self, key, default=None):
value = dict.pop(self, key, default)
return self.convert_with_key(key, value, replace=False)
class ConvertingList(list, ConvertingMixin):
"""A converting list wrapper."""
def __getitem__(self, key):
value = list.__getitem__(self, key)
return self.convert_with_key(key, value)
def pop(self, idx=-1):
value = list.pop(self, idx)
return self.convert(value)
class ConvertingTuple(tuple, ConvertingMixin):
"""A converting tuple wrapper."""
def __getitem__(self, key):
value = tuple.__getitem__(self, key)
# Can't replace a tuple entry.
return self.convert_with_key(key, value, replace=False)
class BaseConfigurator(object):
"""
The configurator base class which defines some useful defaults.
"""
CONVERT_PATTERN = re.compile(r'^(?P<prefix>[a-z]+)://(?P<suffix>.*)$')
WORD_PATTERN = re.compile(r'^\s*(\w+)\s*')
DOT_PATTERN = re.compile(r'^\.\s*(\w+)\s*')
INDEX_PATTERN = re.compile(r'^\[\s*(\w+)\s*\]\s*')
DIGIT_PATTERN = re.compile(r'^\d+$')
value_converters = {
'ext' : 'ext_convert',
'cfg' : 'cfg_convert',
}
# We might want to use a different one, e.g. importlib
importer = staticmethod(__import__)
def __init__(self, config):
self.config = ConvertingDict(config)
self.config.configurator = self
def resolve(self, s):
"""
Resolve strings to objects using standard import and attribute
syntax.
"""
name = s.split('.')
used = name.pop(0)
try:
found = self.importer(used)
for frag in name:
used += '.' + frag
try:
found = getattr(found, frag)
except AttributeError:
self.importer(used)
found = getattr(found, frag)
return found
except ImportError:
e, tb = sys.exc_info()[1:]
v = ValueError('Cannot resolve %r: %s' % (s, e))
v.__cause__, v.__traceback__ = e, tb
raise v
def ext_convert(self, value):
"""Default converter for the ext:// protocol."""
return self.resolve(value)
def cfg_convert(self, value):
"""Default converter for the cfg:// protocol."""
rest = value
m = self.WORD_PATTERN.match(rest)
if m is None:
raise ValueError("Unable to convert %r" % value)
else:
rest = rest[m.end():]
d = self.config[m.groups()[0]]
#print d, rest
while rest:
m = self.DOT_PATTERN.match(rest)
if m:
d = d[m.groups()[0]]
else:
m = self.INDEX_PATTERN.match(rest)
if m:
idx = m.groups()[0]
if not self.DIGIT_PATTERN.match(idx):
d = d[idx]
else:
try:
n = int(idx) # try as number first (most likely)
d = d[n]
except TypeError:
d = d[idx]
if m:
rest = rest[m.end():]
else:
raise ValueError('Unable to convert '
'%r at %r' % (value, rest))
#rest should be empty
return d
def convert(self, value):
"""
Convert values to an appropriate type. dicts, lists and tuples are
replaced by their converting alternatives. Strings are checked to
see if they have a conversion format and are converted if they do.
"""
if not isinstance(value, ConvertingDict) and isinstance(value, dict):
value = ConvertingDict(value)
value.configurator = self
elif not isinstance(value, ConvertingList) and isinstance(value, list):
value = ConvertingList(value)
value.configurator = self
elif not isinstance(value, ConvertingTuple) and\
isinstance(value, tuple):
value = ConvertingTuple(value)
value.configurator = self
elif isinstance(value, str): # str for py3k
m = self.CONVERT_PATTERN.match(value)
if m:
d = m.groupdict()
prefix = d['prefix']
converter = self.value_converters.get(prefix, None)
if converter:
suffix = d['suffix']
converter = getattr(self, converter)
value = converter(suffix)
return value
def configure_custom(self, config):
"""Configure an object with a user-supplied factory."""
c = config.pop('()')
if not callable(c):
c = self.resolve(c)
props = config.pop('.', None)
# Check for valid identifiers
kwargs = dict([(k, config[k]) for k in config if valid_ident(k)])
result = c(**kwargs)
if props:
for name, value in props.items():
setattr(result, name, value)
return result
def as_tuple(self, value):
"""Utility function which converts lists to tuples."""
if isinstance(value, list):
value = tuple(value)
return value
class DictConfigurator(BaseConfigurator):
"""
Configure logging using a dictionary-like object to describe the
configuration.
"""
def configure(self):
"""Do the configuration."""
config = self.config
if 'version' not in config:
raise ValueError("dictionary doesn't specify a version")
if config['version'] != 1:
raise ValueError("Unsupported version: %s" % config['version'])
incremental = config.pop('incremental', False)
EMPTY_DICT = {}
logging._acquireLock()
try:
if incremental:
handlers = config.get('handlers', EMPTY_DICT)
for name in handlers:
if name not in logging._handlers:
raise ValueError('No handler found with '
'name %r' % name)
else:
try:
handler = logging._handlers[name]
handler_config = handlers[name]
level = handler_config.get('level', None)
if level:
handler.setLevel(logging._checkLevel(level))
except Exception as e:
raise ValueError('Unable to configure handler '
'%r: %s' % (name, e))
loggers = config.get('loggers', EMPTY_DICT)
for name in loggers:
try:
self.configure_logger(name, loggers[name], True)
except Exception as e:
raise ValueError('Unable to configure logger '
'%r: %s' % (name, e))
root = config.get('root', None)
if root:
try:
self.configure_root(root, True)
except Exception as e:
raise ValueError('Unable to configure root '
'logger: %s' % e)
else:
disable_existing = config.pop('disable_existing_loggers', True)
logging._handlers.clear()
del logging._handlerList[:]
# Do formatters first - they don't refer to anything else
formatters = config.get('formatters', EMPTY_DICT)
for name in formatters:
try:
formatters[name] = self.configure_formatter(
formatters[name])
except Exception as e:
raise ValueError('Unable to configure '
'formatter %r: %s' % (name, e))
# Next, do filters - they don't refer to anything else, either
filters = config.get('filters', EMPTY_DICT)
for name in filters:
try:
filters[name] = self.configure_filter(filters[name])
except Exception as e:
raise ValueError('Unable to configure '
'filter %r: %s' % (name, e))
# Next, do handlers - they refer to formatters and filters
# As handlers can refer to other handlers, sort the keys
# to allow a deterministic order of configuration
handlers = config.get('handlers', EMPTY_DICT)
deferred = []
for name in sorted(handlers):
try:
handler = self.configure_handler(handlers[name])
handler.name = name
handlers[name] = handler
except Exception as e:
if 'target not configured yet' in str(e):
deferred.append(name)
else:
raise ValueError('Unable to configure handler '
'%r: %s' % (name, e))
# Now do any that were deferred
for name in deferred:
try:
handler = self.configure_handler(handlers[name])
handler.name = name
handlers[name] = handler
except Exception as e:
raise ValueError('Unable to configure handler '
'%r: %s' % (name, e))
# Next, do loggers - they refer to handlers and filters
#we don't want to lose the existing loggers,
#since other threads may have pointers to them.
#existing is set to contain all existing loggers,
#and as we go through the new configuration we
#remove any which are configured. At the end,
#what's left in existing is the set of loggers
#which were in the previous configuration but
#which are not in the new configuration.
root = logging.root
existing = list(root.manager.loggerDict.keys())
#The list needs to be sorted so that we can
#avoid disabling child loggers of explicitly
#named loggers. With a sorted list it is easier
#to find the child loggers.
existing.sort()
#We'll keep the list of existing loggers
#which are children of named loggers here...
child_loggers = []
#now set up the new ones...
loggers = config.get('loggers', EMPTY_DICT)
for name in loggers:
if name in existing:
i = existing.index(name) + 1 # look after name
prefixed = name + "."
pflen = len(prefixed)
num_existing = len(existing)
while i < num_existing:
if existing[i][:pflen] == prefixed:
child_loggers.append(existing[i])
i += 1
existing.remove(name)
try:
self.configure_logger(name, loggers[name])
except Exception as e:
raise ValueError('Unable to configure logger '
'%r: %s' % (name, e))
#Disable any old loggers. There's no point deleting
#them as other threads may continue to hold references
#and by disabling them, you stop them doing any logging.
#However, don't disable children of named loggers, as that's
#probably not what was intended by the user.
#for log in existing:
# logger = root.manager.loggerDict[log]
# if log in child_loggers:
# logger.level = logging.NOTSET
# logger.handlers = []
# logger.propagate = True
# elif disable_existing:
# logger.disabled = True
_handle_existing_loggers(existing, child_loggers,
disable_existing)
# And finally, do the root logger
root = config.get('root', None)
if root:
try:
self.configure_root(root)
except Exception as e:
raise ValueError('Unable to configure root '
'logger: %s' % e)
finally:
logging._releaseLock()
def configure_formatter(self, config):
"""Configure a formatter from a dictionary."""
if '()' in config:
factory = config['()'] # for use in exception handler
try:
result = self.configure_custom(config)
except TypeError as te:
if "'format'" not in str(te):
raise
#Name of parameter changed from fmt to format.
#Retry with old name.
#This is so that code can be used with older Python versions
#(e.g. by Django)
config['fmt'] = config.pop('format')
config['()'] = factory
result = self.configure_custom(config)
else:
fmt = config.get('format', None)
dfmt = config.get('datefmt', None)
style = config.get('style', '%')
result = logging.Formatter(fmt, dfmt, style)
return result
def configure_filter(self, config):
"""Configure a filter from a dictionary."""
if '()' in config:
result = self.configure_custom(config)
else:
name = config.get('name', '')
result = logging.Filter(name)
return result
def add_filters(self, filterer, filters):
"""Add filters to a filterer from a list of names."""
for f in filters:
try:
filterer.addFilter(self.config['filters'][f])
except Exception as e:
raise ValueError('Unable to add filter %r: %s' % (f, e))
def configure_handler(self, config):
"""Configure a handler from a dictionary."""
config_copy = dict(config) # for restoring in case of error
formatter = config.pop('formatter', None)
if formatter:
try:
formatter = self.config['formatters'][formatter]
except Exception as e:
raise ValueError('Unable to set formatter '
'%r: %s' % (formatter, e))
level = config.pop('level', None)
filters = config.pop('filters', None)
if '()' in config:
c = config.pop('()')
if not callable(c):
c = self.resolve(c)
factory = c
else:
cname = config.pop('class')
klass = self.resolve(cname)
#Special case for handler which refers to another handler
if issubclass(klass, logging.handlers.MemoryHandler) and\
'target' in config:
try:
th = self.config['handlers'][config['target']]
if not isinstance(th, logging.Handler):
config.update(config_copy) # restore for deferred cfg
raise TypeError('target not configured yet')
config['target'] = th
except Exception as e:
raise ValueError('Unable to set target handler '
'%r: %s' % (config['target'], e))
elif issubclass(klass, logging.handlers.SMTPHandler) and\
'mailhost' in config:
config['mailhost'] = self.as_tuple(config['mailhost'])
elif issubclass(klass, logging.handlers.SysLogHandler) and\
'address' in config:
config['address'] = self.as_tuple(config['address'])
factory = klass
props = config.pop('.', None)
kwargs = dict([(k, config[k]) for k in config if valid_ident(k)])
try:
result = factory(**kwargs)
except TypeError as te:
if "'stream'" not in str(te):
raise
#The argument name changed from strm to stream
#Retry with old name.
#This is so that code can be used with older Python versions
#(e.g. by Django)
kwargs['strm'] = kwargs.pop('stream')
result = factory(**kwargs)
if formatter:
result.setFormatter(formatter)
if level is not None:
result.setLevel(logging._checkLevel(level))
if filters:
self.add_filters(result, filters)
if props:
for name, value in props.items():
setattr(result, name, value)
return result
def add_handlers(self, logger, handlers):
"""Add handlers to a logger from a list of names."""
for h in handlers:
try:
logger.addHandler(self.config['handlers'][h])
except Exception as e:
raise ValueError('Unable to add handler %r: %s' % (h, e))
def common_logger_config(self, logger, config, incremental=False):
"""
Perform configuration which is common to root and non-root loggers.
"""
level = config.get('level', None)
if level is not None:
logger.setLevel(logging._checkLevel(level))
if not incremental:
#Remove any existing handlers
for h in logger.handlers[:]:
logger.removeHandler(h)
handlers = config.get('handlers', None)
if handlers:
self.add_handlers(logger, handlers)
filters = config.get('filters', None)
if filters:
self.add_filters(logger, filters)
def configure_logger(self, name, config, incremental=False):
"""Configure a non-root logger from a dictionary."""
logger = logging.getLogger(name)
self.common_logger_config(logger, config, incremental)
propagate = config.get('propagate', None)
if propagate is not None:
logger.propagate = propagate
def configure_root(self, config, incremental=False):
"""Configure a root logger from a dictionary."""
root = logging.getLogger()
self.common_logger_config(root, config, incremental)
dictConfigClass = DictConfigurator
def dictConfig(config):
"""Configure logging using a dictionary."""
dictConfigClass(config).configure()
def listen(port=DEFAULT_LOGGING_CONFIG_PORT, verify=None):
"""
Start up a socket server on the specified port, and listen for new
configurations.
These will be sent as a file suitable for processing by fileConfig().
Returns a Thread object on which you can call start() to start the server,
and which you can join() when appropriate. To stop the server, call
stopListening().
Use the ``verify`` argument to verify any bytes received across the wire
from a client. If specified, it should be a callable which receives a
single argument - the bytes of configuration data received across the
network - and it should return either ``None``, to indicate that the
passed in bytes could not be verified and should be discarded, or a
byte string which is then passed to the configuration machinery as
normal. Note that you can return transformed bytes, e.g. by decrypting
the bytes passed in.
"""
if not thread: #pragma: no cover
raise NotImplementedError("listen() needs threading to work")
class ConfigStreamHandler(StreamRequestHandler):
"""
Handler for a logging configuration request.
It expects a completely new logging configuration and uses fileConfig
to install it.
"""
def handle(self):
"""
Handle a request.
Each request is expected to be a 4-byte length, packed using
struct.pack(">L", n), followed by the config file.
Uses fileConfig() to do the grunt work.
"""
try:
conn = self.connection
chunk = conn.recv(4)
if len(chunk) == 4:
slen = struct.unpack(">L", chunk)[0]
chunk = self.connection.recv(slen)
while len(chunk) < slen:
chunk = chunk + conn.recv(slen - len(chunk))
if self.server.verify is not None:
chunk = self.server.verify(chunk)
if chunk is not None: # verified, can process
chunk = chunk.decode("utf-8")
try:
import json
d =json.loads(chunk)
assert isinstance(d, dict)
dictConfig(d)
except Exception:
#Apply new configuration.
file = io.StringIO(chunk)
try:
fileConfig(file)
except Exception:
traceback.print_exc()
if self.server.ready:
self.server.ready.set()
except OSError as e:
if e.errno != RESET_ERROR:
raise
class ConfigSocketReceiver(ThreadingTCPServer):
"""
A simple TCP socket-based logging config receiver.
"""
allow_reuse_address = 1
def __init__(self, host='localhost', port=DEFAULT_LOGGING_CONFIG_PORT,
handler=None, ready=None, verify=None):
ThreadingTCPServer.__init__(self, (host, port), handler)
logging._acquireLock()
self.abort = 0
logging._releaseLock()
self.timeout = 1
self.ready = ready
self.verify = verify
def serve_until_stopped(self):
import select
abort = 0
while not abort:
rd, wr, ex = select.select([self.socket.fileno()],
[], [],
self.timeout)
if rd:
self.handle_request()
logging._acquireLock()
abort = self.abort
logging._releaseLock()
self.socket.close()
class Server(threading.Thread):
def __init__(self, rcvr, hdlr, port, verify):
super(Server, self).__init__()
self.rcvr = rcvr
self.hdlr = hdlr
self.port = port
self.verify = verify
self.ready = threading.Event()
def run(self):
server = self.rcvr(port=self.port, handler=self.hdlr,
ready=self.ready,
verify=self.verify)
if self.port == 0:
self.port = server.server_address[1]
self.ready.set()
global _listener
logging._acquireLock()
_listener = server
logging._releaseLock()
server.serve_until_stopped()
return Server(ConfigSocketReceiver, ConfigStreamHandler, port, verify)
def stopListening():
"""
Stop the listening server which was created with a call to listen().
"""
global _listener
logging._acquireLock()
try:
if _listener:
_listener.abort = 1
_listener = None
finally:
logging._releaseLock()
# Copyright 2001-2015 by Vinay Sajip. All Rights Reserved.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Vinay Sajip
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
"""
Additional handlers for the logging package for Python. The core package is
based on PEP 282 and comments thereto in comp.lang.python.
Copyright (C) 2001-2015 Vinay Sajip. All Rights Reserved.
To use, simply 'import logging.handlers' and log away!
"""
import logging, socket, os, pickle, struct, time, re
from stat import ST_DEV, ST_INO, ST_MTIME
import queue
try:
import threading
except ImportError: #pragma: no cover
threading = None
#
# Some constants...
#
DEFAULT_TCP_LOGGING_PORT = 9020
DEFAULT_UDP_LOGGING_PORT = 9021
DEFAULT_HTTP_LOGGING_PORT = 9022
DEFAULT_SOAP_LOGGING_PORT = 9023
SYSLOG_UDP_PORT = 514
SYSLOG_TCP_PORT = 514
_MIDNIGHT = 24 * 60 * 60 # number of seconds in a day
class BaseRotatingHandler(logging.FileHandler):
"""
Base class for handlers that rotate log files at a certain point.
Not meant to be instantiated directly. Instead, use RotatingFileHandler
or TimedRotatingFileHandler.
"""
def __init__(self, filename, mode, encoding=None, delay=False):
"""
Use the specified filename for streamed logging
"""
logging.FileHandler.__init__(self, filename, mode, encoding, delay)
self.mode = mode
self.encoding = encoding
self.namer = None
self.rotator = None
def emit(self, record):
"""
Emit a record.
Output the record to the file, catering for rollover as described
in doRollover().
"""
try:
if self.shouldRollover(record):
self.doRollover()
logging.FileHandler.emit(self, record)
except Exception:
self.handleError(record)
def rotation_filename(self, default_name):
"""
Modify the filename of a log file when rotating.
This is provided so that a custom filename can be provided.
The default implementation calls the 'namer' attribute of the
handler, if it's callable, passing the default name to
it. If the attribute isn't callable (the default is None), the name
is returned unchanged.
:param default_name: The default name for the log file.
"""
if not callable(self.namer):
result = default_name
else:
result = self.namer(default_name)
return result
def rotate(self, source, dest):
"""
When rotating, rotate the current log.
The default implementation calls the 'rotator' attribute of the
handler, if it's callable, passing the source and dest arguments to
it. If the attribute isn't callable (the default is None), the source
is simply renamed to the destination.
:param source: The source filename. This is normally the base
filename, e.g. 'test.log'
:param dest: The destination filename. This is normally
what the source is rotated to, e.g. 'test.log.1'.
"""
if not callable(self.rotator):
# Issue 18940: A file may not have been created if delay is True.
if os.path.exists(source):
os.rename(source, dest)
else:
self.rotator(source, dest)
class RotatingFileHandler(BaseRotatingHandler):
"""
Handler for logging to a set of files, which switches from one file
to the next when the current file reaches a certain size.
"""
def __init__(self, filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False):
"""
Open the specified file and use it as the stream for logging.
By default, the file grows indefinitely. You can specify particular
values of maxBytes and backupCount to allow the file to rollover at
a predetermined size.
Rollover occurs whenever the current log file is nearly maxBytes in
length. If backupCount is >= 1, the system will successively create
new files with the same pathname as the base file, but with extensions
".1", ".2" etc. appended to it. For example, with a backupCount of 5
and a base file name of "app.log", you would get "app.log",
"app.log.1", "app.log.2", ... through to "app.log.5". The file being
written to is always "app.log" - when it gets filled up, it is closed
and renamed to "app.log.1", and if files "app.log.1", "app.log.2" etc.
exist, then they are renamed to "app.log.2", "app.log.3" etc.
respectively.
If maxBytes is zero, rollover never occurs.
"""
# If rotation/rollover is wanted, it doesn't make sense to use another
# mode. If for example 'w' were specified, then if there were multiple
# runs of the calling application, the logs from previous runs would be
# lost if the 'w' is respected, because the log file would be truncated
# on each run.
if maxBytes > 0:
mode = 'a'
BaseRotatingHandler.__init__(self, filename, mode, encoding, delay)
self.maxBytes = maxBytes
self.backupCount = backupCount
def doRollover(self):
"""
Do a rollover, as described in __init__().
"""
if self.stream:
self.stream.close()
self.stream = None
if self.backupCount > 0:
for i in range(self.backupCount - 1, 0, -1):
sfn = self.rotation_filename("%s.%d" % (self.baseFilename, i))
dfn = self.rotation_filename("%s.%d" % (self.baseFilename,
i + 1))
if os.path.exists(sfn):
if os.path.exists(dfn):
os.remove(dfn)
os.rename(sfn, dfn)
dfn = self.rotation_filename(self.baseFilename + ".1")
if os.path.exists(dfn):
os.remove(dfn)
self.rotate(self.baseFilename, dfn)
if not self.delay:
self.stream = self._open()
def shouldRollover(self, record):
"""
Determine if rollover should occur.
Basically, see if the supplied record would cause the file to exceed
the size limit we have.
"""
if self.stream is None: # delay was set...
self.stream = self._open()
if self.maxBytes > 0: # are we rolling over?
msg = "%s\n" % self.format(record)
self.stream.seek(0, 2) #due to non-posix-compliant Windows feature
if self.stream.tell() + len(msg) >= self.maxBytes:
return 1
return 0
class TimedRotatingFileHandler(BaseRotatingHandler):
"""
Handler for logging to a file, rotating the log file at certain timed
intervals.
If backupCount is > 0, when rollover is done, no more than backupCount
files are kept - the oldest ones are deleted.
"""
def __init__(self, filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False, atTime=None):
BaseRotatingHandler.__init__(self, filename, 'a', encoding, delay)
self.when = when.upper()
self.backupCount = backupCount
self.utc = utc
self.atTime = atTime
# Calculate the real rollover interval, which is just the number of
# seconds between rollovers. Also set the filename suffix used when
# a rollover occurs. Current 'when' events supported:
# S - Seconds
# M - Minutes
# H - Hours
# D - Days
# midnight - roll over at midnight
# W{0-6} - roll over on a certain day; 0 - Monday
#
# Case of the 'when' specifier is not important; lower or upper case
# will work.
if self.when == 'S':
self.interval = 1 # one second
self.suffix = "%Y-%m-%d_%H-%M-%S"
self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}(\.\w+)?$"
elif self.when == 'M':
self.interval = 60 # one minute
self.suffix = "%Y-%m-%d_%H-%M"
self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}-\d{2}(\.\w+)?$"
elif self.when == 'H':
self.interval = 60 * 60 # one hour
self.suffix = "%Y-%m-%d_%H"
self.extMatch = r"^\d{4}-\d{2}-\d{2}_\d{2}(\.\w+)?$"
elif self.when == 'D' or self.when == 'MIDNIGHT':
self.interval = 60 * 60 * 24 # one day
self.suffix = "%Y-%m-%d"
self.extMatch = r"^\d{4}-\d{2}-\d{2}(\.\w+)?$"
elif self.when.startswith('W'):
self.interval = 60 * 60 * 24 * 7 # one week
if len(self.when) != 2:
raise ValueError("You must specify a day for weekly rollover from 0 to 6 (0 is Monday): %s" % self.when)
if self.when[1] < '0' or self.when[1] > '6':
raise ValueError("Invalid day specified for weekly rollover: %s" % self.when)
self.dayOfWeek = int(self.when[1])
self.suffix = "%Y-%m-%d"
self.extMatch = r"^\d{4}-\d{2}-\d{2}(\.\w+)?$"
else:
raise ValueError("Invalid rollover interval specified: %s" % self.when)
self.extMatch = re.compile(self.extMatch, re.ASCII)
self.interval = self.interval * interval # multiply by units requested
if os.path.exists(filename):
t = os.stat(filename)[ST_MTIME]
else:
t = int(time.time())
self.rolloverAt = self.computeRollover(t)
def computeRollover(self, currentTime):
"""
Work out the rollover time based on the specified time.
"""
result = currentTime + self.interval
# If we are rolling over at midnight or weekly, then the interval is already known.
# What we need to figure out is WHEN the next interval is. In other words,
# if you are rolling over at midnight, then your base interval is 1 day,
# but you want to start that one day clock at midnight, not now. So, we
# have to fudge the rolloverAt value in order to trigger the first rollover
# at the right time. After that, the regular interval will take care of
# the rest. Note that this code doesn't care about leap seconds. :)
if self.when == 'MIDNIGHT' or self.when.startswith('W'):
# This could be done with less code, but I wanted it to be clear
if self.utc:
t = time.gmtime(currentTime)
else:
t = time.localtime(currentTime)
currentHour = t[3]
currentMinute = t[4]
currentSecond = t[5]
currentDay = t[6]
# r is the number of seconds left between now and the next rotation
if self.atTime is None:
rotate_ts = _MIDNIGHT
else:
rotate_ts = ((self.atTime.hour * 60 + self.atTime.minute)*60 +
self.atTime.second)
r = rotate_ts - ((currentHour * 60 + currentMinute) * 60 +
currentSecond)
if r < 0:
# Rotate time is before the current time (for example when
# self.rotateAt is 13:45 and it now 14:15), rotation is
# tomorrow.
r += _MIDNIGHT
currentDay = (currentDay + 1) % 7
result = currentTime + r
# If we are rolling over on a certain day, add in the number of days until
# the next rollover, but offset by 1 since we just calculated the time
# until the next day starts. There are three cases:
# Case 1) The day to rollover is today; in this case, do nothing
# Case 2) The day to rollover is further in the interval (i.e., today is
# day 2 (Wednesday) and rollover is on day 6 (Sunday). Days to
# next rollover is simply 6 - 2 - 1, or 3.
# Case 3) The day to rollover is behind us in the interval (i.e., today
# is day 5 (Saturday) and rollover is on day 3 (Thursday).
# Days to rollover is 6 - 5 + 3, or 4. In this case, it's the
# number of days left in the current week (1) plus the number
# of days in the next week until the rollover day (3).
# The calculations described in 2) and 3) above need to have a day added.
# This is because the above time calculation takes us to midnight on this
# day, i.e. the start of the next day.
if self.when.startswith('W'):
day = currentDay # 0 is Monday
if day != self.dayOfWeek:
if day < self.dayOfWeek:
daysToWait = self.dayOfWeek - day
else:
daysToWait = 6 - day + self.dayOfWeek + 1
newRolloverAt = result + (daysToWait * (60 * 60 * 24))
if not self.utc:
dstNow = t[-1]
dstAtRollover = time.localtime(newRolloverAt)[-1]
if dstNow != dstAtRollover:
if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour
addend = -3600
else: # DST bows out before next rollover, so we need to add an hour
addend = 3600
newRolloverAt += addend
result = newRolloverAt
return result
def shouldRollover(self, record):
"""
Determine if rollover should occur.
record is not used, as we are just comparing times, but it is needed so
the method signatures are the same
"""
t = int(time.time())
if t >= self.rolloverAt:
return 1
return 0
def getFilesToDelete(self):
"""
Determine the files to delete when rolling over.
More specific than the earlier method, which just used glob.glob().
"""
dirName, baseName = os.path.split(self.baseFilename)
fileNames = os.listdir(dirName)
result = []
prefix = baseName + "."
plen = len(prefix)
for fileName in fileNames:
if fileName[:plen] == prefix:
suffix = fileName[plen:]
if self.extMatch.match(suffix):
result.append(os.path.join(dirName, fileName))
result.sort()
if len(result) < self.backupCount:
result = []
else:
result = result[:len(result) - self.backupCount]
return result
def doRollover(self):
"""
do a rollover; in this case, a date/time stamp is appended to the filename
when the rollover happens. However, you want the file to be named for the
start of the interval, not the current time. If there is a backup count,
then we have to get a list of matching filenames, sort them and remove
the one with the oldest suffix.
"""
if self.stream:
self.stream.close()
self.stream = None
# get the time that this sequence started at and make it a TimeTuple
currentTime = int(time.time())
dstNow = time.localtime(currentTime)[-1]
t = self.rolloverAt - self.interval
if self.utc:
timeTuple = time.gmtime(t)
else:
timeTuple = time.localtime(t)
dstThen = timeTuple[-1]
if dstNow != dstThen:
if dstNow:
addend = 3600
else:
addend = -3600
timeTuple = time.localtime(t + addend)
dfn = self.rotation_filename(self.baseFilename + "." +
time.strftime(self.suffix, timeTuple))
if os.path.exists(dfn):
os.remove(dfn)
self.rotate(self.baseFilename, dfn)
if self.backupCount > 0:
for s in self.getFilesToDelete():
os.remove(s)
if not self.delay:
self.stream = self._open()
newRolloverAt = self.computeRollover(currentTime)
while newRolloverAt <= currentTime:
newRolloverAt = newRolloverAt + self.interval
#If DST changes and midnight or weekly rollover, adjust for this.
if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc:
dstAtRollover = time.localtime(newRolloverAt)[-1]
if dstNow != dstAtRollover:
if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour
addend = -3600
else: # DST bows out before next rollover, so we need to add an hour
addend = 3600
newRolloverAt += addend
self.rolloverAt = newRolloverAt
class WatchedFileHandler(logging.FileHandler):
"""
A handler for logging to a file, which watches the file
to see if it has changed while in use. This can happen because of
usage of programs such as newsyslog and logrotate which perform
log file rotation. This handler, intended for use under Unix,
watches the file to see if it has changed since the last emit.
(A file has changed if its device or inode have changed.)
If it has changed, the old file stream is closed, and the file
opened to get a new stream.
This handler is not appropriate for use under Windows, because
under Windows open files cannot be moved or renamed - logging
opens the files with exclusive locks - and so there is no need
for such a handler. Furthermore, ST_INO is not supported under
Windows; stat always returns zero for this value.
This handler is based on a suggestion and patch by Chad J.
Schroeder.
"""
def __init__(self, filename, mode='a', encoding=None, delay=False):
logging.FileHandler.__init__(self, filename, mode, encoding, delay)
self.dev, self.ino = -1, -1
self._statstream()
def _statstream(self):
if self.stream:
sres = os.fstat(self.stream.fileno())
self.dev, self.ino = sres[ST_DEV], sres[ST_INO]
def emit(self, record):
"""
Emit a record.
First check if the underlying file has changed, and if it
has, close the old stream and reopen the file to get the
current stream.
"""
# Reduce the chance of race conditions by stat'ing by path only
# once and then fstat'ing our new fd if we opened a new log stream.
# See issue #14632: Thanks to John Mulligan for the problem report
# and patch.
try:
# stat the file by path, checking for existence
sres = os.stat(self.baseFilename)
except FileNotFoundError:
sres = None
# compare file system stat with that of our stream file handle
if not sres or sres[ST_DEV] != self.dev or sres[ST_INO] != self.ino:
if self.stream is not None:
# we have an open file handle, clean it up
self.stream.flush()
self.stream.close()
self.stream = None # See Issue #21742: _open () might fail.
# open a new file handle and get new stat info from that fd
self.stream = self._open()
self._statstream()
logging.FileHandler.emit(self, record)
class SocketHandler(logging.Handler):
"""
A handler class which writes logging records, in pickle format, to
a streaming socket. The socket is kept open across logging calls.
If the peer resets it, an attempt is made to reconnect on the next call.
The pickle which is sent is that of the LogRecord's attribute dictionary
(__dict__), so that the receiver does not need to have the logging module
installed in order to process the logging event.
To unpickle the record at the receiving end into a LogRecord, use the
makeLogRecord function.
"""
def __init__(self, host, port):
"""
Initializes the handler with a specific host address and port.
When the attribute *closeOnError* is set to True - if a socket error
occurs, the socket is silently closed and then reopened on the next
logging call.
"""
logging.Handler.__init__(self)
self.host = host
self.port = port
if port is None:
self.address = host
else:
self.address = (host, port)
self.sock = None
self.closeOnError = False
self.retryTime = None
#
# Exponential backoff parameters.
#
self.retryStart = 1.0
self.retryMax = 30.0
self.retryFactor = 2.0
def makeSocket(self, timeout=1):
"""
A factory method which allows subclasses to define the precise
type of socket they want.
"""
if self.port is not None:
result = socket.create_connection(self.address, timeout=timeout)
else:
result = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
result.settimeout(timeout)
try:
result.connect(self.address)
except OSError:
result.close() # Issue 19182
raise
return result
def createSocket(self):
"""
Try to create a socket, using an exponential backoff with
a max retry time. Thanks to Robert Olson for the original patch
(SF #815911) which has been slightly refactored.
"""
now = time.time()
# Either retryTime is None, in which case this
# is the first time back after a disconnect, or
# we've waited long enough.
if self.retryTime is None:
attempt = True
else:
attempt = (now >= self.retryTime)
if attempt:
try:
self.sock = self.makeSocket()
self.retryTime = None # next time, no delay before trying
except OSError:
#Creation failed, so set the retry time and return.
if self.retryTime is None:
self.retryPeriod = self.retryStart
else:
self.retryPeriod = self.retryPeriod * self.retryFactor
if self.retryPeriod > self.retryMax:
self.retryPeriod = self.retryMax
self.retryTime = now + self.retryPeriod
def send(self, s):
"""
Send a pickled string to the socket.
This function allows for partial sends which can happen when the
network is busy.
"""
if self.sock is None:
self.createSocket()
#self.sock can be None either because we haven't reached the retry
#time yet, or because we have reached the retry time and retried,
#but are still unable to connect.
if self.sock:
try:
self.sock.sendall(s)
except OSError: #pragma: no cover
self.sock.close()
self.sock = None # so we can call createSocket next time
def makePickle(self, record):
"""
Pickles the record in binary format with a length prefix, and
returns it ready for transmission across the socket.
"""
ei = record.exc_info
if ei:
# just to get traceback text into record.exc_text ...
dummy = self.format(record)
# See issue #14436: If msg or args are objects, they may not be
# available on the receiving end. So we convert the msg % args
# to a string, save it as msg and zap the args.
d = dict(record.__dict__)
d['msg'] = record.getMessage()
d['args'] = None
d['exc_info'] = None
s = pickle.dumps(d, 1)
slen = struct.pack(">L", len(s))
return slen + s
def handleError(self, record):
"""
Handle an error during logging.
An error has occurred during logging. Most likely cause -
connection lost. Close the socket so that we can retry on the
next event.
"""
if self.closeOnError and self.sock:
self.sock.close()
self.sock = None #try to reconnect next time
else:
logging.Handler.handleError(self, record)
def emit(self, record):
"""
Emit a record.
Pickles the record and writes it to the socket in binary format.
If there is an error with the socket, silently drop the packet.
If there was a problem with the socket, re-establishes the
socket.
"""
try:
s = self.makePickle(record)
self.send(s)
except Exception:
self.handleError(record)
def close(self):
"""
Closes the socket.
"""
self.acquire()
try:
sock = self.sock
if sock:
self.sock = None
sock.close()
logging.Handler.close(self)
finally:
self.release()
class DatagramHandler(SocketHandler):
"""
A handler class which writes logging records, in pickle format, to
a datagram socket. The pickle which is sent is that of the LogRecord's
attribute dictionary (__dict__), so that the receiver does not need to
have the logging module installed in order to process the logging event.
To unpickle the record at the receiving end into a LogRecord, use the
makeLogRecord function.
"""
def __init__(self, host, port):
"""
Initializes the handler with a specific host address and port.
"""
SocketHandler.__init__(self, host, port)
self.closeOnError = False
def makeSocket(self):
"""
The factory method of SocketHandler is here overridden to create
a UDP socket (SOCK_DGRAM).
"""
if self.port is None:
family = socket.AF_UNIX
else:
family = socket.AF_INET
s = socket.socket(family, socket.SOCK_DGRAM)
return s
def send(self, s):
"""
Send a pickled string to a socket.
This function no longer allows for partial sends which can happen
when the network is busy - UDP does not guarantee delivery and
can deliver packets out of sequence.
"""
if self.sock is None:
self.createSocket()
self.sock.sendto(s, self.address)
class SysLogHandler(logging.Handler):
"""
A handler class which sends formatted logging records to a syslog
server. Based on Sam Rushing's syslog module:
http://www.nightmare.com/squirl/python-ext/misc/syslog.py
Contributed by Nicolas Untz (after which minor refactoring changes
have been made).
"""
# from <linux/sys/syslog.h>:
# ======================================================================
# priorities/facilities are encoded into a single 32-bit quantity, where
# the bottom 3 bits are the priority (0-7) and the top 28 bits are the
# facility (0-big number). Both the priorities and the facilities map
# roughly one-to-one to strings in the syslogd(8) source code. This
# mapping is included in this file.
#
# priorities (these are ordered)
LOG_EMERG = 0 # system is unusable
LOG_ALERT = 1 # action must be taken immediately
LOG_CRIT = 2 # critical conditions
LOG_ERR = 3 # error conditions
LOG_WARNING = 4 # warning conditions
LOG_NOTICE = 5 # normal but significant condition
LOG_INFO = 6 # informational
LOG_DEBUG = 7 # debug-level messages
# facility codes
LOG_KERN = 0 # kernel messages
LOG_USER = 1 # random user-level messages
LOG_MAIL = 2 # mail system
LOG_DAEMON = 3 # system daemons
LOG_AUTH = 4 # security/authorization messages
LOG_SYSLOG = 5 # messages generated internally by syslogd
LOG_LPR = 6 # line printer subsystem
LOG_NEWS = 7 # network news subsystem
LOG_UUCP = 8 # UUCP subsystem
LOG_CRON = 9 # clock daemon
LOG_AUTHPRIV = 10 # security/authorization messages (private)
LOG_FTP = 11 # FTP daemon
# other codes through 15 reserved for system use
LOG_LOCAL0 = 16 # reserved for local use
LOG_LOCAL1 = 17 # reserved for local use
LOG_LOCAL2 = 18 # reserved for local use
LOG_LOCAL3 = 19 # reserved for local use
LOG_LOCAL4 = 20 # reserved for local use
LOG_LOCAL5 = 21 # reserved for local use
LOG_LOCAL6 = 22 # reserved for local use
LOG_LOCAL7 = 23 # reserved for local use
priority_names = {
"alert": LOG_ALERT,
"crit": LOG_CRIT,
"critical": LOG_CRIT,
"debug": LOG_DEBUG,
"emerg": LOG_EMERG,
"err": LOG_ERR,
"error": LOG_ERR, # DEPRECATED
"info": LOG_INFO,
"notice": LOG_NOTICE,
"panic": LOG_EMERG, # DEPRECATED
"warn": LOG_WARNING, # DEPRECATED
"warning": LOG_WARNING,
}
facility_names = {
"auth": LOG_AUTH,
"authpriv": LOG_AUTHPRIV,
"cron": LOG_CRON,
"daemon": LOG_DAEMON,
"ftp": LOG_FTP,
"kern": LOG_KERN,
"lpr": LOG_LPR,
"mail": LOG_MAIL,
"news": LOG_NEWS,
"security": LOG_AUTH, # DEPRECATED
"syslog": LOG_SYSLOG,
"user": LOG_USER,
"uucp": LOG_UUCP,
"local0": LOG_LOCAL0,
"local1": LOG_LOCAL1,
"local2": LOG_LOCAL2,
"local3": LOG_LOCAL3,
"local4": LOG_LOCAL4,
"local5": LOG_LOCAL5,
"local6": LOG_LOCAL6,
"local7": LOG_LOCAL7,
}
#The map below appears to be trivially lowercasing the key. However,
#there's more to it than meets the eye - in some locales, lowercasing
#gives unexpected results. See SF #1524081: in the Turkish locale,
#"INFO".lower() != "info"
priority_map = {
"DEBUG" : "debug",
"INFO" : "info",
"WARNING" : "warning",
"ERROR" : "error",
"CRITICAL" : "critical"
}
def __init__(self, address=('localhost', SYSLOG_UDP_PORT),
facility=LOG_USER, socktype=None):
"""
Initialize a handler.
If address is specified as a string, a UNIX socket is used. To log to a
local syslogd, "SysLogHandler(address="/dev/log")" can be used.
If facility is not specified, LOG_USER is used. If socktype is
specified as socket.SOCK_DGRAM or socket.SOCK_STREAM, that specific
socket type will be used. For Unix sockets, you can also specify a
socktype of None, in which case socket.SOCK_DGRAM will be used, falling
back to socket.SOCK_STREAM.
"""
logging.Handler.__init__(self)
self.address = address
self.facility = facility
self.socktype = socktype
if isinstance(address, str):
self.unixsocket = True
self._connect_unixsocket(address)
else:
self.unixsocket = False
if socktype is None:
socktype = socket.SOCK_DGRAM
self.socket = socket.socket(socket.AF_INET, socktype)
if socktype == socket.SOCK_STREAM:
self.socket.connect(address)
self.socktype = socktype
self.formatter = None
def _connect_unixsocket(self, address):
use_socktype = self.socktype
if use_socktype is None:
use_socktype = socket.SOCK_DGRAM
self.socket = socket.socket(socket.AF_UNIX, use_socktype)
try:
self.socket.connect(address)
# it worked, so set self.socktype to the used type
self.socktype = use_socktype
except OSError:
self.socket.close()
if self.socktype is not None:
# user didn't specify falling back, so fail
raise
use_socktype = socket.SOCK_STREAM
self.socket = socket.socket(socket.AF_UNIX, use_socktype)
try:
self.socket.connect(address)
# it worked, so set self.socktype to the used type
self.socktype = use_socktype
except OSError:
self.socket.close()
raise
def encodePriority(self, facility, priority):
"""
Encode the facility and priority. You can pass in strings or
integers - if strings are passed, the facility_names and
priority_names mapping dictionaries are used to convert them to
integers.
"""
if isinstance(facility, str):
facility = self.facility_names[facility]
if isinstance(priority, str):
priority = self.priority_names[priority]
return (facility << 3) | priority
def close (self):
"""
Closes the socket.
"""
self.acquire()
try:
self.socket.close()
logging.Handler.close(self)
finally:
self.release()
def mapPriority(self, levelName):
"""
Map a logging level name to a key in the priority_names map.
This is useful in two scenarios: when custom levels are being
used, and in the case where you can't do a straightforward
mapping by lowercasing the logging level name because of locale-
specific issues (see SF #1524081).
"""
return self.priority_map.get(levelName, "warning")
ident = '' # prepended to all messages
append_nul = True # some old syslog daemons expect a NUL terminator
def emit(self, record):
"""
Emit a record.
The record is formatted, and then sent to the syslog server. If
exception information is present, it is NOT sent to the server.
"""
try:
msg = self.format(record)
if self.ident:
msg = self.ident + msg
if self.append_nul:
msg += '\000'
# We need to convert record level to lowercase, maybe this will
# change in the future.
prio = '<%d>' % self.encodePriority(self.facility,
self.mapPriority(record.levelname))
prio = prio.encode('utf-8')
# Message is a string. Convert to bytes as required by RFC 5424
msg = msg.encode('utf-8')
msg = prio + msg
if self.unixsocket:
try:
self.socket.send(msg)
except OSError:
self.socket.close()
self._connect_unixsocket(self.address)
self.socket.send(msg)
elif self.socktype == socket.SOCK_DGRAM:
self.socket.sendto(msg, self.address)
else:
self.socket.sendall(msg)
except Exception:
self.handleError(record)
class SMTPHandler(logging.Handler):
"""
A handler class which sends an SMTP email for each logging event.
"""
def __init__(self, mailhost, fromaddr, toaddrs, subject,
credentials=None, secure=None, timeout=5.0):
"""
Initialize the handler.
Initialize the instance with the from and to addresses and subject
line of the email. To specify a non-standard SMTP port, use the
(host, port) tuple format for the mailhost argument. To specify
authentication credentials, supply a (username, password) tuple
for the credentials argument. To specify the use of a secure
protocol (TLS), pass in a tuple for the secure argument. This will
only be used when authentication credentials are supplied. The tuple
will be either an empty tuple, or a single-value tuple with the name
of a keyfile, or a 2-value tuple with the names of the keyfile and
certificate file. (This tuple is passed to the `starttls` method).
A timeout in seconds can be specified for the SMTP connection (the
default is one second).
"""
logging.Handler.__init__(self)
if isinstance(mailhost, (list, tuple)):
self.mailhost, self.mailport = mailhost
else:
self.mailhost, self.mailport = mailhost, None
if isinstance(credentials, (list, tuple)):
self.username, self.password = credentials
else:
self.username = None
self.fromaddr = fromaddr
if isinstance(toaddrs, str):
toaddrs = [toaddrs]
self.toaddrs = toaddrs
self.subject = subject
self.secure = secure
self.timeout = timeout
def getSubject(self, record):
"""
Determine the subject for the email.
If you want to specify a subject line which is record-dependent,
override this method.
"""
return self.subject
def emit(self, record):
"""
Emit a record.
Format the record and send it to the specified addressees.
"""
try:
import smtplib
from email.message import EmailMessage
import email.utils
port = self.mailport
if not port:
port = smtplib.SMTP_PORT
smtp = smtplib.SMTP(self.mailhost, port, timeout=self.timeout)
msg = EmailMessage()
msg['From'] = self.fromaddr
msg['To'] = ','.join(self.toaddrs)
msg['Subject'] = self.getSubject(record)
msg['Date'] = email.utils.localtime()
msg.set_content(self.format(record))
if self.username:
if self.secure is not None:
smtp.ehlo()
smtp.starttls(*self.secure)
smtp.ehlo()
smtp.login(self.username, self.password)
smtp.send_message(msg)
smtp.quit()
except Exception:
self.handleError(record)
class NTEventLogHandler(logging.Handler):
"""
A handler class which sends events to the NT Event Log. Adds a
registry entry for the specified application name. If no dllname is
provided, win32service.pyd (which contains some basic message
placeholders) is used. Note that use of these placeholders will make
your event logs big, as the entire message source is held in the log.
If you want slimmer logs, you have to pass in the name of your own DLL
which contains the message definitions you want to use in the event log.
"""
def __init__(self, appname, dllname=None, logtype="Application"):
logging.Handler.__init__(self)
try:
import win32evtlogutil, win32evtlog
self.appname = appname
self._welu = win32evtlogutil
if not dllname:
dllname = os.path.split(self._welu.__file__)
dllname = os.path.split(dllname[0])
dllname = os.path.join(dllname[0], r'win32service.pyd')
self.dllname = dllname
self.logtype = logtype
self._welu.AddSourceToRegistry(appname, dllname, logtype)
self.deftype = win32evtlog.EVENTLOG_ERROR_TYPE
self.typemap = {
logging.DEBUG : win32evtlog.EVENTLOG_INFORMATION_TYPE,
logging.INFO : win32evtlog.EVENTLOG_INFORMATION_TYPE,
logging.WARNING : win32evtlog.EVENTLOG_WARNING_TYPE,
logging.ERROR : win32evtlog.EVENTLOG_ERROR_TYPE,
logging.CRITICAL: win32evtlog.EVENTLOG_ERROR_TYPE,
}
except ImportError:
print("The Python Win32 extensions for NT (service, event "\
"logging) appear not to be available.")
self._welu = None
def getMessageID(self, record):
"""
Return the message ID for the event record. If you are using your
own messages, you could do this by having the msg passed to the
logger being an ID rather than a formatting string. Then, in here,
you could use a dictionary lookup to get the message ID. This
version returns 1, which is the base message ID in win32service.pyd.
"""
return 1
def getEventCategory(self, record):
"""
Return the event category for the record.
Override this if you want to specify your own categories. This version
returns 0.
"""
return 0
def getEventType(self, record):
"""
Return the event type for the record.
Override this if you want to specify your own types. This version does
a mapping using the handler's typemap attribute, which is set up in
__init__() to a dictionary which contains mappings for DEBUG, INFO,
WARNING, ERROR and CRITICAL. If you are using your own levels you will
either need to override this method or place a suitable dictionary in
the handler's typemap attribute.
"""
return self.typemap.get(record.levelno, self.deftype)
def emit(self, record):
"""
Emit a record.
Determine the message ID, event category and event type. Then
log the message in the NT event log.
"""
if self._welu:
try:
id = self.getMessageID(record)
cat = self.getEventCategory(record)
type = self.getEventType(record)
msg = self.format(record)
self._welu.ReportEvent(self.appname, id, cat, type, [msg])
except Exception:
self.handleError(record)
def close(self):
"""
Clean up this handler.
You can remove the application name from the registry as a
source of event log entries. However, if you do this, you will
not be able to see the events as you intended in the Event Log
Viewer - it needs to be able to access the registry to get the
DLL name.
"""
#self._welu.RemoveSourceFromRegistry(self.appname, self.logtype)
logging.Handler.close(self)
class HTTPHandler(logging.Handler):
"""
A class which sends records to a Web server, using either GET or
POST semantics.
"""
def __init__(self, host, url, method="GET", secure=False, credentials=None,
context=None):
"""
Initialize the instance with the host, the request URL, and the method
("GET" or "POST")
"""
logging.Handler.__init__(self)
method = method.upper()
if method not in ["GET", "POST"]:
raise ValueError("method must be GET or POST")
if not secure and context is not None:
raise ValueError("context parameter only makes sense "
"with secure=True")
self.host = host
self.url = url
self.method = method
self.secure = secure
self.credentials = credentials
self.context = context
def mapLogRecord(self, record):
"""
Default implementation of mapping the log record into a dict
that is sent as the CGI data. Overwrite in your class.
Contributed by Franz Glasner.
"""
return record.__dict__
def emit(self, record):
"""
Emit a record.
Send the record to the Web server as a percent-encoded dictionary
"""
try:
import http.client, urllib.parse
host = self.host
if self.secure:
h = http.client.HTTPSConnection(host, context=self.context)
else:
h = http.client.HTTPConnection(host)
url = self.url
data = urllib.parse.urlencode(self.mapLogRecord(record))
if self.method == "GET":
if (url.find('?') >= 0):
sep = '&'
else:
sep = '?'
url = url + "%c%s" % (sep, data)
h.putrequest(self.method, url)
# support multiple hosts on one IP address...
# need to strip optional :port from host, if present
i = host.find(":")
if i >= 0:
host = host[:i]
h.putheader("Host", host)
if self.method == "POST":
h.putheader("Content-type",
"application/x-www-form-urlencoded")
h.putheader("Content-length", str(len(data)))
if self.credentials:
import base64
s = ('u%s:%s' % self.credentials).encode('utf-8')
s = 'Basic ' + base64.b64encode(s).strip()
h.putheader('Authorization', s)
h.endheaders()
if self.method == "POST":
h.send(data.encode('utf-8'))
h.getresponse() #can't do anything with the result
except Exception:
self.handleError(record)
class BufferingHandler(logging.Handler):
"""
A handler class which buffers logging records in memory. Whenever each
record is added to the buffer, a check is made to see if the buffer should
be flushed. If it should, then flush() is expected to do what's needed.
"""
def __init__(self, capacity):
"""
Initialize the handler with the buffer size.
"""
logging.Handler.__init__(self)
self.capacity = capacity
self.buffer = []
def shouldFlush(self, record):
"""
Should the handler flush its buffer?
Returns true if the buffer is up to capacity. This method can be
overridden to implement custom flushing strategies.
"""
return (len(self.buffer) >= self.capacity)
def emit(self, record):
"""
Emit a record.
Append the record. If shouldFlush() tells us to, call flush() to process
the buffer.
"""
self.buffer.append(record)
if self.shouldFlush(record):
self.flush()
def flush(self):
"""
Override to implement custom flushing behaviour.
This version just zaps the buffer to empty.
"""
self.acquire()
try:
self.buffer = []
finally:
self.release()
def close(self):
"""
Close the handler.
This version just flushes and chains to the parent class' close().
"""
try:
self.flush()
finally:
logging.Handler.close(self)
class MemoryHandler(BufferingHandler):
"""
A handler class which buffers logging records in memory, periodically
flushing them to a target handler. Flushing occurs whenever the buffer
is full, or when an event of a certain severity or greater is seen.
"""
def __init__(self, capacity, flushLevel=logging.ERROR, target=None):
"""
Initialize the handler with the buffer size, the level at which
flushing should occur and an optional target.
Note that without a target being set either here or via setTarget(),
a MemoryHandler is no use to anyone!
"""
BufferingHandler.__init__(self, capacity)
self.flushLevel = flushLevel
self.target = target
def shouldFlush(self, record):
"""
Check for buffer full or a record at the flushLevel or higher.
"""
return (len(self.buffer) >= self.capacity) or \
(record.levelno >= self.flushLevel)
def setTarget(self, target):
"""
Set the target handler for this handler.
"""
self.target = target
def flush(self):
"""
For a MemoryHandler, flushing means just sending the buffered
records to the target, if there is one. Override if you want
different behaviour.
The record buffer is also cleared by this operation.
"""
self.acquire()
try:
if self.target:
for record in self.buffer:
self.target.handle(record)
self.buffer = []
finally:
self.release()
def close(self):
"""
Flush, set the target to None and lose the buffer.
"""
try:
self.flush()
finally:
self.acquire()
try:
self.target = None
BufferingHandler.close(self)
finally:
self.release()
class QueueHandler(logging.Handler):
"""
This handler sends events to a queue. Typically, it would be used together
with a multiprocessing Queue to centralise logging to file in one process
(in a multi-process application), so as to avoid file write contention
between processes.
This code is new in Python 3.2, but this class can be copy pasted into
user code for use with earlier Python versions.
"""
def __init__(self, queue):
"""
Initialise an instance, using the passed queue.
"""
logging.Handler.__init__(self)
self.queue = queue
def enqueue(self, record):
"""
Enqueue a record.
The base implementation uses put_nowait. You may want to override
this method if you want to use blocking, timeouts or custom queue
implementations.
"""
self.queue.put_nowait(record)
def prepare(self, record):
"""
Prepares a record for queuing. The object returned by this method is
enqueued.
The base implementation formats the record to merge the message
and arguments, and removes unpickleable items from the record
in-place.
You might want to override this method if you want to convert
the record to a dict or JSON string, or send a modified copy
of the record while leaving the original intact.
"""
# The format operation gets traceback text into record.exc_text
# (if there's exception data), and also puts the message into
# record.message. We can then use this to replace the original
# msg + args, as these might be unpickleable. We also zap the
# exc_info attribute, as it's no longer needed and, if not None,
# will typically not be pickleable.
self.format(record)
record.msg = record.message
record.args = None
record.exc_info = None
return record
def emit(self, record):
"""
Emit a record.
Writes the LogRecord to the queue, preparing it for pickling first.
"""
try:
self.enqueue(self.prepare(record))
except Exception:
self.handleError(record)
if threading:
class QueueListener(object):
"""
This class implements an internal threaded listener which watches for
LogRecords being added to a queue, removes them and passes them to a
list of handlers for processing.
"""
_sentinel = None
def __init__(self, queue, *handlers):
"""
Initialise an instance with the specified queue and
handlers.
"""
self.queue = queue
self.handlers = handlers
self._stop = threading.Event()
self._thread = None
def dequeue(self, block):
"""
Dequeue a record and return it, optionally blocking.
The base implementation uses get. You may want to override this method
if you want to use timeouts or work with custom queue implementations.
"""
return self.queue.get(block)
def start(self):
"""
Start the listener.
This starts up a background thread to monitor the queue for
LogRecords to process.
"""
self._thread = t = threading.Thread(target=self._monitor)
t.setDaemon(True)
t.start()
def prepare(self , record):
"""
Prepare a record for handling.
This method just returns the passed-in record. You may want to
override this method if you need to do any custom marshalling or
manipulation of the record before passing it to the handlers.
"""
return record
def handle(self, record):
"""
Handle a record.
This just loops through the handlers offering them the record
to handle.
"""
record = self.prepare(record)
for handler in self.handlers:
handler.handle(record)
def _monitor(self):
"""
Monitor the queue for records, and ask the handler
to deal with them.
This method runs on a separate, internal thread.
The thread will terminate if it sees a sentinel object in the queue.
"""
q = self.queue
has_task_done = hasattr(q, 'task_done')
while not self._stop.isSet():
try:
record = self.dequeue(True)
if record is self._sentinel:
break
self.handle(record)
if has_task_done:
q.task_done()
except queue.Empty:
pass
# There might still be records in the queue.
while True:
try:
record = self.dequeue(False)
if record is self._sentinel:
break
self.handle(record)
if has_task_done:
q.task_done()
except queue.Empty:
break
def enqueue_sentinel(self):
"""
This is used to enqueue the sentinel record.
The base implementation uses put_nowait. You may want to override this
method if you want to use timeouts or work with custom queue
implementations.
"""
self.queue.put_nowait(self._sentinel)
def stop(self):
"""
Stop the listener.
This asks the thread to terminate, and then waits for it to do so.
Note that if you don't call this before your application exits, there
may be some records still left on the queue, which won't be processed.
"""
self._stop.set()
self.enqueue_sentinel()
self._thread.join()
self._thread = None
# Copyright 2001-2014 by Vinay Sajip. All Rights Reserved.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for any purpose and without fee is hereby granted,
# provided that the above copyright notice appear in all copies and that
# both that copyright notice and this permission notice appear in
# supporting documentation, and that the name of Vinay Sajip
# not be used in advertising or publicity pertaining to distribution
# of the software without specific, written prior permission.
# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
"""
Logging package for Python. Based on PEP 282 and comments thereto in
comp.lang.python.
Copyright (C) 2001-2014 Vinay Sajip. All Rights Reserved.
To use, simply 'import logging' and log away!
"""
import sys, os, time, io, traceback, warnings, weakref, collections
from string import Template
__all__ = ['BASIC_FORMAT', 'BufferingFormatter', 'CRITICAL', 'DEBUG', 'ERROR',
'FATAL', 'FileHandler', 'Filter', 'Formatter', 'Handler', 'INFO',
'LogRecord', 'Logger', 'LoggerAdapter', 'NOTSET', 'NullHandler',
'StreamHandler', 'WARN', 'WARNING', 'addLevelName', 'basicConfig',
'captureWarnings', 'critical', 'debug', 'disable', 'error',
'exception', 'fatal', 'getLevelName', 'getLogger', 'getLoggerClass',
'info', 'log', 'makeLogRecord', 'setLoggerClass', 'warn', 'warning',
'getLogRecordFactory', 'setLogRecordFactory', 'lastResort']
try:
import threading
except ImportError: #pragma: no cover
threading = None
__author__ = "Vinay Sajip <[email protected]>"
__status__ = "production"
# The following module attributes are no longer updated.
__version__ = "0.5.1.2"
__date__ = "07 February 2010"
#---------------------------------------------------------------------------
# Miscellaneous module data
#---------------------------------------------------------------------------
#
#_startTime is used as the base when calculating the relative time of events
#
_startTime = time.time()
#
#raiseExceptions is used to see if exceptions during handling should be
#propagated
#
raiseExceptions = True
#
# If you don't want threading information in the log, set this to zero
#
logThreads = True
#
# If you don't want multiprocessing information in the log, set this to zero
#
logMultiprocessing = True
#
# If you don't want process information in the log, set this to zero
#
logProcesses = True
#---------------------------------------------------------------------------
# Level related stuff
#---------------------------------------------------------------------------
#
# Default levels and level names, these can be replaced with any positive set
# of values having corresponding names. There is a pseudo-level, NOTSET, which
# is only really there as a lower limit for user-defined levels. Handlers and
# loggers are initialized with NOTSET so that they will log all messages, even
# at user-defined levels.
#
CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
INFO = 20
DEBUG = 10
NOTSET = 0
_levelToName = {
CRITICAL: 'CRITICAL',
ERROR: 'ERROR',
WARNING: 'WARNING',
INFO: 'INFO',
DEBUG: 'DEBUG',
NOTSET: 'NOTSET',
}
_nameToLevel = {
'CRITICAL': CRITICAL,
'ERROR': ERROR,
'WARN': WARNING,
'WARNING': WARNING,
'INFO': INFO,
'DEBUG': DEBUG,
'NOTSET': NOTSET,
}
def getLevelName(level):
"""
Return the textual representation of logging level 'level'.
If the level is one of the predefined levels (CRITICAL, ERROR, WARNING,
INFO, DEBUG) then you get the corresponding string. If you have
associated levels with names using addLevelName then the name you have
associated with 'level' is returned.
If a numeric value corresponding to one of the defined levels is passed
in, the corresponding string representation is returned.
Otherwise, the string "Level %s" % level is returned.
"""
# See Issue #22386 for the reason for this convoluted expression
return _levelToName.get(level, _nameToLevel.get(level, ("Level %s" % level)))
def addLevelName(level, levelName):
"""
Associate 'levelName' with 'level'.
This is used when converting levels to text during message formatting.
"""
_acquireLock()
try: #unlikely to cause an exception, but you never know...
_levelToName[level] = levelName
_nameToLevel[levelName] = level
finally:
_releaseLock()
if hasattr(sys, '_getframe'):
currentframe = lambda: sys._getframe(3)
else: #pragma: no cover
def currentframe():
"""Return the frame object for the caller's stack frame."""
try:
raise Exception
except Exception:
return sys.exc_info()[2].tb_frame.f_back
#
# _srcfile is used when walking the stack to check when we've got the first
# caller stack frame, by skipping frames whose filename is that of this
# module's source. It therefore should contain the filename of this module's
# source file.
#
# Ordinarily we would use __file__ for this, but frozen modules don't always
# have __file__ set, for some reason (see Issue #21736). Thus, we get the
# filename from a handy code object from a function defined in this module.
# (There's no particular reason for picking addLevelName.)
#
_srcfile = os.path.normcase(addLevelName.__code__.co_filename)
# _srcfile is only used in conjunction with sys._getframe().
# To provide compatibility with older versions of Python, set _srcfile
# to None if _getframe() is not available; this value will prevent
# findCaller() from being called. You can also do this if you want to avoid
# the overhead of fetching caller information, even when _getframe() is
# available.
#if not hasattr(sys, '_getframe'):
# _srcfile = None
def _checkLevel(level):
if isinstance(level, int):
rv = level
elif str(level) == level:
if level not in _nameToLevel:
raise ValueError("Unknown level: %r" % level)
rv = _nameToLevel[level]
else:
raise TypeError("Level not an integer or a valid string: %r" % level)
return rv
#---------------------------------------------------------------------------
# Thread-related stuff
#---------------------------------------------------------------------------
#
#_lock is used to serialize access to shared data structures in this module.
#This needs to be an RLock because fileConfig() creates and configures
#Handlers, and so might arbitrary user threads. Since Handler code updates the
#shared dictionary _handlers, it needs to acquire the lock. But if configuring,
#the lock would already have been acquired - so we need an RLock.
#The same argument applies to Loggers and Manager.loggerDict.
#
if threading:
_lock = threading.RLock()
else: #pragma: no cover
_lock = None
def _acquireLock():
"""
Acquire the module-level lock for serializing access to shared data.
This should be released with _releaseLock().
"""
if _lock:
_lock.acquire()
def _releaseLock():
"""
Release the module-level lock acquired by calling _acquireLock().
"""
if _lock:
_lock.release()
#---------------------------------------------------------------------------
# The logging record
#---------------------------------------------------------------------------
class LogRecord(object):
"""
A LogRecord instance represents an event being logged.
LogRecord instances are created every time something is logged. They
contain all the information pertinent to the event being logged. The
main information passed in is in msg and args, which are combined
using str(msg) % args to create the message field of the record. The
record also includes information such as when the record was created,
the source line where the logging call was made, and any exception
information to be logged.
"""
def __init__(self, name, level, pathname, lineno,
msg, args, exc_info, func=None, sinfo=None, **kwargs):
"""
Initialize a logging record with interesting information.
"""
ct = time.time()
self.name = name
self.msg = msg
#
# The following statement allows passing of a dictionary as a sole
# argument, so that you can do something like
# logging.debug("a %(a)d b %(b)s", {'a':1, 'b':2})
# Suggested by Stefan Behnel.
# Note that without the test for args[0], we get a problem because
# during formatting, we test to see if the arg is present using
# 'if self.args:'. If the event being logged is e.g. 'Value is %d'
# and if the passed arg fails 'if self.args:' then no formatting
# is done. For example, logger.warning('Value is %d', 0) would log
# 'Value is %d' instead of 'Value is 0'.
# For the use case of passing a dictionary, this should not be a
# problem.
# Issue #21172: a request was made to relax the isinstance check
# to hasattr(args[0], '__getitem__'). However, the docs on string
# formatting still seem to suggest a mapping object is required.
# Thus, while not removing the isinstance check, it does now look
# for collections.Mapping rather than, as before, dict.
if (args and len(args) == 1 and isinstance(args[0], collections.Mapping)
and args[0]):
args = args[0]
self.args = args
self.levelname = getLevelName(level)
self.levelno = level
self.pathname = pathname
try:
self.filename = os.path.basename(pathname)
self.module = os.path.splitext(self.filename)[0]
except (TypeError, ValueError, AttributeError):
self.filename = pathname
self.module = "Unknown module"
self.exc_info = exc_info
self.exc_text = None # used to cache the traceback text
self.stack_info = sinfo
self.lineno = lineno
self.funcName = func
self.created = ct
self.msecs = (ct - int(ct)) * 1000
self.relativeCreated = (self.created - _startTime) * 1000
if logThreads and threading:
self.thread = threading.get_ident()
self.threadName = threading.current_thread().name
else: # pragma: no cover
self.thread = None
self.threadName = None
if not logMultiprocessing: # pragma: no cover
self.processName = None
else:
self.processName = 'MainProcess'
mp = sys.modules.get('multiprocessing')
if mp is not None:
# Errors may occur if multiprocessing has not finished loading
# yet - e.g. if a custom import hook causes third-party code
# to run when multiprocessing calls import. See issue 8200
# for an example
try:
self.processName = mp.current_process().name
except Exception: #pragma: no cover
pass
if logProcesses and hasattr(os, 'getpid'):
self.process = os.getpid()
else:
self.process = None
def __str__(self):
return '<LogRecord: %s, %s, %s, %s, "%s">'%(self.name, self.levelno,
self.pathname, self.lineno, self.msg)
def getMessage(self):
"""
Return the message for this LogRecord.
Return the message for this LogRecord after merging any user-supplied
arguments with the message.
"""
msg = str(self.msg)
if self.args:
msg = msg % self.args
return msg
#
# Determine which class to use when instantiating log records.
#
_logRecordFactory = LogRecord
def setLogRecordFactory(factory):
"""
Set the factory to be used when instantiating a log record.
:param factory: A callable which will be called to instantiate
a log record.
"""
global _logRecordFactory
_logRecordFactory = factory
def getLogRecordFactory():
"""
Return the factory to be used when instantiating a log record.
"""
return _logRecordFactory
def makeLogRecord(dict):
"""
Make a LogRecord whose attributes are defined by the specified dictionary,
This function is useful for converting a logging event received over
a socket connection (which is sent as a dictionary) into a LogRecord
instance.
"""
rv = _logRecordFactory(None, None, "", 0, "", (), None, None)
rv.__dict__.update(dict)
return rv
#---------------------------------------------------------------------------
# Formatter classes and functions
#---------------------------------------------------------------------------
class PercentStyle(object):
default_format = '%(message)s'
asctime_format = '%(asctime)s'
asctime_search = '%(asctime)'
def __init__(self, fmt):
self._fmt = fmt or self.default_format
def usesTime(self):
return self._fmt.find(self.asctime_search) >= 0
def format(self, record):
return self._fmt % record.__dict__
class StrFormatStyle(PercentStyle):
default_format = '{message}'
asctime_format = '{asctime}'
asctime_search = '{asctime'
def format(self, record):
return self._fmt.format(**record.__dict__)
class StringTemplateStyle(PercentStyle):
default_format = '${message}'
asctime_format = '${asctime}'
asctime_search = '${asctime}'
def __init__(self, fmt):
self._fmt = fmt or self.default_format
self._tpl = Template(self._fmt)
def usesTime(self):
fmt = self._fmt
return fmt.find('$asctime') >= 0 or fmt.find(self.asctime_format) >= 0
def format(self, record):
return self._tpl.substitute(**record.__dict__)
BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s"
_STYLES = {
'%': (PercentStyle, BASIC_FORMAT),
'{': (StrFormatStyle, '{levelname}:{name}:{message}'),
'$': (StringTemplateStyle, '${levelname}:${name}:${message}'),
}
class Formatter(object):
"""
Formatter instances are used to convert a LogRecord to text.
Formatters need to know how a LogRecord is constructed. They are
responsible for converting a LogRecord to (usually) a string which can
be interpreted by either a human or an external system. The base Formatter
allows a formatting string to be specified. If none is supplied, the
default value of "%s(message)" is used.
The Formatter can be initialized with a format string which makes use of
knowledge of the LogRecord attributes - e.g. the default value mentioned
above makes use of the fact that the user's message and arguments are pre-
formatted into a LogRecord's message attribute. Currently, the useful
attributes in a LogRecord are described by:
%(name)s Name of the logger (logging channel)
%(levelno)s Numeric logging level for the message (DEBUG, INFO,
WARNING, ERROR, CRITICAL)
%(levelname)s Text logging level for the message ("DEBUG", "INFO",
"WARNING", "ERROR", "CRITICAL")
%(pathname)s Full pathname of the source file where the logging
call was issued (if available)
%(filename)s Filename portion of pathname
%(module)s Module (name portion of filename)
%(lineno)d Source line number where the logging call was issued
(if available)
%(funcName)s Function name
%(created)f Time when the LogRecord was created (time.time()
return value)
%(asctime)s Textual time when the LogRecord was created
%(msecs)d Millisecond portion of the creation time
%(relativeCreated)d Time in milliseconds when the LogRecord was created,
relative to the time the logging module was loaded
(typically at application startup time)
%(thread)d Thread ID (if available)
%(threadName)s Thread name (if available)
%(process)d Process ID (if available)
%(message)s The result of record.getMessage(), computed just as
the record is emitted
"""
converter = time.localtime
def __init__(self, fmt=None, datefmt=None, style='%'):
"""
Initialize the formatter with specified format strings.
Initialize the formatter either with the specified format string, or a
default as described above. Allow for specialized date formatting with
the optional datefmt argument (if omitted, you get the ISO8601 format).
Use a style parameter of '%', '{' or '$' to specify that you want to
use one of %-formatting, :meth:`str.format` (``{}``) formatting or
:class:`string.Template` formatting in your format string.
.. versionchanged: 3.2
Added the ``style`` parameter.
"""
if style not in _STYLES:
raise ValueError('Style must be one of: %s' % ','.join(
_STYLES.keys()))
self._style = _STYLES[style][0](fmt)
self._fmt = self._style._fmt
self.datefmt = datefmt
default_time_format = '%Y-%m-%d %H:%M:%S'
default_msec_format = '%s,%03d'
def formatTime(self, record, datefmt=None):
"""
Return the creation time of the specified LogRecord as formatted text.
This method should be called from format() by a formatter which
wants to make use of a formatted time. This method can be overridden
in formatters to provide for any specific requirement, but the
basic behaviour is as follows: if datefmt (a string) is specified,
it is used with time.strftime() to format the creation time of the
record. Otherwise, the ISO8601 format is used. The resulting
string is returned. This function uses a user-configurable function
to convert the creation time to a tuple. By default, time.localtime()
is used; to change this for a particular formatter instance, set the
'converter' attribute to a function with the same signature as
time.localtime() or time.gmtime(). To change it for all formatters,
for example if you want all logging times to be shown in GMT,
set the 'converter' attribute in the Formatter class.
"""
ct = self.converter(record.created)
if datefmt:
s = time.strftime(datefmt, ct)
else:
t = time.strftime(self.default_time_format, ct)
s = self.default_msec_format % (t, record.msecs)
return s
def formatException(self, ei):
"""
Format and return the specified exception information as a string.
This default implementation just uses
traceback.print_exception()
"""
sio = io.StringIO()
tb = ei[2]
# See issues #9427, #1553375. Commented out for now.
#if getattr(self, 'fullstack', False):
# traceback.print_stack(tb.tb_frame.f_back, file=sio)
traceback.print_exception(ei[0], ei[1], tb, None, sio)
s = sio.getvalue()
sio.close()
if s[-1:] == "\n":
s = s[:-1]
return s
def usesTime(self):
"""
Check if the format uses the creation time of the record.
"""
return self._style.usesTime()
def formatMessage(self, record):
return self._style.format(record)
def formatStack(self, stack_info):
"""
This method is provided as an extension point for specialized
formatting of stack information.
The input data is a string as returned from a call to
:func:`traceback.print_stack`, but with the last trailing newline
removed.
The base implementation just returns the value passed in.
"""
return stack_info
def format(self, record):
"""
Format the specified record as text.
The record's attribute dictionary is used as the operand to a
string formatting operation which yields the returned string.
Before formatting the dictionary, a couple of preparatory steps
are carried out. The message attribute of the record is computed
using LogRecord.getMessage(). If the formatting string uses the
time (as determined by a call to usesTime(), formatTime() is
called to format the event time. If there is exception information,
it is formatted using formatException() and appended to the message.
"""
record.message = record.getMessage()
if self.usesTime():
record.asctime = self.formatTime(record, self.datefmt)
s = self.formatMessage(record)
if record.exc_info:
# Cache the traceback text to avoid converting it multiple times
# (it's constant anyway)
if not record.exc_text:
record.exc_text = self.formatException(record.exc_info)
if record.exc_text:
if s[-1:] != "\n":
s = s + "\n"
s = s + record.exc_text
if record.stack_info:
if s[-1:] != "\n":
s = s + "\n"
s = s + self.formatStack(record.stack_info)
return s
#
# The default formatter to use when no other is specified
#
_defaultFormatter = Formatter()
class BufferingFormatter(object):
"""
A formatter suitable for formatting a number of records.
"""
def __init__(self, linefmt=None):
"""
Optionally specify a formatter which will be used to format each
individual record.
"""
if linefmt:
self.linefmt = linefmt
else:
self.linefmt = _defaultFormatter
def formatHeader(self, records):
"""
Return the header string for the specified records.
"""
return ""
def formatFooter(self, records):
"""
Return the footer string for the specified records.
"""
return ""
def format(self, records):
"""
Format the specified records and return the result as a string.
"""
rv = ""
if len(records) > 0:
rv = rv + self.formatHeader(records)
for record in records:
rv = rv + self.linefmt.format(record)
rv = rv + self.formatFooter(records)
return rv
#---------------------------------------------------------------------------
# Filter classes and functions
#---------------------------------------------------------------------------
class Filter(object):
"""
Filter instances are used to perform arbitrary filtering of LogRecords.
Loggers and Handlers can optionally use Filter instances to filter
records as desired. The base filter class only allows events which are
below a certain point in the logger hierarchy. For example, a filter
initialized with "A.B" will allow events logged by loggers "A.B",
"A.B.C", "A.B.C.D", "A.B.D" etc. but not "A.BB", "B.A.B" etc. If
initialized with the empty string, all events are passed.
"""
def __init__(self, name=''):
"""
Initialize a filter.
Initialize with the name of the logger which, together with its
children, will have its events allowed through the filter. If no
name is specified, allow every event.
"""
self.name = name
self.nlen = len(name)
def filter(self, record):
"""
Determine if the specified record is to be logged.
Is the specified record to be logged? Returns 0 for no, nonzero for
yes. If deemed appropriate, the record may be modified in-place.
"""
if self.nlen == 0:
return True
elif self.name == record.name:
return True
elif record.name.find(self.name, 0, self.nlen) != 0:
return False
return (record.name[self.nlen] == ".")
class Filterer(object):
"""
A base class for loggers and handlers which allows them to share
common code.
"""
def __init__(self):
"""
Initialize the list of filters to be an empty list.
"""
self.filters = []
def addFilter(self, filter):
"""
Add the specified filter to this handler.
"""
if not (filter in self.filters):
self.filters.append(filter)
def removeFilter(self, filter):
"""
Remove the specified filter from this handler.
"""
if filter in self.filters:
self.filters.remove(filter)
def filter(self, record):
"""
Determine if a record is loggable by consulting all the filters.
The default is to allow the record to be logged; any filter can veto
this and the record is then dropped. Returns a zero value if a record
is to be dropped, else non-zero.
.. versionchanged: 3.2
Allow filters to be just callables.
"""
rv = True
for f in self.filters:
if hasattr(f, 'filter'):
result = f.filter(record)
else:
result = f(record) # assume callable - will raise if not
if not result:
rv = False
break
return rv
#---------------------------------------------------------------------------
# Handler classes and functions
#---------------------------------------------------------------------------
_handlers = weakref.WeakValueDictionary() #map of handler names to handlers
_handlerList = [] # added to allow handlers to be removed in reverse of order initialized
def _removeHandlerRef(wr):
"""
Remove a handler reference from the internal cleanup list.
"""
# This function can be called during module teardown, when globals are
# set to None. It can also be called from another thread. So we need to
# pre-emptively grab the necessary globals and check if they're None,
# to prevent race conditions and failures during interpreter shutdown.
acquire, release, handlers = _acquireLock, _releaseLock, _handlerList
if acquire and release and handlers:
acquire()
try:
if wr in handlers:
handlers.remove(wr)
finally:
release()
def _addHandlerRef(handler):
"""
Add a handler to the internal cleanup list using a weak reference.
"""
_acquireLock()
try:
_handlerList.append(weakref.ref(handler, _removeHandlerRef))
finally:
_releaseLock()
class Handler(Filterer):
"""
Handler instances dispatch logging events to specific destinations.
The base handler class. Acts as a placeholder which defines the Handler
interface. Handlers can optionally use Formatter instances to format
records as desired. By default, no formatter is specified; in this case,
the 'raw' message as determined by record.message is logged.
"""
def __init__(self, level=NOTSET):
"""
Initializes the instance - basically setting the formatter to None
and the filter list to empty.
"""
Filterer.__init__(self)
self._name = None
self.level = _checkLevel(level)
self.formatter = None
# Add the handler to the global _handlerList (for cleanup on shutdown)
_addHandlerRef(self)
self.createLock()
def get_name(self):
return self._name
def set_name(self, name):
_acquireLock()
try:
if self._name in _handlers:
del _handlers[self._name]
self._name = name
if name:
_handlers[name] = self
finally:
_releaseLock()
name = property(get_name, set_name)
def createLock(self):
"""
Acquire a thread lock for serializing access to the underlying I/O.
"""
if threading:
self.lock = threading.RLock()
else: #pragma: no cover
self.lock = None
def acquire(self):
"""
Acquire the I/O thread lock.
"""
if self.lock:
self.lock.acquire()
def release(self):
"""
Release the I/O thread lock.
"""
if self.lock:
self.lock.release()
def setLevel(self, level):
"""
Set the logging level of this handler. level must be an int or a str.
"""
self.level = _checkLevel(level)
def format(self, record):
"""
Format the specified record.
If a formatter is set, use it. Otherwise, use the default formatter
for the module.
"""
if self.formatter:
fmt = self.formatter
else:
fmt = _defaultFormatter
return fmt.format(record)
def emit(self, record):
"""
Do whatever it takes to actually log the specified logging record.
This version is intended to be implemented by subclasses and so
raises a NotImplementedError.
"""
raise NotImplementedError('emit must be implemented '
'by Handler subclasses')
def handle(self, record):
"""
Conditionally emit the specified logging record.
Emission depends on filters which may have been added to the handler.
Wrap the actual emission of the record with acquisition/release of
the I/O thread lock. Returns whether the filter passed the record for
emission.
"""
rv = self.filter(record)
if rv:
self.acquire()
try:
self.emit(record)
finally:
self.release()
return rv
def setFormatter(self, fmt):
"""
Set the formatter for this handler.
"""
self.formatter = fmt
def flush(self):
"""
Ensure all logging output has been flushed.
This version does nothing and is intended to be implemented by
subclasses.
"""
pass
def close(self):
"""
Tidy up any resources used by the handler.
This version removes the handler from an internal map of handlers,
_handlers, which is used for handler lookup by name. Subclasses
should ensure that this gets called from overridden close()
methods.
"""
#get the module data lock, as we're updating a shared structure.
_acquireLock()
try: #unlikely to raise an exception, but you never know...
if self._name and self._name in _handlers:
del _handlers[self._name]
finally:
_releaseLock()
def handleError(self, record):
"""
Handle errors which occur during an emit() call.
This method should be called from handlers when an exception is
encountered during an emit() call. If raiseExceptions is false,
exceptions get silently ignored. This is what is mostly wanted
for a logging system - most users will not care about errors in
the logging system, they are more interested in application errors.
You could, however, replace this with a custom handler if you wish.
The record which was being processed is passed in to this method.
"""
if raiseExceptions and sys.stderr: # see issue 13807
t, v, tb = sys.exc_info()
try:
sys.stderr.write('--- Logging error ---\n')
traceback.print_exception(t, v, tb, None, sys.stderr)
sys.stderr.write('Call stack:\n')
# Walk the stack frame up until we're out of logging,
# so as to print the calling context.
frame = tb.tb_frame
while (frame and os.path.dirname(frame.f_code.co_filename) ==
__path__[0]):
frame = frame.f_back
if frame:
traceback.print_stack(frame, file=sys.stderr)
else:
# couldn't find the right stack frame, for some reason
sys.stderr.write('Logged from file %s, line %s\n' % (
record.filename, record.lineno))
# Issue 18671: output logging message and arguments
try:
sys.stderr.write('Message: %r\n'
'Arguments: %s\n' % (record.msg,
record.args))
except Exception:
sys.stderr.write('Unable to print the message and arguments'
' - possible formatting error.\nUse the'
' traceback above to help find the error.\n'
)
except OSError: #pragma: no cover
pass # see issue 5971
finally:
del t, v, tb
class StreamHandler(Handler):
"""
A handler class which writes logging records, appropriately formatted,
to a stream. Note that this class does not close the stream, as
sys.stdout or sys.stderr may be used.
"""
terminator = '\n'
def __init__(self, stream=None):
"""
Initialize the handler.
If stream is not specified, sys.stderr is used.
"""
Handler.__init__(self)
if stream is None:
stream = sys.stderr
self.stream = stream
def flush(self):
"""
Flushes the stream.
"""
self.acquire()
try:
if self.stream and hasattr(self.stream, "flush"):
self.stream.flush()
finally:
self.release()
def emit(self, record):
"""
Emit a record.
If a formatter is specified, it is used to format the record.
The record is then written to the stream with a trailing newline. If
exception information is present, it is formatted using
traceback.print_exception and appended to the stream. If the stream
has an 'encoding' attribute, it is used to determine how to do the
output to the stream.
"""
try:
msg = self.format(record)
stream = self.stream
stream.write(msg)
stream.write(self.terminator)
self.flush()
except Exception:
self.handleError(record)
class FileHandler(StreamHandler):
"""
A handler class which writes formatted logging records to disk files.
"""
def __init__(self, filename, mode='a', encoding=None, delay=False):
"""
Open the specified file and use it as the stream for logging.
"""
#keep the absolute path, otherwise derived classes which use this
#may come a cropper when the current directory changes
self.baseFilename = os.path.abspath(filename)
self.mode = mode
self.encoding = encoding
self.delay = delay
if delay:
#We don't open the stream, but we still need to call the
#Handler constructor to set level, formatter, lock etc.
Handler.__init__(self)
self.stream = None
else:
StreamHandler.__init__(self, self._open())
def close(self):
"""
Closes the stream.
"""
self.acquire()
try:
try:
if self.stream:
try:
self.flush()
finally:
stream = self.stream
self.stream = None
if hasattr(stream, "close"):
stream.close()
finally:
# Issue #19523: call unconditionally to
# prevent a handler leak when delay is set
StreamHandler.close(self)
finally:
self.release()
def _open(self):
"""
Open the current base file with the (original) mode and encoding.
Return the resulting stream.
"""
return open(self.baseFilename, self.mode, encoding=self.encoding)
def emit(self, record):
"""
Emit a record.
If the stream was not opened because 'delay' was specified in the
constructor, open it before calling the superclass's emit.
"""
if self.stream is None:
self.stream = self._open()
StreamHandler.emit(self, record)
class _StderrHandler(StreamHandler):
"""
This class is like a StreamHandler using sys.stderr, but always uses
whatever sys.stderr is currently set to rather than the value of
sys.stderr at handler construction time.
"""
def __init__(self, level=NOTSET):
"""
Initialize the handler.
"""
Handler.__init__(self, level)
@property
def stream(self):
return sys.stderr
_defaultLastResort = _StderrHandler(WARNING)
lastResort = _defaultLastResort
#---------------------------------------------------------------------------
# Manager classes and functions
#---------------------------------------------------------------------------
class PlaceHolder(object):
"""
PlaceHolder instances are used in the Manager logger hierarchy to take
the place of nodes for which no loggers have been defined. This class is
intended for internal use only and not as part of the public API.
"""
def __init__(self, alogger):
"""
Initialize with the specified logger being a child of this placeholder.
"""
self.loggerMap = { alogger : None }
def append(self, alogger):
"""
Add the specified logger as a child of this placeholder.
"""
if alogger not in self.loggerMap:
self.loggerMap[alogger] = None
#
# Determine which class to use when instantiating loggers.
#
_loggerClass = None
def setLoggerClass(klass):
"""
Set the class to be used when instantiating a logger. The class should
define __init__() such that only a name argument is required, and the
__init__() should call Logger.__init__()
"""
if klass != Logger:
if not issubclass(klass, Logger):
raise TypeError("logger not derived from logging.Logger: "
+ klass.__name__)
global _loggerClass
_loggerClass = klass
def getLoggerClass():
"""
Return the class to be used when instantiating a logger.
"""
return _loggerClass
class Manager(object):
"""
There is [under normal circumstances] just one Manager instance, which
holds the hierarchy of loggers.
"""
def __init__(self, rootnode):
"""
Initialize the manager with the root node of the logger hierarchy.
"""
self.root = rootnode
self.disable = 0
self.emittedNoHandlerWarning = False
self.loggerDict = {}
self.loggerClass = None
self.logRecordFactory = None
def getLogger(self, name):
"""
Get a logger with the specified name (channel name), creating it
if it doesn't yet exist. This name is a dot-separated hierarchical
name, such as "a", "a.b", "a.b.c" or similar.
If a PlaceHolder existed for the specified name [i.e. the logger
didn't exist but a child of it did], replace it with the created
logger and fix up the parent/child references which pointed to the
placeholder to now point to the logger.
"""
rv = None
if not isinstance(name, str):
raise TypeError('A logger name must be a string')
_acquireLock()
try:
if name in self.loggerDict:
rv = self.loggerDict[name]
if isinstance(rv, PlaceHolder):
ph = rv
rv = (self.loggerClass or _loggerClass)(name)
rv.manager = self
self.loggerDict[name] = rv
self._fixupChildren(ph, rv)
self._fixupParents(rv)
else:
rv = (self.loggerClass or _loggerClass)(name)
rv.manager = self
self.loggerDict[name] = rv
self._fixupParents(rv)
finally:
_releaseLock()
return rv
def setLoggerClass(self, klass):
"""
Set the class to be used when instantiating a logger with this Manager.
"""
if klass != Logger:
if not issubclass(klass, Logger):
raise TypeError("logger not derived from logging.Logger: "
+ klass.__name__)
self.loggerClass = klass
def setLogRecordFactory(self, factory):
"""
Set the factory to be used when instantiating a log record with this
Manager.
"""
self.logRecordFactory = factory
def _fixupParents(self, alogger):
"""
Ensure that there are either loggers or placeholders all the way
from the specified logger to the root of the logger hierarchy.
"""
name = alogger.name
i = name.rfind(".")
rv = None
while (i > 0) and not rv:
substr = name[:i]
if substr not in self.loggerDict:
self.loggerDict[substr] = PlaceHolder(alogger)
else:
obj = self.loggerDict[substr]
if isinstance(obj, Logger):
rv = obj
else:
assert isinstance(obj, PlaceHolder)
obj.append(alogger)
i = name.rfind(".", 0, i - 1)
if not rv:
rv = self.root
alogger.parent = rv
def _fixupChildren(self, ph, alogger):
"""
Ensure that children of the placeholder ph are connected to the
specified logger.
"""
name = alogger.name
namelen = len(name)
for c in ph.loggerMap.keys():
#The if means ... if not c.parent.name.startswith(nm)
if c.parent.name[:namelen] != name:
alogger.parent = c.parent
c.parent = alogger
#---------------------------------------------------------------------------
# Logger classes and functions
#---------------------------------------------------------------------------
class Logger(Filterer):
"""
Instances of the Logger class represent a single logging channel. A
"logging channel" indicates an area of an application. Exactly how an
"area" is defined is up to the application developer. Since an
application can have any number of areas, logging channels are identified
by a unique string. Application areas can be nested (e.g. an area
of "input processing" might include sub-areas "read CSV files", "read
XLS files" and "read Gnumeric files"). To cater for this natural nesting,
channel names are organized into a namespace hierarchy where levels are
separated by periods, much like the Java or Python package namespace. So
in the instance given above, channel names might be "input" for the upper
level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels.
There is no arbitrary limit to the depth of nesting.
"""
def __init__(self, name, level=NOTSET):
"""
Initialize the logger with a name and an optional level.
"""
Filterer.__init__(self)
self.name = name
self.level = _checkLevel(level)
self.parent = None
self.propagate = True
self.handlers = []
self.disabled = False
def setLevel(self, level):
"""
Set the logging level of this logger. level must be an int or a str.
"""
self.level = _checkLevel(level)
def debug(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'DEBUG'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.debug("Houston, we have a %s", "thorny problem", exc_info=1)
"""
if self.isEnabledFor(DEBUG):
self._log(DEBUG, msg, args, **kwargs)
def info(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'INFO'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.info("Houston, we have a %s", "interesting problem", exc_info=1)
"""
if self.isEnabledFor(INFO):
self._log(INFO, msg, args, **kwargs)
def warning(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'WARNING'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.warning("Houston, we have a %s", "bit of a problem", exc_info=1)
"""
if self.isEnabledFor(WARNING):
self._log(WARNING, msg, args, **kwargs)
def warn(self, msg, *args, **kwargs):
warnings.warn("The 'warn' method is deprecated, "
"use 'warning' instead", DeprecationWarning, 2)
self.warning(msg, *args, **kwargs)
def error(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'ERROR'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.error("Houston, we have a %s", "major problem", exc_info=1)
"""
if self.isEnabledFor(ERROR):
self._log(ERROR, msg, args, **kwargs)
def exception(self, msg, *args, **kwargs):
"""
Convenience method for logging an ERROR with exception information.
"""
kwargs['exc_info'] = True
self.error(msg, *args, **kwargs)
def critical(self, msg, *args, **kwargs):
"""
Log 'msg % args' with severity 'CRITICAL'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.critical("Houston, we have a %s", "major disaster", exc_info=1)
"""
if self.isEnabledFor(CRITICAL):
self._log(CRITICAL, msg, args, **kwargs)
fatal = critical
def log(self, level, msg, *args, **kwargs):
"""
Log 'msg % args' with the integer severity 'level'.
To pass exception information, use the keyword argument exc_info with
a true value, e.g.
logger.log(level, "We have a %s", "mysterious problem", exc_info=1)
"""
if not isinstance(level, int):
if raiseExceptions:
raise TypeError("level must be an integer")
else:
return
if self.isEnabledFor(level):
self._log(level, msg, args, **kwargs)
def findCaller(self, stack_info=False):
"""
Find the stack frame of the caller so that we can note the source
file name, line number and function name.
"""
f = currentframe()
#On some versions of IronPython, currentframe() returns None if
#IronPython isn't run with -X:Frames.
if f is not None:
f = f.f_back
rv = "(unknown file)", 0, "(unknown function)", None
while hasattr(f, "f_code"):
co = f.f_code
filename = os.path.normcase(co.co_filename)
if filename == _srcfile:
f = f.f_back
continue
sinfo = None
if stack_info:
sio = io.StringIO()
sio.write('Stack (most recent call last):\n')
traceback.print_stack(f, file=sio)
sinfo = sio.getvalue()
if sinfo[-1] == '\n':
sinfo = sinfo[:-1]
sio.close()
rv = (co.co_filename, f.f_lineno, co.co_name, sinfo)
break
return rv
def makeRecord(self, name, level, fn, lno, msg, args, exc_info,
func=None, extra=None, sinfo=None):
"""
A factory method which can be overridden in subclasses to create
specialized LogRecords.
"""
rv = _logRecordFactory(name, level, fn, lno, msg, args, exc_info, func,
sinfo)
if extra is not None:
for key in extra:
if (key in ["message", "asctime"]) or (key in rv.__dict__):
raise KeyError("Attempt to overwrite %r in LogRecord" % key)
rv.__dict__[key] = extra[key]
return rv
def _log(self, level, msg, args, exc_info=None, extra=None, stack_info=False):
"""
Low-level logging routine which creates a LogRecord and then calls
all the handlers of this logger to handle the record.
"""
sinfo = None
if _srcfile:
#IronPython doesn't track Python frames, so findCaller raises an
#exception on some versions of IronPython. We trap it here so that
#IronPython can use logging.
try:
fn, lno, func, sinfo = self.findCaller(stack_info)
except ValueError: # pragma: no cover
fn, lno, func = "(unknown file)", 0, "(unknown function)"
else: # pragma: no cover
fn, lno, func = "(unknown file)", 0, "(unknown function)"
if exc_info:
if not isinstance(exc_info, tuple):
exc_info = sys.exc_info()
record = self.makeRecord(self.name, level, fn, lno, msg, args,
exc_info, func, extra, sinfo)
self.handle(record)
def handle(self, record):
"""
Call the handlers for the specified record.
This method is used for unpickled records received from a socket, as
well as those created locally. Logger-level filtering is applied.
"""
if (not self.disabled) and self.filter(record):
self.callHandlers(record)
def addHandler(self, hdlr):
"""
Add the specified handler to this logger.
"""
_acquireLock()
try:
if not (hdlr in self.handlers):
self.handlers.append(hdlr)
finally:
_releaseLock()
def removeHandler(self, hdlr):
"""
Remove the specified handler from this logger.
"""
_acquireLock()
try:
if hdlr in self.handlers:
self.handlers.remove(hdlr)
finally:
_releaseLock()
def hasHandlers(self):
"""
See if this logger has any handlers configured.
Loop through all handlers for this logger and its parents in the
logger hierarchy. Return True if a handler was found, else False.
Stop searching up the hierarchy whenever a logger with the "propagate"
attribute set to zero is found - that will be the last logger which
is checked for the existence of handlers.
"""
c = self
rv = False
while c:
if c.handlers:
rv = True
break
if not c.propagate:
break
else:
c = c.parent
return rv
def callHandlers(self, record):
"""
Pass a record to all relevant handlers.
Loop through all handlers for this logger and its parents in the
logger hierarchy. If no handler was found, output a one-off error
message to sys.stderr. Stop searching up the hierarchy whenever a
logger with the "propagate" attribute set to zero is found - that
will be the last logger whose handlers are called.
"""
c = self
found = 0
while c:
for hdlr in c.handlers:
found = found + 1
if record.levelno >= hdlr.level:
hdlr.handle(record)
if not c.propagate:
c = None #break out
else:
c = c.parent
if (found == 0):
if lastResort:
if record.levelno >= lastResort.level:
lastResort.handle(record)
elif raiseExceptions and not self.manager.emittedNoHandlerWarning:
sys.stderr.write("No handlers could be found for logger"
" \"%s\"\n" % self.name)
self.manager.emittedNoHandlerWarning = True
def getEffectiveLevel(self):
"""
Get the effective level for this logger.
Loop through this logger and its parents in the logger hierarchy,
looking for a non-zero logging level. Return the first one found.
"""
logger = self
while logger:
if logger.level:
return logger.level
logger = logger.parent
return NOTSET
def isEnabledFor(self, level):
"""
Is this logger enabled for level 'level'?
"""
if self.manager.disable >= level:
return False
return level >= self.getEffectiveLevel()
def getChild(self, suffix):
"""
Get a logger which is a descendant to this one.
This is a convenience method, such that
logging.getLogger('abc').getChild('def.ghi')
is the same as
logging.getLogger('abc.def.ghi')
It's useful, for example, when the parent logger is named using
__name__ rather than a literal string.
"""
if self.root is not self:
suffix = '.'.join((self.name, suffix))
return self.manager.getLogger(suffix)
class RootLogger(Logger):
"""
A root logger is not that different to any other logger, except that
it must have a logging level and there is only one instance of it in
the hierarchy.
"""
def __init__(self, level):
"""
Initialize the logger with the name "root".
"""
Logger.__init__(self, "root", level)
_loggerClass = Logger
class LoggerAdapter(object):
"""
An adapter for loggers which makes it easier to specify contextual
information in logging output.
"""
def __init__(self, logger, extra):
"""
Initialize the adapter with a logger and a dict-like object which
provides contextual information. This constructor signature allows
easy stacking of LoggerAdapters, if so desired.
You can effectively pass keyword arguments as shown in the
following example:
adapter = LoggerAdapter(someLogger, dict(p1=v1, p2="v2"))
"""
self.logger = logger
self.extra = extra
def process(self, msg, kwargs):
"""
Process the logging message and keyword arguments passed in to
a logging call to insert contextual information. You can either
manipulate the message itself, the keyword args or both. Return
the message and kwargs modified (or not) to suit your needs.
Normally, you'll only need to override this one method in a
LoggerAdapter subclass for your specific needs.
"""
kwargs["extra"] = self.extra
return msg, kwargs
#
# Boilerplate convenience methods
#
def debug(self, msg, *args, **kwargs):
"""
Delegate a debug call to the underlying logger.
"""
self.log(DEBUG, msg, *args, **kwargs)
def info(self, msg, *args, **kwargs):
"""
Delegate an info call to the underlying logger.
"""
self.log(INFO, msg, *args, **kwargs)
def warning(self, msg, *args, **kwargs):
"""
Delegate a warning call to the underlying logger.
"""
self.log(WARNING, msg, *args, **kwargs)
def warn(self, msg, *args, **kwargs):
warnings.warn("The 'warn' method is deprecated, "
"use 'warning' instead", DeprecationWarning, 2)
self.warning(msg, *args, **kwargs)
def error(self, msg, *args, **kwargs):
"""
Delegate an error call to the underlying logger.
"""
self.log(ERROR, msg, *args, **kwargs)
def exception(self, msg, *args, **kwargs):
"""
Delegate an exception call to the underlying logger.
"""
kwargs["exc_info"] = True
self.log(ERROR, msg, *args, **kwargs)
def critical(self, msg, *args, **kwargs):
"""
Delegate a critical call to the underlying logger.
"""
self.log(CRITICAL, msg, *args, **kwargs)
def log(self, level, msg, *args, **kwargs):
"""
Delegate a log call to the underlying logger, after adding
contextual information from this adapter instance.
"""
if self.isEnabledFor(level):
msg, kwargs = self.process(msg, kwargs)
self.logger._log(level, msg, args, **kwargs)
def isEnabledFor(self, level):
"""
Is this logger enabled for level 'level'?
"""
if self.logger.manager.disable >= level:
return False
return level >= self.getEffectiveLevel()
def setLevel(self, level):
"""
Set the specified level on the underlying logger.
"""
self.logger.setLevel(level)
def getEffectiveLevel(self):
"""
Get the effective level for the underlying logger.
"""
return self.logger.getEffectiveLevel()
def hasHandlers(self):
"""
See if the underlying logger has any handlers.
"""
return self.logger.hasHandlers()
root = RootLogger(WARNING)
Logger.root = root
Logger.manager = Manager(Logger.root)
#---------------------------------------------------------------------------
# Configuration classes and functions
#---------------------------------------------------------------------------
def basicConfig(**kwargs):
"""
Do basic configuration for the logging system.
This function does nothing if the root logger already has handlers
configured. It is a convenience method intended for use by simple scripts
to do one-shot configuration of the logging package.
The default behaviour is to create a StreamHandler which writes to
sys.stderr, set a formatter using the BASIC_FORMAT format string, and
add the handler to the root logger.
A number of optional keyword arguments may be specified, which can alter
the default behaviour.
filename Specifies that a FileHandler be created, using the specified
filename, rather than a StreamHandler.
filemode Specifies the mode to open the file, if filename is specified
(if filemode is unspecified, it defaults to 'a').
format Use the specified format string for the handler.
datefmt Use the specified date/time format.
style If a format string is specified, use this to specify the
type of format string (possible values '%', '{', '$', for
%-formatting, :meth:`str.format` and :class:`string.Template`
- defaults to '%').
level Set the root logger level to the specified level.
stream Use the specified stream to initialize the StreamHandler. Note
that this argument is incompatible with 'filename' - if both
are present, 'stream' is ignored.
handlers If specified, this should be an iterable of already created
handlers, which will be added to the root handler. Any handler
in the list which does not have a formatter assigned will be
assigned the formatter created in this function.
Note that you could specify a stream created using open(filename, mode)
rather than passing the filename and mode in. However, it should be
remembered that StreamHandler does not close its stream (since it may be
using sys.stdout or sys.stderr), whereas FileHandler closes its stream
when the handler is closed.
.. versionchanged:: 3.2
Added the ``style`` parameter.
.. versionchanged:: 3.3
Added the ``handlers`` parameter. A ``ValueError`` is now thrown for
incompatible arguments (e.g. ``handlers`` specified together with
``filename``/``filemode``, or ``filename``/``filemode`` specified
together with ``stream``, or ``handlers`` specified together with
``stream``.
"""
# Add thread safety in case someone mistakenly calls
# basicConfig() from multiple threads
_acquireLock()
try:
if len(root.handlers) == 0:
handlers = kwargs.pop("handlers", None)
if handlers is None:
if "stream" in kwargs and "filename" in kwargs:
raise ValueError("'stream' and 'filename' should not be "
"specified together")
else:
if "stream" in kwargs or "filename" in kwargs:
raise ValueError("'stream' or 'filename' should not be "
"specified together with 'handlers'")
if handlers is None:
filename = kwargs.pop("filename", None)
mode = kwargs.pop("filemode", 'a')
if filename:
h = FileHandler(filename, mode)
else:
stream = kwargs.pop("stream", None)
h = StreamHandler(stream)
handlers = [h]
dfs = kwargs.pop("datefmt", None)
style = kwargs.pop("style", '%')
if style not in _STYLES:
raise ValueError('Style must be one of: %s' % ','.join(
_STYLES.keys()))
fs = kwargs.pop("format", _STYLES[style][1])
fmt = Formatter(fs, dfs, style)
for h in handlers:
if h.formatter is None:
h.setFormatter(fmt)
root.addHandler(h)
level = kwargs.pop("level", None)
if level is not None:
root.setLevel(level)
if kwargs:
keys = ', '.join(kwargs.keys())
raise ValueError('Unrecognised argument(s): %s' % keys)
finally:
_releaseLock()
#---------------------------------------------------------------------------
# Utility functions at module level.
# Basically delegate everything to the root logger.
#---------------------------------------------------------------------------
def getLogger(name=None):
"""
Return a logger with the specified name, creating it if necessary.
If no name is specified, return the root logger.
"""
if name:
return Logger.manager.getLogger(name)
else:
return root
def critical(msg, *args, **kwargs):
"""
Log a message with severity 'CRITICAL' on the root logger. If the logger
has no handlers, call basicConfig() to add a console handler with a
pre-defined format.
"""
if len(root.handlers) == 0:
basicConfig()
root.critical(msg, *args, **kwargs)
fatal = critical
def error(msg, *args, **kwargs):
"""
Log a message with severity 'ERROR' on the root logger. If the logger has
no handlers, call basicConfig() to add a console handler with a pre-defined
format.
"""
if len(root.handlers) == 0:
basicConfig()
root.error(msg, *args, **kwargs)
def exception(msg, *args, **kwargs):
"""
Log a message with severity 'ERROR' on the root logger, with exception
information. If the logger has no handlers, basicConfig() is called to add
a console handler with a pre-defined format.
"""
kwargs['exc_info'] = True
error(msg, *args, **kwargs)
def warning(msg, *args, **kwargs):
"""
Log a message with severity 'WARNING' on the root logger. If the logger has
no handlers, call basicConfig() to add a console handler with a pre-defined
format.
"""
if len(root.handlers) == 0:
basicConfig()
root.warning(msg, *args, **kwargs)
def warn(msg, *args, **kwargs):
warnings.warn("The 'warn' function is deprecated, "
"use 'warning' instead", DeprecationWarning, 2)
warning(msg, *args, **kwargs)
def info(msg, *args, **kwargs):
"""
Log a message with severity 'INFO' on the root logger. If the logger has
no handlers, call basicConfig() to add a console handler with a pre-defined
format.
"""
if len(root.handlers) == 0:
basicConfig()
root.info(msg, *args, **kwargs)
def debug(msg, *args, **kwargs):
"""
Log a message with severity 'DEBUG' on the root logger. If the logger has
no handlers, call basicConfig() to add a console handler with a pre-defined
format.
"""
if len(root.handlers) == 0:
basicConfig()
root.debug(msg, *args, **kwargs)
def log(level, msg, *args, **kwargs):
"""
Log 'msg % args' with the integer severity 'level' on the root logger. If
the logger has no handlers, call basicConfig() to add a console handler
with a pre-defined format.
"""
if len(root.handlers) == 0:
basicConfig()
root.log(level, msg, *args, **kwargs)
def disable(level):
"""
Disable all logging calls of severity 'level' and below.
"""
root.manager.disable = level
def shutdown(handlerList=_handlerList):
"""
Perform any cleanup actions in the logging system (e.g. flushing
buffers).
Should be called at application exit.
"""
for wr in reversed(handlerList[:]):
#errors might occur, for example, if files are locked
#we just ignore them if raiseExceptions is not set
try:
h = wr()
if h:
try:
h.acquire()
h.flush()
h.close()
except (OSError, ValueError):
# Ignore errors which might be caused
# because handlers have been closed but
# references to them are still around at
# application exit.
pass
finally:
h.release()
except: # ignore everything, as we're shutting down
if raiseExceptions:
raise
#else, swallow
#Let's try and shutdown automatically on application exit...
import atexit
atexit.register(shutdown)
# Null handler
class NullHandler(Handler):
"""
This handler does nothing. It's intended to be used to avoid the
"No handlers could be found for logger XXX" one-off warning. This is
important for library code, which may contain code to log events. If a user
of the library does not configure logging, the one-off warning might be
produced; to avoid this, the library developer simply needs to instantiate
a NullHandler and add it to the top-level logger of the library module or
package.
"""
def handle(self, record):
"""Stub."""
def emit(self, record):
"""Stub."""
def createLock(self):
self.lock = None
# Warnings integration
_warnings_showwarning = None
def _showwarning(message, category, filename, lineno, file=None, line=None):
"""
Implementation of showwarnings which redirects to logging, which will first
check to see if the file parameter is None. If a file is specified, it will
delegate to the original warnings implementation of showwarning. Otherwise,
it will call warnings.formatwarning and will log the resulting string to a
warnings logger named "py.warnings" with level logging.WARNING.
"""
if file is not None:
if _warnings_showwarning is not None:
_warnings_showwarning(message, category, filename, lineno, file, line)
else:
s = warnings.formatwarning(message, category, filename, lineno, line)
logger = getLogger("py.warnings")
if not logger.handlers:
logger.addHandler(NullHandler())
logger.warning("%s", s)
def captureWarnings(capture):
"""
If capture is true, redirect all warnings to the logging package.
If capture is False, ensure that warnings are not redirected to logging
but to their original destinations.
"""
global _warnings_showwarning
if capture:
if _warnings_showwarning is None:
_warnings_showwarning = warnings.showwarning
warnings.showwarning = _showwarning
else:
if _warnings_showwarning is not None:
warnings.showwarning = _warnings_showwarning
_warnings_showwarning = None
from . import Table
_Validation = Table('_Validation')
_Validation.add_field(1,'Table',11552)
_Validation.add_field(2,'Column',11552)
_Validation.add_field(3,'Nullable',3332)
_Validation.add_field(4,'MinValue',4356)
_Validation.add_field(5,'MaxValue',4356)
_Validation.add_field(6,'KeyTable',7679)
_Validation.add_field(7,'KeyColumn',5378)
_Validation.add_field(8,'Category',7456)
_Validation.add_field(9,'Set',7679)
_Validation.add_field(10,'Description',7679)
ActionText = Table('ActionText')
ActionText.add_field(1,'Action',11592)
ActionText.add_field(2,'Description',7936)
ActionText.add_field(3,'Template',7936)
AdminExecuteSequence = Table('AdminExecuteSequence')
AdminExecuteSequence.add_field(1,'Action',11592)
AdminExecuteSequence.add_field(2,'Condition',7679)
AdminExecuteSequence.add_field(3,'Sequence',5378)
Condition = Table('Condition')
Condition.add_field(1,'Feature_',11558)
Condition.add_field(2,'Level',9474)
Condition.add_field(3,'Condition',7679)
AdminUISequence = Table('AdminUISequence')
AdminUISequence.add_field(1,'Action',11592)
AdminUISequence.add_field(2,'Condition',7679)
AdminUISequence.add_field(3,'Sequence',5378)
AdvtExecuteSequence = Table('AdvtExecuteSequence')
AdvtExecuteSequence.add_field(1,'Action',11592)
AdvtExecuteSequence.add_field(2,'Condition',7679)
AdvtExecuteSequence.add_field(3,'Sequence',5378)
AdvtUISequence = Table('AdvtUISequence')
AdvtUISequence.add_field(1,'Action',11592)
AdvtUISequence.add_field(2,'Condition',7679)
AdvtUISequence.add_field(3,'Sequence',5378)
AppId = Table('AppId')
AppId.add_field(1,'AppId',11558)
AppId.add_field(2,'RemoteServerName',7679)
AppId.add_field(3,'LocalService',7679)
AppId.add_field(4,'ServiceParameters',7679)
AppId.add_field(5,'DllSurrogate',7679)
AppId.add_field(6,'ActivateAtStorage',5378)
AppId.add_field(7,'RunAsInteractiveUser',5378)
AppSearch = Table('AppSearch')
AppSearch.add_field(1,'Property',11592)
AppSearch.add_field(2,'Signature_',11592)
Property = Table('Property')
Property.add_field(1,'Property',11592)
Property.add_field(2,'Value',3840)
BBControl = Table('BBControl')
BBControl.add_field(1,'Billboard_',11570)
BBControl.add_field(2,'BBControl',11570)
BBControl.add_field(3,'Type',3378)
BBControl.add_field(4,'X',1282)
BBControl.add_field(5,'Y',1282)
BBControl.add_field(6,'Width',1282)
BBControl.add_field(7,'Height',1282)
BBControl.add_field(8,'Attributes',4356)
BBControl.add_field(9,'Text',7986)
Billboard = Table('Billboard')
Billboard.add_field(1,'Billboard',11570)
Billboard.add_field(2,'Feature_',3366)
Billboard.add_field(3,'Action',7474)
Billboard.add_field(4,'Ordering',5378)
Feature = Table('Feature')
Feature.add_field(1,'Feature',11558)
Feature.add_field(2,'Feature_Parent',7462)
Feature.add_field(3,'Title',8000)
Feature.add_field(4,'Description',8191)
Feature.add_field(5,'Display',5378)
Feature.add_field(6,'Level',1282)
Feature.add_field(7,'Directory_',7496)
Feature.add_field(8,'Attributes',1282)
Binary = Table('Binary')
Binary.add_field(1,'Name',11592)
Binary.add_field(2,'Data',2304)
BindImage = Table('BindImage')
BindImage.add_field(1,'File_',11592)
BindImage.add_field(2,'Path',7679)
File = Table('File')
File.add_field(1,'File',11592)
File.add_field(2,'Component_',3400)
File.add_field(3,'FileName',4095)
File.add_field(4,'FileSize',260)
File.add_field(5,'Version',7496)
File.add_field(6,'Language',7444)
File.add_field(7,'Attributes',5378)
File.add_field(8,'Sequence',1282)
CCPSearch = Table('CCPSearch')
CCPSearch.add_field(1,'Signature_',11592)
CheckBox = Table('CheckBox')
CheckBox.add_field(1,'Property',11592)
CheckBox.add_field(2,'Value',7488)
Class = Table('Class')
Class.add_field(1,'CLSID',11558)
Class.add_field(2,'Context',11552)
Class.add_field(3,'Component_',11592)
Class.add_field(4,'ProgId_Default',7679)
Class.add_field(5,'Description',8191)
Class.add_field(6,'AppId_',7462)
Class.add_field(7,'FileTypeMask',7679)
Class.add_field(8,'Icon_',7496)
Class.add_field(9,'IconIndex',5378)
Class.add_field(10,'DefInprocHandler',7456)
Class.add_field(11,'Argument',7679)
Class.add_field(12,'Feature_',3366)
Class.add_field(13,'Attributes',5378)
Component = Table('Component')
Component.add_field(1,'Component',11592)
Component.add_field(2,'ComponentId',7462)
Component.add_field(3,'Directory_',3400)
Component.add_field(4,'Attributes',1282)
Component.add_field(5,'Condition',7679)
Component.add_field(6,'KeyPath',7496)
Icon = Table('Icon')
Icon.add_field(1,'Name',11592)
Icon.add_field(2,'Data',2304)
ProgId = Table('ProgId')
ProgId.add_field(1,'ProgId',11775)
ProgId.add_field(2,'ProgId_Parent',7679)
ProgId.add_field(3,'Class_',7462)
ProgId.add_field(4,'Description',8191)
ProgId.add_field(5,'Icon_',7496)
ProgId.add_field(6,'IconIndex',5378)
ComboBox = Table('ComboBox')
ComboBox.add_field(1,'Property',11592)
ComboBox.add_field(2,'Order',9474)
ComboBox.add_field(3,'Value',3392)
ComboBox.add_field(4,'Text',8000)
CompLocator = Table('CompLocator')
CompLocator.add_field(1,'Signature_',11592)
CompLocator.add_field(2,'ComponentId',3366)
CompLocator.add_field(3,'Type',5378)
Complus = Table('Complus')
Complus.add_field(1,'Component_',11592)
Complus.add_field(2,'ExpType',13570)
Directory = Table('Directory')
Directory.add_field(1,'Directory',11592)
Directory.add_field(2,'Directory_Parent',7496)
Directory.add_field(3,'DefaultDir',4095)
Control = Table('Control')
Control.add_field(1,'Dialog_',11592)
Control.add_field(2,'Control',11570)
Control.add_field(3,'Type',3348)
Control.add_field(4,'X',1282)
Control.add_field(5,'Y',1282)
Control.add_field(6,'Width',1282)
Control.add_field(7,'Height',1282)
Control.add_field(8,'Attributes',4356)
Control.add_field(9,'Property',7474)
Control.add_field(10,'Text',7936)
Control.add_field(11,'Control_Next',7474)
Control.add_field(12,'Help',7986)
Dialog = Table('Dialog')
Dialog.add_field(1,'Dialog',11592)
Dialog.add_field(2,'HCentering',1282)
Dialog.add_field(3,'VCentering',1282)
Dialog.add_field(4,'Width',1282)
Dialog.add_field(5,'Height',1282)
Dialog.add_field(6,'Attributes',4356)
Dialog.add_field(7,'Title',8064)
Dialog.add_field(8,'Control_First',3378)
Dialog.add_field(9,'Control_Default',7474)
Dialog.add_field(10,'Control_Cancel',7474)
ControlCondition = Table('ControlCondition')
ControlCondition.add_field(1,'Dialog_',11592)
ControlCondition.add_field(2,'Control_',11570)
ControlCondition.add_field(3,'Action',11570)
ControlCondition.add_field(4,'Condition',11775)
ControlEvent = Table('ControlEvent')
ControlEvent.add_field(1,'Dialog_',11592)
ControlEvent.add_field(2,'Control_',11570)
ControlEvent.add_field(3,'Event',11570)
ControlEvent.add_field(4,'Argument',11775)
ControlEvent.add_field(5,'Condition',15871)
ControlEvent.add_field(6,'Ordering',5378)
CreateFolder = Table('CreateFolder')
CreateFolder.add_field(1,'Directory_',11592)
CreateFolder.add_field(2,'Component_',11592)
CustomAction = Table('CustomAction')
CustomAction.add_field(1,'Action',11592)
CustomAction.add_field(2,'Type',1282)
CustomAction.add_field(3,'Source',7496)
CustomAction.add_field(4,'Target',7679)
DrLocator = Table('DrLocator')
DrLocator.add_field(1,'Signature_',11592)
DrLocator.add_field(2,'Parent',15688)
DrLocator.add_field(3,'Path',15871)
DrLocator.add_field(4,'Depth',5378)
DuplicateFile = Table('DuplicateFile')
DuplicateFile.add_field(1,'FileKey',11592)
DuplicateFile.add_field(2,'Component_',3400)
DuplicateFile.add_field(3,'File_',3400)
DuplicateFile.add_field(4,'DestName',8191)
DuplicateFile.add_field(5,'DestFolder',7496)
Environment = Table('Environment')
Environment.add_field(1,'Environment',11592)
Environment.add_field(2,'Name',4095)
Environment.add_field(3,'Value',8191)
Environment.add_field(4,'Component_',3400)
Error = Table('Error')
Error.add_field(1,'Error',9474)
Error.add_field(2,'Message',7936)
EventMapping = Table('EventMapping')
EventMapping.add_field(1,'Dialog_',11592)
EventMapping.add_field(2,'Control_',11570)
EventMapping.add_field(3,'Event',11570)
EventMapping.add_field(4,'Attribute',3378)
Extension = Table('Extension')
Extension.add_field(1,'Extension',11775)
Extension.add_field(2,'Component_',11592)
Extension.add_field(3,'ProgId_',7679)
Extension.add_field(4,'MIME_',7488)
Extension.add_field(5,'Feature_',3366)
MIME = Table('MIME')
MIME.add_field(1,'ContentType',11584)
MIME.add_field(2,'Extension_',3583)
MIME.add_field(3,'CLSID',7462)
FeatureComponents = Table('FeatureComponents')
FeatureComponents.add_field(1,'Feature_',11558)
FeatureComponents.add_field(2,'Component_',11592)
FileSFPCatalog = Table('FileSFPCatalog')
FileSFPCatalog.add_field(1,'File_',11592)
FileSFPCatalog.add_field(2,'SFPCatalog_',11775)
SFPCatalog = Table('SFPCatalog')
SFPCatalog.add_field(1,'SFPCatalog',11775)
SFPCatalog.add_field(2,'Catalog',2304)
SFPCatalog.add_field(3,'Dependency',7424)
Font = Table('Font')
Font.add_field(1,'File_',11592)
Font.add_field(2,'FontTitle',7552)
IniFile = Table('IniFile')
IniFile.add_field(1,'IniFile',11592)
IniFile.add_field(2,'FileName',4095)
IniFile.add_field(3,'DirProperty',7496)
IniFile.add_field(4,'Section',3936)
IniFile.add_field(5,'Key',3968)
IniFile.add_field(6,'Value',4095)
IniFile.add_field(7,'Action',1282)
IniFile.add_field(8,'Component_',3400)
IniLocator = Table('IniLocator')
IniLocator.add_field(1,'Signature_',11592)
IniLocator.add_field(2,'FileName',3583)
IniLocator.add_field(3,'Section',3424)
IniLocator.add_field(4,'Key',3456)
IniLocator.add_field(5,'Field',5378)
IniLocator.add_field(6,'Type',5378)
InstallExecuteSequence = Table('InstallExecuteSequence')
InstallExecuteSequence.add_field(1,'Action',11592)
InstallExecuteSequence.add_field(2,'Condition',7679)
InstallExecuteSequence.add_field(3,'Sequence',5378)
InstallUISequence = Table('InstallUISequence')
InstallUISequence.add_field(1,'Action',11592)
InstallUISequence.add_field(2,'Condition',7679)
InstallUISequence.add_field(3,'Sequence',5378)
IsolatedComponent = Table('IsolatedComponent')
IsolatedComponent.add_field(1,'Component_Shared',11592)
IsolatedComponent.add_field(2,'Component_Application',11592)
LaunchCondition = Table('LaunchCondition')
LaunchCondition.add_field(1,'Condition',11775)
LaunchCondition.add_field(2,'Description',4095)
ListBox = Table('ListBox')
ListBox.add_field(1,'Property',11592)
ListBox.add_field(2,'Order',9474)
ListBox.add_field(3,'Value',3392)
ListBox.add_field(4,'Text',8000)
ListView = Table('ListView')
ListView.add_field(1,'Property',11592)
ListView.add_field(2,'Order',9474)
ListView.add_field(3,'Value',3392)
ListView.add_field(4,'Text',8000)
ListView.add_field(5,'Binary_',7496)
LockPermissions = Table('LockPermissions')
LockPermissions.add_field(1,'LockObject',11592)
LockPermissions.add_field(2,'Table',11552)
LockPermissions.add_field(3,'Domain',15871)
LockPermissions.add_field(4,'User',11775)
LockPermissions.add_field(5,'Permission',4356)
Media = Table('Media')
Media.add_field(1,'DiskId',9474)
Media.add_field(2,'LastSequence',1282)
Media.add_field(3,'DiskPrompt',8000)
Media.add_field(4,'Cabinet',7679)
Media.add_field(5,'VolumeLabel',7456)
Media.add_field(6,'Source',7496)
MoveFile = Table('MoveFile')
MoveFile.add_field(1,'FileKey',11592)
MoveFile.add_field(2,'Component_',3400)
MoveFile.add_field(3,'SourceName',8191)
MoveFile.add_field(4,'DestName',8191)
MoveFile.add_field(5,'SourceFolder',7496)
MoveFile.add_field(6,'DestFolder',3400)
MoveFile.add_field(7,'Options',1282)
MsiAssembly = Table('MsiAssembly')
MsiAssembly.add_field(1,'Component_',11592)
MsiAssembly.add_field(2,'Feature_',3366)
MsiAssembly.add_field(3,'File_Manifest',7496)
MsiAssembly.add_field(4,'File_Application',7496)
MsiAssembly.add_field(5,'Attributes',5378)
MsiAssemblyName = Table('MsiAssemblyName')
MsiAssemblyName.add_field(1,'Component_',11592)
MsiAssemblyName.add_field(2,'Name',11775)
MsiAssemblyName.add_field(3,'Value',3583)
MsiDigitalCertificate = Table('MsiDigitalCertificate')
MsiDigitalCertificate.add_field(1,'DigitalCertificate',11592)
MsiDigitalCertificate.add_field(2,'CertData',2304)
MsiDigitalSignature = Table('MsiDigitalSignature')
MsiDigitalSignature.add_field(1,'Table',11552)
MsiDigitalSignature.add_field(2,'SignObject',11592)
MsiDigitalSignature.add_field(3,'DigitalCertificate_',3400)
MsiDigitalSignature.add_field(4,'Hash',6400)
MsiFileHash = Table('MsiFileHash')
MsiFileHash.add_field(1,'File_',11592)
MsiFileHash.add_field(2,'Options',1282)
MsiFileHash.add_field(3,'HashPart1',260)
MsiFileHash.add_field(4,'HashPart2',260)
MsiFileHash.add_field(5,'HashPart3',260)
MsiFileHash.add_field(6,'HashPart4',260)
MsiPatchHeaders = Table('MsiPatchHeaders')
MsiPatchHeaders.add_field(1,'StreamRef',11558)
MsiPatchHeaders.add_field(2,'Header',2304)
ODBCAttribute = Table('ODBCAttribute')
ODBCAttribute.add_field(1,'Driver_',11592)
ODBCAttribute.add_field(2,'Attribute',11560)
ODBCAttribute.add_field(3,'Value',8191)
ODBCDriver = Table('ODBCDriver')
ODBCDriver.add_field(1,'Driver',11592)
ODBCDriver.add_field(2,'Component_',3400)
ODBCDriver.add_field(3,'Description',3583)
ODBCDriver.add_field(4,'File_',3400)
ODBCDriver.add_field(5,'File_Setup',7496)
ODBCDataSource = Table('ODBCDataSource')
ODBCDataSource.add_field(1,'DataSource',11592)
ODBCDataSource.add_field(2,'Component_',3400)
ODBCDataSource.add_field(3,'Description',3583)
ODBCDataSource.add_field(4,'DriverDescription',3583)
ODBCDataSource.add_field(5,'Registration',1282)
ODBCSourceAttribute = Table('ODBCSourceAttribute')
ODBCSourceAttribute.add_field(1,'DataSource_',11592)
ODBCSourceAttribute.add_field(2,'Attribute',11552)
ODBCSourceAttribute.add_field(3,'Value',8191)
ODBCTranslator = Table('ODBCTranslator')
ODBCTranslator.add_field(1,'Translator',11592)
ODBCTranslator.add_field(2,'Component_',3400)
ODBCTranslator.add_field(3,'Description',3583)
ODBCTranslator.add_field(4,'File_',3400)
ODBCTranslator.add_field(5,'File_Setup',7496)
Patch = Table('Patch')
Patch.add_field(1,'File_',11592)
Patch.add_field(2,'Sequence',9474)
Patch.add_field(3,'PatchSize',260)
Patch.add_field(4,'Attributes',1282)
Patch.add_field(5,'Header',6400)
Patch.add_field(6,'StreamRef_',7462)
PatchPackage = Table('PatchPackage')
PatchPackage.add_field(1,'PatchId',11558)
PatchPackage.add_field(2,'Media_',1282)
PublishComponent = Table('PublishComponent')
PublishComponent.add_field(1,'ComponentId',11558)
PublishComponent.add_field(2,'Qualifier',11775)
PublishComponent.add_field(3,'Component_',11592)
PublishComponent.add_field(4,'AppData',8191)
PublishComponent.add_field(5,'Feature_',3366)
RadioButton = Table('RadioButton')
RadioButton.add_field(1,'Property',11592)
RadioButton.add_field(2,'Order',9474)
RadioButton.add_field(3,'Value',3392)
RadioButton.add_field(4,'X',1282)
RadioButton.add_field(5,'Y',1282)
RadioButton.add_field(6,'Width',1282)
RadioButton.add_field(7,'Height',1282)
RadioButton.add_field(8,'Text',8000)
RadioButton.add_field(9,'Help',7986)
Registry = Table('Registry')
Registry.add_field(1,'Registry',11592)
Registry.add_field(2,'Root',1282)
Registry.add_field(3,'Key',4095)
Registry.add_field(4,'Name',8191)
Registry.add_field(5,'Value',7936)
Registry.add_field(6,'Component_',3400)
RegLocator = Table('RegLocator')
RegLocator.add_field(1,'Signature_',11592)
RegLocator.add_field(2,'Root',1282)
RegLocator.add_field(3,'Key',3583)
RegLocator.add_field(4,'Name',7679)
RegLocator.add_field(5,'Type',5378)
RemoveFile = Table('RemoveFile')
RemoveFile.add_field(1,'FileKey',11592)
RemoveFile.add_field(2,'Component_',3400)
RemoveFile.add_field(3,'FileName',8191)
RemoveFile.add_field(4,'DirProperty',3400)
RemoveFile.add_field(5,'InstallMode',1282)
RemoveIniFile = Table('RemoveIniFile')
RemoveIniFile.add_field(1,'RemoveIniFile',11592)
RemoveIniFile.add_field(2,'FileName',4095)
RemoveIniFile.add_field(3,'DirProperty',7496)
RemoveIniFile.add_field(4,'Section',3936)
RemoveIniFile.add_field(5,'Key',3968)
RemoveIniFile.add_field(6,'Value',8191)
RemoveIniFile.add_field(7,'Action',1282)
RemoveIniFile.add_field(8,'Component_',3400)
RemoveRegistry = Table('RemoveRegistry')
RemoveRegistry.add_field(1,'RemoveRegistry',11592)
RemoveRegistry.add_field(2,'Root',1282)
RemoveRegistry.add_field(3,'Key',4095)
RemoveRegistry.add_field(4,'Name',8191)
RemoveRegistry.add_field(5,'Component_',3400)
ReserveCost = Table('ReserveCost')
ReserveCost.add_field(1,'ReserveKey',11592)
ReserveCost.add_field(2,'Component_',3400)
ReserveCost.add_field(3,'ReserveFolder',7496)
ReserveCost.add_field(4,'ReserveLocal',260)
ReserveCost.add_field(5,'ReserveSource',260)
SelfReg = Table('SelfReg')
SelfReg.add_field(1,'File_',11592)
SelfReg.add_field(2,'Cost',5378)
ServiceControl = Table('ServiceControl')
ServiceControl.add_field(1,'ServiceControl',11592)
ServiceControl.add_field(2,'Name',4095)
ServiceControl.add_field(3,'Event',1282)
ServiceControl.add_field(4,'Arguments',8191)
ServiceControl.add_field(5,'Wait',5378)
ServiceControl.add_field(6,'Component_',3400)
ServiceInstall = Table('ServiceInstall')
ServiceInstall.add_field(1,'ServiceInstall',11592)
ServiceInstall.add_field(2,'Name',3583)
ServiceInstall.add_field(3,'DisplayName',8191)
ServiceInstall.add_field(4,'ServiceType',260)
ServiceInstall.add_field(5,'StartType',260)
ServiceInstall.add_field(6,'ErrorControl',260)
ServiceInstall.add_field(7,'LoadOrderGroup',7679)
ServiceInstall.add_field(8,'Dependencies',7679)
ServiceInstall.add_field(9,'StartName',7679)
ServiceInstall.add_field(10,'Password',7679)
ServiceInstall.add_field(11,'Arguments',7679)
ServiceInstall.add_field(12,'Component_',3400)
ServiceInstall.add_field(13,'Description',8191)
Shortcut = Table('Shortcut')
Shortcut.add_field(1,'Shortcut',11592)
Shortcut.add_field(2,'Directory_',3400)
Shortcut.add_field(3,'Name',3968)
Shortcut.add_field(4,'Component_',3400)
Shortcut.add_field(5,'Target',3400)
Shortcut.add_field(6,'Arguments',7679)
Shortcut.add_field(7,'Description',8191)
Shortcut.add_field(8,'Hotkey',5378)
Shortcut.add_field(9,'Icon_',7496)
Shortcut.add_field(10,'IconIndex',5378)
Shortcut.add_field(11,'ShowCmd',5378)
Shortcut.add_field(12,'WkDir',7496)
Signature = Table('Signature')
Signature.add_field(1,'Signature',11592)
Signature.add_field(2,'FileName',3583)
Signature.add_field(3,'MinVersion',7444)
Signature.add_field(4,'MaxVersion',7444)
Signature.add_field(5,'MinSize',4356)
Signature.add_field(6,'MaxSize',4356)
Signature.add_field(7,'MinDate',4356)
Signature.add_field(8,'MaxDate',4356)
Signature.add_field(9,'Languages',7679)
TextStyle = Table('TextStyle')
TextStyle.add_field(1,'TextStyle',11592)
TextStyle.add_field(2,'FaceName',3360)
TextStyle.add_field(3,'Size',1282)
TextStyle.add_field(4,'Color',4356)
TextStyle.add_field(5,'StyleBits',5378)
TypeLib = Table('TypeLib')
TypeLib.add_field(1,'LibID',11558)
TypeLib.add_field(2,'Language',9474)
TypeLib.add_field(3,'Component_',11592)
TypeLib.add_field(4,'Version',4356)
TypeLib.add_field(5,'Description',8064)
TypeLib.add_field(6,'Directory_',7496)
TypeLib.add_field(7,'Feature_',3366)
TypeLib.add_field(8,'Cost',4356)
UIText = Table('UIText')
UIText.add_field(1,'Key',11592)
UIText.add_field(2,'Text',8191)
Upgrade = Table('Upgrade')
Upgrade.add_field(1,'UpgradeCode',11558)
Upgrade.add_field(2,'VersionMin',15636)
Upgrade.add_field(3,'VersionMax',15636)
Upgrade.add_field(4,'Language',15871)
Upgrade.add_field(5,'Attributes',8452)
Upgrade.add_field(6,'Remove',7679)
Upgrade.add_field(7,'ActionProperty',3400)
Verb = Table('Verb')
Verb.add_field(1,'Extension_',11775)
Verb.add_field(2,'Verb',11552)
Verb.add_field(3,'Sequence',5378)
Verb.add_field(4,'Command',8191)
Verb.add_field(5,'Argument',8191)
tables=[_Validation, ActionText, AdminExecuteSequence, Condition, AdminUISequence, AdvtExecuteSequence, AdvtUISequence, AppId, AppSearch, Property, BBControl, Billboard, Feature, Binary, BindImage, File, CCPSearch, CheckBox, Class, Component, Icon, ProgId, ComboBox, CompLocator, Complus, Directory, Control, Dialog, ControlCondition, ControlEvent, CreateFolder, CustomAction, DrLocator, DuplicateFile, Environment, Error, EventMapping, Extension, MIME, FeatureComponents, FileSFPCatalog, SFPCatalog, Font, IniFile, IniLocator, InstallExecuteSequence, InstallUISequence, IsolatedComponent, LaunchCondition, ListBox, ListView, LockPermissions, Media, MoveFile, MsiAssembly, MsiAssemblyName, MsiDigitalCertificate, MsiDigitalSignature, MsiFileHash, MsiPatchHeaders, ODBCAttribute, ODBCDriver, ODBCDataSource, ODBCSourceAttribute, ODBCTranslator, Patch, PatchPackage, PublishComponent, RadioButton, Registry, RegLocator, RemoveFile, RemoveIniFile, RemoveRegistry, ReserveCost, SelfReg, ServiceControl, ServiceInstall, Shortcut, Signature, TextStyle, TypeLib, UIText, Upgrade, Verb]
_Validation_records = [
('_Validation','Table','N',None, None, None, None, 'Identifier',None, 'Name of table',),
('_Validation','Column','N',None, None, None, None, 'Identifier',None, 'Name of column',),
('_Validation','Description','Y',None, None, None, None, 'Text',None, 'Description of column',),
('_Validation','Set','Y',None, None, None, None, 'Text',None, 'Set of values that are permitted',),
('_Validation','Category','Y',None, None, None, None, None, 'Text;Formatted;Template;Condition;Guid;Path;Version;Language;Identifier;Binary;UpperCase;LowerCase;Filename;Paths;AnyPath;WildCardFilename;RegPath;KeyFormatted;CustomSource;Property;Cabinet;Shortcut;URL','String category',),
('_Validation','KeyColumn','Y',1,32,None, None, None, None, 'Column to which foreign key connects',),
('_Validation','KeyTable','Y',None, None, None, None, 'Identifier',None, 'For foreign key, Name of table to which data must link',),
('_Validation','MaxValue','Y',-2147483647,2147483647,None, None, None, None, 'Maximum value allowed',),
('_Validation','MinValue','Y',-2147483647,2147483647,None, None, None, None, 'Minimum value allowed',),
('_Validation','Nullable','N',None, None, None, None, None, 'Y;N;@','Whether the column is nullable',),
('ActionText','Description','Y',None, None, None, None, 'Text',None, 'Localized description displayed in progress dialog and log when action is executing.',),
('ActionText','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to be described.',),
('ActionText','Template','Y',None, None, None, None, 'Template',None, 'Optional localized format template used to format action data records for display during action execution.',),
('AdminExecuteSequence','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to invoke, either in the engine or the handler DLL.',),
('AdminExecuteSequence','Condition','Y',None, None, None, None, 'Condition',None, 'Optional expression which skips the action if evaluates to expFalse.If the expression syntax is invalid, the engine will terminate, returning iesBadActionData.',),
('AdminExecuteSequence','Sequence','Y',-4,32767,None, None, None, None, 'Number that determines the sort order in which the actions are to be executed. Leave blank to suppress action.',),
('Condition','Condition','Y',None, None, None, None, 'Condition',None, 'Expression evaluated to determine if Level in the Feature table is to change.',),
('Condition','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Reference to a Feature entry in Feature table.',),
('Condition','Level','N',0,32767,None, None, None, None, 'New selection Level to set in Feature table if Condition evaluates to TRUE.',),
('AdminUISequence','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to invoke, either in the engine or the handler DLL.',),
('AdminUISequence','Condition','Y',None, None, None, None, 'Condition',None, 'Optional expression which skips the action if evaluates to expFalse.If the expression syntax is invalid, the engine will terminate, returning iesBadActionData.',),
('AdminUISequence','Sequence','Y',-4,32767,None, None, None, None, 'Number that determines the sort order in which the actions are to be executed. Leave blank to suppress action.',),
('AdvtExecuteSequence','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to invoke, either in the engine or the handler DLL.',),
('AdvtExecuteSequence','Condition','Y',None, None, None, None, 'Condition',None, 'Optional expression which skips the action if evaluates to expFalse.If the expression syntax is invalid, the engine will terminate, returning iesBadActionData.',),
('AdvtExecuteSequence','Sequence','Y',-4,32767,None, None, None, None, 'Number that determines the sort order in which the actions are to be executed. Leave blank to suppress action.',),
('AdvtUISequence','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to invoke, either in the engine or the handler DLL.',),
('AdvtUISequence','Condition','Y',None, None, None, None, 'Condition',None, 'Optional expression which skips the action if evaluates to expFalse.If the expression syntax is invalid, the engine will terminate, returning iesBadActionData.',),
('AdvtUISequence','Sequence','Y',-4,32767,None, None, None, None, 'Number that determines the sort order in which the actions are to be executed. Leave blank to suppress action.',),
('AppId','AppId','N',None, None, None, None, 'Guid',None, None, ),
('AppId','ActivateAtStorage','Y',0,1,None, None, None, None, None, ),
('AppId','DllSurrogate','Y',None, None, None, None, 'Text',None, None, ),
('AppId','LocalService','Y',None, None, None, None, 'Text',None, None, ),
('AppId','RemoteServerName','Y',None, None, None, None, 'Formatted',None, None, ),
('AppId','RunAsInteractiveUser','Y',0,1,None, None, None, None, None, ),
('AppId','ServiceParameters','Y',None, None, None, None, 'Text',None, None, ),
('AppSearch','Property','N',None, None, None, None, 'Identifier',None, 'The property associated with a Signature',),
('AppSearch','Signature_','N',None, None, 'Signature;RegLocator;IniLocator;DrLocator;CompLocator',1,'Identifier',None, 'The Signature_ represents a unique file signature and is also the foreign key in the Signature, RegLocator, IniLocator, CompLocator and the DrLocator tables.',),
('Property','Property','N',None, None, None, None, 'Identifier',None, 'Name of property, uppercase if settable by launcher or loader.',),
('Property','Value','N',None, None, None, None, 'Text',None, 'String value for property. Never null or empty.',),
('BBControl','Type','N',None, None, None, None, 'Identifier',None, 'The type of the control.',),
('BBControl','Y','N',0,32767,None, None, None, None, 'Vertical coordinate of the upper left corner of the bounding rectangle of the control.',),
('BBControl','Text','Y',None, None, None, None, 'Text',None, 'A string used to set the initial text contained within a control (if appropriate).',),
('BBControl','BBControl','N',None, None, None, None, 'Identifier',None, 'Name of the control. This name must be unique within a billboard, but can repeat on different billboard.',),
('BBControl','Attributes','Y',0,2147483647,None, None, None, None, 'A 32-bit word that specifies the attribute flags to be applied to this control.',),
('BBControl','Billboard_','N',None, None, 'Billboard',1,'Identifier',None, 'External key to the Billboard table, name of the billboard.',),
('BBControl','Height','N',0,32767,None, None, None, None, 'Height of the bounding rectangle of the control.',),
('BBControl','Width','N',0,32767,None, None, None, None, 'Width of the bounding rectangle of the control.',),
('BBControl','X','N',0,32767,None, None, None, None, 'Horizontal coordinate of the upper left corner of the bounding rectangle of the control.',),
('Billboard','Action','Y',None, None, None, None, 'Identifier',None, 'The name of an action. The billboard is displayed during the progress messages received from this action.',),
('Billboard','Billboard','N',None, None, None, None, 'Identifier',None, 'Name of the billboard.',),
('Billboard','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'An external key to the Feature Table. The billboard is shown only if this feature is being installed.',),
('Billboard','Ordering','Y',0,32767,None, None, None, None, 'A positive integer. If there is more than one billboard corresponding to an action they will be shown in the order defined by this column.',),
('Feature','Description','Y',None, None, None, None, 'Text',None, 'Longer descriptive text describing a visible feature item.',),
('Feature','Attributes','N',None, None, None, None, None, '0;1;2;4;5;6;8;9;10;16;17;18;20;21;22;24;25;26;32;33;34;36;37;38;48;49;50;52;53;54','Feature attributes',),
('Feature','Feature','N',None, None, None, None, 'Identifier',None, 'Primary key used to identify a particular feature record.',),
('Feature','Directory_','Y',None, None, 'Directory',1,'UpperCase',None, 'The name of the Directory that can be configured by the UI. A non-null value will enable the browse button.',),
('Feature','Level','N',0,32767,None, None, None, None, 'The install level at which record will be initially selected. An install level of 0 will disable an item and prevent its display.',),
('Feature','Title','Y',None, None, None, None, 'Text',None, 'Short text identifying a visible feature item.',),
('Feature','Display','Y',0,32767,None, None, None, None, 'Numeric sort order, used to force a specific display ordering.',),
('Feature','Feature_Parent','Y',None, None, 'Feature',1,'Identifier',None, 'Optional key of a parent record in the same table. If the parent is not selected, then the record will not be installed. Null indicates a root item.',),
('Binary','Name','N',None, None, None, None, 'Identifier',None, 'Unique key identifying the binary data.',),
('Binary','Data','N',None, None, None, None, 'Binary',None, 'The unformatted binary data.',),
('BindImage','File_','N',None, None, 'File',1,'Identifier',None, 'The index into the File table. This must be an executable file.',),
('BindImage','Path','Y',None, None, None, None, 'Paths',None, 'A list of ; delimited paths that represent the paths to be searched for the import DLLS. The list is usually a list of properties each enclosed within square brackets [] .',),
('File','Sequence','N',1,32767,None, None, None, None, 'Sequence with respect to the media images; order must track cabinet order.',),
('File','Attributes','Y',0,32767,None, None, None, None, 'Integer containing bit flags representing file attributes (with the decimal value of each bit position in parentheses)',),
('File','File','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token, must match identifier in cabinet. For uncompressed files, this field is ignored.',),
('File','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key referencing Component that controls the file.',),
('File','FileName','N',None, None, None, None, 'Filename',None, 'File name used for installation, may be localized. This may contain a "short name|long name" pair.',),
('File','FileSize','N',0,2147483647,None, None, None, None, 'Size of file in bytes (integer).',),
('File','Language','Y',None, None, None, None, 'Language',None, 'List of decimal language Ids, comma-separated if more than one.',),
('File','Version','Y',None, None, 'File',1,'Version',None, 'Version string for versioned files; Blank for unversioned files.',),
('CCPSearch','Signature_','N',None, None, 'Signature;RegLocator;IniLocator;DrLocator;CompLocator',1,'Identifier',None, 'The Signature_ represents a unique file signature and is also the foreign key in the Signature, RegLocator, IniLocator, CompLocator and the DrLocator tables.',),
('CheckBox','Property','N',None, None, None, None, 'Identifier',None, 'A named property to be tied to the item.',),
('CheckBox','Value','Y',None, None, None, None, 'Formatted',None, 'The value string associated with the item.',),
('Class','Description','Y',None, None, None, None, 'Text',None, 'Localized description for the Class.',),
('Class','Attributes','Y',None, 32767,None, None, None, None, 'Class registration attributes.',),
('Class','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Required foreign key into the Feature Table, specifying the feature to validate or install in order for the CLSID factory to be operational.',),
('Class','AppId_','Y',None, None, 'AppId',1,'Guid',None, 'Optional AppID containing DCOM information for associated application (string GUID).',),
('Class','Argument','Y',None, None, None, None, 'Formatted',None, 'optional argument for LocalServers.',),
('Class','CLSID','N',None, None, None, None, 'Guid',None, 'The CLSID of an OLE factory.',),
('Class','Component_','N',None, None, 'Component',1,'Identifier',None, 'Required foreign key into the Component Table, specifying the component for which to return a path when called through LocateComponent.',),
('Class','Context','N',None, None, None, None, 'Identifier',None, 'The numeric server context for this server. CLSCTX_xxxx',),
('Class','DefInprocHandler','Y',None, None, None, None, 'Filename','1;2;3','Optional default inproc handler. Only optionally provided if Context=CLSCTX_LOCAL_SERVER. Typically "ole32.dll" or "mapi32.dll"',),
('Class','FileTypeMask','Y',None, None, None, None, 'Text',None, 'Optional string containing information for the HKCRthis CLSID) key. If multiple patterns exist, they must be delimited by a semicolon, and numeric subkeys will be generated: 0,1,2...',),
('Class','Icon_','Y',None, None, 'Icon',1,'Identifier',None, 'Optional foreign key into the Icon Table, specifying the icon file associated with this CLSID. Will be written under the DefaultIcon key.',),
('Class','IconIndex','Y',-32767,32767,None, None, None, None, 'Optional icon index.',),
('Class','ProgId_Default','Y',None, None, 'ProgId',1,'Text',None, 'Optional ProgId associated with this CLSID.',),
('Component','Condition','Y',None, None, None, None, 'Condition',None, "A conditional statement that will disable this component if the specified condition evaluates to the 'True' state. If a component is disabled, it will not be installed, regardless of the 'Action' state associated with the component.",),
('Component','Attributes','N',None, None, None, None, None, None, 'Remote execution option, one of irsEnum',),
('Component','Component','N',None, None, None, None, 'Identifier',None, 'Primary key used to identify a particular component record.',),
('Component','ComponentId','Y',None, None, None, None, 'Guid',None, 'A string GUID unique to this component, version, and language.',),
('Component','Directory_','N',None, None, 'Directory',1,'Identifier',None, 'Required key of a Directory table record. This is actually a property name whose value contains the actual path, set either by the AppSearch action or with the default setting obtained from the Directory table.',),
('Component','KeyPath','Y',None, None, 'File;Registry;ODBCDataSource',1,'Identifier',None, 'Either the primary key into the File table, Registry table, or ODBCDataSource table. This extract path is stored when the component is installed, and is used to detect the presence of the component and to return the path to it.',),
('Icon','Name','N',None, None, None, None, 'Identifier',None, 'Primary key. Name of the icon file.',),
('Icon','Data','N',None, None, None, None, 'Binary',None, 'Binary stream. The binary icon data in PE (.DLL or .EXE) or icon (.ICO) format.',),
('ProgId','Description','Y',None, None, None, None, 'Text',None, 'Localized description for the Program identifier.',),
('ProgId','Icon_','Y',None, None, 'Icon',1,'Identifier',None, 'Optional foreign key into the Icon Table, specifying the icon file associated with this ProgId. Will be written under the DefaultIcon key.',),
('ProgId','IconIndex','Y',-32767,32767,None, None, None, None, 'Optional icon index.',),
('ProgId','ProgId','N',None, None, None, None, 'Text',None, 'The Program Identifier. Primary key.',),
('ProgId','Class_','Y',None, None, 'Class',1,'Guid',None, 'The CLSID of an OLE factory corresponding to the ProgId.',),
('ProgId','ProgId_Parent','Y',None, None, 'ProgId',1,'Text',None, 'The Parent Program Identifier. If specified, the ProgId column becomes a version independent prog id.',),
('ComboBox','Text','Y',None, None, None, None, 'Formatted',None, 'The visible text to be assigned to the item. Optional. If this entry or the entire column is missing, the text is the same as the value.',),
('ComboBox','Property','N',None, None, None, None, 'Identifier',None, 'A named property to be tied to this item. All the items tied to the same property become part of the same combobox.',),
('ComboBox','Value','N',None, None, None, None, 'Formatted',None, 'The value string associated with this item. Selecting the line will set the associated property to this value.',),
('ComboBox','Order','N',1,32767,None, None, None, None, 'A positive integer used to determine the ordering of the items within one list.\tThe integers do not have to be consecutive.',),
('CompLocator','Type','Y',0,1,None, None, None, None, 'A boolean value that determines if the registry value is a filename or a directory location.',),
('CompLocator','Signature_','N',None, None, None, None, 'Identifier',None, 'The table key. The Signature_ represents a unique file signature and is also the foreign key in the Signature table.',),
('CompLocator','ComponentId','N',None, None, None, None, 'Guid',None, 'A string GUID unique to this component, version, and language.',),
('Complus','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key referencing Component that controls the ComPlus component.',),
('Complus','ExpType','Y',0,32767,None, None, None, None, 'ComPlus component attributes.',),
('Directory','Directory','N',None, None, None, None, 'Identifier',None, 'Unique identifier for directory entry, primary key. If a property by this name is defined, it contains the full path to the directory.',),
('Directory','DefaultDir','N',None, None, None, None, 'DefaultDir',None, "The default sub-path under parent's path.",),
('Directory','Directory_Parent','Y',None, None, 'Directory',1,'Identifier',None, 'Reference to the entry in this table specifying the default parent directory. A record parented to itself or with a Null parent represents a root of the install tree.',),
('Control','Type','N',None, None, None, None, 'Identifier',None, 'The type of the control.',),
('Control','Y','N',0,32767,None, None, None, None, 'Vertical coordinate of the upper left corner of the bounding rectangle of the control.',),
('Control','Text','Y',None, None, None, None, 'Formatted',None, 'A string used to set the initial text contained within a control (if appropriate).',),
('Control','Property','Y',None, None, None, None, 'Identifier',None, 'The name of a defined property to be linked to this control. ',),
('Control','Attributes','Y',0,2147483647,None, None, None, None, 'A 32-bit word that specifies the attribute flags to be applied to this control.',),
('Control','Height','N',0,32767,None, None, None, None, 'Height of the bounding rectangle of the control.',),
('Control','Width','N',0,32767,None, None, None, None, 'Width of the bounding rectangle of the control.',),
('Control','X','N',0,32767,None, None, None, None, 'Horizontal coordinate of the upper left corner of the bounding rectangle of the control.',),
('Control','Control','N',None, None, None, None, 'Identifier',None, 'Name of the control. This name must be unique within a dialog, but can repeat on different dialogs. ',),
('Control','Control_Next','Y',None, None, 'Control',2,'Identifier',None, 'The name of an other control on the same dialog. This link defines the tab order of the controls. The links have to form one or more cycles!',),
('Control','Dialog_','N',None, None, 'Dialog',1,'Identifier',None, 'External key to the Dialog table, name of the dialog.',),
('Control','Help','Y',None, None, None, None, 'Text',None, 'The help strings used with the button. The text is optional. ',),
('Dialog','Attributes','Y',0,2147483647,None, None, None, None, 'A 32-bit word that specifies the attribute flags to be applied to this dialog.',),
('Dialog','Height','N',0,32767,None, None, None, None, 'Height of the bounding rectangle of the dialog.',),
('Dialog','Width','N',0,32767,None, None, None, None, 'Width of the bounding rectangle of the dialog.',),
('Dialog','Dialog','N',None, None, None, None, 'Identifier',None, 'Name of the dialog.',),
('Dialog','Control_Cancel','Y',None, None, 'Control',2,'Identifier',None, 'Defines the cancel control. Hitting escape or clicking on the close icon on the dialog is equivalent to pushing this button.',),
('Dialog','Control_Default','Y',None, None, 'Control',2,'Identifier',None, 'Defines the default control. Hitting return is equivalent to pushing this button.',),
('Dialog','Control_First','N',None, None, 'Control',2,'Identifier',None, 'Defines the control that has the focus when the dialog is created.',),
('Dialog','HCentering','N',0,100,None, None, None, None, 'Horizontal position of the dialog on a 0-100 scale. 0 means left end, 100 means right end of the screen, 50 center.',),
('Dialog','Title','Y',None, None, None, None, 'Formatted',None, "A text string specifying the title to be displayed in the title bar of the dialog's window.",),
('Dialog','VCentering','N',0,100,None, None, None, None, 'Vertical position of the dialog on a 0-100 scale. 0 means top end, 100 means bottom end of the screen, 50 center.',),
('ControlCondition','Action','N',None, None, None, None, None, 'Default;Disable;Enable;Hide;Show','The desired action to be taken on the specified control.',),
('ControlCondition','Condition','N',None, None, None, None, 'Condition',None, 'A standard conditional statement that specifies under which conditions the action should be triggered.',),
('ControlCondition','Dialog_','N',None, None, 'Dialog',1,'Identifier',None, 'A foreign key to the Dialog table, name of the dialog.',),
('ControlCondition','Control_','N',None, None, 'Control',2,'Identifier',None, 'A foreign key to the Control table, name of the control.',),
('ControlEvent','Condition','Y',None, None, None, None, 'Condition',None, 'A standard conditional statement that specifies under which conditions an event should be triggered.',),
('ControlEvent','Ordering','Y',0,2147483647,None, None, None, None, 'An integer used to order several events tied to the same control. Can be left blank.',),
('ControlEvent','Argument','N',None, None, None, None, 'Formatted',None, 'A value to be used as a modifier when triggering a particular event.',),
('ControlEvent','Dialog_','N',None, None, 'Dialog',1,'Identifier',None, 'A foreign key to the Dialog table, name of the dialog.',),
('ControlEvent','Control_','N',None, None, 'Control',2,'Identifier',None, 'A foreign key to the Control table, name of the control',),
('ControlEvent','Event','N',None, None, None, None, 'Formatted',None, 'An identifier that specifies the type of the event that should take place when the user interacts with control specified by the first two entries.',),
('CreateFolder','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table.',),
('CreateFolder','Directory_','N',None, None, 'Directory',1,'Identifier',None, 'Primary key, could be foreign key into the Directory table.',),
('CustomAction','Type','N',1,16383,None, None, None, None, 'The numeric custom action type, consisting of source location, code type, entry, option flags.',),
('CustomAction','Action','N',None, None, None, None, 'Identifier',None, 'Primary key, name of action, normally appears in sequence table unless private use.',),
('CustomAction','Source','Y',None, None, None, None, 'CustomSource',None, 'The table reference of the source of the code.',),
('CustomAction','Target','Y',None, None, None, None, 'Formatted',None, 'Excecution parameter, depends on the type of custom action',),
('DrLocator','Signature_','N',None, None, None, None, 'Identifier',None, 'The Signature_ represents a unique file signature and is also the foreign key in the Signature table.',),
('DrLocator','Path','Y',None, None, None, None, 'AnyPath',None, 'The path on the user system. This is either a subpath below the value of the Parent or a full path. The path may contain properties enclosed within [ ] that will be expanded.',),
('DrLocator','Depth','Y',0,32767,None, None, None, None, 'The depth below the path to which the Signature_ is recursively searched. If absent, the depth is assumed to be 0.',),
('DrLocator','Parent','Y',None, None, None, None, 'Identifier',None, 'The parent file signature. It is also a foreign key in the Signature table. If null and the Path column does not expand to a full path, then all the fixed drives of the user system are searched using the Path.',),
('DuplicateFile','File_','N',None, None, 'File',1,'Identifier',None, 'Foreign key referencing the source file to be duplicated.',),
('DuplicateFile','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key referencing Component that controls the duplicate file.',),
('DuplicateFile','DestFolder','Y',None, None, None, None, 'Identifier',None, 'Name of a property whose value is assumed to resolve to the full pathname to a destination folder.',),
('DuplicateFile','DestName','Y',None, None, None, None, 'Filename',None, 'Filename to be given to the duplicate file.',),
('DuplicateFile','FileKey','N',None, None, None, None, 'Identifier',None, 'Primary key used to identify a particular file entry',),
('Environment','Name','N',None, None, None, None, 'Text',None, 'The name of the environmental value.',),
('Environment','Value','Y',None, None, None, None, 'Formatted',None, 'The value to set in the environmental settings.',),
('Environment','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table referencing component that controls the installing of the environmental value.',),
('Environment','Environment','N',None, None, None, None, 'Identifier',None, 'Unique identifier for the environmental variable setting',),
('Error','Error','N',0,32767,None, None, None, None, 'Integer error number, obtained from header file IError(...) macros.',),
('Error','Message','Y',None, None, None, None, 'Template',None, 'Error formatting template, obtained from user ed. or localizers.',),
('EventMapping','Dialog_','N',None, None, 'Dialog',1,'Identifier',None, 'A foreign key to the Dialog table, name of the Dialog.',),
('EventMapping','Control_','N',None, None, 'Control',2,'Identifier',None, 'A foreign key to the Control table, name of the control.',),
('EventMapping','Event','N',None, None, None, None, 'Identifier',None, 'An identifier that specifies the type of the event that the control subscribes to.',),
('EventMapping','Attribute','N',None, None, None, None, 'Identifier',None, 'The name of the control attribute, that is set when this event is received.',),
('Extension','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Required foreign key into the Feature Table, specifying the feature to validate or install in order for the CLSID factory to be operational.',),
('Extension','Component_','N',None, None, 'Component',1,'Identifier',None, 'Required foreign key into the Component Table, specifying the component for which to return a path when called through LocateComponent.',),
('Extension','Extension','N',None, None, None, None, 'Text',None, 'The extension associated with the table row.',),
('Extension','MIME_','Y',None, None, 'MIME',1,'Text',None, 'Optional Context identifier, typically "type/format" associated with the extension',),
('Extension','ProgId_','Y',None, None, 'ProgId',1,'Text',None, 'Optional ProgId associated with this extension.',),
('MIME','CLSID','Y',None, None, None, None, 'Guid',None, 'Optional associated CLSID.',),
('MIME','ContentType','N',None, None, None, None, 'Text',None, 'Primary key. Context identifier, typically "type/format".',),
('MIME','Extension_','N',None, None, 'Extension',1,'Text',None, 'Optional associated extension (without dot)',),
('FeatureComponents','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Foreign key into Feature table.',),
('FeatureComponents','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into Component table.',),
('FileSFPCatalog','File_','N',None, None, 'File',1,'Identifier',None, 'File associated with the catalog',),
('FileSFPCatalog','SFPCatalog_','N',None, None, 'SFPCatalog',1,'Filename',None, 'Catalog associated with the file',),
('SFPCatalog','SFPCatalog','N',None, None, None, None, 'Filename',None, 'File name for the catalog.',),
('SFPCatalog','Catalog','N',None, None, None, None, 'Binary',None, 'SFP Catalog',),
('SFPCatalog','Dependency','Y',None, None, None, None, 'Formatted',None, 'Parent catalog - only used by SFP',),
('Font','File_','N',None, None, 'File',1,'Identifier',None, 'Primary key, foreign key into File table referencing font file.',),
('Font','FontTitle','Y',None, None, None, None, 'Text',None, 'Font name.',),
('IniFile','Action','N',None, None, None, None, None, '0;1;3','The type of modification to be made, one of iifEnum',),
('IniFile','Value','N',None, None, None, None, 'Formatted',None, 'The value to be written.',),
('IniFile','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table referencing component that controls the installing of the .INI value.',),
('IniFile','FileName','N',None, None, None, None, 'Filename',None, 'The .INI file name in which to write the information',),
('IniFile','IniFile','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('IniFile','DirProperty','Y',None, None, None, None, 'Identifier',None, 'Foreign key into the Directory table denoting the directory where the .INI file is.',),
('IniFile','Key','N',None, None, None, None, 'Formatted',None, 'The .INI file key below Section.',),
('IniFile','Section','N',None, None, None, None, 'Formatted',None, 'The .INI file Section.',),
('IniLocator','Type','Y',0,2,None, None, None, None, 'An integer value that determines if the .INI value read is a filename or a directory location or to be used as is w/o interpretation.',),
('IniLocator','Signature_','N',None, None, None, None, 'Identifier',None, 'The table key. The Signature_ represents a unique file signature and is also the foreign key in the Signature table.',),
('IniLocator','FileName','N',None, None, None, None, 'Filename',None, 'The .INI file name.',),
('IniLocator','Key','N',None, None, None, None, 'Text',None, 'Key value (followed by an equals sign in INI file).',),
('IniLocator','Section','N',None, None, None, None, 'Text',None, 'Section name within in file (within square brackets in INI file).',),
('IniLocator','Field','Y',0,32767,None, None, None, None, 'The field in the .INI line. If Field is null or 0 the entire line is read.',),
('InstallExecuteSequence','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to invoke, either in the engine or the handler DLL.',),
('InstallExecuteSequence','Condition','Y',None, None, None, None, 'Condition',None, 'Optional expression which skips the action if evaluates to expFalse.If the expression syntax is invalid, the engine will terminate, returning iesBadActionData.',),
('InstallExecuteSequence','Sequence','Y',-4,32767,None, None, None, None, 'Number that determines the sort order in which the actions are to be executed. Leave blank to suppress action.',),
('InstallUISequence','Action','N',None, None, None, None, 'Identifier',None, 'Name of action to invoke, either in the engine or the handler DLL.',),
('InstallUISequence','Condition','Y',None, None, None, None, 'Condition',None, 'Optional expression which skips the action if evaluates to expFalse.If the expression syntax is invalid, the engine will terminate, returning iesBadActionData.',),
('InstallUISequence','Sequence','Y',-4,32767,None, None, None, None, 'Number that determines the sort order in which the actions are to be executed. Leave blank to suppress action.',),
('IsolatedComponent','Component_Application','N',None, None, 'Component',1,'Identifier',None, 'Key to Component table item for application',),
('IsolatedComponent','Component_Shared','N',None, None, 'Component',1,'Identifier',None, 'Key to Component table item to be isolated',),
('LaunchCondition','Description','N',None, None, None, None, 'Formatted',None, 'Localizable text to display when condition fails and install must abort.',),
('LaunchCondition','Condition','N',None, None, None, None, 'Condition',None, 'Expression which must evaluate to TRUE in order for install to commence.',),
('ListBox','Text','Y',None, None, None, None, 'Text',None, 'The visible text to be assigned to the item. Optional. If this entry or the entire column is missing, the text is the same as the value.',),
('ListBox','Property','N',None, None, None, None, 'Identifier',None, 'A named property to be tied to this item. All the items tied to the same property become part of the same listbox.',),
('ListBox','Value','N',None, None, None, None, 'Formatted',None, 'The value string associated with this item. Selecting the line will set the associated property to this value.',),
('ListBox','Order','N',1,32767,None, None, None, None, 'A positive integer used to determine the ordering of the items within one list..The integers do not have to be consecutive.',),
('ListView','Text','Y',None, None, None, None, 'Text',None, 'The visible text to be assigned to the item. Optional. If this entry or the entire column is missing, the text is the same as the value.',),
('ListView','Property','N',None, None, None, None, 'Identifier',None, 'A named property to be tied to this item. All the items tied to the same property become part of the same listview.',),
('ListView','Value','N',None, None, None, None, 'Identifier',None, 'The value string associated with this item. Selecting the line will set the associated property to this value.',),
('ListView','Order','N',1,32767,None, None, None, None, 'A positive integer used to determine the ordering of the items within one list..The integers do not have to be consecutive.',),
('ListView','Binary_','Y',None, None, 'Binary',1,'Identifier',None, 'The name of the icon to be displayed with the icon. The binary information is looked up from the Binary Table.',),
('LockPermissions','Table','N',None, None, None, None, 'Identifier','Directory;File;Registry','Reference to another table name',),
('LockPermissions','Domain','Y',None, None, None, None, 'Formatted',None, 'Domain name for user whose permissions are being set. (usually a property)',),
('LockPermissions','LockObject','N',None, None, None, None, 'Identifier',None, 'Foreign key into Registry or File table',),
('LockPermissions','Permission','Y',-2147483647,2147483647,None, None, None, None, 'Permission Access mask. Full Control = 268435456 (GENERIC_ALL = 0x10000000)',),
('LockPermissions','User','N',None, None, None, None, 'Formatted',None, 'User for permissions to be set. (usually a property)',),
('Media','Source','Y',None, None, None, None, 'Property',None, 'The property defining the location of the cabinet file.',),
('Media','Cabinet','Y',None, None, None, None, 'Cabinet',None, 'If some or all of the files stored on the media are compressed in a cabinet, the name of that cabinet.',),
('Media','DiskId','N',1,32767,None, None, None, None, 'Primary key, integer to determine sort order for table.',),
('Media','DiskPrompt','Y',None, None, None, None, 'Text',None, 'Disk name: the visible text actually printed on the disk. This will be used to prompt the user when this disk needs to be inserted.',),
('Media','LastSequence','N',0,32767,None, None, None, None, 'File sequence number for the last file for this media.',),
('Media','VolumeLabel','Y',None, None, None, None, 'Text',None, 'The label attributed to the volume.',),
('ModuleComponents','Component','N',None, None, 'Component',1,'Identifier',None, 'Component contained in the module.',),
('ModuleComponents','Language','N',None, None, 'ModuleSignature',2,None, None, 'Default language ID for module (may be changed by transform).',),
('ModuleComponents','ModuleID','N',None, None, 'ModuleSignature',1,'Identifier',None, 'Module containing the component.',),
('ModuleSignature','Language','N',None, None, None, None, None, None, 'Default decimal language of module.',),
('ModuleSignature','Version','N',None, None, None, None, 'Version',None, 'Version of the module.',),
('ModuleSignature','ModuleID','N',None, None, None, None, 'Identifier',None, 'Module identifier (String.GUID).',),
('ModuleDependency','ModuleID','N',None, None, 'ModuleSignature',1,'Identifier',None, 'Module requiring the dependency.',),
('ModuleDependency','ModuleLanguage','N',None, None, 'ModuleSignature',2,None, None, 'Language of module requiring the dependency.',),
('ModuleDependency','RequiredID','N',None, None, None, None, None, None, 'String.GUID of required module.',),
('ModuleDependency','RequiredLanguage','N',None, None, None, None, None, None, 'LanguageID of the required module.',),
('ModuleDependency','RequiredVersion','Y',None, None, None, None, 'Version',None, 'Version of the required version.',),
('ModuleExclusion','ModuleID','N',None, None, 'ModuleSignature',1,'Identifier',None, 'String.GUID of module with exclusion requirement.',),
('ModuleExclusion','ModuleLanguage','N',None, None, 'ModuleSignature',2,None, None, 'LanguageID of module with exclusion requirement.',),
('ModuleExclusion','ExcludedID','N',None, None, None, None, None, None, 'String.GUID of excluded module.',),
('ModuleExclusion','ExcludedLanguage','N',None, None, None, None, None, None, 'Language of excluded module.',),
('ModuleExclusion','ExcludedMaxVersion','Y',None, None, None, None, 'Version',None, 'Maximum version of excluded module.',),
('ModuleExclusion','ExcludedMinVersion','Y',None, None, None, None, 'Version',None, 'Minimum version of excluded module.',),
('MoveFile','Component_','N',None, None, 'Component',1,'Identifier',None, 'If this component is not "selected" for installation or removal, no action will be taken on the associated MoveFile entry',),
('MoveFile','DestFolder','N',None, None, None, None, 'Identifier',None, 'Name of a property whose value is assumed to resolve to the full path to the destination directory',),
('MoveFile','DestName','Y',None, None, None, None, 'Filename',None, 'Name to be given to the original file after it is moved or copied. If blank, the destination file will be given the same name as the source file',),
('MoveFile','FileKey','N',None, None, None, None, 'Identifier',None, 'Primary key that uniquely identifies a particular MoveFile record',),
('MoveFile','Options','N',0,1,None, None, None, None, 'Integer value specifying the MoveFile operating mode, one of imfoEnum',),
('MoveFile','SourceFolder','Y',None, None, None, None, 'Identifier',None, 'Name of a property whose value is assumed to resolve to the full path to the source directory',),
('MoveFile','SourceName','Y',None, None, None, None, 'Text',None, "Name of the source file(s) to be moved or copied. Can contain the '*' or '?' wildcards.",),
('MsiAssembly','Attributes','Y',None, None, None, None, None, None, 'Assembly attributes',),
('MsiAssembly','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Foreign key into Feature table.',),
('MsiAssembly','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into Component table.',),
('MsiAssembly','File_Application','Y',None, None, 'File',1,'Identifier',None, 'Foreign key into File table, denoting the application context for private assemblies. Null for global assemblies.',),
('MsiAssembly','File_Manifest','Y',None, None, 'File',1,'Identifier',None, 'Foreign key into the File table denoting the manifest file for the assembly.',),
('MsiAssemblyName','Name','N',None, None, None, None, 'Text',None, 'The name part of the name-value pairs for the assembly name.',),
('MsiAssemblyName','Value','N',None, None, None, None, 'Text',None, 'The value part of the name-value pairs for the assembly name.',),
('MsiAssemblyName','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into Component table.',),
('MsiDigitalCertificate','CertData','N',None, None, None, None, 'Binary',None, 'A certificate context blob for a signer certificate',),
('MsiDigitalCertificate','DigitalCertificate','N',None, None, None, None, 'Identifier',None, 'A unique identifier for the row',),
('MsiDigitalSignature','Table','N',None, None, None, None, None, 'Media','Reference to another table name (only Media table is supported)',),
('MsiDigitalSignature','DigitalCertificate_','N',None, None, 'MsiDigitalCertificate',1,'Identifier',None, 'Foreign key to MsiDigitalCertificate table identifying the signer certificate',),
('MsiDigitalSignature','Hash','Y',None, None, None, None, 'Binary',None, 'The encoded hash blob from the digital signature',),
('MsiDigitalSignature','SignObject','N',None, None, None, None, 'Text',None, 'Foreign key to Media table',),
('MsiFileHash','File_','N',None, None, 'File',1,'Identifier',None, 'Primary key, foreign key into File table referencing file with this hash',),
('MsiFileHash','Options','N',0,32767,None, None, None, None, 'Various options and attributes for this hash.',),
('MsiFileHash','HashPart1','N',None, None, None, None, None, None, 'Size of file in bytes (integer).',),
('MsiFileHash','HashPart2','N',None, None, None, None, None, None, 'Size of file in bytes (integer).',),
('MsiFileHash','HashPart3','N',None, None, None, None, None, None, 'Size of file in bytes (integer).',),
('MsiFileHash','HashPart4','N',None, None, None, None, None, None, 'Size of file in bytes (integer).',),
('MsiPatchHeaders','StreamRef','N',None, None, None, None, 'Identifier',None, 'Primary key. A unique identifier for the row.',),
('MsiPatchHeaders','Header','N',None, None, None, None, 'Binary',None, 'Binary stream. The patch header, used for patch validation.',),
('ODBCAttribute','Value','Y',None, None, None, None, 'Text',None, 'Value for ODBC driver attribute',),
('ODBCAttribute','Attribute','N',None, None, None, None, 'Text',None, 'Name of ODBC driver attribute',),
('ODBCAttribute','Driver_','N',None, None, 'ODBCDriver',1,'Identifier',None, 'Reference to ODBC driver in ODBCDriver table',),
('ODBCDriver','Description','N',None, None, None, None, 'Text',None, 'Text used as registered name for driver, non-localized',),
('ODBCDriver','File_','N',None, None, 'File',1,'Identifier',None, 'Reference to key driver file',),
('ODBCDriver','Component_','N',None, None, 'Component',1,'Identifier',None, 'Reference to associated component',),
('ODBCDriver','Driver','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized.internal token for driver',),
('ODBCDriver','File_Setup','Y',None, None, 'File',1,'Identifier',None, 'Optional reference to key driver setup DLL',),
('ODBCDataSource','Description','N',None, None, None, None, 'Text',None, 'Text used as registered name for data source',),
('ODBCDataSource','Component_','N',None, None, 'Component',1,'Identifier',None, 'Reference to associated component',),
('ODBCDataSource','DataSource','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized.internal token for data source',),
('ODBCDataSource','DriverDescription','N',None, None, None, None, 'Text',None, 'Reference to driver description, may be existing driver',),
('ODBCDataSource','Registration','N',0,1,None, None, None, None, 'Registration option: 0=machine, 1=user, others t.b.d.',),
('ODBCSourceAttribute','Value','Y',None, None, None, None, 'Text',None, 'Value for ODBC data source attribute',),
('ODBCSourceAttribute','Attribute','N',None, None, None, None, 'Text',None, 'Name of ODBC data source attribute',),
('ODBCSourceAttribute','DataSource_','N',None, None, 'ODBCDataSource',1,'Identifier',None, 'Reference to ODBC data source in ODBCDataSource table',),
('ODBCTranslator','Description','N',None, None, None, None, 'Text',None, 'Text used as registered name for translator',),
('ODBCTranslator','File_','N',None, None, 'File',1,'Identifier',None, 'Reference to key translator file',),
('ODBCTranslator','Component_','N',None, None, 'Component',1,'Identifier',None, 'Reference to associated component',),
('ODBCTranslator','File_Setup','Y',None, None, 'File',1,'Identifier',None, 'Optional reference to key translator setup DLL',),
('ODBCTranslator','Translator','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized.internal token for translator',),
('Patch','Sequence','N',0,32767,None, None, None, None, 'Primary key, sequence with respect to the media images; order must track cabinet order.',),
('Patch','Attributes','N',0,32767,None, None, None, None, 'Integer containing bit flags representing patch attributes',),
('Patch','File_','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token, foreign key to File table, must match identifier in cabinet.',),
('Patch','Header','Y',None, None, None, None, 'Binary',None, 'Binary stream. The patch header, used for patch validation.',),
('Patch','PatchSize','N',0,2147483647,None, None, None, None, 'Size of patch in bytes (integer).',),
('Patch','StreamRef_','Y',None, None, None, None, 'Identifier',None, 'Identifier. Foreign key to the StreamRef column of the MsiPatchHeaders table.',),
('PatchPackage','Media_','N',0,32767,None, None, None, None, 'Foreign key to DiskId column of Media table. Indicates the disk containing the patch package.',),
('PatchPackage','PatchId','N',None, None, None, None, 'Guid',None, 'A unique string GUID representing this patch.',),
('PublishComponent','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Foreign key into the Feature table.',),
('PublishComponent','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table.',),
('PublishComponent','ComponentId','N',None, None, None, None, 'Guid',None, 'A string GUID that represents the component id that will be requested by the alien product.',),
('PublishComponent','AppData','Y',None, None, None, None, 'Text',None, 'This is localisable Application specific data that can be associated with a Qualified Component.',),
('PublishComponent','Qualifier','N',None, None, None, None, 'Text',None, 'This is defined only when the ComponentId column is an Qualified Component Id. This is the Qualifier for ProvideComponentIndirect.',),
('RadioButton','Y','N',0,32767,None, None, None, None, 'The vertical coordinate of the upper left corner of the bounding rectangle of the radio button.',),
('RadioButton','Text','Y',None, None, None, None, 'Text',None, 'The visible title to be assigned to the radio button.',),
('RadioButton','Property','N',None, None, None, None, 'Identifier',None, 'A named property to be tied to this radio button. All the buttons tied to the same property become part of the same group.',),
('RadioButton','Height','N',0,32767,None, None, None, None, 'The height of the button.',),
('RadioButton','Width','N',0,32767,None, None, None, None, 'The width of the button.',),
('RadioButton','X','N',0,32767,None, None, None, None, 'The horizontal coordinate of the upper left corner of the bounding rectangle of the radio button.',),
('RadioButton','Value','N',None, None, None, None, 'Formatted',None, 'The value string associated with this button. Selecting the button will set the associated property to this value.',),
('RadioButton','Order','N',1,32767,None, None, None, None, 'A positive integer used to determine the ordering of the items within one list..The integers do not have to be consecutive.',),
('RadioButton','Help','Y',None, None, None, None, 'Text',None, 'The help strings used with the button. The text is optional.',),
('Registry','Name','Y',None, None, None, None, 'Formatted',None, 'The registry value name.',),
('Registry','Value','Y',None, None, None, None, 'Formatted',None, 'The registry value.',),
('Registry','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table referencing component that controls the installing of the registry value.',),
('Registry','Key','N',None, None, None, None, 'RegPath',None, 'The key for the registry value.',),
('Registry','Registry','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('Registry','Root','N',-1,3,None, None, None, None, 'The predefined root key for the registry value, one of rrkEnum.',),
('RegLocator','Name','Y',None, None, None, None, 'Formatted',None, 'The registry value name.',),
('RegLocator','Type','Y',0,18,None, None, None, None, 'An integer value that determines if the registry value is a filename or a directory location or to be used as is w/o interpretation.',),
('RegLocator','Signature_','N',None, None, None, None, 'Identifier',None, 'The table key. The Signature_ represents a unique file signature and is also the foreign key in the Signature table. If the type is 0, the registry values refers a directory, and _Signature is not a foreign key.',),
('RegLocator','Key','N',None, None, None, None, 'RegPath',None, 'The key for the registry value.',),
('RegLocator','Root','N',0,3,None, None, None, None, 'The predefined root key for the registry value, one of rrkEnum.',),
('RemoveFile','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key referencing Component that controls the file to be removed.',),
('RemoveFile','FileKey','N',None, None, None, None, 'Identifier',None, 'Primary key used to identify a particular file entry',),
('RemoveFile','FileName','Y',None, None, None, None, 'WildCardFilename',None, 'Name of the file to be removed.',),
('RemoveFile','DirProperty','N',None, None, None, None, 'Identifier',None, 'Name of a property whose value is assumed to resolve to the full pathname to the folder of the file to be removed.',),
('RemoveFile','InstallMode','N',None, None, None, None, None, '1;2;3','Installation option, one of iimEnum.',),
('RemoveIniFile','Action','N',None, None, None, None, None, '2;4','The type of modification to be made, one of iifEnum.',),
('RemoveIniFile','Value','Y',None, None, None, None, 'Formatted',None, 'The value to be deleted. The value is required when Action is iifIniRemoveTag',),
('RemoveIniFile','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table referencing component that controls the deletion of the .INI value.',),
('RemoveIniFile','FileName','N',None, None, None, None, 'Filename',None, 'The .INI file name in which to delete the information',),
('RemoveIniFile','DirProperty','Y',None, None, None, None, 'Identifier',None, 'Foreign key into the Directory table denoting the directory where the .INI file is.',),
('RemoveIniFile','Key','N',None, None, None, None, 'Formatted',None, 'The .INI file key below Section.',),
('RemoveIniFile','Section','N',None, None, None, None, 'Formatted',None, 'The .INI file Section.',),
('RemoveIniFile','RemoveIniFile','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('RemoveRegistry','Name','Y',None, None, None, None, 'Formatted',None, 'The registry value name.',),
('RemoveRegistry','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table referencing component that controls the deletion of the registry value.',),
('RemoveRegistry','Key','N',None, None, None, None, 'RegPath',None, 'The key for the registry value.',),
('RemoveRegistry','Root','N',-1,3,None, None, None, None, 'The predefined root key for the registry value, one of rrkEnum',),
('RemoveRegistry','RemoveRegistry','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('ReserveCost','Component_','N',None, None, 'Component',1,'Identifier',None, 'Reserve a specified amount of space if this component is to be installed.',),
('ReserveCost','ReserveFolder','Y',None, None, None, None, 'Identifier',None, 'Name of a property whose value is assumed to resolve to the full path to the destination directory',),
('ReserveCost','ReserveKey','N',None, None, None, None, 'Identifier',None, 'Primary key that uniquely identifies a particular ReserveCost record',),
('ReserveCost','ReserveLocal','N',0,2147483647,None, None, None, None, 'Disk space to reserve if linked component is installed locally.',),
('ReserveCost','ReserveSource','N',0,2147483647,None, None, None, None, 'Disk space to reserve if linked component is installed to run from the source location.',),
('SelfReg','File_','N',None, None, 'File',1,'Identifier',None, 'Foreign key into the File table denoting the module that needs to be registered.',),
('SelfReg','Cost','Y',0,32767,None, None, None, None, 'The cost of registering the module.',),
('ServiceControl','Name','N',None, None, None, None, 'Formatted',None, 'Name of a service. /, \\, comma and space are invalid',),
('ServiceControl','Component_','N',None, None, 'Component',1,'Identifier',None, 'Required foreign key into the Component Table that controls the startup of the service',),
('ServiceControl','Event','N',0,187,None, None, None, None, 'Bit field: Install: 0x1 = Start, 0x2 = Stop, 0x8 = Delete, Uninstall: 0x10 = Start, 0x20 = Stop, 0x80 = Delete',),
('ServiceControl','ServiceControl','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('ServiceControl','Arguments','Y',None, None, None, None, 'Formatted',None, 'Arguments for the service. Separate by [~].',),
('ServiceControl','Wait','Y',0,1,None, None, None, None, 'Boolean for whether to wait for the service to fully start',),
('ServiceInstall','Name','N',None, None, None, None, 'Formatted',None, 'Internal Name of the Service',),
('ServiceInstall','Description','Y',None, None, None, None, 'Text',None, 'Description of service.',),
('ServiceInstall','Component_','N',None, None, 'Component',1,'Identifier',None, 'Required foreign key into the Component Table that controls the startup of the service',),
('ServiceInstall','Arguments','Y',None, None, None, None, 'Formatted',None, 'Arguments to include in every start of the service, passed to WinMain',),
('ServiceInstall','ServiceInstall','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('ServiceInstall','Dependencies','Y',None, None, None, None, 'Formatted',None, 'Other services this depends on to start. Separate by [~], and end with [~][~]',),
('ServiceInstall','DisplayName','Y',None, None, None, None, 'Formatted',None, 'External Name of the Service',),
('ServiceInstall','ErrorControl','N',-2147483647,2147483647,None, None, None, None, 'Severity of error if service fails to start',),
('ServiceInstall','LoadOrderGroup','Y',None, None, None, None, 'Formatted',None, 'LoadOrderGroup',),
('ServiceInstall','Password','Y',None, None, None, None, 'Formatted',None, 'password to run service with. (with StartName)',),
('ServiceInstall','ServiceType','N',-2147483647,2147483647,None, None, None, None, 'Type of the service',),
('ServiceInstall','StartName','Y',None, None, None, None, 'Formatted',None, 'User or object name to run service as',),
('ServiceInstall','StartType','N',0,4,None, None, None, None, 'Type of the service',),
('Shortcut','Name','N',None, None, None, None, 'Filename',None, 'The name of the shortcut to be created.',),
('Shortcut','Description','Y',None, None, None, None, 'Text',None, 'The description for the shortcut.',),
('Shortcut','Component_','N',None, None, 'Component',1,'Identifier',None, 'Foreign key into the Component table denoting the component whose selection gates the shortcut creation/deletion.',),
('Shortcut','Icon_','Y',None, None, 'Icon',1,'Identifier',None, 'Foreign key into the File table denoting the external icon file for the shortcut.',),
('Shortcut','IconIndex','Y',-32767,32767,None, None, None, None, 'The icon index for the shortcut.',),
('Shortcut','Directory_','N',None, None, 'Directory',1,'Identifier',None, 'Foreign key into the Directory table denoting the directory where the shortcut file is created.',),
('Shortcut','Target','N',None, None, None, None, 'Shortcut',None, 'The shortcut target. This is usually a property that is expanded to a file or a folder that the shortcut points to.',),
('Shortcut','Arguments','Y',None, None, None, None, 'Formatted',None, 'The command-line arguments for the shortcut.',),
('Shortcut','Shortcut','N',None, None, None, None, 'Identifier',None, 'Primary key, non-localized token.',),
('Shortcut','Hotkey','Y',0,32767,None, None, None, None, 'The hotkey for the shortcut. It has the virtual-key code for the key in the low-order byte, and the modifier flags in the high-order byte. ',),
('Shortcut','ShowCmd','Y',None, None, None, None, None, '1;3;7','The show command for the application window.The following values may be used.',),
('Shortcut','WkDir','Y',None, None, None, None, 'Identifier',None, 'Name of property defining location of working directory.',),
('Signature','FileName','N',None, None, None, None, 'Filename',None, 'The name of the file. This may contain a "short name|long name" pair.',),
('Signature','Signature','N',None, None, None, None, 'Identifier',None, 'The table key. The Signature represents a unique file signature.',),
('Signature','Languages','Y',None, None, None, None, 'Language',None, 'The languages supported by the file.',),
('Signature','MaxDate','Y',0,2147483647,None, None, None, None, 'The maximum creation date of the file.',),
('Signature','MaxSize','Y',0,2147483647,None, None, None, None, 'The maximum size of the file. ',),
('Signature','MaxVersion','Y',None, None, None, None, 'Text',None, 'The maximum version of the file.',),
('Signature','MinDate','Y',0,2147483647,None, None, None, None, 'The minimum creation date of the file.',),
('Signature','MinSize','Y',0,2147483647,None, None, None, None, 'The minimum size of the file.',),
('Signature','MinVersion','Y',None, None, None, None, 'Text',None, 'The minimum version of the file.',),
('TextStyle','TextStyle','N',None, None, None, None, 'Identifier',None, 'Name of the style. The primary key of this table. This name is embedded in the texts to indicate a style change.',),
('TextStyle','Color','Y',0,16777215,None, None, None, None, 'An integer indicating the color of the string in the RGB format (Red, Green, Blue each 0-255, RGB = R + 256*G + 256^2*B).',),
('TextStyle','FaceName','N',None, None, None, None, 'Text',None, 'A string indicating the name of the font used. Required. The string must be at most 31 characters long.',),
('TextStyle','Size','N',0,32767,None, None, None, None, 'The size of the font used. This size is given in our units (1/12 of the system font height). Assuming that the system font is set to 12 point size, this is equivalent to the point size.',),
('TextStyle','StyleBits','Y',0,15,None, None, None, None, 'A combination of style bits.',),
('TypeLib','Description','Y',None, None, None, None, 'Text',None, None, ),
('TypeLib','Feature_','N',None, None, 'Feature',1,'Identifier',None, 'Required foreign key into the Feature Table, specifying the feature to validate or install in order for the type library to be operational.',),
('TypeLib','Component_','N',None, None, 'Component',1,'Identifier',None, 'Required foreign key into the Component Table, specifying the component for which to return a path when called through LocateComponent.',),
('TypeLib','Directory_','Y',None, None, 'Directory',1,'Identifier',None, 'Optional. The foreign key into the Directory table denoting the path to the help file for the type library.',),
('TypeLib','Language','N',0,32767,None, None, None, None, 'The language of the library.',),
('TypeLib','Version','Y',0,16777215,None, None, None, None, 'The version of the library. The minor version is in the lower 8 bits of the integer. The major version is in the next 16 bits. ',),
('TypeLib','Cost','Y',0,2147483647,None, None, None, None, 'The cost associated with the registration of the typelib. This column is currently optional.',),
('TypeLib','LibID','N',None, None, None, None, 'Guid',None, 'The GUID that represents the library.',),
('UIText','Text','Y',None, None, None, None, 'Text',None, 'The localized version of the string.',),
('UIText','Key','N',None, None, None, None, 'Identifier',None, 'A unique key that identifies the particular string.',),
('Upgrade','Attributes','N',0,2147483647,None, None, None, None, 'The attributes of this product set.',),
('Upgrade','Language','Y',None, None, None, None, 'Language',None, 'A comma-separated list of languages for either products in this set or products not in this set.',),
('Upgrade','ActionProperty','N',None, None, None, None, 'UpperCase',None, 'The property to set when a product in this set is found.',),
('Upgrade','Remove','Y',None, None, None, None, 'Formatted',None, 'The list of features to remove when uninstalling a product from this set. The default is "ALL".',),
('Upgrade','UpgradeCode','N',None, None, None, None, 'Guid',None, 'The UpgradeCode GUID belonging to the products in this set.',),
('Upgrade','VersionMax','Y',None, None, None, None, 'Text',None, 'The maximum ProductVersion of the products in this set. The set may or may not include products with this particular version.',),
('Upgrade','VersionMin','Y',None, None, None, None, 'Text',None, 'The minimum ProductVersion of the products in this set. The set may or may not include products with this particular version.',),
('Verb','Sequence','Y',0,32767,None, None, None, None, 'Order within the verbs for a particular extension. Also used simply to specify the default verb.',),
('Verb','Argument','Y',None, None, None, None, 'Formatted',None, 'Optional value for the command arguments.',),
('Verb','Extension_','N',None, None, 'Extension',1,'Text',None, 'The extension associated with the table row.',),
('Verb','Verb','N',None, None, None, None, 'Text',None, 'The verb for the command.',),
('Verb','Command','Y',None, None, None, None, 'Formatted',None, 'The command text.',),
]
AdminExecuteSequence = [
('InstallInitialize', None, 1500),
('InstallFinalize', None, 6600),
('InstallFiles', None, 4000),
('InstallAdminPackage', None, 3900),
('FileCost', None, 900),
('CostInitialize', None, 800),
('CostFinalize', None, 1000),
('InstallValidate', None, 1400),
]
AdminUISequence = [
('FileCost', None, 900),
('CostInitialize', None, 800),
('CostFinalize', None, 1000),
('ExecuteAction', None, 1300),
('ExitDialog', None, -1),
('FatalError', None, -3),
('UserExit', None, -2),
]
AdvtExecuteSequence = [
('InstallInitialize', None, 1500),
('InstallFinalize', None, 6600),
('CostInitialize', None, 800),
('CostFinalize', None, 1000),
('InstallValidate', None, 1400),
('CreateShortcuts', None, 4500),
('MsiPublishAssemblies', None, 6250),
('PublishComponents', None, 6200),
('PublishFeatures', None, 6300),
('PublishProduct', None, 6400),
('RegisterClassInfo', None, 4600),
('RegisterExtensionInfo', None, 4700),
('RegisterMIMEInfo', None, 4900),
('RegisterProgIdInfo', None, 4800),
]
InstallExecuteSequence = [
('InstallInitialize', None, 1500),
('InstallFinalize', None, 6600),
('InstallFiles', None, 4000),
('FileCost', None, 900),
('CostInitialize', None, 800),
('CostFinalize', None, 1000),
('InstallValidate', None, 1400),
('CreateShortcuts', None, 4500),
('MsiPublishAssemblies', None, 6250),
('PublishComponents', None, 6200),
('PublishFeatures', None, 6300),
('PublishProduct', None, 6400),
('RegisterClassInfo', None, 4600),
('RegisterExtensionInfo', None, 4700),
('RegisterMIMEInfo', None, 4900),
('RegisterProgIdInfo', None, 4800),
('AllocateRegistrySpace', 'NOT Installed', 1550),
('AppSearch', None, 400),
('BindImage', None, 4300),
('CCPSearch', 'NOT Installed', 500),
('CreateFolders', None, 3700),
('DeleteServices', 'VersionNT', 2000),
('DuplicateFiles', None, 4210),
('FindRelatedProducts', None, 200),
('InstallODBC', None, 5400),
('InstallServices', 'VersionNT', 5800),
('IsolateComponents', None, 950),
('LaunchConditions', None, 100),
('MigrateFeatureStates', None, 1200),
('MoveFiles', None, 3800),
('PatchFiles', None, 4090),
('ProcessComponents', None, 1600),
('RegisterComPlus', None, 5700),
('RegisterFonts', None, 5300),
('RegisterProduct', None, 6100),
('RegisterTypeLibraries', None, 5500),
('RegisterUser', None, 6000),
('RemoveDuplicateFiles', None, 3400),
('RemoveEnvironmentStrings', None, 3300),
('RemoveExistingProducts', None, 6700),
('RemoveFiles', None, 3500),
('RemoveFolders', None, 3600),
('RemoveIniValues', None, 3100),
('RemoveODBC', None, 2400),
('RemoveRegistryValues', None, 2600),
('RemoveShortcuts', None, 3200),
('RMCCPSearch', 'NOT Installed', 600),
('SelfRegModules', None, 5600),
('SelfUnregModules', None, 2200),
('SetODBCFolders', None, 1100),
('StartServices', 'VersionNT', 5900),
('StopServices', 'VersionNT', 1900),
('MsiUnpublishAssemblies', None, 1750),
('UnpublishComponents', None, 1700),
('UnpublishFeatures', None, 1800),
('UnregisterClassInfo', None, 2700),
('UnregisterComPlus', None, 2100),
('UnregisterExtensionInfo', None, 2800),
('UnregisterFonts', None, 2500),
('UnregisterMIMEInfo', None, 3000),
('UnregisterProgIdInfo', None, 2900),
('UnregisterTypeLibraries', None, 2300),
('ValidateProductID', None, 700),
('WriteEnvironmentStrings', None, 5200),
('WriteIniValues', None, 5100),
('WriteRegistryValues', None, 5000),
]
InstallUISequence = [
('FileCost', None, 900),
('CostInitialize', None, 800),
('CostFinalize', None, 1000),
('ExecuteAction', None, 1300),
('ExitDialog', None, -1),
('FatalError', None, -3),
('UserExit', None, -2),
('AppSearch', None, 400),
('CCPSearch', 'NOT Installed', 500),
('FindRelatedProducts', None, 200),
('IsolateComponents', None, 950),
('LaunchConditions', None, 100),
('MigrateFeatureStates', None, 1200),
('RMCCPSearch', 'NOT Installed', 600),
('ValidateProductID', None, 700),
]
tables=['AdminExecuteSequence', 'AdminUISequence', 'AdvtExecuteSequence', 'InstallExecuteSequence', 'InstallUISequence']
import msilib,os;dirname=os.path.dirname(__file__)
ActionText = [
('InstallValidate', 'Validating install', None),
('InstallFiles', 'Copying new files', 'File: [1], Directory: [9], Size: [6]'),
('InstallAdminPackage', 'Copying network install files', 'File: [1], Directory: [9], Size: [6]'),
('FileCost', 'Computing space requirements', None),
('CostInitialize', 'Computing space requirements', None),
('CostFinalize', 'Computing space requirements', None),
('CreateShortcuts', 'Creating shortcuts', 'Shortcut: [1]'),
('PublishComponents', 'Publishing Qualified Components', 'Component ID: [1], Qualifier: [2]'),
('PublishFeatures', 'Publishing Product Features', 'Feature: [1]'),
('PublishProduct', 'Publishing product information', None),
('RegisterClassInfo', 'Registering Class servers', 'Class Id: [1]'),
('RegisterExtensionInfo', 'Registering extension servers', 'Extension: [1]'),
('RegisterMIMEInfo', 'Registering MIME info', 'MIME Content Type: [1], Extension: [2]'),
('RegisterProgIdInfo', 'Registering program identifiers', 'ProgId: [1]'),
('AllocateRegistrySpace', 'Allocating registry space', 'Free space: [1]'),
('AppSearch', 'Searching for installed applications', 'Property: [1], Signature: [2]'),
('BindImage', 'Binding executables', 'File: [1]'),
('CCPSearch', 'Searching for qualifying products', None),
('CreateFolders', 'Creating folders', 'Folder: [1]'),
('DeleteServices', 'Deleting services', 'Service: [1]'),
('DuplicateFiles', 'Creating duplicate files', 'File: [1], Directory: [9], Size: [6]'),
('FindRelatedProducts', 'Searching for related applications', 'Found application: [1]'),
('InstallODBC', 'Installing ODBC components', None),
('InstallServices', 'Installing new services', 'Service: [2]'),
('LaunchConditions', 'Evaluating launch conditions', None),
('MigrateFeatureStates', 'Migrating feature states from related applications', 'Application: [1]'),
('MoveFiles', 'Moving files', 'File: [1], Directory: [9], Size: [6]'),
('PatchFiles', 'Patching files', 'File: [1], Directory: [2], Size: [3]'),
('ProcessComponents', 'Updating component registration', None),
('RegisterComPlus', 'Registering COM+ Applications and Components', 'AppId: [1]{{, AppType: [2], Users: [3], RSN: [4]}}'),
('RegisterFonts', 'Registering fonts', 'Font: [1]'),
('RegisterProduct', 'Registering product', '[1]'),
('RegisterTypeLibraries', 'Registering type libraries', 'LibID: [1]'),
('RegisterUser', 'Registering user', '[1]'),
('RemoveDuplicateFiles', 'Removing duplicated files', 'File: [1], Directory: [9]'),
('RemoveEnvironmentStrings', 'Updating environment strings', 'Name: [1], Value: [2], Action [3]'),
('RemoveExistingProducts', 'Removing applications', 'Application: [1], Command line: [2]'),
('RemoveFiles', 'Removing files', 'File: [1], Directory: [9]'),
('RemoveFolders', 'Removing folders', 'Folder: [1]'),
('RemoveIniValues', 'Removing INI files entries', 'File: [1], Section: [2], Key: [3], Value: [4]'),
('RemoveODBC', 'Removing ODBC components', None),
('RemoveRegistryValues', 'Removing system registry values', 'Key: [1], Name: [2]'),
('RemoveShortcuts', 'Removing shortcuts', 'Shortcut: [1]'),
('RMCCPSearch', 'Searching for qualifying products', None),
('SelfRegModules', 'Registering modules', 'File: [1], Folder: [2]'),
('SelfUnregModules', 'Unregistering modules', 'File: [1], Folder: [2]'),
('SetODBCFolders', 'Initializing ODBC directories', None),
('StartServices', 'Starting services', 'Service: [1]'),
('StopServices', 'Stopping services', 'Service: [1]'),
('UnpublishComponents', 'Unpublishing Qualified Components', 'Component ID: [1], Qualifier: [2]'),
('UnpublishFeatures', 'Unpublishing Product Features', 'Feature: [1]'),
('UnregisterClassInfo', 'Unregister Class servers', 'Class Id: [1]'),
('UnregisterComPlus', 'Unregistering COM+ Applications and Components', 'AppId: [1]{{, AppType: [2]}}'),
('UnregisterExtensionInfo', 'Unregistering extension servers', 'Extension: [1]'),
('UnregisterFonts', 'Unregistering fonts', 'Font: [1]'),
('UnregisterMIMEInfo', 'Unregistering MIME info', 'MIME Content Type: [1], Extension: [2]'),
('UnregisterProgIdInfo', 'Unregistering program identifiers', 'ProgId: [1]'),
('UnregisterTypeLibraries', 'Unregistering type libraries', 'LibID: [1]'),
('WriteEnvironmentStrings', 'Updating environment strings', 'Name: [1], Value: [2], Action [3]'),
('WriteIniValues', 'Writing INI files values', 'File: [1], Section: [2], Key: [3], Value: [4]'),
('WriteRegistryValues', 'Writing system registry values', 'Key: [1], Name: [2], Value: [3]'),
('Advertise', 'Advertising application', None),
('GenerateScript', 'Generating script operations for action:', '[1]'),
('InstallSFPCatalogFile', 'Installing system catalog', 'File: [1], Dependencies: [2]'),
('MsiPublishAssemblies', 'Publishing assembly information', 'Application Context:[1], Assembly Name:[2]'),
('MsiUnpublishAssemblies', 'Unpublishing assembly information', 'Application Context:[1], Assembly Name:[2]'),
('Rollback', 'Rolling back action:', '[1]'),
('RollbackCleanup', 'Removing backup files', 'File: [1]'),
('UnmoveFiles', 'Removing moved files', 'File: [1], Directory: [9]'),
('UnpublishProduct', 'Unpublishing product information', None),
]
UIText = [
('AbsentPath', None),
('bytes', 'bytes'),
('GB', 'GB'),
('KB', 'KB'),
('MB', 'MB'),
('MenuAbsent', 'Entire feature will be unavailable'),
('MenuAdvertise', 'Feature will be installed when required'),
('MenuAllCD', 'Entire feature will be installed to run from CD'),
('MenuAllLocal', 'Entire feature will be installed on local hard drive'),
('MenuAllNetwork', 'Entire feature will be installed to run from network'),
('MenuCD', 'Will be installed to run from CD'),
('MenuLocal', 'Will be installed on local hard drive'),
('MenuNetwork', 'Will be installed to run from network'),
('ScriptInProgress', 'Gathering required information...'),
('SelAbsentAbsent', 'This feature will remain uninstalled'),
('SelAbsentAdvertise', 'This feature will be set to be installed when required'),
('SelAbsentCD', 'This feature will be installed to run from CD'),
('SelAbsentLocal', 'This feature will be installed on the local hard drive'),
('SelAbsentNetwork', 'This feature will be installed to run from the network'),
('SelAdvertiseAbsent', 'This feature will become unavailable'),
('SelAdvertiseAdvertise', 'Will be installed when required'),
('SelAdvertiseCD', 'This feature will be available to run from CD'),
('SelAdvertiseLocal', 'This feature will be installed on your local hard drive'),
('SelAdvertiseNetwork', 'This feature will be available to run from the network'),
('SelCDAbsent', "This feature will be uninstalled completely, you won't be able to run it from CD"),
('SelCDAdvertise', 'This feature will change from run from CD state to set to be installed when required'),
('SelCDCD', 'This feature will remain to be run from CD'),
('SelCDLocal', 'This feature will change from run from CD state to be installed on the local hard drive'),
('SelChildCostNeg', 'This feature frees up [1] on your hard drive.'),
('SelChildCostPos', 'This feature requires [1] on your hard drive.'),
('SelCostPending', 'Compiling cost for this feature...'),
('SelLocalAbsent', 'This feature will be completely removed'),
('SelLocalAdvertise', 'This feature will be removed from your local hard drive, but will be set to be installed when required'),
('SelLocalCD', 'This feature will be removed from your local hard drive, but will be still available to run from CD'),
('SelLocalLocal', 'This feature will remain on you local hard drive'),
('SelLocalNetwork', 'This feature will be removed from your local hard drive, but will be still available to run from the network'),
('SelNetworkAbsent', "This feature will be uninstalled completely, you won't be able to run it from the network"),
('SelNetworkAdvertise', 'This feature will change from run from network state to set to be installed when required'),
('SelNetworkLocal', 'This feature will change from run from network state to be installed on the local hard drive'),
('SelNetworkNetwork', 'This feature will remain to be run from the network'),
('SelParentCostNegNeg', 'This feature frees up [1] on your hard drive. It has [2] of [3] subfeatures selected. The subfeatures free up [4] on your hard drive.'),
('SelParentCostNegPos', 'This feature frees up [1] on your hard drive. It has [2] of [3] subfeatures selected. The subfeatures require [4] on your hard drive.'),
('SelParentCostPosNeg', 'This feature requires [1] on your hard drive. It has [2] of [3] subfeatures selected. The subfeatures free up [4] on your hard drive.'),
('SelParentCostPosPos', 'This feature requires [1] on your hard drive. It has [2] of [3] subfeatures selected. The subfeatures require [4] on your hard drive.'),
('TimeRemaining', 'Time remaining: {[1] minutes }{[2] seconds}'),
('VolumeCostAvailable', 'Available'),
('VolumeCostDifference', 'Difference'),
('VolumeCostRequired', 'Required'),
('VolumeCostSize', 'Disk Size'),
('VolumeCostVolume', 'Volume'),
]
tables=['ActionText', 'UIText']
# Copyright (C) 2005 Martin v. Löwis
# Licensed to PSF under a Contributor Agreement.
from _msi import *
import os, string, re, sys
AMD64 = "AMD64" in sys.version
Itanium = "Itanium" in sys.version
Win64 = AMD64 or Itanium
# Partially taken from Wine
datasizemask= 0x00ff
type_valid= 0x0100
type_localizable= 0x0200
typemask= 0x0c00
type_long= 0x0000
type_short= 0x0400
type_string= 0x0c00
type_binary= 0x0800
type_nullable= 0x1000
type_key= 0x2000
# XXX temporary, localizable?
knownbits = datasizemask | type_valid | type_localizable | \
typemask | type_nullable | type_key
class Table:
def __init__(self, name):
self.name = name
self.fields = []
def add_field(self, index, name, type):
self.fields.append((index,name,type))
def sql(self):
fields = []
keys = []
self.fields.sort()
fields = [None]*len(self.fields)
for index, name, type in self.fields:
index -= 1
unk = type & ~knownbits
if unk:
print("%s.%s unknown bits %x" % (self.name, name, unk))
size = type & datasizemask
dtype = type & typemask
if dtype == type_string:
if size:
tname="CHAR(%d)" % size
else:
tname="CHAR"
elif dtype == type_short:
assert size==2
tname = "SHORT"
elif dtype == type_long:
assert size==4
tname="LONG"
elif dtype == type_binary:
assert size==0
tname="OBJECT"
else:
tname="unknown"
print("%s.%sunknown integer type %d" % (self.name, name, size))
if type & type_nullable:
flags = ""
else:
flags = " NOT NULL"
if type & type_localizable:
flags += " LOCALIZABLE"
fields[index] = "`%s` %s%s" % (name, tname, flags)
if type & type_key:
keys.append("`%s`" % name)
fields = ", ".join(fields)
keys = ", ".join(keys)
return "CREATE TABLE %s (%s PRIMARY KEY %s)" % (self.name, fields, keys)
def create(self, db):
v = db.OpenView(self.sql())
v.Execute(None)
v.Close()
class _Unspecified:pass
def change_sequence(seq, action, seqno=_Unspecified, cond = _Unspecified):
"Change the sequence number of an action in a sequence list"
for i in range(len(seq)):
if seq[i][0] == action:
if cond is _Unspecified:
cond = seq[i][1]
if seqno is _Unspecified:
seqno = seq[i][2]
seq[i] = (action, cond, seqno)
return
raise ValueError("Action not found in sequence")
def add_data(db, table, values):
v = db.OpenView("SELECT * FROM `%s`" % table)
count = v.GetColumnInfo(MSICOLINFO_NAMES).GetFieldCount()
r = CreateRecord(count)
for value in values:
assert len(value) == count, value
for i in range(count):
field = value[i]
if isinstance(field, int):
r.SetInteger(i+1,field)
elif isinstance(field, str):
r.SetString(i+1,field)
elif field is None:
pass
elif isinstance(field, Binary):
r.SetStream(i+1, field.name)
else:
raise TypeError("Unsupported type %s" % field.__class__.__name__)
try:
v.Modify(MSIMODIFY_INSERT, r)
except Exception as e:
raise MSIError("Could not insert "+repr(values)+" into "+table)
r.ClearData()
v.Close()
def add_stream(db, name, path):
v = db.OpenView("INSERT INTO _Streams (Name, Data) VALUES ('%s', ?)" % name)
r = CreateRecord(1)
r.SetStream(1, path)
v.Execute(r)
v.Close()
def init_database(name, schema,
ProductName, ProductCode, ProductVersion,
Manufacturer):
try:
os.unlink(name)
except OSError:
pass
ProductCode = ProductCode.upper()
# Create the database
db = OpenDatabase(name, MSIDBOPEN_CREATE)
# Create the tables
for t in schema.tables:
t.create(db)
# Fill the validation table
add_data(db, "_Validation", schema._Validation_records)
# Initialize the summary information, allowing atmost 20 properties
si = db.GetSummaryInformation(20)
si.SetProperty(PID_TITLE, "Installation Database")
si.SetProperty(PID_SUBJECT, ProductName)
si.SetProperty(PID_AUTHOR, Manufacturer)
if Itanium:
si.SetProperty(PID_TEMPLATE, "Intel64;1033")
elif AMD64:
si.SetProperty(PID_TEMPLATE, "x64;1033")
else:
si.SetProperty(PID_TEMPLATE, "Intel;1033")
si.SetProperty(PID_REVNUMBER, gen_uuid())
si.SetProperty(PID_WORDCOUNT, 2) # long file names, compressed, original media
si.SetProperty(PID_PAGECOUNT, 200)
si.SetProperty(PID_APPNAME, "Python MSI Library")
# XXX more properties
si.Persist()
add_data(db, "Property", [
("ProductName", ProductName),
("ProductCode", ProductCode),
("ProductVersion", ProductVersion),
("Manufacturer", Manufacturer),
("ProductLanguage", "1033")])
db.Commit()
return db
def add_tables(db, module):
for table in module.tables:
add_data(db, table, getattr(module, table))
def make_id(str):
identifier_chars = string.ascii_letters + string.digits + "._"
str = "".join([c if c in identifier_chars else "_" for c in str])
if str[0] in (string.digits + "."):
str = "_" + str
assert re.match("^[A-Za-z_][A-Za-z0-9_.]*$", str), "FILE"+str
return str
def gen_uuid():
return "{"+UuidCreate().upper()+"}"
class CAB:
def __init__(self, name):
self.name = name
self.files = []
self.filenames = set()
self.index = 0
def gen_id(self, file):
logical = _logical = make_id(file)
pos = 1
while logical in self.filenames:
logical = "%s.%d" % (_logical, pos)
pos += 1
self.filenames.add(logical)
return logical
def append(self, full, file, logical):
if os.path.isdir(full):
return
if not logical:
logical = self.gen_id(file)
self.index += 1
self.files.append((full, logical))
return self.index, logical
def commit(self, db):
from tempfile import mktemp
filename = mktemp()
FCICreate(filename, self.files)
add_data(db, "Media",
[(1, self.index, None, "#"+self.name, None, None)])
add_stream(db, self.name, filename)
os.unlink(filename)
db.Commit()
_directories = set()
class Directory:
def __init__(self, db, cab, basedir, physical, _logical, default, componentflags=None):
"""Create a new directory in the Directory table. There is a current component
at each point in time for the directory, which is either explicitly created
through start_component, or implicitly when files are added for the first
time. Files are added into the current component, and into the cab file.
To create a directory, a base directory object needs to be specified (can be
None), the path to the physical directory, and a logical directory name.
Default specifies the DefaultDir slot in the directory table. componentflags
specifies the default flags that new components get."""
index = 1
_logical = make_id(_logical)
logical = _logical
while logical in _directories:
logical = "%s%d" % (_logical, index)
index += 1
_directories.add(logical)
self.db = db
self.cab = cab
self.basedir = basedir
self.physical = physical
self.logical = logical
self.component = None
self.short_names = set()
self.ids = set()
self.keyfiles = {}
self.componentflags = componentflags
if basedir:
self.absolute = os.path.join(basedir.absolute, physical)
blogical = basedir.logical
else:
self.absolute = physical
blogical = None
add_data(db, "Directory", [(logical, blogical, default)])
def start_component(self, component = None, feature = None, flags = None, keyfile = None, uuid=None):
"""Add an entry to the Component table, and make this component the current for this
directory. If no component name is given, the directory name is used. If no feature
is given, the current feature is used. If no flags are given, the directory's default
flags are used. If no keyfile is given, the KeyPath is left null in the Component
table."""
if flags is None:
flags = self.componentflags
if uuid is None:
uuid = gen_uuid()
else:
uuid = uuid.upper()
if component is None:
component = self.logical
self.component = component
if Win64:
flags |= 256
if keyfile:
keyid = self.cab.gen_id(self.absolute, keyfile)
self.keyfiles[keyfile] = keyid
else:
keyid = None
add_data(self.db, "Component",
[(component, uuid, self.logical, flags, None, keyid)])
if feature is None:
feature = current_feature
add_data(self.db, "FeatureComponents",
[(feature.id, component)])
def make_short(self, file):
oldfile = file
file = file.replace('+', '_')
file = ''.join(c for c in file if not c in ' "/\[]:;=,')
parts = file.split(".")
if len(parts) > 1:
prefix = "".join(parts[:-1]).upper()
suffix = parts[-1].upper()
if not prefix:
prefix = suffix
suffix = None
else:
prefix = file.upper()
suffix = None
if len(parts) < 3 and len(prefix) <= 8 and file == oldfile and (
not suffix or len(suffix) <= 3):
if suffix:
file = prefix+"."+suffix
else:
file = prefix
else:
file = None
if file is None or file in self.short_names:
prefix = prefix[:6]
if suffix:
suffix = suffix[:3]
pos = 1
while 1:
if suffix:
file = "%s~%d.%s" % (prefix, pos, suffix)
else:
file = "%s~%d" % (prefix, pos)
if file not in self.short_names: break
pos += 1
assert pos < 10000
if pos in (10, 100, 1000):
prefix = prefix[:-1]
self.short_names.add(file)
assert not re.search(r'[\?|><:/*"+,;=\[\]]', file) # restrictions on short names
return file
def add_file(self, file, src=None, version=None, language=None):
"""Add a file to the current component of the directory, starting a new one
if there is no current component. By default, the file name in the source
and the file table will be identical. If the src file is specified, it is
interpreted relative to the current directory. Optionally, a version and a
language can be specified for the entry in the File table."""
if not self.component:
self.start_component(self.logical, current_feature, 0)
if not src:
# Allow relative paths for file if src is not specified
src = file
file = os.path.basename(file)
absolute = os.path.join(self.absolute, src)
assert not re.search(r'[\?|><:/*]"', file) # restrictions on long names
if file in self.keyfiles:
logical = self.keyfiles[file]
else:
logical = None
sequence, logical = self.cab.append(absolute, file, logical)
assert logical not in self.ids
self.ids.add(logical)
short = self.make_short(file)
full = "%s|%s" % (short, file)
filesize = os.stat(absolute).st_size
# constants.msidbFileAttributesVital
# Compressed omitted, since it is the database default
# could add r/o, system, hidden
attributes = 512
add_data(self.db, "File",
[(logical, self.component, full, filesize, version,
language, attributes, sequence)])
#if not version:
# # Add hash if the file is not versioned
# filehash = FileHash(absolute, 0)
# add_data(self.db, "MsiFileHash",
# [(logical, 0, filehash.IntegerData(1),
# filehash.IntegerData(2), filehash.IntegerData(3),
# filehash.IntegerData(4))])
# Automatically remove .pyc/.pyo files on uninstall (2)
# XXX: adding so many RemoveFile entries makes installer unbelievably
# slow. So instead, we have to use wildcard remove entries
if file.endswith(".py"):
add_data(self.db, "RemoveFile",
[(logical+"c", self.component, "%sC|%sc" % (short, file),
self.logical, 2),
(logical+"o", self.component, "%sO|%so" % (short, file),
self.logical, 2)])
return logical
def glob(self, pattern, exclude = None):
"""Add a list of files to the current component as specified in the
glob pattern. Individual files can be excluded in the exclude list."""
files = glob.glob1(self.absolute, pattern)
for f in files:
if exclude and f in exclude: continue
self.add_file(f)
return files
def remove_pyc(self):
"Remove .pyc/.pyo files on uninstall"
add_data(self.db, "RemoveFile",
[(self.component+"c", self.component, "*.pyc", self.logical, 2),
(self.component+"o", self.component, "*.pyo", self.logical, 2)])
class Binary:
def __init__(self, fname):
self.name = fname
def __repr__(self):
return 'msilib.Binary(os.path.join(dirname,"%s"))' % self.name
class Feature:
def __init__(self, db, id, title, desc, display, level = 1,
parent=None, directory = None, attributes=0):
self.id = id
if parent:
parent = parent.id
add_data(db, "Feature",
[(id, parent, title, desc, display,
level, directory, attributes)])
def set_current(self):
global current_feature
current_feature = self
class Control:
def __init__(self, dlg, name):
self.dlg = dlg
self.name = name
def event(self, event, argument, condition = "1", ordering = None):
add_data(self.dlg.db, "ControlEvent",
[(self.dlg.name, self.name, event, argument,
condition, ordering)])
def mapping(self, event, attribute):
add_data(self.dlg.db, "EventMapping",
[(self.dlg.name, self.name, event, attribute)])
def condition(self, action, condition):
add_data(self.dlg.db, "ControlCondition",
[(self.dlg.name, self.name, action, condition)])
class RadioButtonGroup(Control):
def __init__(self, dlg, name, property):
self.dlg = dlg
self.name = name
self.property = property
self.index = 1
def add(self, name, x, y, w, h, text, value = None):
if value is None:
value = name
add_data(self.dlg.db, "RadioButton",
[(self.property, self.index, value,
x, y, w, h, text, None)])
self.index += 1
class Dialog:
def __init__(self, db, name, x, y, w, h, attr, title, first, default, cancel):
self.db = db
self.name = name
self.x, self.y, self.w, self.h = x,y,w,h
add_data(db, "Dialog", [(name, x,y,w,h,attr,title,first,default,cancel)])
def control(self, name, type, x, y, w, h, attr, prop, text, next, help):
add_data(self.db, "Control",
[(self.name, name, type, x, y, w, h, attr, prop, text, next, help)])
return Control(self, name)
def text(self, name, x, y, w, h, attr, text):
return self.control(name, "Text", x, y, w, h, attr, None,
text, None, None)
def bitmap(self, name, x, y, w, h, text):
return self.control(name, "Bitmap", x, y, w, h, 1, None, text, None, None)
def line(self, name, x, y, w, h):
return self.control(name, "Line", x, y, w, h, 1, None, None, None, None)
def pushbutton(self, name, x, y, w, h, attr, text, next):
return self.control(name, "PushButton", x, y, w, h, attr, None, text, next, None)
def radiogroup(self, name, x, y, w, h, attr, prop, text, next):
add_data(self.db, "Control",
[(self.name, name, "RadioButtonGroup",
x, y, w, h, attr, prop, text, next, None)])
return RadioButtonGroup(self, name, prop)
def checkbox(self, name, x, y, w, h, attr, prop, text, next):
return self.control(name, "CheckBox", x, y, w, h, attr, prop, text, next, None)
#
# A higher level module for using sockets (or Windows named pipes)
#
# multiprocessing/connection.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = [ 'Client', 'Listener', 'Pipe', 'wait' ]
import io
import os
import sys
import socket
import struct
import time
import tempfile
import itertools
import _multiprocessing
from . import reduction
from . import util
from . import AuthenticationError, BufferTooShort
from .reduction import ForkingPickler
try:
import _winapi
from _winapi import WAIT_OBJECT_0, WAIT_ABANDONED_0, WAIT_TIMEOUT, INFINITE
except ImportError:
if sys.platform == 'win32':
raise
_winapi = None
#
#
#
BUFSIZE = 8192
# A very generous timeout when it comes to local connections...
CONNECTION_TIMEOUT = 20.
_mmap_counter = itertools.count()
default_family = 'AF_INET'
families = ['AF_INET']
if hasattr(socket, 'AF_UNIX'):
default_family = 'AF_UNIX'
families += ['AF_UNIX']
if sys.platform == 'win32':
default_family = 'AF_PIPE'
families += ['AF_PIPE']
def _init_timeout(timeout=CONNECTION_TIMEOUT):
return time.time() + timeout
def _check_timeout(t):
return time.time() > t
#
#
#
def arbitrary_address(family):
'''
Return an arbitrary free address for the given family
'''
if family == 'AF_INET':
return ('localhost', 0)
elif family == 'AF_UNIX':
return tempfile.mktemp(prefix='listener-', dir=util.get_temp_dir())
elif family == 'AF_PIPE':
return tempfile.mktemp(prefix=r'\\.\pipe\pyc-%d-%d-' %
(os.getpid(), next(_mmap_counter)), dir="")
else:
raise ValueError('unrecognized family')
def _validate_family(family):
'''
Checks if the family is valid for the current environment.
'''
if sys.platform != 'win32' and family == 'AF_PIPE':
raise ValueError('Family %s is not recognized.' % family)
if sys.platform == 'win32' and family == 'AF_UNIX':
# double check
if not hasattr(socket, family):
raise ValueError('Family %s is not recognized.' % family)
def address_type(address):
'''
Return the types of the address
This can be 'AF_INET', 'AF_UNIX', or 'AF_PIPE'
'''
if type(address) == tuple:
return 'AF_INET'
elif type(address) is str and address.startswith('\\\\'):
return 'AF_PIPE'
elif type(address) is str:
return 'AF_UNIX'
else:
raise ValueError('address type of %r unrecognized' % address)
#
# Connection classes
#
class _ConnectionBase:
_handle = None
def __init__(self, handle, readable=True, writable=True):
handle = handle.__index__()
if handle < 0:
raise ValueError("invalid handle")
if not readable and not writable:
raise ValueError(
"at least one of `readable` and `writable` must be True")
self._handle = handle
self._readable = readable
self._writable = writable
# XXX should we use util.Finalize instead of a __del__?
def __del__(self):
if self._handle is not None:
self._close()
def _check_closed(self):
if self._handle is None:
raise OSError("handle is closed")
def _check_readable(self):
if not self._readable:
raise OSError("connection is write-only")
def _check_writable(self):
if not self._writable:
raise OSError("connection is read-only")
def _bad_message_length(self):
if self._writable:
self._readable = False
else:
self.close()
raise OSError("bad message length")
@property
def closed(self):
"""True if the connection is closed"""
return self._handle is None
@property
def readable(self):
"""True if the connection is readable"""
return self._readable
@property
def writable(self):
"""True if the connection is writable"""
return self._writable
def fileno(self):
"""File descriptor or handle of the connection"""
self._check_closed()
return self._handle
def close(self):
"""Close the connection"""
if self._handle is not None:
try:
self._close()
finally:
self._handle = None
def send_bytes(self, buf, offset=0, size=None):
"""Send the bytes data from a bytes-like object"""
self._check_closed()
self._check_writable()
m = memoryview(buf)
# HACK for byte-indexing of non-bytewise buffers (e.g. array.array)
if m.itemsize > 1:
m = memoryview(bytes(m))
n = len(m)
if offset < 0:
raise ValueError("offset is negative")
if n < offset:
raise ValueError("buffer length < offset")
if size is None:
size = n - offset
elif size < 0:
raise ValueError("size is negative")
elif offset + size > n:
raise ValueError("buffer length < offset + size")
self._send_bytes(m[offset:offset + size])
def send(self, obj):
"""Send a (picklable) object"""
self._check_closed()
self._check_writable()
self._send_bytes(ForkingPickler.dumps(obj))
def recv_bytes(self, maxlength=None):
"""
Receive bytes data as a bytes object.
"""
self._check_closed()
self._check_readable()
if maxlength is not None and maxlength < 0:
raise ValueError("negative maxlength")
buf = self._recv_bytes(maxlength)
if buf is None:
self._bad_message_length()
return buf.getvalue()
def recv_bytes_into(self, buf, offset=0):
"""
Receive bytes data into a writeable bytes-like object.
Return the number of bytes read.
"""
self._check_closed()
self._check_readable()
with memoryview(buf) as m:
# Get bytesize of arbitrary buffer
itemsize = m.itemsize
bytesize = itemsize * len(m)
if offset < 0:
raise ValueError("negative offset")
elif offset > bytesize:
raise ValueError("offset too large")
result = self._recv_bytes()
size = result.tell()
if bytesize < offset + size:
raise BufferTooShort(result.getvalue())
# Message can fit in dest
result.seek(0)
result.readinto(m[offset // itemsize :
(offset + size) // itemsize])
return size
def recv(self):
"""Receive a (picklable) object"""
self._check_closed()
self._check_readable()
buf = self._recv_bytes()
return ForkingPickler.loads(buf.getbuffer())
def poll(self, timeout=0.0):
"""Whether there is any input available to be read"""
self._check_closed()
self._check_readable()
return self._poll(timeout)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_tb):
self.close()
if _winapi:
class PipeConnection(_ConnectionBase):
"""
Connection class based on a Windows named pipe.
Overlapped I/O is used, so the handles must have been created
with FILE_FLAG_OVERLAPPED.
"""
_got_empty_message = False
def _close(self, _CloseHandle=_winapi.CloseHandle):
_CloseHandle(self._handle)
def _send_bytes(self, buf):
ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
try:
if err == _winapi.ERROR_IO_PENDING:
waitres = _winapi.WaitForMultipleObjects(
[ov.event], False, INFINITE)
assert waitres == WAIT_OBJECT_0
except:
ov.cancel()
raise
finally:
nwritten, err = ov.GetOverlappedResult(True)
assert err == 0
assert nwritten == len(buf)
def _recv_bytes(self, maxsize=None):
if self._got_empty_message:
self._got_empty_message = False
return io.BytesIO()
else:
bsize = 128 if maxsize is None else min(maxsize, 128)
try:
ov, err = _winapi.ReadFile(self._handle, bsize,
overlapped=True)
try:
if err == _winapi.ERROR_IO_PENDING:
waitres = _winapi.WaitForMultipleObjects(
[ov.event], False, INFINITE)
assert waitres == WAIT_OBJECT_0
except:
ov.cancel()
raise
finally:
nread, err = ov.GetOverlappedResult(True)
if err == 0:
f = io.BytesIO()
f.write(ov.getbuffer())
return f
elif err == _winapi.ERROR_MORE_DATA:
return self._get_more_data(ov, maxsize)
except OSError as e:
if e.winerror == _winapi.ERROR_BROKEN_PIPE:
raise EOFError
else:
raise
raise RuntimeError("shouldn't get here; expected KeyboardInterrupt")
def _poll(self, timeout):
if (self._got_empty_message or
_winapi.PeekNamedPipe(self._handle)[0] != 0):
return True
return bool(wait([self], timeout))
def _get_more_data(self, ov, maxsize):
buf = ov.getbuffer()
f = io.BytesIO()
f.write(buf)
left = _winapi.PeekNamedPipe(self._handle)[1]
assert left > 0
if maxsize is not None and len(buf) + left > maxsize:
self._bad_message_length()
ov, err = _winapi.ReadFile(self._handle, left, overlapped=True)
rbytes, err = ov.GetOverlappedResult(True)
assert err == 0
assert rbytes == left
f.write(ov.getbuffer())
return f
class Connection(_ConnectionBase):
"""
Connection class based on an arbitrary file descriptor (Unix only), or
a socket handle (Windows).
"""
if _winapi:
def _close(self, _close=_multiprocessing.closesocket):
_close(self._handle)
_write = _multiprocessing.send
_read = _multiprocessing.recv
else:
def _close(self, _close=os.close):
_close(self._handle)
_write = os.write
_read = os.read
def _send(self, buf, write=_write):
remaining = len(buf)
while True:
try:
n = write(self._handle, buf)
except InterruptedError:
continue
remaining -= n
if remaining == 0:
break
buf = buf[n:]
def _recv(self, size, read=_read):
buf = io.BytesIO()
handle = self._handle
remaining = size
while remaining > 0:
try:
chunk = read(handle, remaining)
except InterruptedError:
continue
n = len(chunk)
if n == 0:
if remaining == size:
raise EOFError
else:
raise OSError("got end of file during message")
buf.write(chunk)
remaining -= n
return buf
def _send_bytes(self, buf):
n = len(buf)
# For wire compatibility with 3.2 and lower
header = struct.pack("!i", n)
if n > 16384:
# The payload is large so Nagle's algorithm won't be triggered
# and we'd better avoid the cost of concatenation.
chunks = [header, buf]
elif n > 0:
# Issue # 20540: concatenate before sending, to avoid delays due
# to Nagle's algorithm on a TCP socket.
chunks = [header + buf]
else:
# This code path is necessary to avoid "broken pipe" errors
# when sending a 0-length buffer if the other end closed the pipe.
chunks = [header]
for chunk in chunks:
self._send(chunk)
def _recv_bytes(self, maxsize=None):
buf = self._recv(4)
size, = struct.unpack("!i", buf.getvalue())
if maxsize is not None and size > maxsize:
return None
return self._recv(size)
def _poll(self, timeout):
r = wait([self], timeout)
return bool(r)
#
# Public functions
#
class Listener(object):
'''
Returns a listener object.
This is a wrapper for a bound socket which is 'listening' for
connections, or for a Windows named pipe.
'''
def __init__(self, address=None, family=None, backlog=1, authkey=None):
family = family or (address and address_type(address)) \
or default_family
address = address or arbitrary_address(family)
_validate_family(family)
if family == 'AF_PIPE':
self._listener = PipeListener(address, backlog)
else:
self._listener = SocketListener(address, family, backlog)
if authkey is not None and not isinstance(authkey, bytes):
raise TypeError('authkey should be a byte string')
self._authkey = authkey
def accept(self):
'''
Accept a connection on the bound socket or named pipe of `self`.
Returns a `Connection` object.
'''
if self._listener is None:
raise OSError('listener is closed')
c = self._listener.accept()
if self._authkey:
deliver_challenge(c, self._authkey)
answer_challenge(c, self._authkey)
return c
def close(self):
'''
Close the bound socket or named pipe of `self`.
'''
listener = self._listener
if listener is not None:
self._listener = None
listener.close()
address = property(lambda self: self._listener._address)
last_accepted = property(lambda self: self._listener._last_accepted)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_tb):
self.close()
def Client(address, family=None, authkey=None):
'''
Returns a connection to the address of a `Listener`
'''
family = family or address_type(address)
_validate_family(family)
if family == 'AF_PIPE':
c = PipeClient(address)
else:
c = SocketClient(address)
if authkey is not None and not isinstance(authkey, bytes):
raise TypeError('authkey should be a byte string')
if authkey is not None:
answer_challenge(c, authkey)
deliver_challenge(c, authkey)
return c
if sys.platform != 'win32':
def Pipe(duplex=True):
'''
Returns pair of connection objects at either end of a pipe
'''
if duplex:
s1, s2 = socket.socketpair()
s1.setblocking(True)
s2.setblocking(True)
c1 = Connection(s1.detach())
c2 = Connection(s2.detach())
else:
fd1, fd2 = os.pipe()
c1 = Connection(fd1, writable=False)
c2 = Connection(fd2, readable=False)
return c1, c2
else:
def Pipe(duplex=True):
'''
Returns pair of connection objects at either end of a pipe
'''
address = arbitrary_address('AF_PIPE')
if duplex:
openmode = _winapi.PIPE_ACCESS_DUPLEX
access = _winapi.GENERIC_READ | _winapi.GENERIC_WRITE
obsize, ibsize = BUFSIZE, BUFSIZE
else:
openmode = _winapi.PIPE_ACCESS_INBOUND
access = _winapi.GENERIC_WRITE
obsize, ibsize = 0, BUFSIZE
h1 = _winapi.CreateNamedPipe(
address, openmode | _winapi.FILE_FLAG_OVERLAPPED |
_winapi.FILE_FLAG_FIRST_PIPE_INSTANCE,
_winapi.PIPE_TYPE_MESSAGE | _winapi.PIPE_READMODE_MESSAGE |
_winapi.PIPE_WAIT,
1, obsize, ibsize, _winapi.NMPWAIT_WAIT_FOREVER,
# default security descriptor: the handle cannot be inherited
_winapi.NULL
)
h2 = _winapi.CreateFile(
address, access, 0, _winapi.NULL, _winapi.OPEN_EXISTING,
_winapi.FILE_FLAG_OVERLAPPED, _winapi.NULL
)
_winapi.SetNamedPipeHandleState(
h2, _winapi.PIPE_READMODE_MESSAGE, None, None
)
overlapped = _winapi.ConnectNamedPipe(h1, overlapped=True)
_, err = overlapped.GetOverlappedResult(True)
assert err == 0
c1 = PipeConnection(h1, writable=duplex)
c2 = PipeConnection(h2, readable=duplex)
return c1, c2
#
# Definitions for connections based on sockets
#
class SocketListener(object):
'''
Representation of a socket which is bound to an address and listening
'''
def __init__(self, address, family, backlog=1):
self._socket = socket.socket(getattr(socket, family))
try:
# SO_REUSEADDR has different semantics on Windows (issue #2550).
if os.name == 'posix':
self._socket.setsockopt(socket.SOL_SOCKET,
socket.SO_REUSEADDR, 1)
self._socket.setblocking(True)
self._socket.bind(address)
self._socket.listen(backlog)
self._address = self._socket.getsockname()
except OSError:
self._socket.close()
raise
self._family = family
self._last_accepted = None
if family == 'AF_UNIX':
self._unlink = util.Finalize(
self, os.unlink, args=(address,), exitpriority=0
)
else:
self._unlink = None
def accept(self):
while True:
try:
s, self._last_accepted = self._socket.accept()
except InterruptedError:
pass
else:
break
s.setblocking(True)
return Connection(s.detach())
def close(self):
try:
self._socket.close()
finally:
unlink = self._unlink
if unlink is not None:
self._unlink = None
unlink()
def SocketClient(address):
'''
Return a connection object connected to the socket given by `address`
'''
family = address_type(address)
with socket.socket( getattr(socket, family) ) as s:
s.setblocking(True)
s.connect(address)
return Connection(s.detach())
#
# Definitions for connections based on named pipes
#
if sys.platform == 'win32':
class PipeListener(object):
'''
Representation of a named pipe
'''
def __init__(self, address, backlog=None):
self._address = address
self._handle_queue = [self._new_handle(first=True)]
self._last_accepted = None
util.sub_debug('listener created with address=%r', self._address)
self.close = util.Finalize(
self, PipeListener._finalize_pipe_listener,
args=(self._handle_queue, self._address), exitpriority=0
)
def _new_handle(self, first=False):
flags = _winapi.PIPE_ACCESS_DUPLEX | _winapi.FILE_FLAG_OVERLAPPED
if first:
flags |= _winapi.FILE_FLAG_FIRST_PIPE_INSTANCE
return _winapi.CreateNamedPipe(
self._address, flags,
_winapi.PIPE_TYPE_MESSAGE | _winapi.PIPE_READMODE_MESSAGE |
_winapi.PIPE_WAIT,
_winapi.PIPE_UNLIMITED_INSTANCES, BUFSIZE, BUFSIZE,
_winapi.NMPWAIT_WAIT_FOREVER, _winapi.NULL
)
def accept(self):
self._handle_queue.append(self._new_handle())
handle = self._handle_queue.pop(0)
try:
ov = _winapi.ConnectNamedPipe(handle, overlapped=True)
except OSError as e:
if e.winerror != _winapi.ERROR_NO_DATA:
raise
# ERROR_NO_DATA can occur if a client has already connected,
# written data and then disconnected -- see Issue 14725.
else:
try:
res = _winapi.WaitForMultipleObjects(
[ov.event], False, INFINITE)
except:
ov.cancel()
_winapi.CloseHandle(handle)
raise
finally:
_, err = ov.GetOverlappedResult(True)
assert err == 0
return PipeConnection(handle)
@staticmethod
def _finalize_pipe_listener(queue, address):
util.sub_debug('closing listener with address=%r', address)
for handle in queue:
_winapi.CloseHandle(handle)
def PipeClient(address):
'''
Return a connection object connected to the pipe given by `address`
'''
t = _init_timeout()
while 1:
try:
_winapi.WaitNamedPipe(address, 1000)
h = _winapi.CreateFile(
address, _winapi.GENERIC_READ | _winapi.GENERIC_WRITE,
0, _winapi.NULL, _winapi.OPEN_EXISTING,
_winapi.FILE_FLAG_OVERLAPPED, _winapi.NULL
)
except OSError as e:
if e.winerror not in (_winapi.ERROR_SEM_TIMEOUT,
_winapi.ERROR_PIPE_BUSY) or _check_timeout(t):
raise
else:
break
else:
raise
_winapi.SetNamedPipeHandleState(
h, _winapi.PIPE_READMODE_MESSAGE, None, None
)
return PipeConnection(h)
#
# Authentication stuff
#
MESSAGE_LENGTH = 20
CHALLENGE = b'#CHALLENGE#'
WELCOME = b'#WELCOME#'
FAILURE = b'#FAILURE#'
def deliver_challenge(connection, authkey):
import hmac
assert isinstance(authkey, bytes)
message = os.urandom(MESSAGE_LENGTH)
connection.send_bytes(CHALLENGE + message)
digest = hmac.new(authkey, message, 'md5').digest()
response = connection.recv_bytes(256) # reject large message
if response == digest:
connection.send_bytes(WELCOME)
else:
connection.send_bytes(FAILURE)
raise AuthenticationError('digest received was wrong')
def answer_challenge(connection, authkey):
import hmac
assert isinstance(authkey, bytes)
message = connection.recv_bytes(256) # reject large message
assert message[:len(CHALLENGE)] == CHALLENGE, 'message = %r' % message
message = message[len(CHALLENGE):]
digest = hmac.new(authkey, message, 'md5').digest()
connection.send_bytes(digest)
response = connection.recv_bytes(256) # reject large message
if response != WELCOME:
raise AuthenticationError('digest sent was rejected')
#
# Support for using xmlrpclib for serialization
#
class ConnectionWrapper(object):
def __init__(self, conn, dumps, loads):
self._conn = conn
self._dumps = dumps
self._loads = loads
for attr in ('fileno', 'close', 'poll', 'recv_bytes', 'send_bytes'):
obj = getattr(conn, attr)
setattr(self, attr, obj)
def send(self, obj):
s = self._dumps(obj)
self._conn.send_bytes(s)
def recv(self):
s = self._conn.recv_bytes()
return self._loads(s)
def _xml_dumps(obj):
return xmlrpclib.dumps((obj,), None, None, None, 1).encode('utf-8')
def _xml_loads(s):
(obj,), method = xmlrpclib.loads(s.decode('utf-8'))
return obj
class XmlListener(Listener):
def accept(self):
global xmlrpclib
import xmlrpc.client as xmlrpclib
obj = Listener.accept(self)
return ConnectionWrapper(obj, _xml_dumps, _xml_loads)
def XmlClient(*args, **kwds):
global xmlrpclib
import xmlrpc.client as xmlrpclib
return ConnectionWrapper(Client(*args, **kwds), _xml_dumps, _xml_loads)
#
# Wait
#
if sys.platform == 'win32':
def _exhaustive_wait(handles, timeout):
# Return ALL handles which are currently signalled. (Only
# returning the first signalled might create starvation issues.)
L = list(handles)
ready = []
while L:
res = _winapi.WaitForMultipleObjects(L, False, timeout)
if res == WAIT_TIMEOUT:
break
elif WAIT_OBJECT_0 <= res < WAIT_OBJECT_0 + len(L):
res -= WAIT_OBJECT_0
elif WAIT_ABANDONED_0 <= res < WAIT_ABANDONED_0 + len(L):
res -= WAIT_ABANDONED_0
else:
raise RuntimeError('Should not get here')
ready.append(L[res])
L = L[res+1:]
timeout = 0
return ready
_ready_errors = {_winapi.ERROR_BROKEN_PIPE, _winapi.ERROR_NETNAME_DELETED}
def wait(object_list, timeout=None):
'''
Wait till an object in object_list is ready/readable.
Returns list of those objects in object_list which are ready/readable.
'''
if timeout is None:
timeout = INFINITE
elif timeout < 0:
timeout = 0
else:
timeout = int(timeout * 1000 + 0.5)
object_list = list(object_list)
waithandle_to_obj = {}
ov_list = []
ready_objects = set()
ready_handles = set()
try:
for o in object_list:
try:
fileno = getattr(o, 'fileno')
except AttributeError:
waithandle_to_obj[o.__index__()] = o
else:
# start an overlapped read of length zero
try:
ov, err = _winapi.ReadFile(fileno(), 0, True)
except OSError as e:
ov, err = None, e.winerror
if err not in _ready_errors:
raise
if err == _winapi.ERROR_IO_PENDING:
ov_list.append(ov)
waithandle_to_obj[ov.event] = o
else:
# If o.fileno() is an overlapped pipe handle and
# err == 0 then there is a zero length message
# in the pipe, but it HAS NOT been consumed...
if ov and sys.getwindowsversion()[:2] >= (6, 2):
# ... except on Windows 8 and later, where
# the message HAS been consumed.
try:
_, err = ov.GetOverlappedResult(False)
except OSError as e:
err = e.winerror
if not err and hasattr(o, '_got_empty_message'):
o._got_empty_message = True
ready_objects.add(o)
timeout = 0
ready_handles = _exhaustive_wait(waithandle_to_obj.keys(), timeout)
finally:
# request that overlapped reads stop
for ov in ov_list:
ov.cancel()
# wait for all overlapped reads to stop
for ov in ov_list:
try:
_, err = ov.GetOverlappedResult(True)
except OSError as e:
err = e.winerror
if err not in _ready_errors:
raise
if err != _winapi.ERROR_OPERATION_ABORTED:
o = waithandle_to_obj[ov.event]
ready_objects.add(o)
if err == 0:
# If o.fileno() is an overlapped pipe handle then
# a zero length message HAS been consumed.
if hasattr(o, '_got_empty_message'):
o._got_empty_message = True
ready_objects.update(waithandle_to_obj[h] for h in ready_handles)
return [o for o in object_list if o in ready_objects]
else:
import selectors
# poll/select have the advantage of not requiring any extra file
# descriptor, contrarily to epoll/kqueue (also, they require a single
# syscall).
if hasattr(selectors, 'PollSelector'):
_WaitSelector = selectors.PollSelector
else:
_WaitSelector = selectors.SelectSelector
def wait(object_list, timeout=None):
'''
Wait till an object in object_list is ready/readable.
Returns list of those objects in object_list which are ready/readable.
'''
with _WaitSelector() as selector:
for obj in object_list:
selector.register(obj, selectors.EVENT_READ)
if timeout is not None:
deadline = time.time() + timeout
while True:
ready = selector.select(timeout)
if ready:
return [key.fileobj for (key, events) in ready]
else:
if timeout is not None:
timeout = deadline - time.time()
if timeout < 0:
return ready
#
# Make connection and socket objects sharable if possible
#
if sys.platform == 'win32':
def reduce_connection(conn):
handle = conn.fileno()
with socket.fromfd(handle, socket.AF_INET, socket.SOCK_STREAM) as s:
from . import resource_sharer
ds = resource_sharer.DupSocket(s)
return rebuild_connection, (ds, conn.readable, conn.writable)
def rebuild_connection(ds, readable, writable):
sock = ds.detach()
return Connection(sock.detach(), readable, writable)
reduction.register(Connection, reduce_connection)
def reduce_pipe_connection(conn):
access = ((_winapi.FILE_GENERIC_READ if conn.readable else 0) |
(_winapi.FILE_GENERIC_WRITE if conn.writable else 0))
dh = reduction.DupHandle(conn.fileno(), access)
return rebuild_pipe_connection, (dh, conn.readable, conn.writable)
def rebuild_pipe_connection(dh, readable, writable):
handle = dh.detach()
return PipeConnection(handle, readable, writable)
reduction.register(PipeConnection, reduce_pipe_connection)
else:
def reduce_connection(conn):
df = reduction.DupFd(conn.fileno())
return rebuild_connection, (df, conn.readable, conn.writable)
def rebuild_connection(df, readable, writable):
fd = df.detach()
return Connection(fd, readable, writable)
reduction.register(Connection, reduce_connection)
import os
import sys
import threading
from . import process
__all__ = [] # things are copied from here to __init__.py
#
# Exceptions
#
class ProcessError(Exception):
pass
class BufferTooShort(ProcessError):
pass
class TimeoutError(ProcessError):
pass
class AuthenticationError(ProcessError):
pass
#
# Base type for contexts
#
class BaseContext(object):
ProcessError = ProcessError
BufferTooShort = BufferTooShort
TimeoutError = TimeoutError
AuthenticationError = AuthenticationError
current_process = staticmethod(process.current_process)
active_children = staticmethod(process.active_children)
def cpu_count(self):
'''Returns the number of CPUs in the system'''
num = os.cpu_count()
if num is None:
raise NotImplementedError('cannot determine number of cpus')
else:
return num
def Manager(self):
'''Returns a manager associated with a running server process
The managers methods such as `Lock()`, `Condition()` and `Queue()`
can be used to create shared objects.
'''
from .managers import SyncManager
m = SyncManager(ctx=self.get_context())
m.start()
return m
def Pipe(self, duplex=True):
'''Returns two connection object connected by a pipe'''
from .connection import Pipe
return Pipe(duplex)
def Lock(self):
'''Returns a non-recursive lock object'''
from .synchronize import Lock
return Lock(ctx=self.get_context())
def RLock(self):
'''Returns a recursive lock object'''
from .synchronize import RLock
return RLock(ctx=self.get_context())
def Condition(self, lock=None):
'''Returns a condition object'''
from .synchronize import Condition
return Condition(lock, ctx=self.get_context())
def Semaphore(self, value=1):
'''Returns a semaphore object'''
from .synchronize import Semaphore
return Semaphore(value, ctx=self.get_context())
def BoundedSemaphore(self, value=1):
'''Returns a bounded semaphore object'''
from .synchronize import BoundedSemaphore
return BoundedSemaphore(value, ctx=self.get_context())
def Event(self):
'''Returns an event object'''
from .synchronize import Event
return Event(ctx=self.get_context())
def Barrier(self, parties, action=None, timeout=None):
'''Returns a barrier object'''
from .synchronize import Barrier
return Barrier(parties, action, timeout, ctx=self.get_context())
def Queue(self, maxsize=0):
'''Returns a queue object'''
from .queues import Queue
return Queue(maxsize, ctx=self.get_context())
def JoinableQueue(self, maxsize=0):
'''Returns a queue object'''
from .queues import JoinableQueue
return JoinableQueue(maxsize, ctx=self.get_context())
def SimpleQueue(self):
'''Returns a queue object'''
from .queues import SimpleQueue
return SimpleQueue(ctx=self.get_context())
def Pool(self, processes=None, initializer=None, initargs=(),
maxtasksperchild=None):
'''Returns a process pool object'''
from .pool import Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
context=self.get_context())
def RawValue(self, typecode_or_type, *args):
'''Returns a shared object'''
from .sharedctypes import RawValue
return RawValue(typecode_or_type, *args)
def RawArray(self, typecode_or_type, size_or_initializer):
'''Returns a shared array'''
from .sharedctypes import RawArray
return RawArray(typecode_or_type, size_or_initializer)
def Value(self, typecode_or_type, *args, lock=True):
'''Returns a synchronized shared object'''
from .sharedctypes import Value
return Value(typecode_or_type, *args, lock=lock,
ctx=self.get_context())
def Array(self, typecode_or_type, size_or_initializer, *, lock=True):
'''Returns a synchronized shared array'''
from .sharedctypes import Array
return Array(typecode_or_type, size_or_initializer, lock=lock,
ctx=self.get_context())
def freeze_support(self):
'''Check whether this is a fake forked process in a frozen executable.
If so then run code specified by commandline and exit.
'''
if sys.platform == 'win32' and getattr(sys, 'frozen', False):
from .spawn import freeze_support
freeze_support()
def get_logger(self):
'''Return package logger -- if it does not already exist then
it is created.
'''
from .util import get_logger
return get_logger()
def log_to_stderr(self, level=None):
'''Turn on logging and add a handler which prints to stderr'''
from .util import log_to_stderr
return log_to_stderr(level)
def allow_connection_pickling(self):
'''Install support for sending connections and sockets
between processes
'''
# This is undocumented. In previous versions of multiprocessing
# its only effect was to make socket objects inheritable on Windows.
from . import connection
def set_executable(self, executable):
'''Sets the path to a python.exe or pythonw.exe binary used to run
child processes instead of sys.executable when using the 'spawn'
start method. Useful for people embedding Python.
'''
from .spawn import set_executable
set_executable(executable)
def set_forkserver_preload(self, module_names):
'''Set list of module names to try to load in forkserver process.
This is really just a hint.
'''
from .forkserver import set_forkserver_preload
set_forkserver_preload(module_names)
def get_context(self, method=None):
if method is None:
return self
try:
ctx = _concrete_contexts[method]
except KeyError:
raise ValueError('cannot find context for %r' % method)
ctx._check_available()
return ctx
def get_start_method(self, allow_none=False):
return self._name
def set_start_method(self, method=None):
raise ValueError('cannot set start method of concrete context')
def _check_available(self):
pass
#
# Type of default context -- underlying context can be set at most once
#
class Process(process.BaseProcess):
_start_method = None
@staticmethod
def _Popen(process_obj):
return _default_context.get_context().Process._Popen(process_obj)
class DefaultContext(BaseContext):
Process = Process
def __init__(self, context):
self._default_context = context
self._actual_context = None
def get_context(self, method=None):
if method is None:
if self._actual_context is None:
self._actual_context = self._default_context
return self._actual_context
else:
return super().get_context(method)
def set_start_method(self, method, force=False):
if self._actual_context is not None and not force:
raise RuntimeError('context has already been set')
if method is None and force:
self._actual_context = None
return
self._actual_context = self.get_context(method)
def get_start_method(self, allow_none=False):
if self._actual_context is None:
if allow_none:
return None
self._actual_context = self._default_context
return self._actual_context._name
def get_all_start_methods(self):
if sys.platform == 'win32':
return ['spawn']
else:
from . import reduction
if reduction.HAVE_SEND_HANDLE:
return ['fork', 'spawn', 'forkserver']
else:
return ['fork', 'spawn']
DefaultContext.__all__ = list(x for x in dir(DefaultContext) if x[0] != '_')
#
# Context types for fixed start method
#
if sys.platform != 'win32':
class ForkProcess(process.BaseProcess):
_start_method = 'fork'
@staticmethod
def _Popen(process_obj):
from .popen_fork import Popen
return Popen(process_obj)
class SpawnProcess(process.BaseProcess):
_start_method = 'spawn'
@staticmethod
def _Popen(process_obj):
from .popen_spawn_posix import Popen
return Popen(process_obj)
class ForkServerProcess(process.BaseProcess):
_start_method = 'forkserver'
@staticmethod
def _Popen(process_obj):
from .popen_forkserver import Popen
return Popen(process_obj)
class ForkContext(BaseContext):
_name = 'fork'
Process = ForkProcess
class SpawnContext(BaseContext):
_name = 'spawn'
Process = SpawnProcess
class ForkServerContext(BaseContext):
_name = 'forkserver'
Process = ForkServerProcess
def _check_available(self):
from . import reduction
if not reduction.HAVE_SEND_HANDLE:
raise ValueError('forkserver start method not available')
_concrete_contexts = {
'fork': ForkContext(),
'spawn': SpawnContext(),
'forkserver': ForkServerContext(),
}
_default_context = DefaultContext(_concrete_contexts['fork'])
else:
class SpawnProcess(process.BaseProcess):
_start_method = 'spawn'
@staticmethod
def _Popen(process_obj):
from .popen_spawn_win32 import Popen
return Popen(process_obj)
class SpawnContext(BaseContext):
_name = 'spawn'
Process = SpawnProcess
_concrete_contexts = {
'spawn': SpawnContext(),
}
_default_context = DefaultContext(_concrete_contexts['spawn'])
#
# Force the start method
#
def _force_start_method(method):
_default_context._actual_context = _concrete_contexts[method]
#
# Check that the current thread is spawning a child process
#
_tls = threading.local()
def get_spawning_popen():
return getattr(_tls, 'spawning_popen', None)
def set_spawning_popen(popen):
_tls.spawning_popen = popen
def assert_spawning(obj):
if get_spawning_popen() is None:
raise RuntimeError(
'%s objects should only be shared between processes'
' through inheritance' % type(obj).__name__
)
import errno
import os
import selectors
import signal
import socket
import struct
import sys
import threading
from . import connection
from . import process
from . import reduction
from . import semaphore_tracker
from . import spawn
from . import util
__all__ = ['ensure_running', 'get_inherited_fds', 'connect_to_new_process',
'set_forkserver_preload']
#
#
#
MAXFDS_TO_SEND = 256
UNSIGNED_STRUCT = struct.Struct('Q') # large enough for pid_t
#
# Forkserver class
#
class ForkServer(object):
def __init__(self):
self._forkserver_address = None
self._forkserver_alive_fd = None
self._inherited_fds = None
self._lock = threading.Lock()
self._preload_modules = ['__main__']
def set_forkserver_preload(self, modules_names):
'''Set list of module names to try to load in forkserver process.'''
if not all(type(mod) is str for mod in self._preload_modules):
raise TypeError('module_names must be a list of strings')
self._preload_modules = modules_names
def get_inherited_fds(self):
'''Return list of fds inherited from parent process.
This returns None if the current process was not started by fork
server.
'''
return self._inherited_fds
def connect_to_new_process(self, fds):
'''Request forkserver to create a child process.
Returns a pair of fds (status_r, data_w). The calling process can read
the child process's pid and (eventually) its returncode from status_r.
The calling process should write to data_w the pickled preparation and
process data.
'''
self.ensure_running()
if len(fds) + 4 >= MAXFDS_TO_SEND:
raise ValueError('too many fds')
with socket.socket(socket.AF_UNIX) as client:
client.connect(self._forkserver_address)
parent_r, child_w = os.pipe()
child_r, parent_w = os.pipe()
allfds = [child_r, child_w, self._forkserver_alive_fd,
semaphore_tracker.getfd()]
allfds += fds
try:
reduction.sendfds(client, allfds)
return parent_r, parent_w
except:
os.close(parent_r)
os.close(parent_w)
raise
finally:
os.close(child_r)
os.close(child_w)
def ensure_running(self):
'''Make sure that a fork server is running.
This can be called from any process. Note that usually a child
process will just reuse the forkserver started by its parent, so
ensure_running() will do nothing.
'''
with self._lock:
semaphore_tracker.ensure_running()
if self._forkserver_alive_fd is not None:
return
cmd = ('from multiprocessing.forkserver import main; ' +
'main(%d, %d, %r, **%r)')
if self._preload_modules:
desired_keys = {'main_path', 'sys_path'}
data = spawn.get_preparation_data('ignore')
data = dict((x,y) for (x,y) in data.items()
if x in desired_keys)
else:
data = {}
with socket.socket(socket.AF_UNIX) as listener:
address = connection.arbitrary_address('AF_UNIX')
listener.bind(address)
os.chmod(address, 0o600)
listener.listen(100)
# all client processes own the write end of the "alive" pipe;
# when they all terminate the read end becomes ready.
alive_r, alive_w = os.pipe()
try:
fds_to_pass = [listener.fileno(), alive_r]
cmd %= (listener.fileno(), alive_r, self._preload_modules,
data)
exe = spawn.get_executable()
args = [exe] + util._args_from_interpreter_flags()
args += ['-c', cmd]
pid = util.spawnv_passfds(exe, args, fds_to_pass)
except:
os.close(alive_w)
raise
finally:
os.close(alive_r)
self._forkserver_address = address
self._forkserver_alive_fd = alive_w
#
#
#
def main(listener_fd, alive_r, preload, main_path=None, sys_path=None):
'''Run forkserver.'''
if preload:
if '__main__' in preload and main_path is not None:
process.current_process()._inheriting = True
try:
spawn.import_main_path(main_path)
finally:
del process.current_process()._inheriting
for modname in preload:
try:
__import__(modname)
except ImportError:
pass
# close sys.stdin
if sys.stdin is not None:
try:
sys.stdin.close()
sys.stdin = open(os.devnull)
except (OSError, ValueError):
pass
# ignoring SIGCHLD means no need to reap zombie processes
handler = signal.signal(signal.SIGCHLD, signal.SIG_IGN)
with socket.socket(socket.AF_UNIX, fileno=listener_fd) as listener, \
selectors.DefaultSelector() as selector:
_forkserver._forkserver_address = listener.getsockname()
selector.register(listener, selectors.EVENT_READ)
selector.register(alive_r, selectors.EVENT_READ)
while True:
try:
while True:
rfds = [key.fileobj for (key, events) in selector.select()]
if rfds:
break
if alive_r in rfds:
# EOF because no more client processes left
assert os.read(alive_r, 1) == b''
raise SystemExit
assert listener in rfds
with listener.accept()[0] as s:
code = 1
if os.fork() == 0:
try:
_serve_one(s, listener, alive_r, handler)
except Exception:
sys.excepthook(*sys.exc_info())
sys.stderr.flush()
finally:
os._exit(code)
except InterruptedError:
pass
except OSError as e:
if e.errno != errno.ECONNABORTED:
raise
def _serve_one(s, listener, alive_r, handler):
# close unnecessary stuff and reset SIGCHLD handler
listener.close()
os.close(alive_r)
signal.signal(signal.SIGCHLD, handler)
# receive fds from parent process
fds = reduction.recvfds(s, MAXFDS_TO_SEND + 1)
s.close()
assert len(fds) <= MAXFDS_TO_SEND
(child_r, child_w, _forkserver._forkserver_alive_fd,
stfd, *_forkserver._inherited_fds) = fds
semaphore_tracker._semaphore_tracker._fd = stfd
# send pid to client processes
write_unsigned(child_w, os.getpid())
# reseed random number generator
if 'random' in sys.modules:
import random
random.seed()
# run process object received over pipe
code = spawn._main(child_r)
# write the exit code to the pipe
write_unsigned(child_w, code)
#
# Read and write unsigned numbers
#
def read_unsigned(fd):
data = b''
length = UNSIGNED_STRUCT.size
while len(data) < length:
while True:
try:
s = os.read(fd, length - len(data))
except InterruptedError:
pass
else:
break
if not s:
raise EOFError('unexpected EOF')
data += s
return UNSIGNED_STRUCT.unpack(data)[0]
def write_unsigned(fd, n):
msg = UNSIGNED_STRUCT.pack(n)
while msg:
while True:
try:
nbytes = os.write(fd, msg)
except InterruptedError:
pass
else:
break
if nbytes == 0:
raise RuntimeError('should not get here')
msg = msg[nbytes:]
#
#
#
_forkserver = ForkServer()
ensure_running = _forkserver.ensure_running
get_inherited_fds = _forkserver.get_inherited_fds
connect_to_new_process = _forkserver.connect_to_new_process
set_forkserver_preload = _forkserver.set_forkserver_preload
#
# Module which supports allocation of memory from an mmap
#
# multiprocessing/heap.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
import bisect
import mmap
import os
import sys
import tempfile
import threading
from . import context
from . import reduction
from . import util
__all__ = ['BufferWrapper']
#
# Inheritable class which wraps an mmap, and from which blocks can be allocated
#
if sys.platform == 'win32':
import _winapi
class Arena(object):
_rand = tempfile._RandomNameSequence()
def __init__(self, size):
self.size = size
for i in range(100):
name = 'pym-%d-%s' % (os.getpid(), next(self._rand))
buf = mmap.mmap(-1, size, tagname=name)
if _winapi.GetLastError() == 0:
break
# We have reopened a preexisting mmap.
buf.close()
else:
raise FileExistsError('Cannot find name for new mmap')
self.name = name
self.buffer = buf
self._state = (self.size, self.name)
def __getstate__(self):
context.assert_spawning(self)
return self._state
def __setstate__(self, state):
self.size, self.name = self._state = state
self.buffer = mmap.mmap(-1, self.size, tagname=self.name)
assert _winapi.GetLastError() == _winapi.ERROR_ALREADY_EXISTS
else:
class Arena(object):
def __init__(self, size, fd=-1):
self.size = size
self.fd = fd
if fd == -1:
self.fd, name = tempfile.mkstemp(
prefix='pym-%d-'%os.getpid(), dir=util.get_temp_dir())
os.unlink(name)
util.Finalize(self, os.close, (self.fd,))
with open(self.fd, 'wb', closefd=False) as f:
f.write(b'\0'*size)
self.buffer = mmap.mmap(self.fd, self.size)
def reduce_arena(a):
if a.fd == -1:
raise ValueError('Arena is unpicklable because '
'forking was enabled when it was created')
return rebuild_arena, (a.size, reduction.DupFd(a.fd))
def rebuild_arena(size, dupfd):
return Arena(size, dupfd.detach())
reduction.register(Arena, reduce_arena)
#
# Class allowing allocation of chunks of memory from arenas
#
class Heap(object):
_alignment = 8
def __init__(self, size=mmap.PAGESIZE):
self._lastpid = os.getpid()
self._lock = threading.Lock()
self._size = size
self._lengths = []
self._len_to_seq = {}
self._start_to_block = {}
self._stop_to_block = {}
self._allocated_blocks = set()
self._arenas = []
# list of pending blocks to free - see free() comment below
self._pending_free_blocks = []
@staticmethod
def _roundup(n, alignment):
# alignment must be a power of 2
mask = alignment - 1
return (n + mask) & ~mask
def _malloc(self, size):
# returns a large enough block -- it might be much larger
i = bisect.bisect_left(self._lengths, size)
if i == len(self._lengths):
length = self._roundup(max(self._size, size), mmap.PAGESIZE)
self._size *= 2
util.info('allocating a new mmap of length %d', length)
arena = Arena(length)
self._arenas.append(arena)
return (arena, 0, length)
else:
length = self._lengths[i]
seq = self._len_to_seq[length]
block = seq.pop()
if not seq:
del self._len_to_seq[length], self._lengths[i]
(arena, start, stop) = block
del self._start_to_block[(arena, start)]
del self._stop_to_block[(arena, stop)]
return block
def _free(self, block):
# free location and try to merge with neighbours
(arena, start, stop) = block
try:
prev_block = self._stop_to_block[(arena, start)]
except KeyError:
pass
else:
start, _ = self._absorb(prev_block)
try:
next_block = self._start_to_block[(arena, stop)]
except KeyError:
pass
else:
_, stop = self._absorb(next_block)
block = (arena, start, stop)
length = stop - start
try:
self._len_to_seq[length].append(block)
except KeyError:
self._len_to_seq[length] = [block]
bisect.insort(self._lengths, length)
self._start_to_block[(arena, start)] = block
self._stop_to_block[(arena, stop)] = block
def _absorb(self, block):
# deregister this block so it can be merged with a neighbour
(arena, start, stop) = block
del self._start_to_block[(arena, start)]
del self._stop_to_block[(arena, stop)]
length = stop - start
seq = self._len_to_seq[length]
seq.remove(block)
if not seq:
del self._len_to_seq[length]
self._lengths.remove(length)
return start, stop
def _free_pending_blocks(self):
# Free all the blocks in the pending list - called with the lock held.
while True:
try:
block = self._pending_free_blocks.pop()
except IndexError:
break
self._allocated_blocks.remove(block)
self._free(block)
def free(self, block):
# free a block returned by malloc()
# Since free() can be called asynchronously by the GC, it could happen
# that it's called while self._lock is held: in that case,
# self._lock.acquire() would deadlock (issue #12352). To avoid that, a
# trylock is used instead, and if the lock can't be acquired
# immediately, the block is added to a list of blocks to be freed
# synchronously sometimes later from malloc() or free(), by calling
# _free_pending_blocks() (appending and retrieving from a list is not
# strictly thread-safe but under cPython it's atomic thanks to the GIL).
assert os.getpid() == self._lastpid
if not self._lock.acquire(False):
# can't acquire the lock right now, add the block to the list of
# pending blocks to free
self._pending_free_blocks.append(block)
else:
# we hold the lock
try:
self._free_pending_blocks()
self._allocated_blocks.remove(block)
self._free(block)
finally:
self._lock.release()
def malloc(self, size):
# return a block of right size (possibly rounded up)
assert 0 <= size < sys.maxsize
if os.getpid() != self._lastpid:
self.__init__() # reinitialize after fork
self._lock.acquire()
self._free_pending_blocks()
try:
size = self._roundup(max(size,1), self._alignment)
(arena, start, stop) = self._malloc(size)
new_stop = start + size
if new_stop < stop:
self._free((arena, new_stop, stop))
block = (arena, start, new_stop)
self._allocated_blocks.add(block)
return block
finally:
self._lock.release()
#
# Class representing a chunk of an mmap -- can be inherited by child process
#
class BufferWrapper(object):
_heap = Heap()
def __init__(self, size):
assert 0 <= size < sys.maxsize
block = BufferWrapper._heap.malloc(size)
self._state = (block, size)
util.Finalize(self, BufferWrapper._heap.free, args=(block,))
def create_memoryview(self):
(arena, start, stop), size = self._state
return memoryview(arena.buffer)[start:start+size]
#
# Module providing the `SyncManager` class for dealing
# with shared objects
#
# multiprocessing/managers.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = [ 'BaseManager', 'SyncManager', 'BaseProxy', 'Token' ]
#
# Imports
#
import sys
import threading
import array
import queue
from time import time as _time
from traceback import format_exc
from . import connection
from . import context
from . import pool
from . import process
from . import reduction
from . import util
from . import get_context
#
# Register some things for pickling
#
def reduce_array(a):
return array.array, (a.typecode, a.tobytes())
reduction.register(array.array, reduce_array)
view_types = [type(getattr({}, name)()) for name in ('items','keys','values')]
if view_types[0] is not list: # only needed in Py3.0
def rebuild_as_list(obj):
return list, (list(obj),)
for view_type in view_types:
reduction.register(view_type, rebuild_as_list)
#
# Type for identifying shared objects
#
class Token(object):
'''
Type to uniquely identify a shared object
'''
__slots__ = ('typeid', 'address', 'id')
def __init__(self, typeid, address, id):
(self.typeid, self.address, self.id) = (typeid, address, id)
def __getstate__(self):
return (self.typeid, self.address, self.id)
def __setstate__(self, state):
(self.typeid, self.address, self.id) = state
def __repr__(self):
return 'Token(typeid=%r, address=%r, id=%r)' % \
(self.typeid, self.address, self.id)
#
# Function for communication with a manager's server process
#
def dispatch(c, id, methodname, args=(), kwds={}):
'''
Send a message to manager using connection `c` and return response
'''
c.send((id, methodname, args, kwds))
kind, result = c.recv()
if kind == '#RETURN':
return result
raise convert_to_error(kind, result)
def convert_to_error(kind, result):
if kind == '#ERROR':
return result
elif kind == '#TRACEBACK':
assert type(result) is str
return RemoteError(result)
elif kind == '#UNSERIALIZABLE':
assert type(result) is str
return RemoteError('Unserializable message: %s\n' % result)
else:
return ValueError('Unrecognized message type')
class RemoteError(Exception):
def __str__(self):
return ('\n' + '-'*75 + '\n' + str(self.args[0]) + '-'*75)
#
# Functions for finding the method names of an object
#
def all_methods(obj):
'''
Return a list of names of methods of `obj`
'''
temp = []
for name in dir(obj):
func = getattr(obj, name)
if callable(func):
temp.append(name)
return temp
def public_methods(obj):
'''
Return a list of names of methods of `obj` which do not start with '_'
'''
return [name for name in all_methods(obj) if name[0] != '_']
#
# Server which is run in a process controlled by a manager
#
class Server(object):
'''
Server class which runs in a process controlled by a manager object
'''
public = ['shutdown', 'create', 'accept_connection', 'get_methods',
'debug_info', 'number_of_objects', 'dummy', 'incref', 'decref']
def __init__(self, registry, address, authkey, serializer):
assert isinstance(authkey, bytes)
self.registry = registry
self.authkey = process.AuthenticationString(authkey)
Listener, Client = listener_client[serializer]
# do authentication later
self.listener = Listener(address=address, backlog=16)
self.address = self.listener.address
self.id_to_obj = {'0': (None, ())}
self.id_to_refcount = {}
self.mutex = threading.RLock()
def serve_forever(self):
'''
Run the server forever
'''
self.stop_event = threading.Event()
process.current_process()._manager_server = self
try:
accepter = threading.Thread(target=self.accepter)
accepter.daemon = True
accepter.start()
try:
while not self.stop_event.is_set():
self.stop_event.wait(1)
except (KeyboardInterrupt, SystemExit):
pass
finally:
if sys.stdout != sys.__stdout__:
util.debug('resetting stdout, stderr')
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
sys.exit(0)
def accepter(self):
while True:
try:
c = self.listener.accept()
except OSError:
continue
t = threading.Thread(target=self.handle_request, args=(c,))
t.daemon = True
t.start()
def handle_request(self, c):
'''
Handle a new connection
'''
funcname = result = request = None
try:
connection.deliver_challenge(c, self.authkey)
connection.answer_challenge(c, self.authkey)
request = c.recv()
ignore, funcname, args, kwds = request
assert funcname in self.public, '%r unrecognized' % funcname
func = getattr(self, funcname)
except Exception:
msg = ('#TRACEBACK', format_exc())
else:
try:
result = func(c, *args, **kwds)
except Exception:
msg = ('#TRACEBACK', format_exc())
else:
msg = ('#RETURN', result)
try:
c.send(msg)
except Exception as e:
try:
c.send(('#TRACEBACK', format_exc()))
except Exception:
pass
util.info('Failure to send message: %r', msg)
util.info(' ... request was %r', request)
util.info(' ... exception was %r', e)
c.close()
def serve_client(self, conn):
'''
Handle requests from the proxies in a particular process/thread
'''
util.debug('starting server thread to service %r',
threading.current_thread().name)
recv = conn.recv
send = conn.send
id_to_obj = self.id_to_obj
while not self.stop_event.is_set():
try:
methodname = obj = None
request = recv()
ident, methodname, args, kwds = request
obj, exposed, gettypeid = id_to_obj[ident]
if methodname not in exposed:
raise AttributeError(
'method %r of %r object is not in exposed=%r' %
(methodname, type(obj), exposed)
)
function = getattr(obj, methodname)
try:
res = function(*args, **kwds)
except Exception as e:
msg = ('#ERROR', e)
else:
typeid = gettypeid and gettypeid.get(methodname, None)
if typeid:
rident, rexposed = self.create(conn, typeid, res)
token = Token(typeid, self.address, rident)
msg = ('#PROXY', (rexposed, token))
else:
msg = ('#RETURN', res)
except AttributeError:
if methodname is None:
msg = ('#TRACEBACK', format_exc())
else:
try:
fallback_func = self.fallback_mapping[methodname]
result = fallback_func(
self, conn, ident, obj, *args, **kwds
)
msg = ('#RETURN', result)
except Exception:
msg = ('#TRACEBACK', format_exc())
except EOFError:
util.debug('got EOF -- exiting thread serving %r',
threading.current_thread().name)
sys.exit(0)
except Exception:
msg = ('#TRACEBACK', format_exc())
try:
try:
send(msg)
except Exception as e:
send(('#UNSERIALIZABLE', repr(msg)))
except Exception as e:
util.info('exception in thread serving %r',
threading.current_thread().name)
util.info(' ... message was %r', msg)
util.info(' ... exception was %r', e)
conn.close()
sys.exit(1)
def fallback_getvalue(self, conn, ident, obj):
return obj
def fallback_str(self, conn, ident, obj):
return str(obj)
def fallback_repr(self, conn, ident, obj):
return repr(obj)
fallback_mapping = {
'__str__':fallback_str,
'__repr__':fallback_repr,
'#GETVALUE':fallback_getvalue
}
def dummy(self, c):
pass
def debug_info(self, c):
'''
Return some info --- useful to spot problems with refcounting
'''
self.mutex.acquire()
try:
result = []
keys = list(self.id_to_obj.keys())
keys.sort()
for ident in keys:
if ident != '0':
result.append(' %s: refcount=%s\n %s' %
(ident, self.id_to_refcount[ident],
str(self.id_to_obj[ident][0])[:75]))
return '\n'.join(result)
finally:
self.mutex.release()
def number_of_objects(self, c):
'''
Number of shared objects
'''
return len(self.id_to_obj) - 1 # don't count ident='0'
def shutdown(self, c):
'''
Shutdown this process
'''
try:
util.debug('manager received shutdown message')
c.send(('#RETURN', None))
except:
import traceback
traceback.print_exc()
finally:
self.stop_event.set()
def create(self, c, typeid, *args, **kwds):
'''
Create a new shared object and return its id
'''
self.mutex.acquire()
try:
callable, exposed, method_to_typeid, proxytype = \
self.registry[typeid]
if callable is None:
assert len(args) == 1 and not kwds
obj = args[0]
else:
obj = callable(*args, **kwds)
if exposed is None:
exposed = public_methods(obj)
if method_to_typeid is not None:
assert type(method_to_typeid) is dict
exposed = list(exposed) + list(method_to_typeid)
ident = '%x' % id(obj) # convert to string because xmlrpclib
# only has 32 bit signed integers
util.debug('%r callable returned object with id %r', typeid, ident)
self.id_to_obj[ident] = (obj, set(exposed), method_to_typeid)
if ident not in self.id_to_refcount:
self.id_to_refcount[ident] = 0
# increment the reference count immediately, to avoid
# this object being garbage collected before a Proxy
# object for it can be created. The caller of create()
# is responsible for doing a decref once the Proxy object
# has been created.
self.incref(c, ident)
return ident, tuple(exposed)
finally:
self.mutex.release()
def get_methods(self, c, token):
'''
Return the methods of the shared object indicated by token
'''
return tuple(self.id_to_obj[token.id][1])
def accept_connection(self, c, name):
'''
Spawn a new thread to serve this connection
'''
threading.current_thread().name = name
c.send(('#RETURN', None))
self.serve_client(c)
def incref(self, c, ident):
self.mutex.acquire()
try:
self.id_to_refcount[ident] += 1
finally:
self.mutex.release()
def decref(self, c, ident):
self.mutex.acquire()
try:
assert self.id_to_refcount[ident] >= 1
self.id_to_refcount[ident] -= 1
if self.id_to_refcount[ident] == 0:
del self.id_to_obj[ident], self.id_to_refcount[ident]
util.debug('disposing of obj with id %r', ident)
finally:
self.mutex.release()
#
# Class to represent state of a manager
#
class State(object):
__slots__ = ['value']
INITIAL = 0
STARTED = 1
SHUTDOWN = 2
#
# Mapping from serializer name to Listener and Client types
#
listener_client = {
'pickle' : (connection.Listener, connection.Client),
'xmlrpclib' : (connection.XmlListener, connection.XmlClient)
}
#
# Definition of BaseManager
#
class BaseManager(object):
'''
Base class for managers
'''
_registry = {}
_Server = Server
def __init__(self, address=None, authkey=None, serializer='pickle',
ctx=None):
if authkey is None:
authkey = process.current_process().authkey
self._address = address # XXX not final address if eg ('', 0)
self._authkey = process.AuthenticationString(authkey)
self._state = State()
self._state.value = State.INITIAL
self._serializer = serializer
self._Listener, self._Client = listener_client[serializer]
self._ctx = ctx or get_context()
def get_server(self):
'''
Return server object with serve_forever() method and address attribute
'''
assert self._state.value == State.INITIAL
return Server(self._registry, self._address,
self._authkey, self._serializer)
def connect(self):
'''
Connect manager object to the server process
'''
Listener, Client = listener_client[self._serializer]
conn = Client(self._address, authkey=self._authkey)
dispatch(conn, None, 'dummy')
self._state.value = State.STARTED
def start(self, initializer=None, initargs=()):
'''
Spawn a server process for this manager object
'''
assert self._state.value == State.INITIAL
if initializer is not None and not callable(initializer):
raise TypeError('initializer must be a callable')
# pipe over which we will retrieve address of server
reader, writer = connection.Pipe(duplex=False)
# spawn process which runs a server
self._process = self._ctx.Process(
target=type(self)._run_server,
args=(self._registry, self._address, self._authkey,
self._serializer, writer, initializer, initargs),
)
ident = ':'.join(str(i) for i in self._process._identity)
self._process.name = type(self).__name__ + '-' + ident
self._process.start()
# get address of server
writer.close()
self._address = reader.recv()
reader.close()
# register a finalizer
self._state.value = State.STARTED
self.shutdown = util.Finalize(
self, type(self)._finalize_manager,
args=(self._process, self._address, self._authkey,
self._state, self._Client),
exitpriority=0
)
@classmethod
def _run_server(cls, registry, address, authkey, serializer, writer,
initializer=None, initargs=()):
'''
Create a server, report its address and run it
'''
if initializer is not None:
initializer(*initargs)
# create server
server = cls._Server(registry, address, authkey, serializer)
# inform parent process of the server's address
writer.send(server.address)
writer.close()
# run the manager
util.info('manager serving at %r', server.address)
server.serve_forever()
def _create(self, typeid, *args, **kwds):
'''
Create a new shared object; return the token and exposed tuple
'''
assert self._state.value == State.STARTED, 'server not yet started'
conn = self._Client(self._address, authkey=self._authkey)
try:
id, exposed = dispatch(conn, None, 'create', (typeid,)+args, kwds)
finally:
conn.close()
return Token(typeid, self._address, id), exposed
def join(self, timeout=None):
'''
Join the manager process (if it has been spawned)
'''
if self._process is not None:
self._process.join(timeout)
if not self._process.is_alive():
self._process = None
def _debug_info(self):
'''
Return some info about the servers shared objects and connections
'''
conn = self._Client(self._address, authkey=self._authkey)
try:
return dispatch(conn, None, 'debug_info')
finally:
conn.close()
def _number_of_objects(self):
'''
Return the number of shared objects
'''
conn = self._Client(self._address, authkey=self._authkey)
try:
return dispatch(conn, None, 'number_of_objects')
finally:
conn.close()
def __enter__(self):
if self._state.value == State.INITIAL:
self.start()
assert self._state.value == State.STARTED
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.shutdown()
@staticmethod
def _finalize_manager(process, address, authkey, state, _Client):
'''
Shutdown the manager process; will be registered as a finalizer
'''
if process.is_alive():
util.info('sending shutdown message to manager')
try:
conn = _Client(address, authkey=authkey)
try:
dispatch(conn, None, 'shutdown')
finally:
conn.close()
except Exception:
pass
process.join(timeout=1.0)
if process.is_alive():
util.info('manager still alive')
if hasattr(process, 'terminate'):
util.info('trying to `terminate()` manager process')
process.terminate()
process.join(timeout=0.1)
if process.is_alive():
util.info('manager still alive after terminate')
state.value = State.SHUTDOWN
try:
del BaseProxy._address_to_local[address]
except KeyError:
pass
address = property(lambda self: self._address)
@classmethod
def register(cls, typeid, callable=None, proxytype=None, exposed=None,
method_to_typeid=None, create_method=True):
'''
Register a typeid with the manager type
'''
if '_registry' not in cls.__dict__:
cls._registry = cls._registry.copy()
if proxytype is None:
proxytype = AutoProxy
exposed = exposed or getattr(proxytype, '_exposed_', None)
method_to_typeid = method_to_typeid or \
getattr(proxytype, '_method_to_typeid_', None)
if method_to_typeid:
for key, value in list(method_to_typeid.items()):
assert type(key) is str, '%r is not a string' % key
assert type(value) is str, '%r is not a string' % value
cls._registry[typeid] = (
callable, exposed, method_to_typeid, proxytype
)
if create_method:
def temp(self, *args, **kwds):
util.debug('requesting creation of a shared %r object', typeid)
token, exp = self._create(typeid, *args, **kwds)
proxy = proxytype(
token, self._serializer, manager=self,
authkey=self._authkey, exposed=exp
)
conn = self._Client(token.address, authkey=self._authkey)
dispatch(conn, None, 'decref', (token.id,))
return proxy
temp.__name__ = typeid
setattr(cls, typeid, temp)
#
# Subclass of set which get cleared after a fork
#
class ProcessLocalSet(set):
def __init__(self):
util.register_after_fork(self, lambda obj: obj.clear())
def __reduce__(self):
return type(self), ()
#
# Definition of BaseProxy
#
class BaseProxy(object):
'''
A base for proxies of shared objects
'''
_address_to_local = {}
_mutex = util.ForkAwareThreadLock()
def __init__(self, token, serializer, manager=None,
authkey=None, exposed=None, incref=True):
BaseProxy._mutex.acquire()
try:
tls_idset = BaseProxy._address_to_local.get(token.address, None)
if tls_idset is None:
tls_idset = util.ForkAwareLocal(), ProcessLocalSet()
BaseProxy._address_to_local[token.address] = tls_idset
finally:
BaseProxy._mutex.release()
# self._tls is used to record the connection used by this
# thread to communicate with the manager at token.address
self._tls = tls_idset[0]
# self._idset is used to record the identities of all shared
# objects for which the current process owns references and
# which are in the manager at token.address
self._idset = tls_idset[1]
self._token = token
self._id = self._token.id
self._manager = manager
self._serializer = serializer
self._Client = listener_client[serializer][1]
if authkey is not None:
self._authkey = process.AuthenticationString(authkey)
elif self._manager is not None:
self._authkey = self._manager._authkey
else:
self._authkey = process.current_process().authkey
if incref:
self._incref()
util.register_after_fork(self, BaseProxy._after_fork)
def _connect(self):
util.debug('making connection to manager')
name = process.current_process().name
if threading.current_thread().name != 'MainThread':
name += '|' + threading.current_thread().name
conn = self._Client(self._token.address, authkey=self._authkey)
dispatch(conn, None, 'accept_connection', (name,))
self._tls.connection = conn
def _callmethod(self, methodname, args=(), kwds={}):
'''
Try to call a method of the referrent and return a copy of the result
'''
try:
conn = self._tls.connection
except AttributeError:
util.debug('thread %r does not own a connection',
threading.current_thread().name)
self._connect()
conn = self._tls.connection
conn.send((self._id, methodname, args, kwds))
kind, result = conn.recv()
if kind == '#RETURN':
return result
elif kind == '#PROXY':
exposed, token = result
proxytype = self._manager._registry[token.typeid][-1]
token.address = self._token.address
proxy = proxytype(
token, self._serializer, manager=self._manager,
authkey=self._authkey, exposed=exposed
)
conn = self._Client(token.address, authkey=self._authkey)
dispatch(conn, None, 'decref', (token.id,))
return proxy
raise convert_to_error(kind, result)
def _getvalue(self):
'''
Get a copy of the value of the referent
'''
return self._callmethod('#GETVALUE')
def _incref(self):
conn = self._Client(self._token.address, authkey=self._authkey)
dispatch(conn, None, 'incref', (self._id,))
util.debug('INCREF %r', self._token.id)
self._idset.add(self._id)
state = self._manager and self._manager._state
self._close = util.Finalize(
self, BaseProxy._decref,
args=(self._token, self._authkey, state,
self._tls, self._idset, self._Client),
exitpriority=10
)
@staticmethod
def _decref(token, authkey, state, tls, idset, _Client):
idset.discard(token.id)
# check whether manager is still alive
if state is None or state.value == State.STARTED:
# tell manager this process no longer cares about referent
try:
util.debug('DECREF %r', token.id)
conn = _Client(token.address, authkey=authkey)
dispatch(conn, None, 'decref', (token.id,))
except Exception as e:
util.debug('... decref failed %s', e)
else:
util.debug('DECREF %r -- manager already shutdown', token.id)
# check whether we can close this thread's connection because
# the process owns no more references to objects for this manager
if not idset and hasattr(tls, 'connection'):
util.debug('thread %r has no more proxies so closing conn',
threading.current_thread().name)
tls.connection.close()
del tls.connection
def _after_fork(self):
self._manager = None
try:
self._incref()
except Exception as e:
# the proxy may just be for a manager which has shutdown
util.info('incref failed: %s' % e)
def __reduce__(self):
kwds = {}
if context.get_spawning_popen() is not None:
kwds['authkey'] = self._authkey
if getattr(self, '_isauto', False):
kwds['exposed'] = self._exposed_
return (RebuildProxy,
(AutoProxy, self._token, self._serializer, kwds))
else:
return (RebuildProxy,
(type(self), self._token, self._serializer, kwds))
def __deepcopy__(self, memo):
return self._getvalue()
def __repr__(self):
return '<%s object, typeid %r at %s>' % \
(type(self).__name__, self._token.typeid, '0x%x' % id(self))
def __str__(self):
'''
Return representation of the referent (or a fall-back if that fails)
'''
try:
return self._callmethod('__repr__')
except Exception:
return repr(self)[:-1] + "; '__str__()' failed>"
#
# Function used for unpickling
#
def RebuildProxy(func, token, serializer, kwds):
'''
Function used for unpickling proxy objects.
If possible the shared object is returned, or otherwise a proxy for it.
'''
server = getattr(process.current_process(), '_manager_server', None)
if server and server.address == token.address:
return server.id_to_obj[token.id][0]
else:
incref = (
kwds.pop('incref', True) and
not getattr(process.current_process(), '_inheriting', False)
)
return func(token, serializer, incref=incref, **kwds)
#
# Functions to create proxies and proxy types
#
def MakeProxyType(name, exposed, _cache={}):
'''
Return an proxy type whose methods are given by `exposed`
'''
exposed = tuple(exposed)
try:
return _cache[(name, exposed)]
except KeyError:
pass
dic = {}
for meth in exposed:
exec('''def %s(self, *args, **kwds):
return self._callmethod(%r, args, kwds)''' % (meth, meth), dic)
ProxyType = type(name, (BaseProxy,), dic)
ProxyType._exposed_ = exposed
_cache[(name, exposed)] = ProxyType
return ProxyType
def AutoProxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True):
'''
Return an auto-proxy for `token`
'''
_Client = listener_client[serializer][1]
if exposed is None:
conn = _Client(token.address, authkey=authkey)
try:
exposed = dispatch(conn, None, 'get_methods', (token,))
finally:
conn.close()
if authkey is None and manager is not None:
authkey = manager._authkey
if authkey is None:
authkey = process.current_process().authkey
ProxyType = MakeProxyType('AutoProxy[%s]' % token.typeid, exposed)
proxy = ProxyType(token, serializer, manager=manager, authkey=authkey,
incref=incref)
proxy._isauto = True
return proxy
#
# Types/callables which we will register with SyncManager
#
class Namespace(object):
def __init__(self, **kwds):
self.__dict__.update(kwds)
def __repr__(self):
items = list(self.__dict__.items())
temp = []
for name, value in items:
if not name.startswith('_'):
temp.append('%s=%r' % (name, value))
temp.sort()
return 'Namespace(%s)' % str.join(', ', temp)
class Value(object):
def __init__(self, typecode, value, lock=True):
self._typecode = typecode
self._value = value
def get(self):
return self._value
def set(self, value):
self._value = value
def __repr__(self):
return '%s(%r, %r)'%(type(self).__name__, self._typecode, self._value)
value = property(get, set)
def Array(typecode, sequence, lock=True):
return array.array(typecode, sequence)
#
# Proxy types used by SyncManager
#
class IteratorProxy(BaseProxy):
_exposed_ = ('__next__', 'send', 'throw', 'close')
def __iter__(self):
return self
def __next__(self, *args):
return self._callmethod('__next__', args)
def send(self, *args):
return self._callmethod('send', args)
def throw(self, *args):
return self._callmethod('throw', args)
def close(self, *args):
return self._callmethod('close', args)
class AcquirerProxy(BaseProxy):
_exposed_ = ('acquire', 'release')
def acquire(self, blocking=True, timeout=None):
args = (blocking,) if timeout is None else (blocking, timeout)
return self._callmethod('acquire', args)
def release(self):
return self._callmethod('release')
def __enter__(self):
return self._callmethod('acquire')
def __exit__(self, exc_type, exc_val, exc_tb):
return self._callmethod('release')
class ConditionProxy(AcquirerProxy):
_exposed_ = ('acquire', 'release', 'wait', 'notify', 'notify_all')
def wait(self, timeout=None):
return self._callmethod('wait', (timeout,))
def notify(self):
return self._callmethod('notify')
def notify_all(self):
return self._callmethod('notify_all')
def wait_for(self, predicate, timeout=None):
result = predicate()
if result:
return result
if timeout is not None:
endtime = _time() + timeout
else:
endtime = None
waittime = None
while not result:
if endtime is not None:
waittime = endtime - _time()
if waittime <= 0:
break
self.wait(waittime)
result = predicate()
return result
class EventProxy(BaseProxy):
_exposed_ = ('is_set', 'set', 'clear', 'wait')
def is_set(self):
return self._callmethod('is_set')
def set(self):
return self._callmethod('set')
def clear(self):
return self._callmethod('clear')
def wait(self, timeout=None):
return self._callmethod('wait', (timeout,))
class BarrierProxy(BaseProxy):
_exposed_ = ('__getattribute__', 'wait', 'abort', 'reset')
def wait(self, timeout=None):
return self._callmethod('wait', (timeout,))
def abort(self):
return self._callmethod('abort')
def reset(self):
return self._callmethod('reset')
@property
def parties(self):
return self._callmethod('__getattribute__', ('parties',))
@property
def n_waiting(self):
return self._callmethod('__getattribute__', ('n_waiting',))
@property
def broken(self):
return self._callmethod('__getattribute__', ('broken',))
class NamespaceProxy(BaseProxy):
_exposed_ = ('__getattribute__', '__setattr__', '__delattr__')
def __getattr__(self, key):
if key[0] == '_':
return object.__getattribute__(self, key)
callmethod = object.__getattribute__(self, '_callmethod')
return callmethod('__getattribute__', (key,))
def __setattr__(self, key, value):
if key[0] == '_':
return object.__setattr__(self, key, value)
callmethod = object.__getattribute__(self, '_callmethod')
return callmethod('__setattr__', (key, value))
def __delattr__(self, key):
if key[0] == '_':
return object.__delattr__(self, key)
callmethod = object.__getattribute__(self, '_callmethod')
return callmethod('__delattr__', (key,))
class ValueProxy(BaseProxy):
_exposed_ = ('get', 'set')
def get(self):
return self._callmethod('get')
def set(self, value):
return self._callmethod('set', (value,))
value = property(get, set)
BaseListProxy = MakeProxyType('BaseListProxy', (
'__add__', '__contains__', '__delitem__', '__getitem__', '__len__',
'__mul__', '__reversed__', '__rmul__', '__setitem__',
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort', '__imul__'
))
class ListProxy(BaseListProxy):
def __iadd__(self, value):
self._callmethod('extend', (value,))
return self
def __imul__(self, value):
self._callmethod('__imul__', (value,))
return self
DictProxy = MakeProxyType('DictProxy', (
'__contains__', '__delitem__', '__getitem__', '__len__',
'__setitem__', 'clear', 'copy', 'get', 'has_key', 'items',
'keys', 'pop', 'popitem', 'setdefault', 'update', 'values'
))
ArrayProxy = MakeProxyType('ArrayProxy', (
'__len__', '__getitem__', '__setitem__'
))
BasePoolProxy = MakeProxyType('PoolProxy', (
'apply', 'apply_async', 'close', 'imap', 'imap_unordered', 'join',
'map', 'map_async', 'starmap', 'starmap_async', 'terminate',
))
BasePoolProxy._method_to_typeid_ = {
'apply_async': 'AsyncResult',
'map_async': 'AsyncResult',
'starmap_async': 'AsyncResult',
'imap': 'Iterator',
'imap_unordered': 'Iterator'
}
class PoolProxy(BasePoolProxy):
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.terminate()
#
# Definition of SyncManager
#
class SyncManager(BaseManager):
'''
Subclass of `BaseManager` which supports a number of shared object types.
The types registered are those intended for the synchronization
of threads, plus `dict`, `list` and `Namespace`.
The `multiprocessing.Manager()` function creates started instances of
this class.
'''
SyncManager.register('Queue', queue.Queue)
SyncManager.register('JoinableQueue', queue.Queue)
SyncManager.register('Event', threading.Event, EventProxy)
SyncManager.register('Lock', threading.Lock, AcquirerProxy)
SyncManager.register('RLock', threading.RLock, AcquirerProxy)
SyncManager.register('Semaphore', threading.Semaphore, AcquirerProxy)
SyncManager.register('BoundedSemaphore', threading.BoundedSemaphore,
AcquirerProxy)
SyncManager.register('Condition', threading.Condition, ConditionProxy)
SyncManager.register('Barrier', threading.Barrier, BarrierProxy)
SyncManager.register('Pool', pool.Pool, PoolProxy)
SyncManager.register('list', list, ListProxy)
SyncManager.register('dict', dict, DictProxy)
SyncManager.register('Value', Value, ValueProxy)
SyncManager.register('Array', Array, ArrayProxy)
SyncManager.register('Namespace', Namespace, NamespaceProxy)
# types returned by methods of PoolProxy
SyncManager.register('Iterator', proxytype=IteratorProxy, create_method=False)
SyncManager.register('AsyncResult', create_method=False)
#
# Module providing the `Pool` class for managing a process pool
#
# multiprocessing/pool.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = ['Pool', 'ThreadPool']
#
# Imports
#
import threading
import queue
import itertools
import collections
import os
import time
import traceback
# If threading is available then ThreadPool should be provided. Therefore
# we avoid top-level imports which are liable to fail on some systems.
from . import util
from . import get_context, TimeoutError
#
# Constants representing the state of a pool
#
RUN = 0
CLOSE = 1
TERMINATE = 2
#
# Miscellaneous
#
job_counter = itertools.count()
def mapstar(args):
return list(map(*args))
def starmapstar(args):
return list(itertools.starmap(args[0], args[1]))
#
# Hack to embed stringification of remote traceback in local traceback
#
class RemoteTraceback(Exception):
def __init__(self, tb):
self.tb = tb
def __str__(self):
return self.tb
class ExceptionWithTraceback:
def __init__(self, exc, tb):
tb = traceback.format_exception(type(exc), exc, tb)
tb = ''.join(tb)
self.exc = exc
self.tb = '\n"""\n%s"""' % tb
def __reduce__(self):
return rebuild_exc, (self.exc, self.tb)
def rebuild_exc(exc, tb):
exc.__cause__ = RemoteTraceback(tb)
return exc
#
# Code run by worker processes
#
class MaybeEncodingError(Exception):
"""Wraps possible unpickleable errors, so they can be
safely sent through the socket."""
def __init__(self, exc, value):
self.exc = repr(exc)
self.value = repr(value)
super(MaybeEncodingError, self).__init__(self.exc, self.value)
def __str__(self):
return "Error sending result: '%s'. Reason: '%s'" % (self.value,
self.exc)
def __repr__(self):
return "<MaybeEncodingError: %s>" % str(self)
def worker(inqueue, outqueue, initializer=None, initargs=(), maxtasks=None,
wrap_exception=False):
assert maxtasks is None or (type(maxtasks) == int and maxtasks > 0)
put = outqueue.put
get = inqueue.get
if hasattr(inqueue, '_writer'):
inqueue._writer.close()
outqueue._reader.close()
if initializer is not None:
initializer(*initargs)
completed = 0
while maxtasks is None or (maxtasks and completed < maxtasks):
try:
task = get()
except (EOFError, OSError):
util.debug('worker got EOFError or OSError -- exiting')
break
if task is None:
util.debug('worker got sentinel -- exiting')
break
job, i, func, args, kwds = task
try:
result = (True, func(*args, **kwds))
except Exception as e:
if wrap_exception:
e = ExceptionWithTraceback(e, e.__traceback__)
result = (False, e)
try:
put((job, i, result))
except Exception as e:
wrapped = MaybeEncodingError(e, result[1])
util.debug("Possible encoding error while sending result: %s" % (
wrapped))
put((job, i, (False, wrapped)))
completed += 1
util.debug('worker exiting after %d tasks' % completed)
#
# Class representing a process pool
#
class Pool(object):
'''
Class which supports an async version of applying functions to arguments.
'''
_wrap_exception = True
def Process(self, *args, **kwds):
return self._ctx.Process(*args, **kwds)
def __init__(self, processes=None, initializer=None, initargs=(),
maxtasksperchild=None, context=None):
self._ctx = context or get_context()
self._setup_queues()
self._taskqueue = queue.Queue()
self._cache = {}
self._state = RUN
self._maxtasksperchild = maxtasksperchild
self._initializer = initializer
self._initargs = initargs
if processes is None:
processes = os.cpu_count() or 1
if processes < 1:
raise ValueError("Number of processes must be at least 1")
if initializer is not None and not callable(initializer):
raise TypeError('initializer must be a callable')
self._processes = processes
self._pool = []
self._repopulate_pool()
self._worker_handler = threading.Thread(
target=Pool._handle_workers,
args=(self, )
)
self._worker_handler.daemon = True
self._worker_handler._state = RUN
self._worker_handler.start()
self._task_handler = threading.Thread(
target=Pool._handle_tasks,
args=(self._taskqueue, self._quick_put, self._outqueue,
self._pool, self._cache)
)
self._task_handler.daemon = True
self._task_handler._state = RUN
self._task_handler.start()
self._result_handler = threading.Thread(
target=Pool._handle_results,
args=(self._outqueue, self._quick_get, self._cache)
)
self._result_handler.daemon = True
self._result_handler._state = RUN
self._result_handler.start()
self._terminate = util.Finalize(
self, self._terminate_pool,
args=(self._taskqueue, self._inqueue, self._outqueue, self._pool,
self._worker_handler, self._task_handler,
self._result_handler, self._cache),
exitpriority=15
)
def _join_exited_workers(self):
"""Cleanup after any worker processes which have exited due to reaching
their specified lifetime. Returns True if any workers were cleaned up.
"""
cleaned = False
for i in reversed(range(len(self._pool))):
worker = self._pool[i]
if worker.exitcode is not None:
# worker exited
util.debug('cleaning up worker %d' % i)
worker.join()
cleaned = True
del self._pool[i]
return cleaned
def _repopulate_pool(self):
"""Bring the number of pool processes up to the specified number,
for use after reaping workers which have exited.
"""
for i in range(self._processes - len(self._pool)):
w = self.Process(target=worker,
args=(self._inqueue, self._outqueue,
self._initializer,
self._initargs, self._maxtasksperchild,
self._wrap_exception)
)
self._pool.append(w)
w.name = w.name.replace('Process', 'PoolWorker')
w.daemon = True
w.start()
util.debug('added worker')
def _maintain_pool(self):
"""Clean up any exited workers and start replacements for them.
"""
if self._join_exited_workers():
self._repopulate_pool()
def _setup_queues(self):
self._inqueue = self._ctx.SimpleQueue()
self._outqueue = self._ctx.SimpleQueue()
self._quick_put = self._inqueue._writer.send
self._quick_get = self._outqueue._reader.recv
def apply(self, func, args=(), kwds={}):
'''
Equivalent of `func(*args, **kwds)`.
'''
assert self._state == RUN
return self.apply_async(func, args, kwds).get()
def map(self, func, iterable, chunksize=None):
'''
Apply `func` to each element in `iterable`, collecting the results
in a list that is returned.
'''
return self._map_async(func, iterable, mapstar, chunksize).get()
def starmap(self, func, iterable, chunksize=None):
'''
Like `map()` method but the elements of the `iterable` are expected to
be iterables as well and will be unpacked as arguments. Hence
`func` and (a, b) becomes func(a, b).
'''
return self._map_async(func, iterable, starmapstar, chunksize).get()
def starmap_async(self, func, iterable, chunksize=None, callback=None,
error_callback=None):
'''
Asynchronous version of `starmap()` method.
'''
return self._map_async(func, iterable, starmapstar, chunksize,
callback, error_callback)
def imap(self, func, iterable, chunksize=1):
'''
Equivalent of `map()` -- can be MUCH slower than `Pool.map()`.
'''
if self._state != RUN:
raise ValueError("Pool not running")
if chunksize == 1:
result = IMapIterator(self._cache)
self._taskqueue.put((((result._job, i, func, (x,), {})
for i, x in enumerate(iterable)), result._set_length))
return result
else:
assert chunksize > 1
task_batches = Pool._get_tasks(func, iterable, chunksize)
result = IMapIterator(self._cache)
self._taskqueue.put((((result._job, i, mapstar, (x,), {})
for i, x in enumerate(task_batches)), result._set_length))
return (item for chunk in result for item in chunk)
def imap_unordered(self, func, iterable, chunksize=1):
'''
Like `imap()` method but ordering of results is arbitrary.
'''
if self._state != RUN:
raise ValueError("Pool not running")
if chunksize == 1:
result = IMapUnorderedIterator(self._cache)
self._taskqueue.put((((result._job, i, func, (x,), {})
for i, x in enumerate(iterable)), result._set_length))
return result
else:
assert chunksize > 1
task_batches = Pool._get_tasks(func, iterable, chunksize)
result = IMapUnorderedIterator(self._cache)
self._taskqueue.put((((result._job, i, mapstar, (x,), {})
for i, x in enumerate(task_batches)), result._set_length))
return (item for chunk in result for item in chunk)
def apply_async(self, func, args=(), kwds={}, callback=None,
error_callback=None):
'''
Asynchronous version of `apply()` method.
'''
if self._state != RUN:
raise ValueError("Pool not running")
result = ApplyResult(self._cache, callback, error_callback)
self._taskqueue.put(([(result._job, None, func, args, kwds)], None))
return result
def map_async(self, func, iterable, chunksize=None, callback=None,
error_callback=None):
'''
Asynchronous version of `map()` method.
'''
return self._map_async(func, iterable, mapstar, chunksize, callback,
error_callback)
def _map_async(self, func, iterable, mapper, chunksize=None, callback=None,
error_callback=None):
'''
Helper function to implement map, starmap and their async counterparts.
'''
if self._state != RUN:
raise ValueError("Pool not running")
if not hasattr(iterable, '__len__'):
iterable = list(iterable)
if chunksize is None:
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
if len(iterable) == 0:
chunksize = 0
task_batches = Pool._get_tasks(func, iterable, chunksize)
result = MapResult(self._cache, chunksize, len(iterable), callback,
error_callback=error_callback)
self._taskqueue.put((((result._job, i, mapper, (x,), {})
for i, x in enumerate(task_batches)), None))
return result
@staticmethod
def _handle_workers(pool):
thread = threading.current_thread()
# Keep maintaining workers until the cache gets drained, unless the pool
# is terminated.
while thread._state == RUN or (pool._cache and thread._state != TERMINATE):
pool._maintain_pool()
time.sleep(0.1)
# send sentinel to stop workers
pool._taskqueue.put(None)
util.debug('worker handler exiting')
@staticmethod
def _handle_tasks(taskqueue, put, outqueue, pool, cache):
thread = threading.current_thread()
for taskseq, set_length in iter(taskqueue.get, None):
task = None
i = -1
try:
for i, task in enumerate(taskseq):
if thread._state:
util.debug('task handler found thread._state != RUN')
break
try:
put(task)
except Exception as e:
job, ind = task[:2]
try:
cache[job]._set(ind, (False, e))
except KeyError:
pass
else:
if set_length:
util.debug('doing set_length()')
set_length(i+1)
continue
break
except Exception as ex:
job, ind = task[:2] if task else (0, 0)
if job in cache:
cache[job]._set(ind + 1, (False, ex))
if set_length:
util.debug('doing set_length()')
set_length(i+1)
else:
util.debug('task handler got sentinel')
try:
# tell result handler to finish when cache is empty
util.debug('task handler sending sentinel to result handler')
outqueue.put(None)
# tell workers there is no more work
util.debug('task handler sending sentinel to workers')
for p in pool:
put(None)
except OSError:
util.debug('task handler got OSError when sending sentinels')
util.debug('task handler exiting')
@staticmethod
def _handle_results(outqueue, get, cache):
thread = threading.current_thread()
while 1:
try:
task = get()
except (OSError, EOFError):
util.debug('result handler got EOFError/OSError -- exiting')
return
if thread._state:
assert thread._state == TERMINATE
util.debug('result handler found thread._state=TERMINATE')
break
if task is None:
util.debug('result handler got sentinel')
break
job, i, obj = task
try:
cache[job]._set(i, obj)
except KeyError:
pass
while cache and thread._state != TERMINATE:
try:
task = get()
except (OSError, EOFError):
util.debug('result handler got EOFError/OSError -- exiting')
return
if task is None:
util.debug('result handler ignoring extra sentinel')
continue
job, i, obj = task
try:
cache[job]._set(i, obj)
except KeyError:
pass
if hasattr(outqueue, '_reader'):
util.debug('ensuring that outqueue is not full')
# If we don't make room available in outqueue then
# attempts to add the sentinel (None) to outqueue may
# block. There is guaranteed to be no more than 2 sentinels.
try:
for i in range(10):
if not outqueue._reader.poll():
break
get()
except (OSError, EOFError):
pass
util.debug('result handler exiting: len(cache)=%s, thread._state=%s',
len(cache), thread._state)
@staticmethod
def _get_tasks(func, it, size):
it = iter(it)
while 1:
x = tuple(itertools.islice(it, size))
if not x:
return
yield (func, x)
def __reduce__(self):
raise NotImplementedError(
'pool objects cannot be passed between processes or pickled'
)
def close(self):
util.debug('closing pool')
if self._state == RUN:
self._state = CLOSE
self._worker_handler._state = CLOSE
def terminate(self):
util.debug('terminating pool')
self._state = TERMINATE
self._worker_handler._state = TERMINATE
self._terminate()
def join(self):
util.debug('joining pool')
assert self._state in (CLOSE, TERMINATE)
self._worker_handler.join()
self._task_handler.join()
self._result_handler.join()
for p in self._pool:
p.join()
@staticmethod
def _help_stuff_finish(inqueue, task_handler, size):
# task_handler may be blocked trying to put items on inqueue
util.debug('removing tasks from inqueue until task handler finished')
inqueue._rlock.acquire()
while task_handler.is_alive() and inqueue._reader.poll():
inqueue._reader.recv()
time.sleep(0)
@classmethod
def _terminate_pool(cls, taskqueue, inqueue, outqueue, pool,
worker_handler, task_handler, result_handler, cache):
# this is guaranteed to only be called once
util.debug('finalizing pool')
worker_handler._state = TERMINATE
task_handler._state = TERMINATE
util.debug('helping task handler/workers to finish')
cls._help_stuff_finish(inqueue, task_handler, len(pool))
assert result_handler.is_alive() or len(cache) == 0
result_handler._state = TERMINATE
outqueue.put(None) # sentinel
# We must wait for the worker handler to exit before terminating
# workers because we don't want workers to be restarted behind our back.
util.debug('joining worker handler')
if threading.current_thread() is not worker_handler:
worker_handler.join()
# Terminate workers which haven't already finished.
if pool and hasattr(pool[0], 'terminate'):
util.debug('terminating workers')
for p in pool:
if p.exitcode is None:
p.terminate()
util.debug('joining task handler')
if threading.current_thread() is not task_handler:
task_handler.join()
util.debug('joining result handler')
if threading.current_thread() is not result_handler:
result_handler.join()
if pool and hasattr(pool[0], 'terminate'):
util.debug('joining pool workers')
for p in pool:
if p.is_alive():
# worker has not yet exited
util.debug('cleaning up worker %d' % p.pid)
p.join()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.terminate()
#
# Class whose instances are returned by `Pool.apply_async()`
#
class ApplyResult(object):
def __init__(self, cache, callback, error_callback):
self._event = threading.Event()
self._job = next(job_counter)
self._cache = cache
self._callback = callback
self._error_callback = error_callback
cache[self._job] = self
def ready(self):
return self._event.is_set()
def successful(self):
assert self.ready()
return self._success
def wait(self, timeout=None):
self._event.wait(timeout)
def get(self, timeout=None):
self.wait(timeout)
if not self.ready():
raise TimeoutError
if self._success:
return self._value
else:
raise self._value
def _set(self, i, obj):
self._success, self._value = obj
if self._callback and self._success:
self._callback(self._value)
if self._error_callback and not self._success:
self._error_callback(self._value)
self._event.set()
del self._cache[self._job]
AsyncResult = ApplyResult # create alias -- see #17805
#
# Class whose instances are returned by `Pool.map_async()`
#
class MapResult(ApplyResult):
def __init__(self, cache, chunksize, length, callback, error_callback):
ApplyResult.__init__(self, cache, callback,
error_callback=error_callback)
self._success = True
self._value = [None] * length
self._chunksize = chunksize
if chunksize <= 0:
self._number_left = 0
self._event.set()
del cache[self._job]
else:
self._number_left = length//chunksize + bool(length % chunksize)
def _set(self, i, success_result):
success, result = success_result
if success:
self._value[i*self._chunksize:(i+1)*self._chunksize] = result
self._number_left -= 1
if self._number_left == 0:
if self._callback:
self._callback(self._value)
del self._cache[self._job]
self._event.set()
else:
self._success = False
self._value = result
if self._error_callback:
self._error_callback(self._value)
del self._cache[self._job]
self._event.set()
#
# Class whose instances are returned by `Pool.imap()`
#
class IMapIterator(object):
def __init__(self, cache):
self._cond = threading.Condition(threading.Lock())
self._job = next(job_counter)
self._cache = cache
self._items = collections.deque()
self._index = 0
self._length = None
self._unsorted = {}
cache[self._job] = self
def __iter__(self):
return self
def next(self, timeout=None):
self._cond.acquire()
try:
try:
item = self._items.popleft()
except IndexError:
if self._index == self._length:
raise StopIteration
self._cond.wait(timeout)
try:
item = self._items.popleft()
except IndexError:
if self._index == self._length:
raise StopIteration
raise TimeoutError
finally:
self._cond.release()
success, value = item
if success:
return value
raise value
__next__ = next # XXX
def _set(self, i, obj):
self._cond.acquire()
try:
if self._index == i:
self._items.append(obj)
self._index += 1
while self._index in self._unsorted:
obj = self._unsorted.pop(self._index)
self._items.append(obj)
self._index += 1
self._cond.notify()
else:
self._unsorted[i] = obj
if self._index == self._length:
del self._cache[self._job]
finally:
self._cond.release()
def _set_length(self, length):
self._cond.acquire()
try:
self._length = length
if self._index == self._length:
self._cond.notify()
del self._cache[self._job]
finally:
self._cond.release()
#
# Class whose instances are returned by `Pool.imap_unordered()`
#
class IMapUnorderedIterator(IMapIterator):
def _set(self, i, obj):
self._cond.acquire()
try:
self._items.append(obj)
self._index += 1
self._cond.notify()
if self._index == self._length:
del self._cache[self._job]
finally:
self._cond.release()
#
#
#
class ThreadPool(Pool):
_wrap_exception = False
@staticmethod
def Process(*args, **kwds):
from .dummy import Process
return Process(*args, **kwds)
def __init__(self, processes=None, initializer=None, initargs=()):
Pool.__init__(self, processes, initializer, initargs)
def _setup_queues(self):
self._inqueue = queue.Queue()
self._outqueue = queue.Queue()
self._quick_put = self._inqueue.put
self._quick_get = self._outqueue.get
@staticmethod
def _help_stuff_finish(inqueue, task_handler, size):
# put sentinels at head of inqueue to make workers finish
inqueue.not_empty.acquire()
try:
inqueue.queue.clear()
inqueue.queue.extend([None] * size)
inqueue.not_empty.notify_all()
finally:
inqueue.not_empty.release()
import os
import sys
import signal
import errno
from . import util
__all__ = ['Popen']
#
# Start child process using fork
#
class Popen(object):
method = 'fork'
def __init__(self, process_obj):
sys.stdout.flush()
sys.stderr.flush()
self.returncode = None
self._launch(process_obj)
def duplicate_for_child(self, fd):
return fd
def poll(self, flag=os.WNOHANG):
if self.returncode is None:
while True:
try:
pid, sts = os.waitpid(self.pid, flag)
except OSError as e:
if e.errno == errno.EINTR:
continue
# Child process not yet created. See #1731717
# e.errno == errno.ECHILD == 10
return None
else:
break
if pid == self.pid:
if os.WIFSIGNALED(sts):
self.returncode = -os.WTERMSIG(sts)
else:
assert os.WIFEXITED(sts)
self.returncode = os.WEXITSTATUS(sts)
return self.returncode
def wait(self, timeout=None):
if self.returncode is None:
if timeout is not None:
from multiprocessing.connection import wait
if not wait([self.sentinel], timeout):
return None
# This shouldn't block if wait() returned successfully.
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
return self.returncode
def terminate(self):
if self.returncode is None:
try:
os.kill(self.pid, signal.SIGTERM)
except ProcessLookupError:
pass
except OSError:
if self.wait(timeout=0.1) is None:
raise
def _launch(self, process_obj):
code = 1
parent_r, child_w = os.pipe()
self.pid = os.fork()
if self.pid == 0:
try:
os.close(parent_r)
if 'random' in sys.modules:
import random
random.seed()
code = process_obj._bootstrap()
finally:
os._exit(code)
else:
os.close(child_w)
util.Finalize(self, os.close, (parent_r,))
self.sentinel = parent_r
import io
import os
from . import reduction
if not reduction.HAVE_SEND_HANDLE:
raise ImportError('No support for sending fds between processes')
from . import context
from . import forkserver
from . import popen_fork
from . import spawn
from . import util
__all__ = ['Popen']
#
# Wrapper for an fd used while launching a process
#
class _DupFd(object):
def __init__(self, ind):
self.ind = ind
def detach(self):
return forkserver.get_inherited_fds()[self.ind]
#
# Start child process using a server process
#
class Popen(popen_fork.Popen):
method = 'forkserver'
DupFd = _DupFd
def __init__(self, process_obj):
self._fds = []
super().__init__(process_obj)
def duplicate_for_child(self, fd):
self._fds.append(fd)
return len(self._fds) - 1
def _launch(self, process_obj):
prep_data = spawn.get_preparation_data(process_obj._name)
buf = io.BytesIO()
context.set_spawning_popen(self)
try:
reduction.dump(prep_data, buf)
reduction.dump(process_obj, buf)
finally:
context.set_spawning_popen(None)
self.sentinel, w = forkserver.connect_to_new_process(self._fds)
util.Finalize(self, os.close, (self.sentinel,))
with open(w, 'wb', closefd=True) as f:
f.write(buf.getbuffer())
self.pid = forkserver.read_unsigned(self.sentinel)
def poll(self, flag=os.WNOHANG):
if self.returncode is None:
from multiprocessing.connection import wait
timeout = 0 if flag == os.WNOHANG else None
if not wait([self.sentinel], timeout):
return None
try:
self.returncode = forkserver.read_unsigned(self.sentinel)
except (OSError, EOFError):
# The process ended abnormally perhaps because of a signal
self.returncode = 255
return self.returncode
import io
import os
from . import context
from . import popen_fork
from . import reduction
from . import spawn
from . import util
__all__ = ['Popen']
#
# Wrapper for an fd used while launching a process
#
class _DupFd(object):
def __init__(self, fd):
self.fd = fd
def detach(self):
return self.fd
#
# Start child process using a fresh interpreter
#
class Popen(popen_fork.Popen):
method = 'spawn'
DupFd = _DupFd
def __init__(self, process_obj):
self._fds = []
super().__init__(process_obj)
def duplicate_for_child(self, fd):
self._fds.append(fd)
return fd
def _launch(self, process_obj):
from . import semaphore_tracker
tracker_fd = semaphore_tracker.getfd()
self._fds.append(tracker_fd)
prep_data = spawn.get_preparation_data(process_obj._name)
fp = io.BytesIO()
context.set_spawning_popen(self)
try:
reduction.dump(prep_data, fp)
reduction.dump(process_obj, fp)
finally:
context.set_spawning_popen(None)
parent_r = child_w = child_r = parent_w = None
try:
parent_r, child_w = os.pipe()
child_r, parent_w = os.pipe()
cmd = spawn.get_command_line(tracker_fd=tracker_fd,
pipe_handle=child_r)
self._fds.extend([child_r, child_w])
self.pid = util.spawnv_passfds(spawn.get_executable(),
cmd, self._fds)
self.sentinel = parent_r
with open(parent_w, 'wb', closefd=False) as f:
f.write(fp.getbuffer())
finally:
if parent_r is not None:
util.Finalize(self, os.close, (parent_r,))
for fd in (child_r, child_w, parent_w):
if fd is not None:
os.close(fd)
import os
import msvcrt
import signal
import sys
import _winapi
from . import context
from . import spawn
from . import reduction
from . import util
__all__ = ['Popen']
#
#
#
TERMINATE = 0x10000
WINEXE = (sys.platform == 'win32' and getattr(sys, 'frozen', False))
WINSERVICE = sys.executable.lower().endswith("pythonservice.exe")
#
# We define a Popen class similar to the one from subprocess, but
# whose constructor takes a process object as its argument.
#
class Popen(object):
'''
Start a subprocess to run the code of a process object
'''
method = 'spawn'
def __init__(self, process_obj):
prep_data = spawn.get_preparation_data(process_obj._name)
# read end of pipe will be "stolen" by the child process
# -- see spawn_main() in spawn.py.
rhandle, whandle = _winapi.CreatePipe(None, 0)
wfd = msvcrt.open_osfhandle(whandle, 0)
cmd = spawn.get_command_line(parent_pid=os.getpid(),
pipe_handle=rhandle)
cmd = ' '.join('"%s"' % x for x in cmd)
with open(wfd, 'wb', closefd=True) as to_child:
# start process
try:
hp, ht, pid, tid = _winapi.CreateProcess(
spawn.get_executable(), cmd,
None, None, False, 0, None, None, None)
_winapi.CloseHandle(ht)
except:
_winapi.CloseHandle(rhandle)
raise
# set attributes of self
self.pid = pid
self.returncode = None
self._handle = hp
self.sentinel = int(hp)
util.Finalize(self, _winapi.CloseHandle, (self.sentinel,))
# send information to child
context.set_spawning_popen(self)
try:
reduction.dump(prep_data, to_child)
reduction.dump(process_obj, to_child)
finally:
context.set_spawning_popen(None)
def duplicate_for_child(self, handle):
assert self is context.get_spawning_popen()
return reduction.duplicate(handle, self.sentinel)
def wait(self, timeout=None):
if self.returncode is None:
if timeout is None:
msecs = _winapi.INFINITE
else:
msecs = max(0, int(timeout * 1000 + 0.5))
res = _winapi.WaitForSingleObject(int(self._handle), msecs)
if res == _winapi.WAIT_OBJECT_0:
code = _winapi.GetExitCodeProcess(self._handle)
if code == TERMINATE:
code = -signal.SIGTERM
self.returncode = code
return self.returncode
def poll(self):
return self.wait(timeout=0)
def terminate(self):
if self.returncode is None:
try:
_winapi.TerminateProcess(int(self._handle), TERMINATE)
except OSError:
if self.wait(timeout=1.0) is None:
raise
#
# Module providing the `Process` class which emulates `threading.Thread`
#
# multiprocessing/process.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = ['BaseProcess', 'current_process', 'active_children']
#
# Imports
#
import os
import sys
import signal
import itertools
from _weakrefset import WeakSet
#
#
#
try:
ORIGINAL_DIR = os.path.abspath(os.getcwd())
except OSError:
ORIGINAL_DIR = None
#
# Public functions
#
def current_process():
'''
Return process object representing the current process
'''
return _current_process
def active_children():
'''
Return list of process objects corresponding to live child processes
'''
_cleanup()
return list(_children)
#
#
#
def _cleanup():
# check for processes which have finished
for p in list(_children):
if p._popen.poll() is not None:
_children.discard(p)
#
# The `Process` class
#
class BaseProcess(object):
'''
Process objects represent activity that is run in a separate process
The class is analogous to `threading.Thread`
'''
def _Popen(self):
raise NotImplementedError
def __init__(self, group=None, target=None, name=None, args=(), kwargs={},
*, daemon=None):
assert group is None, 'group argument must be None for now'
count = next(_process_counter)
self._identity = _current_process._identity + (count,)
self._config = _current_process._config.copy()
self._parent_pid = os.getpid()
self._popen = None
self._target = target
self._args = tuple(args)
self._kwargs = dict(kwargs)
self._name = name or type(self).__name__ + '-' + \
':'.join(str(i) for i in self._identity)
if daemon is not None:
self.daemon = daemon
_dangling.add(self)
def run(self):
'''
Method to be run in sub-process; can be overridden in sub-class
'''
if self._target:
self._target(*self._args, **self._kwargs)
def start(self):
'''
Start child process
'''
assert self._popen is None, 'cannot start a process twice'
assert self._parent_pid == os.getpid(), \
'can only start a process object created by current process'
assert not _current_process._config.get('daemon'), \
'daemonic processes are not allowed to have children'
_cleanup()
self._popen = self._Popen(self)
self._sentinel = self._popen.sentinel
_children.add(self)
def terminate(self):
'''
Terminate process; sends SIGTERM signal or uses TerminateProcess()
'''
self._popen.terminate()
def join(self, timeout=None):
'''
Wait until child process terminates
'''
assert self._parent_pid == os.getpid(), 'can only join a child process'
assert self._popen is not None, 'can only join a started process'
res = self._popen.wait(timeout)
if res is not None:
_children.discard(self)
def is_alive(self):
'''
Return whether process is alive
'''
if self is _current_process:
return True
assert self._parent_pid == os.getpid(), 'can only test a child process'
if self._popen is None:
return False
self._popen.poll()
return self._popen.returncode is None
@property
def name(self):
return self._name
@name.setter
def name(self, name):
assert isinstance(name, str), 'name must be a string'
self._name = name
@property
def daemon(self):
'''
Return whether process is a daemon
'''
return self._config.get('daemon', False)
@daemon.setter
def daemon(self, daemonic):
'''
Set whether process is a daemon
'''
assert self._popen is None, 'process has already started'
self._config['daemon'] = daemonic
@property
def authkey(self):
return self._config['authkey']
@authkey.setter
def authkey(self, authkey):
'''
Set authorization key of process
'''
self._config['authkey'] = AuthenticationString(authkey)
@property
def exitcode(self):
'''
Return exit code of process or `None` if it has yet to stop
'''
if self._popen is None:
return self._popen
return self._popen.poll()
@property
def ident(self):
'''
Return identifier (PID) of process or `None` if it has yet to start
'''
if self is _current_process:
return os.getpid()
else:
return self._popen and self._popen.pid
pid = ident
@property
def sentinel(self):
'''
Return a file descriptor (Unix) or handle (Windows) suitable for
waiting for process termination.
'''
try:
return self._sentinel
except AttributeError:
raise ValueError("process not started")
def __repr__(self):
if self is _current_process:
status = 'started'
elif self._parent_pid != os.getpid():
status = 'unknown'
elif self._popen is None:
status = 'initial'
else:
if self._popen.poll() is not None:
status = self.exitcode
else:
status = 'started'
if type(status) is int:
if status == 0:
status = 'stopped'
else:
status = 'stopped[%s]' % _exitcode_to_name.get(status, status)
return '<%s(%s, %s%s)>' % (type(self).__name__, self._name,
status, self.daemon and ' daemon' or '')
##
def _bootstrap(self):
from . import util, context
global _current_process, _process_counter, _children
try:
if self._start_method is not None:
context._force_start_method(self._start_method)
_process_counter = itertools.count(1)
_children = set()
if sys.stdin is not None:
try:
sys.stdin.close()
sys.stdin = open(os.devnull)
except (OSError, ValueError):
pass
old_process = _current_process
_current_process = self
try:
util._finalizer_registry.clear()
util._run_after_forkers()
finally:
# delay finalization of the old process object until after
# _run_after_forkers() is executed
del old_process
util.info('child process calling self.run()')
try:
self.run()
exitcode = 0
finally:
util._exit_function()
except SystemExit as e:
if not e.args:
exitcode = 1
elif isinstance(e.args[0], int):
exitcode = e.args[0]
else:
sys.stderr.write(str(e.args[0]) + '\n')
exitcode = 1
except:
exitcode = 1
import traceback
sys.stderr.write('Process %s:\n' % self.name)
traceback.print_exc()
finally:
util.info('process exiting with exitcode %d' % exitcode)
sys.stdout.flush()
sys.stderr.flush()
return exitcode
#
# We subclass bytes to avoid accidental transmission of auth keys over network
#
class AuthenticationString(bytes):
def __reduce__(self):
from .context import get_spawning_popen
if get_spawning_popen() is None:
raise TypeError(
'Pickling an AuthenticationString object is '
'disallowed for security reasons'
)
return AuthenticationString, (bytes(self),)
#
# Create object representing the main process
#
class _MainProcess(BaseProcess):
def __init__(self):
self._identity = ()
self._name = 'MainProcess'
self._parent_pid = None
self._popen = None
self._config = {'authkey': AuthenticationString(os.urandom(32)),
'semprefix': '/mp'}
# Note that some versions of FreeBSD only allow named
# semaphores to have names of up to 14 characters. Therefore
# we choose a short prefix.
#
# On MacOSX in a sandbox it may be necessary to use a
# different prefix -- see #19478.
#
# Everything in self._config will be inherited by descendant
# processes.
_current_process = _MainProcess()
_process_counter = itertools.count(1)
_children = set()
del _MainProcess
#
# Give names to some return codes
#
_exitcode_to_name = {}
for name, signum in list(signal.__dict__.items()):
if name[:3]=='SIG' and '_' not in name:
_exitcode_to_name[-signum] = name
# For debug and leak testing
_dangling = WeakSet()
#
# Module implementing queues
#
# multiprocessing/queues.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = ['Queue', 'SimpleQueue', 'JoinableQueue']
import sys
import os
import threading
import collections
import time
import weakref
import errno
from queue import Empty, Full
import _multiprocessing
from . import connection
from . import context
from .util import debug, info, Finalize, register_after_fork, is_exiting
from .reduction import ForkingPickler
#
# Queue type using a pipe, buffer and thread
#
class Queue(object):
def __init__(self, maxsize=0, *, ctx):
if maxsize <= 0:
# Can raise ImportError (see issues #3770 and #23400)
from .synchronize import SEM_VALUE_MAX as maxsize
self._maxsize = maxsize
self._reader, self._writer = connection.Pipe(duplex=False)
self._rlock = ctx.Lock()
self._opid = os.getpid()
if sys.platform == 'win32':
self._wlock = None
else:
self._wlock = ctx.Lock()
self._sem = ctx.BoundedSemaphore(maxsize)
# For use by concurrent.futures
self._ignore_epipe = False
self._after_fork()
if sys.platform != 'win32':
register_after_fork(self, Queue._after_fork)
def __getstate__(self):
context.assert_spawning(self)
return (self._ignore_epipe, self._maxsize, self._reader, self._writer,
self._rlock, self._wlock, self._sem, self._opid)
def __setstate__(self, state):
(self._ignore_epipe, self._maxsize, self._reader, self._writer,
self._rlock, self._wlock, self._sem, self._opid) = state
self._after_fork()
def _after_fork(self):
debug('Queue._after_fork()')
self._notempty = threading.Condition(threading.Lock())
self._buffer = collections.deque()
self._thread = None
self._jointhread = None
self._joincancelled = False
self._closed = False
self._close = None
self._send_bytes = self._writer.send_bytes
self._recv_bytes = self._reader.recv_bytes
self._poll = self._reader.poll
def put(self, obj, block=True, timeout=None):
assert not self._closed
if not self._sem.acquire(block, timeout):
raise Full
self._notempty.acquire()
try:
if self._thread is None:
self._start_thread()
self._buffer.append(obj)
self._notempty.notify()
finally:
self._notempty.release()
def get(self, block=True, timeout=None):
if block and timeout is None:
with self._rlock:
res = self._recv_bytes()
self._sem.release()
else:
if block:
deadline = time.time() + timeout
if not self._rlock.acquire(block, timeout):
raise Empty
try:
if block:
timeout = deadline - time.time()
if timeout < 0 or not self._poll(timeout):
raise Empty
elif not self._poll():
raise Empty
res = self._recv_bytes()
self._sem.release()
finally:
self._rlock.release()
# unserialize the data after having released the lock
return ForkingPickler.loads(res)
def qsize(self):
# Raises NotImplementedError on Mac OSX because of broken sem_getvalue()
return self._maxsize - self._sem._semlock._get_value()
def empty(self):
return not self._poll()
def full(self):
return self._sem._semlock._is_zero()
def get_nowait(self):
return self.get(False)
def put_nowait(self, obj):
return self.put(obj, False)
def close(self):
self._closed = True
try:
self._reader.close()
finally:
close = self._close
if close:
self._close = None
close()
def join_thread(self):
debug('Queue.join_thread()')
assert self._closed
if self._jointhread:
self._jointhread()
def cancel_join_thread(self):
debug('Queue.cancel_join_thread()')
self._joincancelled = True
try:
self._jointhread.cancel()
except AttributeError:
pass
def _start_thread(self):
debug('Queue._start_thread()')
# Start thread which transfers data from buffer to pipe
self._buffer.clear()
self._thread = threading.Thread(
target=Queue._feed,
args=(self._buffer, self._notempty, self._send_bytes,
self._wlock, self._writer.close, self._ignore_epipe),
name='QueueFeederThread'
)
self._thread.daemon = True
debug('doing self._thread.start()')
self._thread.start()
debug('... done self._thread.start()')
# On process exit we will wait for data to be flushed to pipe.
#
# However, if this process created the queue then all
# processes which use the queue will be descendants of this
# process. Therefore waiting for the queue to be flushed
# is pointless once all the child processes have been joined.
created_by_this_process = (self._opid == os.getpid())
if not self._joincancelled and not created_by_this_process:
self._jointhread = Finalize(
self._thread, Queue._finalize_join,
[weakref.ref(self._thread)],
exitpriority=-5
)
# Send sentinel to the thread queue object when garbage collected
self._close = Finalize(
self, Queue._finalize_close,
[self._buffer, self._notempty],
exitpriority=10
)
@staticmethod
def _finalize_join(twr):
debug('joining queue thread')
thread = twr()
if thread is not None:
thread.join()
debug('... queue thread joined')
else:
debug('... queue thread already dead')
@staticmethod
def _finalize_close(buffer, notempty):
debug('telling queue thread to quit')
notempty.acquire()
try:
buffer.append(_sentinel)
notempty.notify()
finally:
notempty.release()
@staticmethod
def _feed(buffer, notempty, send_bytes, writelock, close, ignore_epipe):
debug('starting thread to feed data to pipe')
nacquire = notempty.acquire
nrelease = notempty.release
nwait = notempty.wait
bpopleft = buffer.popleft
sentinel = _sentinel
if sys.platform != 'win32':
wacquire = writelock.acquire
wrelease = writelock.release
else:
wacquire = None
try:
while 1:
nacquire()
try:
if not buffer:
nwait()
finally:
nrelease()
try:
while 1:
obj = bpopleft()
if obj is sentinel:
debug('feeder thread got sentinel -- exiting')
close()
return
# serialize the data before acquiring the lock
obj = ForkingPickler.dumps(obj)
if wacquire is None:
send_bytes(obj)
else:
wacquire()
try:
send_bytes(obj)
finally:
wrelease()
except IndexError:
pass
except Exception as e:
if ignore_epipe and getattr(e, 'errno', 0) == errno.EPIPE:
return
# Since this runs in a daemon thread the resources it uses
# may be become unusable while the process is cleaning up.
# We ignore errors which happen after the process has
# started to cleanup.
try:
if is_exiting():
info('error in queue thread: %s', e)
else:
import traceback
traceback.print_exc()
except Exception:
pass
_sentinel = object()
#
# A queue type which also supports join() and task_done() methods
#
# Note that if you do not call task_done() for each finished task then
# eventually the counter's semaphore may overflow causing Bad Things
# to happen.
#
class JoinableQueue(Queue):
def __init__(self, maxsize=0, *, ctx):
Queue.__init__(self, maxsize, ctx=ctx)
self._unfinished_tasks = ctx.Semaphore(0)
self._cond = ctx.Condition()
def __getstate__(self):
return Queue.__getstate__(self) + (self._cond, self._unfinished_tasks)
def __setstate__(self, state):
Queue.__setstate__(self, state[:-2])
self._cond, self._unfinished_tasks = state[-2:]
def put(self, obj, block=True, timeout=None):
assert not self._closed
if not self._sem.acquire(block, timeout):
raise Full
self._notempty.acquire()
self._cond.acquire()
try:
if self._thread is None:
self._start_thread()
self._buffer.append(obj)
self._unfinished_tasks.release()
self._notempty.notify()
finally:
self._cond.release()
self._notempty.release()
def task_done(self):
self._cond.acquire()
try:
if not self._unfinished_tasks.acquire(False):
raise ValueError('task_done() called too many times')
if self._unfinished_tasks._semlock._is_zero():
self._cond.notify_all()
finally:
self._cond.release()
def join(self):
self._cond.acquire()
try:
if not self._unfinished_tasks._semlock._is_zero():
self._cond.wait()
finally:
self._cond.release()
#
# Simplified Queue type -- really just a locked pipe
#
class SimpleQueue(object):
def __init__(self, *, ctx):
self._reader, self._writer = connection.Pipe(duplex=False)
self._rlock = ctx.Lock()
self._poll = self._reader.poll
if sys.platform == 'win32':
self._wlock = None
else:
self._wlock = ctx.Lock()
def empty(self):
return not self._poll()
def __getstate__(self):
context.assert_spawning(self)
return (self._reader, self._writer, self._rlock, self._wlock)
def __setstate__(self, state):
(self._reader, self._writer, self._rlock, self._wlock) = state
def get(self):
with self._rlock:
res = self._reader.recv_bytes()
# unserialize the data after having released the lock
return ForkingPickler.loads(res)
def put(self, obj):
# serialize the data before acquiring the lock
obj = ForkingPickler.dumps(obj)
if self._wlock is None:
# writes to a message oriented win32 pipe are atomic
self._writer.send_bytes(obj)
else:
with self._wlock:
self._writer.send_bytes(obj)
#
# Module which deals with pickling of objects.
#
# multiprocessing/reduction.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
import copyreg
import functools
import io
import os
import pickle
import socket
import sys
from . import context
__all__ = ['send_handle', 'recv_handle', 'ForkingPickler', 'register', 'dump']
HAVE_SEND_HANDLE = (sys.platform == 'win32' or
(hasattr(socket, 'CMSG_LEN') and
hasattr(socket, 'SCM_RIGHTS') and
hasattr(socket.socket, 'sendmsg')))
#
# Pickler subclass
#
class ForkingPickler(pickle.Pickler):
'''Pickler subclass used by multiprocessing.'''
_extra_reducers = {}
_copyreg_dispatch_table = copyreg.dispatch_table
def __init__(self, *args):
super().__init__(*args)
self.dispatch_table = self._copyreg_dispatch_table.copy()
self.dispatch_table.update(self._extra_reducers)
@classmethod
def register(cls, type, reduce):
'''Register a reduce function for a type.'''
cls._extra_reducers[type] = reduce
@classmethod
def dumps(cls, obj, protocol=None):
buf = io.BytesIO()
cls(buf, protocol).dump(obj)
return buf.getbuffer()
loads = pickle.loads
register = ForkingPickler.register
def dump(obj, file, protocol=None):
'''Replacement for pickle.dump() using ForkingPickler.'''
ForkingPickler(file, protocol).dump(obj)
#
# Platform specific definitions
#
if sys.platform == 'win32':
# Windows
__all__ += ['DupHandle', 'duplicate', 'steal_handle']
import _winapi
def duplicate(handle, target_process=None, inheritable=False):
'''Duplicate a handle. (target_process is a handle not a pid!)'''
if target_process is None:
target_process = _winapi.GetCurrentProcess()
return _winapi.DuplicateHandle(
_winapi.GetCurrentProcess(), handle, target_process,
0, inheritable, _winapi.DUPLICATE_SAME_ACCESS)
def steal_handle(source_pid, handle):
'''Steal a handle from process identified by source_pid.'''
source_process_handle = _winapi.OpenProcess(
_winapi.PROCESS_DUP_HANDLE, False, source_pid)
try:
return _winapi.DuplicateHandle(
source_process_handle, handle,
_winapi.GetCurrentProcess(), 0, False,
_winapi.DUPLICATE_SAME_ACCESS | _winapi.DUPLICATE_CLOSE_SOURCE)
finally:
_winapi.CloseHandle(source_process_handle)
def send_handle(conn, handle, destination_pid):
'''Send a handle over a local connection.'''
dh = DupHandle(handle, _winapi.DUPLICATE_SAME_ACCESS, destination_pid)
conn.send(dh)
def recv_handle(conn):
'''Receive a handle over a local connection.'''
return conn.recv().detach()
class DupHandle(object):
'''Picklable wrapper for a handle.'''
def __init__(self, handle, access, pid=None):
if pid is None:
# We just duplicate the handle in the current process and
# let the receiving process steal the handle.
pid = os.getpid()
proc = _winapi.OpenProcess(_winapi.PROCESS_DUP_HANDLE, False, pid)
try:
self._handle = _winapi.DuplicateHandle(
_winapi.GetCurrentProcess(),
handle, proc, access, False, 0)
finally:
_winapi.CloseHandle(proc)
self._access = access
self._pid = pid
def detach(self):
'''Get the handle. This should only be called once.'''
# retrieve handle from process which currently owns it
if self._pid == os.getpid():
# The handle has already been duplicated for this process.
return self._handle
# We must steal the handle from the process whose pid is self._pid.
proc = _winapi.OpenProcess(_winapi.PROCESS_DUP_HANDLE, False,
self._pid)
try:
return _winapi.DuplicateHandle(
proc, self._handle, _winapi.GetCurrentProcess(),
self._access, False, _winapi.DUPLICATE_CLOSE_SOURCE)
finally:
_winapi.CloseHandle(proc)
else:
# Unix
__all__ += ['DupFd', 'sendfds', 'recvfds']
import array
# On MacOSX we should acknowledge receipt of fds -- see Issue14669
ACKNOWLEDGE = sys.platform == 'darwin'
def sendfds(sock, fds):
'''Send an array of fds over an AF_UNIX socket.'''
fds = array.array('i', fds)
msg = bytes([len(fds) % 256])
sock.sendmsg([msg], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, fds)])
if ACKNOWLEDGE and sock.recv(1) != b'A':
raise RuntimeError('did not receive acknowledgement of fd')
def recvfds(sock, size):
'''Receive an array of fds over an AF_UNIX socket.'''
a = array.array('i')
bytes_size = a.itemsize * size
msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_LEN(bytes_size))
if not msg and not ancdata:
raise EOFError
try:
if ACKNOWLEDGE:
sock.send(b'A')
if len(ancdata) != 1:
raise RuntimeError('received %d items of ancdata' %
len(ancdata))
cmsg_level, cmsg_type, cmsg_data = ancdata[0]
if (cmsg_level == socket.SOL_SOCKET and
cmsg_type == socket.SCM_RIGHTS):
if len(cmsg_data) % a.itemsize != 0:
raise ValueError
a.frombytes(cmsg_data)
assert len(a) % 256 == msg[0]
return list(a)
except (ValueError, IndexError):
pass
raise RuntimeError('Invalid data received')
def send_handle(conn, handle, destination_pid):
'''Send a handle over a local connection.'''
with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
sendfds(s, [handle])
def recv_handle(conn):
'''Receive a handle over a local connection.'''
with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
return recvfds(s, 1)[0]
def DupFd(fd):
'''Return a wrapper for an fd.'''
popen_obj = context.get_spawning_popen()
if popen_obj is not None:
return popen_obj.DupFd(popen_obj.duplicate_for_child(fd))
elif HAVE_SEND_HANDLE:
from . import resource_sharer
return resource_sharer.DupFd(fd)
else:
raise ValueError('SCM_RIGHTS appears not to be available')
#
# Try making some callable types picklable
#
def _reduce_method(m):
if m.__self__ is None:
return getattr, (m.__class__, m.__func__.__name__)
else:
return getattr, (m.__self__, m.__func__.__name__)
class _C:
def f(self):
pass
register(type(_C().f), _reduce_method)
def _reduce_method_descriptor(m):
return getattr, (m.__objclass__, m.__name__)
register(type(list.append), _reduce_method_descriptor)
register(type(int.__add__), _reduce_method_descriptor)
def _reduce_partial(p):
return _rebuild_partial, (p.func, p.args, p.keywords or {})
def _rebuild_partial(func, args, keywords):
return functools.partial(func, *args, **keywords)
register(functools.partial, _reduce_partial)
#
# Make sockets picklable
#
if sys.platform == 'win32':
def _reduce_socket(s):
from .resource_sharer import DupSocket
return _rebuild_socket, (DupSocket(s),)
def _rebuild_socket(ds):
return ds.detach()
register(socket.socket, _reduce_socket)
else:
def _reduce_socket(s):
df = DupFd(s.fileno())
return _rebuild_socket, (df, s.family, s.type, s.proto)
def _rebuild_socket(df, family, type, proto):
fd = df.detach()
return socket.socket(family, type, proto, fileno=fd)
register(socket.socket, _reduce_socket)
#
# We use a background thread for sharing fds on Unix, and for sharing sockets on
# Windows.
#
# A client which wants to pickle a resource registers it with the resource
# sharer and gets an identifier in return. The unpickling process will connect
# to the resource sharer, sends the identifier and its pid, and then receives
# the resource.
#
import os
import signal
import socket
import sys
import threading
from . import process
from . import reduction
from . import util
__all__ = ['stop']
if sys.platform == 'win32':
__all__ += ['DupSocket']
class DupSocket(object):
'''Picklable wrapper for a socket.'''
def __init__(self, sock):
new_sock = sock.dup()
def send(conn, pid):
share = new_sock.share(pid)
conn.send_bytes(share)
self._id = _resource_sharer.register(send, new_sock.close)
def detach(self):
'''Get the socket. This should only be called once.'''
with _resource_sharer.get_connection(self._id) as conn:
share = conn.recv_bytes()
return socket.fromshare(share)
else:
__all__ += ['DupFd']
class DupFd(object):
'''Wrapper for fd which can be used at any time.'''
def __init__(self, fd):
new_fd = os.dup(fd)
def send(conn, pid):
reduction.send_handle(conn, new_fd, pid)
def close():
os.close(new_fd)
self._id = _resource_sharer.register(send, close)
def detach(self):
'''Get the fd. This should only be called once.'''
with _resource_sharer.get_connection(self._id) as conn:
return reduction.recv_handle(conn)
class _ResourceSharer(object):
'''Manager for resouces using background thread.'''
def __init__(self):
self._key = 0
self._cache = {}
self._old_locks = []
self._lock = threading.Lock()
self._listener = None
self._address = None
self._thread = None
util.register_after_fork(self, _ResourceSharer._afterfork)
def register(self, send, close):
'''Register resource, returning an identifier.'''
with self._lock:
if self._address is None:
self._start()
self._key += 1
self._cache[self._key] = (send, close)
return (self._address, self._key)
@staticmethod
def get_connection(ident):
'''Return connection from which to receive identified resource.'''
from .connection import Client
address, key = ident
c = Client(address, authkey=process.current_process().authkey)
c.send((key, os.getpid()))
return c
def stop(self, timeout=None):
'''Stop the background thread and clear registered resources.'''
from .connection import Client
with self._lock:
if self._address is not None:
c = Client(self._address,
authkey=process.current_process().authkey)
c.send(None)
c.close()
self._thread.join(timeout)
if self._thread.is_alive():
util.sub_warning('_ResourceSharer thread did '
'not stop when asked')
self._listener.close()
self._thread = None
self._address = None
self._listener = None
for key, (send, close) in self._cache.items():
close()
self._cache.clear()
def _afterfork(self):
for key, (send, close) in self._cache.items():
close()
self._cache.clear()
# If self._lock was locked at the time of the fork, it may be broken
# -- see issue 6721. Replace it without letting it be gc'ed.
self._old_locks.append(self._lock)
self._lock = threading.Lock()
if self._listener is not None:
self._listener.close()
self._listener = None
self._address = None
self._thread = None
def _start(self):
from .connection import Listener
assert self._listener is None
util.debug('starting listener and thread for sending handles')
self._listener = Listener(authkey=process.current_process().authkey)
self._address = self._listener.address
t = threading.Thread(target=self._serve)
t.daemon = True
t.start()
self._thread = t
def _serve(self):
if hasattr(signal, 'pthread_sigmask'):
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
while 1:
try:
with self._listener.accept() as conn:
msg = conn.recv()
if msg is None:
break
key, destination_pid = msg
send, close = self._cache.pop(key)
try:
send(conn, destination_pid)
finally:
close()
except:
if not util.is_exiting():
sys.excepthook(*sys.exc_info())
_resource_sharer = _ResourceSharer()
stop = _resource_sharer.stop
#
# On Unix we run a server process which keeps track of unlinked
# semaphores. The server ignores SIGINT and SIGTERM and reads from a
# pipe. Every other process of the program has a copy of the writable
# end of the pipe, so we get EOF when all other processes have exited.
# Then the server process unlinks any remaining semaphore names.
#
# This is important because the system only supports a limited number
# of named semaphores, and they will not be automatically removed till
# the next reboot. Without this semaphore tracker process, "killall
# python" would probably leave unlinked semaphores.
#
import os
import signal
import sys
import threading
import warnings
import _multiprocessing
from . import spawn
from . import util
__all__ = ['ensure_running', 'register', 'unregister']
class SemaphoreTracker(object):
def __init__(self):
self._lock = threading.Lock()
self._fd = None
def getfd(self):
self.ensure_running()
return self._fd
def ensure_running(self):
'''Make sure that semaphore tracker process is running.
This can be run from any process. Usually a child process will use
the semaphore created by its parent.'''
with self._lock:
if self._fd is not None:
return
fds_to_pass = []
try:
fds_to_pass.append(sys.stderr.fileno())
except Exception:
pass
cmd = 'from multiprocessing.semaphore_tracker import main;main(%d)'
r, w = os.pipe()
try:
fds_to_pass.append(r)
# process will out live us, so no need to wait on pid
exe = spawn.get_executable()
args = [exe] + util._args_from_interpreter_flags()
args += ['-c', cmd % r]
util.spawnv_passfds(exe, args, fds_to_pass)
except:
os.close(w)
raise
else:
self._fd = w
finally:
os.close(r)
def register(self, name):
'''Register name of semaphore with semaphore tracker.'''
self._send('REGISTER', name)
def unregister(self, name):
'''Unregister name of semaphore with semaphore tracker.'''
self._send('UNREGISTER', name)
def _send(self, cmd, name):
self.ensure_running()
msg = '{0}:{1}\n'.format(cmd, name).encode('ascii')
if len(name) > 512:
# posix guarantees that writes to a pipe of less than PIPE_BUF
# bytes are atomic, and that PIPE_BUF >= 512
raise ValueError('name too long')
nbytes = os.write(self._fd, msg)
assert nbytes == len(msg)
_semaphore_tracker = SemaphoreTracker()
ensure_running = _semaphore_tracker.ensure_running
register = _semaphore_tracker.register
unregister = _semaphore_tracker.unregister
getfd = _semaphore_tracker.getfd
def main(fd):
'''Run semaphore tracker.'''
# protect the process from ^C and "killall python" etc
signal.signal(signal.SIGINT, signal.SIG_IGN)
signal.signal(signal.SIGTERM, signal.SIG_IGN)
for f in (sys.stdin, sys.stdout):
try:
f.close()
except Exception:
pass
cache = set()
try:
# keep track of registered/unregistered semaphores
with open(fd, 'rb') as f:
for line in f:
try:
cmd, name = line.strip().split(b':')
if cmd == b'REGISTER':
cache.add(name)
elif cmd == b'UNREGISTER':
cache.remove(name)
else:
raise RuntimeError('unrecognized command %r' % cmd)
except Exception:
try:
sys.excepthook(*sys.exc_info())
except:
pass
finally:
# all processes have terminated; cleanup any remaining semaphores
if cache:
try:
warnings.warn('semaphore_tracker: There appear to be %d '
'leaked semaphores to clean up at shutdown' %
len(cache))
except Exception:
pass
for name in cache:
# For some reason the process which created and registered this
# semaphore has failed to unregister it. Presumably it has died.
# We therefore unlink it.
try:
name = name.decode('ascii')
try:
_multiprocessing.sem_unlink(name)
except Exception as e:
warnings.warn('semaphore_tracker: %r: %s' % (name, e))
finally:
pass
#
# Module which supports allocation of ctypes objects from shared memory
#
# multiprocessing/sharedctypes.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
import ctypes
import weakref
from . import heap
from . import get_context
from .context import assert_spawning
from .reduction import ForkingPickler
__all__ = ['RawValue', 'RawArray', 'Value', 'Array', 'copy', 'synchronized']
#
#
#
typecode_to_type = {
'c': ctypes.c_char, 'u': ctypes.c_wchar,
'b': ctypes.c_byte, 'B': ctypes.c_ubyte,
'h': ctypes.c_short, 'H': ctypes.c_ushort,
'i': ctypes.c_int, 'I': ctypes.c_uint,
'l': ctypes.c_long, 'L': ctypes.c_ulong,
'f': ctypes.c_float, 'd': ctypes.c_double
}
#
#
#
def _new_value(type_):
size = ctypes.sizeof(type_)
wrapper = heap.BufferWrapper(size)
return rebuild_ctype(type_, wrapper, None)
def RawValue(typecode_or_type, *args):
'''
Returns a ctypes object allocated from shared memory
'''
type_ = typecode_to_type.get(typecode_or_type, typecode_or_type)
obj = _new_value(type_)
ctypes.memset(ctypes.addressof(obj), 0, ctypes.sizeof(obj))
obj.__init__(*args)
return obj
def RawArray(typecode_or_type, size_or_initializer):
'''
Returns a ctypes array allocated from shared memory
'''
type_ = typecode_to_type.get(typecode_or_type, typecode_or_type)
if isinstance(size_or_initializer, int):
type_ = type_ * size_or_initializer
obj = _new_value(type_)
ctypes.memset(ctypes.addressof(obj), 0, ctypes.sizeof(obj))
return obj
else:
type_ = type_ * len(size_or_initializer)
result = _new_value(type_)
result.__init__(*size_or_initializer)
return result
def Value(typecode_or_type, *args, lock=True, ctx=None):
'''
Return a synchronization wrapper for a Value
'''
obj = RawValue(typecode_or_type, *args)
if lock is False:
return obj
if lock in (True, None):
ctx = ctx or get_context()
lock = ctx.RLock()
if not hasattr(lock, 'acquire'):
raise AttributeError("'%r' has no method 'acquire'" % lock)
return synchronized(obj, lock, ctx=ctx)
def Array(typecode_or_type, size_or_initializer, *, lock=True, ctx=None):
'''
Return a synchronization wrapper for a RawArray
'''
obj = RawArray(typecode_or_type, size_or_initializer)
if lock is False:
return obj
if lock in (True, None):
ctx = ctx or get_context()
lock = ctx.RLock()
if not hasattr(lock, 'acquire'):
raise AttributeError("'%r' has no method 'acquire'" % lock)
return synchronized(obj, lock, ctx=ctx)
def copy(obj):
new_obj = _new_value(type(obj))
ctypes.pointer(new_obj)[0] = obj
return new_obj
def synchronized(obj, lock=None, ctx=None):
assert not isinstance(obj, SynchronizedBase), 'object already synchronized'
ctx = ctx or get_context()
if isinstance(obj, ctypes._SimpleCData):
return Synchronized(obj, lock, ctx)
elif isinstance(obj, ctypes.Array):
if obj._type_ is ctypes.c_char:
return SynchronizedString(obj, lock, ctx)
return SynchronizedArray(obj, lock, ctx)
else:
cls = type(obj)
try:
scls = class_cache[cls]
except KeyError:
names = [field[0] for field in cls._fields_]
d = dict((name, make_property(name)) for name in names)
classname = 'Synchronized' + cls.__name__
scls = class_cache[cls] = type(classname, (SynchronizedBase,), d)
return scls(obj, lock, ctx)
#
# Functions for pickling/unpickling
#
def reduce_ctype(obj):
assert_spawning(obj)
if isinstance(obj, ctypes.Array):
return rebuild_ctype, (obj._type_, obj._wrapper, obj._length_)
else:
return rebuild_ctype, (type(obj), obj._wrapper, None)
def rebuild_ctype(type_, wrapper, length):
if length is not None:
type_ = type_ * length
ForkingPickler.register(type_, reduce_ctype)
buf = wrapper.create_memoryview()
obj = type_.from_buffer(buf)
obj._wrapper = wrapper
return obj
#
# Function to create properties
#
def make_property(name):
try:
return prop_cache[name]
except KeyError:
d = {}
exec(template % ((name,)*7), d)
prop_cache[name] = d[name]
return d[name]
template = '''
def get%s(self):
self.acquire()
try:
return self._obj.%s
finally:
self.release()
def set%s(self, value):
self.acquire()
try:
self._obj.%s = value
finally:
self.release()
%s = property(get%s, set%s)
'''
prop_cache = {}
class_cache = weakref.WeakKeyDictionary()
#
# Synchronized wrappers
#
class SynchronizedBase(object):
def __init__(self, obj, lock=None, ctx=None):
self._obj = obj
if lock:
self._lock = lock
else:
ctx = ctx or get_context(force=True)
self._lock = ctx.RLock()
self.acquire = self._lock.acquire
self.release = self._lock.release
def __reduce__(self):
assert_spawning(self)
return synchronized, (self._obj, self._lock)
def get_obj(self):
return self._obj
def get_lock(self):
return self._lock
def __repr__(self):
return '<%s wrapper for %s>' % (type(self).__name__, self._obj)
class Synchronized(SynchronizedBase):
value = make_property('value')
class SynchronizedArray(SynchronizedBase):
def __len__(self):
return len(self._obj)
def __getitem__(self, i):
self.acquire()
try:
return self._obj[i]
finally:
self.release()
def __setitem__(self, i, value):
self.acquire()
try:
self._obj[i] = value
finally:
self.release()
def __getslice__(self, start, stop):
self.acquire()
try:
return self._obj[start:stop]
finally:
self.release()
def __setslice__(self, start, stop, values):
self.acquire()
try:
self._obj[start:stop] = values
finally:
self.release()
class SynchronizedString(SynchronizedArray):
value = make_property('value')
raw = make_property('raw')
#
# Code used to start processes when using the spawn or forkserver
# start methods.
#
# multiprocessing/spawn.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
import os
import pickle
import sys
import runpy
import types
from . import get_start_method, set_start_method
from . import process
from . import util
__all__ = ['_main', 'freeze_support', 'set_executable', 'get_executable',
'get_preparation_data', 'get_command_line', 'import_main_path']
#
# _python_exe is the assumed path to the python executable.
# People embedding Python want to modify it.
#
if sys.platform != 'win32':
WINEXE = False
WINSERVICE = False
else:
WINEXE = (sys.platform == 'win32' and getattr(sys, 'frozen', False))
WINSERVICE = sys.executable.lower().endswith("pythonservice.exe")
if WINSERVICE:
_python_exe = os.path.join(sys.exec_prefix, 'python.exe')
else:
_python_exe = sys.executable
def set_executable(exe):
global _python_exe
_python_exe = exe
def get_executable():
return _python_exe
#
#
#
def is_forking(argv):
'''
Return whether commandline indicates we are forking
'''
if len(argv) >= 2 and argv[1] == '--multiprocessing-fork':
return True
else:
return False
def freeze_support():
'''
Run code for process object if this in not the main process
'''
if is_forking(sys.argv):
kwds = {}
for arg in sys.argv[2:]:
name, value = arg.split('=')
if value == 'None':
kwds[name] = None
else:
kwds[name] = int(value)
spawn_main(**kwds)
sys.exit()
def get_command_line(**kwds):
'''
Returns prefix of command line used for spawning a child process
'''
if getattr(sys, 'frozen', False):
return ([sys.executable, '--multiprocessing-fork'] +
['%s=%r' % item for item in kwds.items()])
else:
prog = 'from multiprocessing.spawn import spawn_main; spawn_main(%s)'
prog %= ', '.join('%s=%r' % item for item in kwds.items())
opts = util._args_from_interpreter_flags()
return [_python_exe] + opts + ['-c', prog, '--multiprocessing-fork']
def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None):
'''
Run code specifed by data received over pipe
'''
assert is_forking(sys.argv)
if sys.platform == 'win32':
import msvcrt
from .reduction import steal_handle
new_handle = steal_handle(parent_pid, pipe_handle)
fd = msvcrt.open_osfhandle(new_handle, os.O_RDONLY)
else:
from . import semaphore_tracker
semaphore_tracker._semaphore_tracker._fd = tracker_fd
fd = pipe_handle
exitcode = _main(fd)
sys.exit(exitcode)
def _main(fd):
with os.fdopen(fd, 'rb', closefd=True) as from_parent:
process.current_process()._inheriting = True
try:
preparation_data = pickle.load(from_parent)
prepare(preparation_data)
self = pickle.load(from_parent)
finally:
del process.current_process()._inheriting
return self._bootstrap()
def _check_not_importing_main():
if getattr(process.current_process(), '_inheriting', False):
raise RuntimeError('''
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.''')
def get_preparation_data(name):
'''
Return info about parent needed by child to unpickle process object
'''
_check_not_importing_main()
d = dict(
log_to_stderr=util._log_to_stderr,
authkey=process.current_process().authkey,
)
if util._logger is not None:
d['log_level'] = util._logger.getEffectiveLevel()
sys_path=sys.path.copy()
try:
i = sys_path.index('')
except ValueError:
pass
else:
sys_path[i] = process.ORIGINAL_DIR
d.update(
name=name,
sys_path=sys_path,
sys_argv=sys.argv,
orig_dir=process.ORIGINAL_DIR,
dir=os.getcwd(),
start_method=get_start_method(),
)
# Figure out whether to initialise main in the subprocess as a module
# or through direct execution (or to leave it alone entirely)
main_module = sys.modules['__main__']
main_mod_name = getattr(main_module.__spec__, "name", None)
if main_mod_name is not None:
d['init_main_from_name'] = main_mod_name
elif sys.platform != 'win32' or (not WINEXE and not WINSERVICE):
main_path = getattr(main_module, '__file__', None)
if main_path is not None:
if (not os.path.isabs(main_path) and
process.ORIGINAL_DIR is not None):
main_path = os.path.join(process.ORIGINAL_DIR, main_path)
d['init_main_from_path'] = os.path.normpath(main_path)
return d
#
# Prepare current process
#
old_main_modules = []
def prepare(data):
'''
Try to get current process ready to unpickle process object
'''
if 'name' in data:
process.current_process().name = data['name']
if 'authkey' in data:
process.current_process().authkey = data['authkey']
if 'log_to_stderr' in data and data['log_to_stderr']:
util.log_to_stderr()
if 'log_level' in data:
util.get_logger().setLevel(data['log_level'])
if 'sys_path' in data:
sys.path = data['sys_path']
if 'sys_argv' in data:
sys.argv = data['sys_argv']
if 'dir' in data:
os.chdir(data['dir'])
if 'orig_dir' in data:
process.ORIGINAL_DIR = data['orig_dir']
if 'start_method' in data:
set_start_method(data['start_method'])
if 'init_main_from_name' in data:
_fixup_main_from_name(data['init_main_from_name'])
elif 'init_main_from_path' in data:
_fixup_main_from_path(data['init_main_from_path'])
# Multiprocessing module helpers to fix up the main module in
# spawned subprocesses
def _fixup_main_from_name(mod_name):
# __main__.py files for packages, directories, zip archives, etc, run
# their "main only" code unconditionally, so we don't even try to
# populate anything in __main__, nor do we make any changes to
# __main__ attributes
current_main = sys.modules['__main__']
if mod_name == "__main__" or mod_name.endswith(".__main__"):
return
# If this process was forked, __main__ may already be populated
if getattr(current_main.__spec__, "name", None) == mod_name:
return
# Otherwise, __main__ may contain some non-main code where we need to
# support unpickling it properly. We rerun it as __mp_main__ and make
# the normal __main__ an alias to that
old_main_modules.append(current_main)
main_module = types.ModuleType("__mp_main__")
main_content = runpy.run_module(mod_name,
run_name="__mp_main__",
alter_sys=True)
main_module.__dict__.update(main_content)
sys.modules['__main__'] = sys.modules['__mp_main__'] = main_module
def _fixup_main_from_path(main_path):
# If this process was forked, __main__ may already be populated
current_main = sys.modules['__main__']
# Unfortunately, the main ipython launch script historically had no
# "if __name__ == '__main__'" guard, so we work around that
# by treating it like a __main__.py file
# See https://github.com/ipython/ipython/issues/4698
main_name = os.path.splitext(os.path.basename(main_path))[0]
if main_name == 'ipython':
return
# Otherwise, if __file__ already has the setting we expect,
# there's nothing more to do
if getattr(current_main, '__file__', None) == main_path:
return
# If the parent process has sent a path through rather than a module
# name we assume it is an executable script that may contain
# non-main code that needs to be executed
old_main_modules.append(current_main)
main_module = types.ModuleType("__mp_main__")
main_content = runpy.run_path(main_path,
run_name="__mp_main__")
main_module.__dict__.update(main_content)
sys.modules['__main__'] = sys.modules['__mp_main__'] = main_module
def import_main_path(main_path):
'''
Set sys.modules['__main__'] to module at main_path
'''
_fixup_main_from_path(main_path)
#
# Module implementing synchronization primitives
#
# multiprocessing/synchronize.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = [
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Condition', 'Event'
]
import threading
import sys
import tempfile
import _multiprocessing
from time import time as _time
from . import context
from . import process
from . import util
# Try to import the mp.synchronize module cleanly, if it fails
# raise ImportError for platforms lacking a working sem_open implementation.
# See issue 3770
try:
from _multiprocessing import SemLock, sem_unlink
except (ImportError):
raise ImportError("This platform lacks a functioning sem_open" +
" implementation, therefore, the required" +
" synchronization primitives needed will not" +
" function, see issue 3770.")
#
# Constants
#
RECURSIVE_MUTEX, SEMAPHORE = list(range(2))
SEM_VALUE_MAX = _multiprocessing.SemLock.SEM_VALUE_MAX
#
# Base class for semaphores and mutexes; wraps `_multiprocessing.SemLock`
#
class SemLock(object):
_rand = tempfile._RandomNameSequence()
def __init__(self, kind, value, maxvalue, *, ctx):
if ctx is None:
ctx = context._default_context.get_context()
name = ctx.get_start_method()
unlink_now = sys.platform == 'win32' or name == 'fork'
for i in range(100):
try:
sl = self._semlock = _multiprocessing.SemLock(
kind, value, maxvalue, self._make_name(),
unlink_now)
except FileExistsError:
pass
else:
break
else:
raise FileExistsError('cannot find name for semaphore')
util.debug('created semlock with handle %s' % sl.handle)
self._make_methods()
if sys.platform != 'win32':
def _after_fork(obj):
obj._semlock._after_fork()
util.register_after_fork(self, _after_fork)
if self._semlock.name is not None:
# We only get here if we are on Unix with forking
# disabled. When the object is garbage collected or the
# process shuts down we unlink the semaphore name
from .semaphore_tracker import register
register(self._semlock.name)
util.Finalize(self, SemLock._cleanup, (self._semlock.name,),
exitpriority=0)
@staticmethod
def _cleanup(name):
from .semaphore_tracker import unregister
sem_unlink(name)
unregister(name)
def _make_methods(self):
self.acquire = self._semlock.acquire
self.release = self._semlock.release
def __enter__(self):
return self._semlock.__enter__()
def __exit__(self, *args):
return self._semlock.__exit__(*args)
def __getstate__(self):
context.assert_spawning(self)
sl = self._semlock
if sys.platform == 'win32':
h = context.get_spawning_popen().duplicate_for_child(sl.handle)
else:
h = sl.handle
return (h, sl.kind, sl.maxvalue, sl.name)
def __setstate__(self, state):
self._semlock = _multiprocessing.SemLock._rebuild(*state)
util.debug('recreated blocker with handle %r' % state[0])
self._make_methods()
@staticmethod
def _make_name():
return '%s-%s' % (process.current_process()._config['semprefix'],
next(SemLock._rand))
#
# Semaphore
#
class Semaphore(SemLock):
def __init__(self, value=1, *, ctx):
SemLock.__init__(self, SEMAPHORE, value, SEM_VALUE_MAX, ctx=ctx)
def get_value(self):
return self._semlock._get_value()
def __repr__(self):
try:
value = self._semlock._get_value()
except Exception:
value = 'unknown'
return '<Semaphore(value=%s)>' % value
#
# Bounded semaphore
#
class BoundedSemaphore(Semaphore):
def __init__(self, value=1, *, ctx):
SemLock.__init__(self, SEMAPHORE, value, value, ctx=ctx)
def __repr__(self):
try:
value = self._semlock._get_value()
except Exception:
value = 'unknown'
return '<BoundedSemaphore(value=%s, maxvalue=%s)>' % \
(value, self._semlock.maxvalue)
#
# Non-recursive lock
#
class Lock(SemLock):
def __init__(self, *, ctx):
SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)
def __repr__(self):
try:
if self._semlock._is_mine():
name = process.current_process().name
if threading.current_thread().name != 'MainThread':
name += '|' + threading.current_thread().name
elif self._semlock._get_value() == 1:
name = 'None'
elif self._semlock._count() > 0:
name = 'SomeOtherThread'
else:
name = 'SomeOtherProcess'
except Exception:
name = 'unknown'
return '<Lock(owner=%s)>' % name
#
# Recursive lock
#
class RLock(SemLock):
def __init__(self, *, ctx):
SemLock.__init__(self, RECURSIVE_MUTEX, 1, 1, ctx=ctx)
def __repr__(self):
try:
if self._semlock._is_mine():
name = process.current_process().name
if threading.current_thread().name != 'MainThread':
name += '|' + threading.current_thread().name
count = self._semlock._count()
elif self._semlock._get_value() == 1:
name, count = 'None', 0
elif self._semlock._count() > 0:
name, count = 'SomeOtherThread', 'nonzero'
else:
name, count = 'SomeOtherProcess', 'nonzero'
except Exception:
name, count = 'unknown', 'unknown'
return '<RLock(%s, %s)>' % (name, count)
#
# Condition variable
#
class Condition(object):
def __init__(self, lock=None, *, ctx):
self._lock = lock or ctx.RLock()
self._sleeping_count = ctx.Semaphore(0)
self._woken_count = ctx.Semaphore(0)
self._wait_semaphore = ctx.Semaphore(0)
self._make_methods()
def __getstate__(self):
context.assert_spawning(self)
return (self._lock, self._sleeping_count,
self._woken_count, self._wait_semaphore)
def __setstate__(self, state):
(self._lock, self._sleeping_count,
self._woken_count, self._wait_semaphore) = state
self._make_methods()
def __enter__(self):
return self._lock.__enter__()
def __exit__(self, *args):
return self._lock.__exit__(*args)
def _make_methods(self):
self.acquire = self._lock.acquire
self.release = self._lock.release
def __repr__(self):
try:
num_waiters = (self._sleeping_count._semlock._get_value() -
self._woken_count._semlock._get_value())
except Exception:
num_waiters = 'unknown'
return '<Condition(%s, %s)>' % (self._lock, num_waiters)
def wait(self, timeout=None):
assert self._lock._semlock._is_mine(), \
'must acquire() condition before using wait()'
# indicate that this thread is going to sleep
self._sleeping_count.release()
# release lock
count = self._lock._semlock._count()
for i in range(count):
self._lock.release()
try:
# wait for notification or timeout
return self._wait_semaphore.acquire(True, timeout)
finally:
# indicate that this thread has woken
self._woken_count.release()
# reacquire lock
for i in range(count):
self._lock.acquire()
def notify(self):
assert self._lock._semlock._is_mine(), 'lock is not owned'
assert not self._wait_semaphore.acquire(False)
# to take account of timeouts since last notify() we subtract
# woken_count from sleeping_count and rezero woken_count
while self._woken_count.acquire(False):
res = self._sleeping_count.acquire(False)
assert res
if self._sleeping_count.acquire(False): # try grabbing a sleeper
self._wait_semaphore.release() # wake up one sleeper
self._woken_count.acquire() # wait for the sleeper to wake
# rezero _wait_semaphore in case a timeout just happened
self._wait_semaphore.acquire(False)
def notify_all(self):
assert self._lock._semlock._is_mine(), 'lock is not owned'
assert not self._wait_semaphore.acquire(False)
# to take account of timeouts since last notify*() we subtract
# woken_count from sleeping_count and rezero woken_count
while self._woken_count.acquire(False):
res = self._sleeping_count.acquire(False)
assert res
sleepers = 0
while self._sleeping_count.acquire(False):
self._wait_semaphore.release() # wake up one sleeper
sleepers += 1
if sleepers:
for i in range(sleepers):
self._woken_count.acquire() # wait for a sleeper to wake
# rezero wait_semaphore in case some timeouts just happened
while self._wait_semaphore.acquire(False):
pass
def wait_for(self, predicate, timeout=None):
result = predicate()
if result:
return result
if timeout is not None:
endtime = _time() + timeout
else:
endtime = None
waittime = None
while not result:
if endtime is not None:
waittime = endtime - _time()
if waittime <= 0:
break
self.wait(waittime)
result = predicate()
return result
#
# Event
#
class Event(object):
def __init__(self, *, ctx):
self._cond = ctx.Condition(ctx.Lock())
self._flag = ctx.Semaphore(0)
def is_set(self):
self._cond.acquire()
try:
if self._flag.acquire(False):
self._flag.release()
return True
return False
finally:
self._cond.release()
def set(self):
self._cond.acquire()
try:
self._flag.acquire(False)
self._flag.release()
self._cond.notify_all()
finally:
self._cond.release()
def clear(self):
self._cond.acquire()
try:
self._flag.acquire(False)
finally:
self._cond.release()
def wait(self, timeout=None):
self._cond.acquire()
try:
if self._flag.acquire(False):
self._flag.release()
else:
self._cond.wait(timeout)
if self._flag.acquire(False):
self._flag.release()
return True
return False
finally:
self._cond.release()
#
# Barrier
#
class Barrier(threading.Barrier):
def __init__(self, parties, action=None, timeout=None, *, ctx):
import struct
from .heap import BufferWrapper
wrapper = BufferWrapper(struct.calcsize('i') * 2)
cond = ctx.Condition()
self.__setstate__((parties, action, timeout, cond, wrapper))
self._state = 0
self._count = 0
def __setstate__(self, state):
(self._parties, self._action, self._timeout,
self._cond, self._wrapper) = state
self._array = self._wrapper.create_memoryview().cast('i')
def __getstate__(self):
return (self._parties, self._action, self._timeout,
self._cond, self._wrapper)
@property
def _state(self):
return self._array[0]
@_state.setter
def _state(self, value):
self._array[0] = value
@property
def _count(self):
return self._array[1]
@_count.setter
def _count(self, value):
self._array[1] = value
#
# Module providing various facilities to other parts of the package
#
# multiprocessing/util.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
import os
import itertools
import weakref
import atexit
import threading # we want threading to install it's
# cleanup function before multiprocessing does
from subprocess import _args_from_interpreter_flags
from . import process
__all__ = [
'sub_debug', 'debug', 'info', 'sub_warning', 'get_logger',
'log_to_stderr', 'get_temp_dir', 'register_after_fork',
'is_exiting', 'Finalize', 'ForkAwareThreadLock', 'ForkAwareLocal',
'close_all_fds_except', 'SUBDEBUG', 'SUBWARNING',
]
#
# Logging
#
NOTSET = 0
SUBDEBUG = 5
DEBUG = 10
INFO = 20
SUBWARNING = 25
LOGGER_NAME = 'multiprocessing'
DEFAULT_LOGGING_FORMAT = '[%(levelname)s/%(processName)s] %(message)s'
_logger = None
_log_to_stderr = False
def sub_debug(msg, *args):
if _logger:
_logger.log(SUBDEBUG, msg, *args)
def debug(msg, *args):
if _logger:
_logger.log(DEBUG, msg, *args)
def info(msg, *args):
if _logger:
_logger.log(INFO, msg, *args)
def sub_warning(msg, *args):
if _logger:
_logger.log(SUBWARNING, msg, *args)
def get_logger():
'''
Returns logger used by multiprocessing
'''
global _logger
import logging
logging._acquireLock()
try:
if not _logger:
_logger = logging.getLogger(LOGGER_NAME)
_logger.propagate = 0
# XXX multiprocessing should cleanup before logging
if hasattr(atexit, 'unregister'):
atexit.unregister(_exit_function)
atexit.register(_exit_function)
else:
atexit._exithandlers.remove((_exit_function, (), {}))
atexit._exithandlers.append((_exit_function, (), {}))
finally:
logging._releaseLock()
return _logger
def log_to_stderr(level=None):
'''
Turn on logging and add a handler which prints to stderr
'''
global _log_to_stderr
import logging
logger = get_logger()
formatter = logging.Formatter(DEFAULT_LOGGING_FORMAT)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
if level:
logger.setLevel(level)
_log_to_stderr = True
return _logger
#
# Function returning a temp directory which will be removed on exit
#
def get_temp_dir():
# get name of a temp directory which will be automatically cleaned up
tempdir = process.current_process()._config.get('tempdir')
if tempdir is None:
import shutil, tempfile
tempdir = tempfile.mkdtemp(prefix='pymp-')
info('created temp directory %s', tempdir)
Finalize(None, shutil.rmtree, args=[tempdir], exitpriority=-100)
process.current_process()._config['tempdir'] = tempdir
return tempdir
#
# Support for reinitialization of objects when bootstrapping a child process
#
_afterfork_registry = weakref.WeakValueDictionary()
_afterfork_counter = itertools.count()
def _run_after_forkers():
items = list(_afterfork_registry.items())
items.sort()
for (index, ident, func), obj in items:
try:
func(obj)
except Exception as e:
info('after forker raised exception %s', e)
def register_after_fork(obj, func):
_afterfork_registry[(next(_afterfork_counter), id(obj), func)] = obj
#
# Finalization using weakrefs
#
_finalizer_registry = {}
_finalizer_counter = itertools.count()
class Finalize(object):
'''
Class which supports object finalization using weakrefs
'''
def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
assert exitpriority is None or type(exitpriority) is int
if obj is not None:
self._weakref = weakref.ref(obj, self)
else:
assert exitpriority is not None
self._callback = callback
self._args = args
self._kwargs = kwargs or {}
self._key = (exitpriority, next(_finalizer_counter))
self._pid = os.getpid()
_finalizer_registry[self._key] = self
def __call__(self, wr=None,
# Need to bind these locally because the globals can have
# been cleared at shutdown
_finalizer_registry=_finalizer_registry,
sub_debug=sub_debug, getpid=os.getpid):
'''
Run the callback unless it has already been called or cancelled
'''
try:
del _finalizer_registry[self._key]
except KeyError:
sub_debug('finalizer no longer registered')
else:
if self._pid != getpid():
sub_debug('finalizer ignored because different process')
res = None
else:
sub_debug('finalizer calling %s with args %s and kwargs %s',
self._callback, self._args, self._kwargs)
res = self._callback(*self._args, **self._kwargs)
self._weakref = self._callback = self._args = \
self._kwargs = self._key = None
return res
def cancel(self):
'''
Cancel finalization of the object
'''
try:
del _finalizer_registry[self._key]
except KeyError:
pass
else:
self._weakref = self._callback = self._args = \
self._kwargs = self._key = None
def still_active(self):
'''
Return whether this finalizer is still waiting to invoke callback
'''
return self._key in _finalizer_registry
def __repr__(self):
try:
obj = self._weakref()
except (AttributeError, TypeError):
obj = None
if obj is None:
return '<Finalize object, dead>'
x = '<Finalize object, callback=%s' % \
getattr(self._callback, '__name__', self._callback)
if self._args:
x += ', args=' + str(self._args)
if self._kwargs:
x += ', kwargs=' + str(self._kwargs)
if self._key[0] is not None:
x += ', exitprority=' + str(self._key[0])
return x + '>'
def _run_finalizers(minpriority=None):
'''
Run all finalizers whose exit priority is not None and at least minpriority
Finalizers with highest priority are called first; finalizers with
the same priority will be called in reverse order of creation.
'''
if _finalizer_registry is None:
# This function may be called after this module's globals are
# destroyed. See the _exit_function function in this module for more
# notes.
return
if minpriority is None:
f = lambda p : p[0][0] is not None
else:
f = lambda p : p[0][0] is not None and p[0][0] >= minpriority
items = [x for x in list(_finalizer_registry.items()) if f(x)]
items.sort(reverse=True)
for key, finalizer in items:
sub_debug('calling %s', finalizer)
try:
finalizer()
except Exception:
import traceback
traceback.print_exc()
if minpriority is None:
_finalizer_registry.clear()
#
# Clean up on exit
#
def is_exiting():
'''
Returns true if the process is shutting down
'''
return _exiting or _exiting is None
_exiting = False
def _exit_function(info=info, debug=debug, _run_finalizers=_run_finalizers,
active_children=process.active_children,
current_process=process.current_process):
# We hold on to references to functions in the arglist due to the
# situation described below, where this function is called after this
# module's globals are destroyed.
global _exiting
if not _exiting:
_exiting = True
info('process shutting down')
debug('running all "atexit" finalizers with priority >= 0')
_run_finalizers(0)
if current_process() is not None:
# We check if the current process is None here because if
# it's None, any call to ``active_children()`` will raise
# an AttributeError (active_children winds up trying to
# get attributes from util._current_process). One
# situation where this can happen is if someone has
# manipulated sys.modules, causing this module to be
# garbage collected. The destructor for the module type
# then replaces all values in the module dict with None.
# For instance, after setuptools runs a test it replaces
# sys.modules with a copy created earlier. See issues
# #9775 and #15881. Also related: #4106, #9205, and
# #9207.
for p in active_children():
if p.daemon:
info('calling terminate() for daemon %s', p.name)
p._popen.terminate()
for p in active_children():
info('calling join() for process %s', p.name)
p.join()
debug('running the remaining "atexit" finalizers')
_run_finalizers()
atexit.register(_exit_function)
#
# Some fork aware types
#
class ForkAwareThreadLock(object):
def __init__(self):
self._reset()
register_after_fork(self, ForkAwareThreadLock._reset)
def _reset(self):
self._lock = threading.Lock()
self.acquire = self._lock.acquire
self.release = self._lock.release
class ForkAwareLocal(threading.local):
def __init__(self):
register_after_fork(self, lambda obj : obj.__dict__.clear())
def __reduce__(self):
return type(self), ()
#
# Close fds except those specified
#
try:
MAXFD = os.sysconf("SC_OPEN_MAX")
except Exception:
MAXFD = 256
def close_all_fds_except(fds):
fds = list(fds) + [-1, MAXFD]
fds.sort()
assert fds[-1] == MAXFD, 'fd too large'
for i in range(len(fds) - 1):
os.closerange(fds[i]+1, fds[i+1])
#
# Start a program with only specified fds kept open
#
def spawnv_passfds(path, args, passfds):
import _posixsubprocess
passfds = sorted(passfds)
errpipe_read, errpipe_write = os.pipe()
try:
return _posixsubprocess.fork_exec(
args, [os.fsencode(path)], True, passfds, None, None,
-1, -1, -1, -1, -1, -1, errpipe_read, errpipe_write,
False, False, None)
finally:
os.close(errpipe_read)
os.close(errpipe_write)
#
# Package analogous to 'threading.py' but using processes
#
# multiprocessing/__init__.py
#
# This package is intended to duplicate the functionality (and much of
# the API) of threading.py but uses processes instead of threads. A
# subpackage 'multiprocessing.dummy' has the same API but is a simple
# wrapper for 'threading'.
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
import sys
from . import context
#
# Copy stuff from default context
#
globals().update((name, getattr(context._default_context, name))
for name in context._default_context.__all__)
__all__ = context._default_context.__all__
#
# XXX These should not really be documented or public.
#
SUBDEBUG = 5
SUBWARNING = 25
#
# Alias for main module -- will be reset by bootstrapping child processes
#
if '__main__' in sys.modules:
sys.modules['__mp_main__'] = sys.modules['__main__']
#
# Analogue of `multiprocessing.connection` which uses queues instead of sockets
#
# multiprocessing/dummy/connection.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = [ 'Client', 'Listener', 'Pipe' ]
from queue import Queue
families = [None]
class Listener(object):
def __init__(self, address=None, family=None, backlog=1):
self._backlog_queue = Queue(backlog)
def accept(self):
return Connection(*self._backlog_queue.get())
def close(self):
self._backlog_queue = None
address = property(lambda self: self._backlog_queue)
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_tb):
self.close()
def Client(address):
_in, _out = Queue(), Queue()
address.put((_out, _in))
return Connection(_in, _out)
def Pipe(duplex=True):
a, b = Queue(), Queue()
return Connection(a, b), Connection(b, a)
class Connection(object):
def __init__(self, _in, _out):
self._out = _out
self._in = _in
self.send = self.send_bytes = _out.put
self.recv = self.recv_bytes = _in.get
def poll(self, timeout=0.0):
if self._in.qsize() > 0:
return True
if timeout <= 0.0:
return False
self._in.not_empty.acquire()
self._in.not_empty.wait(timeout)
self._in.not_empty.release()
return self._in.qsize() > 0
def close(self):
pass
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_tb):
self.close()
#
# Support for the API of the multiprocessing package using threads
#
# multiprocessing/dummy/__init__.py
#
# Copyright (c) 2006-2008, R Oudkerk
# Licensed to PSF under a Contributor Agreement.
#
__all__ = [
'Process', 'current_process', 'active_children', 'freeze_support',
'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Condition',
'Event', 'Barrier', 'Queue', 'Manager', 'Pipe', 'Pool', 'JoinableQueue'
]
#
# Imports
#
import threading
import sys
import weakref
import array
from .connection import Pipe
from threading import Lock, RLock, Semaphore, BoundedSemaphore
from threading import Event, Condition, Barrier
from queue import Queue
#
#
#
class DummyProcess(threading.Thread):
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}):
threading.Thread.__init__(self, group, target, name, args, kwargs)
self._pid = None
self._children = weakref.WeakKeyDictionary()
self._start_called = False
self._parent = current_process()
def start(self):
assert self._parent is current_process()
self._start_called = True
if hasattr(self._parent, '_children'):
self._parent._children[self] = None
threading.Thread.start(self)
@property
def exitcode(self):
if self._start_called and not self.is_alive():
return 0
else:
return None
#
#
#
Process = DummyProcess
current_process = threading.current_thread
current_process()._children = weakref.WeakKeyDictionary()
def active_children():
children = current_process()._children
for p in list(children):
if not p.is_alive():
children.pop(p, None)
return list(children)
def freeze_support():
pass
#
#
#
class Namespace(object):
def __init__(self, **kwds):
self.__dict__.update(kwds)
def __repr__(self):
items = list(self.__dict__.items())
temp = []
for name, value in items:
if not name.startswith('_'):
temp.append('%s=%r' % (name, value))
temp.sort()
return 'Namespace(%s)' % str.join(', ', temp)
dict = dict
list = list
def Array(typecode, sequence, lock=True):
return array.array(typecode, sequence)
class Value(object):
def __init__(self, typecode, value, lock=True):
self._typecode = typecode
self._value = value
def _get(self):
return self._value
def _set(self, value):
self._value = value
value = property(_get, _set)
def __repr__(self):
return '<%s(%r, %r)>'%(type(self).__name__,self._typecode,self._value)
def Manager():
return sys.modules[__name__]
def shutdown():
pass
def Pool(processes=None, initializer=None, initargs=()):
from ..pool import ThreadPool
return ThreadPool(processes, initializer, initargs)
JoinableQueue = Queue
# -*- coding: utf-8 -*-
# Autogenerated by Sphinx on Mon Mar 18 09:21:03 2019
topics = {'assert': '\n'
'The "assert" statement\n'
'**********************\n'
'\n'
'Assert statements are a convenient way to insert debugging '
'assertions\n'
'into a program:\n'
'\n'
' assert_stmt ::= "assert" expression ["," expression]\n'
'\n'
'The simple form, "assert expression", is equivalent to\n'
'\n'
' if __debug__:\n'
' if not expression: raise AssertionError\n'
'\n'
'The extended form, "assert expression1, expression2", is '
'equivalent to\n'
'\n'
' if __debug__:\n'
' if not expression1: raise AssertionError(expression2)\n'
'\n'
'These equivalences assume that "__debug__" and "AssertionError" '
'refer\n'
'to the built-in variables with those names. In the current\n'
'implementation, the built-in variable "__debug__" is "True" under\n'
'normal circumstances, "False" when optimization is requested '
'(command\n'
'line option -O). The current code generator emits no code for an\n'
'assert statement when optimization is requested at compile time. '
'Note\n'
'that it is unnecessary to include the source code for the '
'expression\n'
'that failed in the error message; it will be displayed as part of '
'the\n'
'stack trace.\n'
'\n'
'Assignments to "__debug__" are illegal. The value for the '
'built-in\n'
'variable is determined when the interpreter starts.\n',
'assignment': '\n'
'Assignment statements\n'
'*********************\n'
'\n'
'Assignment statements are used to (re)bind names to values and '
'to\n'
'modify attributes or items of mutable objects:\n'
'\n'
' assignment_stmt ::= (target_list "=")+ (expression_list | '
'yield_expression)\n'
' target_list ::= target ("," target)* [","]\n'
' target ::= identifier\n'
' | "(" target_list ")"\n'
' | "[" target_list "]"\n'
' | attributeref\n'
' | subscription\n'
' | slicing\n'
' | "*" target\n'
'\n'
'(See section *Primaries* for the syntax definitions for\n'
'*attributeref*, *subscription*, and *slicing*.)\n'
'\n'
'An assignment statement evaluates the expression list '
'(remember that\n'
'this can be a single expression or a comma-separated list, the '
'latter\n'
'yielding a tuple) and assigns the single resulting object to '
'each of\n'
'the target lists, from left to right.\n'
'\n'
'Assignment is defined recursively depending on the form of the '
'target\n'
'(list). When a target is part of a mutable object (an '
'attribute\n'
'reference, subscription or slicing), the mutable object must\n'
'ultimately perform the assignment and decide about its '
'validity, and\n'
'may raise an exception if the assignment is unacceptable. The '
'rules\n'
'observed by various types and the exceptions raised are given '
'with the\n'
'definition of the object types (see section *The standard '
'type\n'
'hierarchy*).\n'
'\n'
'Assignment of an object to a target list, optionally enclosed '
'in\n'
'parentheses or square brackets, is recursively defined as '
'follows.\n'
'\n'
'* If the target list is a single target: The object is '
'assigned to\n'
' that target.\n'
'\n'
'* If the target list is a comma-separated list of targets: '
'The\n'
' object must be an iterable with the same number of items as '
'there\n'
' are targets in the target list, and the items are assigned, '
'from\n'
' left to right, to the corresponding targets.\n'
'\n'
' * If the target list contains one target prefixed with an\n'
' asterisk, called a "starred" target: The object must be a '
'sequence\n'
' with at least as many items as there are targets in the '
'target\n'
' list, minus one. The first items of the sequence are '
'assigned,\n'
' from left to right, to the targets before the starred '
'target. The\n'
' final items of the sequence are assigned to the targets '
'after the\n'
' starred target. A list of the remaining items in the '
'sequence is\n'
' then assigned to the starred target (the list can be '
'empty).\n'
'\n'
' * Else: The object must be a sequence with the same number '
'of\n'
' items as there are targets in the target list, and the '
'items are\n'
' assigned, from left to right, to the corresponding '
'targets.\n'
'\n'
'Assignment of an object to a single target is recursively '
'defined as\n'
'follows.\n'
'\n'
'* If the target is an identifier (name):\n'
'\n'
' * If the name does not occur in a "global" or "nonlocal" '
'statement\n'
' in the current code block: the name is bound to the object '
'in the\n'
' current local namespace.\n'
'\n'
' * Otherwise: the name is bound to the object in the global\n'
' namespace or the outer namespace determined by '
'"nonlocal",\n'
' respectively.\n'
'\n'
' The name is rebound if it was already bound. This may cause '
'the\n'
' reference count for the object previously bound to the name '
'to reach\n'
' zero, causing the object to be deallocated and its '
'destructor (if it\n'
' has one) to be called.\n'
'\n'
'* If the target is a target list enclosed in parentheses or '
'in\n'
' square brackets: The object must be an iterable with the '
'same number\n'
' of items as there are targets in the target list, and its '
'items are\n'
' assigned, from left to right, to the corresponding targets.\n'
'\n'
'* If the target is an attribute reference: The primary '
'expression in\n'
' the reference is evaluated. It should yield an object with\n'
' assignable attributes; if this is not the case, "TypeError" '
'is\n'
' raised. That object is then asked to assign the assigned '
'object to\n'
' the given attribute; if it cannot perform the assignment, it '
'raises\n'
' an exception (usually but not necessarily '
'"AttributeError").\n'
'\n'
' Note: If the object is a class instance and the attribute '
'reference\n'
' occurs on both sides of the assignment operator, the RHS '
'expression,\n'
' "a.x" can access either an instance attribute or (if no '
'instance\n'
' attribute exists) a class attribute. The LHS target "a.x" '
'is always\n'
' set as an instance attribute, creating it if necessary. '
'Thus, the\n'
' two occurrences of "a.x" do not necessarily refer to the '
'same\n'
' attribute: if the RHS expression refers to a class '
'attribute, the\n'
' LHS creates a new instance attribute as the target of the\n'
' assignment:\n'
'\n'
' class Cls:\n'
' x = 3 # class variable\n'
' inst = Cls()\n'
' inst.x = inst.x + 1 # writes inst.x as 4 leaving Cls.x '
'as 3\n'
'\n'
' This description does not necessarily apply to descriptor\n'
' attributes, such as properties created with "property()".\n'
'\n'
'* If the target is a subscription: The primary expression in '
'the\n'
' reference is evaluated. It should yield either a mutable '
'sequence\n'
' object (such as a list) or a mapping object (such as a '
'dictionary).\n'
' Next, the subscript expression is evaluated.\n'
'\n'
' If the primary is a mutable sequence object (such as a '
'list), the\n'
' subscript must yield an integer. If it is negative, the '
"sequence's\n"
' length is added to it. The resulting value must be a '
'nonnegative\n'
" integer less than the sequence's length, and the sequence is "
'asked\n'
' to assign the assigned object to its item with that index. '
'If the\n'
' index is out of range, "IndexError" is raised (assignment to '
'a\n'
' subscripted sequence cannot add new items to a list).\n'
'\n'
' If the primary is a mapping object (such as a dictionary), '
'the\n'
" subscript must have a type compatible with the mapping's key "
'type,\n'
' and the mapping is then asked to create a key/datum pair '
'which maps\n'
' the subscript to the assigned object. This can either '
'replace an\n'
' existing key/value pair with the same key value, or insert a '
'new\n'
' key/value pair (if no key with the same value existed).\n'
'\n'
' For user-defined objects, the "__setitem__()" method is '
'called with\n'
' appropriate arguments.\n'
'\n'
'* If the target is a slicing: The primary expression in the\n'
' reference is evaluated. It should yield a mutable sequence '
'object\n'
' (such as a list). The assigned object should be a sequence '
'object\n'
' of the same type. Next, the lower and upper bound '
'expressions are\n'
' evaluated, insofar they are present; defaults are zero and '
'the\n'
" sequence's length. The bounds should evaluate to integers. "
'If\n'
" either bound is negative, the sequence's length is added to "
'it. The\n'
' resulting bounds are clipped to lie between zero and the '
"sequence's\n"
' length, inclusive. Finally, the sequence object is asked to '
'replace\n'
' the slice with the items of the assigned sequence. The '
'length of\n'
' the slice may be different from the length of the assigned '
'sequence,\n'
' thus changing the length of the target sequence, if the '
'target\n'
' sequence allows it.\n'
'\n'
'**CPython implementation detail:** In the current '
'implementation, the\n'
'syntax for targets is taken to be the same as for expressions, '
'and\n'
'invalid syntax is rejected during the code generation phase, '
'causing\n'
'less detailed error messages.\n'
'\n'
'Although the definition of assignment implies that overlaps '
'between\n'
"the left-hand side and the right-hand side are 'simultanenous' "
'(for\n'
'example "a, b = b, a" swaps two variables), overlaps *within* '
'the\n'
'collection of assigned-to variables occur left-to-right, '
'sometimes\n'
'resulting in confusion. For instance, the following program '
'prints\n'
'"[0, 2]":\n'
'\n'
' x = [0, 1]\n'
' i = 0\n'
' i, x[i] = 1, 2 # i is updated, then x[i] is '
'updated\n'
' print(x)\n'
'\n'
'See also: **PEP 3132** - Extended Iterable Unpacking\n'
'\n'
' The specification for the "*target" feature.\n'
'\n'
'\n'
'Augmented assignment statements\n'
'===============================\n'
'\n'
'Augmented assignment is the combination, in a single '
'statement, of a\n'
'binary operation and an assignment statement:\n'
'\n'
' augmented_assignment_stmt ::= augtarget augop '
'(expression_list | yield_expression)\n'
' augtarget ::= identifier | attributeref | '
'subscription | slicing\n'
' augop ::= "+=" | "-=" | "*=" | "/=" | '
'"//=" | "%=" | "**="\n'
' | ">>=" | "<<=" | "&=" | "^=" | "|="\n'
'\n'
'(See section *Primaries* for the syntax definitions of the '
'last three\n'
'symbols.)\n'
'\n'
'An augmented assignment evaluates the target (which, unlike '
'normal\n'
'assignment statements, cannot be an unpacking) and the '
'expression\n'
'list, performs the binary operation specific to the type of '
'assignment\n'
'on the two operands, and assigns the result to the original '
'target.\n'
'The target is only evaluated once.\n'
'\n'
'An augmented assignment expression like "x += 1" can be '
'rewritten as\n'
'"x = x + 1" to achieve a similar, but not exactly equal '
'effect. In the\n'
'augmented version, "x" is only evaluated once. Also, when '
'possible,\n'
'the actual operation is performed *in-place*, meaning that '
'rather than\n'
'creating a new object and assigning that to the target, the '
'old object\n'
'is modified instead.\n'
'\n'
'Unlike normal assignments, augmented assignments evaluate the '
'left-\n'
'hand side *before* evaluating the right-hand side. For '
'example, "a[i]\n'
'+= f(x)" first looks-up "a[i]", then it evaluates "f(x)" and '
'performs\n'
'the addition, and lastly, it writes the result back to '
'"a[i]".\n'
'\n'
'With the exception of assigning to tuples and multiple targets '
'in a\n'
'single statement, the assignment done by augmented assignment\n'
'statements is handled the same way as normal assignments. '
'Similarly,\n'
'with the exception of the possible *in-place* behavior, the '
'binary\n'
'operation performed by augmented assignment is the same as the '
'normal\n'
'binary operations.\n'
'\n'
'For targets which are attribute references, the same *caveat '
'about\n'
'class and instance attributes* applies as for regular '
'assignments.\n',
'atom-identifiers': '\n'
'Identifiers (Names)\n'
'*******************\n'
'\n'
'An identifier occurring as an atom is a name. See '
'section\n'
'*Identifiers and keywords* for lexical definition and '
'section *Naming\n'
'and binding* for documentation of naming and binding.\n'
'\n'
'When the name is bound to an object, evaluation of the '
'atom yields\n'
'that object. When a name is not bound, an attempt to '
'evaluate it\n'
'raises a "NameError" exception.\n'
'\n'
'**Private name mangling:** When an identifier that '
'textually occurs in\n'
'a class definition begins with two or more underscore '
'characters and\n'
'does not end in two or more underscores, it is '
'considered a *private\n'
'name* of that class. Private names are transformed to a '
'longer form\n'
'before code is generated for them. The transformation '
'inserts the\n'
'class name, with leading underscores removed and a '
'single underscore\n'
'inserted, in front of the name. For example, the '
'identifier "__spam"\n'
'occurring in a class named "Ham" will be transformed to '
'"_Ham__spam".\n'
'This transformation is independent of the syntactical '
'context in which\n'
'the identifier is used. If the transformed name is '
'extremely long\n'
'(longer than 255 characters), implementation defined '
'truncation may\n'
'happen. If the class name consists only of underscores, '
'no\n'
'transformation is done.\n',
'atom-literals': '\n'
'Literals\n'
'********\n'
'\n'
'Python supports string and bytes literals and various '
'numeric\n'
'literals:\n'
'\n'
' literal ::= stringliteral | bytesliteral\n'
' | integer | floatnumber | imagnumber\n'
'\n'
'Evaluation of a literal yields an object of the given type '
'(string,\n'
'bytes, integer, floating point number, complex number) with '
'the given\n'
'value. The value may be approximated in the case of '
'floating point\n'
'and imaginary (complex) literals. See section *Literals* '
'for details.\n'
'\n'
'All literals correspond to immutable data types, and hence '
'the\n'
"object's identity is less important than its value. "
'Multiple\n'
'evaluations of literals with the same value (either the '
'same\n'
'occurrence in the program text or a different occurrence) '
'may obtain\n'
'the same object or a different object with the same '
'value.\n',
'attribute-access': '\n'
'Customizing attribute access\n'
'****************************\n'
'\n'
'The following methods can be defined to customize the '
'meaning of\n'
'attribute access (use of, assignment to, or deletion of '
'"x.name") for\n'
'class instances.\n'
'\n'
'object.__getattr__(self, name)\n'
'\n'
' Called when an attribute lookup has not found the '
'attribute in the\n'
' usual places (i.e. it is not an instance attribute '
'nor is it found\n'
' in the class tree for "self"). "name" is the '
'attribute name. This\n'
' method should return the (computed) attribute value '
'or raise an\n'
' "AttributeError" exception.\n'
'\n'
' Note that if the attribute is found through the '
'normal mechanism,\n'
' "__getattr__()" is not called. (This is an '
'intentional asymmetry\n'
' between "__getattr__()" and "__setattr__()".) This is '
'done both for\n'
' efficiency reasons and because otherwise '
'"__getattr__()" would have\n'
' no way to access other attributes of the instance. '
'Note that at\n'
' least for instance variables, you can fake total '
'control by not\n'
' inserting any values in the instance attribute '
'dictionary (but\n'
' instead inserting them in another object). See the\n'
' "__getattribute__()" method below for a way to '
'actually get total\n'
' control over attribute access.\n'
'\n'
'object.__getattribute__(self, name)\n'
'\n'
' Called unconditionally to implement attribute '
'accesses for\n'
' instances of the class. If the class also defines '
'"__getattr__()",\n'
' the latter will not be called unless '
'"__getattribute__()" either\n'
' calls it explicitly or raises an "AttributeError". '
'This method\n'
' should return the (computed) attribute value or raise '
'an\n'
' "AttributeError" exception. In order to avoid '
'infinite recursion in\n'
' this method, its implementation should always call '
'the base class\n'
' method with the same name to access any attributes it '
'needs, for\n'
' example, "object.__getattribute__(self, name)".\n'
'\n'
' Note: This method may still be bypassed when looking '
'up special\n'
' methods as the result of implicit invocation via '
'language syntax\n'
' or built-in functions. See *Special method '
'lookup*.\n'
'\n'
'object.__setattr__(self, name, value)\n'
'\n'
' Called when an attribute assignment is attempted. '
'This is called\n'
' instead of the normal mechanism (i.e. store the value '
'in the\n'
' instance dictionary). *name* is the attribute name, '
'*value* is the\n'
' value to be assigned to it.\n'
'\n'
' If "__setattr__()" wants to assign to an instance '
'attribute, it\n'
' should call the base class method with the same name, '
'for example,\n'
' "object.__setattr__(self, name, value)".\n'
'\n'
'object.__delattr__(self, name)\n'
'\n'
' Like "__setattr__()" but for attribute deletion '
'instead of\n'
' assignment. This should only be implemented if "del '
'obj.name" is\n'
' meaningful for the object.\n'
'\n'
'object.__dir__(self)\n'
'\n'
' Called when "dir()" is called on the object. A '
'sequence must be\n'
' returned. "dir()" converts the returned sequence to a '
'list and\n'
' sorts it.\n'
'\n'
'\n'
'Implementing Descriptors\n'
'========================\n'
'\n'
'The following methods only apply when an instance of the '
'class\n'
'containing the method (a so-called *descriptor* class) '
'appears in an\n'
'*owner* class (the descriptor must be in either the '
"owner's class\n"
'dictionary or in the class dictionary for one of its '
'parents). In the\n'
'examples below, "the attribute" refers to the attribute '
'whose name is\n'
"the key of the property in the owner class' "
'"__dict__".\n'
'\n'
'object.__get__(self, instance, owner)\n'
'\n'
' Called to get the attribute of the owner class (class '
'attribute\n'
' access) or of an instance of that class (instance '
'attribute\n'
' access). *owner* is always the owner class, while '
'*instance* is the\n'
' instance that the attribute was accessed through, or '
'"None" when\n'
' the attribute is accessed through the *owner*. This '
'method should\n'
' return the (computed) attribute value or raise an '
'"AttributeError"\n'
' exception.\n'
'\n'
'object.__set__(self, instance, value)\n'
'\n'
' Called to set the attribute on an instance *instance* '
'of the owner\n'
' class to a new value, *value*.\n'
'\n'
'object.__delete__(self, instance)\n'
'\n'
' Called to delete the attribute on an instance '
'*instance* of the\n'
' owner class.\n'
'\n'
'The attribute "__objclass__" is interpreted by the '
'"inspect" module as\n'
'specifying the class where this object was defined '
'(setting this\n'
'appropriately can assist in runtime introspection of '
'dynamic class\n'
'attributes). For callables, it may indicate that an '
'instance of the\n'
'given type (or a subclass) is expected or required as '
'the first\n'
'positional argument (for example, CPython sets this '
'attribute for\n'
'unbound methods that are implemented in C).\n'
'\n'
'\n'
'Invoking Descriptors\n'
'====================\n'
'\n'
'In general, a descriptor is an object attribute with '
'"binding\n'
'behavior", one whose attribute access has been '
'overridden by methods\n'
'in the descriptor protocol: "__get__()", "__set__()", '
'and\n'
'"__delete__()". If any of those methods are defined for '
'an object, it\n'
'is said to be a descriptor.\n'
'\n'
'The default behavior for attribute access is to get, '
'set, or delete\n'
"the attribute from an object's dictionary. For instance, "
'"a.x" has a\n'
'lookup chain starting with "a.__dict__[\'x\']", then\n'
'"type(a).__dict__[\'x\']", and continuing through the '
'base classes of\n'
'"type(a)" excluding metaclasses.\n'
'\n'
'However, if the looked-up value is an object defining '
'one of the\n'
'descriptor methods, then Python may override the default '
'behavior and\n'
'invoke the descriptor method instead. Where this occurs '
'in the\n'
'precedence chain depends on which descriptor methods '
'were defined and\n'
'how they were called.\n'
'\n'
'The starting point for descriptor invocation is a '
'binding, "a.x". How\n'
'the arguments are assembled depends on "a":\n'
'\n'
'Direct Call\n'
' The simplest and least common call is when user code '
'directly\n'
' invokes a descriptor method: "x.__get__(a)".\n'
'\n'
'Instance Binding\n'
' If binding to an object instance, "a.x" is '
'transformed into the\n'
' call: "type(a).__dict__[\'x\'].__get__(a, type(a))".\n'
'\n'
'Class Binding\n'
' If binding to a class, "A.x" is transformed into the '
'call:\n'
' "A.__dict__[\'x\'].__get__(None, A)".\n'
'\n'
'Super Binding\n'
' If "a" is an instance of "super", then the binding '
'"super(B,\n'
' obj).m()" searches "obj.__class__.__mro__" for the '
'base class "A"\n'
' immediately preceding "B" and then invokes the '
'descriptor with the\n'
' call: "A.__dict__[\'m\'].__get__(obj, '
'obj.__class__)".\n'
'\n'
'For instance bindings, the precedence of descriptor '
'invocation depends\n'
'on the which descriptor methods are defined. A '
'descriptor can define\n'
'any combination of "__get__()", "__set__()" and '
'"__delete__()". If it\n'
'does not define "__get__()", then accessing the '
'attribute will return\n'
'the descriptor object itself unless there is a value in '
"the object's\n"
'instance dictionary. If the descriptor defines '
'"__set__()" and/or\n'
'"__delete__()", it is a data descriptor; if it defines '
'neither, it is\n'
'a non-data descriptor. Normally, data descriptors '
'define both\n'
'"__get__()" and "__set__()", while non-data descriptors '
'have just the\n'
'"__get__()" method. Data descriptors with "__set__()" '
'and "__get__()"\n'
'defined always override a redefinition in an instance '
'dictionary. In\n'
'contrast, non-data descriptors can be overridden by '
'instances.\n'
'\n'
'Python methods (including "staticmethod()" and '
'"classmethod()") are\n'
'implemented as non-data descriptors. Accordingly, '
'instances can\n'
'redefine and override methods. This allows individual '
'instances to\n'
'acquire behaviors that differ from other instances of '
'the same class.\n'
'\n'
'The "property()" function is implemented as a data '
'descriptor.\n'
'Accordingly, instances cannot override the behavior of a '
'property.\n'
'\n'
'\n'
'__slots__\n'
'=========\n'
'\n'
'By default, instances of classes have a dictionary for '
'attribute\n'
'storage. This wastes space for objects having very few '
'instance\n'
'variables. The space consumption can become acute when '
'creating large\n'
'numbers of instances.\n'
'\n'
'The default can be overridden by defining *__slots__* in '
'a class\n'
'definition. The *__slots__* declaration takes a sequence '
'of instance\n'
'variables and reserves just enough space in each '
'instance to hold a\n'
'value for each variable. Space is saved because '
'*__dict__* is not\n'
'created for each instance.\n'
'\n'
'object.__slots__\n'
'\n'
' This class variable can be assigned a string, '
'iterable, or sequence\n'
' of strings with variable names used by instances. '
'*__slots__*\n'
' reserves space for the declared variables and '
'prevents the\n'
' automatic creation of *__dict__* and *__weakref__* '
'for each\n'
' instance.\n'
'\n'
'\n'
'Notes on using *__slots__*\n'
'--------------------------\n'
'\n'
'* When inheriting from a class without *__slots__*, the '
'*__dict__*\n'
' attribute of that class will always be accessible, so '
'a *__slots__*\n'
' definition in the subclass is meaningless.\n'
'\n'
'* Without a *__dict__* variable, instances cannot be '
'assigned new\n'
' variables not listed in the *__slots__* definition. '
'Attempts to\n'
' assign to an unlisted variable name raises '
'"AttributeError". If\n'
' dynamic assignment of new variables is desired, then '
'add\n'
' "\'__dict__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
'* Without a *__weakref__* variable for each instance, '
'classes\n'
' defining *__slots__* do not support weak references to '
'its\n'
' instances. If weak reference support is needed, then '
'add\n'
' "\'__weakref__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
'* *__slots__* are implemented at the class level by '
'creating\n'
' descriptors (*Implementing Descriptors*) for each '
'variable name. As\n'
' a result, class attributes cannot be used to set '
'default values for\n'
' instance variables defined by *__slots__*; otherwise, '
'the class\n'
' attribute would overwrite the descriptor assignment.\n'
'\n'
'* The action of a *__slots__* declaration is limited to '
'the class\n'
' where it is defined. As a result, subclasses will '
'have a *__dict__*\n'
' unless they also define *__slots__* (which must only '
'contain names\n'
' of any *additional* slots).\n'
'\n'
'* If a class defines a slot also defined in a base '
'class, the\n'
' instance variable defined by the base class slot is '
'inaccessible\n'
' (except by retrieving its descriptor directly from the '
'base class).\n'
' This renders the meaning of the program undefined. In '
'the future, a\n'
' check may be added to prevent this.\n'
'\n'
'* Nonempty *__slots__* does not work for classes derived '
'from\n'
' "variable-length" built-in types such as "int", '
'"bytes" and "tuple".\n'
'\n'
'* Any non-string iterable may be assigned to '
'*__slots__*. Mappings\n'
' may also be used; however, in the future, special '
'meaning may be\n'
' assigned to the values corresponding to each key.\n'
'\n'
'* *__class__* assignment works only if both classes have '
'the same\n'
' *__slots__*.\n',
'attribute-references': '\n'
'Attribute references\n'
'********************\n'
'\n'
'An attribute reference is a primary followed by a '
'period and a name:\n'
'\n'
' attributeref ::= primary "." identifier\n'
'\n'
'The primary must evaluate to an object of a type '
'that supports\n'
'attribute references, which most objects do. This '
'object is then\n'
'asked to produce the attribute whose name is the '
'identifier. This\n'
'production can be customized by overriding the '
'"__getattr__()" method.\n'
'If this attribute is not available, the exception '
'"AttributeError" is\n'
'raised. Otherwise, the type and value of the object '
'produced is\n'
'determined by the object. Multiple evaluations of '
'the same attribute\n'
'reference may yield different objects.\n',
'augassign': '\n'
'Augmented assignment statements\n'
'*******************************\n'
'\n'
'Augmented assignment is the combination, in a single statement, '
'of a\n'
'binary operation and an assignment statement:\n'
'\n'
' augmented_assignment_stmt ::= augtarget augop '
'(expression_list | yield_expression)\n'
' augtarget ::= identifier | attributeref | '
'subscription | slicing\n'
' augop ::= "+=" | "-=" | "*=" | "/=" | '
'"//=" | "%=" | "**="\n'
' | ">>=" | "<<=" | "&=" | "^=" | "|="\n'
'\n'
'(See section *Primaries* for the syntax definitions of the last '
'three\n'
'symbols.)\n'
'\n'
'An augmented assignment evaluates the target (which, unlike '
'normal\n'
'assignment statements, cannot be an unpacking) and the '
'expression\n'
'list, performs the binary operation specific to the type of '
'assignment\n'
'on the two operands, and assigns the result to the original '
'target.\n'
'The target is only evaluated once.\n'
'\n'
'An augmented assignment expression like "x += 1" can be '
'rewritten as\n'
'"x = x + 1" to achieve a similar, but not exactly equal effect. '
'In the\n'
'augmented version, "x" is only evaluated once. Also, when '
'possible,\n'
'the actual operation is performed *in-place*, meaning that '
'rather than\n'
'creating a new object and assigning that to the target, the old '
'object\n'
'is modified instead.\n'
'\n'
'Unlike normal assignments, augmented assignments evaluate the '
'left-\n'
'hand side *before* evaluating the right-hand side. For '
'example, "a[i]\n'
'+= f(x)" first looks-up "a[i]", then it evaluates "f(x)" and '
'performs\n'
'the addition, and lastly, it writes the result back to "a[i]".\n'
'\n'
'With the exception of assigning to tuples and multiple targets '
'in a\n'
'single statement, the assignment done by augmented assignment\n'
'statements is handled the same way as normal assignments. '
'Similarly,\n'
'with the exception of the possible *in-place* behavior, the '
'binary\n'
'operation performed by augmented assignment is the same as the '
'normal\n'
'binary operations.\n'
'\n'
'For targets which are attribute references, the same *caveat '
'about\n'
'class and instance attributes* applies as for regular '
'assignments.\n',
'binary': '\n'
'Binary arithmetic operations\n'
'****************************\n'
'\n'
'The binary arithmetic operations have the conventional priority\n'
'levels. Note that some of these operations also apply to certain '
'non-\n'
'numeric types. Apart from the power operator, there are only two\n'
'levels, one for multiplicative operators and one for additive\n'
'operators:\n'
'\n'
' m_expr ::= u_expr | m_expr "*" u_expr | m_expr "//" u_expr | '
'm_expr "/" u_expr\n'
' | m_expr "%" u_expr\n'
' a_expr ::= m_expr | a_expr "+" m_expr | a_expr "-" m_expr\n'
'\n'
'The "*" (multiplication) operator yields the product of its '
'arguments.\n'
'The arguments must either both be numbers, or one argument must be '
'an\n'
'integer and the other must be a sequence. In the former case, the\n'
'numbers are converted to a common type and then multiplied '
'together.\n'
'In the latter case, sequence repetition is performed; a negative\n'
'repetition factor yields an empty sequence.\n'
'\n'
'The "/" (division) and "//" (floor division) operators yield the\n'
'quotient of their arguments. The numeric arguments are first\n'
'converted to a common type. Division of integers yields a float, '
'while\n'
'floor division of integers results in an integer; the result is '
'that\n'
"of mathematical division with the 'floor' function applied to the\n"
'result. Division by zero raises the "ZeroDivisionError" '
'exception.\n'
'\n'
'The "%" (modulo) operator yields the remainder from the division '
'of\n'
'the first argument by the second. The numeric arguments are '
'first\n'
'converted to a common type. A zero right argument raises the\n'
'"ZeroDivisionError" exception. The arguments may be floating '
'point\n'
'numbers, e.g., "3.14%0.7" equals "0.34" (since "3.14" equals '
'"4*0.7 +\n'
'0.34".) The modulo operator always yields a result with the same '
'sign\n'
'as its second operand (or zero); the absolute value of the result '
'is\n'
'strictly smaller than the absolute value of the second operand '
'[1].\n'
'\n'
'The floor division and modulo operators are connected by the '
'following\n'
'identity: "x == (x//y)*y + (x%y)". Floor division and modulo are '
'also\n'
'connected with the built-in function "divmod()": "divmod(x, y) ==\n'
'(x//y, x%y)". [2].\n'
'\n'
'In addition to performing the modulo operation on numbers, the '
'"%"\n'
'operator is also overloaded by string objects to perform '
'old-style\n'
'string formatting (also known as interpolation). The syntax for\n'
'string formatting is described in the Python Library Reference,\n'
'section *printf-style String Formatting*.\n'
'\n'
'The floor division operator, the modulo operator, and the '
'"divmod()"\n'
'function are not defined for complex numbers. Instead, convert to '
'a\n'
'floating point number using the "abs()" function if appropriate.\n'
'\n'
'The "+" (addition) operator yields the sum of its arguments. The\n'
'arguments must either both be numbers or both be sequences of the '
'same\n'
'type. In the former case, the numbers are converted to a common '
'type\n'
'and then added together. In the latter case, the sequences are\n'
'concatenated.\n'
'\n'
'The "-" (subtraction) operator yields the difference of its '
'arguments.\n'
'The numeric arguments are first converted to a common type.\n',
'bitwise': '\n'
'Binary bitwise operations\n'
'*************************\n'
'\n'
'Each of the three bitwise operations has a different priority '
'level:\n'
'\n'
' and_expr ::= shift_expr | and_expr "&" shift_expr\n'
' xor_expr ::= and_expr | xor_expr "^" and_expr\n'
' or_expr ::= xor_expr | or_expr "|" xor_expr\n'
'\n'
'The "&" operator yields the bitwise AND of its arguments, which '
'must\n'
'be integers.\n'
'\n'
'The "^" operator yields the bitwise XOR (exclusive OR) of its\n'
'arguments, which must be integers.\n'
'\n'
'The "|" operator yields the bitwise (inclusive) OR of its '
'arguments,\n'
'which must be integers.\n',
'bltin-code-objects': '\n'
'Code Objects\n'
'************\n'
'\n'
'Code objects are used by the implementation to '
'represent "pseudo-\n'
'compiled" executable Python code such as a function '
'body. They differ\n'
"from function objects because they don't contain a "
'reference to their\n'
'global execution environment. Code objects are '
'returned by the built-\n'
'in "compile()" function and can be extracted from '
'function objects\n'
'through their "__code__" attribute. See also the '
'"code" module.\n'
'\n'
'A code object can be executed or evaluated by passing '
'it (instead of a\n'
'source string) to the "exec()" or "eval()" built-in '
'functions.\n'
'\n'
'See *The standard type hierarchy* for more '
'information.\n',
'bltin-ellipsis-object': '\n'
'The Ellipsis Object\n'
'*******************\n'
'\n'
'This object is commonly used by slicing (see '
'*Slicings*). It supports\n'
'no special operations. There is exactly one '
'ellipsis object, named\n'
'"Ellipsis" (a built-in name). "type(Ellipsis)()" '
'produces the\n'
'"Ellipsis" singleton.\n'
'\n'
'It is written as "Ellipsis" or "...".\n',
'bltin-null-object': '\n'
'The Null Object\n'
'***************\n'
'\n'
"This object is returned by functions that don't "
'explicitly return a\n'
'value. It supports no special operations. There is '
'exactly one null\n'
'object, named "None" (a built-in name). "type(None)()" '
'produces the\n'
'same singleton.\n'
'\n'
'It is written as "None".\n',
'bltin-type-objects': '\n'
'Type Objects\n'
'************\n'
'\n'
'Type objects represent the various object types. An '
"object's type is\n"
'accessed by the built-in function "type()". There are '
'no special\n'
'operations on types. The standard module "types" '
'defines names for\n'
'all standard built-in types.\n'
'\n'
'Types are written like this: "<class \'int\'>".\n',
'booleans': '\n'
'Boolean operations\n'
'******************\n'
'\n'
' or_test ::= and_test | or_test "or" and_test\n'
' and_test ::= not_test | and_test "and" not_test\n'
' not_test ::= comparison | "not" not_test\n'
'\n'
'In the context of Boolean operations, and also when expressions '
'are\n'
'used by control flow statements, the following values are '
'interpreted\n'
'as false: "False", "None", numeric zero of all types, and empty\n'
'strings and containers (including strings, tuples, lists,\n'
'dictionaries, sets and frozensets). All other values are '
'interpreted\n'
'as true. User-defined objects can customize their truth value '
'by\n'
'providing a "__bool__()" method.\n'
'\n'
'The operator "not" yields "True" if its argument is false, '
'"False"\n'
'otherwise.\n'
'\n'
'The expression "x and y" first evaluates *x*; if *x* is false, '
'its\n'
'value is returned; otherwise, *y* is evaluated and the resulting '
'value\n'
'is returned.\n'
'\n'
'The expression "x or y" first evaluates *x*; if *x* is true, its '
'value\n'
'is returned; otherwise, *y* is evaluated and the resulting value '
'is\n'
'returned.\n'
'\n'
'(Note that neither "and" nor "or" restrict the value and type '
'they\n'
'return to "False" and "True", but rather return the last '
'evaluated\n'
'argument. This is sometimes useful, e.g., if "s" is a string '
'that\n'
'should be replaced by a default value if it is empty, the '
'expression\n'
'"s or \'foo\'" yields the desired value. Because "not" has to '
'create a\n'
'new value, it returns a boolean value regardless of the type of '
'its\n'
'argument (for example, "not \'foo\'" produces "False" rather '
'than "\'\'".)\n',
'break': '\n'
'The "break" statement\n'
'*********************\n'
'\n'
' break_stmt ::= "break"\n'
'\n'
'"break" may only occur syntactically nested in a "for" or "while"\n'
'loop, but not nested in a function or class definition within that\n'
'loop.\n'
'\n'
'It terminates the nearest enclosing loop, skipping the optional '
'"else"\n'
'clause if the loop has one.\n'
'\n'
'If a "for" loop is terminated by "break", the loop control target\n'
'keeps its current value.\n'
'\n'
'When "break" passes control out of a "try" statement with a '
'"finally"\n'
'clause, that "finally" clause is executed before really leaving '
'the\n'
'loop.\n',
'callable-types': '\n'
'Emulating callable objects\n'
'**************************\n'
'\n'
'object.__call__(self[, args...])\n'
'\n'
' Called when the instance is "called" as a function; if '
'this method\n'
' is defined, "x(arg1, arg2, ...)" is a shorthand for\n'
' "x.__call__(arg1, arg2, ...)".\n',
'calls': '\n'
'Calls\n'
'*****\n'
'\n'
'A call calls a callable object (e.g., a *function*) with a '
'possibly\n'
'empty series of *arguments*:\n'
'\n'
' call ::= primary "(" [argument_list [","] | '
'comprehension] ")"\n'
' argument_list ::= positional_arguments ["," '
'keyword_arguments]\n'
' ["," "*" expression] ["," '
'keyword_arguments]\n'
' ["," "**" expression]\n'
' | keyword_arguments ["," "*" expression]\n'
' ["," keyword_arguments] ["," "**" '
'expression]\n'
' | "*" expression ["," keyword_arguments] ["," '
'"**" expression]\n'
' | "**" expression\n'
' positional_arguments ::= expression ("," expression)*\n'
' keyword_arguments ::= keyword_item ("," keyword_item)*\n'
' keyword_item ::= identifier "=" expression\n'
'\n'
'An optional trailing comma may be present after the positional and\n'
'keyword arguments but does not affect the semantics.\n'
'\n'
'The primary must evaluate to a callable object (user-defined\n'
'functions, built-in functions, methods of built-in objects, class\n'
'objects, methods of class instances, and all objects having a\n'
'"__call__()" method are callable). All argument expressions are\n'
'evaluated before the call is attempted. Please refer to section\n'
'*Function definitions* for the syntax of formal *parameter* lists.\n'
'\n'
'If keyword arguments are present, they are first converted to\n'
'positional arguments, as follows. First, a list of unfilled slots '
'is\n'
'created for the formal parameters. If there are N positional\n'
'arguments, they are placed in the first N slots. Next, for each\n'
'keyword argument, the identifier is used to determine the\n'
'corresponding slot (if the identifier is the same as the first '
'formal\n'
'parameter name, the first slot is used, and so on). If the slot '
'is\n'
'already filled, a "TypeError" exception is raised. Otherwise, the\n'
'value of the argument is placed in the slot, filling it (even if '
'the\n'
'expression is "None", it fills the slot). When all arguments have\n'
'been processed, the slots that are still unfilled are filled with '
'the\n'
'corresponding default value from the function definition. '
'(Default\n'
'values are calculated, once, when the function is defined; thus, a\n'
'mutable object such as a list or dictionary used as default value '
'will\n'
"be shared by all calls that don't specify an argument value for "
'the\n'
'corresponding slot; this should usually be avoided.) If there are '
'any\n'
'unfilled slots for which no default value is specified, a '
'"TypeError"\n'
'exception is raised. Otherwise, the list of filled slots is used '
'as\n'
'the argument list for the call.\n'
'\n'
'**CPython implementation detail:** An implementation may provide\n'
'built-in functions whose positional parameters do not have names, '
'even\n'
"if they are 'named' for the purpose of documentation, and which\n"
'therefore cannot be supplied by keyword. In CPython, this is the '
'case\n'
'for functions implemented in C that use "PyArg_ParseTuple()" to '
'parse\n'
'their arguments.\n'
'\n'
'If there are more positional arguments than there are formal '
'parameter\n'
'slots, a "TypeError" exception is raised, unless a formal '
'parameter\n'
'using the syntax "*identifier" is present; in this case, that '
'formal\n'
'parameter receives a tuple containing the excess positional '
'arguments\n'
'(or an empty tuple if there were no excess positional arguments).\n'
'\n'
'If any keyword argument does not correspond to a formal parameter\n'
'name, a "TypeError" exception is raised, unless a formal parameter\n'
'using the syntax "**identifier" is present; in this case, that '
'formal\n'
'parameter receives a dictionary containing the excess keyword\n'
'arguments (using the keywords as keys and the argument values as\n'
'corresponding values), or a (new) empty dictionary if there were '
'no\n'
'excess keyword arguments.\n'
'\n'
'If the syntax "*expression" appears in the function call, '
'"expression"\n'
'must evaluate to an iterable. Elements from this iterable are '
'treated\n'
'as if they were additional positional arguments; if there are\n'
'positional arguments *x1*, ..., *xN*, and "expression" evaluates to '
'a\n'
'sequence *y1*, ..., *yM*, this is equivalent to a call with M+N\n'
'positional arguments *x1*, ..., *xN*, *y1*, ..., *yM*.\n'
'\n'
'A consequence of this is that although the "*expression" syntax '
'may\n'
'appear *after* some keyword arguments, it is processed *before* '
'the\n'
'keyword arguments (and the "**expression" argument, if any -- see\n'
'below). So:\n'
'\n'
' >>> def f(a, b):\n'
' ... print(a, b)\n'
' ...\n'
' >>> f(b=1, *(2,))\n'
' 2 1\n'
' >>> f(a=1, *(2,))\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in ?\n'
" TypeError: f() got multiple values for keyword argument 'a'\n"
' >>> f(1, *(2,))\n'
' 1 2\n'
'\n'
'It is unusual for both keyword arguments and the "*expression" '
'syntax\n'
'to be used in the same call, so in practice this confusion does '
'not\n'
'arise.\n'
'\n'
'If the syntax "**expression" appears in the function call,\n'
'"expression" must evaluate to a mapping, the contents of which are\n'
'treated as additional keyword arguments. In the case of a keyword\n'
'appearing in both "expression" and as an explicit keyword argument, '
'a\n'
'"TypeError" exception is raised.\n'
'\n'
'Formal parameters using the syntax "*identifier" or "**identifier"\n'
'cannot be used as positional argument slots or as keyword argument\n'
'names.\n'
'\n'
'A call always returns some value, possibly "None", unless it raises '
'an\n'
'exception. How this value is computed depends on the type of the\n'
'callable object.\n'
'\n'
'If it is---\n'
'\n'
'a user-defined function:\n'
' The code block for the function is executed, passing it the\n'
' argument list. The first thing the code block will do is bind '
'the\n'
' formal parameters to the arguments; this is described in '
'section\n'
' *Function definitions*. When the code block executes a '
'"return"\n'
' statement, this specifies the return value of the function '
'call.\n'
'\n'
'a built-in function or method:\n'
' The result is up to the interpreter; see *Built-in Functions* '
'for\n'
' the descriptions of built-in functions and methods.\n'
'\n'
'a class object:\n'
' A new instance of that class is returned.\n'
'\n'
'a class instance method:\n'
' The corresponding user-defined function is called, with an '
'argument\n'
' list that is one longer than the argument list of the call: the\n'
' instance becomes the first argument.\n'
'\n'
'a class instance:\n'
' The class must define a "__call__()" method; the effect is then '
'the\n'
' same as if that method was called.\n',
'class': '\n'
'Class definitions\n'
'*****************\n'
'\n'
'A class definition defines a class object (see section *The '
'standard\n'
'type hierarchy*):\n'
'\n'
' classdef ::= [decorators] "class" classname [inheritance] ":" '
'suite\n'
' inheritance ::= "(" [parameter_list] ")"\n'
' classname ::= identifier\n'
'\n'
'A class definition is an executable statement. The inheritance '
'list\n'
'usually gives a list of base classes (see *Customizing class '
'creation*\n'
'for more advanced uses), so each item in the list should evaluate '
'to a\n'
'class object which allows subclassing. Classes without an '
'inheritance\n'
'list inherit, by default, from the base class "object"; hence,\n'
'\n'
' class Foo:\n'
' pass\n'
'\n'
'is equivalent to\n'
'\n'
' class Foo(object):\n'
' pass\n'
'\n'
"The class's suite is then executed in a new execution frame (see\n"
'*Naming and binding*), using a newly created local namespace and '
'the\n'
'original global namespace. (Usually, the suite contains mostly\n'
"function definitions.) When the class's suite finishes execution, "
'its\n'
'execution frame is discarded but its local namespace is saved. [4] '
'A\n'
'class object is then created using the inheritance list for the '
'base\n'
'classes and the saved local namespace for the attribute '
'dictionary.\n'
'The class name is bound to this class object in the original local\n'
'namespace.\n'
'\n'
'Class creation can be customized heavily using *metaclasses*.\n'
'\n'
'Classes can also be decorated: just like when decorating '
'functions,\n'
'\n'
' @f1(arg)\n'
' @f2\n'
' class Foo: pass\n'
'\n'
'is equivalent to\n'
'\n'
' class Foo: pass\n'
' Foo = f1(arg)(f2(Foo))\n'
'\n'
'The evaluation rules for the decorator expressions are the same as '
'for\n'
'function decorators. The result must be a class object, which is '
'then\n'
'bound to the class name.\n'
'\n'
"**Programmer's note:** Variables defined in the class definition "
'are\n'
'class attributes; they are shared by instances. Instance '
'attributes\n'
'can be set in a method with "self.name = value". Both class and\n'
'instance attributes are accessible through the notation '
'""self.name"",\n'
'and an instance attribute hides a class attribute with the same '
'name\n'
'when accessed in this way. Class attributes can be used as '
'defaults\n'
'for instance attributes, but using mutable values there can lead '
'to\n'
'unexpected results. *Descriptors* can be used to create instance\n'
'variables with different implementation details.\n'
'\n'
'See also: **PEP 3115** - Metaclasses in Python 3 **PEP 3129** -\n'
' Class Decorators\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] The exception is propagated to the invocation stack unless\n'
' there is a "finally" clause which happens to raise another\n'
' exception. That new exception causes the old one to be lost.\n'
'\n'
'[2] Currently, control "flows off the end" except in the case of\n'
' an exception or the execution of a "return", "continue", or\n'
' "break" statement.\n'
'\n'
'[3] A string literal appearing as the first statement in the\n'
' function body is transformed into the function\'s "__doc__"\n'
" attribute and therefore the function's *docstring*.\n"
'\n'
'[4] A string literal appearing as the first statement in the class\n'
' body is transformed into the namespace\'s "__doc__" item and\n'
" therefore the class's *docstring*.\n",
'comparisons': '\n'
'Comparisons\n'
'***********\n'
'\n'
'Unlike C, all comparison operations in Python have the same '
'priority,\n'
'which is lower than that of any arithmetic, shifting or '
'bitwise\n'
'operation. Also unlike C, expressions like "a < b < c" have '
'the\n'
'interpretation that is conventional in mathematics:\n'
'\n'
' comparison ::= or_expr ( comp_operator or_expr )*\n'
' comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!="\n'
' | "is" ["not"] | ["not"] "in"\n'
'\n'
'Comparisons yield boolean values: "True" or "False".\n'
'\n'
'Comparisons can be chained arbitrarily, e.g., "x < y <= z" '
'is\n'
'equivalent to "x < y and y <= z", except that "y" is '
'evaluated only\n'
'once (but in both cases "z" is not evaluated at all when "x < '
'y" is\n'
'found to be false).\n'
'\n'
'Formally, if *a*, *b*, *c*, ..., *y*, *z* are expressions and '
'*op1*,\n'
'*op2*, ..., *opN* are comparison operators, then "a op1 b op2 '
'c ... y\n'
'opN z" is equivalent to "a op1 b and b op2 c and ... y opN '
'z", except\n'
'that each expression is evaluated at most once.\n'
'\n'
'Note that "a op1 b op2 c" doesn\'t imply any kind of '
'comparison between\n'
'*a* and *c*, so that, e.g., "x < y > z" is perfectly legal '
'(though\n'
'perhaps not pretty).\n'
'\n'
'\n'
'Value comparisons\n'
'=================\n'
'\n'
'The operators "<", ">", "==", ">=", "<=", and "!=" compare '
'the values\n'
'of two objects. The objects do not need to have the same '
'type.\n'
'\n'
'Chapter *Objects, values and types* states that objects have '
'a value\n'
'(in addition to type and identity). The value of an object '
'is a\n'
'rather abstract notion in Python: For example, there is no '
'canonical\n'
"access method for an object's value. Also, there is no "
'requirement\n'
'that the value of an object should be constructed in a '
'particular way,\n'
'e.g. comprised of all its data attributes. Comparison '
'operators\n'
'implement a particular notion of what the value of an object '
'is. One\n'
'can think of them as defining the value of an object '
'indirectly, by\n'
'means of their comparison implementation.\n'
'\n'
'Because all types are (direct or indirect) subtypes of '
'"object", they\n'
'inherit the default comparison behavior from "object". Types '
'can\n'
'customize their comparison behavior by implementing *rich '
'comparison\n'
'methods* like "__lt__()", described in *Basic '
'customization*.\n'
'\n'
'The default behavior for equality comparison ("==" and "!=") '
'is based\n'
'on the identity of the objects. Hence, equality comparison '
'of\n'
'instances with the same identity results in equality, and '
'equality\n'
'comparison of instances with different identities results in\n'
'inequality. A motivation for this default behavior is the '
'desire that\n'
'all objects should be reflexive (i.e. "x is y" implies "x == '
'y").\n'
'\n'
'A default order comparison ("<", ">", "<=", and ">=") is not '
'provided;\n'
'an attempt raises "TypeError". A motivation for this default '
'behavior\n'
'is the lack of a similar invariant as for equality.\n'
'\n'
'The behavior of the default equality comparison, that '
'instances with\n'
'different identities are always unequal, may be in contrast '
'to what\n'
'types will need that have a sensible definition of object '
'value and\n'
'value-based equality. Such types will need to customize '
'their\n'
'comparison behavior, and in fact, a number of built-in types '
'have done\n'
'that.\n'
'\n'
'The following list describes the comparison behavior of the '
'most\n'
'important built-in types.\n'
'\n'
'* Numbers of built-in numeric types (*Numeric Types --- int, '
'float,\n'
' complex*) and of the standard library types '
'"fractions.Fraction" and\n'
' "decimal.Decimal" can be compared within and across their '
'types,\n'
' with the restriction that complex numbers do not support '
'order\n'
' comparison. Within the limits of the types involved, they '
'compare\n'
' mathematically (algorithmically) correct without loss of '
'precision.\n'
'\n'
' The not-a-number values "float(\'NaN\')" and '
'"Decimal(\'NaN\')" are\n'
' special. They are identical to themselves ("x is x" is '
'true) but\n'
' are not equal to themselves ("x == x" is false). '
'Additionally,\n'
' comparing any number to a not-a-number value will return '
'"False".\n'
' For example, both "3 < float(\'NaN\')" and "float(\'NaN\') '
'< 3" will\n'
' return "False".\n'
'\n'
'* Binary sequences (instances of "bytes" or "bytearray") can '
'be\n'
' compared within and across their types. They compare\n'
' lexicographically using the numeric values of their '
'elements.\n'
'\n'
'* Strings (instances of "str") compare lexicographically '
'using the\n'
' numerical Unicode code points (the result of the built-in '
'function\n'
' "ord()") of their characters. [3]\n'
'\n'
' Strings and binary sequences cannot be directly compared.\n'
'\n'
'* Sequences (instances of "tuple", "list", or "range") can '
'be\n'
' compared only within each of their types, with the '
'restriction that\n'
' ranges do not support order comparison. Equality '
'comparison across\n'
' these types results in unequality, and ordering comparison '
'across\n'
' these types raises "TypeError".\n'
'\n'
' Sequences compare lexicographically using comparison of\n'
' corresponding elements, whereby reflexivity of the elements '
'is\n'
' enforced.\n'
'\n'
' In enforcing reflexivity of elements, the comparison of '
'collections\n'
' assumes that for a collection element "x", "x == x" is '
'always true.\n'
' Based on that assumption, element identity is compared '
'first, and\n'
' element comparison is performed only for distinct '
'elements. This\n'
' approach yields the same result as a strict element '
'comparison\n'
' would, if the compared elements are reflexive. For '
'non-reflexive\n'
' elements, the result is different than for strict element\n'
' comparison, and may be surprising: The non-reflexive '
'not-a-number\n'
' values for example result in the following comparison '
'behavior when\n'
' used in a list:\n'
'\n'
" >>> nan = float('NaN')\n"
' >>> nan is nan\n'
' True\n'
' >>> nan == nan\n'
' False <-- the defined non-reflexive '
'behavior of NaN\n'
' >>> [nan] == [nan]\n'
' True <-- list enforces reflexivity and '
'tests identity first\n'
'\n'
' Lexicographical comparison between built-in collections '
'works as\n'
' follows:\n'
'\n'
' * For two collections to compare equal, they must be of the '
'same\n'
' type, have the same length, and each pair of '
'corresponding\n'
' elements must compare equal (for example, "[1,2] == '
'(1,2)" is\n'
' false because the type is not the same).\n'
'\n'
' * Collections that support order comparison are ordered the '
'same\n'
' as their first unequal elements (for example, "[1,2,x] <= '
'[1,2,y]"\n'
' has the same value as "x <= y"). If a corresponding '
'element does\n'
' not exist, the shorter collection is ordered first (for '
'example,\n'
' "[1,2] < [1,2,3]" is true).\n'
'\n'
'* Mappings (instances of "dict") compare equal if and only if '
'they\n'
' have equal *(key, value)* pairs. Equality comparison of the '
'keys and\n'
' elements enforces reflexivity.\n'
'\n'
' Order comparisons ("<", ">", "<=", and ">=") raise '
'"TypeError".\n'
'\n'
'* Sets (instances of "set" or "frozenset") can be compared '
'within\n'
' and across their types.\n'
'\n'
' They define order comparison operators to mean subset and '
'superset\n'
' tests. Those relations do not define total orderings (for '
'example,\n'
' the two sets "{1,2}" and "{2,3}" are not equal, nor subsets '
'of one\n'
' another, nor supersets of one another). Accordingly, sets '
'are not\n'
' appropriate arguments for functions which depend on total '
'ordering\n'
' (for example, "min()", "max()", and "sorted()" produce '
'undefined\n'
' results given a list of sets as inputs).\n'
'\n'
' Comparison of sets enforces reflexivity of its elements.\n'
'\n'
'* Most other built-in types have no comparison methods '
'implemented,\n'
' so they inherit the default comparison behavior.\n'
'\n'
'User-defined classes that customize their comparison behavior '
'should\n'
'follow some consistency rules, if possible:\n'
'\n'
'* Equality comparison should be reflexive. In other words, '
'identical\n'
' objects should compare equal:\n'
'\n'
' "x is y" implies "x == y"\n'
'\n'
'* Comparison should be symmetric. In other words, the '
'following\n'
' expressions should have the same result:\n'
'\n'
' "x == y" and "y == x"\n'
'\n'
' "x != y" and "y != x"\n'
'\n'
' "x < y" and "y > x"\n'
'\n'
' "x <= y" and "y >= x"\n'
'\n'
'* Comparison should be transitive. The following '
'(non-exhaustive)\n'
' examples illustrate that:\n'
'\n'
' "x > y and y > z" implies "x > z"\n'
'\n'
' "x < y and y <= z" implies "x < z"\n'
'\n'
'* Inverse comparison should result in the boolean negation. '
'In other\n'
' words, the following expressions should have the same '
'result:\n'
'\n'
' "x == y" and "not x != y"\n'
'\n'
' "x < y" and "not x >= y" (for total ordering)\n'
'\n'
' "x > y" and "not x <= y" (for total ordering)\n'
'\n'
' The last two expressions apply to totally ordered '
'collections (e.g.\n'
' to sequences, but not to sets or mappings). See also the\n'
' "total_ordering()" decorator.\n'
'\n'
'Python does not enforce these consistency rules. In fact, '
'the\n'
'not-a-number values are an example for not following these '
'rules.\n'
'\n'
'\n'
'Membership test operations\n'
'==========================\n'
'\n'
'The operators "in" and "not in" test for membership. "x in '
's"\n'
'evaluates to true if *x* is a member of *s*, and false '
'otherwise. "x\n'
'not in s" returns the negation of "x in s". All built-in '
'sequences\n'
'and set types support this as well as dictionary, for which '
'"in" tests\n'
'whether the dictionary has a given key. For container types '
'such as\n'
'list, tuple, set, frozenset, dict, or collections.deque, the\n'
'expression "x in y" is equivalent to "any(x is e or x == e '
'for e in\n'
'y)".\n'
'\n'
'For the string and bytes types, "x in y" is true if and only '
'if *x* is\n'
'a substring of *y*. An equivalent test is "y.find(x) != '
'-1". Empty\n'
'strings are always considered to be a substring of any other '
'string,\n'
'so """ in "abc"" will return "True".\n'
'\n'
'For user-defined classes which define the "__contains__()" '
'method, "x\n'
'in y" is true if and only if "y.__contains__(x)" is true.\n'
'\n'
'For user-defined classes which do not define "__contains__()" '
'but do\n'
'define "__iter__()", "x in y" is true if some value "z" with '
'"x == z"\n'
'is produced while iterating over "y". If an exception is '
'raised\n'
'during the iteration, it is as if "in" raised that '
'exception.\n'
'\n'
'Lastly, the old-style iteration protocol is tried: if a class '
'defines\n'
'"__getitem__()", "x in y" is true if and only if there is a '
'non-\n'
'negative integer index *i* such that "x == y[i]", and all '
'lower\n'
'integer indices do not raise "IndexError" exception. (If any '
'other\n'
'exception is raised, it is as if "in" raised that '
'exception).\n'
'\n'
'The operator "not in" is defined to have the inverse true '
'value of\n'
'"in".\n'
'\n'
'\n'
'Identity comparisons\n'
'====================\n'
'\n'
'The operators "is" and "is not" test for object identity: "x '
'is y" is\n'
'true if and only if *x* and *y* are the same object. "x is '
'not y"\n'
'yields the inverse truth value. [4]\n',
'compound': '\n'
'Compound statements\n'
'*******************\n'
'\n'
'Compound statements contain (groups of) other statements; they '
'affect\n'
'or control the execution of those other statements in some way. '
'In\n'
'general, compound statements span multiple lines, although in '
'simple\n'
'incarnations a whole compound statement may be contained in one '
'line.\n'
'\n'
'The "if", "while" and "for" statements implement traditional '
'control\n'
'flow constructs. "try" specifies exception handlers and/or '
'cleanup\n'
'code for a group of statements, while the "with" statement '
'allows the\n'
'execution of initialization and finalization code around a block '
'of\n'
'code. Function and class definitions are also syntactically '
'compound\n'
'statements.\n'
'\n'
"A compound statement consists of one or more 'clauses.' A "
'clause\n'
"consists of a header and a 'suite.' The clause headers of a\n"
'particular compound statement are all at the same indentation '
'level.\n'
'Each clause header begins with a uniquely identifying keyword '
'and ends\n'
'with a colon. A suite is a group of statements controlled by a\n'
'clause. A suite can be one or more semicolon-separated simple\n'
'statements on the same line as the header, following the '
"header's\n"
'colon, or it can be one or more indented statements on '
'subsequent\n'
'lines. Only the latter form of a suite can contain nested '
'compound\n'
"statements; the following is illegal, mostly because it wouldn't "
'be\n'
'clear to which "if" clause a following "else" clause would '
'belong:\n'
'\n'
' if test1: if test2: print(x)\n'
'\n'
'Also note that the semicolon binds tighter than the colon in '
'this\n'
'context, so that in the following example, either all or none of '
'the\n'
'"print()" calls are executed:\n'
'\n'
' if x < y < z: print(x); print(y); print(z)\n'
'\n'
'Summarizing:\n'
'\n'
' compound_stmt ::= if_stmt\n'
' | while_stmt\n'
' | for_stmt\n'
' | try_stmt\n'
' | with_stmt\n'
' | funcdef\n'
' | classdef\n'
' suite ::= stmt_list NEWLINE | NEWLINE INDENT '
'statement+ DEDENT\n'
' statement ::= stmt_list NEWLINE | compound_stmt\n'
' stmt_list ::= simple_stmt (";" simple_stmt)* [";"]\n'
'\n'
'Note that statements always end in a "NEWLINE" possibly followed '
'by a\n'
'"DEDENT". Also note that optional continuation clauses always '
'begin\n'
'with a keyword that cannot start a statement, thus there are no\n'
'ambiguities (the \'dangling "else"\' problem is solved in Python '
'by\n'
'requiring nested "if" statements to be indented).\n'
'\n'
'The formatting of the grammar rules in the following sections '
'places\n'
'each clause on a separate line for clarity.\n'
'\n'
'\n'
'The "if" statement\n'
'==================\n'
'\n'
'The "if" statement is used for conditional execution:\n'
'\n'
' if_stmt ::= "if" expression ":" suite\n'
' ( "elif" expression ":" suite )*\n'
' ["else" ":" suite]\n'
'\n'
'It selects exactly one of the suites by evaluating the '
'expressions one\n'
'by one until one is found to be true (see section *Boolean '
'operations*\n'
'for the definition of true and false); then that suite is '
'executed\n'
'(and no other part of the "if" statement is executed or '
'evaluated).\n'
'If all expressions are false, the suite of the "else" clause, '
'if\n'
'present, is executed.\n'
'\n'
'\n'
'The "while" statement\n'
'=====================\n'
'\n'
'The "while" statement is used for repeated execution as long as '
'an\n'
'expression is true:\n'
'\n'
' while_stmt ::= "while" expression ":" suite\n'
' ["else" ":" suite]\n'
'\n'
'This repeatedly tests the expression and, if it is true, '
'executes the\n'
'first suite; if the expression is false (which may be the first '
'time\n'
'it is tested) the suite of the "else" clause, if present, is '
'executed\n'
'and the loop terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the '
'loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and goes '
'back\n'
'to testing the expression.\n'
'\n'
'\n'
'The "for" statement\n'
'===================\n'
'\n'
'The "for" statement is used to iterate over the elements of a '
'sequence\n'
'(such as a string, tuple or list) or other iterable object:\n'
'\n'
' for_stmt ::= "for" target_list "in" expression_list ":" '
'suite\n'
' ["else" ":" suite]\n'
'\n'
'The expression list is evaluated once; it should yield an '
'iterable\n'
'object. An iterator is created for the result of the\n'
'"expression_list". The suite is then executed once for each '
'item\n'
'provided by the iterator, in the order returned by the '
'iterator. Each\n'
'item in turn is assigned to the target list using the standard '
'rules\n'
'for assignments (see *Assignment statements*), and then the '
'suite is\n'
'executed. When the items are exhausted (which is immediately '
'when the\n'
'sequence is empty or an iterator raises a "StopIteration" '
'exception),\n'
'the suite in the "else" clause, if present, is executed, and the '
'loop\n'
'terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the '
'loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and '
'continues\n'
'with the next item, or with the "else" clause if there is no '
'next\n'
'item.\n'
'\n'
'The for-loop makes assignments to the variables(s) in the target '
'list.\n'
'This overwrites all previous assignments to those variables '
'including\n'
'those made in the suite of the for-loop:\n'
'\n'
' for i in range(10):\n'
' print(i)\n'
' i = 5 # this will not affect the for-loop\n'
' # because i will be overwritten with '
'the next\n'
' # index in the range\n'
'\n'
'Names in the target list are not deleted when the loop is '
'finished,\n'
'but if the sequence is empty, they will not have been assigned '
'to at\n'
'all by the loop. Hint: the built-in function "range()" returns '
'an\n'
"iterator of integers suitable to emulate the effect of Pascal's "
'"for i\n'
':= a to b do"; e.g., "list(range(3))" returns the list "[0, 1, '
'2]".\n'
'\n'
'Note: There is a subtlety when the sequence is being modified by '
'the\n'
' loop (this can only occur for mutable sequences, i.e. lists). '
'An\n'
' internal counter is used to keep track of which item is used '
'next,\n'
' and this is incremented on each iteration. When this counter '
'has\n'
' reached the length of the sequence the loop terminates. This '
'means\n'
' that if the suite deletes the current (or a previous) item '
'from the\n'
' sequence, the next item will be skipped (since it gets the '
'index of\n'
' the current item which has already been treated). Likewise, '
'if the\n'
' suite inserts an item in the sequence before the current item, '
'the\n'
' current item will be treated again the next time through the '
'loop.\n'
' This can lead to nasty bugs that can be avoided by making a\n'
' temporary copy using a slice of the whole sequence, e.g.,\n'
'\n'
' for x in a[:]:\n'
' if x < 0: a.remove(x)\n'
'\n'
'\n'
'The "try" statement\n'
'===================\n'
'\n'
'The "try" statement specifies exception handlers and/or cleanup '
'code\n'
'for a group of statements:\n'
'\n'
' try_stmt ::= try1_stmt | try2_stmt\n'
' try1_stmt ::= "try" ":" suite\n'
' ("except" [expression ["as" identifier]] ":" '
'suite)+\n'
' ["else" ":" suite]\n'
' ["finally" ":" suite]\n'
' try2_stmt ::= "try" ":" suite\n'
' "finally" ":" suite\n'
'\n'
'The "except" clause(s) specify one or more exception handlers. '
'When no\n'
'exception occurs in the "try" clause, no exception handler is\n'
'executed. When an exception occurs in the "try" suite, a search '
'for an\n'
'exception handler is started. This search inspects the except '
'clauses\n'
'in turn until one is found that matches the exception. An '
'expression-\n'
'less except clause, if present, must be last; it matches any\n'
'exception. For an except clause with an expression, that '
'expression\n'
'is evaluated, and the clause matches the exception if the '
'resulting\n'
'object is "compatible" with the exception. An object is '
'compatible\n'
'with an exception if it is the class or a base class of the '
'exception\n'
'object or a tuple containing an item compatible with the '
'exception.\n'
'\n'
'If no except clause matches the exception, the search for an '
'exception\n'
'handler continues in the surrounding code and on the invocation '
'stack.\n'
'[1]\n'
'\n'
'If the evaluation of an expression in the header of an except '
'clause\n'
'raises an exception, the original search for a handler is '
'canceled and\n'
'a search starts for the new exception in the surrounding code '
'and on\n'
'the call stack (it is treated as if the entire "try" statement '
'raised\n'
'the exception).\n'
'\n'
'When a matching except clause is found, the exception is '
'assigned to\n'
'the target specified after the "as" keyword in that except '
'clause, if\n'
"present, and the except clause's suite is executed. All except\n"
'clauses must have an executable block. When the end of this '
'block is\n'
'reached, execution continues normally after the entire try '
'statement.\n'
'(This means that if two nested handlers exist for the same '
'exception,\n'
'and the exception occurs in the try clause of the inner handler, '
'the\n'
'outer handler will not handle the exception.)\n'
'\n'
'When an exception has been assigned using "as target", it is '
'cleared\n'
'at the end of the except clause. This is as if\n'
'\n'
' except E as N:\n'
' foo\n'
'\n'
'was translated to\n'
'\n'
' except E as N:\n'
' try:\n'
' foo\n'
' finally:\n'
' del N\n'
'\n'
'This means the exception must be assigned to a different name to '
'be\n'
'able to refer to it after the except clause. Exceptions are '
'cleared\n'
'because with the traceback attached to them, they form a '
'reference\n'
'cycle with the stack frame, keeping all locals in that frame '
'alive\n'
'until the next garbage collection occurs.\n'
'\n'
"Before an except clause's suite is executed, details about the\n"
'exception are stored in the "sys" module and can be accessed '
'via\n'
'"sys.exc_info()". "sys.exc_info()" returns a 3-tuple consisting '
'of the\n'
'exception class, the exception instance and a traceback object '
'(see\n'
'section *The standard type hierarchy*) identifying the point in '
'the\n'
'program where the exception occurred. "sys.exc_info()" values '
'are\n'
'restored to their previous values (before the call) when '
'returning\n'
'from a function that handled an exception.\n'
'\n'
'The optional "else" clause is executed if and when control flows '
'off\n'
'the end of the "try" clause. [2] Exceptions in the "else" clause '
'are\n'
'not handled by the preceding "except" clauses.\n'
'\n'
'If "finally" is present, it specifies a \'cleanup\' handler. '
'The "try"\n'
'clause is executed, including any "except" and "else" clauses. '
'If an\n'
'exception occurs in any of the clauses and is not handled, the\n'
'exception is temporarily saved. The "finally" clause is '
'executed. If\n'
'there is a saved exception it is re-raised at the end of the '
'"finally"\n'
'clause. If the "finally" clause raises another exception, the '
'saved\n'
'exception is set as the context of the new exception. If the '
'"finally"\n'
'clause executes a "return" or "break" statement, the saved '
'exception\n'
'is discarded:\n'
'\n'
' >>> def f():\n'
' ... try:\n'
' ... 1/0\n'
' ... finally:\n'
' ... return 42\n'
' ...\n'
' >>> f()\n'
' 42\n'
'\n'
'The exception information is not available to the program '
'during\n'
'execution of the "finally" clause.\n'
'\n'
'When a "return", "break" or "continue" statement is executed in '
'the\n'
'"try" suite of a "try"..."finally" statement, the "finally" '
'clause is\n'
'also executed \'on the way out.\' A "continue" statement is '
'illegal in\n'
'the "finally" clause. (The reason is a problem with the current\n'
'implementation --- this restriction may be lifted in the '
'future).\n'
'\n'
'The return value of a function is determined by the last '
'"return"\n'
'statement executed. Since the "finally" clause always executes, '
'a\n'
'"return" statement executed in the "finally" clause will always '
'be the\n'
'last one executed:\n'
'\n'
' >>> def foo():\n'
' ... try:\n'
" ... return 'try'\n"
' ... finally:\n'
" ... return 'finally'\n"
' ...\n'
' >>> foo()\n'
" 'finally'\n"
'\n'
'Additional information on exceptions can be found in section\n'
'*Exceptions*, and information on using the "raise" statement to\n'
'generate exceptions may be found in section *The raise '
'statement*.\n'
'\n'
'\n'
'The "with" statement\n'
'====================\n'
'\n'
'The "with" statement is used to wrap the execution of a block '
'with\n'
'methods defined by a context manager (see section *With '
'Statement\n'
'Context Managers*). This allows common '
'"try"..."except"..."finally"\n'
'usage patterns to be encapsulated for convenient reuse.\n'
'\n'
' with_stmt ::= "with" with_item ("," with_item)* ":" suite\n'
' with_item ::= expression ["as" target]\n'
'\n'
'The execution of the "with" statement with one "item" proceeds '
'as\n'
'follows:\n'
'\n'
'1. The context expression (the expression given in the '
'"with_item")\n'
' is evaluated to obtain a context manager.\n'
'\n'
'2. The context manager\'s "__exit__()" is loaded for later use.\n'
'\n'
'3. The context manager\'s "__enter__()" method is invoked.\n'
'\n'
'4. If a target was included in the "with" statement, the return\n'
' value from "__enter__()" is assigned to it.\n'
'\n'
' Note: The "with" statement guarantees that if the '
'"__enter__()"\n'
' method returns without an error, then "__exit__()" will '
'always be\n'
' called. Thus, if an error occurs during the assignment to '
'the\n'
' target list, it will be treated the same as an error '
'occurring\n'
' within the suite would be. See step 6 below.\n'
'\n'
'5. The suite is executed.\n'
'\n'
'6. The context manager\'s "__exit__()" method is invoked. If '
'an\n'
' exception caused the suite to be exited, its type, value, '
'and\n'
' traceback are passed as arguments to "__exit__()". Otherwise, '
'three\n'
' "None" arguments are supplied.\n'
'\n'
' If the suite was exited due to an exception, and the return '
'value\n'
' from the "__exit__()" method was false, the exception is '
'reraised.\n'
' If the return value was true, the exception is suppressed, '
'and\n'
' execution continues with the statement following the "with"\n'
' statement.\n'
'\n'
' If the suite was exited for any reason other than an '
'exception, the\n'
' return value from "__exit__()" is ignored, and execution '
'proceeds\n'
' at the normal location for the kind of exit that was taken.\n'
'\n'
'With more than one item, the context managers are processed as '
'if\n'
'multiple "with" statements were nested:\n'
'\n'
' with A() as a, B() as b:\n'
' suite\n'
'\n'
'is equivalent to\n'
'\n'
' with A() as a:\n'
' with B() as b:\n'
' suite\n'
'\n'
'Changed in version 3.1: Support for multiple context '
'expressions.\n'
'\n'
'See also: **PEP 0343** - The "with" statement\n'
'\n'
' The specification, background, and examples for the Python '
'"with"\n'
' statement.\n'
'\n'
'\n'
'Function definitions\n'
'====================\n'
'\n'
'A function definition defines a user-defined function object '
'(see\n'
'section *The standard type hierarchy*):\n'
'\n'
' funcdef ::= [decorators] "def" funcname "(" '
'[parameter_list] ")" ["->" expression] ":" suite\n'
' decorators ::= decorator+\n'
' decorator ::= "@" dotted_name ["(" [parameter_list '
'[","]] ")"] NEWLINE\n'
' dotted_name ::= identifier ("." identifier)*\n'
' parameter_list ::= (defparameter ",")*\n'
' | "*" [parameter] ("," defparameter)* ["," '
'"**" parameter]\n'
' | "**" parameter\n'
' | defparameter [","] )\n'
' parameter ::= identifier [":" expression]\n'
' defparameter ::= parameter ["=" expression]\n'
' funcname ::= identifier\n'
'\n'
'A function definition is an executable statement. Its execution '
'binds\n'
'the function name in the current local namespace to a function '
'object\n'
'(a wrapper around the executable code for the function). This\n'
'function object contains a reference to the current global '
'namespace\n'
'as the global namespace to be used when the function is called.\n'
'\n'
'The function definition does not execute the function body; this '
'gets\n'
'executed only when the function is called. [3]\n'
'\n'
'A function definition may be wrapped by one or more *decorator*\n'
'expressions. Decorator expressions are evaluated when the '
'function is\n'
'defined, in the scope that contains the function definition. '
'The\n'
'result must be a callable, which is invoked with the function '
'object\n'
'as the only argument. The returned value is bound to the '
'function name\n'
'instead of the function object. Multiple decorators are applied '
'in\n'
'nested fashion. For example, the following code\n'
'\n'
' @f1(arg)\n'
' @f2\n'
' def func(): pass\n'
'\n'
'is equivalent to\n'
'\n'
' def func(): pass\n'
' func = f1(arg)(f2(func))\n'
'\n'
'When one or more *parameters* have the form *parameter* "="\n'
'*expression*, the function is said to have "default parameter '
'values."\n'
'For a parameter with a default value, the corresponding '
'*argument* may\n'
"be omitted from a call, in which case the parameter's default "
'value is\n'
'substituted. If a parameter has a default value, all following\n'
'parameters up until the ""*"" must also have a default value --- '
'this\n'
'is a syntactic restriction that is not expressed by the '
'grammar.\n'
'\n'
'**Default parameter values are evaluated from left to right when '
'the\n'
'function definition is executed.** This means that the '
'expression is\n'
'evaluated once, when the function is defined, and that the same '
'"pre-\n'
'computed" value is used for each call. This is especially '
'important\n'
'to understand when a default parameter is a mutable object, such '
'as a\n'
'list or a dictionary: if the function modifies the object (e.g. '
'by\n'
'appending an item to a list), the default value is in effect '
'modified.\n'
'This is generally not what was intended. A way around this is '
'to use\n'
'"None" as the default, and explicitly test for it in the body of '
'the\n'
'function, e.g.:\n'
'\n'
' def whats_on_the_telly(penguin=None):\n'
' if penguin is None:\n'
' penguin = []\n'
' penguin.append("property of the zoo")\n'
' return penguin\n'
'\n'
'Function call semantics are described in more detail in section\n'
'*Calls*. A function call always assigns values to all '
'parameters\n'
'mentioned in the parameter list, either from position arguments, '
'from\n'
'keyword arguments, or from default values. If the form\n'
'""*identifier"" is present, it is initialized to a tuple '
'receiving any\n'
'excess positional parameters, defaulting to the empty tuple. If '
'the\n'
'form ""**identifier"" is present, it is initialized to a new\n'
'dictionary receiving any excess keyword arguments, defaulting to '
'a new\n'
'empty dictionary. Parameters after ""*"" or ""*identifier"" are\n'
'keyword-only parameters and may only be passed used keyword '
'arguments.\n'
'\n'
'Parameters may have annotations of the form "": expression"" '
'following\n'
'the parameter name. Any parameter may have an annotation even '
'those\n'
'of the form "*identifier" or "**identifier". Functions may '
'have\n'
'"return" annotation of the form ""-> expression"" after the '
'parameter\n'
'list. These annotations can be any valid Python expression and '
'are\n'
'evaluated when the function definition is executed. Annotations '
'may\n'
'be evaluated in a different order than they appear in the source '
'code.\n'
'The presence of annotations does not change the semantics of a\n'
'function. The annotation values are available as values of a\n'
"dictionary keyed by the parameters' names in the "
'"__annotations__"\n'
'attribute of the function object.\n'
'\n'
'It is also possible to create anonymous functions (functions not '
'bound\n'
'to a name), for immediate use in expressions. This uses lambda\n'
'expressions, described in section *Lambdas*. Note that the '
'lambda\n'
'expression is merely a shorthand for a simplified function '
'definition;\n'
'a function defined in a ""def"" statement can be passed around '
'or\n'
'assigned to another name just like a function defined by a '
'lambda\n'
'expression. The ""def"" form is actually more powerful since '
'it\n'
'allows the execution of multiple statements and annotations.\n'
'\n'
"**Programmer's note:** Functions are first-class objects. A "
'""def""\n'
'statement executed inside a function definition defines a local\n'
'function that can be returned or passed around. Free variables '
'used\n'
'in the nested function can access the local variables of the '
'function\n'
'containing the def. See section *Naming and binding* for '
'details.\n'
'\n'
'See also: **PEP 3107** - Function Annotations\n'
'\n'
' The original specification for function annotations.\n'
'\n'
'\n'
'Class definitions\n'
'=================\n'
'\n'
'A class definition defines a class object (see section *The '
'standard\n'
'type hierarchy*):\n'
'\n'
' classdef ::= [decorators] "class" classname [inheritance] '
'":" suite\n'
' inheritance ::= "(" [parameter_list] ")"\n'
' classname ::= identifier\n'
'\n'
'A class definition is an executable statement. The inheritance '
'list\n'
'usually gives a list of base classes (see *Customizing class '
'creation*\n'
'for more advanced uses), so each item in the list should '
'evaluate to a\n'
'class object which allows subclassing. Classes without an '
'inheritance\n'
'list inherit, by default, from the base class "object"; hence,\n'
'\n'
' class Foo:\n'
' pass\n'
'\n'
'is equivalent to\n'
'\n'
' class Foo(object):\n'
' pass\n'
'\n'
"The class's suite is then executed in a new execution frame "
'(see\n'
'*Naming and binding*), using a newly created local namespace and '
'the\n'
'original global namespace. (Usually, the suite contains mostly\n'
"function definitions.) When the class's suite finishes "
'execution, its\n'
'execution frame is discarded but its local namespace is saved. '
'[4] A\n'
'class object is then created using the inheritance list for the '
'base\n'
'classes and the saved local namespace for the attribute '
'dictionary.\n'
'The class name is bound to this class object in the original '
'local\n'
'namespace.\n'
'\n'
'Class creation can be customized heavily using *metaclasses*.\n'
'\n'
'Classes can also be decorated: just like when decorating '
'functions,\n'
'\n'
' @f1(arg)\n'
' @f2\n'
' class Foo: pass\n'
'\n'
'is equivalent to\n'
'\n'
' class Foo: pass\n'
' Foo = f1(arg)(f2(Foo))\n'
'\n'
'The evaluation rules for the decorator expressions are the same '
'as for\n'
'function decorators. The result must be a class object, which '
'is then\n'
'bound to the class name.\n'
'\n'
"**Programmer's note:** Variables defined in the class definition "
'are\n'
'class attributes; they are shared by instances. Instance '
'attributes\n'
'can be set in a method with "self.name = value". Both class '
'and\n'
'instance attributes are accessible through the notation '
'""self.name"",\n'
'and an instance attribute hides a class attribute with the same '
'name\n'
'when accessed in this way. Class attributes can be used as '
'defaults\n'
'for instance attributes, but using mutable values there can lead '
'to\n'
'unexpected results. *Descriptors* can be used to create '
'instance\n'
'variables with different implementation details.\n'
'\n'
'See also: **PEP 3115** - Metaclasses in Python 3 **PEP 3129** -\n'
' Class Decorators\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] The exception is propagated to the invocation stack unless\n'
' there is a "finally" clause which happens to raise another\n'
' exception. That new exception causes the old one to be '
'lost.\n'
'\n'
'[2] Currently, control "flows off the end" except in the case '
'of\n'
' an exception or the execution of a "return", "continue", or\n'
' "break" statement.\n'
'\n'
'[3] A string literal appearing as the first statement in the\n'
' function body is transformed into the function\'s "__doc__"\n'
" attribute and therefore the function's *docstring*.\n"
'\n'
'[4] A string literal appearing as the first statement in the '
'class\n'
' body is transformed into the namespace\'s "__doc__" item '
'and\n'
" therefore the class's *docstring*.\n",
'context-managers': '\n'
'With Statement Context Managers\n'
'*******************************\n'
'\n'
'A *context manager* is an object that defines the '
'runtime context to\n'
'be established when executing a "with" statement. The '
'context manager\n'
'handles the entry into, and the exit from, the desired '
'runtime context\n'
'for the execution of the block of code. Context '
'managers are normally\n'
'invoked using the "with" statement (described in section '
'*The with\n'
'statement*), but can also be used by directly invoking '
'their methods.\n'
'\n'
'Typical uses of context managers include saving and '
'restoring various\n'
'kinds of global state, locking and unlocking resources, '
'closing opened\n'
'files, etc.\n'
'\n'
'For more information on context managers, see *Context '
'Manager Types*.\n'
'\n'
'object.__enter__(self)\n'
'\n'
' Enter the runtime context related to this object. The '
'"with"\n'
" statement will bind this method's return value to the "
'target(s)\n'
' specified in the "as" clause of the statement, if '
'any.\n'
'\n'
'object.__exit__(self, exc_type, exc_value, traceback)\n'
'\n'
' Exit the runtime context related to this object. The '
'parameters\n'
' describe the exception that caused the context to be '
'exited. If the\n'
' context was exited without an exception, all three '
'arguments will\n'
' be "None".\n'
'\n'
' If an exception is supplied, and the method wishes to '
'suppress the\n'
' exception (i.e., prevent it from being propagated), '
'it should\n'
' return a true value. Otherwise, the exception will be '
'processed\n'
' normally upon exit from this method.\n'
'\n'
' Note that "__exit__()" methods should not reraise the '
'passed-in\n'
" exception; this is the caller's responsibility.\n"
'\n'
'See also: **PEP 0343** - The "with" statement\n'
'\n'
' The specification, background, and examples for the '
'Python "with"\n'
' statement.\n',
'continue': '\n'
'The "continue" statement\n'
'************************\n'
'\n'
' continue_stmt ::= "continue"\n'
'\n'
'"continue" may only occur syntactically nested in a "for" or '
'"while"\n'
'loop, but not nested in a function or class definition or '
'"finally"\n'
'clause within that loop. It continues with the next cycle of '
'the\n'
'nearest enclosing loop.\n'
'\n'
'When "continue" passes control out of a "try" statement with a\n'
'"finally" clause, that "finally" clause is executed before '
'really\n'
'starting the next loop cycle.\n',
'conversions': '\n'
'Arithmetic conversions\n'
'**********************\n'
'\n'
'When a description of an arithmetic operator below uses the '
'phrase\n'
'"the numeric arguments are converted to a common type," this '
'means\n'
'that the operator implementation for built-in types works as '
'follows:\n'
'\n'
'* If either argument is a complex number, the other is '
'converted to\n'
' complex;\n'
'\n'
'* otherwise, if either argument is a floating point number, '
'the\n'
' other is converted to floating point;\n'
'\n'
'* otherwise, both must be integers and no conversion is '
'necessary.\n'
'\n'
'Some additional rules apply for certain operators (e.g., a '
'string as a\n'
"left argument to the '%' operator). Extensions must define "
'their own\n'
'conversion behavior.\n',
'customization': '\n'
'Basic customization\n'
'*******************\n'
'\n'
'object.__new__(cls[, ...])\n'
'\n'
' Called to create a new instance of class *cls*. '
'"__new__()" is a\n'
' static method (special-cased so you need not declare it '
'as such)\n'
' that takes the class of which an instance was requested '
'as its\n'
' first argument. The remaining arguments are those '
'passed to the\n'
' object constructor expression (the call to the class). '
'The return\n'
' value of "__new__()" should be the new object instance '
'(usually an\n'
' instance of *cls*).\n'
'\n'
' Typical implementations create a new instance of the '
'class by\n'
' invoking the superclass\'s "__new__()" method using\n'
' "super(currentclass, cls).__new__(cls[, ...])" with '
'appropriate\n'
' arguments and then modifying the newly-created instance '
'as\n'
' necessary before returning it.\n'
'\n'
' If "__new__()" returns an instance of *cls*, then the '
'new\n'
' instance\'s "__init__()" method will be invoked like\n'
' "__init__(self[, ...])", where *self* is the new '
'instance and the\n'
' remaining arguments are the same as were passed to '
'"__new__()".\n'
'\n'
' If "__new__()" does not return an instance of *cls*, '
'then the new\n'
' instance\'s "__init__()" method will not be invoked.\n'
'\n'
' "__new__()" is intended mainly to allow subclasses of '
'immutable\n'
' types (like int, str, or tuple) to customize instance '
'creation. It\n'
' is also commonly overridden in custom metaclasses in '
'order to\n'
' customize class creation.\n'
'\n'
'object.__init__(self[, ...])\n'
'\n'
' Called after the instance has been created (by '
'"__new__()"), but\n'
' before it is returned to the caller. The arguments are '
'those\n'
' passed to the class constructor expression. If a base '
'class has an\n'
' "__init__()" method, the derived class\'s "__init__()" '
'method, if\n'
' any, must explicitly call it to ensure proper '
'initialization of the\n'
' base class part of the instance; for example:\n'
' "BaseClass.__init__(self, [args...])".\n'
'\n'
' Because "__new__()" and "__init__()" work together in '
'constructing\n'
' objects ("__new__()" to create it, and "__init__()" to '
'customise\n'
' it), no non-"None" value may be returned by '
'"__init__()"; doing so\n'
' will cause a "TypeError" to be raised at runtime.\n'
'\n'
'object.__del__(self)\n'
'\n'
' Called when the instance is about to be destroyed. This '
'is also\n'
' called a destructor. If a base class has a "__del__()" '
'method, the\n'
' derived class\'s "__del__()" method, if any, must '
'explicitly call it\n'
' to ensure proper deletion of the base class part of the '
'instance.\n'
' Note that it is possible (though not recommended!) for '
'the\n'
' "__del__()" method to postpone destruction of the '
'instance by\n'
' creating a new reference to it. It may then be called '
'at a later\n'
' time when this new reference is deleted. It is not '
'guaranteed that\n'
' "__del__()" methods are called for objects that still '
'exist when\n'
' the interpreter exits.\n'
'\n'
' Note: "del x" doesn\'t directly call "x.__del__()" --- '
'the former\n'
' decrements the reference count for "x" by one, and the '
'latter is\n'
' only called when "x"\'s reference count reaches zero. '
'Some common\n'
' situations that may prevent the reference count of an '
'object from\n'
' going to zero include: circular references between '
'objects (e.g.,\n'
' a doubly-linked list or a tree data structure with '
'parent and\n'
' child pointers); a reference to the object on the '
'stack frame of\n'
' a function that caught an exception (the traceback '
'stored in\n'
' "sys.exc_info()[2]" keeps the stack frame alive); or a '
'reference\n'
' to the object on the stack frame that raised an '
'unhandled\n'
' exception in interactive mode (the traceback stored '
'in\n'
' "sys.last_traceback" keeps the stack frame alive). '
'The first\n'
' situation can only be remedied by explicitly breaking '
'the cycles;\n'
' the second can be resolved by freeing the reference to '
'the\n'
' traceback object when it is no longer useful, and the '
'third can\n'
' be resolved by storing "None" in "sys.last_traceback". '
'Circular\n'
' references which are garbage are detected and cleaned '
'up when the\n'
" cyclic garbage collector is enabled (it's on by "
'default). Refer\n'
' to the documentation for the "gc" module for more '
'information\n'
' about this topic.\n'
'\n'
' Warning: Due to the precarious circumstances under '
'which\n'
' "__del__()" methods are invoked, exceptions that occur '
'during\n'
' their execution are ignored, and a warning is printed '
'to\n'
' "sys.stderr" instead. Also, when "__del__()" is '
'invoked in\n'
' response to a module being deleted (e.g., when '
'execution of the\n'
' program is done), other globals referenced by the '
'"__del__()"\n'
' method may already have been deleted or in the process '
'of being\n'
' torn down (e.g. the import machinery shutting down). '
'For this\n'
' reason, "__del__()" methods should do the absolute '
'minimum needed\n'
' to maintain external invariants. Starting with '
'version 1.5,\n'
' Python guarantees that globals whose name begins with '
'a single\n'
' underscore are deleted from their module before other '
'globals are\n'
' deleted; if no other references to such globals exist, '
'this may\n'
' help in assuring that imported modules are still '
'available at the\n'
' time when the "__del__()" method is called.\n'
'\n'
'object.__repr__(self)\n'
'\n'
' Called by the "repr()" built-in function to compute the '
'"official"\n'
' string representation of an object. If at all possible, '
'this\n'
' should look like a valid Python expression that could be '
'used to\n'
' recreate an object with the same value (given an '
'appropriate\n'
' environment). If this is not possible, a string of the '
'form\n'
' "<...some useful description...>" should be returned. '
'The return\n'
' value must be a string object. If a class defines '
'"__repr__()" but\n'
' not "__str__()", then "__repr__()" is also used when an '
'"informal"\n'
' string representation of instances of that class is '
'required.\n'
'\n'
' This is typically used for debugging, so it is important '
'that the\n'
' representation is information-rich and unambiguous.\n'
'\n'
'object.__str__(self)\n'
'\n'
' Called by "str(object)" and the built-in functions '
'"format()" and\n'
' "print()" to compute the "informal" or nicely printable '
'string\n'
' representation of an object. The return value must be a '
'*string*\n'
' object.\n'
'\n'
' This method differs from "object.__repr__()" in that '
'there is no\n'
' expectation that "__str__()" return a valid Python '
'expression: a\n'
' more convenient or concise representation can be used.\n'
'\n'
' The default implementation defined by the built-in type '
'"object"\n'
' calls "object.__repr__()".\n'
'\n'
'object.__bytes__(self)\n'
'\n'
' Called by "bytes()" to compute a byte-string '
'representation of an\n'
' object. This should return a "bytes" object.\n'
'\n'
'object.__format__(self, format_spec)\n'
'\n'
' Called by the "format()" built-in function (and by '
'extension, the\n'
' "str.format()" method of class "str") to produce a '
'"formatted"\n'
' string representation of an object. The "format_spec" '
'argument is a\n'
' string that contains a description of the formatting '
'options\n'
' desired. The interpretation of the "format_spec" '
'argument is up to\n'
' the type implementing "__format__()", however most '
'classes will\n'
' either delegate formatting to one of the built-in types, '
'or use a\n'
' similar formatting option syntax.\n'
'\n'
' See *Format Specification Mini-Language* for a '
'description of the\n'
' standard formatting syntax.\n'
'\n'
' The return value must be a string object.\n'
'\n'
' Changed in version 3.4: The __format__ method of '
'"object" itself\n'
' raises a "TypeError" if passed any non-empty string.\n'
'\n'
'object.__lt__(self, other)\n'
'object.__le__(self, other)\n'
'object.__eq__(self, other)\n'
'object.__ne__(self, other)\n'
'object.__gt__(self, other)\n'
'object.__ge__(self, other)\n'
'\n'
' These are the so-called "rich comparison" methods. The\n'
' correspondence between operator symbols and method names '
'is as\n'
' follows: "x<y" calls "x.__lt__(y)", "x<=y" calls '
'"x.__le__(y)",\n'
' "x==y" calls "x.__eq__(y)", "x!=y" calls "x.__ne__(y)", '
'"x>y" calls\n'
' "x.__gt__(y)", and "x>=y" calls "x.__ge__(y)".\n'
'\n'
' A rich comparison method may return the singleton '
'"NotImplemented"\n'
' if it does not implement the operation for a given pair '
'of\n'
' arguments. By convention, "False" and "True" are '
'returned for a\n'
' successful comparison. However, these methods can return '
'any value,\n'
' so if the comparison operator is used in a Boolean '
'context (e.g.,\n'
' in the condition of an "if" statement), Python will call '
'"bool()"\n'
' on the value to determine if the result is true or '
'false.\n'
'\n'
' By default, "__ne__()" delegates to "__eq__()" and '
'inverts the\n'
' result unless it is "NotImplemented". There are no '
'other implied\n'
' relationships among the comparison operators, for '
'example, the\n'
' truth of "(x<y or x==y)" does not imply "x<=y". To '
'automatically\n'
' generate ordering operations from a single root '
'operation, see\n'
' "functools.total_ordering()".\n'
'\n'
' See the paragraph on "__hash__()" for some important '
'notes on\n'
' creating *hashable* objects which support custom '
'comparison\n'
' operations and are usable as dictionary keys.\n'
'\n'
' There are no swapped-argument versions of these methods '
'(to be used\n'
' when the left argument does not support the operation '
'but the right\n'
' argument does); rather, "__lt__()" and "__gt__()" are '
"each other's\n"
' reflection, "__le__()" and "__ge__()" are each other\'s '
'reflection,\n'
' and "__eq__()" and "__ne__()" are their own reflection. '
'If the\n'
" operands are of different types, and right operand's "
'type is a\n'
" direct or indirect subclass of the left operand's type, "
'the\n'
' reflected method of the right operand has priority, '
'otherwise the\n'
" left operand's method has priority. Virtual subclassing "
'is not\n'
' considered.\n'
'\n'
'object.__hash__(self)\n'
'\n'
' Called by built-in function "hash()" and for operations '
'on members\n'
' of hashed collections including "set", "frozenset", and '
'"dict".\n'
' "__hash__()" should return an integer. The only '
'required property\n'
' is that objects which compare equal have the same hash '
'value; it is\n'
' advised to somehow mix together (e.g. using exclusive '
'or) the hash\n'
' values for the components of the object that also play a '
'part in\n'
' comparison of objects.\n'
'\n'
' Note: "hash()" truncates the value returned from an '
"object's\n"
' custom "__hash__()" method to the size of a '
'"Py_ssize_t". This\n'
' is typically 8 bytes on 64-bit builds and 4 bytes on '
'32-bit\n'
' builds. If an object\'s "__hash__()" must '
'interoperate on builds\n'
' of different bit sizes, be sure to check the width on '
'all\n'
' supported builds. An easy way to do this is with '
'"python -c\n'
' "import sys; print(sys.hash_info.width)"".\n'
'\n'
' If a class does not define an "__eq__()" method it '
'should not\n'
' define a "__hash__()" operation either; if it defines '
'"__eq__()"\n'
' but not "__hash__()", its instances will not be usable '
'as items in\n'
' hashable collections. If a class defines mutable '
'objects and\n'
' implements an "__eq__()" method, it should not '
'implement\n'
' "__hash__()", since the implementation of hashable '
'collections\n'
" requires that a key's hash value is immutable (if the "
"object's hash\n"
' value changes, it will be in the wrong hash bucket).\n'
'\n'
' User-defined classes have "__eq__()" and "__hash__()" '
'methods by\n'
' default; with them, all objects compare unequal (except '
'with\n'
' themselves) and "x.__hash__()" returns an appropriate '
'value such\n'
' that "x == y" implies both that "x is y" and "hash(x) == '
'hash(y)".\n'
'\n'
' A class that overrides "__eq__()" and does not define '
'"__hash__()"\n'
' will have its "__hash__()" implicitly set to "None". '
'When the\n'
' "__hash__()" method of a class is "None", instances of '
'the class\n'
' will raise an appropriate "TypeError" when a program '
'attempts to\n'
' retrieve their hash value, and will also be correctly '
'identified as\n'
' unhashable when checking "isinstance(obj, '
'collections.Hashable)".\n'
'\n'
' If a class that overrides "__eq__()" needs to retain '
'the\n'
' implementation of "__hash__()" from a parent class, the '
'interpreter\n'
' must be told this explicitly by setting "__hash__ =\n'
' <ParentClass>.__hash__".\n'
'\n'
' If a class that does not override "__eq__()" wishes to '
'suppress\n'
' hash support, it should include "__hash__ = None" in the '
'class\n'
' definition. A class which defines its own "__hash__()" '
'that\n'
' explicitly raises a "TypeError" would be incorrectly '
'identified as\n'
' hashable by an "isinstance(obj, collections.Hashable)" '
'call.\n'
'\n'
' Note: By default, the "__hash__()" values of str, bytes '
'and\n'
' datetime objects are "salted" with an unpredictable '
'random value.\n'
' Although they remain constant within an individual '
'Python\n'
' process, they are not predictable between repeated '
'invocations of\n'
' Python.This is intended to provide protection against '
'a denial-\n'
' of-service caused by carefully-chosen inputs that '
'exploit the\n'
' worst case performance of a dict insertion, O(n^2) '
'complexity.\n'
' See '
'http://www.ocert.org/advisories/ocert-2011-003.html for\n'
' details.Changing hash values affects the iteration '
'order of\n'
' dicts, sets and other mappings. Python has never made '
'guarantees\n'
' about this ordering (and it typically varies between '
'32-bit and\n'
' 64-bit builds).See also "PYTHONHASHSEED".\n'
'\n'
' Changed in version 3.3: Hash randomization is enabled by '
'default.\n'
'\n'
'object.__bool__(self)\n'
'\n'
' Called to implement truth value testing and the built-in '
'operation\n'
' "bool()"; should return "False" or "True". When this '
'method is not\n'
' defined, "__len__()" is called, if it is defined, and '
'the object is\n'
' considered true if its result is nonzero. If a class '
'defines\n'
' neither "__len__()" nor "__bool__()", all its instances '
'are\n'
' considered true.\n',
'debugger': '\n'
'"pdb" --- The Python Debugger\n'
'*****************************\n'
'\n'
'**Source code:** Lib/pdb.py\n'
'\n'
'======================================================================\n'
'\n'
'The module "pdb" defines an interactive source code debugger '
'for\n'
'Python programs. It supports setting (conditional) breakpoints '
'and\n'
'single stepping at the source line level, inspection of stack '
'frames,\n'
'source code listing, and evaluation of arbitrary Python code in '
'the\n'
'context of any stack frame. It also supports post-mortem '
'debugging\n'
'and can be called under program control.\n'
'\n'
'The debugger is extensible -- it is actually defined as the '
'class\n'
'"Pdb". This is currently undocumented but easily understood by '
'reading\n'
'the source. The extension interface uses the modules "bdb" and '
'"cmd".\n'
'\n'
'The debugger\'s prompt is "(Pdb)". Typical usage to run a '
'program under\n'
'control of the debugger is:\n'
'\n'
' >>> import pdb\n'
' >>> import mymodule\n'
" >>> pdb.run('mymodule.test()')\n"
' > <string>(0)?()\n'
' (Pdb) continue\n'
' > <string>(1)?()\n'
' (Pdb) continue\n'
" NameError: 'spam'\n"
' > <string>(1)?()\n'
' (Pdb)\n'
'\n'
'Changed in version 3.3: Tab-completion via the "readline" module '
'is\n'
'available for commands and command arguments, e.g. the current '
'global\n'
'and local names are offered as arguments of the "p" command.\n'
'\n'
'"pdb.py" can also be invoked as a script to debug other '
'scripts. For\n'
'example:\n'
'\n'
' python3 -m pdb myscript.py\n'
'\n'
'When invoked as a script, pdb will automatically enter '
'post-mortem\n'
'debugging if the program being debugged exits abnormally. After '
'post-\n'
'mortem debugging (or after normal exit of the program), pdb '
'will\n'
"restart the program. Automatic restarting preserves pdb's state "
'(such\n'
'as breakpoints) and in most cases is more useful than quitting '
'the\n'
"debugger upon program's exit.\n"
'\n'
'New in version 3.2: "pdb.py" now accepts a "-c" option that '
'executes\n'
'commands as if given in a ".pdbrc" file, see *Debugger '
'Commands*.\n'
'\n'
'The typical usage to break into the debugger from a running '
'program is\n'
'to insert\n'
'\n'
' import pdb; pdb.set_trace()\n'
'\n'
'at the location you want to break into the debugger. You can '
'then\n'
'step through the code following this statement, and continue '
'running\n'
'without the debugger using the "continue" command.\n'
'\n'
'The typical usage to inspect a crashed program is:\n'
'\n'
' >>> import pdb\n'
' >>> import mymodule\n'
' >>> mymodule.test()\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in ?\n'
' File "./mymodule.py", line 4, in test\n'
' test2()\n'
' File "./mymodule.py", line 3, in test2\n'
' print(spam)\n'
' NameError: spam\n'
' >>> pdb.pm()\n'
' > ./mymodule.py(3)test2()\n'
' -> print(spam)\n'
' (Pdb)\n'
'\n'
'The module defines the following functions; each enters the '
'debugger\n'
'in a slightly different way:\n'
'\n'
'pdb.run(statement, globals=None, locals=None)\n'
'\n'
' Execute the *statement* (given as a string or a code object) '
'under\n'
' debugger control. The debugger prompt appears before any '
'code is\n'
' executed; you can set breakpoints and type "continue", or you '
'can\n'
' step through the statement using "step" or "next" (all these\n'
' commands are explained below). The optional *globals* and '
'*locals*\n'
' arguments specify the environment in which the code is '
'executed; by\n'
' default the dictionary of the module "__main__" is used. '
'(See the\n'
' explanation of the built-in "exec()" or "eval()" functions.)\n'
'\n'
'pdb.runeval(expression, globals=None, locals=None)\n'
'\n'
' Evaluate the *expression* (given as a string or a code '
'object)\n'
' under debugger control. When "runeval()" returns, it returns '
'the\n'
' value of the expression. Otherwise this function is similar '
'to\n'
' "run()".\n'
'\n'
'pdb.runcall(function, *args, **kwds)\n'
'\n'
' Call the *function* (a function or method object, not a '
'string)\n'
' with the given arguments. When "runcall()" returns, it '
'returns\n'
' whatever the function call returned. The debugger prompt '
'appears\n'
' as soon as the function is entered.\n'
'\n'
'pdb.set_trace()\n'
'\n'
' Enter the debugger at the calling stack frame. This is '
'useful to\n'
' hard-code a breakpoint at a given point in a program, even if '
'the\n'
' code is not otherwise being debugged (e.g. when an assertion\n'
' fails).\n'
'\n'
'pdb.post_mortem(traceback=None)\n'
'\n'
' Enter post-mortem debugging of the given *traceback* object. '
'If no\n'
' *traceback* is given, it uses the one of the exception that '
'is\n'
' currently being handled (an exception must be being handled '
'if the\n'
' default is to be used).\n'
'\n'
'pdb.pm()\n'
'\n'
' Enter post-mortem debugging of the traceback found in\n'
' "sys.last_traceback".\n'
'\n'
'The "run*" functions and "set_trace()" are aliases for '
'instantiating\n'
'the "Pdb" class and calling the method of the same name. If you '
'want\n'
'to access further features, you have to do this yourself:\n'
'\n'
"class class pdb.Pdb(completekey='tab', stdin=None, stdout=None, "
'skip=None, nosigint=False)\n'
'\n'
' "Pdb" is the debugger class.\n'
'\n'
' The *completekey*, *stdin* and *stdout* arguments are passed '
'to the\n'
' underlying "cmd.Cmd" class; see the description there.\n'
'\n'
' The *skip* argument, if given, must be an iterable of '
'glob-style\n'
' module name patterns. The debugger will not step into frames '
'that\n'
' originate in a module that matches one of these patterns. '
'[1]\n'
'\n'
' By default, Pdb sets a handler for the SIGINT signal (which '
'is sent\n'
' when the user presses "Ctrl-C" on the console) when you give '
'a\n'
' "continue" command. This allows you to break into the '
'debugger\n'
' again by pressing "Ctrl-C". If you want Pdb not to touch '
'the\n'
' SIGINT handler, set *nosigint* tot true.\n'
'\n'
' Example call to enable tracing with *skip*:\n'
'\n'
" import pdb; pdb.Pdb(skip=['django.*']).set_trace()\n"
'\n'
' New in version 3.1: The *skip* argument.\n'
'\n'
' New in version 3.2: The *nosigint* argument. Previously, a '
'SIGINT\n'
' handler was never set by Pdb.\n'
'\n'
' run(statement, globals=None, locals=None)\n'
' runeval(expression, globals=None, locals=None)\n'
' runcall(function, *args, **kwds)\n'
' set_trace()\n'
'\n'
' See the documentation for the functions explained above.\n'
'\n'
'\n'
'Debugger Commands\n'
'=================\n'
'\n'
'The commands recognized by the debugger are listed below. Most\n'
'commands can be abbreviated to one or two letters as indicated; '
'e.g.\n'
'"h(elp)" means that either "h" or "help" can be used to enter '
'the help\n'
'command (but not "he" or "hel", nor "H" or "Help" or "HELP").\n'
'Arguments to commands must be separated by whitespace (spaces '
'or\n'
'tabs). Optional arguments are enclosed in square brackets '
'("[]") in\n'
'the command syntax; the square brackets must not be typed.\n'
'Alternatives in the command syntax are separated by a vertical '
'bar\n'
'("|").\n'
'\n'
'Entering a blank line repeats the last command entered. '
'Exception: if\n'
'the last command was a "list" command, the next 11 lines are '
'listed.\n'
'\n'
"Commands that the debugger doesn't recognize are assumed to be "
'Python\n'
'statements and are executed in the context of the program being\n'
'debugged. Python statements can also be prefixed with an '
'exclamation\n'
'point ("!"). This is a powerful way to inspect the program '
'being\n'
'debugged; it is even possible to change a variable or call a '
'function.\n'
'When an exception occurs in such a statement, the exception name '
'is\n'
"printed but the debugger's state is not changed.\n"
'\n'
'The debugger supports *aliases*. Aliases can have parameters '
'which\n'
'allows one a certain level of adaptability to the context under\n'
'examination.\n'
'\n'
'Multiple commands may be entered on a single line, separated by '
'";;".\n'
'(A single ";" is not used as it is the separator for multiple '
'commands\n'
'in a line that is passed to the Python parser.) No intelligence '
'is\n'
'applied to separating the commands; the input is split at the '
'first\n'
'";;" pair, even if it is in the middle of a quoted string.\n'
'\n'
'If a file ".pdbrc" exists in the user\'s home directory or in '
'the\n'
'current directory, it is read in and executed as if it had been '
'typed\n'
'at the debugger prompt. This is particularly useful for '
'aliases. If\n'
'both files exist, the one in the home directory is read first '
'and\n'
'aliases defined there can be overridden by the local file.\n'
'\n'
'Changed in version 3.2: ".pdbrc" can now contain commands that\n'
'continue debugging, such as "continue" or "next". Previously, '
'these\n'
'commands had no effect.\n'
'\n'
'h(elp) [command]\n'
'\n'
' Without argument, print the list of available commands. With '
'a\n'
' *command* as argument, print help about that command. "help '
'pdb"\n'
' displays the full documentation (the docstring of the "pdb"\n'
' module). Since the *command* argument must be an identifier, '
'"help\n'
' exec" must be entered to get help on the "!" command.\n'
'\n'
'w(here)\n'
'\n'
' Print a stack trace, with the most recent frame at the '
'bottom. An\n'
' arrow indicates the current frame, which determines the '
'context of\n'
' most commands.\n'
'\n'
'd(own) [count]\n'
'\n'
' Move the current frame *count* (default one) levels down in '
'the\n'
' stack trace (to a newer frame).\n'
'\n'
'u(p) [count]\n'
'\n'
' Move the current frame *count* (default one) levels up in the '
'stack\n'
' trace (to an older frame).\n'
'\n'
'b(reak) [([filename:]lineno | function) [, condition]]\n'
'\n'
' With a *lineno* argument, set a break there in the current '
'file.\n'
' With a *function* argument, set a break at the first '
'executable\n'
' statement within that function. The line number may be '
'prefixed\n'
' with a filename and a colon, to specify a breakpoint in '
'another\n'
" file (probably one that hasn't been loaded yet). The file "
'is\n'
' searched on "sys.path". Note that each breakpoint is '
'assigned a\n'
' number to which all the other breakpoint commands refer.\n'
'\n'
' If a second argument is present, it is an expression which '
'must\n'
' evaluate to true before the breakpoint is honored.\n'
'\n'
' Without argument, list all breaks, including for each '
'breakpoint,\n'
' the number of times that breakpoint has been hit, the '
'current\n'
' ignore count, and the associated condition if any.\n'
'\n'
'tbreak [([filename:]lineno | function) [, condition]]\n'
'\n'
' Temporary breakpoint, which is removed automatically when it '
'is\n'
' first hit. The arguments are the same as for "break".\n'
'\n'
'cl(ear) [filename:lineno | bpnumber [bpnumber ...]]\n'
'\n'
' With a *filename:lineno* argument, clear all the breakpoints '
'at\n'
' this line. With a space separated list of breakpoint numbers, '
'clear\n'
' those breakpoints. Without argument, clear all breaks (but '
'first\n'
' ask confirmation).\n'
'\n'
'disable [bpnumber [bpnumber ...]]\n'
'\n'
' Disable the breakpoints given as a space separated list of\n'
' breakpoint numbers. Disabling a breakpoint means it cannot '
'cause\n'
' the program to stop execution, but unlike clearing a '
'breakpoint, it\n'
' remains in the list of breakpoints and can be (re-)enabled.\n'
'\n'
'enable [bpnumber [bpnumber ...]]\n'
'\n'
' Enable the breakpoints specified.\n'
'\n'
'ignore bpnumber [count]\n'
'\n'
' Set the ignore count for the given breakpoint number. If '
'count is\n'
' omitted, the ignore count is set to 0. A breakpoint becomes '
'active\n'
' when the ignore count is zero. When non-zero, the count is\n'
' decremented each time the breakpoint is reached and the '
'breakpoint\n'
' is not disabled and any associated condition evaluates to '
'true.\n'
'\n'
'condition bpnumber [condition]\n'
'\n'
' Set a new *condition* for the breakpoint, an expression which '
'must\n'
' evaluate to true before the breakpoint is honored. If '
'*condition*\n'
' is absent, any existing condition is removed; i.e., the '
'breakpoint\n'
' is made unconditional.\n'
'\n'
'commands [bpnumber]\n'
'\n'
' Specify a list of commands for breakpoint number *bpnumber*. '
'The\n'
' commands themselves appear on the following lines. Type a '
'line\n'
' containing just "end" to terminate the commands. An example:\n'
'\n'
' (Pdb) commands 1\n'
' (com) p some_variable\n'
' (com) end\n'
' (Pdb)\n'
'\n'
' To remove all commands from a breakpoint, type commands and '
'follow\n'
' it immediately with "end"; that is, give no commands.\n'
'\n'
' With no *bpnumber* argument, commands refers to the last '
'breakpoint\n'
' set.\n'
'\n'
' You can use breakpoint commands to start your program up '
'again.\n'
' Simply use the continue command, or step, or any other '
'command that\n'
' resumes execution.\n'
'\n'
' Specifying any command resuming execution (currently '
'continue,\n'
' step, next, return, jump, quit and their abbreviations) '
'terminates\n'
' the command list (as if that command was immediately followed '
'by\n'
' end). This is because any time you resume execution (even '
'with a\n'
' simple next or step), you may encounter another '
'breakpoint--which\n'
' could have its own command list, leading to ambiguities about '
'which\n'
' list to execute.\n'
'\n'
" If you use the 'silent' command in the command list, the "
'usual\n'
' message about stopping at a breakpoint is not printed. This '
'may be\n'
' desirable for breakpoints that are to print a specific '
'message and\n'
' then continue. If none of the other commands print anything, '
'you\n'
' see no sign that the breakpoint was reached.\n'
'\n'
's(tep)\n'
'\n'
' Execute the current line, stop at the first possible '
'occasion\n'
' (either in a function that is called or on the next line in '
'the\n'
' current function).\n'
'\n'
'n(ext)\n'
'\n'
' Continue execution until the next line in the current '
'function is\n'
' reached or it returns. (The difference between "next" and '
'"step"\n'
' is that "step" stops inside a called function, while "next"\n'
' executes called functions at (nearly) full speed, only '
'stopping at\n'
' the next line in the current function.)\n'
'\n'
'unt(il) [lineno]\n'
'\n'
' Without argument, continue execution until the line with a '
'number\n'
' greater than the current one is reached.\n'
'\n'
' With a line number, continue execution until a line with a '
'number\n'
' greater or equal to that is reached. In both cases, also '
'stop when\n'
' the current frame returns.\n'
'\n'
' Changed in version 3.2: Allow giving an explicit line '
'number.\n'
'\n'
'r(eturn)\n'
'\n'
' Continue execution until the current function returns.\n'
'\n'
'c(ont(inue))\n'
'\n'
' Continue execution, only stop when a breakpoint is '
'encountered.\n'
'\n'
'j(ump) lineno\n'
'\n'
' Set the next line that will be executed. Only available in '
'the\n'
' bottom-most frame. This lets you jump back and execute code '
'again,\n'
" or jump forward to skip code that you don't want to run.\n"
'\n'
' It should be noted that not all jumps are allowed -- for '
'instance\n'
' it is not possible to jump into the middle of a "for" loop or '
'out\n'
' of a "finally" clause.\n'
'\n'
'l(ist) [first[, last]]\n'
'\n'
' List source code for the current file. Without arguments, '
'list 11\n'
' lines around the current line or continue the previous '
'listing.\n'
' With "." as argument, list 11 lines around the current line. '
'With\n'
' one argument, list 11 lines around at that line. With two\n'
' arguments, list the given range; if the second argument is '
'less\n'
' than the first, it is interpreted as a count.\n'
'\n'
' The current line in the current frame is indicated by "->". '
'If an\n'
' exception is being debugged, the line where the exception '
'was\n'
' originally raised or propagated is indicated by ">>", if it '
'differs\n'
' from the current line.\n'
'\n'
' New in version 3.2: The ">>" marker.\n'
'\n'
'll | longlist\n'
'\n'
' List all source code for the current function or frame.\n'
' Interesting lines are marked as for "list".\n'
'\n'
' New in version 3.2.\n'
'\n'
'a(rgs)\n'
'\n'
' Print the argument list of the current function.\n'
'\n'
'p expression\n'
'\n'
' Evaluate the *expression* in the current context and print '
'its\n'
' value.\n'
'\n'
' Note: "print()" can also be used, but is not a debugger '
'command\n'
' --- this executes the Python "print()" function.\n'
'\n'
'pp expression\n'
'\n'
' Like the "p" command, except the value of the expression is '
'pretty-\n'
' printed using the "pprint" module.\n'
'\n'
'whatis expression\n'
'\n'
' Print the type of the *expression*.\n'
'\n'
'source expression\n'
'\n'
' Try to get source code for the given object and display it.\n'
'\n'
' New in version 3.2.\n'
'\n'
'display [expression]\n'
'\n'
' Display the value of the expression if it changed, each time\n'
' execution stops in the current frame.\n'
'\n'
' Without expression, list all display expressions for the '
'current\n'
' frame.\n'
'\n'
' New in version 3.2.\n'
'\n'
'undisplay [expression]\n'
'\n'
' Do not display the expression any more in the current frame.\n'
' Without expression, clear all display expressions for the '
'current\n'
' frame.\n'
'\n'
' New in version 3.2.\n'
'\n'
'interact\n'
'\n'
' Start an interative interpreter (using the "code" module) '
'whose\n'
' global namespace contains all the (global and local) names '
'found in\n'
' the current scope.\n'
'\n'
' New in version 3.2.\n'
'\n'
'alias [name [command]]\n'
'\n'
' Create an alias called *name* that executes *command*. The '
'command\n'
' must *not* be enclosed in quotes. Replaceable parameters can '
'be\n'
' indicated by "%1", "%2", and so on, while "%*" is replaced by '
'all\n'
' the parameters. If no command is given, the current alias '
'for\n'
' *name* is shown. If no arguments are given, all aliases are '
'listed.\n'
'\n'
' Aliases may be nested and can contain anything that can be '
'legally\n'
' typed at the pdb prompt. Note that internal pdb commands '
'*can* be\n'
' overridden by aliases. Such a command is then hidden until '
'the\n'
' alias is removed. Aliasing is recursively applied to the '
'first\n'
' word of the command line; all other words in the line are '
'left\n'
' alone.\n'
'\n'
' As an example, here are two useful aliases (especially when '
'placed\n'
' in the ".pdbrc" file):\n'
'\n'
' # Print instance variables (usage "pi classInst")\n'
' alias pi for k in %1.__dict__.keys(): '
'print("%1.",k,"=",%1.__dict__[k])\n'
' # Print instance variables in self\n'
' alias ps pi self\n'
'\n'
'unalias name\n'
'\n'
' Delete the specified alias.\n'
'\n'
'! statement\n'
'\n'
' Execute the (one-line) *statement* in the context of the '
'current\n'
' stack frame. The exclamation point can be omitted unless the '
'first\n'
' word of the statement resembles a debugger command. To set '
'a\n'
' global variable, you can prefix the assignment command with '
'a\n'
' "global" statement on the same line, e.g.:\n'
'\n'
" (Pdb) global list_options; list_options = ['-l']\n"
' (Pdb)\n'
'\n'
'run [args ...]\n'
'restart [args ...]\n'
'\n'
' Restart the debugged Python program. If an argument is '
'supplied,\n'
' it is split with "shlex" and the result is used as the new\n'
' "sys.argv". History, breakpoints, actions and debugger '
'options are\n'
' preserved. "restart" is an alias for "run".\n'
'\n'
'q(uit)\n'
'\n'
' Quit from the debugger. The program being executed is '
'aborted.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] Whether a frame is considered to originate in a certain '
'module\n'
' is determined by the "__name__" in the frame globals.\n',
'del': '\n'
'The "del" statement\n'
'*******************\n'
'\n'
' del_stmt ::= "del" target_list\n'
'\n'
'Deletion is recursively defined very similar to the way assignment '
'is\n'
'defined. Rather than spelling it out in full details, here are some\n'
'hints.\n'
'\n'
'Deletion of a target list recursively deletes each target, from left\n'
'to right.\n'
'\n'
'Deletion of a name removes the binding of that name from the local '
'or\n'
'global namespace, depending on whether the name occurs in a "global"\n'
'statement in the same code block. If the name is unbound, a\n'
'"NameError" exception will be raised.\n'
'\n'
'Deletion of attribute references, subscriptions and slicings is '
'passed\n'
'to the primary object involved; deletion of a slicing is in general\n'
'equivalent to assignment of an empty slice of the right type (but '
'even\n'
'this is determined by the sliced object).\n'
'\n'
'Changed in version 3.2: Previously it was illegal to delete a name\n'
'from the local namespace if it occurs as a free variable in a nested\n'
'block.\n',
'dict': '\n'
'Dictionary displays\n'
'*******************\n'
'\n'
'A dictionary display is a possibly empty series of key/datum pairs\n'
'enclosed in curly braces:\n'
'\n'
' dict_display ::= "{" [key_datum_list | dict_comprehension] '
'"}"\n'
' key_datum_list ::= key_datum ("," key_datum)* [","]\n'
' key_datum ::= expression ":" expression\n'
' dict_comprehension ::= expression ":" expression comp_for\n'
'\n'
'A dictionary display yields a new dictionary object.\n'
'\n'
'If a comma-separated sequence of key/datum pairs is given, they are\n'
'evaluated from left to right to define the entries of the '
'dictionary:\n'
'each key object is used as a key into the dictionary to store the\n'
'corresponding datum. This means that you can specify the same key\n'
"multiple times in the key/datum list, and the final dictionary's "
'value\n'
'for that key will be the last one given.\n'
'\n'
'A dict comprehension, in contrast to list and set comprehensions,\n'
'needs two expressions separated with a colon followed by the usual\n'
'"for" and "if" clauses. When the comprehension is run, the '
'resulting\n'
'key and value elements are inserted in the new dictionary in the '
'order\n'
'they are produced.\n'
'\n'
'Restrictions on the types of the key values are listed earlier in\n'
'section *The standard type hierarchy*. (To summarize, the key type\n'
'should be *hashable*, which excludes all mutable objects.) Clashes\n'
'between duplicate keys are not detected; the last datum (textually\n'
'rightmost in the display) stored for a given key value prevails.\n',
'dynamic-features': '\n'
'Interaction with dynamic features\n'
'*********************************\n'
'\n'
'Name resolution of free variables occurs at runtime, not '
'at compile\n'
'time. This means that the following code will print 42:\n'
'\n'
' i = 10\n'
' def f():\n'
' print(i)\n'
' i = 42\n'
' f()\n'
'\n'
'There are several cases where Python statements are '
'illegal when used\n'
'in conjunction with nested scopes that contain free '
'variables.\n'
'\n'
'If a variable is referenced in an enclosing scope, it is '
'illegal to\n'
'delete the name. An error will be reported at compile '
'time.\n'
'\n'
'The "eval()" and "exec()" functions do not have access '
'to the full\n'
'environment for resolving names. Names may be resolved '
'in the local\n'
'and global namespaces of the caller. Free variables are '
'not resolved\n'
'in the nearest enclosing namespace, but in the global '
'namespace. [1]\n'
'The "exec()" and "eval()" functions have optional '
'arguments to\n'
'override the global and local namespace. If only one '
'namespace is\n'
'specified, it is used for both.\n',
'else': '\n'
'The "if" statement\n'
'******************\n'
'\n'
'The "if" statement is used for conditional execution:\n'
'\n'
' if_stmt ::= "if" expression ":" suite\n'
' ( "elif" expression ":" suite )*\n'
' ["else" ":" suite]\n'
'\n'
'It selects exactly one of the suites by evaluating the expressions '
'one\n'
'by one until one is found to be true (see section *Boolean '
'operations*\n'
'for the definition of true and false); then that suite is executed\n'
'(and no other part of the "if" statement is executed or evaluated).\n'
'If all expressions are false, the suite of the "else" clause, if\n'
'present, is executed.\n',
'exceptions': '\n'
'Exceptions\n'
'**********\n'
'\n'
'Exceptions are a means of breaking out of the normal flow of '
'control\n'
'of a code block in order to handle errors or other '
'exceptional\n'
'conditions. An exception is *raised* at the point where the '
'error is\n'
'detected; it may be *handled* by the surrounding code block or '
'by any\n'
'code block that directly or indirectly invoked the code block '
'where\n'
'the error occurred.\n'
'\n'
'The Python interpreter raises an exception when it detects a '
'run-time\n'
'error (such as division by zero). A Python program can also\n'
'explicitly raise an exception with the "raise" statement. '
'Exception\n'
'handlers are specified with the "try" ... "except" statement. '
'The\n'
'"finally" clause of such a statement can be used to specify '
'cleanup\n'
'code which does not handle the exception, but is executed '
'whether an\n'
'exception occurred or not in the preceding code.\n'
'\n'
'Python uses the "termination" model of error handling: an '
'exception\n'
'handler can find out what happened and continue execution at '
'an outer\n'
'level, but it cannot repair the cause of the error and retry '
'the\n'
'failing operation (except by re-entering the offending piece '
'of code\n'
'from the top).\n'
'\n'
'When an exception is not handled at all, the interpreter '
'terminates\n'
'execution of the program, or returns to its interactive main '
'loop. In\n'
'either case, it prints a stack backtrace, except when the '
'exception is\n'
'"SystemExit".\n'
'\n'
'Exceptions are identified by class instances. The "except" '
'clause is\n'
'selected depending on the class of the instance: it must '
'reference the\n'
'class of the instance or a base class thereof. The instance '
'can be\n'
'received by the handler and can carry additional information '
'about the\n'
'exceptional condition.\n'
'\n'
'Note: Exception messages are not part of the Python API. '
'Their\n'
' contents may change from one version of Python to the next '
'without\n'
' warning and should not be relied on by code which will run '
'under\n'
' multiple versions of the interpreter.\n'
'\n'
'See also the description of the "try" statement in section '
'*The try\n'
'statement* and "raise" statement in section *The raise '
'statement*.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] This limitation occurs because the code that is executed '
'by\n'
' these operations is not available at the time the module '
'is\n'
' compiled.\n',
'execmodel': '\n'
'Execution model\n'
'***************\n'
'\n'
'\n'
'Structure of a program\n'
'======================\n'
'\n'
'A Python program is constructed from code blocks. A *block* is '
'a piece\n'
'of Python program text that is executed as a unit. The '
'following are\n'
'blocks: a module, a function body, and a class definition. '
'Each\n'
'command typed interactively is a block. A script file (a file '
'given\n'
'as standard input to the interpreter or specified as a command '
'line\n'
'argument to the interpreter) is a code block. A script command '
'(a\n'
'command specified on the interpreter command line with the '
"'**-c**'\n"
'option) is a code block. The string argument passed to the '
'built-in\n'
'functions "eval()" and "exec()" is a code block.\n'
'\n'
'A code block is executed in an *execution frame*. A frame '
'contains\n'
'some administrative information (used for debugging) and '
'determines\n'
"where and how execution continues after the code block's "
'execution has\n'
'completed.\n'
'\n'
'\n'
'Naming and binding\n'
'==================\n'
'\n'
'\n'
'Binding of names\n'
'----------------\n'
'\n'
'*Names* refer to objects. Names are introduced by name '
'binding\n'
'operations.\n'
'\n'
'The following constructs bind names: formal parameters to '
'functions,\n'
'"import" statements, class and function definitions (these bind '
'the\n'
'class or function name in the defining block), and targets that '
'are\n'
'identifiers if occurring in an assignment, "for" loop header, '
'or after\n'
'"as" in a "with" statement or "except" clause. The "import" '
'statement\n'
'of the form "from ... import *" binds all names defined in the\n'
'imported module, except those beginning with an underscore. '
'This form\n'
'may only be used at the module level.\n'
'\n'
'A target occurring in a "del" statement is also considered '
'bound for\n'
'this purpose (though the actual semantics are to unbind the '
'name).\n'
'\n'
'Each assignment or import statement occurs within a block '
'defined by a\n'
'class or function definition or at the module level (the '
'top-level\n'
'code block).\n'
'\n'
'If a name is bound in a block, it is a local variable of that '
'block,\n'
'unless declared as "nonlocal" or "global". If a name is bound '
'at the\n'
'module level, it is a global variable. (The variables of the '
'module\n'
'code block are local and global.) If a variable is used in a '
'code\n'
'block but not defined there, it is a *free variable*.\n'
'\n'
'Each occurrence of a name in the program text refers to the '
'*binding*\n'
'of that name established by the following name resolution '
'rules.\n'
'\n'
'\n'
'Resolution of names\n'
'-------------------\n'
'\n'
'A *scope* defines the visibility of a name within a block. If '
'a local\n'
'variable is defined in a block, its scope includes that block. '
'If the\n'
'definition occurs in a function block, the scope extends to any '
'blocks\n'
'contained within the defining one, unless a contained block '
'introduces\n'
'a different binding for the name.\n'
'\n'
'When a name is used in a code block, it is resolved using the '
'nearest\n'
'enclosing scope. The set of all such scopes visible to a code '
'block\n'
"is called the block's *environment*.\n"
'\n'
'When a name is not found at all, a "NameError" exception is '
'raised. If\n'
'the current scope is a function scope, and the name refers to a '
'local\n'
'variable that has not yet been bound to a value at the point '
'where the\n'
'name is used, an "UnboundLocalError" exception is raised.\n'
'"UnboundLocalError" is a subclass of "NameError".\n'
'\n'
'If a name binding operation occurs anywhere within a code '
'block, all\n'
'uses of the name within the block are treated as references to '
'the\n'
'current block. This can lead to errors when a name is used '
'within a\n'
'block before it is bound. This rule is subtle. Python lacks\n'
'declarations and allows name binding operations to occur '
'anywhere\n'
'within a code block. The local variables of a code block can '
'be\n'
'determined by scanning the entire text of the block for name '
'binding\n'
'operations.\n'
'\n'
'If the "global" statement occurs within a block, all uses of '
'the name\n'
'specified in the statement refer to the binding of that name in '
'the\n'
'top-level namespace. Names are resolved in the top-level '
'namespace by\n'
'searching the global namespace, i.e. the namespace of the '
'module\n'
'containing the code block, and the builtins namespace, the '
'namespace\n'
'of the module "builtins". The global namespace is searched '
'first. If\n'
'the name is not found there, the builtins namespace is '
'searched. The\n'
'"global" statement must precede all uses of the name.\n'
'\n'
'The "global" statement has the same scope as a name binding '
'operation\n'
'in the same block. If the nearest enclosing scope for a free '
'variable\n'
'contains a global statement, the free variable is treated as a '
'global.\n'
'\n'
'The "nonlocal" statement causes corresponding names to refer '
'to\n'
'previously bound variables in the nearest enclosing function '
'scope.\n'
'"SyntaxError" is raised at compile time if the given name does '
'not\n'
'exist in any enclosing function scope.\n'
'\n'
'The namespace for a module is automatically created the first '
'time a\n'
'module is imported. The main module for a script is always '
'called\n'
'"__main__".\n'
'\n'
'Class definition blocks and arguments to "exec()" and "eval()" '
'are\n'
'special in the context of name resolution. A class definition '
'is an\n'
'executable statement that may use and define names. These '
'references\n'
'follow the normal rules for name resolution with an exception '
'that\n'
'unbound local variables are looked up in the global namespace. '
'The\n'
'namespace of the class definition becomes the attribute '
'dictionary of\n'
'the class. The scope of names defined in a class block is '
'limited to\n'
'the class block; it does not extend to the code blocks of '
'methods --\n'
'this includes comprehensions and generator expressions since '
'they are\n'
'implemented using a function scope. This means that the '
'following\n'
'will fail:\n'
'\n'
' class A:\n'
' a = 42\n'
' b = list(a + i for i in range(10))\n'
'\n'
'\n'
'Builtins and restricted execution\n'
'---------------------------------\n'
'\n'
'The builtins namespace associated with the execution of a code '
'block\n'
'is actually found by looking up the name "__builtins__" in its '
'global\n'
'namespace; this should be a dictionary or a module (in the '
'latter case\n'
"the module's dictionary is used). By default, when in the "
'"__main__"\n'
'module, "__builtins__" is the built-in module "builtins"; when '
'in any\n'
'other module, "__builtins__" is an alias for the dictionary of '
'the\n'
'"builtins" module itself. "__builtins__" can be set to a '
'user-created\n'
'dictionary to create a weak form of restricted execution.\n'
'\n'
'**CPython implementation detail:** Users should not touch\n'
'"__builtins__"; it is strictly an implementation detail. '
'Users\n'
'wanting to override values in the builtins namespace should '
'"import"\n'
'the "builtins" module and modify its attributes appropriately.\n'
'\n'
'\n'
'Interaction with dynamic features\n'
'---------------------------------\n'
'\n'
'Name resolution of free variables occurs at runtime, not at '
'compile\n'
'time. This means that the following code will print 42:\n'
'\n'
' i = 10\n'
' def f():\n'
' print(i)\n'
' i = 42\n'
' f()\n'
'\n'
'There are several cases where Python statements are illegal '
'when used\n'
'in conjunction with nested scopes that contain free variables.\n'
'\n'
'If a variable is referenced in an enclosing scope, it is '
'illegal to\n'
'delete the name. An error will be reported at compile time.\n'
'\n'
'The "eval()" and "exec()" functions do not have access to the '
'full\n'
'environment for resolving names. Names may be resolved in the '
'local\n'
'and global namespaces of the caller. Free variables are not '
'resolved\n'
'in the nearest enclosing namespace, but in the global '
'namespace. [1]\n'
'The "exec()" and "eval()" functions have optional arguments to\n'
'override the global and local namespace. If only one namespace '
'is\n'
'specified, it is used for both.\n'
'\n'
'\n'
'Exceptions\n'
'==========\n'
'\n'
'Exceptions are a means of breaking out of the normal flow of '
'control\n'
'of a code block in order to handle errors or other exceptional\n'
'conditions. An exception is *raised* at the point where the '
'error is\n'
'detected; it may be *handled* by the surrounding code block or '
'by any\n'
'code block that directly or indirectly invoked the code block '
'where\n'
'the error occurred.\n'
'\n'
'The Python interpreter raises an exception when it detects a '
'run-time\n'
'error (such as division by zero). A Python program can also\n'
'explicitly raise an exception with the "raise" statement. '
'Exception\n'
'handlers are specified with the "try" ... "except" statement. '
'The\n'
'"finally" clause of such a statement can be used to specify '
'cleanup\n'
'code which does not handle the exception, but is executed '
'whether an\n'
'exception occurred or not in the preceding code.\n'
'\n'
'Python uses the "termination" model of error handling: an '
'exception\n'
'handler can find out what happened and continue execution at an '
'outer\n'
'level, but it cannot repair the cause of the error and retry '
'the\n'
'failing operation (except by re-entering the offending piece of '
'code\n'
'from the top).\n'
'\n'
'When an exception is not handled at all, the interpreter '
'terminates\n'
'execution of the program, or returns to its interactive main '
'loop. In\n'
'either case, it prints a stack backtrace, except when the '
'exception is\n'
'"SystemExit".\n'
'\n'
'Exceptions are identified by class instances. The "except" '
'clause is\n'
'selected depending on the class of the instance: it must '
'reference the\n'
'class of the instance or a base class thereof. The instance '
'can be\n'
'received by the handler and can carry additional information '
'about the\n'
'exceptional condition.\n'
'\n'
'Note: Exception messages are not part of the Python API. '
'Their\n'
' contents may change from one version of Python to the next '
'without\n'
' warning and should not be relied on by code which will run '
'under\n'
' multiple versions of the interpreter.\n'
'\n'
'See also the description of the "try" statement in section *The '
'try\n'
'statement* and "raise" statement in section *The raise '
'statement*.\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] This limitation occurs because the code that is executed '
'by\n'
' these operations is not available at the time the module '
'is\n'
' compiled.\n',
'exprlists': '\n'
'Expression lists\n'
'****************\n'
'\n'
' expression_list ::= expression ( "," expression )* [","]\n'
'\n'
'An expression list containing at least one comma yields a '
'tuple. The\n'
'length of the tuple is the number of expressions in the list. '
'The\n'
'expressions are evaluated from left to right.\n'
'\n'
'The trailing comma is required only to create a single tuple '
'(a.k.a. a\n'
'*singleton*); it is optional in all other cases. A single '
'expression\n'
"without a trailing comma doesn't create a tuple, but rather "
'yields the\n'
'value of that expression. (To create an empty tuple, use an '
'empty pair\n'
'of parentheses: "()".)\n',
'floating': '\n'
'Floating point literals\n'
'***********************\n'
'\n'
'Floating point literals are described by the following lexical\n'
'definitions:\n'
'\n'
' floatnumber ::= pointfloat | exponentfloat\n'
' pointfloat ::= [intpart] fraction | intpart "."\n'
' exponentfloat ::= (intpart | pointfloat) exponent\n'
' intpart ::= digit+\n'
' fraction ::= "." digit+\n'
' exponent ::= ("e" | "E") ["+" | "-"] digit+\n'
'\n'
'Note that the integer and exponent parts are always interpreted '
'using\n'
'radix 10. For example, "077e010" is legal, and denotes the same '
'number\n'
'as "77e10". The allowed range of floating point literals is\n'
'implementation-dependent. Some examples of floating point '
'literals:\n'
'\n'
' 3.14 10. .001 1e100 3.14e-10 0e0\n'
'\n'
'Note that numeric literals do not include a sign; a phrase like '
'"-1"\n'
'is actually an expression composed of the unary operator "-" and '
'the\n'
'literal "1".\n',
'for': '\n'
'The "for" statement\n'
'*******************\n'
'\n'
'The "for" statement is used to iterate over the elements of a '
'sequence\n'
'(such as a string, tuple or list) or other iterable object:\n'
'\n'
' for_stmt ::= "for" target_list "in" expression_list ":" suite\n'
' ["else" ":" suite]\n'
'\n'
'The expression list is evaluated once; it should yield an iterable\n'
'object. An iterator is created for the result of the\n'
'"expression_list". The suite is then executed once for each item\n'
'provided by the iterator, in the order returned by the iterator. '
'Each\n'
'item in turn is assigned to the target list using the standard rules\n'
'for assignments (see *Assignment statements*), and then the suite is\n'
'executed. When the items are exhausted (which is immediately when '
'the\n'
'sequence is empty or an iterator raises a "StopIteration" '
'exception),\n'
'the suite in the "else" clause, if present, is executed, and the '
'loop\n'
'terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and '
'continues\n'
'with the next item, or with the "else" clause if there is no next\n'
'item.\n'
'\n'
'The for-loop makes assignments to the variables(s) in the target '
'list.\n'
'This overwrites all previous assignments to those variables '
'including\n'
'those made in the suite of the for-loop:\n'
'\n'
' for i in range(10):\n'
' print(i)\n'
' i = 5 # this will not affect the for-loop\n'
' # because i will be overwritten with the '
'next\n'
' # index in the range\n'
'\n'
'Names in the target list are not deleted when the loop is finished,\n'
'but if the sequence is empty, they will not have been assigned to at\n'
'all by the loop. Hint: the built-in function "range()" returns an\n'
'iterator of integers suitable to emulate the effect of Pascal\'s "for '
'i\n'
':= a to b do"; e.g., "list(range(3))" returns the list "[0, 1, 2]".\n'
'\n'
'Note: There is a subtlety when the sequence is being modified by the\n'
' loop (this can only occur for mutable sequences, i.e. lists). An\n'
' internal counter is used to keep track of which item is used next,\n'
' and this is incremented on each iteration. When this counter has\n'
' reached the length of the sequence the loop terminates. This '
'means\n'
' that if the suite deletes the current (or a previous) item from '
'the\n'
' sequence, the next item will be skipped (since it gets the index '
'of\n'
' the current item which has already been treated). Likewise, if '
'the\n'
' suite inserts an item in the sequence before the current item, the\n'
' current item will be treated again the next time through the loop.\n'
' This can lead to nasty bugs that can be avoided by making a\n'
' temporary copy using a slice of the whole sequence, e.g.,\n'
'\n'
' for x in a[:]:\n'
' if x < 0: a.remove(x)\n',
'formatstrings': '\n'
'Format String Syntax\n'
'********************\n'
'\n'
'The "str.format()" method and the "Formatter" class share '
'the same\n'
'syntax for format strings (although in the case of '
'"Formatter",\n'
'subclasses can define their own format string syntax).\n'
'\n'
'Format strings contain "replacement fields" surrounded by '
'curly braces\n'
'"{}". Anything that is not contained in braces is '
'considered literal\n'
'text, which is copied unchanged to the output. If you need '
'to include\n'
'a brace character in the literal text, it can be escaped by '
'doubling:\n'
'"{{" and "}}".\n'
'\n'
'The grammar for a replacement field is as follows:\n'
'\n'
' replacement_field ::= "{" [field_name] ["!" '
'conversion] [":" format_spec] "}"\n'
' field_name ::= arg_name ("." attribute_name | '
'"[" element_index "]")*\n'
' arg_name ::= [identifier | integer]\n'
' attribute_name ::= identifier\n'
' element_index ::= integer | index_string\n'
' index_string ::= <any source character except '
'"]"> +\n'
' conversion ::= "r" | "s" | "a"\n'
' format_spec ::= <described in the next '
'section>\n'
'\n'
'In less formal terms, the replacement field can start with '
'a\n'
'*field_name* that specifies the object whose value is to be '
'formatted\n'
'and inserted into the output instead of the replacement '
'field. The\n'
'*field_name* is optionally followed by a *conversion* '
'field, which is\n'
'preceded by an exclamation point "\'!\'", and a '
'*format_spec*, which is\n'
'preceded by a colon "\':\'". These specify a non-default '
'format for the\n'
'replacement value.\n'
'\n'
'See also the *Format Specification Mini-Language* section.\n'
'\n'
'The *field_name* itself begins with an *arg_name* that is '
'either a\n'
"number or a keyword. If it's a number, it refers to a "
'positional\n'
"argument, and if it's a keyword, it refers to a named "
'keyword\n'
'argument. If the numerical arg_names in a format string '
'are 0, 1, 2,\n'
'... in sequence, they can all be omitted (not just some) '
'and the\n'
'numbers 0, 1, 2, ... will be automatically inserted in that '
'order.\n'
'Because *arg_name* is not quote-delimited, it is not '
'possible to\n'
'specify arbitrary dictionary keys (e.g., the strings '
'"\'10\'" or\n'
'"\':-]\'") within a format string. The *arg_name* can be '
'followed by any\n'
'number of index or attribute expressions. An expression of '
'the form\n'
'"\'.name\'" selects the named attribute using "getattr()", '
'while an\n'
'expression of the form "\'[index]\'" does an index lookup '
'using\n'
'"__getitem__()".\n'
'\n'
'Changed in version 3.1: The positional argument specifiers '
'can be\n'
'omitted, so "\'{} {}\'" is equivalent to "\'{0} {1}\'".\n'
'\n'
'Some simple format string examples:\n'
'\n'
' "First, thou shalt count to {0}" # References first '
'positional argument\n'
' "Bring me a {}" # Implicitly references '
'the first positional argument\n'
' "From {} to {}" # Same as "From {0} to '
'{1}"\n'
' "My quest is {name}" # References keyword '
"argument 'name'\n"
' "Weight in tons {0.weight}" # \'weight\' attribute '
'of first positional arg\n'
' "Units destroyed: {players[0]}" # First element of '
"keyword argument 'players'.\n"
'\n'
'The *conversion* field causes a type coercion before '
'formatting.\n'
'Normally, the job of formatting a value is done by the '
'"__format__()"\n'
'method of the value itself. However, in some cases it is '
'desirable to\n'
'force a type to be formatted as a string, overriding its '
'own\n'
'definition of formatting. By converting the value to a '
'string before\n'
'calling "__format__()", the normal formatting logic is '
'bypassed.\n'
'\n'
'Three conversion flags are currently supported: "\'!s\'" '
'which calls\n'
'"str()" on the value, "\'!r\'" which calls "repr()" and '
'"\'!a\'" which\n'
'calls "ascii()".\n'
'\n'
'Some examples:\n'
'\n'
' "Harold\'s a clever {0!s}" # Calls str() on the '
'argument first\n'
' "Bring out the holy {name!r}" # Calls repr() on the '
'argument first\n'
' "More {!a}" # Calls ascii() on the '
'argument first\n'
'\n'
'The *format_spec* field contains a specification of how the '
'value\n'
'should be presented, including such details as field width, '
'alignment,\n'
'padding, decimal precision and so on. Each value type can '
'define its\n'
'own "formatting mini-language" or interpretation of the '
'*format_spec*.\n'
'\n'
'Most built-in types support a common formatting '
'mini-language, which\n'
'is described in the next section.\n'
'\n'
'A *format_spec* field can also include nested replacement '
'fields\n'
'within it. These nested replacement fields can contain only '
'a field\n'
'name; conversion flags and format specifications are not '
'allowed. The\n'
'replacement fields within the format_spec are substituted '
'before the\n'
'*format_spec* string is interpreted. This allows the '
'formatting of a\n'
'value to be dynamically specified.\n'
'\n'
'See the *Format examples* section for some examples.\n'
'\n'
'\n'
'Format Specification Mini-Language\n'
'==================================\n'
'\n'
'"Format specifications" are used within replacement fields '
'contained\n'
'within a format string to define how individual values are '
'presented\n'
'(see *Format String Syntax*). They can also be passed '
'directly to the\n'
'built-in "format()" function. Each formattable type may '
'define how\n'
'the format specification is to be interpreted.\n'
'\n'
'Most built-in types implement the following options for '
'format\n'
'specifications, although some of the formatting options are '
'only\n'
'supported by the numeric types.\n'
'\n'
'A general convention is that an empty format string ("""") '
'produces\n'
'the same result as if you had called "str()" on the value. '
'A non-empty\n'
'format string typically modifies the result.\n'
'\n'
'The general form of a *standard format specifier* is:\n'
'\n'
' format_spec ::= '
'[[fill]align][sign][#][0][width][,][.precision][type]\n'
' fill ::= <any character>\n'
' align ::= "<" | ">" | "=" | "^"\n'
' sign ::= "+" | "-" | " "\n'
' width ::= integer\n'
' precision ::= integer\n'
' type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" '
'| "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"\n'
'\n'
'If a valid *align* value is specified, it can be preceded '
'by a *fill*\n'
'character that can be any character and defaults to a space '
'if\n'
'omitted. Note that it is not possible to use "{" and "}" as '
'*fill*\n'
'char while using the "str.format()" method; this limitation '
'however\n'
'doesn\'t affect the "format()" function.\n'
'\n'
'The meaning of the various alignment options is as '
'follows:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Option | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'<\'" | Forces the field to be left-aligned '
'within the available |\n'
' | | space (this is the default for most '
'objects). |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'>\'" | Forces the field to be right-aligned '
'within the available |\n'
' | | space (this is the default for '
'numbers). |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'=\'" | Forces the padding to be placed after '
'the sign (if any) |\n'
' | | but before the digits. This is used for '
'printing fields |\n'
" | | in the form '+000000120'. This alignment "
'option is only |\n'
' | | valid for numeric '
'types. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'^\'" | Forces the field to be centered within '
'the available |\n'
' | | '
'space. '
'|\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'Note that unless a minimum field width is defined, the '
'field width\n'
'will always be the same size as the data to fill it, so '
'that the\n'
'alignment option has no meaning in this case.\n'
'\n'
'The *sign* option is only valid for number types, and can '
'be one of\n'
'the following:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Option | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'+\'" | indicates that a sign should be used for '
'both positive as |\n'
' | | well as negative '
'numbers. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'-\'" | indicates that a sign should be used '
'only for negative |\n'
' | | numbers (this is the default '
'behavior). |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | space | indicates that a leading space should be '
'used on positive |\n'
' | | numbers, and a minus sign on negative '
'numbers. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'The "\'#\'" option causes the "alternate form" to be used '
'for the\n'
'conversion. The alternate form is defined differently for '
'different\n'
'types. This option is only valid for integer, float, '
'complex and\n'
'Decimal types. For integers, when binary, octal, or '
'hexadecimal output\n'
'is used, this option adds the prefix respective "\'0b\'", '
'"\'0o\'", or\n'
'"\'0x\'" to the output value. For floats, complex and '
'Decimal the\n'
'alternate form causes the result of the conversion to '
'always contain a\n'
'decimal-point character, even if no digits follow it. '
'Normally, a\n'
'decimal-point character appears in the result of these '
'conversions\n'
'only if a digit follows it. In addition, for "\'g\'" and '
'"\'G\'"\n'
'conversions, trailing zeros are not removed from the '
'result.\n'
'\n'
'The "\',\'" option signals the use of a comma for a '
'thousands separator.\n'
'For a locale aware separator, use the "\'n\'" integer '
'presentation type\n'
'instead.\n'
'\n'
'Changed in version 3.1: Added the "\',\'" option (see also '
'**PEP 378**).\n'
'\n'
'*width* is a decimal integer defining the minimum field '
'width. If not\n'
'specified, then the field width will be determined by the '
'content.\n'
'\n'
'Preceding the *width* field by a zero ("\'0\'") character '
'enables sign-\n'
'aware zero-padding for numeric types. This is equivalent '
'to a *fill*\n'
'character of "\'0\'" with an *alignment* type of "\'=\'".\n'
'\n'
'The *precision* is a decimal number indicating how many '
'digits should\n'
'be displayed after the decimal point for a floating point '
'value\n'
'formatted with "\'f\'" and "\'F\'", or before and after the '
'decimal point\n'
'for a floating point value formatted with "\'g\'" or '
'"\'G\'". For non-\n'
'number types the field indicates the maximum field size - '
'in other\n'
'words, how many characters will be used from the field '
'content. The\n'
'*precision* is not allowed for integer values.\n'
'\n'
'Finally, the *type* determines how the data should be '
'presented.\n'
'\n'
'The available string presentation types are:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Type | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'s\'" | String format. This is the default type '
'for strings and |\n'
' | | may be '
'omitted. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | None | The same as '
'"\'s\'". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'The available integer presentation types are:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Type | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'b\'" | Binary format. Outputs the number in '
'base 2. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'c\'" | Character. Converts the integer to the '
'corresponding |\n'
' | | unicode character before '
'printing. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'d\'" | Decimal Integer. Outputs the number in '
'base 10. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'o\'" | Octal format. Outputs the number in base '
'8. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'x\'" | Hex format. Outputs the number in base '
'16, using lower- |\n'
' | | case letters for the digits above '
'9. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'X\'" | Hex format. Outputs the number in base '
'16, using upper- |\n'
' | | case letters for the digits above '
'9. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'n\'" | Number. This is the same as "\'d\'", '
'except that it uses the |\n'
' | | current locale setting to insert the '
'appropriate number |\n'
' | | separator '
'characters. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | None | The same as '
'"\'d\'". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'In addition to the above presentation types, integers can '
'be formatted\n'
'with the floating point presentation types listed below '
'(except "\'n\'"\n'
'and None). When doing so, "float()" is used to convert the '
'integer to\n'
'a floating point number before formatting.\n'
'\n'
'The available presentation types for floating point and '
'decimal values\n'
'are:\n'
'\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | Type | '
'Meaning '
'|\n'
' '
'+===========+============================================================+\n'
' | "\'e\'" | Exponent notation. Prints the number in '
'scientific |\n'
" | | notation using the letter 'e' to indicate "
'the exponent. |\n'
' | | The default precision is '
'"6". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'E\'" | Exponent notation. Same as "\'e\'" '
'except it uses an upper |\n'
" | | case 'E' as the separator "
'character. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'f\'" | Fixed point. Displays the number as a '
'fixed-point number. |\n'
' | | The default precision is '
'"6". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'F\'" | Fixed point. Same as "\'f\'", but '
'converts "nan" to "NAN" |\n'
' | | and "inf" to '
'"INF". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'g\'" | General format. For a given precision '
'"p >= 1", this |\n'
' | | rounds the number to "p" significant '
'digits and then |\n'
' | | formats the result in either fixed-point '
'format or in |\n'
' | | scientific notation, depending on its '
'magnitude. The |\n'
' | | precise rules are as follows: suppose that '
'the result |\n'
' | | formatted with presentation type "\'e\'" '
'and precision "p-1" |\n'
' | | would have exponent "exp". Then if "-4 <= '
'exp < p", the |\n'
' | | number is formatted with presentation type '
'"\'f\'" and |\n'
' | | precision "p-1-exp". Otherwise, the '
'number is formatted |\n'
' | | with presentation type "\'e\'" and '
'precision "p-1". In both |\n'
' | | cases insignificant trailing zeros are '
'removed from the |\n'
' | | significand, and the decimal point is also '
'removed if |\n'
' | | there are no remaining digits following '
'it. Positive and |\n'
' | | negative infinity, positive and negative '
'zero, and nans, |\n'
' | | are formatted as "inf", "-inf", "0", "-0" '
'and "nan" |\n'
' | | respectively, regardless of the '
'precision. A precision of |\n'
' | | "0" is treated as equivalent to a '
'precision of "1". The |\n'
' | | default precision is '
'"6". |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'G\'" | General format. Same as "\'g\'" except '
'switches to "\'E\'" if |\n'
' | | the number gets too large. The '
'representations of infinity |\n'
' | | and NaN are uppercased, '
'too. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'n\'" | Number. This is the same as "\'g\'", '
'except that it uses the |\n'
' | | current locale setting to insert the '
'appropriate number |\n'
' | | separator '
'characters. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | "\'%\'" | Percentage. Multiplies the number by 100 '
'and displays in |\n'
' | | fixed ("\'f\'") format, followed by a '
'percent sign. |\n'
' '
'+-----------+------------------------------------------------------------+\n'
' | None | Similar to "\'g\'", except that '
'fixed-point notation, when |\n'
' | | used, has at least one digit past the '
'decimal point. The |\n'
' | | default precision is as high as needed to '
'represent the |\n'
' | | particular value. The overall effect is to '
'match the |\n'
' | | output of "str()" as altered by the other '
'format |\n'
' | | '
'modifiers. '
'|\n'
' '
'+-----------+------------------------------------------------------------+\n'
'\n'
'\n'
'Format examples\n'
'===============\n'
'\n'
'This section contains examples of the new format syntax and '
'comparison\n'
'with the old "%"-formatting.\n'
'\n'
'In most of the cases the syntax is similar to the old '
'"%"-formatting,\n'
'with the addition of the "{}" and with ":" used instead of '
'"%". For\n'
'example, "\'%03.2f\'" can be translated to "\'{:03.2f}\'".\n'
'\n'
'The new format syntax also supports new and different '
'options, shown\n'
'in the follow examples.\n'
'\n'
'Accessing arguments by position:\n'
'\n'
" >>> '{0}, {1}, {2}'.format('a', 'b', 'c')\n"
" 'a, b, c'\n"
" >>> '{}, {}, {}'.format('a', 'b', 'c') # 3.1+ only\n"
" 'a, b, c'\n"
" >>> '{2}, {1}, {0}'.format('a', 'b', 'c')\n"
" 'c, b, a'\n"
" >>> '{2}, {1}, {0}'.format(*'abc') # unpacking "
'argument sequence\n'
" 'c, b, a'\n"
" >>> '{0}{1}{0}'.format('abra', 'cad') # arguments' "
'indices can be repeated\n'
" 'abracadabra'\n"
'\n'
'Accessing arguments by name:\n'
'\n'
" >>> 'Coordinates: {latitude}, "
"{longitude}'.format(latitude='37.24N', "
"longitude='-115.81W')\n"
" 'Coordinates: 37.24N, -115.81W'\n"
" >>> coord = {'latitude': '37.24N', 'longitude': "
"'-115.81W'}\n"
" >>> 'Coordinates: {latitude}, "
"{longitude}'.format(**coord)\n"
" 'Coordinates: 37.24N, -115.81W'\n"
'\n'
"Accessing arguments' attributes:\n"
'\n'
' >>> c = 3-5j\n'
" >>> ('The complex number {0} is formed from the real "
"part {0.real} '\n"
" ... 'and the imaginary part {0.imag}.').format(c)\n"
" 'The complex number (3-5j) is formed from the real part "
"3.0 and the imaginary part -5.0.'\n"
' >>> class Point:\n'
' ... def __init__(self, x, y):\n'
' ... self.x, self.y = x, y\n'
' ... def __str__(self):\n'
" ... return 'Point({self.x}, "
"{self.y})'.format(self=self)\n"
' ...\n'
' >>> str(Point(4, 2))\n'
" 'Point(4, 2)'\n"
'\n'
"Accessing arguments' items:\n"
'\n'
' >>> coord = (3, 5)\n'
" >>> 'X: {0[0]}; Y: {0[1]}'.format(coord)\n"
" 'X: 3; Y: 5'\n"
'\n'
'Replacing "%s" and "%r":\n'
'\n'
' >>> "repr() shows quotes: {!r}; str() doesn\'t: '
'{!s}".format(\'test1\', \'test2\')\n'
' "repr() shows quotes: \'test1\'; str() doesn\'t: test2"\n'
'\n'
'Aligning the text and specifying a width:\n'
'\n'
" >>> '{:<30}'.format('left aligned')\n"
" 'left aligned '\n"
" >>> '{:>30}'.format('right aligned')\n"
" ' right aligned'\n"
" >>> '{:^30}'.format('centered')\n"
" ' centered '\n"
" >>> '{:*^30}'.format('centered') # use '*' as a fill "
'char\n'
" '***********centered***********'\n"
'\n'
'Replacing "%+f", "%-f", and "% f" and specifying a sign:\n'
'\n'
" >>> '{:+f}; {:+f}'.format(3.14, -3.14) # show it "
'always\n'
" '+3.140000; -3.140000'\n"
" >>> '{: f}; {: f}'.format(3.14, -3.14) # show a space "
'for positive numbers\n'
" ' 3.140000; -3.140000'\n"
" >>> '{:-f}; {:-f}'.format(3.14, -3.14) # show only the "
"minus -- same as '{:f}; {:f}'\n"
" '3.140000; -3.140000'\n"
'\n'
'Replacing "%x" and "%o" and converting the value to '
'different bases:\n'
'\n'
' >>> # format also supports binary numbers\n'
' >>> "int: {0:d}; hex: {0:x}; oct: {0:o}; bin: '
'{0:b}".format(42)\n'
" 'int: 42; hex: 2a; oct: 52; bin: 101010'\n"
' >>> # with 0x, 0o, or 0b as prefix:\n'
' >>> "int: {0:d}; hex: {0:#x}; oct: {0:#o}; bin: '
'{0:#b}".format(42)\n'
" 'int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010'\n"
'\n'
'Using the comma as a thousands separator:\n'
'\n'
" >>> '{:,}'.format(1234567890)\n"
" '1,234,567,890'\n"
'\n'
'Expressing a percentage:\n'
'\n'
' >>> points = 19\n'
' >>> total = 22\n'
" >>> 'Correct answers: {:.2%}'.format(points/total)\n"
" 'Correct answers: 86.36%'\n"
'\n'
'Using type-specific formatting:\n'
'\n'
' >>> import datetime\n'
' >>> d = datetime.datetime(2010, 7, 4, 12, 15, 58)\n'
" >>> '{:%Y-%m-%d %H:%M:%S}'.format(d)\n"
" '2010-07-04 12:15:58'\n"
'\n'
'Nesting arguments and more complex examples:\n'
'\n'
" >>> for align, text in zip('<^>', ['left', 'center', "
"'right']):\n"
" ... '{0:{fill}{align}16}'.format(text, fill=align, "
'align=align)\n'
' ...\n'
" 'left<<<<<<<<<<<<'\n"
" '^^^^^center^^^^^'\n"
" '>>>>>>>>>>>right'\n"
' >>>\n'
' >>> octets = [192, 168, 0, 1]\n'
" >>> '{:02X}{:02X}{:02X}{:02X}'.format(*octets)\n"
" 'C0A80001'\n"
' >>> int(_, 16)\n'
' 3232235521\n'
' >>>\n'
' >>> width = 5\n'
' >>> for num in range(5,12): #doctest: '
'+NORMALIZE_WHITESPACE\n'
" ... for base in 'dXob':\n"
" ... print('{0:{width}{base}}'.format(num, "
"base=base, width=width), end=' ')\n"
' ... print()\n'
' ...\n'
' 5 5 5 101\n'
' 6 6 6 110\n'
' 7 7 7 111\n'
' 8 8 10 1000\n'
' 9 9 11 1001\n'
' 10 A 12 1010\n'
' 11 B 13 1011\n',
'function': '\n'
'Function definitions\n'
'********************\n'
'\n'
'A function definition defines a user-defined function object '
'(see\n'
'section *The standard type hierarchy*):\n'
'\n'
' funcdef ::= [decorators] "def" funcname "(" '
'[parameter_list] ")" ["->" expression] ":" suite\n'
' decorators ::= decorator+\n'
' decorator ::= "@" dotted_name ["(" [parameter_list '
'[","]] ")"] NEWLINE\n'
' dotted_name ::= identifier ("." identifier)*\n'
' parameter_list ::= (defparameter ",")*\n'
' | "*" [parameter] ("," defparameter)* ["," '
'"**" parameter]\n'
' | "**" parameter\n'
' | defparameter [","] )\n'
' parameter ::= identifier [":" expression]\n'
' defparameter ::= parameter ["=" expression]\n'
' funcname ::= identifier\n'
'\n'
'A function definition is an executable statement. Its execution '
'binds\n'
'the function name in the current local namespace to a function '
'object\n'
'(a wrapper around the executable code for the function). This\n'
'function object contains a reference to the current global '
'namespace\n'
'as the global namespace to be used when the function is called.\n'
'\n'
'The function definition does not execute the function body; this '
'gets\n'
'executed only when the function is called. [3]\n'
'\n'
'A function definition may be wrapped by one or more *decorator*\n'
'expressions. Decorator expressions are evaluated when the '
'function is\n'
'defined, in the scope that contains the function definition. '
'The\n'
'result must be a callable, which is invoked with the function '
'object\n'
'as the only argument. The returned value is bound to the '
'function name\n'
'instead of the function object. Multiple decorators are applied '
'in\n'
'nested fashion. For example, the following code\n'
'\n'
' @f1(arg)\n'
' @f2\n'
' def func(): pass\n'
'\n'
'is equivalent to\n'
'\n'
' def func(): pass\n'
' func = f1(arg)(f2(func))\n'
'\n'
'When one or more *parameters* have the form *parameter* "="\n'
'*expression*, the function is said to have "default parameter '
'values."\n'
'For a parameter with a default value, the corresponding '
'*argument* may\n'
"be omitted from a call, in which case the parameter's default "
'value is\n'
'substituted. If a parameter has a default value, all following\n'
'parameters up until the ""*"" must also have a default value --- '
'this\n'
'is a syntactic restriction that is not expressed by the '
'grammar.\n'
'\n'
'**Default parameter values are evaluated from left to right when '
'the\n'
'function definition is executed.** This means that the '
'expression is\n'
'evaluated once, when the function is defined, and that the same '
'"pre-\n'
'computed" value is used for each call. This is especially '
'important\n'
'to understand when a default parameter is a mutable object, such '
'as a\n'
'list or a dictionary: if the function modifies the object (e.g. '
'by\n'
'appending an item to a list), the default value is in effect '
'modified.\n'
'This is generally not what was intended. A way around this is '
'to use\n'
'"None" as the default, and explicitly test for it in the body of '
'the\n'
'function, e.g.:\n'
'\n'
' def whats_on_the_telly(penguin=None):\n'
' if penguin is None:\n'
' penguin = []\n'
' penguin.append("property of the zoo")\n'
' return penguin\n'
'\n'
'Function call semantics are described in more detail in section\n'
'*Calls*. A function call always assigns values to all '
'parameters\n'
'mentioned in the parameter list, either from position arguments, '
'from\n'
'keyword arguments, or from default values. If the form\n'
'""*identifier"" is present, it is initialized to a tuple '
'receiving any\n'
'excess positional parameters, defaulting to the empty tuple. If '
'the\n'
'form ""**identifier"" is present, it is initialized to a new\n'
'dictionary receiving any excess keyword arguments, defaulting to '
'a new\n'
'empty dictionary. Parameters after ""*"" or ""*identifier"" are\n'
'keyword-only parameters and may only be passed used keyword '
'arguments.\n'
'\n'
'Parameters may have annotations of the form "": expression"" '
'following\n'
'the parameter name. Any parameter may have an annotation even '
'those\n'
'of the form "*identifier" or "**identifier". Functions may '
'have\n'
'"return" annotation of the form ""-> expression"" after the '
'parameter\n'
'list. These annotations can be any valid Python expression and '
'are\n'
'evaluated when the function definition is executed. Annotations '
'may\n'
'be evaluated in a different order than they appear in the source '
'code.\n'
'The presence of annotations does not change the semantics of a\n'
'function. The annotation values are available as values of a\n'
"dictionary keyed by the parameters' names in the "
'"__annotations__"\n'
'attribute of the function object.\n'
'\n'
'It is also possible to create anonymous functions (functions not '
'bound\n'
'to a name), for immediate use in expressions. This uses lambda\n'
'expressions, described in section *Lambdas*. Note that the '
'lambda\n'
'expression is merely a shorthand for a simplified function '
'definition;\n'
'a function defined in a ""def"" statement can be passed around '
'or\n'
'assigned to another name just like a function defined by a '
'lambda\n'
'expression. The ""def"" form is actually more powerful since '
'it\n'
'allows the execution of multiple statements and annotations.\n'
'\n'
"**Programmer's note:** Functions are first-class objects. A "
'""def""\n'
'statement executed inside a function definition defines a local\n'
'function that can be returned or passed around. Free variables '
'used\n'
'in the nested function can access the local variables of the '
'function\n'
'containing the def. See section *Naming and binding* for '
'details.\n'
'\n'
'See also: **PEP 3107** - Function Annotations\n'
'\n'
' The original specification for function annotations.\n',
'global': '\n'
'The "global" statement\n'
'**********************\n'
'\n'
' global_stmt ::= "global" identifier ("," identifier)*\n'
'\n'
'The "global" statement is a declaration which holds for the '
'entire\n'
'current code block. It means that the listed identifiers are to '
'be\n'
'interpreted as globals. It would be impossible to assign to a '
'global\n'
'variable without "global", although free variables may refer to\n'
'globals without being declared global.\n'
'\n'
'Names listed in a "global" statement must not be used in the same '
'code\n'
'block textually preceding that "global" statement.\n'
'\n'
'Names listed in a "global" statement must not be defined as '
'formal\n'
'parameters or in a "for" loop control target, "class" definition,\n'
'function definition, or "import" statement.\n'
'\n'
'**CPython implementation detail:** The current implementation does '
'not\n'
'enforce the two restrictions, but programs should not abuse this\n'
'freedom, as future implementations may enforce them or silently '
'change\n'
'the meaning of the program.\n'
'\n'
'**Programmer\'s note:** the "global" is a directive to the '
'parser. It\n'
'applies only to code parsed at the same time as the "global"\n'
'statement. In particular, a "global" statement contained in a '
'string\n'
'or code object supplied to the built-in "exec()" function does '
'not\n'
'affect the code block *containing* the function call, and code\n'
'contained in such a string is unaffected by "global" statements in '
'the\n'
'code containing the function call. The same applies to the '
'"eval()"\n'
'and "compile()" functions.\n',
'id-classes': '\n'
'Reserved classes of identifiers\n'
'*******************************\n'
'\n'
'Certain classes of identifiers (besides keywords) have '
'special\n'
'meanings. These classes are identified by the patterns of '
'leading and\n'
'trailing underscore characters:\n'
'\n'
'"_*"\n'
' Not imported by "from module import *". The special '
'identifier "_"\n'
' is used in the interactive interpreter to store the result '
'of the\n'
' last evaluation; it is stored in the "builtins" module. '
'When not\n'
' in interactive mode, "_" has no special meaning and is not '
'defined.\n'
' See section *The import statement*.\n'
'\n'
' Note: The name "_" is often used in conjunction with\n'
' internationalization; refer to the documentation for the\n'
' "gettext" module for more information on this '
'convention.\n'
'\n'
'"__*__"\n'
' System-defined names. These names are defined by the '
'interpreter\n'
' and its implementation (including the standard library). '
'Current\n'
' system names are discussed in the *Special method names* '
'section\n'
' and elsewhere. More will likely be defined in future '
'versions of\n'
' Python. *Any* use of "__*__" names, in any context, that '
'does not\n'
' follow explicitly documented use, is subject to breakage '
'without\n'
' warning.\n'
'\n'
'"__*"\n'
' Class-private names. Names in this category, when used '
'within the\n'
' context of a class definition, are re-written to use a '
'mangled form\n'
' to help avoid name clashes between "private" attributes of '
'base and\n'
' derived classes. See section *Identifiers (Names)*.\n',
'identifiers': '\n'
'Identifiers and keywords\n'
'************************\n'
'\n'
'Identifiers (also referred to as *names*) are described by '
'the\n'
'following lexical definitions.\n'
'\n'
'The syntax of identifiers in Python is based on the Unicode '
'standard\n'
'annex UAX-31, with elaboration and changes as defined below; '
'see also\n'
'**PEP 3131** for further details.\n'
'\n'
'Within the ASCII range (U+0001..U+007F), the valid characters '
'for\n'
'identifiers are the same as in Python 2.x: the uppercase and '
'lowercase\n'
'letters "A" through "Z", the underscore "_" and, except for '
'the first\n'
'character, the digits "0" through "9".\n'
'\n'
'Python 3.0 introduces additional characters from outside the '
'ASCII\n'
'range (see **PEP 3131**). For these characters, the '
'classification\n'
'uses the version of the Unicode Character Database as '
'included in the\n'
'"unicodedata" module.\n'
'\n'
'Identifiers are unlimited in length. Case is significant.\n'
'\n'
' identifier ::= xid_start xid_continue*\n'
' id_start ::= <all characters in general categories Lu, '
'Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the '
'Other_ID_Start property>\n'
' id_continue ::= <all characters in id_start, plus '
'characters in the categories Mn, Mc, Nd, Pc and others with '
'the Other_ID_Continue property>\n'
' xid_start ::= <all characters in id_start whose NFKC '
'normalization is in "id_start xid_continue*">\n'
' xid_continue ::= <all characters in id_continue whose NFKC '
'normalization is in "id_continue*">\n'
'\n'
'The Unicode category codes mentioned above stand for:\n'
'\n'
'* *Lu* - uppercase letters\n'
'\n'
'* *Ll* - lowercase letters\n'
'\n'
'* *Lt* - titlecase letters\n'
'\n'
'* *Lm* - modifier letters\n'
'\n'
'* *Lo* - other letters\n'
'\n'
'* *Nl* - letter numbers\n'
'\n'
'* *Mn* - nonspacing marks\n'
'\n'
'* *Mc* - spacing combining marks\n'
'\n'
'* *Nd* - decimal numbers\n'
'\n'
'* *Pc* - connector punctuations\n'
'\n'
'* *Other_ID_Start* - explicit list of characters in '
'PropList.txt to\n'
' support backwards compatibility\n'
'\n'
'* *Other_ID_Continue* - likewise\n'
'\n'
'All identifiers are converted into the normal form NFKC while '
'parsing;\n'
'comparison of identifiers is based on NFKC.\n'
'\n'
'A non-normative HTML file listing all valid identifier '
'characters for\n'
'Unicode 4.1 can be found at http://www.dcl.hpi.uni-\n'
'potsdam.de/home/loewis/table-3131.html.\n'
'\n'
'\n'
'Keywords\n'
'========\n'
'\n'
'The following identifiers are used as reserved words, or '
'*keywords* of\n'
'the language, and cannot be used as ordinary identifiers. '
'They must\n'
'be spelled exactly as written here:\n'
'\n'
' False class finally is return\n'
' None continue for lambda try\n'
' True def from nonlocal while\n'
' and del global not with\n'
' as elif if or yield\n'
' assert else import pass\n'
' break except in raise\n'
'\n'
'\n'
'Reserved classes of identifiers\n'
'===============================\n'
'\n'
'Certain classes of identifiers (besides keywords) have '
'special\n'
'meanings. These classes are identified by the patterns of '
'leading and\n'
'trailing underscore characters:\n'
'\n'
'"_*"\n'
' Not imported by "from module import *". The special '
'identifier "_"\n'
' is used in the interactive interpreter to store the result '
'of the\n'
' last evaluation; it is stored in the "builtins" module. '
'When not\n'
' in interactive mode, "_" has no special meaning and is not '
'defined.\n'
' See section *The import statement*.\n'
'\n'
' Note: The name "_" is often used in conjunction with\n'
' internationalization; refer to the documentation for '
'the\n'
' "gettext" module for more information on this '
'convention.\n'
'\n'
'"__*__"\n'
' System-defined names. These names are defined by the '
'interpreter\n'
' and its implementation (including the standard library). '
'Current\n'
' system names are discussed in the *Special method names* '
'section\n'
' and elsewhere. More will likely be defined in future '
'versions of\n'
' Python. *Any* use of "__*__" names, in any context, that '
'does not\n'
' follow explicitly documented use, is subject to breakage '
'without\n'
' warning.\n'
'\n'
'"__*"\n'
' Class-private names. Names in this category, when used '
'within the\n'
' context of a class definition, are re-written to use a '
'mangled form\n'
' to help avoid name clashes between "private" attributes of '
'base and\n'
' derived classes. See section *Identifiers (Names)*.\n',
'if': '\n'
'The "if" statement\n'
'******************\n'
'\n'
'The "if" statement is used for conditional execution:\n'
'\n'
' if_stmt ::= "if" expression ":" suite\n'
' ( "elif" expression ":" suite )*\n'
' ["else" ":" suite]\n'
'\n'
'It selects exactly one of the suites by evaluating the expressions '
'one\n'
'by one until one is found to be true (see section *Boolean '
'operations*\n'
'for the definition of true and false); then that suite is executed\n'
'(and no other part of the "if" statement is executed or evaluated).\n'
'If all expressions are false, the suite of the "else" clause, if\n'
'present, is executed.\n',
'imaginary': '\n'
'Imaginary literals\n'
'******************\n'
'\n'
'Imaginary literals are described by the following lexical '
'definitions:\n'
'\n'
' imagnumber ::= (floatnumber | intpart) ("j" | "J")\n'
'\n'
'An imaginary literal yields a complex number with a real part '
'of 0.0.\n'
'Complex numbers are represented as a pair of floating point '
'numbers\n'
'and have the same restrictions on their range. To create a '
'complex\n'
'number with a nonzero real part, add a floating point number to '
'it,\n'
'e.g., "(3+4j)". Some examples of imaginary literals:\n'
'\n'
' 3.14j 10.j 10j .001j 1e100j 3.14e-10j\n',
'import': '\n'
'The "import" statement\n'
'**********************\n'
'\n'
' import_stmt ::= "import" module ["as" name] ( "," module '
'["as" name] )*\n'
' | "from" relative_module "import" identifier '
'["as" name]\n'
' ( "," identifier ["as" name] )*\n'
' | "from" relative_module "import" "(" '
'identifier ["as" name]\n'
' ( "," identifier ["as" name] )* [","] ")"\n'
' | "from" module "import" "*"\n'
' module ::= (identifier ".")* identifier\n'
' relative_module ::= "."* module | "."+\n'
' name ::= identifier\n'
'\n'
'The basic import statement (no "from" clause) is executed in two\n'
'steps:\n'
'\n'
'1. find a module, loading and initializing it if necessary\n'
'\n'
'2. define a name or names in the local namespace for the scope\n'
' where the "import" statement occurs.\n'
'\n'
'When the statement contains multiple clauses (separated by commas) '
'the\n'
'two steps are carried out separately for each clause, just as '
'though\n'
'the clauses had been separated out into individiual import '
'statements.\n'
'\n'
'The details of the first step, finding and loading modules are\n'
'described in greater detail in the section on the *import '
'system*,\n'
'which also describes the various types of packages and modules '
'that\n'
'can be imported, as well as all the hooks that can be used to\n'
'customize the import system. Note that failures in this step may\n'
'indicate either that the module could not be located, *or* that '
'an\n'
'error occurred while initializing the module, which includes '
'execution\n'
"of the module's code.\n"
'\n'
'If the requested module is retrieved successfully, it will be '
'made\n'
'available in the local namespace in one of three ways:\n'
'\n'
'* If the module name is followed by "as", then the name following\n'
' "as" is bound directly to the imported module.\n'
'\n'
'* If no other name is specified, and the module being imported is '
'a\n'
" top level module, the module's name is bound in the local "
'namespace\n'
' as a reference to the imported module\n'
'\n'
'* If the module being imported is *not* a top level module, then '
'the\n'
' name of the top level package that contains the module is bound '
'in\n'
' the local namespace as a reference to the top level package. '
'The\n'
' imported module must be accessed using its full qualified name\n'
' rather than directly\n'
'\n'
'The "from" form uses a slightly more complex process:\n'
'\n'
'1. find the module specified in the "from" clause, loading and\n'
' initializing it if necessary;\n'
'\n'
'2. for each of the identifiers specified in the "import" clauses:\n'
'\n'
' 1. check if the imported module has an attribute by that name\n'
'\n'
' 2. if not, attempt to import a submodule with that name and '
'then\n'
' check the imported module again for that attribute\n'
'\n'
' 3. if the attribute is not found, "ImportError" is raised.\n'
'\n'
' 4. otherwise, a reference to that value is stored in the local\n'
' namespace, using the name in the "as" clause if it is '
'present,\n'
' otherwise using the attribute name\n'
'\n'
'Examples:\n'
'\n'
' import foo # foo imported and bound locally\n'
' import foo.bar.baz # foo.bar.baz imported, foo bound '
'locally\n'
' import foo.bar.baz as fbb # foo.bar.baz imported and bound as '
'fbb\n'
' from foo.bar import baz # foo.bar.baz imported and bound as '
'baz\n'
' from foo import attr # foo imported and foo.attr bound as '
'attr\n'
'\n'
'If the list of identifiers is replaced by a star ("\'*\'"), all '
'public\n'
'names defined in the module are bound in the local namespace for '
'the\n'
'scope where the "import" statement occurs.\n'
'\n'
'The *public names* defined by a module are determined by checking '
'the\n'
'module\'s namespace for a variable named "__all__"; if defined, it '
'must\n'
'be a sequence of strings which are names defined or imported by '
'that\n'
'module. The names given in "__all__" are all considered public '
'and\n'
'are required to exist. If "__all__" is not defined, the set of '
'public\n'
"names includes all names found in the module's namespace which do "
'not\n'
'begin with an underscore character ("\'_\'"). "__all__" should '
'contain\n'
'the entire public API. It is intended to avoid accidentally '
'exporting\n'
'items that are not part of the API (such as library modules which '
'were\n'
'imported and used within the module).\n'
'\n'
'The wild card form of import --- "from module import *" --- is '
'only\n'
'allowed at the module level. Attempting to use it in class or\n'
'function definitions will raise a "SyntaxError".\n'
'\n'
'When specifying what module to import you do not have to specify '
'the\n'
'absolute name of the module. When a module or package is '
'contained\n'
'within another package it is possible to make a relative import '
'within\n'
'the same top package without having to mention the package name. '
'By\n'
'using leading dots in the specified module or package after "from" '
'you\n'
'can specify how high to traverse up the current package hierarchy\n'
'without specifying exact names. One leading dot means the current\n'
'package where the module making the import exists. Two dots means '
'up\n'
'one package level. Three dots is up two levels, etc. So if you '
'execute\n'
'"from . import mod" from a module in the "pkg" package then you '
'will\n'
'end up importing "pkg.mod". If you execute "from ..subpkg2 import '
'mod"\n'
'from within "pkg.subpkg1" you will import "pkg.subpkg2.mod". The\n'
'specification for relative imports is contained within **PEP '
'328**.\n'
'\n'
'"importlib.import_module()" is provided to support applications '
'that\n'
'determine dynamically the modules to be loaded.\n'
'\n'
'\n'
'Future statements\n'
'=================\n'
'\n'
'A *future statement* is a directive to the compiler that a '
'particular\n'
'module should be compiled using syntax or semantics that will be\n'
'available in a specified future release of Python where the '
'feature\n'
'becomes standard.\n'
'\n'
'The future statement is intended to ease migration to future '
'versions\n'
'of Python that introduce incompatible changes to the language. '
'It\n'
'allows use of the new features on a per-module basis before the\n'
'release in which the feature becomes standard.\n'
'\n'
' future_statement ::= "from" "__future__" "import" feature ["as" '
'name]\n'
' ("," feature ["as" name])*\n'
' | "from" "__future__" "import" "(" feature '
'["as" name]\n'
' ("," feature ["as" name])* [","] ")"\n'
' feature ::= identifier\n'
' name ::= identifier\n'
'\n'
'A future statement must appear near the top of the module. The '
'only\n'
'lines that can appear before a future statement are:\n'
'\n'
'* the module docstring (if any),\n'
'\n'
'* comments,\n'
'\n'
'* blank lines, and\n'
'\n'
'* other future statements.\n'
'\n'
'The features recognized by Python 3.0 are "absolute_import",\n'
'"division", "generators", "unicode_literals", "print_function",\n'
'"nested_scopes" and "with_statement". They are all redundant '
'because\n'
'they are always enabled, and only kept for backwards '
'compatibility.\n'
'\n'
'A future statement is recognized and treated specially at compile\n'
'time: Changes to the semantics of core constructs are often\n'
'implemented by generating different code. It may even be the '
'case\n'
'that a new feature introduces new incompatible syntax (such as a '
'new\n'
'reserved word), in which case the compiler may need to parse the\n'
'module differently. Such decisions cannot be pushed off until\n'
'runtime.\n'
'\n'
'For any given release, the compiler knows which feature names '
'have\n'
'been defined, and raises a compile-time error if a future '
'statement\n'
'contains a feature not known to it.\n'
'\n'
'The direct runtime semantics are the same as for any import '
'statement:\n'
'there is a standard module "__future__", described later, and it '
'will\n'
'be imported in the usual way at the time the future statement is\n'
'executed.\n'
'\n'
'The interesting runtime semantics depend on the specific feature\n'
'enabled by the future statement.\n'
'\n'
'Note that there is nothing special about the statement:\n'
'\n'
' import __future__ [as name]\n'
'\n'
"That is not a future statement; it's an ordinary import statement "
'with\n'
'no special semantics or syntax restrictions.\n'
'\n'
'Code compiled by calls to the built-in functions "exec()" and\n'
'"compile()" that occur in a module "M" containing a future '
'statement\n'
'will, by default, use the new syntax or semantics associated with '
'the\n'
'future statement. This can be controlled by optional arguments '
'to\n'
'"compile()" --- see the documentation of that function for '
'details.\n'
'\n'
'A future statement typed at an interactive interpreter prompt '
'will\n'
'take effect for the rest of the interpreter session. If an\n'
'interpreter is started with the *-i* option, is passed a script '
'name\n'
'to execute, and the script includes a future statement, it will be '
'in\n'
'effect in the interactive session started after the script is\n'
'executed.\n'
'\n'
'See also: **PEP 236** - Back to the __future__\n'
'\n'
' The original proposal for the __future__ mechanism.\n',
'in': '\n'
'Membership test operations\n'
'**************************\n'
'\n'
'The operators "in" and "not in" test for membership. "x in s"\n'
'evaluates to true if *x* is a member of *s*, and false otherwise. "x\n'
'not in s" returns the negation of "x in s". All built-in sequences\n'
'and set types support this as well as dictionary, for which "in" '
'tests\n'
'whether the dictionary has a given key. For container types such as\n'
'list, tuple, set, frozenset, dict, or collections.deque, the\n'
'expression "x in y" is equivalent to "any(x is e or x == e for e in\n'
'y)".\n'
'\n'
'For the string and bytes types, "x in y" is true if and only if *x* '
'is\n'
'a substring of *y*. An equivalent test is "y.find(x) != -1". Empty\n'
'strings are always considered to be a substring of any other string,\n'
'so """ in "abc"" will return "True".\n'
'\n'
'For user-defined classes which define the "__contains__()" method, "x\n'
'in y" is true if and only if "y.__contains__(x)" is true.\n'
'\n'
'For user-defined classes which do not define "__contains__()" but do\n'
'define "__iter__()", "x in y" is true if some value "z" with "x == z"\n'
'is produced while iterating over "y". If an exception is raised\n'
'during the iteration, it is as if "in" raised that exception.\n'
'\n'
'Lastly, the old-style iteration protocol is tried: if a class defines\n'
'"__getitem__()", "x in y" is true if and only if there is a non-\n'
'negative integer index *i* such that "x == y[i]", and all lower\n'
'integer indices do not raise "IndexError" exception. (If any other\n'
'exception is raised, it is as if "in" raised that exception).\n'
'\n'
'The operator "not in" is defined to have the inverse true value of\n'
'"in".\n',
'integers': '\n'
'Integer literals\n'
'****************\n'
'\n'
'Integer literals are described by the following lexical '
'definitions:\n'
'\n'
' integer ::= decimalinteger | octinteger | hexinteger | '
'bininteger\n'
' decimalinteger ::= nonzerodigit digit* | "0"+\n'
' nonzerodigit ::= "1"..."9"\n'
' digit ::= "0"..."9"\n'
' octinteger ::= "0" ("o" | "O") octdigit+\n'
' hexinteger ::= "0" ("x" | "X") hexdigit+\n'
' bininteger ::= "0" ("b" | "B") bindigit+\n'
' octdigit ::= "0"..."7"\n'
' hexdigit ::= digit | "a"..."f" | "A"..."F"\n'
' bindigit ::= "0" | "1"\n'
'\n'
'There is no limit for the length of integer literals apart from '
'what\n'
'can be stored in available memory.\n'
'\n'
'Note that leading zeros in a non-zero decimal number are not '
'allowed.\n'
'This is for disambiguation with C-style octal literals, which '
'Python\n'
'used before version 3.0.\n'
'\n'
'Some examples of integer literals:\n'
'\n'
' 7 2147483647 0o177 0b100110111\n'
' 3 79228162514264337593543950336 0o377 0xdeadbeef\n',
'lambda': '\n'
'Lambdas\n'
'*******\n'
'\n'
' lambda_expr ::= "lambda" [parameter_list]: expression\n'
' lambda_expr_nocond ::= "lambda" [parameter_list]: '
'expression_nocond\n'
'\n'
'Lambda expressions (sometimes called lambda forms) are used to '
'create\n'
'anonymous functions. The expression "lambda arguments: '
'expression"\n'
'yields a function object. The unnamed object behaves like a '
'function\n'
'object defined with\n'
'\n'
' def <lambda>(arguments):\n'
' return expression\n'
'\n'
'See section *Function definitions* for the syntax of parameter '
'lists.\n'
'Note that functions created with lambda expressions cannot '
'contain\n'
'statements or annotations.\n',
'lists': '\n'
'List displays\n'
'*************\n'
'\n'
'A list display is a possibly empty series of expressions enclosed '
'in\n'
'square brackets:\n'
'\n'
' list_display ::= "[" [expression_list | comprehension] "]"\n'
'\n'
'A list display yields a new list object, the contents being '
'specified\n'
'by either a list of expressions or a comprehension. When a comma-\n'
'separated list of expressions is supplied, its elements are '
'evaluated\n'
'from left to right and placed into the list object in that order.\n'
'When a comprehension is supplied, the list is constructed from the\n'
'elements resulting from the comprehension.\n',
'naming': '\n'
'Naming and binding\n'
'******************\n'
'\n'
'\n'
'Binding of names\n'
'================\n'
'\n'
'*Names* refer to objects. Names are introduced by name binding\n'
'operations.\n'
'\n'
'The following constructs bind names: formal parameters to '
'functions,\n'
'"import" statements, class and function definitions (these bind '
'the\n'
'class or function name in the defining block), and targets that '
'are\n'
'identifiers if occurring in an assignment, "for" loop header, or '
'after\n'
'"as" in a "with" statement or "except" clause. The "import" '
'statement\n'
'of the form "from ... import *" binds all names defined in the\n'
'imported module, except those beginning with an underscore. This '
'form\n'
'may only be used at the module level.\n'
'\n'
'A target occurring in a "del" statement is also considered bound '
'for\n'
'this purpose (though the actual semantics are to unbind the '
'name).\n'
'\n'
'Each assignment or import statement occurs within a block defined '
'by a\n'
'class or function definition or at the module level (the '
'top-level\n'
'code block).\n'
'\n'
'If a name is bound in a block, it is a local variable of that '
'block,\n'
'unless declared as "nonlocal" or "global". If a name is bound at '
'the\n'
'module level, it is a global variable. (The variables of the '
'module\n'
'code block are local and global.) If a variable is used in a '
'code\n'
'block but not defined there, it is a *free variable*.\n'
'\n'
'Each occurrence of a name in the program text refers to the '
'*binding*\n'
'of that name established by the following name resolution rules.\n'
'\n'
'\n'
'Resolution of names\n'
'===================\n'
'\n'
'A *scope* defines the visibility of a name within a block. If a '
'local\n'
'variable is defined in a block, its scope includes that block. If '
'the\n'
'definition occurs in a function block, the scope extends to any '
'blocks\n'
'contained within the defining one, unless a contained block '
'introduces\n'
'a different binding for the name.\n'
'\n'
'When a name is used in a code block, it is resolved using the '
'nearest\n'
'enclosing scope. The set of all such scopes visible to a code '
'block\n'
"is called the block's *environment*.\n"
'\n'
'When a name is not found at all, a "NameError" exception is '
'raised. If\n'
'the current scope is a function scope, and the name refers to a '
'local\n'
'variable that has not yet been bound to a value at the point where '
'the\n'
'name is used, an "UnboundLocalError" exception is raised.\n'
'"UnboundLocalError" is a subclass of "NameError".\n'
'\n'
'If a name binding operation occurs anywhere within a code block, '
'all\n'
'uses of the name within the block are treated as references to '
'the\n'
'current block. This can lead to errors when a name is used within '
'a\n'
'block before it is bound. This rule is subtle. Python lacks\n'
'declarations and allows name binding operations to occur anywhere\n'
'within a code block. The local variables of a code block can be\n'
'determined by scanning the entire text of the block for name '
'binding\n'
'operations.\n'
'\n'
'If the "global" statement occurs within a block, all uses of the '
'name\n'
'specified in the statement refer to the binding of that name in '
'the\n'
'top-level namespace. Names are resolved in the top-level '
'namespace by\n'
'searching the global namespace, i.e. the namespace of the module\n'
'containing the code block, and the builtins namespace, the '
'namespace\n'
'of the module "builtins". The global namespace is searched '
'first. If\n'
'the name is not found there, the builtins namespace is searched. '
'The\n'
'"global" statement must precede all uses of the name.\n'
'\n'
'The "global" statement has the same scope as a name binding '
'operation\n'
'in the same block. If the nearest enclosing scope for a free '
'variable\n'
'contains a global statement, the free variable is treated as a '
'global.\n'
'\n'
'The "nonlocal" statement causes corresponding names to refer to\n'
'previously bound variables in the nearest enclosing function '
'scope.\n'
'"SyntaxError" is raised at compile time if the given name does '
'not\n'
'exist in any enclosing function scope.\n'
'\n'
'The namespace for a module is automatically created the first time '
'a\n'
'module is imported. The main module for a script is always '
'called\n'
'"__main__".\n'
'\n'
'Class definition blocks and arguments to "exec()" and "eval()" '
'are\n'
'special in the context of name resolution. A class definition is '
'an\n'
'executable statement that may use and define names. These '
'references\n'
'follow the normal rules for name resolution with an exception '
'that\n'
'unbound local variables are looked up in the global namespace. '
'The\n'
'namespace of the class definition becomes the attribute dictionary '
'of\n'
'the class. The scope of names defined in a class block is limited '
'to\n'
'the class block; it does not extend to the code blocks of methods '
'--\n'
'this includes comprehensions and generator expressions since they '
'are\n'
'implemented using a function scope. This means that the '
'following\n'
'will fail:\n'
'\n'
' class A:\n'
' a = 42\n'
' b = list(a + i for i in range(10))\n'
'\n'
'\n'
'Builtins and restricted execution\n'
'=================================\n'
'\n'
'The builtins namespace associated with the execution of a code '
'block\n'
'is actually found by looking up the name "__builtins__" in its '
'global\n'
'namespace; this should be a dictionary or a module (in the latter '
'case\n'
"the module's dictionary is used). By default, when in the "
'"__main__"\n'
'module, "__builtins__" is the built-in module "builtins"; when in '
'any\n'
'other module, "__builtins__" is an alias for the dictionary of '
'the\n'
'"builtins" module itself. "__builtins__" can be set to a '
'user-created\n'
'dictionary to create a weak form of restricted execution.\n'
'\n'
'**CPython implementation detail:** Users should not touch\n'
'"__builtins__"; it is strictly an implementation detail. Users\n'
'wanting to override values in the builtins namespace should '
'"import"\n'
'the "builtins" module and modify its attributes appropriately.\n'
'\n'
'\n'
'Interaction with dynamic features\n'
'=================================\n'
'\n'
'Name resolution of free variables occurs at runtime, not at '
'compile\n'
'time. This means that the following code will print 42:\n'
'\n'
' i = 10\n'
' def f():\n'
' print(i)\n'
' i = 42\n'
' f()\n'
'\n'
'There are several cases where Python statements are illegal when '
'used\n'
'in conjunction with nested scopes that contain free variables.\n'
'\n'
'If a variable is referenced in an enclosing scope, it is illegal '
'to\n'
'delete the name. An error will be reported at compile time.\n'
'\n'
'The "eval()" and "exec()" functions do not have access to the '
'full\n'
'environment for resolving names. Names may be resolved in the '
'local\n'
'and global namespaces of the caller. Free variables are not '
'resolved\n'
'in the nearest enclosing namespace, but in the global namespace. '
'[1]\n'
'The "exec()" and "eval()" functions have optional arguments to\n'
'override the global and local namespace. If only one namespace '
'is\n'
'specified, it is used for both.\n',
'nonlocal': '\n'
'The "nonlocal" statement\n'
'************************\n'
'\n'
' nonlocal_stmt ::= "nonlocal" identifier ("," identifier)*\n'
'\n'
'The "nonlocal" statement causes the listed identifiers to refer '
'to\n'
'previously bound variables in the nearest enclosing scope '
'excluding\n'
'globals. This is important because the default behavior for '
'binding is\n'
'to search the local namespace first. The statement allows\n'
'encapsulated code to rebind variables outside of the local '
'scope\n'
'besides the global (module) scope.\n'
'\n'
'Names listed in a "nonlocal" statement, unlike those listed in '
'a\n'
'"global" statement, must refer to pre-existing bindings in an\n'
'enclosing scope (the scope in which a new binding should be '
'created\n'
'cannot be determined unambiguously).\n'
'\n'
'Names listed in a "nonlocal" statement must not collide with '
'pre-\n'
'existing bindings in the local scope.\n'
'\n'
'See also: **PEP 3104** - Access to Names in Outer Scopes\n'
'\n'
' The specification for the "nonlocal" statement.\n',
'numbers': '\n'
'Numeric literals\n'
'****************\n'
'\n'
'There are three types of numeric literals: integers, floating '
'point\n'
'numbers, and imaginary numbers. There are no complex literals\n'
'(complex numbers can be formed by adding a real number and an\n'
'imaginary number).\n'
'\n'
'Note that numeric literals do not include a sign; a phrase like '
'"-1"\n'
'is actually an expression composed of the unary operator \'"-"\' '
'and the\n'
'literal "1".\n',
'numeric-types': '\n'
'Emulating numeric types\n'
'***********************\n'
'\n'
'The following methods can be defined to emulate numeric '
'objects.\n'
'Methods corresponding to operations that are not supported '
'by the\n'
'particular kind of number implemented (e.g., bitwise '
'operations for\n'
'non-integral numbers) should be left undefined.\n'
'\n'
'object.__add__(self, other)\n'
'object.__sub__(self, other)\n'
'object.__mul__(self, other)\n'
'object.__truediv__(self, other)\n'
'object.__floordiv__(self, other)\n'
'object.__mod__(self, other)\n'
'object.__divmod__(self, other)\n'
'object.__pow__(self, other[, modulo])\n'
'object.__lshift__(self, other)\n'
'object.__rshift__(self, other)\n'
'object.__and__(self, other)\n'
'object.__xor__(self, other)\n'
'object.__or__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "/", "//", "%", "divmod()", '
'"pow()",\n'
' "**", "<<", ">>", "&", "^", "|"). For instance, to '
'evaluate the\n'
' expression "x + y", where *x* is an instance of a class '
'that has an\n'
' "__add__()" method, "x.__add__(y)" is called. The '
'"__divmod__()"\n'
' method should be the equivalent to using '
'"__floordiv__()" and\n'
' "__mod__()"; it should not be related to '
'"__truediv__()". Note\n'
' that "__pow__()" should be defined to accept an optional '
'third\n'
' argument if the ternary version of the built-in "pow()" '
'function is\n'
' to be supported.\n'
'\n'
' If one of those methods does not support the operation '
'with the\n'
' supplied arguments, it should return "NotImplemented".\n'
'\n'
'object.__radd__(self, other)\n'
'object.__rsub__(self, other)\n'
'object.__rmul__(self, other)\n'
'object.__rtruediv__(self, other)\n'
'object.__rfloordiv__(self, other)\n'
'object.__rmod__(self, other)\n'
'object.__rdivmod__(self, other)\n'
'object.__rpow__(self, other)\n'
'object.__rlshift__(self, other)\n'
'object.__rrshift__(self, other)\n'
'object.__rand__(self, other)\n'
'object.__rxor__(self, other)\n'
'object.__ror__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "/", "//", "%", "divmod()", '
'"pow()",\n'
' "**", "<<", ">>", "&", "^", "|") with reflected '
'(swapped) operands.\n'
' These functions are only called if the left operand does '
'not\n'
' support the corresponding operation and the operands are '
'of\n'
' different types. [2] For instance, to evaluate the '
'expression "x -\n'
' y", where *y* is an instance of a class that has an '
'"__rsub__()"\n'
' method, "y.__rsub__(x)" is called if "x.__sub__(y)" '
'returns\n'
' *NotImplemented*.\n'
'\n'
' Note that ternary "pow()" will not try calling '
'"__rpow__()" (the\n'
' coercion rules would become too complicated).\n'
'\n'
" Note: If the right operand's type is a subclass of the "
'left\n'
" operand's type and that subclass provides the "
'reflected method\n'
' for the operation, this method will be called before '
'the left\n'
" operand's non-reflected method. This behavior allows "
'subclasses\n'
" to override their ancestors' operations.\n"
'\n'
'object.__iadd__(self, other)\n'
'object.__isub__(self, other)\n'
'object.__imul__(self, other)\n'
'object.__itruediv__(self, other)\n'
'object.__ifloordiv__(self, other)\n'
'object.__imod__(self, other)\n'
'object.__ipow__(self, other[, modulo])\n'
'object.__ilshift__(self, other)\n'
'object.__irshift__(self, other)\n'
'object.__iand__(self, other)\n'
'object.__ixor__(self, other)\n'
'object.__ior__(self, other)\n'
'\n'
' These methods are called to implement the augmented '
'arithmetic\n'
' assignments ("+=", "-=", "*=", "/=", "//=", "%=", "**=", '
'"<<=",\n'
' ">>=", "&=", "^=", "|="). These methods should attempt '
'to do the\n'
' operation in-place (modifying *self*) and return the '
'result (which\n'
' could be, but does not have to be, *self*). If a '
'specific method\n'
' is not defined, the augmented assignment falls back to '
'the normal\n'
' methods. For instance, if *x* is an instance of a class '
'with an\n'
' "__iadd__()" method, "x += y" is equivalent to "x = '
'x.__iadd__(y)"\n'
' . Otherwise, "x.__add__(y)" and "y.__radd__(x)" are '
'considered, as\n'
' with the evaluation of "x + y". In certain situations, '
'augmented\n'
' assignment can result in unexpected errors (see *Why '
'does\n'
" a_tuple[i] += ['item'] raise an exception when the "
'addition\n'
' works?*), but this behavior is in fact part of the data '
'model.\n'
'\n'
'object.__neg__(self)\n'
'object.__pos__(self)\n'
'object.__abs__(self)\n'
'object.__invert__(self)\n'
'\n'
' Called to implement the unary arithmetic operations '
'("-", "+",\n'
' "abs()" and "~").\n'
'\n'
'object.__complex__(self)\n'
'object.__int__(self)\n'
'object.__float__(self)\n'
'object.__round__(self[, n])\n'
'\n'
' Called to implement the built-in functions "complex()", '
'"int()",\n'
' "float()" and "round()". Should return a value of the '
'appropriate\n'
' type.\n'
'\n'
'object.__index__(self)\n'
'\n'
' Called to implement "operator.index()", and whenever '
'Python needs\n'
' to losslessly convert the numeric object to an integer '
'object (such\n'
' as in slicing, or in the built-in "bin()", "hex()" and '
'"oct()"\n'
' functions). Presence of this method indicates that the '
'numeric\n'
' object is an integer type. Must return an integer.\n'
'\n'
' Note: In order to have a coherent integer type class, '
'when\n'
' "__index__()" is defined "__int__()" should also be '
'defined, and\n'
' both should return the same value.\n',
'objects': '\n'
'Objects, values and types\n'
'*************************\n'
'\n'
"*Objects* are Python's abstraction for data. All data in a "
'Python\n'
'program is represented by objects or by relations between '
'objects. (In\n'
'a sense, and in conformance to Von Neumann\'s model of a "stored\n'
'program computer," code is also represented by objects.)\n'
'\n'
"Every object has an identity, a type and a value. An object's\n"
'*identity* never changes once it has been created; you may think '
'of it\n'
'as the object\'s address in memory. The \'"is"\' operator '
'compares the\n'
'identity of two objects; the "id()" function returns an integer\n'
'representing its identity.\n'
'\n'
'**CPython implementation detail:** For CPython, "id(x)" is the '
'memory\n'
'address where "x" is stored.\n'
'\n'
"An object's type determines the operations that the object "
'supports\n'
'(e.g., "does it have a length?") and also defines the possible '
'values\n'
'for objects of that type. The "type()" function returns an '
"object's\n"
'type (which is an object itself). Like its identity, an '
"object's\n"
'*type* is also unchangeable. [1]\n'
'\n'
'The *value* of some objects can change. Objects whose value can\n'
'change are said to be *mutable*; objects whose value is '
'unchangeable\n'
'once they are created are called *immutable*. (The value of an\n'
'immutable container object that contains a reference to a '
'mutable\n'
"object can change when the latter's value is changed; however "
'the\n'
'container is still considered immutable, because the collection '
'of\n'
'objects it contains cannot be changed. So, immutability is not\n'
'strictly the same as having an unchangeable value, it is more '
'subtle.)\n'
"An object's mutability is determined by its type; for instance,\n"
'numbers, strings and tuples are immutable, while dictionaries '
'and\n'
'lists are mutable.\n'
'\n'
'Objects are never explicitly destroyed; however, when they '
'become\n'
'unreachable they may be garbage-collected. An implementation is\n'
'allowed to postpone garbage collection or omit it altogether --- '
'it is\n'
'a matter of implementation quality how garbage collection is\n'
'implemented, as long as no objects are collected that are still\n'
'reachable.\n'
'\n'
'**CPython implementation detail:** CPython currently uses a '
'reference-\n'
'counting scheme with (optional) delayed detection of cyclically '
'linked\n'
'garbage, which collects most objects as soon as they become\n'
'unreachable, but is not guaranteed to collect garbage containing\n'
'circular references. See the documentation of the "gc" module '
'for\n'
'information on controlling the collection of cyclic garbage. '
'Other\n'
'implementations act differently and CPython may change. Do not '
'depend\n'
'on immediate finalization of objects when they become unreachable '
'(so\n'
'you should always close files explicitly).\n'
'\n'
"Note that the use of the implementation's tracing or debugging\n"
'facilities may keep objects alive that would normally be '
'collectable.\n'
'Also note that catching an exception with a \'"try"..."except"\'\n'
'statement may keep objects alive.\n'
'\n'
'Some objects contain references to "external" resources such as '
'open\n'
'files or windows. It is understood that these resources are '
'freed\n'
'when the object is garbage-collected, but since garbage '
'collection is\n'
'not guaranteed to happen, such objects also provide an explicit '
'way to\n'
'release the external resource, usually a "close()" method. '
'Programs\n'
'are strongly recommended to explicitly close such objects. The\n'
'\'"try"..."finally"\' statement and the \'"with"\' statement '
'provide\n'
'convenient ways to do this.\n'
'\n'
'Some objects contain references to other objects; these are '
'called\n'
'*containers*. Examples of containers are tuples, lists and\n'
"dictionaries. The references are part of a container's value. "
'In\n'
'most cases, when we talk about the value of a container, we imply '
'the\n'
'values, not the identities of the contained objects; however, '
'when we\n'
'talk about the mutability of a container, only the identities of '
'the\n'
'immediately contained objects are implied. So, if an immutable\n'
'container (like a tuple) contains a reference to a mutable '
'object, its\n'
'value changes if that mutable object is changed.\n'
'\n'
'Types affect almost all aspects of object behavior. Even the\n'
'importance of object identity is affected in some sense: for '
'immutable\n'
'types, operations that compute new values may actually return a\n'
'reference to any existing object with the same type and value, '
'while\n'
'for mutable objects this is not allowed. E.g., after "a = 1; b = '
'1",\n'
'"a" and "b" may or may not refer to the same object with the '
'value\n'
'one, depending on the implementation, but after "c = []; d = []", '
'"c"\n'
'and "d" are guaranteed to refer to two different, unique, newly\n'
'created empty lists. (Note that "c = d = []" assigns the same '
'object\n'
'to both "c" and "d".)\n',
'operator-summary': '\n'
'Operator precedence\n'
'*******************\n'
'\n'
'The following table summarizes the operator precedence '
'in Python, from\n'
'lowest precedence (least binding) to highest precedence '
'(most\n'
'binding). Operators in the same box have the same '
'precedence. Unless\n'
'the syntax is explicitly given, operators are binary. '
'Operators in\n'
'the same box group left to right (except for '
'exponentiation, which\n'
'groups from right to left).\n'
'\n'
'Note that comparisons, membership tests, and identity '
'tests, all have\n'
'the same precedence and have a left-to-right chaining '
'feature as\n'
'described in the *Comparisons* section.\n'
'\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| Operator | '
'Description |\n'
'+=================================================+=======================================+\n'
'| "lambda" | '
'Lambda expression |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "if" -- "else" | '
'Conditional expression |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "or" | '
'Boolean OR |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "and" | '
'Boolean AND |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "not" "x" | '
'Boolean NOT |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "in", "not in", "is", "is not", "<", "<=", ">", | '
'Comparisons, including membership |\n'
'| ">=", "!=", "==" | '
'tests and identity tests |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "|" | '
'Bitwise OR |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "^" | '
'Bitwise XOR |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "&" | '
'Bitwise AND |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "<<", ">>" | '
'Shifts |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "+", "-" | '
'Addition and subtraction |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "*", "/", "//", "%" | '
'Multiplication, division, remainder |\n'
'| | '
'[5] |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "+x", "-x", "~x" | '
'Positive, negative, bitwise NOT |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "**" | '
'Exponentiation [6] |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "x[index]", "x[index:index]", | '
'Subscription, slicing, call, |\n'
'| "x(arguments...)", "x.attribute" | '
'attribute reference |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'| "(expressions...)", "[expressions...]", "{key: | '
'Binding or tuple display, list |\n'
'| value...}", "{expressions...}" | '
'display, dictionary display, set |\n'
'| | '
'display |\n'
'+-------------------------------------------------+---------------------------------------+\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] While "abs(x%y) < abs(y)" is true mathematically, '
'for floats\n'
' it may not be true numerically due to roundoff. For '
'example, and\n'
' assuming a platform on which a Python float is an '
'IEEE 754 double-\n'
' precision number, in order that "-1e-100 % 1e100" '
'have the same\n'
' sign as "1e100", the computed result is "-1e-100 + '
'1e100", which\n'
' is numerically exactly equal to "1e100". The '
'function\n'
' "math.fmod()" returns a result whose sign matches '
'the sign of the\n'
' first argument instead, and so returns "-1e-100" in '
'this case.\n'
' Which approach is more appropriate depends on the '
'application.\n'
'\n'
'[2] If x is very close to an exact integer multiple of '
"y, it's\n"
' possible for "x//y" to be one larger than '
'"(x-x%y)//y" due to\n'
' rounding. In such cases, Python returns the latter '
'result, in\n'
' order to preserve that "divmod(x,y)[0] * y + x % y" '
'be very close\n'
' to "x".\n'
'\n'
'[3] The Unicode standard distinguishes between *code '
'points* (e.g.\n'
' U+0041) and *abstract characters* (e.g. "LATIN '
'CAPITAL LETTER A").\n'
' While most abstract characters in Unicode are only '
'represented\n'
' using one code point, there is a number of abstract '
'characters\n'
' that can in addition be represented using a sequence '
'of more than\n'
' one code point. For example, the abstract character '
'"LATIN\n'
' CAPITAL LETTER C WITH CEDILLA" can be represented as '
'a single\n'
' *precomposed character* at code position U+00C7, or '
'as a sequence\n'
' of a *base character* at code position U+0043 (LATIN '
'CAPITAL\n'
' LETTER C), followed by a *combining character* at '
'code position\n'
' U+0327 (COMBINING CEDILLA).\n'
'\n'
' The comparison operators on strings compare at the '
'level of\n'
' Unicode code points. This may be counter-intuitive '
'to humans. For\n'
' example, ""\\u00C7" == "\\u0043\\u0327"" is "False", '
'even though both\n'
' strings represent the same abstract character "LATIN '
'CAPITAL\n'
' LETTER C WITH CEDILLA".\n'
'\n'
' To compare strings at the level of abstract '
'characters (that is,\n'
' in a way intuitive to humans), use '
'"unicodedata.normalize()".\n'
'\n'
'[4] Due to automatic garbage-collection, free lists, and '
'the\n'
' dynamic nature of descriptors, you may notice '
'seemingly unusual\n'
' behaviour in certain uses of the "is" operator, like '
'those\n'
' involving comparisons between instance methods, or '
'constants.\n'
' Check their documentation for more info.\n'
'\n'
'[5] The "%" operator is also used for string formatting; '
'the same\n'
' precedence applies.\n'
'\n'
'[6] The power operator "**" binds less tightly than an '
'arithmetic\n'
' or bitwise unary operator on its right, that is, '
'"2**-1" is "0.5".\n',
'pass': '\n'
'The "pass" statement\n'
'********************\n'
'\n'
' pass_stmt ::= "pass"\n'
'\n'
'"pass" is a null operation --- when it is executed, nothing '
'happens.\n'
'It is useful as a placeholder when a statement is required\n'
'syntactically, but no code needs to be executed, for example:\n'
'\n'
' def f(arg): pass # a function that does nothing (yet)\n'
'\n'
' class C: pass # a class with no methods (yet)\n',
'power': '\n'
'The power operator\n'
'******************\n'
'\n'
'The power operator binds more tightly than unary operators on its\n'
'left; it binds less tightly than unary operators on its right. '
'The\n'
'syntax is:\n'
'\n'
' power ::= primary ["**" u_expr]\n'
'\n'
'Thus, in an unparenthesized sequence of power and unary operators, '
'the\n'
'operators are evaluated from right to left (this does not '
'constrain\n'
'the evaluation order for the operands): "-1**2" results in "-1".\n'
'\n'
'The power operator has the same semantics as the built-in "pow()"\n'
'function, when called with two arguments: it yields its left '
'argument\n'
'raised to the power of its right argument. The numeric arguments '
'are\n'
'first converted to a common type, and the result is of that type.\n'
'\n'
'For int operands, the result has the same type as the operands '
'unless\n'
'the second argument is negative; in that case, all arguments are\n'
'converted to float and a float result is delivered. For example,\n'
'"10**2" returns "100", but "10**-2" returns "0.01".\n'
'\n'
'Raising "0.0" to a negative power results in a '
'"ZeroDivisionError".\n'
'Raising a negative number to a fractional power results in a '
'"complex"\n'
'number. (In earlier versions it raised a "ValueError".)\n',
'raise': '\n'
'The "raise" statement\n'
'*********************\n'
'\n'
' raise_stmt ::= "raise" [expression ["from" expression]]\n'
'\n'
'If no expressions are present, "raise" re-raises the last '
'exception\n'
'that was active in the current scope. If no exception is active '
'in\n'
'the current scope, a "RuntimeError" exception is raised indicating\n'
'that this is an error.\n'
'\n'
'Otherwise, "raise" evaluates the first expression as the exception\n'
'object. It must be either a subclass or an instance of\n'
'"BaseException". If it is a class, the exception instance will be\n'
'obtained when needed by instantiating the class with no arguments.\n'
'\n'
"The *type* of the exception is the exception instance's class, the\n"
'*value* is the instance itself.\n'
'\n'
'A traceback object is normally created automatically when an '
'exception\n'
'is raised and attached to it as the "__traceback__" attribute, '
'which\n'
'is writable. You can create an exception and set your own traceback '
'in\n'
'one step using the "with_traceback()" exception method (which '
'returns\n'
'the same exception instance, with its traceback set to its '
'argument),\n'
'like so:\n'
'\n'
' raise Exception("foo occurred").with_traceback(tracebackobj)\n'
'\n'
'The "from" clause is used for exception chaining: if given, the '
'second\n'
'*expression* must be another exception class or instance, which '
'will\n'
'then be attached to the raised exception as the "__cause__" '
'attribute\n'
'(which is writable). If the raised exception is not handled, both\n'
'exceptions will be printed:\n'
'\n'
' >>> try:\n'
' ... print(1 / 0)\n'
' ... except Exception as exc:\n'
' ... raise RuntimeError("Something bad happened") from exc\n'
' ...\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 2, in <module>\n'
' ZeroDivisionError: int division or modulo by zero\n'
'\n'
' The above exception was the direct cause of the following '
'exception:\n'
'\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 4, in <module>\n'
' RuntimeError: Something bad happened\n'
'\n'
'A similar mechanism works implicitly if an exception is raised '
'inside\n'
'an exception handler or a "finally" clause: the previous exception '
'is\n'
'then attached as the new exception\'s "__context__" attribute:\n'
'\n'
' >>> try:\n'
' ... print(1 / 0)\n'
' ... except:\n'
' ... raise RuntimeError("Something bad happened")\n'
' ...\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 2, in <module>\n'
' ZeroDivisionError: int division or modulo by zero\n'
'\n'
' During handling of the above exception, another exception '
'occurred:\n'
'\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 4, in <module>\n'
' RuntimeError: Something bad happened\n'
'\n'
'Additional information on exceptions can be found in section\n'
'*Exceptions*, and information about handling exceptions is in '
'section\n'
'*The try statement*.\n',
'return': '\n'
'The "return" statement\n'
'**********************\n'
'\n'
' return_stmt ::= "return" [expression_list]\n'
'\n'
'"return" may only occur syntactically nested in a function '
'definition,\n'
'not within a nested class definition.\n'
'\n'
'If an expression list is present, it is evaluated, else "None" is\n'
'substituted.\n'
'\n'
'"return" leaves the current function call with the expression list '
'(or\n'
'"None") as return value.\n'
'\n'
'When "return" passes control out of a "try" statement with a '
'"finally"\n'
'clause, that "finally" clause is executed before really leaving '
'the\n'
'function.\n'
'\n'
'In a generator function, the "return" statement indicates that '
'the\n'
'generator is done and will cause "StopIteration" to be raised. '
'The\n'
'returned value (if any) is used as an argument to construct\n'
'"StopIteration" and becomes the "StopIteration.value" attribute.\n',
'sequence-types': '\n'
'Emulating container types\n'
'*************************\n'
'\n'
'The following methods can be defined to implement '
'container objects.\n'
'Containers usually are sequences (such as lists or tuples) '
'or mappings\n'
'(like dictionaries), but can represent other containers as '
'well. The\n'
'first set of methods is used either to emulate a sequence '
'or to\n'
'emulate a mapping; the difference is that for a sequence, '
'the\n'
'allowable keys should be the integers *k* for which "0 <= '
'k < N" where\n'
'*N* is the length of the sequence, or slice objects, which '
'define a\n'
'range of items. It is also recommended that mappings '
'provide the\n'
'methods "keys()", "values()", "items()", "get()", '
'"clear()",\n'
'"setdefault()", "pop()", "popitem()", "copy()", and '
'"update()"\n'
"behaving similar to those for Python's standard dictionary "
'objects.\n'
'The "collections" module provides a "MutableMapping" '
'abstract base\n'
'class to help create those methods from a base set of '
'"__getitem__()",\n'
'"__setitem__()", "__delitem__()", and "keys()". Mutable '
'sequences\n'
'should provide methods "append()", "count()", "index()", '
'"extend()",\n'
'"insert()", "pop()", "remove()", "reverse()" and "sort()", '
'like Python\n'
'standard list objects. Finally, sequence types should '
'implement\n'
'addition (meaning concatenation) and multiplication '
'(meaning\n'
'repetition) by defining the methods "__add__()", '
'"__radd__()",\n'
'"__iadd__()", "__mul__()", "__rmul__()" and "__imul__()" '
'described\n'
'below; they should not define other numerical operators. '
'It is\n'
'recommended that both mappings and sequences implement '
'the\n'
'"__contains__()" method to allow efficient use of the "in" '
'operator;\n'
'for mappings, "in" should search the mapping\'s keys; for '
'sequences, it\n'
'should search through the values. It is further '
'recommended that both\n'
'mappings and sequences implement the "__iter__()" method '
'to allow\n'
'efficient iteration through the container; for mappings, '
'"__iter__()"\n'
'should be the same as "keys()"; for sequences, it should '
'iterate\n'
'through the values.\n'
'\n'
'object.__len__(self)\n'
'\n'
' Called to implement the built-in function "len()". '
'Should return\n'
' the length of the object, an integer ">=" 0. Also, an '
'object that\n'
' doesn\'t define a "__bool__()" method and whose '
'"__len__()" method\n'
' returns zero is considered to be false in a Boolean '
'context.\n'
'\n'
'object.__length_hint__(self)\n'
'\n'
' Called to implement "operator.length_hint()". Should '
'return an\n'
' estimated length for the object (which may be greater '
'or less than\n'
' the actual length). The length must be an integer ">=" '
'0. This\n'
' method is purely an optimization and is never required '
'for\n'
' correctness.\n'
'\n'
' New in version 3.4.\n'
'\n'
'Note: Slicing is done exclusively with the following three '
'methods.\n'
' A call like\n'
'\n'
' a[1:2] = b\n'
'\n'
' is translated to\n'
'\n'
' a[slice(1, 2, None)] = b\n'
'\n'
' and so forth. Missing slice items are always filled in '
'with "None".\n'
'\n'
'object.__getitem__(self, key)\n'
'\n'
' Called to implement evaluation of "self[key]". For '
'sequence types,\n'
' the accepted keys should be integers and slice '
'objects. Note that\n'
' the special interpretation of negative indexes (if the '
'class wishes\n'
' to emulate a sequence type) is up to the '
'"__getitem__()" method. If\n'
' *key* is of an inappropriate type, "TypeError" may be '
'raised; if of\n'
' a value outside the set of indexes for the sequence '
'(after any\n'
' special interpretation of negative values), '
'"IndexError" should be\n'
' raised. For mapping types, if *key* is missing (not in '
'the\n'
' container), "KeyError" should be raised.\n'
'\n'
' Note: "for" loops expect that an "IndexError" will be '
'raised for\n'
' illegal indexes to allow proper detection of the end '
'of the\n'
' sequence.\n'
'\n'
'object.__missing__(self, key)\n'
'\n'
' Called by "dict"."__getitem__()" to implement '
'"self[key]" for dict\n'
' subclasses when key is not in the dictionary.\n'
'\n'
'object.__setitem__(self, key, value)\n'
'\n'
' Called to implement assignment to "self[key]". Same '
'note as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support changes to the values for keys, or '
'if new keys\n'
' can be added, or for sequences if elements can be '
'replaced. The\n'
' same exceptions should be raised for improper *key* '
'values as for\n'
' the "__getitem__()" method.\n'
'\n'
'object.__delitem__(self, key)\n'
'\n'
' Called to implement deletion of "self[key]". Same note '
'as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support removal of keys, or for sequences '
'if elements\n'
' can be removed from the sequence. The same exceptions '
'should be\n'
' raised for improper *key* values as for the '
'"__getitem__()" method.\n'
'\n'
'object.__iter__(self)\n'
'\n'
' This method is called when an iterator is required for '
'a container.\n'
' This method should return a new iterator object that '
'can iterate\n'
' over all the objects in the container. For mappings, '
'it should\n'
' iterate over the keys of the container.\n'
'\n'
' Iterator objects also need to implement this method; '
'they are\n'
' required to return themselves. For more information on '
'iterator\n'
' objects, see *Iterator Types*.\n'
'\n'
'object.__reversed__(self)\n'
'\n'
' Called (if present) by the "reversed()" built-in to '
'implement\n'
' reverse iteration. It should return a new iterator '
'object that\n'
' iterates over all the objects in the container in '
'reverse order.\n'
'\n'
' If the "__reversed__()" method is not provided, the '
'"reversed()"\n'
' built-in will fall back to using the sequence protocol '
'("__len__()"\n'
' and "__getitem__()"). Objects that support the '
'sequence protocol\n'
' should only provide "__reversed__()" if they can '
'provide an\n'
' implementation that is more efficient than the one '
'provided by\n'
' "reversed()".\n'
'\n'
'The membership test operators ("in" and "not in") are '
'normally\n'
'implemented as an iteration through a sequence. However, '
'container\n'
'objects can supply the following special method with a '
'more efficient\n'
'implementation, which also does not require the object be '
'a sequence.\n'
'\n'
'object.__contains__(self, item)\n'
'\n'
' Called to implement membership test operators. Should '
'return true\n'
' if *item* is in *self*, false otherwise. For mapping '
'objects, this\n'
' should consider the keys of the mapping rather than the '
'values or\n'
' the key-item pairs.\n'
'\n'
' For objects that don\'t define "__contains__()", the '
'membership test\n'
' first tries iteration via "__iter__()", then the old '
'sequence\n'
' iteration protocol via "__getitem__()", see *this '
'section in the\n'
' language reference*.\n',
'shifting': '\n'
'Shifting operations\n'
'*******************\n'
'\n'
'The shifting operations have lower priority than the arithmetic\n'
'operations:\n'
'\n'
' shift_expr ::= a_expr | shift_expr ( "<<" | ">>" ) a_expr\n'
'\n'
'These operators accept integers as arguments. They shift the '
'first\n'
'argument to the left or right by the number of bits given by '
'the\n'
'second argument.\n'
'\n'
'A right shift by *n* bits is defined as floor division by '
'"pow(2,n)".\n'
'A left shift by *n* bits is defined as multiplication with '
'"pow(2,n)".\n'
'\n'
'Note: In the current implementation, the right-hand operand is\n'
' required to be at most "sys.maxsize". If the right-hand '
'operand is\n'
' larger than "sys.maxsize" an "OverflowError" exception is '
'raised.\n',
'slicings': '\n'
'Slicings\n'
'********\n'
'\n'
'A slicing selects a range of items in a sequence object (e.g., '
'a\n'
'string, tuple or list). Slicings may be used as expressions or '
'as\n'
'targets in assignment or "del" statements. The syntax for a '
'slicing:\n'
'\n'
' slicing ::= primary "[" slice_list "]"\n'
' slice_list ::= slice_item ("," slice_item)* [","]\n'
' slice_item ::= expression | proper_slice\n'
' proper_slice ::= [lower_bound] ":" [upper_bound] [ ":" '
'[stride] ]\n'
' lower_bound ::= expression\n'
' upper_bound ::= expression\n'
' stride ::= expression\n'
'\n'
'There is ambiguity in the formal syntax here: anything that '
'looks like\n'
'an expression list also looks like a slice list, so any '
'subscription\n'
'can be interpreted as a slicing. Rather than further '
'complicating the\n'
'syntax, this is disambiguated by defining that in this case the\n'
'interpretation as a subscription takes priority over the\n'
'interpretation as a slicing (this is the case if the slice list\n'
'contains no proper slice).\n'
'\n'
'The semantics for a slicing are as follows. The primary is '
'indexed\n'
'(using the same "__getitem__()" method as normal subscription) '
'with a\n'
'key that is constructed from the slice list, as follows. If the '
'slice\n'
'list contains at least one comma, the key is a tuple containing '
'the\n'
'conversion of the slice items; otherwise, the conversion of the '
'lone\n'
'slice item is the key. The conversion of a slice item that is '
'an\n'
'expression is that expression. The conversion of a proper slice '
'is a\n'
'slice object (see section *The standard type hierarchy*) whose\n'
'"start", "stop" and "step" attributes are the values of the\n'
'expressions given as lower bound, upper bound and stride,\n'
'respectively, substituting "None" for missing expressions.\n',
'specialattrs': '\n'
'Special Attributes\n'
'******************\n'
'\n'
'The implementation adds a few special read-only attributes '
'to several\n'
'object types, where they are relevant. Some of these are '
'not reported\n'
'by the "dir()" built-in function.\n'
'\n'
'object.__dict__\n'
'\n'
' A dictionary or other mapping object used to store an '
"object's\n"
' (writable) attributes.\n'
'\n'
'instance.__class__\n'
'\n'
' The class to which a class instance belongs.\n'
'\n'
'class.__bases__\n'
'\n'
' The tuple of base classes of a class object.\n'
'\n'
'class.__name__\n'
'\n'
' The name of the class or type.\n'
'\n'
'class.__qualname__\n'
'\n'
' The *qualified name* of the class or type.\n'
'\n'
' New in version 3.3.\n'
'\n'
'class.__mro__\n'
'\n'
' This attribute is a tuple of classes that are considered '
'when\n'
' looking for base classes during method resolution.\n'
'\n'
'class.mro()\n'
'\n'
' This method can be overridden by a metaclass to customize '
'the\n'
' method resolution order for its instances. It is called '
'at class\n'
' instantiation, and its result is stored in "__mro__".\n'
'\n'
'class.__subclasses__()\n'
'\n'
' Each class keeps a list of weak references to its '
'immediate\n'
' subclasses. This method returns a list of all those '
'references\n'
' still alive. Example:\n'
'\n'
' >>> int.__subclasses__()\n'
" [<class 'bool'>]\n"
'\n'
'-[ Footnotes ]-\n'
'\n'
'[1] Additional information on these special methods may be '
'found\n'
' in the Python Reference Manual (*Basic customization*).\n'
'\n'
'[2] As a consequence, the list "[1, 2]" is considered equal '
'to\n'
' "[1.0, 2.0]", and similarly for tuples.\n'
'\n'
"[3] They must have since the parser can't tell the type of "
'the\n'
' operands.\n'
'\n'
'[4] Cased characters are those with general category '
'property\n'
' being one of "Lu" (Letter, uppercase), "Ll" (Letter, '
'lowercase),\n'
' or "Lt" (Letter, titlecase).\n'
'\n'
'[5] To format only a tuple you should therefore provide a\n'
' singleton tuple whose only element is the tuple to be '
'formatted.\n',
'specialnames': '\n'
'Special method names\n'
'********************\n'
'\n'
'A class can implement certain operations that are invoked by '
'special\n'
'syntax (such as arithmetic operations or subscripting and '
'slicing) by\n'
"defining methods with special names. This is Python's "
'approach to\n'
'*operator overloading*, allowing classes to define their own '
'behavior\n'
'with respect to language operators. For instance, if a '
'class defines\n'
'a method named "__getitem__()", and "x" is an instance of '
'this class,\n'
'then "x[i]" is roughly equivalent to "type(x).__getitem__(x, '
'i)".\n'
'Except where mentioned, attempts to execute an operation '
'raise an\n'
'exception when no appropriate method is defined (typically\n'
'"AttributeError" or "TypeError").\n'
'\n'
'When implementing a class that emulates any built-in type, '
'it is\n'
'important that the emulation only be implemented to the '
'degree that it\n'
'makes sense for the object being modelled. For example, '
'some\n'
'sequences may work well with retrieval of individual '
'elements, but\n'
'extracting a slice may not make sense. (One example of this '
'is the\n'
'"NodeList" interface in the W3C\'s Document Object Model.)\n'
'\n'
'\n'
'Basic customization\n'
'===================\n'
'\n'
'object.__new__(cls[, ...])\n'
'\n'
' Called to create a new instance of class *cls*. '
'"__new__()" is a\n'
' static method (special-cased so you need not declare it '
'as such)\n'
' that takes the class of which an instance was requested '
'as its\n'
' first argument. The remaining arguments are those passed '
'to the\n'
' object constructor expression (the call to the class). '
'The return\n'
' value of "__new__()" should be the new object instance '
'(usually an\n'
' instance of *cls*).\n'
'\n'
' Typical implementations create a new instance of the '
'class by\n'
' invoking the superclass\'s "__new__()" method using\n'
' "super(currentclass, cls).__new__(cls[, ...])" with '
'appropriate\n'
' arguments and then modifying the newly-created instance '
'as\n'
' necessary before returning it.\n'
'\n'
' If "__new__()" returns an instance of *cls*, then the '
'new\n'
' instance\'s "__init__()" method will be invoked like\n'
' "__init__(self[, ...])", where *self* is the new instance '
'and the\n'
' remaining arguments are the same as were passed to '
'"__new__()".\n'
'\n'
' If "__new__()" does not return an instance of *cls*, then '
'the new\n'
' instance\'s "__init__()" method will not be invoked.\n'
'\n'
' "__new__()" is intended mainly to allow subclasses of '
'immutable\n'
' types (like int, str, or tuple) to customize instance '
'creation. It\n'
' is also commonly overridden in custom metaclasses in '
'order to\n'
' customize class creation.\n'
'\n'
'object.__init__(self[, ...])\n'
'\n'
' Called after the instance has been created (by '
'"__new__()"), but\n'
' before it is returned to the caller. The arguments are '
'those\n'
' passed to the class constructor expression. If a base '
'class has an\n'
' "__init__()" method, the derived class\'s "__init__()" '
'method, if\n'
' any, must explicitly call it to ensure proper '
'initialization of the\n'
' base class part of the instance; for example:\n'
' "BaseClass.__init__(self, [args...])".\n'
'\n'
' Because "__new__()" and "__init__()" work together in '
'constructing\n'
' objects ("__new__()" to create it, and "__init__()" to '
'customise\n'
' it), no non-"None" value may be returned by "__init__()"; '
'doing so\n'
' will cause a "TypeError" to be raised at runtime.\n'
'\n'
'object.__del__(self)\n'
'\n'
' Called when the instance is about to be destroyed. This '
'is also\n'
' called a destructor. If a base class has a "__del__()" '
'method, the\n'
' derived class\'s "__del__()" method, if any, must '
'explicitly call it\n'
' to ensure proper deletion of the base class part of the '
'instance.\n'
' Note that it is possible (though not recommended!) for '
'the\n'
' "__del__()" method to postpone destruction of the '
'instance by\n'
' creating a new reference to it. It may then be called at '
'a later\n'
' time when this new reference is deleted. It is not '
'guaranteed that\n'
' "__del__()" methods are called for objects that still '
'exist when\n'
' the interpreter exits.\n'
'\n'
' Note: "del x" doesn\'t directly call "x.__del__()" --- '
'the former\n'
' decrements the reference count for "x" by one, and the '
'latter is\n'
' only called when "x"\'s reference count reaches zero. '
'Some common\n'
' situations that may prevent the reference count of an '
'object from\n'
' going to zero include: circular references between '
'objects (e.g.,\n'
' a doubly-linked list or a tree data structure with '
'parent and\n'
' child pointers); a reference to the object on the stack '
'frame of\n'
' a function that caught an exception (the traceback '
'stored in\n'
' "sys.exc_info()[2]" keeps the stack frame alive); or a '
'reference\n'
' to the object on the stack frame that raised an '
'unhandled\n'
' exception in interactive mode (the traceback stored in\n'
' "sys.last_traceback" keeps the stack frame alive). The '
'first\n'
' situation can only be remedied by explicitly breaking '
'the cycles;\n'
' the second can be resolved by freeing the reference to '
'the\n'
' traceback object when it is no longer useful, and the '
'third can\n'
' be resolved by storing "None" in "sys.last_traceback". '
'Circular\n'
' references which are garbage are detected and cleaned '
'up when the\n'
" cyclic garbage collector is enabled (it's on by "
'default). Refer\n'
' to the documentation for the "gc" module for more '
'information\n'
' about this topic.\n'
'\n'
' Warning: Due to the precarious circumstances under which\n'
' "__del__()" methods are invoked, exceptions that occur '
'during\n'
' their execution are ignored, and a warning is printed '
'to\n'
' "sys.stderr" instead. Also, when "__del__()" is invoked '
'in\n'
' response to a module being deleted (e.g., when '
'execution of the\n'
' program is done), other globals referenced by the '
'"__del__()"\n'
' method may already have been deleted or in the process '
'of being\n'
' torn down (e.g. the import machinery shutting down). '
'For this\n'
' reason, "__del__()" methods should do the absolute '
'minimum needed\n'
' to maintain external invariants. Starting with version '
'1.5,\n'
' Python guarantees that globals whose name begins with a '
'single\n'
' underscore are deleted from their module before other '
'globals are\n'
' deleted; if no other references to such globals exist, '
'this may\n'
' help in assuring that imported modules are still '
'available at the\n'
' time when the "__del__()" method is called.\n'
'\n'
'object.__repr__(self)\n'
'\n'
' Called by the "repr()" built-in function to compute the '
'"official"\n'
' string representation of an object. If at all possible, '
'this\n'
' should look like a valid Python expression that could be '
'used to\n'
' recreate an object with the same value (given an '
'appropriate\n'
' environment). If this is not possible, a string of the '
'form\n'
' "<...some useful description...>" should be returned. The '
'return\n'
' value must be a string object. If a class defines '
'"__repr__()" but\n'
' not "__str__()", then "__repr__()" is also used when an '
'"informal"\n'
' string representation of instances of that class is '
'required.\n'
'\n'
' This is typically used for debugging, so it is important '
'that the\n'
' representation is information-rich and unambiguous.\n'
'\n'
'object.__str__(self)\n'
'\n'
' Called by "str(object)" and the built-in functions '
'"format()" and\n'
' "print()" to compute the "informal" or nicely printable '
'string\n'
' representation of an object. The return value must be a '
'*string*\n'
' object.\n'
'\n'
' This method differs from "object.__repr__()" in that '
'there is no\n'
' expectation that "__str__()" return a valid Python '
'expression: a\n'
' more convenient or concise representation can be used.\n'
'\n'
' The default implementation defined by the built-in type '
'"object"\n'
' calls "object.__repr__()".\n'
'\n'
'object.__bytes__(self)\n'
'\n'
' Called by "bytes()" to compute a byte-string '
'representation of an\n'
' object. This should return a "bytes" object.\n'
'\n'
'object.__format__(self, format_spec)\n'
'\n'
' Called by the "format()" built-in function (and by '
'extension, the\n'
' "str.format()" method of class "str") to produce a '
'"formatted"\n'
' string representation of an object. The "format_spec" '
'argument is a\n'
' string that contains a description of the formatting '
'options\n'
' desired. The interpretation of the "format_spec" argument '
'is up to\n'
' the type implementing "__format__()", however most '
'classes will\n'
' either delegate formatting to one of the built-in types, '
'or use a\n'
' similar formatting option syntax.\n'
'\n'
' See *Format Specification Mini-Language* for a '
'description of the\n'
' standard formatting syntax.\n'
'\n'
' The return value must be a string object.\n'
'\n'
' Changed in version 3.4: The __format__ method of "object" '
'itself\n'
' raises a "TypeError" if passed any non-empty string.\n'
'\n'
'object.__lt__(self, other)\n'
'object.__le__(self, other)\n'
'object.__eq__(self, other)\n'
'object.__ne__(self, other)\n'
'object.__gt__(self, other)\n'
'object.__ge__(self, other)\n'
'\n'
' These are the so-called "rich comparison" methods. The\n'
' correspondence between operator symbols and method names '
'is as\n'
' follows: "x<y" calls "x.__lt__(y)", "x<=y" calls '
'"x.__le__(y)",\n'
' "x==y" calls "x.__eq__(y)", "x!=y" calls "x.__ne__(y)", '
'"x>y" calls\n'
' "x.__gt__(y)", and "x>=y" calls "x.__ge__(y)".\n'
'\n'
' A rich comparison method may return the singleton '
'"NotImplemented"\n'
' if it does not implement the operation for a given pair '
'of\n'
' arguments. By convention, "False" and "True" are returned '
'for a\n'
' successful comparison. However, these methods can return '
'any value,\n'
' so if the comparison operator is used in a Boolean '
'context (e.g.,\n'
' in the condition of an "if" statement), Python will call '
'"bool()"\n'
' on the value to determine if the result is true or '
'false.\n'
'\n'
' By default, "__ne__()" delegates to "__eq__()" and '
'inverts the\n'
' result unless it is "NotImplemented". There are no other '
'implied\n'
' relationships among the comparison operators, for '
'example, the\n'
' truth of "(x<y or x==y)" does not imply "x<=y". To '
'automatically\n'
' generate ordering operations from a single root '
'operation, see\n'
' "functools.total_ordering()".\n'
'\n'
' See the paragraph on "__hash__()" for some important '
'notes on\n'
' creating *hashable* objects which support custom '
'comparison\n'
' operations and are usable as dictionary keys.\n'
'\n'
' There are no swapped-argument versions of these methods '
'(to be used\n'
' when the left argument does not support the operation but '
'the right\n'
' argument does); rather, "__lt__()" and "__gt__()" are '
"each other's\n"
' reflection, "__le__()" and "__ge__()" are each other\'s '
'reflection,\n'
' and "__eq__()" and "__ne__()" are their own reflection. '
'If the\n'
" operands are of different types, and right operand's type "
'is a\n'
" direct or indirect subclass of the left operand's type, "
'the\n'
' reflected method of the right operand has priority, '
'otherwise the\n'
" left operand's method has priority. Virtual subclassing "
'is not\n'
' considered.\n'
'\n'
'object.__hash__(self)\n'
'\n'
' Called by built-in function "hash()" and for operations '
'on members\n'
' of hashed collections including "set", "frozenset", and '
'"dict".\n'
' "__hash__()" should return an integer. The only required '
'property\n'
' is that objects which compare equal have the same hash '
'value; it is\n'
' advised to somehow mix together (e.g. using exclusive or) '
'the hash\n'
' values for the components of the object that also play a '
'part in\n'
' comparison of objects.\n'
'\n'
' Note: "hash()" truncates the value returned from an '
"object's\n"
' custom "__hash__()" method to the size of a '
'"Py_ssize_t". This\n'
' is typically 8 bytes on 64-bit builds and 4 bytes on '
'32-bit\n'
' builds. If an object\'s "__hash__()" must '
'interoperate on builds\n'
' of different bit sizes, be sure to check the width on '
'all\n'
' supported builds. An easy way to do this is with '
'"python -c\n'
' "import sys; print(sys.hash_info.width)"".\n'
'\n'
' If a class does not define an "__eq__()" method it should '
'not\n'
' define a "__hash__()" operation either; if it defines '
'"__eq__()"\n'
' but not "__hash__()", its instances will not be usable as '
'items in\n'
' hashable collections. If a class defines mutable objects '
'and\n'
' implements an "__eq__()" method, it should not implement\n'
' "__hash__()", since the implementation of hashable '
'collections\n'
" requires that a key's hash value is immutable (if the "
"object's hash\n"
' value changes, it will be in the wrong hash bucket).\n'
'\n'
' User-defined classes have "__eq__()" and "__hash__()" '
'methods by\n'
' default; with them, all objects compare unequal (except '
'with\n'
' themselves) and "x.__hash__()" returns an appropriate '
'value such\n'
' that "x == y" implies both that "x is y" and "hash(x) == '
'hash(y)".\n'
'\n'
' A class that overrides "__eq__()" and does not define '
'"__hash__()"\n'
' will have its "__hash__()" implicitly set to "None". '
'When the\n'
' "__hash__()" method of a class is "None", instances of '
'the class\n'
' will raise an appropriate "TypeError" when a program '
'attempts to\n'
' retrieve their hash value, and will also be correctly '
'identified as\n'
' unhashable when checking "isinstance(obj, '
'collections.Hashable)".\n'
'\n'
' If a class that overrides "__eq__()" needs to retain the\n'
' implementation of "__hash__()" from a parent class, the '
'interpreter\n'
' must be told this explicitly by setting "__hash__ =\n'
' <ParentClass>.__hash__".\n'
'\n'
' If a class that does not override "__eq__()" wishes to '
'suppress\n'
' hash support, it should include "__hash__ = None" in the '
'class\n'
' definition. A class which defines its own "__hash__()" '
'that\n'
' explicitly raises a "TypeError" would be incorrectly '
'identified as\n'
' hashable by an "isinstance(obj, collections.Hashable)" '
'call.\n'
'\n'
' Note: By default, the "__hash__()" values of str, bytes '
'and\n'
' datetime objects are "salted" with an unpredictable '
'random value.\n'
' Although they remain constant within an individual '
'Python\n'
' process, they are not predictable between repeated '
'invocations of\n'
' Python.This is intended to provide protection against a '
'denial-\n'
' of-service caused by carefully-chosen inputs that '
'exploit the\n'
' worst case performance of a dict insertion, O(n^2) '
'complexity.\n'
' See http://www.ocert.org/advisories/ocert-2011-003.html '
'for\n'
' details.Changing hash values affects the iteration '
'order of\n'
' dicts, sets and other mappings. Python has never made '
'guarantees\n'
' about this ordering (and it typically varies between '
'32-bit and\n'
' 64-bit builds).See also "PYTHONHASHSEED".\n'
'\n'
' Changed in version 3.3: Hash randomization is enabled by '
'default.\n'
'\n'
'object.__bool__(self)\n'
'\n'
' Called to implement truth value testing and the built-in '
'operation\n'
' "bool()"; should return "False" or "True". When this '
'method is not\n'
' defined, "__len__()" is called, if it is defined, and the '
'object is\n'
' considered true if its result is nonzero. If a class '
'defines\n'
' neither "__len__()" nor "__bool__()", all its instances '
'are\n'
' considered true.\n'
'\n'
'\n'
'Customizing attribute access\n'
'============================\n'
'\n'
'The following methods can be defined to customize the '
'meaning of\n'
'attribute access (use of, assignment to, or deletion of '
'"x.name") for\n'
'class instances.\n'
'\n'
'object.__getattr__(self, name)\n'
'\n'
' Called when an attribute lookup has not found the '
'attribute in the\n'
' usual places (i.e. it is not an instance attribute nor is '
'it found\n'
' in the class tree for "self"). "name" is the attribute '
'name. This\n'
' method should return the (computed) attribute value or '
'raise an\n'
' "AttributeError" exception.\n'
'\n'
' Note that if the attribute is found through the normal '
'mechanism,\n'
' "__getattr__()" is not called. (This is an intentional '
'asymmetry\n'
' between "__getattr__()" and "__setattr__()".) This is '
'done both for\n'
' efficiency reasons and because otherwise "__getattr__()" '
'would have\n'
' no way to access other attributes of the instance. Note '
'that at\n'
' least for instance variables, you can fake total control '
'by not\n'
' inserting any values in the instance attribute dictionary '
'(but\n'
' instead inserting them in another object). See the\n'
' "__getattribute__()" method below for a way to actually '
'get total\n'
' control over attribute access.\n'
'\n'
'object.__getattribute__(self, name)\n'
'\n'
' Called unconditionally to implement attribute accesses '
'for\n'
' instances of the class. If the class also defines '
'"__getattr__()",\n'
' the latter will not be called unless "__getattribute__()" '
'either\n'
' calls it explicitly or raises an "AttributeError". This '
'method\n'
' should return the (computed) attribute value or raise an\n'
' "AttributeError" exception. In order to avoid infinite '
'recursion in\n'
' this method, its implementation should always call the '
'base class\n'
' method with the same name to access any attributes it '
'needs, for\n'
' example, "object.__getattribute__(self, name)".\n'
'\n'
' Note: This method may still be bypassed when looking up '
'special\n'
' methods as the result of implicit invocation via '
'language syntax\n'
' or built-in functions. See *Special method lookup*.\n'
'\n'
'object.__setattr__(self, name, value)\n'
'\n'
' Called when an attribute assignment is attempted. This '
'is called\n'
' instead of the normal mechanism (i.e. store the value in '
'the\n'
' instance dictionary). *name* is the attribute name, '
'*value* is the\n'
' value to be assigned to it.\n'
'\n'
' If "__setattr__()" wants to assign to an instance '
'attribute, it\n'
' should call the base class method with the same name, for '
'example,\n'
' "object.__setattr__(self, name, value)".\n'
'\n'
'object.__delattr__(self, name)\n'
'\n'
' Like "__setattr__()" but for attribute deletion instead '
'of\n'
' assignment. This should only be implemented if "del '
'obj.name" is\n'
' meaningful for the object.\n'
'\n'
'object.__dir__(self)\n'
'\n'
' Called when "dir()" is called on the object. A sequence '
'must be\n'
' returned. "dir()" converts the returned sequence to a '
'list and\n'
' sorts it.\n'
'\n'
'\n'
'Implementing Descriptors\n'
'------------------------\n'
'\n'
'The following methods only apply when an instance of the '
'class\n'
'containing the method (a so-called *descriptor* class) '
'appears in an\n'
"*owner* class (the descriptor must be in either the owner's "
'class\n'
'dictionary or in the class dictionary for one of its '
'parents). In the\n'
'examples below, "the attribute" refers to the attribute '
'whose name is\n'
'the key of the property in the owner class\' "__dict__".\n'
'\n'
'object.__get__(self, instance, owner)\n'
'\n'
' Called to get the attribute of the owner class (class '
'attribute\n'
' access) or of an instance of that class (instance '
'attribute\n'
' access). *owner* is always the owner class, while '
'*instance* is the\n'
' instance that the attribute was accessed through, or '
'"None" when\n'
' the attribute is accessed through the *owner*. This '
'method should\n'
' return the (computed) attribute value or raise an '
'"AttributeError"\n'
' exception.\n'
'\n'
'object.__set__(self, instance, value)\n'
'\n'
' Called to set the attribute on an instance *instance* of '
'the owner\n'
' class to a new value, *value*.\n'
'\n'
'object.__delete__(self, instance)\n'
'\n'
' Called to delete the attribute on an instance *instance* '
'of the\n'
' owner class.\n'
'\n'
'The attribute "__objclass__" is interpreted by the "inspect" '
'module as\n'
'specifying the class where this object was defined (setting '
'this\n'
'appropriately can assist in runtime introspection of dynamic '
'class\n'
'attributes). For callables, it may indicate that an instance '
'of the\n'
'given type (or a subclass) is expected or required as the '
'first\n'
'positional argument (for example, CPython sets this '
'attribute for\n'
'unbound methods that are implemented in C).\n'
'\n'
'\n'
'Invoking Descriptors\n'
'--------------------\n'
'\n'
'In general, a descriptor is an object attribute with '
'"binding\n'
'behavior", one whose attribute access has been overridden by '
'methods\n'
'in the descriptor protocol: "__get__()", "__set__()", and\n'
'"__delete__()". If any of those methods are defined for an '
'object, it\n'
'is said to be a descriptor.\n'
'\n'
'The default behavior for attribute access is to get, set, or '
'delete\n'
"the attribute from an object's dictionary. For instance, "
'"a.x" has a\n'
'lookup chain starting with "a.__dict__[\'x\']", then\n'
'"type(a).__dict__[\'x\']", and continuing through the base '
'classes of\n'
'"type(a)" excluding metaclasses.\n'
'\n'
'However, if the looked-up value is an object defining one of '
'the\n'
'descriptor methods, then Python may override the default '
'behavior and\n'
'invoke the descriptor method instead. Where this occurs in '
'the\n'
'precedence chain depends on which descriptor methods were '
'defined and\n'
'how they were called.\n'
'\n'
'The starting point for descriptor invocation is a binding, '
'"a.x". How\n'
'the arguments are assembled depends on "a":\n'
'\n'
'Direct Call\n'
' The simplest and least common call is when user code '
'directly\n'
' invokes a descriptor method: "x.__get__(a)".\n'
'\n'
'Instance Binding\n'
' If binding to an object instance, "a.x" is transformed '
'into the\n'
' call: "type(a).__dict__[\'x\'].__get__(a, type(a))".\n'
'\n'
'Class Binding\n'
' If binding to a class, "A.x" is transformed into the '
'call:\n'
' "A.__dict__[\'x\'].__get__(None, A)".\n'
'\n'
'Super Binding\n'
' If "a" is an instance of "super", then the binding '
'"super(B,\n'
' obj).m()" searches "obj.__class__.__mro__" for the base '
'class "A"\n'
' immediately preceding "B" and then invokes the descriptor '
'with the\n'
' call: "A.__dict__[\'m\'].__get__(obj, obj.__class__)".\n'
'\n'
'For instance bindings, the precedence of descriptor '
'invocation depends\n'
'on the which descriptor methods are defined. A descriptor '
'can define\n'
'any combination of "__get__()", "__set__()" and '
'"__delete__()". If it\n'
'does not define "__get__()", then accessing the attribute '
'will return\n'
'the descriptor object itself unless there is a value in the '
"object's\n"
'instance dictionary. If the descriptor defines "__set__()" '
'and/or\n'
'"__delete__()", it is a data descriptor; if it defines '
'neither, it is\n'
'a non-data descriptor. Normally, data descriptors define '
'both\n'
'"__get__()" and "__set__()", while non-data descriptors have '
'just the\n'
'"__get__()" method. Data descriptors with "__set__()" and '
'"__get__()"\n'
'defined always override a redefinition in an instance '
'dictionary. In\n'
'contrast, non-data descriptors can be overridden by '
'instances.\n'
'\n'
'Python methods (including "staticmethod()" and '
'"classmethod()") are\n'
'implemented as non-data descriptors. Accordingly, instances '
'can\n'
'redefine and override methods. This allows individual '
'instances to\n'
'acquire behaviors that differ from other instances of the '
'same class.\n'
'\n'
'The "property()" function is implemented as a data '
'descriptor.\n'
'Accordingly, instances cannot override the behavior of a '
'property.\n'
'\n'
'\n'
'__slots__\n'
'---------\n'
'\n'
'By default, instances of classes have a dictionary for '
'attribute\n'
'storage. This wastes space for objects having very few '
'instance\n'
'variables. The space consumption can become acute when '
'creating large\n'
'numbers of instances.\n'
'\n'
'The default can be overridden by defining *__slots__* in a '
'class\n'
'definition. The *__slots__* declaration takes a sequence of '
'instance\n'
'variables and reserves just enough space in each instance to '
'hold a\n'
'value for each variable. Space is saved because *__dict__* '
'is not\n'
'created for each instance.\n'
'\n'
'object.__slots__\n'
'\n'
' This class variable can be assigned a string, iterable, '
'or sequence\n'
' of strings with variable names used by instances. '
'*__slots__*\n'
' reserves space for the declared variables and prevents '
'the\n'
' automatic creation of *__dict__* and *__weakref__* for '
'each\n'
' instance.\n'
'\n'
'\n'
'Notes on using *__slots__*\n'
'~~~~~~~~~~~~~~~~~~~~~~~~~~\n'
'\n'
'* When inheriting from a class without *__slots__*, the '
'*__dict__*\n'
' attribute of that class will always be accessible, so a '
'*__slots__*\n'
' definition in the subclass is meaningless.\n'
'\n'
'* Without a *__dict__* variable, instances cannot be '
'assigned new\n'
' variables not listed in the *__slots__* definition. '
'Attempts to\n'
' assign to an unlisted variable name raises '
'"AttributeError". If\n'
' dynamic assignment of new variables is desired, then add\n'
' "\'__dict__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
'* Without a *__weakref__* variable for each instance, '
'classes\n'
' defining *__slots__* do not support weak references to '
'its\n'
' instances. If weak reference support is needed, then add\n'
' "\'__weakref__\'" to the sequence of strings in the '
'*__slots__*\n'
' declaration.\n'
'\n'
'* *__slots__* are implemented at the class level by '
'creating\n'
' descriptors (*Implementing Descriptors*) for each variable '
'name. As\n'
' a result, class attributes cannot be used to set default '
'values for\n'
' instance variables defined by *__slots__*; otherwise, the '
'class\n'
' attribute would overwrite the descriptor assignment.\n'
'\n'
'* The action of a *__slots__* declaration is limited to the '
'class\n'
' where it is defined. As a result, subclasses will have a '
'*__dict__*\n'
' unless they also define *__slots__* (which must only '
'contain names\n'
' of any *additional* slots).\n'
'\n'
'* If a class defines a slot also defined in a base class, '
'the\n'
' instance variable defined by the base class slot is '
'inaccessible\n'
' (except by retrieving its descriptor directly from the '
'base class).\n'
' This renders the meaning of the program undefined. In the '
'future, a\n'
' check may be added to prevent this.\n'
'\n'
'* Nonempty *__slots__* does not work for classes derived '
'from\n'
' "variable-length" built-in types such as "int", "bytes" '
'and "tuple".\n'
'\n'
'* Any non-string iterable may be assigned to *__slots__*. '
'Mappings\n'
' may also be used; however, in the future, special meaning '
'may be\n'
' assigned to the values corresponding to each key.\n'
'\n'
'* *__class__* assignment works only if both classes have the '
'same\n'
' *__slots__*.\n'
'\n'
'\n'
'Customizing class creation\n'
'==========================\n'
'\n'
'By default, classes are constructed using "type()". The '
'class body is\n'
'executed in a new namespace and the class name is bound '
'locally to the\n'
'result of "type(name, bases, namespace)".\n'
'\n'
'The class creation process can be customised by passing the\n'
'"metaclass" keyword argument in the class definition line, '
'or by\n'
'inheriting from an existing class that included such an '
'argument. In\n'
'the following example, both "MyClass" and "MySubclass" are '
'instances\n'
'of "Meta":\n'
'\n'
' class Meta(type):\n'
' pass\n'
'\n'
' class MyClass(metaclass=Meta):\n'
' pass\n'
'\n'
' class MySubclass(MyClass):\n'
' pass\n'
'\n'
'Any other keyword arguments that are specified in the class '
'definition\n'
'are passed through to all metaclass operations described '
'below.\n'
'\n'
'When a class definition is executed, the following steps '
'occur:\n'
'\n'
'* the appropriate metaclass is determined\n'
'\n'
'* the class namespace is prepared\n'
'\n'
'* the class body is executed\n'
'\n'
'* the class object is created\n'
'\n'
'\n'
'Determining the appropriate metaclass\n'
'-------------------------------------\n'
'\n'
'The appropriate metaclass for a class definition is '
'determined as\n'
'follows:\n'
'\n'
'* if no bases and no explicit metaclass are given, then '
'"type()" is\n'
' used\n'
'\n'
'* if an explicit metaclass is given and it is *not* an '
'instance of\n'
' "type()", then it is used directly as the metaclass\n'
'\n'
'* if an instance of "type()" is given as the explicit '
'metaclass, or\n'
' bases are defined, then the most derived metaclass is '
'used\n'
'\n'
'The most derived metaclass is selected from the explicitly '
'specified\n'
'metaclass (if any) and the metaclasses (i.e. "type(cls)") of '
'all\n'
'specified base classes. The most derived metaclass is one '
'which is a\n'
'subtype of *all* of these candidate metaclasses. If none of '
'the\n'
'candidate metaclasses meets that criterion, then the class '
'definition\n'
'will fail with "TypeError".\n'
'\n'
'\n'
'Preparing the class namespace\n'
'-----------------------------\n'
'\n'
'Once the appropriate metaclass has been identified, then the '
'class\n'
'namespace is prepared. If the metaclass has a "__prepare__" '
'attribute,\n'
'it is called as "namespace = metaclass.__prepare__(name, '
'bases,\n'
'**kwds)" (where the additional keyword arguments, if any, '
'come from\n'
'the class definition).\n'
'\n'
'If the metaclass has no "__prepare__" attribute, then the '
'class\n'
'namespace is initialised as an empty "dict()" instance.\n'
'\n'
'See also: **PEP 3115** - Metaclasses in Python 3000\n'
'\n'
' Introduced the "__prepare__" namespace hook\n'
'\n'
'\n'
'Executing the class body\n'
'------------------------\n'
'\n'
'The class body is executed (approximately) as "exec(body, '
'globals(),\n'
'namespace)". The key difference from a normal call to '
'"exec()" is that\n'
'lexical scoping allows the class body (including any '
'methods) to\n'
'reference names from the current and outer scopes when the '
'class\n'
'definition occurs inside a function.\n'
'\n'
'However, even when the class definition occurs inside the '
'function,\n'
'methods defined inside the class still cannot see names '
'defined at the\n'
'class scope. Class variables must be accessed through the '
'first\n'
'parameter of instance or class methods, and cannot be '
'accessed at all\n'
'from static methods.\n'
'\n'
'\n'
'Creating the class object\n'
'-------------------------\n'
'\n'
'Once the class namespace has been populated by executing the '
'class\n'
'body, the class object is created by calling '
'"metaclass(name, bases,\n'
'namespace, **kwds)" (the additional keywords passed here are '
'the same\n'
'as those passed to "__prepare__").\n'
'\n'
'This class object is the one that will be referenced by the '
'zero-\n'
'argument form of "super()". "__class__" is an implicit '
'closure\n'
'reference created by the compiler if any methods in a class '
'body refer\n'
'to either "__class__" or "super". This allows the zero '
'argument form\n'
'of "super()" to correctly identify the class being defined '
'based on\n'
'lexical scoping, while the class or instance that was used '
'to make the\n'
'current call is identified based on the first argument '
'passed to the\n'
'method.\n'
'\n'
'After the class object is created, it is passed to the '
'class\n'
'decorators included in the class definition (if any) and the '
'resulting\n'
'object is bound in the local namespace as the defined '
'class.\n'
'\n'
'See also: **PEP 3135** - New super\n'
'\n'
' Describes the implicit "__class__" closure reference\n'
'\n'
'\n'
'Metaclass example\n'
'-----------------\n'
'\n'
'The potential uses for metaclasses are boundless. Some ideas '
'that have\n'
'been explored include logging, interface checking, '
'automatic\n'
'delegation, automatic property creation, proxies, '
'frameworks, and\n'
'automatic resource locking/synchronization.\n'
'\n'
'Here is an example of a metaclass that uses an\n'
'"collections.OrderedDict" to remember the order that class '
'variables\n'
'are defined:\n'
'\n'
' class OrderedClass(type):\n'
'\n'
' @classmethod\n'
' def __prepare__(metacls, name, bases, **kwds):\n'
' return collections.OrderedDict()\n'
'\n'
' def __new__(cls, name, bases, namespace, **kwds):\n'
' result = type.__new__(cls, name, bases, '
'dict(namespace))\n'
' result.members = tuple(namespace)\n'
' return result\n'
'\n'
' class A(metaclass=OrderedClass):\n'
' def one(self): pass\n'
' def two(self): pass\n'
' def three(self): pass\n'
' def four(self): pass\n'
'\n'
' >>> A.members\n'
" ('__module__', 'one', 'two', 'three', 'four')\n"
'\n'
'When the class definition for *A* gets executed, the process '
'begins\n'
'with calling the metaclass\'s "__prepare__()" method which '
'returns an\n'
'empty "collections.OrderedDict". That mapping records the '
'methods and\n'
'attributes of *A* as they are defined within the body of the '
'class\n'
'statement. Once those definitions are executed, the ordered '
'dictionary\n'
'is fully populated and the metaclass\'s "__new__()" method '
'gets\n'
'invoked. That method builds the new type and it saves the '
'ordered\n'
'dictionary keys in an attribute called "members".\n'
'\n'
'\n'
'Customizing instance and subclass checks\n'
'========================================\n'
'\n'
'The following methods are used to override the default '
'behavior of the\n'
'"isinstance()" and "issubclass()" built-in functions.\n'
'\n'
'In particular, the metaclass "abc.ABCMeta" implements these '
'methods in\n'
'order to allow the addition of Abstract Base Classes (ABCs) '
'as\n'
'"virtual base classes" to any class or type (including '
'built-in\n'
'types), including other ABCs.\n'
'\n'
'class.__instancecheck__(self, instance)\n'
'\n'
' Return true if *instance* should be considered a (direct '
'or\n'
' indirect) instance of *class*. If defined, called to '
'implement\n'
' "isinstance(instance, class)".\n'
'\n'
'class.__subclasscheck__(self, subclass)\n'
'\n'
' Return true if *subclass* should be considered a (direct '
'or\n'
' indirect) subclass of *class*. If defined, called to '
'implement\n'
' "issubclass(subclass, class)".\n'
'\n'
'Note that these methods are looked up on the type '
'(metaclass) of a\n'
'class. They cannot be defined as class methods in the '
'actual class.\n'
'This is consistent with the lookup of special methods that '
'are called\n'
'on instances, only in this case the instance is itself a '
'class.\n'
'\n'
'See also: **PEP 3119** - Introducing Abstract Base Classes\n'
'\n'
' Includes the specification for customizing '
'"isinstance()" and\n'
' "issubclass()" behavior through "__instancecheck__()" '
'and\n'
' "__subclasscheck__()", with motivation for this '
'functionality in\n'
' the context of adding Abstract Base Classes (see the '
'"abc"\n'
' module) to the language.\n'
'\n'
'\n'
'Emulating callable objects\n'
'==========================\n'
'\n'
'object.__call__(self[, args...])\n'
'\n'
' Called when the instance is "called" as a function; if '
'this method\n'
' is defined, "x(arg1, arg2, ...)" is a shorthand for\n'
' "x.__call__(arg1, arg2, ...)".\n'
'\n'
'\n'
'Emulating container types\n'
'=========================\n'
'\n'
'The following methods can be defined to implement container '
'objects.\n'
'Containers usually are sequences (such as lists or tuples) '
'or mappings\n'
'(like dictionaries), but can represent other containers as '
'well. The\n'
'first set of methods is used either to emulate a sequence or '
'to\n'
'emulate a mapping; the difference is that for a sequence, '
'the\n'
'allowable keys should be the integers *k* for which "0 <= k '
'< N" where\n'
'*N* is the length of the sequence, or slice objects, which '
'define a\n'
'range of items. It is also recommended that mappings '
'provide the\n'
'methods "keys()", "values()", "items()", "get()", '
'"clear()",\n'
'"setdefault()", "pop()", "popitem()", "copy()", and '
'"update()"\n'
"behaving similar to those for Python's standard dictionary "
'objects.\n'
'The "collections" module provides a "MutableMapping" '
'abstract base\n'
'class to help create those methods from a base set of '
'"__getitem__()",\n'
'"__setitem__()", "__delitem__()", and "keys()". Mutable '
'sequences\n'
'should provide methods "append()", "count()", "index()", '
'"extend()",\n'
'"insert()", "pop()", "remove()", "reverse()" and "sort()", '
'like Python\n'
'standard list objects. Finally, sequence types should '
'implement\n'
'addition (meaning concatenation) and multiplication '
'(meaning\n'
'repetition) by defining the methods "__add__()", '
'"__radd__()",\n'
'"__iadd__()", "__mul__()", "__rmul__()" and "__imul__()" '
'described\n'
'below; they should not define other numerical operators. It '
'is\n'
'recommended that both mappings and sequences implement the\n'
'"__contains__()" method to allow efficient use of the "in" '
'operator;\n'
'for mappings, "in" should search the mapping\'s keys; for '
'sequences, it\n'
'should search through the values. It is further recommended '
'that both\n'
'mappings and sequences implement the "__iter__()" method to '
'allow\n'
'efficient iteration through the container; for mappings, '
'"__iter__()"\n'
'should be the same as "keys()"; for sequences, it should '
'iterate\n'
'through the values.\n'
'\n'
'object.__len__(self)\n'
'\n'
' Called to implement the built-in function "len()". '
'Should return\n'
' the length of the object, an integer ">=" 0. Also, an '
'object that\n'
' doesn\'t define a "__bool__()" method and whose '
'"__len__()" method\n'
' returns zero is considered to be false in a Boolean '
'context.\n'
'\n'
'object.__length_hint__(self)\n'
'\n'
' Called to implement "operator.length_hint()". Should '
'return an\n'
' estimated length for the object (which may be greater or '
'less than\n'
' the actual length). The length must be an integer ">=" 0. '
'This\n'
' method is purely an optimization and is never required '
'for\n'
' correctness.\n'
'\n'
' New in version 3.4.\n'
'\n'
'Note: Slicing is done exclusively with the following three '
'methods.\n'
' A call like\n'
'\n'
' a[1:2] = b\n'
'\n'
' is translated to\n'
'\n'
' a[slice(1, 2, None)] = b\n'
'\n'
' and so forth. Missing slice items are always filled in '
'with "None".\n'
'\n'
'object.__getitem__(self, key)\n'
'\n'
' Called to implement evaluation of "self[key]". For '
'sequence types,\n'
' the accepted keys should be integers and slice objects. '
'Note that\n'
' the special interpretation of negative indexes (if the '
'class wishes\n'
' to emulate a sequence type) is up to the "__getitem__()" '
'method. If\n'
' *key* is of an inappropriate type, "TypeError" may be '
'raised; if of\n'
' a value outside the set of indexes for the sequence '
'(after any\n'
' special interpretation of negative values), "IndexError" '
'should be\n'
' raised. For mapping types, if *key* is missing (not in '
'the\n'
' container), "KeyError" should be raised.\n'
'\n'
' Note: "for" loops expect that an "IndexError" will be '
'raised for\n'
' illegal indexes to allow proper detection of the end of '
'the\n'
' sequence.\n'
'\n'
'object.__missing__(self, key)\n'
'\n'
' Called by "dict"."__getitem__()" to implement "self[key]" '
'for dict\n'
' subclasses when key is not in the dictionary.\n'
'\n'
'object.__setitem__(self, key, value)\n'
'\n'
' Called to implement assignment to "self[key]". Same note '
'as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support changes to the values for keys, or if '
'new keys\n'
' can be added, or for sequences if elements can be '
'replaced. The\n'
' same exceptions should be raised for improper *key* '
'values as for\n'
' the "__getitem__()" method.\n'
'\n'
'object.__delitem__(self, key)\n'
'\n'
' Called to implement deletion of "self[key]". Same note '
'as for\n'
' "__getitem__()". This should only be implemented for '
'mappings if\n'
' the objects support removal of keys, or for sequences if '
'elements\n'
' can be removed from the sequence. The same exceptions '
'should be\n'
' raised for improper *key* values as for the '
'"__getitem__()" method.\n'
'\n'
'object.__iter__(self)\n'
'\n'
' This method is called when an iterator is required for a '
'container.\n'
' This method should return a new iterator object that can '
'iterate\n'
' over all the objects in the container. For mappings, it '
'should\n'
' iterate over the keys of the container.\n'
'\n'
' Iterator objects also need to implement this method; they '
'are\n'
' required to return themselves. For more information on '
'iterator\n'
' objects, see *Iterator Types*.\n'
'\n'
'object.__reversed__(self)\n'
'\n'
' Called (if present) by the "reversed()" built-in to '
'implement\n'
' reverse iteration. It should return a new iterator '
'object that\n'
' iterates over all the objects in the container in reverse '
'order.\n'
'\n'
' If the "__reversed__()" method is not provided, the '
'"reversed()"\n'
' built-in will fall back to using the sequence protocol '
'("__len__()"\n'
' and "__getitem__()"). Objects that support the sequence '
'protocol\n'
' should only provide "__reversed__()" if they can provide '
'an\n'
' implementation that is more efficient than the one '
'provided by\n'
' "reversed()".\n'
'\n'
'The membership test operators ("in" and "not in") are '
'normally\n'
'implemented as an iteration through a sequence. However, '
'container\n'
'objects can supply the following special method with a more '
'efficient\n'
'implementation, which also does not require the object be a '
'sequence.\n'
'\n'
'object.__contains__(self, item)\n'
'\n'
' Called to implement membership test operators. Should '
'return true\n'
' if *item* is in *self*, false otherwise. For mapping '
'objects, this\n'
' should consider the keys of the mapping rather than the '
'values or\n'
' the key-item pairs.\n'
'\n'
' For objects that don\'t define "__contains__()", the '
'membership test\n'
' first tries iteration via "__iter__()", then the old '
'sequence\n'
' iteration protocol via "__getitem__()", see *this section '
'in the\n'
' language reference*.\n'
'\n'
'\n'
'Emulating numeric types\n'
'=======================\n'
'\n'
'The following methods can be defined to emulate numeric '
'objects.\n'
'Methods corresponding to operations that are not supported '
'by the\n'
'particular kind of number implemented (e.g., bitwise '
'operations for\n'
'non-integral numbers) should be left undefined.\n'
'\n'
'object.__add__(self, other)\n'
'object.__sub__(self, other)\n'
'object.__mul__(self, other)\n'
'object.__truediv__(self, other)\n'
'object.__floordiv__(self, other)\n'
'object.__mod__(self, other)\n'
'object.__divmod__(self, other)\n'
'object.__pow__(self, other[, modulo])\n'
'object.__lshift__(self, other)\n'
'object.__rshift__(self, other)\n'
'object.__and__(self, other)\n'
'object.__xor__(self, other)\n'
'object.__or__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "/", "//", "%", "divmod()", '
'"pow()",\n'
' "**", "<<", ">>", "&", "^", "|"). For instance, to '
'evaluate the\n'
' expression "x + y", where *x* is an instance of a class '
'that has an\n'
' "__add__()" method, "x.__add__(y)" is called. The '
'"__divmod__()"\n'
' method should be the equivalent to using "__floordiv__()" '
'and\n'
' "__mod__()"; it should not be related to '
'"__truediv__()". Note\n'
' that "__pow__()" should be defined to accept an optional '
'third\n'
' argument if the ternary version of the built-in "pow()" '
'function is\n'
' to be supported.\n'
'\n'
' If one of those methods does not support the operation '
'with the\n'
' supplied arguments, it should return "NotImplemented".\n'
'\n'
'object.__radd__(self, other)\n'
'object.__rsub__(self, other)\n'
'object.__rmul__(self, other)\n'
'object.__rtruediv__(self, other)\n'
'object.__rfloordiv__(self, other)\n'
'object.__rmod__(self, other)\n'
'object.__rdivmod__(self, other)\n'
'object.__rpow__(self, other)\n'
'object.__rlshift__(self, other)\n'
'object.__rrshift__(self, other)\n'
'object.__rand__(self, other)\n'
'object.__rxor__(self, other)\n'
'object.__ror__(self, other)\n'
'\n'
' These methods are called to implement the binary '
'arithmetic\n'
' operations ("+", "-", "*", "/", "//", "%", "divmod()", '
'"pow()",\n'
' "**", "<<", ">>", "&", "^", "|") with reflected (swapped) '
'operands.\n'
' These functions are only called if the left operand does '
'not\n'
' support the corresponding operation and the operands are '
'of\n'
' different types. [2] For instance, to evaluate the '
'expression "x -\n'
' y", where *y* is an instance of a class that has an '
'"__rsub__()"\n'
' method, "y.__rsub__(x)" is called if "x.__sub__(y)" '
'returns\n'
' *NotImplemented*.\n'
'\n'
' Note that ternary "pow()" will not try calling '
'"__rpow__()" (the\n'
' coercion rules would become too complicated).\n'
'\n'
" Note: If the right operand's type is a subclass of the "
'left\n'
" operand's type and that subclass provides the reflected "
'method\n'
' for the operation, this method will be called before '
'the left\n'
" operand's non-reflected method. This behavior allows "
'subclasses\n'
" to override their ancestors' operations.\n"
'\n'
'object.__iadd__(self, other)\n'
'object.__isub__(self, other)\n'
'object.__imul__(self, other)\n'
'object.__itruediv__(self, other)\n'
'object.__ifloordiv__(self, other)\n'
'object.__imod__(self, other)\n'
'object.__ipow__(self, other[, modulo])\n'
'object.__ilshift__(self, other)\n'
'object.__irshift__(self, other)\n'
'object.__iand__(self, other)\n'
'object.__ixor__(self, other)\n'
'object.__ior__(self, other)\n'
'\n'
' These methods are called to implement the augmented '
'arithmetic\n'
' assignments ("+=", "-=", "*=", "/=", "//=", "%=", "**=", '
'"<<=",\n'
' ">>=", "&=", "^=", "|="). These methods should attempt '
'to do the\n'
' operation in-place (modifying *self*) and return the '
'result (which\n'
' could be, but does not have to be, *self*). If a '
'specific method\n'
' is not defined, the augmented assignment falls back to '
'the normal\n'
' methods. For instance, if *x* is an instance of a class '
'with an\n'
' "__iadd__()" method, "x += y" is equivalent to "x = '
'x.__iadd__(y)"\n'
' . Otherwise, "x.__add__(y)" and "y.__radd__(x)" are '
'considered, as\n'
' with the evaluation of "x + y". In certain situations, '
'augmented\n'
' assignment can result in unexpected errors (see *Why '
'does\n'
" a_tuple[i] += ['item'] raise an exception when the "
'addition\n'
' works?*), but this behavior is in fact part of the data '
'model.\n'
'\n'
'object.__neg__(self)\n'
'object.__pos__(self)\n'
'object.__abs__(self)\n'
'object.__invert__(self)\n'
'\n'
' Called to implement the unary arithmetic operations ("-", '
'"+",\n'
' "abs()" and "~").\n'
'\n'
'object.__complex__(self)\n'
'object.__int__(self)\n'
'object.__float__(self)\n'
'object.__round__(self[, n])\n'
'\n'
' Called to implement the built-in functions "complex()", '
'"int()",\n'
' "float()" and "round()". Should return a value of the '
'appropriate\n'
' type.\n'
'\n'
'object.__index__(self)\n'
'\n'
' Called to implement "operator.index()", and whenever '
'Python needs\n'
' to losslessly convert the numeric object to an integer '
'object (such\n'
' as in slicing, or in the built-in "bin()", "hex()" and '
'"oct()"\n'
' functions). Presence of this method indicates that the '
'numeric\n'
' object is an integer type. Must return an integer.\n'
'\n'
' Note: In order to have a coherent integer type class, '
'when\n'
' "__index__()" is defined "__int__()" should also be '
'defined, and\n'
' both should return the same value.\n'
'\n'
'\n'
'With Statement Context Managers\n'
'===============================\n'
'\n'
'A *context manager* is an object that defines the runtime '
'context to\n'
'be established when executing a "with" statement. The '
'context manager\n'
'handles the entry into, and the exit from, the desired '
'runtime context\n'
'for the execution of the block of code. Context managers '
'are normally\n'
'invoked using the "with" statement (described in section '
'*The with\n'
'statement*), but can also be used by directly invoking their '
'methods.\n'
'\n'
'Typical uses of context managers include saving and '
'restoring various\n'
'kinds of global state, locking and unlocking resources, '
'closing opened\n'
'files, etc.\n'
'\n'
'For more information on context managers, see *Context '
'Manager Types*.\n'
'\n'
'object.__enter__(self)\n'
'\n'
' Enter the runtime context related to this object. The '
'"with"\n'
" statement will bind this method's return value to the "
'target(s)\n'
' specified in the "as" clause of the statement, if any.\n'
'\n'
'object.__exit__(self, exc_type, exc_value, traceback)\n'
'\n'
' Exit the runtime context related to this object. The '
'parameters\n'
' describe the exception that caused the context to be '
'exited. If the\n'
' context was exited without an exception, all three '
'arguments will\n'
' be "None".\n'
'\n'
' If an exception is supplied, and the method wishes to '
'suppress the\n'
' exception (i.e., prevent it from being propagated), it '
'should\n'
' return a true value. Otherwise, the exception will be '
'processed\n'
' normally upon exit from this method.\n'
'\n'
' Note that "__exit__()" methods should not reraise the '
'passed-in\n'
" exception; this is the caller's responsibility.\n"
'\n'
'See also: **PEP 0343** - The "with" statement\n'
'\n'
' The specification, background, and examples for the '
'Python "with"\n'
' statement.\n'
'\n'
'\n'
'Special method lookup\n'
'=====================\n'
'\n'
'For custom classes, implicit invocations of special methods '
'are only\n'
"guaranteed to work correctly if defined on an object's type, "
'not in\n'
"the object's instance dictionary. That behaviour is the "
'reason why\n'
'the following code raises an exception:\n'
'\n'
' >>> class C:\n'
' ... pass\n'
' ...\n'
' >>> c = C()\n'
' >>> c.__len__ = lambda: 5\n'
' >>> len(c)\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" TypeError: object of type 'C' has no len()\n"
'\n'
'The rationale behind this behaviour lies with a number of '
'special\n'
'methods such as "__hash__()" and "__repr__()" that are '
'implemented by\n'
'all objects, including type objects. If the implicit lookup '
'of these\n'
'methods used the conventional lookup process, they would '
'fail when\n'
'invoked on the type object itself:\n'
'\n'
' >>> 1 .__hash__() == hash(1)\n'
' True\n'
' >>> int.__hash__() == hash(int)\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" TypeError: descriptor '__hash__' of 'int' object needs an "
'argument\n'
'\n'
'Incorrectly attempting to invoke an unbound method of a '
'class in this\n'
"way is sometimes referred to as 'metaclass confusion', and "
'is avoided\n'
'by bypassing the instance when looking up special methods:\n'
'\n'
' >>> type(1).__hash__(1) == hash(1)\n'
' True\n'
' >>> type(int).__hash__(int) == hash(int)\n'
' True\n'
'\n'
'In addition to bypassing any instance attributes in the '
'interest of\n'
'correctness, implicit special method lookup generally also '
'bypasses\n'
'the "__getattribute__()" method even of the object\'s '
'metaclass:\n'
'\n'
' >>> class Meta(type):\n'
' ... def __getattribute__(*args):\n'
' ... print("Metaclass getattribute invoked")\n'
' ... return type.__getattribute__(*args)\n'
' ...\n'
' >>> class C(object, metaclass=Meta):\n'
' ... def __len__(self):\n'
' ... return 10\n'
' ... def __getattribute__(*args):\n'
' ... print("Class getattribute invoked")\n'
' ... return object.__getattribute__(*args)\n'
' ...\n'
' >>> c = C()\n'
' >>> c.__len__() # Explicit lookup via '
'instance\n'
' Class getattribute invoked\n'
' 10\n'
' >>> type(c).__len__(c) # Explicit lookup via '
'type\n'
' Metaclass getattribute invoked\n'
' 10\n'
' >>> len(c) # Implicit lookup\n'
' 10\n'
'\n'
'Bypassing the "__getattribute__()" machinery in this fashion '
'provides\n'
'significant scope for speed optimisations within the '
'interpreter, at\n'
'the cost of some flexibility in the handling of special '
'methods (the\n'
'special method *must* be set on the class object itself in '
'order to be\n'
'consistently invoked by the interpreter).\n'
'\n'
'-[ Footnotes ]-\n'
'\n'
"[1] It *is* possible in some cases to change an object's "
'type,\n'
" under certain controlled conditions. It generally isn't "
'a good\n'
' idea though, since it can lead to some very strange '
'behaviour if\n'
' it is handled incorrectly.\n'
'\n'
'[2] For operands of the same type, it is assumed that if the '
'non-\n'
' reflected method (such as "__add__()") fails the '
'operation is not\n'
' supported, which is why the reflected method is not '
'called.\n',
'string-methods': '\n'
'String Methods\n'
'**************\n'
'\n'
'Strings implement all of the *common* sequence operations, '
'along with\n'
'the additional methods described below.\n'
'\n'
'Strings also support two styles of string formatting, one '
'providing a\n'
'large degree of flexibility and customization (see '
'"str.format()",\n'
'*Format String Syntax* and *String Formatting*) and the '
'other based on\n'
'C "printf" style formatting that handles a narrower range '
'of types and\n'
'is slightly harder to use correctly, but is often faster '
'for the cases\n'
'it can handle (*printf-style String Formatting*).\n'
'\n'
'The *Text Processing Services* section of the standard '
'library covers\n'
'a number of other modules that provide various text '
'related utilities\n'
'(including regular expression support in the "re" '
'module).\n'
'\n'
'str.capitalize()\n'
'\n'
' Return a copy of the string with its first character '
'capitalized\n'
' and the rest lowercased.\n'
'\n'
'str.casefold()\n'
'\n'
' Return a casefolded copy of the string. Casefolded '
'strings may be\n'
' used for caseless matching.\n'
'\n'
' Casefolding is similar to lowercasing but more '
'aggressive because\n'
' it is intended to remove all case distinctions in a '
'string. For\n'
' example, the German lowercase letter "\'ß\'" is '
'equivalent to ""ss"".\n'
' Since it is already lowercase, "lower()" would do '
'nothing to "\'ß\'";\n'
' "casefold()" converts it to ""ss"".\n'
'\n'
' The casefolding algorithm is described in section 3.13 '
'of the\n'
' Unicode Standard.\n'
'\n'
' New in version 3.3.\n'
'\n'
'str.center(width[, fillchar])\n'
'\n'
' Return centered in a string of length *width*. Padding '
'is done\n'
' using the specified *fillchar* (default is an ASCII '
'space). The\n'
' original string is returned if *width* is less than or '
'equal to\n'
' "len(s)".\n'
'\n'
'str.count(sub[, start[, end]])\n'
'\n'
' Return the number of non-overlapping occurrences of '
'substring *sub*\n'
' in the range [*start*, *end*]. Optional arguments '
'*start* and\n'
' *end* are interpreted as in slice notation.\n'
'\n'
'str.encode(encoding="utf-8", errors="strict")\n'
'\n'
' Return an encoded version of the string as a bytes '
'object. Default\n'
' encoding is "\'utf-8\'". *errors* may be given to set a '
'different\n'
' error handling scheme. The default for *errors* is '
'"\'strict\'",\n'
' meaning that encoding errors raise a "UnicodeError". '
'Other possible\n'
' values are "\'ignore\'", "\'replace\'", '
'"\'xmlcharrefreplace\'",\n'
' "\'backslashreplace\'" and any other name registered '
'via\n'
' "codecs.register_error()", see section *Error '
'Handlers*. For a list\n'
' of possible encodings, see section *Standard '
'Encodings*.\n'
'\n'
' Changed in version 3.1: Support for keyword arguments '
'added.\n'
'\n'
'str.endswith(suffix[, start[, end]])\n'
'\n'
' Return "True" if the string ends with the specified '
'*suffix*,\n'
' otherwise return "False". *suffix* can also be a tuple '
'of suffixes\n'
' to look for. With optional *start*, test beginning at '
'that\n'
' position. With optional *end*, stop comparing at that '
'position.\n'
'\n'
'str.expandtabs(tabsize=8)\n'
'\n'
' Return a copy of the string where all tab characters '
'are replaced\n'
' by one or more spaces, depending on the current column '
'and the\n'
' given tab size. Tab positions occur every *tabsize* '
'characters\n'
' (default is 8, giving tab positions at columns 0, 8, 16 '
'and so on).\n'
' To expand the string, the current column is set to zero '
'and the\n'
' string is examined character by character. If the '
'character is a\n'
' tab ("\\t"), one or more space characters are inserted '
'in the result\n'
' until the current column is equal to the next tab '
'position. (The\n'
' tab character itself is not copied.) If the character '
'is a newline\n'
' ("\\n") or return ("\\r"), it is copied and the current '
'column is\n'
' reset to zero. Any other character is copied unchanged '
'and the\n'
' current column is incremented by one regardless of how '
'the\n'
' character is represented when printed.\n'
'\n'
" >>> '01\\t012\\t0123\\t01234'.expandtabs()\n"
" '01 012 0123 01234'\n"
" >>> '01\\t012\\t0123\\t01234'.expandtabs(4)\n"
" '01 012 0123 01234'\n"
'\n'
'str.find(sub[, start[, end]])\n'
'\n'
' Return the lowest index in the string where substring '
'*sub* is\n'
' found within the slice "s[start:end]". Optional '
'arguments *start*\n'
' and *end* are interpreted as in slice notation. Return '
'"-1" if\n'
' *sub* is not found.\n'
'\n'
' Note: The "find()" method should be used only if you '
'need to know\n'
' the position of *sub*. To check if *sub* is a '
'substring or not,\n'
' use the "in" operator:\n'
'\n'
" >>> 'Py' in 'Python'\n"
' True\n'
'\n'
'str.format(*args, **kwargs)\n'
'\n'
' Perform a string formatting operation. The string on '
'which this\n'
' method is called can contain literal text or '
'replacement fields\n'
' delimited by braces "{}". Each replacement field '
'contains either\n'
' the numeric index of a positional argument, or the name '
'of a\n'
' keyword argument. Returns a copy of the string where '
'each\n'
' replacement field is replaced with the string value of '
'the\n'
' corresponding argument.\n'
'\n'
' >>> "The sum of 1 + 2 is {0}".format(1+2)\n'
" 'The sum of 1 + 2 is 3'\n"
'\n'
' See *Format String Syntax* for a description of the '
'various\n'
' formatting options that can be specified in format '
'strings.\n'
'\n'
'str.format_map(mapping)\n'
'\n'
' Similar to "str.format(**mapping)", except that '
'"mapping" is used\n'
' directly and not copied to a "dict". This is useful if '
'for example\n'
' "mapping" is a dict subclass:\n'
'\n'
' >>> class Default(dict):\n'
' ... def __missing__(self, key):\n'
' ... return key\n'
' ...\n'
" >>> '{name} was born in "
"{country}'.format_map(Default(name='Guido'))\n"
" 'Guido was born in country'\n"
'\n'
' New in version 3.2.\n'
'\n'
'str.index(sub[, start[, end]])\n'
'\n'
' Like "find()", but raise "ValueError" when the '
'substring is not\n'
' found.\n'
'\n'
'str.isalnum()\n'
'\n'
' Return true if all characters in the string are '
'alphanumeric and\n'
' there is at least one character, false otherwise. A '
'character "c"\n'
' is alphanumeric if one of the following returns '
'"True":\n'
' "c.isalpha()", "c.isdecimal()", "c.isdigit()", or '
'"c.isnumeric()".\n'
'\n'
'str.isalpha()\n'
'\n'
' Return true if all characters in the string are '
'alphabetic and\n'
' there is at least one character, false otherwise. '
'Alphabetic\n'
' characters are those characters defined in the Unicode '
'character\n'
' database as "Letter", i.e., those with general category '
'property\n'
' being one of "Lm", "Lt", "Lu", "Ll", or "Lo". Note '
'that this is\n'
' different from the "Alphabetic" property defined in the '
'Unicode\n'
' Standard.\n'
'\n'
'str.isdecimal()\n'
'\n'
' Return true if all characters in the string are decimal '
'characters\n'
' and there is at least one character, false otherwise. '
'Decimal\n'
' characters are those from general category "Nd". This '
'category\n'
' includes digit characters, and all characters that can '
'be used to\n'
' form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC '
'DIGIT ZERO.\n'
'\n'
'str.isdigit()\n'
'\n'
' Return true if all characters in the string are digits '
'and there is\n'
' at least one character, false otherwise. Digits '
'include decimal\n'
' characters and digits that need special handling, such '
'as the\n'
' compatibility superscript digits. Formally, a digit is '
'a character\n'
' that has the property value Numeric_Type=Digit or\n'
' Numeric_Type=Decimal.\n'
'\n'
'str.isidentifier()\n'
'\n'
' Return true if the string is a valid identifier '
'according to the\n'
' language definition, section *Identifiers and '
'keywords*.\n'
'\n'
' Use "keyword.iskeyword()" to test for reserved '
'identifiers such as\n'
' "def" and "class".\n'
'\n'
'str.islower()\n'
'\n'
' Return true if all cased characters [4] in the string '
'are lowercase\n'
' and there is at least one cased character, false '
'otherwise.\n'
'\n'
'str.isnumeric()\n'
'\n'
' Return true if all characters in the string are numeric '
'characters,\n'
' and there is at least one character, false otherwise. '
'Numeric\n'
' characters include digit characters, and all characters '
'that have\n'
' the Unicode numeric value property, e.g. U+2155, VULGAR '
'FRACTION\n'
' ONE FIFTH. Formally, numeric characters are those with '
'the\n'
' property value Numeric_Type=Digit, Numeric_Type=Decimal '
'or\n'
' Numeric_Type=Numeric.\n'
'\n'
'str.isprintable()\n'
'\n'
' Return true if all characters in the string are '
'printable or the\n'
' string is empty, false otherwise. Nonprintable '
'characters are\n'
' those characters defined in the Unicode character '
'database as\n'
' "Other" or "Separator", excepting the ASCII space '
'(0x20) which is\n'
' considered printable. (Note that printable characters '
'in this\n'
' context are those which should not be escaped when '
'"repr()" is\n'
' invoked on a string. It has no bearing on the handling '
'of strings\n'
' written to "sys.stdout" or "sys.stderr".)\n'
'\n'
'str.isspace()\n'
'\n'
' Return true if there are only whitespace characters in '
'the string\n'
' and there is at least one character, false otherwise. '
'Whitespace\n'
' characters are those characters defined in the Unicode '
'character\n'
' database as "Other" or "Separator" and those with '
'bidirectional\n'
' property being one of "WS", "B", or "S".\n'
'\n'
'str.istitle()\n'
'\n'
' Return true if the string is a titlecased string and '
'there is at\n'
' least one character, for example uppercase characters '
'may only\n'
' follow uncased characters and lowercase characters only '
'cased ones.\n'
' Return false otherwise.\n'
'\n'
'str.isupper()\n'
'\n'
' Return true if all cased characters [4] in the string '
'are uppercase\n'
' and there is at least one cased character, false '
'otherwise.\n'
'\n'
'str.join(iterable)\n'
'\n'
' Return a string which is the concatenation of the '
'strings in the\n'
' *iterable* *iterable*. A "TypeError" will be raised if '
'there are\n'
' any non-string values in *iterable*, including "bytes" '
'objects.\n'
' The separator between elements is the string providing '
'this method.\n'
'\n'
'str.ljust(width[, fillchar])\n'
'\n'
' Return the string left justified in a string of length '
'*width*.\n'
' Padding is done using the specified *fillchar* (default '
'is an ASCII\n'
' space). The original string is returned if *width* is '
'less than or\n'
' equal to "len(s)".\n'
'\n'
'str.lower()\n'
'\n'
' Return a copy of the string with all the cased '
'characters [4]\n'
' converted to lowercase.\n'
'\n'
' The lowercasing algorithm used is described in section '
'3.13 of the\n'
' Unicode Standard.\n'
'\n'
'str.lstrip([chars])\n'
'\n'
' Return a copy of the string with leading characters '
'removed. The\n'
' *chars* argument is a string specifying the set of '
'characters to be\n'
' removed. If omitted or "None", the *chars* argument '
'defaults to\n'
' removing whitespace. The *chars* argument is not a '
'prefix; rather,\n'
' all combinations of its values are stripped:\n'
'\n'
" >>> ' spacious '.lstrip()\n"
" 'spacious '\n"
" >>> 'www.example.com'.lstrip('cmowz.')\n"
" 'example.com'\n"
'\n'
'static str.maketrans(x[, y[, z]])\n'
'\n'
' This static method returns a translation table usable '
'for\n'
' "str.translate()".\n'
'\n'
' If there is only one argument, it must be a dictionary '
'mapping\n'
' Unicode ordinals (integers) or characters (strings of '
'length 1) to\n'
' Unicode ordinals, strings (of arbitrary lengths) or '
'None.\n'
' Character keys will then be converted to ordinals.\n'
'\n'
' If there are two arguments, they must be strings of '
'equal length,\n'
' and in the resulting dictionary, each character in x '
'will be mapped\n'
' to the character at the same position in y. If there '
'is a third\n'
' argument, it must be a string, whose characters will be '
'mapped to\n'
' None in the result.\n'
'\n'
'str.partition(sep)\n'
'\n'
' Split the string at the first occurrence of *sep*, and '
'return a\n'
' 3-tuple containing the part before the separator, the '
'separator\n'
' itself, and the part after the separator. If the '
'separator is not\n'
' found, return a 3-tuple containing the string itself, '
'followed by\n'
' two empty strings.\n'
'\n'
'str.replace(old, new[, count])\n'
'\n'
' Return a copy of the string with all occurrences of '
'substring *old*\n'
' replaced by *new*. If the optional argument *count* is '
'given, only\n'
' the first *count* occurrences are replaced.\n'
'\n'
'str.rfind(sub[, start[, end]])\n'
'\n'
' Return the highest index in the string where substring '
'*sub* is\n'
' found, such that *sub* is contained within '
'"s[start:end]".\n'
' Optional arguments *start* and *end* are interpreted as '
'in slice\n'
' notation. Return "-1" on failure.\n'
'\n'
'str.rindex(sub[, start[, end]])\n'
'\n'
' Like "rfind()" but raises "ValueError" when the '
'substring *sub* is\n'
' not found.\n'
'\n'
'str.rjust(width[, fillchar])\n'
'\n'
' Return the string right justified in a string of length '
'*width*.\n'
' Padding is done using the specified *fillchar* (default '
'is an ASCII\n'
' space). The original string is returned if *width* is '
'less than or\n'
' equal to "len(s)".\n'
'\n'
'str.rpartition(sep)\n'
'\n'
' Split the string at the last occurrence of *sep*, and '
'return a\n'
' 3-tuple containing the part before the separator, the '
'separator\n'
' itself, and the part after the separator. If the '
'separator is not\n'
' found, return a 3-tuple containing two empty strings, '
'followed by\n'
' the string itself.\n'
'\n'
'str.rsplit(sep=None, maxsplit=-1)\n'
'\n'
' Return a list of the words in the string, using *sep* '
'as the\n'
' delimiter string. If *maxsplit* is given, at most '
'*maxsplit* splits\n'
' are done, the *rightmost* ones. If *sep* is not '
'specified or\n'
' "None", any whitespace string is a separator. Except '
'for splitting\n'
' from the right, "rsplit()" behaves like "split()" which '
'is\n'
' described in detail below.\n'
'\n'
'str.rstrip([chars])\n'
'\n'
' Return a copy of the string with trailing characters '
'removed. The\n'
' *chars* argument is a string specifying the set of '
'characters to be\n'
' removed. If omitted or "None", the *chars* argument '
'defaults to\n'
' removing whitespace. The *chars* argument is not a '
'suffix; rather,\n'
' all combinations of its values are stripped:\n'
'\n'
" >>> ' spacious '.rstrip()\n"
" ' spacious'\n"
" >>> 'mississippi'.rstrip('ipz')\n"
" 'mississ'\n"
'\n'
'str.split(sep=None, maxsplit=-1)\n'
'\n'
' Return a list of the words in the string, using *sep* '
'as the\n'
' delimiter string. If *maxsplit* is given, at most '
'*maxsplit*\n'
' splits are done (thus, the list will have at most '
'"maxsplit+1"\n'
' elements). If *maxsplit* is not specified or "-1", '
'then there is\n'
' no limit on the number of splits (all possible splits '
'are made).\n'
'\n'
' If *sep* is given, consecutive delimiters are not '
'grouped together\n'
' and are deemed to delimit empty strings (for example,\n'
' "\'1,,2\'.split(\',\')" returns "[\'1\', \'\', '
'\'2\']"). The *sep* argument\n'
' may consist of multiple characters (for example,\n'
' "\'1<>2<>3\'.split(\'<>\')" returns "[\'1\', \'2\', '
'\'3\']"). Splitting an\n'
' empty string with a specified separator returns '
'"[\'\']".\n'
'\n'
' For example:\n'
'\n'
" >>> '1,2,3'.split(',')\n"
" ['1', '2', '3']\n"
" >>> '1,2,3'.split(',', maxsplit=1)\n"
" ['1', '2,3']\n"
" >>> '1,2,,3,'.split(',')\n"
" ['1', '2', '', '3', '']\n"
'\n'
' If *sep* is not specified or is "None", a different '
'splitting\n'
' algorithm is applied: runs of consecutive whitespace '
'are regarded\n'
' as a single separator, and the result will contain no '
'empty strings\n'
' at the start or end if the string has leading or '
'trailing\n'
' whitespace. Consequently, splitting an empty string or '
'a string\n'
' consisting of just whitespace with a "None" separator '
'returns "[]".\n'
'\n'
' For example:\n'
'\n'
" >>> '1 2 3'.split()\n"
" ['1', '2', '3']\n"
" >>> '1 2 3'.split(maxsplit=1)\n"
" ['1', '2 3']\n"
" >>> ' 1 2 3 '.split()\n"
" ['1', '2', '3']\n"
'\n'
'str.splitlines([keepends])\n'
'\n'
' Return a list of the lines in the string, breaking at '
'line\n'
' boundaries. Line breaks are not included in the '
'resulting list\n'
' unless *keepends* is given and true.\n'
'\n'
' This method splits on the following line boundaries. '
'In\n'
' particular, the boundaries are a superset of *universal '
'newlines*.\n'
'\n'
' '
'+-------------------------+-------------------------------+\n'
' | Representation | '
'Description |\n'
' '
'+=========================+===============================+\n'
' | "\\n" | Line '
'Feed |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\r" | Carriage '
'Return |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\r\\n" | Carriage Return + Line '
'Feed |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\v" or "\\x0b" | Line '
'Tabulation |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\f" or "\\x0c" | Form '
'Feed |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x1c" | File '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x1d" | Group '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x1e" | Record '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\x85" | Next Line (C1 Control '
'Code) |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\u2028" | Line '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
' | "\\u2029" | Paragraph '
'Separator |\n'
' '
'+-------------------------+-------------------------------+\n'
'\n'
' Changed in version 3.2: "\\v" and "\\f" added to list '
'of line\n'
' boundaries.\n'
'\n'
' For example:\n'
'\n'
" >>> 'ab c\\n\\nde fg\\rkl\\r\\n'.splitlines()\n"
" ['ab c', '', 'de fg', 'kl']\n"
" >>> 'ab c\\n\\nde "
"fg\\rkl\\r\\n'.splitlines(keepends=True)\n"
" ['ab c\\n', '\\n', 'de fg\\r', 'kl\\r\\n']\n"
'\n'
' Unlike "split()" when a delimiter string *sep* is '
'given, this\n'
' method returns an empty list for the empty string, and '
'a terminal\n'
' line break does not result in an extra line:\n'
'\n'
' >>> "".splitlines()\n'
' []\n'
' >>> "One line\\n".splitlines()\n'
" ['One line']\n"
'\n'
' For comparison, "split(\'\\n\')" gives:\n'
'\n'
" >>> ''.split('\\n')\n"
" ['']\n"
" >>> 'Two lines\\n'.split('\\n')\n"
" ['Two lines', '']\n"
'\n'
'str.startswith(prefix[, start[, end]])\n'
'\n'
' Return "True" if string starts with the *prefix*, '
'otherwise return\n'
' "False". *prefix* can also be a tuple of prefixes to '
'look for.\n'
' With optional *start*, test string beginning at that '
'position.\n'
' With optional *end*, stop comparing string at that '
'position.\n'
'\n'
'str.strip([chars])\n'
'\n'
' Return a copy of the string with the leading and '
'trailing\n'
' characters removed. The *chars* argument is a string '
'specifying the\n'
' set of characters to be removed. If omitted or "None", '
'the *chars*\n'
' argument defaults to removing whitespace. The *chars* '
'argument is\n'
' not a prefix or suffix; rather, all combinations of its '
'values are\n'
' stripped:\n'
'\n'
" >>> ' spacious '.strip()\n"
" 'spacious'\n"
" >>> 'www.example.com'.strip('cmowz.')\n"
" 'example'\n"
'\n'
'str.swapcase()\n'
'\n'
' Return a copy of the string with uppercase characters '
'converted to\n'
' lowercase and vice versa. Note that it is not '
'necessarily true that\n'
' "s.swapcase().swapcase() == s".\n'
'\n'
'str.title()\n'
'\n'
' Return a titlecased version of the string where words '
'start with an\n'
' uppercase character and the remaining characters are '
'lowercase.\n'
'\n'
' For example:\n'
'\n'
" >>> 'Hello world'.title()\n"
" 'Hello World'\n"
'\n'
' The algorithm uses a simple language-independent '
'definition of a\n'
' word as groups of consecutive letters. The definition '
'works in\n'
' many contexts but it means that apostrophes in '
'contractions and\n'
' possessives form word boundaries, which may not be the '
'desired\n'
' result:\n'
'\n'
' >>> "they\'re bill\'s friends from the UK".title()\n'
' "They\'Re Bill\'S Friends From The Uk"\n'
'\n'
' A workaround for apostrophes can be constructed using '
'regular\n'
' expressions:\n'
'\n'
' >>> import re\n'
' >>> def titlecase(s):\n'
' ... return re.sub(r"[A-Za-z]+(\'[A-Za-z]+)?",\n'
' ... lambda mo: '
'mo.group(0)[0].upper() +\n'
' ... '
'mo.group(0)[1:].lower(),\n'
' ... s)\n'
' ...\n'
' >>> titlecase("they\'re bill\'s friends.")\n'
' "They\'re Bill\'s Friends."\n'
'\n'
'str.translate(table)\n'
'\n'
' Return a copy of the string in which each character has '
'been mapped\n'
' through the given translation table. The table must be '
'an object\n'
' that implements indexing via "__getitem__()", typically '
'a *mapping*\n'
' or *sequence*. When indexed by a Unicode ordinal (an '
'integer), the\n'
' table object can do any of the following: return a '
'Unicode ordinal\n'
' or a string, to map the character to one or more other '
'characters;\n'
' return "None", to delete the character from the return '
'string; or\n'
' raise a "LookupError" exception, to map the character '
'to itself.\n'
'\n'
' You can use "str.maketrans()" to create a translation '
'map from\n'
' character-to-character mappings in different formats.\n'
'\n'
' See also the "codecs" module for a more flexible '
'approach to custom\n'
' character mappings.\n'
'\n'
'str.upper()\n'
'\n'
' Return a copy of the string with all the cased '
'characters [4]\n'
' converted to uppercase. Note that '
'"str.upper().isupper()" might be\n'
' "False" if "s" contains uncased characters or if the '
'Unicode\n'
' category of the resulting character(s) is not "Lu" '
'(Letter,\n'
' uppercase), but e.g. "Lt" (Letter, titlecase).\n'
'\n'
' The uppercasing algorithm used is described in section '
'3.13 of the\n'
' Unicode Standard.\n'
'\n'
'str.zfill(width)\n'
'\n'
' Return a copy of the string left filled with ASCII '
'"\'0\'" digits to\n'
' make a string of length *width*. A leading sign prefix\n'
' ("\'+\'"/"\'-\'") is handled by inserting the padding '
'*after* the sign\n'
' character rather than before. The original string is '
'returned if\n'
' *width* is less than or equal to "len(s)".\n'
'\n'
' For example:\n'
'\n'
' >>> "42".zfill(5)\n'
" '00042'\n"
' >>> "-42".zfill(5)\n'
" '-0042'\n",
'strings': '\n'
'String and Bytes literals\n'
'*************************\n'
'\n'
'String literals are described by the following lexical '
'definitions:\n'
'\n'
' stringliteral ::= [stringprefix](shortstring | longstring)\n'
' stringprefix ::= "r" | "u" | "R" | "U"\n'
' shortstring ::= "\'" shortstringitem* "\'" | \'"\' '
'shortstringitem* \'"\'\n'
' longstring ::= "\'\'\'" longstringitem* "\'\'\'" | '
'\'"""\' longstringitem* \'"""\'\n'
' shortstringitem ::= shortstringchar | stringescapeseq\n'
' longstringitem ::= longstringchar | stringescapeseq\n'
' shortstringchar ::= <any source character except "\\" or '
'newline or the quote>\n'
' longstringchar ::= <any source character except "\\">\n'
' stringescapeseq ::= "\\" <any source character>\n'
'\n'
' bytesliteral ::= bytesprefix(shortbytes | longbytes)\n'
' bytesprefix ::= "b" | "B" | "br" | "Br" | "bR" | "BR" | '
'"rb" | "rB" | "Rb" | "RB"\n'
' shortbytes ::= "\'" shortbytesitem* "\'" | \'"\' '
'shortbytesitem* \'"\'\n'
' longbytes ::= "\'\'\'" longbytesitem* "\'\'\'" | \'"""\' '
'longbytesitem* \'"""\'\n'
' shortbytesitem ::= shortbyteschar | bytesescapeseq\n'
' longbytesitem ::= longbyteschar | bytesescapeseq\n'
' shortbyteschar ::= <any ASCII character except "\\" or newline '
'or the quote>\n'
' longbyteschar ::= <any ASCII character except "\\">\n'
' bytesescapeseq ::= "\\" <any ASCII character>\n'
'\n'
'One syntactic restriction not indicated by these productions is '
'that\n'
'whitespace is not allowed between the "stringprefix" or '
'"bytesprefix"\n'
'and the rest of the literal. The source character set is defined '
'by\n'
'the encoding declaration; it is UTF-8 if no encoding declaration '
'is\n'
'given in the source file; see section *Encoding declarations*.\n'
'\n'
'In plain English: Both types of literals can be enclosed in '
'matching\n'
'single quotes ("\'") or double quotes ("""). They can also be '
'enclosed\n'
'in matching groups of three single or double quotes (these are\n'
'generally referred to as *triple-quoted strings*). The '
'backslash\n'
'("\\") character is used to escape characters that otherwise have '
'a\n'
'special meaning, such as newline, backslash itself, or the quote\n'
'character.\n'
'\n'
'Bytes literals are always prefixed with "\'b\'" or "\'B\'"; they '
'produce\n'
'an instance of the "bytes" type instead of the "str" type. They '
'may\n'
'only contain ASCII characters; bytes with a numeric value of 128 '
'or\n'
'greater must be expressed with escapes.\n'
'\n'
'As of Python 3.3 it is possible again to prefix string literals '
'with a\n'
'"u" prefix to simplify maintenance of dual 2.x and 3.x '
'codebases.\n'
'\n'
'Both string and bytes literals may optionally be prefixed with a\n'
'letter "\'r\'" or "\'R\'"; such strings are called *raw strings* '
'and treat\n'
'backslashes as literal characters. As a result, in string '
'literals,\n'
'"\'\\U\'" and "\'\\u\'" escapes in raw strings are not treated '
'specially.\n'
"Given that Python 2.x's raw unicode literals behave differently "
'than\n'
'Python 3.x\'s the "\'ur\'" syntax is not supported.\n'
'\n'
'New in version 3.3: The "\'rb\'" prefix of raw bytes literals has '
'been\n'
'added as a synonym of "\'br\'".\n'
'\n'
'New in version 3.3: Support for the unicode legacy literal\n'
'("u\'value\'") was reintroduced to simplify the maintenance of '
'dual\n'
'Python 2.x and 3.x codebases. See **PEP 414** for more '
'information.\n'
'\n'
'In triple-quoted literals, unescaped newlines and quotes are '
'allowed\n'
'(and are retained), except that three unescaped quotes in a row\n'
'terminate the literal. (A "quote" is the character used to open '
'the\n'
'literal, i.e. either "\'" or """.)\n'
'\n'
'Unless an "\'r\'" or "\'R\'" prefix is present, escape sequences '
'in string\n'
'and bytes literals are interpreted according to rules similar to '
'those\n'
'used by Standard C. The recognized escape sequences are:\n'
'\n'
'+-------------------+-----------------------------------+---------+\n'
'| Escape Sequence | Meaning | Notes '
'|\n'
'+===================+===================================+=========+\n'
'| "\\newline" | Backslash and newline ignored '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\\\" | Backslash ("\\") '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\\'" | Single quote ("\'") '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\"" | Double quote (""") '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\a" | ASCII Bell (BEL) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\b" | ASCII Backspace (BS) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\f" | ASCII Formfeed (FF) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\n" | ASCII Linefeed (LF) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\r" | ASCII Carriage Return (CR) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\t" | ASCII Horizontal Tab (TAB) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\v" | ASCII Vertical Tab (VT) '
'| |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\ooo" | Character with octal value *ooo* | '
'(1,3) |\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\xhh" | Character with hex value *hh* | '
'(2,3) |\n'
'+-------------------+-----------------------------------+---------+\n'
'\n'
'Escape sequences only recognized in string literals are:\n'
'\n'
'+-------------------+-----------------------------------+---------+\n'
'| Escape Sequence | Meaning | Notes '
'|\n'
'+===================+===================================+=========+\n'
'| "\\N{name}" | Character named *name* in the | '
'(4) |\n'
'| | Unicode database | '
'|\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\uxxxx" | Character with 16-bit hex value | '
'(5) |\n'
'| | *xxxx* | '
'|\n'
'+-------------------+-----------------------------------+---------+\n'
'| "\\Uxxxxxxxx" | Character with 32-bit hex value | '
'(6) |\n'
'| | *xxxxxxxx* | '
'|\n'
'+-------------------+-----------------------------------+---------+\n'
'\n'
'Notes:\n'
'\n'
'1. As in Standard C, up to three octal digits are accepted.\n'
'\n'
'2. Unlike in Standard C, exactly two hex digits are required.\n'
'\n'
'3. In a bytes literal, hexadecimal and octal escapes denote the\n'
' byte with the given value. In a string literal, these escapes\n'
' denote a Unicode character with the given value.\n'
'\n'
'4. Changed in version 3.3: Support for name aliases [1] has been\n'
' added.\n'
'\n'
'5. Individual code units which form parts of a surrogate pair '
'can\n'
' be encoded using this escape sequence. Exactly four hex '
'digits are\n'
' required.\n'
'\n'
'6. Any Unicode character can be encoded this way. Exactly eight\n'
' hex digits are required.\n'
'\n'
'Unlike Standard C, all unrecognized escape sequences are left in '
'the\n'
'string unchanged, i.e., *the backslash is left in the result*. '
'(This\n'
'behavior is useful when debugging: if an escape sequence is '
'mistyped,\n'
'the resulting output is more easily recognized as broken.) It is '
'also\n'
'important to note that the escape sequences only recognized in '
'string\n'
'literals fall into the category of unrecognized escapes for '
'bytes\n'
'literals.\n'
'\n'
'Even in a raw literal, quotes can be escaped with a backslash, '
'but the\n'
'backslash remains in the result; for example, "r"\\""" is a '
'valid\n'
'string literal consisting of two characters: a backslash and a '
'double\n'
'quote; "r"\\"" is not a valid string literal (even a raw string '
'cannot\n'
'end in an odd number of backslashes). Specifically, *a raw '
'literal\n'
'cannot end in a single backslash* (since the backslash would '
'escape\n'
'the following quote character). Note also that a single '
'backslash\n'
'followed by a newline is interpreted as those two characters as '
'part\n'
'of the literal, *not* as a line continuation.\n',
'subscriptions': '\n'
'Subscriptions\n'
'*************\n'
'\n'
'A subscription selects an item of a sequence (string, tuple '
'or list)\n'
'or mapping (dictionary) object:\n'
'\n'
' subscription ::= primary "[" expression_list "]"\n'
'\n'
'The primary must evaluate to an object that supports '
'subscription\n'
'(lists or dictionaries for example). User-defined objects '
'can support\n'
'subscription by defining a "__getitem__()" method.\n'
'\n'
'For built-in objects, there are two types of objects that '
'support\n'
'subscription:\n'
'\n'
'If the primary is a mapping, the expression list must '
'evaluate to an\n'
'object whose value is one of the keys of the mapping, and '
'the\n'
'subscription selects the value in the mapping that '
'corresponds to that\n'
'key. (The expression list is a tuple except if it has '
'exactly one\n'
'item.)\n'
'\n'
'If the primary is a sequence, the expression (list) must '
'evaluate to\n'
'an integer or a slice (as discussed in the following '
'section).\n'
'\n'
'The formal syntax makes no special provision for negative '
'indices in\n'
'sequences; however, built-in sequences all provide a '
'"__getitem__()"\n'
'method that interprets negative indices by adding the '
'length of the\n'
'sequence to the index (so that "x[-1]" selects the last '
'item of "x").\n'
'The resulting value must be a nonnegative integer less than '
'the number\n'
'of items in the sequence, and the subscription selects the '
'item whose\n'
'index is that value (counting from zero). Since the support '
'for\n'
"negative indices and slicing occurs in the object's "
'"__getitem__()"\n'
'method, subclasses overriding this method will need to '
'explicitly add\n'
'that support.\n'
'\n'
"A string's items are characters. A character is not a "
'separate data\n'
'type but a string of exactly one character.\n',
'truth': '\n'
'Truth Value Testing\n'
'*******************\n'
'\n'
'Any object can be tested for truth value, for use in an "if" or\n'
'"while" condition or as operand of the Boolean operations below. '
'The\n'
'following values are considered false:\n'
'\n'
'* "None"\n'
'\n'
'* "False"\n'
'\n'
'* zero of any numeric type, for example, "0", "0.0", "0j".\n'
'\n'
'* any empty sequence, for example, "\'\'", "()", "[]".\n'
'\n'
'* any empty mapping, for example, "{}".\n'
'\n'
'* instances of user-defined classes, if the class defines a\n'
' "__bool__()" or "__len__()" method, when that method returns the\n'
' integer zero or "bool" value "False". [1]\n'
'\n'
'All other values are considered true --- so objects of many types '
'are\n'
'always true.\n'
'\n'
'Operations and built-in functions that have a Boolean result '
'always\n'
'return "0" or "False" for false and "1" or "True" for true, unless\n'
'otherwise stated. (Important exception: the Boolean operations '
'"or"\n'
'and "and" always return one of their operands.)\n',
'try': '\n'
'The "try" statement\n'
'*******************\n'
'\n'
'The "try" statement specifies exception handlers and/or cleanup code\n'
'for a group of statements:\n'
'\n'
' try_stmt ::= try1_stmt | try2_stmt\n'
' try1_stmt ::= "try" ":" suite\n'
' ("except" [expression ["as" identifier]] ":" '
'suite)+\n'
' ["else" ":" suite]\n'
' ["finally" ":" suite]\n'
' try2_stmt ::= "try" ":" suite\n'
' "finally" ":" suite\n'
'\n'
'The "except" clause(s) specify one or more exception handlers. When '
'no\n'
'exception occurs in the "try" clause, no exception handler is\n'
'executed. When an exception occurs in the "try" suite, a search for '
'an\n'
'exception handler is started. This search inspects the except '
'clauses\n'
'in turn until one is found that matches the exception. An '
'expression-\n'
'less except clause, if present, must be last; it matches any\n'
'exception. For an except clause with an expression, that expression\n'
'is evaluated, and the clause matches the exception if the resulting\n'
'object is "compatible" with the exception. An object is compatible\n'
'with an exception if it is the class or a base class of the '
'exception\n'
'object or a tuple containing an item compatible with the exception.\n'
'\n'
'If no except clause matches the exception, the search for an '
'exception\n'
'handler continues in the surrounding code and on the invocation '
'stack.\n'
'[1]\n'
'\n'
'If the evaluation of an expression in the header of an except clause\n'
'raises an exception, the original search for a handler is canceled '
'and\n'
'a search starts for the new exception in the surrounding code and on\n'
'the call stack (it is treated as if the entire "try" statement '
'raised\n'
'the exception).\n'
'\n'
'When a matching except clause is found, the exception is assigned to\n'
'the target specified after the "as" keyword in that except clause, '
'if\n'
"present, and the except clause's suite is executed. All except\n"
'clauses must have an executable block. When the end of this block '
'is\n'
'reached, execution continues normally after the entire try '
'statement.\n'
'(This means that if two nested handlers exist for the same '
'exception,\n'
'and the exception occurs in the try clause of the inner handler, the\n'
'outer handler will not handle the exception.)\n'
'\n'
'When an exception has been assigned using "as target", it is cleared\n'
'at the end of the except clause. This is as if\n'
'\n'
' except E as N:\n'
' foo\n'
'\n'
'was translated to\n'
'\n'
' except E as N:\n'
' try:\n'
' foo\n'
' finally:\n'
' del N\n'
'\n'
'This means the exception must be assigned to a different name to be\n'
'able to refer to it after the except clause. Exceptions are cleared\n'
'because with the traceback attached to them, they form a reference\n'
'cycle with the stack frame, keeping all locals in that frame alive\n'
'until the next garbage collection occurs.\n'
'\n'
"Before an except clause's suite is executed, details about the\n"
'exception are stored in the "sys" module and can be accessed via\n'
'"sys.exc_info()". "sys.exc_info()" returns a 3-tuple consisting of '
'the\n'
'exception class, the exception instance and a traceback object (see\n'
'section *The standard type hierarchy*) identifying the point in the\n'
'program where the exception occurred. "sys.exc_info()" values are\n'
'restored to their previous values (before the call) when returning\n'
'from a function that handled an exception.\n'
'\n'
'The optional "else" clause is executed if and when control flows off\n'
'the end of the "try" clause. [2] Exceptions in the "else" clause are\n'
'not handled by the preceding "except" clauses.\n'
'\n'
'If "finally" is present, it specifies a \'cleanup\' handler. The '
'"try"\n'
'clause is executed, including any "except" and "else" clauses. If '
'an\n'
'exception occurs in any of the clauses and is not handled, the\n'
'exception is temporarily saved. The "finally" clause is executed. '
'If\n'
'there is a saved exception it is re-raised at the end of the '
'"finally"\n'
'clause. If the "finally" clause raises another exception, the saved\n'
'exception is set as the context of the new exception. If the '
'"finally"\n'
'clause executes a "return" or "break" statement, the saved exception\n'
'is discarded:\n'
'\n'
' >>> def f():\n'
' ... try:\n'
' ... 1/0\n'
' ... finally:\n'
' ... return 42\n'
' ...\n'
' >>> f()\n'
' 42\n'
'\n'
'The exception information is not available to the program during\n'
'execution of the "finally" clause.\n'
'\n'
'When a "return", "break" or "continue" statement is executed in the\n'
'"try" suite of a "try"..."finally" statement, the "finally" clause '
'is\n'
'also executed \'on the way out.\' A "continue" statement is illegal '
'in\n'
'the "finally" clause. (The reason is a problem with the current\n'
'implementation --- this restriction may be lifted in the future).\n'
'\n'
'The return value of a function is determined by the last "return"\n'
'statement executed. Since the "finally" clause always executes, a\n'
'"return" statement executed in the "finally" clause will always be '
'the\n'
'last one executed:\n'
'\n'
' >>> def foo():\n'
' ... try:\n'
" ... return 'try'\n"
' ... finally:\n'
" ... return 'finally'\n"
' ...\n'
' >>> foo()\n'
" 'finally'\n"
'\n'
'Additional information on exceptions can be found in section\n'
'*Exceptions*, and information on using the "raise" statement to\n'
'generate exceptions may be found in section *The raise statement*.\n',
'types': '\n'
'The standard type hierarchy\n'
'***************************\n'
'\n'
'Below is a list of the types that are built into Python. '
'Extension\n'
'modules (written in C, Java, or other languages, depending on the\n'
'implementation) can define additional types. Future versions of\n'
'Python may add types to the type hierarchy (e.g., rational '
'numbers,\n'
'efficiently stored arrays of integers, etc.), although such '
'additions\n'
'will often be provided via the standard library instead.\n'
'\n'
'Some of the type descriptions below contain a paragraph listing\n'
"'special attributes.' These are attributes that provide access to "
'the\n'
'implementation and are not intended for general use. Their '
'definition\n'
'may change in the future.\n'
'\n'
'None\n'
' This type has a single value. There is a single object with '
'this\n'
' value. This object is accessed through the built-in name "None". '
'It\n'
' is used to signify the absence of a value in many situations, '
'e.g.,\n'
" it is returned from functions that don't explicitly return\n"
' anything. Its truth value is false.\n'
'\n'
'NotImplemented\n'
' This type has a single value. There is a single object with '
'this\n'
' value. This object is accessed through the built-in name\n'
' "NotImplemented". Numeric methods and rich comparison methods\n'
' should return this value if they do not implement the operation '
'for\n'
' the operands provided. (The interpreter will then try the\n'
' reflected operation, or some other fallback, depending on the\n'
' operator.) Its truth value is true.\n'
'\n'
' See *Implementing the arithmetic operations* for more details.\n'
'\n'
'Ellipsis\n'
' This type has a single value. There is a single object with '
'this\n'
' value. This object is accessed through the literal "..." or the\n'
' built-in name "Ellipsis". Its truth value is true.\n'
'\n'
'"numbers.Number"\n'
' These are created by numeric literals and returned as results '
'by\n'
' arithmetic operators and arithmetic built-in functions. '
'Numeric\n'
' objects are immutable; once created their value never changes.\n'
' Python numbers are of course strongly related to mathematical\n'
' numbers, but subject to the limitations of numerical '
'representation\n'
' in computers.\n'
'\n'
' Python distinguishes between integers, floating point numbers, '
'and\n'
' complex numbers:\n'
'\n'
' "numbers.Integral"\n'
' These represent elements from the mathematical set of '
'integers\n'
' (positive and negative).\n'
'\n'
' There are two types of integers:\n'
'\n'
' Integers ("int")\n'
'\n'
' These represent numbers in an unlimited range, subject to\n'
' available (virtual) memory only. For the purpose of '
'shift\n'
' and mask operations, a binary representation is assumed, '
'and\n'
" negative numbers are represented in a variant of 2's\n"
' complement which gives the illusion of an infinite string '
'of\n'
' sign bits extending to the left.\n'
'\n'
' Booleans ("bool")\n'
' These represent the truth values False and True. The two\n'
' objects representing the values "False" and "True" are '
'the\n'
' only Boolean objects. The Boolean type is a subtype of '
'the\n'
' integer type, and Boolean values behave like the values 0 '
'and\n'
' 1, respectively, in almost all contexts, the exception '
'being\n'
' that when converted to a string, the strings ""False"" or\n'
' ""True"" are returned, respectively.\n'
'\n'
' The rules for integer representation are intended to give '
'the\n'
' most meaningful interpretation of shift and mask operations\n'
' involving negative integers.\n'
'\n'
' "numbers.Real" ("float")\n'
' These represent machine-level double precision floating '
'point\n'
' numbers. You are at the mercy of the underlying machine\n'
' architecture (and C or Java implementation) for the accepted\n'
' range and handling of overflow. Python does not support '
'single-\n'
' precision floating point numbers; the savings in processor '
'and\n'
' memory usage that are usually the reason for using these are\n'
' dwarfed by the overhead of using objects in Python, so there '
'is\n'
' no reason to complicate the language with two kinds of '
'floating\n'
' point numbers.\n'
'\n'
' "numbers.Complex" ("complex")\n'
' These represent complex numbers as a pair of machine-level\n'
' double precision floating point numbers. The same caveats '
'apply\n'
' as for floating point numbers. The real and imaginary parts '
'of a\n'
' complex number "z" can be retrieved through the read-only\n'
' attributes "z.real" and "z.imag".\n'
'\n'
'Sequences\n'
' These represent finite ordered sets indexed by non-negative\n'
' numbers. The built-in function "len()" returns the number of '
'items\n'
' of a sequence. When the length of a sequence is *n*, the index '
'set\n'
' contains the numbers 0, 1, ..., *n*-1. Item *i* of sequence *a* '
'is\n'
' selected by "a[i]".\n'
'\n'
' Sequences also support slicing: "a[i:j]" selects all items with\n'
' index *k* such that *i* "<=" *k* "<" *j*. When used as an\n'
' expression, a slice is a sequence of the same type. This '
'implies\n'
' that the index set is renumbered so that it starts at 0.\n'
'\n'
' Some sequences also support "extended slicing" with a third '
'"step"\n'
' parameter: "a[i:j:k]" selects all items of *a* with index *x* '
'where\n'
' "x = i + n*k", *n* ">=" "0" and *i* "<=" *x* "<" *j*.\n'
'\n'
' Sequences are distinguished according to their mutability:\n'
'\n'
' Immutable sequences\n'
' An object of an immutable sequence type cannot change once it '
'is\n'
' created. (If the object contains references to other '
'objects,\n'
' these other objects may be mutable and may be changed; '
'however,\n'
' the collection of objects directly referenced by an '
'immutable\n'
' object cannot change.)\n'
'\n'
' The following types are immutable sequences:\n'
'\n'
' Strings\n'
' A string is a sequence of values that represent Unicode '
'code\n'
' points. All the code points in the range "U+0000 - '
'U+10FFFF"\n'
" can be represented in a string. Python doesn't have a "
'"char"\n'
' type; instead, every code point in the string is '
'represented\n'
' as a string object with length "1". The built-in '
'function\n'
' "ord()" converts a code point from its string form to an\n'
' integer in the range "0 - 10FFFF"; "chr()" converts an\n'
' integer in the range "0 - 10FFFF" to the corresponding '
'length\n'
' "1" string object. "str.encode()" can be used to convert '
'a\n'
' "str" to "bytes" using the given text encoding, and\n'
' "bytes.decode()" can be used to achieve the opposite.\n'
'\n'
' Tuples\n'
' The items of a tuple are arbitrary Python objects. Tuples '
'of\n'
' two or more items are formed by comma-separated lists of\n'
" expressions. A tuple of one item (a 'singleton') can be\n"
' formed by affixing a comma to an expression (an expression '
'by\n'
' itself does not create a tuple, since parentheses must be\n'
' usable for grouping of expressions). An empty tuple can '
'be\n'
' formed by an empty pair of parentheses.\n'
'\n'
' Bytes\n'
' A bytes object is an immutable array. The items are '
'8-bit\n'
' bytes, represented by integers in the range 0 <= x < 256.\n'
' Bytes literals (like "b\'abc\'") and the built-in '
'function\n'
' "bytes()" can be used to construct bytes objects. Also,\n'
' bytes objects can be decoded to strings via the '
'"decode()"\n'
' method.\n'
'\n'
' Mutable sequences\n'
' Mutable sequences can be changed after they are created. '
'The\n'
' subscription and slicing notations can be used as the target '
'of\n'
' assignment and "del" (delete) statements.\n'
'\n'
' There are currently two intrinsic mutable sequence types:\n'
'\n'
' Lists\n'
' The items of a list are arbitrary Python objects. Lists '
'are\n'
' formed by placing a comma-separated list of expressions '
'in\n'
' square brackets. (Note that there are no special cases '
'needed\n'
' to form lists of length 0 or 1.)\n'
'\n'
' Byte Arrays\n'
' A bytearray object is a mutable array. They are created '
'by\n'
' the built-in "bytearray()" constructor. Aside from being\n'
' mutable (and hence unhashable), byte arrays otherwise '
'provide\n'
' the same interface and functionality as immutable bytes\n'
' objects.\n'
'\n'
' The extension module "array" provides an additional example '
'of a\n'
' mutable sequence type, as does the "collections" module.\n'
'\n'
'Set types\n'
' These represent unordered, finite sets of unique, immutable\n'
' objects. As such, they cannot be indexed by any subscript. '
'However,\n'
' they can be iterated over, and the built-in function "len()"\n'
' returns the number of items in a set. Common uses for sets are '
'fast\n'
' membership testing, removing duplicates from a sequence, and\n'
' computing mathematical operations such as intersection, union,\n'
' difference, and symmetric difference.\n'
'\n'
' For set elements, the same immutability rules apply as for\n'
' dictionary keys. Note that numeric types obey the normal rules '
'for\n'
' numeric comparison: if two numbers compare equal (e.g., "1" and\n'
' "1.0"), only one of them can be contained in a set.\n'
'\n'
' There are currently two intrinsic set types:\n'
'\n'
' Sets\n'
' These represent a mutable set. They are created by the '
'built-in\n'
' "set()" constructor and can be modified afterwards by '
'several\n'
' methods, such as "add()".\n'
'\n'
' Frozen sets\n'
' These represent an immutable set. They are created by the\n'
' built-in "frozenset()" constructor. As a frozenset is '
'immutable\n'
' and *hashable*, it can be used again as an element of '
'another\n'
' set, or as a dictionary key.\n'
'\n'
'Mappings\n'
' These represent finite sets of objects indexed by arbitrary '
'index\n'
' sets. The subscript notation "a[k]" selects the item indexed by '
'"k"\n'
' from the mapping "a"; this can be used in expressions and as '
'the\n'
' target of assignments or "del" statements. The built-in '
'function\n'
' "len()" returns the number of items in a mapping.\n'
'\n'
' There is currently a single intrinsic mapping type:\n'
'\n'
' Dictionaries\n'
' These represent finite sets of objects indexed by nearly\n'
' arbitrary values. The only types of values not acceptable '
'as\n'
' keys are values containing lists or dictionaries or other\n'
' mutable types that are compared by value rather than by '
'object\n'
' identity, the reason being that the efficient implementation '
'of\n'
" dictionaries requires a key's hash value to remain constant.\n"
' Numeric types used for keys obey the normal rules for '
'numeric\n'
' comparison: if two numbers compare equal (e.g., "1" and '
'"1.0")\n'
' then they can be used interchangeably to index the same\n'
' dictionary entry.\n'
'\n'
' Dictionaries are mutable; they can be created by the "{...}"\n'
' notation (see section *Dictionary displays*).\n'
'\n'
' The extension modules "dbm.ndbm" and "dbm.gnu" provide\n'
' additional examples of mapping types, as does the '
'"collections"\n'
' module.\n'
'\n'
'Callable types\n'
' These are the types to which the function call operation (see\n'
' section *Calls*) can be applied:\n'
'\n'
' User-defined functions\n'
' A user-defined function object is created by a function\n'
' definition (see section *Function definitions*). It should '
'be\n'
' called with an argument list containing the same number of '
'items\n'
" as the function's formal parameter list.\n"
'\n'
' Special attributes:\n'
'\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | Attribute | Meaning '
'| |\n'
' '
'+===========================+=================================+=============+\n'
' | "__doc__" | The function\'s '
'documentation | Writable |\n'
' | | string, or "None" if '
'| |\n'
' | | unavailable; not inherited by '
'| |\n'
' | | subclasses '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__name__" | The function\'s '
'name | Writable |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__qualname__" | The function\'s *qualified '
'name* | Writable |\n'
' | | New in version 3.3. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__module__" | The name of the module the '
'| Writable |\n'
' | | function was defined in, or '
'| |\n'
' | | "None" if unavailable. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__defaults__" | A tuple containing default '
'| Writable |\n'
' | | argument values for those '
'| |\n'
' | | arguments that have defaults, '
'| |\n'
' | | or "None" if no arguments have '
'| |\n'
' | | a default value '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__code__" | The code object representing '
'| Writable |\n'
' | | the compiled function body. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__globals__" | A reference to the dictionary '
'| Read-only |\n'
" | | that holds the function's "
'| |\n'
' | | global variables --- the global '
'| |\n'
' | | namespace of the module in '
'| |\n'
' | | which the function was defined. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__dict__" | The namespace supporting '
'| Writable |\n'
' | | arbitrary function attributes. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__closure__" | "None" or a tuple of cells that '
'| Read-only |\n'
' | | contain bindings for the '
'| |\n'
" | | function's free variables. "
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__annotations__" | A dict containing annotations '
'| Writable |\n'
' | | of parameters. The keys of the '
'| |\n'
' | | dict are the parameter names, '
'| |\n'
' | | and "\'return\'" for the '
'return | |\n'
' | | annotation, if provided. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
' | "__kwdefaults__" | A dict containing defaults for '
'| Writable |\n'
' | | keyword-only parameters. '
'| |\n'
' '
'+---------------------------+---------------------------------+-------------+\n'
'\n'
' Most of the attributes labelled "Writable" check the type of '
'the\n'
' assigned value.\n'
'\n'
' Function objects also support getting and setting arbitrary\n'
' attributes, which can be used, for example, to attach '
'metadata\n'
' to functions. Regular attribute dot-notation is used to get '
'and\n'
' set such attributes. *Note that the current implementation '
'only\n'
' supports function attributes on user-defined functions. '
'Function\n'
' attributes on built-in functions may be supported in the\n'
' future.*\n'
'\n'
" Additional information about a function's definition can be\n"
' retrieved from its code object; see the description of '
'internal\n'
' types below.\n'
'\n'
' Instance methods\n'
' An instance method object combines a class, a class instance '
'and\n'
' any callable object (normally a user-defined function).\n'
'\n'
' Special read-only attributes: "__self__" is the class '
'instance\n'
' object, "__func__" is the function object; "__doc__" is the\n'
' method\'s documentation (same as "__func__.__doc__"); '
'"__name__"\n'
' is the method name (same as "__func__.__name__"); '
'"__module__"\n'
' is the name of the module the method was defined in, or '
'"None"\n'
' if unavailable.\n'
'\n'
' Methods also support accessing (but not setting) the '
'arbitrary\n'
' function attributes on the underlying function object.\n'
'\n'
' User-defined method objects may be created when getting an\n'
' attribute of a class (perhaps via an instance of that class), '
'if\n'
' that attribute is a user-defined function object or a class\n'
' method object.\n'
'\n'
' When an instance method object is created by retrieving a '
'user-\n'
' defined function object from a class via one of its '
'instances,\n'
' its "__self__" attribute is the instance, and the method '
'object\n'
' is said to be bound. The new method\'s "__func__" attribute '
'is\n'
' the original function object.\n'
'\n'
' When a user-defined method object is created by retrieving\n'
' another method object from a class or instance, the behaviour '
'is\n'
' the same as for a function object, except that the '
'"__func__"\n'
' attribute of the new instance is not the original method '
'object\n'
' but its "__func__" attribute.\n'
'\n'
' When an instance method object is created by retrieving a '
'class\n'
' method object from a class or instance, its "__self__" '
'attribute\n'
' is the class itself, and its "__func__" attribute is the\n'
' function object underlying the class method.\n'
'\n'
' When an instance method object is called, the underlying\n'
' function ("__func__") is called, inserting the class '
'instance\n'
' ("__self__") in front of the argument list. For instance, '
'when\n'
' "C" is a class which contains a definition for a function '
'"f()",\n'
' and "x" is an instance of "C", calling "x.f(1)" is equivalent '
'to\n'
' calling "C.f(x, 1)".\n'
'\n'
' When an instance method object is derived from a class '
'method\n'
' object, the "class instance" stored in "__self__" will '
'actually\n'
' be the class itself, so that calling either "x.f(1)" or '
'"C.f(1)"\n'
' is equivalent to calling "f(C,1)" where "f" is the '
'underlying\n'
' function.\n'
'\n'
' Note that the transformation from function object to '
'instance\n'
' method object happens each time the attribute is retrieved '
'from\n'
' the instance. In some cases, a fruitful optimization is to\n'
' assign the attribute to a local variable and call that local\n'
' variable. Also notice that this transformation only happens '
'for\n'
' user-defined functions; other callable objects (and all non-\n'
' callable objects) are retrieved without transformation. It '
'is\n'
' also important to note that user-defined functions which are\n'
' attributes of a class instance are not converted to bound\n'
' methods; this *only* happens when the function is an '
'attribute\n'
' of the class.\n'
'\n'
' Generator functions\n'
' A function or method which uses the "yield" statement (see\n'
' section *The yield statement*) is called a *generator '
'function*.\n'
' Such a function, when called, always returns an iterator '
'object\n'
' which can be used to execute the body of the function: '
'calling\n'
' the iterator\'s "iterator.__next__()" method will cause the\n'
' function to execute until it provides a value using the '
'"yield"\n'
' statement. When the function executes a "return" statement '
'or\n'
' falls off the end, a "StopIteration" exception is raised and '
'the\n'
' iterator will have reached the end of the set of values to '
'be\n'
' returned.\n'
'\n'
' Built-in functions\n'
' A built-in function object is a wrapper around a C function.\n'
' Examples of built-in functions are "len()" and "math.sin()"\n'
' ("math" is a standard built-in module). The number and type '
'of\n'
' the arguments are determined by the C function. Special '
'read-\n'
' only attributes: "__doc__" is the function\'s documentation\n'
' string, or "None" if unavailable; "__name__" is the '
"function's\n"
' name; "__self__" is set to "None" (but see the next item);\n'
' "__module__" is the name of the module the function was '
'defined\n'
' in or "None" if unavailable.\n'
'\n'
' Built-in methods\n'
' This is really a different disguise of a built-in function, '
'this\n'
' time containing an object passed to the C function as an\n'
' implicit extra argument. An example of a built-in method is\n'
' "alist.append()", assuming *alist* is a list object. In this\n'
' case, the special read-only attribute "__self__" is set to '
'the\n'
' object denoted by *alist*.\n'
'\n'
' Classes\n'
' Classes are callable. These objects normally act as '
'factories\n'
' for new instances of themselves, but variations are possible '
'for\n'
' class types that override "__new__()". The arguments of the\n'
' call are passed to "__new__()" and, in the typical case, to\n'
' "__init__()" to initialize the new instance.\n'
'\n'
' Class Instances\n'
' Instances of arbitrary classes can be made callable by '
'defining\n'
' a "__call__()" method in their class.\n'
'\n'
'Modules\n'
' Modules are a basic organizational unit of Python code, and are\n'
' created by the *import system* as invoked either by the '
'"import"\n'
' statement (see "import"), or by calling functions such as\n'
' "importlib.import_module()" and built-in "__import__()". A '
'module\n'
' object has a namespace implemented by a dictionary object (this '
'is\n'
' the dictionary referenced by the "__globals__" attribute of\n'
' functions defined in the module). Attribute references are\n'
' translated to lookups in this dictionary, e.g., "m.x" is '
'equivalent\n'
' to "m.__dict__["x"]". A module object does not contain the code\n'
" object used to initialize the module (since it isn't needed "
'once\n'
' the initialization is done).\n'
'\n'
" Attribute assignment updates the module's namespace dictionary,\n"
' e.g., "m.x = 1" is equivalent to "m.__dict__["x"] = 1".\n'
'\n'
' Special read-only attribute: "__dict__" is the module\'s '
'namespace\n'
' as a dictionary object.\n'
'\n'
' **CPython implementation detail:** Because of the way CPython\n'
' clears module dictionaries, the module dictionary will be '
'cleared\n'
' when the module falls out of scope even if the dictionary still '
'has\n'
' live references. To avoid this, copy the dictionary or keep '
'the\n'
' module around while using its dictionary directly.\n'
'\n'
' Predefined (writable) attributes: "__name__" is the module\'s '
'name;\n'
' "__doc__" is the module\'s documentation string, or "None" if\n'
' unavailable; "__file__" is the pathname of the file from which '
'the\n'
' module was loaded, if it was loaded from a file. The "__file__"\n'
' attribute may be missing for certain types of modules, such as '
'C\n'
' modules that are statically linked into the interpreter; for\n'
' extension modules loaded dynamically from a shared library, it '
'is\n'
' the pathname of the shared library file.\n'
'\n'
'Custom classes\n'
' Custom class types are typically created by class definitions '
'(see\n'
' section *Class definitions*). A class has a namespace '
'implemented\n'
' by a dictionary object. Class attribute references are '
'translated\n'
' to lookups in this dictionary, e.g., "C.x" is translated to\n'
' "C.__dict__["x"]" (although there are a number of hooks which '
'allow\n'
' for other means of locating attributes). When the attribute name '
'is\n'
' not found there, the attribute search continues in the base\n'
' classes. This search of the base classes uses the C3 method\n'
' resolution order which behaves correctly even in the presence '
'of\n'
" 'diamond' inheritance structures where there are multiple\n"
' inheritance paths leading back to a common ancestor. Additional\n'
' details on the C3 MRO used by Python can be found in the\n'
' documentation accompanying the 2.3 release at\n'
' https://www.python.org/download/releases/2.3/mro/.\n'
'\n'
' When a class attribute reference (for class "C", say) would '
'yield a\n'
' class method object, it is transformed into an instance method\n'
' object whose "__self__" attributes is "C". When it would yield '
'a\n'
' static method object, it is transformed into the object wrapped '
'by\n'
' the static method object. See section *Implementing '
'Descriptors*\n'
' for another way in which attributes retrieved from a class may\n'
' differ from those actually contained in its "__dict__".\n'
'\n'
" Class attribute assignments update the class's dictionary, "
'never\n'
' the dictionary of a base class.\n'
'\n'
' A class object can be called (see above) to yield a class '
'instance\n'
' (see below).\n'
'\n'
' Special attributes: "__name__" is the class name; "__module__" '
'is\n'
' the module name in which the class was defined; "__dict__" is '
'the\n'
' dictionary containing the class\'s namespace; "__bases__" is a '
'tuple\n'
' (possibly empty or a singleton) containing the base classes, in '
'the\n'
' order of their occurrence in the base class list; "__doc__" is '
'the\n'
" class's documentation string, or None if undefined.\n"
'\n'
'Class instances\n'
' A class instance is created by calling a class object (see '
'above).\n'
' A class instance has a namespace implemented as a dictionary '
'which\n'
' is the first place in which attribute references are searched.\n'
" When an attribute is not found there, and the instance's class "
'has\n'
' an attribute by that name, the search continues with the class\n'
' attributes. If a class attribute is found that is a '
'user-defined\n'
' function object, it is transformed into an instance method '
'object\n'
' whose "__self__" attribute is the instance. Static method and\n'
' class method objects are also transformed; see above under\n'
' "Classes". See section *Implementing Descriptors* for another '
'way\n'
' in which attributes of a class retrieved via its instances may\n'
" differ from the objects actually stored in the class's "
'"__dict__".\n'
" If no class attribute is found, and the object's class has a\n"
' "__getattr__()" method, that is called to satisfy the lookup.\n'
'\n'
" Attribute assignments and deletions update the instance's\n"
" dictionary, never a class's dictionary. If the class has a\n"
' "__setattr__()" or "__delattr__()" method, this is called '
'instead\n'
' of updating the instance dictionary directly.\n'
'\n'
' Class instances can pretend to be numbers, sequences, or '
'mappings\n'
' if they have methods with certain special names. See section\n'
' *Special method names*.\n'
'\n'
' Special attributes: "__dict__" is the attribute dictionary;\n'
' "__class__" is the instance\'s class.\n'
'\n'
'I/O objects (also known as file objects)\n'
' A *file object* represents an open file. Various shortcuts are\n'
' available to create file objects: the "open()" built-in '
'function,\n'
' and also "os.popen()", "os.fdopen()", and the "makefile()" '
'method\n'
' of socket objects (and perhaps by other functions or methods\n'
' provided by extension modules).\n'
'\n'
' The objects "sys.stdin", "sys.stdout" and "sys.stderr" are\n'
" initialized to file objects corresponding to the interpreter's\n"
' standard input, output and error streams; they are all open in '
'text\n'
' mode and therefore follow the interface defined by the\n'
' "io.TextIOBase" abstract class.\n'
'\n'
'Internal types\n'
' A few types used internally by the interpreter are exposed to '
'the\n'
' user. Their definitions may change with future versions of the\n'
' interpreter, but they are mentioned here for completeness.\n'
'\n'
' Code objects\n'
' Code objects represent *byte-compiled* executable Python '
'code,\n'
' or *bytecode*. The difference between a code object and a\n'
' function object is that the function object contains an '
'explicit\n'
" reference to the function's globals (the module in which it "
'was\n'
' defined), while a code object contains no context; also the\n'
' default argument values are stored in the function object, '
'not\n'
' in the code object (because they represent values calculated '
'at\n'
' run-time). Unlike function objects, code objects are '
'immutable\n'
' and contain no references (directly or indirectly) to '
'mutable\n'
' objects.\n'
'\n'
' Special read-only attributes: "co_name" gives the function '
'name;\n'
' "co_argcount" is the number of positional arguments '
'(including\n'
' arguments with default values); "co_nlocals" is the number '
'of\n'
' local variables used by the function (including arguments);\n'
' "co_varnames" is a tuple containing the names of the local\n'
' variables (starting with the argument names); "co_cellvars" '
'is a\n'
' tuple containing the names of local variables that are\n'
' referenced by nested functions; "co_freevars" is a tuple\n'
' containing the names of free variables; "co_code" is a '
'string\n'
' representing the sequence of bytecode instructions; '
'"co_consts"\n'
' is a tuple containing the literals used by the bytecode;\n'
' "co_names" is a tuple containing the names used by the '
'bytecode;\n'
' "co_filename" is the filename from which the code was '
'compiled;\n'
' "co_firstlineno" is the first line number of the function;\n'
' "co_lnotab" is a string encoding the mapping from bytecode\n'
' offsets to line numbers (for details see the source code of '
'the\n'
' interpreter); "co_stacksize" is the required stack size\n'
' (including local variables); "co_flags" is an integer '
'encoding a\n'
' number of flags for the interpreter.\n'
'\n'
' The following flag bits are defined for "co_flags": bit '
'"0x04"\n'
' is set if the function uses the "*arguments" syntax to accept '
'an\n'
' arbitrary number of positional arguments; bit "0x08" is set '
'if\n'
' the function uses the "**keywords" syntax to accept '
'arbitrary\n'
' keyword arguments; bit "0x20" is set if the function is a\n'
' generator.\n'
'\n'
' Future feature declarations ("from __future__ import '
'division")\n'
' also use bits in "co_flags" to indicate whether a code '
'object\n'
' was compiled with a particular feature enabled: bit "0x2000" '
'is\n'
' set if the function was compiled with future division '
'enabled;\n'
' bits "0x10" and "0x1000" were used in earlier versions of\n'
' Python.\n'
'\n'
' Other bits in "co_flags" are reserved for internal use.\n'
'\n'
' If a code object represents a function, the first item in\n'
' "co_consts" is the documentation string of the function, or\n'
' "None" if undefined.\n'
'\n'
' Frame objects\n'
' Frame objects represent execution frames. They may occur in\n'
' traceback objects (see below).\n'
'\n'
' Special read-only attributes: "f_back" is to the previous '
'stack\n'
' frame (towards the caller), or "None" if this is the bottom\n'
' stack frame; "f_code" is the code object being executed in '
'this\n'
' frame; "f_locals" is the dictionary used to look up local\n'
' variables; "f_globals" is used for global variables;\n'
' "f_builtins" is used for built-in (intrinsic) names; '
'"f_lasti"\n'
' gives the precise instruction (this is an index into the\n'
' bytecode string of the code object).\n'
'\n'
' Special writable attributes: "f_trace", if not "None", is a\n'
' function called at the start of each source code line (this '
'is\n'
' used by the debugger); "f_lineno" is the current line number '
'of\n'
' the frame --- writing to this from within a trace function '
'jumps\n'
' to the given line (only for the bottom-most frame). A '
'debugger\n'
' can implement a Jump command (aka Set Next Statement) by '
'writing\n'
' to f_lineno.\n'
'\n'
' Frame objects support one method:\n'
'\n'
' frame.clear()\n'
'\n'
' This method clears all references to local variables held '
'by\n'
' the frame. Also, if the frame belonged to a generator, '
'the\n'
' generator is finalized. This helps break reference '
'cycles\n'
' involving frame objects (for example when catching an\n'
' exception and storing its traceback for later use).\n'
'\n'
' "RuntimeError" is raised if the frame is currently '
'executing.\n'
'\n'
' New in version 3.4.\n'
'\n'
' Traceback objects\n'
' Traceback objects represent a stack trace of an exception. '
'A\n'
' traceback object is created when an exception occurs. When '
'the\n'
' search for an exception handler unwinds the execution stack, '
'at\n'
' each unwound level a traceback object is inserted in front '
'of\n'
' the current traceback. When an exception handler is '
'entered,\n'
' the stack trace is made available to the program. (See '
'section\n'
' *The try statement*.) It is accessible as the third item of '
'the\n'
' tuple returned by "sys.exc_info()". When the program contains '
'no\n'
' suitable handler, the stack trace is written (nicely '
'formatted)\n'
' to the standard error stream; if the interpreter is '
'interactive,\n'
' it is also made available to the user as '
'"sys.last_traceback".\n'
'\n'
' Special read-only attributes: "tb_next" is the next level in '
'the\n'
' stack trace (towards the frame where the exception occurred), '
'or\n'
' "None" if there is no next level; "tb_frame" points to the\n'
' execution frame of the current level; "tb_lineno" gives the '
'line\n'
' number where the exception occurred; "tb_lasti" indicates '
'the\n'
' precise instruction. The line number and last instruction '
'in\n'
' the traceback may differ from the line number of its frame\n'
' object if the exception occurred in a "try" statement with '
'no\n'
' matching except clause or with a finally clause.\n'
'\n'
' Slice objects\n'
' Slice objects are used to represent slices for '
'"__getitem__()"\n'
' methods. They are also created by the built-in "slice()"\n'
' function.\n'
'\n'
' Special read-only attributes: "start" is the lower bound; '
'"stop"\n'
' is the upper bound; "step" is the step value; each is "None" '
'if\n'
' omitted. These attributes can have any type.\n'
'\n'
' Slice objects support one method:\n'
'\n'
' slice.indices(self, length)\n'
'\n'
' This method takes a single integer argument *length* and\n'
' computes information about the slice that the slice '
'object\n'
' would describe if applied to a sequence of *length* '
'items.\n'
' It returns a tuple of three integers; respectively these '
'are\n'
' the *start* and *stop* indices and the *step* or stride\n'
' length of the slice. Missing or out-of-bounds indices are\n'
' handled in a manner consistent with regular slices.\n'
'\n'
' Static method objects\n'
' Static method objects provide a way of defeating the\n'
' transformation of function objects to method objects '
'described\n'
' above. A static method object is a wrapper around any other\n'
' object, usually a user-defined method object. When a static\n'
' method object is retrieved from a class or a class instance, '
'the\n'
' object actually returned is the wrapped object, which is not\n'
' subject to any further transformation. Static method objects '
'are\n'
' not themselves callable, although the objects they wrap '
'usually\n'
' are. Static method objects are created by the built-in\n'
' "staticmethod()" constructor.\n'
'\n'
' Class method objects\n'
' A class method object, like a static method object, is a '
'wrapper\n'
' around another object that alters the way in which that '
'object\n'
' is retrieved from classes and class instances. The behaviour '
'of\n'
' class method objects upon such retrieval is described above,\n'
' under "User-defined methods". Class method objects are '
'created\n'
' by the built-in "classmethod()" constructor.\n',
'typesfunctions': '\n'
'Functions\n'
'*********\n'
'\n'
'Function objects are created by function definitions. The '
'only\n'
'operation on a function object is to call it: '
'"func(argument-list)".\n'
'\n'
'There are really two flavors of function objects: built-in '
'functions\n'
'and user-defined functions. Both support the same '
'operation (to call\n'
'the function), but the implementation is different, hence '
'the\n'
'different object types.\n'
'\n'
'See *Function definitions* for more information.\n',
'typesmapping': '\n'
'Mapping Types --- "dict"\n'
'************************\n'
'\n'
'A *mapping* object maps *hashable* values to arbitrary '
'objects.\n'
'Mappings are mutable objects. There is currently only one '
'standard\n'
'mapping type, the *dictionary*. (For other containers see '
'the built-\n'
'in "list", "set", and "tuple" classes, and the "collections" '
'module.)\n'
'\n'
"A dictionary's keys are *almost* arbitrary values. Values "
'that are\n'
'not *hashable*, that is, values containing lists, '
'dictionaries or\n'
'other mutable types (that are compared by value rather than '
'by object\n'
'identity) may not be used as keys. Numeric types used for '
'keys obey\n'
'the normal rules for numeric comparison: if two numbers '
'compare equal\n'
'(such as "1" and "1.0") then they can be used '
'interchangeably to index\n'
'the same dictionary entry. (Note however, that since '
'computers store\n'
'floating-point numbers as approximations it is usually '
'unwise to use\n'
'them as dictionary keys.)\n'
'\n'
'Dictionaries can be created by placing a comma-separated '
'list of "key:\n'
'value" pairs within braces, for example: "{\'jack\': 4098, '
"'sjoerd':\n"
'4127}" or "{4098: \'jack\', 4127: \'sjoerd\'}", or by the '
'"dict"\n'
'constructor.\n'
'\n'
'class class dict(**kwarg)\n'
'class class dict(mapping, **kwarg)\n'
'class class dict(iterable, **kwarg)\n'
'\n'
' Return a new dictionary initialized from an optional '
'positional\n'
' argument and a possibly empty set of keyword arguments.\n'
'\n'
' If no positional argument is given, an empty dictionary '
'is created.\n'
' If a positional argument is given and it is a mapping '
'object, a\n'
' dictionary is created with the same key-value pairs as '
'the mapping\n'
' object. Otherwise, the positional argument must be an '
'*iterable*\n'
' object. Each item in the iterable must itself be an '
'iterable with\n'
' exactly two objects. The first object of each item '
'becomes a key\n'
' in the new dictionary, and the second object the '
'corresponding\n'
' value. If a key occurs more than once, the last value '
'for that key\n'
' becomes the corresponding value in the new dictionary.\n'
'\n'
' If keyword arguments are given, the keyword arguments and '
'their\n'
' values are added to the dictionary created from the '
'positional\n'
' argument. If a key being added is already present, the '
'value from\n'
' the keyword argument replaces the value from the '
'positional\n'
' argument.\n'
'\n'
' To illustrate, the following examples all return a '
'dictionary equal\n'
' to "{"one": 1, "two": 2, "three": 3}":\n'
'\n'
' >>> a = dict(one=1, two=2, three=3)\n'
" >>> b = {'one': 1, 'two': 2, 'three': 3}\n"
" >>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))\n"
" >>> d = dict([('two', 2), ('one', 1), ('three', 3)])\n"
" >>> e = dict({'three': 3, 'one': 1, 'two': 2})\n"
' >>> a == b == c == d == e\n'
' True\n'
'\n'
' Providing keyword arguments as in the first example only '
'works for\n'
' keys that are valid Python identifiers. Otherwise, any '
'valid keys\n'
' can be used.\n'
'\n'
' These are the operations that dictionaries support (and '
'therefore,\n'
' custom mapping types should support too):\n'
'\n'
' len(d)\n'
'\n'
' Return the number of items in the dictionary *d*.\n'
'\n'
' d[key]\n'
'\n'
' Return the item of *d* with key *key*. Raises a '
'"KeyError" if\n'
' *key* is not in the map.\n'
'\n'
' If a subclass of dict defines a method "__missing__()" '
'and *key*\n'
' is not present, the "d[key]" operation calls that '
'method with\n'
' the key *key* as argument. The "d[key]" operation '
'then returns\n'
' or raises whatever is returned or raised by the\n'
' "__missing__(key)" call. No other operations or '
'methods invoke\n'
' "__missing__()". If "__missing__()" is not defined, '
'"KeyError"\n'
' is raised. "__missing__()" must be a method; it cannot '
'be an\n'
' instance variable:\n'
'\n'
' >>> class Counter(dict):\n'
' ... def __missing__(self, key):\n'
' ... return 0\n'
' >>> c = Counter()\n'
" >>> c['red']\n"
' 0\n'
" >>> c['red'] += 1\n"
" >>> c['red']\n"
' 1\n'
'\n'
' The example above shows part of the implementation of\n'
' "collections.Counter". A different "__missing__" '
'method is used\n'
' by "collections.defaultdict".\n'
'\n'
' d[key] = value\n'
'\n'
' Set "d[key]" to *value*.\n'
'\n'
' del d[key]\n'
'\n'
' Remove "d[key]" from *d*. Raises a "KeyError" if '
'*key* is not\n'
' in the map.\n'
'\n'
' key in d\n'
'\n'
' Return "True" if *d* has a key *key*, else "False".\n'
'\n'
' key not in d\n'
'\n'
' Equivalent to "not key in d".\n'
'\n'
' iter(d)\n'
'\n'
' Return an iterator over the keys of the dictionary. '
'This is a\n'
' shortcut for "iter(d.keys())".\n'
'\n'
' clear()\n'
'\n'
' Remove all items from the dictionary.\n'
'\n'
' copy()\n'
'\n'
' Return a shallow copy of the dictionary.\n'
'\n'
' classmethod fromkeys(seq[, value])\n'
'\n'
' Create a new dictionary with keys from *seq* and '
'values set to\n'
' *value*.\n'
'\n'
' "fromkeys()" is a class method that returns a new '
'dictionary.\n'
' *value* defaults to "None".\n'
'\n'
' get(key[, default])\n'
'\n'
' Return the value for *key* if *key* is in the '
'dictionary, else\n'
' *default*. If *default* is not given, it defaults to '
'"None", so\n'
' that this method never raises a "KeyError".\n'
'\n'
' items()\n'
'\n'
' Return a new view of the dictionary\'s items ("(key, '
'value)"\n'
' pairs). See the *documentation of view objects*.\n'
'\n'
' keys()\n'
'\n'
" Return a new view of the dictionary's keys. See the\n"
' *documentation of view objects*.\n'
'\n'
' pop(key[, default])\n'
'\n'
' If *key* is in the dictionary, remove it and return '
'its value,\n'
' else return *default*. If *default* is not given and '
'*key* is\n'
' not in the dictionary, a "KeyError" is raised.\n'
'\n'
' popitem()\n'
'\n'
' Remove and return an arbitrary "(key, value)" pair '
'from the\n'
' dictionary.\n'
'\n'
' "popitem()" is useful to destructively iterate over a\n'
' dictionary, as often used in set algorithms. If the '
'dictionary\n'
' is empty, calling "popitem()" raises a "KeyError".\n'
'\n'
' setdefault(key[, default])\n'
'\n'
' If *key* is in the dictionary, return its value. If '
'not, insert\n'
' *key* with a value of *default* and return *default*. '
'*default*\n'
' defaults to "None".\n'
'\n'
' update([other])\n'
'\n'
' Update the dictionary with the key/value pairs from '
'*other*,\n'
' overwriting existing keys. Return "None".\n'
'\n'
' "update()" accepts either another dictionary object or '
'an\n'
' iterable of key/value pairs (as tuples or other '
'iterables of\n'
' length two). If keyword arguments are specified, the '
'dictionary\n'
' is then updated with those key/value pairs: '
'"d.update(red=1,\n'
' blue=2)".\n'
'\n'
' values()\n'
'\n'
" Return a new view of the dictionary's values. See "
'the\n'
' *documentation of view objects*.\n'
'\n'
' Dictionaries compare equal if and only if they have the '
'same "(key,\n'
' value)" pairs. Order comparisons (\'<\', \'<=\', \'>=\', '
"'>') raise\n"
' "TypeError".\n'
'\n'
'See also: "types.MappingProxyType" can be used to create a '
'read-only\n'
' view of a "dict".\n'
'\n'
'\n'
'Dictionary view objects\n'
'=======================\n'
'\n'
'The objects returned by "dict.keys()", "dict.values()" and\n'
'"dict.items()" are *view objects*. They provide a dynamic '
'view on the\n'
"dictionary's entries, which means that when the dictionary "
'changes,\n'
'the view reflects these changes.\n'
'\n'
'Dictionary views can be iterated over to yield their '
'respective data,\n'
'and support membership tests:\n'
'\n'
'len(dictview)\n'
'\n'
' Return the number of entries in the dictionary.\n'
'\n'
'iter(dictview)\n'
'\n'
' Return an iterator over the keys, values or items '
'(represented as\n'
' tuples of "(key, value)") in the dictionary.\n'
'\n'
' Keys and values are iterated over in an arbitrary order '
'which is\n'
' non-random, varies across Python implementations, and '
'depends on\n'
" the dictionary's history of insertions and deletions. If "
'keys,\n'
' values and items views are iterated over with no '
'intervening\n'
' modifications to the dictionary, the order of items will '
'directly\n'
' correspond. This allows the creation of "(value, key)" '
'pairs using\n'
' "zip()": "pairs = zip(d.values(), d.keys())". Another '
'way to\n'
' create the same list is "pairs = [(v, k) for (k, v) in '
'd.items()]".\n'
'\n'
' Iterating views while adding or deleting entries in the '
'dictionary\n'
' may raise a "RuntimeError" or fail to iterate over all '
'entries.\n'
'\n'
'x in dictview\n'
'\n'
' Return "True" if *x* is in the underlying dictionary\'s '
'keys, values\n'
' or items (in the latter case, *x* should be a "(key, '
'value)"\n'
' tuple).\n'
'\n'
'Keys views are set-like since their entries are unique and '
'hashable.\n'
'If all values are hashable, so that "(key, value)" pairs are '
'unique\n'
'and hashable, then the items view is also set-like. (Values '
'views are\n'
'not treated as set-like since the entries are generally not '
'unique.)\n'
'For set-like views, all of the operations defined for the '
'abstract\n'
'base class "collections.abc.Set" are available (for example, '
'"==",\n'
'"<", or "^").\n'
'\n'
'An example of dictionary view usage:\n'
'\n'
" >>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, "
"'spam': 500}\n"
' >>> keys = dishes.keys()\n'
' >>> values = dishes.values()\n'
'\n'
' >>> # iteration\n'
' >>> n = 0\n'
' >>> for val in values:\n'
' ... n += val\n'
' >>> print(n)\n'
' 504\n'
'\n'
' >>> # keys and values are iterated over in the same '
'order\n'
' >>> list(keys)\n'
" ['eggs', 'bacon', 'sausage', 'spam']\n"
' >>> list(values)\n'
' [2, 1, 1, 500]\n'
'\n'
' >>> # view objects are dynamic and reflect dict changes\n'
" >>> del dishes['eggs']\n"
" >>> del dishes['sausage']\n"
' >>> list(keys)\n'
" ['spam', 'bacon']\n"
'\n'
' >>> # set operations\n'
" >>> keys & {'eggs', 'bacon', 'salad'}\n"
" {'bacon'}\n"
" >>> keys ^ {'sausage', 'juice'}\n"
" {'juice', 'sausage', 'bacon', 'spam'}\n",
'typesmethods': '\n'
'Methods\n'
'*******\n'
'\n'
'Methods are functions that are called using the attribute '
'notation.\n'
'There are two flavors: built-in methods (such as "append()" '
'on lists)\n'
'and class instance methods. Built-in methods are described '
'with the\n'
'types that support them.\n'
'\n'
'If you access a method (a function defined in a class '
'namespace)\n'
'through an instance, you get a special object: a *bound '
'method* (also\n'
'called *instance method*) object. When called, it will add '
'the "self"\n'
'argument to the argument list. Bound methods have two '
'special read-\n'
'only attributes: "m.__self__" is the object on which the '
'method\n'
'operates, and "m.__func__" is the function implementing the '
'method.\n'
'Calling "m(arg-1, arg-2, ..., arg-n)" is completely '
'equivalent to\n'
'calling "m.__func__(m.__self__, arg-1, arg-2, ..., arg-n)".\n'
'\n'
'Like function objects, bound method objects support getting '
'arbitrary\n'
'attributes. However, since method attributes are actually '
'stored on\n'
'the underlying function object ("meth.__func__"), setting '
'method\n'
'attributes on bound methods is disallowed. Attempting to '
'set an\n'
'attribute on a method results in an "AttributeError" being '
'raised. In\n'
'order to set a method attribute, you need to explicitly set '
'it on the\n'
'underlying function object:\n'
'\n'
' >>> class C:\n'
' ... def method(self):\n'
' ... pass\n'
' ...\n'
' >>> c = C()\n'
" >>> c.method.whoami = 'my name is method' # can't set on "
'the method\n'
' Traceback (most recent call last):\n'
' File "<stdin>", line 1, in <module>\n'
" AttributeError: 'method' object has no attribute "
"'whoami'\n"
" >>> c.method.__func__.whoami = 'my name is method'\n"
' >>> c.method.whoami\n'
" 'my name is method'\n"
'\n'
'See *The standard type hierarchy* for more information.\n',
'typesmodules': '\n'
'Modules\n'
'*******\n'
'\n'
'The only special operation on a module is attribute access: '
'"m.name",\n'
'where *m* is a module and *name* accesses a name defined in '
"*m*'s\n"
'symbol table. Module attributes can be assigned to. (Note '
'that the\n'
'"import" statement is not, strictly speaking, an operation '
'on a module\n'
'object; "import foo" does not require a module object named '
'*foo* to\n'
'exist, rather it requires an (external) *definition* for a '
'module\n'
'named *foo* somewhere.)\n'
'\n'
'A special attribute of every module is "__dict__". This is '
'the\n'
"dictionary containing the module's symbol table. Modifying "
'this\n'
"dictionary will actually change the module's symbol table, "
'but direct\n'
'assignment to the "__dict__" attribute is not possible (you '
'can write\n'
'"m.__dict__[\'a\'] = 1", which defines "m.a" to be "1", but '
"you can't\n"
'write "m.__dict__ = {}"). Modifying "__dict__" directly is '
'not\n'
'recommended.\n'
'\n'
'Modules built into the interpreter are written like this: '
'"<module\n'
'\'sys\' (built-in)>". If loaded from a file, they are '
'written as\n'
'"<module \'os\' from '
'\'/usr/local/lib/pythonX.Y/os.pyc\'>".\n',
'typesseq': '\n'
'Sequence Types --- "list", "tuple", "range"\n'
'*******************************************\n'
'\n'
'There are three basic sequence types: lists, tuples, and range\n'
'objects. Additional sequence types tailored for processing of '
'*binary\n'
'data* and *text strings* are described in dedicated sections.\n'
'\n'
'\n'
'Common Sequence Operations\n'
'==========================\n'
'\n'
'The operations in the following table are supported by most '
'sequence\n'
'types, both mutable and immutable. The '
'"collections.abc.Sequence" ABC\n'
'is provided to make it easier to correctly implement these '
'operations\n'
'on custom sequence types.\n'
'\n'
'This table lists the sequence operations sorted in ascending '
'priority.\n'
'In the table, *s* and *t* are sequences of the same type, *n*, '
'*i*,\n'
'*j* and *k* are integers and *x* is an arbitrary object that '
'meets any\n'
'type and value restrictions imposed by *s*.\n'
'\n'
'The "in" and "not in" operations have the same priorities as '
'the\n'
'comparison operations. The "+" (concatenation) and "*" '
'(repetition)\n'
'operations have the same priority as the corresponding numeric\n'
'operations. [3]\n'
'\n'
'+----------------------------+----------------------------------+------------+\n'
'| Operation | Result '
'| Notes |\n'
'+============================+==================================+============+\n'
'| "x in s" | "True" if an item of *s* is '
'| (1) |\n'
'| | equal to *x*, else "False" '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "x not in s" | "False" if an item of *s* is '
'| (1) |\n'
'| | equal to *x*, else "True" '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s + t" | the concatenation of *s* and *t* '
'| (6)(7) |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s * n" or "n * s" | equivalent to adding *s* to '
'| (2)(7) |\n'
'| | itself *n* times '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s[i]" | *i*th item of *s*, origin 0 '
'| (3) |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s[i:j]" | slice of *s* from *i* to *j* '
'| (3)(4) |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s[i:j:k]" | slice of *s* from *i* to *j* '
'| (3)(5) |\n'
'| | with step *k* '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "len(s)" | length of *s* '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "min(s)" | smallest item of *s* '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "max(s)" | largest item of *s* '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s.index(x[, i[, j]])" | index of the first occurrence of '
'| (8) |\n'
'| | *x* in *s* (at or after index '
'| |\n'
'| | *i* and before index *j*) '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'| "s.count(x)" | total number of occurrences of '
'| |\n'
'| | *x* in *s* '
'| |\n'
'+----------------------------+----------------------------------+------------+\n'
'\n'
'Sequences of the same type also support comparisons. In '
'particular,\n'
'tuples and lists are compared lexicographically by comparing\n'
'corresponding elements. This means that to compare equal, every\n'
'element must compare equal and the two sequences must be of the '
'same\n'
'type and have the same length. (For full details see '
'*Comparisons* in\n'
'the language reference.)\n'
'\n'
'Notes:\n'
'\n'
'1. While the "in" and "not in" operations are used only for '
'simple\n'
' containment testing in the general case, some specialised '
'sequences\n'
' (such as "str", "bytes" and "bytearray") also use them for\n'
' subsequence testing:\n'
'\n'
' >>> "gg" in "eggs"\n'
' True\n'
'\n'
'2. Values of *n* less than "0" are treated as "0" (which yields '
'an\n'
' empty sequence of the same type as *s*). Note that items in '
'the\n'
' sequence *s* are not copied; they are referenced multiple '
'times.\n'
' This often haunts new Python programmers; consider:\n'
'\n'
' >>> lists = [[]] * 3\n'
' >>> lists\n'
' [[], [], []]\n'
' >>> lists[0].append(3)\n'
' >>> lists\n'
' [[3], [3], [3]]\n'
'\n'
' What has happened is that "[[]]" is a one-element list '
'containing\n'
' an empty list, so all three elements of "[[]] * 3" are '
'references\n'
' to this single empty list. Modifying any of the elements of\n'
' "lists" modifies this single list. You can create a list of\n'
' different lists this way:\n'
'\n'
' >>> lists = [[] for i in range(3)]\n'
' >>> lists[0].append(3)\n'
' >>> lists[1].append(5)\n'
' >>> lists[2].append(7)\n'
' >>> lists\n'
' [[3], [5], [7]]\n'
'\n'
' Further explanation is available in the FAQ entry *How do I '
'create\n'
' a multidimensional list?*.\n'
'\n'
'3. If *i* or *j* is negative, the index is relative to the end '
'of\n'
' the string: "len(s) + i" or "len(s) + j" is substituted. But '
'note\n'
' that "-0" is still "0".\n'
'\n'
'4. The slice of *s* from *i* to *j* is defined as the sequence '
'of\n'
' items with index *k* such that "i <= k < j". If *i* or *j* '
'is\n'
' greater than "len(s)", use "len(s)". If *i* is omitted or '
'"None",\n'
' use "0". If *j* is omitted or "None", use "len(s)". If *i* '
'is\n'
' greater than or equal to *j*, the slice is empty.\n'
'\n'
'5. The slice of *s* from *i* to *j* with step *k* is defined as '
'the\n'
' sequence of items with index "x = i + n*k" such that "0 <= n '
'<\n'
' (j-i)/k". In other words, the indices are "i", "i+k", '
'"i+2*k",\n'
' "i+3*k" and so on, stopping when *j* is reached (but never\n'
' including *j*). If *i* or *j* is greater than "len(s)", use\n'
' "len(s)". If *i* or *j* are omitted or "None", they become '
'"end"\n'
' values (which end depends on the sign of *k*). Note, *k* '
'cannot be\n'
' zero. If *k* is "None", it is treated like "1".\n'
'\n'
'6. Concatenating immutable sequences always results in a new\n'
' object. This means that building up a sequence by repeated\n'
' concatenation will have a quadratic runtime cost in the '
'total\n'
' sequence length. To get a linear runtime cost, you must '
'switch to\n'
' one of the alternatives below:\n'
'\n'
' * if concatenating "str" objects, you can build a list and '
'use\n'
' "str.join()" at the end or else write to an "io.StringIO"\n'
' instance and retrieve its value when complete\n'
'\n'
' * if concatenating "bytes" objects, you can similarly use\n'
' "bytes.join()" or "io.BytesIO", or you can do in-place\n'
' concatenation with a "bytearray" object. "bytearray" '
'objects are\n'
' mutable and have an efficient overallocation mechanism\n'
'\n'
' * if concatenating "tuple" objects, extend a "list" instead\n'
'\n'
' * for other types, investigate the relevant class '
'documentation\n'
'\n'
'7. Some sequence types (such as "range") only support item\n'
" sequences that follow specific patterns, and hence don't "
'support\n'
' sequence concatenation or repetition.\n'
'\n'
'8. "index" raises "ValueError" when *x* is not found in *s*. '
'When\n'
' supported, the additional arguments to the index method '
'allow\n'
' efficient searching of subsections of the sequence. Passing '
'the\n'
' extra arguments is roughly equivalent to using '
'"s[i:j].index(x)",\n'
' only without copying any data and with the returned index '
'being\n'
' relative to the start of the sequence rather than the start '
'of the\n'
' slice.\n'
'\n'
'\n'
'Immutable Sequence Types\n'
'========================\n'
'\n'
'The only operation that immutable sequence types generally '
'implement\n'
'that is not also implemented by mutable sequence types is '
'support for\n'
'the "hash()" built-in.\n'
'\n'
'This support allows immutable sequences, such as "tuple" '
'instances, to\n'
'be used as "dict" keys and stored in "set" and "frozenset" '
'instances.\n'
'\n'
'Attempting to hash an immutable sequence that contains '
'unhashable\n'
'values will result in "TypeError".\n'
'\n'
'\n'
'Mutable Sequence Types\n'
'======================\n'
'\n'
'The operations in the following table are defined on mutable '
'sequence\n'
'types. The "collections.abc.MutableSequence" ABC is provided to '
'make\n'
'it easier to correctly implement these operations on custom '
'sequence\n'
'types.\n'
'\n'
'In the table *s* is an instance of a mutable sequence type, *t* '
'is any\n'
'iterable object and *x* is an arbitrary object that meets any '
'type and\n'
'value restrictions imposed by *s* (for example, "bytearray" '
'only\n'
'accepts integers that meet the value restriction "0 <= x <= '
'255").\n'
'\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| Operation | '
'Result | Notes |\n'
'+================================+==================================+=======================+\n'
'| "s[i] = x" | item *i* of *s* is replaced '
'by | |\n'
'| | '
'*x* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j] = t" | slice of *s* from *i* to *j* '
'is | |\n'
'| | replaced by the contents of '
'the | |\n'
'| | iterable '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j]" | same as "s[i:j] = '
'[]" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j:k] = t" | the elements of "s[i:j:k]" '
'are | (1) |\n'
'| | replaced by those of '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j:k]" | removes the elements '
'of | |\n'
'| | "s[i:j:k]" from the '
'list | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.append(x)" | appends *x* to the end of '
'the | |\n'
'| | sequence (same '
'as | |\n'
'| | "s[len(s):len(s)] = '
'[x]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.clear()" | removes all items from "s" '
'(same | (5) |\n'
'| | as "del '
's[:]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.copy()" | creates a shallow copy of '
'"s" | (5) |\n'
'| | (same as '
'"s[:]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.extend(t)" or "s += t" | extends *s* with the contents '
'of | |\n'
'| | *t* (for the most part the '
'same | |\n'
'| | as "s[len(s):len(s)] = '
't") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s *= n" | updates *s* with its '
'contents | (6) |\n'
'| | repeated *n* '
'times | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.insert(i, x)" | inserts *x* into *s* at '
'the | |\n'
'| | index given by *i* (same '
'as | |\n'
'| | "s[i:i] = '
'[x]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.pop([i])" | retrieves the item at *i* '
'and | (2) |\n'
'| | also removes it from '
'*s* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.remove(x)" | remove the first item from '
'*s* | (3) |\n'
'| | where "s[i] == '
'x" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.reverse()" | reverses the items of *s* '
'in | (4) |\n'
'| | '
'place | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'\n'
'Notes:\n'
'\n'
'1. *t* must have the same length as the slice it is replacing.\n'
'\n'
'2. The optional argument *i* defaults to "-1", so that by '
'default\n'
' the last item is removed and returned.\n'
'\n'
'3. "remove" raises "ValueError" when *x* is not found in *s*.\n'
'\n'
'4. The "reverse()" method modifies the sequence in place for\n'
' economy of space when reversing a large sequence. To remind '
'users\n'
' that it operates by side effect, it does not return the '
'reversed\n'
' sequence.\n'
'\n'
'5. "clear()" and "copy()" are included for consistency with the\n'
" interfaces of mutable containers that don't support slicing\n"
' operations (such as "dict" and "set")\n'
'\n'
' New in version 3.3: "clear()" and "copy()" methods.\n'
'\n'
'6. The value *n* is an integer, or an object implementing\n'
' "__index__()". Zero and negative values of *n* clear the '
'sequence.\n'
' Items in the sequence are not copied; they are referenced '
'multiple\n'
' times, as explained for "s * n" under *Common Sequence '
'Operations*.\n'
'\n'
'\n'
'Lists\n'
'=====\n'
'\n'
'Lists are mutable sequences, typically used to store collections '
'of\n'
'homogeneous items (where the precise degree of similarity will '
'vary by\n'
'application).\n'
'\n'
'class class list([iterable])\n'
'\n'
' Lists may be constructed in several ways:\n'
'\n'
' * Using a pair of square brackets to denote the empty list: '
'"[]"\n'
'\n'
' * Using square brackets, separating items with commas: '
'"[a]",\n'
' "[a, b, c]"\n'
'\n'
' * Using a list comprehension: "[x for x in iterable]"\n'
'\n'
' * Using the type constructor: "list()" or "list(iterable)"\n'
'\n'
' The constructor builds a list whose items are the same and in '
'the\n'
" same order as *iterable*'s items. *iterable* may be either "
'a\n'
' sequence, a container that supports iteration, or an '
'iterator\n'
' object. If *iterable* is already a list, a copy is made and\n'
' returned, similar to "iterable[:]". For example, '
'"list(\'abc\')"\n'
' returns "[\'a\', \'b\', \'c\']" and "list( (1, 2, 3) )" '
'returns "[1, 2,\n'
' 3]". If no argument is given, the constructor creates a new '
'empty\n'
' list, "[]".\n'
'\n'
' Many other operations also produce lists, including the '
'"sorted()"\n'
' built-in.\n'
'\n'
' Lists implement all of the *common* and *mutable* sequence\n'
' operations. Lists also provide the following additional '
'method:\n'
'\n'
' sort(*, key=None, reverse=None)\n'
'\n'
' This method sorts the list in place, using only "<" '
'comparisons\n'
' between items. Exceptions are not suppressed - if any '
'comparison\n'
' operations fail, the entire sort operation will fail (and '
'the\n'
' list will likely be left in a partially modified state).\n'
'\n'
' "sort()" accepts two arguments that can only be passed by\n'
' keyword (*keyword-only arguments*):\n'
'\n'
' *key* specifies a function of one argument that is used '
'to\n'
' extract a comparison key from each list element (for '
'example,\n'
' "key=str.lower"). The key corresponding to each item in '
'the list\n'
' is calculated once and then used for the entire sorting '
'process.\n'
' The default value of "None" means that list items are '
'sorted\n'
' directly without calculating a separate key value.\n'
'\n'
' The "functools.cmp_to_key()" utility is available to '
'convert a\n'
' 2.x style *cmp* function to a *key* function.\n'
'\n'
' *reverse* is a boolean value. If set to "True", then the '
'list\n'
' elements are sorted as if each comparison were reversed.\n'
'\n'
' This method modifies the sequence in place for economy of '
'space\n'
' when sorting a large sequence. To remind users that it '
'operates\n'
' by side effect, it does not return the sorted sequence '
'(use\n'
' "sorted()" to explicitly request a new sorted list '
'instance).\n'
'\n'
' The "sort()" method is guaranteed to be stable. A sort '
'is\n'
' stable if it guarantees not to change the relative order '
'of\n'
' elements that compare equal --- this is helpful for '
'sorting in\n'
' multiple passes (for example, sort by department, then by '
'salary\n'
' grade).\n'
'\n'
' **CPython implementation detail:** While a list is being '
'sorted,\n'
' the effect of attempting to mutate, or even inspect, the '
'list is\n'
' undefined. The C implementation of Python makes the list '
'appear\n'
' empty for the duration, and raises "ValueError" if it can '
'detect\n'
' that the list has been mutated during a sort.\n'
'\n'
'\n'
'Tuples\n'
'======\n'
'\n'
'Tuples are immutable sequences, typically used to store '
'collections of\n'
'heterogeneous data (such as the 2-tuples produced by the '
'"enumerate()"\n'
'built-in). Tuples are also used for cases where an immutable '
'sequence\n'
'of homogeneous data is needed (such as allowing storage in a '
'"set" or\n'
'"dict" instance).\n'
'\n'
'class class tuple([iterable])\n'
'\n'
' Tuples may be constructed in a number of ways:\n'
'\n'
' * Using a pair of parentheses to denote the empty tuple: '
'"()"\n'
'\n'
' * Using a trailing comma for a singleton tuple: "a," or '
'"(a,)"\n'
'\n'
' * Separating items with commas: "a, b, c" or "(a, b, c)"\n'
'\n'
' * Using the "tuple()" built-in: "tuple()" or '
'"tuple(iterable)"\n'
'\n'
' The constructor builds a tuple whose items are the same and '
'in the\n'
" same order as *iterable*'s items. *iterable* may be either "
'a\n'
' sequence, a container that supports iteration, or an '
'iterator\n'
' object. If *iterable* is already a tuple, it is returned\n'
' unchanged. For example, "tuple(\'abc\')" returns "(\'a\', '
'\'b\', \'c\')"\n'
' and "tuple( [1, 2, 3] )" returns "(1, 2, 3)". If no argument '
'is\n'
' given, the constructor creates a new empty tuple, "()".\n'
'\n'
' Note that it is actually the comma which makes a tuple, not '
'the\n'
' parentheses. The parentheses are optional, except in the '
'empty\n'
' tuple case, or when they are needed to avoid syntactic '
'ambiguity.\n'
' For example, "f(a, b, c)" is a function call with three '
'arguments,\n'
' while "f((a, b, c))" is a function call with a 3-tuple as the '
'sole\n'
' argument.\n'
'\n'
' Tuples implement all of the *common* sequence operations.\n'
'\n'
'For heterogeneous collections of data where access by name is '
'clearer\n'
'than access by index, "collections.namedtuple()" may be a more\n'
'appropriate choice than a simple tuple object.\n'
'\n'
'\n'
'Ranges\n'
'======\n'
'\n'
'The "range" type represents an immutable sequence of numbers and '
'is\n'
'commonly used for looping a specific number of times in "for" '
'loops.\n'
'\n'
'class class range(stop)\n'
'class class range(start, stop[, step])\n'
'\n'
' The arguments to the range constructor must be integers '
'(either\n'
' built-in "int" or any object that implements the "__index__"\n'
' special method). If the *step* argument is omitted, it '
'defaults to\n'
' "1". If the *start* argument is omitted, it defaults to "0". '
'If\n'
' *step* is zero, "ValueError" is raised.\n'
'\n'
' For a positive *step*, the contents of a range "r" are '
'determined\n'
' by the formula "r[i] = start + step*i" where "i >= 0" and '
'"r[i] <\n'
' stop".\n'
'\n'
' For a negative *step*, the contents of the range are still\n'
' determined by the formula "r[i] = start + step*i", but the\n'
' constraints are "i >= 0" and "r[i] > stop".\n'
'\n'
' A range object will be empty if "r[0]" does not meet the '
'value\n'
' constraint. Ranges do support negative indices, but these '
'are\n'
' interpreted as indexing from the end of the sequence '
'determined by\n'
' the positive indices.\n'
'\n'
' Ranges containing absolute values larger than "sys.maxsize" '
'are\n'
' permitted but some features (such as "len()") may raise\n'
' "OverflowError".\n'
'\n'
' Range examples:\n'
'\n'
' >>> list(range(10))\n'
' [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n'
' >>> list(range(1, 11))\n'
' [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n'
' >>> list(range(0, 30, 5))\n'
' [0, 5, 10, 15, 20, 25]\n'
' >>> list(range(0, 10, 3))\n'
' [0, 3, 6, 9]\n'
' >>> list(range(0, -10, -1))\n'
' [0, -1, -2, -3, -4, -5, -6, -7, -8, -9]\n'
' >>> list(range(0))\n'
' []\n'
' >>> list(range(1, 0))\n'
' []\n'
'\n'
' Ranges implement all of the *common* sequence operations '
'except\n'
' concatenation and repetition (due to the fact that range '
'objects\n'
' can only represent sequences that follow a strict pattern '
'and\n'
' repetition and concatenation will usually violate that '
'pattern).\n'
'\n'
'The advantage of the "range" type over a regular "list" or '
'"tuple" is\n'
'that a "range" object will always take the same (small) amount '
'of\n'
'memory, no matter the size of the range it represents (as it '
'only\n'
'stores the "start", "stop" and "step" values, calculating '
'individual\n'
'items and subranges as needed).\n'
'\n'
'Range objects implement the "collections.abc.Sequence" ABC, and\n'
'provide features such as containment tests, element index '
'lookup,\n'
'slicing and support for negative indices (see *Sequence Types '
'---\n'
'list, tuple, range*):\n'
'\n'
'>>> r = range(0, 20, 2)\n'
'>>> r\n'
'range(0, 20, 2)\n'
'>>> 11 in r\n'
'False\n'
'>>> 10 in r\n'
'True\n'
'>>> r.index(10)\n'
'5\n'
'>>> r[5]\n'
'10\n'
'>>> r[:5]\n'
'range(0, 10, 2)\n'
'>>> r[-1]\n'
'18\n'
'\n'
'Testing range objects for equality with "==" and "!=" compares '
'them as\n'
'sequences. That is, two range objects are considered equal if '
'they\n'
'represent the same sequence of values. (Note that two range '
'objects\n'
'that compare equal might have different "start", "stop" and '
'"step"\n'
'attributes, for example "range(0) == range(2, 1, 3)" or '
'"range(0, 3,\n'
'2) == range(0, 4, 2)".)\n'
'\n'
'Changed in version 3.2: Implement the Sequence ABC. Support '
'slicing\n'
'and negative indices. Test "int" objects for membership in '
'constant\n'
'time instead of iterating through all items.\n'
'\n'
"Changed in version 3.3: Define '==' and '!=' to compare range "
'objects\n'
'based on the sequence of values they define (instead of '
'comparing\n'
'based on object identity).\n'
'\n'
'New in version 3.3: The "start", "stop" and "step" attributes.\n',
'typesseq-mutable': '\n'
'Mutable Sequence Types\n'
'**********************\n'
'\n'
'The operations in the following table are defined on '
'mutable sequence\n'
'types. The "collections.abc.MutableSequence" ABC is '
'provided to make\n'
'it easier to correctly implement these operations on '
'custom sequence\n'
'types.\n'
'\n'
'In the table *s* is an instance of a mutable sequence '
'type, *t* is any\n'
'iterable object and *x* is an arbitrary object that '
'meets any type and\n'
'value restrictions imposed by *s* (for example, '
'"bytearray" only\n'
'accepts integers that meet the value restriction "0 <= x '
'<= 255").\n'
'\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| Operation | '
'Result | Notes '
'|\n'
'+================================+==================================+=======================+\n'
'| "s[i] = x" | item *i* of *s* is '
'replaced by | |\n'
'| | '
'*x* | '
'|\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j] = t" | slice of *s* from *i* '
'to *j* is | |\n'
'| | replaced by the '
'contents of the | |\n'
'| | iterable '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j]" | same as "s[i:j] = '
'[]" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s[i:j:k] = t" | the elements of '
'"s[i:j:k]" are | (1) |\n'
'| | replaced by those of '
'*t* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "del s[i:j:k]" | removes the elements '
'of | |\n'
'| | "s[i:j:k]" from the '
'list | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.append(x)" | appends *x* to the '
'end of the | |\n'
'| | sequence (same '
'as | |\n'
'| | "s[len(s):len(s)] = '
'[x]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.clear()" | removes all items '
'from "s" (same | (5) |\n'
'| | as "del '
's[:]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.copy()" | creates a shallow '
'copy of "s" | (5) |\n'
'| | (same as '
'"s[:]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.extend(t)" or "s += t" | extends *s* with the '
'contents of | |\n'
'| | *t* (for the most '
'part the same | |\n'
'| | as "s[len(s):len(s)] '
'= t") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s *= n" | updates *s* with its '
'contents | (6) |\n'
'| | repeated *n* '
'times | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.insert(i, x)" | inserts *x* into *s* '
'at the | |\n'
'| | index given by *i* '
'(same as | |\n'
'| | "s[i:i] = '
'[x]") | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.pop([i])" | retrieves the item at '
'*i* and | (2) |\n'
'| | also removes it from '
'*s* | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.remove(x)" | remove the first item '
'from *s* | (3) |\n'
'| | where "s[i] == '
'x" | |\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'| "s.reverse()" | reverses the items of '
'*s* in | (4) |\n'
'| | '
'place | '
'|\n'
'+--------------------------------+----------------------------------+-----------------------+\n'
'\n'
'Notes:\n'
'\n'
'1. *t* must have the same length as the slice it is '
'replacing.\n'
'\n'
'2. The optional argument *i* defaults to "-1", so that '
'by default\n'
' the last item is removed and returned.\n'
'\n'
'3. "remove" raises "ValueError" when *x* is not found in '
'*s*.\n'
'\n'
'4. The "reverse()" method modifies the sequence in place '
'for\n'
' economy of space when reversing a large sequence. To '
'remind users\n'
' that it operates by side effect, it does not return '
'the reversed\n'
' sequence.\n'
'\n'
'5. "clear()" and "copy()" are included for consistency '
'with the\n'
" interfaces of mutable containers that don't support "
'slicing\n'
' operations (such as "dict" and "set")\n'
'\n'
' New in version 3.3: "clear()" and "copy()" methods.\n'
'\n'
'6. The value *n* is an integer, or an object '
'implementing\n'
' "__index__()". Zero and negative values of *n* clear '
'the sequence.\n'
' Items in the sequence are not copied; they are '
'referenced multiple\n'
' times, as explained for "s * n" under *Common '
'Sequence Operations*.\n',
'unary': '\n'
'Unary arithmetic and bitwise operations\n'
'***************************************\n'
'\n'
'All unary arithmetic and bitwise operations have the same '
'priority:\n'
'\n'
' u_expr ::= power | "-" u_expr | "+" u_expr | "~" u_expr\n'
'\n'
'The unary "-" (minus) operator yields the negation of its numeric\n'
'argument.\n'
'\n'
'The unary "+" (plus) operator yields its numeric argument '
'unchanged.\n'
'\n'
'The unary "~" (invert) operator yields the bitwise inversion of '
'its\n'
'integer argument. The bitwise inversion of "x" is defined as\n'
'"-(x+1)". It only applies to integral numbers.\n'
'\n'
'In all three cases, if the argument does not have the proper type, '
'a\n'
'"TypeError" exception is raised.\n',
'while': '\n'
'The "while" statement\n'
'*********************\n'
'\n'
'The "while" statement is used for repeated execution as long as an\n'
'expression is true:\n'
'\n'
' while_stmt ::= "while" expression ":" suite\n'
' ["else" ":" suite]\n'
'\n'
'This repeatedly tests the expression and, if it is true, executes '
'the\n'
'first suite; if the expression is false (which may be the first '
'time\n'
'it is tested) the suite of the "else" clause, if present, is '
'executed\n'
'and the loop terminates.\n'
'\n'
'A "break" statement executed in the first suite terminates the '
'loop\n'
'without executing the "else" clause\'s suite. A "continue" '
'statement\n'
'executed in the first suite skips the rest of the suite and goes '
'back\n'
'to testing the expression.\n',
'with': '\n'
'The "with" statement\n'
'********************\n'
'\n'
'The "with" statement is used to wrap the execution of a block with\n'
'methods defined by a context manager (see section *With Statement\n'
'Context Managers*). This allows common "try"..."except"..."finally"\n'
'usage patterns to be encapsulated for convenient reuse.\n'
'\n'
' with_stmt ::= "with" with_item ("," with_item)* ":" suite\n'
' with_item ::= expression ["as" target]\n'
'\n'
'The execution of the "with" statement with one "item" proceeds as\n'
'follows:\n'
'\n'
'1. The context expression (the expression given in the "with_item")\n'
' is evaluated to obtain a context manager.\n'
'\n'
'2. The context manager\'s "__exit__()" is loaded for later use.\n'
'\n'
'3. The context manager\'s "__enter__()" method is invoked.\n'
'\n'
'4. If a target was included in the "with" statement, the return\n'
' value from "__enter__()" is assigned to it.\n'
'\n'
' Note: The "with" statement guarantees that if the "__enter__()"\n'
' method returns without an error, then "__exit__()" will always '
'be\n'
' called. Thus, if an error occurs during the assignment to the\n'
' target list, it will be treated the same as an error occurring\n'
' within the suite would be. See step 6 below.\n'
'\n'
'5. The suite is executed.\n'
'\n'
'6. The context manager\'s "__exit__()" method is invoked. If an\n'
' exception caused the suite to be exited, its type, value, and\n'
' traceback are passed as arguments to "__exit__()". Otherwise, '
'three\n'
' "None" arguments are supplied.\n'
'\n'
' If the suite was exited due to an exception, and the return '
'value\n'
' from the "__exit__()" method was false, the exception is '
'reraised.\n'
' If the return value was true, the exception is suppressed, and\n'
' execution continues with the statement following the "with"\n'
' statement.\n'
'\n'
' If the suite was exited for any reason other than an exception, '
'the\n'
' return value from "__exit__()" is ignored, and execution '
'proceeds\n'
' at the normal location for the kind of exit that was taken.\n'
'\n'
'With more than one item, the context managers are processed as if\n'
'multiple "with" statements were nested:\n'
'\n'
' with A() as a, B() as b:\n'
' suite\n'
'\n'
'is equivalent to\n'
'\n'
' with A() as a:\n'
' with B() as b:\n'
' suite\n'
'\n'
'Changed in version 3.1: Support for multiple context expressions.\n'
'\n'
'See also: **PEP 0343** - The "with" statement\n'
'\n'
' The specification, background, and examples for the Python '
'"with"\n'
' statement.\n',
'yield': '\n'
'The "yield" statement\n'
'*********************\n'
'\n'
' yield_stmt ::= yield_expression\n'
'\n'
'A "yield" statement is semantically equivalent to a *yield\n'
'expression*. The yield statement can be used to omit the '
'parentheses\n'
'that would otherwise be required in the equivalent yield '
'expression\n'
'statement. For example, the yield statements\n'
'\n'
' yield <expr>\n'
' yield from <expr>\n'
'\n'
'are equivalent to the yield expression statements\n'
'\n'
' (yield <expr>)\n'
' (yield from <expr>)\n'
'\n'
'Yield expressions and statements are only used when defining a\n'
'*generator* function, and are only used in the body of the '
'generator\n'
'function. Using yield in a function definition is sufficient to '
'cause\n'
'that definition to create a generator function instead of a normal\n'
'function.\n'
'\n'
'For full details of "yield" semantics, refer to the *Yield\n'
'expressions* section.\n'}
# pysqlite2/dbapi2.py: the DB-API 2.0 interface
#
# Copyright (C) 2004-2005 Gerhard Häring <[email protected]>
#
# This file is part of pysqlite.
#
# This software is provided 'as-is', without any express or implied
# warranty. In no event will the authors be held liable for any damages
# arising from the use of this software.
#
# Permission is granted to anyone to use this software for any purpose,
# including commercial applications, and to alter it and redistribute it
# freely, subject to the following restrictions:
#
# 1. The origin of this software must not be misrepresented; you must not
# claim that you wrote the original software. If you use this software
# in a product, an acknowledgment in the product documentation would be
# appreciated but is not required.
# 2. Altered source versions must be plainly marked as such, and must not be
# misrepresented as being the original software.
# 3. This notice may not be removed or altered from any source distribution.
import datetime
import time
import collections.abc
from _sqlite3 import *
paramstyle = "qmark"
threadsafety = 1
apilevel = "2.0"
Date = datetime.date
Time = datetime.time
Timestamp = datetime.datetime
def DateFromTicks(ticks):
return Date(*time.localtime(ticks)[:3])
def TimeFromTicks(ticks):
return Time(*time.localtime(ticks)[3:6])
def TimestampFromTicks(ticks):
return Timestamp(*time.localtime(ticks)[:6])
version_info = tuple([int(x) for x in version.split(".")])
sqlite_version_info = tuple([int(x) for x in sqlite_version.split(".")])
Binary = memoryview
collections.abc.Sequence.register(Row)
def register_adapters_and_converters():
def adapt_date(val):
return val.isoformat()
def adapt_datetime(val):
return val.isoformat(" ")
def convert_date(val):
return datetime.date(*map(int, val.split(b"-")))
def convert_timestamp(val):
datepart, timepart = val.split(b" ")
year, month, day = map(int, datepart.split(b"-"))
timepart_full = timepart.split(b".")
hours, minutes, seconds = map(int, timepart_full[0].split(b":"))
if len(timepart_full) == 2:
microseconds = int('{:0<6.6}'.format(timepart_full[1].decode()))
else:
microseconds = 0
val = datetime.datetime(year, month, day, hours, minutes, seconds, microseconds)
return val
register_adapter(datetime.date, adapt_date)
register_adapter(datetime.datetime, adapt_datetime)
register_converter("date", convert_date)
register_converter("timestamp", convert_timestamp)
register_adapters_and_converters()
# Clean up namespace
del(register_adapters_and_converters)
# Mimic the sqlite3 console shell's .dump command
# Author: Paul Kippes <[email protected]>
# Every identifier in sql is quoted based on a comment in sqlite
# documentation "SQLite adds new keywords from time to time when it
# takes on new features. So to prevent your code from being broken by
# future enhancements, you should normally quote any identifier that
# is an English language word, even if you do not have to."
def _iterdump(connection):
"""
Returns an iterator to the dump of the database in an SQL text format.
Used to produce an SQL dump of the database. Useful to save an in-memory
database for later restoration. This function should not be called
directly but instead called from the Connection method, iterdump().
"""
cu = connection.cursor()
yield('BEGIN TRANSACTION;')
# sqlite_master table contains the SQL CREATE statements for the database.
q = """
SELECT "name", "type", "sql"
FROM "sqlite_master"
WHERE "sql" NOT NULL AND
"type" == 'table'
ORDER BY "name"
"""
schema_res = cu.execute(q)
for table_name, type, sql in schema_res.fetchall():
if table_name == 'sqlite_sequence':
yield('DELETE FROM "sqlite_sequence";')
elif table_name == 'sqlite_stat1':
yield('ANALYZE "sqlite_master";')
elif table_name.startswith('sqlite_'):
continue
# NOTE: Virtual table support not implemented
#elif sql.startswith('CREATE VIRTUAL TABLE'):
# qtable = table_name.replace("'", "''")
# yield("INSERT INTO sqlite_master(type,name,tbl_name,rootpage,sql)"\
# "VALUES('table','{0}','{0}',0,'{1}');".format(
# qtable,
# sql.replace("''")))
else:
yield('{0};'.format(sql))
# Build the insert statement for each row of the current table
table_name_ident = table_name.replace('"', '""')
res = cu.execute('PRAGMA table_info("{0}")'.format(table_name_ident))
column_names = [str(table_info[1]) for table_info in res.fetchall()]
q = """SELECT 'INSERT INTO "{0}" VALUES({1})' FROM "{0}";""".format(
table_name_ident,
",".join("""'||quote("{0}")||'""".format(col.replace('"', '""')) for col in column_names))
query_res = cu.execute(q)
for row in query_res:
yield("{0};".format(row[0]))
# Now when the type is 'index', 'trigger', or 'view'
q = """
SELECT "name", "type", "sql"
FROM "sqlite_master"
WHERE "sql" NOT NULL AND
"type" IN ('index', 'trigger', 'view')
"""
schema_res = cu.execute(q)
for name, type, sql in schema_res.fetchall():
yield('{0};'.format(sql))
yield('COMMIT;')
# pysqlite2/__init__.py: the pysqlite2 package.
#
# Copyright (C) 2005 Gerhard Häring <[email protected]>
#
# This file is part of pysqlite.
#
# This software is provided 'as-is', without any express or implied
# warranty. In no event will the authors be held liable for any damages
# arising from the use of this software.
#
# Permission is granted to anyone to use this software for any purpose,
# including commercial applications, and to alter it and redistribute it
# freely, subject to the following restrictions:
#
# 1. The origin of this software must not be misrepresented; you must not
# claim that you wrote the original software. If you use this software
# in a product, an acknowledgment in the product documentation would be
# appreciated but is not required.
# 2. Altered source versions must be plainly marked as such, and must not be
# misrepresented as being the original software.
# 3. This notice may not be removed or altered from any source distribution.
def _():
import sys
if sys.implementation.name == 'ironpython':
import clr
try:
clr.AddReference('IronPython.SQLite')
except:
pass
_()
del _
from sqlite3.dbapi2 import *
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_bytedesign.py
An example adapted from the example-suite
of PythonCard's turtle graphics.
It's based on an article in BYTE magazine
Problem Solving with Logo: Using Turtle
Graphics to Redraw a Design
November 1982, p. 118 - 134
-------------------------------------------
Due to the statement
t.delay(0)
in line 152, which sets the animation delay
to 0, this animation runs in "line per line"
mode as fast as possible.
"""
import math
from turtle import Turtle, mainloop
from time import clock
# wrapper for any additional drawing routines
# that need to know about each other
class Designer(Turtle):
def design(self, homePos, scale):
self.up()
for i in range(5):
self.forward(64.65 * scale)
self.down()
self.wheel(self.position(), scale)
self.up()
self.backward(64.65 * scale)
self.right(72)
self.up()
self.goto(homePos)
self.right(36)
self.forward(24.5 * scale)
self.right(198)
self.down()
self.centerpiece(46 * scale, 143.4, scale)
self.getscreen().tracer(True)
def wheel(self, initpos, scale):
self.right(54)
for i in range(4):
self.pentpiece(initpos, scale)
self.down()
self.left(36)
for i in range(5):
self.tripiece(initpos, scale)
self.left(36)
for i in range(5):
self.down()
self.right(72)
self.forward(28 * scale)
self.up()
self.backward(28 * scale)
self.left(54)
self.getscreen().update()
def tripiece(self, initpos, scale):
oldh = self.heading()
self.down()
self.backward(2.5 * scale)
self.tripolyr(31.5 * scale, scale)
self.up()
self.goto(initpos)
self.setheading(oldh)
self.down()
self.backward(2.5 * scale)
self.tripolyl(31.5 * scale, scale)
self.up()
self.goto(initpos)
self.setheading(oldh)
self.left(72)
self.getscreen().update()
def pentpiece(self, initpos, scale):
oldh = self.heading()
self.up()
self.forward(29 * scale)
self.down()
for i in range(5):
self.forward(18 * scale)
self.right(72)
self.pentr(18 * scale, 75, scale)
self.up()
self.goto(initpos)
self.setheading(oldh)
self.forward(29 * scale)
self.down()
for i in range(5):
self.forward(18 * scale)
self.right(72)
self.pentl(18 * scale, 75, scale)
self.up()
self.goto(initpos)
self.setheading(oldh)
self.left(72)
self.getscreen().update()
def pentl(self, side, ang, scale):
if side < (2 * scale): return
self.forward(side)
self.left(ang)
self.pentl(side - (.38 * scale), ang, scale)
def pentr(self, side, ang, scale):
if side < (2 * scale): return
self.forward(side)
self.right(ang)
self.pentr(side - (.38 * scale), ang, scale)
def tripolyr(self, side, scale):
if side < (4 * scale): return
self.forward(side)
self.right(111)
self.forward(side / 1.78)
self.right(111)
self.forward(side / 1.3)
self.right(146)
self.tripolyr(side * .75, scale)
def tripolyl(self, side, scale):
if side < (4 * scale): return
self.forward(side)
self.left(111)
self.forward(side / 1.78)
self.left(111)
self.forward(side / 1.3)
self.left(146)
self.tripolyl(side * .75, scale)
def centerpiece(self, s, a, scale):
self.forward(s); self.left(a)
if s < (7.5 * scale):
return
self.centerpiece(s - (1.2 * scale), a, scale)
def main():
t = Designer()
t.speed(0)
t.hideturtle()
t.getscreen().delay(0)
t.getscreen().tracer(0)
at = clock()
t.design(t.position(), 2)
et = clock()
return "runtime: %.2f sec." % (et-at)
if __name__ == '__main__':
msg = main()
print(msg)
mainloop()
# File: tdemo_chaos.py
# Author: Gregor Lingl
# Date: 2009-06-24
# A demonstration of chaos
from turtle import *
N = 80
def f(x):
return 3.9*x*(1-x)
def g(x):
return 3.9*(x-x**2)
def h(x):
return 3.9*x-3.9*x*x
def jumpto(x, y):
penup(); goto(x,y)
def line(x1, y1, x2, y2):
jumpto(x1, y1)
pendown()
goto(x2, y2)
def coosys():
line(-1, 0, N+1, 0)
line(0, -0.1, 0, 1.1)
def plot(fun, start, color):
pencolor(color)
x = start
jumpto(0, x)
pendown()
dot(5)
for i in range(N):
x=fun(x)
goto(i+1,x)
dot(5)
def main():
reset()
setworldcoordinates(-1.0,-0.1, N+1, 1.1)
speed(0)
hideturtle()
coosys()
plot(f, 0.35, "blue")
plot(g, 0.35, "green")
plot(h, 0.35, "red")
# Now zoom in:
for s in range(100):
setworldcoordinates(0.5*s,-0.1, N+1, 1.1)
return "Done!"
if __name__ == "__main__":
main()
mainloop()
#!/usr/bin/env python3
# -*- coding: cp1252 -*-
""" turtle-example-suite:
tdemo_clock.py
Enhanced clock-program, showing date
and time
------------------------------------
Press STOP to exit the program!
------------------------------------
"""
from turtle import *
from datetime import datetime
def jump(distanz, winkel=0):
penup()
right(winkel)
forward(distanz)
left(winkel)
pendown()
def hand(laenge, spitze):
fd(laenge*1.15)
rt(90)
fd(spitze/2.0)
lt(120)
fd(spitze)
lt(120)
fd(spitze)
lt(120)
fd(spitze/2.0)
def make_hand_shape(name, laenge, spitze):
reset()
jump(-laenge*0.15)
begin_poly()
hand(laenge, spitze)
end_poly()
hand_form = get_poly()
register_shape(name, hand_form)
def clockface(radius):
reset()
pensize(7)
for i in range(60):
jump(radius)
if i % 5 == 0:
fd(25)
jump(-radius-25)
else:
dot(3)
jump(-radius)
rt(6)
def setup():
global second_hand, minute_hand, hour_hand, writer
mode("logo")
make_hand_shape("second_hand", 125, 25)
make_hand_shape("minute_hand", 130, 25)
make_hand_shape("hour_hand", 90, 25)
clockface(160)
second_hand = Turtle()
second_hand.shape("second_hand")
second_hand.color("gray20", "gray80")
minute_hand = Turtle()
minute_hand.shape("minute_hand")
minute_hand.color("blue1", "red1")
hour_hand = Turtle()
hour_hand.shape("hour_hand")
hour_hand.color("blue3", "red3")
for hand in second_hand, minute_hand, hour_hand:
hand.resizemode("user")
hand.shapesize(1, 1, 3)
hand.speed(0)
ht()
writer = Turtle()
#writer.mode("logo")
writer.ht()
writer.pu()
writer.bk(85)
def wochentag(t):
wochentag = ["Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"]
return wochentag[t.weekday()]
def datum(z):
monat = ["Jan.", "Feb.", "Mar.", "Apr.", "May", "June",
"July", "Aug.", "Sep.", "Oct.", "Nov.", "Dec."]
j = z.year
m = monat[z.month - 1]
t = z.day
return "%s %d %d" % (m, t, j)
def tick():
t = datetime.today()
sekunde = t.second + t.microsecond*0.000001
minute = t.minute + sekunde/60.0
stunde = t.hour + minute/60.0
try:
tracer(False) # Terminator can occur here
writer.clear()
writer.home()
writer.forward(65)
writer.write(wochentag(t),
align="center", font=("Courier", 14, "bold"))
writer.back(150)
writer.write(datum(t),
align="center", font=("Courier", 14, "bold"))
writer.forward(85)
tracer(True)
second_hand.setheading(6*sekunde) # or here
minute_hand.setheading(6*minute)
hour_hand.setheading(30*stunde)
tracer(True)
ontimer(tick, 100)
except Terminator:
pass # turtledemo user pressed STOP
def main():
tracer(False)
setup()
tracer(True)
tick()
return "EVENTLOOP"
if __name__ == "__main__":
mode("logo")
msg = main()
print(msg)
mainloop()
# colormixer
from turtle import Screen, Turtle, mainloop
class ColorTurtle(Turtle):
def __init__(self, x, y):
Turtle.__init__(self)
self.shape("turtle")
self.resizemode("user")
self.shapesize(3,3,5)
self.pensize(10)
self._color = [0,0,0]
self.x = x
self._color[x] = y
self.color(self._color)
self.speed(0)
self.left(90)
self.pu()
self.goto(x,0)
self.pd()
self.sety(1)
self.pu()
self.sety(y)
self.pencolor("gray25")
self.ondrag(self.shift)
def shift(self, x, y):
self.sety(max(0,min(y,1)))
self._color[self.x] = self.ycor()
self.fillcolor(self._color)
setbgcolor()
def setbgcolor():
screen.bgcolor(red.ycor(), green.ycor(), blue.ycor())
def main():
global screen, red, green, blue
screen = Screen()
screen.delay(0)
screen.setworldcoordinates(-1, -0.3, 3, 1.3)
red = ColorTurtle(0, .5)
green = ColorTurtle(1, .5)
blue = ColorTurtle(2, .5)
setbgcolor()
writer = Turtle()
writer.ht()
writer.pu()
writer.goto(1,1.15)
writer.write("DRAG!",align="center",font=("Arial",30,("bold","italic")))
return "EVENTLOOP"
if __name__ == "__main__":
msg = main()
print(msg)
mainloop()
#!/usr/bin/env python3
""" turtlegraphics-example-suite:
tdemo_forest.py
Displays a 'forest' of 3 breadth-first-trees
similar to the one in tree.
For further remarks see tree.py
This example is a 'breadth-first'-rewrite of
a Logo program written by Erich Neuwirth. See
http://homepage.univie.ac.at/erich.neuwirth/
"""
from turtle import Turtle, colormode, tracer, mainloop
from random import randrange
from time import clock
def symRandom(n):
return randrange(-n,n+1)
def randomize( branchlist, angledist, sizedist ):
return [ (angle+symRandom(angledist),
sizefactor*1.01**symRandom(sizedist))
for angle, sizefactor in branchlist ]
def randomfd( t, distance, parts, angledist ):
for i in range(parts):
t.left(symRandom(angledist))
t.forward( (1.0 * distance)/parts )
def tree(tlist, size, level, widthfactor, branchlists, angledist=10, sizedist=5):
# benutzt Liste von turtles und Liste von Zweiglisten,
# fuer jede turtle eine!
if level > 0:
lst = []
brs = []
for t, branchlist in list(zip(tlist,branchlists)):
t.pensize( size * widthfactor )
t.pencolor( 255 - (180 - 11 * level + symRandom(15)),
180 - 11 * level + symRandom(15),
0 )
t.pendown()
randomfd(t, size, level, angledist )
yield 1
for angle, sizefactor in branchlist:
t.left(angle)
lst.append(t.clone())
brs.append(randomize(branchlist, angledist, sizedist))
t.right(angle)
for x in tree(lst, size*sizefactor, level-1, widthfactor, brs,
angledist, sizedist):
yield None
def start(t,x,y):
colormode(255)
t.reset()
t.speed(0)
t.hideturtle()
t.left(90)
t.penup()
t.setpos(x,y)
t.pendown()
def doit1(level, pen):
pen.hideturtle()
start(pen, 20, -208)
t = tree( [pen], 80, level, 0.1, [[ (45,0.69), (0,0.65), (-45,0.71) ]] )
return t
def doit2(level, pen):
pen.hideturtle()
start(pen, -135, -130)
t = tree( [pen], 120, level, 0.1, [[ (45,0.69), (-45,0.71) ]] )
return t
def doit3(level, pen):
pen.hideturtle()
start(pen, 190, -90)
t = tree( [pen], 100, level, 0.1, [[ (45,0.7), (0,0.72), (-45,0.65) ]] )
return t
# Hier 3 Baumgeneratoren:
def main():
p = Turtle()
p.ht()
tracer(75,0)
u = doit1(6, Turtle(undobuffersize=1))
s = doit2(7, Turtle(undobuffersize=1))
t = doit3(5, Turtle(undobuffersize=1))
a = clock()
while True:
done = 0
for b in u,s,t:
try:
b.__next__()
except:
done += 1
if done == 3:
break
tracer(1,10)
b = clock()
return "runtime: %.2f sec." % (b-a)
if __name__ == '__main__':
main()
mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_fractalCurves.py
This program draws two fractal-curve-designs:
(1) A hilbert curve (in a box)
(2) A combination of Koch-curves.
The CurvesTurtle class and the fractal-curve-
methods are taken from the PythonCard example
scripts for turtle-graphics.
"""
from turtle import *
from time import sleep, clock
class CurvesTurtle(Pen):
# example derived from
# Turtle Geometry: The Computer as a Medium for Exploring Mathematics
# by Harold Abelson and Andrea diSessa
# p. 96-98
def hilbert(self, size, level, parity):
if level == 0:
return
# rotate and draw first subcurve with opposite parity to big curve
self.left(parity * 90)
self.hilbert(size, level - 1, -parity)
# interface to and draw second subcurve with same parity as big curve
self.forward(size)
self.right(parity * 90)
self.hilbert(size, level - 1, parity)
# third subcurve
self.forward(size)
self.hilbert(size, level - 1, parity)
# fourth subcurve
self.right(parity * 90)
self.forward(size)
self.hilbert(size, level - 1, -parity)
# a final turn is needed to make the turtle
# end up facing outward from the large square
self.left(parity * 90)
# Visual Modeling with Logo: A Structural Approach to Seeing
# by James Clayson
# Koch curve, after Helge von Koch who introduced this geometric figure in 1904
# p. 146
def fractalgon(self, n, rad, lev, dir):
import math
# if dir = 1 turn outward
# if dir = -1 turn inward
edge = 2 * rad * math.sin(math.pi / n)
self.pu()
self.fd(rad)
self.pd()
self.rt(180 - (90 * (n - 2) / n))
for i in range(n):
self.fractal(edge, lev, dir)
self.rt(360 / n)
self.lt(180 - (90 * (n - 2) / n))
self.pu()
self.bk(rad)
self.pd()
# p. 146
def fractal(self, dist, depth, dir):
if depth < 1:
self.fd(dist)
return
self.fractal(dist / 3, depth - 1, dir)
self.lt(60 * dir)
self.fractal(dist / 3, depth - 1, dir)
self.rt(120 * dir)
self.fractal(dist / 3, depth - 1, dir)
self.lt(60 * dir)
self.fractal(dist / 3, depth - 1, dir)
def main():
ft = CurvesTurtle()
ft.reset()
ft.speed(0)
ft.ht()
ft.getscreen().tracer(1,0)
ft.pu()
size = 6
ft.setpos(-33*size, -32*size)
ft.pd()
ta=clock()
ft.fillcolor("red")
ft.begin_fill()
ft.fd(size)
ft.hilbert(size, 6, 1)
# frame
ft.fd(size)
for i in range(3):
ft.lt(90)
ft.fd(size*(64+i%2))
ft.pu()
for i in range(2):
ft.fd(size)
ft.rt(90)
ft.pd()
for i in range(4):
ft.fd(size*(66+i%2))
ft.rt(90)
ft.end_fill()
tb=clock()
res = "Hilbert: %.2fsec. " % (tb-ta)
sleep(3)
ft.reset()
ft.speed(0)
ft.ht()
ft.getscreen().tracer(1,0)
ta=clock()
ft.color("black", "blue")
ft.begin_fill()
ft.fractalgon(3, 250, 4, 1)
ft.end_fill()
ft.begin_fill()
ft.color("red")
ft.fractalgon(3, 200, 4, -1)
ft.end_fill()
tb=clock()
res += "Koch: %.2fsec." % (tb-ta)
return res
if __name__ == '__main__':
msg = main()
print(msg)
mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
xtx_lindenmayer_indian.py
Each morning women in Tamil Nadu, in southern
India, place designs, created by using rice
flour and known as kolam on the thresholds of
their homes.
These can be described by Lindenmayer systems,
which can easily be implemented with turtle
graphics and Python.
Two examples are shown here:
(1) the snake kolam
(2) anklets of Krishna
Taken from Marcia Ascher: Mathematics
Elsewhere, An Exploration of Ideas Across
Cultures
"""
################################
# Mini Lindenmayer tool
###############################
from turtle import *
def replace( seq, replacementRules, n ):
for i in range(n):
newseq = ""
for element in seq:
newseq = newseq + replacementRules.get(element,element)
seq = newseq
return seq
def draw( commands, rules ):
for b in commands:
try:
rules[b]()
except TypeError:
try:
draw(rules[b], rules)
except:
pass
def main():
################################
# Example 1: Snake kolam
################################
def r():
right(45)
def l():
left(45)
def f():
forward(7.5)
snake_rules = {"-":r, "+":l, "f":f, "b":"f+f+f--f--f+f+f"}
snake_replacementRules = {"b": "b+f+b--f--b+f+b"}
snake_start = "b--f--b--f"
drawing = replace(snake_start, snake_replacementRules, 3)
reset()
speed(3)
tracer(1,0)
ht()
up()
backward(195)
down()
draw(drawing, snake_rules)
from time import sleep
sleep(3)
################################
# Example 2: Anklets of Krishna
################################
def A():
color("red")
circle(10,90)
def B():
from math import sqrt
color("black")
l = 5/sqrt(2)
forward(l)
circle(l, 270)
forward(l)
def F():
color("green")
forward(10)
krishna_rules = {"a":A, "b":B, "f":F}
krishna_replacementRules = {"a" : "afbfa", "b" : "afbfbfbfa" }
krishna_start = "fbfbfbfb"
reset()
speed(0)
tracer(3,0)
ht()
left(45)
drawing = replace(krishna_start, krishna_replacementRules, 3)
draw(drawing, krishna_rules)
tracer(1)
return "Done!"
if __name__=='__main__':
msg = main()
print(msg)
mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_minimal_hanoi.py
A minimal 'Towers of Hanoi' animation:
A tower of 6 discs is transferred from the
left to the right peg.
An imho quite elegant and concise
implementation using a tower class, which
is derived from the built-in type list.
Discs are turtles with shape "square", but
stretched to rectangles by shapesize()
---------------------------------------
To exit press STOP button
---------------------------------------
"""
from turtle import *
class Disc(Turtle):
def __init__(self, n):
Turtle.__init__(self, shape="square", visible=False)
self.pu()
self.shapesize(1.5, n*1.5, 2) # square-->rectangle
self.fillcolor(n/6., 0, 1-n/6.)
self.st()
class Tower(list):
"Hanoi tower, a subclass of built-in type list"
def __init__(self, x):
"create an empty tower. x is x-position of peg"
self.x = x
def push(self, d):
d.setx(self.x)
d.sety(-150+34*len(self))
self.append(d)
def pop(self):
d = list.pop(self)
d.sety(150)
return d
def hanoi(n, from_, with_, to_):
if n > 0:
hanoi(n-1, from_, to_, with_)
to_.push(from_.pop())
hanoi(n-1, with_, from_, to_)
def play():
onkey(None,"space")
clear()
try:
hanoi(6, t1, t2, t3)
write("press STOP button to exit",
align="center", font=("Courier", 16, "bold"))
except Terminator:
pass # turtledemo user pressed STOP
def main():
global t1, t2, t3
ht(); penup(); goto(0, -225) # writer turtle
t1 = Tower(-250)
t2 = Tower(0)
t3 = Tower(250)
# make tower of 6 discs
for i in range(6,0,-1):
t1.push(Disc(i))
# prepare spartanic user interface ;-)
write("press spacebar to start game",
align="center", font=("Courier", 16, "bold"))
onkey(play, "space")
listen()
return "EVENTLOOP"
if __name__=="__main__":
msg = main()
print(msg)
mainloop()
""" turtle-example-suite:
tdemo_nim.py
Play nim against the computer. The player
who takes the last stick is the winner.
Implements the model-view-controller
design pattern.
"""
import turtle
import random
import time
SCREENWIDTH = 640
SCREENHEIGHT = 480
MINSTICKS = 7
MAXSTICKS = 31
HUNIT = SCREENHEIGHT // 12
WUNIT = SCREENWIDTH // ((MAXSTICKS // 5) * 11 + (MAXSTICKS % 5) * 2)
SCOLOR = (63, 63, 31)
HCOLOR = (255, 204, 204)
COLOR = (204, 204, 255)
def randomrow():
return random.randint(MINSTICKS, MAXSTICKS)
def computerzug(state):
xored = state[0] ^ state[1] ^ state[2]
if xored == 0:
return randommove(state)
for z in range(3):
s = state[z] ^ xored
if s <= state[z]:
move = (z, s)
return move
def randommove(state):
m = max(state)
while True:
z = random.randint(0,2)
if state[z] > (m > 1):
break
rand = random.randint(m > 1, state[z]-1)
return z, rand
class NimModel(object):
def __init__(self, game):
self.game = game
def setup(self):
if self.game.state not in [Nim.CREATED, Nim.OVER]:
return
self.sticks = [randomrow(), randomrow(), randomrow()]
self.player = 0
self.winner = None
self.game.view.setup()
self.game.state = Nim.RUNNING
def move(self, row, col):
maxspalte = self.sticks[row]
self.sticks[row] = col
self.game.view.notify_move(row, col, maxspalte, self.player)
if self.game_over():
self.game.state = Nim.OVER
self.winner = self.player
self.game.view.notify_over()
elif self.player == 0:
self.player = 1
row, col = computerzug(self.sticks)
self.move(row, col)
self.player = 0
def game_over(self):
return self.sticks == [0, 0, 0]
def notify_move(self, row, col):
if self.sticks[row] <= col:
return
self.move(row, col)
class Stick(turtle.Turtle):
def __init__(self, row, col, game):
turtle.Turtle.__init__(self, visible=False)
self.row = row
self.col = col
self.game = game
x, y = self.coords(row, col)
self.shape("square")
self.shapesize(HUNIT/10.0, WUNIT/20.0)
self.speed(0)
self.pu()
self.goto(x,y)
self.color("white")
self.showturtle()
def coords(self, row, col):
packet, remainder = divmod(col, 5)
x = (3 + 11 * packet + 2 * remainder) * WUNIT
y = (2 + 3 * row) * HUNIT
return x - SCREENWIDTH // 2 + WUNIT // 2, SCREENHEIGHT // 2 - y - HUNIT // 2
def makemove(self, x, y):
if self.game.state != Nim.RUNNING:
return
self.game.controller.notify_move(self.row, self.col)
class NimView(object):
def __init__(self, game):
self.game = game
self.screen = game.screen
self.model = game.model
self.screen.colormode(255)
self.screen.tracer(False)
self.screen.bgcolor((240, 240, 255))
self.writer = turtle.Turtle(visible=False)
self.writer.pu()
self.writer.speed(0)
self.sticks = {}
for row in range(3):
for col in range(MAXSTICKS):
self.sticks[(row, col)] = Stick(row, col, game)
self.display("... a moment please ...")
self.screen.tracer(True)
def display(self, msg1, msg2=None):
self.screen.tracer(False)
self.writer.clear()
if msg2 is not None:
self.writer.goto(0, - SCREENHEIGHT // 2 + 48)
self.writer.pencolor("red")
self.writer.write(msg2, align="center", font=("Courier",18,"bold"))
self.writer.goto(0, - SCREENHEIGHT // 2 + 20)
self.writer.pencolor("black")
self.writer.write(msg1, align="center", font=("Courier",14,"bold"))
self.screen.tracer(True)
def setup(self):
self.screen.tracer(False)
for row in range(3):
for col in range(self.model.sticks[row]):
self.sticks[(row, col)].color(SCOLOR)
for row in range(3):
for col in range(self.model.sticks[row], MAXSTICKS):
self.sticks[(row, col)].color("white")
self.display("Your turn! Click leftmost stick to remove.")
self.screen.tracer(True)
def notify_move(self, row, col, maxspalte, player):
if player == 0:
farbe = HCOLOR
for s in range(col, maxspalte):
self.sticks[(row, s)].color(farbe)
else:
self.display(" ... thinking ... ")
time.sleep(0.5)
self.display(" ... thinking ... aaah ...")
farbe = COLOR
for s in range(maxspalte-1, col-1, -1):
time.sleep(0.2)
self.sticks[(row, s)].color(farbe)
self.display("Your turn! Click leftmost stick to remove.")
def notify_over(self):
if self.game.model.winner == 0:
msg2 = "Congrats. You're the winner!!!"
else:
msg2 = "Sorry, the computer is the winner."
self.display("To play again press space bar. To leave press ESC.", msg2)
def clear(self):
if self.game.state == Nim.OVER:
self.screen.clear()
class NimController(object):
def __init__(self, game):
self.game = game
self.sticks = game.view.sticks
self.BUSY = False
for stick in self.sticks.values():
stick.onclick(stick.makemove)
self.game.screen.onkey(self.game.model.setup, "space")
self.game.screen.onkey(self.game.view.clear, "Escape")
self.game.view.display("Press space bar to start game")
self.game.screen.listen()
def notify_move(self, row, col):
if self.BUSY:
return
self.BUSY = True
self.game.model.notify_move(row, col)
self.BUSY = False
class Nim(object):
CREATED = 0
RUNNING = 1
OVER = 2
def __init__(self, screen):
self.state = Nim.CREATED
self.screen = screen
self.model = NimModel(self)
self.view = NimView(self)
self.controller = NimController(self)
def main():
mainscreen = turtle.Screen()
mainscreen.mode("standard")
mainscreen.setup(SCREENWIDTH, SCREENHEIGHT)
nim = Nim(mainscreen)
return "EVENTLOOP"
if __name__ == "__main__":
main()
turtle.mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_paint.py
A simple event-driven paint program
- left mouse button moves turtle
- middle mouse button changes color
- right mouse button toogles betweem pen up
(no line drawn when the turtle moves) and
pen down (line is drawn). If pen up follows
at least two pen-down moves, the polygon that
includes the starting point is filled.
-------------------------------------------
Play around by clicking into the canvas
using all three mouse buttons.
-------------------------------------------
To exit press STOP button
-------------------------------------------
"""
from turtle import *
def switchupdown(x=0, y=0):
if pen()["pendown"]:
end_fill()
up()
else:
down()
begin_fill()
def changecolor(x=0, y=0):
global colors
colors = colors[1:]+colors[:1]
color(colors[0])
def main():
global colors
shape("circle")
resizemode("user")
shapesize(.5)
width(3)
colors=["red", "green", "blue", "yellow"]
color(colors[0])
switchupdown()
onscreenclick(goto,1)
onscreenclick(changecolor,2)
onscreenclick(switchupdown,3)
return "EVENTLOOP"
if __name__ == "__main__":
msg = main()
print(msg)
mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_peace.py
A simple drawing suitable as a beginner's
programming example. Aside from the
peacecolors assignment and the for loop,
it only uses turtle commands.
"""
from turtle import *
def main():
peacecolors = ("red3", "orange", "yellow",
"seagreen4", "orchid4",
"royalblue1", "dodgerblue4")
reset()
Screen()
up()
goto(-320,-195)
width(70)
for pcolor in peacecolors:
color(pcolor)
down()
forward(640)
up()
backward(640)
left(90)
forward(66)
right(90)
width(25)
color("white")
goto(0,-170)
down()
circle(170)
left(90)
forward(340)
up()
left(180)
forward(170)
right(45)
down()
forward(170)
up()
backward(170)
left(90)
down()
forward(170)
up()
goto(0,300) # vanish if hideturtle() is not available ;-)
return "Done!"
if __name__ == "__main__":
main()
mainloop()
#!/usr/bin/env python3
""" xturtle-example-suite:
xtx_kites_and_darts.py
Constructs two aperiodic penrose-tilings,
consisting of kites and darts, by the method
of inflation in six steps.
Starting points are the patterns "sun"
consisting of five kites and "star"
consisting of five darts.
For more information see:
http://en.wikipedia.org/wiki/Penrose_tiling
-------------------------------------------
"""
from turtle import *
from math import cos, pi
from time import clock, sleep
f = (5**0.5-1)/2.0 # (sqrt(5)-1)/2 -- golden ratio
d = 2 * cos(3*pi/10)
def kite(l):
fl = f * l
lt(36)
fd(l)
rt(108)
fd(fl)
rt(36)
fd(fl)
rt(108)
fd(l)
rt(144)
def dart(l):
fl = f * l
lt(36)
fd(l)
rt(144)
fd(fl)
lt(36)
fd(fl)
rt(144)
fd(l)
rt(144)
def inflatekite(l, n):
if n == 0:
px, py = pos()
h, x, y = int(heading()), round(px,3), round(py,3)
tiledict[(h,x,y)] = True
return
fl = f * l
lt(36)
inflatedart(fl, n-1)
fd(l)
rt(144)
inflatekite(fl, n-1)
lt(18)
fd(l*d)
rt(162)
inflatekite(fl, n-1)
lt(36)
fd(l)
rt(180)
inflatedart(fl, n-1)
lt(36)
def inflatedart(l, n):
if n == 0:
px, py = pos()
h, x, y = int(heading()), round(px,3), round(py,3)
tiledict[(h,x,y)] = False
return
fl = f * l
inflatekite(fl, n-1)
lt(36)
fd(l)
rt(180)
inflatedart(fl, n-1)
lt(54)
fd(l*d)
rt(126)
inflatedart(fl, n-1)
fd(l)
rt(144)
def draw(l, n, th=2):
clear()
l = l * f**n
shapesize(l/100.0, l/100.0, th)
for k in tiledict:
h, x, y = k
setpos(x, y)
setheading(h)
if tiledict[k]:
shape("kite")
color("black", (0, 0.75, 0))
else:
shape("dart")
color("black", (0.75, 0, 0))
stamp()
def sun(l, n):
for i in range(5):
inflatekite(l, n)
lt(72)
def star(l,n):
for i in range(5):
inflatedart(l, n)
lt(72)
def makeshapes():
tracer(0)
begin_poly()
kite(100)
end_poly()
register_shape("kite", get_poly())
begin_poly()
dart(100)
end_poly()
register_shape("dart", get_poly())
tracer(1)
def start():
reset()
ht()
pu()
makeshapes()
resizemode("user")
def test(l=200, n=4, fun=sun, startpos=(0,0), th=2):
global tiledict
goto(startpos)
setheading(0)
tiledict = {}
a = clock()
tracer(0)
fun(l, n)
b = clock()
draw(l, n, th)
tracer(1)
c = clock()
print("Calculation: %7.4f s" % (b - a))
print("Drawing: %7.4f s" % (c - b))
print("Together: %7.4f s" % (c - a))
nk = len([x for x in tiledict if tiledict[x]])
nd = len([x for x in tiledict if not tiledict[x]])
print("%d kites and %d darts = %d pieces." % (nk, nd, nk+nd))
def demo(fun=sun):
start()
for i in range(8):
a = clock()
test(300, i, fun)
b = clock()
t = b - a
if t < 2:
sleep(2 - t)
def main():
#title("Penrose-tiling with kites and darts.")
mode("logo")
bgcolor(0.3, 0.3, 0)
demo(sun)
sleep(2)
demo(star)
pencolor("black")
goto(0,-200)
pencolor(0.7,0.7,1)
write("Please wait...",
align="center", font=('Arial Black', 36, 'bold'))
test(600, 8, startpos=(70, 117))
return "Done"
if __name__ == "__main__":
msg = main()
mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_planets_and_moon.py
Gravitational system simulation using the
approximation method from Feynman-lectures,
p.9-8, using turtlegraphics.
Example: heavy central body, light planet,
very light moon!
Planet has a circular orbit, moon a stable
orbit around the planet.
You can hold the movement temporarily by
pressing the left mouse button with the
mouse over the scrollbar of the canvas.
"""
from turtle import Shape, Turtle, mainloop, Vec2D as Vec
from time import sleep
G = 8
class GravSys(object):
def __init__(self):
self.planets = []
self.t = 0
self.dt = 0.01
def init(self):
for p in self.planets:
p.init()
def start(self):
for i in range(10000):
self.t += self.dt
for p in self.planets:
p.step()
class Star(Turtle):
def __init__(self, m, x, v, gravSys, shape):
Turtle.__init__(self, shape=shape)
self.penup()
self.m = m
self.setpos(x)
self.v = v
gravSys.planets.append(self)
self.gravSys = gravSys
self.resizemode("user")
self.pendown()
def init(self):
dt = self.gravSys.dt
self.a = self.acc()
self.v = self.v + 0.5*dt*self.a
def acc(self):
a = Vec(0,0)
for planet in self.gravSys.planets:
if planet != self:
v = planet.pos()-self.pos()
a += (G*planet.m/abs(v)**3)*v
return a
def step(self):
dt = self.gravSys.dt
self.setpos(self.pos() + dt*self.v)
if self.gravSys.planets.index(self) != 0:
self.setheading(self.towards(self.gravSys.planets[0]))
self.a = self.acc()
self.v = self.v + dt*self.a
## create compound yellow/blue turtleshape for planets
def main():
s = Turtle()
s.reset()
s.getscreen().tracer(0,0)
s.ht()
s.pu()
s.fd(6)
s.lt(90)
s.begin_poly()
s.circle(6, 180)
s.end_poly()
m1 = s.get_poly()
s.begin_poly()
s.circle(6,180)
s.end_poly()
m2 = s.get_poly()
planetshape = Shape("compound")
planetshape.addcomponent(m1,"orange")
planetshape.addcomponent(m2,"blue")
s.getscreen().register_shape("planet", planetshape)
s.getscreen().tracer(1,0)
## setup gravitational system
gs = GravSys()
sun = Star(1000000, Vec(0,0), Vec(0,-2.5), gs, "circle")
sun.color("yellow")
sun.shapesize(1.8)
sun.pu()
earth = Star(12500, Vec(210,0), Vec(0,195), gs, "planet")
earth.pencolor("green")
earth.shapesize(0.8)
moon = Star(1, Vec(220,0), Vec(0,295), gs, "planet")
moon.pencolor("blue")
moon.shapesize(0.5)
gs.init()
gs.start()
return "Done!"
if __name__ == '__main__':
main()
mainloop()
""" turtle-example-suite:
tdemo_round_dance.py
(Needs version 1.1 of the turtle module that
comes with Python 3.1)
Dancing turtles have a compound shape
consisting of a series of triangles of
decreasing size.
Turtles march along a circle while rotating
pairwise in opposite direction, with one
exception. Does that breaking of symmetry
enhance the attractiveness of the example?
Press any key to stop the animation.
Technically: demonstrates use of compound
shapes, transformation of shapes as well as
cloning turtles. The animation is
controlled through update().
"""
from turtle import *
def stop():
global running
running = False
def main():
global running
clearscreen()
bgcolor("gray10")
tracer(False)
shape("triangle")
f = 0.793402
phi = 9.064678
s = 5
c = 1
# create compound shape
sh = Shape("compound")
for i in range(10):
shapesize(s)
p =get_shapepoly()
s *= f
c *= f
tilt(-phi)
sh.addcomponent(p, (c, 0.25, 1-c), "black")
register_shape("multitri", sh)
# create dancers
shapesize(1)
shape("multitri")
pu()
setpos(0, -200)
dancers = []
for i in range(180):
fd(7)
tilt(-4)
lt(2)
update()
if i % 12 == 0:
dancers.append(clone())
home()
# dance
running = True
onkeypress(stop)
listen()
cs = 1
while running:
ta = -4
for dancer in dancers:
dancer.fd(7)
dancer.lt(2)
dancer.tilt(ta)
ta = -4 if ta > 0 else 2
if cs < 180:
right(4)
shapesize(cs)
cs *= 1.005
update()
return "DONE!"
if __name__=='__main__':
print(main())
mainloop()
#!/usr/bin/env python3
""" turtle-example-suite:
tdemo_tree.py
Displays a 'breadth-first-tree' - in contrast
to the classical Logo tree drawing programs,
which use a depth-first-algorithm.
Uses:
(1) a tree-generator, where the drawing is
quasi the side-effect, whereas the generator
always yields None.
(2) Turtle-cloning: At each branching point
the current pen is cloned. So in the end
there are 1024 turtles.
"""
from turtle import Turtle, mainloop
from time import clock
def tree(plist, l, a, f):
""" plist is list of pens
l is length of branch
a is half of the angle between 2 branches
f is factor by which branch is shortened
from level to level."""
if l > 3:
lst = []
for p in plist:
p.forward(l)
q = p.clone()
p.left(a)
q.right(a)
lst.append(p)
lst.append(q)
for x in tree(lst, l*f, a, f):
yield None
def maketree():
p = Turtle()
p.setundobuffer(None)
p.hideturtle()
p.speed(0)
p.getscreen().tracer(30,0)
p.left(90)
p.penup()
p.forward(-210)
p.pendown()
t = tree([p], 200, 65, 0.6375)
for x in t:
pass
print(len(p.getscreen().turtles()))
def main():
a=clock()
maketree()
b=clock()
return "done: %.2f sec." % (b-a)
if __name__ == "__main__":
msg = main()
print(msg)
mainloop()
"""turtledemo.two_canvases
Use TurtleScreen and RawTurtle to draw on two
distinct canvases in a separate windows. The
new window must be separately closed in
addition to pressing the STOP button.
"""
from turtle import TurtleScreen, RawTurtle, TK
def main():
root = TK.Tk()
cv1 = TK.Canvas(root, width=300, height=200, bg="#ddffff")
cv2 = TK.Canvas(root, width=300, height=200, bg="#ffeeee")
cv1.pack()
cv2.pack()
s1 = TurtleScreen(cv1)
s1.bgcolor(0.85, 0.85, 1)
s2 = TurtleScreen(cv2)
s2.bgcolor(1, 0.85, 0.85)
p = RawTurtle(s1)
q = RawTurtle(s2)
p.color("red", (1, 0.85, 0.85))
p.width(3)
q.color("blue", (0.85, 0.85, 1))
q.width(3)
for t in p,q:
t.shape("turtle")
t.lt(36)
q.lt(180)
for t in p, q:
t.begin_fill()
for i in range(5):
for t in p, q:
t.fd(50)
t.lt(72)
for t in p,q:
t.end_fill()
t.lt(54)
t.pu()
t.bk(50)
return "EVENTLOOP"
if __name__ == '__main__':
main()
TK.mainloop() # keep window open until user closes it
""" turtle-example-suite:
tdemo_wikipedia3.py
This example is
inspired by the Wikipedia article on turtle
graphics. (See example wikipedia1 for URLs)
First we create (ne-1) (i.e. 35 in this
example) copies of our first turtle p.
Then we let them perform their steps in
parallel.
Followed by a complete undo().
"""
from turtle import Screen, Turtle, mainloop
from time import clock, sleep
def mn_eck(p, ne,sz):
turtlelist = [p]
#create ne-1 additional turtles
for i in range(1,ne):
q = p.clone()
q.rt(360.0/ne)
turtlelist.append(q)
p = q
for i in range(ne):
c = abs(ne/2.0-i)/(ne*.7)
# let those ne turtles make a step
# in parallel:
for t in turtlelist:
t.rt(360./ne)
t.pencolor(1-c,0,c)
t.fd(sz)
def main():
s = Screen()
s.bgcolor("black")
p=Turtle()
p.speed(0)
p.hideturtle()
p.pencolor("red")
p.pensize(3)
s.tracer(36,0)
at = clock()
mn_eck(p, 36, 19)
et = clock()
z1 = et-at
sleep(1)
at = clock()
while any([t.undobufferentries() for t in s.turtles()]):
for t in s.turtles():
t.undo()
et = clock()
return "runtime: %.3f sec" % (z1+et-at)
if __name__ == '__main__':
msg = main()
print(msg)
mainloop()
"""
--------------------------------------
About this viewer
--------------------------------------
Tiny demo viewer to view turtle graphics example scripts.
Quickly and dirtyly assembled by Gregor Lingl.
June, 2006
For more information see: turtledemo - Help
Have fun!
"""
#!/usr/bin/env python3
"""
----------------------------------------------
turtleDemo - Help
----------------------------------------------
This document has two sections:
(1) How to use the demo viewer
(2) How to add your own demos to the demo repository
(1) How to use the demo viewer.
Select a demoscript from the example menu.
The (syntax colored) source code appears in the left
source code window. IT CANNOT BE EDITED, but ONLY VIEWED!
The demo viewer windows can be resized. The divider between text
and canvas can be moved by grabbing it with the mouse. The text font
size can be changed from the menu and with Control/Command '-'/'+'.
It can also be changed on most systems with Control-mousewheel
when the mouse is over the text.
Press START button to start the demo.
Stop execution by pressing the STOP button.
Clear screen by pressing the CLEAR button.
Restart by pressing the START button again.
SPECIAL demos, such as clock.py are those which run EVENTDRIVEN.
Press START button to start the demo.
- Until the EVENTLOOP is entered everything works
as in an ordinary demo script.
- When the EVENTLOOP is entered, you control the
application by using the mouse and/or keys (or it's
controlled by some timer events)
To stop it you can and must press the STOP button.
While the EVENTLOOP is running, the examples menu is disabled.
- Only after having pressed the STOP button, you may
restart it or choose another example script.
* * * * * * * *
In some rare situations there may occur interferences/conflicts
between events concerning the demo script and those concerning the
demo-viewer. (They run in the same process.) Strange behaviour may be
the consequence and in the worst case you must close and restart the
viewer.
* * * * * * * *
(2) How to add your own demos to the demo repository
- Place the file in the same directory as turtledemo/__main__.py
IMPORTANT! When imported, the demo should not modify the system
by calling functions in other modules, such as sys, tkinter, or
turtle. Global variables should be initialized in main().
- The code must contain a main() function which will
be executed by the viewer (see provided example scripts).
It may return a string which will be displayed in the Label below
the source code window (when execution has finished.)
- In order to run mydemo.py by itself, such as during development,
add the following at the end of the file:
if __name__ == '__main__':
main()
mainloop() # keep window open
python -m turtledemo.mydemo # will then run it
- If the demo is EVENT DRIVEN, main must return the string
"EVENTLOOP". This informs the demo viewer that the script is
still running and must be stopped by the user!
If an "EVENTLOOP" demo runs by itself, as with clock, which uses
ontimer, or minimal_hanoi, which loops by recursion, then the
code should catch the turtle.Terminator exception that will be
raised when the user presses the STOP button. (Paint is not such
a demo; it only acts in response to mouse clicks and movements.)
"""
import sys
import os
from tkinter import *
from idlelib.Percolator import Percolator
from idlelib.ColorDelegator import ColorDelegator
from idlelib.textView import view_text
from turtledemo import __doc__ as about_turtledemo
import turtle
import time
demo_dir = os.path.dirname(os.path.abspath(__file__))
darwin = sys.platform == 'darwin'
STARTUP = 1
READY = 2
RUNNING = 3
DONE = 4
EVENTDRIVEN = 5
menufont = ("Arial", 12, NORMAL)
btnfont = ("Arial", 12, 'bold')
txtfont = ['Lucida Console', 10, 'normal']
MINIMUM_FONT_SIZE = 6
MAXIMUM_FONT_SIZE = 100
font_sizes = [8, 9, 10, 11, 12, 14, 18, 20, 22, 24, 30]
def getExampleEntries():
return [entry[:-3] for entry in os.listdir(demo_dir) if
entry.endswith(".py") and entry[0] != '_']
help_entries = ( # (help_label, help_doc)
('Turtledemo help', __doc__),
('About turtledemo', about_turtledemo),
('About turtle module', turtle.__doc__),
)
class DemoWindow(object):
def __init__(self, filename=None):
self.root = root = turtle._root = Tk()
root.title('Python turtle-graphics examples')
root.wm_protocol("WM_DELETE_WINDOW", self._destroy)
if darwin:
import subprocess
# Make sure we are the currently activated OS X application
# so that our menu bar appears.
p = subprocess.Popen(
[
'osascript',
'-e', 'tell application "System Events"',
'-e', 'set frontmost of the first process whose '
'unix id is {} to true'.format(os.getpid()),
'-e', 'end tell',
],
stderr=subprocess.DEVNULL,
stdout=subprocess.DEVNULL,)
root.grid_rowconfigure(0, weight=1)
root.grid_columnconfigure(0, weight=1)
root.grid_columnconfigure(1, minsize=90, weight=1)
root.grid_columnconfigure(2, minsize=90, weight=1)
root.grid_columnconfigure(3, minsize=90, weight=1)
self.mBar = Menu(root, relief=RAISED, borderwidth=2)
self.mBar.add_cascade(menu=self.makeLoadDemoMenu(self.mBar),
label='Examples', underline=0)
self.mBar.add_cascade(menu=self.makeFontMenu(self.mBar),
label='Fontsize', underline=0)
self.mBar.add_cascade(menu=self.makeHelpMenu(self.mBar),
label='Help', underline=0)
root['menu'] = self.mBar
pane = PanedWindow(orient=HORIZONTAL, sashwidth=5,
sashrelief=SOLID, bg='#ddd')
pane.add(self.makeTextFrame(pane))
pane.add(self.makeGraphFrame(pane))
pane.grid(row=0, columnspan=4, sticky='news')
self.output_lbl = Label(root, height= 1, text=" --- ", bg="#ddf",
font=("Arial", 16, 'normal'), borderwidth=2,
relief=RIDGE)
self.start_btn = Button(root, text=" START ", font=btnfont,
fg="white", disabledforeground = "#fed",
command=self.startDemo)
self.stop_btn = Button(root, text=" STOP ", font=btnfont,
fg="white", disabledforeground = "#fed",
command=self.stopIt)
self.clear_btn = Button(root, text=" CLEAR ", font=btnfont,
fg="white", disabledforeground="#fed",
command = self.clearCanvas)
self.output_lbl.grid(row=1, column=0, sticky='news', padx=(0,5))
self.start_btn.grid(row=1, column=1, sticky='ew')
self.stop_btn.grid(row=1, column=2, sticky='ew')
self.clear_btn.grid(row=1, column=3, sticky='ew')
Percolator(self.text).insertfilter(ColorDelegator())
self.dirty = False
self.exitflag = False
if filename:
self.loadfile(filename)
self.configGUI(DISABLED, DISABLED, DISABLED,
"Choose example from menu", "black")
self.state = STARTUP
def onResize(self, event):
cwidth = self._canvas.winfo_width()
cheight = self._canvas.winfo_height()
self._canvas.xview_moveto(0.5*(self.canvwidth-cwidth)/self.canvwidth)
self._canvas.yview_moveto(0.5*(self.canvheight-cheight)/self.canvheight)
def makeTextFrame(self, root):
self.text_frame = text_frame = Frame(root)
self.text = text = Text(text_frame, name='text', padx=5,
wrap='none', width=45)
self.vbar = vbar = Scrollbar(text_frame, name='vbar')
vbar['command'] = text.yview
vbar.pack(side=LEFT, fill=Y)
self.hbar = hbar = Scrollbar(text_frame, name='hbar', orient=HORIZONTAL)
hbar['command'] = text.xview
hbar.pack(side=BOTTOM, fill=X)
text['yscrollcommand'] = vbar.set
text['xscrollcommand'] = hbar.set
text['font'] = tuple(txtfont)
shortcut = 'Command' if darwin else 'Control'
text.bind_all('<%s-minus>' % shortcut, self.decrease_size)
text.bind_all('<%s-underscore>' % shortcut, self.decrease_size)
text.bind_all('<%s-equal>' % shortcut, self.increase_size)
text.bind_all('<%s-plus>' % shortcut, self.increase_size)
text.bind('<Control-MouseWheel>', self.update_mousewheel)
text.bind('<Control-Button-4>', self.increase_size)
text.bind('<Control-Button-5>', self.decrease_size)
text.pack(side=LEFT, fill=BOTH, expand=1)
return text_frame
def makeGraphFrame(self, root):
turtle._Screen._root = root
self.canvwidth = 1000
self.canvheight = 800
turtle._Screen._canvas = self._canvas = canvas = turtle.ScrolledCanvas(
root, 800, 600, self.canvwidth, self.canvheight)
canvas.adjustScrolls()
canvas._rootwindow.bind('<Configure>', self.onResize)
canvas._canvas['borderwidth'] = 0
self.screen = _s_ = turtle.Screen()
turtle.TurtleScreen.__init__(_s_, _s_._canvas)
self.scanvas = _s_._canvas
turtle.RawTurtle.screens = [_s_]
return canvas
def set_txtsize(self, size):
txtfont[1] = size
self.text['font'] = tuple(txtfont)
self.output_lbl['text'] = 'Font size %d' % size
def decrease_size(self, dummy=None):
self.set_txtsize(max(txtfont[1] - 1, MINIMUM_FONT_SIZE))
return 'break'
def increase_size(self, dummy=None):
self.set_txtsize(min(txtfont[1] + 1, MAXIMUM_FONT_SIZE))
return 'break'
def update_mousewheel(self, event):
# For wheel up, event.delte = 120 on Windows, -1 on darwin.
# X-11 sends Control-Button-4 event instead.
if (event.delta < 0) == (not darwin):
return self.decrease_size()
else:
return self.increase_size()
def configGUI(self, start, stop, clear, txt="", color="blue"):
self.start_btn.config(state=start,
bg="#d00" if start == NORMAL else "#fca")
self.stop_btn.config(state=stop,
bg="#d00" if stop == NORMAL else "#fca")
self.clear_btn.config(state=clear,
bg="#d00" if clear == NORMAL else"#fca")
self.output_lbl.config(text=txt, fg=color)
def makeLoadDemoMenu(self, master):
menu = Menu(master)
for entry in getExampleEntries():
def load(entry=entry):
self.loadfile(entry)
menu.add_command(label=entry, underline=0,
font=menufont, command=load)
return menu
def makeFontMenu(self, master):
menu = Menu(master)
menu.add_command(label="Decrease (C-'-')", command=self.decrease_size,
font=menufont)
menu.add_command(label="Increase (C-'+')", command=self.increase_size,
font=menufont)
menu.add_separator()
for size in font_sizes:
def resize(size=size):
self.set_txtsize(size)
menu.add_command(label=str(size), underline=0,
font=menufont, command=resize)
return menu
def makeHelpMenu(self, master):
menu = Menu(master)
for help_label, help_file in help_entries:
def show(help_label=help_label, help_file=help_file):
view_text(self.root, help_label, help_file)
menu.add_command(label=help_label, font=menufont, command=show)
return menu
def refreshCanvas(self):
if self.dirty:
self.screen.clear()
self.dirty=False
def loadfile(self, filename):
self.clearCanvas()
turtle.TurtleScreen._RUNNING = False
modname = 'turtledemo.' + filename
__import__(modname)
self.module = sys.modules[modname]
with open(self.module.__file__, 'r') as f:
chars = f.read()
self.text.delete("1.0", "end")
self.text.insert("1.0", chars)
self.root.title(filename + " - a Python turtle graphics example")
self.configGUI(NORMAL, DISABLED, DISABLED,
"Press start button", "red")
self.state = READY
def startDemo(self):
self.refreshCanvas()
self.dirty = True
turtle.TurtleScreen._RUNNING = True
self.configGUI(DISABLED, NORMAL, DISABLED,
"demo running...", "black")
self.screen.clear()
self.screen.mode("standard")
self.state = RUNNING
try:
result = self.module.main()
if result == "EVENTLOOP":
self.state = EVENTDRIVEN
else:
self.state = DONE
except turtle.Terminator:
if self.root is None:
return
self.state = DONE
result = "stopped!"
if self.state == DONE:
self.configGUI(NORMAL, DISABLED, NORMAL,
result)
elif self.state == EVENTDRIVEN:
self.exitflag = True
self.configGUI(DISABLED, NORMAL, DISABLED,
"use mouse/keys or STOP", "red")
def clearCanvas(self):
self.refreshCanvas()
self.screen._delete("all")
self.scanvas.config(cursor="")
self.configGUI(NORMAL, DISABLED, DISABLED)
def stopIt(self):
if self.exitflag:
self.clearCanvas()
self.exitflag = False
self.configGUI(NORMAL, DISABLED, DISABLED,
"STOPPED!", "red")
turtle.TurtleScreen._RUNNING = False
def _destroy(self):
turtle.TurtleScreen._RUNNING = False
self.root.destroy()
self.root = None
def main():
demo = DemoWindow()
demo.root.mainloop()
if __name__ == '__main__':
main()
"""Test case implementation"""
import sys
import functools
import difflib
import logging
import pprint
import re
import warnings
import collections
import contextlib
import traceback
from . import result
from .util import (strclass, safe_repr, _count_diff_all_purpose,
_count_diff_hashable, _common_shorten_repr)
__unittest = True
DIFF_OMITTED = ('\nDiff is %s characters long. '
'Set self.maxDiff to None to see it.')
class SkipTest(Exception):
"""
Raise this exception in a test to skip it.
Usually you can use TestCase.skipTest() or one of the skipping decorators
instead of raising this directly.
"""
class _ShouldStop(Exception):
"""
The test should stop.
"""
class _UnexpectedSuccess(Exception):
"""
The test was supposed to fail, but it didn't!
"""
class _Outcome(object):
def __init__(self, result=None):
self.expecting_failure = False
self.result = result
self.result_supports_subtests = hasattr(result, "addSubTest")
self.success = True
self.skipped = []
self.expectedFailure = None
self.errors = []
@contextlib.contextmanager
def testPartExecutor(self, test_case, isTest=False):
old_success = self.success
self.success = True
try:
yield
except KeyboardInterrupt:
raise
except SkipTest as e:
self.success = False
self.skipped.append((test_case, str(e)))
except _ShouldStop:
pass
except:
exc_info = sys.exc_info()
if self.expecting_failure:
self.expectedFailure = exc_info
else:
self.success = False
self.errors.append((test_case, exc_info))
# explicitly break a reference cycle:
# exc_info -> frame -> exc_info
exc_info = None
else:
if self.result_supports_subtests and self.success:
self.errors.append((test_case, None))
finally:
self.success = self.success and old_success
def _id(obj):
return obj
def skip(reason):
"""
Unconditionally skip a test.
"""
def decorator(test_item):
if not isinstance(test_item, type):
@functools.wraps(test_item)
def skip_wrapper(*args, **kwargs):
raise SkipTest(reason)
test_item = skip_wrapper
test_item.__unittest_skip__ = True
test_item.__unittest_skip_why__ = reason
return test_item
return decorator
def skipIf(condition, reason):
"""
Skip a test if the condition is true.
"""
if condition:
return skip(reason)
return _id
def skipUnless(condition, reason):
"""
Skip a test unless the condition is true.
"""
if not condition:
return skip(reason)
return _id
def expectedFailure(test_item):
test_item.__unittest_expecting_failure__ = True
return test_item
class _BaseTestCaseContext:
def __init__(self, test_case):
self.test_case = test_case
def _raiseFailure(self, standardMsg):
msg = self.test_case._formatMessage(self.msg, standardMsg)
raise self.test_case.failureException(msg)
class _AssertRaisesBaseContext(_BaseTestCaseContext):
def __init__(self, expected, test_case, callable_obj=None,
expected_regex=None):
_BaseTestCaseContext.__init__(self, test_case)
self.expected = expected
self.test_case = test_case
if callable_obj is not None:
try:
self.obj_name = callable_obj.__name__
except AttributeError:
self.obj_name = str(callable_obj)
else:
self.obj_name = None
if expected_regex is not None:
expected_regex = re.compile(expected_regex)
self.expected_regex = expected_regex
self.msg = None
def handle(self, name, callable_obj, args, kwargs):
"""
If callable_obj is None, assertRaises/Warns is being used as a
context manager, so check for a 'msg' kwarg and return self.
If callable_obj is not None, call it passing args and kwargs.
"""
if callable_obj is None:
self.msg = kwargs.pop('msg', None)
return self
with self:
callable_obj(*args, **kwargs)
class _AssertRaisesContext(_AssertRaisesBaseContext):
"""A context manager used to implement TestCase.assertRaises* methods."""
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, tb):
if exc_type is None:
try:
exc_name = self.expected.__name__
except AttributeError:
exc_name = str(self.expected)
if self.obj_name:
self._raiseFailure("{} not raised by {}".format(exc_name,
self.obj_name))
else:
self._raiseFailure("{} not raised".format(exc_name))
else:
traceback.clear_frames(tb)
if issubclass(exc_type, self.expected):
# store exception, without traceback, for later retrieval
self.exception = exc_value.with_traceback(None)
elif (hasattr(exc_value, 'clsException') and isinstance(exc_value.clsException, self.expected)):
# IronPython: allow .NET exceptions
self.exception = exc_value.clsException
else:
# let unexpected exceptions pass through
return False
if self.expected_regex is None:
return True
expected_regex = self.expected_regex
if not expected_regex.search(str(exc_value)):
self._raiseFailure('"{}" does not match "{}"'.format(
expected_regex.pattern, str(exc_value)))
return True
class _AssertWarnsContext(_AssertRaisesBaseContext):
"""A context manager used to implement TestCase.assertWarns* methods."""
def __enter__(self):
# The __warningregistry__'s need to be in a pristine state for tests
# to work properly.
for v in sys.modules.values():
if getattr(v, '__warningregistry__', None):
v.__warningregistry__ = {}
self.warnings_manager = warnings.catch_warnings(record=True)
self.warnings = self.warnings_manager.__enter__()
warnings.simplefilter("always", self.expected)
return self
def __exit__(self, exc_type, exc_value, tb):
self.warnings_manager.__exit__(exc_type, exc_value, tb)
if exc_type is not None:
# let unexpected exceptions pass through
return
try:
exc_name = self.expected.__name__
except AttributeError:
exc_name = str(self.expected)
first_matching = None
for m in self.warnings:
w = m.message
if not isinstance(w, self.expected):
continue
if first_matching is None:
first_matching = w
if (self.expected_regex is not None and
not self.expected_regex.search(str(w))):
continue
# store warning for later retrieval
self.warning = w
self.filename = m.filename
self.lineno = m.lineno
return
# Now we simply try to choose a helpful failure message
if first_matching is not None:
self._raiseFailure('"{}" does not match "{}"'.format(
self.expected_regex.pattern, str(first_matching)))
if self.obj_name:
self._raiseFailure("{} not triggered by {}".format(exc_name,
self.obj_name))
else:
self._raiseFailure("{} not triggered".format(exc_name))
_LoggingWatcher = collections.namedtuple("_LoggingWatcher",
["records", "output"])
class _CapturingHandler(logging.Handler):
"""
A logging handler capturing all (raw and formatted) logging output.
"""
def __init__(self):
logging.Handler.__init__(self)
self.watcher = _LoggingWatcher([], [])
def flush(self):
pass
def emit(self, record):
self.watcher.records.append(record)
msg = self.format(record)
self.watcher.output.append(msg)
class _AssertLogsContext(_BaseTestCaseContext):
"""A context manager used to implement TestCase.assertLogs()."""
LOGGING_FORMAT = "%(levelname)s:%(name)s:%(message)s"
def __init__(self, test_case, logger_name, level):
_BaseTestCaseContext.__init__(self, test_case)
self.logger_name = logger_name
if level:
self.level = logging._nameToLevel.get(level, level)
else:
self.level = logging.INFO
self.msg = None
def __enter__(self):
if isinstance(self.logger_name, logging.Logger):
logger = self.logger = self.logger_name
else:
logger = self.logger = logging.getLogger(self.logger_name)
formatter = logging.Formatter(self.LOGGING_FORMAT)
handler = _CapturingHandler()
handler.setFormatter(formatter)
self.watcher = handler.watcher
self.old_handlers = logger.handlers[:]
self.old_level = logger.level
self.old_propagate = logger.propagate
logger.handlers = [handler]
logger.setLevel(self.level)
logger.propagate = False
return handler.watcher
def __exit__(self, exc_type, exc_value, tb):
self.logger.handlers = self.old_handlers
self.logger.propagate = self.old_propagate
self.logger.setLevel(self.old_level)
if exc_type is not None:
# let unexpected exceptions pass through
return False
if len(self.watcher.records) == 0:
self._raiseFailure(
"no logs of level {} or higher triggered on {}"
.format(logging.getLevelName(self.level), self.logger.name))
class TestCase(object):
"""A class whose instances are single test cases.
By default, the test code itself should be placed in a method named
'runTest'.
If the fixture may be used for many test cases, create as
many test methods as are needed. When instantiating such a TestCase
subclass, specify in the constructor arguments the name of the test method
that the instance is to execute.
Test authors should subclass TestCase for their own tests. Construction
and deconstruction of the test's environment ('fixture') can be
implemented by overriding the 'setUp' and 'tearDown' methods respectively.
If it is necessary to override the __init__ method, the base class
__init__ method must always be called. It is important that subclasses
should not change the signature of their __init__ method, since instances
of the classes are instantiated automatically by parts of the framework
in order to be run.
When subclassing TestCase, you can set these attributes:
* failureException: determines which exception will be raised when
the instance's assertion methods fail; test methods raising this
exception will be deemed to have 'failed' rather than 'errored'.
* longMessage: determines whether long messages (including repr of
objects used in assert methods) will be printed on failure in *addition*
to any explicit message passed.
* maxDiff: sets the maximum length of a diff in failure messages
by assert methods using difflib. It is looked up as an instance
attribute so can be configured by individual tests if required.
"""
failureException = AssertionError
longMessage = True
maxDiff = 80*8
# If a string is longer than _diffThreshold, use normal comparison instead
# of difflib. See #11763.
_diffThreshold = 2**16
# Attribute used by TestSuite for classSetUp
_classSetupFailed = False
def __init__(self, methodName='runTest'):
"""Create an instance of the class that will use the named test
method when executed. Raises a ValueError if the instance does
not have a method with the specified name.
"""
self._testMethodName = methodName
self._outcome = None
self._testMethodDoc = 'No test'
try:
testMethod = getattr(self, methodName)
except AttributeError:
if methodName != 'runTest':
# we allow instantiation with no explicit method name
# but not an *incorrect* or missing method name
raise ValueError("no such test method in %s: %s" %
(self.__class__, methodName))
else:
self._testMethodDoc = testMethod.__doc__
self._cleanups = []
self._subtest = None
# Map types to custom assertEqual functions that will compare
# instances of said type in more detail to generate a more useful
# error message.
self._type_equality_funcs = {}
self.addTypeEqualityFunc(dict, 'assertDictEqual')
self.addTypeEqualityFunc(list, 'assertListEqual')
self.addTypeEqualityFunc(tuple, 'assertTupleEqual')
self.addTypeEqualityFunc(set, 'assertSetEqual')
self.addTypeEqualityFunc(frozenset, 'assertSetEqual')
self.addTypeEqualityFunc(str, 'assertMultiLineEqual')
def addTypeEqualityFunc(self, typeobj, function):
"""Add a type specific assertEqual style function to compare a type.
This method is for use by TestCase subclasses that need to register
their own type equality functions to provide nicer error messages.
Args:
typeobj: The data type to call this function on when both values
are of the same type in assertEqual().
function: The callable taking two arguments and an optional
msg= argument that raises self.failureException with a
useful error message when the two arguments are not equal.
"""
self._type_equality_funcs[typeobj] = function
def addCleanup(self, function, *args, **kwargs):
"""Add a function, with arguments, to be called when the test is
completed. Functions added are called on a LIFO basis and are
called after tearDown on test failure or success.
Cleanup items are called even if setUp fails (unlike tearDown)."""
self._cleanups.append((function, args, kwargs))
def setUp(self):
"Hook method for setting up the test fixture before exercising it."
pass
def tearDown(self):
"Hook method for deconstructing the test fixture after testing it."
pass
@classmethod
def setUpClass(cls):
"Hook method for setting up class fixture before running tests in the class."
@classmethod
def tearDownClass(cls):
"Hook method for deconstructing the class fixture after running all tests in the class."
def countTestCases(self):
return 1
def defaultTestResult(self):
return result.TestResult()
def shortDescription(self):
"""Returns a one-line description of the test, or None if no
description has been provided.
The default implementation of this method returns the first line of
the specified test method's docstring.
"""
doc = self._testMethodDoc
return doc and doc.split("\n")[0].strip() or None
def id(self):
return "%s.%s" % (strclass(self.__class__), self._testMethodName)
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self._testMethodName == other._testMethodName
def __hash__(self):
return hash((type(self), self._testMethodName))
def __str__(self):
return "%s (%s)" % (self._testMethodName, strclass(self.__class__))
def __repr__(self):
return "<%s testMethod=%s>" % \
(strclass(self.__class__), self._testMethodName)
def _addSkip(self, result, test_case, reason):
addSkip = getattr(result, 'addSkip', None)
if addSkip is not None:
addSkip(test_case, reason)
else:
warnings.warn("TestResult has no addSkip method, skips not reported",
RuntimeWarning, 2)
result.addSuccess(test_case)
@contextlib.contextmanager
def subTest(self, msg=None, **params):
"""Return a context manager that will return the enclosed block
of code in a subtest identified by the optional message and
keyword parameters. A failure in the subtest marks the test
case as failed but resumes execution at the end of the enclosed
block, allowing further test code to be executed.
"""
if not self._outcome.result_supports_subtests:
yield
return
parent = self._subtest
if parent is None:
params_map = collections.ChainMap(params)
else:
params_map = parent.params.new_child(params)
self._subtest = _SubTest(self, msg, params_map)
try:
with self._outcome.testPartExecutor(self._subtest, isTest=True):
yield
if not self._outcome.success:
result = self._outcome.result
if result is not None and result.failfast:
raise _ShouldStop
elif self._outcome.expectedFailure:
# If the test is expecting a failure, we really want to
# stop now and register the expected failure.
raise _ShouldStop
finally:
self._subtest = parent
def _feedErrorsToResult(self, result, errors):
for test, exc_info in errors:
if isinstance(test, _SubTest):
result.addSubTest(test.test_case, test, exc_info)
elif exc_info is not None:
if issubclass(exc_info[0], self.failureException):
result.addFailure(test, exc_info)
else:
result.addError(test, exc_info)
def _addExpectedFailure(self, result, exc_info):
try:
addExpectedFailure = result.addExpectedFailure
except AttributeError:
warnings.warn("TestResult has no addExpectedFailure method, reporting as passes",
RuntimeWarning)
result.addSuccess(self)
else:
addExpectedFailure(self, exc_info)
def _addUnexpectedSuccess(self, result):
try:
addUnexpectedSuccess = result.addUnexpectedSuccess
except AttributeError:
warnings.warn("TestResult has no addUnexpectedSuccess method, reporting as failure",
RuntimeWarning)
# We need to pass an actual exception and traceback to addFailure,
# otherwise the legacy result can choke.
try:
raise _UnexpectedSuccess from None
except _UnexpectedSuccess:
result.addFailure(self, sys.exc_info())
else:
addUnexpectedSuccess(self)
def run(self, result=None):
orig_result = result
if result is None:
result = self.defaultTestResult()
startTestRun = getattr(result, 'startTestRun', None)
if startTestRun is not None:
startTestRun()
result.startTest(self)
testMethod = getattr(self, self._testMethodName)
if (getattr(self.__class__, "__unittest_skip__", False) or
getattr(testMethod, "__unittest_skip__", False)):
# If the class or method was skipped.
try:
skip_why = (getattr(self.__class__, '__unittest_skip_why__', '')
or getattr(testMethod, '__unittest_skip_why__', ''))
self._addSkip(result, self, skip_why)
finally:
result.stopTest(self)
return
expecting_failure_method = getattr(testMethod,
"__unittest_expecting_failure__", False)
expecting_failure_class = getattr(self,
"__unittest_expecting_failure__", False)
expecting_failure = expecting_failure_class or expecting_failure_method
outcome = _Outcome(result)
try:
self._outcome = outcome
with outcome.testPartExecutor(self):
self.setUp()
if outcome.success:
outcome.expecting_failure = expecting_failure
with outcome.testPartExecutor(self, isTest=True):
testMethod()
outcome.expecting_failure = False
with outcome.testPartExecutor(self):
self.tearDown()
self.doCleanups()
for test, reason in outcome.skipped:
self._addSkip(result, test, reason)
self._feedErrorsToResult(result, outcome.errors)
if outcome.success:
if expecting_failure:
if outcome.expectedFailure:
self._addExpectedFailure(result, outcome.expectedFailure)
else:
self._addUnexpectedSuccess(result)
else:
result.addSuccess(self)
return result
finally:
result.stopTest(self)
if orig_result is None:
stopTestRun = getattr(result, 'stopTestRun', None)
if stopTestRun is not None:
stopTestRun()
# explicitly break reference cycles:
# outcome.errors -> frame -> outcome -> outcome.errors
# outcome.expectedFailure -> frame -> outcome -> outcome.expectedFailure
outcome.errors.clear()
outcome.expectedFailure = None
# clear the outcome, no more needed
self._outcome = None
def doCleanups(self):
"""Execute all cleanup functions. Normally called for you after
tearDown."""
outcome = self._outcome or _Outcome()
while self._cleanups:
function, args, kwargs = self._cleanups.pop()
with outcome.testPartExecutor(self):
function(*args, **kwargs)
# return this for backwards compatibility
# even though we no longer us it internally
return outcome.success
def __call__(self, *args, **kwds):
return self.run(*args, **kwds)
def debug(self):
"""Run the test without collecting errors in a TestResult"""
self.setUp()
getattr(self, self._testMethodName)()
self.tearDown()
while self._cleanups:
function, args, kwargs = self._cleanups.pop(-1)
function(*args, **kwargs)
def skipTest(self, reason):
"""Skip this test."""
raise SkipTest(reason)
def fail(self, msg=None):
"""Fail immediately, with the given message."""
raise self.failureException(msg)
def assertFalse(self, expr, msg=None):
"""Check that the expression is false."""
if expr:
msg = self._formatMessage(msg, "%s is not false" % safe_repr(expr))
raise self.failureException(msg)
def assertTrue(self, expr, msg=None):
"""Check that the expression is true."""
if not expr:
msg = self._formatMessage(msg, "%s is not true" % safe_repr(expr))
raise self.failureException(msg)
def _formatMessage(self, msg, standardMsg):
"""Honour the longMessage attribute when generating failure messages.
If longMessage is False this means:
* Use only an explicit message if it is provided
* Otherwise use the standard message for the assert
If longMessage is True:
* Use the standard message
* If an explicit message is provided, plus ' : ' and the explicit message
"""
if not self.longMessage:
return msg or standardMsg
if msg is None:
return standardMsg
try:
# don't switch to '{}' formatting in Python 2.X
# it changes the way unicode input is handled
return '%s : %s' % (standardMsg, msg)
except UnicodeDecodeError:
return '%s : %s' % (safe_repr(standardMsg), safe_repr(msg))
def assertRaises(self, excClass, callableObj=None, *args, **kwargs):
"""Fail unless an exception of class excClass is raised
by callableObj when invoked with arguments args and keyword
arguments kwargs. If a different type of exception is
raised, it will not be caught, and the test case will be
deemed to have suffered an error, exactly as for an
unexpected exception.
If called with callableObj omitted or None, will return a
context object used like this::
with self.assertRaises(SomeException):
do_something()
An optional keyword argument 'msg' can be provided when assertRaises
is used as a context object.
The context manager keeps a reference to the exception as
the 'exception' attribute. This allows you to inspect the
exception after the assertion::
with self.assertRaises(SomeException) as cm:
do_something()
the_exception = cm.exception
self.assertEqual(the_exception.error_code, 3)
"""
context = _AssertRaisesContext(excClass, self, callableObj)
return context.handle('assertRaises', callableObj, args, kwargs)
def assertWarns(self, expected_warning, callable_obj=None, *args, **kwargs):
"""Fail unless a warning of class warnClass is triggered
by callable_obj when invoked with arguments args and keyword
arguments kwargs. If a different type of warning is
triggered, it will not be handled: depending on the other
warning filtering rules in effect, it might be silenced, printed
out, or raised as an exception.
If called with callable_obj omitted or None, will return a
context object used like this::
with self.assertWarns(SomeWarning):
do_something()
An optional keyword argument 'msg' can be provided when assertWarns
is used as a context object.
The context manager keeps a reference to the first matching
warning as the 'warning' attribute; similarly, the 'filename'
and 'lineno' attributes give you information about the line
of Python code from which the warning was triggered.
This allows you to inspect the warning after the assertion::
with self.assertWarns(SomeWarning) as cm:
do_something()
the_warning = cm.warning
self.assertEqual(the_warning.some_attribute, 147)
"""
context = _AssertWarnsContext(expected_warning, self, callable_obj)
return context.handle('assertWarns', callable_obj, args, kwargs)
def assertLogs(self, logger=None, level=None):
"""Fail unless a log message of level *level* or higher is emitted
on *logger_name* or its children. If omitted, *level* defaults to
INFO and *logger* defaults to the root logger.
This method must be used as a context manager, and will yield
a recording object with two attributes: `output` and `records`.
At the end of the context manager, the `output` attribute will
be a list of the matching formatted log messages and the
`records` attribute will be a list of the corresponding LogRecord
objects.
Example::
with self.assertLogs('foo', level='INFO') as cm:
logging.getLogger('foo').info('first message')
logging.getLogger('foo.bar').error('second message')
self.assertEqual(cm.output, ['INFO:foo:first message',
'ERROR:foo.bar:second message'])
"""
return _AssertLogsContext(self, logger, level)
def _getAssertEqualityFunc(self, first, second):
"""Get a detailed comparison function for the types of the two args.
Returns: A callable accepting (first, second, msg=None) that will
raise a failure exception if first != second with a useful human
readable error message for those types.
"""
#
# NOTE(gregory.p.smith): I considered isinstance(first, type(second))
# and vice versa. I opted for the conservative approach in case
# subclasses are not intended to be compared in detail to their super
# class instances using a type equality func. This means testing
# subtypes won't automagically use the detailed comparison. Callers
# should use their type specific assertSpamEqual method to compare
# subclasses if the detailed comparison is desired and appropriate.
# See the discussion in http://bugs.python.org/issue2578.
#
if type(first) is type(second):
asserter = self._type_equality_funcs.get(type(first))
if asserter is not None:
if isinstance(asserter, str):
asserter = getattr(self, asserter)
return asserter
return self._baseAssertEqual
def _baseAssertEqual(self, first, second, msg=None):
"""The default assertEqual implementation, not type specific."""
if not first == second:
standardMsg = '%s != %s' % _common_shorten_repr(first, second)
msg = self._formatMessage(msg, standardMsg)
raise self.failureException(msg)
def assertEqual(self, first, second, msg=None):
"""Fail if the two objects are unequal as determined by the '=='
operator.
"""
assertion_func = self._getAssertEqualityFunc(first, second)
assertion_func(first, second, msg=msg)
def assertNotEqual(self, first, second, msg=None):
"""Fail if the two objects are equal as determined by the '!='
operator.
"""
if not first != second:
msg = self._formatMessage(msg, '%s == %s' % (safe_repr(first),
safe_repr(second)))
raise self.failureException(msg)
def assertAlmostEqual(self, first, second, places=None, msg=None,
delta=None):
"""Fail if the two objects are unequal as determined by their
difference rounded to the given number of decimal places
(default 7) and comparing to zero, or by comparing that the
between the two objects is more than the given delta.
Note that decimal places (from zero) are usually not the same
as significant digits (measured from the most signficant digit).
If the two objects compare equal then they will automatically
compare almost equal.
"""
if first == second:
# shortcut
return
if delta is not None and places is not None:
raise TypeError("specify delta or places not both")
if delta is not None:
if abs(first - second) <= delta:
return
standardMsg = '%s != %s within %s delta' % (safe_repr(first),
safe_repr(second),
safe_repr(delta))
else:
if places is None:
places = 7
if round(abs(second-first), places) == 0:
return
standardMsg = '%s != %s within %r places' % (safe_repr(first),
safe_repr(second),
places)
msg = self._formatMessage(msg, standardMsg)
raise self.failureException(msg)
def assertNotAlmostEqual(self, first, second, places=None, msg=None,
delta=None):
"""Fail if the two objects are equal as determined by their
difference rounded to the given number of decimal places
(default 7) and comparing to zero, or by comparing that the
between the two objects is less than the given delta.
Note that decimal places (from zero) are usually not the same
as significant digits (measured from the most signficant digit).
Objects that are equal automatically fail.
"""
if delta is not None and places is not None:
raise TypeError("specify delta or places not both")
if delta is not None:
if not (first == second) and abs(first - second) > delta:
return
standardMsg = '%s == %s within %s delta' % (safe_repr(first),
safe_repr(second),
safe_repr(delta))
else:
if places is None:
places = 7
if not (first == second) and round(abs(second-first), places) != 0:
return
standardMsg = '%s == %s within %r places' % (safe_repr(first),
safe_repr(second),
places)
msg = self._formatMessage(msg, standardMsg)
raise self.failureException(msg)
def assertSequenceEqual(self, seq1, seq2, msg=None, seq_type=None):
"""An equality assertion for ordered sequences (like lists and tuples).
For the purposes of this function, a valid ordered sequence type is one
which can be indexed, has a length, and has an equality operator.
Args:
seq1: The first sequence to compare.
seq2: The second sequence to compare.
seq_type: The expected datatype of the sequences, or None if no
datatype should be enforced.
msg: Optional message to use on failure instead of a list of
differences.
"""
if seq_type is not None:
seq_type_name = seq_type.__name__
if not isinstance(seq1, seq_type):
raise self.failureException('First sequence is not a %s: %s'
% (seq_type_name, safe_repr(seq1)))
if not isinstance(seq2, seq_type):
raise self.failureException('Second sequence is not a %s: %s'
% (seq_type_name, safe_repr(seq2)))
else:
seq_type_name = "sequence"
differing = None
try:
len1 = len(seq1)
except (TypeError, NotImplementedError):
differing = 'First %s has no length. Non-sequence?' % (
seq_type_name)
if differing is None:
try:
len2 = len(seq2)
except (TypeError, NotImplementedError):
differing = 'Second %s has no length. Non-sequence?' % (
seq_type_name)
if differing is None:
if seq1 == seq2:
return
differing = '%ss differ: %s != %s\n' % (
(seq_type_name.capitalize(),) +
_common_shorten_repr(seq1, seq2))
for i in range(min(len1, len2)):
try:
item1 = seq1[i]
except (TypeError, IndexError, NotImplementedError):
differing += ('\nUnable to index element %d of first %s\n' %
(i, seq_type_name))
break
try:
item2 = seq2[i]
except (TypeError, IndexError, NotImplementedError):
differing += ('\nUnable to index element %d of second %s\n' %
(i, seq_type_name))
break
if item1 != item2:
differing += ('\nFirst differing element %d:\n%s\n%s\n' %
(i, item1, item2))
break
else:
if (len1 == len2 and seq_type is None and
type(seq1) != type(seq2)):
# The sequences are the same, but have differing types.
return
if len1 > len2:
differing += ('\nFirst %s contains %d additional '
'elements.\n' % (seq_type_name, len1 - len2))
try:
differing += ('First extra element %d:\n%s\n' %
(len2, seq1[len2]))
except (TypeError, IndexError, NotImplementedError):
differing += ('Unable to index element %d '
'of first %s\n' % (len2, seq_type_name))
elif len1 < len2:
differing += ('\nSecond %s contains %d additional '
'elements.\n' % (seq_type_name, len2 - len1))
try:
differing += ('First extra element %d:\n%s\n' %
(len1, seq2[len1]))
except (TypeError, IndexError, NotImplementedError):
differing += ('Unable to index element %d '
'of second %s\n' % (len1, seq_type_name))
standardMsg = differing
diffMsg = '\n' + '\n'.join(
difflib.ndiff(pprint.pformat(seq1).splitlines(),
pprint.pformat(seq2).splitlines()))
standardMsg = self._truncateMessage(standardMsg, diffMsg)
msg = self._formatMessage(msg, standardMsg)
self.fail(msg)
def _truncateMessage(self, message, diff):
max_diff = self.maxDiff
if max_diff is None or len(diff) <= max_diff:
return message + diff
return message + (DIFF_OMITTED % len(diff))
def assertListEqual(self, list1, list2, msg=None):
"""A list-specific equality assertion.
Args:
list1: The first list to compare.
list2: The second list to compare.
msg: Optional message to use on failure instead of a list of
differences.
"""
self.assertSequenceEqual(list1, list2, msg, seq_type=list)
def assertTupleEqual(self, tuple1, tuple2, msg=None):
"""A tuple-specific equality assertion.
Args:
tuple1: The first tuple to compare.
tuple2: The second tuple to compare.
msg: Optional message to use on failure instead of a list of
differences.
"""
self.assertSequenceEqual(tuple1, tuple2, msg, seq_type=tuple)
def assertSetEqual(self, set1, set2, msg=None):
"""A set-specific equality assertion.
Args:
set1: The first set to compare.
set2: The second set to compare.
msg: Optional message to use on failure instead of a list of
differences.
assertSetEqual uses ducktyping to support different types of sets, and
is optimized for sets specifically (parameters must support a
difference method).
"""
try:
difference1 = set1.difference(set2)
except TypeError as e:
self.fail('invalid type when attempting set difference: %s' % e)
except AttributeError as e:
self.fail('first argument does not support set difference: %s' % e)
try:
difference2 = set2.difference(set1)
except TypeError as e:
self.fail('invalid type when attempting set difference: %s' % e)
except AttributeError as e:
self.fail('second argument does not support set difference: %s' % e)
if not (difference1 or difference2):
return
lines = []
if difference1:
lines.append('Items in the first set but not the second:')
for item in difference1:
lines.append(repr(item))
if difference2:
lines.append('Items in the second set but not the first:')
for item in difference2:
lines.append(repr(item))
standardMsg = '\n'.join(lines)
self.fail(self._formatMessage(msg, standardMsg))
def assertIn(self, member, container, msg=None):
"""Just like self.assertTrue(a in b), but with a nicer default message."""
if member not in container:
standardMsg = '%s not found in %s' % (safe_repr(member),
safe_repr(container))
self.fail(self._formatMessage(msg, standardMsg))
def assertNotIn(self, member, container, msg=None):
"""Just like self.assertTrue(a not in b), but with a nicer default message."""
if member in container:
standardMsg = '%s unexpectedly found in %s' % (safe_repr(member),
safe_repr(container))
self.fail(self._formatMessage(msg, standardMsg))
def assertIs(self, expr1, expr2, msg=None):
"""Just like self.assertTrue(a is b), but with a nicer default message."""
if expr1 is not expr2:
standardMsg = '%s is not %s' % (safe_repr(expr1),
safe_repr(expr2))
self.fail(self._formatMessage(msg, standardMsg))
def assertIsNot(self, expr1, expr2, msg=None):
"""Just like self.assertTrue(a is not b), but with a nicer default message."""
if expr1 is expr2:
standardMsg = 'unexpectedly identical: %s' % (safe_repr(expr1),)
self.fail(self._formatMessage(msg, standardMsg))
def assertDictEqual(self, d1, d2, msg=None):
self.assertIsInstance(d1, dict, 'First argument is not a dictionary')
self.assertIsInstance(d2, dict, 'Second argument is not a dictionary')
if d1 != d2:
standardMsg = '%s != %s' % _common_shorten_repr(d1, d2)
diff = ('\n' + '\n'.join(difflib.ndiff(
pprint.pformat(d1).splitlines(),
pprint.pformat(d2).splitlines())))
standardMsg = self._truncateMessage(standardMsg, diff)
self.fail(self._formatMessage(msg, standardMsg))
def assertDictContainsSubset(self, subset, dictionary, msg=None):
"""Checks whether dictionary is a superset of subset."""
warnings.warn('assertDictContainsSubset is deprecated',
DeprecationWarning)
missing = []
mismatched = []
for key, value in subset.items():
if key not in dictionary:
missing.append(key)
elif value != dictionary[key]:
mismatched.append('%s, expected: %s, actual: %s' %
(safe_repr(key), safe_repr(value),
safe_repr(dictionary[key])))
if not (missing or mismatched):
return
standardMsg = ''
if missing:
standardMsg = 'Missing: %s' % ','.join(safe_repr(m) for m in
missing)
if mismatched:
if standardMsg:
standardMsg += '; '
standardMsg += 'Mismatched values: %s' % ','.join(mismatched)
self.fail(self._formatMessage(msg, standardMsg))
def assertCountEqual(self, first, second, msg=None):
"""An unordered sequence comparison asserting that the same elements,
regardless of order. If the same element occurs more than once,
it verifies that the elements occur the same number of times.
self.assertEqual(Counter(list(first)),
Counter(list(second)))
Example:
- [0, 1, 1] and [1, 0, 1] compare equal.
- [0, 0, 1] and [0, 1] compare unequal.
"""
first_seq, second_seq = list(first), list(second)
try:
first = collections.Counter(first_seq)
second = collections.Counter(second_seq)
except TypeError:
# Handle case with unhashable elements
differences = _count_diff_all_purpose(first_seq, second_seq)
else:
if first == second:
return
differences = _count_diff_hashable(first_seq, second_seq)
if differences:
standardMsg = 'Element counts were not equal:\n'
lines = ['First has %d, Second has %d: %r' % diff for diff in differences]
diffMsg = '\n'.join(lines)
standardMsg = self._truncateMessage(standardMsg, diffMsg)
msg = self._formatMessage(msg, standardMsg)
self.fail(msg)
def assertMultiLineEqual(self, first, second, msg=None):
"""Assert that two multi-line strings are equal."""
self.assertIsInstance(first, str, 'First argument is not a string')
self.assertIsInstance(second, str, 'Second argument is not a string')
if first != second:
# don't use difflib if the strings are too long
if (len(first) > self._diffThreshold or
len(second) > self._diffThreshold):
self._baseAssertEqual(first, second, msg)
firstlines = first.splitlines(keepends=True)
secondlines = second.splitlines(keepends=True)
if len(firstlines) == 1 and first.strip('\r\n') == first:
firstlines = [first + '\n']
secondlines = [second + '\n']
standardMsg = '%s != %s' % _common_shorten_repr(first, second)
diff = '\n' + ''.join(difflib.ndiff(firstlines, secondlines))
standardMsg = self._truncateMessage(standardMsg, diff)
self.fail(self._formatMessage(msg, standardMsg))
def assertLess(self, a, b, msg=None):
"""Just like self.assertTrue(a < b), but with a nicer default message."""
if not a < b:
standardMsg = '%s not less than %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertLessEqual(self, a, b, msg=None):
"""Just like self.assertTrue(a <= b), but with a nicer default message."""
if not a <= b:
standardMsg = '%s not less than or equal to %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertGreater(self, a, b, msg=None):
"""Just like self.assertTrue(a > b), but with a nicer default message."""
if not a > b:
standardMsg = '%s not greater than %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertGreaterEqual(self, a, b, msg=None):
"""Just like self.assertTrue(a >= b), but with a nicer default message."""
if not a >= b:
standardMsg = '%s not greater than or equal to %s' % (safe_repr(a), safe_repr(b))
self.fail(self._formatMessage(msg, standardMsg))
def assertIsNone(self, obj, msg=None):
"""Same as self.assertTrue(obj is None), with a nicer default message."""
if obj is not None:
standardMsg = '%s is not None' % (safe_repr(obj),)
self.fail(self._formatMessage(msg, standardMsg))
def assertIsNotNone(self, obj, msg=None):
"""Included for symmetry with assertIsNone."""
if obj is None:
standardMsg = 'unexpectedly None'
self.fail(self._formatMessage(msg, standardMsg))
def assertIsInstance(self, obj, cls, msg=None):
"""Same as self.assertTrue(isinstance(obj, cls)), with a nicer
default message."""
if not isinstance(obj, cls):
standardMsg = '%s is not an instance of %r' % (safe_repr(obj), cls)
self.fail(self._formatMessage(msg, standardMsg))
def assertNotIsInstance(self, obj, cls, msg=None):
"""Included for symmetry with assertIsInstance."""
if isinstance(obj, cls):
standardMsg = '%s is an instance of %r' % (safe_repr(obj), cls)
self.fail(self._formatMessage(msg, standardMsg))
def assertRaisesRegex(self, expected_exception, expected_regex,
callable_obj=None, *args, **kwargs):
"""Asserts that the message in a raised exception matches a regex.
Args:
expected_exception: Exception class expected to be raised.
expected_regex: Regex (re pattern object or string) expected
to be found in error message.
callable_obj: Function to be called.
msg: Optional message used in case of failure. Can only be used
when assertRaisesRegex is used as a context manager.
args: Extra args.
kwargs: Extra kwargs.
"""
context = _AssertRaisesContext(expected_exception, self, callable_obj,
expected_regex)
return context.handle('assertRaisesRegex', callable_obj, args, kwargs)
def assertWarnsRegex(self, expected_warning, expected_regex,
callable_obj=None, *args, **kwargs):
"""Asserts that the message in a triggered warning matches a regexp.
Basic functioning is similar to assertWarns() with the addition
that only warnings whose messages also match the regular expression
are considered successful matches.
Args:
expected_warning: Warning class expected to be triggered.
expected_regex: Regex (re pattern object or string) expected
to be found in error message.
callable_obj: Function to be called.
msg: Optional message used in case of failure. Can only be used
when assertWarnsRegex is used as a context manager.
args: Extra args.
kwargs: Extra kwargs.
"""
context = _AssertWarnsContext(expected_warning, self, callable_obj,
expected_regex)
return context.handle('assertWarnsRegex', callable_obj, args, kwargs)
def assertRegex(self, text, expected_regex, msg=None):
"""Fail the test unless the text matches the regular expression."""
if isinstance(expected_regex, (str, bytes)):
assert expected_regex, "expected_regex must not be empty."
expected_regex = re.compile(expected_regex)
if not expected_regex.search(text):
msg = msg or "Regex didn't match"
msg = '%s: %r not found in %r' % (msg, expected_regex.pattern, text)
raise self.failureException(msg)
def assertNotRegex(self, text, unexpected_regex, msg=None):
"""Fail the test if the text matches the regular expression."""
if isinstance(unexpected_regex, (str, bytes)):
unexpected_regex = re.compile(unexpected_regex)
match = unexpected_regex.search(text)
if match:
msg = msg or "Regex matched"
msg = '%s: %r matches %r in %r' % (msg,
text[match.start():match.end()],
unexpected_regex.pattern,
text)
raise self.failureException(msg)
def _deprecate(original_func):
def deprecated_func(*args, **kwargs):
warnings.warn(
'Please use {0} instead.'.format(original_func.__name__),
DeprecationWarning, 2)
return original_func(*args, **kwargs)
return deprecated_func
# see #9424
failUnlessEqual = assertEquals = _deprecate(assertEqual)
failIfEqual = assertNotEquals = _deprecate(assertNotEqual)
failUnlessAlmostEqual = assertAlmostEquals = _deprecate(assertAlmostEqual)
failIfAlmostEqual = assertNotAlmostEquals = _deprecate(assertNotAlmostEqual)
failUnless = assert_ = _deprecate(assertTrue)
failUnlessRaises = _deprecate(assertRaises)
failIf = _deprecate(assertFalse)
assertRaisesRegexp = _deprecate(assertRaisesRegex)
assertRegexpMatches = _deprecate(assertRegex)
class FunctionTestCase(TestCase):
"""A test case that wraps a test function.
This is useful for slipping pre-existing test functions into the
unittest framework. Optionally, set-up and tidy-up functions can be
supplied. As with TestCase, the tidy-up ('tearDown') function will
always be called if the set-up ('setUp') function ran successfully.
"""
def __init__(self, testFunc, setUp=None, tearDown=None, description=None):
super(FunctionTestCase, self).__init__()
self._setUpFunc = setUp
self._tearDownFunc = tearDown
self._testFunc = testFunc
self._description = description
def setUp(self):
if self._setUpFunc is not None:
self._setUpFunc()
def tearDown(self):
if self._tearDownFunc is not None:
self._tearDownFunc()
def runTest(self):
self._testFunc()
def id(self):
return self._testFunc.__name__
def __eq__(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return self._setUpFunc == other._setUpFunc and \
self._tearDownFunc == other._tearDownFunc and \
self._testFunc == other._testFunc and \
self._description == other._description
def __hash__(self):
return hash((type(self), self._setUpFunc, self._tearDownFunc,
self._testFunc, self._description))
def __str__(self):
return "%s (%s)" % (strclass(self.__class__),
self._testFunc.__name__)
def __repr__(self):
return "<%s tec=%s>" % (strclass(self.__class__),
self._testFunc)
def shortDescription(self):
if self._description is not None:
return self._description
doc = self._testFunc.__doc__
return doc and doc.split("\n")[0].strip() or None
class _SubTest(TestCase):
def __init__(self, test_case, message, params):
super().__init__()
self._message = message
self.test_case = test_case
self.params = params
self.failureException = test_case.failureException
def runTest(self):
raise NotImplementedError("subtests cannot be run directly")
def _subDescription(self):
parts = []
if self._message:
parts.append("[{}]".format(self._message))
if self.params:
params_desc = ', '.join(
"{}={!r}".format(k, v)
for (k, v) in sorted(self.params.items()))
parts.append("({})".format(params_desc))
return " ".join(parts) or '(<subtest>)'
def id(self):
return "{} {}".format(self.test_case.id(), self._subDescription())
def shortDescription(self):
"""Returns a one-line description of the subtest, or None if no
description has been provided.
"""
return self.test_case.shortDescription()
def __str__(self):
return "{} {}".format(self.test_case, self._subDescription())
"""Loading unittests."""
import os
import re
import sys
import traceback
import types
import functools
from fnmatch import fnmatch
from . import case, suite, util
__unittest = True
# what about .pyc or .pyo (etc)
# we would need to avoid loading the same tests multiple times
# from '.py', '.pyc' *and* '.pyo'
VALID_MODULE_NAME = re.compile(r'[_a-z]\w*\.py$', re.IGNORECASE)
class _FailedTest(case.TestCase):
_testMethodName = None
def __init__(self, method_name, exception):
self._exception = exception
super(_FailedTest, self).__init__(method_name)
def __getattr__(self, name):
if name != self._testMethodName:
return super(_FailedTest, self).__getattr__(name)
def testFailure():
raise self._exception
return testFailure
def _make_failed_import_test(name, suiteClass):
message = 'Failed to import test module: %s\n%s' % (name, traceback.format_exc())
return _make_failed_test(name, ImportError(message), suiteClass)
def _make_failed_load_tests(name, exception, suiteClass):
return _make_failed_test(name, exception, suiteClass)
def _make_failed_test(methodname, exception, suiteClass):
test = _FailedTest(methodname, exception)
return suiteClass((test,))
def _make_skipped_test(methodname, exception, suiteClass):
@case.skip(str(exception))
def testSkipped(self):
pass
attrs = {methodname: testSkipped}
TestClass = type("ModuleSkipped", (case.TestCase,), attrs)
return suiteClass((TestClass(methodname),))
def _jython_aware_splitext(path):
if path.lower().endswith('$py.class'):
return path[:-9]
return os.path.splitext(path)[0]
class TestLoader(object):
"""
This class is responsible for loading tests according to various criteria
and returning them wrapped in a TestSuite
"""
testMethodPrefix = 'test'
sortTestMethodsUsing = staticmethod(util.three_way_cmp)
suiteClass = suite.TestSuite
_top_level_dir = None
def loadTestsFromTestCase(self, testCaseClass):
"""Return a suite of all tests cases contained in testCaseClass"""
if issubclass(testCaseClass, suite.TestSuite):
raise TypeError("Test cases should not be derived from "
"TestSuite. Maybe you meant to derive from "
"TestCase?")
testCaseNames = self.getTestCaseNames(testCaseClass)
if not testCaseNames and hasattr(testCaseClass, 'runTest'):
testCaseNames = ['runTest']
loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
return loaded_suite
def loadTestsFromModule(self, module, use_load_tests=True):
"""Return a suite of all tests cases contained in the given module"""
tests = []
for name in dir(module):
obj = getattr(module, name)
if isinstance(obj, type) and issubclass(obj, case.TestCase):
tests.append(self.loadTestsFromTestCase(obj))
load_tests = getattr(module, 'load_tests', None)
tests = self.suiteClass(tests)
if use_load_tests and load_tests is not None:
try:
return load_tests(self, tests, None)
except Exception as e:
return _make_failed_load_tests(module.__name__, e,
self.suiteClass)
return tests
def loadTestsFromName(self, name, module=None):
"""Return a suite of all tests cases given a string specifier.
The name may resolve either to a module, a test case class, a
test method within a test case class, or a callable object which
returns a TestCase or TestSuite instance.
The method optionally resolves the names relative to a given module.
"""
parts = name.split('.')
if module is None:
parts_copy = parts[:]
while parts_copy:
try:
module = __import__('.'.join(parts_copy))
break
except ImportError:
del parts_copy[-1]
if not parts_copy:
raise
parts = parts[1:]
obj = module
for part in parts:
parent, obj = obj, getattr(obj, part)
if isinstance(obj, types.ModuleType):
return self.loadTestsFromModule(obj)
elif isinstance(obj, type) and issubclass(obj, case.TestCase):
return self.loadTestsFromTestCase(obj)
elif (isinstance(obj, types.FunctionType) and
isinstance(parent, type) and
issubclass(parent, case.TestCase)):
name = parts[-1]
inst = parent(name)
# static methods follow a different path
if not isinstance(getattr(inst, name), types.FunctionType):
return self.suiteClass([inst])
elif isinstance(obj, suite.TestSuite):
return obj
if callable(obj):
test = obj()
if isinstance(test, suite.TestSuite):
return test
elif isinstance(test, case.TestCase):
return self.suiteClass([test])
else:
raise TypeError("calling %s returned %s, not a test" %
(obj, test))
else:
raise TypeError("don't know how to make test from: %s" % obj)
def loadTestsFromNames(self, names, module=None):
"""Return a suite of all tests cases found using the given sequence
of string specifiers. See 'loadTestsFromName()'.
"""
suites = [self.loadTestsFromName(name, module) for name in names]
return self.suiteClass(suites)
def getTestCaseNames(self, testCaseClass):
"""Return a sorted sequence of method names found within testCaseClass
"""
def isTestMethod(attrname, testCaseClass=testCaseClass,
prefix=self.testMethodPrefix):
return attrname.startswith(prefix) and \
callable(getattr(testCaseClass, attrname))
testFnNames = list(filter(isTestMethod, dir(testCaseClass)))
if self.sortTestMethodsUsing:
testFnNames.sort(key=functools.cmp_to_key(self.sortTestMethodsUsing))
return testFnNames
def discover(self, start_dir, pattern='test*.py', top_level_dir=None):
"""Find and return all test modules from the specified start
directory, recursing into subdirectories to find them and return all
tests found within them. Only test files that match the pattern will
be loaded. (Using shell style pattern matching.)
All test modules must be importable from the top level of the project.
If the start directory is not the top level directory then the top
level directory must be specified separately.
If a test package name (directory with '__init__.py') matches the
pattern then the package will be checked for a 'load_tests' function. If
this exists then it will be called with loader, tests, pattern.
If load_tests exists then discovery does *not* recurse into the package,
load_tests is responsible for loading all tests in the package.
The pattern is deliberately not stored as a loader attribute so that
packages can continue discovery themselves. top_level_dir is stored so
load_tests does not need to pass this argument in to loader.discover().
Paths are sorted before being imported to ensure reproducible execution
order even on filesystems with non-alphabetical ordering like ext3/4.
"""
set_implicit_top = False
if top_level_dir is None and self._top_level_dir is not None:
# make top_level_dir optional if called from load_tests in a package
top_level_dir = self._top_level_dir
elif top_level_dir is None:
set_implicit_top = True
top_level_dir = start_dir
top_level_dir = os.path.abspath(top_level_dir)
if not top_level_dir in sys.path:
# all test modules must be importable from the top level directory
# should we *unconditionally* put the start directory in first
# in sys.path to minimise likelihood of conflicts between installed
# modules and development versions?
sys.path.insert(0, top_level_dir)
self._top_level_dir = top_level_dir
is_not_importable = False
is_namespace = False
tests = []
if os.path.isdir(os.path.abspath(start_dir)):
start_dir = os.path.abspath(start_dir)
if start_dir != top_level_dir:
is_not_importable = not os.path.isfile(os.path.join(start_dir, '__init__.py'))
else:
# support for discovery from dotted module names
try:
__import__(start_dir)
except ImportError:
is_not_importable = True
else:
the_module = sys.modules[start_dir]
top_part = start_dir.split('.')[0]
try:
start_dir = os.path.abspath(
os.path.dirname((the_module.__file__)))
except AttributeError:
# look for namespace packages
try:
spec = the_module.__spec__
except AttributeError:
spec = None
if spec and spec.loader is None:
if spec.submodule_search_locations is not None:
is_namespace = True
for path in the_module.__path__:
if (not set_implicit_top and
not path.startswith(top_level_dir)):
continue
self._top_level_dir = \
(path.split(the_module.__name__
.replace(".", os.path.sep))[0])
tests.extend(self._find_tests(path,
pattern,
namespace=True))
elif the_module.__name__ in sys.builtin_module_names:
# builtin module
raise TypeError('Can not use builtin modules '
'as dotted module names') from None
else:
raise TypeError(
'don\'t know how to discover from {!r}'
.format(the_module)) from None
if set_implicit_top:
if not is_namespace:
self._top_level_dir = \
self._get_directory_containing_module(top_part)
sys.path.remove(top_level_dir)
else:
sys.path.remove(top_level_dir)
if is_not_importable:
raise ImportError('Start directory is not importable: %r' % start_dir)
if not is_namespace:
tests = list(self._find_tests(start_dir, pattern))
return self.suiteClass(tests)
def _get_directory_containing_module(self, module_name):
module = sys.modules[module_name]
full_path = os.path.abspath(module.__file__)
if os.path.basename(full_path).lower().startswith('__init__.py'):
return os.path.dirname(os.path.dirname(full_path))
else:
# here we have been given a module rather than a package - so
# all we can do is search the *same* directory the module is in
# should an exception be raised instead
return os.path.dirname(full_path)
def _get_name_from_path(self, path):
path = _jython_aware_splitext(os.path.normpath(path))
_relpath = os.path.relpath(path, self._top_level_dir)
assert not os.path.isabs(_relpath), "Path must be within the project"
assert not _relpath.startswith('..'), "Path must be within the project"
name = _relpath.replace(os.path.sep, '.')
return name
def _get_module_from_name(self, name):
__import__(name)
return sys.modules[name]
def _match_path(self, path, full_path, pattern):
# override this method to use alternative matching strategy
return fnmatch(path, pattern)
def _find_tests(self, start_dir, pattern, namespace=False):
"""Used by discovery. Yields test suites it loads."""
paths = sorted(os.listdir(start_dir))
for path in paths:
full_path = os.path.join(start_dir, path)
if os.path.isfile(full_path):
if not VALID_MODULE_NAME.match(path):
# valid Python identifiers only
continue
if not self._match_path(path, full_path, pattern):
continue
# if the test file matches, load it
name = self._get_name_from_path(full_path)
try:
module = self._get_module_from_name(name)
except case.SkipTest as e:
yield _make_skipped_test(name, e, self.suiteClass)
except:
yield _make_failed_import_test(name, self.suiteClass)
else:
mod_file = os.path.abspath(getattr(module, '__file__', full_path))
realpath = _jython_aware_splitext(os.path.realpath(mod_file))
fullpath_noext = _jython_aware_splitext(os.path.realpath(full_path))
if realpath.lower() != fullpath_noext.lower():
module_dir = os.path.dirname(realpath)
mod_name = _jython_aware_splitext(os.path.basename(full_path))
expected_dir = os.path.dirname(full_path)
msg = ("%r module incorrectly imported from %r. Expected %r. "
"Is this module globally installed?")
raise ImportError(msg % (mod_name, module_dir, expected_dir))
yield self.loadTestsFromModule(module)
elif os.path.isdir(full_path):
if (not namespace and
not os.path.isfile(os.path.join(full_path, '__init__.py'))):
continue
load_tests = None
tests = None
if fnmatch(path, pattern):
# only check load_tests if the package directory itself matches the filter
name = self._get_name_from_path(full_path)
package = self._get_module_from_name(name)
load_tests = getattr(package, 'load_tests', None)
tests = self.loadTestsFromModule(package, use_load_tests=False)
if load_tests is None:
if tests is not None:
# tests loaded from package file
yield tests
# recurse into the package
yield from self._find_tests(full_path, pattern,
namespace=namespace)
else:
try:
yield load_tests(self, tests, pattern)
except Exception as e:
yield _make_failed_load_tests(package.__name__, e,
self.suiteClass)
defaultTestLoader = TestLoader()
def _makeLoader(prefix, sortUsing, suiteClass=None):
loader = TestLoader()
loader.sortTestMethodsUsing = sortUsing
loader.testMethodPrefix = prefix
if suiteClass:
loader.suiteClass = suiteClass
return loader
def getTestCaseNames(testCaseClass, prefix, sortUsing=util.three_way_cmp):
return _makeLoader(prefix, sortUsing).getTestCaseNames(testCaseClass)
def makeSuite(testCaseClass, prefix='test', sortUsing=util.three_way_cmp,
suiteClass=suite.TestSuite):
return _makeLoader(prefix, sortUsing, suiteClass).loadTestsFromTestCase(
testCaseClass)
def findTestCases(module, prefix='test', sortUsing=util.three_way_cmp,
suiteClass=suite.TestSuite):
return _makeLoader(prefix, sortUsing, suiteClass).loadTestsFromModule(\
module)
"""Unittest main program"""
import sys
import argparse
import os
from . import loader, runner
from .signals import installHandler
__unittest = True
MAIN_EXAMPLES = """\
Examples:
%(prog)s test_module - run tests from test_module
%(prog)s module.TestClass - run tests from module.TestClass
%(prog)s module.Class.test_method - run specified test method
"""
MODULE_EXAMPLES = """\
Examples:
%(prog)s - run default set of tests
%(prog)s MyTestSuite - run suite 'MyTestSuite'
%(prog)s MyTestCase.testSomething - run MyTestCase.testSomething
%(prog)s MyTestCase - run all 'test*' test methods
in MyTestCase
"""
def _convert_name(name):
# on Linux / Mac OS X 'foo.PY' is not importable, but on
# Windows it is. Simpler to do a case insensitive match
# a better check would be to check that the name is a
# valid Python module name.
if os.path.isfile(name) and name.lower().endswith('.py'):
if os.path.isabs(name):
rel_path = os.path.relpath(name, os.getcwd())
if os.path.isabs(rel_path) or rel_path.startswith(os.pardir):
return name
name = rel_path
# on Windows both '\' and '/' are used as path
# separators. Better to replace both than rely on os.path.sep
return name[:-3].replace('\\', '.').replace('/', '.')
return name
def _convert_names(names):
return [_convert_name(name) for name in names]
class TestProgram(object):
"""A command-line program that runs a set of tests; this is primarily
for making test modules conveniently executable.
"""
# defaults for testing
module=None
verbosity = 1
failfast = catchbreak = buffer = progName = warnings = None
_discovery_parser = None
def __init__(self, module='__main__', defaultTest=None, argv=None,
testRunner=None, testLoader=loader.defaultTestLoader,
exit=True, verbosity=1, failfast=None, catchbreak=None,
buffer=None, warnings=None):
if isinstance(module, str):
self.module = __import__(module)
for part in module.split('.')[1:]:
self.module = getattr(self.module, part)
else:
self.module = module
if argv is None:
argv = sys.argv
self.exit = exit
self.failfast = failfast
self.catchbreak = catchbreak
self.verbosity = verbosity
self.buffer = buffer
if warnings is None and not sys.warnoptions:
# even if DreprecationWarnings are ignored by default
# print them anyway unless other warnings settings are
# specified by the warnings arg or the -W python flag
self.warnings = 'default'
else:
# here self.warnings is set either to the value passed
# to the warnings args or to None.
# If the user didn't pass a value self.warnings will
# be None. This means that the behavior is unchanged
# and depends on the values passed to -W.
self.warnings = warnings
self.defaultTest = defaultTest
self.testRunner = testRunner
self.testLoader = testLoader
self.progName = os.path.basename(argv[0])
self.parseArgs(argv)
self.runTests()
def usageExit(self, msg=None):
if msg:
print(msg)
if self._discovery_parser is None:
self._initArgParsers()
self._print_help()
sys.exit(2)
def _print_help(self, *args, **kwargs):
if self.module is None:
print(self._main_parser.format_help())
print(MAIN_EXAMPLES % {'prog': self.progName})
self._discovery_parser.print_help()
else:
print(self._main_parser.format_help())
print(MODULE_EXAMPLES % {'prog': self.progName})
def parseArgs(self, argv):
self._initArgParsers()
if self.module is None:
if len(argv) > 1 and argv[1].lower() == 'discover':
self._do_discovery(argv[2:])
return
self._main_parser.parse_args(argv[1:], self)
if not self.tests:
# this allows "python -m unittest -v" to still work for
# test discovery.
self._do_discovery([])
return
else:
self._main_parser.parse_args(argv[1:], self)
if self.tests:
self.testNames = _convert_names(self.tests)
if __name__ == '__main__':
# to support python -m unittest ...
self.module = None
elif self.defaultTest is None:
# createTests will load tests from self.module
self.testNames = None
elif isinstance(self.defaultTest, str):
self.testNames = (self.defaultTest,)
else:
self.testNames = list(self.defaultTest)
self.createTests()
def createTests(self):
if self.testNames is None:
self.test = self.testLoader.loadTestsFromModule(self.module)
else:
self.test = self.testLoader.loadTestsFromNames(self.testNames,
self.module)
def _initArgParsers(self):
parent_parser = self._getParentArgParser()
self._main_parser = self._getMainArgParser(parent_parser)
self._discovery_parser = self._getDiscoveryArgParser(parent_parser)
def _getParentArgParser(self):
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument('-v', '--verbose', dest='verbosity',
action='store_const', const=2,
help='Verbose output')
parser.add_argument('-q', '--quiet', dest='verbosity',
action='store_const', const=0,
help='Quiet output')
if self.failfast is None:
parser.add_argument('-f', '--failfast', dest='failfast',
action='store_true',
help='Stop on first fail or error')
self.failfast = False
if self.catchbreak is None:
parser.add_argument('-c', '--catch', dest='catchbreak',
action='store_true',
help='Catch Ctrl-C and display results so far')
self.catchbreak = False
if self.buffer is None:
parser.add_argument('-b', '--buffer', dest='buffer',
action='store_true',
help='Buffer stdout and stderr during tests')
self.buffer = False
return parser
def _getMainArgParser(self, parent):
parser = argparse.ArgumentParser(parents=[parent])
parser.prog = self.progName
parser.print_help = self._print_help
parser.add_argument('tests', nargs='*',
help='a list of any number of test modules, '
'classes and test methods.')
return parser
def _getDiscoveryArgParser(self, parent):
parser = argparse.ArgumentParser(parents=[parent])
parser.prog = '%s discover' % self.progName
parser.epilog = ('For test discovery all test modules must be '
'importable from the top level directory of the '
'project.')
parser.add_argument('-s', '--start-directory', dest='start',
help="Directory to start discovery ('.' default)")
parser.add_argument('-p', '--pattern', dest='pattern',
help="Pattern to match tests ('test*.py' default)")
parser.add_argument('-t', '--top-level-directory', dest='top',
help='Top level directory of project (defaults to '
'start directory)')
for arg in ('start', 'pattern', 'top'):
parser.add_argument(arg, nargs='?',
default=argparse.SUPPRESS,
help=argparse.SUPPRESS)
return parser
def _do_discovery(self, argv, Loader=None):
self.start = '.'
self.pattern = 'test*.py'
self.top = None
if argv is not None:
# handle command line args for test discovery
if self._discovery_parser is None:
# for testing
self._initArgParsers()
self._discovery_parser.parse_args(argv, self)
loader = self.testLoader if Loader is None else Loader()
self.test = loader.discover(self.start, self.pattern, self.top)
def runTests(self):
if self.catchbreak:
installHandler()
if self.testRunner is None:
self.testRunner = runner.TextTestRunner
if isinstance(self.testRunner, type):
try:
testRunner = self.testRunner(verbosity=self.verbosity,
failfast=self.failfast,
buffer=self.buffer,
warnings=self.warnings)
except TypeError:
# didn't accept the verbosity, buffer or failfast arguments
testRunner = self.testRunner()
else:
# it is assumed to be a TestRunner instance
testRunner = self.testRunner
self.result = testRunner.run(self.test)
if self.exit:
sys.exit(not self.result.wasSuccessful())
main = TestProgram
# mock.py
# Test tools for mocking and patching.
# Maintained by Michael Foord
# Backport for other versions of Python available from
# http://pypi.python.org/pypi/mock
__all__ = (
'Mock',
'MagicMock',
'patch',
'sentinel',
'DEFAULT',
'ANY',
'call',
'create_autospec',
'FILTER_DIR',
'NonCallableMock',
'NonCallableMagicMock',
'mock_open',
'PropertyMock',
)
__version__ = '1.0'
import inspect
import pprint
import sys
from functools import wraps, partial
BaseExceptions = (BaseException,)
if 'java' in sys.platform:
# jython
import java
BaseExceptions = (BaseException, java.lang.Throwable)
FILTER_DIR = True
# Workaround for issue #12370
# Without this, the __class__ properties wouldn't be set correctly
_safe_super = super
def _is_instance_mock(obj):
# can't use isinstance on Mock objects because they override __class__
# The base class for all mocks is NonCallableMock
return issubclass(type(obj), NonCallableMock)
def _is_exception(obj):
return (
isinstance(obj, BaseExceptions) or
isinstance(obj, type) and issubclass(obj, BaseExceptions)
)
class _slotted(object):
__slots__ = ['a']
DescriptorTypes = (
type(_slotted.a),
property,
)
def _get_signature_object(func, as_instance, eat_self):
"""
Given an arbitrary, possibly callable object, try to create a suitable
signature object.
Return a (reduced func, signature) tuple, or None.
"""
if isinstance(func, type) and not as_instance:
# If it's a type and should be modelled as a type, use __init__.
try:
func = func.__init__
except AttributeError:
return None
# Skip the `self` argument in __init__
eat_self = True
elif not isinstance(func, FunctionTypes):
# If we really want to model an instance of the passed type,
# __call__ should be looked up, not __init__.
try:
func = func.__call__
except AttributeError:
return None
if eat_self:
sig_func = partial(func, None)
else:
sig_func = func
try:
return func, inspect.signature(sig_func)
except ValueError:
# Certain callable types are not supported by inspect.signature()
return None
def _check_signature(func, mock, skipfirst, instance=False):
sig = _get_signature_object(func, instance, skipfirst)
if sig is None:
return
func, sig = sig
def checksig(_mock_self, *args, **kwargs):
sig.bind(*args, **kwargs)
_copy_func_details(func, checksig)
type(mock)._mock_check_sig = checksig
def _copy_func_details(func, funcopy):
funcopy.__name__ = func.__name__
funcopy.__doc__ = func.__doc__
try:
funcopy.__text_signature__ = func.__text_signature__
except AttributeError:
pass
# we explicitly don't copy func.__dict__ into this copy as it would
# expose original attributes that should be mocked
try:
funcopy.__module__ = func.__module__
except AttributeError:
pass
try:
funcopy.__defaults__ = func.__defaults__
except AttributeError:
pass
try:
funcopy.__kwdefaults__ = func.__kwdefaults__
except AttributeError:
pass
def _callable(obj):
if isinstance(obj, type):
return True
if getattr(obj, '__call__', None) is not None:
return True
return False
def _is_list(obj):
# checks for list or tuples
# XXXX badly named!
return type(obj) in (list, tuple)
def _instance_callable(obj):
"""Given an object, return True if the object is callable.
For classes, return True if instances would be callable."""
if not isinstance(obj, type):
# already an instance
return getattr(obj, '__call__', None) is not None
# *could* be broken by a class overriding __mro__ or __dict__ via
# a metaclass
for base in (obj,) + obj.__mro__:
if base.__dict__.get('__call__') is not None:
return True
return False
def _set_signature(mock, original, instance=False):
# creates a function with signature (*args, **kwargs) that delegates to a
# mock. It still does signature checking by calling a lambda with the same
# signature as the original.
if not _callable(original):
return
skipfirst = isinstance(original, type)
result = _get_signature_object(original, instance, skipfirst)
if result is None:
return
func, sig = result
def checksig(*args, **kwargs):
sig.bind(*args, **kwargs)
_copy_func_details(func, checksig)
name = original.__name__
if not name.isidentifier():
name = 'funcopy'
context = {'_checksig_': checksig, 'mock': mock}
src = """def %s(*args, **kwargs):
_checksig_(*args, **kwargs)
return mock(*args, **kwargs)""" % name
exec (src, context)
funcopy = context[name]
_setup_func(funcopy, mock)
return funcopy
def _setup_func(funcopy, mock):
funcopy.mock = mock
# can't use isinstance with mocks
if not _is_instance_mock(mock):
return
def assert_called_with(*args, **kwargs):
return mock.assert_called_with(*args, **kwargs)
def assert_called_once_with(*args, **kwargs):
return mock.assert_called_once_with(*args, **kwargs)
def assert_has_calls(*args, **kwargs):
return mock.assert_has_calls(*args, **kwargs)
def assert_any_call(*args, **kwargs):
return mock.assert_any_call(*args, **kwargs)
def reset_mock():
funcopy.method_calls = _CallList()
funcopy.mock_calls = _CallList()
mock.reset_mock()
ret = funcopy.return_value
if _is_instance_mock(ret) and not ret is mock:
ret.reset_mock()
funcopy.called = False
funcopy.call_count = 0
funcopy.call_args = None
funcopy.call_args_list = _CallList()
funcopy.method_calls = _CallList()
funcopy.mock_calls = _CallList()
funcopy.return_value = mock.return_value
funcopy.side_effect = mock.side_effect
funcopy._mock_children = mock._mock_children
funcopy.assert_called_with = assert_called_with
funcopy.assert_called_once_with = assert_called_once_with
funcopy.assert_has_calls = assert_has_calls
funcopy.assert_any_call = assert_any_call
funcopy.reset_mock = reset_mock
mock._mock_delegate = funcopy
def _is_magic(name):
return '__%s__' % name[2:-2] == name
class _SentinelObject(object):
"A unique, named, sentinel object."
def __init__(self, name):
self.name = name
def __repr__(self):
return 'sentinel.%s' % self.name
class _Sentinel(object):
"""Access attributes to return a named object, usable as a sentinel."""
def __init__(self):
self._sentinels = {}
def __getattr__(self, name):
if name == '__bases__':
# Without this help(unittest.mock) raises an exception
raise AttributeError
return self._sentinels.setdefault(name, _SentinelObject(name))
sentinel = _Sentinel()
DEFAULT = sentinel.DEFAULT
_missing = sentinel.MISSING
_deleted = sentinel.DELETED
def _copy(value):
if type(value) in (dict, list, tuple, set):
return type(value)(value)
return value
_allowed_names = set(
[
'return_value', '_mock_return_value', 'side_effect',
'_mock_side_effect', '_mock_parent', '_mock_new_parent',
'_mock_name', '_mock_new_name'
]
)
def _delegating_property(name):
_allowed_names.add(name)
_the_name = '_mock_' + name
def _get(self, name=name, _the_name=_the_name):
sig = self._mock_delegate
if sig is None:
return getattr(self, _the_name)
return getattr(sig, name)
def _set(self, value, name=name, _the_name=_the_name):
sig = self._mock_delegate
if sig is None:
self.__dict__[_the_name] = value
else:
setattr(sig, name, value)
return property(_get, _set)
class _CallList(list):
def __contains__(self, value):
if not isinstance(value, list):
return list.__contains__(self, value)
len_value = len(value)
len_self = len(self)
if len_value > len_self:
return False
for i in range(0, len_self - len_value + 1):
sub_list = self[i:i+len_value]
if sub_list == value:
return True
return False
def __repr__(self):
return pprint.pformat(list(self))
def _check_and_set_parent(parent, value, name, new_name):
if not _is_instance_mock(value):
return False
if ((value._mock_name or value._mock_new_name) or
(value._mock_parent is not None) or
(value._mock_new_parent is not None)):
return False
_parent = parent
while _parent is not None:
# setting a mock (value) as a child or return value of itself
# should not modify the mock
if _parent is value:
return False
_parent = _parent._mock_new_parent
if new_name:
value._mock_new_parent = parent
value._mock_new_name = new_name
if name:
value._mock_parent = parent
value._mock_name = name
return True
# Internal class to identify if we wrapped an iterator object or not.
class _MockIter(object):
def __init__(self, obj):
self.obj = iter(obj)
def __iter__(self):
return self
def __next__(self):
return next(self.obj)
class Base(object):
_mock_return_value = DEFAULT
_mock_side_effect = None
def __init__(self, *args, **kwargs):
pass
class NonCallableMock(Base):
"""A non-callable version of `Mock`"""
def __new__(cls, *args, **kw):
# every instance has its own class
# so we can create magic methods on the
# class without stomping on other mocks
new = type(cls.__name__, (cls,), {'__doc__': cls.__doc__})
instance = object.__new__(new)
return instance
def __init__(
self, spec=None, wraps=None, name=None, spec_set=None,
parent=None, _spec_state=None, _new_name='', _new_parent=None,
_spec_as_instance=False, _eat_self=None, **kwargs
):
if _new_parent is None:
_new_parent = parent
__dict__ = self.__dict__
__dict__['_mock_parent'] = parent
__dict__['_mock_name'] = name
__dict__['_mock_new_name'] = _new_name
__dict__['_mock_new_parent'] = _new_parent
if spec_set is not None:
spec = spec_set
spec_set = True
if _eat_self is None:
_eat_self = parent is not None
self._mock_add_spec(spec, spec_set, _spec_as_instance, _eat_self)
__dict__['_mock_children'] = {}
__dict__['_mock_wraps'] = wraps
__dict__['_mock_delegate'] = None
__dict__['_mock_called'] = False
__dict__['_mock_call_args'] = None
__dict__['_mock_call_count'] = 0
__dict__['_mock_call_args_list'] = _CallList()
__dict__['_mock_mock_calls'] = _CallList()
__dict__['method_calls'] = _CallList()
if kwargs:
self.configure_mock(**kwargs)
_safe_super(NonCallableMock, self).__init__(
spec, wraps, name, spec_set, parent,
_spec_state
)
def attach_mock(self, mock, attribute):
"""
Attach a mock as an attribute of this one, replacing its name and
parent. Calls to the attached mock will be recorded in the
`method_calls` and `mock_calls` attributes of this one."""
mock._mock_parent = None
mock._mock_new_parent = None
mock._mock_name = ''
mock._mock_new_name = None
setattr(self, attribute, mock)
def mock_add_spec(self, spec, spec_set=False):
"""Add a spec to a mock. `spec` can either be an object or a
list of strings. Only attributes on the `spec` can be fetched as
attributes from the mock.
If `spec_set` is True then only attributes on the spec can be set."""
self._mock_add_spec(spec, spec_set)
def _mock_add_spec(self, spec, spec_set, _spec_as_instance=False,
_eat_self=False):
_spec_class = None
_spec_signature = None
if spec is not None and not _is_list(spec):
if isinstance(spec, type):
_spec_class = spec
else:
_spec_class = _get_class(spec)
res = _get_signature_object(spec,
_spec_as_instance, _eat_self)
_spec_signature = res and res[1]
spec = dir(spec)
__dict__ = self.__dict__
__dict__['_spec_class'] = _spec_class
__dict__['_spec_set'] = spec_set
__dict__['_spec_signature'] = _spec_signature
__dict__['_mock_methods'] = spec
def __get_return_value(self):
ret = self._mock_return_value
if self._mock_delegate is not None:
ret = self._mock_delegate.return_value
if ret is DEFAULT:
ret = self._get_child_mock(
_new_parent=self, _new_name='()'
)
self.return_value = ret
return ret
def __set_return_value(self, value):
if self._mock_delegate is not None:
self._mock_delegate.return_value = value
else:
self._mock_return_value = value
_check_and_set_parent(self, value, None, '()')
__return_value_doc = "The value to be returned when the mock is called."
return_value = property(__get_return_value, __set_return_value,
__return_value_doc)
@property
def __class__(self):
if self._spec_class is None:
return type(self)
return self._spec_class
called = _delegating_property('called')
call_count = _delegating_property('call_count')
call_args = _delegating_property('call_args')
call_args_list = _delegating_property('call_args_list')
mock_calls = _delegating_property('mock_calls')
def __get_side_effect(self):
delegated = self._mock_delegate
if delegated is None:
return self._mock_side_effect
sf = delegated.side_effect
if sf is not None and not callable(sf) and not isinstance(sf, _MockIter):
sf = _MockIter(sf)
delegated.side_effect = sf
return sf
def __set_side_effect(self, value):
value = _try_iter(value)
delegated = self._mock_delegate
if delegated is None:
self._mock_side_effect = value
else:
delegated.side_effect = value
side_effect = property(__get_side_effect, __set_side_effect)
def reset_mock(self, visited=None):
"Restore the mock object to its initial state."
if visited is None:
visited = []
if id(self) in visited:
return
visited.append(id(self))
self.called = False
self.call_args = None
self.call_count = 0
self.mock_calls = _CallList()
self.call_args_list = _CallList()
self.method_calls = _CallList()
for child in self._mock_children.values():
if isinstance(child, _SpecState):
continue
child.reset_mock(visited)
ret = self._mock_return_value
if _is_instance_mock(ret) and ret is not self:
ret.reset_mock(visited)
def configure_mock(self, **kwargs):
"""Set attributes on the mock through keyword arguments.
Attributes plus return values and side effects can be set on child
mocks using standard dot notation and unpacking a dictionary in the
method call:
>>> attrs = {'method.return_value': 3, 'other.side_effect': KeyError}
>>> mock.configure_mock(**attrs)"""
for arg, val in sorted(kwargs.items(),
# we sort on the number of dots so that
# attributes are set before we set attributes on
# attributes
key=lambda entry: entry[0].count('.')):
args = arg.split('.')
final = args.pop()
obj = self
for entry in args:
obj = getattr(obj, entry)
setattr(obj, final, val)
def __getattr__(self, name):
if name == '_mock_methods':
raise AttributeError(name)
elif self._mock_methods is not None:
if name not in self._mock_methods or name in _all_magics:
raise AttributeError("Mock object has no attribute %r" % name)
elif _is_magic(name):
raise AttributeError(name)
result = self._mock_children.get(name)
if result is _deleted:
raise AttributeError(name)
elif result is None:
wraps = None
if self._mock_wraps is not None:
# XXXX should we get the attribute without triggering code
# execution?
wraps = getattr(self._mock_wraps, name)
result = self._get_child_mock(
parent=self, name=name, wraps=wraps, _new_name=name,
_new_parent=self
)
self._mock_children[name] = result
elif isinstance(result, _SpecState):
result = create_autospec(
result.spec, result.spec_set, result.instance,
result.parent, result.name
)
self._mock_children[name] = result
return result
def __repr__(self):
_name_list = [self._mock_new_name]
_parent = self._mock_new_parent
last = self
dot = '.'
if _name_list == ['()']:
dot = ''
seen = set()
while _parent is not None:
last = _parent
_name_list.append(_parent._mock_new_name + dot)
dot = '.'
if _parent._mock_new_name == '()':
dot = ''
_parent = _parent._mock_new_parent
# use ids here so as not to call __hash__ on the mocks
if id(_parent) in seen:
break
seen.add(id(_parent))
_name_list = list(reversed(_name_list))
_first = last._mock_name or 'mock'
if len(_name_list) > 1:
if _name_list[1] not in ('()', '().'):
_first += '.'
_name_list[0] = _first
name = ''.join(_name_list)
name_string = ''
if name not in ('mock', 'mock.'):
name_string = ' name=%r' % name
spec_string = ''
if self._spec_class is not None:
spec_string = ' spec=%r'
if self._spec_set:
spec_string = ' spec_set=%r'
spec_string = spec_string % self._spec_class.__name__
return "<%s%s%s id='%s'>" % (
type(self).__name__,
name_string,
spec_string,
id(self)
)
def __dir__(self):
"""Filter the output of `dir(mock)` to only useful members."""
if not FILTER_DIR:
return object.__dir__(self)
extras = self._mock_methods or []
from_type = dir(type(self))
from_dict = list(self.__dict__)
from_type = [e for e in from_type if not e.startswith('_')]
from_dict = [e for e in from_dict if not e.startswith('_') or
_is_magic(e)]
return sorted(set(extras + from_type + from_dict +
list(self._mock_children)))
def __setattr__(self, name, value):
if name in _allowed_names:
# property setters go through here
return object.__setattr__(self, name, value)
elif (self._spec_set and self._mock_methods is not None and
name not in self._mock_methods and
name not in self.__dict__):
raise AttributeError("Mock object has no attribute '%s'" % name)
elif name in _unsupported_magics:
msg = 'Attempting to set unsupported magic method %r.' % name
raise AttributeError(msg)
elif name in _all_magics:
if self._mock_methods is not None and name not in self._mock_methods:
raise AttributeError("Mock object has no attribute '%s'" % name)
if not _is_instance_mock(value):
setattr(type(self), name, _get_method(name, value))
original = value
value = lambda *args, **kw: original(self, *args, **kw)
else:
# only set _new_name and not name so that mock_calls is tracked
# but not method calls
_check_and_set_parent(self, value, None, name)
setattr(type(self), name, value)
self._mock_children[name] = value
elif name == '__class__':
self._spec_class = value
return
else:
if _check_and_set_parent(self, value, name, name):
self._mock_children[name] = value
return object.__setattr__(self, name, value)
def __delattr__(self, name):
if name in _all_magics and name in type(self).__dict__:
delattr(type(self), name)
if name not in self.__dict__:
# for magic methods that are still MagicProxy objects and
# not set on the instance itself
return
if name in self.__dict__:
object.__delattr__(self, name)
obj = self._mock_children.get(name, _missing)
if obj is _deleted:
raise AttributeError(name)
if obj is not _missing:
del self._mock_children[name]
self._mock_children[name] = _deleted
def _format_mock_call_signature(self, args, kwargs):
name = self._mock_name or 'mock'
return _format_call_signature(name, args, kwargs)
def _format_mock_failure_message(self, args, kwargs):
message = 'Expected call: %s\nActual call: %s'
expected_string = self._format_mock_call_signature(args, kwargs)
call_args = self.call_args
if len(call_args) == 3:
call_args = call_args[1:]
actual_string = self._format_mock_call_signature(*call_args)
return message % (expected_string, actual_string)
def _call_matcher(self, _call):
"""
Given a call (or simply a (args, kwargs) tuple), return a
comparison key suitable for matching with other calls.
This is a best effort method which relies on the spec's signature,
if available, or falls back on the arguments themselves.
"""
sig = self._spec_signature
if sig is not None:
if len(_call) == 2:
name = ''
args, kwargs = _call
else:
name, args, kwargs = _call
try:
return name, sig.bind(*args, **kwargs)
except TypeError as e:
return e.with_traceback(None)
else:
return _call
def assert_called_with(_mock_self, *args, **kwargs):
"""assert that the mock was called with the specified arguments.
Raises an AssertionError if the args and keyword args passed in are
different to the last call to the mock."""
self = _mock_self
if self.call_args is None:
expected = self._format_mock_call_signature(args, kwargs)
raise AssertionError('Expected call: %s\nNot called' % (expected,))
def _error_message():
msg = self._format_mock_failure_message(args, kwargs)
return msg
expected = self._call_matcher((args, kwargs))
actual = self._call_matcher(self.call_args)
if expected != actual:
cause = expected if isinstance(expected, Exception) else None
raise AssertionError(_error_message()) from cause
def assert_called_once_with(_mock_self, *args, **kwargs):
"""assert that the mock was called exactly once and with the specified
arguments."""
self = _mock_self
if not self.call_count == 1:
msg = ("Expected '%s' to be called once. Called %s times." %
(self._mock_name or 'mock', self.call_count))
raise AssertionError(msg)
return self.assert_called_with(*args, **kwargs)
def assert_has_calls(self, calls, any_order=False):
"""assert the mock has been called with the specified calls.
The `mock_calls` list is checked for the calls.
If `any_order` is False (the default) then the calls must be
sequential. There can be extra calls before or after the
specified calls.
If `any_order` is True then the calls can be in any order, but
they must all appear in `mock_calls`."""
expected = [self._call_matcher(c) for c in calls]
cause = expected if isinstance(expected, Exception) else None
all_calls = _CallList(self._call_matcher(c) for c in self.mock_calls)
if not any_order:
if expected not in all_calls:
raise AssertionError(
'Calls not found.\nExpected: %r\n'
'Actual: %r' % (calls, self.mock_calls)
) from cause
return
all_calls = list(all_calls)
not_found = []
for kall in expected:
try:
all_calls.remove(kall)
except ValueError:
not_found.append(kall)
if not_found:
raise AssertionError(
'%r not all found in call list' % (tuple(not_found),)
) from cause
def assert_any_call(self, *args, **kwargs):
"""assert the mock has been called with the specified arguments.
The assert passes if the mock has *ever* been called, unlike
`assert_called_with` and `assert_called_once_with` that only pass if
the call is the most recent one."""
expected = self._call_matcher((args, kwargs))
actual = [self._call_matcher(c) for c in self.call_args_list]
if expected not in actual:
cause = expected if isinstance(expected, Exception) else None
expected_string = self._format_mock_call_signature(args, kwargs)
raise AssertionError(
'%s call not found' % expected_string
) from cause
def _get_child_mock(self, **kw):
"""Create the child mocks for attributes and return value.
By default child mocks will be the same type as the parent.
Subclasses of Mock may want to override this to customize the way
child mocks are made.
For non-callable mocks the callable variant will be used (rather than
any custom subclass)."""
_type = type(self)
if not issubclass(_type, CallableMixin):
if issubclass(_type, NonCallableMagicMock):
klass = MagicMock
elif issubclass(_type, NonCallableMock) :
klass = Mock
else:
klass = _type.__mro__[1]
return klass(**kw)
def _try_iter(obj):
if obj is None:
return obj
if _is_exception(obj):
return obj
if _callable(obj):
return obj
try:
return iter(obj)
except TypeError:
# XXXX backwards compatibility
# but this will blow up on first call - so maybe we should fail early?
return obj
class CallableMixin(Base):
def __init__(self, spec=None, side_effect=None, return_value=DEFAULT,
wraps=None, name=None, spec_set=None, parent=None,
_spec_state=None, _new_name='', _new_parent=None, **kwargs):
self.__dict__['_mock_return_value'] = return_value
_safe_super(CallableMixin, self).__init__(
spec, wraps, name, spec_set, parent,
_spec_state, _new_name, _new_parent, **kwargs
)
self.side_effect = side_effect
def _mock_check_sig(self, *args, **kwargs):
# stub method that can be replaced with one with a specific signature
pass
def __call__(_mock_self, *args, **kwargs):
# can't use self in-case a function / method we are mocking uses self
# in the signature
_mock_self._mock_check_sig(*args, **kwargs)
return _mock_self._mock_call(*args, **kwargs)
def _mock_call(_mock_self, *args, **kwargs):
self = _mock_self
self.called = True
self.call_count += 1
_new_name = self._mock_new_name
_new_parent = self._mock_new_parent
_call = _Call((args, kwargs), two=True)
self.call_args = _call
self.call_args_list.append(_call)
self.mock_calls.append(_Call(('', args, kwargs)))
seen = set()
skip_next_dot = _new_name == '()'
do_method_calls = self._mock_parent is not None
name = self._mock_name
while _new_parent is not None:
this_mock_call = _Call((_new_name, args, kwargs))
if _new_parent._mock_new_name:
dot = '.'
if skip_next_dot:
dot = ''
skip_next_dot = False
if _new_parent._mock_new_name == '()':
skip_next_dot = True
_new_name = _new_parent._mock_new_name + dot + _new_name
if do_method_calls:
if _new_name == name:
this_method_call = this_mock_call
else:
this_method_call = _Call((name, args, kwargs))
_new_parent.method_calls.append(this_method_call)
do_method_calls = _new_parent._mock_parent is not None
if do_method_calls:
name = _new_parent._mock_name + '.' + name
_new_parent.mock_calls.append(this_mock_call)
_new_parent = _new_parent._mock_new_parent
# use ids here so as not to call __hash__ on the mocks
_new_parent_id = id(_new_parent)
if _new_parent_id in seen:
break
seen.add(_new_parent_id)
ret_val = DEFAULT
effect = self.side_effect
if effect is not None:
if _is_exception(effect):
raise effect
if not _callable(effect):
result = next(effect)
if _is_exception(result):
raise result
if result is DEFAULT:
result = self.return_value
return result
ret_val = effect(*args, **kwargs)
if (self._mock_wraps is not None and
self._mock_return_value is DEFAULT):
return self._mock_wraps(*args, **kwargs)
if ret_val is DEFAULT:
ret_val = self.return_value
return ret_val
class Mock(CallableMixin, NonCallableMock):
"""
Create a new `Mock` object. `Mock` takes several optional arguments
that specify the behaviour of the Mock object:
* `spec`: This can be either a list of strings or an existing object (a
class or instance) that acts as the specification for the mock object. If
you pass in an object then a list of strings is formed by calling dir on
the object (excluding unsupported magic attributes and methods). Accessing
any attribute not in this list will raise an `AttributeError`.
If `spec` is an object (rather than a list of strings) then
`mock.__class__` returns the class of the spec object. This allows mocks
to pass `isinstance` tests.
* `spec_set`: A stricter variant of `spec`. If used, attempting to *set*
or get an attribute on the mock that isn't on the object passed as
`spec_set` will raise an `AttributeError`.
* `side_effect`: A function to be called whenever the Mock is called. See
the `side_effect` attribute. Useful for raising exceptions or
dynamically changing return values. The function is called with the same
arguments as the mock, and unless it returns `DEFAULT`, the return
value of this function is used as the return value.
If `side_effect` is an iterable then each call to the mock will return
the next value from the iterable. If any of the members of the iterable
are exceptions they will be raised instead of returned.
* `return_value`: The value returned when the mock is called. By default
this is a new Mock (created on first access). See the
`return_value` attribute.
* `wraps`: Item for the mock object to wrap. If `wraps` is not None then
calling the Mock will pass the call through to the wrapped object
(returning the real result). Attribute access on the mock will return a
Mock object that wraps the corresponding attribute of the wrapped object
(so attempting to access an attribute that doesn't exist will raise an
`AttributeError`).
If the mock has an explicit `return_value` set then calls are not passed
to the wrapped object and the `return_value` is returned instead.
* `name`: If the mock has a name then it will be used in the repr of the
mock. This can be useful for debugging. The name is propagated to child
mocks.
Mocks can also be called with arbitrary keyword arguments. These will be
used to set attributes on the mock after it is created.
"""
def _dot_lookup(thing, comp, import_path):
try:
return getattr(thing, comp)
except AttributeError:
__import__(import_path)
return getattr(thing, comp)
def _importer(target):
components = target.split('.')
import_path = components.pop(0)
thing = __import__(import_path)
for comp in components:
import_path += ".%s" % comp
thing = _dot_lookup(thing, comp, import_path)
return thing
def _is_started(patcher):
# XXXX horrible
return hasattr(patcher, 'is_local')
class _patch(object):
attribute_name = None
_active_patches = []
def __init__(
self, getter, attribute, new, spec, create,
spec_set, autospec, new_callable, kwargs
):
if new_callable is not None:
if new is not DEFAULT:
raise ValueError(
"Cannot use 'new' and 'new_callable' together"
)
if autospec is not None:
raise ValueError(
"Cannot use 'autospec' and 'new_callable' together"
)
self.getter = getter
self.attribute = attribute
self.new = new
self.new_callable = new_callable
self.spec = spec
self.create = create
self.has_local = False
self.spec_set = spec_set
self.autospec = autospec
self.kwargs = kwargs
self.additional_patchers = []
def copy(self):
patcher = _patch(
self.getter, self.attribute, self.new, self.spec,
self.create, self.spec_set,
self.autospec, self.new_callable, self.kwargs
)
patcher.attribute_name = self.attribute_name
patcher.additional_patchers = [
p.copy() for p in self.additional_patchers
]
return patcher
def __call__(self, func):
if isinstance(func, type):
return self.decorate_class(func)
return self.decorate_callable(func)
def decorate_class(self, klass):
for attr in dir(klass):
if not attr.startswith(patch.TEST_PREFIX):
continue
attr_value = getattr(klass, attr)
if not hasattr(attr_value, "__call__"):
continue
patcher = self.copy()
setattr(klass, attr, patcher(attr_value))
return klass
def decorate_callable(self, func):
if hasattr(func, 'patchings'):
func.patchings.append(self)
return func
@wraps(func)
def patched(*args, **keywargs):
extra_args = []
entered_patchers = []
exc_info = tuple()
try:
for patching in patched.patchings:
arg = patching.__enter__()
entered_patchers.append(patching)
if patching.attribute_name is not None:
keywargs.update(arg)
elif patching.new is DEFAULT:
extra_args.append(arg)
args += tuple(extra_args)
return func(*args, **keywargs)
except:
if (patching not in entered_patchers and
_is_started(patching)):
# the patcher may have been started, but an exception
# raised whilst entering one of its additional_patchers
entered_patchers.append(patching)
# Pass the exception to __exit__
exc_info = sys.exc_info()
# re-raise the exception
raise
finally:
for patching in reversed(entered_patchers):
patching.__exit__(*exc_info)
patched.patchings = [self]
return patched
def get_original(self):
target = self.getter()
name = self.attribute
original = DEFAULT
local = False
try:
original = target.__dict__[name]
except (AttributeError, KeyError):
original = getattr(target, name, DEFAULT)
else:
local = True
if not self.create and original is DEFAULT:
raise AttributeError(
"%s does not have the attribute %r" % (target, name)
)
return original, local
def __enter__(self):
"""Perform the patch."""
new, spec, spec_set = self.new, self.spec, self.spec_set
autospec, kwargs = self.autospec, self.kwargs
new_callable = self.new_callable
self.target = self.getter()
# normalise False to None
if spec is False:
spec = None
if spec_set is False:
spec_set = None
if autospec is False:
autospec = None
if spec is not None and autospec is not None:
raise TypeError("Can't specify spec and autospec")
if ((spec is not None or autospec is not None) and
spec_set not in (True, None)):
raise TypeError("Can't provide explicit spec_set *and* spec or autospec")
original, local = self.get_original()
if new is DEFAULT and autospec is None:
inherit = False
if spec is True:
# set spec to the object we are replacing
spec = original
if spec_set is True:
spec_set = original
spec = None
elif spec is not None:
if spec_set is True:
spec_set = spec
spec = None
elif spec_set is True:
spec_set = original
if spec is not None or spec_set is not None:
if original is DEFAULT:
raise TypeError("Can't use 'spec' with create=True")
if isinstance(original, type):
# If we're patching out a class and there is a spec
inherit = True
Klass = MagicMock
_kwargs = {}
if new_callable is not None:
Klass = new_callable
elif spec is not None or spec_set is not None:
this_spec = spec
if spec_set is not None:
this_spec = spec_set
if _is_list(this_spec):
not_callable = '__call__' not in this_spec
else:
not_callable = not callable(this_spec)
if not_callable:
Klass = NonCallableMagicMock
if spec is not None:
_kwargs['spec'] = spec
if spec_set is not None:
_kwargs['spec_set'] = spec_set
# add a name to mocks
if (isinstance(Klass, type) and
issubclass(Klass, NonCallableMock) and self.attribute):
_kwargs['name'] = self.attribute
_kwargs.update(kwargs)
new = Klass(**_kwargs)
if inherit and _is_instance_mock(new):
# we can only tell if the instance should be callable if the
# spec is not a list
this_spec = spec
if spec_set is not None:
this_spec = spec_set
if (not _is_list(this_spec) and not
_instance_callable(this_spec)):
Klass = NonCallableMagicMock
_kwargs.pop('name')
new.return_value = Klass(_new_parent=new, _new_name='()',
**_kwargs)
elif autospec is not None:
# spec is ignored, new *must* be default, spec_set is treated
# as a boolean. Should we check spec is not None and that spec_set
# is a bool?
if new is not DEFAULT:
raise TypeError(
"autospec creates the mock for you. Can't specify "
"autospec and new."
)
if original is DEFAULT:
raise TypeError("Can't use 'autospec' with create=True")
spec_set = bool(spec_set)
if autospec is True:
autospec = original
new = create_autospec(autospec, spec_set=spec_set,
_name=self.attribute, **kwargs)
elif kwargs:
# can't set keyword args when we aren't creating the mock
# XXXX If new is a Mock we could call new.configure_mock(**kwargs)
raise TypeError("Can't pass kwargs to a mock we aren't creating")
new_attr = new
self.temp_original = original
self.is_local = local
setattr(self.target, self.attribute, new_attr)
if self.attribute_name is not None:
extra_args = {}
if self.new is DEFAULT:
extra_args[self.attribute_name] = new
for patching in self.additional_patchers:
arg = patching.__enter__()
if patching.new is DEFAULT:
extra_args.update(arg)
return extra_args
return new
def __exit__(self, *exc_info):
"""Undo the patch."""
if not _is_started(self):
raise RuntimeError('stop called on unstarted patcher')
if self.is_local and self.temp_original is not DEFAULT:
setattr(self.target, self.attribute, self.temp_original)
else:
delattr(self.target, self.attribute)
if not self.create and not hasattr(self.target, self.attribute):
# needed for proxy objects like django settings
setattr(self.target, self.attribute, self.temp_original)
del self.temp_original
del self.is_local
del self.target
for patcher in reversed(self.additional_patchers):
if _is_started(patcher):
patcher.__exit__(*exc_info)
def start(self):
"""Activate a patch, returning any created mock."""
result = self.__enter__()
self._active_patches.append(self)
return result
def stop(self):
"""Stop an active patch."""
try:
self._active_patches.remove(self)
except ValueError:
# If the patch hasn't been started this will fail
pass
return self.__exit__()
def _get_target(target):
try:
target, attribute = target.rsplit('.', 1)
except (TypeError, ValueError):
raise TypeError("Need a valid target to patch. You supplied: %r" %
(target,))
getter = lambda: _importer(target)
return getter, attribute
def _patch_object(
target, attribute, new=DEFAULT, spec=None,
create=False, spec_set=None, autospec=None,
new_callable=None, **kwargs
):
"""
patch the named member (`attribute`) on an object (`target`) with a mock
object.
`patch.object` can be used as a decorator, class decorator or a context
manager. Arguments `new`, `spec`, `create`, `spec_set`,
`autospec` and `new_callable` have the same meaning as for `patch`. Like
`patch`, `patch.object` takes arbitrary keyword arguments for configuring
the mock object it creates.
When used as a class decorator `patch.object` honours `patch.TEST_PREFIX`
for choosing which methods to wrap.
"""
getter = lambda: target
return _patch(
getter, attribute, new, spec, create,
spec_set, autospec, new_callable, kwargs
)
def _patch_multiple(target, spec=None, create=False, spec_set=None,
autospec=None, new_callable=None, **kwargs):
"""Perform multiple patches in a single call. It takes the object to be
patched (either as an object or a string to fetch the object by importing)
and keyword arguments for the patches::
with patch.multiple(settings, FIRST_PATCH='one', SECOND_PATCH='two'):
...
Use `DEFAULT` as the value if you want `patch.multiple` to create
mocks for you. In this case the created mocks are passed into a decorated
function by keyword, and a dictionary is returned when `patch.multiple` is
used as a context manager.
`patch.multiple` can be used as a decorator, class decorator or a context
manager. The arguments `spec`, `spec_set`, `create`,
`autospec` and `new_callable` have the same meaning as for `patch`. These
arguments will be applied to *all* patches done by `patch.multiple`.
When used as a class decorator `patch.multiple` honours `patch.TEST_PREFIX`
for choosing which methods to wrap.
"""
if type(target) is str:
getter = lambda: _importer(target)
else:
getter = lambda: target
if not kwargs:
raise ValueError(
'Must supply at least one keyword argument with patch.multiple'
)
# need to wrap in a list for python 3, where items is a view
items = list(kwargs.items())
attribute, new = items[0]
patcher = _patch(
getter, attribute, new, spec, create, spec_set,
autospec, new_callable, {}
)
patcher.attribute_name = attribute
for attribute, new in items[1:]:
this_patcher = _patch(
getter, attribute, new, spec, create, spec_set,
autospec, new_callable, {}
)
this_patcher.attribute_name = attribute
patcher.additional_patchers.append(this_patcher)
return patcher
def patch(
target, new=DEFAULT, spec=None, create=False,
spec_set=None, autospec=None, new_callable=None, **kwargs
):
"""
`patch` acts as a function decorator, class decorator or a context
manager. Inside the body of the function or with statement, the `target`
is patched with a `new` object. When the function/with statement exits
the patch is undone.
If `new` is omitted, then the target is replaced with a
`MagicMock`. If `patch` is used as a decorator and `new` is
omitted, the created mock is passed in as an extra argument to the
decorated function. If `patch` is used as a context manager the created
mock is returned by the context manager.
`target` should be a string in the form `'package.module.ClassName'`. The
`target` is imported and the specified object replaced with the `new`
object, so the `target` must be importable from the environment you are
calling `patch` from. The target is imported when the decorated function
is executed, not at decoration time.
The `spec` and `spec_set` keyword arguments are passed to the `MagicMock`
if patch is creating one for you.
In addition you can pass `spec=True` or `spec_set=True`, which causes
patch to pass in the object being mocked as the spec/spec_set object.
`new_callable` allows you to specify a different class, or callable object,
that will be called to create the `new` object. By default `MagicMock` is
used.
A more powerful form of `spec` is `autospec`. If you set `autospec=True`
then the mock will be created with a spec from the object being replaced.
All attributes of the mock will also have the spec of the corresponding
attribute of the object being replaced. Methods and functions being
mocked will have their arguments checked and will raise a `TypeError` if
they are called with the wrong signature. For mocks replacing a class,
their return value (the 'instance') will have the same spec as the class.
Instead of `autospec=True` you can pass `autospec=some_object` to use an
arbitrary object as the spec instead of the one being replaced.
By default `patch` will fail to replace attributes that don't exist. If
you pass in `create=True`, and the attribute doesn't exist, patch will
create the attribute for you when the patched function is called, and
delete it again afterwards. This is useful for writing tests against
attributes that your production code creates at runtime. It is off by
default because it can be dangerous. With it switched on you can write
passing tests against APIs that don't actually exist!
Patch can be used as a `TestCase` class decorator. It works by
decorating each test method in the class. This reduces the boilerplate
code when your test methods share a common patchings set. `patch` finds
tests by looking for method names that start with `patch.TEST_PREFIX`.
By default this is `test`, which matches the way `unittest` finds tests.
You can specify an alternative prefix by setting `patch.TEST_PREFIX`.
Patch can be used as a context manager, with the with statement. Here the
patching applies to the indented block after the with statement. If you
use "as" then the patched object will be bound to the name after the
"as"; very useful if `patch` is creating a mock object for you.
`patch` takes arbitrary keyword arguments. These will be passed to
the `Mock` (or `new_callable`) on construction.
`patch.dict(...)`, `patch.multiple(...)` and `patch.object(...)` are
available for alternate use-cases.
"""
getter, attribute = _get_target(target)
return _patch(
getter, attribute, new, spec, create,
spec_set, autospec, new_callable, kwargs
)
class _patch_dict(object):
"""
Patch a dictionary, or dictionary like object, and restore the dictionary
to its original state after the test.
`in_dict` can be a dictionary or a mapping like container. If it is a
mapping then it must at least support getting, setting and deleting items
plus iterating over keys.
`in_dict` can also be a string specifying the name of the dictionary, which
will then be fetched by importing it.
`values` can be a dictionary of values to set in the dictionary. `values`
can also be an iterable of `(key, value)` pairs.
If `clear` is True then the dictionary will be cleared before the new
values are set.
`patch.dict` can also be called with arbitrary keyword arguments to set
values in the dictionary::
with patch.dict('sys.modules', mymodule=Mock(), other_module=Mock()):
...
`patch.dict` can be used as a context manager, decorator or class
decorator. When used as a class decorator `patch.dict` honours
`patch.TEST_PREFIX` for choosing which methods to wrap.
"""
def __init__(self, in_dict, values=(), clear=False, **kwargs):
if isinstance(in_dict, str):
in_dict = _importer(in_dict)
self.in_dict = in_dict
# support any argument supported by dict(...) constructor
self.values = dict(values)
self.values.update(kwargs)
self.clear = clear
self._original = None
def __call__(self, f):
if isinstance(f, type):
return self.decorate_class(f)
@wraps(f)
def _inner(*args, **kw):
self._patch_dict()
try:
return f(*args, **kw)
finally:
self._unpatch_dict()
return _inner
def decorate_class(self, klass):
for attr in dir(klass):
attr_value = getattr(klass, attr)
if (attr.startswith(patch.TEST_PREFIX) and
hasattr(attr_value, "__call__")):
decorator = _patch_dict(self.in_dict, self.values, self.clear)
decorated = decorator(attr_value)
setattr(klass, attr, decorated)
return klass
def __enter__(self):
"""Patch the dict."""
self._patch_dict()
def _patch_dict(self):
values = self.values
in_dict = self.in_dict
clear = self.clear
try:
original = in_dict.copy()
except AttributeError:
# dict like object with no copy method
# must support iteration over keys
original = {}
for key in in_dict:
original[key] = in_dict[key]
self._original = original
if clear:
_clear_dict(in_dict)
try:
in_dict.update(values)
except AttributeError:
# dict like object with no update method
for key in values:
in_dict[key] = values[key]
def _unpatch_dict(self):
in_dict = self.in_dict
original = self._original
_clear_dict(in_dict)
try:
in_dict.update(original)
except AttributeError:
for key in original:
in_dict[key] = original[key]
def __exit__(self, *args):
"""Unpatch the dict."""
self._unpatch_dict()
return False
start = __enter__
stop = __exit__
def _clear_dict(in_dict):
try:
in_dict.clear()
except AttributeError:
keys = list(in_dict)
for key in keys:
del in_dict[key]
def _patch_stopall():
"""Stop all active patches. LIFO to unroll nested patches."""
for patch in reversed(_patch._active_patches):
patch.stop()
patch.object = _patch_object
patch.dict = _patch_dict
patch.multiple = _patch_multiple
patch.stopall = _patch_stopall
patch.TEST_PREFIX = 'test'
magic_methods = (
"lt le gt ge eq ne "
"getitem setitem delitem "
"len contains iter "
"hash str sizeof "
"enter exit "
# we added divmod and rdivmod here instead of numerics
# because there is no idivmod
"divmod rdivmod neg pos abs invert "
"complex int float index "
"trunc floor ceil "
"bool next "
)
numerics = (
"add sub mul div floordiv mod lshift rshift and xor or pow truediv"
)
inplace = ' '.join('i%s' % n for n in numerics.split())
right = ' '.join('r%s' % n for n in numerics.split())
# not including __prepare__, __instancecheck__, __subclasscheck__
# (as they are metaclass methods)
# __del__ is not supported at all as it causes problems if it exists
_non_defaults = set('__%s__' % method for method in [
'get', 'set', 'delete', 'reversed', 'missing', 'reduce', 'reduce_ex',
'getinitargs', 'getnewargs', 'getstate', 'setstate', 'getformat',
'setformat', 'repr', 'dir', 'subclasses', 'format',
])
def _get_method(name, func):
"Turns a callable object (like a mock) into a real function"
def method(self, *args, **kw):
return func(self, *args, **kw)
method.__name__ = name
return method
_magics = set(
'__%s__' % method for method in
' '.join([magic_methods, numerics, inplace, right]).split()
)
_all_magics = _magics | _non_defaults
_unsupported_magics = set([
'__getattr__', '__setattr__',
'__init__', '__new__', '__prepare__'
'__instancecheck__', '__subclasscheck__',
'__del__'
])
_calculate_return_value = {
'__hash__': lambda self: object.__hash__(self),
'__str__': lambda self: object.__str__(self),
'__sizeof__': lambda self: object.__sizeof__(self),
}
_return_values = {
'__lt__': NotImplemented,
'__gt__': NotImplemented,
'__le__': NotImplemented,
'__ge__': NotImplemented,
'__int__': 1,
'__contains__': False,
'__len__': 0,
'__exit__': False,
'__complex__': 1j,
'__float__': 1.0,
'__bool__': True,
'__index__': 1,
}
def _get_eq(self):
def __eq__(other):
ret_val = self.__eq__._mock_return_value
if ret_val is not DEFAULT:
return ret_val
return self is other
return __eq__
def _get_ne(self):
def __ne__(other):
if self.__ne__._mock_return_value is not DEFAULT:
return DEFAULT
return self is not other
return __ne__
def _get_iter(self):
def __iter__():
ret_val = self.__iter__._mock_return_value
if ret_val is DEFAULT:
return iter([])
# if ret_val was already an iterator, then calling iter on it should
# return the iterator unchanged
return iter(ret_val)
return __iter__
_side_effect_methods = {
'__eq__': _get_eq,
'__ne__': _get_ne,
'__iter__': _get_iter,
}
def _set_return_value(mock, method, name):
fixed = _return_values.get(name, DEFAULT)
if fixed is not DEFAULT:
method.return_value = fixed
return
return_calulator = _calculate_return_value.get(name)
if return_calulator is not None:
try:
return_value = return_calulator(mock)
except AttributeError:
# XXXX why do we return AttributeError here?
# set it as a side_effect instead?
return_value = AttributeError(name)
method.return_value = return_value
return
side_effector = _side_effect_methods.get(name)
if side_effector is not None:
method.side_effect = side_effector(mock)
class MagicMixin(object):
def __init__(self, *args, **kw):
self._mock_set_magics() # make magic work for kwargs in init
_safe_super(MagicMixin, self).__init__(*args, **kw)
self._mock_set_magics() # fix magic broken by upper level init
def _mock_set_magics(self):
these_magics = _magics
if getattr(self, "_mock_methods", None) is not None:
these_magics = _magics.intersection(self._mock_methods)
remove_magics = set()
remove_magics = _magics - these_magics
for entry in remove_magics:
if entry in type(self).__dict__:
# remove unneeded magic methods
delattr(self, entry)
# don't overwrite existing attributes if called a second time
these_magics = these_magics - set(type(self).__dict__)
_type = type(self)
for entry in these_magics:
setattr(_type, entry, MagicProxy(entry, self))
class NonCallableMagicMock(MagicMixin, NonCallableMock):
"""A version of `MagicMock` that isn't callable."""
def mock_add_spec(self, spec, spec_set=False):
"""Add a spec to a mock. `spec` can either be an object or a
list of strings. Only attributes on the `spec` can be fetched as
attributes from the mock.
If `spec_set` is True then only attributes on the spec can be set."""
self._mock_add_spec(spec, spec_set)
self._mock_set_magics()
class MagicMock(MagicMixin, Mock):
"""
MagicMock is a subclass of Mock with default implementations
of most of the magic methods. You can use MagicMock without having to
configure the magic methods yourself.
If you use the `spec` or `spec_set` arguments then *only* magic
methods that exist in the spec will be created.
Attributes and the return value of a `MagicMock` will also be `MagicMocks`.
"""
def mock_add_spec(self, spec, spec_set=False):
"""Add a spec to a mock. `spec` can either be an object or a
list of strings. Only attributes on the `spec` can be fetched as
attributes from the mock.
If `spec_set` is True then only attributes on the spec can be set."""
self._mock_add_spec(spec, spec_set)
self._mock_set_magics()
class MagicProxy(object):
def __init__(self, name, parent):
self.name = name
self.parent = parent
def __call__(self, *args, **kwargs):
m = self.create_mock()
return m(*args, **kwargs)
def create_mock(self):
entry = self.name
parent = self.parent
m = parent._get_child_mock(name=entry, _new_name=entry,
_new_parent=parent)
setattr(parent, entry, m)
_set_return_value(parent, m, entry)
return m
def __get__(self, obj, _type=None):
return self.create_mock()
class _ANY(object):
"A helper object that compares equal to everything."
def __eq__(self, other):
return True
def __ne__(self, other):
return False
def __repr__(self):
return '<ANY>'
ANY = _ANY()
def _format_call_signature(name, args, kwargs):
message = '%s(%%s)' % name
formatted_args = ''
args_string = ', '.join([repr(arg) for arg in args])
kwargs_string = ', '.join([
'%s=%r' % (key, value) for key, value in kwargs.items()
])
if args_string:
formatted_args = args_string
if kwargs_string:
if formatted_args:
formatted_args += ', '
formatted_args += kwargs_string
return message % formatted_args
class _Call(tuple):
"""
A tuple for holding the results of a call to a mock, either in the form
`(args, kwargs)` or `(name, args, kwargs)`.
If args or kwargs are empty then a call tuple will compare equal to
a tuple without those values. This makes comparisons less verbose::
_Call(('name', (), {})) == ('name',)
_Call(('name', (1,), {})) == ('name', (1,))
_Call(((), {'a': 'b'})) == ({'a': 'b'},)
The `_Call` object provides a useful shortcut for comparing with call::
_Call(((1, 2), {'a': 3})) == call(1, 2, a=3)
_Call(('foo', (1, 2), {'a': 3})) == call.foo(1, 2, a=3)
If the _Call has no name then it will match any name.
"""
def __new__(cls, value=(), name=None, parent=None, two=False,
from_kall=True):
name = ''
args = ()
kwargs = {}
_len = len(value)
if _len == 3:
name, args, kwargs = value
elif _len == 2:
first, second = value
if isinstance(first, str):
name = first
if isinstance(second, tuple):
args = second
else:
kwargs = second
else:
args, kwargs = first, second
elif _len == 1:
value, = value
if isinstance(value, str):
name = value
elif isinstance(value, tuple):
args = value
else:
kwargs = value
if two:
return tuple.__new__(cls, (args, kwargs))
return tuple.__new__(cls, (name, args, kwargs))
def __init__(self, value=(), name=None, parent=None, two=False,
from_kall=True):
self.name = name
self.parent = parent
self.from_kall = from_kall
def __eq__(self, other):
if other is ANY:
return True
try:
len_other = len(other)
except TypeError:
return False
self_name = ''
if len(self) == 2:
self_args, self_kwargs = self
else:
self_name, self_args, self_kwargs = self
other_name = ''
if len_other == 0:
other_args, other_kwargs = (), {}
elif len_other == 3:
other_name, other_args, other_kwargs = other
elif len_other == 1:
value, = other
if isinstance(value, tuple):
other_args = value
other_kwargs = {}
elif isinstance(value, str):
other_name = value
other_args, other_kwargs = (), {}
else:
other_args = ()
other_kwargs = value
elif len_other == 2:
# could be (name, args) or (name, kwargs) or (args, kwargs)
first, second = other
if isinstance(first, str):
other_name = first
if isinstance(second, tuple):
other_args, other_kwargs = second, {}
else:
other_args, other_kwargs = (), second
else:
other_args, other_kwargs = first, second
else:
return False
if self_name and other_name != self_name:
return False
# this order is important for ANY to work!
return (other_args, other_kwargs) == (self_args, self_kwargs)
def __ne__(self, other):
return not self.__eq__(other)
def __call__(self, *args, **kwargs):
if self.name is None:
return _Call(('', args, kwargs), name='()')
name = self.name + '()'
return _Call((self.name, args, kwargs), name=name, parent=self)
def __getattr__(self, attr):
if self.name is None:
return _Call(name=attr, from_kall=False)
name = '%s.%s' % (self.name, attr)
return _Call(name=name, parent=self, from_kall=False)
def __repr__(self):
if not self.from_kall:
name = self.name or 'call'
if name.startswith('()'):
name = 'call%s' % name
return name
if len(self) == 2:
name = 'call'
args, kwargs = self
else:
name, args, kwargs = self
if not name:
name = 'call'
elif not name.startswith('()'):
name = 'call.%s' % name
else:
name = 'call%s' % name
return _format_call_signature(name, args, kwargs)
def call_list(self):
"""For a call object that represents multiple calls, `call_list`
returns a list of all the intermediate calls as well as the
final call."""
vals = []
thing = self
while thing is not None:
if thing.from_kall:
vals.append(thing)
thing = thing.parent
return _CallList(reversed(vals))
call = _Call(from_kall=False)
def create_autospec(spec, spec_set=False, instance=False, _parent=None,
_name=None, **kwargs):
"""Create a mock object using another object as a spec. Attributes on the
mock will use the corresponding attribute on the `spec` object as their
spec.
Functions or methods being mocked will have their arguments checked
to check that they are called with the correct signature.
If `spec_set` is True then attempting to set attributes that don't exist
on the spec object will raise an `AttributeError`.
If a class is used as a spec then the return value of the mock (the
instance of the class) will have the same spec. You can use a class as the
spec for an instance object by passing `instance=True`. The returned mock
will only be callable if instances of the mock are callable.
`create_autospec` also takes arbitrary keyword arguments that are passed to
the constructor of the created mock."""
if _is_list(spec):
# can't pass a list instance to the mock constructor as it will be
# interpreted as a list of strings
spec = type(spec)
is_type = isinstance(spec, type)
_kwargs = {'spec': spec}
if spec_set:
_kwargs = {'spec_set': spec}
elif spec is None:
# None we mock with a normal mock without a spec
_kwargs = {}
if _kwargs and instance:
_kwargs['_spec_as_instance'] = True
_kwargs.update(kwargs)
Klass = MagicMock
if type(spec) in DescriptorTypes:
# descriptors don't have a spec
# because we don't know what type they return
_kwargs = {}
elif not _callable(spec):
Klass = NonCallableMagicMock
elif is_type and instance and not _instance_callable(spec):
Klass = NonCallableMagicMock
_name = _kwargs.pop('name', _name)
_new_name = _name
if _parent is None:
# for a top level object no _new_name should be set
_new_name = ''
mock = Klass(parent=_parent, _new_parent=_parent, _new_name=_new_name,
name=_name, **_kwargs)
if isinstance(spec, FunctionTypes):
# should only happen at the top level because we don't
# recurse for functions
mock = _set_signature(mock, spec)
else:
_check_signature(spec, mock, is_type, instance)
if _parent is not None and not instance:
_parent._mock_children[_name] = mock
if is_type and not instance and 'return_value' not in kwargs:
mock.return_value = create_autospec(spec, spec_set, instance=True,
_name='()', _parent=mock)
for entry in dir(spec):
if _is_magic(entry):
# MagicMock already does the useful magic methods for us
continue
# XXXX do we need a better way of getting attributes without
# triggering code execution (?) Probably not - we need the actual
# object to mock it so we would rather trigger a property than mock
# the property descriptor. Likewise we want to mock out dynamically
# provided attributes.
# XXXX what about attributes that raise exceptions other than
# AttributeError on being fetched?
# we could be resilient against it, or catch and propagate the
# exception when the attribute is fetched from the mock
try:
original = getattr(spec, entry)
except AttributeError:
continue
kwargs = {'spec': original}
if spec_set:
kwargs = {'spec_set': original}
if not isinstance(original, FunctionTypes):
new = _SpecState(original, spec_set, mock, entry, instance)
mock._mock_children[entry] = new
else:
parent = mock
if isinstance(spec, FunctionTypes):
parent = mock.mock
skipfirst = _must_skip(spec, entry, is_type)
kwargs['_eat_self'] = skipfirst
new = MagicMock(parent=parent, name=entry, _new_name=entry,
_new_parent=parent,
**kwargs)
mock._mock_children[entry] = new
_check_signature(original, new, skipfirst=skipfirst)
# so functions created with _set_signature become instance attributes,
# *plus* their underlying mock exists in _mock_children of the parent
# mock. Adding to _mock_children may be unnecessary where we are also
# setting as an instance attribute?
if isinstance(new, FunctionTypes):
setattr(mock, entry, new)
return mock
def _must_skip(spec, entry, is_type):
"""
Return whether we should skip the first argument on spec's `entry`
attribute.
"""
if not isinstance(spec, type):
if entry in getattr(spec, '__dict__', {}):
# instance attribute - shouldn't skip
return False
spec = spec.__class__
for klass in spec.__mro__:
result = klass.__dict__.get(entry, DEFAULT)
if result is DEFAULT:
continue
if isinstance(result, (staticmethod, classmethod)):
return False
elif isinstance(getattr(result, '__get__', None), MethodWrapperTypes):
# Normal method => skip if looked up on type
# (if looked up on instance, self is already skipped)
return is_type
else:
return False
# shouldn't get here unless function is a dynamically provided attribute
# XXXX untested behaviour
return is_type
def _get_class(obj):
try:
return obj.__class__
except AttributeError:
# it is possible for objects to have no __class__
return type(obj)
class _SpecState(object):
def __init__(self, spec, spec_set=False, parent=None,
name=None, ids=None, instance=False):
self.spec = spec
self.ids = ids
self.spec_set = spec_set
self.parent = parent
self.instance = instance
self.name = name
FunctionTypes = (
# python function
type(create_autospec),
# instance method
type(ANY.__eq__),
)
MethodWrapperTypes = (
type(ANY.__eq__.__get__),
)
file_spec = None
def _iterate_read_data(read_data):
# Helper for mock_open:
# Retrieve lines from read_data via a generator so that separate calls to
# readline, read, and readlines are properly interleaved
sep = b'\n' if isinstance(read_data, bytes) else '\n'
data_as_list = [l + sep for l in read_data.split(sep)]
if data_as_list[-1] == sep:
# If the last line ended in a newline, the list comprehension will have an
# extra entry that's just a newline. Remove this.
data_as_list = data_as_list[:-1]
else:
# If there wasn't an extra newline by itself, then the file being
# emulated doesn't have a newline to end the last line remove the
# newline that our naive format() added
data_as_list[-1] = data_as_list[-1][:-1]
for line in data_as_list:
yield line
def mock_open(mock=None, read_data=''):
"""
A helper function to create a mock to replace the use of `open`. It works
for `open` called directly or used as a context manager.
The `mock` argument is the mock object to configure. If `None` (the
default) then a `MagicMock` will be created for you, with the API limited
to methods or attributes available on standard file handles.
`read_data` is a string for the `read` methoddline`, and `readlines` of the
file handle to return. This is an empty string by default.
"""
def _readlines_side_effect(*args, **kwargs):
if handle.readlines.return_value is not None:
return handle.readlines.return_value
return list(_state[0])
def _read_side_effect(*args, **kwargs):
if handle.read.return_value is not None:
return handle.read.return_value
return type(read_data)().join(_state[0])
def _readline_side_effect():
if handle.readline.return_value is not None:
while True:
yield handle.readline.return_value
for line in _state[0]:
yield line
global file_spec
if file_spec is None:
import _io
file_spec = list(set(dir(_io.TextIOWrapper)).union(set(dir(_io.BytesIO))))
if mock is None:
mock = MagicMock(name='open', spec=open)
handle = MagicMock(spec=file_spec)
handle.__enter__.return_value = handle
_state = [_iterate_read_data(read_data), None]
handle.write.return_value = None
handle.read.return_value = None
handle.readline.return_value = None
handle.readlines.return_value = None
handle.read.side_effect = _read_side_effect
_state[1] = _readline_side_effect()
handle.readline.side_effect = _state[1]
handle.readlines.side_effect = _readlines_side_effect
def reset_data(*args, **kwargs):
_state[0] = _iterate_read_data(read_data)
if handle.readline.side_effect == _state[1]:
# Only reset the side effect if the user hasn't overridden it.
_state[1] = _readline_side_effect()
handle.readline.side_effect = _state[1]
return DEFAULT
mock.side_effect = reset_data
mock.return_value = handle
return mock
class PropertyMock(Mock):
"""
A mock intended to be used as a property, or other descriptor, on a class.
`PropertyMock` provides `__get__` and `__set__` methods so you can specify
a return value when it is fetched.
Fetching a `PropertyMock` instance from an object calls the mock, with
no args. Setting it calls the mock with the value being set.
"""
def _get_child_mock(self, **kwargs):
return MagicMock(**kwargs)
def __get__(self, obj, obj_type):
return self()
def __set__(self, obj, val):
self(val)
"""Test result object"""
import io
import sys
import traceback
from . import util
from functools import wraps
__unittest = True
def failfast(method):
@wraps(method)
def inner(self, *args, **kw):
if getattr(self, 'failfast', False):
self.stop()
return method(self, *args, **kw)
return inner
STDOUT_LINE = '\nStdout:\n%s'
STDERR_LINE = '\nStderr:\n%s'
class TestResult(object):
"""Holder for test result information.
Test results are automatically managed by the TestCase and TestSuite
classes, and do not need to be explicitly manipulated by writers of tests.
Each instance holds the total number of tests run, and collections of
failures and errors that occurred among those test runs. The collections
contain tuples of (testcase, exceptioninfo), where exceptioninfo is the
formatted traceback of the error that occurred.
"""
_previousTestClass = None
_testRunEntered = False
_moduleSetUpFailed = False
def __init__(self, stream=None, descriptions=None, verbosity=None):
self.failfast = False
self.failures = []
self.errors = []
self.testsRun = 0
self.skipped = []
self.expectedFailures = []
self.unexpectedSuccesses = []
self.shouldStop = False
self.buffer = False
self._stdout_buffer = None
self._stderr_buffer = None
self._original_stdout = sys.stdout
self._original_stderr = sys.stderr
self._mirrorOutput = False
def printErrors(self):
"Called by TestRunner after test run"
def startTest(self, test):
"Called when the given test is about to be run"
self.testsRun += 1
self._mirrorOutput = False
self._setupStdout()
def _setupStdout(self):
if self.buffer:
if self._stderr_buffer is None:
self._stderr_buffer = io.StringIO()
self._stdout_buffer = io.StringIO()
sys.stdout = self._stdout_buffer
sys.stderr = self._stderr_buffer
def startTestRun(self):
"""Called once before any tests are executed.
See startTest for a method called before each test.
"""
def stopTest(self, test):
"""Called when the given test has been run"""
self._restoreStdout()
self._mirrorOutput = False
def _restoreStdout(self):
if self.buffer:
if self._mirrorOutput:
output = sys.stdout.getvalue()
error = sys.stderr.getvalue()
if output:
if not output.endswith('\n'):
output += '\n'
self._original_stdout.write(STDOUT_LINE % output)
if error:
if not error.endswith('\n'):
error += '\n'
self._original_stderr.write(STDERR_LINE % error)
sys.stdout = self._original_stdout
sys.stderr = self._original_stderr
self._stdout_buffer.seek(0)
self._stdout_buffer.truncate()
self._stderr_buffer.seek(0)
self._stderr_buffer.truncate()
def stopTestRun(self):
"""Called once after all tests are executed.
See stopTest for a method called after each test.
"""
@failfast
def addError(self, test, err):
"""Called when an error has occurred. 'err' is a tuple of values as
returned by sys.exc_info().
"""
self.errors.append((test, self._exc_info_to_string(err, test)))
self._mirrorOutput = True
@failfast
def addFailure(self, test, err):
"""Called when an error has occurred. 'err' is a tuple of values as
returned by sys.exc_info()."""
self.failures.append((test, self._exc_info_to_string(err, test)))
self._mirrorOutput = True
def addSubTest(self, test, subtest, err):
"""Called at the end of a subtest.
'err' is None if the subtest ended successfully, otherwise it's a
tuple of values as returned by sys.exc_info().
"""
# By default, we don't do anything with successful subtests, but
# more sophisticated test results might want to record them.
if err is not None:
if getattr(self, 'failfast', False):
self.stop()
if issubclass(err[0], test.failureException):
errors = self.failures
else:
errors = self.errors
errors.append((subtest, self._exc_info_to_string(err, test)))
self._mirrorOutput = True
def addSuccess(self, test):
"Called when a test has completed successfully"
pass
def addSkip(self, test, reason):
"""Called when a test is skipped."""
self.skipped.append((test, reason))
def addExpectedFailure(self, test, err):
"""Called when an expected failure/error occured."""
self.expectedFailures.append(
(test, self._exc_info_to_string(err, test)))
@failfast
def addUnexpectedSuccess(self, test):
"""Called when a test was expected to fail, but succeed."""
self.unexpectedSuccesses.append(test)
def wasSuccessful(self):
"""Tells whether or not this result was a success."""
# The hasattr check is for test_result's OldResult test. That
# way this method works on objects that lack the attribute.
# (where would such result intances come from? old stored pickles?)
return ((len(self.failures) == len(self.errors) == 0) and
(not hasattr(self, 'unexpectedSuccesses') or
len(self.unexpectedSuccesses) == 0))
def stop(self):
"""Indicates that the tests should be aborted."""
self.shouldStop = True
def _exc_info_to_string(self, err, test):
"""Converts a sys.exc_info()-style tuple of values into a string."""
exctype, value, tb = err
# Skip test runner traceback levels
while tb and self._is_relevant_tb_level(tb):
tb = tb.tb_next
if exctype is test.failureException:
# Skip assert*() traceback levels
length = self._count_relevant_tb_levels(tb)
msgLines = traceback.format_exception(exctype, value, tb, length)
else:
msgLines = traceback.format_exception(exctype, value, tb)
if self.buffer:
output = sys.stdout.getvalue()
error = sys.stderr.getvalue()
if output:
if not output.endswith('\n'):
output += '\n'
msgLines.append(STDOUT_LINE % output)
if error:
if not error.endswith('\n'):
error += '\n'
msgLines.append(STDERR_LINE % error)
return ''.join(msgLines)
def _is_relevant_tb_level(self, tb):
return '__unittest' in tb.tb_frame.f_globals
def _count_relevant_tb_levels(self, tb):
length = 0
while tb and not self._is_relevant_tb_level(tb):
length += 1
tb = tb.tb_next
return length
def __repr__(self):
return ("<%s run=%i errors=%i failures=%i>" %
(util.strclass(self.__class__), self.testsRun, len(self.errors),
len(self.failures)))
"""Running tests"""
import sys
import time
import warnings
from . import result
from .signals import registerResult
__unittest = True
class _WritelnDecorator(object):
"""Used to decorate file-like objects with a handy 'writeln' method"""
def __init__(self,stream):
self.stream = stream
def __getattr__(self, attr):
if attr in ('stream', '__getstate__'):
raise AttributeError(attr)
return getattr(self.stream,attr)
def writeln(self, arg=None):
if arg:
self.write(arg)
self.write('\n') # text-mode streams translate to \r\n if needed
class TextTestResult(result.TestResult):
"""A test result class that can print formatted text results to a stream.
Used by TextTestRunner.
"""
separator1 = '=' * 70
separator2 = '-' * 70
def __init__(self, stream, descriptions, verbosity):
super(TextTestResult, self).__init__(stream, descriptions, verbosity)
self.stream = stream
self.showAll = verbosity > 1
self.dots = verbosity == 1
self.descriptions = descriptions
def getDescription(self, test):
doc_first_line = test.shortDescription()
if self.descriptions and doc_first_line:
return '\n'.join((str(test), doc_first_line))
else:
return str(test)
def startTest(self, test):
super(TextTestResult, self).startTest(test)
if self.showAll:
self.stream.write(self.getDescription(test))
self.stream.write(" ... ")
self.stream.flush()
def addSuccess(self, test):
super(TextTestResult, self).addSuccess(test)
if self.showAll:
self.stream.writeln("ok")
elif self.dots:
self.stream.write('.')
self.stream.flush()
def addError(self, test, err):
super(TextTestResult, self).addError(test, err)
if self.showAll:
self.stream.writeln("ERROR")
elif self.dots:
self.stream.write('E')
self.stream.flush()
def addFailure(self, test, err):
super(TextTestResult, self).addFailure(test, err)
if self.showAll:
self.stream.writeln("FAIL")
elif self.dots:
self.stream.write('F')
self.stream.flush()
def addSkip(self, test, reason):
super(TextTestResult, self).addSkip(test, reason)
if self.showAll:
self.stream.writeln("skipped {0!r}".format(reason))
elif self.dots:
self.stream.write("s")
self.stream.flush()
def addExpectedFailure(self, test, err):
super(TextTestResult, self).addExpectedFailure(test, err)
if self.showAll:
self.stream.writeln("expected failure")
elif self.dots:
self.stream.write("x")
self.stream.flush()
def addUnexpectedSuccess(self, test):
super(TextTestResult, self).addUnexpectedSuccess(test)
if self.showAll:
self.stream.writeln("unexpected success")
elif self.dots:
self.stream.write("u")
self.stream.flush()
def printErrors(self):
if self.dots or self.showAll:
self.stream.writeln()
self.printErrorList('ERROR', self.errors)
self.printErrorList('FAIL', self.failures)
def printErrorList(self, flavour, errors):
for test, err in errors:
self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
self.stream.writeln(self.separator2)
self.stream.writeln("%s" % err)
class TextTestRunner(object):
"""A test runner class that displays results in textual form.
It prints out the names of tests as they are run, errors as they
occur, and a summary of the results at the end of the test run.
"""
resultclass = TextTestResult
def __init__(self, stream=None, descriptions=True, verbosity=1,
failfast=False, buffer=False, resultclass=None, warnings=None):
if stream is None:
stream = sys.stderr
self.stream = _WritelnDecorator(stream)
self.descriptions = descriptions
self.verbosity = verbosity
self.failfast = failfast
self.buffer = buffer
self.warnings = warnings
if resultclass is not None:
self.resultclass = resultclass
def _makeResult(self):
return self.resultclass(self.stream, self.descriptions, self.verbosity)
def run(self, test):
"Run the given test case or test suite."
result = self._makeResult()
registerResult(result)
result.failfast = self.failfast
result.buffer = self.buffer
with warnings.catch_warnings():
if self.warnings:
# if self.warnings is set, use it to filter all the warnings
warnings.simplefilter(self.warnings)
# if the filter is 'default' or 'always', special-case the
# warnings from the deprecated unittest methods to show them
# no more than once per module, because they can be fairly
# noisy. The -Wd and -Wa flags can be used to bypass this
# only when self.warnings is None.
if self.warnings in ['default', 'always']:
warnings.filterwarnings('module',
category=DeprecationWarning,
message='Please use assert\w+ instead.')
startTime = time.time()
startTestRun = getattr(result, 'startTestRun', None)
if startTestRun is not None:
startTestRun()
try:
test(result)
finally:
stopTestRun = getattr(result, 'stopTestRun', None)
if stopTestRun is not None:
stopTestRun()
stopTime = time.time()
timeTaken = stopTime - startTime
result.printErrors()
if hasattr(result, 'separator2'):
self.stream.writeln(result.separator2)
run = result.testsRun
self.stream.writeln("Ran %d test%s in %.3fs" %
(run, run != 1 and "s" or "", timeTaken))
self.stream.writeln()
expectedFails = unexpectedSuccesses = skipped = 0
try:
results = map(len, (result.expectedFailures,
result.unexpectedSuccesses,
result.skipped))
except AttributeError:
pass
else:
expectedFails, unexpectedSuccesses, skipped = results
infos = []
if not result.wasSuccessful():
self.stream.write("FAILED")
failed, errored = len(result.failures), len(result.errors)
if failed:
infos.append("failures=%d" % failed)
if errored:
infos.append("errors=%d" % errored)
else:
self.stream.write("OK")
if skipped:
infos.append("skipped=%d" % skipped)
if expectedFails:
infos.append("expected failures=%d" % expectedFails)
if unexpectedSuccesses:
infos.append("unexpected successes=%d" % unexpectedSuccesses)
if infos:
self.stream.writeln(" (%s)" % (", ".join(infos),))
else:
self.stream.write("\n")
return result
import signal
import weakref
from functools import wraps
__unittest = True
class _InterruptHandler(object):
def __init__(self, default_handler):
self.called = False
self.original_handler = default_handler
if isinstance(default_handler, int):
if default_handler == signal.SIG_DFL:
# Pretend it's signal.default_int_handler instead.
default_handler = signal.default_int_handler
elif default_handler == signal.SIG_IGN:
# Not quite the same thing as SIG_IGN, but the closest we
# can make it: do nothing.
def default_handler(unused_signum, unused_frame):
pass
else:
raise TypeError("expected SIGINT signal handler to be "
"signal.SIG_IGN, signal.SIG_DFL, or a "
"callable object")
self.default_handler = default_handler
def __call__(self, signum, frame):
installed_handler = signal.getsignal(signal.SIGINT)
if installed_handler is not self:
# if we aren't the installed handler, then delegate immediately
# to the default handler
self.default_handler(signum, frame)
if self.called:
self.default_handler(signum, frame)
self.called = True
for result in _results.keys():
result.stop()
_results = weakref.WeakKeyDictionary()
def registerResult(result):
_results[result] = 1
def removeResult(result):
return bool(_results.pop(result, None))
_interrupt_handler = None
def installHandler():
global _interrupt_handler
if _interrupt_handler is None:
default_handler = signal.getsignal(signal.SIGINT)
_interrupt_handler = _InterruptHandler(default_handler)
signal.signal(signal.SIGINT, _interrupt_handler)
def removeHandler(method=None):
if method is not None:
@wraps(method)
def inner(*args, **kwargs):
initial = signal.getsignal(signal.SIGINT)
removeHandler()
try:
return method(*args, **kwargs)
finally:
signal.signal(signal.SIGINT, initial)
return inner
global _interrupt_handler
if _interrupt_handler is not None:
signal.signal(signal.SIGINT, _interrupt_handler.original_handler)
"""TestSuite"""
import sys
from . import case
from . import util
__unittest = True
def _call_if_exists(parent, attr):
func = getattr(parent, attr, lambda: None)
func()
class BaseTestSuite(object):
"""A simple test suite that doesn't provide class or module shared fixtures.
"""
_cleanup = True
def __init__(self, tests=()):
self._tests = []
self._removed_tests = 0
self.addTests(tests)
def __repr__(self):
return "<%s tests=%s>" % (util.strclass(self.__class__), list(self))
def __eq__(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return list(self) == list(other)
def __iter__(self):
return iter(self._tests)
def countTestCases(self):
cases = self._removed_tests
for test in self:
if test:
cases += test.countTestCases()
return cases
def addTest(self, test):
# sanity checks
if not callable(test):
raise TypeError("{} is not callable".format(repr(test)))
if isinstance(test, type) and issubclass(test,
(case.TestCase, TestSuite)):
raise TypeError("TestCases and TestSuites must be instantiated "
"before passing them to addTest()")
self._tests.append(test)
def addTests(self, tests):
if isinstance(tests, str):
raise TypeError("tests must be an iterable of tests, not a string")
for test in tests:
self.addTest(test)
def run(self, result):
for index, test in enumerate(self):
if result.shouldStop:
break
test(result)
if self._cleanup:
self._removeTestAtIndex(index)
return result
def _removeTestAtIndex(self, index):
"""Stop holding a reference to the TestCase at index."""
try:
test = self._tests[index]
except TypeError:
# support for suite implementations that have overriden self._tests
pass
else:
# Some unittest tests add non TestCase/TestSuite objects to
# the suite.
if hasattr(test, 'countTestCases'):
self._removed_tests += test.countTestCases()
self._tests[index] = None
def __call__(self, *args, **kwds):
return self.run(*args, **kwds)
def debug(self):
"""Run the tests without collecting errors in a TestResult"""
for test in self:
test.debug()
class TestSuite(BaseTestSuite):
"""A test suite is a composite test consisting of a number of TestCases.
For use, create an instance of TestSuite, then add test case instances.
When all tests have been added, the suite can be passed to a test
runner, such as TextTestRunner. It will run the individual test cases
in the order in which they were added, aggregating the results. When
subclassing, do not forget to call the base class constructor.
"""
def run(self, result, debug=False):
topLevel = False
if getattr(result, '_testRunEntered', False) is False:
result._testRunEntered = topLevel = True
for index, test in enumerate(self):
if result.shouldStop:
break
if _isnotsuite(test):
self._tearDownPreviousClass(test, result)
self._handleModuleFixture(test, result)
self._handleClassSetUp(test, result)
result._previousTestClass = test.__class__
if (getattr(test.__class__, '_classSetupFailed', False) or
getattr(result, '_moduleSetUpFailed', False)):
continue
if not debug:
test(result)
else:
test.debug()
if self._cleanup:
self._removeTestAtIndex(index)
if topLevel:
self._tearDownPreviousClass(None, result)
self._handleModuleTearDown(result)
result._testRunEntered = False
return result
def debug(self):
"""Run the tests without collecting errors in a TestResult"""
debug = _DebugResult()
self.run(debug, True)
################################
def _handleClassSetUp(self, test, result):
previousClass = getattr(result, '_previousTestClass', None)
currentClass = test.__class__
if currentClass == previousClass:
return
if result._moduleSetUpFailed:
return
if getattr(currentClass, "__unittest_skip__", False):
return
try:
currentClass._classSetupFailed = False
except TypeError:
# test may actually be a function
# so its class will be a builtin-type
pass
setUpClass = getattr(currentClass, 'setUpClass', None)
if setUpClass is not None:
_call_if_exists(result, '_setupStdout')
try:
setUpClass()
except Exception as e:
if isinstance(result, _DebugResult):
raise
currentClass._classSetupFailed = True
className = util.strclass(currentClass)
errorName = 'setUpClass (%s)' % className
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
def _get_previous_module(self, result):
previousModule = None
previousClass = getattr(result, '_previousTestClass', None)
if previousClass is not None:
previousModule = previousClass.__module__
return previousModule
def _handleModuleFixture(self, test, result):
previousModule = self._get_previous_module(result)
currentModule = test.__class__.__module__
if currentModule == previousModule:
return
self._handleModuleTearDown(result)
result._moduleSetUpFailed = False
try:
module = sys.modules[currentModule]
except KeyError:
return
setUpModule = getattr(module, 'setUpModule', None)
if setUpModule is not None:
_call_if_exists(result, '_setupStdout')
try:
setUpModule()
except Exception as e:
if isinstance(result, _DebugResult):
raise
result._moduleSetUpFailed = True
errorName = 'setUpModule (%s)' % currentModule
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
def _addClassOrModuleLevelException(self, result, exception, errorName):
error = _ErrorHolder(errorName)
addSkip = getattr(result, 'addSkip', None)
if addSkip is not None and isinstance(exception, case.SkipTest):
addSkip(error, str(exception))
else:
result.addError(error, sys.exc_info())
def _handleModuleTearDown(self, result):
previousModule = self._get_previous_module(result)
if previousModule is None:
return
if result._moduleSetUpFailed:
return
try:
module = sys.modules[previousModule]
except KeyError:
return
tearDownModule = getattr(module, 'tearDownModule', None)
if tearDownModule is not None:
_call_if_exists(result, '_setupStdout')
try:
tearDownModule()
except Exception as e:
if isinstance(result, _DebugResult):
raise
errorName = 'tearDownModule (%s)' % previousModule
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
def _tearDownPreviousClass(self, test, result):
previousClass = getattr(result, '_previousTestClass', None)
currentClass = test.__class__
if currentClass == previousClass:
return
if getattr(previousClass, '_classSetupFailed', False):
return
if getattr(result, '_moduleSetUpFailed', False):
return
if getattr(previousClass, "__unittest_skip__", False):
return
tearDownClass = getattr(previousClass, 'tearDownClass', None)
if tearDownClass is not None:
_call_if_exists(result, '_setupStdout')
try:
tearDownClass()
except Exception as e:
if isinstance(result, _DebugResult):
raise
className = util.strclass(previousClass)
errorName = 'tearDownClass (%s)' % className
self._addClassOrModuleLevelException(result, e, errorName)
finally:
_call_if_exists(result, '_restoreStdout')
class _ErrorHolder(object):
"""
Placeholder for a TestCase inside a result. As far as a TestResult
is concerned, this looks exactly like a unit test. Used to insert
arbitrary errors into a test suite run.
"""
# Inspired by the ErrorHolder from Twisted:
# http://twistedmatrix.com/trac/browser/trunk/twisted/trial/runner.py
# attribute used by TestResult._exc_info_to_string
failureException = None
def __init__(self, description):
self.description = description
def id(self):
return self.description
def shortDescription(self):
return None
def __repr__(self):
return "<ErrorHolder description=%r>" % (self.description,)
def __str__(self):
return self.id()
def run(self, result):
# could call result.addError(...) - but this test-like object
# shouldn't be run anyway
pass
def __call__(self, result):
return self.run(result)
def countTestCases(self):
return 0
def _isnotsuite(test):
"A crude way to tell apart testcases and suites with duck-typing"
try:
iter(test)
except TypeError:
return True
return False
class _DebugResult(object):
"Used by the TestSuite to hold previous class when running in debug."
_previousTestClass = None
_moduleSetUpFailed = False
shouldStop = False
"""Various utility functions."""
from collections import namedtuple, OrderedDict
from os.path import commonprefix
__unittest = True
_MAX_LENGTH = 80
_PLACEHOLDER_LEN = 12
_MIN_BEGIN_LEN = 5
_MIN_END_LEN = 5
_MIN_COMMON_LEN = 5
_MIN_DIFF_LEN = _MAX_LENGTH - \
(_MIN_BEGIN_LEN + _PLACEHOLDER_LEN + _MIN_COMMON_LEN +
_PLACEHOLDER_LEN + _MIN_END_LEN)
assert _MIN_DIFF_LEN >= 0
def _shorten(s, prefixlen, suffixlen):
skip = len(s) - prefixlen - suffixlen
if skip > _PLACEHOLDER_LEN:
s = '%s[%d chars]%s' % (s[:prefixlen], skip, s[len(s) - suffixlen:])
return s
def _common_shorten_repr(*args):
args = tuple(map(safe_repr, args))
maxlen = max(map(len, args))
if maxlen <= _MAX_LENGTH:
return args
prefix = commonprefix(args)
prefixlen = len(prefix)
common_len = _MAX_LENGTH - \
(maxlen - prefixlen + _MIN_BEGIN_LEN + _PLACEHOLDER_LEN)
if common_len > _MIN_COMMON_LEN:
assert _MIN_BEGIN_LEN + _PLACEHOLDER_LEN + _MIN_COMMON_LEN + \
(maxlen - prefixlen) < _MAX_LENGTH
prefix = _shorten(prefix, _MIN_BEGIN_LEN, common_len)
return tuple(prefix + s[prefixlen:] for s in args)
prefix = _shorten(prefix, _MIN_BEGIN_LEN, _MIN_COMMON_LEN)
return tuple(prefix + _shorten(s[prefixlen:], _MIN_DIFF_LEN, _MIN_END_LEN)
for s in args)
def safe_repr(obj, short=False):
try:
result = repr(obj)
except Exception:
result = object.__repr__(obj)
if not short or len(result) < _MAX_LENGTH:
return result
return result[:_MAX_LENGTH] + ' [truncated]...'
def strclass(cls):
return "%s.%s" % (cls.__module__, cls.__name__)
def sorted_list_difference(expected, actual):
"""Finds elements in only one or the other of two, sorted input lists.
Returns a two-element tuple of lists. The first list contains those
elements in the "expected" list but not in the "actual" list, and the
second contains those elements in the "actual" list but not in the
"expected" list. Duplicate elements in either input list are ignored.
"""
i = j = 0
missing = []
unexpected = []
while True:
try:
e = expected[i]
a = actual[j]
if e < a:
missing.append(e)
i += 1
while expected[i] == e:
i += 1
elif e > a:
unexpected.append(a)
j += 1
while actual[j] == a:
j += 1
else:
i += 1
try:
while expected[i] == e:
i += 1
finally:
j += 1
while actual[j] == a:
j += 1
except IndexError:
missing.extend(expected[i:])
unexpected.extend(actual[j:])
break
return missing, unexpected
def unorderable_list_difference(expected, actual):
"""Same behavior as sorted_list_difference but
for lists of unorderable items (like dicts).
As it does a linear search per item (remove) it
has O(n*n) performance."""
missing = []
while expected:
item = expected.pop()
try:
actual.remove(item)
except ValueError:
missing.append(item)
# anything left in actual is unexpected
return missing, actual
def three_way_cmp(x, y):
"""Return -1 if x < y, 0 if x == y and 1 if x > y"""
return (x > y) - (x < y)
_Mismatch = namedtuple('Mismatch', 'actual expected value')
def _count_diff_all_purpose(actual, expected):
'Returns list of (cnt_act, cnt_exp, elem) triples where the counts differ'
# elements need not be hashable
s, t = list(actual), list(expected)
m, n = len(s), len(t)
NULL = object()
result = []
for i, elem in enumerate(s):
if elem is NULL:
continue
cnt_s = cnt_t = 0
for j in range(i, m):
if s[j] == elem:
cnt_s += 1
s[j] = NULL
for j, other_elem in enumerate(t):
if other_elem == elem:
cnt_t += 1
t[j] = NULL
if cnt_s != cnt_t:
diff = _Mismatch(cnt_s, cnt_t, elem)
result.append(diff)
for i, elem in enumerate(t):
if elem is NULL:
continue
cnt_t = 0
for j in range(i, n):
if t[j] == elem:
cnt_t += 1
t[j] = NULL
diff = _Mismatch(0, cnt_t, elem)
result.append(diff)
return result
def _ordered_count(iterable):
'Return dict of element counts, in the order they were first seen'
c = OrderedDict()
for elem in iterable:
c[elem] = c.get(elem, 0) + 1
return c
def _count_diff_hashable(actual, expected):
'Returns list of (cnt_act, cnt_exp, elem) triples where the counts differ'
# elements must be hashable
s, t = _ordered_count(actual), _ordered_count(expected)
result = []
for elem, cnt_s in s.items():
cnt_t = t.get(elem, 0)
if cnt_s != cnt_t:
diff = _Mismatch(cnt_s, cnt_t, elem)
result.append(diff)
for elem, cnt_t in t.items():
if elem not in s:
diff = _Mismatch(0, cnt_t, elem)
result.append(diff)
return result
"""
Python unit testing framework, based on Erich Gamma's JUnit and Kent Beck's
Smalltalk testing framework.
This module contains the core framework classes that form the basis of
specific test cases and suites (TestCase, TestSuite etc.), and also a
text-based utility class for running the tests and reporting the results
(TextTestRunner).
Simple usage:
import unittest
class IntegerArithmeticTestCase(unittest.TestCase):
def testAdd(self): ## test method names begin 'test*'
self.assertEqual((1 + 2), 3)
self.assertEqual(0 + 1, 1)
def testMultiply(self):
self.assertEqual((0 * 10), 0)
self.assertEqual((5 * 8), 40)
if __name__ == '__main__':
unittest.main()
Further information is available in the bundled documentation, and from
http://docs.python.org/library/unittest.html
Copyright (c) 1999-2003 Steve Purcell
Copyright (c) 2003-2010 Python Software Foundation
This module is free software, and you may redistribute it and/or modify
it under the same terms as Python itself, so long as this copyright message
and disclaimer are retained in their original form.
IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF
THIS CODE, EVEN IF THE AUTHOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE CODE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS,
AND THERE IS NO OBLIGATION WHATSOEVER TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
"""
__all__ = ['TestResult', 'TestCase', 'TestSuite',
'TextTestRunner', 'TestLoader', 'FunctionTestCase', 'main',
'defaultTestLoader', 'SkipTest', 'skip', 'skipIf', 'skipUnless',
'expectedFailure', 'TextTestResult', 'installHandler',
'registerResult', 'removeResult', 'removeHandler']
# Expose obsolete functions for backwards compatibility
__all__.extend(['getTestCaseNames', 'makeSuite', 'findTestCases'])
__unittest = True
from .result import TestResult
from .case import (TestCase, FunctionTestCase, SkipTest, skip, skipIf,
skipUnless, expectedFailure)
from .suite import BaseTestSuite, TestSuite
from .loader import (TestLoader, defaultTestLoader, makeSuite, getTestCaseNames,
findTestCases)
from .main import TestProgram, main
from .runner import TextTestRunner, TextTestResult
from .signals import installHandler, registerResult, removeResult, removeHandler
# deprecated
_TextTestResult = TextTestResult
"""Main entry point"""
import sys
if sys.argv[0].endswith("__main__.py"):
import os.path
# We change sys.argv[0] to make help message more useful
# use executable without path, unquoted
# (it's just a hint anyway)
# (if you have spaces in your executable you get what you deserve!)
executable = os.path.basename(sys.executable)
sys.argv[0] = executable + " -m unittest"
del os
__unittest = True
from .main import main, TestProgram
main(module=None)
"""Exception classes raised by urllib.
The base exception class is URLError, which inherits from OSError. It
doesn't define any behavior of its own, but is the base class for all
exceptions defined in this package.
HTTPError is an exception class that is also a valid HTTP response
instance. It behaves this way because HTTP protocol errors are valid
responses, with a status code, headers, and a body. In some contexts,
an application may want to handle an exception like a regular
response.
"""
import urllib.response
__all__ = ['URLError', 'HTTPError', 'ContentTooShortError']
# do these error classes make sense?
# make sure all of the OSError stuff is overridden. we just want to be
# subtypes.
class URLError(OSError):
# URLError is a sub-type of OSError, but it doesn't share any of
# the implementation. need to override __init__ and __str__.
# It sets self.args for compatibility with other EnvironmentError
# subclasses, but args doesn't have the typical format with errno in
# slot 0 and strerror in slot 1. This may be better than nothing.
def __init__(self, reason, filename=None):
self.args = reason,
self.reason = reason
if filename is not None:
self.filename = filename
def __str__(self):
return '<urlopen error %s>' % self.reason
class HTTPError(URLError, urllib.response.addinfourl):
"""Raised when HTTP error occurs, but also acts like non-error return"""
__super_init = urllib.response.addinfourl.__init__
def __init__(self, url, code, msg, hdrs, fp):
self.code = code
self.msg = msg
self.hdrs = hdrs
self.fp = fp
self.filename = url
# The addinfourl classes depend on fp being a valid file
# object. In some cases, the HTTPError may not have a valid
# file object. If this happens, the simplest workaround is to
# not initialize the base classes.
if fp is not None:
self.__super_init(fp, hdrs, url, code)
def __str__(self):
return 'HTTP Error %s: %s' % (self.code, self.msg)
# since URLError specifies a .reason attribute, HTTPError should also
# provide this attribute. See issue13211 for discussion.
@property
def reason(self):
return self.msg
@property
def headers(self):
return self.hdrs
@headers.setter
def headers(self, headers):
self.hdrs = headers
# exception raised when downloaded size does not match content-length
class ContentTooShortError(URLError):
def __init__(self, message, content):
URLError.__init__(self, message)
self.content = content
"""Parse (absolute and relative) URLs.
urlparse module is based upon the following RFC specifications.
RFC 3986 (STD66): "Uniform Resource Identifiers" by T. Berners-Lee, R. Fielding
and L. Masinter, January 2005.
RFC 2732 : "Format for Literal IPv6 Addresses in URL's by R.Hinden, B.Carpenter
and L.Masinter, December 1999.
RFC 2396: "Uniform Resource Identifiers (URI)": Generic Syntax by T.
Berners-Lee, R. Fielding, and L. Masinter, August 1998.
RFC 2368: "The mailto URL scheme", by P.Hoffman , L Masinter, J. Zawinski, July 1998.
RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June
1995.
RFC 1738: "Uniform Resource Locators (URL)" by T. Berners-Lee, L. Masinter, M.
McCahill, December 1994
RFC 3986 is considered the current standard and any future changes to
urlparse module should conform with it. The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes, some
parsing quirks from older RFCs are retained. The testcases in
test_urlparse.py provides a good indicator of parsing behavior.
"""
import re
import sys
import collections
__all__ = ["urlparse", "urlunparse", "urljoin", "urldefrag",
"urlsplit", "urlunsplit", "urlencode", "parse_qs",
"parse_qsl", "quote", "quote_plus", "quote_from_bytes",
"unquote", "unquote_plus", "unquote_to_bytes"]
# A classification of schemes ('' means apply by default)
uses_relative = ['ftp', 'http', 'gopher', 'nntp', 'imap',
'wais', 'file', 'https', 'shttp', 'mms',
'prospero', 'rtsp', 'rtspu', '', 'sftp',
'svn', 'svn+ssh']
uses_netloc = ['ftp', 'http', 'gopher', 'nntp', 'telnet',
'imap', 'wais', 'file', 'mms', 'https', 'shttp',
'snews', 'prospero', 'rtsp', 'rtspu', 'rsync', '',
'svn', 'svn+ssh', 'sftp', 'nfs', 'git', 'git+ssh']
uses_params = ['ftp', 'hdl', 'prospero', 'http', 'imap',
'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
'mms', '', 'sftp', 'tel']
# These are not actually used anymore, but should stay for backwards
# compatibility. (They are undocumented, but have a public-looking name.)
non_hierarchical = ['gopher', 'hdl', 'mailto', 'news',
'telnet', 'wais', 'imap', 'snews', 'sip', 'sips']
uses_query = ['http', 'wais', 'imap', 'https', 'shttp', 'mms',
'gopher', 'rtsp', 'rtspu', 'sip', 'sips', '']
uses_fragment = ['ftp', 'hdl', 'http', 'gopher', 'news',
'nntp', 'wais', 'https', 'shttp', 'snews',
'file', 'prospero', '']
# Characters valid in scheme names
scheme_chars = ('abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
'0123456789'
'+-.')
# Unsafe bytes to be removed per WHATWG spec
_UNSAFE_URL_BYTES_TO_REMOVE = ['\t', '\r', '\n']
# XXX: Consider replacing with functools.lru_cache
MAX_CACHE_SIZE = 20
_parse_cache = {}
def clear_cache():
"""Clear the parse cache and the quoters cache."""
_parse_cache.clear()
_safe_quoters.clear()
# Helpers for bytes handling
# For 3.2, we deliberately require applications that
# handle improperly quoted URLs to do their own
# decoding and encoding. If valid use cases are
# presented, we may relax this by using latin-1
# decoding internally for 3.3
_implicit_encoding = 'ascii'
_implicit_errors = 'strict'
def _noop(obj):
return obj
def _encode_result(obj, encoding=_implicit_encoding,
errors=_implicit_errors):
return obj.encode(encoding, errors)
def _decode_args(args, encoding=_implicit_encoding,
errors=_implicit_errors):
return tuple(x.decode(encoding, errors) if x else '' for x in args)
def _coerce_args(*args):
# Invokes decode if necessary to create str args
# and returns the coerced inputs along with
# an appropriate result coercion function
# - noop for str inputs
# - encoding function otherwise
str_input = isinstance(args[0], str)
for arg in args[1:]:
# We special-case the empty string to support the
# "scheme=''" default argument to some functions
if arg and isinstance(arg, str) != str_input:
raise TypeError("Cannot mix str and non-str arguments")
if str_input:
return args + (_noop,)
return _decode_args(args) + (_encode_result,)
# Result objects are more helpful than simple tuples
class _ResultMixinStr(object):
"""Standard approach to encoding parsed results from str to bytes"""
__slots__ = ()
def encode(self, encoding='ascii', errors='strict'):
return self._encoded_counterpart(*(x.encode(encoding, errors) for x in self))
class _ResultMixinBytes(object):
"""Standard approach to decoding parsed results from bytes to str"""
__slots__ = ()
def decode(self, encoding='ascii', errors='strict'):
return self._decoded_counterpart(*(x.decode(encoding, errors) for x in self))
class _NetlocResultMixinBase(object):
"""Shared methods for the parsed result objects containing a netloc element"""
__slots__ = ()
@property
def username(self):
return self._userinfo[0]
@property
def password(self):
return self._userinfo[1]
@property
def hostname(self):
hostname = self._hostinfo[0]
if not hostname:
hostname = None
elif hostname is not None:
hostname = hostname.lower()
return hostname
@property
def port(self):
port = self._hostinfo[1]
if port is not None:
port = int(port, 10)
# Return None on an illegal port
if not ( 0 <= port <= 65535):
return None
return port
class _NetlocResultMixinStr(_NetlocResultMixinBase, _ResultMixinStr):
__slots__ = ()
@property
def _userinfo(self):
netloc = self.netloc
userinfo, have_info, hostinfo = netloc.rpartition('@')
if have_info:
username, have_password, password = userinfo.partition(':')
if not have_password:
password = None
else:
username = password = None
return username, password
@property
def _hostinfo(self):
netloc = self.netloc
_, _, hostinfo = netloc.rpartition('@')
_, have_open_br, bracketed = hostinfo.partition('[')
if have_open_br:
hostname, _, port = bracketed.partition(']')
_, _, port = port.partition(':')
else:
hostname, _, port = hostinfo.partition(':')
if not port:
port = None
return hostname, port
class _NetlocResultMixinBytes(_NetlocResultMixinBase, _ResultMixinBytes):
__slots__ = ()
@property
def _userinfo(self):
netloc = self.netloc
userinfo, have_info, hostinfo = netloc.rpartition(b'@')
if have_info:
username, have_password, password = userinfo.partition(b':')
if not have_password:
password = None
else:
username = password = None
return username, password
@property
def _hostinfo(self):
netloc = self.netloc
_, _, hostinfo = netloc.rpartition(b'@')
_, have_open_br, bracketed = hostinfo.partition(b'[')
if have_open_br:
hostname, _, port = bracketed.partition(b']')
_, _, port = port.partition(b':')
else:
hostname, _, port = hostinfo.partition(b':')
if not port:
port = None
return hostname, port
from collections import namedtuple
_DefragResultBase = namedtuple('DefragResult', 'url fragment')
_SplitResultBase = namedtuple('SplitResult', 'scheme netloc path query fragment')
_ParseResultBase = namedtuple('ParseResult', 'scheme netloc path params query fragment')
# For backwards compatibility, alias _NetlocResultMixinStr
# ResultBase is no longer part of the documented API, but it is
# retained since deprecating it isn't worth the hassle
ResultBase = _NetlocResultMixinStr
# Structured result objects for string data
class DefragResult(_DefragResultBase, _ResultMixinStr):
__slots__ = ()
def geturl(self):
if self.fragment:
return self.url + '#' + self.fragment
else:
return self.url
class SplitResult(_SplitResultBase, _NetlocResultMixinStr):
__slots__ = ()
def geturl(self):
return urlunsplit(self)
class ParseResult(_ParseResultBase, _NetlocResultMixinStr):
__slots__ = ()
def geturl(self):
return urlunparse(self)
# Structured result objects for bytes data
class DefragResultBytes(_DefragResultBase, _ResultMixinBytes):
__slots__ = ()
def geturl(self):
if self.fragment:
return self.url + b'#' + self.fragment
else:
return self.url
class SplitResultBytes(_SplitResultBase, _NetlocResultMixinBytes):
__slots__ = ()
def geturl(self):
return urlunsplit(self)
class ParseResultBytes(_ParseResultBase, _NetlocResultMixinBytes):
__slots__ = ()
def geturl(self):
return urlunparse(self)
# Set up the encode/decode result pairs
def _fix_result_transcoding():
_result_pairs = (
(DefragResult, DefragResultBytes),
(SplitResult, SplitResultBytes),
(ParseResult, ParseResultBytes),
)
for _decoded, _encoded in _result_pairs:
_decoded._encoded_counterpart = _encoded
_encoded._decoded_counterpart = _decoded
_fix_result_transcoding()
del _fix_result_transcoding
def urlparse(url, scheme='', allow_fragments=True):
"""Parse a URL into 6 components:
<scheme>://<netloc>/<path>;<params>?<query>#<fragment>
Return a 6-tuple: (scheme, netloc, path, params, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
url, scheme, _coerce_result = _coerce_args(url, scheme)
splitresult = urlsplit(url, scheme, allow_fragments)
scheme, netloc, url, query, fragment = splitresult
if scheme in uses_params and ';' in url:
url, params = _splitparams(url)
else:
params = ''
result = ParseResult(scheme, netloc, url, params, query, fragment)
return _coerce_result(result)
def _splitparams(url):
if '/' in url:
i = url.find(';', url.rfind('/'))
if i < 0:
return url, ''
else:
i = url.find(';')
return url[:i], url[i+1:]
def _splitnetloc(url, start=0):
delim = len(url) # position of end of domain part of url, default is end
for c in '/?#': # look for delimiters; the order is NOT important
wdelim = url.find(c, start) # find first of this delim
if wdelim >= 0: # if found
delim = min(delim, wdelim) # use earliest delim position
return url[start:delim], url[delim:] # return (domain, rest)
def _checknetloc(netloc):
if not netloc or not any(ord(c) > 127 for c in netloc):
return
# looking for characters like \u2100 that expand to 'a/c'
# IDNA uses NFKC equivalence, so normalize for this check
import unicodedata
netloc2 = unicodedata.normalize('NFKC', netloc)
if netloc == netloc2:
return
_, _, netloc = netloc.rpartition('@') # anything to the left of '@' is okay
for c in '/?#@:':
if c in netloc2:
raise ValueError("netloc '" + netloc2 + "' contains invalid " +
"characters under NFKC normalization")
def _remove_unsafe_bytes_from_url(url):
for b in _UNSAFE_URL_BYTES_TO_REMOVE:
url = url.replace(b, "")
return url
def urlsplit(url, scheme='', allow_fragments=True):
"""Parse a URL into 5 components:
<scheme>://<netloc>/<path>?<query>#<fragment>
Return a 5-tuple: (scheme, netloc, path, query, fragment).
Note that we don't break the components up in smaller bits
(e.g. netloc is a single string) and we don't expand % escapes."""
url, scheme, _coerce_result = _coerce_args(url, scheme)
url = _remove_unsafe_bytes_from_url(url)
scheme = _remove_unsafe_bytes_from_url(scheme)
allow_fragments = bool(allow_fragments)
key = url, scheme, allow_fragments, type(url), type(scheme)
cached = _parse_cache.get(key, None)
if cached:
return _coerce_result(cached)
if len(_parse_cache) >= MAX_CACHE_SIZE: # avoid runaway growth
clear_cache()
netloc = query = fragment = ''
i = url.find(':')
if i > 0:
if url[:i] == 'http': # optimize the common case
scheme = url[:i].lower()
url = url[i+1:]
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
_checknetloc(netloc)
v = SplitResult(scheme, netloc, url, query, fragment)
_parse_cache[key] = v
return _coerce_result(v)
for c in url[:i]:
if c not in scheme_chars:
break
else:
# make sure "url" is not actually a port number (in which case
# "scheme" is really part of the path)
rest = url[i+1:]
if not rest or any(c not in '0123456789' for c in rest):
# not a port number
scheme, url = url[:i].lower(), rest
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if allow_fragments and '#' in url:
url, fragment = url.split('#', 1)
if '?' in url:
url, query = url.split('?', 1)
_checknetloc(netloc)
v = SplitResult(scheme, netloc, url, query, fragment)
_parse_cache[key] = v
return _coerce_result(v)
def urlunparse(components):
"""Put a parsed URL back together again. This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had redundant delimiters, e.g. a ? with an empty query
(the draft states that these are equivalent)."""
scheme, netloc, url, params, query, fragment, _coerce_result = (
_coerce_args(*components))
if params:
url = "%s;%s" % (url, params)
return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))
def urlunsplit(components):
"""Combine the elements of a tuple as returned by urlsplit() into a
complete URL as a string. The data argument can be any five-item iterable.
This may result in a slightly different, but equivalent URL, if the URL that
was parsed originally had unnecessary delimiters (for example, a ? with an
empty query; the RFC states that these are equivalent)."""
scheme, netloc, url, query, fragment, _coerce_result = (
_coerce_args(*components))
if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
if url and url[:1] != '/': url = '/' + url
url = '//' + (netloc or '') + url
if scheme:
url = scheme + ':' + url
if query:
url = url + '?' + query
if fragment:
url = url + '#' + fragment
return _coerce_result(url)
def urljoin(base, url, allow_fragments=True):
"""Join a base URL and a possibly relative URL to form an absolute
interpretation of the latter."""
if not base:
return url
if not url:
return base
base, url, _coerce_result = _coerce_args(base, url)
bscheme, bnetloc, bpath, bparams, bquery, bfragment = \
urlparse(base, '', allow_fragments)
scheme, netloc, path, params, query, fragment = \
urlparse(url, bscheme, allow_fragments)
if scheme != bscheme or scheme not in uses_relative:
return _coerce_result(url)
if scheme in uses_netloc:
if netloc:
return _coerce_result(urlunparse((scheme, netloc, path,
params, query, fragment)))
netloc = bnetloc
if path[:1] == '/':
return _coerce_result(urlunparse((scheme, netloc, path,
params, query, fragment)))
if not path and not params:
path = bpath
params = bparams
if not query:
query = bquery
return _coerce_result(urlunparse((scheme, netloc, path,
params, query, fragment)))
segments = bpath.split('/')[:-1] + path.split('/')
# XXX The stuff below is bogus in various ways...
if segments[-1] == '.':
segments[-1] = ''
while '.' in segments:
segments.remove('.')
while 1:
i = 1
n = len(segments) - 1
while i < n:
if (segments[i] == '..'
and segments[i-1] not in ('', '..')):
del segments[i-1:i+1]
break
i = i+1
else:
break
if segments == ['', '..']:
segments[-1] = ''
elif len(segments) >= 2 and segments[-1] == '..':
segments[-2:] = ['']
return _coerce_result(urlunparse((scheme, netloc, '/'.join(segments),
params, query, fragment)))
def urldefrag(url):
"""Removes any existing fragment from URL.
Returns a tuple of the defragmented URL and the fragment. If
the URL contained no fragments, the second element is the
empty string.
"""
url, _coerce_result = _coerce_args(url)
if '#' in url:
s, n, p, a, q, frag = urlparse(url)
defrag = urlunparse((s, n, p, a, q, ''))
else:
frag = ''
defrag = url
return _coerce_result(DefragResult(defrag, frag))
_hexdig = '0123456789ABCDEFabcdef'
_hextobyte = None
def unquote_to_bytes(string):
"""unquote_to_bytes('abc%20def') -> b'abc def'."""
# Note: strings are encoded as UTF-8. This is only an issue if it contains
# unescaped non-ASCII characters, which URIs should not.
if not string:
# Is it a string-like object?
string.split
return b''
if isinstance(string, str):
string = string.encode('utf-8')
bits = string.split(b'%')
if len(bits) == 1:
return string
res = [bits[0]]
append = res.append
# Delay the initialization of the table to not waste memory
# if the function is never called
global _hextobyte
if _hextobyte is None:
_hextobyte = {(a + b).encode(): bytes([int(a + b, 16)])
for a in _hexdig for b in _hexdig}
for item in bits[1:]:
try:
append(_hextobyte[item[:2]])
append(item[2:])
except KeyError:
append(b'%')
append(item)
return b''.join(res)
_asciire = re.compile('([\x00-\x7f]+)')
def unquote(string, encoding='utf-8', errors='replace'):
"""Replace %xx escapes by their single-character equivalent. The optional
encoding and errors parameters specify how to decode percent-encoded
sequences into Unicode characters, as accepted by the bytes.decode()
method.
By default, percent-encoded sequences are decoded with UTF-8, and invalid
sequences are replaced by a placeholder character.
unquote('abc%20def') -> 'abc def'.
"""
if '%' not in string:
string.split
return string
if encoding is None:
encoding = 'utf-8'
if errors is None:
errors = 'replace'
bits = _asciire.split(string)
res = [bits[0]]
append = res.append
for i in range(1, len(bits), 2):
append(unquote_to_bytes(bits[i]).decode(encoding, errors))
append(bits[i + 1])
return ''.join(res)
def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
"""Parse a query given as a string argument.
Arguments:
qs: percent-encoded query string to be parsed
keep_blank_values: flag indicating whether blank values in
percent-encoded queries should be treated as blank strings.
A true value indicates that blanks should be retained as
blank strings. The default false value indicates that
blank values are to be ignored and treated as if they were
not included.
strict_parsing: flag indicating what to do with parsing errors.
If false (the default), errors are silently ignored.
If true, errors raise a ValueError exception.
encoding and errors: specify how to decode percent-encoded sequences
into Unicode characters, as accepted by the bytes.decode() method.
"""
parsed_result = {}
pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
encoding=encoding, errors=errors)
for name, value in pairs:
if name in parsed_result:
parsed_result[name].append(value)
else:
parsed_result[name] = [value]
return parsed_result
def parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
encoding='utf-8', errors='replace'):
"""Parse a query given as a string argument.
Arguments:
qs: percent-encoded query string to be parsed
keep_blank_values: flag indicating whether blank values in
percent-encoded queries should be treated as blank strings. A
true value indicates that blanks should be retained as blank
strings. The default false value indicates that blank values
are to be ignored and treated as if they were not included.
strict_parsing: flag indicating what to do with parsing errors. If
false (the default), errors are silently ignored. If true,
errors raise a ValueError exception.
encoding and errors: specify how to decode percent-encoded sequences
into Unicode characters, as accepted by the bytes.decode() method.
Returns a list, as G-d intended.
"""
qs, _coerce_result = _coerce_args(qs)
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
r = []
for name_value in pairs:
if not name_value and not strict_parsing:
continue
nv = name_value.split('=', 1)
if len(nv) != 2:
if strict_parsing:
raise ValueError("bad query field: %r" % (name_value,))
# Handle case of a control-name with no equal sign
if keep_blank_values:
nv.append('')
else:
continue
if len(nv[1]) or keep_blank_values:
name = nv[0].replace('+', ' ')
name = unquote(name, encoding=encoding, errors=errors)
name = _coerce_result(name)
value = nv[1].replace('+', ' ')
value = unquote(value, encoding=encoding, errors=errors)
value = _coerce_result(value)
r.append((name, value))
return r
def unquote_plus(string, encoding='utf-8', errors='replace'):
"""Like unquote(), but also replace plus signs by spaces, as required for
unquoting HTML form values.
unquote_plus('%7e/abc+def') -> '~/abc def'
"""
string = string.replace('+', ' ')
return unquote(string, encoding, errors)
_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'abcdefghijklmnopqrstuvwxyz'
b'0123456789'
b'_.-')
_ALWAYS_SAFE_BYTES = bytes(_ALWAYS_SAFE)
_safe_quoters = {}
class Quoter(collections.defaultdict):
"""A mapping from bytes (in range(0,256)) to strings.
String values are percent-encoded byte values, unless the key < 128, and
in the "safe" set (either the specified safe set, or default set).
"""
# Keeps a cache internally, using defaultdict, for efficiency (lookups
# of cached keys don't call Python code at all).
def __init__(self, safe):
"""safe: bytes object."""
self.safe = _ALWAYS_SAFE.union(safe)
def __repr__(self):
# Without this, will just display as a defaultdict
return "<Quoter %r>" % dict(self)
def __missing__(self, b):
# Handle a cache miss. Store quoted string in cache and return.
res = chr(b) if b in self.safe else '%{:02X}'.format(b)
self[b] = res
return res
def quote(string, safe='/', encoding=None, errors=None):
"""quote('abc def') -> 'abc%20def'
Each part of a URL, e.g. the path info, the query, etc., has a
different set of reserved characters that must be quoted.
RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists
the following reserved characters.
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.
By default, the quote function is intended for quoting the path
section of a URL. Thus, it will not encode '/'. This character
is reserved, but in typical usage the quote function is being
called on a path where the existing slash characters are used as
reserved characters.
string and safe may be either str or bytes objects. encoding and errors
must not be specified if string is a bytes object.
The optional encoding and errors parameters specify how to deal with
non-ASCII characters, as accepted by the str.encode method.
By default, encoding='utf-8' (characters are encoded with UTF-8), and
errors='strict' (unsupported characters raise a UnicodeEncodeError).
"""
if isinstance(string, str):
if not string:
return string
if encoding is None:
encoding = 'utf-8'
if errors is None:
errors = 'strict'
string = string.encode(encoding, errors)
else:
if encoding is not None:
raise TypeError("quote() doesn't support 'encoding' for bytes")
if errors is not None:
raise TypeError("quote() doesn't support 'errors' for bytes")
return quote_from_bytes(string, safe)
def quote_plus(string, safe='', encoding=None, errors=None):
"""Like quote(), but also replace ' ' with '+', as required for quoting
HTML form values. Plus signs in the original string are escaped unless
they are included in safe. It also does not have safe default to '/'.
"""
# Check if ' ' in string, where string may either be a str or bytes. If
# there are no spaces, the regular quote will produce the right answer.
if ((isinstance(string, str) and ' ' not in string) or
(isinstance(string, bytes) and b' ' not in string)):
return quote(string, safe, encoding, errors)
if isinstance(safe, str):
space = ' '
else:
space = b' '
string = quote(string, safe + space, encoding, errors)
return string.replace(' ', '+')
def quote_from_bytes(bs, safe='/'):
"""Like quote(), but accepts a bytes object rather than a str, and does
not perform string-to-bytes encoding. It always returns an ASCII string.
quote_from_bytes(b'abc def\x3f') -> 'abc%20def%3f'
"""
if not isinstance(bs, (bytes, bytearray)):
raise TypeError("quote_from_bytes() expected bytes")
if not bs:
return ''
if isinstance(safe, str):
# Normalize 'safe' by converting to bytes and removing non-ASCII chars
safe = safe.encode('ascii', 'ignore')
else:
safe = bytes([c for c in safe if c < 128])
if not bs.rstrip(_ALWAYS_SAFE_BYTES + safe):
return bs.decode()
try:
quoter = _safe_quoters[safe]
except KeyError:
_safe_quoters[safe] = quoter = Quoter(safe).__getitem__
return ''.join([quoter(char) for char in bs])
def urlencode(query, doseq=False, safe='', encoding=None, errors=None):
"""Encode a dict or sequence of two-element tuples into a URL query string.
If any values in the query arg are sequences and doseq is true, each
sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the
parameters in the output will match the order of parameters in the
input.
The components of a query arg may each be either a string or a bytes type.
The safe, encoding, and errors parameters are passed down to quote_plus()
(encoding and errors only if a component is a str).
"""
if hasattr(query, "items"):
query = query.items()
else:
# It's a bother at times that strings and string-like objects are
# sequences.
try:
# non-sequence items should not work with len()
# non-empty strings will fail this
if len(query) and not isinstance(query[0], tuple):
raise TypeError
# Zero-length sequences of all types will get here and succeed,
# but that's a minor nit. Since the original implementation
# allowed empty dicts that type of behavior probably should be
# preserved for consistency
except TypeError:
ty, va, tb = sys.exc_info()
raise TypeError("not a valid non-string sequence "
"or mapping object").with_traceback(tb)
l = []
if not doseq:
for k, v in query:
if isinstance(k, bytes):
k = quote_plus(k, safe)
else:
k = quote_plus(str(k), safe, encoding, errors)
if isinstance(v, bytes):
v = quote_plus(v, safe)
else:
v = quote_plus(str(v), safe, encoding, errors)
l.append(k + '=' + v)
else:
for k, v in query:
if isinstance(k, bytes):
k = quote_plus(k, safe)
else:
k = quote_plus(str(k), safe, encoding, errors)
if isinstance(v, bytes):
v = quote_plus(v, safe)
l.append(k + '=' + v)
elif isinstance(v, str):
v = quote_plus(v, safe, encoding, errors)
l.append(k + '=' + v)
else:
try:
# Is this a sufficient test for sequence-ness?
x = len(v)
except TypeError:
# not a sequence
v = quote_plus(str(v), safe, encoding, errors)
l.append(k + '=' + v)
else:
# loop over the sequence
for elt in v:
if isinstance(elt, bytes):
elt = quote_plus(elt, safe)
else:
elt = quote_plus(str(elt), safe, encoding, errors)
l.append(k + '=' + elt)
return '&'.join(l)
# Utilities to parse URLs (most of these return None for missing parts):
# unwrap('<URL:type://host/path>') --> 'type://host/path'
# splittype('type:opaquestring') --> 'type', 'opaquestring'
# splithost('//host[:port]/path') --> 'host[:port]', '/path'
# splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'
# splitpasswd('user:passwd') -> 'user', 'passwd'
# splitport('host:port') --> 'host', 'port'
# splitquery('/path?query') --> '/path', 'query'
# splittag('/path#tag') --> '/path', 'tag'
# splitattr('/path;attr1=value1;attr2=value2;...') ->
# '/path', ['attr1=value1', 'attr2=value2', ...]
# splitvalue('attr=value') --> 'attr', 'value'
# urllib.parse.unquote('abc%20def') -> 'abc def'
# quote('abc def') -> 'abc%20def')
def to_bytes(url):
"""to_bytes(u"URL") --> 'URL'."""
# Most URL schemes require ASCII. If that changes, the conversion
# can be relaxed.
# XXX get rid of to_bytes()
if isinstance(url, str):
try:
url = url.encode("ASCII").decode()
except UnicodeError:
raise UnicodeError("URL " + repr(url) +
" contains non-ASCII characters")
return url
def unwrap(url):
"""unwrap('<URL:type://host/path>') --> 'type://host/path'."""
url = str(url).strip()
if url[:1] == '<' and url[-1:] == '>':
url = url[1:-1].strip()
if url[:4] == 'URL:': url = url[4:].strip()
return url
_typeprog = None
def splittype(url):
"""splittype('type:opaquestring') --> 'type', 'opaquestring'."""
global _typeprog
if _typeprog is None:
_typeprog = re.compile('^([^/:]+):')
match = _typeprog.match(url)
if match:
scheme = match.group(1)
return scheme.lower(), url[len(scheme) + 1:]
return None, url
_hostprog = None
def splithost(url):
"""splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
global _hostprog
if _hostprog is None:
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
match = _hostprog.match(url)
if match:
host_port = match.group(1)
path = match.group(2)
if path and not path.startswith('/'):
path = '/' + path
return host_port, path
return None, url
_userprog = None
def splituser(host):
"""splituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'."""
global _userprog
if _userprog is None:
_userprog = re.compile('^(.*)@(.*)$')
match = _userprog.match(host)
if match: return match.group(1, 2)
return None, host
_passwdprog = None
def splitpasswd(user):
"""splitpasswd('user:passwd') -> 'user', 'passwd'."""
global _passwdprog
if _passwdprog is None:
_passwdprog = re.compile('^([^:]*):(.*)$',re.S)
match = _passwdprog.match(user)
if match: return match.group(1, 2)
return user, None
# splittag('/path#tag') --> '/path', 'tag'
_portprog = None
def splitport(host):
"""splitport('host:port') --> 'host', 'port'."""
global _portprog
if _portprog is None:
_portprog = re.compile('^(.*):([0-9]*)$')
match = _portprog.match(host)
if match:
host, port = match.groups()
if port:
return host, port
return host, None
_nportprog = None
def splitnport(host, defport=-1):
"""Split host and port, returning numeric port.
Return given default port if no ':' found; defaults to -1.
Return numerical port if a valid number are found after ':'.
Return None if ':' but not a valid number."""
global _nportprog
if _nportprog is None:
_nportprog = re.compile('^(.*):(.*)$')
match = _nportprog.match(host)
if match:
host, port = match.group(1, 2)
if port:
try:
nport = int(port)
except ValueError:
nport = None
return host, nport
return host, defport
_queryprog = None
def splitquery(url):
"""splitquery('/path?query') --> '/path', 'query'."""
global _queryprog
if _queryprog is None:
_queryprog = re.compile('^(.*)\?([^?]*)$')
match = _queryprog.match(url)
if match: return match.group(1, 2)
return url, None
_tagprog = None
def splittag(url):
"""splittag('/path#tag') --> '/path', 'tag'."""
global _tagprog
if _tagprog is None:
_tagprog = re.compile('^(.*)#([^#]*)$')
match = _tagprog.match(url)
if match: return match.group(1, 2)
return url, None
def splitattr(url):
"""splitattr('/path;attr1=value1;attr2=value2;...') ->
'/path', ['attr1=value1', 'attr2=value2', ...]."""
words = url.split(';')
return words[0], words[1:]
_valueprog = None
def splitvalue(attr):
"""splitvalue('attr=value') --> 'attr', 'value'."""
global _valueprog
if _valueprog is None:
_valueprog = re.compile('^([^=]*)=(.*)$')
match = _valueprog.match(attr)
if match: return match.group(1, 2)
return attr, None
"""An extensible library for opening URLs using a variety of protocols
The simplest way to use this module is to call the urlopen function,
which accepts a string containing a URL or a Request object (described
below). It opens the URL and returns the results as file-like
object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do
all the actual work. Each Handler implements a particular protocol or
option. The OpenerDirector is a composite object that invokes the
Handlers needed to open the requested URL. For example, the
HTTPHandler performs HTTP GET and POST requests and deals with
non-error returns. The HTTPRedirectHandler automatically deals with
HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
deals with digest authentication.
urlopen(url, data=None) -- Basic usage is the same as original
urllib. pass the url and optionally data to post to an HTTP URL, and
get a file-like object back. One difference is that you can also pass
a Request instance instead of URL. Raises a URLError (subclass of
OSError); for HTTP errors, raises an HTTPError, which can also be
treated as a valid response.
build_opener -- Function that creates a new OpenerDirector instance.
Will install the default handlers. Accepts one or more Handlers as
arguments, either instances or Handler classes that it will
instantiate. If one of the argument is a subclass of the default
handler, the argument will be installed instead of the default.
install_opener -- Installs a new opener as the default opener.
objects of interest:
OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
the Handler classes, while dealing with requests and responses.
Request -- An object that encapsulates the state of a request. The
state can be as simple as the URL. It can also include extra HTTP
headers, e.g. a User-Agent.
BaseHandler --
internals:
BaseHandler and parent
_call_chain conventions
Example usage:
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='geheim$parole')
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
"""
# XXX issues:
# If an authentication error handler that tries to perform
# authentication for some reason but fails, how should the error be
# signalled? The client needs to know the HTTP error code. But if
# the handler knows that the problem was, e.g., that it didn't know
# that hash algo that requested in the challenge, it would be good to
# pass that information along to the client, too.
# ftp errors aren't handled cleanly
# check digest against correct (i.e. non-apache) implementation
# Possible extensions:
# complex proxies XXX not sure what exactly was meant by this
# abstract factory for opener
import base64
import bisect
import email
import hashlib
import http.client
import io
import os
import posixpath
import re
import socket
import sys
import time
import collections
import tempfile
import contextlib
import warnings
from urllib.error import URLError, HTTPError, ContentTooShortError
from urllib.parse import (
urlparse, urlsplit, urljoin, unwrap, quote, unquote,
splittype, splithost, splitport, splituser, splitpasswd,
splitattr, splitquery, splitvalue, splittag, to_bytes,
unquote_to_bytes, urlunparse)
from urllib.response import addinfourl, addclosehook
# check for SSL
try:
import ssl
except ImportError:
_have_ssl = False
else:
_have_ssl = True
__all__ = [
# Classes
'Request', 'OpenerDirector', 'BaseHandler', 'HTTPDefaultErrorHandler',
'HTTPRedirectHandler', 'HTTPCookieProcessor', 'ProxyHandler',
'HTTPPasswordMgr', 'HTTPPasswordMgrWithDefaultRealm',
'AbstractBasicAuthHandler', 'HTTPBasicAuthHandler', 'ProxyBasicAuthHandler',
'AbstractDigestAuthHandler', 'HTTPDigestAuthHandler', 'ProxyDigestAuthHandler',
'HTTPHandler', 'FileHandler', 'FTPHandler', 'CacheFTPHandler', 'DataHandler',
'UnknownHandler', 'HTTPErrorProcessor',
# Functions
'urlopen', 'install_opener', 'build_opener',
'pathname2url', 'url2pathname', 'getproxies',
# Legacy interface
'urlretrieve', 'urlcleanup', 'URLopener', 'FancyURLopener',
]
# used in User-Agent header sent
__version__ = sys.version[:3]
_opener = None
def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
*, cafile=None, capath=None, cadefault=False, context=None):
global _opener
if cafile or capath or cadefault:
if context is not None:
raise ValueError(
"You can't pass both context and any of cafile, capath, and "
"cadefault"
)
if not _have_ssl:
raise ValueError('SSL support not available')
context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH,
cafile=cafile,
capath=capath)
https_handler = HTTPSHandler(context=context)
opener = build_opener(https_handler)
elif context:
https_handler = HTTPSHandler(context=context)
opener = build_opener(https_handler)
elif _opener is None:
_opener = opener = build_opener()
else:
opener = _opener
return opener.open(url, data, timeout)
def install_opener(opener):
global _opener
_opener = opener
_url_tempfiles = []
def urlretrieve(url, filename=None, reporthook=None, data=None):
"""
Retrieve a URL into a temporary location on disk.
Requires a URL argument. If a filename is passed, it is used as
the temporary file location. The reporthook argument should be
a callable that accepts a block number, a read size, and the
total file size of the URL target. The data argument should be
valid URL encoded data.
If a filename is passed and the URL points to a local resource,
the result is a copy from local file to new file.
Returns a tuple containing the path to the newly created
data file as well as the resulting HTTPMessage object.
"""
url_type, path = splittype(url)
with contextlib.closing(urlopen(url, data)) as fp:
headers = fp.info()
# Just return the local path and the "headers" for file://
# URLs. No sense in performing a copy unless requested.
if url_type == "file" and not filename:
return os.path.normpath(path), headers
# Handle temporary file setup.
if filename:
tfp = open(filename, 'wb')
else:
tfp = tempfile.NamedTemporaryFile(delete=False)
filename = tfp.name
_url_tempfiles.append(filename)
with tfp:
result = filename, headers
bs = 1024*8
size = -1
read = 0
blocknum = 0
if "content-length" in headers:
size = int(headers["Content-Length"])
if reporthook:
reporthook(blocknum, bs, size)
while True:
block = fp.read(bs)
if not block:
break
read += len(block)
tfp.write(block)
blocknum += 1
if reporthook:
reporthook(blocknum, bs, size)
if size >= 0 and read < size:
raise ContentTooShortError(
"retrieval incomplete: got only %i out of %i bytes"
% (read, size), result)
return result
def urlcleanup():
"""Clean up temporary files from urlretrieve calls."""
for temp_file in _url_tempfiles:
try:
os.unlink(temp_file)
except OSError:
pass
del _url_tempfiles[:]
global _opener
if _opener:
_opener = None
# copied from cookielib.py
_cut_port_re = re.compile(r":\d+$", re.ASCII)
def request_host(request):
"""Return request-host, as defined by RFC 2965.
Variation from RFC: returned value is lowercased, for convenient
comparison.
"""
url = request.full_url
host = urlparse(url)[1]
if host == "":
host = request.get_header("Host", "")
# remove port, if present
host = _cut_port_re.sub("", host, 1)
return host.lower()
class Request:
def __init__(self, url, data=None, headers={},
origin_req_host=None, unverifiable=False,
method=None):
self.full_url = url
self.headers = {}
self.unredirected_hdrs = {}
self._data = None
self.data = data
self._tunnel_host = None
for key, value in headers.items():
self.add_header(key, value)
if origin_req_host is None:
origin_req_host = request_host(self)
self.origin_req_host = origin_req_host
self.unverifiable = unverifiable
if method:
self.method = method
@property
def full_url(self):
if self.fragment:
return '{}#{}'.format(self._full_url, self.fragment)
return self._full_url
@full_url.setter
def full_url(self, url):
# unwrap('<URL:type://host/path>') --> 'type://host/path'
self._full_url = unwrap(url)
self._full_url, self.fragment = splittag(self._full_url)
self._parse()
@full_url.deleter
def full_url(self):
self._full_url = None
self.fragment = None
self.selector = ''
@property
def data(self):
return self._data
@data.setter
def data(self, data):
if data != self._data:
self._data = data
# issue 16464
# if we change data we need to remove content-length header
# (cause it's most probably calculated for previous value)
if self.has_header("Content-length"):
self.remove_header("Content-length")
@data.deleter
def data(self):
self.data = None
def _parse(self):
self.type, rest = splittype(self._full_url)
if self.type is None:
raise ValueError("unknown url type: %r" % self.full_url)
self.host, self.selector = splithost(rest)
if self.host:
self.host = unquote(self.host)
def get_method(self):
"""Return a string indicating the HTTP request method."""
default_method = "POST" if self.data is not None else "GET"
return getattr(self, 'method', default_method)
def get_full_url(self):
return self.full_url
def set_proxy(self, host, type):
if self.type == 'https' and not self._tunnel_host:
self._tunnel_host = self.host
else:
self.type= type
self.selector = self.full_url
self.host = host
def has_proxy(self):
return self.selector == self.full_url
def add_header(self, key, val):
# useful for something like authentication
self.headers[key.capitalize()] = val
def add_unredirected_header(self, key, val):
# will not be added to a redirected request
self.unredirected_hdrs[key.capitalize()] = val
def has_header(self, header_name):
return (header_name in self.headers or
header_name in self.unredirected_hdrs)
def get_header(self, header_name, default=None):
return self.headers.get(
header_name,
self.unredirected_hdrs.get(header_name, default))
def remove_header(self, header_name):
self.headers.pop(header_name, None)
self.unredirected_hdrs.pop(header_name, None)
def header_items(self):
hdrs = self.unredirected_hdrs.copy()
hdrs.update(self.headers)
return list(hdrs.items())
class OpenerDirector:
def __init__(self):
client_version = "Python-urllib/%s" % __version__
self.addheaders = [('User-agent', client_version)]
# self.handlers is retained only for backward compatibility
self.handlers = []
# manage the individual handlers
self.handle_open = {}
self.handle_error = {}
self.process_response = {}
self.process_request = {}
def add_handler(self, handler):
if not hasattr(handler, "add_parent"):
raise TypeError("expected BaseHandler instance, got %r" %
type(handler))
added = False
for meth in dir(handler):
if meth in ["redirect_request", "do_open", "proxy_open"]:
# oops, coincidental match
continue
i = meth.find("_")
protocol = meth[:i]
condition = meth[i+1:]
if condition.startswith("error"):
j = condition.find("_") + i + 1
kind = meth[j+1:]
try:
kind = int(kind)
except ValueError:
pass
lookup = self.handle_error.get(protocol, {})
self.handle_error[protocol] = lookup
elif condition == "open":
kind = protocol
lookup = self.handle_open
elif condition == "response":
kind = protocol
lookup = self.process_response
elif condition == "request":
kind = protocol
lookup = self.process_request
else:
continue
handlers = lookup.setdefault(kind, [])
if handlers:
bisect.insort(handlers, handler)
else:
handlers.append(handler)
added = True
if added:
bisect.insort(self.handlers, handler)
handler.add_parent(self)
def close(self):
# Only exists for backwards compatibility.
pass
def _call_chain(self, chain, kind, meth_name, *args):
# Handlers raise an exception if no one else should try to handle
# the request, or return None if they can't but another handler
# could. Otherwise, they return the response.
handlers = chain.get(kind, ())
for handler in handlers:
func = getattr(handler, meth_name)
result = func(*args)
if result is not None:
return result
def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
# accept a URL or a Request object
if isinstance(fullurl, str):
req = Request(fullurl, data)
else:
req = fullurl
if data is not None:
req.data = data
req.timeout = timeout
protocol = req.type
# pre-process request
meth_name = protocol+"_request"
for processor in self.process_request.get(protocol, []):
meth = getattr(processor, meth_name)
req = meth(req)
response = self._open(req, data)
# post-process response
meth_name = protocol+"_response"
for processor in self.process_response.get(protocol, []):
meth = getattr(processor, meth_name)
response = meth(req, response)
return response
def _open(self, req, data=None):
result = self._call_chain(self.handle_open, 'default',
'default_open', req)
if result:
return result
protocol = req.type
result = self._call_chain(self.handle_open, protocol, protocol +
'_open', req)
if result:
return result
return self._call_chain(self.handle_open, 'unknown',
'unknown_open', req)
def error(self, proto, *args):
if proto in ('http', 'https'):
# XXX http[s] protocols are special-cased
dict = self.handle_error['http'] # https is not different than http
proto = args[2] # YUCK!
meth_name = 'http_error_%s' % proto
http_err = 1
orig_args = args
else:
dict = self.handle_error
meth_name = proto + '_error'
http_err = 0
args = (dict, proto, meth_name) + args
result = self._call_chain(*args)
if result:
return result
if http_err:
args = (dict, 'default', 'http_error_default') + orig_args
return self._call_chain(*args)
# XXX probably also want an abstract factory that knows when it makes
# sense to skip a superclass in favor of a subclass and when it might
# make sense to include both
def build_opener(*handlers):
"""Create an opener object from a list of handlers.
The opener will use several default handlers, including support
for HTTP, FTP and when applicable HTTPS.
If any of the handlers passed as arguments are subclasses of the
default handlers, the default handlers will not be used.
"""
opener = OpenerDirector()
default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor,
DataHandler]
if hasattr(http.client, "HTTPSConnection"):
default_classes.append(HTTPSHandler)
skip = set()
for klass in default_classes:
for check in handlers:
if isinstance(check, type):
if issubclass(check, klass):
skip.add(klass)
elif isinstance(check, klass):
skip.add(klass)
for klass in skip:
default_classes.remove(klass)
for klass in default_classes:
opener.add_handler(klass())
for h in handlers:
if isinstance(h, type):
h = h()
opener.add_handler(h)
return opener
class BaseHandler:
handler_order = 500
def add_parent(self, parent):
self.parent = parent
def close(self):
# Only exists for backwards compatibility
pass
def __lt__(self, other):
if not hasattr(other, "handler_order"):
# Try to preserve the old behavior of having custom classes
# inserted after default ones (works only for custom user
# classes which are not aware of handler_order).
return True
return self.handler_order < other.handler_order
class HTTPErrorProcessor(BaseHandler):
"""Process HTTP error responses."""
handler_order = 1000 # after all other processing
def http_response(self, request, response):
code, msg, hdrs = response.code, response.msg, response.info()
# According to RFC 2616, "2xx" code indicates that the client's
# request was successfully received, understood, and accepted.
if not (200 <= code < 300):
response = self.parent.error(
'http', request, response, code, msg, hdrs)
return response
https_response = http_response
class HTTPDefaultErrorHandler(BaseHandler):
def http_error_default(self, req, fp, code, msg, hdrs):
raise HTTPError(req.full_url, code, msg, hdrs, fp)
class HTTPRedirectHandler(BaseHandler):
# maximum number of redirections to any single URL
# this is needed because of the state that cookies introduce
max_repeats = 4
# maximum total number of redirections (regardless of URL) before
# assuming we're in a loop
max_redirections = 10
def redirect_request(self, req, fp, code, msg, headers, newurl):
"""Return a Request or None in response to a redirect.
This is called by the http_error_30x methods when a
redirection response is received. If a redirection should
take place, return a new Request to allow http_error_30x to
perform the redirect. Otherwise, raise HTTPError if no-one
else should try to handle this url. Return None if you can't
but another Handler might.
"""
m = req.get_method()
if (not (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
or code in (301, 302, 303) and m == "POST")):
raise HTTPError(req.full_url, code, msg, headers, fp)
# Strictly (according to RFC 2616), 301 or 302 in response to
# a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib.request, in this case). In practice,
# essentially all clients do redirect in this case, so we do
# the same.
# be conciliant with URIs containing a space
newurl = newurl.replace(' ', '%20')
CONTENT_HEADERS = ("content-length", "content-type")
newheaders = dict((k, v) for k, v in req.headers.items()
if k.lower() not in CONTENT_HEADERS)
return Request(newurl,
headers=newheaders,
origin_req_host=req.origin_req_host,
unverifiable=True)
# Implementation note: To avoid the server sending us into an
# infinite loop, the request object needs to track what URLs we
# have already seen. Do this by adding a handler-specific
# attribute to the Request object.
def http_error_302(self, req, fp, code, msg, headers):
# Some servers (incorrectly) return multiple Location headers
# (so probably same goes for URI). Use first header.
if "location" in headers:
newurl = headers["location"]
elif "uri" in headers:
newurl = headers["uri"]
else:
return
# fix a possible malformed URL
urlparts = urlparse(newurl)
# For security reasons we don't allow redirection to anything other
# than http, https or ftp.
if urlparts.scheme not in ('http', 'https', 'ftp', ''):
raise HTTPError(
newurl, code,
"%s - Redirection to url '%s' is not allowed" % (msg, newurl),
headers, fp)
if not urlparts.path:
urlparts = list(urlparts)
urlparts[2] = "/"
newurl = urlunparse(urlparts)
newurl = urljoin(req.full_url, newurl)
# XXX Probably want to forget about the state of the current
# request, although that might interact poorly with other
# handlers that also use handler-specific request attributes
new = self.redirect_request(req, fp, code, msg, headers, newurl)
if new is None:
return
# loop detection
# .redirect_dict has a key url if url was previously visited.
if hasattr(req, 'redirect_dict'):
visited = new.redirect_dict = req.redirect_dict
if (visited.get(newurl, 0) >= self.max_repeats or
len(visited) >= self.max_redirections):
raise HTTPError(req.full_url, code,
self.inf_msg + msg, headers, fp)
else:
visited = new.redirect_dict = req.redirect_dict = {}
visited[newurl] = visited.get(newurl, 0) + 1
# Don't close the fp until we are sure that we won't use it
# with HTTPError.
fp.read()
fp.close()
return self.parent.open(new, timeout=req.timeout)
http_error_301 = http_error_303 = http_error_307 = http_error_302
inf_msg = "The HTTP server returned a redirect error that would " \
"lead to an infinite loop.\n" \
"The last 30x error message was:\n"
def _parse_proxy(proxy):
"""Return (scheme, user, password, host/port) given a URL or an authority.
If a URL is supplied, it must have an authority (host:port) component.
According to RFC 3986, having an authority component means the URL must
have two slashes after the scheme.
"""
scheme, r_scheme = splittype(proxy)
if not r_scheme.startswith("/"):
# authority
scheme = None
authority = proxy
else:
# URL
if not r_scheme.startswith("//"):
raise ValueError("proxy URL with no authority: %r" % proxy)
# We have an authority, so for RFC 3986-compliant URLs (by ss 3.
# and 3.3.), path is empty or starts with '/'
end = r_scheme.find("/", 2)
if end == -1:
end = None
authority = r_scheme[2:end]
userinfo, hostport = splituser(authority)
if userinfo is not None:
user, password = splitpasswd(userinfo)
else:
user = password = None
return scheme, user, password, hostport
class ProxyHandler(BaseHandler):
# Proxies must be in front
handler_order = 100
def __init__(self, proxies=None):
if proxies is None:
proxies = getproxies()
assert hasattr(proxies, 'keys'), "proxies must be a mapping"
self.proxies = proxies
for type, url in proxies.items():
setattr(self, '%s_open' % type,
lambda r, proxy=url, type=type, meth=self.proxy_open:
meth(r, proxy, type))
def proxy_open(self, req, proxy, type):
orig_type = req.type
proxy_type, user, password, hostport = _parse_proxy(proxy)
if proxy_type is None:
proxy_type = orig_type
if req.host and proxy_bypass(req.host):
return None
if user and password:
user_pass = '%s:%s' % (unquote(user),
unquote(password))
creds = base64.b64encode(user_pass.encode()).decode("ascii")
req.add_header('Proxy-authorization', 'Basic ' + creds)
hostport = unquote(hostport)
req.set_proxy(hostport, proxy_type)
if orig_type == proxy_type or orig_type == 'https':
# let other handlers take care of it
return None
else:
# need to start over, because the other handlers don't
# grok the proxy's URL type
# e.g. if we have a constructor arg proxies like so:
# {'http': 'ftp://proxy.example.com'}, we may end up turning
# a request for http://acme.example.com/a into one for
# ftp://proxy.example.com/a
return self.parent.open(req, timeout=req.timeout)
class HTTPPasswordMgr:
def __init__(self):
self.passwd = {}
def add_password(self, realm, uri, user, passwd):
# uri could be a single URI or a sequence
if isinstance(uri, str):
uri = [uri]
if realm not in self.passwd:
self.passwd[realm] = {}
for default_port in True, False:
reduced_uri = tuple(
[self.reduce_uri(u, default_port) for u in uri])
self.passwd[realm][reduced_uri] = (user, passwd)
def find_user_password(self, realm, authuri):
domains = self.passwd.get(realm, {})
for default_port in True, False:
reduced_authuri = self.reduce_uri(authuri, default_port)
for uris, authinfo in domains.items():
for uri in uris:
if self.is_suburi(uri, reduced_authuri):
return authinfo
return None, None
def reduce_uri(self, uri, default_port=True):
"""Accept authority or URI and extract only the authority and path."""
# note HTTP URLs do not have a userinfo component
parts = urlsplit(uri)
if parts[1]:
# URI
scheme = parts[0]
authority = parts[1]
path = parts[2] or '/'
else:
# host or host:port
scheme = None
authority = uri
path = '/'
host, port = splitport(authority)
if default_port and port is None and scheme is not None:
dport = {"http": 80,
"https": 443,
}.get(scheme)
if dport is not None:
authority = "%s:%d" % (host, dport)
return authority, path
def is_suburi(self, base, test):
"""Check if test is below base in a URI tree
Both args must be URIs in reduced form.
"""
if base == test:
return True
if base[0] != test[0]:
return False
common = posixpath.commonprefix((base[1], test[1]))
if len(common) == len(base[1]):
return True
return False
class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):
def find_user_password(self, realm, authuri):
user, password = HTTPPasswordMgr.find_user_password(self, realm,
authuri)
if user is not None:
return user, password
return HTTPPasswordMgr.find_user_password(self, None, authuri)
class AbstractBasicAuthHandler:
# XXX this allows for multiple auth-schemes, but will stupidly pick
# the last one with a realm specified.
# allow for double- and single-quoted realm values
# (single quotes are a violation of the RFC, but appear in the wild)
rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
'realm=(["\']?)([^"\']*)\\2', re.I)
# XXX could pre-emptively send auth info already accepted (RFC 2617,
# end of section 2, and section 1.2 immediately after "credentials"
# production).
def __init__(self, password_mgr=None):
if password_mgr is None:
password_mgr = HTTPPasswordMgr()
self.passwd = password_mgr
self.add_password = self.passwd.add_password
def http_error_auth_reqed(self, authreq, host, req, headers):
# host may be an authority (without userinfo) or a URL with an
# authority
# XXX could be multiple headers
authreq = headers.get(authreq, None)
if authreq:
scheme = authreq.split()[0]
if scheme.lower() != 'basic':
raise ValueError("AbstractBasicAuthHandler does not"
" support the following scheme: '%s'" %
scheme)
else:
mo = AbstractBasicAuthHandler.rx.search(authreq)
if mo:
scheme, quote, realm = mo.groups()
if quote not in ['"',"'"]:
warnings.warn("Basic Auth Realm was unquoted",
UserWarning, 2)
if scheme.lower() == 'basic':
return self.retry_http_basic_auth(host, req, realm)
def retry_http_basic_auth(self, host, req, realm):
user, pw = self.passwd.find_user_password(realm, host)
if pw is not None:
raw = "%s:%s" % (user, pw)
auth = "Basic " + base64.b64encode(raw.encode()).decode("ascii")
if req.get_header(self.auth_header, None) == auth:
return None
req.add_unredirected_header(self.auth_header, auth)
return self.parent.open(req, timeout=req.timeout)
else:
return None
class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
auth_header = 'Authorization'
def http_error_401(self, req, fp, code, msg, headers):
url = req.full_url
response = self.http_error_auth_reqed('www-authenticate',
url, req, headers)
return response
class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
auth_header = 'Proxy-authorization'
def http_error_407(self, req, fp, code, msg, headers):
# http_error_auth_reqed requires that there is no userinfo component in
# authority. Assume there isn't one, since urllib.request does not (and
# should not, RFC 3986 s. 3.2.1) support requests for URLs containing
# userinfo.
authority = req.host
response = self.http_error_auth_reqed('proxy-authenticate',
authority, req, headers)
return response
# Return n random bytes.
_randombytes = os.urandom
class AbstractDigestAuthHandler:
# Digest authentication is specified in RFC 2617.
# XXX The client does not inspect the Authentication-Info header
# in a successful response.
# XXX It should be possible to test this implementation against
# a mock server that just generates a static set of challenges.
# XXX qop="auth-int" supports is shaky
def __init__(self, passwd=None):
if passwd is None:
passwd = HTTPPasswordMgr()
self.passwd = passwd
self.add_password = self.passwd.add_password
self.retried = 0
self.nonce_count = 0
self.last_nonce = None
def reset_retry_count(self):
self.retried = 0
def http_error_auth_reqed(self, auth_header, host, req, headers):
authreq = headers.get(auth_header, None)
if self.retried > 5:
# Don't fail endlessly - if we failed once, we'll probably
# fail a second time. Hm. Unless the Password Manager is
# prompting for the information. Crap. This isn't great
# but it's better than the current 'repeat until recursion
# depth exceeded' approach <wink>
raise HTTPError(req.full_url, 401, "digest auth failed",
headers, None)
else:
self.retried += 1
if authreq:
scheme = authreq.split()[0]
if scheme.lower() == 'digest':
return self.retry_http_digest_auth(req, authreq)
elif scheme.lower() != 'basic':
raise ValueError("AbstractDigestAuthHandler does not support"
" the following scheme: '%s'" % scheme)
def retry_http_digest_auth(self, req, auth):
token, challenge = auth.split(' ', 1)
chal = parse_keqv_list(filter(None, parse_http_list(challenge)))
auth = self.get_authorization(req, chal)
if auth:
auth_val = 'Digest %s' % auth
if req.headers.get(self.auth_header, None) == auth_val:
return None
req.add_unredirected_header(self.auth_header, auth_val)
resp = self.parent.open(req, timeout=req.timeout)
return resp
def get_cnonce(self, nonce):
# The cnonce-value is an opaque
# quoted string value provided by the client and used by both client
# and server to avoid chosen plaintext attacks, to provide mutual
# authentication, and to provide some message integrity protection.
# This isn't a fabulous effort, but it's probably Good Enough.
s = "%s:%s:%s:" % (self.nonce_count, nonce, time.ctime())
b = s.encode("ascii") + _randombytes(8)
dig = hashlib.sha1(b).hexdigest()
return dig[:16]
def get_authorization(self, req, chal):
try:
realm = chal['realm']
nonce = chal['nonce']
qop = chal.get('qop')
algorithm = chal.get('algorithm', 'MD5')
# mod_digest doesn't send an opaque, even though it isn't
# supposed to be optional
opaque = chal.get('opaque', None)
except KeyError:
return None
H, KD = self.get_algorithm_impls(algorithm)
if H is None:
return None
user, pw = self.passwd.find_user_password(realm, req.full_url)
if user is None:
return None
# XXX not implemented yet
if req.data is not None:
entdig = self.get_entity_digest(req.data, chal)
else:
entdig = None
A1 = "%s:%s:%s" % (user, realm, pw)
A2 = "%s:%s" % (req.get_method(),
# XXX selector: what about proxies and full urls
req.selector)
if qop == 'auth':
if nonce == self.last_nonce:
self.nonce_count += 1
else:
self.nonce_count = 1
self.last_nonce = nonce
ncvalue = '%08x' % self.nonce_count
cnonce = self.get_cnonce(nonce)
noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
respdig = KD(H(A1), noncebit)
elif qop is None:
respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
else:
# XXX handle auth-int.
raise URLError("qop '%s' is not supported." % qop)
# XXX should the partial digests be encoded too?
base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
'response="%s"' % (user, realm, nonce, req.selector,
respdig)
if opaque:
base += ', opaque="%s"' % opaque
if entdig:
base += ', digest="%s"' % entdig
base += ', algorithm="%s"' % algorithm
if qop:
base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
return base
def get_algorithm_impls(self, algorithm):
# lambdas assume digest modules are imported at the top level
if algorithm == 'MD5':
H = lambda x: hashlib.md5(x.encode("ascii")).hexdigest()
elif algorithm == 'SHA':
H = lambda x: hashlib.sha1(x.encode("ascii")).hexdigest()
# XXX MD5-sess
KD = lambda s, d: H("%s:%s" % (s, d))
return H, KD
def get_entity_digest(self, data, chal):
# XXX not implemented yet
return None
class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
"""An authentication protocol defined by RFC 2069
Digest authentication improves on basic authentication because it
does not transmit passwords in the clear.
"""
auth_header = 'Authorization'
handler_order = 490 # before Basic auth
def http_error_401(self, req, fp, code, msg, headers):
host = urlparse(req.full_url)[1]
retry = self.http_error_auth_reqed('www-authenticate',
host, req, headers)
self.reset_retry_count()
return retry
class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
auth_header = 'Proxy-Authorization'
handler_order = 490 # before Basic auth
def http_error_407(self, req, fp, code, msg, headers):
host = req.host
retry = self.http_error_auth_reqed('proxy-authenticate',
host, req, headers)
self.reset_retry_count()
return retry
class AbstractHTTPHandler(BaseHandler):
def __init__(self, debuglevel=0):
self._debuglevel = debuglevel
def set_http_debuglevel(self, level):
self._debuglevel = level
def do_request_(self, request):
host = request.host
if not host:
raise URLError('no host given')
if request.data is not None: # POST
data = request.data
if isinstance(data, str):
msg = "POST data should be bytes or an iterable of bytes. " \
"It cannot be of type str."
raise TypeError(msg)
if not request.has_header('Content-type'):
request.add_unredirected_header(
'Content-type',
'application/x-www-form-urlencoded')
if not request.has_header('Content-length'):
try:
mv = memoryview(data)
except TypeError:
if isinstance(data, collections.Iterable):
raise ValueError("Content-Length should be specified "
"for iterable data of type %r %r" % (type(data),
data))
else:
request.add_unredirected_header(
'Content-length', '%d' % (len(mv) * mv.itemsize))
sel_host = host
if request.has_proxy():
scheme, sel = splittype(request.selector)
sel_host, sel_path = splithost(sel)
if not request.has_header('Host'):
request.add_unredirected_header('Host', sel_host)
for name, value in self.parent.addheaders:
name = name.capitalize()
if not request.has_header(name):
request.add_unredirected_header(name, value)
return request
def do_open(self, http_class, req, **http_conn_args):
"""Return an HTTPResponse object for the request, using http_class.
http_class must implement the HTTPConnection API from http.client.
"""
host = req.host
if not host:
raise URLError('no host given')
# will parse host:port
h = http_class(host, timeout=req.timeout, **http_conn_args)
headers = dict(req.unredirected_hdrs)
headers.update(dict((k, v) for k, v in req.headers.items()
if k not in headers))
# TODO(jhylton): Should this be redesigned to handle
# persistent connections?
# We want to make an HTTP/1.1 request, but the addinfourl
# class isn't prepared to deal with a persistent connection.
# It will try to read all remaining data from the socket,
# which will block while the server waits for the next request.
# So make sure the connection gets closed after the (only)
# request.
headers["Connection"] = "close"
headers = dict((name.title(), val) for name, val in headers.items())
if req._tunnel_host:
tunnel_headers = {}
proxy_auth_hdr = "Proxy-Authorization"
if proxy_auth_hdr in headers:
tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]
# Proxy-Authorization should not be sent to origin
# server.
del headers[proxy_auth_hdr]
h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
try:
try:
h.request(req.get_method(), req.selector, req.data, headers)
except OSError as err: # timeout error
raise URLError(err)
r = h.getresponse()
except:
h.close()
raise
# If the server does not send us a 'Connection: close' header,
# HTTPConnection assumes the socket should be left open. Manually
# mark the socket to be closed when this response object goes away.
if h.sock:
h.sock.close()
h.sock = None
r.url = req.get_full_url()
# This line replaces the .msg attribute of the HTTPResponse
# with .headers, because urllib clients expect the response to
# have the reason in .msg. It would be good to mark this
# attribute is deprecated and get then to use info() or
# .headers.
r.msg = r.reason
return r
class HTTPHandler(AbstractHTTPHandler):
def http_open(self, req):
return self.do_open(http.client.HTTPConnection, req)
http_request = AbstractHTTPHandler.do_request_
if hasattr(http.client, 'HTTPSConnection'):
class HTTPSHandler(AbstractHTTPHandler):
def __init__(self, debuglevel=0, context=None, check_hostname=None):
AbstractHTTPHandler.__init__(self, debuglevel)
self._context = context
self._check_hostname = check_hostname
def https_open(self, req):
return self.do_open(http.client.HTTPSConnection, req,
context=self._context, check_hostname=self._check_hostname)
https_request = AbstractHTTPHandler.do_request_
__all__.append('HTTPSHandler')
class HTTPCookieProcessor(BaseHandler):
def __init__(self, cookiejar=None):
import http.cookiejar
if cookiejar is None:
cookiejar = http.cookiejar.CookieJar()
self.cookiejar = cookiejar
def http_request(self, request):
self.cookiejar.add_cookie_header(request)
return request
def http_response(self, request, response):
self.cookiejar.extract_cookies(response, request)
return response
https_request = http_request
https_response = http_response
class UnknownHandler(BaseHandler):
def unknown_open(self, req):
type = req.type
raise URLError('unknown url type: %s' % type)
def parse_keqv_list(l):
"""Parse list of key=value strings where keys are not duplicated."""
parsed = {}
for elt in l:
k, v = elt.split('=', 1)
if v[0] == '"' and v[-1] == '"':
v = v[1:-1]
parsed[k] = v
return parsed
def parse_http_list(s):
"""Parse lists as described by RFC 2068 Section 2.
In particular, parse comma-separated lists where the elements of
the list may include quoted-strings. A quoted-string could
contain a comma. A non-quoted string could have quotes in the
middle. Neither commas nor quotes count if they are escaped.
Only double-quotes count, not single-quotes.
"""
res = []
part = ''
escape = quote = False
for cur in s:
if escape:
part += cur
escape = False
continue
if quote:
if cur == '\\':
escape = True
continue
elif cur == '"':
quote = False
part += cur
continue
if cur == ',':
res.append(part)
part = ''
continue
if cur == '"':
quote = True
part += cur
# append last part
if part:
res.append(part)
return [part.strip() for part in res]
class FileHandler(BaseHandler):
# Use local file or FTP depending on form of URL
def file_open(self, req):
url = req.selector
if url[:2] == '//' and url[2:3] != '/' and (req.host and
req.host != 'localhost'):
if not req.host in self.get_names():
raise URLError("file:// scheme is supported only on localhost")
else:
return self.open_local_file(req)
# names for the localhost
names = None
def get_names(self):
if FileHandler.names is None:
try:
FileHandler.names = tuple(
socket.gethostbyname_ex('localhost')[2] +
socket.gethostbyname_ex(socket.gethostname())[2])
except socket.gaierror:
FileHandler.names = (socket.gethostbyname('localhost'),)
return FileHandler.names
# not entirely sure what the rules are here
def open_local_file(self, req):
import email.utils
import mimetypes
host = req.host
filename = req.selector
localfile = url2pathname(filename)
try:
stats = os.stat(localfile)
size = stats.st_size
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
mtype = mimetypes.guess_type(filename)[0]
headers = email.message_from_string(
'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
(mtype or 'text/plain', size, modified))
if host:
host, port = splitport(host)
if not host or \
(not port and _safe_gethostbyname(host) in self.get_names()):
if host:
origurl = 'file://' + host + filename
else:
origurl = 'file://' + filename
return addinfourl(open(localfile, 'rb'), headers, origurl)
except OSError as exp:
# users shouldn't expect OSErrors coming from urlopen()
raise URLError(exp)
raise URLError('file not on local host')
def _safe_gethostbyname(host):
try:
return socket.gethostbyname(host)
except socket.gaierror:
return None
class FTPHandler(BaseHandler):
def ftp_open(self, req):
import ftplib
import mimetypes
host = req.host
if not host:
raise URLError('ftp error: no host given')
host, port = splitport(host)
if port is None:
port = ftplib.FTP_PORT
else:
port = int(port)
# username/password handling
user, host = splituser(host)
if user:
user, passwd = splitpasswd(user)
else:
passwd = None
host = unquote(host)
user = user or ''
passwd = passwd or ''
try:
host = socket.gethostbyname(host)
except OSError as msg:
raise URLError(msg)
path, attrs = splitattr(req.selector)
dirs = path.split('/')
dirs = list(map(unquote, dirs))
dirs, file = dirs[:-1], dirs[-1]
if dirs and not dirs[0]:
dirs = dirs[1:]
try:
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
type = file and 'I' or 'D'
for attr in attrs:
attr, value = splitvalue(attr)
if attr.lower() == 'type' and \
value in ('a', 'A', 'i', 'I', 'd', 'D'):
type = value.upper()
fp, retrlen = fw.retrfile(file, type)
headers = ""
mtype = mimetypes.guess_type(req.full_url)[0]
if mtype:
headers += "Content-type: %s\n" % mtype
if retrlen is not None and retrlen >= 0:
headers += "Content-length: %d\n" % retrlen
headers = email.message_from_string(headers)
return addinfourl(fp, headers, req.full_url)
except ftplib.all_errors as exp:
exc = URLError('ftp error: %r' % exp)
raise exc.with_traceback(sys.exc_info()[2])
def connect_ftp(self, user, passwd, host, port, dirs, timeout):
return ftpwrapper(user, passwd, host, port, dirs, timeout,
persistent=False)
class CacheFTPHandler(FTPHandler):
# XXX would be nice to have pluggable cache strategies
# XXX this stuff is definitely not thread safe
def __init__(self):
self.cache = {}
self.timeout = {}
self.soonest = 0
self.delay = 60
self.max_conns = 16
def setTimeout(self, t):
self.delay = t
def setMaxConns(self, m):
self.max_conns = m
def connect_ftp(self, user, passwd, host, port, dirs, timeout):
key = user, host, port, '/'.join(dirs), timeout
if key in self.cache:
self.timeout[key] = time.time() + self.delay
else:
self.cache[key] = ftpwrapper(user, passwd, host, port,
dirs, timeout)
self.timeout[key] = time.time() + self.delay
self.check_cache()
return self.cache[key]
def check_cache(self):
# first check for old ones
t = time.time()
if self.soonest <= t:
for k, v in list(self.timeout.items()):
if v < t:
self.cache[k].close()
del self.cache[k]
del self.timeout[k]
self.soonest = min(list(self.timeout.values()))
# then check the size
if len(self.cache) == self.max_conns:
for k, v in list(self.timeout.items()):
if v == self.soonest:
del self.cache[k]
del self.timeout[k]
break
self.soonest = min(list(self.timeout.values()))
def clear_cache(self):
for conn in self.cache.values():
conn.close()
self.cache.clear()
self.timeout.clear()
class DataHandler(BaseHandler):
def data_open(self, req):
# data URLs as specified in RFC 2397.
#
# ignores POSTed data
#
# syntax:
# dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
# mediatype := [ type "/" subtype ] *( ";" parameter )
# data := *urlchar
# parameter := attribute "=" value
url = req.full_url
scheme, data = url.split(":",1)
mediatype, data = data.split(",",1)
# even base64 encoded data URLs might be quoted so unquote in any case:
data = unquote_to_bytes(data)
if mediatype.endswith(";base64"):
data = base64.decodebytes(data)
mediatype = mediatype[:-7]
if not mediatype:
mediatype = "text/plain;charset=US-ASCII"
headers = email.message_from_string("Content-type: %s\nContent-length: %d\n" %
(mediatype, len(data)))
return addinfourl(io.BytesIO(data), headers, url)
# Code move from the old urllib module
MAXFTPCACHE = 10 # Trim the ftp cache beyond this size
# Helper for non-unix systems
if os.name == 'nt':
from nturl2path import url2pathname, pathname2url
else:
def url2pathname(pathname):
"""OS-specific conversion from a relative URL of the 'file' scheme
to a file system path; not recommended for general use."""
return unquote(pathname)
def pathname2url(pathname):
"""OS-specific conversion from a file system path to a relative URL
of the 'file' scheme; not recommended for general use."""
return quote(pathname)
# This really consists of two pieces:
# (1) a class which handles opening of all sorts of URLs
# (plus assorted utilities etc.)
# (2) a set of functions for parsing URLs
# XXX Should these be separated out into different modules?
ftpcache = {}
class URLopener:
"""Class to open URLs.
This is a class rather than just a subroutine because we may need
more than one set of global protocol-specific options.
Note -- this is a base class for those who don't want the
automatic handling of errors type 302 (relocated) and 401
(authorization needed)."""
__tempfiles = None
version = "Python-urllib/%s" % __version__
# Constructor
def __init__(self, proxies=None, **x509):
msg = "%(class)s style of invoking requests is deprecated. " \
"Use newer urlopen functions/methods" % {'class': self.__class__.__name__}
warnings.warn(msg, DeprecationWarning, stacklevel=3)
if proxies is None:
proxies = getproxies()
assert hasattr(proxies, 'keys'), "proxies must be a mapping"
self.proxies = proxies
self.key_file = x509.get('key_file')
self.cert_file = x509.get('cert_file')
self.addheaders = [('User-Agent', self.version)]
self.__tempfiles = []
self.__unlink = os.unlink # See cleanup()
self.tempcache = None
# Undocumented feature: if you assign {} to tempcache,
# it is used to cache files retrieved with
# self.retrieve(). This is not enabled by default
# since it does not work for changing documents (and I
# haven't got the logic to check expiration headers
# yet).
self.ftpcache = ftpcache
# Undocumented feature: you can use a different
# ftp cache by assigning to the .ftpcache member;
# in case you want logically independent URL openers
# XXX This is not threadsafe. Bah.
def __del__(self):
self.close()
def close(self):
self.cleanup()
def cleanup(self):
# This code sometimes runs when the rest of this module
# has already been deleted, so it can't use any globals
# or import anything.
if self.__tempfiles:
for file in self.__tempfiles:
try:
self.__unlink(file)
except OSError:
pass
del self.__tempfiles[:]
if self.tempcache:
self.tempcache.clear()
def addheader(self, *args):
"""Add a header to be used by the HTTP interface only
e.g. u.addheader('Accept', 'sound/basic')"""
self.addheaders.append(args)
# External interface
def open(self, fullurl, data=None):
"""Use URLopener().open(file) instead of open(file, 'r')."""
fullurl = unwrap(to_bytes(fullurl))
fullurl = quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]|")
if self.tempcache and fullurl in self.tempcache:
filename, headers = self.tempcache[fullurl]
fp = open(filename, 'rb')
return addinfourl(fp, headers, fullurl)
urltype, url = splittype(fullurl)
if not urltype:
urltype = 'file'
if urltype in self.proxies:
proxy = self.proxies[urltype]
urltype, proxyhost = splittype(proxy)
host, selector = splithost(proxyhost)
url = (host, fullurl) # Signal special case to open_*()
else:
proxy = None
name = 'open_' + urltype
self.type = urltype
name = name.replace('-', '_')
if not hasattr(self, name):
if proxy:
return self.open_unknown_proxy(proxy, fullurl, data)
else:
return self.open_unknown(fullurl, data)
try:
if data is None:
return getattr(self, name)(url)
else:
return getattr(self, name)(url, data)
except (HTTPError, URLError):
raise
except OSError as msg:
raise OSError('socket error', msg).with_traceback(sys.exc_info()[2])
def open_unknown(self, fullurl, data=None):
"""Overridable interface to open unknown URL type."""
type, url = splittype(fullurl)
raise OSError('url error', 'unknown url type', type)
def open_unknown_proxy(self, proxy, fullurl, data=None):
"""Overridable interface to open unknown URL type."""
type, url = splittype(fullurl)
raise OSError('url error', 'invalid proxy for %s' % type, proxy)
# External interface
def retrieve(self, url, filename=None, reporthook=None, data=None):
"""retrieve(url) returns (filename, headers) for a local object
or (tempfilename, headers) for a remote object."""
url = unwrap(to_bytes(url))
if self.tempcache and url in self.tempcache:
return self.tempcache[url]
type, url1 = splittype(url)
if filename is None and (not type or type == 'file'):
try:
fp = self.open_local_file(url1)
hdrs = fp.info()
fp.close()
return url2pathname(splithost(url1)[1]), hdrs
except OSError as msg:
pass
fp = self.open(url, data)
try:
headers = fp.info()
if filename:
tfp = open(filename, 'wb')
else:
import tempfile
garbage, path = splittype(url)
garbage, path = splithost(path or "")
path, garbage = splitquery(path or "")
path, garbage = splitattr(path or "")
suffix = os.path.splitext(path)[1]
(fd, filename) = tempfile.mkstemp(suffix)
self.__tempfiles.append(filename)
tfp = os.fdopen(fd, 'wb')
try:
result = filename, headers
if self.tempcache is not None:
self.tempcache[url] = result
bs = 1024*8
size = -1
read = 0
blocknum = 0
if "content-length" in headers:
size = int(headers["Content-Length"])
if reporthook:
reporthook(blocknum, bs, size)
while 1:
block = fp.read(bs)
if not block:
break
read += len(block)
tfp.write(block)
blocknum += 1
if reporthook:
reporthook(blocknum, bs, size)
finally:
tfp.close()
finally:
fp.close()
# raise exception if actual size does not match content-length header
if size >= 0 and read < size:
raise ContentTooShortError(
"retrieval incomplete: got only %i out of %i bytes"
% (read, size), result)
return result
# Each method named open_<type> knows how to open that type of URL
def _open_generic_http(self, connection_factory, url, data):
"""Make an HTTP connection using connection_class.
This is an internal method that should be called from
open_http() or open_https().
Arguments:
- connection_factory should take a host name and return an
HTTPConnection instance.
- url is the url to retrieval or a host, relative-path pair.
- data is payload for a POST request or None.
"""
user_passwd = None
proxy_passwd= None
if isinstance(url, str):
host, selector = splithost(url)
if host:
user_passwd, host = splituser(host)
host = unquote(host)
realhost = host
else:
host, selector = url
# check whether the proxy contains authorization information
proxy_passwd, host = splituser(host)
# now we proceed with the url we want to obtain
urltype, rest = splittype(selector)
url = rest
user_passwd = None
if urltype.lower() != 'http':
realhost = None
else:
realhost, rest = splithost(rest)
if realhost:
user_passwd, realhost = splituser(realhost)
if user_passwd:
selector = "%s://%s%s" % (urltype, realhost, rest)
if proxy_bypass(realhost):
host = realhost
if not host: raise OSError('http error', 'no host given')
if proxy_passwd:
proxy_passwd = unquote(proxy_passwd)
proxy_auth = base64.b64encode(proxy_passwd.encode()).decode('ascii')
else:
proxy_auth = None
if user_passwd:
user_passwd = unquote(user_passwd)
auth = base64.b64encode(user_passwd.encode()).decode('ascii')
else:
auth = None
http_conn = connection_factory(host)
headers = {}
if proxy_auth:
headers["Proxy-Authorization"] = "Basic %s" % proxy_auth
if auth:
headers["Authorization"] = "Basic %s" % auth
if realhost:
headers["Host"] = realhost
# Add Connection:close as we don't support persistent connections yet.
# This helps in closing the socket and avoiding ResourceWarning
headers["Connection"] = "close"
for header, value in self.addheaders:
headers[header] = value
if data is not None:
headers["Content-Type"] = "application/x-www-form-urlencoded"
http_conn.request("POST", selector, data, headers)
else:
http_conn.request("GET", selector, headers=headers)
try:
response = http_conn.getresponse()
except http.client.BadStatusLine:
# something went wrong with the HTTP status line
raise URLError("http protocol error: bad status line")
# According to RFC 2616, "2xx" code indicates that the client's
# request was successfully received, understood, and accepted.
if 200 <= response.status < 300:
return addinfourl(response, response.msg, "http:" + url,
response.status)
else:
return self.http_error(
url, response.fp,
response.status, response.reason, response.msg, data)
def open_http(self, url, data=None):
"""Use HTTP protocol."""
return self._open_generic_http(http.client.HTTPConnection, url, data)
def http_error(self, url, fp, errcode, errmsg, headers, data=None):
"""Handle http errors.
Derived class can override this, or provide specific handlers
named http_error_DDD where DDD is the 3-digit error code."""
# First check if there's a specific handler for this error
name = 'http_error_%d' % errcode
if hasattr(self, name):
method = getattr(self, name)
if data is None:
result = method(url, fp, errcode, errmsg, headers)
else:
result = method(url, fp, errcode, errmsg, headers, data)
if result: return result
return self.http_error_default(url, fp, errcode, errmsg, headers)
def http_error_default(self, url, fp, errcode, errmsg, headers):
"""Default error handler: close the connection and raise OSError."""
fp.close()
raise HTTPError(url, errcode, errmsg, headers, None)
if _have_ssl:
def _https_connection(self, host):
return http.client.HTTPSConnection(host,
key_file=self.key_file,
cert_file=self.cert_file)
def open_https(self, url, data=None):
"""Use HTTPS protocol."""
return self._open_generic_http(self._https_connection, url, data)
def open_file(self, url):
"""Use local file or FTP depending on form of URL."""
if not isinstance(url, str):
raise URLError('file error: proxy support for file protocol currently not implemented')
if url[:2] == '//' and url[2:3] != '/' and url[2:12].lower() != 'localhost/':
raise ValueError("file:// scheme is supported only on localhost")
else:
return self.open_local_file(url)
def open_local_file(self, url):
"""Use local file."""
import email.utils
import mimetypes
host, file = splithost(url)
localname = url2pathname(file)
try:
stats = os.stat(localname)
except OSError as e:
raise URLError(e.strerror, e.filename)
size = stats.st_size
modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
mtype = mimetypes.guess_type(url)[0]
headers = email.message_from_string(
'Content-Type: %s\nContent-Length: %d\nLast-modified: %s\n' %
(mtype or 'text/plain', size, modified))
if not host:
urlfile = file
if file[:1] == '/':
urlfile = 'file://' + file
return addinfourl(open(localname, 'rb'), headers, urlfile)
host, port = splitport(host)
if (not port
and socket.gethostbyname(host) in ((localhost(),) + thishost())):
urlfile = file
if file[:1] == '/':
urlfile = 'file://' + file
elif file[:2] == './':
raise ValueError("local file url may start with / or file:. Unknown url of type: %s" % url)
return addinfourl(open(localname, 'rb'), headers, urlfile)
raise URLError('local file error: not on local host')
def open_ftp(self, url):
"""Use FTP protocol."""
if not isinstance(url, str):
raise URLError('ftp error: proxy support for ftp protocol currently not implemented')
import mimetypes
host, path = splithost(url)
if not host: raise URLError('ftp error: no host given')
host, port = splitport(host)
user, host = splituser(host)
if user: user, passwd = splitpasswd(user)
else: passwd = None
host = unquote(host)
user = unquote(user or '')
passwd = unquote(passwd or '')
host = socket.gethostbyname(host)
if not port:
import ftplib
port = ftplib.FTP_PORT
else:
port = int(port)
path, attrs = splitattr(path)
path = unquote(path)
dirs = path.split('/')
dirs, file = dirs[:-1], dirs[-1]
if dirs and not dirs[0]: dirs = dirs[1:]
if dirs and not dirs[0]: dirs[0] = '/'
key = user, host, port, '/'.join(dirs)
# XXX thread unsafe!
if len(self.ftpcache) > MAXFTPCACHE:
# Prune the cache, rather arbitrarily
for k in list(self.ftpcache):
if k != key:
v = self.ftpcache[k]
del self.ftpcache[k]
v.close()
try:
if key not in self.ftpcache:
self.ftpcache[key] = \
ftpwrapper(user, passwd, host, port, dirs)
if not file: type = 'D'
else: type = 'I'
for attr in attrs:
attr, value = splitvalue(attr)
if attr.lower() == 'type' and \
value in ('a', 'A', 'i', 'I', 'd', 'D'):
type = value.upper()
(fp, retrlen) = self.ftpcache[key].retrfile(file, type)
mtype = mimetypes.guess_type("ftp:" + url)[0]
headers = ""
if mtype:
headers += "Content-Type: %s\n" % mtype
if retrlen is not None and retrlen >= 0:
headers += "Content-Length: %d\n" % retrlen
headers = email.message_from_string(headers)
return addinfourl(fp, headers, "ftp:" + url)
except ftperrors() as exp:
raise URLError('ftp error %r' % exp).with_traceback(sys.exc_info()[2])
def open_data(self, url, data=None):
"""Use "data" URL."""
if not isinstance(url, str):
raise URLError('data error: proxy support for data protocol currently not implemented')
# ignore POSTed data
#
# syntax of data URLs:
# dataurl := "data:" [ mediatype ] [ ";base64" ] "," data
# mediatype := [ type "/" subtype ] *( ";" parameter )
# data := *urlchar
# parameter := attribute "=" value
try:
[type, data] = url.split(',', 1)
except ValueError:
raise OSError('data error', 'bad data URL')
if not type:
type = 'text/plain;charset=US-ASCII'
semi = type.rfind(';')
if semi >= 0 and '=' not in type[semi:]:
encoding = type[semi+1:]
type = type[:semi]
else:
encoding = ''
msg = []
msg.append('Date: %s'%time.strftime('%a, %d %b %Y %H:%M:%S GMT',
time.gmtime(time.time())))
msg.append('Content-type: %s' % type)
if encoding == 'base64':
# XXX is this encoding/decoding ok?
data = base64.decodebytes(data.encode('ascii')).decode('latin-1')
else:
data = unquote(data)
msg.append('Content-Length: %d' % len(data))
msg.append('')
msg.append(data)
msg = '\n'.join(msg)
headers = email.message_from_string(msg)
f = io.StringIO(msg)
#f.fileno = None # needed for addinfourl
return addinfourl(f, headers, url)
class FancyURLopener(URLopener):
"""Derived class with handlers for errors we can handle (perhaps)."""
def __init__(self, *args, **kwargs):
URLopener.__init__(self, *args, **kwargs)
self.auth_cache = {}
self.tries = 0
self.maxtries = 10
def http_error_default(self, url, fp, errcode, errmsg, headers):
"""Default error handling -- don't raise an exception."""
return addinfourl(fp, headers, "http:" + url, errcode)
def http_error_302(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 302 -- relocated (temporarily)."""
self.tries += 1
if self.maxtries and self.tries >= self.maxtries:
if hasattr(self, "http_error_500"):
meth = self.http_error_500
else:
meth = self.http_error_default
self.tries = 0
return meth(url, fp, 500,
"Internal Server Error: Redirect Recursion", headers)
result = self.redirect_internal(url, fp, errcode, errmsg, headers,
data)
self.tries = 0
return result
def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
if 'location' in headers:
newurl = headers['location']
elif 'uri' in headers:
newurl = headers['uri']
else:
return
fp.close()
# In case the server sent a relative URL, join with original:
newurl = urljoin(self.type + ":" + url, newurl)
urlparts = urlparse(newurl)
# For security reasons, we don't allow redirection to anything other
# than http, https and ftp.
# We are using newer HTTPError with older redirect_internal method
# This older method will get deprecated in 3.3
if urlparts.scheme not in ('http', 'https', 'ftp', ''):
raise HTTPError(newurl, errcode,
errmsg +
" Redirection to url '%s' is not allowed." % newurl,
headers, fp)
return self.open(newurl)
def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 301 -- also relocated (permanently)."""
return self.http_error_302(url, fp, errcode, errmsg, headers, data)
def http_error_303(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 303 -- also relocated (essentially identical to 302)."""
return self.http_error_302(url, fp, errcode, errmsg, headers, data)
def http_error_307(self, url, fp, errcode, errmsg, headers, data=None):
"""Error 307 -- relocated, but turn POST into error."""
if data is None:
return self.http_error_302(url, fp, errcode, errmsg, headers, data)
else:
return self.http_error_default(url, fp, errcode, errmsg, headers)
def http_error_401(self, url, fp, errcode, errmsg, headers, data=None,
retry=False):
"""Error 401 -- authentication required.
This function supports Basic authentication only."""
if 'www-authenticate' not in headers:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
stuff = headers['www-authenticate']
match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff)
if not match:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
scheme, realm = match.groups()
if scheme.lower() != 'basic':
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
if not retry:
URLopener.http_error_default(self, url, fp, errcode, errmsg,
headers)
name = 'retry_' + self.type + '_basic_auth'
if data is None:
return getattr(self,name)(url, realm)
else:
return getattr(self,name)(url, realm, data)
def http_error_407(self, url, fp, errcode, errmsg, headers, data=None,
retry=False):
"""Error 407 -- proxy authentication required.
This function supports Basic authentication only."""
if 'proxy-authenticate' not in headers:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
stuff = headers['proxy-authenticate']
match = re.match('[ \t]*([^ \t]+)[ \t]+realm="([^"]*)"', stuff)
if not match:
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
scheme, realm = match.groups()
if scheme.lower() != 'basic':
URLopener.http_error_default(self, url, fp,
errcode, errmsg, headers)
if not retry:
URLopener.http_error_default(self, url, fp, errcode, errmsg,
headers)
name = 'retry_proxy_' + self.type + '_basic_auth'
if data is None:
return getattr(self,name)(url, realm)
else:
return getattr(self,name)(url, realm, data)
def retry_proxy_http_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
newurl = 'http://' + host + selector
proxy = self.proxies['http']
urltype, proxyhost = splittype(proxy)
proxyhost, proxyselector = splithost(proxyhost)
i = proxyhost.find('@') + 1
proxyhost = proxyhost[i:]
user, passwd = self.get_user_passwd(proxyhost, realm, i)
if not (user or passwd): return None
proxyhost = "%s:%s@%s" % (quote(user, safe=''),
quote(passwd, safe=''), proxyhost)
self.proxies['http'] = 'http://' + proxyhost + proxyselector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def retry_proxy_https_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
newurl = 'https://' + host + selector
proxy = self.proxies['https']
urltype, proxyhost = splittype(proxy)
proxyhost, proxyselector = splithost(proxyhost)
i = proxyhost.find('@') + 1
proxyhost = proxyhost[i:]
user, passwd = self.get_user_passwd(proxyhost, realm, i)
if not (user or passwd): return None
proxyhost = "%s:%s@%s" % (quote(user, safe=''),
quote(passwd, safe=''), proxyhost)
self.proxies['https'] = 'https://' + proxyhost + proxyselector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def retry_http_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
i = host.find('@') + 1
host = host[i:]
user, passwd = self.get_user_passwd(host, realm, i)
if not (user or passwd): return None
host = "%s:%s@%s" % (quote(user, safe=''),
quote(passwd, safe=''), host)
newurl = 'http://' + host + selector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def retry_https_basic_auth(self, url, realm, data=None):
host, selector = splithost(url)
i = host.find('@') + 1
host = host[i:]
user, passwd = self.get_user_passwd(host, realm, i)
if not (user or passwd): return None
host = "%s:%s@%s" % (quote(user, safe=''),
quote(passwd, safe=''), host)
newurl = 'https://' + host + selector
if data is None:
return self.open(newurl)
else:
return self.open(newurl, data)
def get_user_passwd(self, host, realm, clear_cache=0):
key = realm + '@' + host.lower()
if key in self.auth_cache:
if clear_cache:
del self.auth_cache[key]
else:
return self.auth_cache[key]
user, passwd = self.prompt_user_passwd(host, realm)
if user or passwd: self.auth_cache[key] = (user, passwd)
return user, passwd
def prompt_user_passwd(self, host, realm):
"""Override this in a GUI environment!"""
import getpass
try:
user = input("Enter username for %s at %s: " % (realm, host))
passwd = getpass.getpass("Enter password for %s in %s at %s: " %
(user, realm, host))
return user, passwd
except KeyboardInterrupt:
print()
return None, None
# Utility functions
_localhost = None
def localhost():
"""Return the IP address of the magic hostname 'localhost'."""
global _localhost
if _localhost is None:
_localhost = socket.gethostbyname('localhost')
return _localhost
_thishost = None
def thishost():
"""Return the IP addresses of the current host."""
global _thishost
if _thishost is None:
try:
_thishost = tuple(socket.gethostbyname_ex(socket.gethostname())[2])
except socket.gaierror:
_thishost = tuple(socket.gethostbyname_ex('localhost')[2])
return _thishost
_ftperrors = None
def ftperrors():
"""Return the set of errors raised by the FTP class."""
global _ftperrors
if _ftperrors is None:
import ftplib
_ftperrors = ftplib.all_errors
return _ftperrors
_noheaders = None
def noheaders():
"""Return an empty email Message object."""
global _noheaders
if _noheaders is None:
_noheaders = email.message_from_string("")
return _noheaders
# Utility classes
class ftpwrapper:
"""Class used by open_ftp() for cache of open FTP connections."""
def __init__(self, user, passwd, host, port, dirs, timeout=None,
persistent=True):
self.user = user
self.passwd = passwd
self.host = host
self.port = port
self.dirs = dirs
self.timeout = timeout
self.refcount = 0
self.keepalive = persistent
try:
self.init()
except:
self.close()
raise
def init(self):
import ftplib
self.busy = 0
self.ftp = ftplib.FTP()
self.ftp.connect(self.host, self.port, self.timeout)
self.ftp.login(self.user, self.passwd)
_target = '/'.join(self.dirs)
self.ftp.cwd(_target)
def retrfile(self, file, type):
import ftplib
self.endtransfer()
if type in ('d', 'D'): cmd = 'TYPE A'; isdir = 1
else: cmd = 'TYPE ' + type; isdir = 0
try:
self.ftp.voidcmd(cmd)
except ftplib.all_errors:
self.init()
self.ftp.voidcmd(cmd)
conn = None
if file and not isdir:
# Try to retrieve as a file
try:
cmd = 'RETR ' + file
conn, retrlen = self.ftp.ntransfercmd(cmd)
except ftplib.error_perm as reason:
if str(reason)[:3] != '550':
raise URLError('ftp error: %r' % reason).with_traceback(
sys.exc_info()[2])
if not conn:
# Set transfer mode to ASCII!
self.ftp.voidcmd('TYPE A')
# Try a directory listing. Verify that directory exists.
if file:
pwd = self.ftp.pwd()
try:
try:
self.ftp.cwd(file)
except ftplib.error_perm as reason:
raise URLError('ftp error: %r' % reason) from reason
finally:
self.ftp.cwd(pwd)
cmd = 'LIST ' + file
else:
cmd = 'LIST'
conn, retrlen = self.ftp.ntransfercmd(cmd)
self.busy = 1
ftpobj = addclosehook(conn.makefile('rb'), self.file_close)
self.refcount += 1
conn.close()
# Pass back both a suitably decorated object and a retrieval length
return (ftpobj, retrlen)
def endtransfer(self):
self.busy = 0
def close(self):
self.keepalive = False
if self.refcount <= 0:
self.real_close()
def file_close(self):
self.endtransfer()
self.refcount -= 1
if self.refcount <= 0 and not self.keepalive:
self.real_close()
def real_close(self):
self.endtransfer()
try:
self.ftp.close()
except ftperrors():
pass
# Proxy handling
def getproxies_environment():
"""Return a dictionary of scheme -> proxy server URL mappings.
Scan the environment for variables named <scheme>_proxy;
this seems to be the standard convention. If you need a
different way, you can pass a proxies dictionary to the
[Fancy]URLopener constructor.
"""
proxies = {}
for name, value in os.environ.items():
name = name.lower()
if value and name[-6:] == '_proxy':
proxies[name[:-6]] = value
# CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
# (non-all-lowercase) as it may be set from the web server by a "Proxy:"
# header from the client
if 'REQUEST_METHOD' in os.environ:
proxies.pop('http', None)
return proxies
def proxy_bypass_environment(host):
"""Test if proxies should not be used for a particular host.
Checks the environment for a variable named no_proxy, which should
be a list of DNS suffixes separated by commas, or '*' for all hosts.
"""
no_proxy = os.environ.get('no_proxy', '') or os.environ.get('NO_PROXY', '')
# '*' is special case for always bypass
if no_proxy == '*':
return 1
# strip port off host
hostonly, port = splitport(host)
# check if the host ends with any of the DNS suffixes
no_proxy_list = [proxy.strip() for proxy in no_proxy.split(',')]
for name in no_proxy_list:
if name and (hostonly.endswith(name) or host.endswith(name)):
return 1
# otherwise, don't bypass
return 0
# This code tests an OSX specific data structure but is testable on all
# platforms
def _proxy_bypass_macosx_sysconf(host, proxy_settings):
"""
Return True iff this host shouldn't be accessed using a proxy
This function uses the MacOSX framework SystemConfiguration
to fetch the proxy information.
proxy_settings come from _scproxy._get_proxy_settings or get mocked ie:
{ 'exclude_simple': bool,
'exceptions': ['foo.bar', '*.bar.com', '127.0.0.1', '10.1', '10.0/16']
}
"""
from fnmatch import fnmatch
hostonly, port = splitport(host)
def ip2num(ipAddr):
parts = ipAddr.split('.')
parts = list(map(int, parts))
if len(parts) != 4:
parts = (parts + [0, 0, 0, 0])[:4]
return (parts[0] << 24) | (parts[1] << 16) | (parts[2] << 8) | parts[3]
# Check for simple host names:
if '.' not in host:
if proxy_settings['exclude_simple']:
return True
hostIP = None
for value in proxy_settings.get('exceptions', ()):
# Items in the list are strings like these: *.local, 169.254/16
if not value: continue
m = re.match(r"(\d+(?:\.\d+)*)(/\d+)?", value)
if m is not None:
if hostIP is None:
try:
hostIP = socket.gethostbyname(hostonly)
hostIP = ip2num(hostIP)
except OSError:
continue
base = ip2num(m.group(1))
mask = m.group(2)
if mask is None:
mask = 8 * (m.group(1).count('.') + 1)
else:
mask = int(mask[1:])
mask = 32 - mask
if (hostIP >> mask) == (base >> mask):
return True
elif fnmatch(host, value):
return True
return False
if sys.platform == 'darwin' and sys.implementation.name != "ironpython": # https://github.com/IronLanguages/ironpython3/issues/1050
from _scproxy import _get_proxy_settings, _get_proxies
def proxy_bypass_macosx_sysconf(host):
proxy_settings = _get_proxy_settings()
return _proxy_bypass_macosx_sysconf(host, proxy_settings)
def getproxies_macosx_sysconf():
"""Return a dictionary of scheme -> proxy server URL mappings.
This function uses the MacOSX framework SystemConfiguration
to fetch the proxy information.
"""
return _get_proxies()
def proxy_bypass(host):
if getproxies_environment():
return proxy_bypass_environment(host)
else:
return proxy_bypass_macosx_sysconf(host)
def getproxies():
return getproxies_environment() or getproxies_macosx_sysconf()
elif os.name == 'nt':
def getproxies_registry():
"""Return a dictionary of scheme -> proxy server URL mappings.
Win32 uses the registry to store proxies.
"""
proxies = {}
try:
import winreg
except ImportError:
# Std module, so should be around - but you never know!
return proxies
try:
internetSettings = winreg.OpenKey(winreg.HKEY_CURRENT_USER,
r'Software\Microsoft\Windows\CurrentVersion\Internet Settings')
proxyEnable = winreg.QueryValueEx(internetSettings,
'ProxyEnable')[0]
if proxyEnable:
# Returned as Unicode but problems if not converted to ASCII
proxyServer = str(winreg.QueryValueEx(internetSettings,
'ProxyServer')[0])
if '=' in proxyServer:
# Per-protocol settings
for p in proxyServer.split(';'):
protocol, address = p.split('=', 1)
# See if address has a type:// prefix
if not re.match('^([^/:]+)://', address):
address = '%s://%s' % (protocol, address)
proxies[protocol] = address
else:
# Use one setting for all protocols
if proxyServer[:5] == 'http:':
proxies['http'] = proxyServer
else:
proxies['http'] = 'http://%s' % proxyServer
proxies['https'] = 'https://%s' % proxyServer
proxies['ftp'] = 'ftp://%s' % proxyServer
internetSettings.Close()
except (OSError, ValueError, TypeError):
# Either registry key not found etc, or the value in an
# unexpected format.
# proxies already set up to be empty so nothing to do
pass
return proxies
def getproxies():
"""Return a dictionary of scheme -> proxy server URL mappings.
Returns settings gathered from the environment, if specified,
or the registry.
"""
return getproxies_environment() or getproxies_registry()
def proxy_bypass_registry(host):
try:
import winreg
except ImportError:
# Std modules, so should be around - but you never know!
return 0
try:
internetSettings = winreg.OpenKey(winreg.HKEY_CURRENT_USER,
r'Software\Microsoft\Windows\CurrentVersion\Internet Settings')
proxyEnable = winreg.QueryValueEx(internetSettings,
'ProxyEnable')[0]
proxyOverride = str(winreg.QueryValueEx(internetSettings,
'ProxyOverride')[0])
# ^^^^ Returned as Unicode but problems if not converted to ASCII
except OSError:
return 0
if not proxyEnable or not proxyOverride:
return 0
# try to make a host list from name and IP address.
rawHost, port = splitport(host)
host = [rawHost]
try:
addr = socket.gethostbyname(rawHost)
if addr != rawHost:
host.append(addr)
except OSError:
pass
try:
fqdn = socket.getfqdn(rawHost)
if fqdn != rawHost:
host.append(fqdn)
except OSError:
pass
# make a check value list from the registry entry: replace the
# '<local>' string by the localhost entry and the corresponding
# canonical entry.
proxyOverride = proxyOverride.split(';')
# now check if we match one of the registry values.
for test in proxyOverride:
if test == '<local>':
if '.' not in rawHost:
return 1
test = test.replace(".", r"\.") # mask dots
test = test.replace("*", r".*") # change glob sequence
test = test.replace("?", r".") # change glob char
for val in host:
if re.match(test, val, re.I):
return 1
return 0
def proxy_bypass(host):
"""Return a dictionary of scheme -> proxy server URL mappings.
Returns settings gathered from the environment, if specified,
or the registry.
"""
if getproxies_environment():
return proxy_bypass_environment(host)
else:
return proxy_bypass_registry(host)
else:
# By default use environment variables
getproxies = getproxies_environment
proxy_bypass = proxy_bypass_environment
"""Response classes used by urllib.
The base class, addbase, defines a minimal file-like interface,
including read() and readline(). The typical response object is an
addinfourl instance, which defines an info() method that returns
headers and a geturl() method that returns the url.
"""
import tempfile
__all__ = ['addbase', 'addclosehook', 'addinfo', 'addinfourl']
class addbase(tempfile._TemporaryFileWrapper):
"""Base class for addinfo and addclosehook. Is a good idea for garbage collection."""
# XXX Add a method to expose the timeout on the underlying socket?
def __init__(self, fp):
super(addbase, self).__init__(fp, '<urllib response>', delete=False)
# Keep reference around as this was part of the original API.
self.fp = fp
def __repr__(self):
return '<%s at %r whose fp = %r>' % (self.__class__.__name__,
id(self), self.file)
def __enter__(self):
if self.fp.closed:
raise ValueError("I/O operation on closed file")
return self
def __exit__(self, type, value, traceback):
self.close()
class addclosehook(addbase):
"""Class to add a close hook to an open file."""
def __init__(self, fp, closehook, *hookargs):
super(addclosehook, self).__init__(fp)
self.closehook = closehook
self.hookargs = hookargs
def close(self):
try:
closehook = self.closehook
hookargs = self.hookargs
if closehook:
self.closehook = None
self.hookargs = None
closehook(*hookargs)
finally:
super(addclosehook, self).close()
class addinfo(addbase):
"""class to add an info() method to an open file."""
def __init__(self, fp, headers):
super(addinfo, self).__init__(fp)
self.headers = headers
def info(self):
return self.headers
class addinfourl(addinfo):
"""class to add info() and geturl() methods to an open file."""
def __init__(self, fp, headers, url, code=None):
super(addinfourl, self).__init__(fp, headers)
self.url = url
self.code = code
def getcode(self):
return self.code
def geturl(self):
return self.url
""" robotparser.py
Copyright (C) 2000 Bastian Kleineidam
You can choose between two licenses when using this package:
1) GNU GPLv2
2) PSF license for Python 2.2
The robots.txt Exclusion Protocol is implemented as specified in
http://www.robotstxt.org/norobots-rfc.txt
"""
import urllib.parse, urllib.request
__all__ = ["RobotFileParser"]
class RobotFileParser:
""" This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
"""
def __init__(self, url=''):
self.entries = []
self.default_entry = None
self.disallow_all = False
self.allow_all = False
self.set_url(url)
self.last_checked = 0
def mtime(self):
"""Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
"""
return self.last_checked
def modified(self):
"""Sets the time the robots.txt file was last fetched to the
current time.
"""
import time
self.last_checked = time.time()
def set_url(self, url):
"""Sets the URL referring to a robots.txt file."""
self.url = url
self.host, self.path = urllib.parse.urlparse(url)[1:3]
def read(self):
"""Reads the robots.txt URL and feeds it to the parser."""
try:
f = urllib.request.urlopen(self.url)
except urllib.error.HTTPError as err:
if err.code in (401, 403):
self.disallow_all = True
elif err.code >= 400 and err.code < 500:
self.allow_all = True
else:
raw = f.read()
self.parse(raw.decode("utf-8").splitlines())
def _add_entry(self, entry):
if "*" in entry.useragents:
# the default entry is considered last
if self.default_entry is None:
# the first default entry wins
self.default_entry = entry
else:
self.entries.append(entry)
def parse(self, lines):
"""Parse the input lines from a robots.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines.
"""
# states:
# 0: start state
# 1: saw user-agent line
# 2: saw an allow or disallow line
state = 0
entry = Entry()
self.modified()
for line in lines:
if not line:
if state == 1:
entry = Entry()
state = 0
elif state == 2:
self._add_entry(entry)
entry = Entry()
state = 0
# remove optional comment and strip line
i = line.find('#')
if i >= 0:
line = line[:i]
line = line.strip()
if not line:
continue
line = line.split(':', 1)
if len(line) == 2:
line[0] = line[0].strip().lower()
line[1] = urllib.parse.unquote(line[1].strip())
if line[0] == "user-agent":
if state == 2:
self._add_entry(entry)
entry = Entry()
entry.useragents.append(line[1])
state = 1
elif line[0] == "disallow":
if state != 0:
entry.rulelines.append(RuleLine(line[1], False))
state = 2
elif line[0] == "allow":
if state != 0:
entry.rulelines.append(RuleLine(line[1], True))
state = 2
if state == 2:
self._add_entry(entry)
def can_fetch(self, useragent, url):
"""using the parsed robots.txt decide if useragent can fetch url"""
if self.disallow_all:
return False
if self.allow_all:
return True
# Until the robots.txt file has been read or found not
# to exist, we must assume that no url is allowable.
# This prevents false positives when a user erronenously
# calls can_fetch() before calling read().
if not self.last_checked:
return False
# search for given user agent matches
# the first match counts
parsed_url = urllib.parse.urlparse(urllib.parse.unquote(url))
url = urllib.parse.urlunparse(('','',parsed_url.path,
parsed_url.params,parsed_url.query, parsed_url.fragment))
url = urllib.parse.quote(url)
if not url:
url = "/"
for entry in self.entries:
if entry.applies_to(useragent):
return entry.allowance(url)
# try the default entry last
if self.default_entry:
return self.default_entry.allowance(url)
# agent not found ==> access granted
return True
def __str__(self):
return ''.join([str(entry) + "\n" for entry in self.entries])
class RuleLine:
"""A rule line is a single "Allow:" (allowance==True) or "Disallow:"
(allowance==False) followed by a path."""
def __init__(self, path, allowance):
if path == '' and not allowance:
# an empty value means allow all
allowance = True
path = urllib.parse.urlunparse(urllib.parse.urlparse(path))
self.path = urllib.parse.quote(path)
self.allowance = allowance
def applies_to(self, filename):
return self.path == "*" or filename.startswith(self.path)
def __str__(self):
return (self.allowance and "Allow" or "Disallow") + ": " + self.path
class Entry:
"""An entry has one or more user-agents and zero or more rulelines"""
def __init__(self):
self.useragents = []
self.rulelines = []
def __str__(self):
ret = []
for agent in self.useragents:
ret.extend(["User-agent: ", agent, "\n"])
for line in self.rulelines:
ret.extend([str(line), "\n"])
return ''.join(ret)
def applies_to(self, useragent):
"""check if this entry applies to the specified agent"""
# split the name token and make it lower case
useragent = useragent.split("/")[0].lower()
for agent in self.useragents:
if agent == '*':
# we have the catch-all agent
return True
agent = agent.lower()
if agent in useragent:
return True
return False
def allowance(self, filename):
"""Preconditions:
- our agent applies to this entry
- filename is URL decoded"""
for line in self.rulelines:
if line.applies_to(filename):
return line.allowance
return True
"""
Virtual environment (venv) package for Python. Based on PEP 405.
Copyright (C) 2011-2014 Vinay Sajip.
Licensed to the PSF under a contributor agreement.
usage: python -m venv [-h] [--system-site-packages] [--symlinks] [--clear]
[--upgrade]
ENV_DIR [ENV_DIR ...]
Creates virtual Python environments in one or more target directories.
positional arguments:
ENV_DIR A directory to create the environment in.
optional arguments:
-h, --help show this help message and exit
--system-site-packages
Give the virtual environment access to the system
site-packages dir.
--symlinks Attempt to symlink rather than copy.
--clear Delete the environment directory if it already exists.
If not specified and the directory exists, an error is
raised.
--upgrade Upgrade the environment directory to use this version
of Python, assuming Python has been upgraded in-place.
--without-pip Skips installing or upgrading pip in the virtual
environment (pip is bootstrapped by default)
"""
import logging
import os
import shutil
import subprocess
import sys
import types
logger = logging.getLogger(__name__)
class EnvBuilder:
"""
This class exists to allow virtual environment creation to be
customized. The constructor parameters determine the builder's
behaviour when called upon to create a virtual environment.
By default, the builder makes the system (global) site-packages dir
*un*available to the created environment.
If invoked using the Python -m option, the default is to use copying
on Windows platforms but symlinks elsewhere. If instantiated some
other way, the default is to *not* use symlinks.
:param system_site_packages: If True, the system (global) site-packages
dir is available to created environments.
:param clear: If True and the target directory exists, it is deleted.
Otherwise, if the target directory exists, an error is
raised.
:param symlinks: If True, attempt to symlink rather than copy files into
virtual environment.
:param upgrade: If True, upgrade an existing virtual environment.
:param with_pip: If True, ensure pip is installed in the virtual
environment
"""
def __init__(self, system_site_packages=False, clear=False,
symlinks=False, upgrade=False, with_pip=False):
self.system_site_packages = system_site_packages
self.clear = clear
self.symlinks = symlinks
self.upgrade = upgrade
self.with_pip = with_pip
def create(self, env_dir):
"""
Create a virtual environment in a directory.
:param env_dir: The target directory to create an environment in.
"""
env_dir = os.path.abspath(env_dir)
context = self.ensure_directories(env_dir)
self.create_configuration(context)
self.setup_python(context)
if self.with_pip:
self._setup_pip(context)
if not self.upgrade:
self.setup_scripts(context)
self.post_setup(context)
def clear_directory(self, path):
for fn in os.listdir(path):
fn = os.path.join(path, fn)
if os.path.islink(fn) or os.path.isfile(fn):
os.remove(fn)
elif os.path.isdir(fn):
shutil.rmtree(fn)
def ensure_directories(self, env_dir):
"""
Create the directories for the environment.
Returns a context object which holds paths in the environment,
for use by subsequent logic.
"""
def create_if_needed(d):
if not os.path.exists(d):
os.makedirs(d)
elif os.path.islink(d) or os.path.isfile(d):
raise ValueError('Unable to create directory %r' % d)
if os.path.exists(env_dir) and self.clear:
self.clear_directory(env_dir)
context = types.SimpleNamespace()
context.env_dir = env_dir
context.env_name = os.path.split(env_dir)[1]
context.prompt = '(%s) ' % context.env_name
create_if_needed(env_dir)
env = os.environ
if sys.platform == 'darwin' and '__PYVENV_LAUNCHER__' in env:
executable = os.environ['__PYVENV_LAUNCHER__']
else:
executable = sys.executable
dirname, exename = os.path.split(os.path.abspath(executable))
context.executable = executable
context.python_dir = dirname
context.python_exe = exename
if sys.platform == 'win32':
binname = 'Scripts'
incpath = 'Include'
libpath = os.path.join(env_dir, 'Lib', 'site-packages')
else:
binname = 'bin'
incpath = 'include'
libpath = os.path.join(env_dir, 'lib',
'python%d.%d' % sys.version_info[:2],
'site-packages')
context.inc_path = path = os.path.join(env_dir, incpath)
create_if_needed(path)
create_if_needed(libpath)
# Issue 21197: create lib64 as a symlink to lib on 64-bit non-OS X POSIX
if ((sys.maxsize > 2**32) and (os.name == 'posix') and
(sys.platform != 'darwin')):
link_path = os.path.join(env_dir, 'lib64')
if not os.path.exists(link_path): # Issue #21643
os.symlink('lib', link_path)
context.bin_path = binpath = os.path.join(env_dir, binname)
context.bin_name = binname
context.env_exe = os.path.join(binpath, exename)
create_if_needed(binpath)
return context
def create_configuration(self, context):
"""
Create a configuration file indicating where the environment's Python
was copied from, and whether the system site-packages should be made
available in the environment.
:param context: The information for the environment creation request
being processed.
"""
context.cfg_path = path = os.path.join(context.env_dir, 'pyvenv.cfg')
with open(path, 'w', encoding='utf-8') as f:
f.write('home = %s\n' % context.python_dir)
if self.system_site_packages:
incl = 'true'
else:
incl = 'false'
f.write('include-system-site-packages = %s\n' % incl)
f.write('version = %d.%d.%d\n' % sys.version_info[:3])
if os.name == 'nt':
def include_binary(self, f):
if f.endswith(('.pyd', '.dll')):
result = True
else:
result = f.startswith('python') and f.endswith('.exe')
return result
def symlink_or_copy(self, src, dst, relative_symlinks_ok=False):
"""
Try symlinking a file, and if that fails, fall back to copying.
"""
force_copy = not self.symlinks
if not force_copy:
try:
if not os.path.islink(dst): # can't link to itself!
if relative_symlinks_ok:
assert os.path.dirname(src) == os.path.dirname(dst)
os.symlink(os.path.basename(src), dst)
else:
os.symlink(src, dst)
except Exception: # may need to use a more specific exception
logger.warning('Unable to symlink %r to %r', src, dst)
force_copy = True
if force_copy:
shutil.copyfile(src, dst)
def setup_python(self, context):
"""
Set up a Python executable in the environment.
:param context: The information for the environment creation request
being processed.
"""
binpath = context.bin_path
path = context.env_exe
copier = self.symlink_or_copy
copier(context.executable, path)
dirname = context.python_dir
if os.name != 'nt':
if not os.path.islink(path):
os.chmod(path, 0o755)
for suffix in ('python', 'python3'):
path = os.path.join(binpath, suffix)
if not os.path.exists(path):
# Issue 18807: make copies if
# symlinks are not wanted
copier(context.env_exe, path, relative_symlinks_ok=True)
if not os.path.islink(path):
os.chmod(path, 0o755)
else:
subdir = 'DLLs'
include = self.include_binary
files = [f for f in os.listdir(dirname) if include(f)]
for f in files:
src = os.path.join(dirname, f)
dst = os.path.join(binpath, f)
if dst != context.env_exe: # already done, above
copier(src, dst)
dirname = os.path.join(dirname, subdir)
if os.path.isdir(dirname):
files = [f for f in os.listdir(dirname) if include(f)]
for f in files:
src = os.path.join(dirname, f)
dst = os.path.join(binpath, f)
copier(src, dst)
# copy init.tcl over
for root, dirs, files in os.walk(context.python_dir):
if 'init.tcl' in files:
tcldir = os.path.basename(root)
tcldir = os.path.join(context.env_dir, 'Lib', tcldir)
if not os.path.exists(tcldir):
os.makedirs(tcldir)
src = os.path.join(root, 'init.tcl')
dst = os.path.join(tcldir, 'init.tcl')
shutil.copyfile(src, dst)
break
def _setup_pip(self, context):
"""Installs or upgrades pip in a virtual environment"""
# We run ensurepip in isolated mode to avoid side effects from
# environment vars, the current directory and anything else
# intended for the global Python environment
cmd = [context.env_exe, '-Im', 'ensurepip', '--upgrade',
'--default-pip']
subprocess.check_output(cmd, stderr=subprocess.STDOUT)
def setup_scripts(self, context):
"""
Set up scripts into the created environment from a directory.
This method installs the default scripts into the environment
being created. You can prevent the default installation by overriding
this method if you really need to, or if you need to specify
a different location for the scripts to install. By default, the
'scripts' directory in the venv package is used as the source of
scripts to install.
"""
path = os.path.abspath(os.path.dirname(__file__))
path = os.path.join(path, 'scripts')
self.install_scripts(context, path)
def post_setup(self, context):
"""
Hook for post-setup modification of the venv. Subclasses may install
additional packages or scripts here, add activation shell scripts, etc.
:param context: The information for the environment creation request
being processed.
"""
pass
def replace_variables(self, text, context):
"""
Replace variable placeholders in script text with context-specific
variables.
Return the text passed in , but with variables replaced.
:param text: The text in which to replace placeholder variables.
:param context: The information for the environment creation request
being processed.
"""
text = text.replace('__VENV_DIR__', context.env_dir)
text = text.replace('__VENV_NAME__', context.env_name)
text = text.replace('__VENV_PROMPT__', context.prompt)
text = text.replace('__VENV_BIN_NAME__', context.bin_name)
text = text.replace('__VENV_PYTHON__', context.env_exe)
return text
def install_scripts(self, context, path):
"""
Install scripts into the created environment from a directory.
:param context: The information for the environment creation request
being processed.
:param path: Absolute pathname of a directory containing script.
Scripts in the 'common' subdirectory of this directory,
and those in the directory named for the platform
being run on, are installed in the created environment.
Placeholder variables are replaced with environment-
specific values.
"""
binpath = context.bin_path
plen = len(path)
for root, dirs, files in os.walk(path):
if root == path: # at top-level, remove irrelevant dirs
for d in dirs[:]:
if d not in ('common', os.name):
dirs.remove(d)
continue # ignore files in top level
for f in files:
srcfile = os.path.join(root, f)
suffix = root[plen:].split(os.sep)[2:]
if not suffix:
dstdir = binpath
else:
dstdir = os.path.join(binpath, *suffix)
if not os.path.exists(dstdir):
os.makedirs(dstdir)
dstfile = os.path.join(dstdir, f)
with open(srcfile, 'rb') as f:
data = f.read()
if srcfile.endswith('.exe'):
mode = 'wb'
else:
mode = 'w'
try:
data = data.decode('utf-8')
data = self.replace_variables(data, context)
except UnicodeDecodeError as e:
data = None
logger.warning('unable to copy script %r, '
'may be binary: %s', srcfile, e)
if data is not None:
with open(dstfile, mode) as f:
f.write(data)
shutil.copymode(srcfile, dstfile)
def create(env_dir, system_site_packages=False, clear=False,
symlinks=False, with_pip=False):
"""
Create a virtual environment in a directory.
By default, makes the system (global) site-packages dir *un*available to
the created environment, and uses copying rather than symlinking for files
obtained from the source Python installation.
:param env_dir: The target directory to create an environment in.
:param system_site_packages: If True, the system (global) site-packages
dir is available to the environment.
:param clear: If True and the target directory exists, it is deleted.
Otherwise, if the target directory exists, an error is
raised.
:param symlinks: If True, attempt to symlink rather than copy files into
virtual environment.
:param with_pip: If True, ensure pip is installed in the virtual
environment
"""
builder = EnvBuilder(system_site_packages=system_site_packages,
clear=clear, symlinks=symlinks, with_pip=with_pip)
builder.create(env_dir)
def main(args=None):
compatible = True
if sys.version_info < (3, 3):
compatible = False
elif not hasattr(sys, 'base_prefix'):
compatible = False
if not compatible:
raise ValueError('This script is only for use with Python >= 3.3')
else:
import argparse
parser = argparse.ArgumentParser(prog=__name__,
description='Creates virtual Python '
'environments in one or '
'more target '
'directories.',
epilog='Once an environment has been '
'created, you may wish to '
'activate it, e.g. by '
'sourcing an activate script '
'in its bin directory.')
parser.add_argument('dirs', metavar='ENV_DIR', nargs='+',
help='A directory to create the environment in.')
parser.add_argument('--system-site-packages', default=False,
action='store_true', dest='system_site',
help='Give the virtual environment access to the '
'system site-packages dir.')
if os.name == 'nt':
use_symlinks = False
else:
use_symlinks = True
group = parser.add_mutually_exclusive_group()
group.add_argument('--symlinks', default=use_symlinks,
action='store_true', dest='symlinks',
help='Try to use symlinks rather than copies, '
'when symlinks are not the default for '
'the platform.')
group.add_argument('--copies', default=not use_symlinks,
action='store_false', dest='symlinks',
help='Try to use copies rather than symlinks, '
'even when symlinks are the default for '
'the platform.')
parser.add_argument('--clear', default=False, action='store_true',
dest='clear', help='Delete the contents of the '
'environment directory if it '
'already exists, before '
'environment creation.')
parser.add_argument('--upgrade', default=False, action='store_true',
dest='upgrade', help='Upgrade the environment '
'directory to use this version '
'of Python, assuming Python '
'has been upgraded in-place.')
parser.add_argument('--without-pip', dest='with_pip',
default=True, action='store_false',
help='Skips installing or upgrading pip in the '
'virtual environment (pip is bootstrapped '
'by default)')
options = parser.parse_args(args)
if options.upgrade and options.clear:
raise ValueError('you cannot supply --upgrade and --clear together.')
builder = EnvBuilder(system_site_packages=options.system_site,
clear=options.clear,
symlinks=options.symlinks,
upgrade=options.upgrade,
with_pip=options.with_pip)
for d in options.dirs:
builder.create(d)
if __name__ == '__main__':
rc = 1
try:
main()
rc = 0
except Exception as e:
print('Error: %s' % e, file=sys.stderr)
sys.exit(rc)
import sys
from . import main
rc = 1
try:
main()
rc = 0
except Exception as e:
print('Error: %s' % e, file=sys.stderr)
sys.exit(rc)
@echo off
set "VIRTUAL_ENV=__VENV_DIR__"
if not defined PROMPT (
set "PROMPT=$P$G"
)
if defined _OLD_VIRTUAL_PROMPT (
set "PROMPT=%_OLD_VIRTUAL_PROMPT%"
)
if defined _OLD_VIRTUAL_PYTHONHOME (
set "PYTHONHOME=%_OLD_VIRTUAL_PYTHONHOME%"
)
set "_OLD_VIRTUAL_PROMPT=%PROMPT%"
set "PROMPT=__VENV_PROMPT__%PROMPT%"
if defined PYTHONHOME (
set "_OLD_VIRTUAL_PYTHONHOME=%PYTHONHOME%"
set PYTHONHOME=
)
if defined _OLD_VIRTUAL_PATH (
set "PATH=%_OLD_VIRTUAL_PATH%"
) else (
set "_OLD_VIRTUAL_PATH=%PATH%"
)
set "PATH=%VIRTUAL_ENV%\__VENV_BIN_NAME__;%PATH%"
:END
function global:deactivate ([switch]$NonDestructive) {
# Revert to original values
if (Test-Path function:_OLD_VIRTUAL_PROMPT) {
copy-item function:_OLD_VIRTUAL_PROMPT function:prompt
remove-item function:_OLD_VIRTUAL_PROMPT
}
if (Test-Path env:_OLD_VIRTUAL_PYTHONHOME) {
copy-item env:_OLD_VIRTUAL_PYTHONHOME env:PYTHONHOME
remove-item env:_OLD_VIRTUAL_PYTHONHOME
}
if (Test-Path env:_OLD_VIRTUAL_PATH) {
copy-item env:_OLD_VIRTUAL_PATH env:PATH
remove-item env:_OLD_VIRTUAL_PATH
}
if (Test-Path env:VIRTUAL_ENV) {
remove-item env:VIRTUAL_ENV
}
if (!$NonDestructive) {
# Self destruct!
remove-item function:deactivate
}
}
deactivate -nondestructive
$env:VIRTUAL_ENV="__VENV_DIR__"
# Set the prompt to include the env name
# Make sure _OLD_VIRTUAL_PROMPT is global
function global:_OLD_VIRTUAL_PROMPT {""}
copy-item function:prompt function:_OLD_VIRTUAL_PROMPT
function global:prompt {
Write-Host -NoNewline -ForegroundColor Green '__VENV_PROMPT__'
_OLD_VIRTUAL_PROMPT
}
# Clear PYTHONHOME
if (Test-Path env:PYTHONHOME) {
copy-item env:PYTHONHOME env:_OLD_VIRTUAL_PYTHONHOME
remove-item env:PYTHONHOME
}
# Add the venv to the PATH
copy-item env:PATH env:_OLD_VIRTUAL_PATH
$env:PATH = "$env:VIRTUAL_ENV\__VENV_BIN_NAME__;$env:PATH"
@echo off
if defined _OLD_VIRTUAL_PROMPT (
set "PROMPT=%_OLD_VIRTUAL_PROMPT%"
)
set _OLD_VIRTUAL_PROMPT=
if defined _OLD_VIRTUAL_PYTHONHOME (
set "PYTHONHOME=%_OLD_VIRTUAL_PYTHONHOME%"
set _OLD_VIRTUAL_PYTHONHOME=
)
if defined _OLD_VIRTUAL_PATH (
set "PATH=%_OLD_VIRTUAL_PATH%"
)
set _OLD_VIRTUAL_PATH=
set VIRTUAL_ENV=
:END
"""Base classes for server/gateway implementations"""
from .util import FileWrapper, guess_scheme, is_hop_by_hop
from .headers import Headers
import sys, os, time
__all__ = [
'BaseHandler', 'SimpleHandler', 'BaseCGIHandler', 'CGIHandler',
'IISCGIHandler', 'read_environ'
]
# Weekday and month names for HTTP date/time formatting; always English!
_weekdayname = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
_monthname = [None, # Dummy so we can use 1-based month numbers
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
def format_date_time(timestamp):
year, month, day, hh, mm, ss, wd, y, z = time.gmtime(timestamp)
return "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
_weekdayname[wd], day, _monthname[month], year, hh, mm, ss
)
_is_request = {
'SCRIPT_NAME', 'PATH_INFO', 'QUERY_STRING', 'REQUEST_METHOD', 'AUTH_TYPE',
'CONTENT_TYPE', 'CONTENT_LENGTH', 'HTTPS', 'REMOTE_USER', 'REMOTE_IDENT',
}.__contains__
def _needs_transcode(k):
return _is_request(k) or k.startswith('HTTP_') or k.startswith('SSL_') \
or (k.startswith('REDIRECT_') and _needs_transcode(k[9:]))
def read_environ():
"""Read environment, fixing HTTP variables"""
enc = sys.getfilesystemencoding()
esc = 'surrogateescape'
try:
''.encode('utf-8', esc)
except LookupError:
esc = 'replace'
environ = {}
# Take the basic environment from native-unicode os.environ. Attempt to
# fix up the variables that come from the HTTP request to compensate for
# the bytes->unicode decoding step that will already have taken place.
for k, v in os.environ.items():
if _needs_transcode(k):
# On win32, the os.environ is natively Unicode. Different servers
# decode the request bytes using different encodings.
if sys.platform == 'win32':
software = os.environ.get('SERVER_SOFTWARE', '').lower()
# On IIS, the HTTP request will be decoded as UTF-8 as long
# as the input is a valid UTF-8 sequence. Otherwise it is
# decoded using the system code page (mbcs), with no way to
# detect this has happened. Because UTF-8 is the more likely
# encoding, and mbcs is inherently unreliable (an mbcs string
# that happens to be valid UTF-8 will not be decoded as mbcs)
# always recreate the original bytes as UTF-8.
if software.startswith('microsoft-iis/'):
v = v.encode('utf-8').decode('iso-8859-1')
# Apache mod_cgi writes bytes-as-unicode (as if ISO-8859-1) direct
# to the Unicode environ. No modification needed.
elif software.startswith('apache/'):
pass
# Python 3's http.server.CGIHTTPRequestHandler decodes
# using the urllib.unquote default of UTF-8, amongst other
# issues.
elif (
software.startswith('simplehttp/')
and 'python/3' in software
):
v = v.encode('utf-8').decode('iso-8859-1')
# For other servers, guess that they have written bytes to
# the environ using stdio byte-oriented interfaces, ending up
# with the system code page.
else:
v = v.encode(enc, 'replace').decode('iso-8859-1')
# Recover bytes from unicode environ, using surrogate escapes
# where available (Python 3.1+).
else:
v = v.encode(enc, esc).decode('iso-8859-1')
environ[k] = v
return environ
class BaseHandler:
"""Manage the invocation of a WSGI application"""
# Configuration parameters; can override per-subclass or per-instance
wsgi_version = (1,0)
wsgi_multithread = True
wsgi_multiprocess = True
wsgi_run_once = False
origin_server = True # We are transmitting direct to client
http_version = "1.0" # Version that should be used for response
server_software = None # String name of server software, if any
# os_environ is used to supply configuration from the OS environment:
# by default it's a copy of 'os.environ' as of import time, but you can
# override this in e.g. your __init__ method.
os_environ= read_environ()
# Collaborator classes
wsgi_file_wrapper = FileWrapper # set to None to disable
headers_class = Headers # must be a Headers-like class
# Error handling (also per-subclass or per-instance)
traceback_limit = None # Print entire traceback to self.get_stderr()
error_status = "500 Internal Server Error"
error_headers = [('Content-Type','text/plain')]
error_body = b"A server error occurred. Please contact the administrator."
# State variables (don't mess with these)
status = result = None
headers_sent = False
headers = None
bytes_sent = 0
def run(self, application):
"""Invoke the application"""
# Note to self: don't move the close()! Asynchronous servers shouldn't
# call close() from finish_response(), so if you close() anywhere but
# the double-error branch here, you'll break asynchronous servers by
# prematurely closing. Async servers must return from 'run()' without
# closing if there might still be output to iterate over.
try:
self.setup_environ()
self.result = application(self.environ, self.start_response)
self.finish_response()
except:
try:
self.handle_error()
except:
# If we get an error handling an error, just give up already!
self.close()
raise # ...and let the actual server figure it out.
def setup_environ(self):
"""Set up the environment for one request"""
env = self.environ = self.os_environ.copy()
self.add_cgi_vars()
env['wsgi.input'] = self.get_stdin()
env['wsgi.errors'] = self.get_stderr()
env['wsgi.version'] = self.wsgi_version
env['wsgi.run_once'] = self.wsgi_run_once
env['wsgi.url_scheme'] = self.get_scheme()
env['wsgi.multithread'] = self.wsgi_multithread
env['wsgi.multiprocess'] = self.wsgi_multiprocess
if self.wsgi_file_wrapper is not None:
env['wsgi.file_wrapper'] = self.wsgi_file_wrapper
if self.origin_server and self.server_software:
env.setdefault('SERVER_SOFTWARE',self.server_software)
def finish_response(self):
"""Send any iterable data, then close self and the iterable
Subclasses intended for use in asynchronous servers will
want to redefine this method, such that it sets up callbacks
in the event loop to iterate over the data, and to call
'self.close()' once the response is finished.
"""
try:
if not self.result_is_file() or not self.sendfile():
for data in self.result:
self.write(data)
self.finish_content()
finally:
self.close()
def get_scheme(self):
"""Return the URL scheme being used"""
return guess_scheme(self.environ)
def set_content_length(self):
"""Compute Content-Length or switch to chunked encoding if possible"""
try:
blocks = len(self.result)
except (TypeError,AttributeError,NotImplementedError):
pass
else:
if blocks==1:
self.headers['Content-Length'] = str(self.bytes_sent)
return
# XXX Try for chunked encoding if origin server and client is 1.1
def cleanup_headers(self):
"""Make any necessary header changes or defaults
Subclasses can extend this to add other defaults.
"""
if 'Content-Length' not in self.headers:
self.set_content_length()
def start_response(self, status, headers,exc_info=None):
"""'start_response()' callable as specified by PEP 3333"""
if exc_info:
try:
if self.headers_sent:
# Re-raise original exception if headers sent
raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])
finally:
exc_info = None # avoid dangling circular ref
elif self.headers is not None:
raise AssertionError("Headers already set!")
self.status = status
self.headers = self.headers_class(headers)
status = self._convert_string_type(status, "Status")
assert len(status)>=4,"Status must be at least 4 characters"
assert int(status[:3]),"Status message must begin w/3-digit code"
assert status[3]==" ", "Status message must have a space after code"
if __debug__:
for name, val in headers:
name = self._convert_string_type(name, "Header name")
val = self._convert_string_type(val, "Header value")
assert not is_hop_by_hop(name),"Hop-by-hop headers not allowed"
return self.write
def _convert_string_type(self, value, title):
"""Convert/check value type."""
if type(value) is str:
return value
raise AssertionError(
"{0} must be of type str (got {1})".format(title, repr(value))
)
def send_preamble(self):
"""Transmit version/status/date/server, via self._write()"""
if self.origin_server:
if self.client_is_modern():
self._write(('HTTP/%s %s\r\n' % (self.http_version,self.status)).encode('iso-8859-1'))
if 'Date' not in self.headers:
self._write(
('Date: %s\r\n' % format_date_time(time.time())).encode('iso-8859-1')
)
if self.server_software and 'Server' not in self.headers:
self._write(('Server: %s\r\n' % self.server_software).encode('iso-8859-1'))
else:
self._write(('Status: %s\r\n' % self.status).encode('iso-8859-1'))
def write(self, data):
"""'write()' callable as specified by PEP 3333"""
assert type(data) is bytes, \
"write() argument must be a bytes instance"
if not self.status:
raise AssertionError("write() before start_response()")
elif not self.headers_sent:
# Before the first output, send the stored headers
self.bytes_sent = len(data) # make sure we know content-length
self.send_headers()
else:
self.bytes_sent += len(data)
# XXX check Content-Length and truncate if too many bytes written?
self._write(data)
self._flush()
def sendfile(self):
"""Platform-specific file transmission
Override this method in subclasses to support platform-specific
file transmission. It is only called if the application's
return iterable ('self.result') is an instance of
'self.wsgi_file_wrapper'.
This method should return a true value if it was able to actually
transmit the wrapped file-like object using a platform-specific
approach. It should return a false value if normal iteration
should be used instead. An exception can be raised to indicate
that transmission was attempted, but failed.
NOTE: this method should call 'self.send_headers()' if
'self.headers_sent' is false and it is going to attempt direct
transmission of the file.
"""
return False # No platform-specific transmission by default
def finish_content(self):
"""Ensure headers and content have both been sent"""
if not self.headers_sent:
# Only zero Content-Length if not set by the application (so
# that HEAD requests can be satisfied properly, see #3839)
self.headers.setdefault('Content-Length', "0")
self.send_headers()
else:
pass # XXX check if content-length was too short?
def close(self):
"""Close the iterable (if needed) and reset all instance vars
Subclasses may want to also drop the client connection.
"""
try:
if hasattr(self.result,'close'):
self.result.close()
finally:
self.result = self.headers = self.status = self.environ = None
self.bytes_sent = 0; self.headers_sent = False
def send_headers(self):
"""Transmit headers to the client, via self._write()"""
self.cleanup_headers()
self.headers_sent = True
if not self.origin_server or self.client_is_modern():
self.send_preamble()
self._write(bytes(self.headers))
def result_is_file(self):
"""True if 'self.result' is an instance of 'self.wsgi_file_wrapper'"""
wrapper = self.wsgi_file_wrapper
return wrapper is not None and isinstance(self.result,wrapper)
def client_is_modern(self):
"""True if client can accept status and headers"""
return self.environ['SERVER_PROTOCOL'].upper() != 'HTTP/0.9'
def log_exception(self,exc_info):
"""Log the 'exc_info' tuple in the server log
Subclasses may override to retarget the output or change its format.
"""
try:
from traceback import print_exception
stderr = self.get_stderr()
print_exception(
exc_info[0], exc_info[1], exc_info[2],
self.traceback_limit, stderr
)
stderr.flush()
finally:
exc_info = None
def handle_error(self):
"""Log current error, and send error output to client if possible"""
self.log_exception(sys.exc_info())
if not self.headers_sent:
self.result = self.error_output(self.environ, self.start_response)
self.finish_response()
# XXX else: attempt advanced recovery techniques for HTML or text?
def error_output(self, environ, start_response):
"""WSGI mini-app to create error output
By default, this just uses the 'error_status', 'error_headers',
and 'error_body' attributes to generate an output page. It can
be overridden in a subclass to dynamically generate diagnostics,
choose an appropriate message for the user's preferred language, etc.
Note, however, that it's not recommended from a security perspective to
spit out diagnostics to any old user; ideally, you should have to do
something special to enable diagnostic output, which is why we don't
include any here!
"""
start_response(self.error_status,self.error_headers[:],sys.exc_info())
return [self.error_body]
# Pure abstract methods; *must* be overridden in subclasses
def _write(self,data):
"""Override in subclass to buffer data for send to client
It's okay if this method actually transmits the data; BaseHandler
just separates write and flush operations for greater efficiency
when the underlying system actually has such a distinction.
"""
raise NotImplementedError
def _flush(self):
"""Override in subclass to force sending of recent '_write()' calls
It's okay if this method is a no-op (i.e., if '_write()' actually
sends the data.
"""
raise NotImplementedError
def get_stdin(self):
"""Override in subclass to return suitable 'wsgi.input'"""
raise NotImplementedError
def get_stderr(self):
"""Override in subclass to return suitable 'wsgi.errors'"""
raise NotImplementedError
def add_cgi_vars(self):
"""Override in subclass to insert CGI variables in 'self.environ'"""
raise NotImplementedError
class SimpleHandler(BaseHandler):
"""Handler that's just initialized with streams, environment, etc.
This handler subclass is intended for synchronous HTTP/1.0 origin servers,
and handles sending the entire response output, given the correct inputs.
Usage::
handler = SimpleHandler(
inp,out,err,env, multithread=False, multiprocess=True
)
handler.run(app)"""
def __init__(self,stdin,stdout,stderr,environ,
multithread=True, multiprocess=False
):
self.stdin = stdin
self.stdout = stdout
self.stderr = stderr
self.base_env = environ
self.wsgi_multithread = multithread
self.wsgi_multiprocess = multiprocess
def get_stdin(self):
return self.stdin
def get_stderr(self):
return self.stderr
def add_cgi_vars(self):
self.environ.update(self.base_env)
def _write(self,data):
self.stdout.write(data)
def _flush(self):
self.stdout.flush()
self._flush = self.stdout.flush
class BaseCGIHandler(SimpleHandler):
"""CGI-like systems using input/output/error streams and environ mapping
Usage::
handler = BaseCGIHandler(inp,out,err,env)
handler.run(app)
This handler class is useful for gateway protocols like ReadyExec and
FastCGI, that have usable input/output/error streams and an environment
mapping. It's also the base class for CGIHandler, which just uses
sys.stdin, os.environ, and so on.
The constructor also takes keyword arguments 'multithread' and
'multiprocess' (defaulting to 'True' and 'False' respectively) to control
the configuration sent to the application. It sets 'origin_server' to
False (to enable CGI-like output), and assumes that 'wsgi.run_once' is
False.
"""
origin_server = False
class CGIHandler(BaseCGIHandler):
"""CGI-based invocation via sys.stdin/stdout/stderr and os.environ
Usage::
CGIHandler().run(app)
The difference between this class and BaseCGIHandler is that it always
uses 'wsgi.run_once' of 'True', 'wsgi.multithread' of 'False', and
'wsgi.multiprocess' of 'True'. It does not take any initialization
parameters, but always uses 'sys.stdin', 'os.environ', and friends.
If you need to override any of these parameters, use BaseCGIHandler
instead.
"""
wsgi_run_once = True
# Do not allow os.environ to leak between requests in Google App Engine
# and other multi-run CGI use cases. This is not easily testable.
# See http://bugs.python.org/issue7250
os_environ = {}
def __init__(self):
BaseCGIHandler.__init__(
self, sys.stdin.buffer, sys.stdout.buffer, sys.stderr,
read_environ(), multithread=False, multiprocess=True
)
class IISCGIHandler(BaseCGIHandler):
"""CGI-based invocation with workaround for IIS path bug
This handler should be used in preference to CGIHandler when deploying on
Microsoft IIS without having set the config allowPathInfo option (IIS>=7)
or metabase allowPathInfoForScriptMappings (IIS<7).
"""
wsgi_run_once = True
os_environ = {}
# By default, IIS gives a PATH_INFO that duplicates the SCRIPT_NAME at
# the front, causing problems for WSGI applications that wish to implement
# routing. This handler strips any such duplicated path.
# IIS can be configured to pass the correct PATH_INFO, but this causes
# another bug where PATH_TRANSLATED is wrong. Luckily this variable is
# rarely used and is not guaranteed by WSGI. On IIS<7, though, the
# setting can only be made on a vhost level, affecting all other script
# mappings, many of which break when exposed to the PATH_TRANSLATED bug.
# For this reason IIS<7 is almost never deployed with the fix. (Even IIS7
# rarely uses it because there is still no UI for it.)
# There is no way for CGI code to tell whether the option was set, so a
# separate handler class is provided.
def __init__(self):
environ= read_environ()
path = environ.get('PATH_INFO', '')
script = environ.get('SCRIPT_NAME', '')
if (path+'/').startswith(script+'/'):
environ['PATH_INFO'] = path[len(script):]
BaseCGIHandler.__init__(
self, sys.stdin.buffer, sys.stdout.buffer, sys.stderr,
environ, multithread=False, multiprocess=True
)
"""Manage HTTP Response Headers
Much of this module is red-handedly pilfered from email.message in the stdlib,
so portions are Copyright (C) 2001,2002 Python Software Foundation, and were
written by Barry Warsaw.
"""
# Regular expression that matches `special' characters in parameters, the
# existence of which force quoting of the parameter value.
import re
tspecials = re.compile(r'[ \(\)<>@,;:\\"/\[\]\?=]')
def _formatparam(param, value=None, quote=1):
"""Convenience function to format and return a key=value pair.
This will quote the value if needed or if quote is true.
"""
if value is not None and len(value) > 0:
if quote or tspecials.search(value):
value = value.replace('\\', '\\\\').replace('"', r'\"')
return '%s="%s"' % (param, value)
else:
return '%s=%s' % (param, value)
else:
return param
class Headers:
"""Manage a collection of HTTP response headers"""
def __init__(self,headers):
if type(headers) is not list:
raise TypeError("Headers must be a list of name/value tuples")
self._headers = headers
if __debug__:
for k, v in headers:
self._convert_string_type(k)
self._convert_string_type(v)
def _convert_string_type(self, value):
"""Convert/check value type."""
if type(value) is str:
return value
raise AssertionError("Header names/values must be"
" of type str (got {0})".format(repr(value)))
def __len__(self):
"""Return the total number of headers, including duplicates."""
return len(self._headers)
def __setitem__(self, name, val):
"""Set the value of a header."""
del self[name]
self._headers.append(
(self._convert_string_type(name), self._convert_string_type(val)))
def __delitem__(self,name):
"""Delete all occurrences of a header, if present.
Does *not* raise an exception if the header is missing.
"""
name = self._convert_string_type(name.lower())
self._headers[:] = [kv for kv in self._headers if kv[0].lower() != name]
def __getitem__(self,name):
"""Get the first header value for 'name'
Return None if the header is missing instead of raising an exception.
Note that if the header appeared multiple times, the first exactly which
occurrance gets returned is undefined. Use getall() to get all
the values matching a header field name.
"""
return self.get(name)
def __contains__(self, name):
"""Return true if the message contains the header."""
return self.get(name) is not None
def get_all(self, name):
"""Return a list of all the values for the named field.
These will be sorted in the order they appeared in the original header
list or were added to this instance, and may contain duplicates. Any
fields deleted and re-inserted are always appended to the header list.
If no fields exist with the given name, returns an empty list.
"""
name = self._convert_string_type(name.lower())
return [kv[1] for kv in self._headers if kv[0].lower()==name]
def get(self,name,default=None):
"""Get the first header value for 'name', or return 'default'"""
name = self._convert_string_type(name.lower())
for k,v in self._headers:
if k.lower()==name:
return v
return default
def keys(self):
"""Return a list of all the header field names.
These will be sorted in the order they appeared in the original header
list, or were added to this instance, and may contain duplicates.
Any fields deleted and re-inserted are always appended to the header
list.
"""
return [k for k, v in self._headers]
def values(self):
"""Return a list of all header values.
These will be sorted in the order they appeared in the original header
list, or were added to this instance, and may contain duplicates.
Any fields deleted and re-inserted are always appended to the header
list.
"""
return [v for k, v in self._headers]
def items(self):
"""Get all the header fields and values.
These will be sorted in the order they were in the original header
list, or were added to this instance, and may contain duplicates.
Any fields deleted and re-inserted are always appended to the header
list.
"""
return self._headers[:]
def __repr__(self):
return "Headers(%r)" % self._headers
def __str__(self):
"""str() returns the formatted headers, complete with end line,
suitable for direct HTTP transmission."""
return '\r\n'.join(["%s: %s" % kv for kv in self._headers]+['',''])
def __bytes__(self):
return str(self).encode('iso-8859-1')
def setdefault(self,name,value):
"""Return first matching header value for 'name', or 'value'
If there is no header named 'name', add a new header with name 'name'
and value 'value'."""
result = self.get(name)
if result is None:
self._headers.append((self._convert_string_type(name),
self._convert_string_type(value)))
return value
else:
return result
def add_header(self, _name, _value, **_params):
"""Extended header setting.
_name is the header field to add. keyword arguments can be used to set
additional parameters for the header field, with underscores converted
to dashes. Normally the parameter will be added as key="value" unless
value is None, in which case only the key will be added.
Example:
h.add_header('content-disposition', 'attachment', filename='bud.gif')
Note that unlike the corresponding 'email.message' method, this does
*not* handle '(charset, language, value)' tuples: all values must be
strings or None.
"""
parts = []
if _value is not None:
_value = self._convert_string_type(_value)
parts.append(_value)
for k, v in _params.items():
k = self._convert_string_type(k)
if v is None:
parts.append(k.replace('_', '-'))
else:
v = self._convert_string_type(v)
parts.append(_formatparam(k.replace('_', '-'), v))
self._headers.append((self._convert_string_type(_name), "; ".join(parts)))
"""BaseHTTPServer that implements the Python WSGI protocol (PEP 3333)
This is both an example of how WSGI can be implemented, and a basis for running
simple web applications on a local machine, such as might be done when testing
or debugging an application. It has not been reviewed for security issues,
however, and we strongly recommend that you use a "real" web server for
production use.
For example usage, see the 'if __name__=="__main__"' block at the end of the
module. See also the BaseHTTPServer module docs for other API information.
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
import sys
import urllib.parse
from wsgiref.handlers import SimpleHandler
from platform import python_implementation
__version__ = "0.2"
__all__ = ['WSGIServer', 'WSGIRequestHandler', 'demo_app', 'make_server']
server_version = "WSGIServer/" + __version__
sys_version = python_implementation() + "/" + sys.version.split()[0]
software_version = server_version + ' ' + sys_version
class ServerHandler(SimpleHandler):
server_software = software_version
def close(self):
try:
self.request_handler.log_request(
self.status.split(' ',1)[0], self.bytes_sent
)
finally:
SimpleHandler.close(self)
class WSGIServer(HTTPServer):
"""BaseHTTPServer that implements the Python WSGI protocol"""
application = None
def server_bind(self):
"""Override server_bind to store the server name."""
HTTPServer.server_bind(self)
self.setup_environ()
def setup_environ(self):
# Set up base environment
env = self.base_environ = {}
env['SERVER_NAME'] = self.server_name
env['GATEWAY_INTERFACE'] = 'CGI/1.1'
env['SERVER_PORT'] = str(self.server_port)
env['REMOTE_HOST']=''
env['CONTENT_LENGTH']=''
env['SCRIPT_NAME'] = ''
def get_app(self):
return self.application
def set_app(self,application):
self.application = application
class WSGIRequestHandler(BaseHTTPRequestHandler):
server_version = "WSGIServer/" + __version__
def get_environ(self):
env = self.server.base_environ.copy()
env['SERVER_PROTOCOL'] = self.request_version
env['SERVER_SOFTWARE'] = self.server_version
env['REQUEST_METHOD'] = self.command
if '?' in self.path:
path,query = self.path.split('?',1)
else:
path,query = self.path,''
env['PATH_INFO'] = urllib.parse.unquote_to_bytes(path).decode('iso-8859-1')
env['QUERY_STRING'] = query
host = self.address_string()
if host != self.client_address[0]:
env['REMOTE_HOST'] = host
env['REMOTE_ADDR'] = self.client_address[0]
if self.headers.get('content-type') is None:
env['CONTENT_TYPE'] = self.headers.get_content_type()
else:
env['CONTENT_TYPE'] = self.headers['content-type']
length = self.headers.get('content-length')
if length:
env['CONTENT_LENGTH'] = length
for k, v in self.headers.items():
k=k.replace('-','_').upper(); v=v.strip()
if k in env:
continue # skip content length, type,etc.
if 'HTTP_'+k in env:
env['HTTP_'+k] += ','+v # comma-separate multiple headers
else:
env['HTTP_'+k] = v
return env
def get_stderr(self):
return sys.stderr
def handle(self):
"""Handle a single HTTP request"""
self.raw_requestline = self.rfile.readline(65537)
if len(self.raw_requestline) > 65536:
self.requestline = ''
self.request_version = ''
self.command = ''
self.send_error(414)
return
if not self.parse_request(): # An error code has been sent, just exit
return
handler = ServerHandler(
self.rfile, self.wfile, self.get_stderr(), self.get_environ()
)
handler.request_handler = self # backpointer for logging
handler.run(self.server.get_app())
def demo_app(environ,start_response):
from io import StringIO
stdout = StringIO()
print("Hello world!", file=stdout)
print(file=stdout)
h = sorted(environ.items())
for k,v in h:
print(k,'=',repr(v), file=stdout)
start_response("200 OK", [('Content-Type','text/plain; charset=utf-8')])
return [stdout.getvalue().encode("utf-8")]
def make_server(
host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler
):
"""Create a new WSGI server listening on `host` and `port` for `app`"""
server = server_class((host, port), handler_class)
server.set_app(app)
return server
if __name__ == '__main__':
httpd = make_server('', 8000, demo_app)
sa = httpd.socket.getsockname()
print("Serving HTTP on", sa[0], "port", sa[1], "...")
import webbrowser
webbrowser.open('http://localhost:8000/xyz?abc')
httpd.handle_request() # serve one request, then exit
httpd.server_close()
"""Miscellaneous WSGI-related Utilities"""
import posixpath
__all__ = [
'FileWrapper', 'guess_scheme', 'application_uri', 'request_uri',
'shift_path_info', 'setup_testing_defaults',
]
class FileWrapper:
"""Wrapper to convert file-like objects to iterables"""
def __init__(self, filelike, blksize=8192):
self.filelike = filelike
self.blksize = blksize
if hasattr(filelike,'close'):
self.close = filelike.close
def __getitem__(self,key):
data = self.filelike.read(self.blksize)
if data:
return data
raise IndexError
def __iter__(self):
return self
def __next__(self):
data = self.filelike.read(self.blksize)
if data:
return data
raise StopIteration
def guess_scheme(environ):
"""Return a guess for whether 'wsgi.url_scheme' should be 'http' or 'https'
"""
if environ.get("HTTPS") in ('yes','on','1'):
return 'https'
else:
return 'http'
def application_uri(environ):
"""Return the application's base URI (no PATH_INFO or QUERY_STRING)"""
url = environ['wsgi.url_scheme']+'://'
from urllib.parse import quote
if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']
if environ['wsgi.url_scheme'] == 'https':
if environ['SERVER_PORT'] != '443':
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']
url += quote(environ.get('SCRIPT_NAME') or '/', encoding='latin1')
return url
def request_uri(environ, include_query=True):
"""Return the full request URI, optionally including the query string"""
url = application_uri(environ)
from urllib.parse import quote
path_info = quote(environ.get('PATH_INFO',''), safe='/;=,', encoding='latin1')
if not environ.get('SCRIPT_NAME'):
url += path_info[1:]
else:
url += path_info
if include_query and environ.get('QUERY_STRING'):
url += '?' + environ['QUERY_STRING']
return url
def shift_path_info(environ):
"""Shift a name from PATH_INFO to SCRIPT_NAME, returning it
If there are no remaining path segments in PATH_INFO, return None.
Note: 'environ' is modified in-place; use a copy if you need to keep
the original PATH_INFO or SCRIPT_NAME.
Note: when PATH_INFO is just a '/', this returns '' and appends a trailing
'/' to SCRIPT_NAME, even though empty path segments are normally ignored,
and SCRIPT_NAME doesn't normally end in a '/'. This is intentional
behavior, to ensure that an application can tell the difference between
'/x' and '/x/' when traversing to objects.
"""
path_info = environ.get('PATH_INFO','')
if not path_info:
return None
path_parts = path_info.split('/')
path_parts[1:-1] = [p for p in path_parts[1:-1] if p and p != '.']
name = path_parts[1]
del path_parts[1]
script_name = environ.get('SCRIPT_NAME','')
script_name = posixpath.normpath(script_name+'/'+name)
if script_name.endswith('/'):
script_name = script_name[:-1]
if not name and not script_name.endswith('/'):
script_name += '/'
environ['SCRIPT_NAME'] = script_name
environ['PATH_INFO'] = '/'.join(path_parts)
# Special case: '/.' on PATH_INFO doesn't get stripped,
# because we don't strip the last element of PATH_INFO
# if there's only one path part left. Instead of fixing this
# above, we fix it here so that PATH_INFO gets normalized to
# an empty string in the environ.
if name=='.':
name = None
return name
def setup_testing_defaults(environ):
"""Update 'environ' with trivial defaults for testing purposes
This adds various parameters required for WSGI, including HTTP_HOST,
SERVER_NAME, SERVER_PORT, REQUEST_METHOD, SCRIPT_NAME, PATH_INFO,
and all of the wsgi.* variables. It only supplies default values,
and does not replace any existing settings for these variables.
This routine is intended to make it easier for unit tests of WSGI
servers and applications to set up dummy environments. It should *not*
be used by actual WSGI servers or applications, since the data is fake!
"""
environ.setdefault('SERVER_NAME','127.0.0.1')
environ.setdefault('SERVER_PROTOCOL','HTTP/1.0')
environ.setdefault('HTTP_HOST',environ['SERVER_NAME'])
environ.setdefault('REQUEST_METHOD','GET')
if 'SCRIPT_NAME' not in environ and 'PATH_INFO' not in environ:
environ.setdefault('SCRIPT_NAME','')
environ.setdefault('PATH_INFO','/')
environ.setdefault('wsgi.version', (1,0))
environ.setdefault('wsgi.run_once', 0)
environ.setdefault('wsgi.multithread', 0)
environ.setdefault('wsgi.multiprocess', 0)
from io import StringIO, BytesIO
environ.setdefault('wsgi.input', BytesIO())
environ.setdefault('wsgi.errors', StringIO())
environ.setdefault('wsgi.url_scheme',guess_scheme(environ))
if environ['wsgi.url_scheme']=='http':
environ.setdefault('SERVER_PORT', '80')
elif environ['wsgi.url_scheme']=='https':
environ.setdefault('SERVER_PORT', '443')
_hoppish = {
'connection':1, 'keep-alive':1, 'proxy-authenticate':1,
'proxy-authorization':1, 'te':1, 'trailers':1, 'transfer-encoding':1,
'upgrade':1
}.__contains__
def is_hop_by_hop(header_name):
"""Return true if 'header_name' is an HTTP/1.1 "Hop-by-Hop" header"""
return _hoppish(header_name.lower())
# (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org)
# Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
# Also licenced under the Apache License, 2.0: http://opensource.org/licenses/apache2.0.php
# Licensed to PSF under a Contributor Agreement
"""
Middleware to check for obedience to the WSGI specification.
Some of the things this checks:
* Signature of the application and start_response (including that
keyword arguments are not used).
* Environment checks:
- Environment is a dictionary (and not a subclass).
- That all the required keys are in the environment: REQUEST_METHOD,
SERVER_NAME, SERVER_PORT, wsgi.version, wsgi.input, wsgi.errors,
wsgi.multithread, wsgi.multiprocess, wsgi.run_once
- That HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH are not in the
environment (these headers should appear as CONTENT_LENGTH and
CONTENT_TYPE).
- Warns if QUERY_STRING is missing, as the cgi module acts
unpredictably in that case.
- That CGI-style variables (that don't contain a .) have
(non-unicode) string values
- That wsgi.version is a tuple
- That wsgi.url_scheme is 'http' or 'https' (@@: is this too
restrictive?)
- Warns if the REQUEST_METHOD is not known (@@: probably too
restrictive).
- That SCRIPT_NAME and PATH_INFO are empty or start with /
- That at least one of SCRIPT_NAME or PATH_INFO are set.
- That CONTENT_LENGTH is a positive integer.
- That SCRIPT_NAME is not '/' (it should be '', and PATH_INFO should
be '/').
- That wsgi.input has the methods read, readline, readlines, and
__iter__
- That wsgi.errors has the methods flush, write, writelines
* The status is a string, contains a space, starts with an integer,
and that integer is in range (> 100).
* That the headers is a list (not a subclass, not another kind of
sequence).
* That the items of the headers are tuples of strings.
* That there is no 'status' header (that is used in CGI, but not in
WSGI).
* That the headers don't contain newlines or colons, end in _ or -, or
contain characters codes below 037.
* That Content-Type is given if there is content (CGI often has a
default content type, but WSGI does not).
* That no Content-Type is given when there is no content (@@: is this
too restrictive?)
* That the exc_info argument to start_response is a tuple or None.
* That all calls to the writer are with strings, and no other methods
on the writer are accessed.
* That wsgi.input is used properly:
- .read() is called with zero or one argument
- That it returns a string
- That readline, readlines, and __iter__ return strings
- That .close() is not called
- No other methods are provided
* That wsgi.errors is used properly:
- .write() and .writelines() is called with a string
- That .close() is not called, and no other methods are provided.
* The response iterator:
- That it is not a string (it should be a list of a single string; a
string will work, but perform horribly).
- That .__next__() returns a string
- That the iterator is not iterated over until start_response has
been called (that can signal either a server or application
error).
- That .close() is called (doesn't raise exception, only prints to
sys.stderr, because we only know it isn't called when the object
is garbage collected).
"""
__all__ = ['validator']
import re
import sys
import warnings
header_re = re.compile(r'^[a-zA-Z][a-zA-Z0-9\-_]*$')
bad_header_value_re = re.compile(r'[\000-\037]')
class WSGIWarning(Warning):
"""
Raised in response to WSGI-spec-related warnings
"""
def assert_(cond, *args):
if not cond:
raise AssertionError(*args)
def check_string_type(value, title):
if type (value) is str:
return value
raise AssertionError(
"{0} must be of type str (got {1})".format(title, repr(value)))
def validator(application):
"""
When applied between a WSGI server and a WSGI application, this
middleware will check for WSGI compliancy on a number of levels.
This middleware does not modify the request or response in any
way, but will raise an AssertionError if anything seems off
(except for a failure to close the application iterator, which
will be printed to stderr -- there's no way to raise an exception
at that point).
"""
def lint_app(*args, **kw):
assert_(len(args) == 2, "Two arguments required")
assert_(not kw, "No keyword arguments allowed")
environ, start_response = args
check_environ(environ)
# We use this to check if the application returns without
# calling start_response:
start_response_started = []
def start_response_wrapper(*args, **kw):
assert_(len(args) == 2 or len(args) == 3, (
"Invalid number of arguments: %s" % (args,)))
assert_(not kw, "No keyword arguments allowed")
status = args[0]
headers = args[1]
if len(args) == 3:
exc_info = args[2]
else:
exc_info = None
check_status(status)
check_headers(headers)
check_content_type(status, headers)
check_exc_info(exc_info)
start_response_started.append(None)
return WriteWrapper(start_response(*args))
environ['wsgi.input'] = InputWrapper(environ['wsgi.input'])
environ['wsgi.errors'] = ErrorWrapper(environ['wsgi.errors'])
iterator = application(environ, start_response_wrapper)
assert_(iterator is not None and iterator != False,
"The application must return an iterator, if only an empty list")
check_iterator(iterator)
return IteratorWrapper(iterator, start_response_started)
return lint_app
class InputWrapper:
def __init__(self, wsgi_input):
self.input = wsgi_input
def read(self, *args):
assert_(len(args) == 1)
v = self.input.read(*args)
assert_(type(v) is bytes)
return v
def readline(self, *args):
assert_(len(args) <= 1)
v = self.input.readline(*args)
assert_(type(v) is bytes)
return v
def readlines(self, *args):
assert_(len(args) <= 1)
lines = self.input.readlines(*args)
assert_(type(lines) is list)
for line in lines:
assert_(type(line) is bytes)
return lines
def __iter__(self):
while 1:
line = self.readline()
if not line:
return
yield line
def close(self):
assert_(0, "input.close() must not be called")
class ErrorWrapper:
def __init__(self, wsgi_errors):
self.errors = wsgi_errors
def write(self, s):
assert_(type(s) is str)
self.errors.write(s)
def flush(self):
self.errors.flush()
def writelines(self, seq):
for line in seq:
self.write(line)
def close(self):
assert_(0, "errors.close() must not be called")
class WriteWrapper:
def __init__(self, wsgi_writer):
self.writer = wsgi_writer
def __call__(self, s):
assert_(type(s) is bytes)
self.writer(s)
class PartialIteratorWrapper:
def __init__(self, wsgi_iterator):
self.iterator = wsgi_iterator
def __iter__(self):
# We want to make sure __iter__ is called
return IteratorWrapper(self.iterator, None)
class IteratorWrapper:
def __init__(self, wsgi_iterator, check_start_response):
self.original_iterator = wsgi_iterator
self.iterator = iter(wsgi_iterator)
self.closed = False
self.check_start_response = check_start_response
def __iter__(self):
return self
def __next__(self):
assert_(not self.closed,
"Iterator read after closed")
v = next(self.iterator)
if type(v) is not bytes:
assert_(False, "Iterator yielded non-bytestring (%r)" % (v,))
if self.check_start_response is not None:
assert_(self.check_start_response,
"The application returns and we started iterating over its body, but start_response has not yet been called")
self.check_start_response = None
return v
def close(self):
self.closed = True
if hasattr(self.original_iterator, 'close'):
self.original_iterator.close()
def __del__(self):
if not self.closed:
sys.stderr.write(
"Iterator garbage collected without being closed")
assert_(self.closed,
"Iterator garbage collected without being closed")
def check_environ(environ):
assert_(type(environ) is dict,
"Environment is not of the right type: %r (environment: %r)"
% (type(environ), environ))
for key in ['REQUEST_METHOD', 'SERVER_NAME', 'SERVER_PORT',
'wsgi.version', 'wsgi.input', 'wsgi.errors',
'wsgi.multithread', 'wsgi.multiprocess',
'wsgi.run_once']:
assert_(key in environ,
"Environment missing required key: %r" % (key,))
for key in ['HTTP_CONTENT_TYPE', 'HTTP_CONTENT_LENGTH']:
assert_(key not in environ,
"Environment should not have the key: %s "
"(use %s instead)" % (key, key[5:]))
if 'QUERY_STRING' not in environ:
warnings.warn(
'QUERY_STRING is not in the WSGI environment; the cgi '
'module will use sys.argv when this variable is missing, '
'so application errors are more likely',
WSGIWarning)
for key in environ.keys():
if '.' in key:
# Extension, we don't care about its type
continue
assert_(type(environ[key]) is str,
"Environmental variable %s is not a string: %r (value: %r)"
% (key, type(environ[key]), environ[key]))
assert_(type(environ['wsgi.version']) is tuple,
"wsgi.version should be a tuple (%r)" % (environ['wsgi.version'],))
assert_(environ['wsgi.url_scheme'] in ('http', 'https'),
"wsgi.url_scheme unknown: %r" % environ['wsgi.url_scheme'])
check_input(environ['wsgi.input'])
check_errors(environ['wsgi.errors'])
# @@: these need filling out:
if environ['REQUEST_METHOD'] not in (
'GET', 'HEAD', 'POST', 'OPTIONS', 'PATCH', 'PUT', 'DELETE', 'TRACE'):
warnings.warn(
"Unknown REQUEST_METHOD: %r" % environ['REQUEST_METHOD'],
WSGIWarning)
assert_(not environ.get('SCRIPT_NAME')
or environ['SCRIPT_NAME'].startswith('/'),
"SCRIPT_NAME doesn't start with /: %r" % environ['SCRIPT_NAME'])
assert_(not environ.get('PATH_INFO')
or environ['PATH_INFO'].startswith('/'),
"PATH_INFO doesn't start with /: %r" % environ['PATH_INFO'])
if environ.get('CONTENT_LENGTH'):
assert_(int(environ['CONTENT_LENGTH']) >= 0,
"Invalid CONTENT_LENGTH: %r" % environ['CONTENT_LENGTH'])
if not environ.get('SCRIPT_NAME'):
assert_('PATH_INFO' in environ,
"One of SCRIPT_NAME or PATH_INFO are required (PATH_INFO "
"should at least be '/' if SCRIPT_NAME is empty)")
assert_(environ.get('SCRIPT_NAME') != '/',
"SCRIPT_NAME cannot be '/'; it should instead be '', and "
"PATH_INFO should be '/'")
def check_input(wsgi_input):
for attr in ['read', 'readline', 'readlines', '__iter__']:
assert_(hasattr(wsgi_input, attr),
"wsgi.input (%r) doesn't have the attribute %s"
% (wsgi_input, attr))
def check_errors(wsgi_errors):
for attr in ['flush', 'write', 'writelines']:
assert_(hasattr(wsgi_errors, attr),
"wsgi.errors (%r) doesn't have the attribute %s"
% (wsgi_errors, attr))
def check_status(status):
status = check_string_type(status, "Status")
# Implicitly check that we can turn it into an integer:
status_code = status.split(None, 1)[0]
assert_(len(status_code) == 3,
"Status codes must be three characters: %r" % status_code)
status_int = int(status_code)
assert_(status_int >= 100, "Status code is invalid: %r" % status_int)
if len(status) < 4 or status[3] != ' ':
warnings.warn(
"The status string (%r) should be a three-digit integer "
"followed by a single space and a status explanation"
% status, WSGIWarning)
def check_headers(headers):
assert_(type(headers) is list,
"Headers (%r) must be of type list: %r"
% (headers, type(headers)))
header_names = {}
for item in headers:
assert_(type(item) is tuple,
"Individual headers (%r) must be of type tuple: %r"
% (item, type(item)))
assert_(len(item) == 2)
name, value = item
name = check_string_type(name, "Header name")
value = check_string_type(value, "Header value")
assert_(name.lower() != 'status',
"The Status header cannot be used; it conflicts with CGI "
"script, and HTTP status is not given through headers "
"(value: %r)." % value)
header_names[name.lower()] = None
assert_('\n' not in name and ':' not in name,
"Header names may not contain ':' or '\\n': %r" % name)
assert_(header_re.search(name), "Bad header name: %r" % name)
assert_(not name.endswith('-') and not name.endswith('_'),
"Names may not end in '-' or '_': %r" % name)
if bad_header_value_re.search(value):
assert_(0, "Bad header value: %r (bad char: %r)"
% (value, bad_header_value_re.search(value).group(0)))
def check_content_type(status, headers):
status = check_string_type(status, "Status")
code = int(status.split(None, 1)[0])
# @@: need one more person to verify this interpretation of RFC 2616
# http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
NO_MESSAGE_BODY = (204, 304)
for name, value in headers:
name = check_string_type(name, "Header name")
if name.lower() == 'content-type':
if code not in NO_MESSAGE_BODY:
return
assert_(0, ("Content-Type header found in a %s response, "
"which must not return content.") % code)
if code not in NO_MESSAGE_BODY:
assert_(0, "No Content-Type header found in headers (%s)" % headers)
def check_exc_info(exc_info):
assert_(exc_info is None or type(exc_info) is tuple,
"exc_info (%r) is not a tuple: %r" % (exc_info, type(exc_info)))
# More exc_info checks?
def check_iterator(iterator):
# Technically a bytestring is legal, which is why it's a really bad
# idea, because it may cause the response to be returned
# character-by-character
assert_(not isinstance(iterator, (str, bytes)),
"You should not return a string as your application iterator, "
"instead return a single-item list containing a bytestring.")
"""wsgiref -- a WSGI (PEP 3333) Reference Library
Current Contents:
* util -- Miscellaneous useful functions and wrappers
* headers -- Manage response headers
* handlers -- base classes for server/gateway implementations
* simple_server -- a simple BaseHTTPServer that supports WSGI
* validate -- validation wrapper that sits between an app and a server
to detect errors in either
To-Do:
* cgi_gateway -- Run WSGI apps under CGI (pending a deployment standard)
* cgi_wrapper -- Run CGI apps under WSGI
* router -- a simple middleware component that handles URL traversal
"""
"""Core XML support for Python.
This package contains four sub-packages:
dom -- The W3C Document Object Model. This supports DOM Level 1 +
Namespaces.
parsers -- Python wrappers for XML parsers (currently only supports Expat).
sax -- The Simple API for XML, developed by XML-Dev, led by David
Megginson and ported to Python by Lars Marius Garshol. This
supports the SAX 2 API.
etree -- The ElementTree XML library. This is a subset of the full
ElementTree XML release.
"""
__all__ = ["dom", "parsers", "sax", "etree"]
"""Registration facilities for DOM. This module should not be used
directly. Instead, the functions getDOMImplementation and
registerDOMImplementation should be imported from xml.dom."""
# This is a list of well-known implementations. Well-known names
# should be published by posting to [email protected], and are
# subsequently recorded in this file.
import sys
well_known_implementations = {
'minidom':'xml.dom.minidom',
'4DOM': 'xml.dom.DOMImplementation',
}
# DOM implementations not officially registered should register
# themselves with their
registered = {}
def registerDOMImplementation(name, factory):
"""registerDOMImplementation(name, factory)
Register the factory function with the name. The factory function
should return an object which implements the DOMImplementation
interface. The factory function can either return the same object,
or a new one (e.g. if that implementation supports some
customization)."""
registered[name] = factory
def _good_enough(dom, features):
"_good_enough(dom, features) -> Return 1 if the dom offers the features"
for f,v in features:
if not dom.hasFeature(f,v):
return 0
return 1
def getDOMImplementation(name=None, features=()):
"""getDOMImplementation(name = None, features = ()) -> DOM implementation.
Return a suitable DOM implementation. The name is either
well-known, the module name of a DOM implementation, or None. If
it is not None, imports the corresponding module and returns
DOMImplementation object if the import succeeds.
If name is not given, consider the available implementations to
find one with the required feature set. If no implementation can
be found, raise an ImportError. The features list must be a sequence
of (feature, version) pairs which are passed to hasFeature."""
import os
creator = None
mod = well_known_implementations.get(name)
if mod:
mod = __import__(mod, {}, {}, ['getDOMImplementation'])
return mod.getDOMImplementation()
elif name:
return registered[name]()
elif not sys.flags.ignore_environment and "PYTHON_DOM" in os.environ:
return getDOMImplementation(name = os.environ["PYTHON_DOM"])
# User did not specify a name, try implementations in arbitrary
# order, returning the one that has the required features
if isinstance(features, str):
features = _parse_feature_string(features)
for creator in registered.values():
dom = creator()
if _good_enough(dom, features):
return dom
for creator in well_known_implementations.keys():
try:
dom = getDOMImplementation(name = creator)
except Exception: # typically ImportError, or AttributeError
continue
if _good_enough(dom, features):
return dom
raise ImportError("no suitable DOM implementation found")
def _parse_feature_string(s):
features = []
parts = s.split()
i = 0
length = len(parts)
while i < length:
feature = parts[i]
if feature[0] in "0123456789":
raise ValueError("bad feature name: %r" % (feature,))
i = i + 1
version = None
if i < length:
v = parts[i]
if v[0] in "0123456789":
i = i + 1
version = v
features.append((feature, version))
return tuple(features)
"""Facility to use the Expat parser to load a minidom instance
from a string or file.
This avoids all the overhead of SAX and pulldom to gain performance.
"""
# Warning!
#
# This module is tightly bound to the implementation details of the
# minidom DOM and can't be used with other DOM implementations. This
# is due, in part, to a lack of appropriate methods in the DOM (there is
# no way to create Entity and Notation nodes via the DOM Level 2
# interface), and for performance. The later is the cause of some fairly
# cryptic code.
#
# Performance hacks:
#
# - .character_data_handler() has an extra case in which continuing
# data is appended to an existing Text node; this can be a
# speedup since pyexpat can break up character data into multiple
# callbacks even though we set the buffer_text attribute on the
# parser. This also gives us the advantage that we don't need a
# separate normalization pass.
#
# - Determining that a node exists is done using an identity comparison
# with None rather than a truth test; this avoids searching for and
# calling any methods on the node object if it exists. (A rather
# nice speedup is achieved this way as well!)
from xml.dom import xmlbuilder, minidom, Node
from xml.dom import EMPTY_NAMESPACE, EMPTY_PREFIX, XMLNS_NAMESPACE
from xml.parsers import expat
from xml.dom.minidom import _append_child, _set_attribute_node
from xml.dom.NodeFilter import NodeFilter
TEXT_NODE = Node.TEXT_NODE
CDATA_SECTION_NODE = Node.CDATA_SECTION_NODE
DOCUMENT_NODE = Node.DOCUMENT_NODE
FILTER_ACCEPT = xmlbuilder.DOMBuilderFilter.FILTER_ACCEPT
FILTER_REJECT = xmlbuilder.DOMBuilderFilter.FILTER_REJECT
FILTER_SKIP = xmlbuilder.DOMBuilderFilter.FILTER_SKIP
FILTER_INTERRUPT = xmlbuilder.DOMBuilderFilter.FILTER_INTERRUPT
theDOMImplementation = minidom.getDOMImplementation()
# Expat typename -> TypeInfo
_typeinfo_map = {
"CDATA": minidom.TypeInfo(None, "cdata"),
"ENUM": minidom.TypeInfo(None, "enumeration"),
"ENTITY": minidom.TypeInfo(None, "entity"),
"ENTITIES": minidom.TypeInfo(None, "entities"),
"ID": minidom.TypeInfo(None, "id"),
"IDREF": minidom.TypeInfo(None, "idref"),
"IDREFS": minidom.TypeInfo(None, "idrefs"),
"NMTOKEN": minidom.TypeInfo(None, "nmtoken"),
"NMTOKENS": minidom.TypeInfo(None, "nmtokens"),
}
class ElementInfo(object):
__slots__ = '_attr_info', '_model', 'tagName'
def __init__(self, tagName, model=None):
self.tagName = tagName
self._attr_info = []
self._model = model
def __getstate__(self):
return self._attr_info, self._model, self.tagName
def __setstate__(self, state):
self._attr_info, self._model, self.tagName = state
def getAttributeType(self, aname):
for info in self._attr_info:
if info[1] == aname:
t = info[-2]
if t[0] == "(":
return _typeinfo_map["ENUM"]
else:
return _typeinfo_map[info[-2]]
return minidom._no_type
def getAttributeTypeNS(self, namespaceURI, localName):
return minidom._no_type
def isElementContent(self):
if self._model:
type = self._model[0]
return type not in (expat.model.XML_CTYPE_ANY,
expat.model.XML_CTYPE_MIXED)
else:
return False
def isEmpty(self):
if self._model:
return self._model[0] == expat.model.XML_CTYPE_EMPTY
else:
return False
def isId(self, aname):
for info in self._attr_info:
if info[1] == aname:
return info[-2] == "ID"
return False
def isIdNS(self, euri, ename, auri, aname):
# not sure this is meaningful
return self.isId((auri, aname))
def _intern(builder, s):
return builder._intern_setdefault(s, s)
def _parse_ns_name(builder, name):
assert ' ' in name
parts = name.split(' ')
intern = builder._intern_setdefault
if len(parts) == 3:
uri, localname, prefix = parts
prefix = intern(prefix, prefix)
qname = "%s:%s" % (prefix, localname)
qname = intern(qname, qname)
localname = intern(localname, localname)
elif len(parts) == 2:
uri, localname = parts
prefix = EMPTY_PREFIX
qname = localname = intern(localname, localname)
else:
raise ValueError("Unsupported syntax: spaces in URIs not supported: %r" % name)
return intern(uri, uri), localname, prefix, qname
class ExpatBuilder:
"""Document builder that uses Expat to build a ParsedXML.DOM document
instance."""
def __init__(self, options=None):
if options is None:
options = xmlbuilder.Options()
self._options = options
if self._options.filter is not None:
self._filter = FilterVisibilityController(self._options.filter)
else:
self._filter = None
# This *really* doesn't do anything in this case, so
# override it with something fast & minimal.
self._finish_start_element = id
self._parser = None
self.reset()
def createParser(self):
"""Create a new parser object."""
return expat.ParserCreate()
def getParser(self):
"""Return the parser object, creating a new one if needed."""
if not self._parser:
self._parser = self.createParser()
self._intern_setdefault = self._parser.intern.setdefault
self._parser.buffer_text = True
self._parser.ordered_attributes = True
self._parser.specified_attributes = True
self.install(self._parser)
return self._parser
def reset(self):
"""Free all data structures used during DOM construction."""
self.document = theDOMImplementation.createDocument(
EMPTY_NAMESPACE, None, None)
self.curNode = self.document
self._elem_info = self.document._elem_info
self._cdata = False
def install(self, parser):
"""Install the callbacks needed to build the DOM into the parser."""
# This creates circular references!
parser.StartDoctypeDeclHandler = self.start_doctype_decl_handler
parser.StartElementHandler = self.first_element_handler
parser.EndElementHandler = self.end_element_handler
parser.ProcessingInstructionHandler = self.pi_handler
if self._options.entities:
parser.EntityDeclHandler = self.entity_decl_handler
parser.NotationDeclHandler = self.notation_decl_handler
if self._options.comments:
parser.CommentHandler = self.comment_handler
if self._options.cdata_sections:
parser.StartCdataSectionHandler = self.start_cdata_section_handler
parser.EndCdataSectionHandler = self.end_cdata_section_handler
parser.CharacterDataHandler = self.character_data_handler_cdata
else:
parser.CharacterDataHandler = self.character_data_handler
parser.ExternalEntityRefHandler = self.external_entity_ref_handler
parser.XmlDeclHandler = self.xml_decl_handler
parser.ElementDeclHandler = self.element_decl_handler
parser.AttlistDeclHandler = self.attlist_decl_handler
def parseFile(self, file):
"""Parse a document from a file object, returning the document
node."""
parser = self.getParser()
first_buffer = True
try:
while 1:
buffer = file.read(16*1024)
if not buffer:
break
parser.Parse(buffer, 0)
if first_buffer and self.document.documentElement:
self._setup_subset(buffer)
first_buffer = False
parser.Parse("", True)
except ParseEscape:
pass
doc = self.document
self.reset()
self._parser = None
return doc
def parseString(self, string):
"""Parse a document from a string, returning the document node."""
parser = self.getParser()
try:
parser.Parse(string, True)
self._setup_subset(string)
except ParseEscape:
pass
doc = self.document
self.reset()
self._parser = None
return doc
def _setup_subset(self, buffer):
"""Load the internal subset if there might be one."""
if self.document.doctype:
extractor = InternalSubsetExtractor()
extractor.parseString(buffer)
subset = extractor.getSubset()
self.document.doctype.internalSubset = subset
def start_doctype_decl_handler(self, doctypeName, systemId, publicId,
has_internal_subset):
doctype = self.document.implementation.createDocumentType(
doctypeName, publicId, systemId)
doctype.ownerDocument = self.document
_append_child(self.document, doctype)
self.document.doctype = doctype
if self._filter and self._filter.acceptNode(doctype) == FILTER_REJECT:
self.document.doctype = None
del self.document.childNodes[-1]
doctype = None
self._parser.EntityDeclHandler = None
self._parser.NotationDeclHandler = None
if has_internal_subset:
if doctype is not None:
doctype.entities._seq = []
doctype.notations._seq = []
self._parser.CommentHandler = None
self._parser.ProcessingInstructionHandler = None
self._parser.EndDoctypeDeclHandler = self.end_doctype_decl_handler
def end_doctype_decl_handler(self):
if self._options.comments:
self._parser.CommentHandler = self.comment_handler
self._parser.ProcessingInstructionHandler = self.pi_handler
if not (self._elem_info or self._filter):
self._finish_end_element = id
def pi_handler(self, target, data):
node = self.document.createProcessingInstruction(target, data)
_append_child(self.curNode, node)
if self._filter and self._filter.acceptNode(node) == FILTER_REJECT:
self.curNode.removeChild(node)
def character_data_handler_cdata(self, data):
childNodes = self.curNode.childNodes
if self._cdata:
if ( self._cdata_continue
and childNodes[-1].nodeType == CDATA_SECTION_NODE):
childNodes[-1].appendData(data)
return
node = self.document.createCDATASection(data)
self._cdata_continue = True
elif childNodes and childNodes[-1].nodeType == TEXT_NODE:
node = childNodes[-1]
value = node.data + data
node.data = value
return
else:
node = minidom.Text()
node.data = data
node.ownerDocument = self.document
_append_child(self.curNode, node)
def character_data_handler(self, data):
childNodes = self.curNode.childNodes
if childNodes and childNodes[-1].nodeType == TEXT_NODE:
node = childNodes[-1]
node.data = node.data + data
return
node = minidom.Text()
node.data = node.data + data
node.ownerDocument = self.document
_append_child(self.curNode, node)
def entity_decl_handler(self, entityName, is_parameter_entity, value,
base, systemId, publicId, notationName):
if is_parameter_entity:
# we don't care about parameter entities for the DOM
return
if not self._options.entities:
return
node = self.document._create_entity(entityName, publicId,
systemId, notationName)
if value is not None:
# internal entity
# node *should* be readonly, but we'll cheat
child = self.document.createTextNode(value)
node.childNodes.append(child)
self.document.doctype.entities._seq.append(node)
if self._filter and self._filter.acceptNode(node) == FILTER_REJECT:
del self.document.doctype.entities._seq[-1]
def notation_decl_handler(self, notationName, base, systemId, publicId):
node = self.document._create_notation(notationName, publicId, systemId)
self.document.doctype.notations._seq.append(node)
if self._filter and self._filter.acceptNode(node) == FILTER_ACCEPT:
del self.document.doctype.notations._seq[-1]
def comment_handler(self, data):
node = self.document.createComment(data)
_append_child(self.curNode, node)
if self._filter and self._filter.acceptNode(node) == FILTER_REJECT:
self.curNode.removeChild(node)
def start_cdata_section_handler(self):
self._cdata = True
self._cdata_continue = False
def end_cdata_section_handler(self):
self._cdata = False
self._cdata_continue = False
def external_entity_ref_handler(self, context, base, systemId, publicId):
return 1
def first_element_handler(self, name, attributes):
if self._filter is None and not self._elem_info:
self._finish_end_element = id
self.getParser().StartElementHandler = self.start_element_handler
self.start_element_handler(name, attributes)
def start_element_handler(self, name, attributes):
node = self.document.createElement(name)
_append_child(self.curNode, node)
self.curNode = node
if attributes:
for i in range(0, len(attributes), 2):
a = minidom.Attr(attributes[i], EMPTY_NAMESPACE,
None, EMPTY_PREFIX)
value = attributes[i+1]
a.value = value
a.ownerDocument = self.document
_set_attribute_node(node, a)
if node is not self.document.documentElement:
self._finish_start_element(node)
def _finish_start_element(self, node):
if self._filter:
# To be general, we'd have to call isSameNode(), but this
# is sufficient for minidom:
if node is self.document.documentElement:
return
filt = self._filter.startContainer(node)
if filt == FILTER_REJECT:
# ignore this node & all descendents
Rejecter(self)
elif filt == FILTER_SKIP:
# ignore this node, but make it's children become
# children of the parent node
Skipper(self)
else:
return
self.curNode = node.parentNode
node.parentNode.removeChild(node)
node.unlink()
# If this ever changes, Namespaces.end_element_handler() needs to
# be changed to match.
#
def end_element_handler(self, name):
curNode = self.curNode
self.curNode = curNode.parentNode
self._finish_end_element(curNode)
def _finish_end_element(self, curNode):
info = self._elem_info.get(curNode.tagName)
if info:
self._handle_white_text_nodes(curNode, info)
if self._filter:
if curNode is self.document.documentElement:
return
if self._filter.acceptNode(curNode) == FILTER_REJECT:
self.curNode.removeChild(curNode)
curNode.unlink()
def _handle_white_text_nodes(self, node, info):
if (self._options.whitespace_in_element_content
or not info.isElementContent()):
return
# We have element type information and should remove ignorable
# whitespace; identify for text nodes which contain only
# whitespace.
L = []
for child in node.childNodes:
if child.nodeType == TEXT_NODE and not child.data.strip():
L.append(child)
# Remove ignorable whitespace from the tree.
for child in L:
node.removeChild(child)
def element_decl_handler(self, name, model):
info = self._elem_info.get(name)
if info is None:
self._elem_info[name] = ElementInfo(name, model)
else:
assert info._model is None
info._model = model
def attlist_decl_handler(self, elem, name, type, default, required):
info = self._elem_info.get(elem)
if info is None:
info = ElementInfo(elem)
self._elem_info[elem] = info
info._attr_info.append(
[None, name, None, None, default, 0, type, required])
def xml_decl_handler(self, version, encoding, standalone):
self.document.version = version
self.document.encoding = encoding
# This is still a little ugly, thanks to the pyexpat API. ;-(
if standalone >= 0:
if standalone:
self.document.standalone = True
else:
self.document.standalone = False
# Don't include FILTER_INTERRUPT, since that's checked separately
# where allowed.
_ALLOWED_FILTER_RETURNS = (FILTER_ACCEPT, FILTER_REJECT, FILTER_SKIP)
class FilterVisibilityController(object):
"""Wrapper around a DOMBuilderFilter which implements the checks
to make the whatToShow filter attribute work."""
__slots__ = 'filter',
def __init__(self, filter):
self.filter = filter
def startContainer(self, node):
mask = self._nodetype_mask[node.nodeType]
if self.filter.whatToShow & mask:
val = self.filter.startContainer(node)
if val == FILTER_INTERRUPT:
raise ParseEscape
if val not in _ALLOWED_FILTER_RETURNS:
raise ValueError(
"startContainer() returned illegal value: " + repr(val))
return val
else:
return FILTER_ACCEPT
def acceptNode(self, node):
mask = self._nodetype_mask[node.nodeType]
if self.filter.whatToShow & mask:
val = self.filter.acceptNode(node)
if val == FILTER_INTERRUPT:
raise ParseEscape
if val == FILTER_SKIP:
# move all child nodes to the parent, and remove this node
parent = node.parentNode
for child in node.childNodes[:]:
parent.appendChild(child)
# node is handled by the caller
return FILTER_REJECT
if val not in _ALLOWED_FILTER_RETURNS:
raise ValueError(
"acceptNode() returned illegal value: " + repr(val))
return val
else:
return FILTER_ACCEPT
_nodetype_mask = {
Node.ELEMENT_NODE: NodeFilter.SHOW_ELEMENT,
Node.ATTRIBUTE_NODE: NodeFilter.SHOW_ATTRIBUTE,
Node.TEXT_NODE: NodeFilter.SHOW_TEXT,
Node.CDATA_SECTION_NODE: NodeFilter.SHOW_CDATA_SECTION,
Node.ENTITY_REFERENCE_NODE: NodeFilter.SHOW_ENTITY_REFERENCE,
Node.ENTITY_NODE: NodeFilter.SHOW_ENTITY,
Node.PROCESSING_INSTRUCTION_NODE: NodeFilter.SHOW_PROCESSING_INSTRUCTION,
Node.COMMENT_NODE: NodeFilter.SHOW_COMMENT,
Node.DOCUMENT_NODE: NodeFilter.SHOW_DOCUMENT,
Node.DOCUMENT_TYPE_NODE: NodeFilter.SHOW_DOCUMENT_TYPE,
Node.DOCUMENT_FRAGMENT_NODE: NodeFilter.SHOW_DOCUMENT_FRAGMENT,
Node.NOTATION_NODE: NodeFilter.SHOW_NOTATION,
}
class FilterCrutch(object):
__slots__ = '_builder', '_level', '_old_start', '_old_end'
def __init__(self, builder):
self._level = 0
self._builder = builder
parser = builder._parser
self._old_start = parser.StartElementHandler
self._old_end = parser.EndElementHandler
parser.StartElementHandler = self.start_element_handler
parser.EndElementHandler = self.end_element_handler
class Rejecter(FilterCrutch):
__slots__ = ()
def __init__(self, builder):
FilterCrutch.__init__(self, builder)
parser = builder._parser
for name in ("ProcessingInstructionHandler",
"CommentHandler",
"CharacterDataHandler",
"StartCdataSectionHandler",
"EndCdataSectionHandler",
"ExternalEntityRefHandler",
):
setattr(parser, name, None)
def start_element_handler(self, *args):
self._level = self._level + 1
def end_element_handler(self, *args):
if self._level == 0:
# restore the old handlers
parser = self._builder._parser
self._builder.install(parser)
parser.StartElementHandler = self._old_start
parser.EndElementHandler = self._old_end
else:
self._level = self._level - 1
class Skipper(FilterCrutch):
__slots__ = ()
def start_element_handler(self, *args):
node = self._builder.curNode
self._old_start(*args)
if self._builder.curNode is not node:
self._level = self._level + 1
def end_element_handler(self, *args):
if self._level == 0:
# We're popping back out of the node we're skipping, so we
# shouldn't need to do anything but reset the handlers.
self._builder._parser.StartElementHandler = self._old_start
self._builder._parser.EndElementHandler = self._old_end
self._builder = None
else:
self._level = self._level - 1
self._old_end(*args)
# framework document used by the fragment builder.
# Takes a string for the doctype, subset string, and namespace attrs string.
_FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID = \
"http://xml.python.org/entities/fragment-builder/internal"
_FRAGMENT_BUILDER_TEMPLATE = (
'''\
<!DOCTYPE wrapper
%%s [
<!ENTITY fragment-builder-internal
SYSTEM "%s">
%%s
]>
<wrapper %%s
>&fragment-builder-internal;</wrapper>'''
% _FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID)
class FragmentBuilder(ExpatBuilder):
"""Builder which constructs document fragments given XML source
text and a context node.
The context node is expected to provide information about the
namespace declarations which are in scope at the start of the
fragment.
"""
def __init__(self, context, options=None):
if context.nodeType == DOCUMENT_NODE:
self.originalDocument = context
self.context = context
else:
self.originalDocument = context.ownerDocument
self.context = context
ExpatBuilder.__init__(self, options)
def reset(self):
ExpatBuilder.reset(self)
self.fragment = None
def parseFile(self, file):
"""Parse a document fragment from a file object, returning the
fragment node."""
return self.parseString(file.read())
def parseString(self, string):
"""Parse a document fragment from a string, returning the
fragment node."""
self._source = string
parser = self.getParser()
doctype = self.originalDocument.doctype
ident = ""
if doctype:
subset = doctype.internalSubset or self._getDeclarations()
if doctype.publicId:
ident = ('PUBLIC "%s" "%s"'
% (doctype.publicId, doctype.systemId))
elif doctype.systemId:
ident = 'SYSTEM "%s"' % doctype.systemId
else:
subset = ""
nsattrs = self._getNSattrs() # get ns decls from node's ancestors
document = _FRAGMENT_BUILDER_TEMPLATE % (ident, subset, nsattrs)
try:
parser.Parse(document, 1)
except:
self.reset()
raise
fragment = self.fragment
self.reset()
## self._parser = None
return fragment
def _getDeclarations(self):
"""Re-create the internal subset from the DocumentType node.
This is only needed if we don't already have the
internalSubset as a string.
"""
doctype = self.context.ownerDocument.doctype
s = ""
if doctype:
for i in range(doctype.notations.length):
notation = doctype.notations.item(i)
if s:
s = s + "\n "
s = "%s<!NOTATION %s" % (s, notation.nodeName)
if notation.publicId:
s = '%s PUBLIC "%s"\n "%s">' \
% (s, notation.publicId, notation.systemId)
else:
s = '%s SYSTEM "%s">' % (s, notation.systemId)
for i in range(doctype.entities.length):
entity = doctype.entities.item(i)
if s:
s = s + "\n "
s = "%s<!ENTITY %s" % (s, entity.nodeName)
if entity.publicId:
s = '%s PUBLIC "%s"\n "%s"' \
% (s, entity.publicId, entity.systemId)
elif entity.systemId:
s = '%s SYSTEM "%s"' % (s, entity.systemId)
else:
s = '%s "%s"' % (s, entity.firstChild.data)
if entity.notationName:
s = "%s NOTATION %s" % (s, entity.notationName)
s = s + ">"
return s
def _getNSattrs(self):
return ""
def external_entity_ref_handler(self, context, base, systemId, publicId):
if systemId == _FRAGMENT_BUILDER_INTERNAL_SYSTEM_ID:
# this entref is the one that we made to put the subtree
# in; all of our given input is parsed in here.
old_document = self.document
old_cur_node = self.curNode
parser = self._parser.ExternalEntityParserCreate(context)
# put the real document back, parse into the fragment to return
self.document = self.originalDocument
self.fragment = self.document.createDocumentFragment()
self.curNode = self.fragment
try:
parser.Parse(self._source, 1)
finally:
self.curNode = old_cur_node
self.document = old_document
self._source = None
return -1
else:
return ExpatBuilder.external_entity_ref_handler(
self, context, base, systemId, publicId)
class Namespaces:
"""Mix-in class for builders; adds support for namespaces."""
def _initNamespaces(self):
# list of (prefix, uri) ns declarations. Namespace attrs are
# constructed from this and added to the element's attrs.
self._ns_ordered_prefixes = []
def createParser(self):
"""Create a new namespace-handling parser."""
parser = expat.ParserCreate(namespace_separator=" ")
parser.namespace_prefixes = True
return parser
def install(self, parser):
"""Insert the namespace-handlers onto the parser."""
ExpatBuilder.install(self, parser)
if self._options.namespace_declarations:
parser.StartNamespaceDeclHandler = (
self.start_namespace_decl_handler)
def start_namespace_decl_handler(self, prefix, uri):
"""Push this namespace declaration on our storage."""
self._ns_ordered_prefixes.append((prefix, uri))
def start_element_handler(self, name, attributes):
if ' ' in name:
uri, localname, prefix, qname = _parse_ns_name(self, name)
else:
uri = EMPTY_NAMESPACE
qname = name
localname = None
prefix = EMPTY_PREFIX
node = minidom.Element(qname, uri, prefix, localname)
node.ownerDocument = self.document
_append_child(self.curNode, node)
self.curNode = node
if self._ns_ordered_prefixes:
for prefix, uri in self._ns_ordered_prefixes:
if prefix:
a = minidom.Attr(_intern(self, 'xmlns:' + prefix),
XMLNS_NAMESPACE, prefix, "xmlns")
else:
a = minidom.Attr("xmlns", XMLNS_NAMESPACE,
"xmlns", EMPTY_PREFIX)
a.value = uri
a.ownerDocument = self.document
_set_attribute_node(node, a)
del self._ns_ordered_prefixes[:]
if attributes:
node._ensure_attributes()
_attrs = node._attrs
_attrsNS = node._attrsNS
for i in range(0, len(attributes), 2):
aname = attributes[i]
value = attributes[i+1]
if ' ' in aname:
uri, localname, prefix, qname = _parse_ns_name(self, aname)
a = minidom.Attr(qname, uri, localname, prefix)
_attrs[qname] = a
_attrsNS[(uri, localname)] = a
else:
a = minidom.Attr(aname, EMPTY_NAMESPACE,
aname, EMPTY_PREFIX)
_attrs[aname] = a
_attrsNS[(EMPTY_NAMESPACE, aname)] = a
a.ownerDocument = self.document
a.value = value
a.ownerElement = node
if __debug__:
# This only adds some asserts to the original
# end_element_handler(), so we only define this when -O is not
# used. If changing one, be sure to check the other to see if
# it needs to be changed as well.
#
def end_element_handler(self, name):
curNode = self.curNode
if ' ' in name:
uri, localname, prefix, qname = _parse_ns_name(self, name)
assert (curNode.namespaceURI == uri
and curNode.localName == localname
and curNode.prefix == prefix), \
"element stack messed up! (namespace)"
else:
assert curNode.nodeName == name, \
"element stack messed up - bad nodeName"
assert curNode.namespaceURI == EMPTY_NAMESPACE, \
"element stack messed up - bad namespaceURI"
self.curNode = curNode.parentNode
self._finish_end_element(curNode)
class ExpatBuilderNS(Namespaces, ExpatBuilder):
"""Document builder that supports namespaces."""
def reset(self):
ExpatBuilder.reset(self)
self._initNamespaces()
class FragmentBuilderNS(Namespaces, FragmentBuilder):
"""Fragment builder that supports namespaces."""
def reset(self):
FragmentBuilder.reset(self)
self._initNamespaces()
def _getNSattrs(self):
"""Return string of namespace attributes from this element and
ancestors."""
# XXX This needs to be re-written to walk the ancestors of the
# context to build up the namespace information from
# declarations, elements, and attributes found in context.
# Otherwise we have to store a bunch more data on the DOM
# (though that *might* be more reliable -- not clear).
attrs = ""
context = self.context
L = []
while context:
if hasattr(context, '_ns_prefix_uri'):
for prefix, uri in context._ns_prefix_uri.items():
# add every new NS decl from context to L and attrs string
if prefix in L:
continue
L.append(prefix)
if prefix:
declname = "xmlns:" + prefix
else:
declname = "xmlns"
if attrs:
attrs = "%s\n %s='%s'" % (attrs, declname, uri)
else:
attrs = " %s='%s'" % (declname, uri)
context = context.parentNode
return attrs
class ParseEscape(Exception):
"""Exception raised to short-circuit parsing in InternalSubsetExtractor."""
pass
class InternalSubsetExtractor(ExpatBuilder):
"""XML processor which can rip out the internal document type subset."""
subset = None
def getSubset(self):
"""Return the internal subset as a string."""
return self.subset
def parseFile(self, file):
try:
ExpatBuilder.parseFile(self, file)
except ParseEscape:
pass
def parseString(self, string):
try:
ExpatBuilder.parseString(self, string)
except ParseEscape:
pass
def install(self, parser):
parser.StartDoctypeDeclHandler = self.start_doctype_decl_handler
parser.StartElementHandler = self.start_element_handler
def start_doctype_decl_handler(self, name, publicId, systemId,
has_internal_subset):
if has_internal_subset:
parser = self.getParser()
self.subset = []
parser.DefaultHandler = self.subset.append
parser.EndDoctypeDeclHandler = self.end_doctype_decl_handler
else:
raise ParseEscape()
def end_doctype_decl_handler(self):
s = ''.join(self.subset).replace('\r\n', '\n').replace('\r', '\n')
self.subset = s
raise ParseEscape()
def start_element_handler(self, name, attrs):
raise ParseEscape()
def parse(file, namespaces=True):
"""Parse a document, returning the resulting Document node.
'file' may be either a file name or an open file object.
"""
if namespaces:
builder = ExpatBuilderNS()
else:
builder = ExpatBuilder()
if isinstance(file, str):
with open(file, 'rb') as fp:
result = builder.parseFile(fp)
else:
result = builder.parseFile(file)
return result
def parseString(string, namespaces=True):
"""Parse a document from a string, returning the resulting
Document node.
"""
if namespaces:
builder = ExpatBuilderNS()
else:
builder = ExpatBuilder()
return builder.parseString(string)
def parseFragment(file, context, namespaces=True):
"""Parse a fragment of a document, given the context from which it
was originally extracted. context should be the parent of the
node(s) which are in the fragment.
'file' may be either a file name or an open file object.
"""
if namespaces:
builder = FragmentBuilderNS(context)
else:
builder = FragmentBuilder(context)
if isinstance(file, str):
with open(file, 'rb') as fp:
result = builder.parseFile(fp)
else:
result = builder.parseFile(file)
return result
def parseFragmentString(string, context, namespaces=True):
"""Parse a fragment of a document from a string, given the context
from which it was originally extracted. context should be the
parent of the node(s) which are in the fragment.
"""
if namespaces:
builder = FragmentBuilderNS(context)
else:
builder = FragmentBuilder(context)
return builder.parseString(string)
def makeBuilder(options):
"""Create a builder based on an Options object."""
if options.namespaces:
return ExpatBuilderNS(options)
else:
return ExpatBuilder(options)
"""Python version compatibility support for minidom.
This module contains internal implementation details and
should not be imported; use xml.dom.minidom instead.
"""
# This module should only be imported using "import *".
#
# The following names are defined:
#
# NodeList -- lightest possible NodeList implementation
#
# EmptyNodeList -- lightest possible NodeList that is guaranteed to
# remain empty (immutable)
#
# StringTypes -- tuple of defined string types
#
# defproperty -- function used in conjunction with GetattrMagic;
# using these together is needed to make them work
# as efficiently as possible in both Python 2.2+
# and older versions. For example:
#
# class MyClass(GetattrMagic):
# def _get_myattr(self):
# return something
#
# defproperty(MyClass, "myattr",
# "return some value")
#
# For Python 2.2 and newer, this will construct a
# property object on the class, which avoids
# needing to override __getattr__(). It will only
# work for read-only attributes.
#
# For older versions of Python, inheriting from
# GetattrMagic will use the traditional
# __getattr__() hackery to achieve the same effect,
# but less efficiently.
#
# defproperty() should be used for each version of
# the relevant _get_<property>() function.
__all__ = ["NodeList", "EmptyNodeList", "StringTypes", "defproperty"]
import xml.dom
StringTypes = (str,)
class NodeList(list):
__slots__ = ()
def item(self, index):
if 0 <= index < len(self):
return self[index]
def _get_length(self):
return len(self)
def _set_length(self, value):
raise xml.dom.NoModificationAllowedErr(
"attempt to modify read-only attribute 'length'")
length = property(_get_length, _set_length,
doc="The number of nodes in the NodeList.")
# For backward compatibility
def __setstate__(self, state):
if state is None:
state = []
self[:] = state
class EmptyNodeList(tuple):
__slots__ = ()
def __add__(self, other):
NL = NodeList()
NL.extend(other)
return NL
def __radd__(self, other):
NL = NodeList()
NL.extend(other)
return NL
def item(self, index):
return None
def _get_length(self):
return 0
def _set_length(self, value):
raise xml.dom.NoModificationAllowedErr(
"attempt to modify read-only attribute 'length'")
length = property(_get_length, _set_length,
doc="The number of nodes in the NodeList.")
def defproperty(klass, name, doc):
get = getattr(klass, ("_get_" + name))
def set(self, value, name=name):
raise xml.dom.NoModificationAllowedErr(
"attempt to modify read-only attribute " + repr(name))
assert not hasattr(klass, "_set_" + name), \
"expected not to find _set_" + name
prop = property(get, set, doc=doc)
setattr(klass, name, prop)
"""Simple implementation of the Level 1 DOM.
Namespaces and other minor Level 2 features are also supported.
parse("foo.xml")
parseString("<foo><bar/></foo>")
Todo:
=====
* convenience methods for getting elements and text.
* more testing
* bring some of the writer and linearizer code into conformance with this
interface
* SAX 2 namespaces
"""
import io
import xml.dom
from xml.dom import EMPTY_NAMESPACE, EMPTY_PREFIX, XMLNS_NAMESPACE, domreg
from xml.dom.minicompat import *
from xml.dom.xmlbuilder import DOMImplementationLS, DocumentLS
# This is used by the ID-cache invalidation checks; the list isn't
# actually complete, since the nodes being checked will never be the
# DOCUMENT_NODE or DOCUMENT_FRAGMENT_NODE. (The node being checked is
# the node being added or removed, not the node being modified.)
#
_nodeTypes_with_children = (xml.dom.Node.ELEMENT_NODE,
xml.dom.Node.ENTITY_REFERENCE_NODE)
class Node(xml.dom.Node):
namespaceURI = None # this is non-null only for elements and attributes
parentNode = None
ownerDocument = None
nextSibling = None
previousSibling = None
prefix = EMPTY_PREFIX # non-null only for NS elements and attributes
def __bool__(self):
return True
def toxml(self, encoding=None):
return self.toprettyxml("", "", encoding)
def toprettyxml(self, indent="\t", newl="\n", encoding=None):
if encoding is None:
writer = io.StringIO()
else:
writer = io.TextIOWrapper(io.BytesIO(),
encoding=encoding,
errors="xmlcharrefreplace",
newline='\n')
if self.nodeType == Node.DOCUMENT_NODE:
# Can pass encoding only to document, to put it into XML header
self.writexml(writer, "", indent, newl, encoding)
else:
self.writexml(writer, "", indent, newl)
if encoding is None:
return writer.getvalue()
else:
return writer.detach().getvalue()
def hasChildNodes(self):
return bool(self.childNodes)
def _get_childNodes(self):
return self.childNodes
def _get_firstChild(self):
if self.childNodes:
return self.childNodes[0]
def _get_lastChild(self):
if self.childNodes:
return self.childNodes[-1]
def insertBefore(self, newChild, refChild):
if newChild.nodeType == self.DOCUMENT_FRAGMENT_NODE:
for c in tuple(newChild.childNodes):
self.insertBefore(c, refChild)
### The DOM does not clearly specify what to return in this case
return newChild
if newChild.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(newChild), repr(self)))
if newChild.parentNode is not None:
newChild.parentNode.removeChild(newChild)
if refChild is None:
self.appendChild(newChild)
else:
try:
index = self.childNodes.index(refChild)
except ValueError:
raise xml.dom.NotFoundErr()
if newChild.nodeType in _nodeTypes_with_children:
_clear_id_cache(self)
self.childNodes.insert(index, newChild)
newChild.nextSibling = refChild
refChild.previousSibling = newChild
if index:
node = self.childNodes[index-1]
node.nextSibling = newChild
newChild.previousSibling = node
else:
newChild.previousSibling = None
newChild.parentNode = self
return newChild
def appendChild(self, node):
if node.nodeType == self.DOCUMENT_FRAGMENT_NODE:
for c in tuple(node.childNodes):
self.appendChild(c)
### The DOM does not clearly specify what to return in this case
return node
if node.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(node), repr(self)))
elif node.nodeType in _nodeTypes_with_children:
_clear_id_cache(self)
if node.parentNode is not None:
node.parentNode.removeChild(node)
_append_child(self, node)
node.nextSibling = None
return node
def replaceChild(self, newChild, oldChild):
if newChild.nodeType == self.DOCUMENT_FRAGMENT_NODE:
refChild = oldChild.nextSibling
self.removeChild(oldChild)
return self.insertBefore(newChild, refChild)
if newChild.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(newChild), repr(self)))
if newChild is oldChild:
return
if newChild.parentNode is not None:
newChild.parentNode.removeChild(newChild)
try:
index = self.childNodes.index(oldChild)
except ValueError:
raise xml.dom.NotFoundErr()
self.childNodes[index] = newChild
newChild.parentNode = self
oldChild.parentNode = None
if (newChild.nodeType in _nodeTypes_with_children
or oldChild.nodeType in _nodeTypes_with_children):
_clear_id_cache(self)
newChild.nextSibling = oldChild.nextSibling
newChild.previousSibling = oldChild.previousSibling
oldChild.nextSibling = None
oldChild.previousSibling = None
if newChild.previousSibling:
newChild.previousSibling.nextSibling = newChild
if newChild.nextSibling:
newChild.nextSibling.previousSibling = newChild
return oldChild
def removeChild(self, oldChild):
try:
self.childNodes.remove(oldChild)
except ValueError:
raise xml.dom.NotFoundErr()
if oldChild.nextSibling is not None:
oldChild.nextSibling.previousSibling = oldChild.previousSibling
if oldChild.previousSibling is not None:
oldChild.previousSibling.nextSibling = oldChild.nextSibling
oldChild.nextSibling = oldChild.previousSibling = None
if oldChild.nodeType in _nodeTypes_with_children:
_clear_id_cache(self)
oldChild.parentNode = None
return oldChild
def normalize(self):
L = []
for child in self.childNodes:
if child.nodeType == Node.TEXT_NODE:
if not child.data:
# empty text node; discard
if L:
L[-1].nextSibling = child.nextSibling
if child.nextSibling:
child.nextSibling.previousSibling = child.previousSibling
child.unlink()
elif L and L[-1].nodeType == child.nodeType:
# collapse text node
node = L[-1]
node.data = node.data + child.data
node.nextSibling = child.nextSibling
if child.nextSibling:
child.nextSibling.previousSibling = node
child.unlink()
else:
L.append(child)
else:
L.append(child)
if child.nodeType == Node.ELEMENT_NODE:
child.normalize()
self.childNodes[:] = L
def cloneNode(self, deep):
return _clone_node(self, deep, self.ownerDocument or self)
def isSupported(self, feature, version):
return self.ownerDocument.implementation.hasFeature(feature, version)
def _get_localName(self):
# Overridden in Element and Attr where localName can be Non-Null
return None
# Node interfaces from Level 3 (WD 9 April 2002)
def isSameNode(self, other):
return self is other
def getInterface(self, feature):
if self.isSupported(feature, None):
return self
else:
return None
# The "user data" functions use a dictionary that is only present
# if some user data has been set, so be careful not to assume it
# exists.
def getUserData(self, key):
try:
return self._user_data[key][0]
except (AttributeError, KeyError):
return None
def setUserData(self, key, data, handler):
old = None
try:
d = self._user_data
except AttributeError:
d = {}
self._user_data = d
if key in d:
old = d[key][0]
if data is None:
# ignore handlers passed for None
handler = None
if old is not None:
del d[key]
else:
d[key] = (data, handler)
return old
def _call_user_data_handler(self, operation, src, dst):
if hasattr(self, "_user_data"):
for key, (data, handler) in list(self._user_data.items()):
if handler is not None:
handler.handle(operation, key, data, src, dst)
# minidom-specific API:
def unlink(self):
self.parentNode = self.ownerDocument = None
if self.childNodes:
for child in self.childNodes:
child.unlink()
self.childNodes = NodeList()
self.previousSibling = None
self.nextSibling = None
# A Node is its own context manager, to ensure that an unlink() call occurs.
# This is similar to how a file object works.
def __enter__(self):
return self
def __exit__(self, et, ev, tb):
self.unlink()
defproperty(Node, "firstChild", doc="First child node, or None.")
defproperty(Node, "lastChild", doc="Last child node, or None.")
defproperty(Node, "localName", doc="Namespace-local name of this node.")
def _append_child(self, node):
# fast path with less checks; usable by DOM builders if careful
childNodes = self.childNodes
if childNodes:
last = childNodes[-1]
node.previousSibling = last
last.nextSibling = node
childNodes.append(node)
node.parentNode = self
def _in_document(node):
# return True iff node is part of a document tree
while node is not None:
if node.nodeType == Node.DOCUMENT_NODE:
return True
node = node.parentNode
return False
def _write_data(writer, data):
"Writes datachars to writer."
if data:
data = data.replace("&", "&").replace("<", "<"). \
replace("\"", """).replace(">", ">")
writer.write(data)
def _get_elements_by_tagName_helper(parent, name, rc):
for node in parent.childNodes:
if node.nodeType == Node.ELEMENT_NODE and \
(name == "*" or node.tagName == name):
rc.append(node)
_get_elements_by_tagName_helper(node, name, rc)
return rc
def _get_elements_by_tagName_ns_helper(parent, nsURI, localName, rc):
for node in parent.childNodes:
if node.nodeType == Node.ELEMENT_NODE:
if ((localName == "*" or node.localName == localName) and
(nsURI == "*" or node.namespaceURI == nsURI)):
rc.append(node)
_get_elements_by_tagName_ns_helper(node, nsURI, localName, rc)
return rc
class DocumentFragment(Node):
nodeType = Node.DOCUMENT_FRAGMENT_NODE
nodeName = "#document-fragment"
nodeValue = None
attributes = None
parentNode = None
_child_node_types = (Node.ELEMENT_NODE,
Node.TEXT_NODE,
Node.CDATA_SECTION_NODE,
Node.ENTITY_REFERENCE_NODE,
Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE,
Node.NOTATION_NODE)
def __init__(self):
self.childNodes = NodeList()
class Attr(Node):
__slots__=('_name', '_value', 'namespaceURI',
'_prefix', 'childNodes', '_localName', 'ownerDocument', 'ownerElement')
nodeType = Node.ATTRIBUTE_NODE
attributes = None
specified = False
_is_id = False
_child_node_types = (Node.TEXT_NODE, Node.ENTITY_REFERENCE_NODE)
def __init__(self, qName, namespaceURI=EMPTY_NAMESPACE, localName=None,
prefix=None):
self.ownerElement = None
self._name = qName
self.namespaceURI = namespaceURI
self._prefix = prefix
self.childNodes = NodeList()
# Add the single child node that represents the value of the attr
self.childNodes.append(Text())
# nodeValue and value are set elsewhere
def _get_localName(self):
try:
return self._localName
except AttributeError:
return self.nodeName.split(":", 1)[-1]
def _get_specified(self):
return self.specified
def _get_name(self):
return self._name
def _set_name(self, value):
self._name = value
if self.ownerElement is not None:
_clear_id_cache(self.ownerElement)
nodeName = name = property(_get_name, _set_name)
def _get_value(self):
return self._value
def _set_value(self, value):
self._value = value
self.childNodes[0].data = value
if self.ownerElement is not None:
_clear_id_cache(self.ownerElement)
self.childNodes[0].data = value
nodeValue = value = property(_get_value, _set_value)
def _get_prefix(self):
return self._prefix
def _set_prefix(self, prefix):
nsuri = self.namespaceURI
if prefix == "xmlns":
if nsuri and nsuri != XMLNS_NAMESPACE:
raise xml.dom.NamespaceErr(
"illegal use of 'xmlns' prefix for the wrong namespace")
self._prefix = prefix
if prefix is None:
newName = self.localName
else:
newName = "%s:%s" % (prefix, self.localName)
if self.ownerElement:
_clear_id_cache(self.ownerElement)
self.name = newName
prefix = property(_get_prefix, _set_prefix)
def unlink(self):
# This implementation does not call the base implementation
# since most of that is not needed, and the expense of the
# method call is not warranted. We duplicate the removal of
# children, but that's all we needed from the base class.
elem = self.ownerElement
if elem is not None:
del elem._attrs[self.nodeName]
del elem._attrsNS[(self.namespaceURI, self.localName)]
if self._is_id:
self._is_id = False
elem._magic_id_nodes -= 1
self.ownerDocument._magic_id_count -= 1
for child in self.childNodes:
child.unlink()
del self.childNodes[:]
def _get_isId(self):
if self._is_id:
return True
doc = self.ownerDocument
elem = self.ownerElement
if doc is None or elem is None:
return False
info = doc._get_elem_info(elem)
if info is None:
return False
if self.namespaceURI:
return info.isIdNS(self.namespaceURI, self.localName)
else:
return info.isId(self.nodeName)
def _get_schemaType(self):
doc = self.ownerDocument
elem = self.ownerElement
if doc is None or elem is None:
return _no_type
info = doc._get_elem_info(elem)
if info is None:
return _no_type
if self.namespaceURI:
return info.getAttributeTypeNS(self.namespaceURI, self.localName)
else:
return info.getAttributeType(self.nodeName)
defproperty(Attr, "isId", doc="True if this attribute is an ID.")
defproperty(Attr, "localName", doc="Namespace-local name of this attribute.")
defproperty(Attr, "schemaType", doc="Schema type for this attribute.")
class NamedNodeMap(object):
"""The attribute list is a transient interface to the underlying
dictionaries. Mutations here will change the underlying element's
dictionary.
Ordering is imposed artificially and does not reflect the order of
attributes as found in an input document.
"""
__slots__ = ('_attrs', '_attrsNS', '_ownerElement')
def __init__(self, attrs, attrsNS, ownerElement):
self._attrs = attrs
self._attrsNS = attrsNS
self._ownerElement = ownerElement
def _get_length(self):
return len(self._attrs)
def item(self, index):
try:
return self[list(self._attrs.keys())[index]]
except IndexError:
return None
def items(self):
L = []
for node in self._attrs.values():
L.append((node.nodeName, node.value))
return L
def itemsNS(self):
L = []
for node in self._attrs.values():
L.append(((node.namespaceURI, node.localName), node.value))
return L
def __contains__(self, key):
if isinstance(key, str):
return key in self._attrs
else:
return key in self._attrsNS
def keys(self):
return self._attrs.keys()
def keysNS(self):
return self._attrsNS.keys()
def values(self):
return self._attrs.values()
def get(self, name, value=None):
return self._attrs.get(name, value)
__len__ = _get_length
def _cmp(self, other):
if self._attrs is getattr(other, "_attrs", None):
return 0
else:
return (id(self) > id(other)) - (id(self) < id(other))
def __eq__(self, other):
return self._cmp(other) == 0
def __ge__(self, other):
return self._cmp(other) >= 0
def __gt__(self, other):
return self._cmp(other) > 0
def __le__(self, other):
return self._cmp(other) <= 0
def __lt__(self, other):
return self._cmp(other) < 0
def __ne__(self, other):
return self._cmp(other) != 0
def __getitem__(self, attname_or_tuple):
if isinstance(attname_or_tuple, tuple):
return self._attrsNS[attname_or_tuple]
else:
return self._attrs[attname_or_tuple]
# same as set
def __setitem__(self, attname, value):
if isinstance(value, str):
try:
node = self._attrs[attname]
except KeyError:
node = Attr(attname)
node.ownerDocument = self._ownerElement.ownerDocument
self.setNamedItem(node)
node.value = value
else:
if not isinstance(value, Attr):
raise TypeError("value must be a string or Attr object")
node = value
self.setNamedItem(node)
def getNamedItem(self, name):
try:
return self._attrs[name]
except KeyError:
return None
def getNamedItemNS(self, namespaceURI, localName):
try:
return self._attrsNS[(namespaceURI, localName)]
except KeyError:
return None
def removeNamedItem(self, name):
n = self.getNamedItem(name)
if n is not None:
_clear_id_cache(self._ownerElement)
del self._attrs[n.nodeName]
del self._attrsNS[(n.namespaceURI, n.localName)]
if hasattr(n, 'ownerElement'):
n.ownerElement = None
return n
else:
raise xml.dom.NotFoundErr()
def removeNamedItemNS(self, namespaceURI, localName):
n = self.getNamedItemNS(namespaceURI, localName)
if n is not None:
_clear_id_cache(self._ownerElement)
del self._attrsNS[(n.namespaceURI, n.localName)]
del self._attrs[n.nodeName]
if hasattr(n, 'ownerElement'):
n.ownerElement = None
return n
else:
raise xml.dom.NotFoundErr()
def setNamedItem(self, node):
if not isinstance(node, Attr):
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(node), repr(self)))
old = self._attrs.get(node.name)
if old:
old.unlink()
self._attrs[node.name] = node
self._attrsNS[(node.namespaceURI, node.localName)] = node
node.ownerElement = self._ownerElement
_clear_id_cache(node.ownerElement)
return old
def setNamedItemNS(self, node):
return self.setNamedItem(node)
def __delitem__(self, attname_or_tuple):
node = self[attname_or_tuple]
_clear_id_cache(node.ownerElement)
node.unlink()
def __getstate__(self):
return self._attrs, self._attrsNS, self._ownerElement
def __setstate__(self, state):
self._attrs, self._attrsNS, self._ownerElement = state
defproperty(NamedNodeMap, "length",
doc="Number of nodes in the NamedNodeMap.")
AttributeList = NamedNodeMap
class TypeInfo(object):
__slots__ = 'namespace', 'name'
def __init__(self, namespace, name):
self.namespace = namespace
self.name = name
def __repr__(self):
if self.namespace:
return "<TypeInfo %r (from %r)>" % (self.name, self.namespace)
else:
return "<TypeInfo %r>" % self.name
def _get_name(self):
return self.name
def _get_namespace(self):
return self.namespace
_no_type = TypeInfo(None, None)
class Element(Node):
__slots__=('ownerDocument', 'parentNode', 'tagName', 'nodeName', 'prefix',
'namespaceURI', '_localName', 'childNodes', '_attrs', '_attrsNS',
'nextSibling', 'previousSibling')
nodeType = Node.ELEMENT_NODE
nodeValue = None
schemaType = _no_type
_magic_id_nodes = 0
_child_node_types = (Node.ELEMENT_NODE,
Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE,
Node.TEXT_NODE,
Node.CDATA_SECTION_NODE,
Node.ENTITY_REFERENCE_NODE)
def __init__(self, tagName, namespaceURI=EMPTY_NAMESPACE, prefix=None,
localName=None):
self.parentNode = None
self.tagName = self.nodeName = tagName
self.prefix = prefix
self.namespaceURI = namespaceURI
self.childNodes = NodeList()
self.nextSibling = self.previousSibling = None
# Attribute dictionaries are lazily created
# attributes are double-indexed:
# tagName -> Attribute
# URI,localName -> Attribute
# in the future: consider lazy generation
# of attribute objects this is too tricky
# for now because of headaches with
# namespaces.
self._attrs = None
self._attrsNS = None
def _ensure_attributes(self):
if self._attrs is None:
self._attrs = {}
self._attrsNS = {}
def _get_localName(self):
try:
return self._localName
except AttributeError:
return self.tagName.split(":", 1)[-1]
def _get_tagName(self):
return self.tagName
def unlink(self):
if self._attrs is not None:
for attr in list(self._attrs.values()):
attr.unlink()
self._attrs = None
self._attrsNS = None
Node.unlink(self)
def getAttribute(self, attname):
if self._attrs is None:
return ""
try:
return self._attrs[attname].value
except KeyError:
return ""
def getAttributeNS(self, namespaceURI, localName):
if self._attrsNS is None:
return ""
try:
return self._attrsNS[(namespaceURI, localName)].value
except KeyError:
return ""
def setAttribute(self, attname, value):
attr = self.getAttributeNode(attname)
if attr is None:
attr = Attr(attname)
attr.value = value # also sets nodeValue
attr.ownerDocument = self.ownerDocument
self.setAttributeNode(attr)
elif value != attr.value:
attr.value = value
if attr.isId:
_clear_id_cache(self)
def setAttributeNS(self, namespaceURI, qualifiedName, value):
prefix, localname = _nssplit(qualifiedName)
attr = self.getAttributeNodeNS(namespaceURI, localname)
if attr is None:
attr = Attr(qualifiedName, namespaceURI, localname, prefix)
attr.value = value
attr.ownerDocument = self.ownerDocument
self.setAttributeNode(attr)
else:
if value != attr.value:
attr.value = value
if attr.isId:
_clear_id_cache(self)
if attr.prefix != prefix:
attr.prefix = prefix
attr.nodeName = qualifiedName
def getAttributeNode(self, attrname):
if self._attrs is None:
return None
return self._attrs.get(attrname)
def getAttributeNodeNS(self, namespaceURI, localName):
if self._attrsNS is None:
return None
return self._attrsNS.get((namespaceURI, localName))
def setAttributeNode(self, attr):
if attr.ownerElement not in (None, self):
raise xml.dom.InuseAttributeErr("attribute node already owned")
self._ensure_attributes()
old1 = self._attrs.get(attr.name, None)
if old1 is not None:
self.removeAttributeNode(old1)
old2 = self._attrsNS.get((attr.namespaceURI, attr.localName), None)
if old2 is not None and old2 is not old1:
self.removeAttributeNode(old2)
_set_attribute_node(self, attr)
if old1 is not attr:
# It might have already been part of this node, in which case
# it doesn't represent a change, and should not be returned.
return old1
if old2 is not attr:
return old2
setAttributeNodeNS = setAttributeNode
def removeAttribute(self, name):
if self._attrsNS is None:
raise xml.dom.NotFoundErr()
try:
attr = self._attrs[name]
except KeyError:
raise xml.dom.NotFoundErr()
self.removeAttributeNode(attr)
def removeAttributeNS(self, namespaceURI, localName):
if self._attrsNS is None:
raise xml.dom.NotFoundErr()
try:
attr = self._attrsNS[(namespaceURI, localName)]
except KeyError:
raise xml.dom.NotFoundErr()
self.removeAttributeNode(attr)
def removeAttributeNode(self, node):
if node is None:
raise xml.dom.NotFoundErr()
try:
self._attrs[node.name]
except KeyError:
raise xml.dom.NotFoundErr()
_clear_id_cache(self)
node.unlink()
# Restore this since the node is still useful and otherwise
# unlinked
node.ownerDocument = self.ownerDocument
removeAttributeNodeNS = removeAttributeNode
def hasAttribute(self, name):
if self._attrs is None:
return False
return name in self._attrs
def hasAttributeNS(self, namespaceURI, localName):
if self._attrsNS is None:
return False
return (namespaceURI, localName) in self._attrsNS
def getElementsByTagName(self, name):
return _get_elements_by_tagName_helper(self, name, NodeList())
def getElementsByTagNameNS(self, namespaceURI, localName):
return _get_elements_by_tagName_ns_helper(
self, namespaceURI, localName, NodeList())
def __repr__(self):
return "<DOM Element: %s at %#x>" % (self.tagName, id(self))
def writexml(self, writer, indent="", addindent="", newl=""):
# indent = current indentation
# addindent = indentation to add to higher levels
# newl = newline string
writer.write(indent+"<" + self.tagName)
attrs = self._get_attributes()
a_names = sorted(attrs.keys())
for a_name in a_names:
writer.write(" %s=\"" % a_name)
_write_data(writer, attrs[a_name].value)
writer.write("\"")
if self.childNodes:
writer.write(">")
if (len(self.childNodes) == 1 and
self.childNodes[0].nodeType == Node.TEXT_NODE):
self.childNodes[0].writexml(writer, '', '', '')
else:
writer.write(newl)
for node in self.childNodes:
node.writexml(writer, indent+addindent, addindent, newl)
writer.write(indent)
writer.write("</%s>%s" % (self.tagName, newl))
else:
writer.write("/>%s"%(newl))
def _get_attributes(self):
self._ensure_attributes()
return NamedNodeMap(self._attrs, self._attrsNS, self)
def hasAttributes(self):
if self._attrs:
return True
else:
return False
# DOM Level 3 attributes, based on the 22 Oct 2002 draft
def setIdAttribute(self, name):
idAttr = self.getAttributeNode(name)
self.setIdAttributeNode(idAttr)
def setIdAttributeNS(self, namespaceURI, localName):
idAttr = self.getAttributeNodeNS(namespaceURI, localName)
self.setIdAttributeNode(idAttr)
def setIdAttributeNode(self, idAttr):
if idAttr is None or not self.isSameNode(idAttr.ownerElement):
raise xml.dom.NotFoundErr()
if _get_containing_entref(self) is not None:
raise xml.dom.NoModificationAllowedErr()
if not idAttr._is_id:
idAttr._is_id = True
self._magic_id_nodes += 1
self.ownerDocument._magic_id_count += 1
_clear_id_cache(self)
defproperty(Element, "attributes",
doc="NamedNodeMap of attributes on the element.")
defproperty(Element, "localName",
doc="Namespace-local name of this element.")
def _set_attribute_node(element, attr):
_clear_id_cache(element)
element._ensure_attributes()
element._attrs[attr.name] = attr
element._attrsNS[(attr.namespaceURI, attr.localName)] = attr
# This creates a circular reference, but Element.unlink()
# breaks the cycle since the references to the attribute
# dictionaries are tossed.
attr.ownerElement = element
class Childless:
"""Mixin that makes childless-ness easy to implement and avoids
the complexity of the Node methods that deal with children.
"""
__slots__ = ()
attributes = None
childNodes = EmptyNodeList()
firstChild = None
lastChild = None
def _get_firstChild(self):
return None
def _get_lastChild(self):
return None
def appendChild(self, node):
raise xml.dom.HierarchyRequestErr(
self.nodeName + " nodes cannot have children")
def hasChildNodes(self):
return False
def insertBefore(self, newChild, refChild):
raise xml.dom.HierarchyRequestErr(
self.nodeName + " nodes do not have children")
def removeChild(self, oldChild):
raise xml.dom.NotFoundErr(
self.nodeName + " nodes do not have children")
def normalize(self):
# For childless nodes, normalize() has nothing to do.
pass
def replaceChild(self, newChild, oldChild):
raise xml.dom.HierarchyRequestErr(
self.nodeName + " nodes do not have children")
class ProcessingInstruction(Childless, Node):
nodeType = Node.PROCESSING_INSTRUCTION_NODE
__slots__ = ('target', 'data')
def __init__(self, target, data):
self.target = target
self.data = data
# nodeValue is an alias for data
def _get_nodeValue(self):
return self.data
def _set_nodeValue(self, value):
self.data = value
nodeValue = property(_get_nodeValue, _set_nodeValue)
# nodeName is an alias for target
def _get_nodeName(self):
return self.target
def _set_nodeName(self, value):
self.target = value
nodeName = property(_get_nodeName, _set_nodeName)
def writexml(self, writer, indent="", addindent="", newl=""):
writer.write("%s<?%s %s?>%s" % (indent,self.target, self.data, newl))
class CharacterData(Childless, Node):
__slots__=('_data', 'ownerDocument','parentNode', 'previousSibling', 'nextSibling')
def __init__(self):
self.ownerDocument = self.parentNode = None
self.previousSibling = self.nextSibling = None
self._data = ''
Node.__init__(self)
def _get_length(self):
return len(self.data)
__len__ = _get_length
def _get_data(self):
return self._data
def _set_data(self, data):
self._data = data
data = nodeValue = property(_get_data, _set_data)
def __repr__(self):
data = self.data
if len(data) > 10:
dotdotdot = "..."
else:
dotdotdot = ""
return '<DOM %s node "%r%s">' % (
self.__class__.__name__, data[0:10], dotdotdot)
def substringData(self, offset, count):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if count < 0:
raise xml.dom.IndexSizeErr("count cannot be negative")
return self.data[offset:offset+count]
def appendData(self, arg):
self.data = self.data + arg
def insertData(self, offset, arg):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if arg:
self.data = "%s%s%s" % (
self.data[:offset], arg, self.data[offset:])
def deleteData(self, offset, count):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if count < 0:
raise xml.dom.IndexSizeErr("count cannot be negative")
if count:
self.data = self.data[:offset] + self.data[offset+count:]
def replaceData(self, offset, count, arg):
if offset < 0:
raise xml.dom.IndexSizeErr("offset cannot be negative")
if offset >= len(self.data):
raise xml.dom.IndexSizeErr("offset cannot be beyond end of data")
if count < 0:
raise xml.dom.IndexSizeErr("count cannot be negative")
if count:
self.data = "%s%s%s" % (
self.data[:offset], arg, self.data[offset+count:])
defproperty(CharacterData, "length", doc="Length of the string data.")
class Text(CharacterData):
__slots__ = ()
nodeType = Node.TEXT_NODE
nodeName = "#text"
attributes = None
def splitText(self, offset):
if offset < 0 or offset > len(self.data):
raise xml.dom.IndexSizeErr("illegal offset value")
newText = self.__class__()
newText.data = self.data[offset:]
newText.ownerDocument = self.ownerDocument
next = self.nextSibling
if self.parentNode and self in self.parentNode.childNodes:
if next is None:
self.parentNode.appendChild(newText)
else:
self.parentNode.insertBefore(newText, next)
self.data = self.data[:offset]
return newText
def writexml(self, writer, indent="", addindent="", newl=""):
_write_data(writer, "%s%s%s" % (indent, self.data, newl))
# DOM Level 3 (WD 9 April 2002)
def _get_wholeText(self):
L = [self.data]
n = self.previousSibling
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
L.insert(0, n.data)
n = n.previousSibling
else:
break
n = self.nextSibling
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
L.append(n.data)
n = n.nextSibling
else:
break
return ''.join(L)
def replaceWholeText(self, content):
# XXX This needs to be seriously changed if minidom ever
# supports EntityReference nodes.
parent = self.parentNode
n = self.previousSibling
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
next = n.previousSibling
parent.removeChild(n)
n = next
else:
break
n = self.nextSibling
if not content:
parent.removeChild(self)
while n is not None:
if n.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
next = n.nextSibling
parent.removeChild(n)
n = next
else:
break
if content:
self.data = content
return self
else:
return None
def _get_isWhitespaceInElementContent(self):
if self.data.strip():
return False
elem = _get_containing_element(self)
if elem is None:
return False
info = self.ownerDocument._get_elem_info(elem)
if info is None:
return False
else:
return info.isElementContent()
defproperty(Text, "isWhitespaceInElementContent",
doc="True iff this text node contains only whitespace"
" and is in element content.")
defproperty(Text, "wholeText",
doc="The text of all logically-adjacent text nodes.")
def _get_containing_element(node):
c = node.parentNode
while c is not None:
if c.nodeType == Node.ELEMENT_NODE:
return c
c = c.parentNode
return None
def _get_containing_entref(node):
c = node.parentNode
while c is not None:
if c.nodeType == Node.ENTITY_REFERENCE_NODE:
return c
c = c.parentNode
return None
class Comment(CharacterData):
nodeType = Node.COMMENT_NODE
nodeName = "#comment"
def __init__(self, data):
CharacterData.__init__(self)
self._data = data
def writexml(self, writer, indent="", addindent="", newl=""):
if "--" in self.data:
raise ValueError("'--' is not allowed in a comment node")
writer.write("%s<!--%s-->%s" % (indent, self.data, newl))
class CDATASection(Text):
__slots__ = ()
nodeType = Node.CDATA_SECTION_NODE
nodeName = "#cdata-section"
def writexml(self, writer, indent="", addindent="", newl=""):
if self.data.find("]]>") >= 0:
raise ValueError("']]>' not allowed in a CDATA section")
writer.write("<![CDATA[%s]]>" % self.data)
class ReadOnlySequentialNamedNodeMap(object):
__slots__ = '_seq',
def __init__(self, seq=()):
# seq should be a list or tuple
self._seq = seq
def __len__(self):
return len(self._seq)
def _get_length(self):
return len(self._seq)
def getNamedItem(self, name):
for n in self._seq:
if n.nodeName == name:
return n
def getNamedItemNS(self, namespaceURI, localName):
for n in self._seq:
if n.namespaceURI == namespaceURI and n.localName == localName:
return n
def __getitem__(self, name_or_tuple):
if isinstance(name_or_tuple, tuple):
node = self.getNamedItemNS(*name_or_tuple)
else:
node = self.getNamedItem(name_or_tuple)
if node is None:
raise KeyError(name_or_tuple)
return node
def item(self, index):
if index < 0:
return None
try:
return self._seq[index]
except IndexError:
return None
def removeNamedItem(self, name):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def removeNamedItemNS(self, namespaceURI, localName):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def setNamedItem(self, node):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def setNamedItemNS(self, node):
raise xml.dom.NoModificationAllowedErr(
"NamedNodeMap instance is read-only")
def __getstate__(self):
return [self._seq]
def __setstate__(self, state):
self._seq = state[0]
defproperty(ReadOnlySequentialNamedNodeMap, "length",
doc="Number of entries in the NamedNodeMap.")
class Identified:
"""Mix-in class that supports the publicId and systemId attributes."""
__slots__ = 'publicId', 'systemId'
def _identified_mixin_init(self, publicId, systemId):
self.publicId = publicId
self.systemId = systemId
def _get_publicId(self):
return self.publicId
def _get_systemId(self):
return self.systemId
class DocumentType(Identified, Childless, Node):
nodeType = Node.DOCUMENT_TYPE_NODE
nodeValue = None
name = None
publicId = None
systemId = None
internalSubset = None
def __init__(self, qualifiedName):
self.entities = ReadOnlySequentialNamedNodeMap()
self.notations = ReadOnlySequentialNamedNodeMap()
if qualifiedName:
prefix, localname = _nssplit(qualifiedName)
self.name = localname
self.nodeName = self.name
def _get_internalSubset(self):
return self.internalSubset
def cloneNode(self, deep):
if self.ownerDocument is None:
# it's ok
clone = DocumentType(None)
clone.name = self.name
clone.nodeName = self.name
operation = xml.dom.UserDataHandler.NODE_CLONED
if deep:
clone.entities._seq = []
clone.notations._seq = []
for n in self.notations._seq:
notation = Notation(n.nodeName, n.publicId, n.systemId)
clone.notations._seq.append(notation)
n._call_user_data_handler(operation, n, notation)
for e in self.entities._seq:
entity = Entity(e.nodeName, e.publicId, e.systemId,
e.notationName)
entity.actualEncoding = e.actualEncoding
entity.encoding = e.encoding
entity.version = e.version
clone.entities._seq.append(entity)
e._call_user_data_handler(operation, n, entity)
self._call_user_data_handler(operation, self, clone)
return clone
else:
return None
def writexml(self, writer, indent="", addindent="", newl=""):
writer.write("<!DOCTYPE ")
writer.write(self.name)
if self.publicId:
writer.write("%s PUBLIC '%s'%s '%s'"
% (newl, self.publicId, newl, self.systemId))
elif self.systemId:
writer.write("%s SYSTEM '%s'" % (newl, self.systemId))
if self.internalSubset is not None:
writer.write(" [")
writer.write(self.internalSubset)
writer.write("]")
writer.write(">"+newl)
class Entity(Identified, Node):
attributes = None
nodeType = Node.ENTITY_NODE
nodeValue = None
actualEncoding = None
encoding = None
version = None
def __init__(self, name, publicId, systemId, notation):
self.nodeName = name
self.notationName = notation
self.childNodes = NodeList()
self._identified_mixin_init(publicId, systemId)
def _get_actualEncoding(self):
return self.actualEncoding
def _get_encoding(self):
return self.encoding
def _get_version(self):
return self.version
def appendChild(self, newChild):
raise xml.dom.HierarchyRequestErr(
"cannot append children to an entity node")
def insertBefore(self, newChild, refChild):
raise xml.dom.HierarchyRequestErr(
"cannot insert children below an entity node")
def removeChild(self, oldChild):
raise xml.dom.HierarchyRequestErr(
"cannot remove children from an entity node")
def replaceChild(self, newChild, oldChild):
raise xml.dom.HierarchyRequestErr(
"cannot replace children of an entity node")
class Notation(Identified, Childless, Node):
nodeType = Node.NOTATION_NODE
nodeValue = None
def __init__(self, name, publicId, systemId):
self.nodeName = name
self._identified_mixin_init(publicId, systemId)
class DOMImplementation(DOMImplementationLS):
_features = [("core", "1.0"),
("core", "2.0"),
("core", None),
("xml", "1.0"),
("xml", "2.0"),
("xml", None),
("ls-load", "3.0"),
("ls-load", None),
]
def hasFeature(self, feature, version):
if version == "":
version = None
return (feature.lower(), version) in self._features
def createDocument(self, namespaceURI, qualifiedName, doctype):
if doctype and doctype.parentNode is not None:
raise xml.dom.WrongDocumentErr(
"doctype object owned by another DOM tree")
doc = self._create_document()
add_root_element = not (namespaceURI is None
and qualifiedName is None
and doctype is None)
if not qualifiedName and add_root_element:
# The spec is unclear what to raise here; SyntaxErr
# would be the other obvious candidate. Since Xerces raises
# InvalidCharacterErr, and since SyntaxErr is not listed
# for createDocument, that seems to be the better choice.
# XXX: need to check for illegal characters here and in
# createElement.
# DOM Level III clears this up when talking about the return value
# of this function. If namespaceURI, qName and DocType are
# Null the document is returned without a document element
# Otherwise if doctype or namespaceURI are not None
# Then we go back to the above problem
raise xml.dom.InvalidCharacterErr("Element with no name")
if add_root_element:
prefix, localname = _nssplit(qualifiedName)
if prefix == "xml" \
and namespaceURI != "http://www.w3.org/XML/1998/namespace":
raise xml.dom.NamespaceErr("illegal use of 'xml' prefix")
if prefix and not namespaceURI:
raise xml.dom.NamespaceErr(
"illegal use of prefix without namespaces")
element = doc.createElementNS(namespaceURI, qualifiedName)
if doctype:
doc.appendChild(doctype)
doc.appendChild(element)
if doctype:
doctype.parentNode = doctype.ownerDocument = doc
doc.doctype = doctype
doc.implementation = self
return doc
def createDocumentType(self, qualifiedName, publicId, systemId):
doctype = DocumentType(qualifiedName)
doctype.publicId = publicId
doctype.systemId = systemId
return doctype
# DOM Level 3 (WD 9 April 2002)
def getInterface(self, feature):
if self.hasFeature(feature, None):
return self
else:
return None
# internal
def _create_document(self):
return Document()
class ElementInfo(object):
"""Object that represents content-model information for an element.
This implementation is not expected to be used in practice; DOM
builders should provide implementations which do the right thing
using information available to it.
"""
__slots__ = 'tagName',
def __init__(self, name):
self.tagName = name
def getAttributeType(self, aname):
return _no_type
def getAttributeTypeNS(self, namespaceURI, localName):
return _no_type
def isElementContent(self):
return False
def isEmpty(self):
"""Returns true iff this element is declared to have an EMPTY
content model."""
return False
def isId(self, aname):
"""Returns true iff the named attribute is a DTD-style ID."""
return False
def isIdNS(self, namespaceURI, localName):
"""Returns true iff the identified attribute is a DTD-style ID."""
return False
def __getstate__(self):
return self.tagName
def __setstate__(self, state):
self.tagName = state
def _clear_id_cache(node):
if node.nodeType == Node.DOCUMENT_NODE:
node._id_cache.clear()
node._id_search_stack = None
elif _in_document(node):
node.ownerDocument._id_cache.clear()
node.ownerDocument._id_search_stack= None
class Document(Node, DocumentLS):
__slots__ = ('_elem_info', 'doctype',
'_id_search_stack', 'childNodes', '_id_cache')
_child_node_types = (Node.ELEMENT_NODE, Node.PROCESSING_INSTRUCTION_NODE,
Node.COMMENT_NODE, Node.DOCUMENT_TYPE_NODE)
implementation = DOMImplementation()
nodeType = Node.DOCUMENT_NODE
nodeName = "#document"
nodeValue = None
attributes = None
parentNode = None
previousSibling = nextSibling = None
# Document attributes from Level 3 (WD 9 April 2002)
actualEncoding = None
encoding = None
standalone = None
version = None
strictErrorChecking = False
errorHandler = None
documentURI = None
_magic_id_count = 0
def __init__(self):
self.doctype = None
self.childNodes = NodeList()
# mapping of (namespaceURI, localName) -> ElementInfo
# and tagName -> ElementInfo
self._elem_info = {}
self._id_cache = {}
self._id_search_stack = None
def _get_elem_info(self, element):
if element.namespaceURI:
key = element.namespaceURI, element.localName
else:
key = element.tagName
return self._elem_info.get(key)
def _get_actualEncoding(self):
return self.actualEncoding
def _get_doctype(self):
return self.doctype
def _get_documentURI(self):
return self.documentURI
def _get_encoding(self):
return self.encoding
def _get_errorHandler(self):
return self.errorHandler
def _get_standalone(self):
return self.standalone
def _get_strictErrorChecking(self):
return self.strictErrorChecking
def _get_version(self):
return self.version
def appendChild(self, node):
if node.nodeType not in self._child_node_types:
raise xml.dom.HierarchyRequestErr(
"%s cannot be child of %s" % (repr(node), repr(self)))
if node.parentNode is not None:
# This needs to be done before the next test since this
# may *be* the document element, in which case it should
# end up re-ordered to the end.
node.parentNode.removeChild(node)
if node.nodeType == Node.ELEMENT_NODE \
and self._get_documentElement():
raise xml.dom.HierarchyRequestErr(
"two document elements disallowed")
return Node.appendChild(self, node)
def removeChild(self, oldChild):
try:
self.childNodes.remove(oldChild)
except ValueError:
raise xml.dom.NotFoundErr()
oldChild.nextSibling = oldChild.previousSibling = None
oldChild.parentNode = None
if self.documentElement is oldChild:
self.documentElement = None
return oldChild
def _get_documentElement(self):
for node in self.childNodes:
if node.nodeType == Node.ELEMENT_NODE:
return node
def unlink(self):
if self.doctype is not None:
self.doctype.unlink()
self.doctype = None
Node.unlink(self)
def cloneNode(self, deep):
if not deep:
return None
clone = self.implementation.createDocument(None, None, None)
clone.encoding = self.encoding
clone.standalone = self.standalone
clone.version = self.version
for n in self.childNodes:
childclone = _clone_node(n, deep, clone)
assert childclone.ownerDocument.isSameNode(clone)
clone.childNodes.append(childclone)
if childclone.nodeType == Node.DOCUMENT_NODE:
assert clone.documentElement is None
elif childclone.nodeType == Node.DOCUMENT_TYPE_NODE:
assert clone.doctype is None
clone.doctype = childclone
childclone.parentNode = clone
self._call_user_data_handler(xml.dom.UserDataHandler.NODE_CLONED,
self, clone)
return clone
def createDocumentFragment(self):
d = DocumentFragment()
d.ownerDocument = self
return d
def createElement(self, tagName):
e = Element(tagName)
e.ownerDocument = self
return e
def createTextNode(self, data):
if not isinstance(data, str):
raise TypeError("node contents must be a string")
t = Text()
t.data = data
t.ownerDocument = self
return t
def createCDATASection(self, data):
if not isinstance(data, str):
raise TypeError("node contents must be a string")
c = CDATASection()
c.data = data
c.ownerDocument = self
return c
def createComment(self, data):
c = Comment(data)
c.ownerDocument = self
return c
def createProcessingInstruction(self, target, data):
p = ProcessingInstruction(target, data)
p.ownerDocument = self
return p
def createAttribute(self, qName):
a = Attr(qName)
a.ownerDocument = self
a.value = ""
return a
def createElementNS(self, namespaceURI, qualifiedName):
prefix, localName = _nssplit(qualifiedName)
e = Element(qualifiedName, namespaceURI, prefix)
e.ownerDocument = self
return e
def createAttributeNS(self, namespaceURI, qualifiedName):
prefix, localName = _nssplit(qualifiedName)
a = Attr(qualifiedName, namespaceURI, localName, prefix)
a.ownerDocument = self
a.value = ""
return a
# A couple of implementation-specific helpers to create node types
# not supported by the W3C DOM specs:
def _create_entity(self, name, publicId, systemId, notationName):
e = Entity(name, publicId, systemId, notationName)
e.ownerDocument = self
return e
def _create_notation(self, name, publicId, systemId):
n = Notation(name, publicId, systemId)
n.ownerDocument = self
return n
def getElementById(self, id):
if id in self._id_cache:
return self._id_cache[id]
if not (self._elem_info or self._magic_id_count):
return None
stack = self._id_search_stack
if stack is None:
# we never searched before, or the cache has been cleared
stack = [self.documentElement]
self._id_search_stack = stack
elif not stack:
# Previous search was completed and cache is still valid;
# no matching node.
return None
result = None
while stack:
node = stack.pop()
# add child elements to stack for continued searching
stack.extend([child for child in node.childNodes
if child.nodeType in _nodeTypes_with_children])
# check this node
info = self._get_elem_info(node)
if info:
# We have to process all ID attributes before
# returning in order to get all the attributes set to
# be IDs using Element.setIdAttribute*().
for attr in node.attributes.values():
if attr.namespaceURI:
if info.isIdNS(attr.namespaceURI, attr.localName):
self._id_cache[attr.value] = node
if attr.value == id:
result = node
elif not node._magic_id_nodes:
break
elif info.isId(attr.name):
self._id_cache[attr.value] = node
if attr.value == id:
result = node
elif not node._magic_id_nodes:
break
elif attr._is_id:
self._id_cache[attr.value] = node
if attr.value == id:
result = node
elif node._magic_id_nodes == 1:
break
elif node._magic_id_nodes:
for attr in node.attributes.values():
if attr._is_id:
self._id_cache[attr.value] = node
if attr.value == id:
result = node
if result is not None:
break
return result
def getElementsByTagName(self, name):
return _get_elements_by_tagName_helper(self, name, NodeList())
def getElementsByTagNameNS(self, namespaceURI, localName):
return _get_elements_by_tagName_ns_helper(
self, namespaceURI, localName, NodeList())
def isSupported(self, feature, version):
return self.implementation.hasFeature(feature, version)
def importNode(self, node, deep):
if node.nodeType == Node.DOCUMENT_NODE:
raise xml.dom.NotSupportedErr("cannot import document nodes")
elif node.nodeType == Node.DOCUMENT_TYPE_NODE:
raise xml.dom.NotSupportedErr("cannot import document type nodes")
return _clone_node(node, deep, self)
def writexml(self, writer, indent="", addindent="", newl="", encoding=None):
if encoding is None:
writer.write('<?xml version="1.0" ?>'+newl)
else:
writer.write('<?xml version="1.0" encoding="%s"?>%s' % (
encoding, newl))
for node in self.childNodes:
node.writexml(writer, indent, addindent, newl)
# DOM Level 3 (WD 9 April 2002)
def renameNode(self, n, namespaceURI, name):
if n.ownerDocument is not self:
raise xml.dom.WrongDocumentErr(
"cannot rename nodes from other documents;\n"
"expected %s,\nfound %s" % (self, n.ownerDocument))
if n.nodeType not in (Node.ELEMENT_NODE, Node.ATTRIBUTE_NODE):
raise xml.dom.NotSupportedErr(
"renameNode() only applies to element and attribute nodes")
if namespaceURI != EMPTY_NAMESPACE:
if ':' in name:
prefix, localName = name.split(':', 1)
if ( prefix == "xmlns"
and namespaceURI != xml.dom.XMLNS_NAMESPACE):
raise xml.dom.NamespaceErr(
"illegal use of 'xmlns' prefix")
else:
if ( name == "xmlns"
and namespaceURI != xml.dom.XMLNS_NAMESPACE
and n.nodeType == Node.ATTRIBUTE_NODE):
raise xml.dom.NamespaceErr(
"illegal use of the 'xmlns' attribute")
prefix = None
localName = name
else:
prefix = None
localName = None
if n.nodeType == Node.ATTRIBUTE_NODE:
element = n.ownerElement
if element is not None:
is_id = n._is_id
element.removeAttributeNode(n)
else:
element = None
n.prefix = prefix
n._localName = localName
n.namespaceURI = namespaceURI
n.nodeName = name
if n.nodeType == Node.ELEMENT_NODE:
n.tagName = name
else:
# attribute node
n.name = name
if element is not None:
element.setAttributeNode(n)
if is_id:
element.setIdAttributeNode(n)
# It's not clear from a semantic perspective whether we should
# call the user data handlers for the NODE_RENAMED event since
# we're re-using the existing node. The draft spec has been
# interpreted as meaning "no, don't call the handler unless a
# new node is created."
return n
defproperty(Document, "documentElement",
doc="Top-level element of this document.")
def _clone_node(node, deep, newOwnerDocument):
"""
Clone a node and give it the new owner document.
Called by Node.cloneNode and Document.importNode
"""
if node.ownerDocument.isSameNode(newOwnerDocument):
operation = xml.dom.UserDataHandler.NODE_CLONED
else:
operation = xml.dom.UserDataHandler.NODE_IMPORTED
if node.nodeType == Node.ELEMENT_NODE:
clone = newOwnerDocument.createElementNS(node.namespaceURI,
node.nodeName)
for attr in node.attributes.values():
clone.setAttributeNS(attr.namespaceURI, attr.nodeName, attr.value)
a = clone.getAttributeNodeNS(attr.namespaceURI, attr.localName)
a.specified = attr.specified
if deep:
for child in node.childNodes:
c = _clone_node(child, deep, newOwnerDocument)
clone.appendChild(c)
elif node.nodeType == Node.DOCUMENT_FRAGMENT_NODE:
clone = newOwnerDocument.createDocumentFragment()
if deep:
for child in node.childNodes:
c = _clone_node(child, deep, newOwnerDocument)
clone.appendChild(c)
elif node.nodeType == Node.TEXT_NODE:
clone = newOwnerDocument.createTextNode(node.data)
elif node.nodeType == Node.CDATA_SECTION_NODE:
clone = newOwnerDocument.createCDATASection(node.data)
elif node.nodeType == Node.PROCESSING_INSTRUCTION_NODE:
clone = newOwnerDocument.createProcessingInstruction(node.target,
node.data)
elif node.nodeType == Node.COMMENT_NODE:
clone = newOwnerDocument.createComment(node.data)
elif node.nodeType == Node.ATTRIBUTE_NODE:
clone = newOwnerDocument.createAttributeNS(node.namespaceURI,
node.nodeName)
clone.specified = True
clone.value = node.value
elif node.nodeType == Node.DOCUMENT_TYPE_NODE:
assert node.ownerDocument is not newOwnerDocument
operation = xml.dom.UserDataHandler.NODE_IMPORTED
clone = newOwnerDocument.implementation.createDocumentType(
node.name, node.publicId, node.systemId)
clone.ownerDocument = newOwnerDocument
if deep:
clone.entities._seq = []
clone.notations._seq = []
for n in node.notations._seq:
notation = Notation(n.nodeName, n.publicId, n.systemId)
notation.ownerDocument = newOwnerDocument
clone.notations._seq.append(notation)
if hasattr(n, '_call_user_data_handler'):
n._call_user_data_handler(operation, n, notation)
for e in node.entities._seq:
entity = Entity(e.nodeName, e.publicId, e.systemId,
e.notationName)
entity.actualEncoding = e.actualEncoding
entity.encoding = e.encoding
entity.version = e.version
entity.ownerDocument = newOwnerDocument
clone.entities._seq.append(entity)
if hasattr(e, '_call_user_data_handler'):
e._call_user_data_handler(operation, n, entity)
else:
# Note the cloning of Document and DocumentType nodes is
# implementation specific. minidom handles those cases
# directly in the cloneNode() methods.
raise xml.dom.NotSupportedErr("Cannot clone node %s" % repr(node))
# Check for _call_user_data_handler() since this could conceivably
# used with other DOM implementations (one of the FourThought
# DOMs, perhaps?).
if hasattr(node, '_call_user_data_handler'):
node._call_user_data_handler(operation, node, clone)
return clone
def _nssplit(qualifiedName):
fields = qualifiedName.split(':', 1)
if len(fields) == 2:
return fields
else:
return (None, fields[0])
def _do_pulldom_parse(func, args, kwargs):
events = func(*args, **kwargs)
toktype, rootNode = events.getEvent()
events.expandNode(rootNode)
events.clear()
return rootNode
def parse(file, parser=None, bufsize=None):
"""Parse a file into a DOM by filename or file object."""
if parser is None and not bufsize:
from xml.dom import expatbuilder
return expatbuilder.parse(file)
else:
from xml.dom import pulldom
return _do_pulldom_parse(pulldom.parse, (file,),
{'parser': parser, 'bufsize': bufsize})
def parseString(string, parser=None):
"""Parse a file into a DOM from a string."""
if parser is None:
from xml.dom import expatbuilder
return expatbuilder.parseString(string)
else:
from xml.dom import pulldom
return _do_pulldom_parse(pulldom.parseString, (string,),
{'parser': parser})
def getDOMImplementation(features=None):
if features:
if isinstance(features, str):
features = domreg._parse_feature_string(features)
for f, v in features:
if not Document.implementation.hasFeature(f, v):
return None
return Document.implementation
# This is the Python mapping for interface NodeFilter from
# DOM2-Traversal-Range. It contains only constants.
class NodeFilter:
"""
This is the DOM2 NodeFilter interface. It contains only constants.
"""
FILTER_ACCEPT = 1
FILTER_REJECT = 2
FILTER_SKIP = 3
SHOW_ALL = 0xFFFFFFFF
SHOW_ELEMENT = 0x00000001
SHOW_ATTRIBUTE = 0x00000002
SHOW_TEXT = 0x00000004
SHOW_CDATA_SECTION = 0x00000008
SHOW_ENTITY_REFERENCE = 0x00000010
SHOW_ENTITY = 0x00000020
SHOW_PROCESSING_INSTRUCTION = 0x00000040
SHOW_COMMENT = 0x00000080
SHOW_DOCUMENT = 0x00000100
SHOW_DOCUMENT_TYPE = 0x00000200
SHOW_DOCUMENT_FRAGMENT = 0x00000400
SHOW_NOTATION = 0x00000800
def acceptNode(self, node):
raise NotImplementedError
import xml.sax
import xml.sax.handler
START_ELEMENT = "START_ELEMENT"
END_ELEMENT = "END_ELEMENT"
COMMENT = "COMMENT"
START_DOCUMENT = "START_DOCUMENT"
END_DOCUMENT = "END_DOCUMENT"
PROCESSING_INSTRUCTION = "PROCESSING_INSTRUCTION"
IGNORABLE_WHITESPACE = "IGNORABLE_WHITESPACE"
CHARACTERS = "CHARACTERS"
class PullDOM(xml.sax.ContentHandler):
_locator = None
document = None
def __init__(self, documentFactory=None):
from xml.dom import XML_NAMESPACE
self.documentFactory = documentFactory
self.firstEvent = [None, None]
self.lastEvent = self.firstEvent
self.elementStack = []
self.push = self.elementStack.append
try:
self.pop = self.elementStack.pop
except AttributeError:
# use class' pop instead
pass
self._ns_contexts = [{XML_NAMESPACE:'xml'}] # contains uri -> prefix dicts
self._current_context = self._ns_contexts[-1]
self.pending_events = []
def pop(self):
result = self.elementStack[-1]
del self.elementStack[-1]
return result
def setDocumentLocator(self, locator):
self._locator = locator
def startPrefixMapping(self, prefix, uri):
if not hasattr(self, '_xmlns_attrs'):
self._xmlns_attrs = []
self._xmlns_attrs.append((prefix or 'xmlns', uri))
self._ns_contexts.append(self._current_context.copy())
self._current_context[uri] = prefix or None
def endPrefixMapping(self, prefix):
self._current_context = self._ns_contexts.pop()
def startElementNS(self, name, tagName , attrs):
# Retrieve xml namespace declaration attributes.
xmlns_uri = 'http://www.w3.org/2000/xmlns/'
xmlns_attrs = getattr(self, '_xmlns_attrs', None)
if xmlns_attrs is not None:
for aname, value in xmlns_attrs:
attrs._attrs[(xmlns_uri, aname)] = value
self._xmlns_attrs = []
uri, localname = name
if uri:
# When using namespaces, the reader may or may not
# provide us with the original name. If not, create
# *a* valid tagName from the current context.
if tagName is None:
prefix = self._current_context[uri]
if prefix:
tagName = prefix + ":" + localname
else:
tagName = localname
if self.document:
node = self.document.createElementNS(uri, tagName)
else:
node = self.buildDocument(uri, tagName)
else:
# When the tagname is not prefixed, it just appears as
# localname
if self.document:
node = self.document.createElement(localname)
else:
node = self.buildDocument(None, localname)
for aname,value in attrs.items():
a_uri, a_localname = aname
if a_uri == xmlns_uri:
if a_localname == 'xmlns':
qname = a_localname
else:
qname = 'xmlns:' + a_localname
attr = self.document.createAttributeNS(a_uri, qname)
node.setAttributeNodeNS(attr)
elif a_uri:
prefix = self._current_context[a_uri]
if prefix:
qname = prefix + ":" + a_localname
else:
qname = a_localname
attr = self.document.createAttributeNS(a_uri, qname)
node.setAttributeNodeNS(attr)
else:
attr = self.document.createAttribute(a_localname)
node.setAttributeNode(attr)
attr.value = value
self.lastEvent[1] = [(START_ELEMENT, node), None]
self.lastEvent = self.lastEvent[1]
self.push(node)
def endElementNS(self, name, tagName):
self.lastEvent[1] = [(END_ELEMENT, self.pop()), None]
self.lastEvent = self.lastEvent[1]
def startElement(self, name, attrs):
if self.document:
node = self.document.createElement(name)
else:
node = self.buildDocument(None, name)
for aname,value in attrs.items():
attr = self.document.createAttribute(aname)
attr.value = value
node.setAttributeNode(attr)
self.lastEvent[1] = [(START_ELEMENT, node), None]
self.lastEvent = self.lastEvent[1]
self.push(node)
def endElement(self, name):
self.lastEvent[1] = [(END_ELEMENT, self.pop()), None]
self.lastEvent = self.lastEvent[1]
def comment(self, s):
if self.document:
node = self.document.createComment(s)
self.lastEvent[1] = [(COMMENT, node), None]
self.lastEvent = self.lastEvent[1]
else:
event = [(COMMENT, s), None]
self.pending_events.append(event)
def processingInstruction(self, target, data):
if self.document:
node = self.document.createProcessingInstruction(target, data)
self.lastEvent[1] = [(PROCESSING_INSTRUCTION, node), None]
self.lastEvent = self.lastEvent[1]
else:
event = [(PROCESSING_INSTRUCTION, target, data), None]
self.pending_events.append(event)
def ignorableWhitespace(self, chars):
node = self.document.createTextNode(chars)
self.lastEvent[1] = [(IGNORABLE_WHITESPACE, node), None]
self.lastEvent = self.lastEvent[1]
def characters(self, chars):
node = self.document.createTextNode(chars)
self.lastEvent[1] = [(CHARACTERS, node), None]
self.lastEvent = self.lastEvent[1]
def startDocument(self):
if self.documentFactory is None:
import xml.dom.minidom
self.documentFactory = xml.dom.minidom.Document.implementation
def buildDocument(self, uri, tagname):
# Can't do that in startDocument, since we need the tagname
# XXX: obtain DocumentType
node = self.documentFactory.createDocument(uri, tagname, None)
self.document = node
self.lastEvent[1] = [(START_DOCUMENT, node), None]
self.lastEvent = self.lastEvent[1]
self.push(node)
# Put everything we have seen so far into the document
for e in self.pending_events:
if e[0][0] == PROCESSING_INSTRUCTION:
_,target,data = e[0]
n = self.document.createProcessingInstruction(target, data)
e[0] = (PROCESSING_INSTRUCTION, n)
elif e[0][0] == COMMENT:
n = self.document.createComment(e[0][1])
e[0] = (COMMENT, n)
else:
raise AssertionError("Unknown pending event ",e[0][0])
self.lastEvent[1] = e
self.lastEvent = e
self.pending_events = None
return node.firstChild
def endDocument(self):
self.lastEvent[1] = [(END_DOCUMENT, self.document), None]
self.pop()
def clear(self):
"clear(): Explicitly release parsing structures"
self.document = None
class ErrorHandler:
def warning(self, exception):
print(exception)
def error(self, exception):
raise exception
def fatalError(self, exception):
raise exception
class DOMEventStream:
def __init__(self, stream, parser, bufsize):
self.stream = stream
self.parser = parser
self.bufsize = bufsize
if not hasattr(self.parser, 'feed'):
self.getEvent = self._slurp
self.reset()
def reset(self):
self.pulldom = PullDOM()
# This content handler relies on namespace support
self.parser.setFeature(xml.sax.handler.feature_namespaces, 1)
self.parser.setContentHandler(self.pulldom)
def __getitem__(self, pos):
rc = self.getEvent()
if rc:
return rc
raise IndexError
def __next__(self):
rc = self.getEvent()
if rc:
return rc
raise StopIteration
def __iter__(self):
return self
def expandNode(self, node):
event = self.getEvent()
parents = [node]
while event:
token, cur_node = event
if cur_node is node:
return
if token != END_ELEMENT:
parents[-1].appendChild(cur_node)
if token == START_ELEMENT:
parents.append(cur_node)
elif token == END_ELEMENT:
del parents[-1]
event = self.getEvent()
def getEvent(self):
# use IncrementalParser interface, so we get the desired
# pull effect
if not self.pulldom.firstEvent[1]:
self.pulldom.lastEvent = self.pulldom.firstEvent
while not self.pulldom.firstEvent[1]:
buf = self.stream.read(self.bufsize)
if not buf:
self.parser.close()
return None
self.parser.feed(buf)
rc = self.pulldom.firstEvent[1][0]
self.pulldom.firstEvent[1] = self.pulldom.firstEvent[1][1]
return rc
def _slurp(self):
""" Fallback replacement for getEvent() using the
standard SAX2 interface, which means we slurp the
SAX events into memory (no performance gain, but
we are compatible to all SAX parsers).
"""
self.parser.parse(self.stream)
self.getEvent = self._emit
return self._emit()
def _emit(self):
""" Fallback replacement for getEvent() that emits
the events that _slurp() read previously.
"""
rc = self.pulldom.firstEvent[1][0]
self.pulldom.firstEvent[1] = self.pulldom.firstEvent[1][1]
return rc
def clear(self):
"""clear(): Explicitly release parsing objects"""
self.pulldom.clear()
del self.pulldom
self.parser = None
self.stream = None
class SAX2DOM(PullDOM):
def startElementNS(self, name, tagName , attrs):
PullDOM.startElementNS(self, name, tagName, attrs)
curNode = self.elementStack[-1]
parentNode = self.elementStack[-2]
parentNode.appendChild(curNode)
def startElement(self, name, attrs):
PullDOM.startElement(self, name, attrs)
curNode = self.elementStack[-1]
parentNode = self.elementStack[-2]
parentNode.appendChild(curNode)
def processingInstruction(self, target, data):
PullDOM.processingInstruction(self, target, data)
node = self.lastEvent[0][1]
parentNode = self.elementStack[-1]
parentNode.appendChild(node)
def ignorableWhitespace(self, chars):
PullDOM.ignorableWhitespace(self, chars)
node = self.lastEvent[0][1]
parentNode = self.elementStack[-1]
parentNode.appendChild(node)
def characters(self, chars):
PullDOM.characters(self, chars)
node = self.lastEvent[0][1]
parentNode = self.elementStack[-1]
parentNode.appendChild(node)
default_bufsize = (2 ** 14) - 20
def parse(stream_or_string, parser=None, bufsize=None):
if bufsize is None:
bufsize = default_bufsize
if isinstance(stream_or_string, str):
stream = open(stream_or_string, 'rb')
else:
stream = stream_or_string
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(stream, parser, bufsize)
def parseString(string, parser=None):
from io import StringIO
bufsize = len(string)
buf = StringIO(string)
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(buf, parser, bufsize)
"""Implementation of the DOM Level 3 'LS-Load' feature."""
import copy
import xml.dom
from xml.dom.NodeFilter import NodeFilter
__all__ = ["DOMBuilder", "DOMEntityResolver", "DOMInputSource"]
class Options:
"""Features object that has variables set for each DOMBuilder feature.
The DOMBuilder class uses an instance of this class to pass settings to
the ExpatBuilder class.
"""
# Note that the DOMBuilder class in LoadSave constrains which of these
# values can be set using the DOM Level 3 LoadSave feature.
namespaces = 1
namespace_declarations = True
validation = False
external_parameter_entities = True
external_general_entities = True
external_dtd_subset = True
validate_if_schema = False
validate = False
datatype_normalization = False
create_entity_ref_nodes = True
entities = True
whitespace_in_element_content = True
cdata_sections = True
comments = True
charset_overrides_xml_encoding = True
infoset = False
supported_mediatypes_only = False
errorHandler = None
filter = None
class DOMBuilder:
entityResolver = None
errorHandler = None
filter = None
ACTION_REPLACE = 1
ACTION_APPEND_AS_CHILDREN = 2
ACTION_INSERT_AFTER = 3
ACTION_INSERT_BEFORE = 4
_legal_actions = (ACTION_REPLACE, ACTION_APPEND_AS_CHILDREN,
ACTION_INSERT_AFTER, ACTION_INSERT_BEFORE)
def __init__(self):
self._options = Options()
def _get_entityResolver(self):
return self.entityResolver
def _set_entityResolver(self, entityResolver):
self.entityResolver = entityResolver
def _get_errorHandler(self):
return self.errorHandler
def _set_errorHandler(self, errorHandler):
self.errorHandler = errorHandler
def _get_filter(self):
return self.filter
def _set_filter(self, filter):
self.filter = filter
def setFeature(self, name, state):
if self.supportsFeature(name):
state = state and 1 or 0
try:
settings = self._settings[(_name_xform(name), state)]
except KeyError:
raise xml.dom.NotSupportedErr(
"unsupported feature: %r" % (name,))
else:
for name, value in settings:
setattr(self._options, name, value)
else:
raise xml.dom.NotFoundErr("unknown feature: " + repr(name))
def supportsFeature(self, name):
return hasattr(self._options, _name_xform(name))
def canSetFeature(self, name, state):
key = (_name_xform(name), state and 1 or 0)
return key in self._settings
# This dictionary maps from (feature,value) to a list of
# (option,value) pairs that should be set on the Options object.
# If a (feature,value) setting is not in this dictionary, it is
# not supported by the DOMBuilder.
#
_settings = {
("namespace_declarations", 0): [
("namespace_declarations", 0)],
("namespace_declarations", 1): [
("namespace_declarations", 1)],
("validation", 0): [
("validation", 0)],
("external_general_entities", 0): [
("external_general_entities", 0)],
("external_general_entities", 1): [
("external_general_entities", 1)],
("external_parameter_entities", 0): [
("external_parameter_entities", 0)],
("external_parameter_entities", 1): [
("external_parameter_entities", 1)],
("validate_if_schema", 0): [
("validate_if_schema", 0)],
("create_entity_ref_nodes", 0): [
("create_entity_ref_nodes", 0)],
("create_entity_ref_nodes", 1): [
("create_entity_ref_nodes", 1)],
("entities", 0): [
("create_entity_ref_nodes", 0),
("entities", 0)],
("entities", 1): [
("entities", 1)],
("whitespace_in_element_content", 0): [
("whitespace_in_element_content", 0)],
("whitespace_in_element_content", 1): [
("whitespace_in_element_content", 1)],
("cdata_sections", 0): [
("cdata_sections", 0)],
("cdata_sections", 1): [
("cdata_sections", 1)],
("comments", 0): [
("comments", 0)],
("comments", 1): [
("comments", 1)],
("charset_overrides_xml_encoding", 0): [
("charset_overrides_xml_encoding", 0)],
("charset_overrides_xml_encoding", 1): [
("charset_overrides_xml_encoding", 1)],
("infoset", 0): [],
("infoset", 1): [
("namespace_declarations", 0),
("validate_if_schema", 0),
("create_entity_ref_nodes", 0),
("entities", 0),
("cdata_sections", 0),
("datatype_normalization", 1),
("whitespace_in_element_content", 1),
("comments", 1),
("charset_overrides_xml_encoding", 1)],
("supported_mediatypes_only", 0): [
("supported_mediatypes_only", 0)],
("namespaces", 0): [
("namespaces", 0)],
("namespaces", 1): [
("namespaces", 1)],
}
def getFeature(self, name):
xname = _name_xform(name)
try:
return getattr(self._options, xname)
except AttributeError:
if name == "infoset":
options = self._options
return (options.datatype_normalization
and options.whitespace_in_element_content
and options.comments
and options.charset_overrides_xml_encoding
and not (options.namespace_declarations
or options.validate_if_schema
or options.create_entity_ref_nodes
or options.entities
or options.cdata_sections))
raise xml.dom.NotFoundErr("feature %s not known" % repr(name))
def parseURI(self, uri):
if self.entityResolver:
input = self.entityResolver.resolveEntity(None, uri)
else:
input = DOMEntityResolver().resolveEntity(None, uri)
return self.parse(input)
def parse(self, input):
options = copy.copy(self._options)
options.filter = self.filter
options.errorHandler = self.errorHandler
fp = input.byteStream
if fp is None and options.systemId:
import urllib.request
fp = urllib.request.urlopen(input.systemId)
return self._parse_bytestream(fp, options)
def parseWithContext(self, input, cnode, action):
if action not in self._legal_actions:
raise ValueError("not a legal action")
raise NotImplementedError("Haven't written this yet...")
def _parse_bytestream(self, stream, options):
import xml.dom.expatbuilder
builder = xml.dom.expatbuilder.makeBuilder(options)
return builder.parseFile(stream)
def _name_xform(name):
return name.lower().replace('-', '_')
class DOMEntityResolver(object):
__slots__ = '_opener',
def resolveEntity(self, publicId, systemId):
assert systemId is not None
source = DOMInputSource()
source.publicId = publicId
source.systemId = systemId
source.byteStream = self._get_opener().open(systemId)
# determine the encoding if the transport provided it
source.encoding = self._guess_media_encoding(source)
# determine the base URI is we can
import posixpath, urllib.parse
parts = urllib.parse.urlparse(systemId)
scheme, netloc, path, params, query, fragment = parts
# XXX should we check the scheme here as well?
if path and not path.endswith("/"):
path = posixpath.dirname(path) + "/"
parts = scheme, netloc, path, params, query, fragment
source.baseURI = urllib.parse.urlunparse(parts)
return source
def _get_opener(self):
try:
return self._opener
except AttributeError:
self._opener = self._create_opener()
return self._opener
def _create_opener(self):
import urllib.request
return urllib.request.build_opener()
def _guess_media_encoding(self, source):
info = source.byteStream.info()
if "Content-Type" in info:
for param in info.getplist():
if param.startswith("charset="):
return param.split("=", 1)[1].lower()
class DOMInputSource(object):
__slots__ = ('byteStream', 'characterStream', 'stringData',
'encoding', 'publicId', 'systemId', 'baseURI')
def __init__(self):
self.byteStream = None
self.characterStream = None
self.stringData = None
self.encoding = None
self.publicId = None
self.systemId = None
self.baseURI = None
def _get_byteStream(self):
return self.byteStream
def _set_byteStream(self, byteStream):
self.byteStream = byteStream
def _get_characterStream(self):
return self.characterStream
def _set_characterStream(self, characterStream):
self.characterStream = characterStream
def _get_stringData(self):
return self.stringData
def _set_stringData(self, data):
self.stringData = data
def _get_encoding(self):
return self.encoding
def _set_encoding(self, encoding):
self.encoding = encoding
def _get_publicId(self):
return self.publicId
def _set_publicId(self, publicId):
self.publicId = publicId
def _get_systemId(self):
return self.systemId
def _set_systemId(self, systemId):
self.systemId = systemId
def _get_baseURI(self):
return self.baseURI
def _set_baseURI(self, uri):
self.baseURI = uri
class DOMBuilderFilter:
"""Element filter which can be used to tailor construction of
a DOM instance.
"""
# There's really no need for this class; concrete implementations
# should just implement the endElement() and startElement()
# methods as appropriate. Using this makes it easy to only
# implement one of them.
FILTER_ACCEPT = 1
FILTER_REJECT = 2
FILTER_SKIP = 3
FILTER_INTERRUPT = 4
whatToShow = NodeFilter.SHOW_ALL
def _get_whatToShow(self):
return self.whatToShow
def acceptNode(self, element):
return self.FILTER_ACCEPT
def startContainer(self, element):
return self.FILTER_ACCEPT
del NodeFilter
class DocumentLS:
"""Mixin to create documents that conform to the load/save spec."""
async_ = False
def _get_async(self):
return False
def _set_async(self, flag):
if flag:
raise xml.dom.NotSupportedErr(
"asynchronous document loading is not supported")
def abort(self):
# What does it mean to "clear" a document? Does the
# documentElement disappear?
raise NotImplementedError(
"haven't figured out what this means yet")
def load(self, uri):
raise NotImplementedError("haven't written this yet")
def loadXML(self, source):
raise NotImplementedError("haven't written this yet")
def saveXML(self, snode):
if snode is None:
snode = self
elif snode.ownerDocument is not self:
raise xml.dom.WrongDocumentErr()
return snode.toxml()
class DOMImplementationLS:
MODE_SYNCHRONOUS = 1
MODE_ASYNCHRONOUS = 2
def createDOMBuilder(self, mode, schemaType):
if schemaType is not None:
raise xml.dom.NotSupportedErr(
"schemaType not yet supported")
if mode == self.MODE_SYNCHRONOUS:
return DOMBuilder()
if mode == self.MODE_ASYNCHRONOUS:
raise xml.dom.NotSupportedErr(
"asynchronous builders are not supported")
raise ValueError("unknown value for mode")
def createDOMWriter(self):
raise NotImplementedError(
"the writer interface hasn't been written yet!")
def createDOMInputSource(self):
return DOMInputSource()
"""W3C Document Object Model implementation for Python.
The Python mapping of the Document Object Model is documented in the
Python Library Reference in the section on the xml.dom package.
This package contains the following modules:
minidom -- A simple implementation of the Level 1 DOM with namespace
support added (based on the Level 2 specification) and other
minor Level 2 functionality.
pulldom -- DOM builder supporting on-demand tree-building for selected
subtrees of the document.
"""
class Node:
"""Class giving the NodeType constants."""
__slots__ = ()
# DOM implementations may use this as a base class for their own
# Node implementations. If they don't, the constants defined here
# should still be used as the canonical definitions as they match
# the values given in the W3C recommendation. Client code can
# safely refer to these values in all tests of Node.nodeType
# values.
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
#ExceptionCode
INDEX_SIZE_ERR = 1
DOMSTRING_SIZE_ERR = 2
HIERARCHY_REQUEST_ERR = 3
WRONG_DOCUMENT_ERR = 4
INVALID_CHARACTER_ERR = 5
NO_DATA_ALLOWED_ERR = 6
NO_MODIFICATION_ALLOWED_ERR = 7
NOT_FOUND_ERR = 8
NOT_SUPPORTED_ERR = 9
INUSE_ATTRIBUTE_ERR = 10
INVALID_STATE_ERR = 11
SYNTAX_ERR = 12
INVALID_MODIFICATION_ERR = 13
NAMESPACE_ERR = 14
INVALID_ACCESS_ERR = 15
VALIDATION_ERR = 16
class DOMException(Exception):
"""Abstract base class for DOM exceptions.
Exceptions with specific codes are specializations of this class."""
def __init__(self, *args, **kw):
if self.__class__ is DOMException:
raise RuntimeError(
"DOMException should not be instantiated directly")
Exception.__init__(self, *args, **kw)
def _get_code(self):
return self.code
class IndexSizeErr(DOMException):
code = INDEX_SIZE_ERR
class DomstringSizeErr(DOMException):
code = DOMSTRING_SIZE_ERR
class HierarchyRequestErr(DOMException):
code = HIERARCHY_REQUEST_ERR
class WrongDocumentErr(DOMException):
code = WRONG_DOCUMENT_ERR
class InvalidCharacterErr(DOMException):
code = INVALID_CHARACTER_ERR
class NoDataAllowedErr(DOMException):
code = NO_DATA_ALLOWED_ERR
class NoModificationAllowedErr(DOMException):
code = NO_MODIFICATION_ALLOWED_ERR
class NotFoundErr(DOMException):
code = NOT_FOUND_ERR
class NotSupportedErr(DOMException):
code = NOT_SUPPORTED_ERR
class InuseAttributeErr(DOMException):
code = INUSE_ATTRIBUTE_ERR
class InvalidStateErr(DOMException):
code = INVALID_STATE_ERR
class SyntaxErr(DOMException):
code = SYNTAX_ERR
class InvalidModificationErr(DOMException):
code = INVALID_MODIFICATION_ERR
class NamespaceErr(DOMException):
code = NAMESPACE_ERR
class InvalidAccessErr(DOMException):
code = INVALID_ACCESS_ERR
class ValidationErr(DOMException):
code = VALIDATION_ERR
class UserDataHandler:
"""Class giving the operation constants for UserDataHandler.handle()."""
# Based on DOM Level 3 (WD 9 April 2002)
NODE_CLONED = 1
NODE_IMPORTED = 2
NODE_DELETED = 3
NODE_RENAMED = 4
XML_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
XMLNS_NAMESPACE = "http://www.w3.org/2000/xmlns/"
XHTML_NAMESPACE = "http://www.w3.org/1999/xhtml"
EMPTY_NAMESPACE = None
EMPTY_PREFIX = None
from .domreg import getDOMImplementation, registerDOMImplementation
# Deprecated alias for xml.etree.ElementTree
from xml.etree.ElementTree import *
#
# ElementTree
# $Id: ElementInclude.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xinclude support for element trees
#
# history:
# 2003-08-15 fl created
# 2003-11-14 fl fixed default loader
#
# Copyright (c) 2003-2004 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
##
# Limited XInclude support for the ElementTree package.
##
import copy
from . import ElementTree
XINCLUDE = "{http://www.w3.org/2001/XInclude}"
XINCLUDE_INCLUDE = XINCLUDE + "include"
XINCLUDE_FALLBACK = XINCLUDE + "fallback"
##
# Fatal include error.
class FatalIncludeError(SyntaxError):
pass
##
# Default loader. This loader reads an included resource from disk.
#
# @param href Resource reference.
# @param parse Parse mode. Either "xml" or "text".
# @param encoding Optional text encoding (UTF-8 by default for "text").
# @return The expanded resource. If the parse mode is "xml", this
# is an ElementTree instance. If the parse mode is "text", this
# is a Unicode string. If the loader fails, it can return None
# or raise an OSError exception.
# @throws OSError If the loader fails to load the resource.
def default_loader(href, parse, encoding=None):
if parse == "xml":
with open(href, 'rb') as file:
data = ElementTree.parse(file).getroot()
else:
if not encoding:
encoding = 'UTF-8'
with open(href, 'r', encoding=encoding) as file:
data = file.read()
return data
##
# Expand XInclude directives.
#
# @param elem Root element.
# @param loader Optional resource loader. If omitted, it defaults
# to {@link default_loader}. If given, it should be a callable
# that implements the same interface as <b>default_loader</b>.
# @throws FatalIncludeError If the function fails to include a given
# resource, or if the tree contains malformed XInclude elements.
# @throws OSError If the function fails to load a given resource.
def include(elem, loader=None):
if loader is None:
loader = default_loader
# look for xinclude elements
i = 0
while i < len(elem):
e = elem[i]
if e.tag == XINCLUDE_INCLUDE:
# process xinclude directive
href = e.get("href")
parse = e.get("parse", "xml")
if parse == "xml":
node = loader(href, parse)
if node is None:
raise FatalIncludeError(
"cannot load %r as %r" % (href, parse)
)
node = copy.copy(node)
if e.tail:
node.tail = (node.tail or "") + e.tail
elem[i] = node
elif parse == "text":
text = loader(href, parse, e.get("encoding"))
if text is None:
raise FatalIncludeError(
"cannot load %r as %r" % (href, parse)
)
if i:
node = elem[i-1]
node.tail = (node.tail or "") + text + (e.tail or "")
else:
elem.text = (elem.text or "") + text + (e.tail or "")
del elem[i]
continue
else:
raise FatalIncludeError(
"unknown parse type in xi:include tag (%r)" % parse
)
elif e.tag == XINCLUDE_FALLBACK:
raise FatalIncludeError(
"xi:fallback tag must be child of xi:include (%r)" % e.tag
)
else:
include(e, loader)
i = i + 1
#
# ElementTree
# $Id: ElementPath.py 3375 2008-02-13 08:05:08Z fredrik $
#
# limited xpath support for element trees
#
# history:
# 2003-05-23 fl created
# 2003-05-28 fl added support for // etc
# 2003-08-27 fl fixed parsing of periods in element names
# 2007-09-10 fl new selection engine
# 2007-09-12 fl fixed parent selector
# 2007-09-13 fl added iterfind; changed findall to return a list
# 2007-11-30 fl added namespaces support
# 2009-10-30 fl added child element value filter
#
# Copyright (c) 2003-2009 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2009 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
##
# Implementation module for XPath support. There's usually no reason
# to import this module directly; the <b>ElementTree</b> does this for
# you, if needed.
##
import re
xpath_tokenizer_re = re.compile(
"("
"'[^']*'|\"[^\"]*\"|"
"::|"
"//?|"
"\.\.|"
"\(\)|"
"[/.*:\[\]\(\)@=])|"
"((?:\{[^}]+\})?[^/\[\]\(\)@=\s]+)|"
"\s+"
)
def xpath_tokenizer(pattern, namespaces=None):
for token in xpath_tokenizer_re.findall(pattern):
tag = token[1]
if tag and tag[0] != "{" and ":" in tag:
try:
prefix, uri = tag.split(":", 1)
if not namespaces:
raise KeyError
yield token[0], "{%s}%s" % (namespaces[prefix], uri)
except KeyError:
raise SyntaxError("prefix %r not found in prefix map" % prefix)
else:
yield token
def get_parent_map(context):
parent_map = context.parent_map
if parent_map is None:
context.parent_map = parent_map = {}
for p in context.root.iter():
for e in p:
parent_map[e] = p
return parent_map
def prepare_child(next, token):
tag = token[1]
def select(context, result):
for elem in result:
for e in elem:
if e.tag == tag:
yield e
return select
def prepare_star(next, token):
def select(context, result):
for elem in result:
yield from elem
return select
def prepare_self(next, token):
def select(context, result):
yield from result
return select
def prepare_descendant(next, token):
token = next()
if token[0] == "*":
tag = "*"
elif not token[0]:
tag = token[1]
else:
raise SyntaxError("invalid descendant")
def select(context, result):
for elem in result:
for e in elem.iter(tag):
if e is not elem:
yield e
return select
def prepare_parent(next, token):
def select(context, result):
# FIXME: raise error if .. is applied at toplevel?
parent_map = get_parent_map(context)
result_map = {}
for elem in result:
if elem in parent_map:
parent = parent_map[elem]
if parent not in result_map:
result_map[parent] = None
yield parent
return select
def prepare_predicate(next, token):
# FIXME: replace with real parser!!! refs:
# http://effbot.org/zone/simple-iterator-parser.htm
# http://javascript.crockford.com/tdop/tdop.html
signature = []
predicate = []
while 1:
token = next()
if token[0] == "]":
break
if token[0] and token[0][:1] in "'\"":
token = "'", token[0][1:-1]
signature.append(token[0] or "-")
predicate.append(token[1])
signature = "".join(signature)
# use signature to determine predicate type
if signature == "@-":
# [@attribute] predicate
key = predicate[1]
def select(context, result):
for elem in result:
if elem.get(key) is not None:
yield elem
return select
if signature == "@-='":
# [@attribute='value']
key = predicate[1]
value = predicate[-1]
def select(context, result):
for elem in result:
if elem.get(key) == value:
yield elem
return select
if signature == "-" and not re.match("\-?\d+$", predicate[0]):
# [tag]
tag = predicate[0]
def select(context, result):
for elem in result:
if elem.find(tag) is not None:
yield elem
return select
if signature == "-='" and not re.match("\-?\d+$", predicate[0]):
# [tag='value']
tag = predicate[0]
value = predicate[-1]
def select(context, result):
for elem in result:
for e in elem.findall(tag):
if "".join(e.itertext()) == value:
yield elem
break
return select
if signature == "-" or signature == "-()" or signature == "-()-":
# [index] or [last()] or [last()-index]
if signature == "-":
# [index]
index = int(predicate[0]) - 1
if index < 0:
raise SyntaxError("XPath position >= 1 expected")
else:
if predicate[0] != "last":
raise SyntaxError("unsupported function")
if signature == "-()-":
try:
index = int(predicate[2]) - 1
except ValueError:
raise SyntaxError("unsupported expression")
if index > -2:
raise SyntaxError("XPath offset from last() must be negative")
else:
index = -1
def select(context, result):
parent_map = get_parent_map(context)
for elem in result:
try:
parent = parent_map[elem]
# FIXME: what if the selector is "*" ?
elems = list(parent.findall(elem.tag))
if elems[index] is elem:
yield elem
except (IndexError, KeyError):
pass
return select
raise SyntaxError("invalid predicate")
ops = {
"": prepare_child,
"*": prepare_star,
".": prepare_self,
"..": prepare_parent,
"//": prepare_descendant,
"[": prepare_predicate,
}
_cache = {}
class _SelectorContext:
parent_map = None
def __init__(self, root):
self.root = root
# --------------------------------------------------------------------
##
# Generate all matching objects.
def iterfind(elem, path, namespaces=None):
# compile selector pattern
cache_key = (path, None if namespaces is None
else tuple(sorted(namespaces.items())))
if path[-1:] == "/":
path = path + "*" # implicit all (FIXME: keep this?)
try:
selector = _cache[cache_key]
except KeyError:
if len(_cache) > 100:
_cache.clear()
if path[:1] == "/":
raise SyntaxError("cannot use absolute path on element")
next = iter(xpath_tokenizer(path, namespaces)).__next__
token = next()
selector = []
while 1:
try:
selector.append(ops[token[0]](next, token))
except StopIteration:
raise SyntaxError("invalid path")
try:
token = next()
if token[0] == "/":
token = next()
except StopIteration:
break
_cache[cache_key] = selector
# execute selector pattern
result = [elem]
context = _SelectorContext(elem)
for select in selector:
result = select(context, result)
return result
##
# Find first matching object.
def find(elem, path, namespaces=None):
try:
return next(iterfind(elem, path, namespaces))
except StopIteration:
return None
##
# Find all matching objects.
def findall(elem, path, namespaces=None):
return list(iterfind(elem, path, namespaces))
##
# Find text for first matching object.
def findtext(elem, path, default=None, namespaces=None):
try:
elem = next(iterfind(elem, path, namespaces))
return elem.text or ""
except StopIteration:
return default
"""Lightweight XML support for Python.
XML is an inherently hierarchical data format, and the most natural way to
represent it is with a tree. This module has two classes for this purpose:
1. ElementTree represents the whole XML document as a tree and
2. Element represents a single node in this tree.
Interactions with the whole document (reading and writing to/from files) are
usually done on the ElementTree level. Interactions with a single XML element
and its sub-elements are done on the Element level.
Element is a flexible container object designed to store hierarchical data
structures in memory. It can be described as a cross between a list and a
dictionary. Each Element has a number of properties associated with it:
'tag' - a string containing the element's name.
'attributes' - a Python dictionary storing the element's attributes.
'text' - a string containing the element's text content.
'tail' - an optional string containing text after the element's end tag.
And a number of child elements stored in a Python sequence.
To create an element instance, use the Element constructor,
or the SubElement factory function.
You can also use the ElementTree class to wrap an element structure
and convert it to and from XML.
"""
#---------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
#
# ElementTree
# Copyright (c) 1999-2008 by Fredrik Lundh. All rights reserved.
#
# [email protected]
# http://www.pythonware.com
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
__all__ = [
# public symbols
"Comment",
"dump",
"Element", "ElementTree",
"fromstring", "fromstringlist",
"iselement", "iterparse",
"parse", "ParseError",
"PI", "ProcessingInstruction",
"QName",
"SubElement",
"tostring", "tostringlist",
"TreeBuilder",
"VERSION",
"XML", "XMLID",
"XMLParser",
"register_namespace",
]
VERSION = "1.3.0"
import sys
import re
import warnings
import io
import contextlib
from . import ElementPath
class ParseError(SyntaxError):
"""An error when parsing an XML document.
In addition to its exception value, a ParseError contains
two extra attributes:
'code' - the specific exception code
'position' - the line and column of the error
"""
pass
# --------------------------------------------------------------------
def iselement(element):
"""Return True if *element* appears to be an Element."""
return hasattr(element, 'tag')
class Element:
"""An XML element.
This class is the reference implementation of the Element interface.
An element's length is its number of subelements. That means if you
want to check if an element is truly empty, you should check BOTH
its length AND its text attribute.
The element tag, attribute names, and attribute values can be either
bytes or strings.
*tag* is the element name. *attrib* is an optional dictionary containing
element attributes. *extra* are additional element attributes given as
keyword arguments.
Example form:
<tag attrib>text<child/>...</tag>tail
"""
tag = None
"""The element's name."""
attrib = None
"""Dictionary of the element's attributes."""
text = None
"""
Text before first subelement. This is either a string or the value None.
Note that if there is no text, this attribute may be either
None or the empty string, depending on the parser.
"""
tail = None
"""
Text after this element's end tag, but before the next sibling element's
start tag. This is either a string or the value None. Note that if there
was no text, this attribute may be either None or an empty string,
depending on the parser.
"""
def __init__(self, tag, attrib={}, **extra):
if not isinstance(attrib, dict):
raise TypeError("attrib must be dict, not %s" % (
attrib.__class__.__name__,))
attrib = attrib.copy()
attrib.update(extra)
self.tag = tag
self.attrib = attrib
self._children = []
def __repr__(self):
return "<Element %s at 0x%x>" % (repr(self.tag), id(self))
def makeelement(self, tag, attrib):
"""Create a new element with the same type.
*tag* is a string containing the element name.
*attrib* is a dictionary containing the element attributes.
Do not call this method, use the SubElement factory function instead.
"""
return self.__class__(tag, attrib)
def copy(self):
"""Return copy of current element.
This creates a shallow copy. Subelements will be shared with the
original tree.
"""
elem = self.makeelement(self.tag, self.attrib)
elem.text = self.text
elem.tail = self.tail
elem[:] = self
return elem
def __len__(self):
return len(self._children)
def __bool__(self):
warnings.warn(
"The behavior of this method will change in future versions. "
"Use specific 'len(elem)' or 'elem is not None' test instead.",
FutureWarning, stacklevel=2
)
return len(self._children) != 0 # emulate old behaviour, for now
def __getitem__(self, index):
return self._children[index]
def __setitem__(self, index, element):
# if isinstance(index, slice):
# for elt in element:
# assert iselement(elt)
# else:
# assert iselement(element)
self._children[index] = element
def __delitem__(self, index):
del self._children[index]
def append(self, subelement):
"""Add *subelement* to the end of this element.
The new element will appear in document order after the last existing
subelement (or directly after the text, if it's the first subelement),
but before the end tag for this element.
"""
self._assert_is_element(subelement)
self._children.append(subelement)
def extend(self, elements):
"""Append subelements from a sequence.
*elements* is a sequence with zero or more elements.
"""
for element in elements:
self._assert_is_element(element)
self._children.extend(elements)
def insert(self, index, subelement):
"""Insert *subelement* at position *index*."""
self._assert_is_element(subelement)
self._children.insert(index, subelement)
def _assert_is_element(self, e):
# Need to refer to the actual Python implementation, not the
# shadowing C implementation.
if not isinstance(e, _Element_Py):
raise TypeError('expected an Element, not %s' % type(e).__name__)
def remove(self, subelement):
"""Remove matching subelement.
Unlike the find methods, this method compares elements based on
identity, NOT ON tag value or contents. To remove subelements by
other means, the easiest way is to use a list comprehension to
select what elements to keep, and then use slice assignment to update
the parent element.
ValueError is raised if a matching element could not be found.
"""
# assert iselement(element)
self._children.remove(subelement)
def getchildren(self):
"""(Deprecated) Return all subelements.
Elements are returned in document order.
"""
warnings.warn(
"This method will be removed in future versions. "
"Use 'list(elem)' or iteration over elem instead.",
DeprecationWarning, stacklevel=2
)
return self._children
def find(self, path, namespaces=None):
"""Find first matching element by tag name or path.
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Return the first matching element, or None if no element was found.
"""
return ElementPath.find(self, path, namespaces)
def findtext(self, path, default=None, namespaces=None):
"""Find text for first matching element by tag name or path.
*path* is a string having either an element tag or an XPath,
*default* is the value to return if the element was not found,
*namespaces* is an optional mapping from namespace prefix to full name.
Return text content of first matching element, or default value if
none was found. Note that if an element is found having no text
content, the empty string is returned.
"""
return ElementPath.findtext(self, path, default, namespaces)
def findall(self, path, namespaces=None):
"""Find all matching subelements by tag name or path.
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Returns list containing all matching elements in document order.
"""
return ElementPath.findall(self, path, namespaces)
def iterfind(self, path, namespaces=None):
"""Find all matching subelements by tag name or path.
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Return an iterable yielding all matching elements in document order.
"""
return ElementPath.iterfind(self, path, namespaces)
def clear(self):
"""Reset element.
This function removes all subelements, clears all attributes, and sets
the text and tail attributes to None.
"""
self.attrib.clear()
self._children = []
self.text = self.tail = None
def get(self, key, default=None):
"""Get element attribute.
Equivalent to attrib.get, but some implementations may handle this a
bit more efficiently. *key* is what attribute to look for, and
*default* is what to return if the attribute was not found.
Returns a string containing the attribute value, or the default if
attribute was not found.
"""
return self.attrib.get(key, default)
def set(self, key, value):
"""Set element attribute.
Equivalent to attrib[key] = value, but some implementations may handle
this a bit more efficiently. *key* is what attribute to set, and
*value* is the attribute value to set it to.
"""
self.attrib[key] = value
def keys(self):
"""Get list of attribute names.
Names are returned in an arbitrary order, just like an ordinary
Python dict. Equivalent to attrib.keys()
"""
return self.attrib.keys()
def items(self):
"""Get element attributes as a sequence.
The attributes are returned in arbitrary order. Equivalent to
attrib.items().
Return a list of (name, value) tuples.
"""
return self.attrib.items()
def iter(self, tag=None):
"""Create tree iterator.
The iterator loops over the element and all subelements in document
order, returning all elements with a matching tag.
If the tree structure is modified during iteration, new or removed
elements may or may not be included. To get a stable set, use the
list() function on the iterator, and loop over the resulting list.
*tag* is what tags to look for (default is to return all elements)
Return an iterator containing all the matching elements.
"""
if tag == "*":
tag = None
if tag is None or self.tag == tag:
yield self
for e in self._children:
yield from e.iter(tag)
# compatibility
def getiterator(self, tag=None):
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'elem.iter()' or 'list(elem.iter())' instead.",
PendingDeprecationWarning, stacklevel=2
)
return list(self.iter(tag))
def itertext(self):
"""Create text iterator.
The iterator loops over the element and all subelements in document
order, returning all inner text.
"""
tag = self.tag
if not isinstance(tag, str) and tag is not None:
return
if self.text:
yield self.text
for e in self:
yield from e.itertext()
if e.tail:
yield e.tail
def SubElement(parent, tag, attrib={}, **extra):
"""Subelement factory which creates an element instance, and appends it
to an existing parent.
The element tag, attribute names, and attribute values can be either
bytes or Unicode strings.
*parent* is the parent element, *tag* is the subelements name, *attrib* is
an optional directory containing element attributes, *extra* are
additional attributes given as keyword arguments.
"""
attrib = attrib.copy()
attrib.update(extra)
element = parent.makeelement(tag, attrib)
parent.append(element)
return element
def Comment(text=None):
"""Comment element factory.
This function creates a special element which the standard serializer
serializes as an XML comment.
*text* is a string containing the comment string.
"""
element = Element(Comment)
element.text = text
return element
def ProcessingInstruction(target, text=None):
"""Processing Instruction element factory.
This function creates a special element which the standard serializer
serializes as an XML comment.
*target* is a string containing the processing instruction, *text* is a
string containing the processing instruction contents, if any.
"""
element = Element(ProcessingInstruction)
element.text = target
if text:
element.text = element.text + " " + text
return element
PI = ProcessingInstruction
class QName:
"""Qualified name wrapper.
This class can be used to wrap a QName attribute value in order to get
proper namespace handing on output.
*text_or_uri* is a string containing the QName value either in the form
{uri}local, or if the tag argument is given, the URI part of a QName.
*tag* is an optional argument which if given, will make the first
argument (text_or_uri) be interpreted as a URI, and this argument (tag)
be interpreted as a local name.
"""
def __init__(self, text_or_uri, tag=None):
if tag:
text_or_uri = "{%s}%s" % (text_or_uri, tag)
self.text = text_or_uri
def __str__(self):
return self.text
def __repr__(self):
return '<QName %r>' % (self.text,)
def __hash__(self):
return hash(self.text)
def __le__(self, other):
if isinstance(other, QName):
return self.text <= other.text
return self.text <= other
def __lt__(self, other):
if isinstance(other, QName):
return self.text < other.text
return self.text < other
def __ge__(self, other):
if isinstance(other, QName):
return self.text >= other.text
return self.text >= other
def __gt__(self, other):
if isinstance(other, QName):
return self.text > other.text
return self.text > other
def __eq__(self, other):
if isinstance(other, QName):
return self.text == other.text
return self.text == other
def __ne__(self, other):
if isinstance(other, QName):
return self.text != other.text
return self.text != other
# --------------------------------------------------------------------
class ElementTree:
"""An XML element hierarchy.
This class also provides support for serialization to and from
standard XML.
*element* is an optional root element node,
*file* is an optional file handle or file name of an XML file whose
contents will be used to initialize the tree with.
"""
def __init__(self, element=None, file=None):
# assert element is None or iselement(element)
self._root = element # first node
if file:
self.parse(file)
def getroot(self):
"""Return root element of this tree."""
return self._root
def _setroot(self, element):
"""Replace root element of this tree.
This will discard the current contents of the tree and replace it
with the given element. Use with care!
"""
# assert iselement(element)
self._root = element
def parse(self, source, parser=None):
"""Load external XML document into element tree.
*source* is a file name or file object, *parser* is an optional parser
instance that defaults to XMLParser.
ParseError is raised if the parser fails to parse the document.
Returns the root element of the given source document.
"""
close_source = False
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
try:
if parser is None:
# If no parser was specified, create a default XMLParser
parser = XMLParser()
if hasattr(parser, '_parse_whole'):
# The default XMLParser, when it comes from an accelerator,
# can define an internal _parse_whole API for efficiency.
# It can be used to parse the whole source without feeding
# it with chunks.
self._root = parser._parse_whole(source)
return self._root
while True:
data = source.read(65536)
if not data:
break
parser.feed(data)
self._root = parser.close()
return self._root
finally:
if close_source:
source.close()
def iter(self, tag=None):
"""Create and return tree iterator for the root element.
The iterator loops over all elements in this tree, in document order.
*tag* is a string with the tag name to iterate over
(default is to return all elements).
"""
# assert self._root is not None
return self._root.iter(tag)
# compatibility
def getiterator(self, tag=None):
# Change for a DeprecationWarning in 1.4
warnings.warn(
"This method will be removed in future versions. "
"Use 'tree.iter()' or 'list(tree.iter())' instead.",
PendingDeprecationWarning, stacklevel=2
)
return list(self.iter(tag))
def find(self, path, namespaces=None):
"""Find first matching element by tag name or path.
Same as getroot().find(path), which is Element.find()
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Return the first matching element, or None if no element was found.
"""
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.find(path, namespaces)
def findtext(self, path, default=None, namespaces=None):
"""Find first matching element by tag name or path.
Same as getroot().findtext(path), which is Element.findtext()
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Return the first matching element, or None if no element was found.
"""
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.findtext(path, default, namespaces)
def findall(self, path, namespaces=None):
"""Find all matching subelements by tag name or path.
Same as getroot().findall(path), which is Element.findall().
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Return list containing all matching elements in document order.
"""
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.findall(path, namespaces)
def iterfind(self, path, namespaces=None):
"""Find all matching subelements by tag name or path.
Same as getroot().iterfind(path), which is element.iterfind()
*path* is a string having either an element tag or an XPath,
*namespaces* is an optional mapping from namespace prefix to full name.
Return an iterable yielding all matching elements in document order.
"""
# assert self._root is not None
if path[:1] == "/":
path = "." + path
warnings.warn(
"This search is broken in 1.3 and earlier, and will be "
"fixed in a future version. If you rely on the current "
"behaviour, change it to %r" % path,
FutureWarning, stacklevel=2
)
return self._root.iterfind(path, namespaces)
def write(self, file_or_filename,
encoding=None,
xml_declaration=None,
default_namespace=None,
method=None, *,
short_empty_elements=True):
"""Write element tree to a file as XML.
Arguments:
*file_or_filename* -- file name or a file object opened for writing
*encoding* -- the output encoding (default: US-ASCII)
*xml_declaration* -- bool indicating if an XML declaration should be
added to the output. If None, an XML declaration
is added if encoding IS NOT either of:
US-ASCII, UTF-8, or Unicode
*default_namespace* -- sets the default XML namespace (for "xmlns")
*method* -- either "xml" (default), "html, "text", or "c14n"
*short_empty_elements* -- controls the formatting of elements
that contain no content. If True (default)
they are emitted as a single self-closed
tag, otherwise they are emitted as a pair
of start/end tags
"""
if not method:
method = "xml"
elif method not in _serialize:
raise ValueError("unknown method %r" % method)
if not encoding:
if method == "c14n":
encoding = "utf-8"
else:
encoding = "us-ascii"
enc_lower = encoding.lower()
with _get_writer(file_or_filename, enc_lower) as write:
if method == "xml" and (xml_declaration or
(xml_declaration is None and
enc_lower not in ("utf-8", "us-ascii", "unicode"))):
declared_encoding = encoding
if enc_lower == "unicode":
# Retrieve the default encoding for the xml declaration
import locale
declared_encoding = locale.getpreferredencoding()
write("<?xml version='1.0' encoding='%s'?>\n" % (
declared_encoding,))
if method == "text":
_serialize_text(write, self._root)
else:
qnames, namespaces = _namespaces(self._root, default_namespace)
serialize = _serialize[method]
serialize(write, self._root, qnames, namespaces,
short_empty_elements=short_empty_elements)
def write_c14n(self, file):
# lxml.etree compatibility. use output method instead
return self.write(file, method="c14n")
# --------------------------------------------------------------------
# serialization support
@contextlib.contextmanager
def _get_writer(file_or_filename, encoding):
# returns text write method and release all resources after using
try:
write = file_or_filename.write
except AttributeError:
# file_or_filename is a file name
if encoding == "unicode":
file = open(file_or_filename, "w")
else:
file = open(file_or_filename, "w", encoding=encoding,
errors="xmlcharrefreplace")
with file:
yield file.write
else:
# file_or_filename is a file-like object
# encoding determines if it is a text or binary writer
if encoding == "unicode":
# use a text writer as is
yield write
else:
# wrap a binary writer with TextIOWrapper
with contextlib.ExitStack() as stack:
if isinstance(file_or_filename, io.BufferedIOBase):
file = file_or_filename
elif isinstance(file_or_filename, io.RawIOBase):
file = io.BufferedWriter(file_or_filename)
# Keep the original file open when the BufferedWriter is
# destroyed
stack.callback(file.detach)
else:
# This is to handle passed objects that aren't in the
# IOBase hierarchy, but just have a write method
file = io.BufferedIOBase()
file.writable = lambda: True
file.write = write
try:
# TextIOWrapper uses this methods to determine
# if BOM (for UTF-16, etc) should be added
file.seekable = file_or_filename.seekable
file.tell = file_or_filename.tell
except AttributeError:
pass
file = io.TextIOWrapper(file,
encoding=encoding,
errors="xmlcharrefreplace",
newline="\n")
# Keep the original file open when the TextIOWrapper is
# destroyed
stack.callback(file.detach)
yield file.write
def _namespaces(elem, default_namespace=None):
# identify namespaces used in this tree
# maps qnames to *encoded* prefix:local names
qnames = {None: None}
# maps uri:s to prefixes
namespaces = {}
if default_namespace:
namespaces[default_namespace] = ""
def add_qname(qname):
# calculate serialized qname representation
try:
if qname[:1] == "{":
uri, tag = qname[1:].rsplit("}", 1)
prefix = namespaces.get(uri)
if prefix is None:
prefix = _namespace_map.get(uri)
if prefix is None:
prefix = "ns%d" % len(namespaces)
if prefix != "xml":
namespaces[uri] = prefix
if prefix:
qnames[qname] = "%s:%s" % (prefix, tag)
else:
qnames[qname] = tag # default element
else:
if default_namespace:
# FIXME: can this be handled in XML 1.0?
raise ValueError(
"cannot use non-qualified names with "
"default_namespace option"
)
qnames[qname] = qname
except TypeError:
_raise_serialization_error(qname)
# populate qname and namespaces table
for elem in elem.iter():
tag = elem.tag
if isinstance(tag, QName):
if tag.text not in qnames:
add_qname(tag.text)
elif isinstance(tag, str):
if tag not in qnames:
add_qname(tag)
elif tag is not None and tag is not Comment and tag is not PI:
_raise_serialization_error(tag)
for key, value in elem.items():
if isinstance(key, QName):
key = key.text
if key not in qnames:
add_qname(key)
if isinstance(value, QName) and value.text not in qnames:
add_qname(value.text)
text = elem.text
if isinstance(text, QName) and text.text not in qnames:
add_qname(text.text)
return qnames, namespaces
def _serialize_xml(write, elem, qnames, namespaces,
short_empty_elements, **kwargs):
tag = elem.tag
text = elem.text
if tag is Comment:
write("<!--%s-->" % text)
elif tag is ProcessingInstruction:
write("<?%s?>" % text)
else:
tag = qnames[tag]
if tag is None:
if text:
write(_escape_cdata(text))
for e in elem:
_serialize_xml(write, e, qnames, None,
short_empty_elements=short_empty_elements)
else:
write("<" + tag)
items = list(elem.items())
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k,
_escape_attrib(v)
))
for k, v in sorted(items): # lexical order
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib(v)
write(" %s=\"%s\"" % (qnames[k], v))
if text or len(elem) or not short_empty_elements:
write(">")
if text:
write(_escape_cdata(text))
for e in elem:
_serialize_xml(write, e, qnames, None,
short_empty_elements=short_empty_elements)
write("</" + tag + ">")
else:
write(" />")
if elem.tail:
write(_escape_cdata(elem.tail))
HTML_EMPTY = ("area", "base", "basefont", "br", "col", "frame", "hr",
"img", "input", "isindex", "link", "meta", "param")
try:
HTML_EMPTY = set(HTML_EMPTY)
except NameError:
pass
def _serialize_html(write, elem, qnames, namespaces, **kwargs):
tag = elem.tag
text = elem.text
if tag is Comment:
write("<!--%s-->" % _escape_cdata(text))
elif tag is ProcessingInstruction:
write("<?%s?>" % _escape_cdata(text))
else:
tag = qnames[tag]
if tag is None:
if text:
write(_escape_cdata(text))
for e in elem:
_serialize_html(write, e, qnames, None)
else:
write("<" + tag)
items = list(elem.items())
if items or namespaces:
if namespaces:
for v, k in sorted(namespaces.items(),
key=lambda x: x[1]): # sort on prefix
if k:
k = ":" + k
write(" xmlns%s=\"%s\"" % (
k,
_escape_attrib(v)
))
for k, v in sorted(items): # lexical order
if isinstance(k, QName):
k = k.text
if isinstance(v, QName):
v = qnames[v.text]
else:
v = _escape_attrib_html(v)
# FIXME: handle boolean attributes
write(" %s=\"%s\"" % (qnames[k], v))
write(">")
ltag = tag.lower()
if text:
if ltag == "script" or ltag == "style":
write(text)
else:
write(_escape_cdata(text))
for e in elem:
_serialize_html(write, e, qnames, None)
if ltag not in HTML_EMPTY:
write("</" + tag + ">")
if elem.tail:
write(_escape_cdata(elem.tail))
def _serialize_text(write, elem):
for part in elem.itertext():
write(part)
if elem.tail:
write(elem.tail)
_serialize = {
"xml": _serialize_xml,
"html": _serialize_html,
"text": _serialize_text,
# this optional method is imported at the end of the module
# "c14n": _serialize_c14n,
}
def register_namespace(prefix, uri):
"""Register a namespace prefix.
The registry is global, and any existing mapping for either the
given prefix or the namespace URI will be removed.
*prefix* is the namespace prefix, *uri* is a namespace uri. Tags and
attributes in this namespace will be serialized with prefix if possible.
ValueError is raised if prefix is reserved or is invalid.
"""
if re.match("ns\d+$", prefix):
raise ValueError("Prefix format reserved for internal use")
for k, v in list(_namespace_map.items()):
if k == uri or v == prefix:
del _namespace_map[k]
_namespace_map[uri] = prefix
_namespace_map = {
# "well-known" namespace prefixes
"http://www.w3.org/XML/1998/namespace": "xml",
"http://www.w3.org/1999/xhtml": "html",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
"http://schemas.xmlsoap.org/wsdl/": "wsdl",
# xml schema
"http://www.w3.org/2001/XMLSchema": "xs",
"http://www.w3.org/2001/XMLSchema-instance": "xsi",
# dublin core
"http://purl.org/dc/elements/1.1/": "dc",
}
# For tests and troubleshooting
register_namespace._namespace_map = _namespace_map
def _raise_serialization_error(text):
raise TypeError(
"cannot serialize %r (type %s)" % (text, type(text).__name__)
)
def _escape_cdata(text):
# escape character data
try:
# it's worth avoiding do-nothing calls for strings that are
# shorter than 500 character, or so. assume that's, by far,
# the most common case in most applications.
if "&" in text:
text = text.replace("&", "&")
if "<" in text:
text = text.replace("<", "<")
if ">" in text:
text = text.replace(">", ">")
return text
except (TypeError, AttributeError):
_raise_serialization_error(text)
def _escape_attrib(text):
# escape attribute value
try:
if "&" in text:
text = text.replace("&", "&")
if "<" in text:
text = text.replace("<", "<")
if ">" in text:
text = text.replace(">", ">")
if "\"" in text:
text = text.replace("\"", """)
if "\n" in text:
text = text.replace("\n", " ")
return text
except (TypeError, AttributeError):
_raise_serialization_error(text)
def _escape_attrib_html(text):
# escape attribute value
try:
if "&" in text:
text = text.replace("&", "&")
if ">" in text:
text = text.replace(">", ">")
if "\"" in text:
text = text.replace("\"", """)
return text
except (TypeError, AttributeError):
_raise_serialization_error(text)
# --------------------------------------------------------------------
def tostring(element, encoding=None, method=None, *,
short_empty_elements=True):
"""Generate string representation of XML element.
All subelements are included. If encoding is "unicode", a string
is returned. Otherwise a bytestring is returned.
*element* is an Element instance, *encoding* is an optional output
encoding defaulting to US-ASCII, *method* is an optional output which can
be one of "xml" (default), "html", "text" or "c14n".
Returns an (optionally) encoded string containing the XML data.
"""
stream = io.StringIO() if encoding == 'unicode' else io.BytesIO()
ElementTree(element).write(stream, encoding, method=method,
short_empty_elements=short_empty_elements)
return stream.getvalue()
class _ListDataStream(io.BufferedIOBase):
"""An auxiliary stream accumulating into a list reference."""
def __init__(self, lst):
self.lst = lst
def writable(self):
return True
def seekable(self):
return True
def write(self, b):
self.lst.append(b)
def tell(self):
return len(self.lst)
def tostringlist(element, encoding=None, method=None, *,
short_empty_elements=True):
lst = []
stream = _ListDataStream(lst)
ElementTree(element).write(stream, encoding, method=method,
short_empty_elements=short_empty_elements)
return lst
def dump(elem):
"""Write element tree or element structure to sys.stdout.
This function should be used for debugging only.
*elem* is either an ElementTree, or a single Element. The exact output
format is implementation dependent. In this version, it's written as an
ordinary XML file.
"""
# debugging
if not isinstance(elem, ElementTree):
elem = ElementTree(elem)
elem.write(sys.stdout, encoding="unicode")
tail = elem.getroot().tail
if not tail or tail[-1] != "\n":
sys.stdout.write("\n")
# --------------------------------------------------------------------
# parsing
def parse(source, parser=None):
"""Parse XML document into element tree.
*source* is a filename or file object containing XML data,
*parser* is an optional parser instance defaulting to XMLParser.
Return an ElementTree instance.
"""
tree = ElementTree()
tree.parse(source, parser)
return tree
def iterparse(source, events=None, parser=None):
"""Incrementally parse XML document into ElementTree.
This class also reports what's going on to the user based on the
*events* it is initialized with. The supported events are the strings
"start", "end", "start-ns" and "end-ns" (the "ns" events are used to get
detailed namespace information). If *events* is omitted, only
"end" events are reported.
*source* is a filename or file object containing XML data, *events* is
a list of events to report back, *parser* is an optional parser instance.
Returns an iterator providing (event, elem) pairs.
"""
close_source = False
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
try:
return _IterParseIterator(source, events, parser, close_source)
except:
if close_source:
source.close()
raise
class XMLPullParser:
def __init__(self, events=None, *, _parser=None):
# The _parser argument is for internal use only and must not be relied
# upon in user code. It will be removed in a future release.
# See http://bugs.python.org/issue17741 for more details.
# _elementtree.c expects a list, not a deque
self._events_queue = []
self._index = 0
self._parser = _parser or XMLParser(target=TreeBuilder())
# wire up the parser for event reporting
if events is None:
events = ("end",)
self._parser._setevents(self._events_queue, events)
def feed(self, data):
"""Feed encoded data to parser."""
if self._parser is None:
raise ValueError("feed() called after end of stream")
if data:
try:
self._parser.feed(data)
except SyntaxError as exc:
self._events_queue.append(exc)
def _close_and_return_root(self):
# iterparse needs this to set its root attribute properly :(
root = self._parser.close()
self._parser = None
return root
def close(self):
"""Finish feeding data to parser.
Unlike XMLParser, does not return the root element. Use
read_events() to consume elements from XMLPullParser.
"""
self._close_and_return_root()
def read_events(self):
"""Return an iterator over currently available (event, elem) pairs.
Events are consumed from the internal event queue as they are
retrieved from the iterator.
"""
events = self._events_queue
while True:
index = self._index
try:
event = events[self._index]
# Avoid retaining references to past events
events[self._index] = None
except IndexError:
break
index += 1
# Compact the list in a O(1) amortized fashion
# As noted above, _elementree.c needs a list, not a deque
if index * 2 >= len(events):
events[:index] = []
self._index = 0
else:
self._index = index
if isinstance(event, Exception):
raise event
else:
yield event
class _IterParseIterator:
def __init__(self, source, events, parser, close_source=False):
# Use the internal, undocumented _parser argument for now; When the
# parser argument of iterparse is removed, this can be killed.
self._parser = XMLPullParser(events=events, _parser=parser)
self._file = source
self._close_file = close_source
self.root = self._root = None
def __next__(self):
try:
while 1:
for event in self._parser.read_events():
return event
if self._parser._parser is None:
break
# load event buffer
data = self._file.read(16 * 1024)
if data:
self._parser.feed(data)
else:
self._root = self._parser._close_and_return_root()
self.root = self._root
except:
if self._close_file:
self._file.close()
raise
if self._close_file:
self._file.close()
raise StopIteration
def __iter__(self):
return self
def XML(text, parser=None):
"""Parse XML document from string constant.
This function can be used to embed "XML Literals" in Python code.
*text* is a string containing XML data, *parser* is an
optional parser instance, defaulting to the standard XMLParser.
Returns an Element instance.
"""
if not parser:
parser = XMLParser(target=TreeBuilder())
parser.feed(text)
return parser.close()
def XMLID(text, parser=None):
"""Parse XML document from string constant for its IDs.
*text* is a string containing XML data, *parser* is an
optional parser instance, defaulting to the standard XMLParser.
Returns an (Element, dict) tuple, in which the
dict maps element id:s to elements.
"""
if not parser:
parser = XMLParser(target=TreeBuilder())
parser.feed(text)
tree = parser.close()
ids = {}
for elem in tree.iter():
id = elem.get("id")
if id:
ids[id] = elem
return tree, ids
# Parse XML document from string constant. Alias for XML().
fromstring = XML
def fromstringlist(sequence, parser=None):
"""Parse XML document from sequence of string fragments.
*sequence* is a list of other sequence, *parser* is an optional parser
instance, defaulting to the standard XMLParser.
Returns an Element instance.
"""
if not parser:
parser = XMLParser(target=TreeBuilder())
for text in sequence:
parser.feed(text)
return parser.close()
# --------------------------------------------------------------------
class TreeBuilder:
"""Generic element structure builder.
This builder converts a sequence of start, data, and end method
calls to a well-formed element structure.
You can use this class to build an element structure using a custom XML
parser, or a parser for some other XML-like format.
*element_factory* is an optional element factory which is called
to create new Element instances, as necessary.
"""
def __init__(self, element_factory=None):
self._data = [] # data collector
self._elem = [] # element stack
self._last = None # last element
self._tail = None # true if we're after an end tag
if element_factory is None:
element_factory = Element
self._factory = element_factory
def close(self):
"""Flush builder buffers and return toplevel document Element."""
assert len(self._elem) == 0, "missing end tags"
assert self._last is not None, "missing toplevel element"
return self._last
def _flush(self):
if self._data:
if self._last is not None:
text = "".join(self._data)
if self._tail:
assert self._last.tail is None, "internal error (tail)"
self._last.tail = text
else:
assert self._last.text is None, "internal error (text)"
self._last.text = text
self._data = []
def data(self, data):
"""Add text to current element."""
self._data.append(data)
def start(self, tag, attrs):
"""Open new element and return it.
*tag* is the element name, *attrs* is a dict containing element
attributes.
"""
self._flush()
self._last = elem = self._factory(tag, attrs)
if self._elem:
self._elem[-1].append(elem)
self._elem.append(elem)
self._tail = 0
return elem
def end(self, tag):
"""Close and return current Element.
*tag* is the element name.
"""
self._flush()
self._last = self._elem.pop()
assert self._last.tag == tag,\
"end tag mismatch (expected %s, got %s)" % (
self._last.tag, tag)
self._tail = 1
return self._last
# also see ElementTree and TreeBuilder
class XMLParser:
"""Element structure builder for XML source data based on the expat parser.
*html* are predefined HTML entities (not supported currently),
*target* is an optional target object which defaults to an instance of the
standard TreeBuilder class, *encoding* is an optional encoding string
which if given, overrides the encoding specified in the XML file:
http://www.iana.org/assignments/character-sets
"""
def __init__(self, html=0, target=None, encoding=None):
try:
from xml.parsers import expat
except ImportError:
try:
import pyexpat as expat
except ImportError:
raise ImportError(
"No module named expat; use SimpleXMLTreeBuilder instead"
)
parser = expat.ParserCreate(encoding, "}")
if target is None:
target = TreeBuilder()
# underscored names are provided for compatibility only
self.parser = self._parser = parser
self.target = self._target = target
self._error = expat.error
self._names = {} # name memo cache
# main callbacks
parser.DefaultHandlerExpand = self._default
if hasattr(target, 'start'):
parser.StartElementHandler = self._start
if hasattr(target, 'end'):
parser.EndElementHandler = self._end
if hasattr(target, 'data'):
parser.CharacterDataHandler = target.data
# miscellaneous callbacks
if hasattr(target, 'comment'):
parser.CommentHandler = target.comment
if hasattr(target, 'pi'):
parser.ProcessingInstructionHandler = target.pi
# Configure pyexpat: buffering, new-style attribute handling.
parser.buffer_text = 1
parser.ordered_attributes = 1
parser.specified_attributes = 1
self._doctype = None
self.entity = {}
try:
self.version = "Expat %d.%d.%d" % expat.version_info
except AttributeError:
pass # unknown
def _setevents(self, events_queue, events_to_report):
# Internal API for XMLPullParser
# events_to_report: a list of events to report during parsing (same as
# the *events* of XMLPullParser's constructor.
# events_queue: a list of actual parsing events that will be populated
# by the underlying parser.
#
parser = self._parser
append = events_queue.append
for event_name in events_to_report:
if event_name == "start":
parser.ordered_attributes = 1
parser.specified_attributes = 1
def handler(tag, attrib_in, event=event_name, append=append,
start=self._start):
append((event, start(tag, attrib_in)))
parser.StartElementHandler = handler
elif event_name == "end":
def handler(tag, event=event_name, append=append,
end=self._end):
append((event, end(tag)))
parser.EndElementHandler = handler
elif event_name == "start-ns":
def handler(prefix, uri, event=event_name, append=append):
append((event, (prefix or "", uri or "")))
parser.StartNamespaceDeclHandler = handler
elif event_name == "end-ns":
def handler(prefix, event=event_name, append=append):
append((event, None))
parser.EndNamespaceDeclHandler = handler
else:
raise ValueError("unknown event %r" % event_name)
def _raiseerror(self, value):
err = ParseError(value)
err.code = value.code
err.position = value.lineno, value.offset
raise err
def _fixname(self, key):
# expand qname, and convert name string to ascii, if possible
try:
name = self._names[key]
except KeyError:
name = key
if "}" in name:
name = "{" + name
self._names[key] = name
return name
def _start(self, tag, attr_list):
# Handler for expat's StartElementHandler. Since ordered_attributes
# is set, the attributes are reported as a list of alternating
# attribute name,value.
fixname = self._fixname
tag = fixname(tag)
attrib = {}
if attr_list:
for i in range(0, len(attr_list), 2):
attrib[fixname(attr_list[i])] = attr_list[i+1]
return self.target.start(tag, attrib)
def _end(self, tag):
return self.target.end(self._fixname(tag))
def _default(self, text):
prefix = text[:1]
if prefix == "&":
# deal with undefined entities
try:
data_handler = self.target.data
except AttributeError:
return
try:
data_handler(self.entity[text[1:-1]])
except KeyError:
from xml.parsers import expat
err = expat.error(
"undefined entity %s: line %d, column %d" %
(text, self.parser.ErrorLineNumber,
self.parser.ErrorColumnNumber)
)
err.code = 11 # XML_ERROR_UNDEFINED_ENTITY
err.lineno = self.parser.ErrorLineNumber
err.offset = self.parser.ErrorColumnNumber
raise err
elif prefix == "<" and text[:9] == "<!DOCTYPE":
self._doctype = [] # inside a doctype declaration
elif self._doctype is not None:
# parse doctype contents
if prefix == ">":
self._doctype = None
return
text = text.strip()
if not text:
return
self._doctype.append(text)
n = len(self._doctype)
if n > 2:
type = self._doctype[1]
if type == "PUBLIC" and n == 4:
name, type, pubid, system = self._doctype
if pubid:
pubid = pubid[1:-1]
elif type == "SYSTEM" and n == 3:
name, type, system = self._doctype
pubid = None
else:
return
if hasattr(self.target, "doctype"):
self.target.doctype(name, pubid, system[1:-1])
elif self.doctype != self._XMLParser__doctype:
# warn about deprecated call
self._XMLParser__doctype(name, pubid, system[1:-1])
self.doctype(name, pubid, system[1:-1])
self._doctype = None
def doctype(self, name, pubid, system):
"""(Deprecated) Handle doctype declaration
*name* is the Doctype name, *pubid* is the public identifier,
and *system* is the system identifier.
"""
warnings.warn(
"This method of XMLParser is deprecated. Define doctype() "
"method on the TreeBuilder target.",
DeprecationWarning,
)
# sentinel, if doctype is redefined in a subclass
__doctype = doctype
def feed(self, data):
"""Feed encoded data to parser."""
try:
self.parser.Parse(data, 0)
except self._error as v:
self._raiseerror(v)
def close(self):
"""Finish feeding data to parser and return element structure."""
try:
self.parser.Parse("", 1) # end of data
except self._error as v:
self._raiseerror(v)
try:
close_handler = self.target.close
except AttributeError:
pass
else:
return close_handler()
finally:
# get rid of circular references
del self.parser, self._parser
del self.target, self._target
# Import the C accelerators
try:
# Element is going to be shadowed by the C implementation. We need to keep
# the Python version of it accessible for some "creative" by external code
# (see tests)
_Element_Py = Element
# Element, SubElement, ParseError, TreeBuilder, XMLParser
from _elementtree import *
except ImportError:
pass
# $Id: __init__.py 3375 2008-02-13 08:05:08Z fredrik $
# elementtree package
# --------------------------------------------------------------------
# The ElementTree toolkit is
#
# Copyright (c) 1999-2008 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
# Licensed to PSF under a Contributor Agreement.
# See http://www.python.org/psf/license for licensing details.
"""Interface to the Expat non-validating XML parser."""
import sys
from pyexpat import *
# provide pyexpat submodules as xml.parsers.expat submodules
sys.modules['xml.parsers.expat.model'] = model
sys.modules['xml.parsers.expat.errors'] = errors
"""Python interfaces to XML parsers.
This package contains one module:
expat -- Python wrapper for James Clark's Expat parser, with namespace
support.
"""
"""
SAX driver for the pyexpat C module. This driver works with
pyexpat.__version__ == '2.22'.
"""
version = "0.20"
from xml.sax._exceptions import *
from xml.sax.handler import feature_validation, feature_namespaces
from xml.sax.handler import feature_namespace_prefixes
from xml.sax.handler import feature_external_ges, feature_external_pes
from xml.sax.handler import feature_string_interning
from xml.sax.handler import property_xml_string, property_interning_dict
# xml.parsers.expat does not raise ImportError in Jython
import sys
if sys.platform[:4] == "java":
raise SAXReaderNotAvailable("expat not available in Java", None)
del sys
try:
from xml.parsers import expat
except ImportError:
raise SAXReaderNotAvailable("expat not supported", None)
else:
if not hasattr(expat, "ParserCreate"):
raise SAXReaderNotAvailable("expat not supported", None)
from xml.sax import xmlreader, saxutils, handler
AttributesImpl = xmlreader.AttributesImpl
AttributesNSImpl = xmlreader.AttributesNSImpl
# If we're using a sufficiently recent version of Python, we can use
# weak references to avoid cycles between the parser and content
# handler, otherwise we'll just have to pretend.
try:
import _weakref
except ImportError:
def _mkproxy(o):
return o
else:
import weakref
_mkproxy = weakref.proxy
del weakref, _weakref
class _ClosedParser:
pass
# --- ExpatLocator
class ExpatLocator(xmlreader.Locator):
"""Locator for use with the ExpatParser class.
This uses a weak reference to the parser object to avoid creating
a circular reference between the parser and the content handler.
"""
def __init__(self, parser):
self._ref = _mkproxy(parser)
def getColumnNumber(self):
parser = self._ref
if parser._parser is None:
return None
return parser._parser.ErrorColumnNumber
def getLineNumber(self):
parser = self._ref
if parser._parser is None:
return 1
return parser._parser.ErrorLineNumber
def getPublicId(self):
parser = self._ref
if parser is None:
return None
return parser._source.getPublicId()
def getSystemId(self):
parser = self._ref
if parser is None:
return None
return parser._source.getSystemId()
# --- ExpatParser
class ExpatParser(xmlreader.IncrementalParser, xmlreader.Locator):
"""SAX driver for the pyexpat C module."""
def __init__(self, namespaceHandling=0, bufsize=2**16-20):
xmlreader.IncrementalParser.__init__(self, bufsize)
self._source = xmlreader.InputSource()
self._parser = None
self._namespaces = namespaceHandling
self._lex_handler_prop = None
self._parsing = 0
self._entity_stack = []
self._external_ges = 1
self._interning = None
# XMLReader methods
def parse(self, source):
"Parse an XML document from a URL or an InputSource."
source = saxutils.prepare_input_source(source)
self._source = source
self.reset()
self._cont_handler.setDocumentLocator(ExpatLocator(self))
xmlreader.IncrementalParser.parse(self, source)
def prepareParser(self, source):
if source.getSystemId() is not None:
self._parser.SetBase(source.getSystemId())
# Redefined setContentHandler to allow changing handlers during parsing
def setContentHandler(self, handler):
xmlreader.IncrementalParser.setContentHandler(self, handler)
if self._parsing:
self._reset_cont_handler()
def getFeature(self, name):
if name == feature_namespaces:
return self._namespaces
elif name == feature_string_interning:
return self._interning is not None
elif name in (feature_validation, feature_external_pes,
feature_namespace_prefixes):
return 0
elif name == feature_external_ges:
return self._external_ges
raise SAXNotRecognizedException("Feature '%s' not recognized" % name)
def setFeature(self, name, state):
if self._parsing:
raise SAXNotSupportedException("Cannot set features while parsing")
if name == feature_namespaces:
self._namespaces = state
elif name == feature_external_ges:
self._external_ges = state
elif name == feature_string_interning:
if state:
if self._interning is None:
self._interning = {}
else:
self._interning = None
elif name == feature_validation:
if state:
raise SAXNotSupportedException(
"expat does not support validation")
elif name == feature_external_pes:
if state:
raise SAXNotSupportedException(
"expat does not read external parameter entities")
elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
else:
raise SAXNotRecognizedException(
"Feature '%s' not recognized" % name)
def getProperty(self, name):
if name == handler.property_lexical_handler:
return self._lex_handler_prop
elif name == property_interning_dict:
return self._interning
elif name == property_xml_string:
if self._parser:
if hasattr(self._parser, "GetInputContext"):
return self._parser.GetInputContext()
else:
raise SAXNotRecognizedException(
"This version of expat does not support getting"
" the XML string")
else:
raise SAXNotSupportedException(
"XML string cannot be returned when not parsing")
raise SAXNotRecognizedException("Property '%s' not recognized" % name)
def setProperty(self, name, value):
if name == handler.property_lexical_handler:
self._lex_handler_prop = value
if self._parsing:
self._reset_lex_handler_prop()
elif name == property_interning_dict:
self._interning = value
elif name == property_xml_string:
raise SAXNotSupportedException("Property '%s' cannot be set" %
name)
else:
raise SAXNotRecognizedException("Property '%s' not recognized" %
name)
# IncrementalParser methods
def feed(self, data, isFinal = 0):
if not self._parsing:
self.reset()
self._parsing = 1
self._cont_handler.startDocument()
try:
# The isFinal parameter is internal to the expat reader.
# If it is set to true, expat will check validity of the entire
# document. When feeding chunks, they are not normally final -
# except when invoked from close.
self._parser.Parse(data, isFinal)
except expat.error as e:
exc = SAXParseException(expat.ErrorString(e.code), e, self)
# FIXME: when to invoke error()?
self._err_handler.fatalError(exc)
def close(self):
if (self._entity_stack or self._parser is None or
isinstance(self._parser, _ClosedParser)):
# If we are completing an external entity, do nothing here
return
try:
self.feed("", isFinal = 1)
self._cont_handler.endDocument()
self._parsing = 0
# break cycle created by expat handlers pointing to our methods
self._parser = None
finally:
self._parsing = 0
if self._parser is not None:
# Keep ErrorColumnNumber and ErrorLineNumber after closing.
parser = _ClosedParser()
parser.ErrorColumnNumber = self._parser.ErrorColumnNumber
parser.ErrorLineNumber = self._parser.ErrorLineNumber
self._parser = parser
bs = self._source.getByteStream()
if bs is not None:
bs.close()
def _reset_cont_handler(self):
self._parser.ProcessingInstructionHandler = \
self._cont_handler.processingInstruction
self._parser.CharacterDataHandler = self._cont_handler.characters
def _reset_lex_handler_prop(self):
lex = self._lex_handler_prop
parser = self._parser
if lex is None:
parser.CommentHandler = None
parser.StartCdataSectionHandler = None
parser.EndCdataSectionHandler = None
parser.StartDoctypeDeclHandler = None
parser.EndDoctypeDeclHandler = None
else:
parser.CommentHandler = lex.comment
parser.StartCdataSectionHandler = lex.startCDATA
parser.EndCdataSectionHandler = lex.endCDATA
parser.StartDoctypeDeclHandler = self.start_doctype_decl
parser.EndDoctypeDeclHandler = lex.endDTD
def reset(self):
if self._namespaces:
self._parser = expat.ParserCreate(self._source.getEncoding(), " ",
intern=self._interning)
self._parser.namespace_prefixes = 1
self._parser.StartElementHandler = self.start_element_ns
self._parser.EndElementHandler = self.end_element_ns
else:
self._parser = expat.ParserCreate(self._source.getEncoding(),
intern = self._interning)
self._parser.StartElementHandler = self.start_element
self._parser.EndElementHandler = self.end_element
self._reset_cont_handler()
self._parser.UnparsedEntityDeclHandler = self.unparsed_entity_decl
self._parser.NotationDeclHandler = self.notation_decl
self._parser.StartNamespaceDeclHandler = self.start_namespace_decl
self._parser.EndNamespaceDeclHandler = self.end_namespace_decl
self._decl_handler_prop = None
if self._lex_handler_prop:
self._reset_lex_handler_prop()
# self._parser.DefaultHandler =
# self._parser.DefaultHandlerExpand =
# self._parser.NotStandaloneHandler =
self._parser.ExternalEntityRefHandler = self.external_entity_ref
try:
self._parser.SkippedEntityHandler = self.skipped_entity_handler
except AttributeError:
# This pyexpat does not support SkippedEntity
pass
self._parser.SetParamEntityParsing(
expat.XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE)
self._parsing = 0
self._entity_stack = []
# Locator methods
def getColumnNumber(self):
if self._parser is None:
return None
return self._parser.ErrorColumnNumber
def getLineNumber(self):
if self._parser is None:
return 1
return self._parser.ErrorLineNumber
def getPublicId(self):
return self._source.getPublicId()
def getSystemId(self):
return self._source.getSystemId()
# event handlers
def start_element(self, name, attrs):
self._cont_handler.startElement(name, AttributesImpl(attrs))
def end_element(self, name):
self._cont_handler.endElement(name)
def start_element_ns(self, name, attrs):
pair = name.split()
if len(pair) == 1:
# no namespace
pair = (None, name)
elif len(pair) == 3:
pair = pair[0], pair[1]
else:
# default namespace
pair = tuple(pair)
newattrs = {}
qnames = {}
for (aname, value) in attrs.items():
parts = aname.split()
length = len(parts)
if length == 1:
# no namespace
qname = aname
apair = (None, aname)
elif length == 3:
qname = "%s:%s" % (parts[2], parts[1])
apair = parts[0], parts[1]
else:
# default namespace
qname = parts[1]
apair = tuple(parts)
newattrs[apair] = value
qnames[apair] = qname
self._cont_handler.startElementNS(pair, None,
AttributesNSImpl(newattrs, qnames))
def end_element_ns(self, name):
pair = name.split()
if len(pair) == 1:
pair = (None, name)
elif len(pair) == 3:
pair = pair[0], pair[1]
else:
pair = tuple(pair)
self._cont_handler.endElementNS(pair, None)
# this is not used (call directly to ContentHandler)
def processing_instruction(self, target, data):
self._cont_handler.processingInstruction(target, data)
# this is not used (call directly to ContentHandler)
def character_data(self, data):
self._cont_handler.characters(data)
def start_namespace_decl(self, prefix, uri):
self._cont_handler.startPrefixMapping(prefix, uri)
def end_namespace_decl(self, prefix):
self._cont_handler.endPrefixMapping(prefix)
def start_doctype_decl(self, name, sysid, pubid, has_internal_subset):
self._lex_handler_prop.startDTD(name, pubid, sysid)
def unparsed_entity_decl(self, name, base, sysid, pubid, notation_name):
self._dtd_handler.unparsedEntityDecl(name, pubid, sysid, notation_name)
def notation_decl(self, name, base, sysid, pubid):
self._dtd_handler.notationDecl(name, pubid, sysid)
def external_entity_ref(self, context, base, sysid, pubid):
if not self._external_ges:
return 1
source = self._ent_handler.resolveEntity(pubid, sysid)
source = saxutils.prepare_input_source(source,
self._source.getSystemId() or
"")
self._entity_stack.append((self._parser, self._source))
self._parser = self._parser.ExternalEntityParserCreate(context)
self._source = source
try:
xmlreader.IncrementalParser.parse(self, source)
except:
return 0 # FIXME: save error info here?
(self._parser, self._source) = self._entity_stack[-1]
del self._entity_stack[-1]
return 1
def skipped_entity_handler(self, name, is_pe):
if is_pe:
# The SAX spec requires to report skipped PEs with a '%'
name = '%'+name
self._cont_handler.skippedEntity(name)
# ---
def create_parser(*args, **kwargs):
return ExpatParser(*args, **kwargs)
# ---
if __name__ == "__main__":
import xml.sax.saxutils
p = create_parser()
p.setContentHandler(xml.sax.saxutils.XMLGenerator())
p.setErrorHandler(xml.sax.ErrorHandler())
p.parse("http://www.ibiblio.org/xml/examples/shakespeare/hamlet.xml")
"""
This module contains the core classes of version 2.0 of SAX for Python.
This file provides only default classes with absolutely minimum
functionality, from which drivers and applications can be subclassed.
Many of these classes are empty and are included only as documentation
of the interfaces.
$Id$
"""
version = '2.0beta'
#============================================================================
#
# HANDLER INTERFACES
#
#============================================================================
# ===== ERRORHANDLER =====
class ErrorHandler:
"""Basic interface for SAX error handlers.
If you create an object that implements this interface, then
register the object with your XMLReader, the parser will call the
methods in your object to report all warnings and errors. There
are three levels of errors available: warnings, (possibly)
recoverable errors, and unrecoverable errors. All methods take a
SAXParseException as the only parameter."""
def error(self, exception):
"Handle a recoverable error."
raise exception
def fatalError(self, exception):
"Handle a non-recoverable error."
raise exception
def warning(self, exception):
"Handle a warning."
print(exception)
# ===== CONTENTHANDLER =====
class ContentHandler:
"""Interface for receiving logical document content events.
This is the main callback interface in SAX, and the one most
important to applications. The order of events in this interface
mirrors the order of the information in the document."""
def __init__(self):
self._locator = None
def setDocumentLocator(self, locator):
"""Called by the parser to give the application a locator for
locating the origin of document events.
SAX parsers are strongly encouraged (though not absolutely
required) to supply a locator: if it does so, it must supply
the locator to the application by invoking this method before
invoking any of the other methods in the DocumentHandler
interface.
The locator allows the application to determine the end
position of any document-related event, even if the parser is
not reporting an error. Typically, the application will use
this information for reporting its own errors (such as
character content that does not match an application's
business rules). The information returned by the locator is
probably not sufficient for use with a search engine.
Note that the locator will return correct information only
during the invocation of the events in this interface. The
application should not attempt to use it at any other time."""
self._locator = locator
def startDocument(self):
"""Receive notification of the beginning of a document.
The SAX parser will invoke this method only once, before any
other methods in this interface or in DTDHandler (except for
setDocumentLocator)."""
def endDocument(self):
"""Receive notification of the end of a document.
The SAX parser will invoke this method only once, and it will
be the last method invoked during the parse. The parser shall
not invoke this method until it has either abandoned parsing
(because of an unrecoverable error) or reached the end of
input."""
def startPrefixMapping(self, prefix, uri):
"""Begin the scope of a prefix-URI Namespace mapping.
The information from this event is not necessary for normal
Namespace processing: the SAX XML reader will automatically
replace prefixes for element and attribute names when the
http://xml.org/sax/features/namespaces feature is true (the
default).
There are cases, however, when applications need to use
prefixes in character data or in attribute values, where they
cannot safely be expanded automatically; the
start/endPrefixMapping event supplies the information to the
application to expand prefixes in those contexts itself, if
necessary.
Note that start/endPrefixMapping events are not guaranteed to
be properly nested relative to each-other: all
startPrefixMapping events will occur before the corresponding
startElement event, and all endPrefixMapping events will occur
after the corresponding endElement event, but their order is
not guaranteed."""
def endPrefixMapping(self, prefix):
"""End the scope of a prefix-URI mapping.
See startPrefixMapping for details. This event will always
occur after the corresponding endElement event, but the order
of endPrefixMapping events is not otherwise guaranteed."""
def startElement(self, name, attrs):
"""Signals the start of an element in non-namespace mode.
The name parameter contains the raw XML 1.0 name of the
element type as a string and the attrs parameter holds an
instance of the Attributes class containing the attributes of
the element."""
def endElement(self, name):
"""Signals the end of an element in non-namespace mode.
The name parameter contains the name of the element type, just
as with the startElement event."""
def startElementNS(self, name, qname, attrs):
"""Signals the start of an element in namespace mode.
The name parameter contains the name of the element type as a
(uri, localname) tuple, the qname parameter the raw XML 1.0
name used in the source document, and the attrs parameter
holds an instance of the Attributes class containing the
attributes of the element.
The uri part of the name tuple is None for elements which have
no namespace."""
def endElementNS(self, name, qname):
"""Signals the end of an element in namespace mode.
The name parameter contains the name of the element type, just
as with the startElementNS event."""
def characters(self, content):
"""Receive notification of character data.
The Parser will call this method to report each chunk of
character data. SAX parsers may return all contiguous
character data in a single chunk, or they may split it into
several chunks; however, all of the characters in any single
event must come from the same external entity so that the
Locator provides useful information."""
def ignorableWhitespace(self, whitespace):
"""Receive notification of ignorable whitespace in element content.
Validating Parsers must use this method to report each chunk
of ignorable whitespace (see the W3C XML 1.0 recommendation,
section 2.10): non-validating parsers may also use this method
if they are capable of parsing and using content models.
SAX parsers may return all contiguous whitespace in a single
chunk, or they may split it into several chunks; however, all
of the characters in any single event must come from the same
external entity, so that the Locator provides useful
information."""
def processingInstruction(self, target, data):
"""Receive notification of a processing instruction.
The Parser will invoke this method once for each processing
instruction found: note that processing instructions may occur
before or after the main document element.
A SAX parser should never report an XML declaration (XML 1.0,
section 2.8) or a text declaration (XML 1.0, section 4.3.1)
using this method."""
def skippedEntity(self, name):
"""Receive notification of a skipped entity.
The Parser will invoke this method once for each entity
skipped. Non-validating processors may skip entities if they
have not seen the declarations (because, for example, the
entity was declared in an external DTD subset). All processors
may skip external entities, depending on the values of the
http://xml.org/sax/features/external-general-entities and the
http://xml.org/sax/features/external-parameter-entities
properties."""
# ===== DTDHandler =====
class DTDHandler:
"""Handle DTD events.
This interface specifies only those DTD events required for basic
parsing (unparsed entities and attributes)."""
def notationDecl(self, name, publicId, systemId):
"Handle a notation declaration event."
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
"Handle an unparsed entity declaration event."
# ===== ENTITYRESOLVER =====
class EntityResolver:
"""Basic interface for resolving entities. If you create an object
implementing this interface, then register the object with your
Parser, the parser will call the method in your object to
resolve all external entities. Note that DefaultHandler implements
this interface with the default behaviour."""
def resolveEntity(self, publicId, systemId):
"""Resolve the system identifier of an entity and return either
the system identifier to read from as a string, or an InputSource
to read from."""
return systemId
#============================================================================
#
# CORE FEATURES
#
#============================================================================
feature_namespaces = "http://xml.org/sax/features/namespaces"
# true: Perform Namespace processing (default).
# false: Optionally do not perform Namespace processing
# (implies namespace-prefixes).
# access: (parsing) read-only; (not parsing) read/write
feature_namespace_prefixes = "http://xml.org/sax/features/namespace-prefixes"
# true: Report the original prefixed names and attributes used for Namespace
# declarations.
# false: Do not report attributes used for Namespace declarations, and
# optionally do not report original prefixed names (default).
# access: (parsing) read-only; (not parsing) read/write
feature_string_interning = "http://xml.org/sax/features/string-interning"
# true: All element names, prefixes, attribute names, Namespace URIs, and
# local names are interned using the built-in intern function.
# false: Names are not necessarily interned, although they may be (default).
# access: (parsing) read-only; (not parsing) read/write
feature_validation = "http://xml.org/sax/features/validation"
# true: Report all validation errors (implies external-general-entities and
# external-parameter-entities).
# false: Do not report validation errors.
# access: (parsing) read-only; (not parsing) read/write
feature_external_ges = "http://xml.org/sax/features/external-general-entities"
# true: Include all external general (text) entities.
# false: Do not include external general entities.
# access: (parsing) read-only; (not parsing) read/write
feature_external_pes = "http://xml.org/sax/features/external-parameter-entities"
# true: Include all external parameter entities, including the external
# DTD subset.
# false: Do not include any external parameter entities, even the external
# DTD subset.
# access: (parsing) read-only; (not parsing) read/write
all_features = [feature_namespaces,
feature_namespace_prefixes,
feature_string_interning,
feature_validation,
feature_external_ges,
feature_external_pes]
#============================================================================
#
# CORE PROPERTIES
#
#============================================================================
property_lexical_handler = "http://xml.org/sax/properties/lexical-handler"
# data type: xml.sax.sax2lib.LexicalHandler
# description: An optional extension handler for lexical events like comments.
# access: read/write
property_declaration_handler = "http://xml.org/sax/properties/declaration-handler"
# data type: xml.sax.sax2lib.DeclHandler
# description: An optional extension handler for DTD-related events other
# than notations and unparsed entities.
# access: read/write
property_dom_node = "http://xml.org/sax/properties/dom-node"
# data type: org.w3c.dom.Node
# description: When parsing, the current DOM node being visited if this is
# a DOM iterator; when not parsing, the root DOM node for
# iteration.
# access: (parsing) read-only; (not parsing) read/write
property_xml_string = "http://xml.org/sax/properties/xml-string"
# data type: String
# description: The literal string of characters that was the source for
# the current event.
# access: read-only
property_encoding = "http://www.python.org/sax/properties/encoding"
# data type: String
# description: The name of the encoding to assume for input data.
# access: write: set the encoding, e.g. established by a higher-level
# protocol. May change during parsing (e.g. after
# processing a META tag)
# read: return the current encoding (possibly established through
# auto-detection.
# initial value: UTF-8
#
property_interning_dict = "http://www.python.org/sax/properties/interning-dict"
# data type: Dictionary
# description: The dictionary used to intern common strings in the document
# access: write: Request that the parser uses a specific dictionary, to
# allow interning across different documents
# read: return the current interning dictionary, or None
#
all_properties = [property_lexical_handler,
property_dom_node,
property_declaration_handler,
property_xml_string,
property_encoding,
property_interning_dict]
"""\
A library of useful helper classes to the SAX classes, for the
convenience of application and driver writers.
"""
import os, urllib.parse, urllib.request
import io
import codecs
from . import handler
from . import xmlreader
def __dict_replace(s, d):
"""Replace substrings of a string using a dictionary."""
for key, value in d.items():
s = s.replace(key, value)
return s
def escape(data, entities={}):
"""Escape &, <, and > in a string of data.
You can escape other strings of data by passing a dictionary as
the optional entities parameter. The keys and values must all be
strings; each key will be replaced with its corresponding value.
"""
# must do ampersand first
data = data.replace("&", "&")
data = data.replace(">", ">")
data = data.replace("<", "<")
if entities:
data = __dict_replace(data, entities)
return data
def unescape(data, entities={}):
"""Unescape &, <, and > in a string of data.
You can unescape other strings of data by passing a dictionary as
the optional entities parameter. The keys and values must all be
strings; each key will be replaced with its corresponding value.
"""
data = data.replace("<", "<")
data = data.replace(">", ">")
if entities:
data = __dict_replace(data, entities)
# must do ampersand last
return data.replace("&", "&")
def quoteattr(data, entities={}):
"""Escape and quote an attribute value.
Escape &, <, and > in a string of data, then quote it for use as
an attribute value. The \" character will be escaped as well, if
necessary.
You can escape other strings of data by passing a dictionary as
the optional entities parameter. The keys and values must all be
strings; each key will be replaced with its corresponding value.
"""
entities = entities.copy()
entities.update({'\n': ' ', '\r': ' ', '\t':'	'})
data = escape(data, entities)
if '"' in data:
if "'" in data:
data = '"%s"' % data.replace('"', """)
else:
data = "'%s'" % data
else:
data = '"%s"' % data
return data
def _gettextwriter(out, encoding):
if out is None:
import sys
return sys.stdout
if isinstance(out, io.TextIOBase):
# use a text writer as is
return out
if isinstance(out, (codecs.StreamWriter, codecs.StreamReaderWriter)):
# use a codecs stream writer as is
return out
# wrap a binary writer with TextIOWrapper
if isinstance(out, io.RawIOBase):
# Keep the original file open when the TextIOWrapper is
# destroyed
class _wrapper:
__class__ = out.__class__
def __getattr__(self, name):
return getattr(out, name)
buffer = _wrapper()
buffer.close = lambda: None
else:
# This is to handle passed objects that aren't in the
# IOBase hierarchy, but just have a write method
buffer = io.BufferedIOBase()
buffer.writable = lambda: True
buffer.write = out.write
try:
# TextIOWrapper uses this methods to determine
# if BOM (for UTF-16, etc) should be added
buffer.seekable = out.seekable
buffer.tell = out.tell
except AttributeError:
pass
return io.TextIOWrapper(buffer, encoding=encoding,
errors='xmlcharrefreplace',
newline='\n',
write_through=True)
class XMLGenerator(handler.ContentHandler):
def __init__(self, out=None, encoding="iso-8859-1", short_empty_elements=False):
handler.ContentHandler.__init__(self)
out = _gettextwriter(out, encoding)
self._write = out.write
self._flush = out.flush
self._ns_contexts = [{}] # contains uri -> prefix dicts
self._current_context = self._ns_contexts[-1]
self._undeclared_ns_maps = []
self._encoding = encoding
self._short_empty_elements = short_empty_elements
self._pending_start_element = False
def _qname(self, name):
"""Builds a qualified name from a (ns_url, localname) pair"""
if name[0]:
# Per http://www.w3.org/XML/1998/namespace, The 'xml' prefix is
# bound by definition to http://www.w3.org/XML/1998/namespace. It
# does not need to be declared and will not usually be found in
# self._current_context.
if 'http://www.w3.org/XML/1998/namespace' == name[0]:
return 'xml:' + name[1]
# The name is in a non-empty namespace
prefix = self._current_context[name[0]]
if prefix:
# If it is not the default namespace, prepend the prefix
return prefix + ":" + name[1]
# Return the unqualified name
return name[1]
def _finish_pending_start_element(self,endElement=False):
if self._pending_start_element:
self._write('>')
self._pending_start_element = False
# ContentHandler methods
def startDocument(self):
self._write('<?xml version="1.0" encoding="%s"?>\n' %
self._encoding)
def endDocument(self):
self._flush()
def startPrefixMapping(self, prefix, uri):
self._ns_contexts.append(self._current_context.copy())
self._current_context[uri] = prefix
self._undeclared_ns_maps.append((prefix, uri))
def endPrefixMapping(self, prefix):
self._current_context = self._ns_contexts[-1]
del self._ns_contexts[-1]
def startElement(self, name, attrs):
self._finish_pending_start_element()
self._write('<' + name)
for (name, value) in attrs.items():
self._write(' %s=%s' % (name, quoteattr(value)))
if self._short_empty_elements:
self._pending_start_element = True
else:
self._write(">")
def endElement(self, name):
if self._pending_start_element:
self._write('/>')
self._pending_start_element = False
else:
self._write('</%s>' % name)
def startElementNS(self, name, qname, attrs):
self._finish_pending_start_element()
self._write('<' + self._qname(name))
for prefix, uri in self._undeclared_ns_maps:
if prefix:
self._write(' xmlns:%s="%s"' % (prefix, uri))
else:
self._write(' xmlns="%s"' % uri)
self._undeclared_ns_maps = []
for (name, value) in attrs.items():
self._write(' %s=%s' % (self._qname(name), quoteattr(value)))
if self._short_empty_elements:
self._pending_start_element = True
else:
self._write(">")
def endElementNS(self, name, qname):
if self._pending_start_element:
self._write('/>')
self._pending_start_element = False
else:
self._write('</%s>' % self._qname(name))
def characters(self, content):
if content:
self._finish_pending_start_element()
if not isinstance(content, str):
content = str(content, self._encoding)
self._write(escape(content))
def ignorableWhitespace(self, content):
if content:
self._finish_pending_start_element()
if not isinstance(content, str):
content = str(content, self._encoding)
self._write(content)
def processingInstruction(self, target, data):
self._finish_pending_start_element()
self._write('<?%s %s?>' % (target, data))
class XMLFilterBase(xmlreader.XMLReader):
"""This class is designed to sit between an XMLReader and the
client application's event handlers. By default, it does nothing
but pass requests up to the reader and events on to the handlers
unmodified, but subclasses can override specific methods to modify
the event stream or the configuration requests as they pass
through."""
def __init__(self, parent = None):
xmlreader.XMLReader.__init__(self)
self._parent = parent
# ErrorHandler methods
def error(self, exception):
self._err_handler.error(exception)
def fatalError(self, exception):
self._err_handler.fatalError(exception)
def warning(self, exception):
self._err_handler.warning(exception)
# ContentHandler methods
def setDocumentLocator(self, locator):
self._cont_handler.setDocumentLocator(locator)
def startDocument(self):
self._cont_handler.startDocument()
def endDocument(self):
self._cont_handler.endDocument()
def startPrefixMapping(self, prefix, uri):
self._cont_handler.startPrefixMapping(prefix, uri)
def endPrefixMapping(self, prefix):
self._cont_handler.endPrefixMapping(prefix)
def startElement(self, name, attrs):
self._cont_handler.startElement(name, attrs)
def endElement(self, name):
self._cont_handler.endElement(name)
def startElementNS(self, name, qname, attrs):
self._cont_handler.startElementNS(name, qname, attrs)
def endElementNS(self, name, qname):
self._cont_handler.endElementNS(name, qname)
def characters(self, content):
self._cont_handler.characters(content)
def ignorableWhitespace(self, chars):
self._cont_handler.ignorableWhitespace(chars)
def processingInstruction(self, target, data):
self._cont_handler.processingInstruction(target, data)
def skippedEntity(self, name):
self._cont_handler.skippedEntity(name)
# DTDHandler methods
def notationDecl(self, name, publicId, systemId):
self._dtd_handler.notationDecl(name, publicId, systemId)
def unparsedEntityDecl(self, name, publicId, systemId, ndata):
self._dtd_handler.unparsedEntityDecl(name, publicId, systemId, ndata)
# EntityResolver methods
def resolveEntity(self, publicId, systemId):
return self._ent_handler.resolveEntity(publicId, systemId)
# XMLReader methods
def parse(self, source):
self._parent.setContentHandler(self)
self._parent.setErrorHandler(self)
self._parent.setEntityResolver(self)
self._parent.setDTDHandler(self)
self._parent.parse(source)
def setLocale(self, locale):
self._parent.setLocale(locale)
def getFeature(self, name):
return self._parent.getFeature(name)
def setFeature(self, name, state):
self._parent.setFeature(name, state)
def getProperty(self, name):
return self._parent.getProperty(name)
def setProperty(self, name, value):
self._parent.setProperty(name, value)
# XMLFilter methods
def getParent(self):
return self._parent
def setParent(self, parent):
self._parent = parent
# --- Utility functions
def prepare_input_source(source, base=""):
"""This function takes an InputSource and an optional base URL and
returns a fully resolved InputSource object ready for reading."""
if isinstance(source, str):
source = xmlreader.InputSource(source)
elif hasattr(source, "read"):
f = source
source = xmlreader.InputSource()
source.setByteStream(f)
if hasattr(f, "name") and isinstance(f.name, str):
source.setSystemId(f.name)
if source.getByteStream() is None:
sysid = source.getSystemId()
basehead = os.path.dirname(os.path.normpath(base))
sysidfilename = os.path.join(basehead, sysid)
if os.path.isfile(sysidfilename):
source.setSystemId(sysidfilename)
f = open(sysidfilename, "rb")
else:
source.setSystemId(urllib.parse.urljoin(base, sysid))
f = urllib.request.urlopen(source.getSystemId())
source.setByteStream(f)
return source
"""An XML Reader is the SAX 2 name for an XML parser. XML Parsers
should be based on this code. """
from . import handler
from ._exceptions import SAXNotSupportedException, SAXNotRecognizedException
# ===== XMLREADER =====
class XMLReader:
"""Interface for reading an XML document using callbacks.
XMLReader is the interface that an XML parser's SAX2 driver must
implement. This interface allows an application to set and query
features and properties in the parser, to register event handlers
for document processing, and to initiate a document parse.
All SAX interfaces are assumed to be synchronous: the parse
methods must not return until parsing is complete, and readers
must wait for an event-handler callback to return before reporting
the next event."""
def __init__(self):
self._cont_handler = handler.ContentHandler()
self._dtd_handler = handler.DTDHandler()
self._ent_handler = handler.EntityResolver()
self._err_handler = handler.ErrorHandler()
def parse(self, source):
"Parse an XML document from a system identifier or an InputSource."
raise NotImplementedError("This method must be implemented!")
def getContentHandler(self):
"Returns the current ContentHandler."
return self._cont_handler
def setContentHandler(self, handler):
"Registers a new object to receive document content events."
self._cont_handler = handler
def getDTDHandler(self):
"Returns the current DTD handler."
return self._dtd_handler
def setDTDHandler(self, handler):
"Register an object to receive basic DTD-related events."
self._dtd_handler = handler
def getEntityResolver(self):
"Returns the current EntityResolver."
return self._ent_handler
def setEntityResolver(self, resolver):
"Register an object to resolve external entities."
self._ent_handler = resolver
def getErrorHandler(self):
"Returns the current ErrorHandler."
return self._err_handler
def setErrorHandler(self, handler):
"Register an object to receive error-message events."
self._err_handler = handler
def setLocale(self, locale):
"""Allow an application to set the locale for errors and warnings.
SAX parsers are not required to provide localization for errors
and warnings; if they cannot support the requested locale,
however, they must raise a SAX exception. Applications may
request a locale change in the middle of a parse."""
raise SAXNotSupportedException("Locale support not implemented")
def getFeature(self, name):
"Looks up and returns the state of a SAX2 feature."
raise SAXNotRecognizedException("Feature '%s' not recognized" % name)
def setFeature(self, name, state):
"Sets the state of a SAX2 feature."
raise SAXNotRecognizedException("Feature '%s' not recognized" % name)
def getProperty(self, name):
"Looks up and returns the value of a SAX2 property."
raise SAXNotRecognizedException("Property '%s' not recognized" % name)
def setProperty(self, name, value):
"Sets the value of a SAX2 property."
raise SAXNotRecognizedException("Property '%s' not recognized" % name)
class IncrementalParser(XMLReader):
"""This interface adds three extra methods to the XMLReader
interface that allow XML parsers to support incremental
parsing. Support for this interface is optional, since not all
underlying XML parsers support this functionality.
When the parser is instantiated it is ready to begin accepting
data from the feed method immediately. After parsing has been
finished with a call to close the reset method must be called to
make the parser ready to accept new data, either from feed or
using the parse method.
Note that these methods must _not_ be called during parsing, that
is, after parse has been called and before it returns.
By default, the class also implements the parse method of the XMLReader
interface using the feed, close and reset methods of the
IncrementalParser interface as a convenience to SAX 2.0 driver
writers."""
def __init__(self, bufsize=2**16):
self._bufsize = bufsize
XMLReader.__init__(self)
def parse(self, source):
from . import saxutils
source = saxutils.prepare_input_source(source)
self.prepareParser(source)
file = source.getByteStream()
buffer = file.read(self._bufsize)
while buffer:
self.feed(buffer)
buffer = file.read(self._bufsize)
self.close()
def feed(self, data):
"""This method gives the raw XML data in the data parameter to
the parser and makes it parse the data, emitting the
corresponding events. It is allowed for XML constructs to be
split across several calls to feed.
feed may raise SAXException."""
raise NotImplementedError("This method must be implemented!")
def prepareParser(self, source):
"""This method is called by the parse implementation to allow
the SAX 2.0 driver to prepare itself for parsing."""
raise NotImplementedError("prepareParser must be overridden!")
def close(self):
"""This method is called when the entire XML document has been
passed to the parser through the feed method, to notify the
parser that there are no more data. This allows the parser to
do the final checks on the document and empty the internal
data buffer.
The parser will not be ready to parse another document until
the reset method has been called.
close may raise SAXException."""
raise NotImplementedError("This method must be implemented!")
def reset(self):
"""This method is called after close has been called to reset
the parser so that it is ready to parse new documents. The
results of calling parse or feed after close without calling
reset are undefined."""
raise NotImplementedError("This method must be implemented!")
# ===== LOCATOR =====
class Locator:
"""Interface for associating a SAX event with a document
location. A locator object will return valid results only during
calls to DocumentHandler methods; at any other time, the
results are unpredictable."""
def getColumnNumber(self):
"Return the column number where the current event ends."
return -1
def getLineNumber(self):
"Return the line number where the current event ends."
return -1
def getPublicId(self):
"Return the public identifier for the current event."
return None
def getSystemId(self):
"Return the system identifier for the current event."
return None
# ===== INPUTSOURCE =====
class InputSource:
"""Encapsulation of the information needed by the XMLReader to
read entities.
This class may include information about the public identifier,
system identifier, byte stream (possibly with character encoding
information) and/or the character stream of an entity.
Applications will create objects of this class for use in the
XMLReader.parse method and for returning from
EntityResolver.resolveEntity.
An InputSource belongs to the application, the XMLReader is not
allowed to modify InputSource objects passed to it from the
application, although it may make copies and modify those."""
def __init__(self, system_id = None):
self.__system_id = system_id
self.__public_id = None
self.__encoding = None
self.__bytefile = None
self.__charfile = None
def setPublicId(self, public_id):
"Sets the public identifier of this InputSource."
self.__public_id = public_id
def getPublicId(self):
"Returns the public identifier of this InputSource."
return self.__public_id
def setSystemId(self, system_id):
"Sets the system identifier of this InputSource."
self.__system_id = system_id
def getSystemId(self):
"Returns the system identifier of this InputSource."
return self.__system_id
def setEncoding(self, encoding):
"""Sets the character encoding of this InputSource.
The encoding must be a string acceptable for an XML encoding
declaration (see section 4.3.3 of the XML recommendation).
The encoding attribute of the InputSource is ignored if the
InputSource also contains a character stream."""
self.__encoding = encoding
def getEncoding(self):
"Get the character encoding of this InputSource."
return self.__encoding
def setByteStream(self, bytefile):
"""Set the byte stream (a Python file-like object which does
not perform byte-to-character conversion) for this input
source.
The SAX parser will ignore this if there is also a character
stream specified, but it will use a byte stream in preference
to opening a URI connection itself.
If the application knows the character encoding of the byte
stream, it should set it with the setEncoding method."""
self.__bytefile = bytefile
def getByteStream(self):
"""Get the byte stream for this input source.
The getEncoding method will return the character encoding for
this byte stream, or None if unknown."""
return self.__bytefile
def setCharacterStream(self, charfile):
"""Set the character stream for this input source. (The stream
must be a Python 2.0 Unicode-wrapped file-like that performs
conversion to Unicode strings.)
If there is a character stream specified, the SAX parser will
ignore any byte stream and will not attempt to open a URI
connection to the system identifier."""
self.__charfile = charfile
def getCharacterStream(self):
"Get the character stream for this input source."
return self.__charfile
# ===== ATTRIBUTESIMPL =====
class AttributesImpl:
def __init__(self, attrs):
"""Non-NS-aware implementation.
attrs should be of the form {name : value}."""
self._attrs = attrs
def getLength(self):
return len(self._attrs)
def getType(self, name):
return "CDATA"
def getValue(self, name):
return self._attrs[name]
def getValueByQName(self, name):
return self._attrs[name]
def getNameByQName(self, name):
if name not in self._attrs:
raise KeyError(name)
return name
def getQNameByName(self, name):
if name not in self._attrs:
raise KeyError(name)
return name
def getNames(self):
return list(self._attrs.keys())
def getQNames(self):
return list(self._attrs.keys())
def __len__(self):
return len(self._attrs)
def __getitem__(self, name):
return self._attrs[name]
def keys(self):
return list(self._attrs.keys())
def __contains__(self, name):
return name in self._attrs
def get(self, name, alternative=None):
return self._attrs.get(name, alternative)
def copy(self):
return self.__class__(self._attrs)
def items(self):
return list(self._attrs.items())
def values(self):
return list(self._attrs.values())
# ===== ATTRIBUTESNSIMPL =====
class AttributesNSImpl(AttributesImpl):
def __init__(self, attrs, qnames):
"""NS-aware implementation.
attrs should be of the form {(ns_uri, lname): value, ...}.
qnames of the form {(ns_uri, lname): qname, ...}."""
self._attrs = attrs
self._qnames = qnames
def getValueByQName(self, name):
for (nsname, qname) in self._qnames.items():
if qname == name:
return self._attrs[nsname]
raise KeyError(name)
def getNameByQName(self, name):
for (nsname, qname) in self._qnames.items():
if qname == name:
return nsname
raise KeyError(name)
def getQNameByName(self, name):
return self._qnames[name]
def getQNames(self):
return list(self._qnames.values())
def copy(self):
return self.__class__(self._attrs, self._qnames)
def _test():
XMLReader()
IncrementalParser()
Locator()
if __name__ == "__main__":
_test()
"""Different kinds of SAX Exceptions"""
import sys
if sys.platform[:4] == "java":
from java.lang import Exception
del sys
# ===== SAXEXCEPTION =====
class SAXException(Exception):
"""Encapsulate an XML error or warning. This class can contain
basic error or warning information from either the XML parser or
the application: you can subclass it to provide additional
functionality, or to add localization. Note that although you will
receive a SAXException as the argument to the handlers in the
ErrorHandler interface, you are not actually required to raise
the exception; instead, you can simply read the information in
it."""
def __init__(self, msg, exception=None):
"""Creates an exception. The message is required, but the exception
is optional."""
self._msg = msg
self._exception = exception
Exception.__init__(self, msg)
def getMessage(self):
"Return a message for this exception."
return self._msg
def getException(self):
"Return the embedded exception, or None if there was none."
return self._exception
def __str__(self):
"Create a string representation of the exception."
return self._msg
def __getitem__(self, ix):
"""Avoids weird error messages if someone does exception[ix] by
mistake, since Exception has __getitem__ defined."""
raise AttributeError("__getitem__")
# ===== SAXPARSEEXCEPTION =====
class SAXParseException(SAXException):
"""Encapsulate an XML parse error or warning.
This exception will include information for locating the error in
the original XML document. Note that although the application will
receive a SAXParseException as the argument to the handlers in the
ErrorHandler interface, the application is not actually required
to raise the exception; instead, it can simply read the
information in it and take a different action.
Since this exception is a subclass of SAXException, it inherits
the ability to wrap another exception."""
def __init__(self, msg, exception, locator):
"Creates the exception. The exception parameter is allowed to be None."
SAXException.__init__(self, msg, exception)
self._locator = locator
# We need to cache this stuff at construction time.
# If this exception is raised, the objects through which we must
# traverse to get this information may be deleted by the time
# it gets caught.
self._systemId = self._locator.getSystemId()
self._colnum = self._locator.getColumnNumber()
self._linenum = self._locator.getLineNumber()
def getColumnNumber(self):
"""The column number of the end of the text where the exception
occurred."""
return self._colnum
def getLineNumber(self):
"The line number of the end of the text where the exception occurred."
return self._linenum
def getPublicId(self):
"Get the public identifier of the entity where the exception occurred."
return self._locator.getPublicId()
def getSystemId(self):
"Get the system identifier of the entity where the exception occurred."
return self._systemId
def __str__(self):
"Create a string representation of the exception."
sysid = self.getSystemId()
if sysid is None:
sysid = "<unknown>"
linenum = self.getLineNumber()
if linenum is None:
linenum = "?"
colnum = self.getColumnNumber()
if colnum is None:
colnum = "?"
return "%s:%s:%s: %s" % (sysid, linenum, colnum, self._msg)
# ===== SAXNOTRECOGNIZEDEXCEPTION =====
class SAXNotRecognizedException(SAXException):
"""Exception class for an unrecognized identifier.
An XMLReader will raise this exception when it is confronted with an
unrecognized feature or property. SAX applications and extensions may
use this class for similar purposes."""
# ===== SAXNOTSUPPORTEDEXCEPTION =====
class SAXNotSupportedException(SAXException):
"""Exception class for an unsupported operation.
An XMLReader will raise this exception when a service it cannot
perform is requested (specifically setting a state or value). SAX
applications and extensions may use this class for similar
purposes."""
# ===== SAXNOTSUPPORTEDEXCEPTION =====
class SAXReaderNotAvailable(SAXNotSupportedException):
"""Exception class for a missing driver.
An XMLReader module (driver) should raise this exception when it
is first imported, e.g. when a support module cannot be imported.
It also may be raised during parsing, e.g. if executing an external
program is not permitted."""
"""Simple API for XML (SAX) implementation for Python.
This module provides an implementation of the SAX 2 interface;
information about the Java version of the interface can be found at
http://www.megginson.com/SAX/. The Python version of the interface is
documented at <...>.
This package contains the following modules:
handler -- Base classes and constants which define the SAX 2 API for
the 'client-side' of SAX for Python.
saxutils -- Implementation of the convenience classes commonly used to
work with SAX.
xmlreader -- Base classes and constants which define the SAX 2 API for
the parsers used with SAX for Python.
expatreader -- Driver that allows use of the Expat parser with SAX.
"""
from .xmlreader import InputSource
from .handler import ContentHandler, ErrorHandler
from ._exceptions import SAXException, SAXNotRecognizedException, \
SAXParseException, SAXNotSupportedException, \
SAXReaderNotAvailable
def parse(source, handler, errorHandler=ErrorHandler()):
parser = make_parser()
parser.setContentHandler(handler)
parser.setErrorHandler(errorHandler)
parser.parse(source)
def parseString(string, handler, errorHandler=ErrorHandler()):
from io import BytesIO
if errorHandler is None:
errorHandler = ErrorHandler()
parser = make_parser()
parser.setContentHandler(handler)
parser.setErrorHandler(errorHandler)
inpsrc = InputSource()
inpsrc.setByteStream(BytesIO(string))
parser.parse(inpsrc)
# this is the parser list used by the make_parser function if no
# alternatives are given as parameters to the function
default_parser_list = ["xml.sax.expatreader"]
# tell modulefinder that importing sax potentially imports expatreader
_false = 0
if _false:
import xml.sax.expatreader
import os, sys
if not sys.flags.ignore_environment and "PY_SAX_PARSER" in os.environ:
default_parser_list = os.environ["PY_SAX_PARSER"].split(",")
del os
_key = "python.xml.sax.parser"
if sys.platform[:4] == "java" and sys.registry.containsKey(_key):
default_parser_list = sys.registry.getProperty(_key).split(",")
def make_parser(parser_list = []):
"""Creates and returns a SAX parser.
Creates the first parser it is able to instantiate of the ones
given in the list created by doing parser_list +
default_parser_list. The lists must contain the names of Python
modules containing both a SAX parser and a create_parser function."""
for parser_name in parser_list + default_parser_list:
try:
return _create_parser(parser_name)
except ImportError as e:
import sys
if parser_name in sys.modules:
# The parser module was found, but importing it
# failed unexpectedly, pass this exception through
raise
except SAXReaderNotAvailable:
# The parser module detected that it won't work properly,
# so try the next one
pass
raise SAXReaderNotAvailable("No parsers found", None)
# --- Internal utility methods used by make_parser
if sys.platform[ : 4] == "java":
def _create_parser(parser_name):
from org.python.core import imp
drv_module = imp.importName(parser_name, 0, globals())
return drv_module.create_parser()
else:
def _create_parser(parser_name):
drv_module = __import__(parser_name,{},{},['create_parser'])
return drv_module.create_parser()
del sys
#
# XML-RPC CLIENT LIBRARY
# $Id$
#
# an XML-RPC client interface for Python.
#
# the marshalling and response parser code can also be used to
# implement XML-RPC servers.
#
# Notes:
# this version is designed to work with Python 2.1 or newer.
#
# History:
# 1999-01-14 fl Created
# 1999-01-15 fl Changed dateTime to use localtime
# 1999-01-16 fl Added Binary/base64 element, default to RPC2 service
# 1999-01-19 fl Fixed array data element (from Skip Montanaro)
# 1999-01-21 fl Fixed dateTime constructor, etc.
# 1999-02-02 fl Added fault handling, handle empty sequences, etc.
# 1999-02-10 fl Fixed problem with empty responses (from Skip Montanaro)
# 1999-06-20 fl Speed improvements, pluggable parsers/transports (0.9.8)
# 2000-11-28 fl Changed boolean to check the truth value of its argument
# 2001-02-24 fl Added encoding/Unicode/SafeTransport patches
# 2001-02-26 fl Added compare support to wrappers (0.9.9/1.0b1)
# 2001-03-28 fl Make sure response tuple is a singleton
# 2001-03-29 fl Don't require empty params element (from Nicholas Riley)
# 2001-06-10 fl Folded in _xmlrpclib accelerator support (1.0b2)
# 2001-08-20 fl Base xmlrpclib.Error on built-in Exception (from Paul Prescod)
# 2001-09-03 fl Allow Transport subclass to override getparser
# 2001-09-10 fl Lazy import of urllib, cgi, xmllib (20x import speedup)
# 2001-10-01 fl Remove containers from memo cache when done with them
# 2001-10-01 fl Use faster escape method (80% dumps speedup)
# 2001-10-02 fl More dumps microtuning
# 2001-10-04 fl Make sure import expat gets a parser (from Guido van Rossum)
# 2001-10-10 sm Allow long ints to be passed as ints if they don't overflow
# 2001-10-17 sm Test for int and long overflow (allows use on 64-bit systems)
# 2001-11-12 fl Use repr() to marshal doubles (from Paul Felix)
# 2002-03-17 fl Avoid buffered read when possible (from James Rucker)
# 2002-04-07 fl Added pythondoc comments
# 2002-04-16 fl Added __str__ methods to datetime/binary wrappers
# 2002-05-15 fl Added error constants (from Andrew Kuchling)
# 2002-06-27 fl Merged with Python CVS version
# 2002-10-22 fl Added basic authentication (based on code from Phillip Eby)
# 2003-01-22 sm Add support for the bool type
# 2003-02-27 gvr Remove apply calls
# 2003-04-24 sm Use cStringIO if available
# 2003-04-25 ak Add support for nil
# 2003-06-15 gn Add support for time.struct_time
# 2003-07-12 gp Correct marshalling of Faults
# 2003-10-31 mvl Add multicall support
# 2004-08-20 mvl Bump minimum supported Python version to 2.1
# 2014-12-02 ch/doko Add workaround for gzip bomb vulnerability
#
# Copyright (c) 1999-2002 by Secret Labs AB.
# Copyright (c) 1999-2002 by Fredrik Lundh.
#
# [email protected]
# http://www.pythonware.com
#
# --------------------------------------------------------------------
# The XML-RPC client interface is
#
# Copyright (c) 1999-2002 by Secret Labs AB
# Copyright (c) 1999-2002 by Fredrik Lundh
#
# By obtaining, using, and/or copying this software and/or its
# associated documentation, you agree that you have read, understood,
# and will comply with the following terms and conditions:
#
# Permission to use, copy, modify, and distribute this software and
# its associated documentation for any purpose and without fee is
# hereby granted, provided that the above copyright notice appears in
# all copies, and that both that copyright notice and this permission
# notice appear in supporting documentation, and that the name of
# Secret Labs AB or the author not be used in advertising or publicity
# pertaining to distribution of the software without specific, written
# prior permission.
#
# SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD
# TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT-
# ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR
# BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
# DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
# WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
# ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE
# OF THIS SOFTWARE.
# --------------------------------------------------------------------
"""
An XML-RPC client interface for Python.
The marshalling and response parser code can also be used to
implement XML-RPC servers.
Exported exceptions:
Error Base class for client errors
ProtocolError Indicates an HTTP protocol error
ResponseError Indicates a broken response package
Fault Indicates an XML-RPC fault package
Exported classes:
ServerProxy Represents a logical connection to an XML-RPC server
MultiCall Executor of boxcared xmlrpc requests
DateTime dateTime wrapper for an ISO 8601 string or time tuple or
localtime integer value to generate a "dateTime.iso8601"
XML-RPC value
Binary binary data wrapper
Marshaller Generate an XML-RPC params chunk from a Python data structure
Unmarshaller Unmarshal an XML-RPC response from incoming XML event message
Transport Handles an HTTP transaction to an XML-RPC server
SafeTransport Handles an HTTPS transaction to an XML-RPC server
Exported constants:
(none)
Exported functions:
getparser Create instance of the fastest available parser & attach
to an unmarshalling object
dumps Convert an argument tuple or a Fault instance to an XML-RPC
request (or response, if the methodresponse option is used).
loads Convert an XML-RPC packet to unmarshalled data plus a method
name (None if not present).
"""
import base64
import sys
import time
from datetime import datetime
import http.client
import urllib.parse
from xml.parsers import expat
import errno
from io import BytesIO
try:
import gzip
except ImportError:
gzip = None #python can be built without zlib/gzip support
# --------------------------------------------------------------------
# Internal stuff
def escape(s):
s = s.replace("&", "&")
s = s.replace("<", "<")
return s.replace(">", ">",)
# used in User-Agent header sent
__version__ = sys.version[:3]
# xmlrpc integer limits
MAXINT = 2**31-1
MININT = -2**31
# --------------------------------------------------------------------
# Error constants (from Dan Libby's specification at
# http://xmlrpc-epi.sourceforge.net/specs/rfc.fault_codes.php)
# Ranges of errors
PARSE_ERROR = -32700
SERVER_ERROR = -32600
APPLICATION_ERROR = -32500
SYSTEM_ERROR = -32400
TRANSPORT_ERROR = -32300
# Specific errors
NOT_WELLFORMED_ERROR = -32700
UNSUPPORTED_ENCODING = -32701
INVALID_ENCODING_CHAR = -32702
INVALID_XMLRPC = -32600
METHOD_NOT_FOUND = -32601
INVALID_METHOD_PARAMS = -32602
INTERNAL_ERROR = -32603
# --------------------------------------------------------------------
# Exceptions
##
# Base class for all kinds of client-side errors.
class Error(Exception):
"""Base class for client errors."""
def __str__(self):
return repr(self)
##
# Indicates an HTTP-level protocol error. This is raised by the HTTP
# transport layer, if the server returns an error code other than 200
# (OK).
#
# @param url The target URL.
# @param errcode The HTTP error code.
# @param errmsg The HTTP error message.
# @param headers The HTTP header dictionary.
class ProtocolError(Error):
"""Indicates an HTTP protocol error."""
def __init__(self, url, errcode, errmsg, headers):
Error.__init__(self)
self.url = url
self.errcode = errcode
self.errmsg = errmsg
self.headers = headers
def __repr__(self):
return (
"<ProtocolError for %s: %s %s>" %
(self.url, self.errcode, self.errmsg)
)
##
# Indicates a broken XML-RPC response package. This exception is
# raised by the unmarshalling layer, if the XML-RPC response is
# malformed.
class ResponseError(Error):
"""Indicates a broken response package."""
pass
##
# Indicates an XML-RPC fault response package. This exception is
# raised by the unmarshalling layer, if the XML-RPC response contains
# a fault string. This exception can also be used as a class, to
# generate a fault XML-RPC message.
#
# @param faultCode The XML-RPC fault code.
# @param faultString The XML-RPC fault string.
class Fault(Error):
"""Indicates an XML-RPC fault package."""
def __init__(self, faultCode, faultString, **extra):
Error.__init__(self)
self.faultCode = faultCode
self.faultString = faultString
def __repr__(self):
return "<Fault %s: %r>" % (self.faultCode, self.faultString)
# --------------------------------------------------------------------
# Special values
##
# Backwards compatibility
boolean = Boolean = bool
##
# Wrapper for XML-RPC DateTime values. This converts a time value to
# the format used by XML-RPC.
# <p>
# The value can be given as a datetime object, as a string in the
# format "yyyymmddThh:mm:ss", as a 9-item time tuple (as returned by
# time.localtime()), or an integer value (as returned by time.time()).
# The wrapper uses time.localtime() to convert an integer to a time
# tuple.
#
# @param value The time, given as a datetime object, an ISO 8601 string,
# a time tuple, or an integer time value.
# Issue #13305: different format codes across platforms
_day0 = datetime(1, 1, 1)
if _day0.strftime('%Y') == '0001': # Mac OS X
def _iso8601_format(value):
return value.strftime("%Y%m%dT%H:%M:%S")
elif _day0.strftime('%4Y') == '0001': # Linux
def _iso8601_format(value):
return value.strftime("%4Y%m%dT%H:%M:%S")
else:
def _iso8601_format(value):
return value.strftime("%Y%m%dT%H:%M:%S").zfill(17)
del _day0
def _strftime(value):
if isinstance(value, datetime):
return _iso8601_format(value)
if not isinstance(value, (tuple, time.struct_time)):
if value == 0:
value = time.time()
value = time.localtime(value)
return "%04d%02d%02dT%02d:%02d:%02d" % value[:6]
class DateTime:
"""DateTime wrapper for an ISO 8601 string or time tuple or
localtime integer value to generate 'dateTime.iso8601' XML-RPC
value.
"""
def __init__(self, value=0):
if isinstance(value, str):
self.value = value
else:
self.value = _strftime(value)
def make_comparable(self, other):
if isinstance(other, DateTime):
s = self.value
o = other.value
elif isinstance(other, datetime):
s = self.value
o = _iso8601_format(other)
elif isinstance(other, str):
s = self.value
o = other
elif hasattr(other, "timetuple"):
s = self.timetuple()
o = other.timetuple()
else:
otype = (hasattr(other, "__class__")
and other.__class__.__name__
or type(other))
raise TypeError("Can't compare %s and %s" %
(self.__class__.__name__, otype))
return s, o
def __lt__(self, other):
s, o = self.make_comparable(other)
return s < o
def __le__(self, other):
s, o = self.make_comparable(other)
return s <= o
def __gt__(self, other):
s, o = self.make_comparable(other)
return s > o
def __ge__(self, other):
s, o = self.make_comparable(other)
return s >= o
def __eq__(self, other):
s, o = self.make_comparable(other)
return s == o
def __ne__(self, other):
s, o = self.make_comparable(other)
return s != o
def timetuple(self):
return time.strptime(self.value, "%Y%m%dT%H:%M:%S")
##
# Get date/time value.
#
# @return Date/time value, as an ISO 8601 string.
def __str__(self):
return self.value
def __repr__(self):
return "<DateTime %r at %x>" % (self.value, id(self))
def decode(self, data):
self.value = str(data).strip()
def encode(self, out):
out.write("<value><dateTime.iso8601>")
out.write(self.value)
out.write("</dateTime.iso8601></value>\n")
def _datetime(data):
# decode xml element contents into a DateTime structure.
value = DateTime()
value.decode(data)
return value
def _datetime_type(data):
return datetime.strptime(data, "%Y%m%dT%H:%M:%S")
##
# Wrapper for binary data. This can be used to transport any kind
# of binary data over XML-RPC, using BASE64 encoding.
#
# @param data An 8-bit string containing arbitrary data.
class Binary:
"""Wrapper for binary data."""
def __init__(self, data=None):
if data is None:
data = b""
else:
if not isinstance(data, (bytes, bytearray)):
raise TypeError("expected bytes or bytearray, not %s" %
data.__class__.__name__)
data = bytes(data) # Make a copy of the bytes!
self.data = data
##
# Get buffer contents.
#
# @return Buffer contents, as an 8-bit string.
def __str__(self):
return str(self.data, "latin-1") # XXX encoding?!
def __eq__(self, other):
if isinstance(other, Binary):
other = other.data
return self.data == other
def __ne__(self, other):
if isinstance(other, Binary):
other = other.data
return self.data != other
def decode(self, data):
self.data = base64.decodebytes(data)
def encode(self, out):
out.write("<value><base64>\n")
encoded = base64.encodebytes(self.data)
out.write(encoded.decode('ascii'))
out.write("</base64></value>\n")
def _binary(data):
# decode xml element contents into a Binary structure
value = Binary()
value.decode(data)
return value
WRAPPERS = (DateTime, Binary)
# --------------------------------------------------------------------
# XML parsers
class ExpatParser:
# fast expat parser for Python 2.0 and later.
def __init__(self, target):
self._parser = parser = expat.ParserCreate(None, None)
self._target = target
parser.StartElementHandler = target.start
parser.EndElementHandler = target.end
parser.CharacterDataHandler = target.data
encoding = None
target.xml(encoding, None)
def feed(self, data):
self._parser.Parse(data, 0)
def close(self):
try:
parser = self._parser
except AttributeError:
pass
else:
del self._target, self._parser # get rid of circular references
parser.Parse(b"", True) # end of data
# --------------------------------------------------------------------
# XML-RPC marshalling and unmarshalling code
##
# XML-RPC marshaller.
#
# @param encoding Default encoding for 8-bit strings. The default
# value is None (interpreted as UTF-8).
# @see dumps
class Marshaller:
"""Generate an XML-RPC params chunk from a Python data structure.
Create a Marshaller instance for each set of parameters, and use
the "dumps" method to convert your data (represented as a tuple)
to an XML-RPC params chunk. To write a fault response, pass a
Fault instance instead. You may prefer to use the "dumps" module
function for this purpose.
"""
# by the way, if you don't understand what's going on in here,
# that's perfectly ok.
def __init__(self, encoding=None, allow_none=False):
self.memo = {}
self.data = None
self.encoding = encoding
self.allow_none = allow_none
dispatch = {}
def dumps(self, values):
out = []
write = out.append
dump = self.__dump
if isinstance(values, Fault):
# fault instance
write("<fault>\n")
dump({'faultCode': values.faultCode,
'faultString': values.faultString},
write)
write("</fault>\n")
else:
# parameter block
# FIXME: the xml-rpc specification allows us to leave out
# the entire <params> block if there are no parameters.
# however, changing this may break older code (including
# old versions of xmlrpclib.py), so this is better left as
# is for now. See @XMLRPC3 for more information. /F
write("<params>\n")
for v in values:
write("<param>\n")
dump(v, write)
write("</param>\n")
write("</params>\n")
result = "".join(out)
return result
def __dump(self, value, write):
try:
f = self.dispatch[type(value)]
except KeyError:
# check if this object can be marshalled as a structure
if not hasattr(value, '__dict__'):
raise TypeError("cannot marshal %s objects" % type(value))
# check if this class is a sub-class of a basic type,
# because we don't know how to marshal these types
# (e.g. a string sub-class)
for type_ in type(value).__mro__:
if type_ in self.dispatch.keys():
raise TypeError("cannot marshal %s objects" % type(value))
# XXX(twouters): using "_arbitrary_instance" as key as a quick-fix
# for the p3yk merge, this should probably be fixed more neatly.
f = self.dispatch["_arbitrary_instance"]
f(self, value, write)
def dump_nil (self, value, write):
if not self.allow_none:
raise TypeError("cannot marshal None unless allow_none is enabled")
write("<value><nil/></value>")
dispatch[type(None)] = dump_nil
def dump_bool(self, value, write):
write("<value><boolean>")
write(value and "1" or "0")
write("</boolean></value>\n")
dispatch[bool] = dump_bool
def dump_long(self, value, write):
if value > MAXINT or value < MININT:
raise OverflowError("int exceeds XML-RPC limits")
write("<value><int>")
write(str(int(value)))
write("</int></value>\n")
dispatch[int] = dump_long
# backward compatible
dump_int = dump_long
def dump_double(self, value, write):
write("<value><double>")
write(repr(value))
write("</double></value>\n")
dispatch[float] = dump_double
def dump_unicode(self, value, write, escape=escape):
write("<value><string>")
write(escape(value))
write("</string></value>\n")
dispatch[str] = dump_unicode
def dump_bytes(self, value, write):
write("<value><base64>\n")
encoded = base64.encodebytes(value)
write(encoded.decode('ascii'))
write("</base64></value>\n")
dispatch[bytes] = dump_bytes
dispatch[bytearray] = dump_bytes
def dump_array(self, value, write):
i = id(value)
if i in self.memo:
raise TypeError("cannot marshal recursive sequences")
self.memo[i] = None
dump = self.__dump
write("<value><array><data>\n")
for v in value:
dump(v, write)
write("</data></array></value>\n")
del self.memo[i]
dispatch[tuple] = dump_array
dispatch[list] = dump_array
def dump_struct(self, value, write, escape=escape):
i = id(value)
if i in self.memo:
raise TypeError("cannot marshal recursive dictionaries")
self.memo[i] = None
dump = self.__dump
write("<value><struct>\n")
for k, v in value.items():
write("<member>\n")
if not isinstance(k, str):
raise TypeError("dictionary key must be string")
write("<name>%s</name>\n" % escape(k))
dump(v, write)
write("</member>\n")
write("</struct></value>\n")
del self.memo[i]
dispatch[dict] = dump_struct
def dump_datetime(self, value, write):
write("<value><dateTime.iso8601>")
write(_strftime(value))
write("</dateTime.iso8601></value>\n")
dispatch[datetime] = dump_datetime
def dump_instance(self, value, write):
# check for special wrappers
if value.__class__ in WRAPPERS:
self.write = write
value.encode(self)
del self.write
else:
# store instance attributes as a struct (really?)
self.dump_struct(value.__dict__, write)
dispatch[DateTime] = dump_instance
dispatch[Binary] = dump_instance
# XXX(twouters): using "_arbitrary_instance" as key as a quick-fix
# for the p3yk merge, this should probably be fixed more neatly.
dispatch["_arbitrary_instance"] = dump_instance
##
# XML-RPC unmarshaller.
#
# @see loads
class Unmarshaller:
"""Unmarshal an XML-RPC response, based on incoming XML event
messages (start, data, end). Call close() to get the resulting
data structure.
Note that this reader is fairly tolerant, and gladly accepts bogus
XML-RPC data without complaining (but not bogus XML).
"""
# and again, if you don't understand what's going on in here,
# that's perfectly ok.
def __init__(self, use_datetime=False, use_builtin_types=False):
self._type = None
self._stack = []
self._marks = []
self._data = []
self._methodname = None
self._encoding = "utf-8"
self.append = self._stack.append
self._use_datetime = use_builtin_types or use_datetime
self._use_bytes = use_builtin_types
def close(self):
# return response tuple and target method
if self._type is None or self._marks:
raise ResponseError()
if self._type == "fault":
raise Fault(**self._stack[0])
return tuple(self._stack)
def getmethodname(self):
return self._methodname
#
# event handlers
def xml(self, encoding, standalone):
self._encoding = encoding
# FIXME: assert standalone == 1 ???
def start(self, tag, attrs):
# prepare to handle this element
if tag == "array" or tag == "struct":
self._marks.append(len(self._stack))
self._data = []
self._value = (tag == "value")
def data(self, text):
self._data.append(text)
def end(self, tag):
# call the appropriate end tag handler
try:
f = self.dispatch[tag]
except KeyError:
pass # unknown tag ?
else:
return f(self, "".join(self._data))
#
# accelerator support
def end_dispatch(self, tag, data):
# dispatch data
try:
f = self.dispatch[tag]
except KeyError:
pass # unknown tag ?
else:
return f(self, data)
#
# element decoders
dispatch = {}
def end_nil (self, data):
self.append(None)
self._value = 0
dispatch["nil"] = end_nil
def end_boolean(self, data):
if data == "0":
self.append(False)
elif data == "1":
self.append(True)
else:
raise TypeError("bad boolean value")
self._value = 0
dispatch["boolean"] = end_boolean
def end_int(self, data):
self.append(int(data))
self._value = 0
dispatch["i4"] = end_int
dispatch["i8"] = end_int
dispatch["int"] = end_int
def end_double(self, data):
self.append(float(data))
self._value = 0
dispatch["double"] = end_double
def end_string(self, data):
if self._encoding:
data = data.decode(self._encoding)
self.append(data)
self._value = 0
dispatch["string"] = end_string
dispatch["name"] = end_string # struct keys are always strings
def end_array(self, data):
mark = self._marks.pop()
# map arrays to Python lists
self._stack[mark:] = [self._stack[mark:]]
self._value = 0
dispatch["array"] = end_array
def end_struct(self, data):
mark = self._marks.pop()
# map structs to Python dictionaries
dict = {}
items = self._stack[mark:]
for i in range(0, len(items), 2):
dict[items[i]] = items[i+1]
self._stack[mark:] = [dict]
self._value = 0
dispatch["struct"] = end_struct
def end_base64(self, data):
value = Binary()
value.decode(data.encode("ascii"))
if self._use_bytes:
value = value.data
self.append(value)
self._value = 0
dispatch["base64"] = end_base64
def end_dateTime(self, data):
value = DateTime()
value.decode(data)
if self._use_datetime:
value = _datetime_type(data)
self.append(value)
dispatch["dateTime.iso8601"] = end_dateTime
def end_value(self, data):
# if we stumble upon a value element with no internal
# elements, treat it as a string element
if self._value:
self.end_string(data)
dispatch["value"] = end_value
def end_params(self, data):
self._type = "params"
dispatch["params"] = end_params
def end_fault(self, data):
self._type = "fault"
dispatch["fault"] = end_fault
def end_methodName(self, data):
if self._encoding:
data = data.decode(self._encoding)
self._methodname = data
self._type = "methodName" # no params
dispatch["methodName"] = end_methodName
## Multicall support
#
class _MultiCallMethod:
# some lesser magic to store calls made to a MultiCall object
# for batch execution
def __init__(self, call_list, name):
self.__call_list = call_list
self.__name = name
def __getattr__(self, name):
return _MultiCallMethod(self.__call_list, "%s.%s" % (self.__name, name))
def __call__(self, *args):
self.__call_list.append((self.__name, args))
class MultiCallIterator:
"""Iterates over the results of a multicall. Exceptions are
raised in response to xmlrpc faults."""
def __init__(self, results):
self.results = results
def __getitem__(self, i):
item = self.results[i]
if type(item) == type({}):
raise Fault(item['faultCode'], item['faultString'])
elif type(item) == type([]):
return item[0]
else:
raise ValueError("unexpected type in multicall result")
class MultiCall:
"""server -> an object used to boxcar method calls
server should be a ServerProxy object.
Methods can be added to the MultiCall using normal
method call syntax e.g.:
multicall = MultiCall(server_proxy)
multicall.add(2,3)
multicall.get_address("Guido")
To execute the multicall, call the MultiCall object e.g.:
add_result, address = multicall()
"""
def __init__(self, server):
self.__server = server
self.__call_list = []
def __repr__(self):
return "<MultiCall at %x>" % id(self)
__str__ = __repr__
def __getattr__(self, name):
return _MultiCallMethod(self.__call_list, name)
def __call__(self):
marshalled_list = []
for name, args in self.__call_list:
marshalled_list.append({'methodName' : name, 'params' : args})
return MultiCallIterator(self.__server.system.multicall(marshalled_list))
# --------------------------------------------------------------------
# convenience functions
FastMarshaller = FastParser = FastUnmarshaller = None
##
# Create a parser object, and connect it to an unmarshalling instance.
# This function picks the fastest available XML parser.
#
# return A (parser, unmarshaller) tuple.
def getparser(use_datetime=False, use_builtin_types=False):
"""getparser() -> parser, unmarshaller
Create an instance of the fastest available parser, and attach it
to an unmarshalling object. Return both objects.
"""
if FastParser and FastUnmarshaller:
if use_builtin_types:
mkdatetime = _datetime_type
mkbytes = base64.decodebytes
elif use_datetime:
mkdatetime = _datetime_type
mkbytes = _binary
else:
mkdatetime = _datetime
mkbytes = _binary
target = FastUnmarshaller(True, False, mkbytes, mkdatetime, Fault)
parser = FastParser(target)
else:
target = Unmarshaller(use_datetime=use_datetime, use_builtin_types=use_builtin_types)
if FastParser:
parser = FastParser(target)
else:
parser = ExpatParser(target)
return parser, target
##
# Convert a Python tuple or a Fault instance to an XML-RPC packet.
#
# @def dumps(params, **options)
# @param params A tuple or Fault instance.
# @keyparam methodname If given, create a methodCall request for
# this method name.
# @keyparam methodresponse If given, create a methodResponse packet.
# If used with a tuple, the tuple must be a singleton (that is,
# it must contain exactly one element).
# @keyparam encoding The packet encoding.
# @return A string containing marshalled data.
def dumps(params, methodname=None, methodresponse=None, encoding=None,
allow_none=False):
"""data [,options] -> marshalled data
Convert an argument tuple or a Fault instance to an XML-RPC
request (or response, if the methodresponse option is used).
In addition to the data object, the following options can be given
as keyword arguments:
methodname: the method name for a methodCall packet
methodresponse: true to create a methodResponse packet.
If this option is used with a tuple, the tuple must be
a singleton (i.e. it can contain only one element).
encoding: the packet encoding (default is UTF-8)
All byte strings in the data structure are assumed to use the
packet encoding. Unicode strings are automatically converted,
where necessary.
"""
assert isinstance(params, (tuple, Fault)), "argument must be tuple or Fault instance"
if isinstance(params, Fault):
methodresponse = 1
elif methodresponse and isinstance(params, tuple):
assert len(params) == 1, "response tuple must be a singleton"
if not encoding:
encoding = "utf-8"
if FastMarshaller:
m = FastMarshaller(encoding)
else:
m = Marshaller(encoding, allow_none)
data = m.dumps(params)
if encoding != "utf-8":
xmlheader = "<?xml version='1.0' encoding='%s'?>\n" % str(encoding)
else:
xmlheader = "<?xml version='1.0'?>\n" # utf-8 is default
# standard XML-RPC wrappings
if methodname:
# a method call
if not isinstance(methodname, str):
methodname = methodname.encode(encoding)
data = (
xmlheader,
"<methodCall>\n"
"<methodName>", methodname, "</methodName>\n",
data,
"</methodCall>\n"
)
elif methodresponse:
# a method response, or a fault structure
data = (
xmlheader,
"<methodResponse>\n",
data,
"</methodResponse>\n"
)
else:
return data # return as is
return "".join(data)
##
# Convert an XML-RPC packet to a Python object. If the XML-RPC packet
# represents a fault condition, this function raises a Fault exception.
#
# @param data An XML-RPC packet, given as an 8-bit string.
# @return A tuple containing the unpacked data, and the method name
# (None if not present).
# @see Fault
def loads(data, use_datetime=False, use_builtin_types=False):
"""data -> unmarshalled data, method name
Convert an XML-RPC packet to unmarshalled data plus a method
name (None if not present).
If the XML-RPC packet represents a fault condition, this function
raises a Fault exception.
"""
p, u = getparser(use_datetime=use_datetime, use_builtin_types=use_builtin_types)
p.feed(data)
p.close()
return u.close(), u.getmethodname()
##
# Encode a string using the gzip content encoding such as specified by the
# Content-Encoding: gzip
# in the HTTP header, as described in RFC 1952
#
# @param data the unencoded data
# @return the encoded data
def gzip_encode(data):
"""data -> gzip encoded data
Encode data using the gzip content encoding as described in RFC 1952
"""
if not gzip:
raise NotImplementedError
f = BytesIO()
gzf = gzip.GzipFile(mode="wb", fileobj=f, compresslevel=1)
gzf.write(data)
gzf.close()
encoded = f.getvalue()
f.close()
return encoded
##
# Decode a string using the gzip content encoding such as specified by the
# Content-Encoding: gzip
# in the HTTP header, as described in RFC 1952
#
# @param data The encoded data
# @keyparam max_decode Maximum bytes to decode (20MB default), use negative
# values for unlimited decoding
# @return the unencoded data
# @raises ValueError if data is not correctly coded.
# @raises ValueError if max gzipped payload length exceeded
def gzip_decode(data, max_decode=20971520):
"""gzip encoded data -> unencoded data
Decode data using the gzip content encoding as described in RFC 1952
"""
if not gzip:
raise NotImplementedError
f = BytesIO(data)
gzf = gzip.GzipFile(mode="rb", fileobj=f)
try:
if max_decode < 0: # no limit
decoded = gzf.read()
else:
decoded = gzf.read(max_decode + 1)
except OSError:
raise ValueError("invalid data")
f.close()
gzf.close()
if max_decode >= 0 and len(decoded) > max_decode:
raise ValueError("max gzipped payload length exceeded")
return decoded
##
# Return a decoded file-like object for the gzip encoding
# as described in RFC 1952.
#
# @param response A stream supporting a read() method
# @return a file-like object that the decoded data can be read() from
class GzipDecodedResponse(gzip.GzipFile if gzip else object):
"""a file-like object to decode a response encoded with the gzip
method, as described in RFC 1952.
"""
def __init__(self, response):
#response doesn't support tell() and read(), required by
#GzipFile
if not gzip:
raise NotImplementedError
self.io = BytesIO(response.read())
gzip.GzipFile.__init__(self, mode="rb", fileobj=self.io)
def close(self):
try:
gzip.GzipFile.close(self)
finally:
self.io.close()
# --------------------------------------------------------------------
# request dispatcher
class _Method:
# some magic to bind an XML-RPC method to an RPC server.
# supports "nested" methods (e.g. examples.getStateName)
def __init__(self, send, name):
self.__send = send
self.__name = name
def __getattr__(self, name):
return _Method(self.__send, "%s.%s" % (self.__name, name))
def __call__(self, *args):
return self.__send(self.__name, args)
##
# Standard transport class for XML-RPC over HTTP.
# <p>
# You can create custom transports by subclassing this method, and
# overriding selected methods.
class Transport:
"""Handles an HTTP transaction to an XML-RPC server."""
# client identifier (may be overridden)
user_agent = "Python-xmlrpc/%s" % __version__
#if true, we'll request gzip encoding
accept_gzip_encoding = True
# if positive, encode request using gzip if it exceeds this threshold
# note that many server will get confused, so only use it if you know
# that they can decode such a request
encode_threshold = None #None = don't encode
def __init__(self, use_datetime=False, use_builtin_types=False):
self._use_datetime = use_datetime
self._use_builtin_types = use_builtin_types
self._connection = (None, None)
self._extra_headers = []
##
# Send a complete request, and parse the response.
# Retry request if a cached connection has disconnected.
#
# @param host Target host.
# @param handler Target PRC handler.
# @param request_body XML-RPC request body.
# @param verbose Debugging flag.
# @return Parsed response.
def request(self, host, handler, request_body, verbose=False):
#retry request once if cached connection has gone cold
for i in (0, 1):
try:
return self.single_request(host, handler, request_body, verbose)
except OSError as e:
if i or e.errno not in (errno.ECONNRESET, errno.ECONNABORTED,
errno.EPIPE):
raise
except http.client.BadStatusLine: #close after we sent request
if i:
raise
def single_request(self, host, handler, request_body, verbose=False):
# issue XML-RPC request
try:
http_conn = self.send_request(host, handler, request_body, verbose)
resp = http_conn.getresponse()
if resp.status == 200:
self.verbose = verbose
return self.parse_response(resp)
except Fault:
raise
except Exception:
#All unexpected errors leave connection in
# a strange state, so we clear it.
self.close()
raise
#We got an error response.
#Discard any response data and raise exception
if resp.getheader("content-length", ""):
resp.read()
raise ProtocolError(
host + handler,
resp.status, resp.reason,
dict(resp.getheaders())
)
##
# Create parser.
#
# @return A 2-tuple containing a parser and an unmarshaller.
def getparser(self):
# get parser and unmarshaller
return getparser(use_datetime=self._use_datetime,
use_builtin_types=self._use_builtin_types)
##
# Get authorization info from host parameter
# Host may be a string, or a (host, x509-dict) tuple; if a string,
# it is checked for a "user:pw@host" format, and a "Basic
# Authentication" header is added if appropriate.
#
# @param host Host descriptor (URL or (URL, x509 info) tuple).
# @return A 3-tuple containing (actual host, extra headers,
# x509 info). The header and x509 fields may be None.
def get_host_info(self, host):
x509 = {}
if isinstance(host, tuple):
host, x509 = host
auth, host = urllib.parse.splituser(host)
if auth:
auth = urllib.parse.unquote_to_bytes(auth)
auth = base64.encodebytes(auth).decode("utf-8")
auth = "".join(auth.split()) # get rid of whitespace
extra_headers = [
("Authorization", "Basic " + auth)
]
else:
extra_headers = []
return host, extra_headers, x509
##
# Connect to server.
#
# @param host Target host.
# @return An HTTPConnection object
def make_connection(self, host):
#return an existing connection if possible. This allows
#HTTP/1.1 keep-alive.
if self._connection and host == self._connection[0]:
return self._connection[1]
# create a HTTP connection object from a host descriptor
chost, self._extra_headers, x509 = self.get_host_info(host)
self._connection = host, http.client.HTTPConnection(chost)
return self._connection[1]
##
# Clear any cached connection object.
# Used in the event of socket errors.
#
def close(self):
host, connection = self._connection
if connection:
self._connection = (None, None)
connection.close()
##
# Send HTTP request.
#
# @param host Host descriptor (URL or (URL, x509 info) tuple).
# @param handler Targer RPC handler (a path relative to host)
# @param request_body The XML-RPC request body
# @param debug Enable debugging if debug is true.
# @return An HTTPConnection.
def send_request(self, host, handler, request_body, debug):
connection = self.make_connection(host)
headers = self._extra_headers[:]
if debug:
connection.set_debuglevel(1)
if self.accept_gzip_encoding and gzip:
connection.putrequest("POST", handler, skip_accept_encoding=True)
headers.append(("Accept-Encoding", "gzip"))
else:
connection.putrequest("POST", handler)
headers.append(("Content-Type", "text/xml"))
headers.append(("User-Agent", self.user_agent))
self.send_headers(connection, headers)
self.send_content(connection, request_body)
return connection
##
# Send request headers.
# This function provides a useful hook for subclassing
#
# @param connection httpConnection.
# @param headers list of key,value pairs for HTTP headers
def send_headers(self, connection, headers):
for key, val in headers:
connection.putheader(key, val)
##
# Send request body.
# This function provides a useful hook for subclassing
#
# @param connection httpConnection.
# @param request_body XML-RPC request body.
def send_content(self, connection, request_body):
#optionally encode the request
if (self.encode_threshold is not None and
self.encode_threshold < len(request_body) and
gzip):
connection.putheader("Content-Encoding", "gzip")
request_body = gzip_encode(request_body)
connection.putheader("Content-Length", str(len(request_body)))
connection.endheaders(request_body)
##
# Parse response.
#
# @param file Stream.
# @return Response tuple and target method.
def parse_response(self, response):
# read response data from httpresponse, and parse it
# Check for new http response object, otherwise it is a file object.
if hasattr(response, 'getheader'):
if response.getheader("Content-Encoding", "") == "gzip":
stream = GzipDecodedResponse(response)
else:
stream = response
else:
stream = response
p, u = self.getparser()
while 1:
data = stream.read(1024)
if not data:
break
if self.verbose:
print("body:", repr(data))
p.feed(data)
if stream is not response:
stream.close()
p.close()
return u.close()
##
# Standard transport class for XML-RPC over HTTPS.
class SafeTransport(Transport):
"""Handles an HTTPS transaction to an XML-RPC server."""
def __init__(self, use_datetime=False, use_builtin_types=False, *,
context=None):
super().__init__(use_datetime=use_datetime, use_builtin_types=use_builtin_types)
self.context = context
# FIXME: mostly untested
def make_connection(self, host):
if self._connection and host == self._connection[0]:
return self._connection[1]
if not hasattr(http.client, "HTTPSConnection"):
raise NotImplementedError(
"your version of http.client doesn't support HTTPS")
# create a HTTPS connection object from a host descriptor
# host may be a string, or a (host, x509-dict) tuple
chost, self._extra_headers, x509 = self.get_host_info(host)
self._connection = host, http.client.HTTPSConnection(chost,
None, context=self.context, **(x509 or {}))
return self._connection[1]
##
# Standard server proxy. This class establishes a virtual connection
# to an XML-RPC server.
# <p>
# This class is available as ServerProxy and Server. New code should
# use ServerProxy, to avoid confusion.
#
# @def ServerProxy(uri, **options)
# @param uri The connection point on the server.
# @keyparam transport A transport factory, compatible with the
# standard transport class.
# @keyparam encoding The default encoding used for 8-bit strings
# (default is UTF-8).
# @keyparam verbose Use a true value to enable debugging output.
# (printed to standard output).
# @see Transport
class ServerProxy:
"""uri [,options] -> a logical connection to an XML-RPC server
uri is the connection point on the server, given as
scheme://host/target.
The standard implementation always supports the "http" scheme. If
SSL socket support is available (Python 2.0), it also supports
"https".
If the target part and the slash preceding it are both omitted,
"/RPC2" is assumed.
The following options can be given as keyword arguments:
transport: a transport factory
encoding: the request encoding (default is UTF-8)
All 8-bit strings passed to the server proxy are assumed to use
the given encoding.
"""
def __init__(self, uri, transport=None, encoding=None, verbose=False,
allow_none=False, use_datetime=False, use_builtin_types=False,
*, context=None):
# establish a "logical" server connection
# get the url
type, uri = urllib.parse.splittype(uri)
if type not in ("http", "https"):
raise OSError("unsupported XML-RPC protocol")
self.__host, self.__handler = urllib.parse.splithost(uri)
if not self.__handler:
self.__handler = "/RPC2"
if transport is None:
if type == "https":
handler = SafeTransport
extra_kwargs = {"context": context}
else:
handler = Transport
extra_kwargs = {}
transport = handler(use_datetime=use_datetime,
use_builtin_types=use_builtin_types,
**extra_kwargs)
self.__transport = transport
self.__encoding = encoding or 'utf-8'
self.__verbose = verbose
self.__allow_none = allow_none
def __close(self):
self.__transport.close()
def __request(self, methodname, params):
# call a method on the remote server
request = dumps(params, methodname, encoding=self.__encoding,
allow_none=self.__allow_none).encode(self.__encoding)
response = self.__transport.request(
self.__host,
self.__handler,
request,
verbose=self.__verbose
)
if len(response) == 1:
response = response[0]
return response
def __repr__(self):
return (
"<ServerProxy for %s%s>" %
(self.__host, self.__handler)
)
__str__ = __repr__
def __getattr__(self, name):
# magic method dispatcher
return _Method(self.__request, name)
# note: to call a remote object with an non-standard name, use
# result getattr(server, "strange-python-name")(args)
def __call__(self, attr):
"""A workaround to get special attributes on the ServerProxy
without interfering with the magic __getattr__
"""
if attr == "close":
return self.__close
elif attr == "transport":
return self.__transport
raise AttributeError("Attribute %r not found" % (attr,))
# compatibility
Server = ServerProxy
# --------------------------------------------------------------------
# test code
if __name__ == "__main__":
# simple test program (from the XML-RPC specification)
# local server, available from Lib/xmlrpc/server.py
server = ServerProxy("http://localhost:8000")
try:
print(server.currentTime.getCurrentTime())
except Error as v:
print("ERROR", v)
multi = MultiCall(server)
multi.getData()
multi.pow(2,9)
multi.add(1,2)
try:
for response in multi():
print(response)
except Error as v:
print("ERROR", v)
r"""XML-RPC Servers.
This module can be used to create simple XML-RPC servers
by creating a server and either installing functions, a
class instance, or by extending the SimpleXMLRPCServer
class.
It can also be used to handle XML-RPC requests in a CGI
environment using CGIXMLRPCRequestHandler.
The Doc* classes can be used to create XML-RPC servers that
serve pydoc-style documentation in response to HTTP
GET requests. This documentation is dynamically generated
based on the functions and methods registered with the
server.
A list of possible usage patterns follows:
1. Install functions:
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_function(pow)
server.register_function(lambda x,y: x+y, 'add')
server.serve_forever()
2. Install an instance:
class MyFuncs:
def __init__(self):
# make all of the sys functions available through sys.func_name
import sys
self.sys = sys
def _listMethods(self):
# implement this method so that system.listMethods
# knows to advertise the sys methods
return list_public_methods(self) + \
['sys.' + method for method in list_public_methods(self.sys)]
def pow(self, x, y): return pow(x, y)
def add(self, x, y) : return x + y
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_introspection_functions()
server.register_instance(MyFuncs())
server.serve_forever()
3. Install an instance with custom dispatch method:
class Math:
def _listMethods(self):
# this method must be present for system.listMethods
# to work
return ['add', 'pow']
def _methodHelp(self, method):
# this method must be present for system.methodHelp
# to work
if method == 'add':
return "add(2,3) => 5"
elif method == 'pow':
return "pow(x, y[, z]) => number"
else:
# By convention, return empty
# string if no help is available
return ""
def _dispatch(self, method, params):
if method == 'pow':
return pow(*params)
elif method == 'add':
return params[0] + params[1]
else:
raise ValueError('bad method')
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_introspection_functions()
server.register_instance(Math())
server.serve_forever()
4. Subclass SimpleXMLRPCServer:
class MathServer(SimpleXMLRPCServer):
def _dispatch(self, method, params):
try:
# We are forcing the 'export_' prefix on methods that are
# callable through XML-RPC to prevent potential security
# problems
func = getattr(self, 'export_' + method)
except AttributeError:
raise Exception('method "%s" is not supported' % method)
else:
return func(*params)
def export_add(self, x, y):
return x + y
server = MathServer(("localhost", 8000))
server.serve_forever()
5. CGI script:
server = CGIXMLRPCRequestHandler()
server.register_function(pow)
server.handle_request()
"""
# Written by Brian Quinlan ([email protected]).
# Based on code written by Fredrik Lundh.
from xmlrpc.client import Fault, dumps, loads, gzip_encode, gzip_decode
from http.server import BaseHTTPRequestHandler
import http.server
import socketserver
import sys
import os
import re
import pydoc
import inspect
import traceback
try:
import fcntl
except ImportError:
fcntl = None
def resolve_dotted_attribute(obj, attr, allow_dotted_names=True):
"""resolve_dotted_attribute(a, 'b.c.d') => a.b.c.d
Resolves a dotted attribute name to an object. Raises
an AttributeError if any attribute in the chain starts with a '_'.
If the optional allow_dotted_names argument is false, dots are not
supported and this function operates similar to getattr(obj, attr).
"""
if allow_dotted_names:
attrs = attr.split('.')
else:
attrs = [attr]
for i in attrs:
if i.startswith('_'):
raise AttributeError(
'attempt to access private attribute "%s"' % i
)
else:
obj = getattr(obj,i)
return obj
def list_public_methods(obj):
"""Returns a list of attribute strings, found in the specified
object, which represent callable attributes"""
return [member for member in dir(obj)
if not member.startswith('_') and
callable(getattr(obj, member))]
class SimpleXMLRPCDispatcher:
"""Mix-in class that dispatches XML-RPC requests.
This class is used to register XML-RPC method handlers
and then to dispatch them. This class doesn't need to be
instanced directly when used by SimpleXMLRPCServer but it
can be instanced when used by the MultiPathXMLRPCServer
"""
def __init__(self, allow_none=False, encoding=None,
use_builtin_types=False):
self.funcs = {}
self.instance = None
self.allow_none = allow_none
self.encoding = encoding or 'utf-8'
self.use_builtin_types = use_builtin_types
def register_instance(self, instance, allow_dotted_names=False):
"""Registers an instance to respond to XML-RPC requests.
Only one instance can be installed at a time.
If the registered instance has a _dispatch method then that
method will be called with the name of the XML-RPC method and
its parameters as a tuple
e.g. instance._dispatch('add',(2,3))
If the registered instance does not have a _dispatch method
then the instance will be searched to find a matching method
and, if found, will be called. Methods beginning with an '_'
are considered private and will not be called by
SimpleXMLRPCServer.
If a registered function matches a XML-RPC request, then it
will be called instead of the registered instance.
If the optional allow_dotted_names argument is true and the
instance does not have a _dispatch method, method names
containing dots are supported and resolved, as long as none of
the name segments start with an '_'.
*** SECURITY WARNING: ***
Enabling the allow_dotted_names options allows intruders
to access your module's global variables and may allow
intruders to execute arbitrary code on your machine. Only
use this option on a secure, closed network.
"""
self.instance = instance
self.allow_dotted_names = allow_dotted_names
def register_function(self, function, name=None):
"""Registers a function to respond to XML-RPC requests.
The optional name argument can be used to set a Unicode name
for the function.
"""
if name is None:
name = function.__name__
self.funcs[name] = function
def register_introspection_functions(self):
"""Registers the XML-RPC introspection methods in the system
namespace.
see http://xmlrpc.usefulinc.com/doc/reserved.html
"""
self.funcs.update({'system.listMethods' : self.system_listMethods,
'system.methodSignature' : self.system_methodSignature,
'system.methodHelp' : self.system_methodHelp})
def register_multicall_functions(self):
"""Registers the XML-RPC multicall method in the system
namespace.
see http://www.xmlrpc.com/discuss/msgReader$1208"""
self.funcs.update({'system.multicall' : self.system_multicall})
def _marshaled_dispatch(self, data, dispatch_method = None, path = None):
"""Dispatches an XML-RPC method from marshalled (XML) data.
XML-RPC methods are dispatched from the marshalled (XML) data
using the _dispatch method and the result is returned as
marshalled data. For backwards compatibility, a dispatch
function can be provided as an argument (see comment in
SimpleXMLRPCRequestHandler.do_POST) but overriding the
existing method through subclassing is the preferred means
of changing method dispatch behavior.
"""
try:
params, method = loads(data, use_builtin_types=self.use_builtin_types)
# generate response
if dispatch_method is not None:
response = dispatch_method(method, params)
else:
response = self._dispatch(method, params)
# wrap response in a singleton tuple
response = (response,)
response = dumps(response, methodresponse=1,
allow_none=self.allow_none, encoding=self.encoding)
except Fault as fault:
response = dumps(fault, allow_none=self.allow_none,
encoding=self.encoding)
except:
# report exception back to server
exc_type, exc_value, exc_tb = sys.exc_info()
response = dumps(
Fault(1, "%s:%s" % (exc_type, exc_value)),
encoding=self.encoding, allow_none=self.allow_none,
)
return response.encode(self.encoding)
def system_listMethods(self):
"""system.listMethods() => ['add', 'subtract', 'multiple']
Returns a list of the methods supported by the server."""
methods = set(self.funcs.keys())
if self.instance is not None:
# Instance can implement _listMethod to return a list of
# methods
if hasattr(self.instance, '_listMethods'):
methods |= set(self.instance._listMethods())
# if the instance has a _dispatch method then we
# don't have enough information to provide a list
# of methods
elif not hasattr(self.instance, '_dispatch'):
methods |= set(list_public_methods(self.instance))
return sorted(methods)
def system_methodSignature(self, method_name):
"""system.methodSignature('add') => [double, int, int]
Returns a list describing the signature of the method. In the
above example, the add method takes two integers as arguments
and returns a double result.
This server does NOT support system.methodSignature."""
# See http://xmlrpc.usefulinc.com/doc/sysmethodsig.html
return 'signatures not supported'
def system_methodHelp(self, method_name):
"""system.methodHelp('add') => "Adds two integers together"
Returns a string containing documentation for the specified method."""
method = None
if method_name in self.funcs:
method = self.funcs[method_name]
elif self.instance is not None:
# Instance can implement _methodHelp to return help for a method
if hasattr(self.instance, '_methodHelp'):
return self.instance._methodHelp(method_name)
# if the instance has a _dispatch method then we
# don't have enough information to provide help
elif not hasattr(self.instance, '_dispatch'):
try:
method = resolve_dotted_attribute(
self.instance,
method_name,
self.allow_dotted_names
)
except AttributeError:
pass
# Note that we aren't checking that the method actually
# be a callable object of some kind
if method is None:
return ""
else:
return pydoc.getdoc(method)
def system_multicall(self, call_list):
"""system.multicall([{'methodName': 'add', 'params': [2, 2]}, ...]) => \
[[4], ...]
Allows the caller to package multiple XML-RPC calls into a single
request.
See http://www.xmlrpc.com/discuss/msgReader$1208
"""
results = []
for call in call_list:
method_name = call['methodName']
params = call['params']
try:
# XXX A marshalling error in any response will fail the entire
# multicall. If someone cares they should fix this.
results.append([self._dispatch(method_name, params)])
except Fault as fault:
results.append(
{'faultCode' : fault.faultCode,
'faultString' : fault.faultString}
)
except:
exc_type, exc_value, exc_tb = sys.exc_info()
results.append(
{'faultCode' : 1,
'faultString' : "%s:%s" % (exc_type, exc_value)}
)
return results
def _dispatch(self, method, params):
"""Dispatches the XML-RPC method.
XML-RPC calls are forwarded to a registered function that
matches the called XML-RPC method name. If no such function
exists then the call is forwarded to the registered instance,
if available.
If the registered instance has a _dispatch method then that
method will be called with the name of the XML-RPC method and
its parameters as a tuple
e.g. instance._dispatch('add',(2,3))
If the registered instance does not have a _dispatch method
then the instance will be searched to find a matching method
and, if found, will be called.
Methods beginning with an '_' are considered private and will
not be called.
"""
func = None
try:
# check to see if a matching function has been registered
func = self.funcs[method]
except KeyError:
if self.instance is not None:
# check for a _dispatch method
if hasattr(self.instance, '_dispatch'):
return self.instance._dispatch(method, params)
else:
# call instance method directly
try:
func = resolve_dotted_attribute(
self.instance,
method,
self.allow_dotted_names
)
except AttributeError:
pass
if func is not None:
return func(*params)
else:
raise Exception('method "%s" is not supported' % method)
class SimpleXMLRPCRequestHandler(BaseHTTPRequestHandler):
"""Simple XML-RPC request handler class.
Handles all HTTP POST requests and attempts to decode them as
XML-RPC requests.
"""
# Class attribute listing the accessible path components;
# paths not on this list will result in a 404 error.
rpc_paths = ('/', '/RPC2')
#if not None, encode responses larger than this, if possible
encode_threshold = 1400 #a common MTU
#Override form StreamRequestHandler: full buffering of output
#and no Nagle.
wbufsize = -1
disable_nagle_algorithm = True
# a re to match a gzip Accept-Encoding
aepattern = re.compile(r"""
\s* ([^\s;]+) \s* #content-coding
(;\s* q \s*=\s* ([0-9\.]+))? #q
""", re.VERBOSE | re.IGNORECASE)
def accept_encodings(self):
r = {}
ae = self.headers.get("Accept-Encoding", "")
for e in ae.split(","):
match = self.aepattern.match(e)
if match:
v = match.group(3)
v = float(v) if v else 1.0
r[match.group(1)] = v
return r
def is_rpc_path_valid(self):
if self.rpc_paths:
return self.path in self.rpc_paths
else:
# If .rpc_paths is empty, just assume all paths are legal
return True
def do_POST(self):
"""Handles the HTTP POST request.
Attempts to interpret all HTTP POST requests as XML-RPC calls,
which are forwarded to the server's _dispatch method for handling.
"""
# Check that the path is legal
if not self.is_rpc_path_valid():
self.report_404()
return
try:
# Get arguments by reading body of request.
# We read this in chunks to avoid straining
# socket.read(); around the 10 or 15Mb mark, some platforms
# begin to have problems (bug #792570).
max_chunk_size = 10*1024*1024
size_remaining = int(self.headers["content-length"])
L = []
while size_remaining:
chunk_size = min(size_remaining, max_chunk_size)
chunk = self.rfile.read(chunk_size)
if not chunk:
break
L.append(chunk)
size_remaining -= len(L[-1])
data = b''.join(L)
data = self.decode_request_content(data)
if data is None:
return #response has been sent
# In previous versions of SimpleXMLRPCServer, _dispatch
# could be overridden in this class, instead of in
# SimpleXMLRPCDispatcher. To maintain backwards compatibility,
# check to see if a subclass implements _dispatch and dispatch
# using that method if present.
response = self.server._marshaled_dispatch(
data, getattr(self, '_dispatch', None), self.path
)
except Exception as e: # This should only happen if the module is buggy
# internal error, report as HTTP server error
self.send_response(500)
# Send information about the exception if requested
if hasattr(self.server, '_send_traceback_header') and \
self.server._send_traceback_header:
self.send_header("X-exception", str(e))
trace = traceback.format_exc()
trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII')
self.send_header("X-traceback", trace)
self.send_header("Content-length", "0")
self.end_headers()
else:
self.send_response(200)
self.send_header("Content-type", "text/xml")
if self.encode_threshold is not None:
if len(response) > self.encode_threshold:
q = self.accept_encodings().get("gzip", 0)
if q:
try:
response = gzip_encode(response)
self.send_header("Content-Encoding", "gzip")
except NotImplementedError:
pass
self.send_header("Content-length", str(len(response)))
self.end_headers()
self.wfile.write(response)
def decode_request_content(self, data):
#support gzip encoding of request
encoding = self.headers.get("content-encoding", "identity").lower()
if encoding == "identity":
return data
if encoding == "gzip":
try:
return gzip_decode(data)
except NotImplementedError:
self.send_response(501, "encoding %r not supported" % encoding)
except ValueError:
self.send_response(400, "error decoding gzip content")
else:
self.send_response(501, "encoding %r not supported" % encoding)
self.send_header("Content-length", "0")
self.end_headers()
def report_404 (self):
# Report a 404 error
self.send_response(404)
response = b'No such page'
self.send_header("Content-type", "text/plain")
self.send_header("Content-length", str(len(response)))
self.end_headers()
self.wfile.write(response)
def log_request(self, code='-', size='-'):
"""Selectively log an accepted request."""
if self.server.logRequests:
BaseHTTPRequestHandler.log_request(self, code, size)
class SimpleXMLRPCServer(socketserver.TCPServer,
SimpleXMLRPCDispatcher):
"""Simple XML-RPC server.
Simple XML-RPC server that allows functions and a single instance
to be installed to handle requests. The default implementation
attempts to dispatch XML-RPC calls to the functions or instance
installed in the server. Override the _dispatch method inherited
from SimpleXMLRPCDispatcher to change this behavior.
"""
allow_reuse_address = True
# Warning: this is for debugging purposes only! Never set this to True in
# production code, as will be sending out sensitive information (exception
# and stack trace details) when exceptions are raised inside
# SimpleXMLRPCRequestHandler.do_POST
_send_traceback_header = False
def __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler,
logRequests=True, allow_none=False, encoding=None,
bind_and_activate=True, use_builtin_types=False):
self.logRequests = logRequests
SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding, use_builtin_types)
socketserver.TCPServer.__init__(self, addr, requestHandler, bind_and_activate)
class MultiPathXMLRPCServer(SimpleXMLRPCServer):
"""Multipath XML-RPC Server
This specialization of SimpleXMLRPCServer allows the user to create
multiple Dispatcher instances and assign them to different
HTTP request paths. This makes it possible to run two or more
'virtual XML-RPC servers' at the same port.
Make sure that the requestHandler accepts the paths in question.
"""
def __init__(self, addr, requestHandler=SimpleXMLRPCRequestHandler,
logRequests=True, allow_none=False, encoding=None,
bind_and_activate=True, use_builtin_types=False):
SimpleXMLRPCServer.__init__(self, addr, requestHandler, logRequests, allow_none,
encoding, bind_and_activate, use_builtin_types)
self.dispatchers = {}
self.allow_none = allow_none
self.encoding = encoding or 'utf-8'
def add_dispatcher(self, path, dispatcher):
self.dispatchers[path] = dispatcher
return dispatcher
def get_dispatcher(self, path):
return self.dispatchers[path]
def _marshaled_dispatch(self, data, dispatch_method = None, path = None):
try:
response = self.dispatchers[path]._marshaled_dispatch(
data, dispatch_method, path)
except:
# report low level exception back to server
# (each dispatcher should have handled their own
# exceptions)
exc_type, exc_value = sys.exc_info()[:2]
response = dumps(
Fault(1, "%s:%s" % (exc_type, exc_value)),
encoding=self.encoding, allow_none=self.allow_none)
response = response.encode(self.encoding)
return response
class CGIXMLRPCRequestHandler(SimpleXMLRPCDispatcher):
"""Simple handler for XML-RPC data passed through CGI."""
def __init__(self, allow_none=False, encoding=None, use_builtin_types=False):
SimpleXMLRPCDispatcher.__init__(self, allow_none, encoding, use_builtin_types)
def handle_xmlrpc(self, request_text):
"""Handle a single XML-RPC request"""
response = self._marshaled_dispatch(request_text)
print('Content-Type: text/xml')
print('Content-Length: %d' % len(response))
print()
sys.stdout.flush()
sys.stdout.buffer.write(response)
sys.stdout.buffer.flush()
def handle_get(self):
"""Handle a single HTTP GET request.
Default implementation indicates an error because
XML-RPC uses the POST method.
"""
code = 400
message, explain = BaseHTTPRequestHandler.responses[code]
response = http.server.DEFAULT_ERROR_MESSAGE % \
{
'code' : code,
'message' : message,
'explain' : explain
}
response = response.encode('utf-8')
print('Status: %d %s' % (code, message))
print('Content-Type: %s' % http.server.DEFAULT_ERROR_CONTENT_TYPE)
print('Content-Length: %d' % len(response))
print()
sys.stdout.flush()
sys.stdout.buffer.write(response)
sys.stdout.buffer.flush()
def handle_request(self, request_text=None):
"""Handle a single XML-RPC request passed through a CGI post method.
If no XML data is given then it is read from stdin. The resulting
XML-RPC response is printed to stdout along with the correct HTTP
headers.
"""
if request_text is None and \
os.environ.get('REQUEST_METHOD', None) == 'GET':
self.handle_get()
else:
# POST data is normally available through stdin
try:
length = int(os.environ.get('CONTENT_LENGTH', None))
except (ValueError, TypeError):
length = -1
if request_text is None:
request_text = sys.stdin.read(length)
self.handle_xmlrpc(request_text)
# -----------------------------------------------------------------------------
# Self documenting XML-RPC Server.
class ServerHTMLDoc(pydoc.HTMLDoc):
"""Class used to generate pydoc HTML document for a server"""
def markup(self, text, escape=None, funcs={}, classes={}, methods={}):
"""Mark up some plain text, given a context of symbols to look for.
Each context dictionary maps object names to anchor names."""
escape = escape or self.escape
results = []
here = 0
# XXX Note that this regular expression does not allow for the
# hyperlinking of arbitrary strings being used as method
# names. Only methods with names consisting of word characters
# and '.'s are hyperlinked.
pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
r'RFC[- ]?(\d+)|'
r'PEP[- ]?(\d+)|'
r'(self\.)?((?:\w|\.)+))\b')
while 1:
match = pattern.search(text, here)
if not match: break
start, end = match.span()
results.append(escape(text[here:start]))
all, scheme, rfc, pep, selfdot, name = match.groups()
if scheme:
url = escape(all).replace('"', '"')
results.append('<a href="%s">%s</a>' % (url, url))
elif rfc:
url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif pep:
url = 'http://www.python.org/dev/peps/pep-%04d/' % int(pep)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif text[end:end+1] == '(':
results.append(self.namelink(name, methods, funcs, classes))
elif selfdot:
results.append('self.<strong>%s</strong>' % name)
else:
results.append(self.namelink(name, classes))
here = end
results.append(escape(text[here:]))
return ''.join(results)
def docroutine(self, object, name, mod=None,
funcs={}, classes={}, methods={}, cl=None):
"""Produce HTML documentation for a function or method object."""
anchor = (cl and cl.__name__ or '') + '-' + name
note = ''
title = '<a name="%s"><strong>%s</strong></a>' % (
self.escape(anchor), self.escape(name))
if inspect.ismethod(object):
args = inspect.getfullargspec(object)
# exclude the argument bound to the instance, it will be
# confusing to the non-Python user
argspec = inspect.formatargspec (
args.args[1:],
args.varargs,
args.varkw,
args.defaults,
annotations=args.annotations,
formatvalue=self.formatvalue
)
elif inspect.isfunction(object):
args = inspect.getfullargspec(object)
argspec = inspect.formatargspec(
args.args, args.varargs, args.varkw, args.defaults,
annotations=args.annotations,
formatvalue=self.formatvalue)
else:
argspec = '(...)'
if isinstance(object, tuple):
argspec = object[0] or argspec
docstring = object[1] or ""
else:
docstring = pydoc.getdoc(object)
decl = title + argspec + (note and self.grey(
'<font face="helvetica, arial">%s</font>' % note))
doc = self.markup(
docstring, self.preformat, funcs, classes, methods)
doc = doc and '<dd><tt>%s</tt></dd>' % doc
return '<dl><dt>%s</dt>%s</dl>\n' % (decl, doc)
def docserver(self, server_name, package_documentation, methods):
"""Produce HTML documentation for an XML-RPC server."""
fdict = {}
for key, value in methods.items():
fdict[key] = '#-' + key
fdict[value] = fdict[key]
server_name = self.escape(server_name)
head = '<big><big><strong>%s</strong></big></big>' % server_name
result = self.heading(head, '#ffffff', '#7799ee')
doc = self.markup(package_documentation, self.preformat, fdict)
doc = doc and '<tt>%s</tt>' % doc
result = result + '<p>%s</p>\n' % doc
contents = []
method_items = sorted(methods.items())
for key, value in method_items:
contents.append(self.docroutine(value, key, funcs=fdict))
result = result + self.bigsection(
'Methods', '#ffffff', '#eeaa77', ''.join(contents))
return result
class XMLRPCDocGenerator:
"""Generates documentation for an XML-RPC server.
This class is designed as mix-in and should not
be constructed directly.
"""
def __init__(self):
# setup variables used for HTML documentation
self.server_name = 'XML-RPC Server Documentation'
self.server_documentation = \
"This server exports the following methods through the XML-RPC "\
"protocol."
self.server_title = 'XML-RPC Server Documentation'
def set_server_title(self, server_title):
"""Set the HTML title of the generated server documentation"""
self.server_title = server_title
def set_server_name(self, server_name):
"""Set the name of the generated HTML server documentation"""
self.server_name = server_name
def set_server_documentation(self, server_documentation):
"""Set the documentation string for the entire server."""
self.server_documentation = server_documentation
def generate_html_documentation(self):
"""generate_html_documentation() => html documentation for the server
Generates HTML documentation for the server using introspection for
installed functions and instances that do not implement the
_dispatch method. Alternatively, instances can choose to implement
the _get_method_argstring(method_name) method to provide the
argument string used in the documentation and the
_methodHelp(method_name) method to provide the help text used
in the documentation."""
methods = {}
for method_name in self.system_listMethods():
if method_name in self.funcs:
method = self.funcs[method_name]
elif self.instance is not None:
method_info = [None, None] # argspec, documentation
if hasattr(self.instance, '_get_method_argstring'):
method_info[0] = self.instance._get_method_argstring(method_name)
if hasattr(self.instance, '_methodHelp'):
method_info[1] = self.instance._methodHelp(method_name)
method_info = tuple(method_info)
if method_info != (None, None):
method = method_info
elif not hasattr(self.instance, '_dispatch'):
try:
method = resolve_dotted_attribute(
self.instance,
method_name
)
except AttributeError:
method = method_info
else:
method = method_info
else:
assert 0, "Could not find method in self.functions and no "\
"instance installed"
methods[method_name] = method
documenter = ServerHTMLDoc()
documentation = documenter.docserver(
self.server_name,
self.server_documentation,
methods
)
return documenter.page(self.server_title, documentation)
class DocXMLRPCRequestHandler(SimpleXMLRPCRequestHandler):
"""XML-RPC and documentation request handler class.
Handles all HTTP POST requests and attempts to decode them as
XML-RPC requests.
Handles all HTTP GET requests and interprets them as requests
for documentation.
"""
def do_GET(self):
"""Handles the HTTP GET request.
Interpret all HTTP GET requests as requests for server
documentation.
"""
# Check that the path is legal
if not self.is_rpc_path_valid():
self.report_404()
return
response = self.server.generate_html_documentation().encode('utf-8')
self.send_response(200)
self.send_header("Content-type", "text/html")
self.send_header("Content-length", str(len(response)))
self.end_headers()
self.wfile.write(response)
class DocXMLRPCServer( SimpleXMLRPCServer,
XMLRPCDocGenerator):
"""XML-RPC and HTML documentation server.
Adds the ability to serve server documentation to the capabilities
of SimpleXMLRPCServer.
"""
def __init__(self, addr, requestHandler=DocXMLRPCRequestHandler,
logRequests=True, allow_none=False, encoding=None,
bind_and_activate=True, use_builtin_types=False):
SimpleXMLRPCServer.__init__(self, addr, requestHandler, logRequests,
allow_none, encoding, bind_and_activate,
use_builtin_types)
XMLRPCDocGenerator.__init__(self)
class DocCGIXMLRPCRequestHandler( CGIXMLRPCRequestHandler,
XMLRPCDocGenerator):
"""Handler for XML-RPC data and documentation requests passed through
CGI"""
def handle_get(self):
"""Handles the HTTP GET request.
Interpret all HTTP GET requests as requests for server
documentation.
"""
response = self.generate_html_documentation().encode('utf-8')
print('Content-Type: text/html')
print('Content-Length: %d' % len(response))
print()
sys.stdout.flush()
sys.stdout.buffer.write(response)
sys.stdout.buffer.flush()
def __init__(self):
CGIXMLRPCRequestHandler.__init__(self)
XMLRPCDocGenerator.__init__(self)
if __name__ == '__main__':
import datetime
class ExampleService:
def getData(self):
return '42'
class currentTime:
@staticmethod
def getCurrentTime():
return datetime.datetime.now()
server = SimpleXMLRPCServer(("localhost", 8000))
server.register_function(pow)
server.register_function(lambda x,y: x+y, 'add')
server.register_instance(ExampleService(), allow_dotted_names=True)
server.register_multicall_functions()
print('Serving XML-RPC on localhost port 8000')
print('It is advisable to run this example server within a secure, closed network.')
try:
server.serve_forever()
except KeyboardInterrupt:
print("\nKeyboard interrupt received, exiting.")
server.server_close()
sys.exit(0)
# This directory is a Python package.
IronPython Console
==================
IronPython is an open-source implementation of the Python programming language that is tightly integrated with the .NET Framework. IronPython can use the .NET Framework and Python libraries, and other .NET languages can use Python code just as easily.
This package contains a standalone Python interpreter running on .NET Framework, invokable from the command line as `ipy`. It also includes the Python Standard Library released by the Python project, but slightly modified to work better with IronPython and .NET.
The current target is Python 3.4, although features and behaviors from later versions may be included. Refer to the [source code repository](https://github.com/IronLanguages/ironpython3) for list of features from each version of CPython that have been implemented.
## Differences with CPython
While compatibility with CPython is one of our main goals with IronPython 3, there are still some differences that may cause issues. See [Differences from CPython](https://github.com/IronLanguages/ironpython3/blob/master/Documentation/differences-from-c-python.md) for details.
## Package compatibility
See the [Package compatibility](https://github.com/IronLanguages/ironpython3/blob/master/Documentation/package-compatibility.md) document for information on compatibility with popular Python packages.
VERIFICATION Verification is intended to assist the Chocolatey moderators and community in verifying that this package's contents are trustworthy.
This package is published by the IronPython Project itself. The binaries are identical to other package types published by the project, in particular the IronPython nuget package.
Log in or click on link to see number of positives.
- wininst-6.0.exe (751941b4e098) - ## / 75
- wininst-7.1.exe (4a3387a54eec) - ## / 75
- wininst-8.0.exe (e362670f93cd) - ## / 75
- wininst-10.0-amd64.exe (b2e32b3fa44b) - ## / 55
- wininst-10.0.exe (1aa3927c7985) - ## / 75
- wininst-9.0-amd64.exe (84fe7824717b) - ## / 74
- wininst-9.0.exe (13b98844b2fa) - ## / 75
- IKVM.Reflection.dll (0718c32f2184) - ## / 70
- System.Numerics.Vectors.dll (1d3ef8698281) - ## / 73
- System.Buffers.dll (accccfbe45d9) - ## / 73
- System.Runtime.CompilerServices.Unsafe.dll (66409f670315) - ## / 72
- System.Memory.dll (bf3fb84664f4) - ## / 73
- ipy.exe (e0d0ada32da4) - ## / 71
- ipy32.exe (9023ca3d1bb7) - ## / 71
- ipyc.exe (d2e05b45dbd7) - ## / 71
- ipyw.exe (3d9987a38417) - ## / 71
- ipyw32.exe (19920dd81602) - ## / 61
- ironpython.3.4.1.nupkg (54ccadfd118b) - ## / 64
- IronPython.dll (fa31585f4ee4) - ## / 67
- IronPython.Modules.dll (42154d6cf9fc) - ## / 62
- Microsoft.Dynamic.dll (237f9659526c) - ## / 70
- Microsoft.Scripting.dll (e5197479d47a) - ## / 49
- IronPython.SQLite.dll (5c43c75d7d88) - ## / 70
- IronPython.Wpf.dll (4e3c9e0d0ec9) - ## / 70
In cases where actual malware is found, the packages are subject to removal. Software sometimes has false positives. Moderators do not necessarily validate the safety of the underlying software, only that a package retrieves software from the official distribution point and/or validate embedded software against official distribution point (where distribution rights allow redistribution).
Chocolatey Pro provides runtime protection from possible malware.
Add to Builder | Version | Downloads | Last Updated | Status |
---|---|---|---|---|
IronPython 3.4.1 | 2686 | Tuesday, July 18, 2023 | Approved | |
IronPython 3.4.0 | 1134 | Monday, December 12, 2022 | Approved | |
IronPython 2.7.12 | 5455 | Friday, January 21, 2022 | Approved | |
IronPython 2.7.11 | 5080 | Tuesday, November 17, 2020 | Approved | |
IronPython 2.7.11-candidate1 | 144 | Tuesday, September 15, 2020 | Approved | |
IronPython 2.7.10 | 1871 | Thursday, May 28, 2020 | Approved | |
IronPython 2.7.10-alpha0 | 424 | Saturday, February 23, 2019 | Approved | |
IronPython 2.7.9 | 6720 | Tuesday, October 9, 2018 | Approved | |
IronPython 2.7.9-candidate1 | 264 | Monday, August 20, 2018 | Approved | |
IronPython 2.7.8.1 | 9601 | Tuesday, March 20, 2018 | Approved | |
IronPython 2.7.5.20150916 | 1711 | Tuesday, September 15, 2015 | Approved | |
IronPython 2.7.5.20150915 | 400 | Tuesday, September 15, 2015 | Approved | |
IronPython 2.7.4.20140816 | 1444 | Friday, August 15, 2014 | Approved | |
IronPython 2.7.4.20140815 | 393 | Friday, August 15, 2014 | Approved |
© IronPython Contributors
This package has no dependencies.
Ground Rules:
- This discussion is only about IronPython and the IronPython package. If you have feedback for Chocolatey, please contact the Google Group.
- This discussion will carry over multiple versions. If you have a comment about a particular version, please note that in your comments.
- The maintainers of this Chocolatey Package will be notified about new comments that are posted to this Disqus thread, however, it is NOT a guarantee that you will get a response. If you do not hear back from the maintainers after posting a message below, please follow up by using the link on the left side of this page or follow this link to contact maintainers. If you still hear nothing back, please follow the package triage process.
- Tell us what you love about the package or IronPython, or tell us what needs improvement.
- Share your experiences with the package, or extra configuration or gotchas that you've found.
- If you use a url, the comment will be flagged for moderation until you've been whitelisted. Disqus moderated comments are approved on a weekly schedule if not sooner. It could take between 1-5 days for your comment to show up.